一种基于深度学习的异构多模态目标识别方法

来源期刊:中南大学学报(自然科学版)2016年第5期

论文作者:胡超 文孟飞 刘伟荣

文章页码:1580 - 1588

关键词:目标识别;深度学习;卷积神经网络;限制玻尔兹曼机;典型关联分析

Key words:object recognition; deep learning; restricted boltzmann machine; convolutional neural network; canonical correlation analysis

摘    要:提出一种基于深度学习的异构多模态目标识别方法。首先针对媒体流中同时存在音频和视频信息的特征,建立一种异构多模态深度学习结构;结合卷积神经网络和限制波尔兹曼机的算法优点,对音频信息和视频信息分别并行处理,生成基于典型关联分析的共享特征表示,并进一步利用时间相关特性进行参数的优化。分别使用标准语音人脸库和截取的实际电影视频对算法进行实验。研究结果表明:对于这2种视频来源,所提出方法在目标识别的精度方面都有显著提高。

Abstract: The heterogeneous multimodal object recognition method was proposed based on deep learning. Firstly, based on the video and audio co-existing feature of media data, a heterogeneous multimodal structure was constructed to incorporate the convolutional neural network(CNN) and the restricted boltzmann machine(RBM). The audio and video information were processed respectively, generating the share characteristic representation by using the canonical correlation analysis(CCA). Then the temporal coherence of video frame was utilized to improve the recognizing accuracy further. The experiments were implemented based on the standard audio & face library and the actual movie video fragments. The results show that for both the two kinds of video sources, the proposed method improves the accuracy of target recognition significantly.

相关论文

  • 暂无!

相关知识点

  • 暂无!

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号