|
|
Research on Cross-media Correlation Analysis by Fusing Semantic Features and Distribution Features |
Liu Zhongbao1,2, Zhao Wenjuan1,2 |
1.Institute of Language Intelligence, Beijing Language and Culture University, Beijing 100083 2.Key Laboratory of Cloud Computing and Internet-of-Things Technology, Quanzhou University of Information Engineering, Quanzhou 362000 |
|
|
Abstract Several types of media data such as text, image, video, and audio are of multi-source heterogeneous type, which leads to the problem of semantic gaps. Current researches focus mostly on text and image, presumably because it is difficult to measure the correlation between more types of media data. Therefore, we discuss performing cross-media correlation analysis by fusing the semantic features and distribution features so as to produce consistent presentation of different types of media data. The different types of media data are first vectorized and input into the proposed model. Then, bidirectional long short-term memory (BiLSTM) is utilized to extract the context information, and the feature vectors are obtained. Finally, the correlation between different types of media data is analyzed by fusing the semantic features and distribution features, and all types of media data are represented consistently. The comparative experimental results show that the method proposed in this paper performs better than several traditional methods such as CCA (canonical correlation analysis), KCCA (kernel canonical correlation analysis), and Deep-SM (deep semantic match), which indicates that the proposed method can precisely detect the semantic correlation between different types of media data. The paper offers guidance and reference for research on cross-media correlation analysis.
|
Received: 07 April 2020
|
|
|
|
1 Li D G, Dimitrova N, Li M K, et al. Multimedia content processing through cross-modal association[C]// Proceedings of the 11th ACM International Conference on Multimedia. New York: ACM Press, 2003: 604-611. 2 Peng Y X, Qi J W. Reinforced cross-media correlation learning by context-aware bidirectional translation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(6): 1718-1731. 3 Rasiwasia N, Costa Pereira J, Coviello E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceedings of the 18th International Conference on Multimedia. New York: ACM Press, 2010: 251-260. 4 Ballan L, Uricchio T, Seidenari L, et al. A cross-media model for automatic image annotation[C]// Proceedings of the 4th ACM International Conference on Multimedia Retrieval. New York: ACM Press, 2014: 73-80. 5 Gong Y C, Ke Q F, Isard M, et al. A multi-view embedding space for modeling Internet images, tags, and their semantics[J]. International Journal of Computer Vision, 2014, 106(2): 210-233. 6 Chen Y M, Wang L, Wang W, et al. Continuum regression for cross-modal multimedia retrieval[C]// Proceedings of the 19th IEEE International Conference on Image Processing. IEEE, 2012: 1949-1952. 7 Andrew G, Arora R, Bilmes J, et al. Deep canonical correlation analysis[J]. Proceedings of Machine Learning, 2013, 28(3): 1247-1255. 8 Feng F X, Wang X J, Li R F. Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM Press, 2014: 7-16. 9 Wei Y C, Zhao Y, Lu C Y, et al. Cross-modal retrieval with CNN visual features: a new baseline[J]. IEEE Transactions on Cybernetics, 2017, 47(2): 449-460. 10 Peng Y X, Huang X, Qi J W. Cross-media shared representation by hierarchical learning with multiple deep networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 3846-3853. 11 Peng Y X, Qi J W, Huang X, et al. CCL: cross-modal correlation learning with multigrained fusion by hierarchical network[J]. IEEE Transactions on Multimedia, 2018, 20(2): 405-420. 12 Huang X, Peng Y X, Yuan M K. Cross-modal common representation learning by hybrid transfer network[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 1893-1900. 13 李广丽, 张红斌, 移梦阳. 数字图书馆中跨媒体检索模型的设计及优化探索[J]. 情报理论与实践, 2013, 36(2): 104-108. 14 明均仁, 何超. 基于语义关联挖掘的数字图书馆跨媒体检索方法研究[J]. 图书情报工作, 2013, 57(7): 101-105. 15 张兴旺, 黄晓斌. 数字图书馆跨媒体检索研究综述[J]. 情报资料工作, 2014(3): 37-42. 16 刘忠宝, 贾君枝, 赵文娟. 数字图书馆跨媒体检索技术研究[J]. 图书馆论坛, 2014, 34(12): 94-97, 137. 17 李爱明. 数字图书馆中基于语义关联挖掘的跨媒体检索研究: 模型设计与实验分析[J]. 情报科学, 2014, 32(1): 85-88. 18 彭欣. 基于深度学习的数字图书馆跨媒体语义检索方法研究[J]. 情报探索, 2018(2): 16-19. 19 徐彤阳, 邓颖慧. 微信中基于语义关联的跨媒体检索研究[J]. 情报科学, 2018, 36(7): 158-162. 20 黄微, 刘熠, 孙悦. 多媒体网络舆情语义识别的关键技术分析[J]. 情报理论与实践, 2019, 42(1): 134-140. 21 熊回香, 杨滋荣, 蒋武轩. 跨媒体知识图谱构建中多模态数据语义相关性研究[J]. 情报理论与实践, 2019, 42(2): 13-18, 24. 22 李广丽, 朱涛, 刘斌, 等. 面向大数据的数字图书馆多媒体信息检索系统优化研究[J]. 情报科学, 2019, 37(2): 115-119. 23 Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5/6): 602-610. 24 He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2016: 770-778. 25 Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 655-665. 26 Palaz D, Collobert R, Doss M M. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks[C]// Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013: 1766-1770. |
|
|
|