李志义, 黄子风, 许晓绵. 基于表示学习的跨模态检索模型与特征抽取研究综述[J]. 情报学报, 2018, 37(4): 422-435.
Li Zhiyi, Huang Zifeng, Xu Xiaomian. A Review of the Cross-Modal Retrieval Model and Feature Extraction Based on Representation Learning. 情报学报, 2018, 37(4): 422-435.
[1] 王剑. 基于深度学习的跨模态图像检索方法研究[D]. 北京: 中国科学院大学研究生院, 2016. [2] 何泳澔. 跨模态关联学习及其在图像检索中的应用研究[D]. 北京:中国科学院大学自动化研究所, 2016. [3] 张昭旭. CNN深度学习模型用于表情特征提取方法探究[J]. 现代计算机, 2016(3): 41-44. [4] 孙志军, 薛磊, 许阳明. 基于深度学习的边际Fisher分析特征提取算法[J]. 电子与信息学报, 2013, 35(4): 805-811. [5] Amir A, Basu S, Iyengar G, et al.A multi-modal system for the retrieval of semantic video events[J]. Computer Vision & Image Understanding, 2004, 96(2): 216-236. [6] Rasiwasia N, Costa Pereira J, Coviello E, et al.A new approach to cross-modal multimedia retrieval[C]// Proceedings of the International Conference on Multimedia. New York: ACM Press, 2010: 251-260. [7] Ngiam J, Khosla A, Kim M, et al.Multimodal deep learning[C]// Proceedings of the International Conference on Machine Learning. Washington, USA, 2011: 689-696. [8] 刘春丽, 李晓戈, 刘睿, 等. 基于表示学习的中文分词[J]. 计算机应用, 2016, 36(10): 2794-2798. [9] Mikolov T, Sutskever I, Chen K, et al.Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119. [10] Zhao Y, Liu Z Y, Sun M S. Phrase type sensitive tensor indexing model for semantic composition[OL]. [2017-07-25]. http://www. thunlp.org/~lzy/publications/aaai2015_tim.pdf. [11] Hu B T, Lu Z D, Li H, et al. Convolutional neural network architectures for matching natural language sentences[OL]. [2017-07- 25]. http://www.hangli-hl.com/uploads/3/1/6/8/3168008/hu-etal- nips2014.pdf. [12] Le Q V, Mikolov T. Distributed representations of sentences and documents[OL]. [2017-07-25]. http://proceedings.mlr.press/v32/ le14.pdf. [13] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[OL]. [2017-07-25]. http://www.cs.wayne.edu/~mdong/Kalchbrenner_DCNN_ACL14.pdf. [14] Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations[OL]. [2017-07-25]. http://www.perozzi.net/ publications/14_kdd_deepwalk-slides.pdf. [15] Tang J, Qu M, Wang M Z, et al.LINE: Large-scale information network embedding[OL].[2018-04-10].https://www.microsoft. com/en-us/research/wp-content/uploads/2016/02/frp0228-Tang.pdf. [16] Grubinger M, Clough P, Müller H, et al. The IAPR TC12 Benchmark: A new evaluation resource for visual information systems[C/OL]// Proceedings of the International Workshop OntoImage 2006 Language Resources for Content-Based Image Retrieval. [2017-07-25]. http://www-i6.informatik.rwth-aachen.de/ publications/download/34/Grubinger-LREC-2006.pdf. [17] Plummer B A, Wang L, Cervantes C M, et al.Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models[C]// Proceedings of the International Conference on Computer Vision. Las Vegas: IEEE, 2016: 2 [18] Krizhevsky A, Sutskever I, Hinton G E.ImageNet classification with deep convolutional neural networks[C]// Proceedings of the International Conference on Neural Information Processing Systems. Curran Associates, 2012: 1097-1105. [19] David R.Signature analysis for multiple-output circuits[J]. IEEE Transactions on Computers, 1986, 35(9): 830-837. [20] Cortes C, Vapnik V.Support-vector networks[J]. Machine Learning, 1995, 20(3): 273-297. [21] Greene W H.Marginal effects in the bivariate probit model[J]. Social Science Electronic Publishing[OL]. [2017-07-25]. http:// archive.nyu.edu/bitstream/2451/26254/2/EC-96-11.pdf. [22] Bengio Y.Learning deep architectures for AI[J]. Foundations & Trends® in Machine Learning, 2009, 2(1): 1-127. [23] 韩力群. 人工神经网络理论、设计及应用[M]. 北京: 化学工业出版社, 2002: 191-193. [24] Hinton G E, Osindero S, Teh Y W.A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527. [25] Deng L, Li J Y, Huang J T, et al.Recent advances in deep learning for speech research at Microsoft[C]// Proceedings of the 38th IEEE International Conference on Acoustics, Speech, and Signal. Vancouver: IEEE, 2013: 8604-8608. [26] Glorot X, Bengio Y.Understanding the difficulty of training deep feedforward neural networks[C]// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics, 2010: 249-256. [27] Bengio Y, Lamblin P, Popovici D, et al.Greedy layer-wise training of deep networks[C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver: MIT Press, 2006: 153-160. [28] 吴海燕. 基于自动编码器的半监督表示学习与分类学习研究[D]. 重庆: 重庆大学, 2015. [29] Andreas J, Rohrbach M, Darrell T, et al. Learning to compose neural networks for question answering[OL]. [2017-07-31]. http://www.stanfordlibraries.info/class/cs224n/lectures/cs224n-2017-lecture17-highlight.pdf. [30] 朱陶, 任海军, 洪卫军. 一种基于前向无监督卷积神经网络的人脸表示学习方法[J]. 计算机科学, 2016, 43(6): 303-307. [31] 李志宇, 梁循, 徐志明, 等. DNPS: 基于阻尼采样的大规模动态社会网络结构特征表示学习[J]. 计算机学报, 2017, 40(4): 805-823. [32] 李志义, 王冕, 赵鹏武. 基于条件随机场模型的“评价特征-评价词”对抽取研究[J]. 情报学报, 2017, 36(4): 411-421. [33] Rumelhart D E, Hinton G E, Williams R J.Learning representations by back-propagating errors[J]. Nature, 1986, 323: 533-536. [34] Vincent P, Larochelle H, Bengio Y, et al.Extracting and composing robust features with denoising autoencoders[C]// Proceedings of the International Conference on Machine Learning. New York: ACM Press, 2008: 1096-1103. [35] Rifai S, Vincent P, Muller X, et al. Contractive auto-encoders: Explicit invariance during feature extraction[OL]. [2017-07-31]. http://www.iro.umontreal.ca/~lisa/bib/pub_subject/language/pointeurs/ICML2011_explicit_invariance.pdf. [36] Masci J, Meier U.Stacked convolutional auto-encoders for hierarchical feature extraction[C]// Proceedings of the International Conference on Artificial Neural Networks. Springer-Verlag, 2011: 52-59. [37] Vincent P, Larochelle H, Lajoie I, et al.Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(12): 3371-3408. [38] Mitchell B, Sheppard J.Deep structure learning: Beyond connectionist approaches[C]// Proceedings of the International Conference on Machine Learning and Applications. IEEE, 2013: 162-167. [39] Erhan D, Bengio Y, Courville A, et al.Why does unsupervised pre-training help deep learning?[J]. Journal of Machine Learning Research, 2010, 11(3): 625-660. [40] Deng L, Seltzer M L, Yu D, et al.Binary coding of speech spectrograms using a deep auto-encoder[C]// Proceedings of the Conference of the International Speech Communication Association, Makuhari, Chiba, Japan. DBLP, 2010: 1692-1695. [41] Lee H, Ekanadham C, Ng A Y.Sparse deep belief net model for visual area V2[C]// Proceedings of the International Conference on Neural Information Processing Systems. Curran Associates, 2007: 873-880. [42] 李海峰, 李纯果. 深度学习结构和算法比较分析[J]. 河北大学学报(自然科学版), 2012, 32(5): 538-544. [43] 刘菲, 刘学亮. 基于稀疏编码的多模态信息交叉检索[J]. 中国图象图形学报, 2015, 20(9): 1170-1176. [44] 赵仲秋, 季海峰, 高隽, 等. 基于稀疏编码多尺度空间潜在语义分析的图像分类[J]. 计算机学报, 2014, 37(6): 1251-1260. [45] 万源, 史莹, 陈晓丽. 非负局部Laplacian稀疏编码和上下文信息的图像分类[J]. 中国图象图形学报, 2017, 22(6): 731-740. [46] Smolensky P.Information processing in dynamical systems: Foundations of harmony theory[C]// MIT Press, 1986: 194-281. [47] Mikolov T, Sutskever I, Chen K, et al.Distributed representations of words and phrases and their compositionality[C]// Proceedings of the International Conference on Neural Information Processing Systems. Curran Associates, 2013: 3111-3119. [48] Freund Y, Haussler D.Unsupervised learning of distributions on binary vectors using two layer networks[J]. Advances in Neural Information Processing Systems, 1999(4): 912-919. [49] Le Roux N, Bengio Y.Representational power of restricted boltzmann machines and deep belief networks[J]. Neural Computation, 2008, 20(6): 1631-1649. [50] Hinton G E.Training products of experts by minimizing contrastive divergence[J]. Neural Computation, 2002, 14(8): 1771-1800. [51] Ashwin T S, Saran S, Reddy G R M. Video affective content analysis based on multimodal features using a novel hybrid SVM-RBM classifier[C]// IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering. IEEE, 2017: 416-421. [52] 王曙. 深度学习算法研究及其在图像分类上的应用[D]. 南京: 南京邮电大学, 2016. [53] 张阳, 刘伟铭, 吴义虎. 基于深信度网络分类算法的行人检测方法[J]. 计算机应用研究, 2016, 33(2): 594-597. [54] Morère O, Lin J, Veillard A, et al.Nested invariance pooling and RBM hashing for image instance retrieval[C]// Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. New York: ACM Press, 2017: 260-268. [55] 刘兴旺, 王江晴, 徐科. 一种融合AutoEncoder与CNN的混合算法用于图像特征提取[J]. 计算机应用研究, 2017, 34(12): 3839-3843. [56] 黎亚雄, 张坚强, 潘登, 等. 基于RNN-RBM语言模型的语音识别研究[J]. 计算机研究与发展, 2014, 51(9): 1936-1944. [57] 鲁铮. 基于T-RBM算法的DBN分类网络的研究[D]. 长春: 吉林大学, 2014. [58] 潘广源, 柴伟, 乔俊飞. DBN网络的深度确定方法[J]. 控制与决策, 2015, 30(2): 256-260. [59] 何俊, 蔡建峰, 房灵芝, 等. 基于LBP/VAR与DBN模型的人脸表情识别[J]. 计算机应用研究, 2016, 33(8): 2509-2513. [60] 吕启, 窦勇, 牛新, 等. 基于DBN模型的遥感图像分类[J]. 计算机研究与发展, 2014, 51(9): 1911-1918. [61] LeCun Y, Bottou L, Bengio Y, et al. Gradient based learning applied to document recognition[C]// Proceedings of IEEE, 1998, 86(11): 2278-2324. [62] Rasmusbergpalm/DeepLearnToolbox[OL]. [2017-07-12].https:// github.com/rasmusbergpalm/DeepLearnToolbox. [63] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2017-11-16]. https:// arxiv.org/pdf/1409.1556.pdf. [64] Szegedy C, Liu W, Jia Y Q, et al.Going deeper with convolutions[C]// Proceedings of the Conference on Computer Vision and Pattern Recognition. IEEE, 2015: 1-9. [65] 李彦冬, 郝宗波, 雷航. 卷积神经网络研究综述[J]. 计算机应用, 2016, 36(9): 2508-2515. [66] Zheng C X, Long A, Volkov Y, et al.A cross-modal system for cell migration image annotation and retrieval[C]// Proceedings of the International Joint Conference on Neural Networks. IEEE, 2007: 1738-1743. [67] Jia Y Q, Salzmann M, Darrell T.Learning cross-modality similarity for multinomial data[C]// Proceedings of the International Conference on Computer Vision.Barcelona. IEEE Computer Society, 2011: 2407-2414. [68] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[EB/OL]. [2017-07-12]. http://arxiv.org/pdf/1301.3781.pdf. [69] Le T A.An exploration of the Word2vec algorithm: Creating a vector representation of a language vocabulary that encodes meaning and usage patterns in the vector space structure[D]. University of North Texas, 2016. [70] 张川. 面向图像分类的深度残差网络优化结构研究[D]. 北京: 中国科学院大学计算机技术研究所, 2016. [71] Vía J, Santamaría I, Pérez J. A robust RLS algorithm for adaptive canonical correlation analysis[OL]. [2017-07-31]. http:// pdfs.semanticscholar.org/59ef/40e0c8fd82c95b12f3aee38b57a653ab1ea1.pdf. [72] 邓正恒. 跨模态信息检索方法的研究与实现[D]. 上海: 复旦大学, 2013. [73] Feng F X, Wang X J, Li R F.Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM Press, 2014: 7-16. [74] Chandrika P, Jawahar C V.Multi modal semantic indexing for image retrieval[C]// Proceedings of the ACM International Conference on Image and Video Retrieval. New York: ACM Press, 2010: 342-349. [75] Lin W X, Lu T, Su F.A novel multi-modal integration and propagation model for cross-media information retrieval[C]// Proceedings of the International Conference on Advances in Multimedia Modeling. Springer-Verlag, 2012: 740-749. [76] Wang K Y, Wang W, He R, et al.Multi-modal subspace learning with joint graph regularization for cross-modal retrieval[C]// Proceedings of the 2013 Second IAPR Asian Conference on Pattern Recognition. IEEE Computer Society, 2013: 236-240. [77] Xie L, Pan P, Lu Y S.Analyzing semantic correlation for cross-modal retrieval[J]. Multimedia Systems, 2015, 21(6): 525-539. [78] Wang S X, Pan P, Lu Y S, et al.Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model[J]. Multimedia Tools and Applications, 2015, 74(6): 2009-2032. [79] Xu X, Yang Y, Shimada A, et al.Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts[C]// Proceedings of the ACM International Conference on Multimedia. New York: ACM Press, 2015: 847-850 [80] 彭岩, 张道强. 半监督典型相关分析算法[J]. 软件学报, 2008, 19(11): 2822-2832. [81] Akaho S.A kernel method for canonical correlation analysis[C]// Proceedings of the International Meeting of the Psychometric Society. Springer, 2001: 263-269. [82] Yin J S, Hu D W, Zhou Z T.Noisy manifold learning using neighborhood smoothing embedding[J]. Pattern Recognition Letters, 2008, 29(11): 1613-1620. [83] Feng F X, Wang X J, Li R F.Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM Press, 2014: 7-16. [84] Kim J S, Sim J Y, Kim C S.Multiscale saliency detection using random walk with restart[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(2): 198-210. [85] Verma Y, Jawahar C V.A support vector approach for cross-modal search of images and texts[J]. Computer Vision and Image Understanding, 2016, 154: 48-63. [86] Wang W, Yang X Y, Ooi B C, et al.Effective deep learning-based multi-modal retrieval[J]. The VLDB Journal, 2016, 25(1): 79-101. [87] Cao Y, Long M S, Wang J M, et al.Deep visual-semantic hashing for cross-modal retrieval[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 1445-1454. [88] 董永亮, 柴旭清. 基于潜在语义的双层图像-文本多模态检索语义网络[J]. 计算机工程, 2016, 42(7): 299-303. [89] 丁恒, 陆伟. 基于相关性的跨模态信息检索研究[J]. 现代图书情报技术, 2016, 32(1): 17-23. [90] 刘传才, 杨静宇. 一种新的图像纹理表示方法[J]. 计算机学报, 2001, 24(11): 1202-1209. [91] 李瑞光, 姜锋霞. 基于内容图像检索的特征性能评价研究[J]. 电脑知识与技术, 2014(5): 922-923. [92] Saracevic T.Evaluation of evaluation in information retrieval[C]// Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1995: 138-146. [93] 江秋鑫. 基于SIFT特征的图像相似性度量及其应用研究[D]. 大连: 大连理工大学, 2012. [94] 余锦秀. 基于用户行为分析的搜索引擎自动评价技术研究[D]. 北京: 北京邮电大学, 2013. [95] Li K H, Huang Z, Cheng Y C, et al.A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers[C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2014: 4503-4507. [96] 信息检索的评价指标(Precision, Recall, F-score, MAP)[EB/OL]. [2017-08-20]. http://blog.csdn.net/Lu597203933/article/details/ 41802155.