基于无监督图神经网络的学术文献表示学习研究

doi:10.3772/j.issn.1000-0135.2022.01.007

情报学报

2022, Vol. 41

Issue (1): 62-72 DOI: 10.3772/j.issn.1000-0135.2022.01.007

情报分析方法与技术

本期目录 | 过刊浏览 | 高级检索

基于无监督图神经网络的学术文献表示学习研究

丁恒, 任卫强, 曹高辉

华中师范大学信息管理学院，武汉 430079

Using Unsupervised Graphs of Neural Networks for Constructing Learning Representations of Academic Papers

Ding Heng, Ren Weiqiang, Cao Gaohui

School of Information Management, Central China Normal University, Wuhan 430079

摘要
图/表
参考文献
相关文章 (14)

全文: PDF (2371 KB) HTML (215 KB)
输出: BibTeX | EndNote (RIS)

摘要学术文献特征表示，是学术文献搜索、分类组织、个性化推荐等学术大数据服务的关键步骤。研究表明，图神经网络能够有效学习文献的特征表示，然而当前研究主要集中在有监督学习方法上，不仅对数据集的大小和质量的要求较高，且学习到的文献特征表示与具体任务高度耦合。基于此，本文将四种无监督图神经网络方法引入学术文献表示学习，从Cora、CiteSeer和DBLP（database systems and logic programming）数据集的引文网络、共被引网络和文献耦合网络中学习文献的表示向量，并应用于文献分类和论文推荐两大下游任务。研究结果表明，①深度互信息图神经网络适合于文献分类任务，对抗正则化变分图自编码器则在论文推荐任务上性能更佳；②Cora数据集上的结果表明，相较于共被引和文献耦合网络，引文网络更适合于学习通用的文献表示向量。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	丁恒
	任卫强
	曹高辉

关键词 ：无监督学习, 图神经网络, 表示学习, 学术文献

收稿日期: 2021-03-27

基金资助:国家自然科学基金青年科学基金项目“基于深度语义表示和多文档摘要的学术文献自动综述研究”（71904058）；中国博士后科学基金项目“面向综述自动撰写的摘要式学术搜索引擎研究”（2020M682458）。

作者简介: 丁恒，男，1988年生，博士，讲师，硕士生导师，主要研究领域为信息检索、文本挖掘、人工智能；任卫强，男，1997年生，硕士研究生，主要研究领域为自然语言处理；曹高辉，男，1980年生，博士，副教授，主要研究领域为信息检索与信息组织、用户行为，E-mail：ghcao@mail.ccnu.edu.c；

引用本文:

丁恒, 任卫强, 曹高辉. 基于无监督图神经网络的学术文献表示学习研究[J]. 情报学报, 2022, 41(1): 62-72.
Ding Heng, Ren Weiqiang, Cao Gaohui. Using Unsupervised Graphs of Neural Networks for Constructing Learning Representations of Academic Papers. 情报学报, 2022, 41(1): 62-72.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2022.01.007 或 https://qbxb.istic.ac.cn/CN/Y2022/V41/I1/62

1 Landhuis E. Scientific literature: information overload[J]. Nature, 2016, 535(7612): 457-458.
2 Sulova S, Todoranova L, Penchev B, et al. Using text mining to classify research papers[C]// Proceedings of the 17th International Multidisciplinary Scientific GeoConference, Bulgaria, Albena, 2017: 647-654.
3 Chandrasekaran K, Gauch S, Lakkaraju P, et al. Concept-based document recommendations for CiteSeer authors[C]// Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems. Heidelberg: Springer, 2008: 83-92.
4 Sun C, Qiu X P, Xu Y G, et al. How to fine-tune BERT for text classification?[C]// Proceedings of the China National Conference on Chinese Computational Linguistics. Cham: Springer, 2019: 194-206.
5 Hassan H, Sansonetti G, Gasparetti F, et al. BERT, ELMo, USE and InferSent sentence encoders: the panacea for research-paper recommendation?[C]// Proceedings of the 13th ACM Conference on Recommender Systems. New York: ACM Press, 2019: 6-10.
6 Kong X J, Mao M Y, Wang W, et al. VOPRec: vector representation learning of papers with text information and structural identity for recommendation[J]. IEEE Transactions on Emerging Topics in Computing, 2021, 9(1): 226-237.
7 刘欢, 李晓戈, 胡立坤, 等. 基于知识图谱驱动的图神经网络推荐模型[J]. 计算机应用, 2021, 41(7): 1865-1870.
8 吴国栋, 查志康, 涂立静, 等. 图神经网络推荐研究进展[J]. 智能系统学报, 2020, 15(1): 14-24.
9 Wu Z H, Pan S R, Chen F W, et al. A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 4-24.
10 Reid Turner C, Fuggetta A, Lavazza L, et al. A conceptual basis for feature engineering[J]. Journal of Systems and Software, 1999, 49(1): 3-15.
11 Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798-1828.
12 Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. New York: ACM Press, 2012: 1097-1105.
13 Noda K, Yamaguchi Y, Nakadai K, et al. Audio-visual speech recognition using deep learning[J]. Applied Intelligence, 2015, 42(4): 722-737.
14 Church K W. Word2Vec[J]. Natural Language Engineering, 2017, 23(1): 155-162.
15 Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 2227-2237.
16 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186.
17 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3615-3620.
18 王健宗, 孔令炜, 黄章成, 等. 图神经网络综述[J]. 计算机工程, 2021, 47(4): 1-12.
19 文贵华, 江丽君, 文军. 邻域参数动态变化的局部线性嵌入[J]. 软件学报, 2008, 19(7): 1666-1673.
20 戴志波, 王靖. 鲁棒拉普拉斯特征映射算法[J]. 计算机应用研究, 2011, 28(9): 3249-3252.
21 Ahmed A, Shervashidze N, Narayanamurthy S, et al. Distributed large-scale natural graph factorization[C]// Proceedings of the 22nd International Conference on World Wide Web. New York: ACM Press, 2013: 37-48.
22 Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2014: 701-710.
23 Grover A, Leskovec J. node2vec: scalable feature learning for networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 855-864.
24 马扬, 程光权, 梁星星, 等. 有向加权网络中的改进SDNE算法[J]. 计算机科学, 2020, 47(4): 233-237.
25 Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[C]// Proceedings of the International Conference on Learning Representations, Toulon, France, April 24 - 26, 2017.
26 Kipf T N, Welling M. Variational graph auto-encoders[OL]. (2016-11-21) [2021-03-16]. https://arxiv.org/pdf/1611.07308v1.pdf.
27 Pan S R, Hu R Q, Long G D, et al. Adversarially regularized graph autoencoder for graph embedding[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 2609-2615.
28 Veli?kovi? P, Fedus W, Hamilton W L, et al. Deep graph infomax[OL]. (2018-12-21) [2021-03-16]. https://arxiv.org/pdf/1809.10341v2.pdf.
29 Le Q, Mikolov T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on International Conference on Machine Learning. JMLR.org, 2014: II-1188-II-1196.
30 Ganguly S, Pudi V. Paper2vec: combining graph and text information for scientific paper representation[C]// Proceedings of the European Conference on Information Retrieval. Cham: Springer, 2017: 383-395.