|
|
Detection of Scientific Knowledge Structure Based on Graph Representation Learning |
Liu Feifan1,2, Zhang Shuang1,2, Luo Shuangling3, Xia Haoxiang1,2 |
1.Institute of Systems Engineering, Dalian University of Technology, Dalian 116024 2.Research Center for Big Data and Intelligent Decision-Making, Dalian University of Technology, Dalian 116024 3.School of Maritime Economics and Management, Dalian Maritime University, Dalian 116026 |
|
|
Abstract Accurately identifying and detecting scientific knowledge structures is of fundamental importance for understanding the development of subdisciplines in a certain field, formulating science and technology policies, and conducting research management activities. The current methods implemented by researchers to address this challenge mainly focus on two aspects: text mining and social or complex network analysis. However, few studies have fully integrated the information obtained from these two methods, and they are often only used as a basis for cross validation. Therefore, in this study, we use the advantages of graph deep learning methods emerging in the field of deep learning and propose a research framework for scientific knowledge structure detection based on deep graph neural network models combined with document representation and manifold learning algorithms. Two datasets were selected to validate the proposed research framework that are representative of basic research disciplines and new emerging research fields, respectively. The experimental results show that graph deep learning can effectively integrate the topical feature information of the literature and citation relationship feature information, thereby detecting a clearer domain knowledge structure. This study expands the application scenarios of the graph neural network model and presents reference value for the application of scientific and technical information engineering.
|
Received: 09 October 2020
|
|
|
|
1 Fortunato S, Bergstrom C T, B?rner K, et al. Science of science[J]. Science, 2018, 359(6379): eaao0185. 2 Milojevi? S. Quantifying the cognitive extent of science[J]. Journal of Informetrics, 2015, 9(4): 962-973. 3 齐金山, 梁循, 李志宇, 等. 大规模复杂信息网络表示学习: 概念、方法与挑战[J]. 计算机学报, 2018, 41(10): 2394-2420. 4 张琳, 孙蓓蓓, 黄颖. 跨学科合作模式下的交叉科学测度研究——以ESI社会科学领域高被引学者为例[J]. 情报学报, 2018, 37(3): 231-242. 5 Gerlach M, Peixoto T P, Altmann E G. A network approach to topic models[J]. Science Advances, 2018, 4(7): eaaq1360. 6 Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 7 龚轶, 王峥. 交叉学科及其研究资助的五个关键问题[J]. 科学学研究, 2015, 33(9): 1297-1304, 1339. 8 赵蓉英. 知识网络研究(Ⅱ)——知识网络的概念、内涵和特征[J]. 情报学报, 2007, 26(3): 470-476. 9 王晓光. 科学知识网络的形成与演化(Ⅰ): 共词网络方法的提出[J]. 情报学报, 2009, 28(4): 599-605. 10 赵丽梅, 张庆普. 基于科学知识图谱的我国知识管理研究范式分析[J]. 情报学报, 2012, 31(1): 95-103. 11 逯万辉, 谭宗颖. 基于深度学习的期刊分群与科学知识结构测度方法研究[J]. 情报学报, 2020, 39(1): 38-46. 12 施聪莺, 徐朝军, 杨晓江. TFIDF算法研究综述[J]. 计算机应用, 2009, 29(S1): 167-170, 180. 13 程齐凯, 王晓光. 一种基于共词网络社区的科研主题演化分析框架[J]. 图书情报工作, 2013, 57(8): 91-96. 14 王鹏, 高铖, 陈晓美. 基于LDA模型的文本聚类研究[J]. 情报科学, 2015, 33(1): 63-68. 15 王曰芬, 傅柱, 陈必坤. 基于LDA主题模型的科学文献主题识别:全局和学科两个视角的对比分析[J]. 情报理论与实践, 2016, 39(7): 121-126, 101. 16 Maaten L V D, Hinton G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(86): 2579-2605. 17 张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018, 37(2): 132-139. 18 任红娟. 基于文献内容和引用特征融合的科学结构分析方法研究[J]. 情报学报, 2013, 32(10): 1068-1074. 19 柴省三. 内容词-共引聚类分析及其在科学结构研究中的应用[J]. 情报学报, 1997, 16(1): 68-73. 20 侯跃芳, 崔雷, 吴迪. 应用引文共引聚类-内容词分析法对学科发展的研究[J]. 情报学报, 2007, 26(2): 309-314. 21 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2013: 3111-3119. 22 Le Q, Mikolov T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. JMLR.org, 2014: II-1188-II-1196. 23 ?eh??ek R, Sojka P. Software framework for topic modelling with large corpora[C]// Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta: University of Malta, 2010: 45-50. 24 Grover A, Leskovec J. node2vec: scalable feature learning for networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 855-864. 25 Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[C]// Proceedings of the 5th International Conference on Learning Representations, 2017. 26 Veli?kovi? P, Cucurull G, Casanova A, et al. Graph attention networks[C]// Proceedings of the International Conference on Learning Representations, 2018. 27 Xu K, Hu W, Leskovec J, et al. How powerful are graph neural networks?[C]// Proceedings of the International Conference on Learning Representations, 2019. 28 Ying Z, Bourgeois D, You J, et al. GNNExplainer: generating explanations for graph neural networks[C]// Proceedings of the Neural Information Processing Systems, 2019, 32: 9240-9251. 29 Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2014: 701-710. 30 Yang C, Liu Z Y, Zhao D L, et al. Network representation learning with rich text information[C]// Proceedings of the 24th International Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 2111-2117. 31 Tang J, Qu M, Wang M Z, et al. LINE: large-scale information network embedding[C]// Proceedings of the 24th International Conference on World Wide Web. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2015: 1067-1077. 32 张幸芝, 雷润玲, 杨超. 文本挖掘——基于ROSTCM和NetDraw的内容分析[J]. 科技文献信息管理, 2017, 31(1): 17-21, 33. 33 周晓分, 黄国彬, 白雅楠. 科学计量可视化软件的对比与数据预处理研究[J]. 图书情报工作, 2013, 57(23): 64-72. 34 Bastian M, Heymann S, Jacomy M. Gephi: an open source software for exploring and manipulating networks[C]// Proceedings of the Third International Conference on Weblogs and Social Media. Palo Alto: AAAI Press, 2009: 361-362. 35 McInnes L, Healy J, Saul N, et al. UMAP: uniform manifold approximation and projection[J]. Journal of Open Source Software, 2018, 3(29): 861. 36 Becht E, McInnes L, Healy J, et al. Dimensionality reduction for visualizing single-cell data using UMAP[J]. Nature Biotechnology, 2019, 37(1): 38-44. 37 罗双玲, 张文琪, 夏昊翔. 基于半积累引文网络社区发现的学科领域主题演化分析——以“合作演化”领域为例[J]. 情报学报, 2017, 36(1): 100-110. 38 Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks[J]. Journal of Statistical Mechanics: Theory and Experiment, 2008, 2008(10): P10008. 39 Abbe E. Community detection and stochastic block models: recent developments[J]. Journal of Machine Learning Research, 2018, 18(1): 6446-6531. 40 Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis[J]. Physical Review E, 2009, 80: 056117. 41 Yang Z, Algesheimer R, Tessone C J. A comparative analysis of community detection algorithms on artificial networks[J]. Scientific Reports, 2016, 6: 30750. 42 Pan R K, Sinha S, Kaski K, et al. The evolution of interdisciplinarity in physics research[J]. Scientific Reports, 2012, 2: 551. 43 Wang K S, Shen Z H, Huang C Y, et al. A review of microsoft academic services for science of science studies[J]. Frontiers in Big Data, 2019, 2: 45. 44 Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems, 2019: 8026-8037. 45 Yang Z L, Cohen W W, Salakhutdinov R. Revisiting semi-supervised learning with graph embeddings[C]// Proceedings of the 33rd International Conference on Machine Learning. JMLR.org, 2016: 40-48. 46 Yan S J, Xiong Y J, Lin D H. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]// Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 7444-7452. |
|
|
|