|
|
Research on Constructing an Academic Knowledge Graph of Multi-dimensional Knowledge Elements in Academic Full Texts |
Shen Si1,2, Zhu Yufei1 |
1.School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094 2.China Center for International Economic Exchanges, Beijing 100050 |
|
|
Abstract Academic texts contain a large amount of knowledge element information. Mining and organizing these knowledge elements can effectively improve the utilization efficiency of academic resources. Through the construction of an academic knowledge graph, connecting all kinds of tacit “knowledge elements” in an article can not only save time for scholars seeking to obtain knowledge points but also help them expand knowledge points through the network community in a knowledge graph. Through literature research and other methods, beginning with three dimensions, this paper determines the key knowledge elements in 18 academic papers, takes the text description information of knowledge elements as the entity object, and outlines the conceptual framework of an academic knowledge graph. Then, 515 pieces of literature in the JASIST are selected to study the manual annotation and entity extraction of the key knowledge elements in each paper based on deep learning. The research content includes whether such knowledge elements will create problems in the process of manual annotation and whether they will reach the expected value in automatic extraction when attempting to screen the knowledge elements involved in the construction of a knowledge graph. Finally, nine kinds of knowledge elements are selected, including mathematical formulas, software tools, data sources, specific models, tables, graphs, research prospects, research problems, and research results. Together with the titular data, triads composed of head entities, relations, and tail entities are generated and stored in the graph database for visual evaluation. Finally, the visualization and knowledge element retrieval of the graph are studied to prove its feasibility and scalability. The research shows that some knowledge elements in the text are suitable for large-scale automatic annotation, and all kinds of knowledge elements can form a dense knowledge community through mutual links.
|
Received: 04 May 2023
|
|
|
|
1 刘洁. 学术知识图谱的构建及系统设计与实现[D]. 南京: 东南大学, 2019. 2 李思锐. 面向语义推理的多视角学术知识图谱构建方法研究与应用[D]. 长春: 吉林大学, 2020. 3 Wan H Y, Zhang Y T, Zhang J, et al. AMiner: search and mining of academic social networks[J]. Data Intelligence, 2019, 1(1): 58-76. 4 Wang K S, Shen Z H, Huang C Y, et al. Microsoft Academic Graph: when experts are not enough[J]. Quantitative Science Studies, 2020, 1(1): 396-413. 5 Wang R J, Yan Y C, Wang J L, et al. AceKG: a large-scale knowledge graph for academic data mining[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2018: 1487-1490. 6 李肖俊, 邵必林. 多源异构数据情境中学术知识图谱模型构建研究[J]. 现代情报, 2020, 40(6): 88-97. 7 张云中, 祝蕊. 面向知识问答系统的图情学术领域知识图谱构建: 多源数据整合视角[J]. 情报科学, 2021, 39(5): 115-123. 8 李梦妮. 基于多源数据的高校学术知识图谱构建及其应用研究[D]. 杭州: 浙江工业大学, 2020. 9 索传军, 赖海媚. 学术论文问题知识元的类型与描述规则[J]. 中国图书馆学报, 2021, 47(2): 95-109. 10 贺德方, 曾建勋. 基于语义的馆藏资源深度聚合研究[J]. 中国图书馆学报, 2012, 38(4): 79-87. 11 戎军涛. 学术文献内容知识元语义描述模型研究[J]. 情报科学, 2019, 37(7): 30-35. 12 卢超, 章成志, 王玉琢, 等. 语义特征分析的深化——学术文献的全文计量分析研究综述[J]. 中国图书馆学报, 2021, 47(2): 110-131. 13 王晓光, 李梦琳, 宋宁远. 科学论文功能单元本体设计与标引应用实验[J]. 中国图书馆学报, 2018, 44(4): 73-88. 14 文庭孝, 侯经川, 龚蛟腾, 等. 中文文本知识元的构建及其现实意义[J]. 中国图书馆学报, 2007, 33(6): 91-95. 15 温有奎, 温浩, 徐端颐, 等. 基于知识元的文本知识标引[J]. 情报学报, 2006, 25(3): 282-288. 16 张静. 中小学多媒体知识元库理论研究[D]. 武汉: 华中师范大学, 2004. 17 Wang W, Zheng Q H, Liu J, et al. Exploiting various information for knowledge element relation recognition[C]// Proceedings of the 2009 IEEE International Conference on Granular Computing. Piscataway: IEEE, 2009: 565-571. 18 Hu R, Wang H J, Xu H, et al. Research on intelligent knowledge representation method and algorithm based on basic-element theory[J]. Neural Computing and Applications, 2020, 32(10): 5353-5365. 19 Zou J H, Liu Q T, Yang Z K. Knowledge elements mining subsystem of knowledge abstract and fusion system[C]// Proceedings of the 2008 International Symposium on Knowledge Acquisition and Modeling. Piscataway: IEEE, 2008: 776-778. 20 Jiang L, Yang Z K, Wang J X. Knowledge indexing of Chinese text based knowledge element[C]// Proceedings of the 2008 International Symposium on Knowledge Acquisition and Modeling. Piscataway: IEEE, 2008: 35-38. 21 Tao S J, Liu Q T, Huang T, et al. Research of knowledge element indexing for Educational Technology[C]// Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering. Piscataway: IEEE, 2010: V4-123-V4-126. 22 Wen Y K, Wen H. Semantic text deep mining based on knowledge element[C]// Proceedings of the 2011 International Conference on Internet Computing and Information Services. Piscataway: IEEE, 2011: 90-93. 23 Wu D, Feng Y, Zhang C W. The semantic annotation method of knowledge element[C]// Proceedings of the 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering. Piscataway: IEEE, 2011: 91-94. 24 王晓光, 周慧敏, 宋宁远. 科学论文论证本体设计与标注实验[J]. 情报学报, 2020, 39(9): 885-895. 25 Groza T, Handschuh S, M?ller K, et al. SALT - semantically annotated LaTeX for scientific publications[C]// Proceedings of the Semantic Web Conference: Research and Applications. Heidelberg: Springer, 2007: 518-532. 26 Clark T, De Waard A. Ontology of rhetorical blocks (ORB)[EB/OL]. (2011-06-05) [2023-04-06]. https://www.w3.org/2001/sw/hcls/notes/orb/. 27 Peroni S. The discourse element ontology[EB/OL]. [2023-04-06]. http://purl.org/spar/deo. 28 Constantin A, Peroni S, Pettifer S, et al. The document components ontology (DoCO)[J]. Semantic Web, 2016, 7(2): 167-181. 29 李姣, 朱小燕. 生物文献的本体建模及其在语义查询中的应用[C]// 第三届学生计算语言学研讨会论文集. 沈阳: 沈阳出版社, 2006: 122-126. 30 赵力奇. 创业领域文献知识图谱构建与应用研究[D]. 长春: 吉林大学, 2019. 31 周峰. 地学文献资源的语义关联数据构建研究[D]. 北京: 中国地质大学(北京), 2016. 32 曲佳彬, 欧石燕, 凌洪飞. 基于深度挖掘的学术论文关联数据构建与可视化分析[J]. 情报学报, 2019, 38(6): 595-611. 33 孙薇. 基于科技文献关联数据的科研关系发现研究[D]. 镇江: 江苏大学, 2017. 34 石泽顺, 肖明. 基于RelFinder的图情学科关联数据语义关系发现实践[J]. 图书情报工作, 2017, 61(17): 139-148. 35 Xia F, Chen Z, Wang W, et al. MVCWalker: random walk-based most valuable collaborators recommendation exploiting academic factors[J]. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 364-375. 36 唐杰, 梁邦勇, 李涓子, 等. 语义Web中的本体自动映射[J]. 计算机学报, 2006, 29(11): 1956-1976. 37 蒋婷. 学科领域本体学习及学术资源语义标注研究[D]. 南京: 南京大学, 2017. 38 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 39 王余蓝. 图形数据库NEO4J与关系据库的比较研究[J]. 现代电子技术, 2012, 35(20): 77-79. |
|
|
|