摘要针对关键词引用网络中存在的节点语义角色缺失和关联关系单一的局限,本文通过学术文本语义功能增强网络中节点及其关联关系的语义信息,提出了一种细粒度关键词引用网络。首先对学术文本内容进行解析,抽取文献关键词、引用关联、引文上下文、引用对象等信息,并对其词汇功能和引用功能进行识别。接着采用复杂网络图方法构建细粒度关键词引用网络,从引用功能敏感的子网分析、特定节点的多维关联分析和细粒度领域知识演化分析三个方面进行领域知识多维分析,并以ACL(Association for Computational Linguistics)会议论文集为例进行实证研究。结果验证了本文所提出方法的有效性,发现了NLP(natural language processing)领域知识间的使用、扩展和对比模式,揭示了特定研究问题的发展情况或特定方法的应用情况,刻画了领域知识的细粒度演化脉络。本文扩展了知识网络的研究方法和深度,也为领域知识多维分析提供了新的视角和路径。
1 Khasseh A A, Soheili F, Moghaddam H S, et al. Intellectual structure of knowledge in iMetrics: a co-word analysis[J]. Information Processing & Management, 2017, 53(3): 705-720. 2 Lu W, Wang J M, Hu J M. Analyzing the topic distribution and evolution of foreign relations from parliamentary debates: a framework and case study[J]. Information Processing & Management, 2020, 57(3): 102191. 3 Lozano S, Calzada-Infante L, Adenso-Díaz B, et al. Complex network analysis of keywords co-occurrence in the recent efficiency analysis literature[J]. Scientometrics, 2019, 120(2): 609-629. 4 章成志, 谢雨欣, 宋云天. 学术文本中细粒度知识实体的关联分析[J]. 图书馆论坛, 2021, 41(3): 12-20. 5 Tosi M D L, dos Reis J C. SciKGraph: a knowledge graph approach to structure a scientific field[J]. Journal of Informetrics, 2021, 15(1): 101109. 6 王忠义, 谭旭, 夏立新. 共词分析方法的细粒度化与语义化研究[J]. 情报学报, 2014, 33(9): 969-978. 7 周萌, 陈果. 科技文本中术语细粒度共现关系抽取与可视化分析[J]. 情报科学, 2019, 37(3): 81-87. 8 吴蕾, 梁晓贺, 宋红燕. 基于超网络的科技论文关键词关联分析[J]. 情报学报, 2020, 39(3): 253-258. 9 Ding Y, Song M, Han J, et al. Entitymetrics: measuring the impact of entities[J]. PLoS One, 2013, 8(8): e71416. 10 Song M, Han N G, Kim Y H, et al. Discovering implicit entity relation with the gene-citation-gene network[J]. PLoS One, 2013, 8(12): e84639. 11 Hsiao T M, Chen K H. The dynamics of research subfields for library and information science: an investigation based on word bibliographic coupling[J]. Scientometrics, 2020, 125(1): 717-737. 12 Cheng Q K, Wang J M, Lu W, et al. Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis[J]. Scientometrics, 2020, 124(3): 1923-1943. 13 Young S R, Rose D C, Karnowski T P, et al. Optimizing deep learning hyper-parameters through an evolutionary algorithm[C]// Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments. New York: ACM Press, 2015: Article No.4. 14 Ren Z C, Shen Q, Diao X L, et al. A sentiment-aware deep learning approach for personality detection from text[J]. Information Processing & Management, 2021, 58(3): 102532. 15 Cambrosio A, Cointet J P, Abdo A. Beyond networks: aligning qualitative and computational science studies[J]. Quantitative Science Studies, 2020, 1(9): 1017-1024. 16 程齐凯, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取[J]. 情报学报, 2021, 40(1): 43-52. 17 Kondo T, Nanba H, Takezawa T, et al. Technical trend analysis by analyzing research papers’ titles[C]// Proceedings of the Language and Technology Conference. Heidelberg: Springer, 2011: 512-521. 18 Gupta S, Manning C. Analyzing the dynamics of research by extracting key aspects of scientific papers[C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Asian Federation of Natural Language Processing, 2011: 1-9. 19 Lu W, Li X, Liu Z F, et al. How do author-selected keywords function semantically in scientific manuscripts?[J]. Knowledge Organization, 2019, 46(6): 403-418. 20 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329. 21 陆伟, 孟睿, 刘兴帮. 面向引用关系的引文内容标注框架研究[J]. 中国图书馆学报, 2014, 40(6): 93-104. 22 Peritz B C. A classification of citation roles for the social sciences and related fields[J]. Scientometrics, 1983, 5(5): 303-312. 23 Teufel S, Siddharthan A, Tidhar D. Automatic classification of citation function[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2006: 103-110. 24 黄文彬, 王冰璐, 步一, 等. 关键词共引分析的科学计量方法研究[J]. 情报资料工作, 2018(2): 37-42. 25 程齐凯, 王佳敏, 陆伟. 基于引用共词网络的领域基础词汇发现研究[J]. 数据分析与知识发现, 2019, 3(6): 57-65. 26 刘臣, 张庆普, 单伟, 等. 基于语义的社会网络关联路径评价及其应用[J]. 情报学报, 2011, 30(2): 172-182. 27 张晗, 赵玉虹. 医学文献语义共词知识网的构建: 方法与实证[J]. 图书情报工作, 2016, 60(11): 135-142. 28 陈翔, 黄璐, 倪兴兴, 等. 基于动态语义网络分析的主题演化路径识别研究[J]. 情报学报, 2021, 40(5): 500-512. 29 Ma J X, Lund B. The evolution and shift of research topics and methods in library and information science[J]. Journal of the Association for Information Science and Technology, 2021, 72(8): 1059-1074. 30 孙震, 冷伏海. 一种基于知识元迁移的ESI研究前沿知识演进分析方法[J]. 情报学报, 2021, 40(10): 1027-1042. 31 索传军, 李木子. 我国学术论文研究问题探析——基于2015—2020年图情领域CSSCI发表论文的实证研究[J]. 图书情报工作, 2021, 65(19): 105-116. 32 章成志, 张颖怡. 基于学术论文全文的研究方法实体自动识别研究[J]. 情报学报, 2020, 39(6): 589-600. 33 Heffernan K, Teufel S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2): 1367-1382. 34 Safder I, Hassan S U, Visvizi A, et al. Deep learning-based extraction of algorithmic metadata in full-text scholarly documents[J]. Information Processing & Management, 2020, 57(6): 102269. 35 Lopez P. GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications[C]// Proceedings of the International Conference on Theory and Practice of Digital Libraries. Heidelberg: Springer, 2009: 473-474. 36 Hernández-Alvarez M, Gomez Soriano J M, Martínez-Barco P. Citation function, polarity and influence classification[J]. Natural Language Engineering, 2017, 23(4): 561-588. 37 Cohen J. A coefficient of agreement for nominal scales[J]. Educational and Psychological Measurement, 1960, 20(1): 37-46. 38 马娜, 张智雄, 吴朋民. 基于特征融合的术语型引用对象自动识别方法研究[J]. 数据分析与知识发现, 2020, 4(1): 89-98. 39 余丽, 钱力, 付常雷, 等. 基于深度学习的文本中细粒度知识元抽取方法研究[J]. 数据分析与知识发现, 2019, 3(1): 38-45. 40 Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 1715-1725.