An Approach to Identifying Transformative Research by Integrating Citation Function and Triangular Citation Structure
Zheng Zhejun1,2, Ma Yaxue1,2, Liang Zhentao3, Bai Yun3, Pei Lei1,2
1.Laboratory of Data Intelligence and Interdisciplinary Innovation, Nanjing University, Nanjing 210023 2.School of Information Management, Nanjing University, Nanjing 210023 3.School of Information Management, Wuhan University, Wuhan 430072
摘要变革性研究是科技新范式或新领域产生的前导,识别变革性研究对科研管理和科技前瞻具有重要意义。现有研究较少考虑不同引文关联对目标文献价值判断的影响,为此,本文提出一种融合引文功能和三角引用结构的变革性研究识别方法。根据不同引文功能组合获取目标文献及其前序、后序文献间的三角引用结构,提取文献间的巩固或颠覆关系,据此构建目标文献的自我中心巩固-颠覆引用(ego-centric consolidation-disruption citation,ECCD)网络,以ECCD网络结构特征与文本内容特征为输入,构建图注意力神经网络模型,识别兼具高学术影响力和专家认定颠覆性的变革性研究。在PMCOA(PubMed Central Open Access Subset)数据集上的实证分析发现,变革性研究识别任务的最佳F1值为0.3926,优于其他基线模型。模型参数的可解释性分析显示,本文基于引文功能识别的颠覆性引用关系在高学术影响力研究和变革性研究识别任务中具有重要作用。
郑哲浚, 马亚雪, 梁镇涛, 白云, 裴雷. 融合引文功能与三角引用结构的变革性研究识别方法[J]. 情报学报, 2025, 44(8): 950-961.
Zheng Zhejun, Ma Yaxue, Liang Zhentao, Bai Yun, Pei Lei. An Approach to Identifying Transformative Research by Integrating Citation Function and Triangular Citation Structure. 情报学报, 2025, 44(8): 950-961.
1 National Science Board. Enhancing support of transformative research at the National Science Foundation[R]. Alexandria: National Science Foundation, 2007. 2 林紫洛, 杨雪梅, 于诗睿, 等. 摘要语言视角下医学突破性论文识别研究[J]. 医学信息学杂志, 2023, 44(5): 39-44. 3 Savov P, Jatowt A, Nielek R. Identifying breakthrough scientific papers[J]. Information Processing & Management, 2020, 57(2): 102168. 4 Wang S Y, Ma Y X, Mao J, et al. Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities[J]. Journal of the Association for Information Science and Technology, 2023, 74(2): 150-167. 5 Wei C L, Li J, Shi D B. Quantifying revolutionary discoveries: evidence from Nobel prize-winning papers[J]. Information Processing & Management, 2023, 60(3): 103252. 6 Staudt J, Yu H F, Light R P, et al. High-impact and transformative science (HITS) metrics: definition, exemplification, and comparison[J]. PLoS One, 2018, 13(7): e0200597. 7 Wu L F, Wang D S, Evans J A. Large teams develop and small teams disrupt science and technology[J]. Nature, 2019, 566(7744): 378-382. 8 Funk R J, Owen-Smith J. A dynamic network measure of technological change[J]. Management Science, 2017, 63(3): 791-817. 9 Leibel C, Bornmann L. What do we know about the disruption index in scientometrics? An overview of the literature[J]. Scientometrics, 2024, 129(1): 601-639. 10 Leydesdorff L, Bornmann L. Disruption indices and their calculation using web-of-science data: indicators of historical developments or evolutionary dynamics?[J]. Journal of Informetrics, 2021, 15(4): 101219. 11 Bonzi S, Snyder H W. Motivations for citation: a comparison of self citation and citation to others[J]. Scientometrics, 1991, 21(2): 245-254. 12 Thelwall M. Should citations be counted separately from each originating section?[J]. Journal of Informetrics, 2019, 13(2): 658-678. 13 Bornmann L, Devarakonda S, Tekles A, et al. Are disruption index indicators convergently valid? The comparison of several indicator variants with assessments by peers[J]. Quantitative Science Studies, 2020, 1(3): 1242-1259. 14 刘运梅, 马费成. 面向全文本内容分析的文献三角引用现象研究[J]. 中国图书馆学报, 2021, 47(3): 84-99. 15 Chai S, Menon A. Breakthrough recognition: bias against novelty and competition for attention[J]. Research Policy, 2019, 48(3): 733-747. 16 梁国强, 步一, 胡志刚, 等. 变革性研究预见: 理论模型和多维引文特征[J]. 情报学报, 2022, 41(11): 1111-1123. 17 Small H, Tseng H, Patek M. Discovering discoveries: identifying biomedical discoveries using citation contexts[J]. Journal of Informetrics, 2017, 11(1): 46-62. 18 Wang X, Yang X M, Du J, et al. A deep learning approach for identifying biomedical breakthrough discoveries using context analysis[J]. Scientometrics, 2021, 126(7): 5531-5549. 19 王雪, 杨雪梅, 林紫洛, 等. 基于引文全文本的医学领域突破性文献识别研究[J]. 情报杂志, 2021, 40(3): 132-138. 20 Mariani M S, Medo M, Zhang Y C. Identification of milestone papers through time-balanced network centrality[J]. Journal of Informetrics, 2016, 10(4): 1207-1223. 21 Min C, Bu Y, Sun J J. Predicting scientific breakthroughs based on knowledge structure variations[J]. Technological Forecasting and Social Change, 2021, 164: 120502. 22 梁国强, 侯海燕, 黄福, 等. “革命性论文”学科扩散的早期识别及影响因素分析: 以诺奖论文为例[J]. 情报杂志, 2019, 38(9): 142-149. 23 杨雪梅, 汪雪锋, 唐小利, 等. 生物医学领域突破性论文识别研究[J]. 图书情报工作, 2024, 68(15): 4-14. 24 梁国强, 宋卢睿, 侯海燕. 引文视角下的变革性研究早期识别模型构建方法与应用[J]. 现代情报, 2024, 44(6): 59-66, 81. 25 Hu Z G, Chen C M, Liu Z Y. Where are citations located in the body of scientific articles? A study of the distributions of citation locations[J]. Journal of Informetrics, 2013, 7(4): 887-896. 26 Mari?i? S, Spaventi J, Pavi?i? L, et al. Citation context versus the frequency counts of citation histories[J]. Journal of the American Society for Information Science, 1998, 49(6): 530-540. 27 Cano V. Citation behavior: classification, utility, and location[J]. Journal of the American Society for Information Science, 1989, 40(4): 284-290. 28 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985. 29 秦成磊, 章成志. 基于层次注意力网络模型的学术文本结构功能识别[J]. 数据分析与知识发现, 2020, 4(11): 26-42. 30 Wu J G. Improving the writing of research papers: IMRAD and beyond[J]. Landscape Ecology, 2011, 26(10): 1345-1349. 31 Sollaci L B, Pereira M G. The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey[J]. Journal of the Medical Library Association, 2004, 92(3): 364-367. 32 Qin C L, Zhang C Z. Which structure of academic articles do referees pay more attention to?: perspective of peer review and full-text of academic articles[J]. Aslib Journal of Information Management, 2023, 75(5): 884-916. 33 Huang S Z, Qian J J, Huang Y, et al. Disclosing the relationship between citation structure and future impact of a publication[J]. Journal of the Association for Information Science and Technology, 2022, 73(7): 1025-1042. 34 Boyack K W, Klavans R. Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately?[J]. Journal of the American Society for Information Science and Technology, 2010, 61(12): 2389-2404. 35 Garfield E. Historiographic mapping of knowledge domains literature[J]. Journal of Information Science, 2004, 30(2): 119-145. 36 Kessler M M. Bibliographic coupling between scientific papers[J]. American Documentation, 1963, 14(1): 10-25. 37 Small H, Griffith B C. The structure of scientific literatures I: identifying and graphing specialties[J]. Science Studies, 1974, 4(1): 17-40. 38 Zhang X Y, Xie Q, Song C, et al. Mining the evolutionary process of knowledge through multiple relationships between keywords[J]. Scientometrics, 2022, 127(4): 2023-2053. 39 White H D, Griffith B C. Author cocitation: a literature measure of intellectual structure[J]. Journal of the American Society for Information Science, 1981, 32(3): 163-171. 40 McCain K W. Mapping authors in intellectual space: a technical overview[J]. Journal of the American Society for Information Science, 1990, 41(6): 433-443. 41 Zhao D Z, Strotmann A. Evolution of research activities and intellectual influences in information science 1996–2005: introducing author bibliographic-coupling analysis[J]. Journal of the American Society for Information Science and Technology, 2008, 59(13): 2070-2086. 42 Zhao D Z, Strotmann A. The knowledge base and research front of information science 2006–2010: an author cocitation and bibliographic coupling analysis[J]. Journal of the Association for Information Science and Technology, 2014, 65(5): 995-1006. 43 McCain K W. Mapping economics through the journal literature: an experiment in journal cocitation analysis[J]. Journal of the American Society for Information Science, 1991, 42(4): 290-296. 44 Thijs B, Zhang L, Gl?nzel W. Bibliographic coupling and hierarchical clustering for the validation and improvement of subject-classification schemes[J]. Scientometrics, 2015, 105(3): 1453-1467. 45 Huang Y, Bu Y, Ding Y, et al. Exploring direct citations between citing publications[J]. Journal of Information Science, 2021, 47(5): 615-626. 46 Huang Y, Bu Y, Ding Y, et al. Number versus structure: towards citing cascades[J]. Scientometrics, 2018, 117(3): 2177-2193. 47 Liu Y M, Yang L, Chen M. A new citation concept: triangular citation in the literature[J]. Journal of Informetrics, 2021, 15(2): 101141. 48 刘运梅, 张帅, 司湘云, 等. 基于内容标注的三角引用动机研究方法探析[J]. 图书情报工作, 2021, 65(10): 48-55. 49 Liu Y M, Chen M. Applying text similarity algorithm to analyze the triangular citation behavior of scientists[J]. Applied Soft Computing, 2021, 107: 107362. 50 杨文霞, 邓三鸿, 胡昊天, 等. 三角引用关系中文献位置的差异对比研究[J]. 信息资源管理学报, 2024, 14(1): 131-145. 51 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3613-3618. 52 Wang X, Ji H Y, Shi C, et al. Heterogeneous graph attention network[C]// Proceedings of the World Wide Web Conference. New York: ACM Press, 2019: 2022-2032. 53 Wei C H, Allot A, Lai P T, et al. PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge[J]. Nucleic Acids Research, 2024, 52(W1): W540-W546. 54 Liang Z T, Mao J, Lu K, et al. Finding citations for PubMed: a large-scale comparison between five freely available bibliographic data sources[J]. Scientometrics, 2021, 126(12): 9519-9542.