|
|
Essential Reference Measurements from the Perspective of Full-Text: Concept Definition, Index System, and Identification Model |
Lin Gege1, Hou Haiyan1, Pan Yuxin1, Liang Guoqiang2, Hu Zhigang3 |
1.School of Public Administration and Policy, Dalian University of Technology, Dalian 116024 2.College of Economics and Management, Beijing University of Technology, Beijing 100124 3.Institute for Science, Technology and Society, South China Normal University, Guangzhou 510006 |
|
|
Abstract Identifying essential references within citing documents is fundamental for conducting thorough evaluations of scientific achievements. Therefore, this study explores the measurement of essential references from the perspective of full text that includes the definition of concepts, construction of an indicator system, and optimization of identification models, thereby providing a more precise scientific evaluation tool. First, the definition of essential references was clarified, and an indicator system for identifying essential references was constructed, encompassing two dimensions (bibliographic and citation information), eight sub-dimensions, and 33 citation feature indicators. Second, by utilizing various machine learning models, such as random forest, support vector machine, and logistic regression, citation feature indicators were selected and optimized. Their correlations and information gains were analyzed, and 21 important citation feature indicators were retained, to validate the effectiveness of the identification models. The results indicate that citation feature indicators based on citation information hold greater importance and contribute more to the identification of essential references. The performance of machine learning models in identifying essential references was excellent, particularly for the random forest, support vector machine, and logistic regression models, with area under receiver operating characteristic curve (AUC) values exceeding 0.85, demonstrating the efficiency and robustness of the models. The core citation measurement methods and identification models not only provide more accurate tools for scientific evaluation systems but also lay a solid foundation for further in-depth research into citation analysis.
|
Received: 22 December 2023
|
|
|
|
1 国务院办公厅关于完善科技成果评价机制的指导意见[EB/OL]. (2021-07-16) [2024-09-21]. https://www.gov.cn/gongbao/content/2021/content_5631817.htm. 2 中共中央 国务院印发《深化新时代教育评价改革总体方案》[EB/OL]. (2020-10-13) [2024-09-21]. http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/202010/t20201013_494381.html. 3 教育部印发《关于破除高校哲学社会科学研究评价中“唯论文”不良导向的若干意见》的通知[EB/OL]. (2020-12-15) [2024-09-21]. http://www.moe.gov.cn/srcsite/A13/moe_2557/s3103/202012/t20201215_505588.html. 4 教育部 科技部印发《关于规范高等学校SCI论文相关指标使用 树立正确评价导向的若干意见》的通知[EB/OL]. (2020-02-23) [2024-09-21]. http://www.moe.gov.cn/srcsite/A16/moe_784/202002/t20200223_423334.html. 5 Lyu D Q, Ruan X M, Xie J, et al. The classification of citing motivations: a meta-synthesis[J]. Scientometrics, 2021, 126(4): 3243-3264. 6 Mandard M. On the shoulders of giants? Motives to cite in management research[J]. European Management Review, 2022, 19(1): 10-21. 7 Pak C M, Wang W B, Yu G. An analysis of in-text citations based on fractional counting[J]. Journal of Informetrics, 2020, 14(4): 101070. 8 Lin G G, Hou H Y, Hu Z G. Understanding multiple references citation[C]// Proceedings of the 17th International Conference on Scientometrics and Informetrics. Leuven: ISSI Society, 2019: 2347-2357. 9 Hu Z G, Lin G G, Sun T A, et al. Understanding multiply mentioned references[J]. Journal of Informetrics, 2017, 11(4): 948-958. 10 Zhao D Z, Cappello A, Johnston L. Functions of uni- and multi-citations: implications for weighted citation analysis[J]. Journal of Data and Information Science, 2017, 2(1): 51-69. 11 胡志刚. 全文引文分析: 理论、方法与应用[M]. 北京: 科学出版社, 2016. 12 胡志刚, 章成志. 悄然兴起的全文计量分析[J]. 图书馆论坛, 2021, 41(3): 1-11. 13 章成志, 胡志刚, 徐硕, 等. 全文本计量分析理论与技术的新进展与新探索——2019全文本文献计量分析学术沙龙综述[J]. 信息资源管理学报, 2020, 10(1): 111-117. 14 Ding Y, Zhang G, Chambers T, et al. Content-based citation analysis: the next generation of citation analysis[J]. Journal of the Association for Information Science and Technology, 2014, 65(9): 1820-1833. 15 赵蓉英, 曾宪琴, 陈必坤. 全文本引文分析——引文分析的新发展[J]. 图书情报工作, 2014, 58(9): 129-135. 16 Valenzuela M, Ha V, Etzioni O. Identifying meaningful citations[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 21-26. 17 Nazir S, Asif M, Ahmad S, et al. Important citation identification by exploiting content and section-wise in-text citation count[J]. PLoS One, 2020, 15(3): e0228885. 18 Cano V. Citation behavior: classification, utility, and location[J]. Journal of the American Society for Information Science, 1989, 40(4): 284-290. 19 McCain K W, Turner K. Citation context analysis and aging patterns of journal articles in molecular genetics[J]. Scientometrics, 1989, 17(1): 127-163. 20 Cui X. Identification of essential references based on the full text of scientific papers and its application in scientometrics[D]. Leiden: Leiden University, 2014. 21 朱大明. 研究型论文中“关键引文”概念初探[J]. 中国科技期刊研究, 2015, 26(11): 1161-1165. 22 Zhu X D, Turney P, Lemire D, et al. Measuring academic influence: not all citations are equal[J]. Journal of the Association for Information Science and Technology, 2015, 66(2): 408-427. 23 Qayyum F, Afzal M T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content[J]. Scientometrics, 2019, 118(1): 21-43. 24 Aljuaid H, Iftikhar R, Ahmad S, et al. Important citation identification using sentiment analysis of in-text citations[J]. Telematics and Informatics, 2021, 56: 101492. 25 夏红玉, 胡潜, 王忠义. 基于引文重要性的知识流动主路径分析[J]. 情报学报, 2022, 41(5): 451-462. 26 Szomszor M, Pendlebury D A, Adams J. How much is too much? The difference between research influence and self-citation excess[J]. Scientometrics, 2020, 123(2): 1119-1147. 27 Mishra S, Fegley B D, Diesner J, et al. Self-citation is the hallmark of productive authors, of any gender[J]. PLoS One, 2018, 13(9): e0195773. 28 Jones T H, Hanney S. Tracing the indirect societal impacts of biomedical research: development and piloting of a technique based on citations[J]. Scientometrics, 2016, 107(3): 975-1003. 29 Small H G. Cited documents as concept symbols[J]. Social Studies of Science, 1978, 8(3): 327-340. 30 Horbach S, Aagaard K, Schneider J W. Meta-research: how problematic citing practices distort science[OL]. (2021-02-22). https://doi.org/10.31222/osf.io/aqyhg. 31 Lin G G, van Eck N J, Hou H Y, et al. The changing role of cited papers over time: an analysis of highly cited papers based on a large full text dataset[C/OL]// Proceedings of the 26th International Conference on Science and Technology Indicator, (2022-09-07). https://doi.org/10.5281/zenodo.6948268. 32 Jarneving B. Bibliographic coupling and its application to research-front and other core documents[J]. Journal of Informetrics, 2007, 1(4): 287-307. 33 Ghosal T, Tiwary P, Patton R, et al. Towards establishing a research lineage via identification of significant citations[J]. Quantitative Science Studies, 2022, 2(4): 1511-1528. 34 章成志, 张颖怡. 基于学术论文全文的研究方法实体自动识别研究[J]. 情报学报, 2020, 39(6): 589-600. 35 秦成磊, 章成志. 基于层次注意力网络模型的学术文本结构功能识别[J]. 数据分析与知识发现, 2020, 4(11): 26-42. 36 Ding Y, Liu X Z, Guo C, et al. The distribution of references across texts: some implications for citation analysis[J]. Journal of Informetrics, 2013, 7(3): 583-592. 37 Mari?i? S, Spaventi J, Pavi?i? L, et al. Citation context versus the frequency counts of citation histories[J]. Journal of the American Society for Information Science, 1998, 49(6): 530-540. 38 Tang R, Safer M A. Author-rated importance of cited references in biology and psychology publications[J]. Journal of Documentation, 2008, 64(2): 246-272. 39 Zhao D Z, Strotmann A. Deep and narrow impact: introducing location filtered citation counting[J]. Scientometrics, 2020, 122(1): 503-517. 40 Small H. Characterizing highly cited method and non-method papers using citation contexts: the role of uncertainty[J]. Journal of Informetrics, 2018, 12(2): 461-480. 41 Herlach G. Can retrieval of information from citation indexes be simplified? Multiple mention of a reference as a characteristic of the link between cited and citing article[J]. Journal of the American Society for Information Science, 1978, 29(6): 308-310. 42 林歌歌. 科技论文中多项引用的分布与特征研究[D]. 大连: 大连理工大学, 2019. 43 Huang S Z, Qian J J, Huang Y, et al. Disclosing the relationship between citation structure and future impact of a publication[J]. Journal of the Association for Information Science and Technology, 2022, 73(7): 1025-1042. 44 Hassan S U, Akram A, Haddawy P. Identifying important citations using contextual information from full text[C]// Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries. Piscataway: IEEE, 2017: 41-48. 45 Hutto C, Gilbert E. VADER: a parsimonious rule-based model for sentiment analysis of social media text[C]// Proceedings of the 8th International AAAI Conference on Weblogs and Social Media. Palo Alto: AAAI Press, 2014: 216-225. 46 Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text[C]// Proceedings of the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3613-3618. 47 Pedregosa F, Varoquaux G E L, Gramfort A, et al. Scikit-learn: machine learning in python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830. 48 Géron A. Hands-on machine learning with scikit-learn, keras, and TensorFlow[M]. Sevastopol: O’Reilly Media, 2022. 49 Kraskov A, St?gbauer H, Grassberger P. Estimating mutual information[J]. Physical Review E, 2004, 69(6): 066138. 50 Azhagusundari B, Thanamani A S. Feature selection based on information gain[J]. International Journal of Innovative Technology and Exploring Engineering, 2013, 2(2): 18-21. |
|
|
|