|
|
Multi-disciplinary Citation Classification with Multiple Features |
Zheng Zhihan, Li Xinyu, Meng Fan, Bu Yi |
Department of Information Management, Peking University, Beijing 100871 |
|
|
Abstract As the primary method to deeply understand citation behavior, citation classification research plays an important role in many scenarios, such as document management, retrieval, and utilization. This study uses machine learning methods to further explore citation classification by reviewing important citation behavior mechanisms and citation classification research. In this study, the fields of the original dataset can be supplemented and increased by matching the literature database and document analysis, and the features of four major categories that may be related to citation classification are extracted during the construction of the citation classification model. Thereafter, the feature selection is conducted using a simulated annealing algorithm. The results indicate that the established random forest model has the best performance on citation influence and citation function classification and outperforms the classification model combining the support vector machine with the SciBERT linear layer. The model established by the study improves the performance of automatic classification of multidisciplinary citations and the process of feature extraction and selection in research, as well as the exploration of the relationship between citation categories and some factors that have certain reference values for related research.
|
Received: 04 October 2023
|
|
|
|
1 邱均平, 陈晓宇, 何文静. 科研人员论文引用动机及相互影响关系研究[J]. 图书情报工作, 2015, 59(9): 36-44. 2 Liu M X. A study of citing motivation of Chinese scientists[J]. Journal of Information Science, 1993, 19(1): 13-23. 3 Ryan R M, Deci E L. Intrinsic and extrinsic motivation from a self-determination theory perspective: definitions, theory, practices, and future directions[J]. Contemporary Educational Psychology, 2020, 61: 101860. 4 Brooks T A. Evidence of complex citer motivations[J]. Journal of the American Society for Information Science, 1986, 37(1): 34-36. 5 Condic K S. Citation analysis of student dissertations and faculty publications in reading and educational leadership at Oakland University[J]. The Journal of Academic Librarianship, 2015, 41(5): 548-557. 6 Garfield E. Can citation indexing be automated?[C]// Proceedings of the Symposium on Statistical Association Methods for Mechanized Documentation. Washington, DC: National Bureau of Standards, 1965: 189-192. 7 Moravcsik M J, Murugesan P. Some results on the function and quality of citations[J]. Social Studies of Science, 1975, 5(1): 86-92. 8 Hassan R N, Serenko A. Patterns of citations for the growth of knowledge: a Foucauldian perspective[J]. Journal of Documentation, 2019, 75(3): 593-611. 9 Teufel S, Siddharthan A, Tidhar D. Automatic classification of citation function[C]// Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2006: 103-110. 10 Dong C L, Sch?fer U. Ensemble-style self-training on citation classification[C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2011: 623-631. 11 Zhu X D, Turney P, Lemire D, et al. Measuring academic influence: not all citations are equal[J]. Journal of the Association for Information Science and Technology, 2015, 66(2): 408-427. 12 Valenzuela M, Ha V, Etzioni O. Identifying meaningful citations[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 21-26. 13 Jurgens D, Kumar S, Hoover R, et al. Measuring the evolution of a scientific field through citation frames[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 391-406. 14 Jochim C, Schütze H. Towards a generic and flexible citation classifier based on a faceted classification scheme[C]// Proceedings of COLING 2012. The COLING 2012 Organizing Committee, 2012: 1343-1358. 15 Cohan A, Ammar W, van Zuylen M, et al. Structural scaffolds for citation intent classification in scientific publications[C]// Proceedings of the 2019 Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 3586-3596. 16 Pride D, Knoth P. An authoritative approach to citation classification[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. New York: ACM Press, 2020: 337-340. 17 Kunnath S N, Stauber V, Wu R, et al. ACT2: a multi-disciplinary semi-structured dataset for importance and purpose classification of citations[C]// Proceedings of the 13th Language Resources and Evaluation Conference. European Language Resources Association, 2022: 3398-3406. 18 Qayyum F, Afzal M T. Identification of important citations by exploiting research articles’ metadata and cue-terms from content[J]. Scientometrics, 2019, 118(1): 21-43. 19 Hassan S U, Safder I, Akram A, et al. A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis[J]. Scientometrics, 2018, 116(2): 973-996. 20 Wang M Y, Zhang J Q, Jiao S J, et al. Important citation identification by exploiting the syntactic and contextual information of citations[J]. Scientometrics, 2020, 125(3): 2109-2129. 21 Wan X J, Liu F. Are all literature citations equally important? Automatic citation strength estimation and its applications[J]. Journal of the Association for Information Science and Technology, 2014, 65(9): 1929-1938. 22 Abu-Jbara A, Ezra J, Radev D. Purpose and polarity of citation: towards NLP-based bibliometrics[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2013: 596-606. 23 Zeng T, Acuna D E. Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models[J]. Scientometrics, 2020, 124(1): 399-428. 24 Nazir S, Asif M, Ahmad S, et al. Important citation identification by exploiting content and section-wise in-text citation count[J]. PLoS One, 2020, 15(3): e0228885. 25 Maheshwari H, Singh B, Varma V. SciBERT sentence representation for citation context classification[C]// Proceedings of the 2nd Workshop on Scholarly Document Processing. Stroudsburg: Association for Computational Linguistics, 2021: 130-133. 26 Boyack K W, Klavans R, B?rner K. Mapping the backbone of science[J]. Scientometrics, 2005, 64(3): 351-374. 27 Rafols I, Porter A L, Leydesdorff L. Science overlay maps: a new tool for research policy and library management[J]. Journal of the American Society for Information Science and Technology, 2010, 61(9): 1871-1887. 责任编辑 魏瑞斌) |
|
|
|