|
|
Using Behavior and Influence Assessment of Algorithms Based on Full-text Academic Articles |
Zhang Chengzhi1, 2, 3, Ding Ruiyi1, 3, Wang Yuzhuo1, 3 |
1. Department of Information Management, Nanjing University of Science & Technology, Nanjing 210094;
2. Jiangsu Key Laboratory of Data Engineering and Knowledge Service (Nanjing University), Nanjing 210093;
3. Jiangsu Collaborative Innovation Center of Social Safety Science and Technology, Nanjing 210094 |
|
|
Abstract Data mining algorithms have been widely used in scientific research and practice. Investigating the mentions of data mining algorithms in academic papers and assessing their influence can help researchers comprehensively understand algorithms used in their field and select those that are appropriate based on a given research task. We used full-text of academic articles to conduct an analysis using the behavior of algorithms and evaluating their influence. This paper considers the field of natural language processing and collects the full-text proceedings accumulated by the China National Conference on Computing Linguistics (CCL) from 1993 to 2016 to conduct a comprehensive examination of the top 10 data mining algorithms based on four aspects: frequency of mention, location of mention, motivation for mention, and age distribution. The impacts of the algorithms are evaluated according to these four aspects. The experimental results show that obvious differences exist among the 10 algorithms; the SVM algorithm has the highest influence, while the CART and Apriori algorithms have low influence. This investigation can provide recommendations for researchers, especially novices whose works are related to data-driven research or applications, as well as introduce new ideas toward the assessment of algorithm influence.
|
Received: 04 July 2018
|
|
|
|
[1] 王玉琢, 章成志. 考虑全文本内容的算法学术影响力分析研究[J]. 图书情报工作, 2017, 61(23): 6-14.
[2] Wilbanks E G, Facciotti M T. Evaluation of algorithm performance in ChIP-seq peak detection[J]. PLoS ONE, 2010, 5(7): e11471.
[3] Nesma S, Mohammed B, Mohammed C. Statistical comparisons of the Top 10 algorithms in data mining for classification task[J]. International Journal of Interactive Multimedia and Artificial Intelligence, 2016, 4(1): 46-51.
[4] Wu X. Top 10 algorithms in data mining[J]. Knowledge Information System, 2008, 14(1): 1-37.
[5] Ding Y, Song M, Han J, et al. Entitymetrics: Measuring the impact of entities[J]. PLoS ONE, 2013, 8(8): e71416.
[6] Kathy M, Hal D, Snigdha C, et al. Predicting the impact of scientific concepts using full-text features[J]. Journal of the Association for Information Science and Technology, 2016, 67(11): 2684-2696.
[7] 丁楠, 黎娇, 李文雨泽, 等. 基于引用的科学数据评价研究[J]. 图书与情报, 2014(5): 95-99.
[8] Belter C W. Measuring the value of research data: A citation analysis of oceanographic data sets[J]. PLoS ONE, 2014, 9(3): e92590.
[9] 王雪, 马胜利, 余曾溧, 等. 科学数据的引用行为及其影响力研究[J]. 情报学报, 2016, 35(11): 1132-1139.
[10] Pan X, Yan E, Ming S, et al. Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools[J]. Journal of Informetrics, 2018, 12 (2): 481-493.
[11] 赵蓉英, 魏明坤, 汪少震. 基于Altmetrics 的开源软件学术影响力评价研究[J]. 中国图书馆学报, 2017, 43(2): 80-95.
[12] 杨波, 王雪, 余曾溧. 生物信息学文献中的科学软件利用行为研究[J]. 情报学报, 2016, 35(11): 1140-1147.
[13] Wang Y Z, Zhang C Z. Using full-text of research articles to analyze academic impact of algorithms[C]// Proceedings of International Conference on Information. Springer, 2018: 395-401.
[14] 赵蓉英, 曾宪琴, 陈必坤. 全文本引文分析—引文分析的新发展[J]. 图书情报工作, 2014, 58(9): 129-135.
[15] Ding Y, Zhang G, Chambers T, et al. Content-based citation analysis: The next generation of citation analysis[J]. Journal of the Association for Information Science and Technology, 2014, 65(9): 1820-1833.
[16] 胡志刚. 全文引文分析方法与应用[M]. 北京: 科学出版社, 2017.
[17] 王文娟, 马建霞, 陈春, 等. 引文文本分类与实现方法研究综述[J]. 图书情报工作, 2016, 60(6): 118-127.
[18] 刘盛博. 科学论文的引用内容分析及其应用[D]. 大连: 大连理工大学, 2014.
[19] 李婷婷, 李秀霞. 基于引文内容的信息学期刊互引分析[J]. 情报杂志, 2016, 35(2): 110-115.
[20] An J, Kim N, Kan M Y, et al. Exploring characteristics of highly cited authors according to citation location and content[J]. Journal of the Association for Information Science and Technology, 2017, 68(8): 1975-1988.
[21] Hassan S U, Safder I, Akram A, et al. A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis[J]. Scientometrics, 2018, 116(2): 973-996.
[22] Garfield E. Can citation indexing be automated?[C]// Proceedings of the Symposyum on Statistical Association, Washington DC, 1964: 84-90.
[23] Weinstock M. Citation indexes[J]. Encyclopedia of Library and Information Science, 1971, 5(1): 16-40.
[24] Brooks T A. Private acts and public objects:an investigation of citer motivations[J]. Journal of the American Society for Information Science, 2010, 36(4): 223-229.
[25] Erikson M G, Erlandson P. A taxonomy ofmotivesto cite[J]. Social Studies of Science, 2014, 44(4): 625-637.
[26] Moravcsik M J, Murugesan P. Some results on the function and quality of citations[J]. Social Studies of Science, 1975, 5(1): 86-92.
[27] Teufel S, Siddharthan A, Tidhar D. An annotation scheme for citation function[C]// Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue. Stroudsburg: Association for Computational Linguistics, 2006: 80-87.
[28] Jurgens D, Kumar S, Hoover R, et al. Measuring the evolution of a scientific field through citation frames[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 391-406.
[29] 邱均平, 陈晓宇, 何文静. 科研人员论文引用动机及相互影响关系研究[J]. 图书情报工作, 2015, 59(9): 36-44.
[30] Ding Y, Liu X, Guo C, et al. The distribution of references across texts: Some implications for citation analysis[J]. Journal of Informetrics, 2013, 7(3): 583-592.
[31] Mccain K, Tuhneh K. Citation context analysis and aging patterns of journal articles in molecular genetics[J]. Scientometrics, 1989, 17(1-2): 127- l63.
[32] Lin L, Evans S. Structural patterns in empirical research articles: A cross-disciplinary study[J]. English for Specific Purposes, 2012, 31(3): 150-160.
[33] 崔明, 潘雪莲, 华薇娜. 我国图书情报领域的软件使用和引用研究[J]. 中国图书馆学报, 2018(3): 66-78. |
|
|
|