|
|
Towards Automatic Literature Review Generation System: Research on Document Value Assessment |
Ding Heng, Ruan Jinglong |
School of Information Management, Central China Normal University, Wuhan 430079 |
|
|
Abstract The problem of information overload caused by “thesis explosion” has directed attention towards research on automatic review systems. How to automatically select important documents that can reflect the development of knowledge is the primary problem that the automatic review system needs to solve. In this study, starting from which factors influence review authors’ selection of references, the rules of review authors’ citation behaviors are excavated to assess the value of documents, and a document evaluation model for automatic review systems is constructed based on the ranking learning framework. This study uses Microsoft Academic Graph as the data source to construct an experimental data set and evaluates the experimental results through two indicators: ΔP@K and NDCG@K. The experimental results revealed two findings: (1) Compared with pointwise and listwise approaches, the pairwise approach is more suitable for training the optimal document evaluation model. The pairwise approach gains 0.274, 0.085, 0.738, and 0.831 on ΔP@100, ΔP@200, NDCG@100, and NDCG@200, respectively. (2) Knowledge importance, literature quality, and influence have a greater contribution to the improvement of the model and are the primary considerations for the authors of the review article to evaluate the value of the literature and choose references.
|
Received: 11 October 2021
|
|
|
|
1 Tsafnat G, Glasziou P, Choong M K, et al. Systematic review automation technologies[J]. Systematic Reviews, 2014, 3: Article No.74. 2 Wang J, Zhang C Z, Zhang M Y, et al. CitationAS: a tool of automatic survey generation based on citation content[J]. Journal of Data and Information Science, 2018, 3(2): 20-37. 3 Portenoy J, West J D. Constructing and evaluating automated literature review systems[J]. Scientometrics, 2020, 125(3): 3233-3251. 4 Wang L L, Lo K. Text mining approaches for dealing with the rapidly expanding literature on COVID-19[J]. Briefings in Bioinformatics, 2021, 22(2): 781-799. 5 Portenoy J, West J D. Supervised learning for automated literature review[C]// Proceedings of the 4th Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries. CEUR-WS.org, 2019: 83-91. 6 Nye B E, Nenkova A, Marshall I J, et al. Trialstreamer: mapping and browsing medical evidence in real-time[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg: Association for Computational Linguistics, 2020: 63-69. 7 Tang J, Jin R M, Zhang J. A topic modeling approach and its integration into the random walk framework for academic search[C]// Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008: 1055-1060. 8 Sayyadi H, Getoor L. FutureRank: ranking scientific articles by predicting their future PageRank[C]// Proceedings of the 2009 SIAM International Conference on Data Mining. Philadelphia: Society for Industrial and Applied Mathematics, 2009: 533-544. 9 Wang Y J, Tong Y H, Zeng M. Ranking scientific articles by exploiting citations, authors, journals, and time information[C]// Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2013: 933-939. 10 Xiong C Y, Power R, Callan J. Explicit semantic ranking for academic search via knowledge graph embedding[C]// Proceedings of the 26th International Conference on World Wide Web. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2017: 1271-1279. 11 Gargiulo F, Silvestri S, Fontanella M, et al. A deep learning approach for scientific paper semantic ranking[C]// Proceedings of the International Conference on Intelligent Interactive Multimedia Systems and Services. Cham: Springer, 2018: 471-481. 12 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——在学术搜索中的应用[J]. 情报学报, 2016, 35(4): 425-431. 13 王瑞雪, 方婧, 李信, 等. 学术查询意图类目体系构建与分析:百度学术查询日志的实证[J]. 图书情报工作, 2021, 65(4): 73-80. 14 万连城. 面向问题导向的学术文献搜索引擎研究[J]. 电子科技, 2016, 29(12): 142-144, 147. 15 Balabanovic M, Shoham Y. Fab: content-based, collaborative recommendation[J]. Communications of the ACM, 1997, 40(3): 66-72. 16 李响, 谭静. 融合相关性与多样性的学术论文推荐方法研究[J]. 情报理论与实践, 2017, 40(6): 99-103. 17 谭红叶, 要一璐, 梁颖红. 基于知识脉络的科技论文推荐[J]. 山东大学学报(理学版), 2016, 51(5): 94-101. 18 杨凯, 王利, 周志平, 等. 基于内容和协同过滤的科技文献个性化推荐[J]. 信息技术, 2019, 43(12): 11-14. 19 Asabere N Y, Xia F, Meng Q X, et al. Scholarly paper recommendation based on social awareness and folksonomy[J]. International Journal of Parallel, Emergent and Distributed Systems, 2015, 30(3): 211-232. 20 Vellino A. Recommending research articles using citation data[J]. Library Hi Tech, 2015, 33(4): 597-609. 21 Zhou Q, Chen X Z, Chen C S. Authoritative scholarly paper recommendation based on paper communities[C]// Proceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering. IEEE, 2014: 1536-1540. 22 Gori M, Pucci A. Research paper recommender systems: a random-walk based approach[C]// Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. IEEE, 2006: 778-781. 23 Bai X M, Wang M Y, Lee I, et al. Scientific paper recommendation: a survey[J]. IEEE Access, 2019, 7: 9324-9339. 24 步一, 许家伟, 黄文彬. 基于引文的科学文献定量评价: 引文影响力指标评述[J]. 图书情报知识, 2021, 38(6): 47-59, 46. 25 Aksnes D W, Langfeldt L, Wouters P. Citations, citation indicators, and research quality: an overview of basic concepts and theories[J]. SAGE Open, 2019, 9(1). DOI: 10.1177/2158244019829575. 26 Merton R K. Priorities in scientific discovery: a chapter in the sociology of science[J]. American Sociological Review, 1957, 22(6): 635-659. 27 Gilbert G N. Referencing as persuasion[J]. Social Studies of Science, 1977, 7(1): 113-122. 28 Garfield E. Can citation indexing be automated?[C]// Symposium Proceedings of Statistical Association Methods for Mechanized Documentation, 1965, 269: 189-192. 29 Lyu D Q, Ruan X M, Xie J, et al. The classification of citing motivations: a meta-synthesis[J]. Scientometrics, 2021, 126(4): 3243-3264. 30 马凤, 武夷山. 关于论文引用动机的问卷调查研究——以中国期刊研究界和情报学界为例[J]. 情报杂志, 2009, 28(6): 9-14, 8. 31 邱均平, 陈晓宇, 何文静. 科研人员论文引用动机及相互影响关系研究[J]. 图书情报工作, 2015, 59(9): 36-44. 32 Tahamtan I, Afshar A S, Ahamdzadeh K. Factors affecting number of citations: a comprehensive review of the literature[J]. Scientometrics, 2016, 107(3): 1195-1225. 33 Belter C W. Citation analysis as a literature search method for systematic reviews[J]. Journal of the Association for Information Science and Technology, 2016, 67(11): 2766-2777. 34 Janssens A C J W, Gwinn M. Novel citation-based search method for scientific literature: application to meta-analyses[J]. BMC Medical Research Methodology, 2015, 15(1): 84. 35 Chen T T. The development and empirical study of a literature review aiding system[J]. Scientometrics, 2012, 92(1): 105-116. 36 Yu T, Yu G, Li P Y, et al. Citation impact prediction for scientific papers using stepwise regression analysis[J]. Scientometrics, 2014, 101(2): 1233-1252. 37 Peng T Q, Zhu J J H. Where you publish matters most: a multilevel analysis of factors affecting citations of internet studies[J]. Journal of the American Society for Information Science and Technology, 2012, 63(9): 1789-1803. 38 Yan R, Tang J, Liu X B, et al. Citation count prediction: learning to estimate future citations for literature[C]// Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2011: 1247-1252. 39 Dong Y X, Johnson R A, Chawla N V. Will this paper increase your h-index? Scientific impact prediction[C]// Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. New York: ACM Press, 2015: 149-158. 40 Uzzi B, Mukherjee S, Stringer M, et al. Atypical combinations and scientific impact[J]. Science, 2013, 342(6157): 468-472. 41 Roth C, Wu J, Lozano S. Assessing impact and quality from local dynamics of citation networks[J]. Journal of Informetrics, 2012, 6(1): 111-120. 42 Yan R, Huang C R, Tang J, et al. To better stand on the shoulder of giants[C]// Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries. New York: ACM Press, 2012: 51-60. 43 Bai X M, Zhang F L, Lee I. Predicting the citations of scholarly paper[J]. Journal of Informetrics, 2019, 13(1): 407-418. 44 Chakraborty T, Kumar S, Goyal P, et al. Towards a stratified learning approach to predict future citation counts[C]// Proceedings of the IEEE/ACM Joint Conference on Digital Libraries. IEEE, 2014: 351-360. 45 Zhang X Y, Xie Q, Song M. Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network[J]. Journal of Informetrics, 2021, 15(2): 101140. 46 Liu T Y. Learning to rank for information retrieval[J]. Foundations and Trends in Information Retrieval, 2009, 3(3): 225-331. 47 Fan R E, Chang K W, Hsieh C J, et al. LIBLINEAR: a library for large linear classification[J]. Journal of Machine Learning Research, 2008, 9: 1871-1874. 48 Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. 49 Zhu J, Zou H, Rosset S, et al. Multi-class AdaBoost[J]. Statistics and Its Interface, 2009, 2(3): 349-360. 50 Geurts P, Ernst D, Wehenkel L. Extremely randomized trees[J]. Machine Learning, 2006, 63(1): 3-42. 51 Friedman J H. Greedy function approximation: a gradient boosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232. 52 Burges C J C. From RankNet to LambdaRank to LambdaMART: an overview[R]. Microsoft Research Technical Report, 2010: MSR-TR-2010-82. 53 Burges C J C, Ragno R, Le Q V. Learning to rank with nonsmooth cost functions[C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge: The MIT Press, 2006: 193-200. 54 ?trumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions[J]. Knowledge and Information Systems, 2014, 41(3): 647-665. 责任编辑 潘尧 |
|
|
|