|
|
Identification and Utilization of Key Points of Scientific Papers Based on Peer Review Texts |
Chen Chong1, Cheng Zijia2, Wang Chuanqing3, Li Lei1 |
1.School of Government, Beijing Normal University, Beijing 100875 2.School of Information Resource Management, Renmin University of China, Beijing 100872 3.National Science Library, Chinese Academy of Sciences, Beijing 100190 |
|
|
Abstract Scientific researchers often aim at specific tasks when searching literatures, such as seeking topics, methods, and conclusions. However, distinguishing the numerous key points of scientific papers and judging their value is time consuming and laborious. The task also requires extensive professional knowledge. The peer review contains the disclosure of the paper’s key points and the authoritative evaluation of the reference value of the paper, which can effectively help in meeting the aforementioned needs. This study considers the peer review as the object, defines the key point types in the review on the basis of the typical elements in scientific research activities, and extracts the key points of the paper described in the peer review through supervised learning methods, which not only provide a structured summary of the key points of the paper, but can also be used to assist the literature retrieval. This research collected 549 papers published in Acta Psychologica Sinica between 2014 and 2020 and their corresponding reviews. Four types of key points are defined: general information, methods, results, and highlights. Then, four classification models are trained using SVM (support vector machine), FastText, TextCNN (convolutional neural networks), and BiLSTM (bi-directional long short-term memory) to compare the results. Experiments show that the BiLSTM method has the best efficacy in key points recognition, with an average recognition accuracy of 91% across five tests. The highlights in the key points are further categorized into four: topic selection, value, method, and writing, which is then subdivided by the SVM method, with an F1 value of 85%. Similar to the application of the research results, this study also uses the recognized key points to facilitate in-depth understanding of the scientific paper, classifies the search results on the basis of the highlights, and improves the organization and service form of paper retrieval. This study’s contributions are as follows: (1) introducing the research problems of mining the key points of a scientific paper from the peer review and constructing the framework and hierarchy of the key points; (2) transforming the key points recognition into a classification task and comparing various classification methods to determine the comprehensively optimal method; and (3) achieving the classification organization of retrieval results on the basis of key points and assisting users in understanding and judging the results.
|
Received: 07 May 2022
|
|
|
|
1 de Ribaupierre H, Falquet G. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: a use case in gender documents[J]. International Journal on Digital Libraries, 2018, 19(2): 271-286. 2 李信, 程齐凯, 刘兴帮. 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017, 61(1): 109-116. 3 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——在学术搜索中的应用[J]. 情报学报, 2016, 35(4): 425-431. 4 Kang D, Ammar W, Dalvi B, et al. A dataset of peer reviews (PeerRead): collection, insights and NLP applications[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 1647-1661. 5 Wang K, Wan X J. Sentiment analysis of peer review texts for scholarly papers[C]// Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. New York: ACM Press, 2018: 175-184. 6 Chakraborty S, Goyal P, Mukherjee A. Aspect-based sentiment analysis of scientific reviews[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. New York: ACM Press, 2020: 207-216. 7 Thelwall M, Papas E R, Nyakoojo Z, et al. Automatically detecting open academic review praise and criticism[J]. Online Information Review, 2020, 44(5): 1057-1076. 8 Ghosal T, Verma R, Ekbal A, et al. DeepSentiPeer: harnessing sentiment in review texts to recommend peer review decisions[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 1120-1130. 9 Fernandes G L, Vaz-de-Melo P O S. Between acceptance and rejection: challenges for an automatic peer review process[C]// Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries. New York: ACM Press, 2022: Article No.32. 10 Nakov P I, Schwartz A S, Hearst M A. Citances: citation sentences for semantic analysis of bioscience text[C]// Proceedings of the SIGIR Workshop on Search and Discovery in Bioinformatics, 2004: 81-88. 11 Elkiss A, Shen S W, Fader A, et al. Blind men and elephants: what do citation summaries tell us about a research article?[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1): 51-62. 12 Chen C, Zhang J Y, Chu X Y, et al. Study on the difference between summary peer reviews and abstracts of scientific papers[C]// Proceedings of the Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents. CERU-WS.org, 2020: 83-85. 13 Qazvinian V, Radev D R. Scientific paper summarization using citation summary networks[C]// Proceedings of the 22nd International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2008: 689-696. 14 Mohammad S, Dorr B, Egan M, et al. Using citations to generate surveys of scientific paradigms[C]// Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2009: 584-592. 15 雷声伟, 陈海华, 黄永, 等. 学术文献引文上下文自动识别研究[J]. 图书情报工作, 2016, 60(17): 78-87. 16 Fitzpatrick K. Peer-to-peer review and the future of scholarly authority[J]. Social Epistemology, 2010, 24(3): 161-179. 17 Wang P L, Tahamtan I. The state-of-the-art of open peer review: early adopters[J]. Proceedings of the Association for Information Science and Technology, 2017, 54(1): 819-820. 18 李金珍, 庄景春, 邱炳武. 《心理学报》开放性同行评审方式探索及初步成效[J]. 中国科技期刊研究, 2015, 26(2): 139-142. 19 Soergel D, Saunders A, McCallum A. Open scholarship and peer review: a time for experimentation[C]// Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28. 20 Ghosal T, Kumar S, Bharti P K, et al. Peer review analyze: a novel benchmark resource for computational analysis of peer reviews[J]. PLoS One, 2022, 17(1): e0259238. 21 Moxham N. Authors, editors and newsmongers: form and genre in the philosophical transactions under Henry Oldenburg[M]// News Networks in Early Modern Europe. Leiden: Brill Academic Publishers, 2016: 465-492. 22 Zuckerman H, Merton R K. Patterns of evaluation in science: institutionalisation, structure and functions of the referee system[J]. Minerva, 1971, 9(1): 66-100. 23 Hua X Y, Nikolov M, Badugu N, et al. Argument mining for understanding peer reviews[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 2131-2137. 24 Fromm M, Faerman E, Berrendorf M, et al. Argument mining driven analysis of peer-reviews[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 4758-4766. 25 Lawrence J, Reed C. Argument mining: a survey[J]. Computational Linguistics, 2020, 45(4): 765-818. 26 Swales J M. Aspects of article introductions[M]. Ann Arbor: University of Michigan Press, 2011. 27 Swales J M. Genre analysis: English in academic and research settings[M]. Cambridge: Cambridge University Press, 1990: 127-174. 28 American National Standards Institute (ANSI Z39.14-1979). American national standard for writing abstracts[S]. New York: American National Standards Institute, 1979. 29 dos Santos M B. The textual organization of research paper abstracts in applied linguistics[J]. Text, 1996, 16(4): 481-499. 30 Toulmin S E. The uses of argument[M]. Cambridge: Cambridge University Press, 2003. 31 Nwogu K N. The medical research paper: structure and functions[J]. English for Specific Purposes, 1997, 16(2): 119-138. 32 de Ribaupierre H. Precise information retrieval in semantic scientific digital libraries[D]. Geneva: University of Geneva, 2014. 33 Liakata M, Teufel S, Siddharthan A, et al. Corpora for the conceptualisation and zoning of scientific papers[C]// Proceedings of the 7th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2010: 2054-2061. 34 陈果, 许天祥. 基于主动学习的科技论文句子功能识别研究[J]. 数据分析与知识发现, 2019, 3(8): 53-61. 35 张智雄, 刘欢, 丁良萍, 等. 不同深度学习模型的科技论文摘要语步识别效果对比研究[J]. 数据分析与知识发现, 2019, 3(12): 1-9. 36 Teufel S, Carletta J, Moens M. An annotation scheme for discourse-level argumentation in research articles[C]// Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1999: 110-117. 37 McKnight L, Srinivasan P. Categorization of sentence types in medical abstracts[J]. AMIA Annual Symposium Proceedings, 2003, 2003: 440-444. 38 Wu J C, Chang Y C, Liou H C, et al. Computational analysis of move structures in academic abstracts[C]// Proceedings of the COLING/ACL on Interactive Presentation Sessions. Stroudsburg: Association for Computational Linguistics, 2006: 41-44. 39 Hirohata K, Okazaki N, Ananiadou S, et al. Identifying sections in scientific abstracts using conditional random fields[C]// Proceedings of the 3rd International Joint Conference on Natural Language Processing, 2008: 381-388. 40 de Ribaupierre H, Falquet G. New trends for reading scientific documents[C]// Proceedings of the 4th ACM Workshop on Online Books, Complementary Social Media and Crowdsourcing. New York: ACM Press, 2011: 19-24. 41 Qin C L, Zhang C Z. Exploring the distribution of referees’ comments in IMRaD structure of academic articles[C]// Proceedings of the 18th International Conference on Scientometrics and Informetrics. KU Leuven: International Society for Scientometrics and Informetrics, 2021: 1527-1528. 42 Park J, Cardie C. A corpus of eRulemaking user comments for measuring evaluability of arguments[C]// Proceedings of the Eleventh International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2018: 1623-1628. 43 李艳霞, 柴毅, 胡友强, 等. 不平衡数据分类方法综述[J]. 控制与决策, 2019, 34(4): 673-688. 44 Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 427-431. 45 Kim Y. Convolutional neural networks for sentence classification[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1746-1751. 46 Lai S W, Xu L H, Liu K, et al. Recurrent convolutional neural networks for text classification[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 2267-2273. |
|
|
|