Recognition Method and Annotation of Academic Claim Sentences
Xu Jian1,2, Guo Yufan1, Yu Xuehan1, Huang Yuxin1, Yang Tingting1, Wang Weiyi1, Liu Zheng1
1.College of Information Management, Nanjing Agricultural University, Nanjing 210095 2.The Post-Doctoral Research Center of Agricultural & Forestry Economics and Management, College of Economics and Management, Nanjing Agricultural University, Nanjing 210095
摘要学术文本中的论断句包含了学者对研究问题的看法和判断,对其进行识别有助于组织和挖掘其中蕴含的学术观点,以辅助学者更高效地开展科研活动。在对前人研究进行归纳的基础上,提出论断句判断的3个充分条件和3个必要条件,从肯定和否定角度构建论断句判定标准。开发论断句标注系统,选择信息资源管理领域部分论文,开展摘要和全文层面论断句的标注实验。评测最小序列优化、支持向量机、朴素贝叶斯、决策树、k近邻、BERT(bidirectional encoder representations from transformers)+FC(full connection)、BERT+BiLSTM(bidirectional long short-term memory)分类器对论断句的识别效果。研究发现:①使用本文提出的判断标准,标注者在摘要和全文层面对学术文本中论断句和非论断句的标注一致性较高;②仅使用文本特征情况下,BERT+BiLSTM算法识别效果最好,准确率、召回率和F_1值等指标均大于90%;③论断句和非论断句在长度、段内位置、文内位置和TextRank权重上频率分布均存在差异;④在摘要层面,使用序列最小优化算法,加入长度特征后,分类器识别效果提升0.5%;在全文层面,使用支持向量机分类器,加入长度、段内相对位置、文内相对位置特征后,分类器识别效果在F_1值上取得了2%的提升。
1 Toulmin S E. The uses of argument[M]. Cambridge: Cambridge University Press, 2003. 2 Freeman J B. Dialectics and the macrostructure of arguments: a theory of argument structure[M]. Boston: De Gruyter Mouton, 1991: 50-72. 3 Walton D. Argumentation theory: a very short introduction[M]// Argumentation in Artificial Intelligence. Boston: Springer, 2009: 1-22. 4 Petasis G, Karkaletsis V. Identifying argument components through TextRank[C]// Proceedings of the Third Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2016: 76-81. 5 Levy R, Gretz S, Sznajder B, et al. Unsupervised corpus-wide claim detection[C]// Proceedings of the 4th Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2017: 79-84. 6 Mochales-Palau R, Moens M F. Study on sentence relations in the automatic detection of argumentation in legal cases[C]// Proceedings of the 2007 Conference on Legal Knowledge and Information Systems. Amsterdam: IOS Press, 2007: 89-98. 7 Palau R M, Moens M F. Argumentation mining: the detection, classification and structure of arguments in text[C]// Proceedings of the 12th International Conference on Artificial Intelligence and Law. New York: ACM Press, 2009: 98-107. 8 Moens M F, Boiy E, Palau R M, et al. Automatic detection of arguments in legal texts[C]// Proceedings of the 11th International Conference on Artificial Intelligence and Law. New York: ACM Press, 2007: 225-230. 9 Habernal I, Eckle-Kohler J, Gurevych I. Argumentation mining on the Web from information seeking perspective[C/OL]// Proceedings of ArgNLP. CEUR-WS.org, 2014. http://ceur-ws.org/Vol-1341/paper4.pdf. 10 Park J, Katiyar A, Yang B S. Conditional random fields for identifying appropriate types of support for propositions in online user comments[C]// Proceedings of the 2nd Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2015: 39-44. 11 Sardianos C, Katakis I M, Petasis G, et al. Argument extraction from news[C]// Proceedings of the 2nd Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2015: 56-66. 12 Petasis G. Segmentation of argumentative texts with contextualised word representations[C]// Proceedings of the 6th Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2019: 1-10. 13 Trevisan B, Dickmeis E, Jakobs E M, et al. Indicators of argument-conclusion relationships. an approach for argumentation mining in German discourses[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 104-105. 14 Carstens L, Toni F. Towards relation based argumentation mining[C]// Proceedings of the 2nd Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2015: 29-34. 15 Stab C, Gurevych I. Identifying argumentative discourse structures in persuasive essays[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 46-56. 16 Lawrence J, Reed C. Mining argumentative structure from natural language text using automatically generated premise-conclusion topic models[C]// Proceedings of the 4th Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2017: 22-31. 17 Boltu?i? F, ?najder J. Back up your stance: recognizing arguments in online discussions[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 49-58. 18 Ong N, Litman D, Brusilovsky A. Ontology-based argument mining and automatic essay scoring[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 24-28. 19 Song Y, Heilman M, Beigman Klebanov B, et al. Applying argumentation schemes for essay scoring[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 69-78. 20 Beigman Klebanov B, Stab C, Burstein J, et al. Argumentation: content, structure, and relationship with essay quality[C]// Proceedings of the Third Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2016: 70-75. 21 Green N L. Representation of argumentation in text with rhetorical structure theory[J]. Argumentation, 2010, 24(2): 181-196. 22 Accuosto P, Neves M, Saggion H. Argumentation mining in scientific literature: from computational linguistics to biomedicine[C]// Proceedings of the 11th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2021. CEUR-WS.org, 2021: 20-36. 23 Accuosto P, Saggion H. Mining arguments in scientific abstracts with discourse-level embeddings[J]. Data & Knowledge Engineering, 2020, 129: 101840. 24 Graves H, Graves R, Mercer R, et al. Titles that announce argumentative claims in biomedical research articles[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 98-99. 25 Park D H, Blake C. Identifying comparative claim sentences in full-text scientific articles[C]// Proceedings of the Workshop on Detecting Structure in Scholarly Discourse. Stroudsburg: Association for Computational Linguistics, 2012: 1-9. 26 阳萍, 谢志鹏. 基于BiLSTM模型的定义抽取方法[J]. 计算机工程, 2020, 46(3): 40-45. 27 章成志, 李铮. 基于学术论文全文的创新研究评价句抽取研究[J]. 数据分析与知识发现, 2019, 3(10): 12-19. 28 宋若璇, 钱力, 杜宇. 基于科技论文中未来工作句集的学术创新构想话题自动生成方法研究[J]. 数据分析与知识发现, 2021, 5(5): 10-20. 29 张颖怡, 章成志. 基于学术论文全文的研究方法句自动抽取研究[J]. 情报学报, 2020, 39(6): 640-650. 30 温浩. 科技文摘创新点语义识别与分类方法研究[J]. 情报学报, 2019, 38(3): 249-256. 31 Ma B W, Wang Y Z, Zhang C Z. CSAA: an online annotating platform for classifying sections of academic articles[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. New York: ACM Press, 2020: 519-520. 32 Ma B W, Zhang C Z, Wang Y Z. Exploring significant characteristics and models for classification of structure function of academic documents[J]. Data and Information Management, 2021, 5(1): 65-74. 33 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985. 34 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538. 35 方龙, 李信, 黄永, 等. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017, 36(6): 599-605. 36 Cohen J. A coefficient of agreement for nominal scales[J]. Educational and Psychological Measurement, 1960, 20(1): 37-46. 37 Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5): 804-813. 38 Domingos P M, Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss[J]. Machine Learning, 1997, 29: 103-130. 39 Ross Quinlan J. C4.5: programs for machine learning[M]. San Mateo: Morgan Kaufmann Publishers, 1993. 40 Hsu C W, Lin C J. A comparison of methods for multiclass support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13(2): 415-425. 41 Platt J C. Fast training of support vector machines using sequential minimal optimization[M]// Advances in Kernel Methods: Support Vector Learning. Cambridge: MIT Press, 1999: 185-208. 42 Devlin J. Chang M W. Lee K,et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 43 Mihalcea R, Tarau P. TextRank: bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 44 汤连杰. HanLP2.0[J]. 软件和集成电路, 2019(8): 95.