学术论断句标注与识别方法探索

doi:10.3772/j.issn.1000-0135.2022.07.005

情报学报

2022, Vol. 41

Issue (7): 707-719 DOI: 10.3772/j.issn.1000-0135.2022.07.005

Current Issue | Archive | Adv Search

Recognition Method and Annotation of Academic Claim Sentences

Xu Jian^1,2, Guo Yufan¹, Yu Xuehan¹, Huang Yuxin¹, Yang Tingting¹, Wang Weiyi¹, Liu Zheng¹

1.College of Information Management, Nanjing Agricultural University, Nanjing 210095
2.The Post-Doctoral Research Center of Agricultural & Forestry Economics and Management, College of Economics and Management, Nanjing Agricultural University, Nanjing 210095

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (2965 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract Claim sentences in academic texts contain the scholars' opinions and judgments on research issues. Identifying them is helpful for organizing and mining academic thoughts contained in them to assist scholars to carry out scientific research activities efficiently. Based on previous studies, this paper presents three sufficient conditions and three prerequisite conditions for the judgment of claim sentences and clarifies the judgment criteria of claim sentences from positive and negative perspectives. In this study, we construct the annotation system of claim sentences, select a few papers in the field of information resource management, and carry out the annotation experiment of claim sentences at the abstract and full text levels. The recognition effect of sequential minimal optimization (SMO), support vector machine (SVM), naive Bayesian, decision tree, k-nearest neighbor (kNN), BERT+FC, and BERT+BiLSTM classifiers on a claim sentence was evaluated. The results show that: (1) using the criteria proposed in this study, the annotators have a high consistency in the annotation process of claim and non-claim sentences within academic texts at the abstract and full text levels. (2) When only textual features are used, the method based on the BERT+BiLSTM achieves the best performance. Evaluation shows that the precision, recall, and F_1 indicators are greater than 90%. (3) In academic papers, there exist differences in the length and the relative position within a paragraph and a text, between claim and non-claim sentences. (4) At the abstract level, the SMO method was used. After incorporating the length feature, the recognition effect of the classifier was improved by 0.5% in the F_1 value. At the full-text level, we used the SVM classifier. After adding the features of length and the relative position within the paragraph and text, the recognition effect of the classifier was improved by 2% in the F_1 value.

Key words： academic text claim sentence textual feature machine learning recognition

Received: 02 May 2021

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xu Jian
	Guo Yufan
	Yu Xuehan
	Huang Yuxin
	Yang Tingting
	Wang Weiyi
	Liu Zheng

Cite this article:

Xu Jian,Guo Yufan,Yu Xuehan, et al. Recognition Method and Annotation of Academic Claim Sentences[J]. 情报学报, 2022, 41(7): 707-719.

URL:

https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2022.07.005 OR https://qbxb.istic.ac.cn/EN/Y2022/V41/I7/707

1 Toulmin S E. The uses of argument[M]. Cambridge: Cambridge University Press, 2003.
2 Freeman J B. Dialectics and the macrostructure of arguments: a theory of argument structure[M]. Boston: De Gruyter Mouton, 1991: 50-72.
3 Walton D. Argumentation theory: a very short introduction[M]// Argumentation in Artificial Intelligence. Boston: Springer, 2009: 1-22.
4 Petasis G, Karkaletsis V. Identifying argument components through TextRank[C]// Proceedings of the Third Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2016: 76-81.
5 Levy R, Gretz S, Sznajder B, et al. Unsupervised corpus-wide claim detection[C]// Proceedings of the 4th Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2017: 79-84.
6 Mochales-Palau R, Moens M F. Study on sentence relations in the automatic detection of argumentation in legal cases[C]// Proceedings of the 2007 Conference on Legal Knowledge and Information Systems. Amsterdam: IOS Press, 2007: 89-98.
7 Palau R M, Moens M F. Argumentation mining: the detection, classification and structure of arguments in text[C]// Proceedings of the 12th International Conference on Artificial Intelligence and Law. New York: ACM Press, 2009: 98-107.
8 Moens M F, Boiy E, Palau R M, et al. Automatic detection of arguments in legal texts[C]// Proceedings of the 11th International Conference on Artificial Intelligence and Law. New York: ACM Press, 2007: 225-230.
9 Habernal I, Eckle-Kohler J, Gurevych I. Argumentation mining on the Web from information seeking perspective[C/OL]// Proceedings of ArgNLP. CEUR-WS.org, 2014. http://ceur-ws.org/Vol-1341/paper4.pdf.
10 Park J, Katiyar A, Yang B S. Conditional random fields for identifying appropriate types of support for propositions in online user comments[C]// Proceedings of the 2nd Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2015: 39-44.
11 Sardianos C, Katakis I M, Petasis G, et al. Argument extraction from news[C]// Proceedings of the 2nd Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2015: 56-66.
12 Petasis G. Segmentation of argumentative texts with contextualised word representations[C]// Proceedings of the 6th Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2019: 1-10.
13 Trevisan B, Dickmeis E, Jakobs E M, et al. Indicators of argument-conclusion relationships. an approach for argumentation mining in German discourses[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 104-105.
14 Carstens L, Toni F. Towards relation based argumentation mining[C]// Proceedings of the 2nd Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2015: 29-34.
15 Stab C, Gurevych I. Identifying argumentative discourse structures in persuasive essays[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 46-56.
16 Lawrence J, Reed C. Mining argumentative structure from natural language text using automatically generated premise-conclusion topic models[C]// Proceedings of the 4th Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2017: 22-31.
17 Boltu?i? F, ?najder J. Back up your stance: recognizing arguments in online discussions[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 49-58.
18 Ong N, Litman D, Brusilovsky A. Ontology-based argument mining and automatic essay scoring[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 24-28.
19 Song Y, Heilman M, Beigman Klebanov B, et al. Applying argumentation schemes for essay scoring[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 69-78.
20 Beigman Klebanov B, Stab C, Burstein J, et al. Argumentation: content, structure, and relationship with essay quality[C]// Proceedings of the Third Workshop on Argument Mining. Stroudsburg: Association for Computational Linguistics, 2016: 70-75.
21 Green N L. Representation of argumentation in text with rhetorical structure theory[J]. Argumentation, 2010, 24(2): 181-196.
22 Accuosto P, Neves M, Saggion H. Argumentation mining in scientific literature: from computational linguistics to biomedicine[C]// Proceedings of the 11th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2021. CEUR-WS.org, 2021: 20-36.
23 Accuosto P, Saggion H. Mining arguments in scientific abstracts with discourse-level embeddings[J]. Data & Knowledge Engineering, 2020, 129: 101840.
24 Graves H, Graves R, Mercer R, et al. Titles that announce argumentative claims in biomedical research articles[C]// Proceedings of the First Workshop on Argumentation Mining. Stroudsburg: Association for Computational Linguistics, 2014: 98-99.
25 Park D H, Blake C. Identifying comparative claim sentences in full-text scientific articles[C]// Proceedings of the Workshop on Detecting Structure in Scholarly Discourse. Stroudsburg: Association for Computational Linguistics, 2012: 1-9.
26 阳萍, 谢志鹏. 基于BiLSTM模型的定义抽取方法[J]. 计算机工程, 2020, 46(3): 40-45.
27 章成志, 李铮. 基于学术论文全文的创新研究评价句抽取研究[J]. 数据分析与知识发现, 2019, 3(10): 12-19.
28 宋若璇, 钱力, 杜宇. 基于科技论文中未来工作句集的学术创新构想话题自动生成方法研究[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
29 张颖怡, 章成志. 基于学术论文全文的研究方法句自动抽取研究[J]. 情报学报, 2020, 39(6): 640-650.
30 温浩. 科技文摘创新点语义识别与分类方法研究[J]. 情报学报, 2019, 38(3): 249-256.
31 Ma B W, Wang Y Z, Zhang C Z. CSAA: an online annotating platform for classifying sections of academic articles[C]// Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. New York: ACM Press, 2020: 519-520.
32 Ma B W, Zhang C Z, Wang Y Z. Exploring significant characteristics and models for classification of structure function of academic documents[J]. Data and Information Management, 2021, 5(1): 65-74.
33 陆伟, 黄永, 程齐凯. 学术文本的结构功能识别——功能框架及基于章节标题的识别[J]. 情报学报, 2014, 33(9): 979-985.
34 黄永, 陆伟, 程齐凯, 等. 学术文本的结构功能识别——基于段落的识别[J]. 情报学报, 2016, 35(5): 530-538.
35 方龙, 李信, 黄永, 等. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017, 36(6): 599-605.
36 Cohen J. A coefficient of agreement for nominal scales[J]. Educational and Psychological Measurement, 1960, 20(1): 37-46.
37 Denoeux T. A k-nearest neighbor classification rule based on Dempster-Shafer theory[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1995, 25(5): 804-813.
38 Domingos P M, Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss[J]. Machine Learning, 1997, 29: 103-130.
39 Ross Quinlan J. C4.5: programs for machine learning[M]. San Mateo: Morgan Kaufmann Publishers, 1993.
40 Hsu C W, Lin C J. A comparison of methods for multiclass support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13(2): 415-425.
41 Platt J C. Fast training of support vector machines using sequential minimal optimization[M]// Advances in Kernel Methods: Support Vector Learning. Cambridge: MIT Press, 1999: 185-208.
42 Devlin J. Chang M W. Lee K,et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186.
43 Mihalcea R, Tarau P. TextRank: bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411.
44 汤连杰. HanLP2.0[J]. 软件和集成电路, 2019(8): 95.

Editorial Office: JCSSTI Editorial Office, No.15 fuxing road, haidian, Beijing 100038
Tel: +86(010)68598273; Fax: +86(010)68598285; E-mail: qbxb@istic.ac.cn
Copyright © 2015 by the Journal of The China Society for Scientific and Technical Information
ISSN: 1000-0135 CN: 11-2257 / G3