情报学报  2021, Vol. 40 Issue (6): 621-629    DOI: 10.3772/j.issn.1000-0135.2021.06.007
Current Issue | Archive | Adv Search |
Automatic Labeling of Semantic Clauses in Research Articles
Huang Wenbin1, Wang Yueqian1, Bu Yi1, Che Shangkun2
1.Department of Information Management, Peking University, Beijing 100871
2.School of Economics and Management, Tsinghua University, Beijing 100084
Download: PDF (784 KB)   HTML (105 KB) 
Export: BibTeX | EndNote (RIS)      
Abstract  Analyzing the semantic structure of research articles can be widely used to address multiple issues such as information extraction and retrieval. This paper describes the semantic structure of research articles by applying machine learning techniques to recognize the semantic types of discourse segments in these articles. We extracted the macro structure of research articles, including the syntactic and lexical information of each discourse segment as input features, and trained five models, namely support vector machines (SVM), conditional random fields (CRF), random forests (RF), gradient boost classifier (GBC), and stochastic gradient descent classifier (SGD). We integrated three best-performing models, that is, CRF, SVM, and GBC, to form a bagging model for classifying all discourse segments from the full text. Experimental results showed that our bagging model outperformed the baseline model on tasks of classifying discourse segments from full text and result sections with a higher accuracy and F-score. Furthermore, a topic-clustering experiment demonstrated the effectiveness of the model on topic detection, which is a common task in the field of text mining.
Key wordsresearch article      semantic labeling      text classification      machine learning      clustering     
Received: 13 May 2020     
Service
E-mail this article
Add to my bookshelf
Add to citation manager
E-mail Alert
RSS
Articles by authors
Huang Wenbin
Wang Yueqian
Bu Yi
Che Shangkun
Cite this article:   
Huang Wenbin,Wang Yueqian,Bu Yi, et al. Automatic Labeling of Semantic Clauses in Research Articles[J]. 情报学报, 2021, 40(6): 621-629.
URL:  
https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2021.06.007     OR     https://qbxb.istic.ac.cn/EN/Y2021/V40/I6/621