基于全文内容的学术论文研究方法自动分类研究

doi:10.3772/j.issn.1000-0135.2020.08.007

情报学报

2020, Vol. 39

Issue (8): 852-862 DOI: 10.3772/j.issn.1000-0135.2020.08.007

Current Issue | Archive | Adv Search

Using Full Content to Automatically Classify the Research Methods of Academic Articles

Zhang Chengzhi¹, Li Zhuo¹, Chu Heting²

1.Department of Information Management, School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094
2.Palmer School of Library and Information Science, Long Island University, New York 11548

Abstract
Figure/Table
References
Related Citation (7)

Download: PDF (1551 KB) HTML (140 KB)
Export: BibTeX | EndNote (RIS)

Abstract Automatic classification of the research methods used in academic papers is helpful for the evaluative analysis of these research methods in that it provides a basis for researchers to recommend or select the appropriate methods for their scholarly endeavors. Compared with using only abstracts for classification, the full content of articles contain more context regarding research methods, which is of great significance in exploring such automatic classification. This study examines the full content of 820 academic papers in the field of library and information science (LIS). Experts in the field of the LIS annotated method went through these academic papers. Subsequently, a training corpus for the classification of research methods was generated. We adopted the problem transformation method and algorithm adaptive method in the multi-label classification task. Na?ve Bayes and Support Vector Machine were used as the underlying classifiers of the problem transformation method to construct six different classification models. Meanwhile, the ML-KNN model in the algorithm adaptive method was selected to automatically classify the research methods used in the chosen articles. The experimental results showed that classification performance with the full article improved greatly when compared to using only the abstract. The Na?ve Bayes algorithm performed the best in the classifier chain strategy of the problem transformation method, and the F₁value reached 0.705. In addition, the results also demonstrated that research methods used in different academic papers are represented differently. A small training set would lead to low generalizability of automatic classification results.

Key words： classification of research methods text classification full-text content multi-label classification

Received: 15 April 2019

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Zhang Chengzhi
	Li Zhuo
	Chu Heting

Cite this article:

Zhang Chengzhi,Li Zhuo,Chu Heting. Using Full Content to Automatically Classify the Research Methods of Academic Articles[J]. 情报学报, 2020, 39(8): 852-862.

URL:

https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2020.08.007 OR https://qbxb.istic.ac.cn/EN/Y2020/V39/I8/852

1 魏瑞斌. 基于内容分析的国内图书情报学研究方法创新研究——以共词分析方法为例[J]. 图书情报工作, 2016, 60(24): 107-114.
2 储荷婷. 图书馆情报学界的研究方法:实践与发展[J]. 国家图书馆学刊, 2014, 23(3): 3-14.
3 Chu H T, Ke Q. Research methods: What??s in the name?[J]. Library & Information Science Research, 2017, 39(4): 284-294.
4 Eckle-Kohler J, Nghiem T D, Gurevych I. Automatically assigning research methods to journal articles in the domain of social sciences[J]. Proceedings of the American Society for Information Science and Technology, 2013, 50(1): 1-8.
5 顾立平. 科研模式变革中的数据管理服务:实现开放获取、开放数据、开放科学的途径[J]. 中国图书馆学报, 2018, 44(6): 43-58.
6 Peritz B C. Are methodological papers more cited than theoretical or empirical ones? The case of sociology[J]. Scientometrics, 1983, 5(4): 211-218.
7 Palvia P, Mao E, Salam A F, et al. Management information systems research: What’s there in a methodology?[J]. Communications of the Association for Information Systems, 2003, 11: 289-309.
8 杨溢, 李伟超. 1990—2001年我国图书馆学情报学方法论研究统计分析[J]. 图书馆, 2003(5): 31-34.
9 王芳, 王向女. 我国情报学研究方法的计量分析: 以1999~2008年《情报学报》为例[J]. 情报学报, 2010, 29(4): 652-662.
10 Togia A, Malliari A. Research methods in library and information science[EB/OL]. [2019-04-01]. https://www.intechopen.com/books/qualitative-versus-quantitative-research/research-methods-in-library-and-information-science.
11 Houngbo H, Mercer R E. Method mention extraction from scientific research papers[C]// Proceedings of 26th International Conference on Computational Linguistics. The COLING 2012 Organizing Committee, 2012: 1211-1222.
12 Kova?evi? A, Konjovi? Z, Milosavljevi? B, et al. Mining methodologies from NLP publications: A case study in automatic terminology recognition[J]. Computer Speech & Language, 2012, 26(2): 105-126.
13 化柏林. 针对中文学术文献的情报方法术语抽取[J]. 现代图书情报技术, 2013(6): 68-75.
14 化柏林. 学术论文中方法知识元的类型与描述规则研究[J]. 中国图书馆学报, 2016, 42(1): 30-40.
15 徐军, 丁宇新, 王晓龙. 使用机器学习方法进行新闻的情感自动分类[J]. 中文信息学报, 2007, 21(6): 95-100.
16 王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014, 30(3): 80-87.
17 Heffernan K, Teufel S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2): 1367-1382.
18 刘浏, 王东波. 基于论文自动分类的社科类学科跨学科性研究[J]. 数据分析与知识发现, 2018, 2(3): 30-38.
19 Zhang X, Zhao J, Lecun Y. Character-level convolutional networks for text classification[C]// Proceedings of the 29th Annual Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 649-657.
20 Lai S W, Xu L H, Liu K, et al. Recurrent convolutional neural networks for text classification[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 2267-2273.
21 代六玲, 黄河燕, 陈肇雄. 中文文本分类中特征抽取方法的比较研究[J]. 中文信息学报, 2004, 18(1): 26-32.
22 Yang Y, Pedersen J. A comparative study on feature selection in text categorization[C]// Proceedings of the 14th International Conference on Machine Learning, Nashville, USA, 1997: 412-420.
23 Zhang M L, Zhou Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837.
24 Boutell M R, Luo J B, Shen X P, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9): 1757-1771.
25 Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification[C]// Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases. Heidelberg: Springer, 2009: 254-269.
26 Tsoumakas G, Vlahavas I. Random k-labelsets: An ensemble method for multilabel classification[C]// Proceedings of the 18th European Conference on Machine Learning. Heidelberg: Springer, 2007: 406-417.
27 李思男, 李宁, 李战怀. 多标签数据挖掘技术:研究综述[J]. 计算机科学, 2013, 40(4): 14-21.
28 Zhang M L, Zhou Z H. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
29 Lewis D D. Naive Bayes at forty: The independence assumption in information retrieval[C]// Proceedings of the 10th European Conference on Machine Learning. Heidelberg: Springer, 1998: 4-15.
30 Tong S, Koller D. Support vector machine active learning with applications to text classification[J]. Journal of Machine Learning Research, 2002, 2(1): 999-1006.

Editorial Office: JCSSTI Editorial Office, No.15 fuxing road, haidian, Beijing 100038
Tel: +86(010)68598273; Fax: +86(010)68598285; E-mail: qbxb@istic.ac.cn
Copyright © 2015 by the Journal of The China Society for Scientific and Technical Information
ISSN: 1000-0135 CN: 11-2257 / G3