学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取

doi:10.3772/j.issn.1000-0135.2021.01.005

情报学报

2021, Vol. 40

Issue (1): 43-52 DOI: 10.3772/j.issn.1000-0135.2021.01.005

Current Issue | Archive | Adv Search

Recognition of Lexical Functions in Academic Texts: Problem Method Extraction Based on Title Generation Strategy and Attention Mechanism

Cheng Qikai^1,2, Li Pengcheng^1,2, Zhang Guobiao^1,2, Lu Wei^1,2

1.School of Information Management, Wuhan University, Wuhan 430072
2.Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072

Abstract
Figure/Table
References
Related Citation (15)

Download: PDF (2157 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract The purpose of academic text problem and method identification is to extract research questions and methods from academic text. Aimed at solving the problems of low recognition accuracy, limited recall rate, and poor generalization ability caused by the difficulty of obtaining the training set in traditional recognition methods, this study proposes an academic text problem recognition method based on a deep learning and title generation strategy. The method converts the extraction and recognition of the problem method into the form of title generation in a specific form. By constructing a seq2seq model and introducing an attention mechanism, multi-layer semantic word information was captured to generate and obtain the problem and method pronouns in academic texts. The experimental results showed that through the application of deep learning methods and title generation strategies, this study effectively identified core research problems and core research methods in academic literature.

Key words： lexical function recognition deep learning automatic abstraction academic text

Received: 16 May 2020

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Cheng Qikai
	Li Pengcheng
	Zhang Guobiao
	Lu Wei

Cite this article:

Cheng Qikai,Li Pengcheng,Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Problem Method Extraction Based on Title Generation Strategy and Attention Mechanism[J]. 情报学报, 2021, 40(1): 43-52.

URL:

https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2021.01.005 OR https://qbxb.istic.ac.cn/EN/Y2021/V40/I1/43

1 Hensiak K. Too much of a good thing[J]. Legal Reference Services Quarterly, 2003, 22(2-3): 85-98.
2 孟慧岚, 高鲁山. 科技期刊论文分类标引的探讨[J]. 编辑学报, 2002, 14(1): 27-28.
3 Ribaupierre H D, Falquet G. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: A use case in gender documents[J]. International Journal on Digital Libraries, 2018, 19(2-3): 271-286.
4 Bikel D M, Miller S, Schwartz R, et al. Nymble: a high-performance learning name-finder[C]// Proceedings of the Fifth Conference on Applied Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 1997: 194-201.
5 赵军, 刘康, 周光有, 等. 开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-110.
6 刘怀军, 车万翔, 刘挺. 中文语义角色标注的特征工程[J]. 中文信息学报, 2007, 21(1): 79-84.
7 石进, 韩进, 赵小柯, 等. 基于语境概念核心词提取算法研究[J]. 情报学报, 2019, 38(11): 1177-1186.
8 Abney S P. Parsing by chunks[M]// Berwick R C, Abney S P, Tenny C. (eds) Principle-Based Parsing. Dordrecht: Springer, 1991: 257-278.
9 Palmer M, Gildea D, Xue N W. Semantic role labeling[J]. Synthesis Lectures on Human Language Technologies, 2010, 3(1): 1-103.
10 文勖, 张宇, 刘挺, 等. 基于句法结构分析的中文问题分类[J]. 中文信息学报, 2006, 20(2): 33-39.
11 Kondo T, Nanba H, Takezawa T, et al. Technical trend analysis by analyzing research papers’ titles[C]// Proceedings of the Language and Technology Conference. Heidelberg: Springer, 2011: 512-521.
12 Nanba H, Kondo T, Takezawa T. Automatic creation of a technical trend map from research papers and patents[C]// Proceedings of the 3rd International Workshop on Patent Information Retrieval. New York: ACM Press, 2010: 11-16.
13 Trappey A J C, Trappey C V, Govindarajan U H, et al. A review of technology standards and patent portfolios for enabling cyber-physical systems in advanced manufacturing[J]. IEEE Access, 2016, 4: 7356-7382.
14 Choi S, Yoon J, Kim K, et al. SAO network analysis of patents for technology trends identification: a case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells[J]. Scientometrics, 2011, 88(3): 863-883.
15 Cheng T Y, Wang M T. The patent-classification technology/function matrix-A systematic method for design around[J]. Journal of Intellectual Property Rights, 2013, 18(2): 158-167.
16 Gupta S, Manning C D. Analyzing the dynamics of research by extracting key aspects of scientific papers[C]// Proceedings of 5th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2011: 1-9.
17 Tsai C T, Kundu G, Roth D. Concept-based analysis of scientific literature[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York: ACM Press, 2013: 1733-1738.
18 程齐凯. 学术文本的词汇功能识别[D]. 武汉: 武汉大学, 2015.
19 李信, 程齐凯, 刘兴帮. 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017, 61(1): 109-116.
20 刘智锋, 李信, 程齐凯, 等. 学术文本关键词语义功能数据集构建与分析——以Journal of Informetrics为例[J/OL]. 图书馆论坛, 2019, 39(7): 64-74.
21 Jin R, Hauptmann A G. Automatic title generation for spoken broadcast news[C]// Proceedings of the First International Conference on Human Language Technology Research. Stroudsburg: Association for Computational Linguistics, 2001: 1-3.
22 李浥尘, 胡珀, 王丽君. 基于神经网络的体育新闻自动生成研究[J]. 中文信息学报, 2018, 32(3): 77-83.
23 李勇, 成红红, 梁新彦, 等. CNN图像标题生成[J]. 西安电子科技大学学报, 2019, 46(2): 152-157.
24 Zeng K H, Chen T H, Niebles J C, et al. Title generation for user generated videos[C]// Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 609-625.
25 汤鹏杰, 谭云兰, 李金忠, 等. 密集帧率采样的视频标题生成[J]. 计算机科学与探索, 2018, 12(6): 981-993.
26 Ribeiro R, Matos D M D. Extractive summarization of broadcast news: comparing strategies for European Portuguese[C]// Proceedings of the International Conference on Text, Speech and Dialogue. Heidelberg: Springer, 2007: 115-122.
27 Nallapati R, Zhou B W, dos Santos C, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2016: 280-290.
28 Nallapati R, Zhai F F, Zhou B W. SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents[C]// Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3075-3081.
29 Ayana, Shen S Q, Zhao Y, et al. Neural headline generation with sentence-wise optimization[OL]. (2016-10-09). https://arxiv.org/pdf/1604.01904.pdf.
30 Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 379-389.
31 Chopra S, Auli M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2016: 93-98.
32 Scott M, Thompson G. Patterns of text: in honour of Michael Hoey[M]. Amsterdam: John Benjamins Publishing Company, 2001.
33 Paiva C E, da Silveira Nogueira Lima J P, Paiva B S R. Articles with short titles describing the results are cited more often[J]. Clinics, 2012, 67(5): 509-513.
34 Jamali H R, Nikzad M. Article title type and its relation with the number of downloads and citations[J]. Scientometrics, 2011, 88(2): 653-661.
35 Putra J W G, Khodra M L. Automatic title generation in scientific articles for authorship assistance: A summarization approach[J]. Journal of ICT Research and Applications, 2017, 11(3): 253.
36 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. (2013-09-07). https://arxiv.org/pdf/1301.3781.pdf.
37 Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2002: 311-318.
38 Wang Q Y, Huang L F, Jiang Z Y, et al. PaperRobot: incremental draft generation of scientific ideas[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 1980-1991.

Editorial Office: JCSSTI Editorial Office, No.15 fuxing road, haidian, Beijing 100038
Tel: +86(010)68598273; Fax: +86(010)68598285; E-mail: qbxb@istic.ac.cn
Copyright © 2015 by the Journal of The China Society for Scientific and Technical Information
ISSN: 1000-0135 CN: 11-2257 / G3