学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取

doi:10.3772/j.issn.1000-0135.2021.01.005

情报学报

2021, Vol. 40

Issue (1): 43-52 DOI: 10.3772/j.issn.1000-0135.2021.01.005

情报分析方法与技术

本期目录 | 过刊浏览 | 高级检索

学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取

程齐凯^1,2, 李鹏程^1,2, 张国标^1,2, 陆伟^1,2

1.武汉大学信息管理学院，武汉 430072
2.武汉大学信息检索与知识挖掘研究所，武汉 430072

Recognition of Lexical Functions in Academic Texts: Problem Method Extraction Based on Title Generation Strategy and Attention Mechanism

Cheng Qikai^1,2, Li Pengcheng^1,2, Zhang Guobiao^1,2, Lu Wei^1,2

1.School of Information Management, Wuhan University, Wuhan 430072
2.Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (2157 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要学术文本词汇功能识别的目的是实现学术文本中表征问题、方法和对象等词汇的抽取。针对传统识别方法中训练难以获取所导致的识别准确率低、召回率有限和泛化能力差等问题，本研究提出了一种基于深度学习和标题生成策略的学术文本词汇功能识别方法，将任务形式由信息抽取转化为特定形式的标题生成问题。本研究采用构建seq2seq模型和引入注意力机制的方式捕获词汇多层语义信息，最终实现学术文本中问题和方法指代词的生成和获取。实验结果表明，通过应用深度学习方法和标题生成策略，本研究提出的模型能够从摘要中有效识别学术文献的主要研究问题和主要研究方法，并较已有方法在识别效果上有明显提升。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	程齐凯
	李鹏程
	张国标
	陆伟

关键词 ：汇功能识别, 深度学习, 自动文摘, 学术文本

收稿日期: 2020-05-16

基金资助:国家自然科学基金项目“基于多语义信息融合的学术文献引文推荐研究”(71673211)；国家自然科学基金青年科学基金项目“基于深度语义挖掘的引文推荐多样化研究”(71704137)。

作者简介: 程齐凯，男，1989年生，博士，副教授，主要研究方向为自然语言处理、信息检索、机器学习；李鹏程，男，1994年生，博士研究生，研究方向为文本挖掘，深度学习；张国标，男，1990年生，博士研究生，研究方向为图像识别，深度学习；陆伟，男，1974年生，博士，教授，博士生导师，主要研究方向为信息检索、知识管理、数据智能等，E-mail:weilu@whu.edu.c；

引用本文:

程齐凯, 李鹏程, 张国标, 陆伟. 学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取[J]. 情报学报, 2021, 40(1): 43-52.
Cheng Qikai, Li Pengcheng, Zhang Guobiao, Lu Wei. Recognition of Lexical Functions in Academic Texts: Problem Method Extraction Based on Title Generation Strategy and Attention Mechanism. 情报学报, 2021, 40(1): 43-52.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2021.01.005 或 https://qbxb.istic.ac.cn/CN/Y2021/V40/I1/43

1 Hensiak K. Too much of a good thing[J]. Legal Reference Services Quarterly, 2003, 22(2-3): 85-98.
2 孟慧岚, 高鲁山. 科技期刊论文分类标引的探讨[J]. 编辑学报, 2002, 14(1): 27-28.
3 Ribaupierre H D, Falquet G. Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: A use case in gender documents[J]. International Journal on Digital Libraries, 2018, 19(2-3): 271-286.
4 Bikel D M, Miller S, Schwartz R, et al. Nymble: a high-performance learning name-finder[C]// Proceedings of the Fifth Conference on Applied Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 1997: 194-201.
5 赵军, 刘康, 周光有, 等. 开放式文本信息抽取[J]. 中文信息学报, 2011, 25(6): 98-110.
6 刘怀军, 车万翔, 刘挺. 中文语义角色标注的特征工程[J]. 中文信息学报, 2007, 21(1): 79-84.
7 石进, 韩进, 赵小柯, 等. 基于语境概念核心词提取算法研究[J]. 情报学报, 2019, 38(11): 1177-1186.
8 Abney S P. Parsing by chunks[M]// Berwick R C, Abney S P, Tenny C. (eds) Principle-Based Parsing. Dordrecht: Springer, 1991: 257-278.
9 Palmer M, Gildea D, Xue N W. Semantic role labeling[J]. Synthesis Lectures on Human Language Technologies, 2010, 3(1): 1-103.
10 文勖, 张宇, 刘挺, 等. 基于句法结构分析的中文问题分类[J]. 中文信息学报, 2006, 20(2): 33-39.
11 Kondo T, Nanba H, Takezawa T, et al. Technical trend analysis by analyzing research papers’ titles[C]// Proceedings of the Language and Technology Conference. Heidelberg: Springer, 2011: 512-521.
12 Nanba H, Kondo T, Takezawa T. Automatic creation of a technical trend map from research papers and patents[C]// Proceedings of the 3rd International Workshop on Patent Information Retrieval. New York: ACM Press, 2010: 11-16.
13 Trappey A J C, Trappey C V, Govindarajan U H, et al. A review of technology standards and patent portfolios for enabling cyber-physical systems in advanced manufacturing[J]. IEEE Access, 2016, 4: 7356-7382.
14 Choi S, Yoon J, Kim K, et al. SAO network analysis of patents for technology trends identification: a case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells[J]. Scientometrics, 2011, 88(3): 863-883.
15 Cheng T Y, Wang M T. The patent-classification technology/function matrix-A systematic method for design around[J]. Journal of Intellectual Property Rights, 2013, 18(2): 158-167.
16 Gupta S, Manning C D. Analyzing the dynamics of research by extracting key aspects of scientific papers[C]// Proceedings of 5th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2011: 1-9.
17 Tsai C T, Kundu G, Roth D. Concept-based analysis of scientific literature[C]// Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. New York: ACM Press, 2013: 1733-1738.
18 程齐凯. 学术文本的词汇功能识别[D]. 武汉: 武汉大学, 2015.
19 李信, 程齐凯, 刘兴帮. 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017, 61(1): 109-116.
20 刘智锋, 李信, 程齐凯, 等. 学术文本关键词语义功能数据集构建与分析——以Journal of Informetrics为例[J/OL]. 图书馆论坛, 2019, 39(7): 64-74.
21 Jin R, Hauptmann A G. Automatic title generation for spoken broadcast news[C]// Proceedings of the First International Conference on Human Language Technology Research. Stroudsburg: Association for Computational Linguistics, 2001: 1-3.
22 李浥尘, 胡珀, 王丽君. 基于神经网络的体育新闻自动生成研究[J]. 中文信息学报, 2018, 32(3): 77-83.
23 李勇, 成红红, 梁新彦, 等. CNN图像标题生成[J]. 西安电子科技大学学报, 2019, 46(2): 152-157.
24 Zeng K H, Chen T H, Niebles J C, et al. Title generation for user generated videos[C]// Proceedings of the European Conference on Computer Vision. Cham: Springer, 2016: 609-625.
25 汤鹏杰, 谭云兰, 李金忠, 等. 密集帧率采样的视频标题生成[J]. 计算机科学与探索, 2018, 12(6): 981-993.
26 Ribeiro R, Matos D M D. Extractive summarization of broadcast news: comparing strategies for European Portuguese[C]// Proceedings of the International Conference on Text, Speech and Dialogue. Heidelberg: Springer, 2007: 115-122.
27 Nallapati R, Zhou B W, dos Santos C, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2016: 280-290.
28 Nallapati R, Zhai F F, Zhou B W. SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents[C]// Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3075-3081.
29 Ayana, Shen S Q, Zhao Y, et al. Neural headline generation with sentence-wise optimization[OL]. (2016-10-09). https://arxiv.org/pdf/1604.01904.pdf.
30 Rush A M, Chopra S, Weston J. A neural attention model for abstractive sentence summarization[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 379-389.
31 Chopra S, Auli M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2016: 93-98.
32 Scott M, Thompson G. Patterns of text: in honour of Michael Hoey[M]. Amsterdam: John Benjamins Publishing Company, 2001.
33 Paiva C E, da Silveira Nogueira Lima J P, Paiva B S R. Articles with short titles describing the results are cited more often[J]. Clinics, 2012, 67(5): 509-513.
34 Jamali H R, Nikzad M. Article title type and its relation with the number of downloads and citations[J]. Scientometrics, 2011, 88(2): 653-661.
35 Putra J W G, Khodra M L. Automatic title generation in scientific articles for authorship assistance: A summarization approach[J]. Journal of ICT Research and Applications, 2017, 11(3): 253.
36 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. (2013-09-07). https://arxiv.org/pdf/1301.3781.pdf.
37 Papineni K, Roukos S, Ward T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2002: 311-318.
38 Wang Q Y, Huang L F, Jiang Z Y, et al. PaperRobot: incremental draft generation of scientific ideas[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 1980-1991.