融合语步和文本多特征的科技论文结构化摘要生成

doi:10.3772/j.issn.1000-0135.2023.10.004

情报学报

2023, Vol. 42

Issue (10): 1176-1186 DOI: 10.3772/j.issn.1000-0135.2023.10.004

情报技术与应用

本期目录 | 过刊浏览 | 高级检索

融合语步和文本多特征的科技论文结构化摘要生成

习海旭^1,2, 何胜¹, 黄纯国¹

1.江苏理工学院计算机工程学院，常州 213001
2.南京理工大学经济管理学院信息管理系，南京 210094

Structured Abstract Generation for Scientific and Technological Papers by Integrating Moves and Text Features

Xi Haixu^1,2, He Sheng¹, Huang Chunguo¹

1.School of Computer Engineering, Jiangsu University of Technology, Changzhou 213001
2.Department of Information Management, School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094

摘要
图/表
参考文献
相关文章 (9)

全文: PDF (1196 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要在移动互联网时代，移动阅读、碎片化阅读已经成为人们阅读的主流方式。在用户阅读过程中，提供摘要内容以提高阅读效率是解决信息过载问题的重要途径之一。科技研究论文文本长、内容广且包含领域知识，其摘要生成任务相比于新闻等普通文本更具有挑战性。本文提出了一种科技论文结构化摘要方法。首先，将科技论文划分为不同的语步；其次，分别对不同语步文本进行抽取式摘要，将文本多特征按权重融入TextRank算法的迭代计算过程中，引入MMR（maximal marginal relevance）算法对预选摘要集进行冗余处理；最后，使用依存句法分析对文本进行语义分析，进一步精简摘要，并组合成结构化摘要。研究结果表明，相比于基准模型，该方法在不同语步的相关性、多样性和可读性指标提升上具有一定差异；结合人工评价发现，该方法在显著提升摘要多样性的同时，一定程度上提升了摘要的相关性和可读性。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	习海旭
	何胜
	黄纯国

关键词 ：语步, 特征融合, 科技论文摘要, 依存句法分析, 语义分析

收稿日期: 2022-10-24

基金资助:国家社会科学基金项目“基于情境感知的移动图书馆服务模型的构建与应用研究”（19BTQ045）。

作者简介: 习海旭，男，1981年生，副教授，博士研究生，主要研究领域为信息检索与文本挖掘，E-mail：xihaixu@jsut.edu.cn；何胜，男，1971年生，博士，教授，硕士生导师，主要研究领域为数据挖掘及自然语言处理；黄纯国，男，1966年生，硕士，教授，主要研究领域为智慧教育、数据分析及数据挖掘；

引用本文:

习海旭, 何胜, 黄纯国. 融合语步和文本多特征的科技论文结构化摘要生成[J]. 情报学报, 2023, 42(10): 1176-1186.
Xi Haixu, He Sheng, Huang Chunguo. Structured Abstract Generation for Scientific and Technological Papers by Integrating Moves and Text Features. 情报学报, 2023, 42(10): 1176-1186.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2023.10.004 或 https://qbxb.istic.ac.cn/CN/Y2023/V42/I10/1176

1 许红波. 基于引文上下文的学术文献摘要方法研究[D]. 杨凌: 西北农林科技大学, 2017: 12-16.
2 陈渝, 朱云琴. PPM理论视角下用户从纸质阅读到移动阅读的转移行为影响因素研究[J]. 图书馆学研究, 2020(2): 70-80.
3 李倩, 郭倩倩. 基于用户类型划分的高校图书馆阅读推广策略选择研究[J]. 图书馆, 2020(6): 72-79, 100.
4 谭金源, 刁宇峰, 杨亮, 等. 基于BERT-SUMOPN模型的抽取-生成式文本自动摘要[J]. 山东大学学报(理学版), 2021, 56(7): 82-90.
5 Elkiss A, Shen S W, Fader A, et al. Blind men and elephants: what do citation summaries tell us about a research article?[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1): 51-62.
6 卫佳君, 宋继华. 自动文摘的方法研究[J]. 计算机技术与发展, 2011, 21(8): 188-191.
7 Kryscinski W, McCann B, Xiong C M, et al. Evaluating the factual consistency of abstractive text summarization[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 9332-9346.
8 Tan J W, Wan X J, Xiao J G. Abstractive document summarization with a graph-based attentional neural model[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 1171-1181.
9 唐晓波, 顾娜, 谭明亮. 基于句子主题发现的中文多文档自动摘要研究[J]. 情报科学, 2020, 38(3): 11-16, 28.
10 Galgani F, Compton P, Hoffmann A. Summarization based on bi-directional citation analysis[J]. Information Processing & Management, 2015, 51(1): 1-24.
11 王勇臻. 基于深度学习的学术文献自动摘要方法研究[D]. 大连: 大连海事大学, 2018: 23-28.
12 Swales J M. Research genres: explorations and applications[M]. Cambridge: Cambridge University Press, 2004: 125-130.
13 欧石燕, 陈嘉文. 科学论文全文语步自动识别研究[J]. 现代情报, 2021, 41(11): 3-11.
14 王如萍. 基于学术论文全文的研究方法句识别与自动摘要研究[D]. 南京: 南京理工大学, 2021: 18-29.
15 陈海华, 黄永, 张炯, 等. 基于引文上下文的学术文本自动摘要技术研究[J]. 数字图书馆论坛, 2016(8): 43-49.
16 Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
17 Hu P, He T T, Ji D H. Chinese text summarization based on thematic area detection[C]// Proceedings of the Workshop on Text Summarization Branches Out. Stroudsburg: Association for Computational Linguistics, 2004: 112-119.
18 Mao X K, Yang H, Huang S B, et al. Extractive summarization using supervised and unsupervised learning[J]. Expert Systems with Applications, 2019, 133: 173-181.
19 Abdel Fattah M, Ren F J. GA, MR, FFNN, PNN and GMM based models for automatic text summarization[J]. Computer Speech & Language, 2009, 23(1): 126-144.
20 Zheng H, Lapata M. Sentence centrality revisited for unsupervised summarization[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 6236-6247.
21 Dong Y, Mircea A, Cheung J C K. Discourse-aware unsupervised summarization for long scientific documents[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2021: 1089-1102.
22 Ju J X, Liu M, Koh H Y, et al. Leveraging information bottleneck for scientific document summarization[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 4091-4098.
23 Liang X N, Wu S Z, Li M, et al. Improving unsupervised extractive summarization with facet-aware modeling[C]// Findings of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2021: 1685-1697.
24 Gidiotis A, Tsoumakas G. Structured summarization of academic publications[C]// Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2020: 636-645.
25 李鲲. 面向科技文献自动摘要的科研事件抽取研究[D]. 北京: 中国科学技术信息研究所, 2017: 25-36.
26 Xu S, Wan X J, Hu S, et al. COSSUM: towards conversation-oriented structured summarization for automatic medical insurance assessment[C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2022: 4248-4256.
27 Koh H Y, Ju J X, Liu M, et al. An empirical survey on long document summarization: datasets, models, and metrics[J]. ACM Computing Surveys, 2023, 55(8): Article No.154.
28 Manakul P, Gales M. Long-span summarization via local attention and content selection[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 6026-6041.
29 Pilault J, Li R, Subramanian S, et al. On extractive and abstractive neural document summarization with transformer language models[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 9308-9319.
30 郑梦悦, 秦春秀, 马续补. 面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论[J]. 情报理论与实践, 2020, 43(2): 157-163.
31 丁良萍, 张智雄, 刘欢. 影响支持向量机模型语步自动识别效果的因素研究[J]. 数据分析与知识发现, 2019, 3(11): 16-23.
32 赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
33 王末, 崔运鹏, 陈丽, 等. 基于深度学习的学术论文语步结构分类方法研究[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
34 刘江峰, 冯钰童, 刘浏, 等. 领域双语数据增强的学术文本摘要结构识别研究[J]. 数据分析与知识发现, 2023, 7(8): 105-118.
35 Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
36 Brandow R, Mitze K, Rau L F. Automatic condensation of electronic publications by sentence selection[J]. Information Processing & Management, 1995, 31(5): 675-685.
37 Gupta V, Bharti P, Nokhiz P, et al. SumPubMed: summarization dataset of PubMed scientific articles[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop. Stroudsburg: Association for Computational Linguistics, 2021: 292-303.
38 Jurafsky D. Speech & language processing[M]. Noida: Pearson Education India, 2000: 123-156.
39 Mihalcea R, Tarau P. TextRank: bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 责任编辑潘尧