|
|
Structured Abstract Generation for Scientific and Technological Papers by Integrating Moves and Text Features |
Xi Haixu1,2, He Sheng1, Huang Chunguo1 |
1.School of Computer Engineering, Jiangsu University of Technology, Changzhou 213001 2.Department of Information Management, School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094 |
|
|
Abstract In the era of mobile Internet, mobile and fragmented reading have become the main means of public reading. One of the important ways to solve the problem of information overload is to provide key summary content for improving reading efficiency. The task of abstracting scientific and technological papers is more challenging than that of abstracting ordinary texts, such as news, because of their length, varying content, and domain knowledge. This paper proposes a structured summarization method for scientific papers. First, scientific papers are divided into different moves, and then, the texts of different moves are abstracted separately. The multiple features of the text are integrated into the iterative calculation process of the TextRank algorithm according to weight, and the MMR algorithm is introduced to redundantly process the pre-selected abstract set. Finally, the text is semantically analyzed using dependency syntax analysis, and the summary is further streamlined and combined into a structured summary. Experimental results show that this method is different from the benchmark model in terms of the relevance, diversity, and readability of different moves. Combined with manual evaluation, this method can significantly improve the diversity of the summary while simultaneously improving the relevance and readability of the summary to a certain extent.
|
Received: 24 October 2022
|
|
|
|
1 许红波. 基于引文上下文的学术文献摘要方法研究[D]. 杨凌: 西北农林科技大学, 2017: 12-16. 2 陈渝, 朱云琴. PPM理论视角下用户从纸质阅读到移动阅读的转移行为影响因素研究[J]. 图书馆学研究, 2020(2): 70-80. 3 李倩, 郭倩倩. 基于用户类型划分的高校图书馆阅读推广策略选择研究[J]. 图书馆, 2020(6): 72-79, 100. 4 谭金源, 刁宇峰, 杨亮, 等. 基于BERT-SUMOPN模型的抽取-生成式文本自动摘要[J]. 山东大学学报(理学版), 2021, 56(7): 82-90. 5 Elkiss A, Shen S W, Fader A, et al. Blind men and elephants: what do citation summaries tell us about a research article?[J]. Journal of the American Society for Information Science and Technology, 2008, 59(1): 51-62. 6 卫佳君, 宋继华. 自动文摘的方法研究[J]. 计算机技术与发展, 2011, 21(8): 188-191. 7 Kryscinski W, McCann B, Xiong C M, et al. Evaluating the factual consistency of abstractive text summarization[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 9332-9346. 8 Tan J W, Wan X J, Xiao J G. Abstractive document summarization with a graph-based attentional neural model[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 1171-1181. 9 唐晓波, 顾娜, 谭明亮. 基于句子主题发现的中文多文档自动摘要研究[J]. 情报科学, 2020, 38(3): 11-16, 28. 10 Galgani F, Compton P, Hoffmann A. Summarization based on bi-directional citation analysis[J]. Information Processing & Management, 2015, 51(1): 1-24. 11 王勇臻. 基于深度学习的学术文献自动摘要方法研究[D]. 大连: 大连海事大学, 2018: 23-28. 12 Swales J M. Research genres: explorations and applications[M]. Cambridge: Cambridge University Press, 2004: 125-130. 13 欧石燕, 陈嘉文. 科学论文全文语步自动识别研究[J]. 现代情报, 2021, 41(11): 3-11. 14 王如萍. 基于学术论文全文的研究方法句识别与自动摘要研究[D]. 南京: 南京理工大学, 2021: 18-29. 15 陈海华, 黄永, 张炯, 等. 基于引文上下文的学术文本自动摘要技术研究[J]. 数字图书馆论坛, 2016(8): 43-49. 16 Luhn H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165. 17 Hu P, He T T, Ji D H. Chinese text summarization based on thematic area detection[C]// Proceedings of the Workshop on Text Summarization Branches Out. Stroudsburg: Association for Computational Linguistics, 2004: 112-119. 18 Mao X K, Yang H, Huang S B, et al. Extractive summarization using supervised and unsupervised learning[J]. Expert Systems with Applications, 2019, 133: 173-181. 19 Abdel Fattah M, Ren F J. GA, MR, FFNN, PNN and GMM based models for automatic text summarization[J]. Computer Speech & Language, 2009, 23(1): 126-144. 20 Zheng H, Lapata M. Sentence centrality revisited for unsupervised summarization[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 6236-6247. 21 Dong Y, Mircea A, Cheung J C K. Discourse-aware unsupervised summarization for long scientific documents[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2021: 1089-1102. 22 Ju J X, Liu M, Koh H Y, et al. Leveraging information bottleneck for scientific document summarization[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 4091-4098. 23 Liang X N, Wu S Z, Li M, et al. Improving unsupervised extractive summarization with facet-aware modeling[C]// Findings of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2021: 1685-1697. 24 Gidiotis A, Tsoumakas G. Structured summarization of academic publications[C]// Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2020: 636-645. 25 李鲲. 面向科技文献自动摘要的科研事件抽取研究[D]. 北京: 中国科学技术信息研究所, 2017: 25-36. 26 Xu S, Wan X J, Hu S, et al. COSSUM: towards conversation-oriented structured summarization for automatic medical insurance assessment[C]// Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2022: 4248-4256. 27 Koh H Y, Ju J X, Liu M, et al. An empirical survey on long document summarization: datasets, models, and metrics[J]. ACM Computing Surveys, 2023, 55(8): Article No.154. 28 Manakul P, Gales M. Long-span summarization via local attention and content selection[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 6026-6041. 29 Pilault J, Li R, Subramanian S, et al. On extractive and abstractive neural document summarization with transformer language models[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 9308-9319. 30 郑梦悦, 秦春秀, 马续补. 面向中文科技文献非结构化摘要的知识元表示与抽取研究——基于知识元本体理论[J]. 情报理论与实践, 2020, 43(2): 157-163. 31 丁良萍, 张智雄, 刘欢. 影响支持向量机模型语步自动识别效果的因素研究[J]. 数据分析与知识发现, 2019, 3(11): 16-23. 32 赵丹宁, 牟冬梅, 白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究[J]. 数据分析与知识发现, 2021, 5(7): 70-80. 33 王末, 崔运鹏, 陈丽, 等. 基于深度学习的学术论文语步结构分类方法研究[J]. 数据分析与知识发现, 2020, 4(6): 60-68. 34 刘江峰, 冯钰童, 刘浏, 等. 领域双语数据增强的学术文本摘要结构识别研究[J]. 数据分析与知识发现, 2023, 7(8): 105-118. 35 Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240. 36 Brandow R, Mitze K, Rau L F. Automatic condensation of electronic publications by sentence selection[J]. Information Processing & Management, 1995, 31(5): 675-685. 37 Gupta V, Bharti P, Nokhiz P, et al. SumPubMed: summarization dataset of PubMed scientific articles[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop. Stroudsburg: Association for Computational Linguistics, 2021: 292-303. 38 Jurafsky D. Speech & language processing[M]. Noida: Pearson Education India, 2000: 123-156. 39 Mihalcea R, Tarau P. TextRank: bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 责任编辑 潘尧 |
|
|
|