|
|
Research on Event Extraction from Ancient Books Based on Machine Reading Comprehension |
Yu Xuehan1,2, He Lin1,2, Wang Xianqi1,2 |
1.College of Information Management, Nanjing Agricultural University, Nanjing 210095 2.Research Center for Humanities and Social Computing, Nanjing Agricultural University, Nanjing 210095 |
|
|
Abstract Exploring the context of ancient Chinese classics and extracting the events and event arguments contained in ancient Chinese classics are critical to read and understand the content of the text quickly. At present, research on event extractions from ancient books is mainly based on pattern matching, machine learning, and neural networks. This paper integrates the machine reading understanding mode into the existing neural network-based methods and combines the “event type” and “argument role” in event extraction into the form of questions so that the answer is event argument. Zuo Zhuan (in annalistic style) and The Historical Records (in annal-biography style) are selected as the training and generalization data, respectively, and the confused sentences are introduced in the specific generalization process to verify the effect of the model, which provides a reference idea for ancient Chinese event extraction.
|
Received: 21 April 2022
|
|
|
|
1 王大盈. 《中国基本古籍库》和《瀚堂典藏》两大古籍数据库比较研究[J]. 情报杂志, 2011, 30(S1): 157-158, 161. 2 季培培. 常见10种古籍全文数据库的比较研究[J]. 图书馆学研究, 2020(20): 71-80. 3 赵文友, 林世田. “中华古籍保护计划”成果——以“中华古籍资源库”建设为中心的古籍数字化工作[J]. 新世纪图书馆, 2018(3): 12-15. 4 王嘉灵. 以《汉书》为例的中古汉语自动分词[D]. 南京: 南京师范大学, 2014. 5 王菁薇, 肖莉, 骆嘉伟, 等. 基于《伤寒论》的命名实体识别研究[J]. 计算机与数字工程, 2021, 49(8): 1584-1587. 6 肖怀志, 李明杰. 基于本体的历史年代知识元在古籍数字化中的应用——以《三国志》历史年代知识元的抽取、存储和表示为例[J]. 图书情报知识, 2005(3): 28-33. 7 李娜. 面向方志类古籍的多类型命名实体联合自动识别模型构建[J]. 图书馆论坛, 2021, 41(12): 113-123. 8 林睿凡. 基于本体方法构建唐本《伤寒论》知识图谱[D]. 北京: 中国中医科学院, 2021. 9 刘静. 基于古籍库中典型描述语检索的人口大量死亡事件时空特征与原因分析[D]. 西安: 陕西师范大学, 2018. 10 程结晶, 王璞钰. 古籍中人物史料的关联组织研究——以《汉书·艺文志》中西汉经学家群体为例[J]. 图书馆论坛, 2023, 43(3): 64-74. 11 李娜. 社会网络分析视角下方志古籍知识组织研究——以《方志物产》山西分卷为例[D]. 南京: 南京农业大学, 2017. 12 Riloff E. Automatically constructing a dictionary for information extraction tasks[C]// Proceedings of the Eleventh National Conference on Artificial Intelligence. Palo Alto: AAAI Press, 1993: 811-816. 13 姜吉发. 一种事件信息抽取模式获取方法[J]. 计算机工程, 2005, 31(15): 96-98. 14 Chen Y B, Xu L H, Liu K, et al. Event extraction via dynamic multi-pooling convolutional neural networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 167-176. 15 Cao Y W, Peng H, Wu J, et al. Knowledge-preserving incremental social event detection via heterogeneous GNNs[C]// Proceedings of the Web Conference 2021. New York: ACM Press, 2021: 3383-3395. 16 Zheng S C, Hao Y X, Lu D Y, et al. Joint entity and relation extraction based on a hybrid neural network[J]. Neurocomputing, 2017, 257: 59-66. 17 李旭晖, 程威, 唐小雅, 等. 基于多层卷积神经网络的金融事件联合抽取方法[J]. 图书情报工作, 2021, 65(24): 89-99. 18 Li X Y, Feng J R, Meng Y X, et al. A unified MRC framework for named entity recognition[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 5849-5859. 19 Hermann K M, Ko?isky T, Grefenstette E, et al. Teaching machines to read and comprehend[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: The MIT Press, 2015: 1693-1701. 20 Hill F, Bordes A, Chopra S, et al. The goldilocks principle: reading children’s books with explicit memory representations[C]// Proceedings of the Forth International Conference on Learning Representations, San Juan, Puerto Rico, 2016. 21 Cui Y M, Liu T, Chen Z P, et al. Consensus attention-based neural networks for Chinese reading comprehension[C]// Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, 2016: 1777-1786. 22 Richardson M, Burges C J C, Renshaw E. MCTest: a challenge dataset for the open-domain machine comprehension of text[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2013: 193-203. 23 Lai G K, Xie Q Z, Liu H X, et al. RACE: large-scale reading comprehension dataset from examinations[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 785-794. 24 Rajpurkar P, Zhang J, Lopyrev K, et al. SQuAD: 100,000+ questions for machine comprehension of text[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 2383-2392. 25 Rajpurkar P, Jia R, Liang P. Know what you don’t know: unanswerable questions for SQuAD[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 784-789. 26 Reddy S, Chen D Q, Manning C D. CoQA: a conversational question answering challenge[J]. Transactions of the Association for Computational Linguistics, 2019, 7: 249-266. 27 Bajaj P, Campos D, Craswell N, et al. MS MARCO: a human generated machine reading comprehension dataset[OL]. (2018-10-31). https://arxiv.org/pdf/1611.09268.pdf. 28 He W, Liu K, Liu J, et al. DuReader: a Chinese machine reading comprehension dataset from real-world applications[C]// Proceedings of the Workshop on Machine Reading for Question Answering. Stroudsburg: Association for Computational Linguistics, 2018: 37-46. 29 马建忠. 马氏文通[M]. 北京: 商务印书馆, 1983: 71. 30 Cui Y M, Che W X, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514. 31 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 32 Zhang Z S, Yang J J, Zhao H. Retrospective reader for machine reading comprehension[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(16): 14506-14514. 33 左传[M]. 郭丹, 译. 北京: 中华书局, 2014. 34 刘勋. 春秋左传精读[M]. 北京: 新世界出版社, 2014. 35 刘坤鹏. 杜预《春秋释例》“诸例”研究[D]. 开封: 河南大学, 2018. 36 Liu Y H, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach[OL]. (2019-07-26). https://arxiv.org/pdf/1907.11692.pdf. 37 Iter D, Guu K, Lansing L, et al. Pretraining with contrastive sentence objectives improves discourse performance of language models[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 4859-4870. 38 蔡镜浩. 精心剪裁 字字斟酌——《史记》、《左传》对比评议[J]. 当代修辞学, 1985(4): 55-57, 60. 39 牙彩练. 论《左传》《史记》叙史风格之差异——以对“吴越争霸”史实的叙写为例[M]// 中华优秀传统文化研究. 北京:中国社会科学出版社, 2019: 81-96. 40 袁喜竹. 《左传》与《史记》史实相同部分的比较研究[D]. 长沙: 湖南师范大学, 2011: 7-39. |
|
|
|