|
|
Named Entity Recognition of Local Chronicles Literature in Traditional Chinese Opera Based on Multi-dimensional Feature Analysis |
Zhai Shanshan1,2, Yu Huajuan1, Chen Jianyao3, Xia Lixin1 |
1.School of Information Management, Central China Normal University, Wuhan 430079 2.Intelligent Computing Laboratory for Cultural Heritage, Wuhan University, Wuhan 430072 3.University of Wisconsin-Milwaukee, Milwaukee 53202 |
|
|
Abstract Local chronicles are a unique and highly valuable form of regional documentation in China. Digitizing and implementing knowledge mining for these records is crucial for the inheritance and dissemination of traditional Chinese culture, as well as for the construction of a culturally strong nation. Named entity recognition (NER) plays a crucial role as a fundamental technology in organizing and discovering knowledge within local chronicles. Although there has been some progress in NER for local chronicles, a systematic technical solution that adapts to the specific features of these texts and the characteristics of domain resources is still lacking. Therefore, this study proposes a novel approach for named entity recognition in traditional Chinese opera local chronicles by integrating multi-dimensional features with Bi-LSTM-CRF. First, by combining syntactic features with textual features such as symbols, suffixes, word structure, context, and negative examples, the distinctive traits of opera entities within local chronicles are analyzed. Thereafter, the Bi-LSTM-CRF model, which performs well in long text structures, is utilized to improve the efficiency of entity recognition with the help of parsed features of opera-like entities. Finally, empirical research is conducted using the specific case of the “Chu Opera Chronicles.” The results demonstrate that the proposed model outperforms the baseline model in terms of named entity recognition, achieving an F1 score of 0.869.
|
Received: 18 October 2023
|
|
|
|
1 鲁丹, 李欣. 数字人文环境下异构方志元数据整合策略[J]. 图书馆论坛, 2019, 39(4): 158-165. 2 刘丹. 面向方志的发现平台设计与实现[J]. 图书馆理论与实践, 2017(3): 109-112. 3 朱锁玲, 包平. 方志类古籍地名识别及系统构建[J]. 中国图书馆学报, 2011, 37(3): 118-124. 4 徐晨飞, 叶海影, 包平. 基于深度学习的方志物产资料实体自动识别模型构建研究[J]. 数据分析与知识发现, 2020, 4(8): 86-97. 5 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340. 6 岑志坚. 地方志文献的特征、价值及开发[J]. 科技情报开发与经济, 2009, 19(4): 87-89. 7 熊欣, 王昊, 邓三鸿. 面向方志知识图谱的术语抽取模型迁移学习研究[J]. 情报理论与实践, 2021, 44(4): 176-184. 8 Kim J H, Woodland P C. A rule-based named entity recognition system for speech input[C]// Proceedings of the 6th International Conference on Spoken Language Processing. Singapore: ISCA, 2000: 528-531. 9 张小衡, 王玲玲. 中文机构名称的识别与分析[J]. 中文信息学报, 1997, 11(4): 21-32. 10 Bikel D M, Schwartz R, Weischedel R M. An algorithm that learns what’s in a name[J]. Machine Learning, 1999, 34(1): 211-231. 11 Krishnan V, Manning C D. An effective two-stage model for exploiting non-local dependencies in named entity recognition[C]// Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2006: 1121-1128. 12 Collins M, Singer Y. Unsupervised models for named entity classification[C]// Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Stroudsburg: Association for Computational Linguistics, 1999:100-110. 13 Mikheev A, Moens M, Grover C. Named entity recognition without gazetteers[C]// Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1999: 1-8. 14 Dong X S, Qian L J, Guan Y, et al. A multiclass classification method based on deep learning for named entity recognition in electronic medical records[C]// Proceedings of the 2016 New York Scientific Data Summit. Piscataway: IEEE, 2016: 1-10. 15 Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2016: 260-270. 16 李丽双, 郭元凯. 基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J]. 中文信息学报, 2018, 32(1): 116-122. 17 Lund B D, Wang T. Chatting about ChatGPT: how may AI and GPT impact academia and libraries?[J]. Library Hi Tech News, 2023, 40(3): 26-29. 18 Peng N Y, Dredze M. Improving named entity recognition for Chinese social media with word segmentation representation learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 149-155. 19 Rondeau M A, Su Y. LSTM-based NeuroCRFs for named entity recognition[C]// Proceedings of the 17th Annual Conference of the International Speech Communication Association. San Francisco: International Speech Communication Association, 2016: 665-669. 20 赵洪, 王芳. 理论术语抽取的深度学习模型及自训练算法研究[J]. 情报学报, 2018, 37(9): 923-938. 21 王昊, 邓三鸿, 苏新宁, 等. 基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报, 2020, 39(8): 817-828. 22 范涛, 王昊, 陈玥彤. 基于深度迁移学习的地方志多模态命名实体识别研究[J]. 情报学报, 2022, 41(4): 412-423. 23 王昊, 林克柔, 孟镇, 等. 文本表示及其特征生成对法律判决书中多类型实体识别的影响分析[J]. 数据分析与知识发现, 2021, 5(7): 10-25. 24 雷树杰, 邢富坤, 王闻慧. 融合多类型特征的特定领域实体识别研究[J]. 计算机应用与软件, 2019, 36(11): 210-217. 25 赵莉莉. 浅析地方志资源数字化[J]. 河南图书馆学刊, 2019, 39(4): 87-89. 26 王伟光: 在国家数字方志馆揭牌暨“方志中国”展览开展仪式上的讲话[EB/OL]. (2016-07-19) [2023-09-10]. http://www.cass.net.cn/yuanlingdao/lingdaoyanlun/201607/t20160719_3125843.html. 27 中国数字方志库[EB/OL]. [2023-09-10]. http://x.wenjinguan.com/. 28 徐晨飞, 包平, 张惠敏, 等. 基于关联数据的方志物产史料语义化知识组织研究[J]. 大学图书馆学报, 2020, 38(6): 78-88. 29 高劲松, 周树斌, 高颖, 等. 山水志史料资源语义知识关联与多维知识发现研究[J]. 情报资料工作, 2023, 44(5): 82-92. 30 陈健瑶, 翟姗姗, 夏立新, 等. 融合句法特征和句法相似度的网络舆情突发事件识别方法研究[J]. 图书情报工作, 2021, 65(9): 41-50. 31 《楚剧志》编纂组. 楚剧志[M]. 北京: 中国戏剧出版社, 1993. 32 袁里驰. 基于依存关系的句法分析统计模型[J]. 中南大学学报(自然科学版), 2009, 40(6): 1630-1635. 33 Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks[C]// Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2013: 6645-6649. 34 Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the Eighteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 2001: 282-289. 35 毛瑞彬, 朱菁, 李爱文, 等. 基于自然语言处理的产业链知识图谱构建[J]. 情报学报, 2022, 41(3): 287-299. |
|
|
|