Named Entity Recognition of Ancient Texts Based on the Enhancement of Multimodal Information from Chinese Characters and Pictographic Visual Alignment
Zheng Xuhui1,2, Wang Hao1,2, Qiu Jingwen1,2
1.School of Information Management, Nanjing University, Nanjing 210023 2.Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023
郑旭辉, 王昊, 裘靖文. 基于汉字多模信息与象形视觉对齐增强的古籍文本命名实体识别研究[J]. 情报学报, 2025, 44(4): 452-465.
Zheng Xuhui, Wang Hao, Qiu Jingwen. Named Entity Recognition of Ancient Texts Based on the Enhancement of Multimodal Information from Chinese Characters and Pictographic Visual Alignment. 情报学报, 2025, 44(4): 452-465.
1 推进新时代古籍工作[N]. 人民日报, 2022-04-12(1). 2 欧阳剑. 数字人文应用服务中的数据版权风险及防范策略[J]. 中国图书馆学报, 2023, 49(1): 118-128. 3 黄水清, 王晓光, 夏翠娟, 等. 推进新时代古籍工作, 加快创新智能化发展[J]. 农业图书情报学报, 2022, 34(5): 4-20. 4 刘浏, 王东波. 命名实体识别研究综述[J]. 情报学报, 2018, 37(3): 329-340. 5 Yadav V, Sharp R, Bethard S. Deep affix features improve neural named entity recognizers[C]// Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Stroudsburg: Association for Computational Linguistics, 2018: 167-172. 6 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 7 Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[OL]. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. 8 Norman J. Chinese[M]. Cambridge: Cambridge University Press, 1988. 9 杜悦, 王东波, 江川, 等. 数字人文下的典籍深度学习实体自动识别模型构建及应用研究[J]. 图书情报工作, 2021, 65(3): 100-108. 10 屈倩倩, 阚红星. 基于BERT-BiLSTM-CRF的中医文本命名实体识别[J]. 电子设计工程, 2021, 29(19): 40-43, 48. 11 林立涛, 王东波, 刘江峰, 等. 数字人文视域下典籍动物命名实体识别研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(10): 42-50. 12 刘江峰, 冯钰童, 王东波, 等. 数字人文视域下SikuBERT增强的史籍实体识别研究[J]. 图书馆论坛, 2022, 42(10): 61-72. 13 范涛, 王昊, 陈玥彤. 基于深度迁移学习的地方志多模态命名实体识别研究[J]. 情报学报, 2022, 41(4): 412-423. 14 谢靖, 刘江峰, 王东波. 古代中国医学文献的命名实体识别研究——以Flat-Lattice增强的SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(10): 51-60. 15 武帅, 杨秀璋, 何琳, 等. 基于句法特征和BERT-BiLSTM-MHA-CRF的细粒度古籍实体识别研究[J]. 数据分析与知识发现2024, 8(12): 136-148. 16 Sun Y M, Lin L, Yang N, et al. Radical-enhanced Chinese character embedding[C]// Proceedings of the International Conference on Neural Information Processing. Cham: Springer, 2014: 279-286. 17 Shi X L, Zhai J J, Yang X D, et al. Radical embedding: delving deeper to Chinese radicals[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 594-598. 18 Liu F, Lu H, Lo C, et al. Learning character-level compositionality with visual features[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 2059-2068. 19 Dai F Z, Cai Z. Glyph-aware embedding of Chinese characters[C]// Proceedings of the First Workshop on Subword and Character Level Models in NLP. Stroudsburg: Association for Computational Linguistics, 2017: 64-69. 20 Su T R, Lee H Y. Learning Chinese word representations from glyphs of characters[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 264-273. 21 Meng Y X, Wu W, Wang F, et al. Glyce: glyph-vectors for Chinese character representations[C]// Proceedings of the 32nd Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2020: 2723-2734. 22 Sun Z J, Li X Y, Sun X F, et al. ChineseBERT: Chinese pretraining enhanced by glyph and pinyin information[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 2065-2075. 23 Xuan Z Y, Bao R, Jiang S Y. FGN: fusion glyph network for Chinese named entity recognition[C]// Proceedings of the Conference on Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence. Singapore: Spring, 2021: 28-40. 24 Gu R M, Wang T, Deng J F, et al. Improving Chinese named entity recognition by interactive fusion of contextual representation and glyph representation[J]. Applied Sciences, 2023, 13(7): 4299. 25 Guo X C, Lu S H, Tang Z, et al. CG-ANER: enhanced contextual embeddings and glyph features-based agricultural named entity recognition[J]. Computers and Electronics in Agriculture, 2022, 194: 106776. 26 Li J T, Meng K. MFE-NER: multi-feature fusion embedding for Chinese named entity recognition[C]// Proceedings of the 23rd China National Conference on Chinese Computational Linguistics. Singapore: Springer, 2025: 191-204. 27 Lv C, Zhang H, Du X K, et al. StyleBERT: Chinese pretraining by font style information[C]// Proceedings of the 10th IEEE Joint International Information Technology and Artificial Intelligence Conference. Piscataway: IEEE, 2022: 646-652. 28 Wu S, Song X N, Feng Z H. MECT: multi-metadata embedding based cross-transformer for Chinese named entity recognition[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Online. Stroudsburg: Association for Computational Linguistics, 2021: 1529-1539. 29 Zhang B H, Cai J H, Zhang H P, et al. VisPhone: Chinese named entity recognition model enhanced by visual and phonetic features[J]. Information Processing & Management, 2023, 60(3): 103314. 30 Wang Y H, Chen H T, Tang Y H, et al. PanGu-π: enhancing language model architectures via nonlinearity compensation[OL]. (2023-12-27). https://arxiv.org/pdf/2312.17276. 31 Li J Y, Fei H, Liu J, et al. Unified named entity recognition as word-word relation classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(10): 10965-10973. 责任编辑 魏瑞斌)