Named Entity Recognition of Ancient Books Based on MU Sequence Labeling
Xu Qiankun1, Wang Dongbo1,2, Liu Yutong1, Huang Shuiqing1,2
1.College of Information Management, Nanjing Agricultural University, Nanjing 210095 2.Research Center for Humanities and Social Computing, Nanjing Agricultural University, Nanjing 210095
许乾坤, 王东波, 刘禹彤, 黄水清. 基于MU序列标注的古籍命名实体识别研究[J]. 情报学报, 2025, 44(6): 736-747.
Xu Qiankun, Wang Dongbo, Liu Yutong, Huang Shuiqing. Named Entity Recognition of Ancient Books Based on MU Sequence Labeling. 情报学报, 2025, 44(6): 736-747.
1 Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2016: 260-270. 2 Wang X Z, Gao T Y, Zhu Z C, et al. KEPLER: a unified model for knowledge embedding and pre-trained language representation[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 176-194. 3 Wang D B, Liu C, Zhao Z X, et al. GujiBERT and GujiGPT: construction of intelligent information processing foundation language models for ancient texts[OL]. (2023-07-11). https://arxiv.org/pdf/2307.05354. 4 王东波, 刘畅, 朱子赫, 等. SikuBERT与SikuRoBERTa: 面向数字人文的《四库全书》预训练模型构建及应用研究[J]. 图书馆论坛, 2022, 42(6): 31-43. 5 Wang P Y, Ren Z C. The uncertainty-based retrieval framework for ancient Chinese CWS and POS[C]// Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages. European Language Resources Association, 2022: 164-168. 6 Cheng J R, Liu J X, Xu X B, et al. A review of Chinese named entity recognition[J]. KSII Transactions on Internet and Information Systems, 2021, 15(6): 2012-2030. 7 余馨玲, 常娥. 基于DA-BERT-CRF模型的古诗词地名自动识别研究——以金陵古诗词为例[J]. 图书馆杂志, 2023, 42(10): 87-94, 73. 8 谢靖, 刘江峰, 王东波. 古代中国医学文献的命名实体识别研究——以Flat-lattice增强的SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(10): 51-60. 9 刘耀, 李冠霖, 李浣青. 面向中医古籍的单篇文本知识标引与结构解析技术[J]. 图书情报工作, 2022, 66(24): 118-127. 10 Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537. 11 Chiu J P C, Nichols E. Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370. 12 Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[OL]. (2015-08-09). https://arxiv.org/pdf/1508.01991. 13 Zhang H L, Zhu H, Ruan J S, et al. A boundary detection enhanced model for people name recognition in ancient Chinese literature[C]// Proceedings of the 4th International Conference on Applied Machine Learning. Piscataway: IEEE, 2022: 1-5. 14 Strubell E, Verga P, Belanger D, et al. Fast and accurate entity recognition with iterated dilated convolutions[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 2670-2680. 15 Zhu Y Y, Wang G X. CAN-NER: convolutional attention network for Chinese named entity recognition[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 3384-3393. 16 Zhang Y, Yang J. Chinese NER using lattice LSTM[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 1554-1564. 17 Xuan Z Y, Bao R, Jiang S Y. FGN: fusion glyph network for Chinese named entity recognition[C]// Proceedings of the 5th China Conference on Knowledge Graph and Semantic Computing. Singapore: Springer, 2020: 28-40. 18 Fu J L, Huang X J, Liu P F. SpanNER: named entity re-/recognition as span prediction[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 7183-7195. 19 田红鹏, 吴璟玮. RIB-NER: 基于跨度的中文命名实体识别方法[J]. 计算机工程与科学, 2024, 46(7): 1311-1320. 20 Ye D M, Lin Y K, Li P, et al. Packed levitated marker for entity and relation extraction[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 4904-4917. 21 Li F, Lin Z C, Zhang M S, et al. A span-based model for joint overlapped and discontinuous named entity recognition[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 4814-4828. 22 Zhang S, Cheng H, Gao J F, et al. Optimizing bi-encoder for named entity recognition via contrastive learning[OL]. (2023-02-23). https://arxiv.org/pdf/2208.14565. 23 Jiang Z B, Xu W, Araki J, et al. Generalizing natural language analysis through span-relation representations[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 2120-2133. 24 Yan C X, Su Q, Wang J. MoGCN: mixture of gated convolutional neural network for named entity recognition of Chinese historical texts[J]. IEEE Access, 2020, 8: 181629-181639. 25 Akbik A, Bergmann T, Blythe D, et al. FLAIR: an easy-to-use framework for state-of-the-art NLP[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 54-59. 26 Xiao C J, Yao Y, Xie R B, et al. Denoising relation extraction from document-level distant supervision[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 3683-3688. 27 Li X Y, Feng J R, Meng Y X, et al. A unified MRC framework for named entity recognition[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 5849-5859.