唐雪梅, 苏祺, 王军, 杨浩. 基于图卷积神经网络的古汉语分词研究[J]. 情报学报, 2023, 42(6): 740-750.
Tang Xuemei, Su Qi, Wang Jun, Yang Hao. Ancient Chinese Word Segmentation Based on Graph Convolutional Neural Network. 情报学报, 2023, 42(6): 740-750.
1 国家技术监督局. 中华人民共和国国家标准: 信息处理用现代汉语分词规范(GB/T 13715—92)[S]. 北京: 中国标准出版社, 1993. 2 张琪, 江川, 纪有书, 等. 面向多领域先秦典籍的分词词性一体化自动标注模型构建[J]. 数据分析与知识发现, 2021, 5(3): 2-11. 3 高毅. 基于BERT预训练模型的古汉语自动分词方法研究[J]. 电子设计工程, 2021, 29(22): 28-32. 4 刘畅, 王东波, 胡昊天, 等. 面向数字人文的融合外部特征的典籍自动分词研究——以SikuBERT预训练模型为例[J]. 图书馆论坛, 2022, 42(6): 44-54. 5 Yao L, Mao C S, Luo Y. Graph convolutional networks for text classification[C]// Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI Press, 2019: 7370-7377. 6 Zhao L, Zhang A L, Liu Y, et al. Encoding multi-granularity structural information for joint Chinese word segmentation and POS tagging[J]. Pattern Recognition Letters, 2020, 138: 163-169. 7 Hu L M, Yang T C, Shi C, et al. Heterogeneous graph attention networks for semi-supervised short text classification[C]// Proceedings of the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 4820-4829. 8 Bastings J, Titov I, Aziz W, et al. Graph convolutional encoders for syntax-aware neural machine translation[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 1957-1967. 9 Liu J X, Wu F Z, Wu C H, et al. Neural Chinese word segmentation with dictionary knowledge[C]// Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing. Cham: Springer, 2018: 80-91. 10 Ma J, Ganchev K, Weiss D. State-of-the-art Chinese word segmentation with Bi-LSTMs[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2018: 4902-4908. 11 Zhang M S, Zhang Y, Fu G H. Transition-based neural word segmentation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 421-431. 12 Chen X C, Qiu X P, Zhu C X, et al. Long short-term memory neural networks for Chinese word segmentation[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 1197-1206. 13 Qiu X P, Pei H Z, Yan H, et al. A concise model for multi-criteria Chinese word segmentation with transformer encoder[C]// Proceedings of the Conference on the Findings of the Association for Computational Linguistics: EMNLP 2020. Stroudsburg: Association for Computational Linguistics, 2020: 2887-2897. 14 Margatina K, Baziotis C, Potamianos A. Attention-based conditioning methods for external knowledge integration[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 3944-3951. 15 Ding N, Long D, Xu G, et al. Coupling distant annotation and adversarial training for cross-domain Chinese word segmentation[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 6662-6671. 16 Liu W, Fu X Y, Zhang Y, et al. Lexicon enhanced Chinese sequence labeling using BERT adapter[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 5847-5858. 17 Tian Y, Song Y, Xia F, et al. Improving Chinese word segmentation with wordhood memory networks[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2020: 8274-8285. 18 Tian Y, Song Y, Xia F. Joint Chinese word segmentation and part-of-speech tagging via multi-channel attention of character N-grams[C]// Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics, 2020: 2073-2084. 19 郭辉, 苏中义, 王文, 等. 一种改进的MM分词算法[J]. 微型电脑应用, 2002(1): 13-15, 2. 20 邱冰, 皇甫娟. 基于中文信息处理的古代汉语分词研究[J]. 微计算机信息, 2008, 24(24): 100-102. 21 王嘉灵. 以《汉书》为例的中古汉语自动分词[D]. 南京: 南京师范大学, 2014. 22 梁社会, 陈小荷. 先秦文献《孟子》自动分词方法研究[J]. 南京师范大学文学院学报, 2013(3): 175-182. 23 高嘉琦, 赵庆聪. 基于新词发现的古典文学作品分词方法研究[J]. 计算机技术与发展, 2021, 31(9): 178-181, 207. 24 邢付贵, 朱廷劭. 基于大规模语料库的古文词典构建及分词技术研究[J]. 中文信息学报, 2021, 35(7): 41-46. 25 钱智勇, 周建忠, 童国平, 等. 基于HMM的楚辞自动分词标注研究[J]. 图书情报工作, 2014, 58(4): 105-110. 26 王晓玉, 李斌. 基于CRFs和词典信息的中古汉语自动分词[J]. 数据分析与知识发现, 2017, 1(5): 62-70. 27 杨世超. 古汉语分词与词性标注方法研究[D]. 唐山: 华北理工大学, 2018. 28 程宁, 李斌, 葛四嘉, 等. 基于BiLSTM-CRF的古汉语自动断句与词法分析一体化研究[J]. 中文信息学报, 2020, 34(4): 1-9. 29 俞敬松, 魏一, 张永伟. 基于BERT的古文断句研究与应用[J]. 中文信息学报, 2019, 33(11): 57-63. 30 Zhou Z H, Li M. Tri-training: exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529-1541. 31 Nguyen M V, Min B, Dernoncourt F, et al. Joint extraction of entities, relations, and events via modeling inter-instance and inter-label dependencies[C]// Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2022: 4363-4374. 32 Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering[C]// Proceedings of the 30th Annual Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2016: 3844-3852. 33 Marcheggiani D, Titov I. Encoding sentences with graph convolutional networks for semantic role labeling[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 1507-1516. 34 Du J, Mi W, Du X. Chinese word segmentation in electronic medical record text via graph neural network-bidirectional LSTM-CRF model[C]// Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway: IEEE, 2020: 985-989. 35 Huang K, Yu H, Liu J P, et al. Lexicon-based graph convolutional network for Chinese word segmentation[C]// Findings of the Association for Computational Linguistics: EMNLP 2021. Stroudsburg Association for Computational Linguistics, 2021: 2908-2917. 36 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg Association for Computational Linguistics, 2019: 4171-4186. 37 唐雪梅, 苏祺, 王军, 等. 基于预训练语言模型的繁体古文自动句读研究[C]// 第二十届中国计算语言学大会. 北京: 中国中文信息学会, 2021: 678-688. 38 Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the Eighteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 2001: 282-289. 39 Feng H D, Chen K, Deng X T, et al. Accessor variety criteria for Chinese word extraction[J]. Computational Linguistics, 2004, 30(1): 75-93. 40 王东波, 刘畅, 朱子赫, 等. SikuBERT与SikuRoBERTa: 面向数字人文的《四库全书》预训练模型构建及应用研究[J]. 图书馆论坛, 2022, 42(6): 31-43. 41 Wei X Y, Liu W H, Qing Z, et al. Glyph features matter: a multimodal solution for EvaHan in LT4HALA2022[C]// Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages. Marseille: European Language Resources Association, 2022: 178-182. 责任编辑 王克平)