|
|
Research on Chinese Named Entity Linking Based on Multi-feature Fusion |
Lin Zefei1,2, Ou Shiyan1 |
1. School of Information Management, Nanjing University, Nanjing 210093; 2. College of Social Development, Fujian Normal University, Fuzhou 350007 |
|
|
Abstract Named Entity Linking (NEL) refers to a named entity disambiguation method that disambiguates multi-sense named entity mentions in a text by mapping them to their correct meanings in a knowledge base. Most of the current NEL studies and practices focus on named entity disambiguation in western texts, rather than Chinese texts, by using Wikipedia. However, this study proposed a Chinese named entity linking method based on the Baidu Encyclopedia. This method integrates single and collective named entity disambiguation features, and adopts different combinations of features in accordance with the different text lengths. In addition, a two-stage disambiguation strategy, which can optimize the result of the first-round of disambiguation, was designed. The results of this experiment on real Chinese corpora showed that disambiguation accuracy can be significantly improved by multi-feature fusion and two-stage disambiguation. A comparative experiment demonstrated that the performance of this NEL method is superior to that of a similar state-of-the-art system (the Chinese NEL service of Knowledge Works Lab at Fudan University).
|
Received: 04 September 2018
|
|
|
|
[1] Nouvel D, Ehrmann M, Rosset S. Named entities for computational linguistics[M]. New York: John Wiley & Sons, Inc., 2016: 153-156. [2] Hughes K, Nothman J, Curran J R. Trading accuracy for faster named entity linking[C]// Proceedings of the Australasian Language Technology Association Workshop. Penrith: Western Sydney University, 2014: 32-40. [3] Zhang W, Su J, Tan C L, et al. Entity linking leveraging: automatically generated annotation[C]// Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2010: 1290-1298. [4] Anastácio I, Martins B, Calado P. Supervised learning for linking named entities to knowledge base entries[C]// Proceedings of TAC. Gaithersburg: NIST, 2011: 1-12. [5] McNamee P, Mayfield J, Lawrie D, et al. Cross-language entity linking[C]// Proceedings of the 5th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2011: 255-263. [6] Francis-Landau M, Durrett G, Klein D. Capturing semantic similarity for entity linking with convolutional neural networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 1256-1261. [7] Sun Y, Lin L, Tang D, et al. Modeling mention, context and entity with neural networks for entity disambiguation[C]// Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence. California: IJCAI, 2015: 1333-1339. [8] Han X, Sun L, Zhao J. Collective entity linking in web text: a graph-based method[C]// Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2011: 765-774. [9] Hoffart J, Yosef M A, Bordino I, et al. Robust disambiguation of named entities in text[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2011: 782-792. [10] Frontini F, Brando C, Ganascia J G. Semantic Web based named entity linking for digital humanities and heritage texts[C]// Proceedings of the First International Workshop Semantic Web for Scientific Heritage at the 12th ESWC 2015 Conference. Berlin: Springer, 2015: 77-88. [11] Guo Y, Che W, Liu T, et al. A graph-based method for entity linking[C]// Proceedings of 5th International Joint Conference on NLP. California: IJCAI, 2011: 1010-1018. [12] Guo Z, Barbosa D. Robust entity linking via random walks[C]// Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York: ACM Press, 2014: 499-508. [13] Rao D, McNamee P, Dredze M. Entity linking: Finding extracted entities in a knowledge base[M]// Multi-source, Multilingual Information Extraction and Summarization. Berlin: Springer, 2013: 93-115. [14] Che W, Li Z, Liu T. LTP: A Chinese language technology platform[C]// Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2010: 13-16. [15] 黄惠贤, 赵泽轩. 二十五史人名大辞典电子版[EB/OL]. [2018-03-10]. http://mall.cnki.net/reference/detail_R200610137.html. [16] 史为乐, 邓自欣, 朱玲玲. 中国历史地名大辞典电子版[EB/OL]. [2018-03-10]. http://mall.cnki.net/reference/detail_R200606116.html. [17] HIT-SCIR. LTP词性标注集[EB/OL]. [2018-02-05]. http://ltp.readthedocs.io/zh_CN/latest/appendix.html. [18] 百度百科. 百科词条数统计[EB/OL]. [2018-01-05]. https://baike.baidu.com. [19] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[EB/OL]. [2018-03-15]. https://arxiv.org/pdf/1301.3781.pdf. [20] Le Q, Mikolov T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. New York: ACM Press, 2014: 1188-1196. [21] Xing C, Wang D, Zhang X, et al. Document classification with distributions of word vectors[C]// Proceedings of 2014 Annual Summit and Conference Asia-Pacific Signal and Information Processing Association. Piscataway: IEEE, 2014: 1-5. [22] Ou S, Kim H. Unsupervised citation sentence identification based on similarity measurement[C]// Proceedings of 2018 International Conference on Information. Berlin: Springer, 2018: 384-394. [23] Blei D, Lafferty J. Dynamic topic models[C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006: 113-120. [24] Hoffman M, Bach F R, Blei D M. Online learning for Latent Dirichlet Allocation[C]// Proceedings of Conference on Neural Information Processing Systems. New York: Curran Associates, 2010: 856-864. [25] Joulin A, Grave E, Bojanowski P, et al. Bag of tricks for efficient text classification[C]// Proceedings of the 15th Conference of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 427-431. [26] Boldi P, Santini M, Vigna S. PageRank as a function of the damping factor[C]// Proceedings of the 14th International Conference on World Wide Web. New York: ACM Press, 2005: 557-566. [27] McNamee P, Dredze M, Gerber A, et al. HLTCOE approaches to knowledge base population[C]// Proceedings of the 2nd Text Analysis Conference. Gaithersburg: National Institute of Standards and Technology, 2009. [28] Li H. A short introduction to learning to rank[J]. IEICE Transactions on Information and Systems, 2011, 94(10): 1854-1862. [29] Cao Y, Xu J, Liu T Y, et al. Adapting ranking SVM to document retrieval[C]// Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2006: 186-193. [30] Joachims T. SVM-rank: Support Vector Machine for ranking[EB/OL]. [2018-04-03]. https://www.cs.cornell.edu/people/tj/svm_ light/svm_rank.html. [31] Witten I H, Frank E, Hall M A, et al. Data mining: Practical machine learning tools and techniques[M]. Burlington: Morgan Kaufmann Publishers, 2011: 154-155. [32] 复旦大学计算机信息与技术系国际数据库中心NLP小组. 文本分类语料库[EB/OL]. [2018-04-12]. http://www.nlpir.org/?action-viewnews-itemid-103. |
|
|
|