摘要命名实体识别是自然语言处理的基础性任务,其结果具有广泛的应用。关联数据由于具有丰富的语义知识,能够对现有命名实体识别进一步完善。本文实现了一个基于关联数据的可配置的中英文命名实体识别系统,在识别过程中对实体进行消歧并对识别结果进行扩展,为命名实体识别的进一步完善提供了新的思路。具体包括:基于DBpedia构造了跨领域的中英文命名实体词典;设计了一个基于Hive的分布式管理数据存储模型,基于该模型实现了对DBpedia数据集的组织、存储以及扩展;设计了一个基于图的命名实体识别算法,该算法能够充分利用关联数据的语义关系对命名实体进行消歧,并且基于DBpedia Spotlight NER Corpus对算法进行测试,并将算法结果与DBpedia Spotlight、NERSO以及Zwmanta三个系统进行对比评价,结果表明本文实现的算法在查全率、查准率、F值上具有更好的表现。
[1] Fafalios P, Baritakis M, Tzitzikas Y. Configuring named entity extraction through real-time exploitation of linked data[C]// Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics. New York: ACM Press, 2014: Article No. 10. [2] Mendes P N, Jakob M, García-Silva A, et al. DBpedia spotlight: shedding light on the web of documents[C]// Proceedings of the 7th International Conference on Semantic Systems. New York: ACM Press, 2011: 1-8. [3] Hakimov S, Oto S A, Dogdu E. Named entity recognition and disambiguation using linked data and graph-based centrality scoring[C]// Proceedings of the 4th International Workshop on Semantic Web Information Management. New York: ACM Press, 2012: Article No. 4. [4] Yosef M A, Hoffart J, Bordino I, et al. Aida: An online tool for accurate disambiguation of named entities in text and tables[J]. Proceedings of the VLDB Endowment, 2011, 4(12): 1450-1453. [5] Usbeck R, Ngomo A C N, R?der M, et al. AGDISTIS-graph-based disambiguation of named entities using linked data[C]// Proceedings of the International Semantic Web Conference. Heidelberg: Springer, 2014: 457-471. [6] Gangemi A, Presutti V, Reforgiato Recupero D, et al. Semantic web machine reading with FRED[J]. Semantic Web, 2017, 8(6): 873-893. [7] Nebhi K. Ontology-based information extraction from Twitter[C]// Proceedings of the Workshop on Information Extraction and Entity Analytics on Social Media Data, Mumbai, India, 2012: 17-22. [8] Maynard D, Peters W, Li Y Y. Evaluating evaluation metrics for ontology-based applications: Infinite reflection[C]// Proceedings of the International Conference on Language Resources and Evaluation, Marrakech, Morocco, 2010: 1045-1050. [9] Damljanovic D, Bontcheva K. Named entity disambiguation using linked data[C]// Proceedings of the 9th Extended Semantic Web Conference. 2012: 231-240. [10] Sinha R, Mihalcea R. Unsupervised graph-based word sense disambiguation using measures of word semantic similarity[C]//Proceedings of the International Conference on Semantic Computing, 2007, 7: 363-369. [11] Nebhi K. Named entity disambiguation using freebase and syntactic parsing[C]// Proceedings of the First International Conference on Linked Data for Information Extraction. Aachen: CEUR-WS, 2013, 1057: 50-55. [12] Usbeck R, Ngomo A C N, Luo W C, et al. Multilingual disambiguation of named entities using linked data[C]// Proceedings of the 2014 International Conference on Posters & Demonstrations Track. Aachen: CEUR-WS, 2014, 1272: 101-104.