|
|
Research on the Discovery of Entity Relationships in Subdivided Domains under the Guidance of a Small-scale Knowledge Base |
Chen Guo1,2, Xu Tianxiang1 |
1.Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094 2.Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094 |
|
|
Abstract The acquisition of entity relationships in subdivided domains is a key issue for deepening and generalizing applications of knowledge engineering. In order to tackle the core problem of heavy reliance on manually annotated corpus at present, a natural solution is to use the existing (or low-cost) knowledge base in the subdivided domains as a guide. In contrast to the general knowledge base, the domain knowledge base is often small. This means it is necessary to not only use the ready-made knowledge content, but also to fully explore the “domain meta-knowledge” contained in the domain knowledge base. This paper proposes a subdivided domain entity relationship discovery scheme that combines domain meta-knowledge and a word embedding vector analogy. First, this paper describes the entity relationship constraints of a specific subdivided domain based on the existing knowledge base, such as the symptom representation relationship, which consists of <disease, symptom> entity pairs. Secondly, the word embedding vector of the domain entity is calculated according to the corresponding domain corpus. Following this, the positive and negative case vector benchmarks of various relational word embedded analogies are learned to provide a small number of high-quality entity relationships in the knowledge base, with the entity relationship classifier then trained based on this. Finally, for a given domain entity, by combining relational constraints, word embedding similarity, and word embedding analogy results, the entities that form different types of relationships are obtained. Taking the cardiovascular data as an example, a small amount of domain knowledge extracted from the encyclopedia can be used to obtain a better entity relationship recognition effect.
|
Received: 19 December 2018
|
|
|
|
1 百度王海峰: AI是新的生产力, 知识图谱是AI进步的阶梯[EB/OL]. (2017-11-10) [2019-03-06]. http://tech.ifeng.com/a/20171110/44754146_0.shtml. 2 Hype cycle for emerging technologies, 2017[EB/OL]. (2017-07-21) [2019-03-01]. https://www.gartner.com/doc/3768572/hype-cycle-emerging-technologies. 3 科技创新2030—“新一代人工智能”重大项目2018年度项目申报指南[EB/OL]. (2018-10-12) [2019-03-04]. http://www.most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2018/201810/W020181012691960006713.pdf. 4 韩其琛, 赵亚伟, 姚郑, 等. 基于叙词表的领域知识图谱初始种子集自动生成算法[J]. 中文信息学报, 2018, 32(8): 1-8. 5 朱礼军, 乔晓东, 张运良. 汉语科技词系统建设实践——以新能源汽车领域为例[J]. 情报学报, 2010, 29(4): 723-731. 6 武文雅, 陈钰枫, 徐金安, 等. 中文实体关系抽取研究综述[J]. 计算机与现代化, 2018(8): 21-27, 34. 7 徐健, 张智雄, 吴振新. 实体关系抽取的技术方法综述[J]. 现代图书情报技术, 2008, 24(8): 18-23. 8 朱惠, 王昊, 苏新宁, 等. 汉语领域术语非分类关系抽取方法研究[J]. 情报学报, 2018, 37(12): 1193-1203. 9 张琴, 郭红梅, 张智雄. 融合词嵌入表示特征的实体关系抽取方法研究[J]. 数据分析与知识发现, 2017(9): 8-15. 10 BrinS. Extracting patterns and relations from the World Wide Web[C]// Proceedings of the International Workshop on the World Wide Web and Databases. Heidelberg: Springer, 1999: 172-183. 11 BlumA, MitchellT. Combining labeled and unlabeled data with co-training[C]// Proceedings of the Eleventh Annual Conference on Computational Learning Theory. New York: ACM Press, 1998: 92-100. 12 ZhuX J, ChahramaniZ B. Learning from labeled and unlabeled data with label propagation. CMU-CALD-02-107, CMU CALD[R]. Pittsburgh: Carnegie Mellon University, 2002. 13 EtzioniO, BankoM, SoderlandS, et al. Open information extraction from the Web[J]. Communications of the ACM, 2008, 51(12): 68-74. 14 漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017, 3(1): 4-25. 15 刘凯, 符海东, 邹玉薇, 等. 基于卷积神经网络的中文医疗弱监督关系抽取[J]. 计算机科学, 2017, 44(10): 249-253. 16 MikolovT, SutskeverI, ChenK, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119. 17 GoldbergY. A primer on neural network models for natural language processing[J]. Journal of Artificial Intelligence Research, 2016, 57: 345-420. 18 MikolovT, ChenK, CorradoG, et al. Efficient estimation of word representations in vector space[C]// Proceedings of the International Conference on Learning Representations Workshop Track, 2013. 19 MikolovT, YihW T, ZweigG. Linguistic regularities in continuous space word representations[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2013: 746-751. 20 PenningtonJ, SocherR, ManningC. Glove: Global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543. 21 ZhangX L, DuC L, LiP S, et al. Knowledge graph completion via local semantic contexts[C]// Proceedings of the International Conference on Database Systems for Advanced Applications. Cham: Springer International Publishing, 2016: 432-446. 22 FuR J, GuoJ, QinB, et al. Learning semantic hierarchies via word embeddings[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014, 1: 1199-1209. 23 秦兵. 《大词林》中实体类型获取及层次化构建方法[EB/OL]. (2015-11-09) [2019-03-06]. http://www.cipsc.org.cn/kg3/qb.html. 24 韩其琛, 赵亚伟, 姚郑, 等. 基于叙词表的领域知识图谱初始种子集自动生成算法[J]. 中文信息学报, 2018, 32(8): 1-8. 25 陈果, 肖璐, 孙建军. 面向网络社区的分面式导航体系构建——以丁香园心血管论坛为例[J]. 情报理论与实践, 2017, 40(10): 112-116. 26 NailiM, ChaibiA H, Ben GhezalaH H. Comparative study of word embedding methods in topic segmentation[J]. Procedia Computer Science, 2017, 112: 340-349. 27 LaiS W, LiuK, HeS Z, et al. How to generate a good word embedding?[J]. IEEE Intelligent Systems, 2016, 31(6): 5-14. 28 张剑, 屈丹, 李真. 基于词向量特征的循环神经网络语言模型[J]. 模式识别与人工智能, 2015, 28(4): 299-305. 29 HartmannJ, HuppertzJ, SchampC, et al. Comparing automated text classification methods[J]. International Journal of Research in Marketing, 2019, 36(1): 20-38. 30 YoungT, HazarikaD, PoriaS, et al. Recent trends in deep learning based natural language processing[J]. IEEE Computational Intelligence Magazine, 2018, 13(3): 55-75. 31 39疾病百科-心血管内科疾病[EB/OL]. [2018-08-01]. http://jbk.39.net/bw/xinxueguanneike_t1. 32 MaatenL V D, HintonG. Visualizing data using tSNE[J]. Journal of Machine Learning Research, 2008, 9(11): 2579-2605. 33 WangX Y, ChusedA, ElhadadN, et al. Automated knowledge acquisition from clinical narrative reports[J]. AMIA Annual Symposium Proceedings Archive, 2008, 2008: 783-787. 34 de BruijnB, CherryC, KiritchenkoS, et al. Machine-learned solutions for three stages of clinical information extraction: The state of the art at I2B2 2010[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 557-562. 35 RinkB, HarabagiuS, RobertsK. Automatic extraction of relations between medical concepts in clinical texts[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 594-600. 36 FrunzaO, InkpenD. Extraction of disease-treatment semantic relations from biomedical sentences[C]// Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2010: 91-98. |
|
|
|