小规模知识库指导下的细分领域实体关系发现研究

doi:10.3772/j.issn.1000-0135.2019.11.008

情报学报

2019, Vol. 38

Issue (11): 1200-1211 DOI: 10.3772/j.issn.1000-0135.2019.11.008

情报分析方法与技术

本期目录 | 过刊浏览 | 高级检索

小规模知识库指导下的细分领域实体关系发现研究

陈果^1,2, 许天祥¹

1.南京理工大学经济管理学院信息管理系，南京 210094
2.江苏省社会公共安全科技协同创新中心，南京 210094

Research on the Discovery of Entity Relationships in Subdivided Domains under the Guidance of a Small-scale Knowledge Base

Chen Guo^1,2, Xu Tianxiang¹

1.Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094
2.Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094

摘要
图/表
参考文献
相关文章 (3)

全文: PDF (2432 KB) HTML (132 KB)
输出: BibTeX | EndNote (RIS)

摘要细分领域实体关系的获取是知识工程深化与泛化应用的关键问题，当前面临对人工标注语料严重依赖这一核心难题，一种自然的解决思路是利用细分领域已有的（或可低成本获取的）知识库作为指导。与通用型知识库不同，细分领域知识库往往规模较小，因此不仅要利用其中的现成知识内容，还有必要充分发掘蕴含于领域知识库中规律性的“领域元知识”。本文提出一种融合领域元知识和词嵌入向量类比的细分领域实体关系发现方案：首先，根据已有知识库抽象出特定细分领域的实体关系约束条件，如症状表征关系由<疾病，症状>实体对构成；其次，依据相应领域语料计算领域实体的词嵌入向量；随后，针对知识库中少量高质实体关系学习各类关系词嵌入类比的正负例向量基准，以此为基础训练实体关系分类器；最后，针对给定的领域实体，综合关系约束、词嵌入相似度、词嵌入类比结果分类，得到与其构成不同类型关系的实体。以心血管领域数据为例，仅用少量从百科抽取的领域知识，即可取得较好的实体关系识别效果。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	陈果
	许天祥

关键词 ：领域实体关系, 词嵌入类比, 术语分析, 领域知识分析

收稿日期: 2018-12-19

基金资助:国家社会科学基金青年项目“领域分析视角下的科技词汇语义挖掘与知识演化研究”（16CTQ024）。

作者简介: 陈果，男，1986年生，博士，副教授，硕士生导师，主要研究方向为领域知识分析、知识服务，E-mail:delphi1987@qq.com

引用本文:

陈果, 许天祥. 小规模知识库指导下的细分领域实体关系发现研究[J]. 情报学报, 2019, 38(11): 1200-1211.
Chen Guo, Xu Tianxiang. Research on the Discovery of Entity Relationships in Subdivided Domains under the Guidance of a Small-scale Knowledge Base. 情报学报, 2019, 38(11): 1200-1211.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2019.11.008 或 https://qbxb.istic.ac.cn/CN/Y2019/V38/I11/1200

1 百度王海峰: AI是新的生产力, 知识图谱是AI进步的阶梯[EB/OL]. (2017-11-10) [2019-03-06]. http://tech.ifeng.com/a/20171110/44754146_0.shtml.
2 Hype cycle for emerging technologies, 2017[EB/OL]. (2017-07-21) [2019-03-01]. https://www.gartner.com/doc/3768572/hype-cycle-emerging-technologies.
3 科技创新2030—“新一代人工智能”重大项目2018年度项目申报指南[EB/OL]. (2018-10-12) [2019-03-04]. http://www.most.gov.cn/mostinfo/xinxifenlei/fgzc/gfxwj/gfxwj2018/201810/W020181012691960006713.pdf.
4 韩其琛, 赵亚伟, 姚郑, 等. 基于叙词表的领域知识图谱初始种子集自动生成算法[J]. 中文信息学报, 2018, 32(8): 1-8.
5 朱礼军, 乔晓东, 张运良. 汉语科技词系统建设实践——以新能源汽车领域为例[J]. 情报学报, 2010, 29(4): 723-731.
6 武文雅, 陈钰枫, 徐金安, 等. 中文实体关系抽取研究综述[J]. 计算机与现代化, 2018(8): 21-27, 34.
7 徐健, 张智雄, 吴振新. 实体关系抽取的技术方法综述[J]. 现代图书情报技术, 2008, 24(8): 18-23.
8 朱惠, 王昊, 苏新宁, 等. 汉语领域术语非分类关系抽取方法研究[J]. 情报学报, 2018, 37(12): 1193-1203.
9 张琴, 郭红梅, 张智雄. 融合词嵌入表示特征的实体关系抽取方法研究[J]. 数据分析与知识发现, 2017(9): 8-15.
10 BrinS. Extracting patterns and relations from the World Wide Web[C]// Proceedings of the International Workshop on the World Wide Web and Databases. Heidelberg: Springer, 1999: 172-183.
11 BlumA, MitchellT. Combining labeled and unlabeled data with co-training[C]// Proceedings of the Eleventh Annual Conference on Computational Learning Theory. New York: ACM Press, 1998: 92-100.
12 ZhuX J, ChahramaniZ B. Learning from labeled and unlabeled data with label propagation. CMU-CALD-02-107, CMU CALD[R]. Pittsburgh: Carnegie Mellon University, 2002.
13 EtzioniO, BankoM, SoderlandS, et al. Open information extraction from the Web[J]. Communications of the ACM, 2008, 51(12): 68-74.
14 漆桂林, 高桓, 吴天星. 知识图谱研究进展[J]. 情报工程, 2017, 3(1): 4-25.
15 刘凯, 符海东, 邹玉薇, 等. 基于卷积神经网络的中文医疗弱监督关系抽取[J]. 计算机科学, 2017, 44(10): 249-253.
16 MikolovT, SutskeverI, ChenK, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
17 GoldbergY. A primer on neural network models for natural language processing[J]. Journal of Artificial Intelligence Research, 2016, 57: 345-420.
18 MikolovT, ChenK, CorradoG, et al. Efficient estimation of word representations in vector space[C]// Proceedings of the International Conference on Learning Representations Workshop Track, 2013.
19 MikolovT, YihW T, ZweigG. Linguistic regularities in continuous space word representations[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2013: 746-751.
20 PenningtonJ, SocherR, ManningC. Glove: Global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543.
21 ZhangX L, DuC L, LiP S, et al. Knowledge graph completion via local semantic contexts[C]// Proceedings of the International Conference on Database Systems for Advanced Applications. Cham: Springer International Publishing, 2016: 432-446.
22 FuR J, GuoJ, QinB, et al. Learning semantic hierarchies via word embeddings[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014, 1: 1199-1209.
23 秦兵. 《大词林》中实体类型获取及层次化构建方法[EB/OL]. (2015-11-09) [2019-03-06]. http://www.cipsc.org.cn/kg3/qb.html.
24 韩其琛, 赵亚伟, 姚郑, 等. 基于叙词表的领域知识图谱初始种子集自动生成算法[J]. 中文信息学报, 2018, 32(8): 1-8.
25 陈果, 肖璐, 孙建军. 面向网络社区的分面式导航体系构建——以丁香园心血管论坛为例[J]. 情报理论与实践, 2017, 40(10): 112-116.
26 NailiM, ChaibiA H, Ben GhezalaH H. Comparative study of word embedding methods in topic segmentation[J]. Procedia Computer Science, 2017, 112: 340-349.
27 LaiS W, LiuK, HeS Z, et al. How to generate a good word embedding?[J]. IEEE Intelligent Systems, 2016, 31(6): 5-14.
28 张剑, 屈丹, 李真. 基于词向量特征的循环神经网络语言模型[J]. 模式识别与人工智能, 2015, 28(4): 299-305.
29 HartmannJ, HuppertzJ, SchampC, et al. Comparing automated text classification methods[J]. International Journal of Research in Marketing, 2019, 36(1): 20-38.
30 YoungT, HazarikaD, PoriaS, et al. Recent trends in deep learning based natural language processing[J]. IEEE Computational Intelligence Magazine, 2018, 13(3): 55-75.
31 39疾病百科-心血管内科疾病[EB/OL]. [2018-08-01]. http://jbk.39.net/bw/xinxueguanneike_t1.
32 MaatenL V D, HintonG. Visualizing data using tSNE[J]. Journal of Machine Learning Research, 2008, 9(11): 2579-2605.
33 WangX Y, ChusedA, ElhadadN, et al. Automated knowledge acquisition from clinical narrative reports[J]. AMIA Annual Symposium Proceedings Archive, 2008, 2008: 783-787.
34 de BruijnB, CherryC, KiritchenkoS, et al. Machine-learned solutions for three stages of clinical information extraction: The state of the art at I2B2 2010[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 557-562.
35 RinkB, HarabagiuS, RobertsK. Automatic extraction of relations between medical concepts in clinical texts[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 594-600.
36 FrunzaO, InkpenD. Extraction of disease-treatment semantic relations from biomedical sentences[C]// Proceedings of the 2010 Workshop on Biomedical Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2010: 91-98.