|
|
Learning Concept Hierarchies from Chinese Academic Literature for Domain Ontology Construction |
Tang Lin1, Guo Chonghui1, Chen Jingfeng1, Sun Leilei2 |
1.Institute of Systems Engineering, Dalian University of Technology, Dalian 116024 2.SKLSDE Lab and BDBC, Beihang University, Beijing 100083 |
|
|
Abstract Constructing domain ontology from academic literature has great significance in promoting discipline development. Taking Chinese academic literature as a data source, this study proposed a semi-automatic method for extracting concept hierarchy. First, a fine-grained universal research framework for constructing hierarchical relations of the domain ontology was proposed. Then, a novel concept representation fusion method was developed, considering concepts semantic features based on deep learning and concept frequency in time series. Combined with an affinity propagation (AP) clustering algorithm, Prim s algorithm, and data from a Web search engine, the ontology concept hierarchy extraction algorithm was proposed via rule-based reasoning (RROCHE). Concept hierarchy relations are learned semi-automatically. The algorithm was then applied to the academic literature on Chinese word segmentation. Numerical experiments examined the feasibility and effectiveness of the proposed methods. The proposed method can also be applied effectively and widely to other domains.
|
Received: 25 March 2019
|
|
|
|
1 李景, 苏晓鹭, 钱平. 构建领域本体的方法[J]. 计算机与农业, 2003(7): 7-10. 2 任飞亮, 沈继坤, 孙宾宾, 等. 从文本中构建领域本体技术综述[J]. 计算机学报, 2019, 42(3): 654-676. 3 杜小勇, 李曼, 王珊. 本体学习研究综述[J]. 软件学报, 2006, 17(9): 1837-1847. 4 WongW, LiuW, BennamounM. Ontology learning from text[J]. ACM Computing Surveys, 2012, 44(4): 1-36. 5 CimianoP. Ontology learning and population from text: Algorithms, evaluation and applications[M]. New York: Springer US, 2006: 3-7. 6 揭春雨, 冯志伟. 基于知识本体的术语定义(下)[J]. 术语标准化与信息技术, 2009(3): 14-23. 7 张雷瀚, 吕学强, 李卓, 等. 领域本体术语的抽取方法研究[J]. 情报学报, 2014, 33(2): 167-174. 8 LeeC M, HuangC K, TangK M, et al. Iterative machine-learning Chinese term extraction[C]// Proceedings of the International Conference on Asian Digital Libraries. Heidelberg: Springer, 2012, 7634: 309-312. 9 闭炳华. 基于word2vec的数字图书馆本体构建技术研究[J]. 现代电子技术, 2016, 39(15): 90-94. 10 王红, 张昊, 史金钏. 基于LDA的领域本体概念获取方法研究[J]. 计算机工程与应用, 2018, 54(13): 252-257. 11 蒋婷, 孙建军. 领域学术本体概念等级关系抽取研究[J]. 情报学报, 2017, 36(10): 1080-1092. 12 RenF L. A cheap domain ontology construction method based on graph generation and conversion method[J]. Journal of Information and Computational Science, 2012, 9(18): 5823-5830. 13 贾秀玲, 文敦伟. 一种本体学习中分类关系提取方法的研究[J]. 计算机技术与发展, 2007, 17(10): 31-33, 36. 14 HearstM A. Automatic acquisition of hyponyms from large text corpora[C]// Proceedings of the 14th Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1992: 539-545. 15 温春, 石昭祥, 张霄. 本体概念层次获取方法综述[J]. 计算机应用与软件, 2010, 27(9): 103-107. 16 ZengD J, LiuK, LaiS W, et al. Relation classification via convolutional deep neural network[C]// Proceedings of the 25th International Conference on Computational Linguistics. Dublin: Dublin City University; Stroudsburg: Association for Computational Linguistics, 2014: 2335-2344. 17 李杰, 陈超美. CiteSpace: 科技文本挖掘及可视化[M]. 北京: 首都经济贸易大学出版社, 2016: 12-50. 18 HintonG E. Learning distributed representations of concepts[C]// Proceedings of the Eighth Conference of the Cognitive Science Society. Seattle: Cognitive Science Society, 1986: 1-12. 19 MikolovT, ChenK, CorradoG, et al. Efficient estimation of word representations in vector space[OL]. https://arxiv.org/abs/1301.3781. 20 LahitaniA R, PermanasariA E, SetiawanN A. Cosine similarity to determine similarity measure: Study case in online essay assessment[C]// Proceedings of the 4th International Conference on Cyber and IT Service Management. New York: IEEE, 2016. 21 刘建伟, 崔立鹏, 罗雄麟. 概率图模型的稀疏化学习[J]. 计算机学报, 2016, 39(8): 1597-1611. 22 FreyB J, DueckD. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. 23 郭崇慧, 曹梦月. GMAP: 一种基于AP聚类的共词分析方法[J]. 情报学报, 2017, 36(11): 1192-1200. 24 SunL L, GuoC H. Incremental affinity propagation clustering based on message passing[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(11): 2731-2744. 25 PrimR C. Shortest connection networks and some generalizations[J]. Bell System Technical Journal, 1957, 36(6): 1389-1401. 26 WarshallS. A theorem on Boolean matrices[J]. Journal of the ACM, 1962, 9(1): 11-12. 27 黄昌宁, 赵海. 中文分词十年回顾[J]. 中文信息学报, 2007, 21(3): 8-19. 28 ZhaoH, CaiD, HuangC N, et al. Chinese word segmentation: Another decade review (2007-2017)[OL]. https://arxiv.org/abs/1901.06079v1. 29 李五锁. 基于改进的深度信念网的中文电子病历命名实体识别方法研究[D]. 北京: 北京化工大学, 2018: 3-5. 30 申站. 基于神经网络的中文电子病历命名实体识别[D]. 北京: 北京邮电大学, 2018: 3-5. |
|
|
|