|
|
Research on Hierarchy Identification of Chinese Terms in the Field of E-government |
Zhang Wei1,2, Wang Hao1,2, Deng Sanhong1,2, Zhang Baolong1,2 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing University, Nanjing 210023 |
|
|
Abstract Driven by data, the characteristics of multi-source heterogeneity have been demonstrated among an increasing number of e-government information resources. Utilizing a large-scale corpus to design an automatic identification scheme for the deep hierarchy of Chinese terms in the field of e-government not only compensates for the lack of a man-made thesaurus in terms of content and structure but also has great practical significance for the dissemination and subsequent application of government information resources in China. Therefore, in line with the dual perspectives of content and structure, the deep associations between terms in the e-government thesaurus are identified in this paper. The content-based hierarchy that is generated by spectral clustering is utilized as the preliminary framework and the structure-based hierarchy that is generated by formal concept analysis is employed as the later modification guide, in order to form the ontology of e-government terms that accounts for the recall and accuracy rates of related terms. The results reveal that the hierarchy of the ontology of e-government terminology is reasonable and effective, and the evaluation results of the term hierarchies illustrate that the knowledge ontology has excellent expansibility and extensibility.
|
Received: 07 November 2019
|
|
|
|
1 傅建平. 新技术在电子政务中的创新应用及对中国的启示——《2018联合国电子政务调查报告》解读之五[J]. 行政管理改革, 2019(5): 59-64. 2 安通. “一云一网一平台”让数据飞起来[J]. 当代贵州, 2019(22): 22-23. 3 郝嘉树. 我国印发型公文主题词表分析及分面改造[J]. 图书馆工作与研究, 2016(6): 63-67,76. 4 Zhai J, Chen Y, Wang Q L, et al. Fuzzy ontology models using intuitionistic fuzzy set for knowledge sharing on the semantic web[C]// Proceedings of the 12th International Conference on Computer Supported Cooperative Work in Design. IEEE, 2008: 465-469. 5 Buitelaar P, Cimiano P, Magnin B. Ontology learning from text: methods, evaluation and applications[J]. Computational Linguistics, 2006, 32(4): 569-572. 6 Zamanifar A, Minaei-Bidgoli B, Sharifi M. A new hybrid farsi text summarization technique based on term co-occurrence and conceptual property of the text[C]// Proceedings of the Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. IEEE, 2008: 635-639. 7 迪莉娅. 基于云计算的政府门户网站知识管理研究[J]. 现代情报, 2014, 34(3): 55-59. 8 杨会良, 陈兰杰, 杨雅旭. 基于扎根理论的跨部门政务信息共享干预路径实证研究[J]. 情报资料工作, 2018(3): 51-56. 9 贾君枝, 武晓宇. 基于FAST的综合电子政务主题词表分面式改造[J]. 图书情报工作, 2014, 58(8): 105-110. 10 王汀, 冀付军. 基于主题词表与百科知识相融合的领域本体自动构建研究[J]. 情报学报, 2017, 36(7): 723-733. 11 萨蕾. 受控词表在政府信息组织中的应用研究[J]. 图书馆建设, 2013(2): 24-28. 12 武晓宇. 面向政府信息公开目录的分面式主题词表构建[D]. 太原: 山西大学, 2014. 13 Lee C M, Huang C K, Tang Fayuan) K M, et al. Iterative machine-learning Chinese term extraction[C]// Proceedings of the International Conference on Asian Digital Libraries. Heidelberg: Springer, 2012: 309-312. 14 祁瑞华, 周俊艺, 郭旭, 等. 基于知识库的图书评论主题抽取研究[J]. 数据分析与知识发现, 2019, 3(6): 83-91. 15 吴志祥, 王昊, 王密平. 中文专利术语层次关系解析研究[J]. 情报学报, 2017, 36(4): 401-410. 16 Das J, Majumder S, Gupta P, et al. Collaborative recommendations using hierarchical clustering based on K-d trees and quadtrees[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2019, 27(4): 637-668. 17 Chowdhury K, Chaudhuri D, Pal A K, et al. Seed selection algorithm through K-means on optimal number of clusters[J]. Multimedia Tools and Applications, 2019, 78(13): 18617-18651. 18 李湘东, 阮涛, 潘练. 融合去噪技术和动态主题数的新闻话题分析框架研究[J]. 情报科学, 2018, 36(4): 14-21. 19 毕强, 刘健. 基于领域本体的数字文献资源聚合及服务推荐方法研究[J]. 情报学报, 2017, 36(5): 452-460. 20 de Farias A M G, Cintra M E, Felix A C, et al. Definition of strategies for crime prevention and combat using fuzzy clustering and formal concept analysis[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2018, 26(3): 429-452. 21 王昊, 朱惠, 邓三鸿. 基于形式概念分析的学科术语层次关系构建研究[J]. 情报学报, 2015, 34(6): 616-627. 22 Qin P D, Xu W R, Guo J. A novel negative sampling based on TFIDF for learning word representation[J]. Neurocomputing, 2016, 177: 257-265. 23 郭红梅, 张智雄. 基于多重文本术语关系叠加识别文本核心主题的有效性探索[J]. 情报学报, 2017, 36(11): 1157-1164. 24 Seo D H, Lee W D. Visualizing a multi-dimensional data set in a lower dimensional space[C]// Proceedings of the First International Conference on the Applications of Digital Information and Web Technologies. IEEE, 2008: 302-307. 25 Bartenhagen C, Klein H U, Ruckert C, et al. Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data[J]. BMC Bioinformatics, 2010, 11: Article No. 567. 26 Arora S, Hu W, Kothari P K. An analysis of the t-SNE algorithm for data visualization[J]. Proceedings of Machine Learning Research, 2018, 75(3): 1-8. 27 Van Lierde H, Chow T W S, Delvenne J C. Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic graphs[J]. Journal of Complex Networks, 2019, 7(1): 1-53. 28 Monnin P, Lezoche M, Napoli A, et al. Using formal concept analysis for checking the structure of an ontology in LOD: the example of DBpedia[C]// Proceedings of the International Symposium on Methodologies for Intelligent Systems. Cham: Springer, 2017: 674-683. 29 Wang H, Deng S H, Su X N. A study on construction and analysis of discipline knowledge structure of Chinese LIS based on CSSCI[J]. Scientometrics, 2016, 109(3): 1725-1759. 30 百度用14年建成全球最大中文百科, 1600多万词条数是英文维基百科近3倍[EB/OL]. [2020-01-10]. http://finance.ynet.com/2020/01/10/2319475t632.html. 31 吕兴龙. 政务微博十年致敬初心, 助力打造“指尖上的网上政府”[EB/OL]. [2019-12-31]. http://zgsc.china.com.cn/2019-12/31/content_41018014.html. |
|
|
|