|
|
Knowledge-Discovery Method Driven by the Collaboration of Data and Knowledge: Concept, Mechanism, and Model |
Yao Sumei1,2, Lu Quan1,2,3 |
1.Center for Studies of Information Resources, Wuhan University, Wuhan 430072 2.School of Information Management, Wuhan University, Wuhan 430072 3.Big Data Institute, Wuhan University, Wuhan 430072 |
|
|
Abstract Knowledge discovery is a critical theoretical framework for addressing the challenges posed by vast amounts of data and complex problems, advancing scientific research and enhancing decision support capabilities. “Data” and “knowledge” are core concepts in information science, and knowledge discovery driven by data or knowledge serves as an essential approach to solving research problems in data-intensive or knowledge-intensive contexts. However, pervasive issues of imperfect data and uncertain knowledge limit the effectiveness of these methods. The co-driven approach offers an innovative pathway for discovering new knowledge through the complementary integration of data and knowledge. Despite its potential, a comprehensive and in-depth analysis of co-driven methods remains insufficient. This study adopts a cognitive logic structure of “what,” “why,” and “how” to explore the basic concepts, mechanisms, and models of knowledge discovery driven by the collaboration of data and knowledge. First, it introduces the fundamental concept of knowledge discovery through data-knowledge co-driven mechanisms, along with a detailed explanation of the newly introduced concepts of imperfect data and uncertain knowledge, which are essential components of this framework. Subsequently, the mechanism section examines the multi-path and multi-objective strategies for integrating data into knowledge-driven knowledge discovery and integrating knowledge into data-driven knowledge discovery. It explains the essence and operational mechanisms of co-driven knowledge discovery by emphasizing the cross-complementarity between data and knowledge. Finally, this study proposes a problem- and scenario-driven basic model of knowledge discovery that is co-driven by data and knowledge. It elaborates on three primary categories of internal modeling for co-driven knowledge discovery: predominantly knowledge-driven discovery (construction and error-correction modes), predominantly data-driven discovery (embedding, correction, and guidance modes), and other collaborative knowledge-discovery methods (hybrid and concurrent modes). The co-driven knowledge-discovery approach, which encompasses multiple co-driven modes, balances the complementary and synergistic effects of data and knowledge, thus providing a more comprehensive framework and process for knowledge discovery. This approach expands methodological innovation and problem-solving perspectives within the discipline of information resource management.
|
Received: 12 March 2024
|
|
|
|
1 Shu X L, Ye Y W. Knowledge discovery: methods from data mining and machine learning[J]. Social Science Research, 2023, 110: 102817. 2 卢小宾, 霍帆帆, 王壮, 等. 数智时代的信息分析方法: 数据驱动、知识驱动及融合驱动[J]. 中国图书馆学报, 2024, 50(1): 29-44. 3 杨善林, 丁帅, 顾东晓, 等. 医疗健康大数据驱动的知识发现与知识服务方法[J]. 管理世界, 2022, 38(1): 219-229. 4 王曰芬, 邹本涛, 宋小康. 大数据驱动下情报研究知识库及其体系架构设计[J]. 情报理论与实践, 2019, 42(1): 28-33. 5 Alfrjani R, Osman T, Cosma G. A hybrid semantic knowledgebase-machine learning approach for opinion mining[J]. Data & Knowledge Engineering, 2019, 121: 88-108. 6 杜建, 孔桂兰, 李鹏飞, 等. 可计算医学知识的基本概念与实现路径[J]. 情报学报, 2021, 40(11): 1221-1233. 7 Schuster D, van Zelst S J, van der Aalst W M P. Utilizing domain knowledge in data-driven process discovery: a literature review[J]. Computers in Industry, 2022, 137: 103612. 8 Yu J, Sheng Q Z, Han J, et al. A semantically enhanced service repository for user-centric service discovery and management[J]. Data & Knowledge Engineering, 2012, 72: 202-218. 9 Chen M, Zhu Q M, Chen Z X. An integrated interactive environment for knowledge discovery from heterogeneous data resources[J]. Information and Software Technology, 2001, 43(8): 487-496. 10 Lane T R, Foil D H, Minerali E, et al. Bioactivity comparison across multiple machine learning algorithms using over 5000 datasets for drug discovery[J]. Molecular Pharmaceutics, 2021, 18(1): 403-415. 11 申静, 杨家鑫. 数据驱动的智库知识服务流程优化[J]. 图书情报知识, 2021, 38(4): 114-124. 12 王萍, 牟冬梅, 石琳, 等. 领域知识融合驱动下的数据挖掘模型构建与优化[J]. 情报理论与实践, 2018, 41(9): 114-117, 153. 13 董克, 吴佳纯. 数智融合信息分析方法的理论框架与体系建构[J]. 中国图书馆学报, 2024, 50(6): 103-118. 14 陈雪龙, 董恩超, 王延章, 等. 非常规突发事件应急管理的知识元模型[J]. 情报杂志, 2011, 30(12): 22-26, 17. 15 唐晓波, 李新星. 基于人工智能的知识服务研究[J]. 图书馆学研究, 2017(13): 26-31. 16 孙立, 焦微玲. 工业大数据驱动下知识发现与知识服务构建研究[J]. 情报理论与实践, 2017, 40(11): 86-89, 104. 17 胡潇戈, 戚越, 王玉琦, 等. 面向智能问答的图书馆参考咨询知识库体系设计及构建[J]. 图书情报知识, 2019, 36(5): 101-108, 119. 18 吴玉浩, 姜红, 高思芃. 大数据驱动下技术标准化与知识管理的协同机制研究[J]. 现代情报, 2019, 39(1): 20-28. 19 蒋勋, 张志祥, 朱晓峰, 等. 大数据驱动智库应急决策的情报架构[J]. 情报理论与实践, 2019, 42(8): 25-32, 16. 20 陈涛, 苏日娜, 张永娟, 等. 智慧数据驱动的古籍智慧性保护体系研究[J]. 中国图书馆学报, 2023, 49(1): 68-81. 21 Tsai Y S, King P H, Higgins M S, et al. An expert-guided decision tree construction strategy: an application in knowledge discovery with medical databases[J]. Journal of the American Medical Informatics Association, 1997, 4: 208-212. 22 Yin J, Tang M J, Cao J L, et al. Knowledge-driven cybersecurity intelligence: software vulnerability coexploitation behavior discovery[J]. IEEE Transactions on Industrial Informatics, 2023, 19(4): 5593-5601. 23 李泽中, 齐晨旭, 戎佳. 多源知识融合的企业知识服务模型构建研究[J]. 情报科学, 2022, 40(12): 56-62. 24 赖朝安, 钱娇. 基于知识图谱的专利挖掘方法及其应用[J]. 科研管理, 2017, 38(S1): 333-341. 25 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究[J]. 数据分析与知识发现, 2021, 5(11): 29-44. 26 范昊, 郑小川, 热孜亚·艾海提, 等. 基于知识图谱的标准文献多维知识发现研究[J]. 情报理论与实践, 2023, 46(9): 175-184. 27 冯伟伟, 秦长江. 国内机构知识库研究现状分析——基于知识图谱的视角[J]. 现代情报, 2015, 35(6): 90-96. 28 刘成山, 杜怡然, 汪圳. 基于细粒度知识图谱的科技文献主题发现与热点分析[J]. 情报理论与实践, 2024, 47(5): 131-138. 29 彭尧, 张玲玲, 邓智斌, 等. 基于企业股权关系知识图谱的制造业信用风险传染分析[J]. 管理评论, 2023, 35(10): 251-267. 30 张君冬, 杨松桦, 严颖, 等. 跨医学体系下医疗知识图谱的构建与药物预测研究——以动脉粥样硬化为例[J]. 情报理论与实践, 2024, 47(2): 178-188. 31 仇开域, 夏翠娟. 碑帖知识库构建: 从智慧化加工到智慧化服务[J]. 图书馆论坛, 2024, 44(6): 99-109. 32 张强, 高劲松, 龙家庆, 等. 基于知识重构的词人时空情感轨迹可视化研究——以辛弃疾为例[J]. 情报学报, 2023, 42(6): 729-739. 33 王晓光, 翁梦娟, 侯西龙, 等. 古籍注疏的知识表示与语义化建模研究[J]. 中国图书馆学报, 2023, 49(3): 75-91. 34 陈玲洪, 潘晓华. 基于知识图谱和读者画像的图书推荐研究[J]. 数据分析与知识发现, 2023, 7(12): 164-171. 35 邱韵霏, 李春旺. 智能情报分析模式: 数据驱动型与知识驱动型[J]. 情报理论与实践, 2020, 43(2): 28-34. 36 谭明亮, 唐晓波. 文本大数据和领域知识联合驱动的慢病智能服务模型构建[J]. 现代情报, 2022, 42(5): 77-85. 37 贾君枝, 崔西燕, 任明. 数据与知识双驱动的知识组织系统构建框架研究[J]. 情报理论与实践, 2023, 46(10): 157-162. 38 王根生, 朱奕, 李胜. 一种融合知识图谱的图注意力神经网络谣言实时检测方法[J]. 数据分析与知识发现, 2024, 8(6): 95-106. 39 张诗莹, 李阳. 融合事理知识图谱与网络舆情分析的突发事件情报支持路径及实证研究——以危化品事故为例[J]. 信息资源管理学报, 2023, 13(4): 60-71. 40 朱侯, 罗颖嘉, 陈梦蕾, 等. 基于知识库增强深度学习模型的隐私政策合规性研究——从完整性与语义冲突角度[J]. 数据分析与知识发现, 2024, 8(5): 46-58. 41 彭博. 融合知识图谱与深度学习的文物信息资源实体关系抽取方法研究[J]. 现代情报, 2021, 41(5): 87-94. 42 Luo J X. Data-driven innovation: what is it?[J]. IEEE Transactions on Engineering Management, 2023, 70(2): 784-790. 43 金哲, 张引, 吴飞, 等. 数据驱动与知识引导结合下人工智能算法模型[J]. 电子与信息学报, 2023, 45(7): 2580-2594. 44 安然, 卢小宾, 郑彦宁. 数智融合视域下产业技术情报分析方法体系研究——基于图书情报领域论文的分析[J]. 情报理论与实践, 2024, 47(8): 43-52. 45 余艳, 张文, 熊飞宇, 等. 融合知识图谱与神经网络赋能数智化管理决策[J]. 管理科学学报, 2023, 26(5): 231-247. 46 张思龙, 王兰成. 知识和数据双轮驱动的网络舆情分析技术研究[J]. 现代情报, 2018, 38(4): 106-111. 47 Li X H, Cao C C, Shi Y H, et al. A survey of data-driven and knowledge-aware explainable AI[J]. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(1): 29-49. 48 Jiang R Q, Yan Y, Xue J H, et al. Knowledge distillation meets label noise learning: ambiguity-guided mutual label refinery[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025, 36(1): 939-952. 49 Nguyen Q, Shikina T, Teruya D, et al. Leveraging expert knowledge for label noise mitigation in machine learning[J]. Applied Sciences, 2021, 11(22): 11040. 50 Song Y F, Zhang D N, Li X D, et al. A novel data cleaning framework based on knowledge graph[C]// Proceedings of the 8th International Conference on Big Data Computing and Communications. Piscataway: IEEE, 2022: 350-355. 51 Xiang M R, Hou J Y, Luo W, et al. Impute gene expression missing values via biological networks: optimal fusion of data and knowledge[C]// Proceedings of the 2021 International Joint Conference on Neural Networks. Piscataway: IEEE, 2021: 1-8. 52 Qi Z X, Wang H Z, Li J Z, et al. FROG: Inference from knowledge base for missing value imputation[J]. Knowledge-Based Systems, 2018, 145: 77-90. 53 Paulheim H. Knowledge graph refinement: a survey of approaches and evaluation methods[J]. Semantic Web, 2017, 8(3): 489-508. 54 Liu X X, Zhou Y C, Zhao H. Robust hierarchical feature selection driven by data and knowledge[J]. Information Sciences, 2021, 551: 341-357. 55 Wu Z P, Zhao H. Hierarchical few-shot learning with feature fusion driven by data and knowledge[J]. Information Sciences, 2023, 639: 119012. 56 Zhou L G, Lu D, Fujita H. The performance of corporate financial distress prediction models with features selection guided by domain knowledge and data mining approaches[J]. Knowledge-Based Systems, 2015, 85: 52-61. 57 张斌, 尹鑫, 杨文. 中国档案学学科体系建设回顾与展望[J]. 中国图书馆学报, 2024, 50(2): 87-99. 58 马费成. 守正创新, 继续推进信息资源管理学科的发展[J]. 情报资料工作, 2023, 44(1): 13-14. |
|
|
|