|
|
Discipline Identification Methods for Knowledge Units: A Comparative Study towards Feature Mining |
Cao Yujie1, Xiang Rongrong1, Mao Jin2, Wang Shiyun2 |
1.School of Information Management, Central China Normal University, Wuhan 430079 2.School of Information Management, Wuhan University, Wuhan 430072 |
|
|
Abstract The features of knowledge units are the basis of the discipline identification of knowledge units. Mining the key features of knowledge units can help improve performance, so as to better serve the study of interdisciplinary research at the knowledge content level. In this study, with the help of 16 methods of discipline identification for knowledge units, we compared and analyzed the discriminative performance of these methods for knowledge units with different word frequencies and disciplinary coverages. Further, we evaluated the effect of the three features and feature combinations of disciplinary importance, disciplinary relevance, and disciplinary discriminability implied by the methods to mine the subset of features with the best effect. We also constructed a test dataset based on data from the cross-cutting field of “computational medicine.” The experimental analysis results showed that the combined use of the three features achieved better performance on all groups, while the performance advantage of disciplinary importance indicates that it is the most important among the three features; the discipline identification of high-frequency words needs to focus on disciplinary importance, while low-frequency words need to focus on disciplinary importance. For knowledge units with multidisciplinary coverage, it is necessary to consider disciplinary differentiation in addition to disciplinary importance. The findings of this study provide theoretical guidance and practical suggestions for the optimization of discipline identification methods for knowledge units.
|
Received: 01 November 2022
|
|
|
|
1 刘仲林. 跨学科学导论[M]. 杭州: 浙江教育出版社, 1990: 13-20. 2 Liu Y X, Rousseau R. Knowledge diffusion through publications and citations: a case study using ESI-fields as unit of diffusion[J]. Journal of the American Society for Information Science and Technology, 2010, 61(2): 340-351. 3 Wang S Y, Mao J, Cao Y J, et al. Integrated knowledge content in an interdisciplinary field: identification, classification, and application[J]. Scientometrics, 2022, 127(11): 6581-6614. 4 Wang S Y, Mao J, Lu K, et al. Investigating interdisciplinary knowledge integration through citance analysis: a case study on eHealth[J]. Journal of Informetrics, 2021, 15(4): 101-214. 5 文庭孝, 罗贤春, 刘晓英, 等. 知识单元研究述评[J]. 中国图书馆学报, 2011, 37(5): 75-86. 6 叶鹰. 质性知识构造与量化知识分析综论: 兼论学术对标法理论基础[J]. 中国图书馆学报, 2022, 48(1): 38-51. 7 索传军, 戎军涛. 知识元理论研究述评[J]. 图书情报工作, 2021, 65(11): 133-142. 8 马亚雪, 毛进, 李纲. 面向科学社会计算的数据组织与建模方法[J]. 中国图书馆学报, 2021, 47(1): 76-87. 9 Zhang C Z, Wu D. Bilingual terminology extraction using multi-level termhood[J]. Electronic Library, 2012, 30(2): 295-308. 10 陈果, 肖璐, 赵雪芹. 领域知识分析中的关键词选择方法研究——一种以学科为背景的全局视角[J]. 情报学报, 2014, 33(9): 959-968. 11 黄颖, 高天舒, 王志楠, 等. 基于Web of Science分类的跨学科测度研究[J]. 科研管理, 2016, 37(3): 124-132. 12 陈雨. WoS与Scopus学科分类对学科学术竞争力评价结果的影响研究[D]. 北京: 中国农业大学, 2018. 13 王成卓, 孙巍, 杨宇. 面向ESI研究前沿数据的学科领域自动分类方法——以农业领域为例[J]. 农业展望, 2021, 17(8): 143-149. 14 房威, 朱安, 李杨, 等. 文献数分类法: 一种适用于期刊评价的期刊分类方法[J]. 安徽农业科学, 2010, 38(36): 20498, 21107. 15 Rafols I, Porter A L, Leydesdorff L. Science overlay maps: a new tool for research policy and library management[J]. Journal of the American Society for Information Science and Technology, 2010, 61(9): 1871-1887. 16 马瑞敏, 闫晓慧, 申楠. 学科交叉直接测度研究[J]. 情报学报, 2019, 38(7): 688-696. 17 黄颖, 张琳, 孙蓓蓓, 等. 跨学科的三维测度——外部知识融合、内在知识会聚与科学合作模式[J]. 科学学研究, 2019, 37(1): 25-35. 18 吕双. 国际知识管理研究的领域分析Ⅱ: 学科领域分布的深度挖掘[J]. 情报杂志, 2012, 31(3): 118-123. 19 华萌, 陈仕吉, 周群, 等. 多学科期刊论文学科划分方法研究[J]. 情报杂志, 2015, 34(5): 76-80, 22. 20 范晴晴, 李长玲, 荣国阳, 等. 跨学科输入知识对学科发展的影响力分析——以图书情报学科为例[J]. 情报科学, 2023, 41(2): 79-85. 21 Li J D, Cheng K W, Wang S H, et al. Feature selection: a data perspective[J]. ACM Computing Surveys, 2017, 50(6): Article No.94. 22 Kageura K, Umino B. Methods of automatic term recognition: a review[J]. Terminology, 1996, 3(2): 259-289. 23 Fattah M A. A novel statistical feature selection approach for text categorization[J]. Journal of Information Processing Systems, 2017, 13(5): 1397-1409. 24 刘丽帆, 张恒, 章成志. 基于学术文献引文内容的跨学科知识流动研究[J]. 情报理论与实践, 2022, 45(6): 24-31, 47. 25 Uysal A K, Gunal S. A novel probabilistic feature selection method for text classification[J]. Knowledge-Based Systems, 2012, 36: 226-235. 26 杜涛. SCI论文一级学科归属判别研究[D]. 太原: 山西大学, 2020. 27 胡昌平, 陈果. 科技论文关键词特征及其对共词分析的影响[J]. 情报学报, 2014, 33(1): 23-32. 28 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329. 29 Wang Y Z, Zhang C Z. What type of domain knowledge is cited by articles with high interdisciplinary degree?[J]. Proceedings of the Association for Information Science and Technology, 2018, 55(1): 919-921. 30 Winslow R L, Trayanova N, Geman D, et al. Computational medicine: translating models to clinical care[J]. Science Translational Medicine, 2012, 4(158): 158rv11. 31 高金玉, 朱小梅. MeSH在医学信息检索中的应用研究[J]. 数字图书馆论坛, 2014(10): 27-31. 32 Kou G, Yang P, Peng Y, et al. Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods[J]. Applied Soft Computing, 2020, 86: 105836. 33 张蕾, 姜宇, 孙莉. 一种改进型TF-IDF文本聚类方法[J]. 吉林大学学报(理学版), 2021, 59(5): 1199-1204. 34 陈建华, 王治和, 蒋芸. 基于类别区分度和关联性分析的综合特征选择[J]. 计算机工程, 2012, 38(9): 186-188, 192. 35 Kit C, Liu X Y. Measuring mono-word termhood by rank difference via corpus comparison[J]. Terminology, 2008, 14(2): 204-229. 36 胡阿沛, 张静, 刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术, 2013(2): 24-29. 37 徐庶睿, 卢超, 章成志. 术语引用视角下的学科交叉测度——以PLOS ONE上六个学科为例[J]. 情报学报, 2017, 36(8): 809-820. 38 张玉芳, 万斌候, 熊忠阳. 文本分类中的特征降维方法研究[J]. 计算机应用研究, 2012, 29(7): 2541-2543. 39 刘敏娟, 张学福, 颜蕴. 基于词频、词量、累积词频占比的共词分析词集范围选取方法研究[J]. 图书情报工作, 2016, 60(23): 135-142. 40 常春, 赖院根. 数字环境下通用概念获取方法[J]. 图书情报工作, 2011, 55(22): 22-25. 41 张宝隆, 王昊, 张卫. 学科交叉视角下的学科区分能力测度方法及分析研究[J]. 情报学报, 2022, 41(4): 375-387. |
|
|
|