|
|
Interdisciplinary Literature Identification Method Based on an Improved Deep Learning Model |
Feng Ling1,2, Pan Yuntao1 |
1.Research Center for Scientific Measurement and Evaluation, Institute of Scientific and Technical Information of China, Beijing 100038 2.School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou 450046 |
|
|
Abstract Effectively identifying interdisciplinary literature not only helps to timely grasp the research trend of interdisciplinary research and track scientific research activities in interdisciplinary fields in real time, but also provides strong support for scientific research decision-making. This paper proposes an interdisciplinary literature identification method based on an improved deep learning model according to the semantic intersection. First, a training dataset for interdisciplinary literature identification is obtained through “text merging.” Then, an improved deep-learning-based text classification model is proposed and trained on the training set. Finally, based on the trained model, a new literature is determined whether it is interdisciplinary. This study conducts empirical research on “Dental Materials” and “Computational Biology” datasets. The results indicate that the proposed method is effective in interdisciplinary literature identification, and the area under the curve (AUC) values calculated on the two datasets— “Dental Materials” and “Computational Biology” —reach 0.741 and 0.966, respectively. Compared with traditional deep-learning-based text classification methods, the proposed method can train interdisciplinary literature recognition models based on existing non-interdisciplinary literature without relying on any prior knowledge of interdiscipline. Thus, when a new literature appears, the proposed method can accurately distinguish whether it is interdisciplinary, achieving real-time monitoring of cutting-edge interdisciplinary fields with development potential. Additionally, there is a significant improvement in the efficiency of identifying interdisciplinary literature as compared with the traditional methods.
|
Received: 02 August 2023
|
|
|
|
1 路甬祥. 学科交叉与交叉科学的意义[J]. 中国科学院院刊, 2005, 20(1): 58-60. 2 金力. 学科交叉已成为高水平科研与顶尖人才的重要特征[EB/OL]. [2022-05-28]. https://fddi.fudan.edu.cn/c9/8c/c18965a444812/page.htm. 3 章成志, 吴小兰. 跨学科研究综述[J]. 情报学报, 2017, 36(5): 523-535. 4 操玉杰, 毛进, 潘荣清, 等. 学科交叉研究的演化阶段特征分析——以医学信息学为例[J]. 数据分析与知识发现, 2019, 3(5): 107-116. 5 张雪, 张志强. 美国科学基金会资助项目的学科交叉度演化规律及影响研究[J]. 情报理论与实践, 2021, 44(12): 122-132. 6 温芳芳, 杨倩倩, 李翔宇. 我国人文社会科学学科交叉性的测度及其演化规律研究——基于国家社科基金关键词耦合分析[J]. 现代情报, 2022, 42(3): 157-167. 7 Xu J, Bu Y, Ding Y, et al. Understanding the formation of interdisciplinary research from the perspective of keyword evolution: a case study on joint attention[J]. Scientometrics, 2018, 117(2): 973-995. 8 曹嘉君, 王曰芬, 陈盛之, 等. 多学科交叉综合的研究领域内学科间分布状态与演化研究[J]. 情报学报, 2020, 39(5): 459-468. 9 张艺蔓, 李秀霞, 韩牧哲. 基于引文耦合的情报学学科结构时序分析[J]. 情报杂志, 2015, 34(3): 100-106. 10 Xu H Y, Guo T, Yue Z H, et al. Interdisciplinary topics of information science: a study based on the terms interdisciplinarity index series[J]. Scientometrics, 2016, 106(2): 583-601. 11 商宪丽. 基于主题引用网络的交叉学科知识传播研究——以数字图书馆为例[J]. 情报科学, 2018, 36(8): 53-59, 66. 12 陈琼, 朱庆华, 闵华, 等. 基于领域主题的学科交叉特征识别方法研究——以医学信息学为例[J]. 现代情报, 2022, 42(4): 11-24. 13 Leydesdorff L, Rafols I, Chen C M. Interactive overlays of journals and the measurement of interdisciplinarity on the basis of aggregated journal-journal citations[J]. Journal of the American Society for Information Science and Technology, 2013, 64(12): 2573-2586. 14 Bromham L, Dinnage R, Hua X. Interdisciplinary research has consistently lower funding success[J]. Nature, 2016, 534(7609): 684-687. 15 黄菡, 王晓光, 王依蒙. 复杂网络视角下的研究主题学科交叉测度研究[J]. 图书情报工作, 2022, 66(19): 99-109. 16 张琳, 刘冬东, 吕琦, 等. 论文学科交叉测度研究: 从全部引文到章节引文[J]. 情报学报, 2020, 39(5): 492-499. 17 黄颖, 虞逸飞, 孙蓓蓓, 等. 基于多代参考文献的单篇论文学科分类方法研究[J]. 现代情报, 2024, 44(6): 119-135. 18 王卫军, 宁致远, 杜一, 等. 基于多标签分类的科技文献学科交叉研究性质识别[J]. 数据分析与知识发现, 2023, 7(1): 102-112. 19 Gl?nzel W, Schubert A, Czerwon H J. An item-by-item subject classification of papers published in multidisciplinary and general journals using reference analysis[J]. Scientometrics, 1999, 44(3): 427-439. 20 Waltman L, van Eck N J. A new methodology for constructing a publication-level classification system of science[J]. Journal of the American Society for Information Science and Technology, 2012, 63(12): 2378-2392. 21 Klavans R, Boyack K W. Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge?[J]. Journal of the Association for Information Science and Technology, 2017, 68(4): 984-998. 22 Lu Y H, Xiong X, Zhang W T, et al. Research on classification and similarity of patent citation based on deep learning[J]. Scientometrics, 2020, 123(2): 813-839. 23 许海云, 董坤, 隗玲. 学科交叉主题识别与预测方法研究[M]. 北京: 科学技术文献出版社, 2019. 24 Li S R, Ou J C. Multi-label classification of research papers using multi-label K-nearest neighbour algorithm[J]. Journal of Physics: Conference Series, 2021, 1994(1): 012031. 25 代林林, 张超群, 汤卫东, 等. 融合对比学习和BERT的层级多标签文本分类模型[J]. 计算机工程与设计, 2024, 45(10): 3111-3119. 26 张淼. 交叉学科多标签文本分类方法对比研究——以图书情报学为例[D]. 南京: 南京农业大学, 2021. 27 Haghighian Roudsari A, Afshar J, Lee W, et al. PatentNet: multi-label classification of patent documents using deep learning based language understanding[J]. Scientometrics, 2022, 127(1): 207-231. 28 Le Q, Mikolov T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. New York: ACM Press, 2014: 1188-1196. 29 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 30 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. 31 汪云云, 陈松灿. 基于AUC的分类器评价和设计综述[J]. 模式识别与人工智能, 2011, 24(1): 64-71. |
|
|
|