基于语义相似关系的学科交叉主题识别方法

doi:10.3772/j.issn.1000-0135.2024.01.004

情报学报

2024, Vol. 43

Issue (1): 34-47 DOI: 10.3772/j.issn.1000-0135.2024.01.004

情报理论与方法

本期目录 | 过刊浏览 | 高级检索

基于语义相似关系的学科交叉主题识别方法

王卫军^1,2,3, 宁致远^2,3, 董昊^2,3, 乔子越^2,3, 杜一^2,3, 周园春^2,3

1.河南财经政法大学图书馆,郑州 450046
2.中国科学院计算机网络信息中心,北京 100190
3.中国科学院大学,北京 100049

Interdisciplinary Topic Identification Method Based on Semantic Similarity Relationship

Wang Weijun^1,2,3, Ning Zhiyuan^2,3, Dong Hao^2,3, Qiao Ziyue^2,3, Du Yi^2,3, Zhou Yuanchun^2,3

1.Library of Henan University of Economics and Law, Zhengzhou 450046
2.Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190
3.University of Chinese Academy of Sciences, Beijing 100049

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (3656 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要识别不同学科间共有的研究内容是学科交叉知识发现的一种研究思路。学科间具有相似语义的研究内容，能够更好地体现学科之间知识的融合、交流现象。针对从科技文献数据中获取语义相似学科交叉研究主题的问题，本文提出了一种基于无监督对比学习的科技文献及关键词语义相似关系表示学习方法，构建了一种语义相似学科交叉主题识别模型。该模型将Spearman相关系数作为评价学科交叉主题的指标，解决了现有研究缺少学科交叉研究数据集的问题。研究结果表明，本文模型较好地获取了科技文献及其关键词之间的语义相似关系，能够较好地反映两个学科之间的交叉态势。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王卫军
	宁致远
	董昊
	乔子越
	杜一
	周园春

关键词 ：科研项目, 学科交叉主题, 对比学习, 表示学习

收稿日期: 2023-02-15

基金资助:国家自然科学基金重点项目“面向领域大数据的知识图谱构建”（61836013）；国家自然科学基金优秀青年科学基金项目（T2322027）；中国科学院青年创新促进会项目（2021166）。

作者简介: 王卫军，男，1981年生，博士，副研究馆员，主要研究方向为知识图谱、科研数据挖掘；宁致远，男，1996年生，博士研究生，主要研究方向为知识图谱、自然语言处理；董昊，男，1997年生，博士研究生，主要研究方向为知识推理、知识图谱；乔子越，男，1996年生，博士，主要研究方向为知识图谱、图数据挖掘；杜一，通信作者，男，1988年生，博士，研究员，硕士生导师，主要研究方向为大数据、知识图谱，E-mail：duyi@cnic.cn；周园春，男，1975年生，博士，研究员，博士生导师，主要研究方向为大数据、知识图谱；

引用本文:

王卫军, 宁致远, 董昊, 乔子越, 杜一, 周园春. 基于语义相似关系的学科交叉主题识别方法[J]. 情报学报, 2024, 43(1): 34-47.
Wang Weijun, Ning Zhiyuan, Dong Hao, Qiao Ziyue, Du Yi, Zhou Yuanchun. Interdisciplinary Topic Identification Method Based on Semantic Similarity Relationship. 情报学报, 2024, 43(1): 34-47.

链接本文:

https://qbxb.istic.ac.cn/CN/10.3772/j.issn.1000-0135.2024.01.004 或 https://qbxb.istic.ac.cn/CN/Y2024/V43/I1/34

1 Easton D. The division, integration, and transfer of knowledge[J]. Bulletin of the American Academy of Arts and Sciences, 1991, 44(4): 8-27.
2 许海云, 董坤, 隗玲. 学科交叉主题识别与预测方法研究[M]. 北京: 科学技术文献出版社, 2019.
3 Xu J, Bu Y, Ding Y, et al. Understanding the formation of interdisciplinary research from the perspective of keyword evolution: a case study on joint attention[J]. Scientometrics, 2018, 117(2): 973-995.
4 Dong K, Xu H Y, Luo R, et al. An integrated method for interdisciplinary topic identification and prediction: a case study on information science and library science[J]. Scientometrics, 2018, 115(2): 849-868.
5 Xu H Y, Guo T, Yue Z H, et al. Interdisciplinary topics of information science: a study based on the terms interdisciplinarity index series[J]. Scientometrics, 2016, 106(2): 583-601.
6 Ba Z C, Cao Y J, Mao J, et al. A hierarchical approach to analyzing knowledge integration between two fields—a case study on medical informatics and computer science[J]. Scientometrics, 2019, 119(3): 1455-1486.
7 赵晓春. 跨学科研究与科研创新能力建设[D]. 合肥: 中国科学技术大学, 2007.
8 李丽刚. 中国高校跨学科研究的发展研究[D]. 长沙: 国防科学技术大学, 2005.
9 张琳, 黄颖. 交叉科学: 测度、评价与应用[M]. 北京: 科学出版社, 2019.
10 杨良斌, 周秋菊, 金碧辉. 基于文献计量的跨学科测度及实证研究[J]. 图书情报工作, 2009, 53(10): 87-90, 115.
11 Kwakkel J H, Cunningham S W. Managing polysemy and synonymy in science mapping using the mixtures of factor analyzers model[J]. Journal of the American Society for Information Science and Technology, 2009, 60(10): 2064-2078.
12 Mennes J, Pedersen T, Lefever E. Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study[J]. Language Resources and Evaluation, 2019, 53(4): 889-917.
13 Chen B T, Ding Y, Ma F C. Semantic word shifts in a scientific domain[J]. Scientometrics, 2018, 117(1): 211-226.
14 Nichols L G. A topic model approach to measuring interdisciplinarity at the National Science Foundation[J]. Scientometrics, 2014, 100(3): 741-754.
15 He Q. Knowledge discovery through co-word analysis[J]. Library Trends, 1999, 48(1): 133-159.
16 中国中文信息学会语言与知识计算专委会. 知识图谱发展报告(2018)[R]. 北京: 中国中文信息学会语言与知识计算专委会, 2018.
17 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261.
18 Firth J R. A synopsis of linguistic theory, 1930-1955[M]// Studies in Linguistic Analysis. Oxford: The Philological Society, 1957: 1-32.
19 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2013: 3111-3119.
20 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[C]// Proceedings of the Workshop of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013.
21 Pennington J, Socher R, Manning C. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543.
22 Peters M E, Neumann M, Iyyer M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 2227-2237.
23 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186.
24 Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[EB/OL]. [2023-02-01]. https://paperswithcode.com/paper/improving-language-understanding-by.
25 Ethayarajh K. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 55-65.
26 Li B H, Zhou H, He J X, et al. On the sentence embeddings from pre-trained language models[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2020: 9119-9130.
27 Wang T Z, Isola P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere[C]// Proceedings of the 37th International Conference on Machine Learning. JMLR.org, 2020: 9929-9939.
28 Gao T Y, Yao X C, Chen D Q. SimCSE: simple contrastive learning of sentence embeddings[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 6894-6910.
29 Reimers N, Beyer P, Gurevych I. Task-oriented intrinsic evaluation of semantic textual similarity[C]// Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka: The COLING 2016 Organizing Committee, 2016: 87-96.
30 Liu Y H, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach[OL]. (2019-07-26). https://arxiv.org/pdf/1907.11692.pdf.
31 Su J L. SimBERT: integrating retrieval and generation into BERT[EB/OL]. [2023-12-15]. https://github.com/ZhuiyiTechnology/simbert.
32 Bao H B, Dong L, Wei F R, et al. UniLMv2: pseudo-masked language models for unified language model pre-training[C]// Proceedings of the 37th International Conference on Machine Learning. JMLR.org, 2020: 642-652.
33 Stirling A. A general framework for analysing diversity in science, technology and society[J]. Journal of the Royal Society Interface, 2007, 4(15): 707-719.
34 许海云, 刘春江, 雷炳旭, 等. 学科交叉的测度、可视化研究及应用——一个情报学文献计量研究案例[J]. 图书情报工作, 2014, 58(12): 95-101.