|
|
|
| Word Vector Network-based Analysis of Scientific Research Topic Evolution: Revealing the Semantic Drift Process |
| Wang Hongyu1,3, Shi Kaiwen2,4, Wang Xiaoguang2, Jin Zhuang1, Zheng Yang2,4, Huang Han1,3 |
1.School of Management, Wuhan University of Technology, Wuhan 430070 2.School of Information Management, Wuhan University, Wuhan 430072 3.Institute of Digital Governance and Management Decision Innovation, Wuhan University of Technology, Wuhan 430070 4.Yunnan Key Laboratory of Service Computing, Yunnan University of Finance and Economics, Kunming 650221 |
|
|
|
|
Abstract As a classical knowledge network that characterizes the knowledge structure of discipline domains, the co-word network approach is affected by the sparse co-occurrence of feature keywords, different keyword synonyms, and insufficient corpus utilization. This makes it difficult to accurately depict the semantic correlations among keywords in the face of large-scale data in discipline domains. It is of practical significance to extend the co-word network from the theoretical and methodological levels to comprehensively reveal the semantic evolution process of discipline domain research topics at the macro and micro levels. This study considers the feature keywords of discipline domains as network vertices, obtains the vector representations of the feature keywords through the GloVe global word embedding model, and sets the normalized cosine similarity between the corresponding word vectors as concatenated edge weights to construct a fully-connected and undirected word vector network. Furthermore, this paper analyzes the roles and features of the discipline domain word vector networks, proposes a research topic semantic drift analysis framework based on word vector networks, and conducts a comparative analysis of the semantic association relations that it characterizes and co-occurrence relations in the co-word network. It is found that the proposed discipline domain word vector network, as a special class of knowledge network, is the mapping of the co-word network of featured keywords on the semantic hyperspace and has obvious value for the analysis of community structure and temporal evolution. Compared to the co-word network approach, the discipline domain word vector network is consistent in characterizing the key concepts of the discipline domain and is more stable and comprehensive in reflecting the knowledge structure of the discipline domain. It can reveal more detailed evolutionary processes, such as the semantic drift generated by scientific research topics at the micro level.
|
|
Received: 30 November 2024
|
|
|
|
1 王晓光. 科学知识网络的形成与演化(Ⅱ): 共词网络可视化与增长动力学[J]. 情报学报, 2010, 29(2): 314-322. 2 姜鑫, 王德庄, 马海群. 关键词词频变化视角下我国“科学数据”领域研究主题演化分析[J]. 现代情报, 2018, 38(1): 141-146, 161. 3 王晓光. 科学知识网络的形成与演化(Ⅰ): 共词网络方法的提出[J]. 情报学报, 2009, 28(4): 599-605. 4 程齐凯, 王晓光. 一种基于共词网络社区的科研主题演化分析框架[J]. 图书情报工作, 2013, 57(8): 91-96. 5 Wang X G, Cheng Q K, Lu W. Analyzing evolution of research topics with NEViewer: a new method based on dynamic co-word networks[J]. Scientometrics, 2014, 101(2): 1253-1271. 6 巴志超, 杨子江, 朱世伟, 等. 基于关键词语义网络的领域主题演化分析方法研究[J]. 情报理论与实践, 2016, 39(3): 67-72. 7 李纲, 巴志超. 共词分析过程中的若干问题研究[J]. 中国图书馆学报, 2017, 43(4): 93-113. 8 赵一鸣, 尹嘉颖. 语义增强型全文本共词网络的构建与分析[J]. 情报学报, 2023, 42(10): 1187-1198. 9 刘自强, 岳丽欣, 许海云, 等. 时序共词网络构建及其动态可视化研究[J]. 情报学报, 2020, 39(2): 186-198. 10 陈翔, 黄璐, 倪兴兴, 等. 基于动态语义网络分析的主题演化路径识别研究[J]. 情报学报, 2021, 40(5): 500-512. 11 滕婕, 刘莉, 李硕, 等. 动态语义网的高价值热点主题识别与演化路径分析[J]. 图书情报工作, 2023, 67(7): 92-106. 12 Serrano M á, Bogu?á M, Vespignani A. Extracting the multiscale backbone of complex weighted networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(16): 6483-6488. 13 张斌, 马费成. 科学知识网络中的链路预测研究述评[J]. 中国图书馆学报, 2015, 41(3): 99-113. 14 Shi K W, Liu K, He X Y. Heterogeneous hypergraph learning for literature retrieval based on citation intents[J]. Scientometrics, 2024, 129(7): 4167-4188. 15 岳丽欣, 周晓英, 刘自强. 科学知识网络扩散中的社区扩张与收敛模式特征分析——以医疗健康信息领域为例[J]. 图书情报工作, 2020, 64(14): 63-73. 16 席运江. 组织知识的网络表示模型及分析方法[D]. 大连: 大连理工大学, 2007. 17 王旻霞, 赵丙军. 科学知识网络的结构特征及演化动力[J]. 情报杂志, 2014, 33(5): 88-95. 18 李楠, 汪波. 跨学科语义漂移识别与可视化分析[J]. 数据分析与知识发现, 2023, 7(10): 15-24. 19 Xu J, Bu Y, Ding Y, et al. Understanding the formation of interdisciplinary research from the perspective of keyword evolution: a case study on joint attention[J]. Scientometrics, 2018, 117(2): 973-995. 20 潘俊, 吴宗大. 知识发现视角下词汇历时语义挖掘与可视化研究[J]. 情报学报, 2021, 40(10): 1052-1064. 21 Chen B T, Ding Y, Ma F C. Semantic word shifts in a scientific domain[J]. Scientometrics, 2018, 117(1): 211-226. 22 王忠义, 涂悦, 夏立新. 科技文献资源中学科知识漂移研究[J]. 情报理论与实践, 2021, 44(6): 118-124. 23 Callon M, Law J, Rip A. Mapping the dynamics of science and technology: sociology of science in the real world[M]. London: Palgrave Macmillan, 1986. 24 Mihalcea R, Tarau P. TextRank: bringing order into texts[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 25 Wang X G, Wang H Y, Huang H. Evolutionary exploration and comparative analysis of the research topic networks in information disciplines[J]. Scientometrics, 2021, 126(6): 4991-5017. 26 刘自强, 王效岳, 白如江. 语义分类的学科主题演化分析方法研究——以我国图书情报领域大数据研究为例[J]. 图书情报工作, 2016, 60(15): 76-85, 93. 27 王康, 高继平, 潘云涛, 等. 多位态研究主题识别及其演化路径方法研究[J]. 图书情报工作, 2021, 65(11): 113-122. 28 黄菡, 王晓光, 何静, 等. 基于矩阵相似度的主题演化路径判别研究[J]. 情报学报, 2023, 42(11): 1265-1275. 29 王宏宇, 王晓光. 基于大规模开放学术图谱的研究前沿分析框架[J]. 情报理论与实践, 2020, 44(1): 102-109. 30 Yeung A W K, Goto T K, Leung W K. The changing landscape of neuroscience research, 2006-2015: a bibliometric study[J]. Frontiers in Neuroscience, 2017, 11: 120. 31 徐红姣, 曾文, 张运良. 基于Word2Vec的论文和专利主题关联演化分析方法研究[J]. 情报杂志, 2018, 37(12): 36-42. 32 陈柏彤. 科研主题演化过程中的词语迁移研究[D]. 武汉: 武汉大学, 2017. 33 邢福元, 常春. 基于生态学视角的叙词表概念稳定性研究[J]. 情报杂志, 2019, 38(7): 146-150. 34 陈柏彤, 康宇杰. 基于分布式语义分析的学术创新跨领域演化探析[J]. 图书情报工作, 2024, 68(12): 95-108. 35 陈果, 徐赞, 洪思琪, 等. 科技领域词汇语义表示的稳定性研究: 多种词嵌入模型对比[J]. 情报学报, 2024, 43(12): 1440-1452. 36 Wendlandt L, Kummerfeld J K, Mihalcea R. Factors influencing the surprising instability of word embeddings[C]// Proceedings of the 16th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 2092-2102. 37 Zhang Y, Cai X J, Fry C V, et al. Topic evolution, disruption and resilience in early COVID-19 research[J]. Scientometrics, 2021, 126(5): 4225-4253. 38 Zhang T Y, Sun R, Fensel J, et al. Understanding the domain development through a word status observation model[J]. Journal of Informetrics, 2023, 17(2): 101395. 39 唐晓波, 王琼赋, 牟昊. 基于词共现与词向量的概念层次关系自动抽取模型——以学术论文评价领域为例[J]. 情报科学, 2022, 40(10): 3-11, 32. 40 王忠义, 彭思源, 夏立新. 跨学科知识组织的概念关联研究[J]. 中国图书馆学报, 2022, 48(3): 43-62. 41 王晓光, 程齐凯. 基于NEViewer的学科主题演化可视化分析[J]. 情报学报, 2013, 32(9): 900-911. 42 姜雅文. 复杂网络社区发现若干问题研究[D]. 北京: 北京交通大学, 2014. 43 Baroni M, Dinu G, Kruszewski G. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 238-247. 44 张金柱, 于文倩. 基于短语表示学习的主题识别及其表征词抽取方法研究[J]. 数据分析与知识发现, 2021, 5(2): 50-60. 45 陈果, 陈晶, 肖璐. 词汇语义链: 领域分析视角下的词汇语义挖掘理论框架[J]. 情报理论与实践, 2022, 45(4): 170-176, 183. 46 Vylomova E, Murphy S, Haslam N. Evaluation of semantic change of harm-related concepts in psychology[C]// Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change. Stroudsburg: Association for Computational Linguistics, 2019: 29-34. 47 F?rber M, Nishioka C, Jatowt A. ScholarSight: visualizing temporal trends of scientific concepts[C]// Proceedings of the 18th ACM/IEEE Joint Conference on Digital Libraries. Piscataway: IEEE, 2019: 438-439. 48 黄建忠, 赵玲, 何茜茜. 加快中国特色数字经济发展的若干理论思考与政策建议——基于马克思主义政治经济学的视角[J]. 宏观经济研究, 2023(4): 4-13, 82. 49 Carlsson B. The digital economy: what is new and what is not?[J]. Structural Change and Economic Dynamics, 2004, 15(3): 245-264. 责任编辑 冯家琪) |
|
|
|