|
|
Topic Mining and Dynamic Evolution Analysis of Funding Projects: Case Studies of AI Field in NSF Data |
Jin Jialin1, Wang Yuefen1,2,3, Ba Zhichao4, Cen Yonghua2,3 |
1.School of Economics & Management, Nanjing University of Science & Technology, Nanjing 210094 2.Management School of Tianjin Normal University, Tianjin 300387 3.Institute for Big Data Science, Tianjin Normal University, Tianjin 300387 4.Laboratory of Data Intelligence and Interdisciplinary Innovation, Nanjing University, Nanjing 210023 |
|
|
Abstract This study aims to construct an analysis process of topic mining and dynamic evolution for funding projects data. By modeling and mining the relationship between funding projects and the directorate using title and abstract metadata, this proposed process can locate the characteristics of topics, the scope and focus of the directorate, the developing direction, and the evolution of a certain funding field from the perspective of funding project content. Firstly, keywords are extracted from the title and abstract of the funding projects using the rapid automatic keyword extraction (RAKE) algorithm, and the core keywords are obtained through the term segmentation method. Subsequently, the word vector is modeled with the core keywords using the Google word2vec deep learning tool, and the word vector is clustered to mine the topics through the k-means algorithm. Finally, the distribution of topics is described and the similarity between the topics is calculated using the word mover's distance (WMD) algorithm for analyzing the evolutionary trend and the primary evolutionary path of the topics. Using artificial intelligence (AI) in the National Science Foundation (NSF) data, it is discovered that the proposed process can recognize the topics within the AI field and the specific focus of the different directorates. Moreover, through the proposed process, it is observed that the evolution of these topics presents a complex situation of a large number of division and integration, a clear evolutionary path, a prominent focus. and a key evolutionary path through the evolutionary intensity of topics, which indicate that this process can reveal the funding direction for integrating and promoting the related technology in certain fields and can provide strong support for academic research and government planning.
|
Received: 22 July 2021
|
|
|
|
1 王文娟, 马建霞. 基于LDA的科研项目主题挖掘与演化分析——以NSF海洋酸化研究为例[J]. 情报杂志, 2017, 36(7): 34-39. 2 徐路路, 王效岳, 白如江, 等. 基于DTM模型和文本特征分析的基金项目新兴趋势探测研究——以NSF石墨烯领域为例[J]. 数据分析与知识发现, 2018, 2(3): 87-97. 3 Coccia M, Bozeman B. Allometric models to measure and analyze the evolution of international research collaboration[J]. Scientometrics, 2016, 108(3): 1065-1084. 4 Barrios C, Flores E, ángeles Martínez M, et al. Is there convergence in international research collaboration? An exploration at the country level in the basic and applied science fields[J]. Scientometrics, 2019, 120: 631-659. 5 Kawamura T, Watanabe K, Matsumoto N, et al. Funding map using paragraph embedding based on semantic diversity[J]. Scientometrics, 2018, 116: 941-958. 6 赵常煜, 吴亚平, 王继民. “一带一路”倡议下的Twitter文本主题挖掘和情感分析[J]. 图书情报工作, 2019, 63(19): 119-127. 7 王艳东, 付小康, 李萌萌. 一种基于共词网络的社交媒体数据主题挖掘方法[J]. 武汉大学学报·信息科学版, 2018, 43(12): 2287-2294. 8 杨玉娟, 冯霞, 王永利. QH-K: 面向新闻文本主题抽取的改进H-K聚类算法[J]. 南京邮电大学学报(自然科学版), 2020, 40(1): 82-88. 9 Nichols L G. A topic model approach to measuring interdisciplinarity at the National Science Foundation[J]. Scientometrics, 2014, 100(3): 741-754. 10 隗玲, 许海云, 胡正银, 等. 学科主题演化路径的多模式识别与预测——一个情报学学科主题演化案例[J]. 图书情报工作, 2016, 60(13): 71-81. 11 陈悦, LamirelJean-Charles, 刘则渊. 中国科学学40年研究主题变迁——基于特征最大化F指标的文本内容分析[J]. 科学学与科学技术管理, 2018, 39(12): 28-45. 12 李海林, 邬先利. 基于时间序列聚类的主题发现与演化分析研究[J]. 情报学报, 2019, 38(10): 1041-1050. 13 Rose S, Engel D, Cramer N, et al. Automatic keyword extraction from individual documents[M]// Text Mining: Applications and Theory. New York: John Wiley & Sons, 2010: 1-20. 14 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2013, 2: 3111-3119. 15 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. (2013-09-07) [2022-07-21]. https://arxiv.org/pdf/1301.3781.pdf. 16 王曰芬, 张露, 张洁逸. 产业领域核心专利识别与演化分析——以人工智能领域为例[J]. 情报科学, 2020, 38(12): 19-26. 17 Kusner M J, Sun Y, Kolkin N I, et al. From word embeddings to document distances[C]// Proceedings of the 32nd International Conference on Machine Learning. JMLR.org, 2015: 957-966. |
|
|
|