|
|
Research on Scientific Topic Prediction from the Perspective of Knowledge Unit Reorganization |
Liang Jiwen1,2, Yang Jianlin1,2, Wang Wei1,2 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.Jiangsu Key Laboratory of Data Engineering & Knowledge Service, Nanjing 210023 |
|
|
Abstract Accurate scientific topic prediction can clarify the future development direction of a given discipline and provide a reference for the development planning and management decision-making in the field of scientific research. This paper focuses on the prediction of new scientific topics based on the perspective of knowledge unit reorganization, compares the representation relationship between the topic and feature words to the representation relationship between scientific concepts and knowledge units, and proposes a scientific topic prediction method. First, the LDA (latent Dirichlet allocation) topic model is used to obtain the global topic, feature words, and probability matrix and obtains the feature word vector by transposing the vector space; second, the vector adjustment coefficients are calculated based on the feature word frequencies predicted by the ARIMA (autoregressive integrated moving average model) model to obtain the feature word prediction vectors, the t-SNE (t-distributed stochastic neighbor embedding) algorithm is applied to reduce the dimensionality of the prediction vectors, and then the low-dimensional prediction vectors are clustered by the fuzzy C-mean algorithm to generate prediction topics to realize the reorganization of knowledge units. Finally, the prediction topic with a new interpretation is selected from the aggregation of several original topics, and this is regarded as the scientific topic prediction result. This paper takes the field of “knowledge management-knowledge organization-knowledge service” as an example for conducting empirical research. The results show that the proposed scientific topic prediction method in this paper can effectively predict new scientific topics from which the essential concepts and the corresponding research content of some words have not appeared at that time, such as digital humanities and knowledge payment.
|
Received: 11 May 2022
|
|
|
|
1 陈玉祥, 朱桂龙, 陈德棉. 科学发展预测的概念和功能[J]. 预测, 1994, 13(1): 57-61. 2 陈德棉, 潘皖印, 毛家杰. 科学预测和技术预测的方法研究[J]. 科学学研究, 1997(4): 56-62. 3 赵红洲, 蒋国华. 知识单元与指数规律[J]. 科学学与科学技术管理, 1984, 5(9): 39-41. 4 Swanson D R. Undiscovered public knowledge[J]. The Library Quarterly, 1986, 56(2): 103-118. 5 王子舟, 王碧滢. 知识的基本组分——文献单元和知识单元[J]. 中国图书馆学报, 2003, 29(1): 5-11. 6 Rotolo D, Hicks D, Martin B R. What is an emerging technology?[J]. Research Policy, 2015, 44(10): 1827-1843. 7 白敬毅, 颜端武, 陈琼. 基于主题模型和曲线拟合的新兴主题趋势预测研究[J]. 情报理论与实践, 2020, 43(7): 130-136, 193. 8 Li Y T, Chen Y, Wang Q Y. Evolution and diffusion of information literacy topics[J]. Scientometrics, 2021, 126(5): 4195-4224. 9 王康, 陈悦, 苏成, 等. 多维视角下科学主题演化分析框架[J]. 情报学报, 2021, 40(3): 297-307. 10 Huang L, Chen X, Zhang Y, et al. Identification of topic evolution: network analytics with piecewise linear representation and word embedding[J]. Scientometrics, 2022, 127(9): 5353-5383. 11 Mryglod O, Holovatch Y, Kenna R, et al. Quantifying the evolution of a scientific topic: reaction of the academic community to the Chornobyl disaster[J]. Scientometrics, 2016, 106(3): 1151-1166. 12 马费成, 陈潇俊, 刘向. 基于科学知识图谱分析的知识演化研究—以生物医学为例[J]. 情报科学, 2012, 30(1): 1-7, 15. 13 Li P, Yang G L, Wang C Q. Visual topical analysis of library and information science[J]. Scientometrics, 2019, 121(3): 1753-1791. 14 王伟, 杨建林. 基于引文网络重叠社团发现的图书情报领域学科主题结构分析[J]. 情报学报, 2020, 39(10): 1021-1033. 15 王曰芬, 傅柱, 陈必坤. 基于LDA主题模型的科学文献主题识别: 全局和学科两个视角的对比分析[J]. 情报理论与实践, 2016, 39(7): 121-126, 101. 16 Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 17 李湘东, 张娇, 袁满. 基于LDA模型的科技期刊主题演化研究[J]. 情报杂志, 2014, 33(7): 115-121. 18 赵新琴, 吴鹏. 基于TDT技术的新冠肺炎疫情文献主题演化研究[J]. 科技情报研究, 2022, 4(2): 49-60. 19 Figuerola C G, Marco F J G, Pinto M. Mapping the evolution of library and information science (1978-2014) using topic modeling on LISA[J]. Scientometrics, 2017, 112(3): 1507-1535. 20 Wang Z B, Ma L, Zhang Y Q. A hybrid document feature extraction method using latent Dirichlet allocation and word2vec[C]// Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace. Piscataway: IEEE, 2016: 98-103. 21 Liu Y, Liu Z Y, Chua T S, et al. Topical word embeddings[C]// Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 2418-2424. 22 陈茫, 张庆普. 我国知识服务研究的演进历程知识图谱与研究态势探讨[J]. 情报资料工作, 2018(2): 80-91. 23 隗玲, 许海云, 胡正银, 等. 学科主题演化路径的多模式识别与预测——一个情报学学科主题演化案例[J]. 图书情报工作, 2016, 60(13): 71-81. 24 Chen W, Lin C R, Li C Y, et al. Tracing the evolution of 3-D printing technology in China using LDA-based patent abstract mining[J]. IEEE Transactions on Engineering Management, 2022, 69(4): 1135-1145. 25 岳丽欣, 周晓英, 陈旖旎. 基于ARIMA模型的信息构建研究主题趋势预测研究[J]. 图书情报知识, 2019(5): 54-63, 72. 26 朱光, 刘蕾, 李凤景. 基于LDA和LSTM模型的研究主题关联与预测研究——以隐私研究为例[J]. 现代情报, 2020, 40(8): 38-50. 27 霍朝光, 霍帆帆, 董克. 基于LSTM神经网络的学科主题热度预测模型[J]. 图书情报知识, 2021(2): 25-34. 28 董克. 预见学科之美: 学科主题预测研究[J]. 图书情报知识, 2021(2): 封二. 29 赵一鸣, 张进, 黎苑楚. 基于多维尺度模型的潜在主题可视化研究[J]. 情报学报, 2014, 33(1): 45-54. 30 奉国和, 孔泳欣, 肖洁琼. 基于加权关键词的领域热点与趋势分析新方法[J]. 图书情报工作, 2018, 62(18): 102-109. 31 der Maaten L, Hinton G E. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. 32 陈挺, 李国鹏, 王小梅. 基于t-SNE降维的科学基金资助项目可视化方法研究[J]. 数据分析与知识发现, 2018, 2(8): 1-9. 33 Chang I C, Yu T K, Chang Y J, et al. Applying text mining, clustering analysis, and latent Dirichlet allocation techniques for topic classification of environmental education journals[J]. Sustainability, 2021, 13(19): Article No.10856. 34 朱晓峰, 葛锐, 盛天祺. 四十年来我国情报学研究的学术变迁与学理支撑[J]. 科技情报研究, 2022, 4(2): 1-14. 35 朱晓峰, 蒋旭牧, 张卫. 领域知识组织研究的历史演化与未来展望[J]. 情报资料工作, 2021, 42(5): 23-31. 36 Pal N R, Bezdek J C. On cluster validity for the fuzzy c-means model[J]. IEEE Transactions on Fuzzy Systems, 1995, 3(3): 370-379. 37 Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques[J]. Journal of Intelligent Information Systems, 2001, 17(2): 107-145. 38 初景利, 栾瑞英, 孔媛. 国外高水平高校智库运行机制特征剖析[J]. 图书馆论坛, 2018, 38(4): 8-16. 39 张琪, 王东波, 黄水清, 等. 史书多维知识重组与可视化研究——以《史记》为对象[J]. 情报学报, 2022, 41(2): 130-141. 40 张卫, 王昊, 邓三鸿, 等. 面向数字人文的古诗文本情感术语抽取与应用研究[J]. 中国图书馆学报, 2021, 47(4): 113-131. 41 夏翠娟, 张磊. 关联数据在家谱数字人文服务中的应用[J]. 图书馆杂志, 2016, 35(10): 26-34. 42 梁徐静. 数字出版与知识付费[M]. 广州: 中山大学出版社, 2020: 168-170. 43 郭宇, 郭勇, 刘文晴, 等. 国内互联网知识付费研究现状与发展趋势[J]. 图书情报工作, 2021, 65(24): 100-108. 44 初景利, 张冬荣. 第二代学科馆员与学科化服务[J]. 图书情报工作, 2008, 52(2): 6-10, 68. 45 张晓林. 颠覆性变革与后图书馆时代——推动知识服务的供给侧结构性改革[J]. 中国图书馆学报, 2018, 44(1): 4-16. 46 石玉玲, 陈万明. 我国知识管理研究现状、热点与趋势[J]. 新世纪图书馆, 2020(4): 85-91. |
|
|
|