|
|
Emerging Topic Recognition Based on Three-Dimensional Topic Feature Measurement |
Zheng Dejun, Cheng Wei |
College of Information Management, Nanjing Agricultural University, Nanjing 210095 |
|
|
Abstract Recognizing emerging topics is conducive to monitoring the latest trends in development over time, providing valuable information support for researchers’ topic selection and research managers’ policy decisions. In this study, an emerging topic recognition method based on a three-dimensional topic feature measurement is proposed. First, topic modeling is conducted using domain semantic knowledge from BERTopic, representing topics by documents as the basic unit. Next, a three-dimensional topic feature index framework based on time, reference, and correlation is constructed to identify emerging topics. The feasibility and effectiveness of the proposed method are discussed through empirical studies, using domain data on text classification as examples. The findings reveal that using documents as the basic unit enhances the exploration of topic features, the three-dimensional topic feature index framework demonstrates good adaptability and expansibility, and the proposed method can be generalized application in other domains. At the theoretical level, this work provides a reference method for emerging topic recognition research. At the practical level, it can serve as a reference tool for scientific and technological intelligence analysis and domain development trend analysis.
|
Received: 30 April 2023
|
|
|
|
1 卢超, 侯海燕, DingYing, 等. 国外新兴研究话题发现研究综述[J]. 情报学报, 2019, 38(1): 97-110. 2 Liang Z T, Mao J, Lu K, et al. Combining deep neural network and bibliometric indicator for emerging research topic prediction[J]. Information Processing & Management, 2021, 58(5): 102611. 3 段庆锋, 闫绪娴, 陈红, 等. 基于媒介比较的学科新兴主题动态识别——altmetrics与引文数据的融合方法[J]. 情报学报, 2022, 41(9): 930-944. 4 钱旦敏, 楼筱湾, 王华麟, 等. 我国信息资源管理学科及其邻近学科视角下的新兴主题识别[J]. 图书馆论坛, 2023, 43(9): 54-64. 5 郝雯柯, 杨建林. 基于语义表示和动态主题模型的社科领域新兴主题预测研究[J]. 情报理论与实践, 2023, 46(2): 184-193. 6 刘春江, 刘自强, 方曙. 基于SAO的技术主题创新演化路径识别及其可视化研究[J]. 情报学报, 2023, 42(2): 164-175. 7 贺德方, 潘云涛. 科技评价的内涵、分类与方法辨析及完善策略[J]. 情报学报, 2023, 42(1): 1-9. 8 Yang S L, Han R Z, Wolfram D, et al. Visualizing the intellectual structure of information science (2006-2015): introducing author keyword coupling analysis[J]. Journal of Informetrics, 2016, 10(1): 132-150. 9 Hou J H, Yang X C, Chen C M. Emerging trends and new developments in information science: a document co-citation analysis (2009-2016)[J]. Scientometrics, 2018, 115(2): 869-892. 10 马铭, 王超, 周勇, 等. 基于语义信息的核心技术主题识别与演化趋势分析方法研究[J]. 情报理论与实践, 2021, 44(9): 106-113. 11 荣国阳, 李长玲, 范晴晴, 等. 主题热度加速度指数——学科研究热点识别新方法[J]. 图书情报工作, 2021, 65(20): 59-67. 12 段庆锋, 陈红, 刘东霞, 等. 基于LSTM模型与加权链路预测的学科新兴主题成长性识别研究[J]. 现代情报, 2022, 42(9): 37-48, 142. 13 叶光辉, 王灿灿, 李松烨. 基于SciTS会议文本的跨学科科研协作新兴主题识别及预测[J]. 情报科学, 2022, 40(7): 126-135. 14 张东鑫, 张敏. 图情领域LDA主题模型应用研究进展述评[J]. 图书情报知识, 2022, 39(6): 143-157. 15 Moody C E. Mixing Dirichlet topic models and word embeddings to make lda2vec[OL]. (2016-05-06). https://arxiv.org/pdf/1605.02019.pdf. 16 胡吉明, 陈果. 基于动态LDA主题模型的内容主题挖掘与演化[J]. 图书情报工作, 2014, 58(2): 138-142. 17 张小平, 周雪忠, 黄厚宽, 等. 一种改进的LDA主题模型[J]. 北京交通大学学报, 2010, 34(2): 111-114. 18 Qin Y W, Qin X Z, Chen H H, et al. Measuring cognitive proximity using semantic analysis: a case study of China’s ICT industry[J]. Scientometrics, 2021, 126(7): 6059-6084. 19 沈思, 李沁宇, 叶媛, 等. 基于TWE模型的医学科技报告主题挖掘及演化分析研究[J]. 数据分析与知识发现, 2021, 5(3): 35-44. 20 Liu H L, Chen Z W, Tang J, et al. Mapping the technology evolution path: a novel model for dynamic topic detection and tracking[J]. Scientometrics, 2020, 125(3): 2043-2090. 21 Wu H, Yi H F, Li C. An integrated approach for detecting and quantifying the topic evolutions of patent technology: a case study on graphene field[J]. Scientometrics, 2021, 126(8): 6301-6321. 22 Zhang Y J, Ma J L, Wang Z J, et al. Collective topical PageRank: a model to evaluate the topic-dependent academic impact of scientific papers[J]. Scientometrics, 2018, 114(3): 1345-1372. 23 赵蓉英, 戴祎璠, 王旭. 基于LDA模型与ATM模型的学者影响力评价研究——以我国核物理学科为例[J]. 情报科学, 2019, 37(6): 3-9. 24 王婷婷, 韩满, 王宇. LDA模型的优化及其主题数量选择研究——以科技文献为例[J]. 数据分析与知识发现, 2018, 2(1): 29-40. 25 徐月梅, 吕思凝, 蔡连侨, 等. 结合卷积神经网络和Topic2Vec的新闻主题演变分析[J]. 数据分析与知识发现, 2018, 2(9): 31-41. 26 Abuzayed A, Al-Khalifa H. BERT for Arabic topic modeling: an experimental study on BERTopic technique[J]. Procedia Computer Science, 2021, 189: 191-194. 27 张敏, 沈嘉裕. 突发公共卫生事件中政务短视频主题与用户行为的关联演化研究[J]. 情报杂志, 2023, 42(3): 181-189. 28 孙佳佳, 李雅静. 基于关键词价值细分的高价值热点主题识别方法研究[J]. 情报学报, 2022, 41(2): 118-129. 29 李慧, 王若婷. 基于文献—关键词双模网络的热点识别方法研究——以数字人文领域为例[J]. 情报理论与实践, 2022, 45(11): 107-114. 30 许海云, 张慧玲, 武华维, 等. 新兴研究主题在演化路径上的关键时间点研究[J]. 图书情报工作, 2021, 65(8): 51-64. 31 Liu X Y, Porter A L. A 3-dimensional analysis for evaluating technology emergence indicators[J]. Scientometrics, 2020, 124(1): 27-55. 32 Zhang S T, Han F. Identifying emerging topics in a technological domain[J]. Journal of Intelligent & Fuzzy Systems, 2016, 31(4): 2147-2157. 33 Li M N, Wang W S, Zhou K Y. Exploring the technology emergence related to artificial intelligence: a perspective of coupling analyses[J]. Technological Forecasting and Social Change, 2021, 172: 121064. 34 Xu H Y, Winnink J, Yue Z H, et al. Multidimensional scientometric indicators for the detection of emerging research topics[J]. Technological Forecasting and Social Change, 2021, 163: 120490. 35 陈虹枢, 宋亚慧, 金茜茜, 等. 动态主题网络视角下的突破性创新主题识别: 以区块链领域为例[J]. 图书情报工作, 2022, 66(10): 45-58. 36 Kim E H J, Jeong Y K, Kim Y H, et al. Exploring scientific trajectories of a large-scale dataset using topic-integrated path extraction[J]. Journal of Informetrics, 2022, 16(1): 101242. 37 孙晓玲, 陈娜, 丁堃. 基于组合概率的技术主题新颖性研究[J]. 情报学报, 2022, 41(10): 1015-1023. 38 杨瑞仙, 高鑫宁, 董克. 我国学术代表作评价研究进展[J]. 图书情报工作, 2022, 66(17): 129-140. 39 Xu S, Hao L Y, An X, et al. Emerging research topics detection with multiple machine learning models[J]. Journal of Informetrics, 2019, 13(4): 100983. 40 高楠, 高嘉骐, 陈洪璞. 新兴技术识别与演化路径分析方法研究——以集成电路领域为例[J]. 情报科学, 2023, 41(3): 127-135, 172. 41 Grootendorst M. BERTopic: neural topic modeling with a class-based TF-IDF procedure[OL]. (2022-03-11). https://arxiv.org/pdf/2203.05794.pdf. 42 Gao Q, Huang X, Dong K, et al. Semantic-enhanced topic evolution analysis: a combination of the dynamic topic model and word2vec[J]. Scientometrics, 2022, 127(3): 1543-1563. 43 王伟, 梁继文, 杨建林. 基于引文网络的领域主题层次结构识别方法研究[J]. 图书情报工作, 2022, 66(17): 81-92. 44 Kim M, Baek I, Song M. Topic diffusion analysis of a weighted citation network in biomedical literature[J]. Journal of the Association for Information Science and Technology, 2018, 69(2): 329-342. 45 Gao T Y, Yao X C, Chen D Q. SimCSE: simple contrastive learning of sentence embeddings[C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2021: 6894-6910. 46 王康, 高继平, 潘云涛, 等. 多位态研究主题识别及其演化路径方法研究[J]. 图书情报工作, 2021, 65(11): 113-122. 47 黄菡, 王晓光, 王依蒙. 复杂网络视角下的研究主题学科交叉测度研究[J]. 图书情报工作, 2022, 66(19): 99-109. 48 刘航冶, 富铁楠, 杨勇. 互联网开源文本情报智能分析技术综述[J]. 情报杂志, 2023, 42(2): 12-16. 49 淦亚婷, 安建业, 徐雪. 基于深度学习的短文本分类方法研究综述[J]. 计算机工程与应用, 2023, 59(4): 43-53. 50 Minaee S, Kalchbrenner N, Cambria E, et al. Deep learning—based text classification: a comprehensive review[J]. ACM Computing Surveys, 2022, 54(3): Article No.62. |
|
|
|