|
|
Recent Advancements in Detection and Evolutionary Tracking of Scientific Topics: A Multi-perspective Survey and Prospect |
Cen Yonghua1,2, Wang Yuefen1,2 |
1.Management School, Tianjin Normal University, Tianjin 300387 2.Institute for Big Data Science, Tianjin Normal University, Tianjin 300387 |
|
|
Abstract Leveraging instructive approaches to comprehensively and accurately detect and track topics from the big data of historical literature is a hot and cutting-edge issue that has garnered substantial interest from different disciplines, especially scientometrics in information science. The major underlying mechanisms involve four perspectives, which include frequency, content, citation, and synthesized analyses. This study attempts to review the latest literature published in prestigious international and domestic journals, to update the main progress of relevant perspectives in scientific topic detection and evolution analysis by unravelling their essential implementation paths. In particular, it considers the risks inherited in existing perspectives, including the biases in heterogeneous importance of knowledge units or network relationships, temporal decay of knowledge, trap of the small samples of emergent topics, dilemma in fitting the natural development and evolution of topics, and failure in portraying knowledge flow and evolution in micro level. In response to these issues, the study focuses attention to the future trend of a synthesized perspective.
|
Received: 18 April 2022
|
|
|
|
1 Chen B T, Tsutsui S, Ding Y, et al. Understanding the topic evolution in a scientific domain: an exploratory study for the field of information retrieval[J]. Journal of Informetrics, 2017, 11(4): 1175-1189. 2 曹树金, 吴育冰, 韦景竹, 等. 知识图谱研究的脉络、流派与趋势——基于SSCI与CSSCI期刊论文的计量与可视化[J]. 中国图书馆学报, 2015, 41(5): 16-34. 3 谭章禄, 彭胜男, 王兆刚. 基于聚类分析的国内文本挖掘热点与趋势研究[J]. 情报学报, 2019, 38(6): 578-585. 4 顾秀丽, 黄颖, 孙蓓蓓, 等. 图书情报领域中的交叉科学研究: 进展与展望[J]. 情报学报, 2020, 39(5): 478-491. 5 周建, 刘炎宝, 刘佳佳. 情感分析研究的知识结构及热点前沿探析[J]. 情报学报, 2020, 39(1): 111-124. 6 Coccia M. The evolution of scientific disciplines in applied sciences: dynamics and empirical properties of experimental physics[J]. Scientometrics, 2020, 124: 451-487. 7 Ma J X, Lund B. The evolution and shift of research topics and methods in library and information science[J]. Journal of the Association for Information Science and Technology, 2021, 72(8): 1059-1074. 8 Li X R, Qiao H, Wang S Y. Exploring evolution and emerging trends in business model study: a co-citation analysis[J]. Scientometrics, 2017, 111(2): 869-887. 9 Hou J H. Exploration into the evolution and historical roots of citation analysis by referenced publication year spectroscopy[J]. Scientometrics, 2017, 110(3): 1437-1452. 10 Pan R K, Petersen A M, Pammolli F, et al. The memory of science: inflation, myopia, and the knowledge network[J]. Journal of Informetrics, 2018, 12(3): 656-678. 11 Tang X L, Li X, Ding Y, et al. The pace of artificial intelligence innovations: speed, talent, and trial-and-error[J]. Journal of Informetrics, 2020, 14(4): 101094. 12 Min C, Ding Y, Li J, et al. Innovation or imitation: the diffusion of citations[J]. Journal of the Association for Information Science and Technology, 2018, 69(10): 1271-1282. 13 Trevisani M, Tuzzi A. Learning the evolution of disciplines from scientific literature: a functional clustering approach to normalized keyword count trajectories[J]. Knowledge-Based Systems, 2018, 146: 129-141. 14 奉国和, 孔泳欣. 基于时间加权关键词词频分析的学科热点研究[J]. 情报学报, 2020, 39(1): 100-110. 15 王康, 陈悦, 苏成, 等. 多维视角下科学主题演化分析框架[J]. 情报学报, 2021, 40(3): 297-307. 16 Lu W, Huang S Z, Yang J Q, et al. Detecting research topic trends by author-defined keyword frequency[J]. Information Processing & Management, 2021, 58(4): 102594. 17 李海林, 邬先利. 基于时间序列聚类的主题发现与演化分析研究[J]. 情报学报, 2019, 38(10): 1041-1050. 18 赵一鸣, 吕鹏辉. 学科知识网络研究(Ⅲ)共词网络的结构、特征与演化[J]. 情报学报, 2014, 33(4): 358-366. 19 许鑫, 陈路遥, 杨佳颖. 数字人文研究领域的知识网络演化——基于题录信息和引文上下文 的关键词共词分析[J]. 情报学报, 2019, 38(3): 322-334. 20 Chumachenko A V, Kreminskyi B G, Mosenkis I L, et al. Dynamics of topic formation and quantitative analysis of hot trends in physical science[J]. Scientometrics, 2020, 125(1): 739-753. 21 吴胜男, 卫慧蓉, 于琦, 等. 结构-内容视角下的学科领域主题演化分析——以肺癌靶向药物领域为例[J]. 信息资源管理学报, 2020, 10(5): 112-121. 22 孙震, 冷伏海. 一种基于知识元共现的ESI研究前沿知识演进分析方法[J]. 情报学报, 2018, 37(11): 1095-1113. 23 刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究——以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016, 42(6): 67-84. 24 李海林, 万校基, 林春培. 基于关键词重要性和近邻传播聚类的主题分析研究[J]. 情报学报, 2018, 37(5): 533-542. 25 Wang X G, Wang H Y, Huang H. Evolutionary exploration and comparative analysis of the research topic networks in information disciplines[J]. Scientometrics, 2021, 126(6): 4991-5017. 26 Zhang Q R, Li Y, Liu J S, et al. A dynamic co-word network-related approach on the evolution of China’s urbanization research[J]. Scientometrics, 2017, 111(3): 1623-1642. 27 Duan Y R, Guan Q. Predicting potential knowledge convergence of solar energy: bibliometric analysis based on link prediction model[J]. Scientometrics, 2021, 126(5): 3749-3773. 28 陈伟, 林超然, 李金秋, 等. 基于LDA-HMM的专利技术主题演化趋势分析——以船用柴油机技术为例[J]. 情报学报, 2018, 37(7): 732-741. 29 Figuerola C G, Marco F J G, Pinto M. Mapping the evolution of library and information science (1978-2014) using topic modeling on LISA[J]. Scientometrics, 2017, 112(3): 1507-1535. 30 Zhai Y J, Ding Y, Wang F. Measuring the diffusion of an innovation: a citation analysis[J]. Journal of the Association for Information Science and Technology, 2018, 69(3): 368-379. 31 徐璐璐, 杜建, 叶鹰. 21世纪以来医学信息学研究走向及其健康信息学转向[J]. 情报学报, 2020, 39(7): 777-786. 32 Han X Y. Evolution of research topics in LIS between 1996 and 2019: an analysis based on latent Dirichlet allocation topic model[J]. Scientometrics, 2020, 125(3): 2561-2595. 33 Wu H, Yi H F, Li C. An integrated approach for detecting and quantifying the topic evolutions of patent technology: a case study on graphene field[J]. Scientometrics, 2021, 126(8): 6301-6321. 34 Xie Q, Zhang X Y, Ding Y, et al. Monolingual and multilingual topic analysis using LDA and BERT embeddings[J]. Journal of Informetrics, 2020, 14(3): 101055. 35 丁玉飞, 王曰芬, 刘卫江. 基于主题模型的科技监测方法及应用研究[J]. 情报学报, 2015, 34(8): 854-865. 36 关鹏, 王曰芬. 基于LDA主题模型和生命周期理论的科学文献主题挖掘[J]. 情报学报, 2015, 34(3): 286-299. 37 Li Y T, Chen Y, Wang Q Y. Evolution and diffusion of information literacy topics[J]. Scientometrics, 2021, 126(5): 4195-4224. 38 Jeyaraj A, Zadeh A H. Evolution of information systems research: insights from topic modeling[J]. Information & Management, 2020, 57(4): 103207. 39 Jebari C, Herrera-Viedma E, Cobo M J. The use of citation context to detect the evolution of research topics: a large-scale analysis[J]. Scientometrics, 2021, 126(4): 2971-2989. 40 Ebadi A, Xi P C, Tremblay S, et al. Understanding the temporal evolution of COVID-19 research through machine learning and natural language processing[J]. Scientometrics, 2021, 126(1): 725-739. 41 陈翔, 黄璐, 倪兴兴, 等. 基于动态语义网络分析的主题演化路径识别研究[J]. 情报学报, 2021, 40(5): 500-512. 42 Sung H Y, Yeh H Y, Lin J K, et al. A visualization tool of patent topic evolution using a growing cell structure neural network[J]. Scientometrics, 2017, 111(3): 1267-1285. 43 Zhang Y, Zhang G Q, Zhu D H, et al. Scientific evolutionary pathways: identifying and visualizing relationships for scientific topics[J]. Journal of the Association for Information Science and Technology, 2017, 68(8): 1925-1939. 44 Small H. Update on science mapping: creating large document spaces[J]. Scientometrics, 1997, 38(2): 275-293. 45 王伟, 杨建林. 基于引文网络重叠社团发现的图书情报领域学科主题结构分析[J]. 情报学报, 2020, 39(10): 1021-1033. 46 赵红, 孙倬, 张莎, 等. 基于文献计量分析的社交商务研究脉络与热点演化[J]. 管理学报, 2019, 16(6): 923-931. 47 Liu T, Tang L. Open innovation from the perspective of network embedding: knowledge evolution and development trend[J]. Scientometrics, 2020, 124(2): 1053-1080. 48 Hou J H, Yang X C, Chen C M. Emerging trends and new developments in information science: a document co-citation analysis (2009-2016)[J]. Scientometrics, 2018, 115(2): 869-892. 49 Yan E J, Ding Y. Scholarly network similarities: how bibliographic coupling networks, citation networks, cocitation networks, topical networks, coauthorship networks, and coword networks relate to each other[J]. Journal of the American Society for Information Science and Technology, 2012, 63(7): 1313-1326. 50 Klavans R, Boyack K W. Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge?[J]. Journal of the Association for Information Science and Technology, 2017, 68(4): 984-998. 51 Boyack K W, Klavans R. Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately?[J]. Journal of the American Society for Information Science and Technology, 2010, 61(12): 2389-2404. 52 刘向, 马费成. 科学知识网络的演化与动力——基于科学引证网络的分析[J]. 管理科学学报, 2012, 15(1): 87-94. 53 游鸽, 郭昊, 刘向. 基于专利引文网络的技术演化网络模型与仿真分析[J]. 系统仿真学报, 2021, 33(3): 591-603. 54 段庆锋, 潘小换. 文献相似性对科学引用偏好的影响实证研究[J]. 图书情报工作, 2018, 62(4): 97-106. 55 Mariani M S, Medo M, Zhang Y C. Identification of milestone papers through time-balanced network centrality[J]. Journal of Informetrics, 2016, 10(4): 1207-1223. 56 Massucci F A, Docampo D. Measuring the academic reputation through citation networks via PageRank[J]. Journal of Informetrics, 2019, 13(1): 185-201. 57 Zhou J L, Zeng A, Fan Y, et al. Ranking scientific publications with similarity-preferential mechanism[J]. Scientometrics, 2016, 106(2): 805-816. 58 Kim M, Baek I, Song M. Topic diffusion analysis of a weighted citation network in biomedical literature[J]. Journal of the Association for Information Science and Technology, 2018, 69(2): 329-342. 59 Liu Y, Xu S H. A local context-aware LDA model for topic modeling in a document network[J]. Journal of the Association for Information Science and Technology, 2017, 68(6): 1429-1448. 60 Huang Y, Bu Y, Ding Y, et al. Number versus structure: towards citing cascades[J]. Scientometrics, 2018, 117(3): 2177-2193. 61 Min C, Chen Q Y, Yan E J, et al. Citation cascade and the evolution of topic relevance[J]. Journal of the Association for Information Science and Technology, 2021, 72(1): 110-127. 62 Min C, Bu Y, Wu D, et al. Identifying citation patterns of scientific breakthroughs: a perspective of dynamic citation process[J]. Information Processing & Management, 2021, 58(1): 102428. 63 Sattari M, Zamanifar K. A cascade information diffusion based label propagation algorithm for community detection in dynamic social networks[J]. Journal of Computational Science, 2018, 25: 122-133. 64 Hummon N P, Dereian P. Connectivity in a citation network: the development of DNA theory[J]. Social Networks, 1989, 11(1): 39-63. 65 Batagelj V. Efficient algorithms for citation network analysis[OL]. (2003-09-14) [2021-12-20]. https://arxiv.org/abs/cs/0309023. 66 Liu J S, Lu L Y Y. An integrated approach for main path analysis: development of the Hirsch index as an example[J]. Journal of the American Society for Information Science and Technology, 2012, 63(3): 528-542. 67 祝清松, 冷伏海. 基于引文主路径文献共被引的主题演化分析[J]. 情报学报, 2014, 33(5): 498-506. 68 Tu Y N, Hsu S L. Constructing conceptual trajectory maps to trace the development of research fields[J]. Journal of the Association for Information Science and Technology, 2016, 67(8): 2016-2031. 69 Wu F F, Li R Y, Huang L C, et al. Theme evolution analysis of electrochemical energy storage research based on CitNetExplorer[J]. Scientometrics, 2017, 110(1): 113-139. 70 Xu S, Hao L Y, An X, et al. Review on emerging research topics with key-route main path analysis[J]. Scientometrics, 2020, 122(1): 607-624. 71 Yu D J, Pan T X. Tracing knowledge diffusion of TOPSIS: a historical perspective from citation network[J]. Expert Systems with Applications, 2021, 168: 114238. 72 Xiao Y, Lu L Y Y, Liu J S, et al. Knowledge diffusion path analysis of data quality literature: a main path analysis[J]. Journal of Informetrics, 2014, 8(3): 594-605. 73 Lathabai H H, George S, Prabhakaran T, et al. An integrated approach to path analysis for weighted citation networks[J]. Scientometrics, 2018, 117(3): 1871-1904. 74 Yu D J, Sheng L B. Influence difference main path analysis: evidence from DNA and blockchain domain citation networks[J]. Journal of Informetrics, 2021, 15(4): 101186. 75 Bornmann L, Haunschild R. Empirical analysis of recent temporal dynamics of research fields: annual publications in chemistry and related areas as an example[J]. Journal of Informetrics, 2022, 16(2): 101253. 76 Hu K, Qi K L, Yang S L, et al. Identifying the “Ghost City” of domain topics in a keyword semantic space combining citations[J]. Scientometrics, 2018, 114(3): 1141-1157. 77 Xu S, Hao L Y, An X, et al. Emerging research topics detection with multiple machine learning models[J]. Journal of Informetrics, 2019, 13(4): 100983. 78 Ding W Y, Chen C M. Dynamic topic detection and tracking: a comparison of HDP, C-word, and cocitation methods[J]. Journal of the Association for Information Science and Technology, 2014, 65(10): 2084-2097. 79 王楠, 马千淳. 基于文献计量和主题探测方法的学科评价比较研究——以中、美、英、澳四国教育学学科为例[J]. 情报学报, 2020, 39(9): 1001-1010. 80 Zhang X Y, Xie Q, Song M. Measuring the impact of novelty, bibliometric, and academic-network factors on citation count using a neural network[J]. Journal of Informetrics, 2021, 15(2): 101140. 81 Choudhury N, Faisal F, Khushi M. Mining temporal evolution of knowledge graphs and genealogical features for literature-based discovery prediction[J]. Journal of Informetrics, 2020, 14(3): 101057. 82 刘自强, 许海云, 罗瑞, 等. 基于主题关联分析的 科技互动模式识别方法研究[J]. 情报学报, 2019, 38(10): 997-1011. 83 Jensen S, Liu X Z, Yu Y Y, et al. Generation of topic evolution trees from heterogeneous bibliographic networks[J]. Journal of Informetrics, 2016, 10(2): 606-621. 84 Cheng Q K, Wang J M, Lu W, et al. Keyword-citation-keyword network: a new perspective of discipline knowledge structure analysis[J]. Scientometrics, 2020, 124(3): 1923-1943. 85 Hu K, Luo Q, Qi K L, et al. Understanding the topic evolution of scientific literatures like an evolving city: using Google word2vec model and spatial autocorrelation analysis[J]. Information Processing & Management, 2019, 56(4): 1185-1203. 86 Kutuzov A, ?vrelid L, Szymanski T, et al. Diachronic word embeddings and semantic shifts: a survey[C]// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 1384-1397. 87 Mucha P J, Richardson T, Macon K, et al. Community structure in time-dependent, multiscale, and multiplex networks[J]. Science, 2010, 328(5980): 876-878. |
|
|
|