|
|
Time-lag Calculation and Enlightenment of Multi-source Science and Technology Literature Fusion for the Detection of Emerging Research Topic: A Case Study in the Field of Agriculture |
Yang Jinqing1,2, Lu Wei1,2, Wu Leyan1,2 |
1.School of Information Management, Wuhan University, Wuhan 430072 2.Institute for Information Retrieval and Knowledge Mining, Wuhan University, Wuhan 430072 |
|
|
Abstract To explore the time lag in the emerging topic detection of multi-source data fusion, this paper designs a scheme to calculate time lag. First, research topics are extracted from four kinds of scientific and technological literature datasets, then a similarity matrix is constructed by calculating the similarity between those research topics. Second, optimal combination under the condition of minimum similarity loss is found based on the Hungarian optimal matching algorithm. Finally, the linear equation model is constructed and time lag is calculated by fitting the model. Using the experimental data of 337,790 abstract texts in agricultural disciplines from 2009 to 2016, the number of the research topics extracted from fund projects, patents, journal articles, and conference papers is 250, 260, 260, and 240 respectively. Using the above-mentioned time-lag calculation method of scientific and technological literature, we find the following results: journal articles lag behind fund project text and conference papers for one year and patent documents lag behind journal articles for one year. Combining with the previous research results in different disciplines, the feasibility and effectiveness of the time-lag calculation method for multi-source scientific and technological literature are verified, and a new idea for the formulation of a multi-source data fusion strategy is also provided.
|
Received: 26 September 2019
|
|
|
|
1 朝乐门, 卢小宾. 数据科学及其对信息科学的影响[J]. 情报学报, 2017, 36(8): 761-771. 2 中国科学技术信息研究所. 中国卓越科技论文报告[EB/OL]. [2020-02-20]. http://conference.istic.ac.cn/cstpcd2019/正文2019_2卓越论文.pdf. 3 Allan J, Carbonell J, Doddington G, et al. Topic detection and tracking pilot study final report[C]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Lansdowne, 1998: 194-218. 4 Tu Y N, Seng J L. Indices of novelty for emerging topic detection[J]. Information Processing & Management, 2012, 48(2): 303-325. 5 郑彦宁, 刘志辉, 赵筱媛, 等. 基于多源信息与多元方法的产业竞争情报分析范式[J]. 情报学报, 2013, 32(3): 228-234. 6 徐路路, 王效岳, 白如江. 基于PLDA模型与多数据源融合相关性分析的新兴主题探测研究——以石墨烯领域为例[J]. 情报理论与实践, 2018, 41(4): 63-69, 43. 7 白如江, 冷伏海, 廖君华. 一种基于多数据源主题对比的科学研究前沿识别方法[J]. 情报理论与实践, 2017, 40(8): 43-48, 36. 8 张娴, 方曙, 肖国华, 等. 专利文献价值评价模型构建及实证分析[J]. 科技进步与对策, 2011, 28(6): 127-132. 9 王凌燕, 方曙, 季培培. 利用专利文献识别新兴技术主题的技术框架研究[J]. 图书情报工作, 2011, 55(18): 74-78, 23 10 Wang Q. A bibliometric model for identifying emerging research topics[J]. Journal of the Association for Information Science and Technology, 2018, 69(2): 290-304. 11 范云满, 马建霞. 基于LDA与新兴主题特征分析的新兴主题探测研究[J]. 情报学报, 2014, 33(7): 698-711. 12 杨海霞, 高宝俊, 孙含林. 基于LDA挖掘计算机科学文献的研究主题[J]. 现代图书情报技术, 2016(11): 20-26. 13 杨金庆, 肖兵, 程秀峰, 等. 基于HDP过程模型与学术会议的学科新兴主题发现研究——以“人工智能”领域为例[J]. 情报理论与实践, 2019, 42(4): 117-122. 14 李静, 徐路路, 赵素君. 基于时间序列分析和SVM模型的基金项目新兴主题趋势预测与可视化研究[J]. 情报理论与实践, 2019, 42(1): 118-123, 152. 15 化柏林, 李广建. 大数据环境下多源信息融合的理论与应用探讨[J]. 图书情报工作, 2015, 59(16): 5-10. 16 Khaleghi B, Khamis A, Karray F O, et al. Multisensor data fusion: a review of the state-of-the-art[J]. Information Fusion, 2013, 14(1): 28-44. 17 许海云, 董坤, 隗玲, 等. 科学计量中多源数据融合方法研究述评[J]. 情报学报, 2018, 37(3): 318-328. 18 周群, 化柏林. 基于多源数据融合的科技决策需求主题识别研究[J]. 情报理论与实践, 2019, 42(3): 107-113. 19 刘自强, 许海云, 岳丽欣, 等. 面向研究前沿预测的主题扩散演化滞后效应研究[J]. 情报学报, 2018, 37(10): 979-988. 20 Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 21 王效岳, 刘自强, 白如江, 等. 基于基金项目数据的研究前沿主题探测方法[J]. 图书情报工作, 2017, 61(13): 87-98. 22 Lu K, Cai X, Ajiferuke I, et al. Vocabulary size and its effect on topic representation[J]. Information Processing & Management, 2017, 53(3): 653-665. 23 Lee L. Measures of distributional similarity[C]//Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. Morristown: Association for Computational Linguistics, 1999: 25-32. 24 Edmonds J. Maximum matching and a polyhedron with 0, 1-vertices[J]. Journal of Research of the National Bureau of Standards Section B: Mathematics and Mathematical Physics, 1965, 69: 125-130. |
|
|
|