|
|
Research on Topic Discovery and Evolution Based on Time Series Clustering |
Li Hailin and Wu Xianli |
College of Business Administration, Huaqiao University, Quanzhou 362021 |
|
|
Abstract In view of the uniqueness of the existing methods of topic discovery and evolutionary analysis in literature, this paper proposes a method of topic discovery and evolutionary analysis based on time series clustering. The co-occurrence matrix of high-frequency keywords in document datasets is found by co-word analysis. The co-occurrence matrix is transformed into a similarity matrix by the Ochiia coefficient calculation method, and then the topic of the document is found by using the nearest neighbor propagation clustering algorithm. At the same time, the research heat of each topic during a certain period is analyzed and transformed into time series data reflecting the heat of each topic, and the time series clustering method is used to classify and analyze the evolution trend of each topic. The experimental results show that the proposed method can effectively discover the research topics of journals and better analyze the evolution trends of these topics through data processing and mining of the journal literature related to innovation management in CNKI from 2000 to 2018.
|
Received: 30 November 2018
|
|
|
|
1 王平. 基于层次概率主题模型的科技文献主题发现及演化[J]. 图书情报工作, 2014, 58(22): 70-77. 2 de la Hoz-CorreaA, Mu?oz-LeivaF, BakuczM. Past themes and future trends in medical tourism research: A co-word analysis[J]. Tourism Management, 2018, 65: 200-211. 3 MryglodO, HolovatchY, KennaR, et al. Quantifying the evolution of a scientific topic: Reaction of the academic community to the Chornobyl disaster[J]. Scientometrics, 2016, 106(3): 1151-1166. 4 de la Hoz-CorreaA, Mu?oz-LeivaF, BakuczM. Past themes and future trends in medical tourism research: A co-word analysis[J]. Tourism Management, 2018, 65: 200-211. 5 郭红梅, 孔贝贝, 张智雄. 基于多重文本关系图中clique子团聚类的主题识别方法研究[J]. 情报学报, 2017, 36(5): 433-442. 6 HajjemM, LatiriC. Combining IR and LDA topic modeling for filtering Microblogs[J]. Procedia Computer Science, 2017, 112: 761-770. 7 刘自强, 王效岳, 白如江. 多维主题演化分析模型构建与实证研究[J]. 情报理论与实践, 2017, 40(3): 92-98. 8 BryX, RedontP, VerronT, et al. THEME-SEER: A multidimensional exploratory technique to analyze a structural model using an extended covariance criterion[J]. Journal of Chemometrics, 2012, 26(5): 158-169. 9 王小华, 徐宁, 谌志群. 基于共词分析的文本主题词聚类与主题发现[J]. 情报科学, 2011, 29(11): 1621-1624. 10 PavlinekM, PodgorelecV. Text classification method based on self-training and LDA topic models[J]. Expert Systems with Applications, 2017, 80: 83-93. 11 廖海涵, 王曰芬, 关鹏. 微博舆情传播周期中不同传播者的主题挖掘与观点识别[J]. 图书情报工作, 2018, 62(19): 77-85. 12 SuhS, ChooJ, LeeJ, et al. L-EnsNMF: Boosted local topic discovery via ensemble of nonnegative matrix factorization[C]// Proceedings of the International Conference on Data Mining. New York: IEEE, 2016: 479-488. 13 YangZ, MichailidisG. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data[J]. Bioinformatics, 2016, 32(1): 1-8. 14 ZongL L, ZhangX C, ZhaoL, et al. Multi-view clustering via multi-manifold regularized non-negative matrix factorization[J]. Neural Networks, 2017, 88: 74-89. 15 AbidinT F, YusufB, UmranM. Singular Value Decomposition for dimensionality reduction in unsupervised text learning problems[C]// Proceedings of the International Conference on Education Technology and Computer. New York: IEEE, 2010: V4-422-V4-426. 16 XueS F, JiangH, DaiL R, et al. Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition[J]. Journal of Signal Processing Systems, 2016, 82(2): 175-185. 17 Gerk?i?S, PregeljB, PerneM, et al. Model predictive control of ITER plasma current and shape using singular-value decomposition[J]. Fusion Engineering and Design, 2018, 129: 158-163. 18 李海林, 万校基, 林春培. 基于关键词重要性和近邻传播聚类的主题分析研究[J]. 情报学报, 2018, 33(5): 533-542. 19 王沙沙, 丰景春, 薛松, 等. 基于知识图谱的PPP研究热点主题分析[J]. 科技管理研究, 2017, 37(17): 167-173. 20 FreyB J, DueckD. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. 21 朱红, 丁世飞, 许新征. 基于改进属性约简的细粒度并行AP聚类算法[J]. 计算机研究与发展, 2012, 49(12): 2638-2644. 22 FreyB J, DueckD. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. 23 刘晓勇, 付辉. 一种快速AP聚类算法[J]. 山东大学学报(工学版), 2011, 41(4): 20-23. 24 李海林, 梁叶. 基于数值符号和形态特征的时间序列相似性度量方法[J]. 控制与决策, 2017, 32(3): 451-458. 25 KajitaS, ItakuraF. Subband-Autocorrelation analysis and its application for speech recognition[C]// Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 1994, 2: II/193-II/196. 26 李海林, 梁叶. 基于动态时间弯曲的股票时间序列联动性研究[J]. 数据采集与处理, 2016, 31(1): 117-129. 27 DiazM, HenriquezP, FerrerM A, et al. Stability-based system for bearing fault early detection[J]. Expert Systems with Applications, 2017, 79: 65-75. 28 SuryantoC H, XueJ H, FukuiK. Randomized time warping for motion recognition[J]. Image and Vision Computing, 2016, 54: 1-11. 29 KhalidM I, AlotaibyT N, AldosariS A, et al. Epileptic MEG spikes detection using amplitude thresholding and dynamic time warping[J]. IEEE Access, 2017, 5: 11658-11667. 30 ThakurM R, KhilnaniD R, GuptaK, et al. Detection and prevention of botnets and malware in an enterprise network[J]. International Journal of Wireless and Mobile Computing, 2012, 5(2): 144-153. 31 DürrenmattD J, del GiudiceD, RieckermannJ. Dynamic time warping improves sewer flow monitoring[J]. Water Research, 2013, 47(11): 3803-3816. 32 李海林, 梁叶, 王少春. 时间序列数据挖掘中的动态时间弯曲研究综述[J]. 控制与决策, 2018, 33(8): 1345-1353. 33 穆颖丽. 论高校图书馆知识管理及其实施策略[J]. 图书情报知识, 2003(6): 22-24. 34 张治河, 丁华, 孙丽杰, 等. 创新型城市与产业创新系统[J]. 科学学与科学技术管理, 2006, 27(12): 150-155. |
|
|
|