|
|
Research on Hotspots and Trends of Domestic Text Mining Based on Cluster Analysis |
Tan Zhanglu, Peng Shengnan, Wang Zhaoguang |
China University of Mining and Technology (Beijing) School of Management, Beijing 100083 |
|
|
Abstract Understanding research hotspots and trends in the field of domestic text mining has immense significance in mastering the development and changes in domain content and promoting further development of the related research. First, this study uses the research literature of 1155 text mining related topics in CNKI database from 1998 to 2017 as the sample and the word frequency matrix of the article keywords as the data. It employs the SPSS software for cluster analysis. Further, the chi-square statistics are used to extract high-degree keywords to interpret the clustering results. According to the clustering results, the literature in the text mining field is divided into 13 categories from the macroscopic level to grasp the research hotspots and trends of domestic text mining. The results show the following: (i) The research on basic research of text mining, text big data preprocessing, and text mining application field are hot topics, (ii) the amount of applied research literature related to association rules, text clustering, and text classification is small, and (iii) text topic analysis, text big data preprocessing, and web text mining research are likely to become new research trends in the future.
|
Received: 27 September 2018
|
|
|
|
1 曾铮. 互联网环境下的知识挖掘研究[J]. 情报理论与实践, 2005(2): 135-138. 2 KodratoffY. Knowledge discovery in texts: A definition and applications[C]// Proceedings of the 11th International Symposium on Methodologies of Intelligent Systems. Heidelberg: Springer, 1999: 16-29. 3 TanA H. Text mining: The state of the art and the challenges[C]// Proceedings of the Pacific Asia Conference on Knowledge Discovery and Data Mining. 1999: 65-70. 4 HearstM A. Mining in textual moutains[OL]. http://mappa.mundi.net/trip-m/hearst. 5 HearstM A. Untangling text data mining[C]// Proceedings of the 37 th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 1999: 3-10. 6 FeldmanR, DaganI. knowledge discovery in textual databases (KDT)[C]// Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining.Palo Alto: AAAI Press, 1995: 112-117. 7 袁军鹏, 朱东华, 李毅, 等. 文本挖掘技术研究进展[J]. 计算机应用研究, 2006(02): 1-4. 8 杨丽华, 戴齐, 杨占华. 文本分类技术研究[J]. 微计算机信息, 2006(15): 209-211. 9 肖建国. 试论文本挖掘及其应用[J]. 图书馆学研究, 2008(4): 22-24. 10 张雯雯, 许鑫. 文本挖掘工具述评[J]. 图书情报工作, 2012, 56(8): 26-31, 55. 11 谌志群, 张国煊. 文本挖掘研究进展[J]. 模式识别与人工智能, 2005, 18(1): 65-74. 12 郑双怡. 文本挖掘及其在知识管理中的应用[J]. 中南民族大学学报(人文社会科学版), 2005(4): 127-130. 13 刘三女牙, 彭晛, 刘智, 等. 基于文本挖掘的学习分析应用研究[J]. 电化教育研究, 2016, 37(2): 23-30. 14 郭金龙, 许鑫, 陆宇杰. 人文社会科学研究中文本挖掘技术应用进展[J]. 图书情报工作, 2012, 56(8): 10-17. 15 范并思. 社会科学信息分析中的文本挖掘[J]. 图书情报工作, 2012, 56(8): 6-9. 16 郭金龙, 许鑫. 数字人文中的文本挖掘研究[J]. 大学图书馆学报, 2012, 30(3): 11-18. 17 徐德金, 张伦. 文本挖掘用于社会科学研究: 现状、问题与展望[J]. 科学与社会, 2015, 5(3): 75-89. 18 李尚昊, 朝乐门. 文本挖掘在中文信息分析中的应用研究述评[J]. 情报科学, 2016, 34(8): 153-159. 19 刘春艳. 基于信息可视化的文本挖掘研究领域前沿与演化分析[J]. 图书情报工作, 2011, 55(S2): 270-272, 189. 20 史航, 高雯珺, 崔雷. 生物医学文本挖掘研究热点分析[J]. 中华医学图书情报杂志, 2016, 25(2): 27-33. 21 AllahyariM, PouriyehS, AssefiM, et al. A brief survey of text mining: Classification, clustering and extraction techniques[OL]. https://arxiv.org/pdf/1707.02919.pdf. 22 唐菁, 沈记全, 杨炳儒. 基于Web的文本挖掘系统的研究与实现[J]. 计算机科学, 2003, 30(1): 60-62. 23 薛为民, 陆玉昌. 文本挖掘技术研究[J]. 北京联合大学学报(自然科学版), 2005, 19(4): 59-63. 24 刘彦保, 王文发, 王文东. 基于聚类分析策略的Web文本挖掘方法[J]. 延安大学学报(自然科学版), 2007, 26(4): 22-25, 29. 25 WuQ Q, DengX, ZhangC D, et al. LDA-based model for topic evolution mining on text[C]// Proceedings of the International Conference on Computer Science & Education. IEEE, 2011: 946-949. 26 DingY, FuX. The research of text mining based on self-organizing maps[J]. Procedia Engineering, 2012, 29(4): 537-541. 27 GoswamiS, ShishodiaM S. A fuzzy based approach to text mining and document clustering[J]. International Journal of Data Mining & Knowledge Management Process, 2013, 3(3): 43-52. 28 朱卫星, 徐伟光, 何红悦, 等. 文本数据主题挖掘与关联搜索研究[J]. 计算机科学, 2017, 44(11A): 411-413, 456. 29 KobayashiV B, MolS T, BerkersH A, et al. Text classification for organizational researchers[J]. Organizational Research Methods, 2018, 21(3): 766-799. 30 HaoT Y, ChenX L, LiG Z, et al. A bibliometric analysis of text mining in medical research[J]. Soft Computing, 2018, 22(23): 7875-7892. 31 ZhaiX, LiZ H, GaoK, et al. Research status and trend analysis of global biomedical text mining studies in recent 10 years[J]. Scientometrics, 2015, 105(1): 509-523. 32 张敏, 罗梅芬, 张艳. 国际文本挖掘研究主题群识别与演化趋势分析[J]. 图书馆学研究, 2017(2): 15-21. 33 GouM Y, ZhaoW L. Knowledge mapping analysis on text mining research of medicine related fields in different regions[J]. Cross-Cultural Communication, 2017, 13(9): 1-9. 34 XuQ X, NiuN, QuanY M, et al. Research on the development of text mining technology based on bibliometrics and knowledge map visualization[J]. Scientific Journal of Information Engineering, 2017, 7(1): 15-26. 35 ChiangJ K, WuW C, LiaoW C, et al. Machine learning trend anticipation by text mining methodology based on SSCI database[C]// Proceedings of the Fifth International Joint Conference on INC, IMS and IDC. Washington DC: IEEE Computer Society, 2009: 612-617. 36 刘海峰, 苏展, 刘守生. 一种基于词频信息的改进CHI文本特征选择[J]. 计算机工程与应用, 2013, 49(22): 110-114. 37 裴英博, 刘晓霞. 文本分类中改进型CHI特征选择方法的研究[J]. 计算机工程与应用, 2011, 47(4): 128-130. 38 王勤池, 乔建行. 数字化图书馆检索系统的现状与发展[J]. 情报科学, 1998, 16(6): 571-574. |
|
|
|