|
|
Theme Analysis Based on Keyword Importance and Affinity Propagation Clustering |
Li Hailin1,2, Wan Xiaoji1, Lin Chunpei1 |
1. College of Business Administration, Huaqiao University, Quanzhou 362021; 2. Research Center of Applied Statistics and Big Data, Huaqiao University, Xiamen 361021 |
|
|
Abstract In view of the fact that co-occurrence analysis lacks consideration of keyword importance and theme analysis in such a way that it does not adaptively extract the core themes in traditional scientific measurement methods, this paper proposes a theme analysis method based on keyword importance and affinity propagation clustering. Based on probable behavior of most authors, the method collects the keywords of theses according to the strength or weakness of the relevance to the corresponding research content, computes the importance measure of the keywords in the papers, and constructs the similarity matrix of the keywords. The extraction and analysis of the core theme is achieved through combining the method with affinity propagation clustering that can retrieve the best representative member of the cluster. In this study, the keywords in a specialized journal of literature and information during the period of 2012 to 2016 were collected, and keyword clustering based on importance measurement was implemented. The evolutionary trends of keywords and core themes were analyzed and studied. The method proposed in this study not only considers the keyword importance and automatically identifies core themes, but also provides new data mining methods for thematic document analysis and effectively improves the topic recognition effect in related fields such as journals and other disciplines.
|
Received: 08 December 2017
|
|
|
|
[1] 郑晓月, 牟冬梅, 琚沅红, 等. 学科知识结构主题演化模式研究——以图书情报学领域“计量学”主题为例[J]. 图书情报工作, 2017, 61(12): 32-41. [2] 李明鑫, 王松. 近十年国内知识图谱研究脉络及主题分析[J]. 图书情报知识, 2016(4): 93-101 [3] 张春博, 王续琨. 主题裂变:科学技术管理学的新走势[J]. 科学学与科学技术管理, 2012, 33(7): 5-11. [4] 唐果媛, 张薇. 基于共词分析法的学科主题演化研究进展与分析[J]. 图书情报工作, 2015, 59(5): 128-136. [5] 刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究——以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016, 42(6): 67-84. [6] 方龙, 李信, 黄永, 等. 学术文本的结构功能识别——在关键词自动抽取中的应用[J]. 情报学报, 2017, 36(6): 599-605. [7] 李思志, 李佳骏, 李艳红. 管理科学与工程领域的创新轨迹研究——基于TOP期刊的文献计量和文本挖掘视角[J]. 中国管理科学, 2014, 22(S1): 56-62. [8] 巴志超, 李纲, 朱世伟. 共现分析中的关键词选择与语义度量方法研究[J]. 情报学报, 2016, 35(2): 197-207. [9] 王沙沙, 丰景春, 薛松, 等. 基于知识图谱的PPP研究热点主题分析[J]. 科技管理研究, 2017, 37(17): 167-173. [10] 秦春秀, 祝婷, 赵捧未, 等. 自然语言语义分析研究进展[J]. 图书情报工作, 2014, 58(22): 130-137. [11] 张敏, 罗梅芬, 张艳. 国际文本挖掘研究主题群识别与演化趋势分析[J]. 图书馆学研究, 2017(2): 15-21. [12] 赵京胜, 朱巧明, 周国栋, 等. 自动关键词抽取研究综述[J]. 软件学报, 2017, 28(9): 2431-2449. [13] 李纲, 李轶. 一种基于关键词加权的共词分析方法[J]. 情报科学, 2011, 29(3): 321-324. [14] Frey B J, Dueck D.Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. [15] 查先进, 张晋朝, 严亚兰, 等. 网络信息行为研究现状及发展动态述评[J]. 中国图书馆学报, 2014, 40(4): 100-115. [16] Guan R C, Shi X H, Marchese M, et al.Text clustering with seeds affinity propagation[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(4): 627-637. [17] Sun L L, Guo C H, Liu C R, et al.Fast affinity propagation clustering based on incomplete similarity matrix[J]. Knowledge and Information Systems, 2017, 51(3): 941-963. |
|
|
|