|
|
A Co-word Analysis Method Based on Semantic Relevance and Fuzzy Clustering |
Lu Quan1,2, Cao Yue1, Chen Jing3 |
1.Center for Studies of Information Resources, Wuhan University, Wuhan 430072 2.Big Data Institute, Wuhan University, Wuhan 430072 3.School of Information Management, Central China Normal University, Wuhan 430079 |
|
|
Abstract Co-word analysis is an important basic method for text content analysis; however, there are two shortcomings of the existing co-word analysis methods. One is that the semantic relevance of word pairs is not considered in the construction of the keyword co-word matrix; the other is that the diversity of word topic attribution is not supported in the cluster analysis of the co-word matrix. This study proposes a co-word analysis method based on semantic relevance and fuzzy clustering. Domain keywords are extracted based on Donohue's formula and the g-index of word frequency. The semantic vector representation of keywords is learned by the word embedding model. Subsequently, the semantic weighted co-word matrix is constructed to synthesize co-occurrence features and semantic relevance to measure the correlation between word pairs. Combining the fuzzy C-means clustering algorithm and factor dimensionality reduction, the semantic weighted co-word matrix is used for keyword fuzzy clustering to overcome the lack of simplification of word topic attribution in hard clustering, which can improve the information quality of clusters and determine the relationship between clusters. Experiments are conducted using periodicals of infectious diseases to verify the effectiveness and superiority of the method.
|
Received: 20 July 2021
|
|
|
|
1 Lu W, Wang J M, Hu J M. Analyzing the topic distribution and evolution of foreign relations from parliamentary debates: a framework and case study[J]. Information Processing & Management, 2020, 57(3): 102191. 2 杨建林. 关键词选择策略及其对共词分析的影响[J]. 情报学报, 2014(10): 1083-1090. 3 叶鹰, 张力, 赵星, 等. 用共关键词网络揭示领域知识结构的实验研究[J]. 情报学报, 2012, 31(12): 1245-1251. 4 俞仙子, 高英莲, 马春霞, 等. 提取核心特征词的惩罚性矩阵分解方法——以共词分析为例[J]. 现代图书情报技术, 2014(3): 88-95. 5 孙海生. 连边社团检测算法对共词分析聚类结果的改进研究[J]. 图书情报工作, 2016, 60(10): 123-129. 6 Callon M, Courtial J P, Turner W A, et al. From translations to problematic networks: an introduction to co-word analysis[J]. Social Science Information, 1983, 22(2): 191-235. 7 钟伟金. 共词分析法应用的规范化研究——主题词和关键词的聚类效果对比分析[J]. 图书情报工作, 2011, 55(6): 114-118. 8 唐晓波, 肖璐. 融合关键词增补与领域本体的共词分析方法研究[J]. 现代图书情报技术, 2013(11): 60-67. 9 赵国荣, 王文剑, 杨光. 一种基于组块分析的共现词提取方法[J]. 情报科学, 2017, 35(12): 129-135. 10 巴志超, 李纲, 朱世伟. 共现分析中的关键词选择与语义度量方法研究[J]. 情报学报, 2016, 35(2): 197-207. 11 郭高晶, 孟溦. 中国(上海)自由贸易试验区政府职能转变的注意力配置研究——基于83篇政策文本的加权共词分析[J]. 情报杂志, 2018, 37(2): 63-68. 12 Donohue J C. Understanding scientific literatures: a bibliometric approach[M]. Cambridge: The MIT Press, 1973: 49-50. 13 虞秋雨, 徐跃权. 共词分析中高频词阈值确定方法的实证研 究——以新冠肺炎文献高频词选取为例[J]. 情报科学, 2020, 38(9): 90-95. 14 杨爱青, 马秀峰, 张风燕, 等. g指数在共词分析主题词选取中的应用研究[J]. 情报杂志, 2012, 31(2): 52-55, 74. 15 安兴茹. 基于正态分布的词频分析法高频词阈值研究[J]. 情报杂志, 2014, 33(10): 129-136. 16 Serrano M A, Bogu?á M, Vespignani A. Extracting the multiscale backbone of complex weighted networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(16): 6483-6488. 17 胡昌平, 陈果. 科技论文关键词特征及其对共词分析的影响[J]. 情报学报, 2014, 33(1): 23-32. 18 Li M N. An exploration to visualise the emerging trends of technology foresight based on an improved technique of co-word analysis and relevant literature data of WOS[J]. Technology Analysis & Strategic Management, 2017, 29(6): 655-671. 19 刘敏娟, 张学福, 颜蕴. 基于词频、词量、累积词频占比的共词分析词集范围选取方法研究[J]. 图书情报工作, 2016, 60(23): 135-142. 20 胡昌平, 陈果. 共词分析中的词语贡献度特征选择研究[J]. 现代图书情报技术, 2013(7/8): 89-93. 21 陈果, 肖璐, 赵雪芹. 领域知识分析中的关键词选择方法研 究——一种以学科为背景的全局视角[J]. 情报学报, 2014, 33(9): 959-968. 22 安新颖. 基于改进信息熵的干细胞研究领域共词分析[J]. 图书情报工作, 2011, 55(2): 37-40. 23 Choi J, Yi S, Lee K C. Analysis of keyword networks in MIS research and implications for predicting knowledge evolution[J]. Information & Management, 2011, 48(8): 371-381. 24 Zhao W Y, Mao J, Lu K. Ranking themes on co-word networks: exploring the relationships among different metrics[J]. Information Processing & Management, 2018, 54(2): 203-218. 25 李纲, 巴志超. 共词分析过程中的若干问题研究[J]. 中国图书馆学报, 2017, 43(4): 93-113. 26 马续补, 相雅凡, 刘玮, 等. 基于共词分析的中国公共信息资源开放政策变迁研究[J]. 信息资源管理学报, 2020, 10(4): 5-14. 27 陆泉, 李畅, 刘婷, 等. 在线医患沟通中的知识不对称研究[J]. 信息资源管理学报, 2021, 11(1): 90-97, 111. 28 李海林, 万校基, 林春培. 基于关键词重要性和近邻传播聚类的主题分析研究[J]. 情报学报, 2018, 37(5): 533-542. 29 周鑫, 陈媛媛. 关键词词频变化视角下学科研究发展趋势分 析——以国内情报学研究为例[J]. 情报杂志, 2016, 35(5): 133-140, 112. 30 奉国和, 孔泳欣. 基于时间加权关键词词频分析的学科热点研究[J]. 情报学报, 2020, 39(1): 100-110. 31 Callon M, Courtial J P, Laville F. Co-word analysis as a tool for describing the network of interactions between basic and technological research: the case of polymer chemsitry[J]. Scientometrics, 1991, 22(1): 155-205. 32 路青, 靖彩玲, 范少萍. 基于互信息的共词分析方法研究[J]. 情报科学, 2016, 34(4): 48-51. 33 Zhou L Q, Ba Z C, Fan H, et al. Research on the semantic measurement in co-word analysis[C]// Proceedings of the iConference 2018: Transforming Digital Worlds. Cham: Springer, 2018: 409-419. 34 王玉林, 王忠义. 细粒度语义共词分析方法研究[J]. 图书情报工作, 2014, 58(21): 73-80. 35 Feng J, Zhang Y Q, Zhang H. Improving the co-word analysis method based on semantic distance[J]. Scientometrics, 2017, 111(3): 1521-1531. 36 周萌, 陈果. 科技文本中术语细粒度共现关系抽取与可视化分析[J]. 情报科学, 2019, 37(3): 81-87. 37 完颜邓邓, 盛小平. 基于共词分析的国内开放存取研究主题探析[J]. 图书情报工作, 2013, 57(5): 94-100. 38 李永忠, 陈静, 谢隆腾. 共词分析法中战略坐标图的改进研究[J]. 情报理论与实践, 2019, 42(1): 65-69. 39 Liu G Y, Hu J M, Wang H L. A co-word analysis of digital library field in China[J]. Scientometrics, 2012, 91(1): 203-217. 40 杨颖, 许丹, 陈斯斯, 等. 基于自然指数刊文数据对全球医学研究领域热点的探析[J]. 情报学报, 2019, 38(11): 1129-1137. 41 郭崇慧, 曹梦月. GMAP: 一种基于AP聚类的共词分析方法[J]. 情报学报, 2017, 36(11): 1192-1200. 42 Li M N, Chu Y Q. Explore the research front of a specific research theme based on a novel technique of enhanced co-word analysis[J]. Journal of Information Science, 2017, 43(6): 725-741. 43 霍朝光, 魏瑞斌, 张斌. 基于PageRank和Node2vec的研究热点与集群发现——以国际深度学习研究领域为例[J]. 情报杂志, 2020, 39(8): 174-179, 153. 44 邵作运, 李秀霞. 惩罚性矩阵分解及其在共词分析中的应用[J]. 图书情报工作, 2015, 59(13): 126-133, 148. 45 王治和, 王淑艳, 杜辉. 基于密度敏感距离的改进模糊C均值聚类算法[J]. 计算机工程, 2021, 47(5): 88-96, 103. 46 李纲, 李昱瑶, 谢子霖, 等. 混合关键词选择策略对共词分析效果的影响研究[J]. 情报理论与实践, 2017, 40(11): 110-116. 47 徐坤, 毕强. 次高频关键词的选择及在共词分析中的应用[J]. 情报理论与实践, 2019, 42(5): 148-152. 48 Zhao Y Y, Cui L, Yang H. Evaluating reliability of co-citation clustering analysis in representing the research history of subject[J]. Scientometrics, 2009, 80(1): 91-102. 49 Bojanowski P, Grave E, Joulin A, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146. 50 张涛, 马海群. 基于文本相似度计算的我国人工智能政策比较研究[J]. 情报杂志, 2021, 40(1): 39-47, 24. 51 Bezdek J C, Ehrlich R, Full W. FCM: the fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2/3): 191-203. 52 胡吉明, 田沛霖. 文本智能计算研究的主题挖掘与演化分析[J]. 情报杂志, 2021, 40(4): 139-146. 53 刘晓梦, 赵彩彦. 2017年感染性疾病临床进展[J]. 临床荟萃, 2018, 33(1): 60-65. 54 杨娟, 赖圣杰, 余宏杰. 感染性疾病流行现状、防控挑战与应对[J]. 中华疾病控制杂志, 2017, 21(7): 647-649, 674. 55 钟伟金, 李佳. 共词分析法研究(一)——共词分析的过程与方式[J]. 情报杂志, 2008, 27(5): 70-72. 56 李锋. 基于核心关键词的聚类分析——兼论共词聚类分析的不足[J]. 情报科学, 2017, 35(8): 68-71, 78. 57 钟伟金, 李佳, 杨兴菊. 共词分析法研究(三)——共词聚类分析法的原理与特点[J]. 情报杂志, 2008, 27(7): 118-120. 58 Boolchandani M, D’Souza A W, Dantas G. Sequencing-based methods and resources to study antimicrobial resistance[J]. Nature Reviews Genetics, 2019, 20(6): 356-370. |
|
|
|