|
|
Topic Mining of Online Reviews Based on Gaussian Latent Dirichlet Allocation |
Guo Xianda, Zhao Narisa, Gao Huan, Yang Xinyi |
Institute of Systems Engineering, Dalian University of Technology, Dalian 116024 |
|
|
Abstract This study proposes a method based on Gaussian latent Dirichlet allocation (LDA) for online comments to overcome the limitations of the current topic mining methods, such as sparseness and semantic incoherence of generated topics, that result in a poor applicability. The word vectors of online comments are obtained by word2vec training, and the topic distribution of online comments is achieved based on the Gaussian LDA model. The topic distribution is then used to calculate the similarity matrix of comments, and the affinity propagation clustering algorithm is employed to cluster online comments. The topic discovery is realized by analyzing the clustering results. Finally, the TextRank algorithm is used to extract the key sentences of each topic to generate the topic summary so that the description of the topic can be completed. The proposed method effectively alleviates the information overload problem of consumers online comments. The effectiveness and practical application value of the proposed method have been established through experiments and calculations performed on online product reviews from seven platforms, such as Taobao, Jingdong, and Douban.
|
Received: 19 July 2019
|
|
|
|
1 刘洋, 廖貅武, 刘莹. 在线评论对应用软件及平台定价策略的影响[J]. 系统工程学报, 2014, 29(4): 560-570. 2 王智生, 李慧颖, 孙锐. 在线评论有用性投票的影响因素研 究——基于商品类型的调节作用[J]. 管理评论, 2016, 28(7): 143-153. 3 Rus V, Niraula N, Banjade R. Similarity measures based on latent Dirichlet allocation[C]// Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing. Heidelberg: Springer, 2013: 459-470. 4 刘啸剑, 谢飞, 吴信东. 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016, 35(6): 664-672. 5 刘晓君, 那日萨, 崔雪莲. 基于隐含狄利克雷分配模型的消费者在线评论复杂网络构建及其应用[J]. 系统工程学报, 2017, 32(3): 305-312. 6 Quan X J, Liu G, Lu Z, et al. Short text similarity based on probabilistic topics[J]. Knowledge and Information Systems, 2010, 25(3): 473-491. 7 de Groof R, Xu H P. Automatic topic discovery of online hospital reviews using an improved LDA with variational Gibbs sampling[C]// Proceedings of the International Conference on Big Data. Boston: IEEE, 2017: 3940-3947. 8 Hu P F, Liu W J, Jiang W, et al. Latent topic model based on Gaussian-LDA for audio retrieval[C]// Proceedings of Chinese Conference on Pattern Recognition. Heidelberg: Springer, 2012: 556-563. 9 Das R, Zaheer M, Dyer C. Gaussian LDA for topic models with word embeddings[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 795-804. 10 Abualigah L M, Khader A T. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering[J]. The Journal of Supercomputing, 2017, 73(11): 4773-4795. 11 Frey B J, Dueck D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. 12 李一鸣, 倪丽萍, 方清华, 等. 基于近邻传播的文本数据流聚类算法研究[J]. 计算机科学, 2016, 43(5): 223-229. 13 Guan R C, Shi X H, Marchese M, et al. Text clustering with seeds affinity propagation[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(4): 627-637. 14 Rangrej A, Kulkarni S, Tendulkar A V. Comparative study of clustering techniques for short text documents[C]// Proceedings of the 20th International Conference Companion on World Wide Web. New York: ACM Press, 2011: 111-112. 15 郭崇慧, 曹梦月. GMAP: 一种基于AP聚类的共词分析方法[J]. 情报学报, 2017, 36(11): 1192-1200. 16 Hu M Q, Liu B. Mining opinion features in customer reviews[C]// Proceedings of the 19th National Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2004: 755-760. 17 莫鹏, 胡珀, 黄湘冀, 等. 基于超图的文本摘要与关键词协同抽取研究[J]. 中文信息学报, 2015, 29(6): 135-140. 18 林莉媛, 王中卿, 李寿山, 等. 基于PageRank的中文多文档文本情感摘要[J]. 中文信息学报, 2014, 28(2): 85-90. 19 唐晓波, 肖璐. 基于单句粒度的微博主题挖掘研究[J]. 情报学报, 2014, 33(6): 623-632. 20 何喜军, 张婷婷, 武玉英, 等. 供需匹配视角下基于语义相似聚类的技术需求识别模型[J]. 系统工程理论与实践, 2019, 39(2): 476-485. 21 尹裴, 王洪伟. 面向产品特征的中文在线评论情感分类: 以本体建模为方法[J]. 系统管理学报, 2016, 25(1): 103-114. 22 Xiao D, Ji Y G, Li Y T, et al. Coupled matrix factorization and topic modeling for aspect mining[J]. Information Processing & Management, 2018, 54(6): 861-873. 23 Araque O, Corcuera-Platas I, Sánchez-Rada J F, et al. Enhancing deep learning sentiment analysis with ensemble techniques in social applications[J]. Expert Systems with Applications, 2017, 77: 236-246. 24 崔雪莲, 那日萨, 刘晓君. 基于主题相似性的在线评论情感分析[J]. 系统管理学报, 2018, 27(5): 24-30. 25 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[OL]. https://arxiv.org/pdf/1301.3781.pdf. 26 Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2013: 3111-3119. 27 黄仁, 张卫. 基于word2vec的互联网商品评论情感倾向研究[J]. 计算机科学, 2016, 43(s1): 387-389. 28 Lauren P, Qu G Z, Zhang F, et al. Discriminant document embeddings with an extreme learning machine for classifying clinical narratives[J]. Neurocomputing, 2018, 277: 129-138. 29 李良强, 袁华, 叶开, 等. 基于在线评论词向量表征的产品属性提取[J]. 系统工程学报, 2018, 33(5): 113-123. 30 Mihalcea R, Tarau P. TextRank: Bringing order into texts[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 31 Fahad A, Alshatri N, Tari Z, et al. A survey of clustering algorithms for big data: Taxonomy and empirical analysis[J]. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279. |
|
|
|