|
|
Research on Microblog Rumor Identification Based on LDA and Random Forest |
Zeng Ziming1,2, Wang Jing1,2 |
1. Center for the Study of Information Resources, Wuhan 430072; 2. Laboratory Center for Library and Information Science, Wuhan 430072 |
|
|
Abstract The spread of Internet rumors has a negative impact on everyday life and social stability. In order to assist in rumor control, this paper analyzes information about the “haze” rumors on the Sina Weibo microblogging platform in 2016, and constructs reliability and influence variables based on Weibo data and history research. In addition, the LDA model is used to gather the topic distribution of the experimental text data. Based upon the reliability variable, the influence variable, and the probability of topics, the paper uses random forest for classification to achieve rumor identification. The experiment results show that the probability of topics plays an important role in rumor identification, and that the random forest model, based on LDA, can lead to an improvement in the accuracy of rumor identification.
|
Received: 03 November 2017
|
|
|
|
[1] 李桂华, 王亚男, 朱一凡. 网络谣言的信息接收反应机制及其风险治理[J]. 情报学报, 2014, 33(3): 305-312. [2] 贺刚, 吕学强, 李卓, 等. 微博谣言识别研究[J]. 图书情报工作, 2013, 57(23): 114-120. [3] 闵庆飞, 刘晓丹. 谣言研究综述: 基于媒介演变的视角[J]. 情报杂志, 2015, 34(4): 104-109. [4] 李丹丹, 马静. 复杂社会网络上的谣言传播模型研究综述[J]. 情报理论与实践, 2016, 39(12): 130-134. [5] 张志安, 束开荣, 何凌南. 微信谣言的主题与特征[J]. 新闻与写作, 2016(1): 60-64. [6] 武庆圆, 何凌南. 基于多标签双词主题模型的短文本谣言分析研究[J]. 情报杂志, 2017, 36(3): 92-97. [7] Zhang Q, Zhang S, Dong J, et al. Automatic detection of rumor on social network[M]//Natural Language Processing and Chinese Computing. Cham: Springer, 2015: 113-122. [8] 刘雅辉, 靳小龙, 沈华伟, 等. 社交媒体中的谣言识别研究综述[J]. 计算机学报, 2018, 41(7): 1536-1545. [9] Wu K, Yang S, Zhu K Q. False rumors detection on sina weibo by propagation structures[C]//2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015: 651-662. [10] 王理, 谢耘耕. 公共事件中网络谣言传播实证分析——基于2010~2012年间网络谣言信息的研究[J]. 上海交通大学学报(哲学社会科学版), 2014, 22(2): 86-99. [11] 蒙在桥, 傅秀芬, 陈培文, 等. 基于OSN的谣言传播模型及影响力节点研究[J]. 复杂系统与复杂性科学, 2015, 12(3): 45-52. [12] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. [13] Dhillon I S, Modha D S. Concept decompositions for large sparse text data using clustering[J]. Machine Learning, 2001, 42(1): 143-175. [14] 张志飞, 苗夺谦, 高灿. 基于 LDA 主题模型的短文本分类方法[J]. 计算机应用, 2013, 33(6): 1587-1590. [15] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32. [16] Breiman L. Statistical modeling: The two cultures (with comments and a rejoinder by the author)[J]. Statistical Science, 2001, 16(3): 199-231. [17] 邓生雄, 雒江涛, 刘勇, 等. 集成随机森林的分类模型[J]. 计算机应用研究, 2015, 32(6): 1621-1624. [18] Han J W, Kamber M. 数据挖掘概念与技术[M]. 范明, 孟小峰, 译. 北京: 机械工业出版社, 2001. [19] 刘知远, 张乐, 涂存超, 等. 中文社交媒体谣言统计语义分析[J]. 中国科学: 信息科学, 2015, 45(12): 1536-1546. [20] 袁旭萍, 王仁武, 翟伯荫. 基于综合指数和熵值法的微博水军自动识别[J]. 情报杂志, 2014, 33(7): 176-179. [21] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016: 33-37. [22] Wolfe F, Clauw D J, Fitzcharles M A, et al. The American college of rheumatology preliminary diagnostic criteria for fibromyalgia and measurement of symptom severity[J]. Arthritis Care & Research, 2010, 62(5): 600-610. [23] 汪海燕, 黎建辉, 杨风雷. 支持向量机理论及算法研究综述[J]. 计算机应用研究, 2014, 31(5): 1281-1286. |
|
|
|