Misinformation Identification Method by Automatic Iterative Clustering Data Set for Training
Zhang Junsheng1, Sun Xiaoping2, Liu Zhihui1
1.Institute of Scientific and Technical Information of China, Beijing 100038 2.KL-IIP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190
张均胜, 孙晓平, 刘志辉. 自动迭代聚类数据集训练的虚假信息识别方法[J]. 情报学报, 2023, 42(1): 59-73.
Zhang Junsheng, Sun Xiaoping, Liu Zhihui. Misinformation Identification Method by Automatic Iterative Clustering Data Set for Training. 情报学报, 2023, 42(1): 59-73.
1 Lazer D M J, Baum M A, Benkler Y, et al. The science of fake news[J]. Science, 2018, 359(6380): 1094-1096. 2 Bondielli A, Marcelloni F. A survey on fake news and rumour detection techniques[J]. Information Sciences, 2019, 497: 38-55. 3 刘彬. 健康传播中的虚假信息扩散机制与网络治理探究[J]. 传播与版权, 2020(4): 178-179, 185. 4 BhattacharjeeAmrita, 舒凯, 高旻, 等. 网络信息生态系统中的虚假信息:检测、缓解与挑战[J]. 计算机研究与发展, 2021, 58(7): 1353-1365. 5 黄如花, 黄雨婷. 面向重大突发公共卫生事件的虚假信息甄别——从新型冠状病毒肺炎疫情防控谈公众信息素养教育的重要性[J/OL]. 图书情报知识, (2020-04-21). https://d.wanfangdata.com.cn/periodical/ChlQZXJpb2RpY2FsQ0hJTmV3UzIwMjIxMTE1Eg90c3FienMyMDIwMDIwMDQaCHFiamFjcm1x. 6 Shu K, Sliva A, Wang S H, et al. Fake news detection on social media: a data mining perspective[J]. ACM SIGKDD Explorations Newsletter, 2017, 19(1): 22-36. 7 Afroz S, Brennan M, Greenstadt R. Detecting hoaxes, frauds, and deception in writing style online[C]// Proceedings of the 2012 IEEE Symposium on Security and Privacy. IEEE, 2012: 461-475. 8 Wu K, Yang S, Zhu K Q. False rumors detection on Sina Weibo by propagation structures[C]// Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, 2015: 651-662. 9 Rubin V L, Conroy N J, Chen Y M, et al. Fake news or truth? Using satirical cues to detect potentially misleading news[C]// Proceedings of the Second Workshop on Computational Approaches to Deception Detection. Stroudsburg: Association for Computational Linguistics, 2016: 7-17. 10 Ahmed H, Traore I, Saad S. Detection of online fake news using n-gram analysis and machine learning techniques[C]// Proceedings of the International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments. Cham: Springer, 2017: 127-138. 11 Rashkin H, Choi E, Jang J Y, et al. Truth of varying shades: analyzing language in fake news and political fact-checking[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 2931-2937. 12 Ma J, Gao W, Mitra P, et al. Detecting rumors from microblogs with recurrent neural networks[C]// Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. Washington D.C: AAAI Press, 2016: 3818-3824. 13 Verma P K, Agrawal P, Amorim I, et al. WELFake: word embedding over linguistic features for fake news detection[J]. IEEE Transactions on Computational Social Systems, 2021, 8(4): 881-893. 14 Kaliyar R K, Goswami A, Narang P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach[J]. Multimedia Tools and Applications, 2021, 80(8): 11765-11788. 15 Vosoughi S, Mohsenvand M N, Roy D. Rumor gauge: predicting the veracity of rumors on Twitter[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(4): Article No.50. 16 刘波, 李洋, 孟青, 等. 社交媒体内容可信性分析与评价[J]. 计算机研究与发展, 2019, 56(9): 1939-1952. 17 谢柏林, 蒋盛益, 周咏梅, 等. 基于把关人行为的微博虚假信息及早检测方法[J]. 计算机学报, 2016, 39(4): 730-744. 18 Hosseinimotlagh S, Papalexakis E E. Unsupervised content-based identification of fake news articles with tensor decomposition ensembles[C]// Proceedings of the Workshop on Misinformation and Misbehavior Mining on the Web, Los Angeles, California, USA, 2018: 1-8. 19 任亚峰, 姬东鸿, 张红斌, 等. 基于PU学习算法的虚假评论识别研究[J]. 计算机研究与发展, 2015, 52(3): 639-648. 20 Dong X S, Victor U, Qian L J. Two-path deep semisupervised learning for timely fake news detection[J]. IEEE Transactions on Computational Social Systems, 2020, 7(6): 1386-1398. 21 Ng R T, Han J W. Efficient and effective clustering methods for spatial data mining[C]// Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1994: 144-155. 22 Sander J, Ester M, Kriegel H P, et al. Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications[J]. Data Mining and Knowledge Discovery, 1998, 2: 169-194. 23 Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]// Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Washington D.C: AAAI Press, 1996: 226-231. 24 Ankerst M, Breunig M M, Kriegel H P, et al. OPTICS: ordering points to identify the clustering structure[J]. ACM SIGMOD Record, 1999, 28(2): 49-60. 25 Silverman B W. Density estimation for statistics and data analysis[M]. London: Chapman and Hall, 1986. 26 Sheather S J, Jones M C. A reliable data-based bandwidth selection method for kernel density estimation[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1991, 53(3): 683-690. 27 Schnell P. A method to find point-groups[J]. Biometrika, 1964, 6: 47-48. 28 Hinneburg A, Keim D A. A general approach to clustering in large databases with noise[J]. Knowledge and Information Systems, 2003, 5(4): 387-415. 29 李存华, 孙志挥, 陈耿, 胡云. 核密度估计及其在聚类算法构造中的应用[J]. 计算机研究与发展, 2004, 41(10): 1712-1719. 30 Hinneburg A, Gabriel H H. DENCLUE 2.0: fast clustering based on kernel density estimation[C]// Proceedings of the International Symposium on Intelligent Data Analysis. Heidelberg: Springer, 2007: 70-80. 31 Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using siamese BERT-networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2019: 3982-3992. 责任编辑 王克平)