|
|
Research on Fake News Detection Based on Multimodal Transformer |
Wang Zhenyu, Zhu Xuefang |
School of Information Management, Nanjing University, Nanjing 210023 |
|
|
Abstract Fake news detection has been an essential area in natural language processing to reduce the negative impact of misinformation on society. Most existing multimodal fake news detection methods use pre-trained models to act as feature extractors; however, these methods have the following shortcomings: (1) Pre-trained model parameters are typically frozen during model training. However, it is crucial to note that these pre-trained models are not flawless; (2) CNN-based image feature extractor structures are typically more complex than Transformer-based text feature extractor structures, and because image features are typically stored in advance, the shortcomings of these models are negligible. Therefore, this study proposes a multimodal end-to-end Transformer, unifies the feature extraction process for different modalities by extracting image features using a vision Transformer rather than a CNN, achieves cross-fusion of image features and text features using a co-attention module, and conducts comparative experiments on three public datasets. The experimental results show that the performance of the model proposed in this study outperforms other baseline models.
|
Received: 16 November 2022
|
|
|
|
1 Allcott H, Gentzkow M. Social media and fake news in the 2016 election[J]. Journal of Economic Perspectives, 2017, 31(2): 211-236. 2 Castillo C, Mendoza M, Poblete B. Information credibility on twitter[C]// Proceedings of the 20th International Conference on World Wide Web. New York: ACM Press, 2011: 675-684. 3 Kwon S, Cha M, Jung K, et al. Prominent features of rumor propagation in online social media[C]// Proceedings of the 2013 IEEE 13th International Conference on Data Mining. Piscataway: IEEE, 2013: 1103-1108. 4 Ma J, Gao W, Mitra P, et al. Detecting rumors from microblogs with recurrent neural networks[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 3818-3824. 5 Yu F, Liu Q A, Wu S, et al. A convolutional approach for misinformation identification[C]// Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3901-3907. 6 Wang Y Q, Ma F L, Jin Z W, et al. EANN: event adversarial neural networks for multi-modal fake news detection[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM Press, 2018: 849-857. 7 Khattar D, Goud J S, Gupta M, et al. MVAE: multimodal variational autoencoder for fake news detection[C]// Proceedings of the 19th International Conference on World Wide Web. New York: ACM Press, 2019: 2915-2921. 8 Singh P, Srivastava R, Rana K P S, et al. SEMI-FND: stacked ensemble based multimodal inference for faster fake news detection[OL]. (2022-05-17) [2022-11-10]. https://arxiv.org/ftp/arxiv/papers/2205/2205.08159.pdf. 9 张国标, 李洁, 胡潇戈. 基于多模态特征融合的社交媒体虚假新闻检测[J]. 情报科学, 2021, 39(10): 126-132. 10 Singhal S, Shah R R, Chakraborty T, et al. SpotFake: a multi-modal framework for fake news detection[C]// Proceedings of the 2019 IEEE Fifth International Conference on Multimedia Big Data. Piscataway: IEEE, 2019: 39-47. 11 王婕, 刘芸, 纪淑娟. 基于矩阵分解双线性池化的多模态融合虚假新闻检测[J]. 计算机应用研究, 2022, 39(10): 2968-2973, 2978. 12 Qian S S, Wang J G, Hu J, et al. Hierarchical multi-modal contextual attention network for fake news detection[C]// Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2021: 153-162. 13 Lu J S, Batra D, Parikh D, et al. ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2019: 13-23. 14 Hendricks L A, Mellor J, Schneider R, et al. Decoupling the role of data, attention, and losses in multimodal transformers[J]. Transactions of the Association for Computational Linguistics, 2021, 9: 570-585. 15 Rashkin H, Choi E, Jang J Y, et al. Truth of varying shades: analyzing language in fake news and political fact-checking[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 2931-2937. 16 Ma J, Gao W, Wei Z Y, et al. Detect rumors using time series of social context information on microblogging websites[C]// Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York: ACM Press, 2015: 1751-1754. 17 Bahad P, Saxena P, Kamal R. Fake news detection using bi-directional LSTM-recurrent neural network[J]. Procedia Computer Science, 2019, 165: 74-82. 18 Qi P, Cao J, Yang T Y, et al. Exploiting multi-domain visual information for fake news detection[C]// Proceedings of the 2019 IEEE International Conference on Data Mining. Piscataway: IEEE, 2019: 518-527. 19 Jin Z W, Cao J, Guo H, et al. Multimodal fusion with recurrent neural networks for rumor detection on microblogs[C]// Proceedings of the 25th ACM International Conference on Multimedia. New York: ACM Press, 2017: 795-816. 20 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 6000-6010. 21 Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale[OL]. (2020-06-03) [2022-11-10]. https://arxiv.org/pdf/2010.11929.pdf. 22 Liu Z, Lin Y T, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 9992-10002. 23 Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]// Proceedings of the 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229. 24 Chen M, Radford A, Child R, et al. Generative pretraining from pixels[C]// Proceedings of the 37th International Conference on Machine Learning. Cambridge: MIT Press, 2020: 1691-1703. 25 Liu R J, Yuan Z J, Liu T, et al. End-to-end lane shape prediction with transformers[C]// Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 3693-3701. 26 Kenton J D M W C, Toutanova L K. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2019: 4171-4186. 27 Liu Y H, Ott M, Goyal N, et al. RoBERTa: a robustly optimized BERT pretraining approach[OL]. (2019-07-26) [2022-11-10]. https://arxiv.org/pdf/1907.11692.pdf. 28 Sun C, Qiu X P, Xu Y G, et al. How to fine-tune BERT for text classification?[C]// Proceedings of the 18th China National Conference on Chinese Computational Linguistics. Cham: Springer, 2019: 194-206. 29 Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning. Cambridge: MIT Press, 2021: 8748-8763. 30 Boididou C, Andreadou K, Papadopoulos S, et al. Verifying multimedia use at MediaEval 2015[C]// Proceedings of the MediaEval 2015 Workshop. CEUR-WS.org, 2015: Paper 4. 31 Li Y Q, Ji K, Ma K, et al. Fake news detection based on the correlation extension of multimodal information[C]// Proceedings of the 6th Aisa-Pacific Web and Web-Age Information Management Joint International Conference on Web and Big Data. Cham: Springer, 2023: 443-450. 32 Kim W, Son B, Kim I. Vilt: vision-and-language transformer without convolution or region supervision[C]// Proceedings of the 38th International Conference on Machine Learning. Cambridge: MIT Press, 2021: 5583-5594. 33 Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context[C]// Proceedings of the 13th European Conference on Computer Vision. Cham: Springer, 2014: 740-755. 34 Sharma P, Ding N, Goodman S, et al. Conceptual Captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 2556-2565. 35 Ordonez V, Kulkarni G, Berg T L. Im2Text: describing images using 1 million captioned photographs[C]// Proceedings of the 24th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2011: 1143-1151. 36 Krishna R, Zhu Y K, Groth O, et al. Visual Genome: connecting language and vision using crowdsourced dense image annotations[J]. International Journal of Computer Vision, 2017, 123(1): 32-73. 37 Cubuk E D, Zoph B, Shlens J, et al. Randaugment: practical automated data augmentation with a reduced search space[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2020: 3008-3017. 38 朱学芳, 王震宇. 基于多模态Transformer的虚假新闻检测方法: CN115982350A[P]. 2023-04-18. 责任编辑 王克平) |
|
|
|