|
|
Comparison and Improvement of Health Misinformation Identification Methods in WeChat Official Account Articles |
Wang Lei1, Song Shijie2, Zhu Qinghua1 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.Business School, Hohai University, Nanjing 211000 |
|
|
Abstract Recently, the proliferation of health misinformation in WeChat official account articles has impacted users’ access to health knowledge and decreased their ability to make informed health decisions. To suppress the dissemination of health misinformation, it is necessary to study methods of automatically identifying and detecting health misinformation. This study uses samples from two sources: health articles published by authority accounts (e.g., “Science China,” “Ding Xiang Doctor,” and other governmental accounts) and articles containing health misinformation that have been labeled. Health misinformation is identified through the steps of word segmentation, stop word removal, syntax feature extraction, and text classification. We selected the best classifier through the comparison of accuracy, precision, recall, training time, and other performance-related indicators. Moreover, to solve the problems of polysemy and synonyms in text classification, this paper used Latent Dirichlet Allocation (LDA) topic analysis to extract the semantic features of the text and then proposed a feature extraction method based on “syntax plus semantics.” The experiments suggest that our proposed new method had better performance over methods based on semantic feature extraction and other prior models. By proposing a novel method for identifying health misinformation in WeChat official account articles, this study may have practical implications for online health misinformation governance.
|
Received: 23 December 2021
|
|
|
|
1 Chu J T, Wang M P, Shen C, et al. How, when and why people seek health information online: qualitative study in Hong Kong[J]. Interactive Journal of Medical Research, 2017, 6(2): e24. 2 Social activities, information seeking on subjects like health and education top the list of mobile activities[EB/OL]. (2019-08-25) [2021-12-16]. https://www.pewresearch.org/internet/2019/08/22/social-activities-information-seeking-on-subjects-like-health-and-education-top-the-list-of-mobile-activities/. 3 Wang X H, Shi J Y, Lee K M. The digital divide and seeking health information on smartphones in Asia: survey study of ten countries[J]. Journal of Medical Internet Research, 2022, 24(1): e24086. 4 Bode L, Vraga E K. In related news, that was wrong: the correction of misinformation through related stories functionality in social media[J]. Journal of Communication, 2015, 65(4): 619-638. 5 周晓英, 岳丽欣, 裴俊良, 等. 我国突发事件应急信息管理政策内容的变迁及其特征研究——基于2003—2020年413份政策文本的计量分析[J]. 情报资料工作, 2021, 42(3): 33-43. 6 庄曦. 城市老年群体微信健康信息的接触与鉴别研究[J]. 南京师大学报(社会科学版), 2019(6): 112-122. 7 王林, 王可, 吴江. 社交媒体中突发公共卫生事件舆情传播与演变——以2018年疫苗事件为例[J]. 数据分析与知识发现, 2019, 3(4): 42-52. 8 裴俊良, 周晓英, 张薷, 等. 基于政策文本的政府突发公共卫生事件信息报告制度的分析研究[J]. 情报资料工作, 2021, 42(3): 52-59. 9 Krishna A, Thompson T L. Misinformation about health: a review of health communication and misinformation scholarship[J]. American Behavioral Scientist, 2021, 65(2): 316-332. 10 Wang Y X, McKee M, Torbica A, et al. Systematic literature review on the spread of health-related misinformation on social media[J]. Social Science & Medicine, 2019, 240: 112552. 11 Nyhan B, Reifler J. When corrections fail: the persistence of political misperceptions[J]. Political Behavior, 2010, 32(2): 303-330. 12 巢乃鹏, 黄娴. 网络传播中的“谣言”现象研究[J]. 情报理论与实践, 2004, 27(6): 586-589, 575. 13 宋士杰, 赵宇翔, 宋小康, 等. 互联网环境下失真健康信息可信度判断的影响因素研究[J]. 中国图书馆学报, 2019, 45(4): 72-85. 14 Bode L, Vraga E K. See something, say something: correction of global health misinformation on social media[J]. Health Communication, 2018, 33(9): 1131-1140. 15 李月琳, 张秀, 王姗姗. 社交媒体健康信息质量研究: 基于真伪健康信息特征的分析[J]. 情报学报, 2018, 37(3): 294-304. 16 Liu Y, Yu K, Wu X F, et al. Analysis and detection of health-related misinformation on Chinese social media[J]. IEEE Access, 2019, 7: 154480-154489. 17 Porat T, Garaizar P, Ferrero M, et al. Content and source analysis of popular tweets following a recent case of diphtheria in Spain[J]. European Journal of Public Health, 2019, 29(1): 117-122. 18 Panatto D, Amicizia D, Arata L, et al. A comprehensive analysis of Italian web pages mentioning squalene-based influenza vaccine adjuvants reveals a high prevalence of misinformation[J]. Human Vaccines & Immunotherapeutics, 2018, 14(4): 969-977. 19 Allem J P, Ramanujam J, Lerman K, et al. Identifying sentiment of hookah-related posts on twitter[J]. JMIR Public Health and Surveillance, 2017, 3(4): e74. 20 Al Khaja K A J, AlKhaja A K, Sequeira R P. Drug information, misinformation, and disinformation on social media: a content analysis study[J]. Journal of Public Health Policy, 2018, 39(3): 343-357. 21 Li J X. Detecting false information in medical and healthcare domains: a text mining approach[C]// Proceedings of the International Conference on Smart Health. Cham: Springer, 2019: 236-246. 22 Sicilia R, Giudice S L, Pei Y L, et al. Health-related rumour detection on Twitter[C]// Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 2017: 1599-1606. 23 Zhang Y, Song S J. Older adults’ evaluation of the credibility of online health information[C]// Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. New York: ACM Press, 2020: 358-362. 24 Sun Y L, Zhang Y, Gwizdka J, et al. Consumer evaluation of the quality of online health information: systematic literature review of relevant criteria and indicators[J]. Journal of Medical Internet Research, 2019, 21(5): e12522. 25 Armstrong-Heimsoth A, Johnson M L, McCulley A, et al. Good Googling: a consumer health literacy program empowering parents to find quality health information online[J]. Journal of Consumer Health on the Internet, 2017, 21(2): 111-124. 26 Pradhan A, Lazar A, Findlater L. Use of intelligent voice assistants by older adults with low technology use[J]. ACM Transactions on Computer-Human Interaction, 2020, 27(4): 1-27. 27 Kattenbeck M, Elsweiler D. Understanding credibility judgements for web search snippets[J]. Aslib Journal of Information Management, 2019, 71(3): 368-391. 28 Zhao M Y, Song S J, Zhao Y C. Health information seeking on social apps among older adults living with chronic conditions[J]. Proceedings of the Association for Information Science and Technology, 2021, 58(1): 878-880. 29 Donzelli G, Palomba G, Federigi I, et al. Misinformation on vaccination: a quantitative analysis of YouTube videos[J]. Human Vaccines & Immunotherapeutics, 2018, 14(7): 1654-1659. 30 Kinsora A, Barron K, Mei Q Z, et al. Creating a labeled dataset for medical misinformation in health forums[C]// Proceedings of the 2017 IEEE International Conference on Healthcare Informatics. IEEE, 2017: 456-461. 31 Kumar S, Shah N. False information on web and social media: a survey[OL]. (2018-04-23) [2021-12-21]. https://arxiv.org/pdf/1804.08559.pdf. 32 Deb A, Majmundar A, Seo S, et al. Social bots for online public health interventions[C]// Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 2018: 1-4. 33 Sicilia R, Lo Giudice S, Pei Y L, et al. Twitter rumour detection in the health domain[J]. Expert Systems with Applications, 2018, 110: 33-40. 34 Hou R, Pérez-Rosas V, Loeb S, et al. Towards automatic detection of misinformation in online medical videos[C]// Proceedings of the 2019 International Conference on Multimodal Interaction. New York: ACM Press, 2019: 235-243. 35 Boyer C, Dolamic L. Automated detection of HONcode website conformity compared to manual detection: an evaluation[J]. Journal of Medical Internet Research, 2015, 17(6): e135. 36 Charnock D, Shepperd S, Needham G, et al. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices[J]. Journal of Epidemiology & Community Health, 1999, 53(2): 105-111. 37 Shah Z, Surian D, Dyda A, et al. Automatically appraising the credibility of vaccine-related web pages shared on social media: a Twitter surveillance study[J]. Journal of Medical Internet Research, 2019, 21(11): e14007. 38 Belen Sa?lam R, Taskaya Temizel T. A framework for automatic information quality ranking of diabetes websites[J]. Informatics for Health and Social Care, 2015, 40(1): 45-66. 39 王平, 程齐凯. 网络信息可信度评估的研究进展及述评[J]. 信息资源管理学报, 2013, 3(1): 46-52. 40 Fernandez-Luque L, Karlsen R, Melton G B. HealthTrust: a social network approach for retrieving online health videos[J]. Journal of Medical Internet Research, 2012, 14(1): e22. 41 Song S J, Zhang Y, Yu B. Interventions to support consumer evaluation of online health information credibility: a scoping review[J]. International Journal of Medical Informatics, 2021, 145: 104321. 42 何涛, 王桂芳, 马廷灿. 基于类中心向量的论文作者归属机构自动识别方法研究[J]. 情报学报, 2019, 38(7): 716-721. 43 Zhang M L, Zhou Z H. ML-KNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048. 44 王菲菲, 王筱涵, 徐硕, 等. 基于三维引文关联网络的潜在知识流动探测——以基因编辑领域为例[J]. 情报学报, 2021, 40(2): 184-193. 45 赵洪. 生成式自动文摘的深度学习方法综述[J]. 情报学报, 2020, 39(3): 330-344. 46 Choudrie J, Banerjee S, Kotecha K, et al. Machine learning techniques and older adults processing of online information and misinformation: a covid 19 study[J]. Computers in Human Behavior, 2021, 119: 106716. 47 Safarnejad L, Xu Q, Ge Y R, et al. A multiple feature category data mining and machine learning approach to characterize and detect health misinformation on social media[J]. IEEE Internet Computing, 2021, 25(5): 43-51. 48 俞琰, 尚明杰, 赵乃瑄. 权利要求特征驱动的专利关键词抽取方法[J]. 情报学报, 2021, 40(6): 610-620. 49 符保龙, 张爱科. 中心聚类和语义特征融合的网页信息文本挖掘方法[J]. 辽宁工程技术大学学报(自然科学版), 2016, 35(1): 85-88. 50 王堃, 林民, 李艳玲. 端到端对话系统意图语义槽联合识别研究综述[J]. 计算机工程与应用, 2020, 56(14): 14-25. 51 曾子明, 王婧. 基于LDA和随机森林的微博谣言识别研究——以2016年雾霾谣言为例[J]. 情报学报, 2019, 38(1): 89-96. |
|
|
|