|
|
Mechanism and Empirical Research on Forecasting Influenza Epidemic Fused with Baidu Index |
Wang Ruojia1,2 |
1. Department of Information Management, Peking University, Beijing 100871; 2. Institute of Ocean Research, Peking University, Beijing 100871 |
|
|
Abstract This study explores the internal mechanism and possibility of forecasting an influenza epidemic based on both search queries and actual influenza data. First, the logical relationship is explored between online information searches and conventional surveillance data based on the concepts of information behaviors, information seeking behaviors, and so on. Then, the range selection method and cross-correlation analysis are used to select keywords according to the theoretical framework. Finally, three models are established and compared. The results show that (i) the empirical research proves the logical rationality of the theoretical framework: the keywords that could reflect flu trends ten weeks in advance are related to influenza vaccines; those a week in advance are related to influenza symptoms; and most of the simultaneous keywords are frequent terms related to influenza; (ii) all three models can predict influenza effectively, and support vector machine yields the most accurate forecasting result.
|
Received: 25 June 2017
|
|
|
|
[1] Ginsberg J, Mohebbi M H, Patel R S, et al.Detecting influenza epidemics using search engine query data[J]. Nature, 2009, 457: 1012-1014. [2] Valdivia A, López-Alcalde J, Vicente M, et al.Monitoring influenza activity in Europe with Google Flu Trends: comparison with the findings of sentinel physician networks - results for 2009-10[J]. Eurosurveillance, 2010, 15(29): 2-7. [3] Wada K, Ohta H, Aizawa Y.Correlation of “Google Flu Trends” with Sentinel Surveillance Data for Influenza in 2009 in Japan[J]. The Open Public Health Journal, 2011,4: 17-20. [4] Cook S, Conrad C, Fowlkes A L, et al.Assessing Google Flu Trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic[J]. PLoS ONE, 2011, 6(8): e23610. [5] Cho S, Sohn C H, Jo M W, et al.Correlation between national influenza surveillance data and google trends in South Korea[J]. PLoS ONE, 2013, 8(12): e81422. [6] Kang M, Zhong H J, He J F, et al.Using Google Trends for influenza surveillance in South China[J]. PLoS ONE, 2013, 8(1): e55205. [7] 朱猛, 祖荣强, 霍翔, 等. 时间序列分析在流感疫情预测预警中的应用[J]. 中华预防医学杂志, 2011, 45(12): 1108-1111. [8] Spink A, Cole C.Human information behavior: Integrating diverse approaches and information use[J]. Journal of the American Society for Information Science and Technology, 2006, 57(1): 25-35. [9] 张崇, 吕本富, 彭赓, 等. 网络搜索数据与CPI的相关性研究[J]. 管理科学学报, 2012, 15(7): 50-59, 70. [10] 王炼, 贾建民. 基于网络信息搜索的旅游需求预测——来自黄金周的证据[J]. 系统管理学报, 2014, 23(3): 345-350, 358. [11] Kulkarni G, Kannan P K, Moe W.Using online search data to forecast new product sales[J]. Decision Support Systems, 2012, 52(3): 604-611. [12] Song T M, Song J, An J Y, et al.Psychological and social factors affecting internet searches on suicide in Korea: A big data analysis of Google search trends[J]. Yonsei Medical Journal, 2014, 55(1): 254-263. [13] Bardak B, Tan M.Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data[C]// Proceedings of the IEEE 15th International Conference on Bioinformatics and Bioengineering, Belgrade, 2015: 1-6. [14] 卢洪涛, 李纲. 网络搜索关键词时序变化特征研究——以H7N9禽流感关键词实验为例[J]. 情报杂志, 2014, 33(11): 175-180. [15] 杨艳红, 曾庆, 赵寒, 等. 基于谷歌趋势的乙型肝炎预测模型[J]. 上海交通大学学报(医学版), 2013, 33(2): 204-208. [16] Cao P H, Wang X, Fang S S, et al.Forecasting influenza epidemics from multi-stream surveillance data in a subtropical city of China[J]. PLoS ONE, 2014, 9(3): e92945. [17] Ortiz J R, Zhou H, Shay D K, et al.Monitoring influenza activity in the United States: A comparison of traditional surveillance systems with Google Flu Trends[J]. PLoS ONE, 2011, 6(4): e18687. [18] 肖静. 高校教师健康信息行为研究[D]. 南京:南京航空航天大学, 2008. [19] 张馨遥. 健康信息需求研究的内容与意义[J]. 医学与社会, 2010, 23(1): 51-53. [20] Wilson T D.Human information behavior[J]. Informing Science: The International Journal of an Emerging Transdiscipline, 2000, 3: 49-56. [21] 李秀婷, 刘凡, 董纪昌, 等. 基于互联网搜索数据的中国流感监测[J]. 系统工程理论与实践, 2013, 33(12): 3028-3034. [22] 李锐, 孙利谦, 熊成龙, 等. 基于互联网搜索数据研究全球高致病性禽流感病毒H5N1的暴发监测[J]. 中华疾病控制杂志, 2015, 19(8): 773-777. [23] Culotta A.Towards detecting influenza epidemics by analyzing Twitter messages[C]// Proceedings of the First Workshop on Social Media Analytics. New York: ACM Press, 2010: 115-122. [24] Xu W, Han Z W, Ma J.A neural netwok based approach to detect influenza epidemics using search engine query data[C]// Proceedings of the International Conference on Machine Learning and Cybernetics, Qingdao, 2010: 1408-1412. [25] Xu Q N, Gel Y R, Ramirez Ramirez L L, et al. Forecasting influenza in Hong Kong with Google search queries and statistical model fusion[J]. PLoS ONE, 2017, 12(5): e0176690. [26] Woo H, Cho Y, Shim E, et al.Estimating influenza outbreaks using both search engine query data and social media data in South Korea[J]. Journal of Medical Internet Research, 2016, 18(7): e177. [27] 卢汉体, 李傅冬, 林君芬, 等. 基于支持向量机的浙江省流感样病例预警模型研究[J]. 浙江大学学报(医学版), 2015, 44(6): 653-658. [28] 中国国家流感中心. 中国流感监测方案(2010年版)[EB/OL]. (2016-05-20) [2017-12-17]. http://www.chinaivdc.cn/cnic/fascc/ 201708/t20170809_149276.htm. [29] Santillana M, Nguyen A T, Dredze M, et al.Combining search, social media, and traditional data sources to improve influenza surveillance[J]. PLoS Computational Biology, 2015, 11(10): e1004513. [30] 王若佳, 李培. 基于互联网搜索数据的流感监测模型比较与优化[J]. 图书情报工作, 2016, 60(18): 122-132. [31] 夏国恩, 金炜东. 基于支持向量机的客户流失预测模型[J]. 系统工程理论与实践, 2008, 28(1): 71-77. |
|
|
|