刘冰, 庞琳. 国内外大数据质量研究述评[J]. 情报学报, 2019, 38(2): 217-226.
Liu Bing, Pang Lin. Review of Domestic and International Research on Big Data Quality. 情报学报, 2019, 38(2): 217-226.
[1] Lohr S. The change of big data[N]. New York Times, 2012-02-11. [2] Laney D. 3D data management: Controlling data volume, velocity and variety[J]. META Group Research Note, 2001, 6: 70. [3] Gantz J, Reinsel D. Extracting value from chaos[J]. IDC iView, 2011, 1142(2011): 1-12. [4] Gudivada V N, Baeza-Yates R, Raghavan V V. Big data: Promises and problems[J]. IEEE Computer, 2015, 48(3): 20-23. [5] Franks B. 驾驭大数据[M]. 北京: 人民邮电出版社, 2013. [6] Kulkarni A. A study on metadata management and quality evaluation in big data management[J]. Engineering Technology & Applied Science Research, 2016, 4(7): 455-459. [7] Lee Y W, Pipino L L, Funk J D, et al. 数据质量征途[M]. 黄伟, 王嘉寅, 苏秦, 等译. 北京: 高等教育出版社, 2015. [8] 汪应洛, 黄伟, 朱志祥. 大数据产业及管理问题的一些初步思考[J]. 科技促进发展, 2014(1): 15-19. [9] Immonen A, P??kk?nen P, Ovaska E. Evaluating the quality of social media data in big data architecture[J]. IEEE Access, 2015, 3: 2028-2043. [10] Liu J, Li J, Li W, et al. Rethinking big data: A review on the data quality and usage issues[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2016, 115: 134-142. [11] Boyd D, Crawford K. Critical questions for big data provocations for a cultural, technological, and scholarly phenomenon[J]. Information, Communication and Society, 2012, 15: 662-679. [12] Sukumar R, Ramachandran N, Ferrell R K. ‘Big Data’ in health care: How good is it?[J]. International Journal of Health Care Quality Assurance, 2015: 2-9. [13] Caballero I, Serrano M, Piattini M. A data quality in use model for big data[C]// Proceedings of the International Conference on Conceptual Modeling. Heidelberg: Springer, 2014: 65-74. [14] Cai L, Zhu Y Y. The challenges of data quality and data quality assessment in the big data era[J]. Data Science Journal, 2015, 14: Article No. 2. [15] Wahyudi A, Kuk G, Janssen M. A process pattern model for tackling and improving big data quality[J]. Information Systems Frontiers, 2018, 20: 457-469. [16] Haryadi A F, Hulstijn J, Wahyudi A, et al. Antecedents of big data quality: An empirical examination in financial service organizations[C]// Proceedings of 2016 IEEE International Conference on Big Data. IEEE, 2016: 116-121. [17] Gao J, Xie C, Tao C. Big data validation and quality assurance—Issuses, challenges, and needs[C]// Proceedings of 2016 IEEE Symposium on Service-Oriented System Engineering. IEEE, 2016: 433-441. [18] Batini C, Rula A, Scannapieco M, et al. From data quality to big data quality[J]. Journal of Database Management, 2015, 26(1): 60-82. [19] Rao D, Gudivada V N, Raghavan V V. Data quality issues in big data[C]// Proceedings of IEEE International Conference on Big Data. IEEE, 2015: 2654-2660. [20] Haryadi A F. Requirements on and antecedents of big data quality: An empirical examination to improve big data quality in financial service organizations[D]. Delft: Delft University of Technology, 2016: 13. [21] Glowalla P, Balazy P, Basten D, et al. Process-driven data quality management—An application of the combined conceptual life cycle model[C]// Proceedings of the 2014 47th Hawaii International Conference on System Sciences. Washington DC: IEEE Computer Society, 2014: 4700-4709. [22] Clarke. The OECD guidelines[EB/OL]. [2017-4-4]. http://www.rogerclarke.com/DV/PaperOECD.html. [23] Soares S. Big data governance[M]// An Emerging Imperative. MC Press, 2012. [24] Aggarwal A. Data quality evaluation framework to assess the dimensions of 3V’s of big data[J]. International Journal of Emerging Technology and Advanced Engineering, 2017, 7(10): 503-506. [25] Toivonen M. Big data quality challenges in the context of business analytics[D]. Helsinki: University of Helsinki, 2015: 47-48. [26] Kl?s M, Trendowicz A, Jedlitschka A. What makes big data different from a data quality assessment perspective? Practical challenges for data and information quality research[R]. ODQ2015 30 March 2015, Garching, Germany. [27] Ardagna D, Cappiello C, Samá W, et al. Context-aware data quality assessment for big data[J]. Future Generation Computer Systems, 2018, 89: 548-562. [28] 张绍华, 潘蓉, 宗宇伟. 大数据治理与服务[M]. 上海: 上海科学技术出版社, 2016: 120. [29] Juddoo S. Overview of data quality challenges in the context of Big Data[C]// Proceedings of the 2015 International Conference on Computing, Communication and Security. IEEE, 2015: 1-9. [30] Sneed H M, Erdoes K. Testing big data (assuring the quality of large databases)[C]// Proceedings of the 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops. IEEE, 2015: 1-6. [31] Liedtke C A. Quality, analytics, and big data[R]. Strategic Improvement Systems, 2016. [32] 蔡莉, 朱扬勇. 大数据质量[M]. 上海: 上海科学技术出版社, 2017: 5. [33] Federal D A S. Data quality framework, version 1.0[R]. Justice Sector Information Strategy, Ministry of Justice, US, 2008. [34] Parkinson J. Six big data challenges[EB/OL]. [2017-02-01]. http://www.cioinsight.com/c/a/Expert-Voices/Managing-Big-Data-Six-Operational-Challenges-484979. [35] Loshin D. Big data analytics: From strategic planning to enterprise integration with tools, techniques, NoSQL, and graph[M]. Morgan Kaufmann Publishers, 2013: 13. [36] Ge M, Dohnal V. Quality management in big data[J]. Informatics, 2018, 5: 19. [37] Calder A. ISO/IEC 38500: The IT governance standard[M]. IT Governance Publishing, 2008. [38] Data Governance Institute. The DGI data governance framework[R]. 2009. [39] IBM Corporation. IBM data governance council maturity model: Building a roadmap for effective data governance[R]. 2007. [40] ISACA. COBIT 5: Enabling information[M]. ISA, 2013. [41] Gartner Group. Big data[EB/OL]. http:// www.gartner.com/it-glossary/big-data. [42] DAMA International. DAMA数据管理知识体系指南[M]. 马欢, 刘晨, 等译. 北京: 清华大学出版社, 2012. [43] Taleb I, Dssouli R, Serhani M A. Big data pre-processing: A quality framework[C]// Proceedings of the IEEE International Congress on Big Data. IEEE, 2015: 191-198. [44] Taleb I, Serhani M A, Dssouli R. Big data quality: A survey[C]// Proceedings of the 2018 IEEE International Congress on Big Data. IEEE, 2018: 166-173. [45] Chen Y T, Sun E W, Lin Y B. Coherent quality management for big data systems: a dynamic approach for stochastic time consistency[J]. Annals of Operations Research, 2018: Article No. 2795. [46] Cheah Y W, Canon R, Plale B, et al. Milieu: Lightweight and configurable big data provenance for science[C]// Proceedings of the 2013 IEEE International Congress on Big Data. IEEE, 2013: 46-53. [47] Beckеr D, King T D, McMullеn B. Big data, big data quality problеm[C]// Proceedings of the 2015 IEEE Intеrnational Conferencе on Santa Clara. IEEE, 2015: 2644-2653. [48] Pawar S H, Thakore D. An assessment model to evaluate quality attributes in big data quality[J]. International Journal of Computer Science Trends and Technology, 2017, 5(2): 373-376. [49] Reddy G M, Deshmukh G, Kumar R A, et al. Enhanced big data quality frame work[J]. International Journal of Computer Science and Information Technologies, 2016, 7(3): 1408-1409. [50] Saha B, Srivastava D. Data quality: The other face of Big Data[C]// Proceedings of the International Conference on Data Engineering. IEEE, 2014: 1294-1297. [51] 金范. 数据质量管理与安全管理[M]. 上海: 上海科学技术出版社, 2016: 47. [52] Soares S. 大数据治理[M]. 匡斌, 译. 北京: 清华大学出版社, 2014. [53] Taleb I, El Kassabi H T, Serhani M A, et al. Big data quality: A quality dimensions evaluation[C]// Proceedings of the 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress. IEEE, 2016: 759-765. [54] Merino J, Caballero I, Rivas B, et al. A data quality in use model for big data[J]. Future Generation Computer Systems, 2016, 63: 123-130. [55] Krogstie J, Gao S. A semiotic approach to investigate quality issues of open big data ecosystems[M]// Information and Knowledge Management in Complex Systems. Springer International Publishing, 2015: 41-50. [56] Bizer C. Quality-driven information filtering—in the context of web-based information systems[M]. Saarbrücken: VDM Verlag, 2007: 1-22. [57] Desai K Y. Big data quality modeling and validation[D]. San Jose: San José State University, 2018, 5: 18-58. [58] Fabijan A, Helena H O, Bosch J. Customer feedback and data collection techniques in software R&D: A literature review[C]// Proceedings of the International Conference of Software Business. Springer: 2015, 1: 139-153. [59] Bertino E. Big data—Opportunities and challenges panel position paper[C]// Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications Conference. Washington DC: IEEE Computer Society, 2013: 479-480. [60] 莫祖英. 大数据质量测度模型构建[J]. 情报理论与实践, 2018, 41(3): 11-15. [61] Floridi L. Big data and information quality[M]// The Philosophy of Information Quality. Springer International Publishing, 2014: 303-315. [62] Abdullah N, Ismail S A, Sophiayati S, et al. Data quality in big data: A review[J]. International Journal of Advances in Soft Computing and its Applications, 2015: 17-27. [63] Sukumar S R, Natarajan R, Ferrell R K. Quality of big data in health care[J]. International Journal of Health Care Quality Assurance, 2015, 28(6): 621-634. [64] Firmani D, Mecella M, Scannapieco M, et al. On the meaningfulness of “Big Data Quality”[J]. Data Science and Engineering, 2016, 1(1): 6-20. [65] Juddoo S. Overview of data quality challenges in the context of Big Data[C]// Proceedings of the 2015 International Conference on Computing, Communication and Security. IEEE, 2016. [66] Dumbill E. Making sense of big data[J]. Big Data, 2013, 1(1): 1-2. [67] Becker D, King T D, McMullen B, et al. Big data quality case study preliminary findings[R]. U.S. Army Medcom Mods, 2013: 1-54. [68] Kl?s M, Putz W, Lutz T. Quality evaluation for big data: A scalable assessment approach and first evaluation results[C]// Proceedings of the Joint Conference of the International Workshop on Software Measurement & the International Conference on Software Process & Product Measurement. IEEE, 2017. [69] Yao L, Ge Z. Big data quality prediction in the process industry: A distributed parallel modeling framework[J]. Journal of Process Control, 2018, 68: 1-13. [70] Farzi S, Dastjerdi A B. Data quality measurement using data mining[J]. International Journal of Computer Theory and Engineering, 2010, 2(1): 115-118. [71] Han R, Nie L, Ghanem M M, et al. Elastic algorithms for guaranteeing quality monotonicity in big data mining[C]// Proceedings of the 2013 IEEE International Conference on Big Data, 2013: 45-50. [72] Li L L, Li J Z, Gao H. Evaluating entity-description conflict on duplicated data[J]. Journal of Combinatorial Optimization, 2016, 31(2): 918-941. [73] Lai S T, Leu F Y. An iterative and incremental data preprocessing procedure for improving the risk of big data project[C]// Proceedings of the International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing. Heidelberg: Springer, 2017, 612: 483-492. [74] Lin Y M, Wang H Z, Li J Z, et al. Data source selection for information integration in big data era[J]. Information Sciences, 2019, 479: 197-213. [75] Miao D, Li J, Liu X, et al. Vertex cover in conflict graphs: Complexity and a near optimal approximation[C]// Proceedings of the International Conference on Combinatorial Optimization and Applications. New York: Springer, 2015: 395-408. [76] Heinrich B, Hristova D. A fuzzy metric for currency in the context of Big Data[C]// Proceedings of the Twenty Second European Conference on Information Systems, 2014: 1-15. [77] Li M H, Li J Z, Cheng S Y. Uncertain rule based method for evaluating data currency[J]. Journal of Software, 2014, 25(S2): 147-156. [78] Endler G, Baumg?rtel P, Wahl A M, et al. Is estimation of data completeness through time series forecasts feasible[C]// Proceedings of the Advances in Databases and Information Systems. Springer International Publishing, 2015: 261-274. [79] Razniewski S, Nutt W. Assessing the completeness of geographical data[C]// Proceedings of the Big Data. Berlin: Springer, 2013: 228-237. [80] Emran N A, Embury S, Missier P, et al. Measuring data completeness for microbial genomics database[C]// Proceedings of the Intelligent Information and Database Systems. Berlin: Springer, 2013: 186-195. [81] 周傲英, 金澈清, 王国仁, 等. 不确定性数据管理技术研究综述[J]. 计算机学报, 2009, 32(1): 1-16. [82] Zhang Y, Wang H Z, Yang Z S, et al. Relative accuracy evaluation[J]. PLoS ONE, 2014, 9(8): e103853. [83] Heinrich B, Klier M, Schiller A, et al. Assessing data quality–A probability-based metric for semantic consistency[J]. Decision Support Systems, 2018, 110: 95-106. [84] 罗纳德·巴赫曼, 吉多·肯珀, 托马斯·格尔策. 大数据时代下半场: 数据治理、驱动与变现[M]. 刘志则, 刘源, 译. 北京: 北京联合出版公司, 2017: 101. [85] Sidi F, Panahy P H S, Affendey L S, et al. Data quality: A survey of data quality dimensions[C]// Proceedings of the 2012 International Conference on Information Retrieval & Knowledge Management. IEEE, 2012: 300-304. [86] Ganapathi A, Chen Y, Ganapathi A, et al. Data quality: Experiences and lessons from operationalizing big data[C]// Proceedings of the IEEE International Conference on Big Data. IEEE, 2017. [87] 叶焕倬, 吴迪. 相似重复记录清理方法研究综述[J]. 现代图书情报技术, 2010, 26(9): 56-66. [88] 蒋勋, 刘喜文. 大数据环境下面向知识服务的数据清洗研究[J]. 图书与情报, 2013(5): 16-21. [89] 庞雄文, 姚占林, 李拥军. 大数据量的高效重复记录检测方法[J]. 华中科技大学学报(自然科学版), 2010(2): 8-11. [90] Williamson A. Big data and the implications for government[J]. Legal Information Management, 2014, 14(4): 253-257. [91] Ciancarini P, Poggi F, Russo D. Big data quality: a roadmap for open data[C]// Proceedings of the 2016 IEEE Second International Conference on Big Data Computing Service and Applications. IEEE, 2016: 210-215. [92] 洪学海, 王志强, 杨青海. 面向共享的政府大数据质量标准化问题研究[J]. 大数据, 2017(3): 44-52. [93] 马一鸣. 政府大数据质量评价体系构建研究[D]. 长春: 吉林大学, 2016. [94] Juddoo S, George C, Duquenoy P, et al. Data governance in the health industry: Investigating data quality dimensions within a big data context[J]. Applied System Innovation, 2018, 1(4): 43; [95] Juddoo S, George C. Discovering the most important data quality dimensions in health big data using latent semantic analysis[C]// Proceedings of the IEEE International Conference on Advances in Big Data, Computing and Data Communication Systems, Durban, South Africa, 2018. [96] Hoffman S. Medical big data and big data quality problems[J]. Social Science Electronic Publishing, 2014: 289-316. [97] 马国耀, 孙勇韬, 马玉玲. 数据校验技术在医疗健康大数据质量控制中的应用分析[J]. 中国卫生信息管理杂志, 2016, 13(4): 417-419. [98] 陈超. 电力大据质量评价模型及动态探查技术研究[J]. 现代电子技术, 2014(4): 153-155. [99] Hazen B, Boone C, Ezell J, et al. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications[J]. International Journal of Production Economics, 2014, 154: 72-80.