|
|
Research on Privacy Data Identification and Measurement Based on Medical Information Text |
Zhang Kailiang1,2, Zang Guoquan1,2, Xiao Yang1 |
1.School of Information Management, Zhengzhou University, Zhengzhou 450001 2.Research Institute of Data Science, Zhengzhou City, Zhengzhou 450001 |
|
|
Abstract The results of data classification in medical industry standards are fuzzy, with few accompanying measurement results. Considering existing problems, this study adopted medical information text mining to objectively measure medical data privacy. Measurement results can provide a reference for verifying and improving current medical data classification results. In this study, the sources of medically sensitive data included industry standards, legal regulations, academic papers, and breach cases. The medically sensitive data unit is composed of sensitive nouns (also known as sensitive data items), sensitive verbs, and sensitive degree words, which are used in the privacy recognition model. The privacy measurement model considers the sensitivity, semantic strength, and text strength of sensitive data. In ranking the results of privacy values, medical application data ranked the highest, followed by health status, medical payment, and personal attribute data.
|
Received: 15 January 2024
|
|
|
|
1 吴丹, 马乐. 基于可穿戴设备的医疗健康数据生命周期管理与服务研究[J]. 信息资源管理学报, 2018, 8(4): 15-27. 2 央视财经. 美国联合健康集团: 子公司数据泄露或致损失16亿美元[EB/OL]. (2024-04-17) [2024-06-21]. https://baijiahao.baidu.com/s?id=1796538992019896228&wfr=spider&for=pc. 3 CNCERT国家工程研究中心. Kaiser医疗集团泄露了近7万份医疗记录[EB/OL]. (2022-07-13) [2024-04-23]. https://mp.weixin.qq.com/s?__biz=MzUzNDYxOTA1NA==&mid=2247529450& idx=3&sn=d39000f688f29858a18190badd12a4e9&chksm=fa93c 52bcde44c3dad5d93e2549f10020c64058795595b76f8bb6c921138584fa5f04269ef75&scene=27. 4 国家市场监督管理总局, 国家标准化管理委员会. 中华人民共和国国家标准GB/T 39725—2020: 信息安全技术 健康医疗数据安全指南[S/OL]. (2020-12-14) [2024-04-23]. http://c.gb688.cn/bzgk/gb/showGb?type=online&hcno=239351905E7B62A7DF537856738247CE. 5 广东省卫生经济学会. 团体标准T/GDWJ 013—2022: 广东省健康医疗数据安全分类分级管理技术规范[S/OL]. (2022-07-15) [2024-04-23]. http://www.ttbz.org.cn/Pdfs/Index/?ftype=st& pms=64952. 6 相丽玲, 王凰. 中外个人健康医疗数据保护标准比较[J]. 情报理论与实践, 2022, 45(3): 188-193, 187. 7 李慧敏, 陈光. 论数据驱动创新与个人信息保护的冲突与平衡――基于对日本医疗数据规制经验的考察[J]. 中国科学院院刊, 2020, 35(9): 1143-1151. 8 高富平. 论医疗数据权利配置——医疗数据开放利用法律框架[J]. 现代法学, 2020, 42(4): 52-68. 9 童峰, 张小红, 刘金华. 大数据时代个人健康医疗信息的立法保护[J]. 情报资料工作, 2020, 41(3): 105-112. 10 Hosseini A, Emami H, Sadat Y, et al. Integrated personal health record (PHR) security: requirements and mechanisms[J]. BMC Medical Informatics and Decision Making, 2023, 23(1): Article No.116. 11 徐文玉, 吴磊, 阎允雪. 基于区块链和同态加密的电子健康记录隐私保护方案[J]. 计算机研究与发展, 2018, 55(10): 2233-2243. 12 Sung M, Cha D, Park Y R. Local differential privacy in the medical domain to protect sensitive information: algorithm development and real-world validation[J]. JMIR Medical Informatics, 2021, 9(11): e26914. 13 Sahana L L R, Ranganatha H R. An enhanced data anonymization approach for privacy preserving data publishing in cloud computing based on genetic chimp optimization[J]. International Journal of Information Security and Privacy, 2022, 16(1): 1-20. 14 臧国全, 王家振, 毕崇武, 等. 政府数据中敏感数据识别与隐私计量研究[J]. 图书情报工作, 2022, 66(15): 66-75. 15 李伟伟, 张涛, 林为民, 等. 基于文本内容的敏感数据识别方法研究与实现[J]. 计算机工程与设计, 2013, 34(4): 1202-1206. 16 Yoon T, Park S Y, Cho H G. A smart filtering system for newly coined profanities by using approximate string alignment[C]// Proceedings of the 10th IEEE International Conference on Computer and Information Technology. Piscataway: IEEE, 2010: 643-650. 17 李瀛, 王冠楠. 网络新闻敏感信息识别与风险分级方法研究[J]. 情报理论与实践, 2022, 45(4): 105-112. 18 Xiong P, Liang L, Zhu Y L, et al. PriTxt: a privacy risk assessment method for text data based on semantic correlation learning[J]. Concurrency and Computation: Practice and Experience, 2022, 34(5): e6680. 19 陈春玲, 姜慧敏, 郭永安. 基于两阶段特征选择的医疗敏感文本分类[J]. 计算机技术与发展, 2020, 30(8): 129-133. 20 Wang D, Wang P. Two birds with one stone: two-factor authentication with security beyond conventional bound[J]. IEEE Transactions on Dependable and Secure Computing, 2018, 15(4): 708-722. 21 吴丁娟. 大数据环境下居民对个人医疗信息被访的容忍度研究[J]. 医学与社会, 2021, 34(2): 72-76. 22 Kang J, Lan J Y, Yan H Y, et al. Antecedents of information sensitivity and willingness to provide[J]. Marketing Intelligence & Planning, 2022, 40(6): 787-803. 23 Harrison G W, Elisabet Rutstr?m E. Experimental evidence on the existence of hypothetical bias in value elicitation methods[M]// Handbook of Experimental Economics Results. Amsterdam: Elsevier, 2008: 752-767. 24 臧国全, 贾瑞莹. 医疗数据中病种隐私的计量与分析[J]. 现代情报, 2020, 40(5): 161-168. 25 邓胜利, 赵海平. 信息泄露情境下的个人信息价值评估及个体 差异: 基于离散选择模型的实证研究[J]. 情报学报, 2019, 38(3): 266-276. 26 张凯亮, 臧国全. 泄露概率情境下的个人数据隐私计量研究[J]. 图书情报工作, 2021, 65(9): 62-69. 27 Li X B, Liu X P, Motiwalla L. Valuing personal data with privacy consideration[J]. Decision Sciences, 2021, 52(2): 393-426. 28 刘海涛, 黄伟. 计量语言学的现状、理论与方法[J]. 浙江大学学报(人文社会科学版), 2012, 42(2): 178-192. 29 江腾蛟, 万常选, 刘德喜, 等. 基于语义分析的评价对象-情感词对抽取[J]. 计算机学报, 2017, 40(3): 617-633. 30 赵妍妍, 秦兵, 车万翔, 等. 基于句法路径的情感评价单元识别[J]. 软件学报, 2011, 22(5): 887-898. 31 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报(信息科学版), 2010, 28(6): 602-608. 32 郑新曼, 董瑜. 基于科技政策文本的程度词典构建研究[J]. 数据分析与知识发现, 2021, 5(10): 81-93. 33 Rashkin H, Singh S, Choi Y. Connotation frames: typed relations of implied sentiment in predicate-argument structure[OL]. (2015-06-16) [2024-04-23]. https://arxiv.org/pdf/1506.02739v2. 34 肖洋, 臧国全. 个人金融数据的敏感性识别与隐私计量研究[J]. 情报理论与实践, 2023, 46(9): 105-114, 86. 35 陈祉如, 郭亮, 杜艳, 等. 基于改进层次分析法的电能计量系统综合评价[J]. 山东大学学报(工学版), 2022, 52(6): 167-175. |
|
|
|