Chinese Disease Name Normalization Based on Multi-task Learning and Polymorphic Semantic Features
Han Pu1,2, Zhang Zhanpeng1, Zhang Wei1
1.School of Management, Nanjing University of Posts & Telecommunications, Nanjing 210003 2.Jiangsu Provincial Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023
韩普, 张展鹏, 张伟. 基于多任务学习和多态语义特征的中文疾病名称归一化研究[J]. 情报学报, 2021, 40(11): 1234-1244.
Han Pu, Zhang Zhanpeng, Zhang Wei. Chinese Disease Name Normalization Based on Multi-task Learning and Polymorphic Semantic Features. 情报学报, 2021, 40(11): 1234-1244.
1 Magumba M A, Nabende P, Mwebaze E. Ontology boosted deep learning for disease name extraction from Twitter messages[J]. Journal of Big Data, 2018, 5(1): 1-19. 2 陈美杉, 夏晨曦. 肝癌患者在线提问的命名实体识别研究: 一种基于迁移学习的方法[J]. 数据分析与知识发现, 2019, 3(12): 61-69. 3 Grover S, Aujla G S. Prediction model for influenza epidemic based on Twitter data[J]. International Journal of Advanced Research in Computer and Communication Engineering, 2014, 3(7): 7541-7545. 4 王萍, 牟冬梅, 高和璇, 等. 基于传染病监测数据的危机探测研究[J]. 情报学报, 2019, 38(5): 492-499. 5 Chen L T, Baird A, Straub D. Fostering participant health knowledge and attitudes: an econometric study of a chronic disease-focused online health community[J]. Journal of Management Information Systems, 2019, 36(1): 194-229. 6 Thelwall M, Buckley K. Topic-based sentiment analysis for the social web: the role of mood and issue‐related words[J]. Journal of the American Society for Information Science and Technology, 2013, 64(8): 1608-1617. 7 Li S, Yu C H, Wang Y C, et al. Exploring adverse drug reactions of diabetes medicine using social media analytics and interactive visualizations[J]. International Journal of Information Management, 2019, 48: 228-237. 8 Karimi S, Metke-Jimenez A, Kemp M, et al. CADEC: a corpus of adverse drug event annotations[J]. Journal of Biomedical Informatics, 2015, 55: 73-81. 9 Ching T, Himmelstein D S, Beaulieu-Jones B K, et al. Opportunities and obstacles for deep learning in biology and medicine[J]. Journal of the Royal Society Interface, 2018, 15(141): 20170387. 10 Leaman R, Islamaj Do?an R, Lu Z Y. DNorm: disease name normalization with pairwise learning to rank[J]. Bioinformatics, 2013, 29(22): 2909-2917. 11 韩普, 马健, 张嘉明, 等. 基于多数据源融合的医疗知识图谱框架构建研究[J]. 现代情报, 2019, 39(6): 81-90. 12 林泽斐, 欧石燕. 多特征融合的中文命名实体链接方法研究[J]. 情报学报, 2019, 38(1): 68-78. 13 Luo Y, Song G J, Li P Y, et al. Multi-task medical concept normalization using multi-view convolutional neural network[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018. 14 Zhang Y Z, Ma X J, Song G J. Chinese medical concept normalization by using text and comorbidity network embedding[C]// Proceedings of the 2018 IEEE International Conference on Data Mining. IEEE, 2018: 777-786. 15 Zhou S J, Li X. Feature engineering vs. deep learning for paper section identification: toward applications in Chinese medical literature[J]. Information Processing & Management, 2020, 57(3): 102206. 16 Ristad E S, Yianilos P N. Learning string-edit distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(5): 522-532. 17 Aronson A R. Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program[J]. Proceedings of the AMIA Symposium, 2001: 17-21. 18 Tsuruoka Y, McNaught J, Tsujii J, et al. Learning string similarity measures for gene/protein name dictionary look-up using logistic regression[J]. Bioinformatics, 2007, 23(20): 2768-2774. 19 Yang H. Automatic extraction of medication information from medical discharge summaries[J]. Journal of the American Medical Informatics Association, 2010, 17(5): 545-548. 20 Khare R, Li J, Lu Z Y. LabeledIn: cataloging labeled indications for human drugs[J]. Journal of Biomedical Informatics, 2014, 52: 448-456. 21 Kate R J. Normalizing clinical terms using learned edit distance patterns[J]. Journal of the American Medical Informatics Association, 2015, 23(2): 380-386. 22 Jonnagaddala J, Jue T R, Chang N W, et al. Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion[J]. Database, 2016, 2016: baw112. 23 Shi H R, Xie P T, Hu Z T, et al. Towards automated ICD coding using deep learning[OL]. (2017-11-30). https://arxiv.org/pdf/1711.04075.pdf. 24 Liu H W, Xu Y. A deep learning way for disease name representation and normalization[C]// Proceedings of the 8th National CCF Conference on Natural Language Processing and Chinese Computing. Cham: Springer, 2017: 151-157. 25 Limsopatham N, Collier N. Normalising medical concepts in social media texts by learning semantic representation[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 1014-1023. 26 Li H D, Chen Q C, Tang B Z, et al. CNN-based ranking for biomedical entity normalization[J]. BMC Bioinformatics, 2017, 18(Suppl 11): 385. 27 Tutubalina E, Miftahutdinov Z, Nikolenko S, et al. Sequence learning with RNNs for medical concept normalization in user-generated texts[OL]. (2018-11-29). https://arxiv.org/pdf/1811.11523. 28 Huang J M, Osorio C, Sy L W. An empirical evaluation of deep learning for ICD-9 code assignment using MIMIC-III clinical notes[J]. Computer Methods and Programs in Biomedicine, 2019, 177: 141-153. 29 Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning[C]// Proceedings of the 25th International Conference on Machine Learning. New York: ACM Press, 2008: 160-167. 30 Liu P F, Qiu X P, Huang X J. Recurrent neural network for text classification with multi-task learning[OL]. (2016-05-17). https://arxiv.org/pdf/1605.05101. 31 Liu P F, Qiu X P, Huang X J. Adversarial multi-task learning for text classification[OL]. (2017-04-19). https://arxiv.org/pdf/1704.05742. 32 Yang J L, Liu Y N, Qian M H, et al. Information extraction from electronic medical records using multitask recurrent neural network with contextual word embedding[J]. Applied Sciences, 2019, 9(18): 3658. 33 Niu J H, Yang Y H, Zhang S H, et al. Multi-task character-level attentional networks for medical concept normalization[J]. Neural Processing Letters, 2019, 49(3): 1239-1256. 34 Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[OL]. (2019-05-24). https://arxiv.org/pdf/1810.04805. 35 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329. 36 吴俊, 程垚, 郝瀚, 等. 基于BERT嵌入BiLSTM-CRF模型的中文专业术语抽取研究[J]. 情报学报, 2020, 39(4): 409-418. 37 Li F, Jin Y H, Liu W S, et al. Fine-tuning bidirectional encoder representations from transformers (BERT)-based models on large-scale electronic health record notes: an empirical study[J]. JMIR Medical Informatics, 2019, 7(3): e14830. 38 Xu D F, Gopale M, Zhang J C, et al. Unified medical language system resources improve sieve-based generation and bidirectional encoder representations from transformers (BERT)-based ranking for concept normalization[J]. Journal of the American Medical Informatics Association, 2020, 27(10): 1510-1519. 39 Ji Z C, Wei Q, Xu H. BERT-based ranking for biomedical entity normalization[OL]. (2019-08-09). https://arxiv.org/ftp/arxiv/papers/1908/1908.03548.pdf. 40 Kalyan K S, Sangeetha S. BertMCN: mapping colloquial phrases to standard medical concepts using BERT and highway network[J]. Artificial Intelligence in Medicine, 2021, 112: 102008. 41 Lee K, Hasan S A, Farri O, et al. Medical concept normalization for online user-generated texts[C]// Proceedings of the IEEE International Conference on Healthcare Informatics. IEEE, 2017: 462-469. 42 Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. 43 Cho K, van Merri?nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1724-1734. 44 Kim Y. Convolutional neural networks for sentence classification[OL]. (2014-09-03). https://arxiv.org/pdf/1408.5882. 45 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[OL]. (2017-12-06). https://arxiv.org/pdf/1706.03762. 46 Dogan R I, Lu Z. An inference method for disease name normalization[C]// Proceedings of the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text. Palo Alto: AAAI Press, 2012: 8-13. 47 Karadeniz ?, ?zgür A. Linking entities through an ontology using word embeddings and syntactic re-ranking[J]. BMC Bioinformatics, 2019, 20(1): 156.