基于BiLSTM-CRF模型的食品安全事件词性自动标注研究

doi:10.3772/j.issn.1000-0135.2018.12.004

情报学报

2018, Vol. 37

Issue (12): 1204-1211 DOI: 10.3772/j.issn.1000-0135.2018.12.004

Current Issue | Archive | Adv Search

Part-of-Speech Automated Annotation of Food Safety Events Based on BiLSTM-CRF

Xu Fei^{1, 2}, Ye Wenhao³, Song Yinghua^{1, 2}

1. School of Management, Wuhan University of Technology, Wuhan 430070;
2. China Research Center for Emergency Management, Wuhan University of Technology, Wuhan 430070;
3. School of Information Management, Nanjing University, Nanjing 210023

Abstract
Figure/Table
References
Related Citation (1)

Download: PDF (425 KB) HTML (1 KB)
Export: BibTeX | EndNote (RIS)

Abstract The accuracy and recall rate of part-of-speech annotation directly affects the overall effect of knowledge and strategy mining of subsequent food-safety incidents, which not only directly affects the performance of term and entity extraction in food-safety events, but also, to some extent, determines the accuracy of classification, clustering, and association knowledge mining related to food-safety events. The experiment of part-of-speech annotation is conducted based on traditional machine learning and deep learning models, such as CRF, RNN BiLSTM, and BiLSTM- CRF. The result of forty groups of experiments shows that the annotation F-scores of the deep learning models is higher than those of the CRF, among which, the average F-score of RNN and BiLSTM is 2.43% and 3.93% higher, respectively. The overall performance of BiLSTM-CRF, which systemically integrates the optimal characteristics of both BiLSTM and CRF, reaches the best level, in which the F-score is 7.12% higher than that of BiLSTM and the F-score of the best model is 95.89%.

Key words： part-of-speech food safety events deep learning model conditional random field

Received: 14 August 2018

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xu Fei
	Ye Wenhao
	Song Yinghua

Cite this article:

Xu Fei,Ye Wenhao,Song Yinghua. Part-of-Speech Automated Annotation of Food Safety Events Based on BiLSTM-CRF[J]. 情报学报, 2018, 37(12): 1204-1211.

URL:

https://qbxb.istic.ac.cn/EN/10.3772/j.issn.1000-0135.2018.12.004 OR https://qbxb.istic.ac.cn/EN/Y2018/V37/I12/1204

[1] 王娇娇. 我国食品安全监管现状与对策研究[J]. 现代食品, 2016, 3(5): 42-43.
[2] 张星联. 我国食品安全预警数据库系统的建设与实现[J]. 食品科技, 2008, 33(12): 250-254.
[3] 吴云红. 食品监管改革的关键——基于互联网的动态第三方数据库[J]. 食品工业科技, 2009(9): 272-274.
[4] 余清. 加工食品风险数据库的构建思路[J]. 价值工程, 2013(30): 174-175.
[5] 阮伟玲. 四川省彭州市三界镇农民专业合作社调查研究[J]. 北京农业, 2014(3): 247-248.
[6] 王东波, 叶文豪, 吴毅, 等. 基于多特征时间抽取模型的食品安全事件演化序列生成研究[J]. 情报学报, 2017, 36(9): 930- 939.
[7] 张越, 王东波, 朱丹浩. 面向食品安全突发事件汉语分词的特征选择及模型优化研究[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[8] 奚雪峰, 周国栋. 面向自然语言处理的深度学习研究[J]. 自动化学报, 2016, 42(10): 1445-1465.
[9] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537.
[10] Zheng X, Chen H, Xu T. Deep learning for Chinese word segmentation and POS tagging[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013: 647-657.
[11] Bosco A, Laganà D, Musmanno R, et al. Modeling and solving the mixed capacitated general routing problem[J]. Optimization Letters, 2013, 7(7): 1451-1469.
[12] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer Science, 2015.
[13] Plank B, S?gaard A, Goldberg Y. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 412-418.
[14] Yang J, Teng Z, Zhang M, et al. Combining Discrete and Neural Features for Sequence Labeling[C]// Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics. Cham: Springer, 2016: 140-154.
[15] 谢逸, 饶文碧, 段鹏飞, 等. 基于CNN和LSTM混合模型的中文词性标注[J]. 武汉大学学报(理学版), 2017, 63(3): 246-250.
[16] 郑亚楠, 珠杰. 基于词向量的藏文词性标注方法研究[J]. 中文信息学报, 2017, 31(1): 112-117.
[17] 司念文, 王衡军, 李伟. 基于注意力长短时记忆网络的中文词性标注模型[J]. 计算机科学, 2018, 4: 009.
[18] Lafferty J, McCallum A, Pereira,F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the Eighteenth International Conference on Machine Learning. Williamstown: WilliamsCollege, 2001: 282-289.
[19] Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533.
[20] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.

Editorial Office: JCSSTI Editorial Office, No.15 fuxing road, haidian, Beijing 100038
Tel: +86(010)68598273; Fax: +86(010)68598285; E-mail: qbxb@istic.ac.cn
Copyright © 2015 by the Journal of The China Society for Scientific and Technical Information
ISSN: 1000-0135 CN: 11-2257 / G3