|
|
Key Technologies of Event Base Construction and Event Detection in the Field of Science and Technology |
Liu Yao1, Fang Xiaowei2, Qin Xun2 |
1.Institute of Scientific and Technical Information of China, Beijing 100038 2.School of Software & Microelectronics, Peking University, Beijing 100091 |
|
|
Abstract This study proposes a method to construct a risk event base and event detection model in the field of science and technology. By analyzing the text features of online news source data, the meta-event resource database and theme-event regenerated resource database models are constructed using crawled news in the field of science and technology, and a comprehensive evaluation model is proposed for event detection. For event detection, this study proposes a two-branch transformer model for risk events, which extracts lexical features related to risk degree from risk events and reduces the interference caused by text domain features to risk event classification, in order to identify risk events. The experimental results show that the proposed risk event detection model and the index for judging the risk propensity of meta-events are effective. This study can provide a reference for risk event base construction in the field of science and technology, and the proposed language model can provide a methodological and technical reference for the study of risk event detection.
|
Received: 01 April 2021
|
|
|
|
1 陈贺. 财经领域事件抽取技术的研究与应用[D]. 北京: 北京理工大学, 2017: 2. 2 高强, 游宏梁. 事件抽取技术研究综述[J]. 情报理论与实践, 2013, 36(4): 114-117, 128. 3 路瑶. 一种基于正规树模式匹配的复杂事件检测方法[D]. 北京: 北京工业大学, 2016: 13. 4 Chieu H L, Ng H T. A maximum entropy approach to information extraction from semi-structured and free text[C] //Proceedings of the 18th National Conference on Artificial Intelligence. Menlo Park: American Association for Artificial Intelligence, 2002: 786-791. 5 张玉. 基于微博的突发事件检测方法研究[D]. 兰州: 兰州大学, 2016: 14. 6 Llorens H, Saquete E, Navarro-Colorado B. TimeML events recognition and classification: learning CRF models with semantic roles[C]// Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2010: 725-733. 7 刘振. 基于网络科技信息的事件抽取研究[J]. 情报科学, 2018, 36(9): 115-117, 122. 8 Boguraev B, Munoz R, Pustejovsky J. Preface[C]// Proceedings of the Workshop on Annotating and Reasoning about Time and Events. Stroudsburg: Association for Computational Linguistics, 2006: v. 9 丁效, 宋凡, 秦兵, 等. 音乐领域典型事件抽取方法研究[J]. 中文信息学报, 2011, 25(2): 15-20. 10 Ahn D. The stages of event extraction[C]// Procceedings of the Workshop on Annotating and Reasoning about Time and Events. Stroudsburg: Association for Computational Linguistics, 2006: 1-8. 11 Walker C, Strassel S, Medero J, et al. ACE 2005 Multilingual Training Corpus LDC2006T06[DB/OL]. Philadelphia: Linguistic Data Consortium, (2006-02-15) [2020-04-15]. https://catalog.ldc.upenn.edu/LDC2006T06. 12 彭楚越. 基于神经网络方法的事件抽取研究[J]. 现代计算机, 2020(6): 47-50. 13 许旭阳, 李弼程, 张先飞, 等. 基于事件实例驱动的新闻文本事件抽取[J]. 计算机科学, 2011, 38(8): 232-235. 14 丁效. 句子级中文事件抽取关键技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2011: 20-22. 15 Linguistic Data Consortium. ACE (automatic content extraction) English Annotation Guidelines for Events[OL]. Version 5.4.3. (2005-07-01) [2020-04-15]. https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/english-events-guidelines-v5.4.3.pdf. 16 Quinlan J R. Induction of decision trees[J]. Machine Learning, 1986, 1(1): 81-106. 17 Hong T P, Lin C W, Yang K T, et al. Using TF-IDF to hide sensitive itemsets[J]. Applied Intelligence, 2013, 38(4): 502-510. 18 Hakim A A, Erwin A, Eng K I, et al. Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach[C]// Proceedings of the 2014 6th International Conference on Information Technology and Electrical Engineering. IEEE, 2014: 1-4. 19 范小丽, 刘晓霞. 文本分类中互信息特征选择方法的研究[J]. 计算机工程与应用, 2010, 46(34): 123-125. 20 Joseph K, Carley K M, Filonuk D, et al. Arab Spring: from newspaper[J]. Social Network Analysis and Mining, 2014, 4: Article No.177. 21 马慧芳, 刘晓倩, 马兰, 等. 融合语义与图结构的短文本特征提取算法[J]. 小型微型计算机系统, 2019, 40(9): 1864-1868. 22 黄图其. 基于神经网络翻译模型的事件文本特征提取系统[D]. 北京: 北京邮电大学, 2018: 18-20. 23 Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging[OL]. (2015-08-09) [2020-04-15]. https://arxiv.org/pdf/1508.01991.pdf. 24 陈宏, 陈伟. 基于突发特征分析的事件检测[J]. 计算机应用研究, 2011, 28(1): 117-120. 25 Ifrim G, Shi B C, Brigadir I. Event detection in Twitter using aggressive filtering and hierarchical tweet clustering[C]// Proceedings of the Second Workshop on Social News on the Web. New York: ACM Press, 2014: 33-40. 26 Leban G, Fortuna B, Brank J, et al. Event registry: learning about world events from news[C]// Proceedings of the 23rd International Conference on World Wide Web. New York: ACM Press, 2014: 107-110. 27 Saaty T L, Kearns K P. The analytic hierarchy process[M]// Analytical Planning. Amsterdam: Elsevier, 1985: 19-62. 28 Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 6000-6010. 29 Kullback S, Leibler R A. On information and sufficiency[J]. The Annals of Mathematical Statistics, 1951, 22(1): 79-86. |
|
|
|