|
|
Deep Learning-Based Classification of Pre-Qin Classics Questions |
Wang Dongbo1, Gao Ruiqing1, Shen Si2, Li Bin3 |
1. College of Information Science and Technology, Nanjing Agricultural University, Nanjing, 210095; 2. School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094; 3. College of Literature, Nanjing Normal University, Nanjing, 210097 |
|
|
Abstract In recent years, the automated question answering system has become a research hotspot in the fields of machine learning, information retrieval, and natural language processing. This question answering system provides simple and accurate answers in a natural language to the questions posed by users. Since question classification is the first step toward developing a question answering system, the classification results make a direct impact on the quality of a question answering system. However, most of the current question classification research in the field focuses on modern Chinese, and there are relatively few studies on the classification of the questions related to ancient Chinese. This paper starts with the concept of question classification and constructs the question classification system for ancient documents; and then uses TF-IDF to extract the category feature words. We use a support vector machine, conditional random fields, and a deep learning model, to conduct the classics question automatic classification experiment. The results show that the Bi-LSTM model offers the best classification among the three, and delivered a reconciliation average of 94.78 on the seven categories proposed in this paper, which has a strong application value.
|
Received: 06 February 2018
|
|
|
|
[1] 中共中央办公厅、国务院办公厅印发《关于实施中华优秀传统文化传承发展工程的意见》[EB/OL]. [2017-01-25]. http://www. gov.cn/zhengce/2017-01/25/content_5163472.html. [2] 教育部关于印发《普通高中课程方案和语文等学科课程标准(2017年版)》的通知[EB/OL]. [2018-01-25]. http://www.moe. edu.cn/srcsite/A26/s8001/201801/t20180115_324647.html. [3] 康海燕, 李飞娟, 苏文杰. 基于问句表征的web智能问答系统[J]. 北京信息科技大学学报(自然科学版), 2011, 26(1): 36-41. [4] 陈玉. 问答系统中问句分类算法研究[J]. 软件工程, 2015(11): 30-31. [5] 文勖, 张宇, 刘挺, 等. 基于句法结构分析的中文问题分类[J]. 中文信息学报, 2006, 20(2): 33-39. [6] 余正涛, 樊孝忠, 郭剑毅. 基于支持向量机的汉语问句分类[J]. 华南理工大学学报(自然科学版), 2005, 33(9): 25-29. [7] 张亮, 黄河燕, 胡春玲. 基于Ontology的中文问答系统问题分类研究[J]. 中国图书馆学报, 2006, 32(2): 60-65. [8] 孙景广, 蔡东风, 吕德新, 等. 基于知网的中文问题自动分类[J]. 中文信息学报, 2007, 21(1): 90-95.贾可亮, 樊孝忠, 许进忠. 基于KNN的汉语问句分类[J]. 微电子学与计算机, 2008, 25(1): 156-158. [9] 刘颖, 韩杰, 滕至阳, 等. 基于支持向量机的问句分析[J]. 计算机技术与发展, 2007, 17(8): 1-4. [10] 张雪芬, 李德玉, 王素格, 等. 基于统计方法的面向旅游问句分类实验研究[J]. 电脑开发与应用, 2009, 22(1): 14-16. [11] 郭海红, 李姣, 代涛. 中文健康问句分类与语料构建[J]. 情报工程, 2016, 2(6): 39-49. [12] Sarrouti M, Ouatik E A S. A machine learning-based method for question type classification in biomedical question answering[J]. Methods of Information in Medicine, 2017, 56(3): 209. [13] Xiao G, Mo J, Chow E, et al.Multi-task CNN for classification of Chinese legal questions[C]// Proceedings of the 2017 IEEE 14th International Conference on e-Business Engineering. IEEE, 2017: 84-90. [14] 欧石燕. 基于文本蕴涵的受限领域自动问答方法研究[J]. 情报学报, 2011, 30(5): 540-547. [15] 欧石燕, 唐振贵. 面向图书馆关联数据的自动问答技术研究[J]. 中国图书馆学报, 2015, 41(6): 44-60. [16] Zhou Z, Zhu X, He Z, et al.Question classification based on hybrid neural networks[C]// Advances in Computer Science Research. Atlantis Press, 2016: 44-52. [17] Chao L I, Chai Y M, Nan X F, et al.Research on problem classification method based on deep learning[J]. Computer Science, 2016, 43(12): 115-119. [18] Razzaghnoori M, Sajedi H, Jazani I K.Question classification in Persian using word vectors and frequencies[J]. Cognitive Systems Research, 2018, 47: 16-27. [19] Hsu C W, Lin C J.A comparison of methods for multi-class support vector machines[J]. IEEE Transacatuions on Netural networks, 2002, 13(23): 415-425. [20] Lafferty J, McCallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the Eighteenth International Conference on Machine Learning. Williamstown: Williams College, 2001: 282-289. [21] Schuster M, Paliwal K.Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11): 2673-2681. [22] Hochreiter S, Schmidhuber J.Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [23] 任梦菲, 王鹏, 蔡恒进, 等. 股票领域中的一种中文问句分类方法[J]. 计算机科学与应用, 2011, 1(3): 134-139. [24] Srivastavan H, Krizhevsky A, et al.dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958. |
|
|
|