王毅, 谢娟, 成颖. 结合LSTM和CNN混合架构的深度神经网络语言模型[J]. 情报学报, 2018, 37(2): 194-205.
Wang Yi, Xie Juan, Cheng Ying. Deep Neural Networks Language Model Based on CNN and LSTM Hybrid Architecture. 情报学报, 2018, 37(2): 194-205.
[1] 文娟. 统计语言模型的研究与应用[D]. 北京: 北京邮电大学, 2010. [2] Bengio Y, Ducharme R, Vincent P, et al.A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3: 1137-1155. [3] Baroni M, Dinu G, Kruszewski G.Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 238-247. [4] Bengio S, Bengio Y.Taking on the curse of dimensionality in joint distributions using neural networks[J]. IEEE Transactions on Neural Networks, 2000, 11(3): 550-557. [5] Mikolov T, Chen K, Corrado G, et al.Efficient estimation of word representations in vector space[EB/OL]. [2013-09-07].https://arxiv.org/abs/1301.3781. [6] Hinton G E.Learning distributed representations of concepts[C]// Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Amherst, 1986, 1: 12. [7] Le Q, Mikolov T.Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. 2014, 14: 1188-1196. [8] Yu M, Dredze M.Improving lexical embeddings with semantic knowledge[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics,. 2014: 545-550. [9] Ma W C, Suel T.Structural sentence similarity estimation for short texts[C]// Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference. Association for the Advancement of Artificial Intelligence, 2016: 232-237. [10] Pennington J, Socher R, Manning C D.GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014, 14: 1532-1543. [11] Cohen J D, Servan-Schreiber D, McClelland J L. A parallel distributed processing approach to automaticity[J]. The American Journal of Psychology, 1992, `105(2): 239-269. [12] Elman J L.Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211. [13] Graves A.Supervised sequence labelling with recurrent neural networks[M]. Berlin: Springer, 2012: 15-35. [14] Mikolov T, Karafiát M, Burget L, et al.Recurrent neural network based language model[C]// Proceedings of the 11th Annual Conference of the International Speech Communication Association, Makuhar, 2010, 2: 3. [15] Mikolov T, Deoras A, Kombrink S, et al.Empirical evaluation and combination of advanced language modeling techniques[C]// Proceedings of the Twelfth Annual Conference of the International Speech Communication Association, 2011. [16] Bengio Y, Simard P, Frasconi P.Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on Neural Networks, 1994, 5(2): 157-166. [17] Hochreiter S, Bengio Y, Frasconi P, et al.Gradient flow in recurrent nets: the difficulty of learning long-term dependencies[EB/OL]. [2014-11-19].https://www.researchgate.net/publication/2839938_Gradient_Flow_in_Recurrent_Nets_the_Difficulty_of_Learning_Long-Term_Dependencies. [18] Lipton Z C, Berkowitz J, Elkan C.A critical review of recurrent neural networks for sequence learning[EB/OL]. [2015-06-05].https://arxiv.org/pdf/1506.00019. [19] Hochreiter S, Schmidhuber J.Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [20] Kang M, Ng T, Nguyen L.Mandarin word-character hybrid-input Neural Network Language Model[C]// Proceedings of the Conference on the 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011: 625-628. [21] Kombrink S, Mikolov T, Karafiát M, et al.Recurrent neural network based language modeling in meeting recognition[C]// Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011: 2877-2880. [22] Mikolov T, Kombrink S, Burget L, et al.Extensions of recurrent neural network language model[C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech Republic, 2011: 5528-5531. [23] Shi Y Z, Zhang W Q, Liu J, et al.RNN language model with word clustering and class-based output layer[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013: 22. [24] Karpathy A, Johnson J, Li F F.Visualizing and understanding recurrent networks[EB/OL]. [2015-11-17].https://arxiv.org/ pdf/1506.02078. [25] Ballesteros M, Dyer C, Smith N A.Improved transition-based parsing by modeling characters instead of words with LSTMs [EB/OL]. [2015-08-11].https://arxiv.org/abs/1508.00657. [26] Cho K, Van Merriënboer B, Gulcehre C, et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. [2014-09-03].https://arxiv.org/ abs/1406.1078. [27] Sutskever I, Vinyals O, Le Q V.Sequence to sequence learning with neural networks[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems. 2014: 3104-3112. [28] Kalchbrenner N, Grefenstette E, Blunsom P.A convolutional neural network for modelling sentences[EB/OL]. [2014-04-08].https://arxiv.org/abs/1404.2188. [29] Kim Y.Convolutional neural networks for sentence classification[EB/OL]. [2014-09-03].https://arxiv.org/abs/1408.5882. [30] Tang D Y, Qin B, Liu T.Document modeling with gated recurrent neural network for sentiment classification[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 1422-1432. [31] Wang L, Luís T, Marujo L, et al.Finding function in form: Compositional character models for open vocabulary word representation[EB/OL]. [2016-05-23].https://arxiv.org/abs/1508. 02096. [32] Kang M, Ng T, Nguyen L.Mandarin word-character hybrid-input Neural Network Language Model[C]// Proceedings of the 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 2011: 625-628. [33] Dos Santos C N, Zadrozny B. Learning character-level representations for part-of-speech tagging[C]// Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014, 32: 1818-1826. [34] Bojanowski P, Joulin A, Mikolov T.Alternative structures for character-level RNNs[EB/OL]. [2015-11-24].https://arxiv.org/ abs/1511.06303. [35] Luong M T, Manning C D.Achieving open vocabulary neural machine translation with hybrid word-character models[EB/ OL]. [2016-06-23].https://arxiv.org/pdf/1604.00788.pdf. [36] 朱德熙. 语法讲义[M]. 北京: 商务印书馆, 1982. [37] Abadi M, Agarwal A, Barham P, et al.TensorFlow: Large- scale machine learning on heterogeneous distributed systems[EB/OL]. [2016-03-16].https://arxiv.org/abs/1603.04467. [38] Looks M, Herreshoff M, Hutchins D L, et al.Deep learning with dynamic computation graphs[EB/OL]. [2017-02-22].https://arxiv.org/abs/1702.02181. [39] Sak H, Senior A, Beaufays F.Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition[EB/OL]. [2014-02-05].https://arxiv.org/abs/ 1402.1128. [40] Bengio Y, Senécal J S.Adaptive importance sampling to accelerate training of a neural probabilistic language model[J]. IEEE Transactions on Neural Networks, 2008, 19(4): 713-722. [41] Gutmann M, Hyvärinen A.Noise-contrastive estimation: A new estimation principle for unnormalized statistical models[C]// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010: 297-304. [42] Mnih A, Teh Y W.A fast and simple algorithm for training neural probabilistic language models[EB/OL]. [2012-06-27].https://arxiv.org/abs/1206.6426. [43] Dyer C.Notes on noise contrastive estimation and negative sampling[EB/OL]. [2014-10-30].https://arxiv.org/abs/1410.8251. [44] Chelba C, Mikolov T, Schuster M, et al.One billion word benchmark for measuring progress in statistical language modeling[EB/OL]. [2014-03-04].https://arxiv.org/abs/1312.3005. [45] Ji S H, Vishwanathan S V N, Satish N, et al. BlackOut: Speeding up recurrent neural network language models with very large vocabularies[EB/OL].[2016-03-31]. https://arxiv. org/abs/1511.06909. [46] Williams W, Prasad N, Mrva D, et al.Scaling recurrent neural network language models[C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015: 5391-5395. [47] Jozefowicz R, Vinyals O, Schuster M, et al.Exploring the limits of language modeling[EB/OL]. [2016-02-11].https:// arxiv.org/abs/1602.02410. [48] Li X, Qin T, Yang J, et al.LightRNN: Memory and computation-efficient recurrent neural networks[C]// Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 4385-4393. [49] Dauphin Y N, Fan A, Auli M, et al.Language modeling with gated convolutional networks[EB/OL]. [2016-11-23].https://arxiv.org/abs/1612.08083.