|
|
A Review of Word Representation Learning |
Pan Jun1,2, Wu Zongda2 |
1.School of Science, Zhejiang University of Science and Technology, Hangzhou 310023 2.Wenzhou Popper Big Data Research, Wenzhou 325035 |
|
|
Abstract Word representation that reflects semantic meaning is fundamental to natural language understanding tasks. The traditional method of encoding a word through a semantic dictionary is impractical due to the high construction cost, and one-hot representation suffers from various defects, such as high dimension and data sparsity. Distributed word representation,which projects the words into vectors in a low-dimensional real-valued space, can capture the semantic relatedness between the words and has been widely used in many NLP tasks. In this paper, we present an in-depth study of word representation learning methods from the perspectives of input data, learning objectives, and optimization algorithms, focusing on the theoretical basis, key techniques, evaluation methods, and application fields. We then summarize the main challenges and the latest advances in this research field, and we finally discuss possible future work in the field.
|
Received: 21 December 2018
|
|
|
|
1 王瑞琴, 杨小明, 楼俊钢. 词汇语义相关性度量研究[J]. 情报学报, 2016, 35(4): 389-404. 2 MikolovT, YihW, ZweigG. Linguistic regularities in continuous space word representations[C]// Proceedings of the Conference of the North American Chapter of the ACL. Stroudsburg: Association for Computational Linguistics, 2013: 746-751. 3 MikolovT, ChenK, CorradoG, et al. Efficient estimation of word representations in vector space[OL]. [2018-09-12]. https://arxiv.org/pdf/1301.3781.pdf 4 MitchellJ, LapataM. Composition in distributional models of semantics[J]. Cognitive Science, 2010, 34(8): 1388-1429. 5 IyyerM, ManjunathaV, Boyd-GraberJ, et al. Deep unordered composition rivals syntactic methods for text classification[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 1681-1691. 6 HermannK M, BlunsomP. The role of syntax in vector space models of compositional semantics[C]// Proceedings of the 51st Annual Meeting of the ACL. Stroudsburg: Association for Computational Linguistics, 2013, 1: 894-904. 7 KartsaklisD. Compositional operators in distributional semantics[J]. Springer Science Reviews, 2014, 2(1-2): 161-177. 8 SocherR, PerelyginA, WuJ, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2013: 1631-1642. 9 KalchbrennerN, GrefenstetteE, BlunsomP. A convolutional neural network for modeling sentences[C]// Proceedings of the 52nd Annual Meeting of the ACL. Stroudsburg: Association for Computational Linguistics, 2014, 1: 655-665. 10 ShenD H, WangG Y, WangW L, et al. Baseline needs more love: On simple word-embedding-based models and associated pooling mechanisms[C]// Proceedings of the 56th Annual Meeting of the ACL. Stroudsburg: Association for Computational Linguistics, 2018. 11 FaruquiM, DodgeJ, JauharS K, et al. Retrofitting word vectors to semantic lexicons[C]// Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2015: 1606-1615. 12 SpeerR, Chin J, Havasi C. ConceptNet 5. 5: An open multilingual graph of general knowledge[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 4444-4451. 13 PilehvarM T, CollierN. Inducing embeddings for rare and unseen words by leveraging lexical resources[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 388-393. 14 Mrk?i?N, óSéaghdha D, ThomsonB, et al. Counter-fitting word vectors to linguistic constraints[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 142-148. 15 Mrk?i?N, Vuli?I, óSéaghdha D, et al. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 309-324. 16 YuM, DredzeM. Improving lexical embeddings with semantic knowledge[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 545-550. 17 LiuQ, JiangH, WeiS, et al. Learning semantic word embeddings based on ordinal knowledge constraints[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 1501-1511. 18 NguyenK A, WaldeS S, VuN T. Integrating distributional lexical contrast into word embeddings for antonym–synonym distinction[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 454. 19 MorinF, BengioY. Hierarchical probabilistic neural network language model[C]// Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 2005: 246-252. 20 SaltonG, WongA, YangC S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. 21 DeerwesterS, DumaisS T, FurnasG W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6): 391-407. 22 SchutzeH. Word space[J]. Advances in Neural Information Processing Systems, 1993, 22(6): 895-902. 23 DhillonP S, FosterD, UngarL. Multi-view learning of word embeddings via CCA[C]// Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada: Curran Associates Inc, 2011: 199-207. 24 DhillonP S, RoduJ, FosterD P, et al. Two step CCA: a new spectral method for estimating vector models of words[C]// Proceedings of the International Conference on International Conference on Machine Learning, 2012: 67-74. 25 LeeD D, SeungH S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755): 788-791. 26 HofmannT. Unsupervised learning by probabilistic latent semantic analysis[J]. Machine Learning, 2001, 42(1-2): 177-196. 27 BleiD M, NgA Y, JordanM I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 28 GaussierE, GoutteC. Relation between pLSA and NMF and implications[C]// Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2005: 601-602. 29 DingC, LiT, PengW. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing[J]. Computational Statistics & Data Analysis, 2008, 52(8): 3913-3927. 30 BengioY, DucharmeR, VincentP, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 3(6): 1137-1155. 31 CollobertR, WestonJ. A unified architecture for natural language processing: deep neural networks with multitask learning[C]// Proceedings of the International Conference on Machine Learning. New York: ACM Press, 2008: 160-167. 32 MnihA, HintonG. Three new graphical models for statistical language modelling[C]// Proceedings of the 24th International Conference on Machine Learning. New York: ACM Press, 2007: 641-648. 33 MikolovT, KarafiátM, BurgetL, et al. Recurrent neural network based language model[C]// Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010: 1045-1048. 34 PenningtonJ, SocherR, ManningC. Glove: Global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543. 35 LevyO, GoldbergY. Neural word embedding as implicit matrix factorization[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2177-2185. 36 LiY, XuL, TianF, et al. Word embedding revisited: a new representation learning and explicit matrix factorization perspective[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 3650-3656. 37 GoldbergY. A primer on neural network models for natural language processing[J]. Journal of Artificial Intelligence Research, 2016, 57: 345-420. 38 KielaD, HillF, ClarkS. Specializing word embeddings for similarity or relatedness[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 2044-2048. 39 ChenX X, LiuZ Y, SunM S. A unified model for word sense representation and disambiguation[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1025-1035. 40 ChenT, XuR, HeY, et al. Improving distributed representation of word sense via WordNet gloss composition and context clustering[J]. Atmospheric Measurement Techniques, 2015, 4(3): 5211-5251. 41 GuoJ, CheW, WangH, et al. Learning sense-specific word embeddings by exploiting bilingual resources[C]// Proceedings of the 25th International Conference on Computational Linguistics, 2014: 497-507. 42 UpadhyayS, ChangK W, TaddyM, et al. Beyond bilingual: multi-sense word embeddings using multilingual context[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 101-110. 43 NeelakantanA, ShankarJ, PassosA, et al. Efficient non-parametric estimation of multiple embeddings perword in vector space[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1059-1069. 44 LiuY, LiuZ, ChuaT S, et al. Topical word embeddings[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 2418-2424. 45 NguyenD Q, NguyenD Q, ModiA, et al. A mixture model for learning multi-sense word embeddings[C]// Proceedings of the 6th Joint Conference on Lexical and Computational Semantics. Stroudsburg: Association for Computational Linguistics, 2017: 121-127. 46 LiJ W, JurafskyD. Do multi-sense embeddings improve natural language understanding?[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 1722-1732. 47 PilehvarM T, CollierN. Inducing embeddings for rare and unseen words by leveraging lexical resources[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 388-393. 48 LinzenT, DupouxE, GoldbergY. Assessing the ability of LSTMs to learn syntax-sensitive dependencies[J]. Transactions of the Association for Computational Linguistics, 2016, 4: 521-535. 49 CaoS S, LuW, ZhouJ, et al. Cw2vec: Learning Chinese word embeddings with stroke n-gram information[C]// Proceedings of the 32th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 1-8. 50 LiY R, LiW J, SunF, et al. Component-enhanced Chinese character embeddings[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 829-834. 51 YinR C, WangQ, LiP, et al. Multi-granularity Chinese word embedding[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 981-986. 52 YuJ X, JianX, XinH, et al. Joint embeddings of Chinese words, characters, and fine-grained sub-character components[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 286-291. 53 Vuli?I, Mrk?i?N, ReichartR, et al. Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017, 1: 56-68. 54 CotterellR, SchützeH. Joint semantic synthesis and morphological analysis of the derived word[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 33-48. 55 ZouW Y, SocherR, CerD, et al. Bilingual word embeddings for phrase-based machine translation[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2013: 1393-1398. 56 XuK, WanX J. Towards a universal sentiment classifier in multiple languages[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 511-520. 57 李志义, 黄子风, 许晓绵. 基于表示学习的跨模态检索模型与特征抽取研究综述[J]. 情报学报, 2018, 37(4): 422-435. 58 LazaridouA, BaroniM. Combining language and vision with a multimodal skip-gram model[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2015: 153-163. 59 KotturS, VedantamR, MouraJ M F, et al. VisualWord2Vec (vis-W2V): Learning visually grounded word embeddings using abstract scenes[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 4985-4994. 60 MaoJ, XuJ, JingK, et al. Training and evaluating multimodal word embeddings with large-scale web annotated images[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems, 2016: 442-450. 61 SuT R, LeeH Y. Learning Chinese word representations from glyphs of characters[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 264-273. 62 ErkK. Supporting inferences in semantic space: representing words as regions in vector space[C]// Proceedings of the 13th Conference on Computational Natural Language Learning, 2009: 57-65. 63 ZhangJ W, SalwenJ, GlassM, et al. Word semantic representations using Bayesian probabilistic tensor factorization[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1522-1531. 64 VilnisL, McCallumA. Word representations via Gaussian embedding[C]// Proceedings of the International Conference on Learning Representations, 2015. 65 AthiwaratkunB, WilsonA G, AnandkumarA. Probabilistic fast text for multi-sense word embeddings[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018. 66 BarkanO. Bayesian neural word embedding[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3135-3143. 67 AthiwaratkunB, WilsonA. Multimodal word distributions[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 1645-1656. 68 Bra?inskasA, HavrylovS, TitovI. Embedding words as distributions with a Bayesian Skip-Gram model[C]// Proceedings of the 27th International Conference on Computational Linguistics, 2018. 69 MelamudO, GoldbergerJ, DaganI. Context2vec: Learning generic context embedding with bidirectional LSTM[C]// Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2016: 51-61. 70 McCannB, BradburyJ, XiongC, et al. learned in translation: contextualized word vectors[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems, 2017: 6294-6305. 71 PetersM, AmmarW, BhagavatulaC, et al. Semi-supervised sequence tagging with bidirectional language models[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 1756-1765. 72 PetersM, NeumannM, IyyerM, et al. Deep contextualized word representations[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 2227-2237. 73 FuR J, GuoJ, QinB, et al. Learning semantic hierarchies: A continuous vector space approach[J]. ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(3): 461-471. 74 ShwartzV, GoldbergY, DaganI. Improving hypernymy detection with an integrated path-based and distributional method[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 2389-2398. 75 NguyenK A, K?perM, im WaldeS S, et al. Hierarchical embeddings for hypernymy detection and directionality[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 233-243. 76 XieR, YuanX, LiuZ, et al. Lexical sememe prediction via word embeddings and matrix factorization[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017: 4200-4206. 77 ZengX, YangC, TuC, et al. Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018. 78 BordesA, UsunierN, Garcia-DuranA, et al. Translating embeddings for modeling multi-relational data[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems, 2013: 2787-2795. 79 HamiltonW L, LeskovecJ, JurafskyD. Cultural shift or linguistic drift? Comparing two computational measures of semantic change[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 2116-2121. 80 AzarbonyadH, DehghaniM, BeelenK, et al. Words are malleable: Computing semantic shifts in political and media discourse[C]// Proceedings of the International Conference on Information and Knowledge Management, 2017: 1509-1518. 81 刘知远, 刘扬, 涂存超, 等. 词汇语义变化与社会变迁定量观测与分析[J]. 语言战略研究, 2016, 1(6): 47-54. 82 HellrichJ, BuechelS, HahnU. JESEME: A Website for exploring diachronic changes in word meaning and emotion[C]// Proceedings of the 27th International Conference on Computational Linguistics, 2018. 83 GargN, SchiebingerL, JurafskyD, et al. Word embeddings quantify 100 years of gender and ethnic stereotypes[J]. Proceedings of the National Academy of Sciences of the United States of America, 2018, 115(16): E3635. 84 CaliskanA, BrysonJ J, NarayananA. Semantics derived automatically from language corpora contain human-like biases[J]. Science, 2016, 356(6334): 183-186. 85 BolukbasiT, ChangK W, ZouJ Y, et al. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems, 2016: 4349-4357. 86 KusnerM J, LoftusJ, RussellC, et al. Counterfactual fairness[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2017: 4066-4076. 87 PleissG, RaghavanM, WuF, et al. On fairness and calibration[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2017: 5680-5689. 88 蔡永明, 长青. 共词网络LDA模型的中文短文本主题分析[J]. 情报学报, 2018, 37(3): 305-317. 89 周清清, 章成志. 在线用户评论细粒度属性抽取[J]. 情报学报, 2017, 36(5): 484-493. 90 赵洪, 王芳. 理论术语抽取的深度学习模型及自训练算法研究[J]. 情报学报, 2018, 37(9): 923-938. 91 张晓娟. 利用嵌入方法实现个性化查询重构[J]. 情报学报, 2018, 37(6): 621-630. 92 张志毅, 张庆云. 柏拉图以来词义说的新审视[J]. 中国语文, 2000(2): 126-136, 190. 93 FirthJ R. A synopsis of linguistic theory 1930-55[J]. Studies in Linguistic Analysis, 1957, 41(4): 1-32. 94 HarrisZ S. Distributional structure[J]. WORD, 1954, 10(2-3): 146-162. 95 HuangE H, SocherR, ManningC D, et al. Improving word representations via global context and multiple word prototypes[C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2012: 873-882. 96 XuY, LiuJ W, YangW, et al. Incorporating latent meanings of morphological compositions to enhance word embeddings[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 1232-1242. 97 AvrahamO, GoldbergY. The interplay of semantics and morphology in word embeddings[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 422-426. 98 BothaJ A, BlunsomP. Compositional morphology for word representations and language modeling[C]// Proceedings of the International Conference on Machine Learning, 2014: 1899-1907. 99 QiuS, CuiQ, BianJ, et al. Co-learning of word representations and morpheme representations[C]// Proceedings of the 25th International Conference on Computational Linguistics, 2014: 141-150. 100 SoricutR, OchF. Unsupervised morphology induction using word embeddings[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2015: 1627-1637. 101 SunF, GuoJ, LanY, et al. Inside out: two jointly predictive models for word representations and phrase representations[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2821-2827. 102 CaoK, ReiM. A Joint model for word embedding and word morphology[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 18-26. 103 XuJ, LiuJ, ZhangL, et al. Improve Chinese word embeddings by exploiting internal structure[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 1041-1050. 104 LevyO, GoldbergY. Dependency-based word embeddings[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 302-308. 105 刘永彬, 欧阳纯萍, 钟东来, 等. 基于非线性全局上下文的词嵌入[J]. 中国科学: 信息科学, 2015, 45(12): 1588-1599. 106 BansalM, GimpelK, LivescuK. Tailoring continuous word representations for dependency parsing[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 809-815. 107 LingW, DyerC, BlackA W, et al. Two too simple adaptations of word2vec for syntax problems[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2015: 1299-1304. 108 ZhaoZ, LiuT, LiS, et al. Ngram2vec: Learning improved word representations from n-gram co-occurrence statistics[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 244-253. 109 SantosC D, ZadroznyB. Learning character-level representations for part-of-speech tagging[C]// Proceedings of the 31st International Conference on Machine Learning, 2014: 1818-1826. 110 KimY, JerniteY, SontagD, et al. Character-aware neural language models[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2016: 2741-2749. 111 JozefowiczR, VinyalsO, SchusterM, et al. Exploring the limits of language modeling[OL]. [2018-09-20]. https://arxiv.org/pdf/1602.02410.pdf. 112 WietingJ, BansalM, GimpelK, et al. Charagram: Embedding words and sentences via character n-grams[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 1504-1515. 113 BojanowskiP, GraveE, JoulinA, et al. Enriching word vectors with subword information[J]. Transactions of the Association for Computational Linguistics, 2017, 5: 135-146. 114 ChenX, XuL, LiuZ, et al. Joint learning of character and word embeddings[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2015: 1236-1242. 115 LevyO, GoldbergY, DaganI. Improving distributional similarity with lessons learned from word embeddings[J]. Transactions of the Association for Computational Linguistics, 2015, 3: 211-225. 116 MinkaT, LaffertyJ. Expectation-propagation for the generative aspect model[C]// Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, 2002: 352-359. 117 GriffithsT L, SteyversM. Finding scientific topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl 1): 5228-5235. 118 牛奉高, 张亚宇. 基于共现潜在语义向量空间模型的语义核构建[J]. 情报学报, 2017, 36(8): 834-842. 119 LundK, BurgessC. Producing high-dimensional semantic spaces from lexical co-occurrence[J]. Behavior Research Methods, Instruments, & Computers, 1996, 28(2): 203-208. 120 DeanJ, GhemawatS. MapReduce: Simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113. 121 BuntineW. Variational extensions to EM and multinomial PCA[C]// Proceedings of the 13th European Conference on Machine Learning. Heidelberg: Springer, 2002: 23-34. 122 MikolovT, KombrinkS, BurgetL, et al. Extensions of recurrent neural network language model[C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. New York: IEEE, 2011: 5528-5531. 123 ElmanJ L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211. 124 WerbosP J. Back propagation through time: What it does and how to do it[J]. Proceedings of the IEEE, 1990, 78(10): 1550-1560. 125 王毅, 谢娟, 成颖. 结合LSTM和CNN混合架构的深度神经网络语言模型[J]. 情报学报, 2018, 37(2): 194-205. 126 GoodmanJ. Classes for fast maximum entropy training[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 2001: 561-564. 127 MnihA, HintonG. A scalable hierarchical distributed language model[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 2008: 1081-1088. 128 MikolovT, SutskeverI, ChenK, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2013: 3111-3119. 129 ChenW, GrangierD, AuliM. Strategies for training large vocabulary neural language models[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 1975-1985. 130 BengioY, SénécalJ S. Quick Training of probabilistic neural nets by importance sampling[C]// Proceedings of the Conference on Artificial Intelligence and Statistics, 2003: 1-9. 131 BengioY, SenécalJ S. Adaptive importance sampling to accelerate training of a neural probabilistic language model[J]. IEEE Transactions on Neural Networks, 2008, 19(4): 713-722. 132 GutmannM, Hyv?rinenA. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models[C]// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010: 297-304. 133 MnihA, KavukcuogluK. Learning word embeddings efficiently with noise-contrastive estimation[C]// Proceedings of the Conference on Advances in Neural Information Processing Systems, 2013: 2265-2273. 134 SchnabelT, LabutovI, MimnoD, et al. Evaluation methods for unsupervised word embeddings[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 298-307. 135 FinkelsteinL, GabrilovichE, MatiasY, et al. Placing search in context: The concept revisited[J]. ACM Transactions on Information Systems, 2002, 20(1): 116-131. 136 AgirreE, AlfonsecaE, HallK, et al. A Study on similarity and relatedness using distributional and WordNet based approaches[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2009: 19-27. 137 LuongM T, SocherR, ManningC D. Better word representations with recursive neural networks for morphology[C]// Proceedings of the 17th Conference on Computational Natural Language Learning, 2013: 104-113. 138 GerzD, Vuli?I, HillF, et al. SimVerb-3500: A large-scale evaluation set of verb similarity[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 2173-2182. 139 CiaramitaM, JohnsonM. Supersense tagging of unknown nouns in WordNet[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2003: 168-175. 140 TsvetkovY, FaruquiM, WangL, et al. Evaluation of word vector representations by subspace alignment[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 2049-2054. 141 LiS, ZhaoZ, HuR, et al. Analogical reasoning on Chinese morphological and semantic relations[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018. 142 TurianJ, RatinovL, BengioY. Word representations: a simple and general method for semi-supervised learning[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2010: 384-394. 143 MillerS. Name tagging with word clusters and discriminative training[C]// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2004: 337-342. 144 KooT, CarrerasX, CollinsM. Simple semi-supervised dependency parsing[C]// Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2008: 595-603. 145 BakarovA. A survey of word embeddings evaluation methods[OL]. [2018-09-25]. https://arxiv.org/pdf/1801.09536.pdf. 146 索绪尔. 索绪尔第三次普通语言学教程[M]. 上海: 上海人民出版社, 2007. 147 LevyO, GoldbergY. Linguistic regularities in sparse and explicit word representations[C]// Proceedings of the 18th Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2014: 171-180. 148 LiJ, MonroeW, DanJ. Understanding neural networks through representation erasure[OL]. [2018-09-28]. https://arxiv.org/pdf/1612.08220v3.pdf. 149 BaroniM, DinuG, KruszewskiG. countDon t, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2014: 238-247. 150 HowardJ, RuderS. Universal language model fine-tuning for text classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2018: 328-339. 151 DevlinJ, ChangM W, LeeK, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[OL]. [2018-10-10]. https://arxiv.org/pdf/1810.04805.pdf. 152 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261. 153 张金柱, 于文倩, 刘菁婕, 等. 基于网络表示学习的科研合作预测研究[J]. 情报学报, 2018, 37(2): 132-139. |
|
|
|