|
|
Research of Multilingual Author-Topic Model for Profiling Researcher Interests |
Li Yan, Liu Zhihui, Gao Yingfan |
Institute of Scientific and Technical Information of China, Beijing 100038 |
|
|
Abstract In the background of big data and globalization, mining latent topics automatically and profiling researchers interests accurately from massive multilingual literature are some of the key issues encountered in providing services with respect to information for knowledge and cross language information retrieval. Currently, the methods adopted to describe researchers interests are mostly based on literatures in one certain language and therefore, these are not applicable to multi-language datasets. This study suggests the JointAT (joint author-topic) model on the basis of author-topic model and multilingual topic model to profile researchers interests from multilingual datasets. Moreover, a Gibbs sampling method to estimate the parameters of the JointAT model is proposed. The experimental results indicate that the JointAT model exhibits a better generalization ability than the author-topic model.
|
Received: 26 April 2019
|
|
|
|
1 Gu X, Blackmore K L. Recent trends in academic journal growth[J]. Scientometrics, 2016, 108(2): 693-716. 2 Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022. 3 Rosen-Zvi M, Griffiths T, Steyvers M, et al. The author-topic model for authors and documents[C]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. Arlington: AUAI Press, 2004: 487-494. 4 Mimno D, McCallum A. Expertise modeling for matching papers with reviewers[C]// Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2007: 500-509. 5 Kawamae N. Author interest topic model[C]// Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2010: 887-888. 6 史庆伟, 乔晓东, 徐硕, 等. 作者主题演化模型及其在研究兴趣演化分析中的应用[J]. 情报学报, 2013, 32(9): 912-919. 7 Blei D M, Jordan M I. Modeling annotated data[C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2003: 127-134. 8 Zhao B, Xing E P. HM-BiTAM: Bilingual topic exploration, word alignment, and translation[C]// Proceedings of the Conference in Advances in Neural Information Processing Systems, British Columbia, Canada, 2007: 1689-1696. 9 Tam Y C, Lane I, Schultz T. Bilingual LSA-based adaptation for statistical machine translation[J]. Machine Translation, 2007, 21(4): 187-207. 10 Ni X C, Sun J T, Hu J, et al. Mining multilingual topics from Wikipedia[C]// Proceedings of the 18th International Conference on World Wide Web. New York: ACM Press, 2009: 1155-1156. 11 Mimno D, Wallach H M, Naradowsky J, et al. Polylingual topic models[C]// Proceedings of the Conference on Empirical Methods in Natural Language Processing. Morristown: Association for Computational Linguistics, 2009: 880-889. 12 Boyd-Graber J, Blei D M. Multilingual topic models for unaligned text[C]// Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. Arlington: AUAI Press, 2009: 75-82. 13 Jagarlamudi J, Daumé H III. Extracting multilingual topics from unaligned comparable corpora[C]// Proceedings of the European Conference on Information Retrieval. Heidelberg: Springer, 2010: 444-456. 14 Griffiths T L, Steyvers M. Finding scientific topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(Suppl. 1): 5228-5235. 15 Teh Y W, Newman D, Welling M. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation[C]// Proceedings of the 20th Conference on Neural Information Processing Systems. Cambridge: The MIT Press, 2007, 19: 1353-1360. 16 Azzopardi L, Girolami M, van Risjbergen K. Investigating the relationship between language model perplexity and IR precision-recall measures[C]// Proceedings of the 26th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2003: 369-370. |
|
|
|