|
|
Investigating the Performance Improvement of Microblog Keyword Extraction by Eye-tracking Data from General-domain Corpus |
Zhang Chengzhi, Hu Shaohu, Zhang Yingyi |
Department of Information Management, School of Economics and Management, Nanjing University of Science & Technology, Nanjing 210094 |
|
|
Abstract Eye-tracking data records the eye trajectory of the viewer when browsing information. Some studies have measured the readers' attention on different words based on eye-tracking data and further added this feature to the microblog keyword extraction model to improve the performance of extraction models. However, the current research only performs keyword extraction based on the total gaze duration of eye-tracking data in the general domain. Moreover, the impact of eye-tracking data on the performance of microblog keyword extraction tasks remains to be fully explored. To this end, the present article comprehensively examines the performance of eye-tracking data of general corpus on microblog keywords from the following aspects of eye movement feature selection: eye movement feature and text feature combination. Simultaneously, due to the large differences in data size between the eye-tracking data and the test data sets, the eye movement characteristics are too sparse and thus impact their effectiveness. Finally, to resolve this issue, an expansion program for eye movement data is proposed.
|
Received: 01 June 2020
|
|
|
|
1 Turney P D. Learning algorithms for keyphrase extraction[J]. Information Retrieval, 2000, 2(4): 303-336. 2 Papagiannopoulou E, Tsoumakas G. A review of keyphrase extraction[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2020, 10(2): e1339. 3 Carpenter P A, Just M A. Eye movements in reading[M]. New York: Academic Press, 1983. 4 Zhang Y Y, Zhang C Z. Using human attention to extract keyphrase from microblog post[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2019: 5867-5872. 5 Lau J H, Baldwin T. An empirical evaluation of doc2vec with practical insights into document embedding generation[C]// Proceedings of the 1st Workshop on Representation Learning for NLP. Stroudsburg: Association for Computational Linguistics, 2016: 78-86. 6 Pagliardini M, Gupta P, Jaggi M. Unsupervised learning of sentence embeddings using compositional n-gram features[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 528-540. 7 Pennington J, Socher R, Manning C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543. 8 Witten I H, Paynter G W, Frank E, et al. KEA: practical automatic keyphrase extraction[C]// Proceedings of the 4th ACM Conference on Digital Libraries. New York: ACM Press, 1999: 254-255. 9 Jiang X, Hu Y H, Li H. A ranking approach to keyphrase extraction[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 2009: 756-757. 10 章成志, 苏新宁. 基于条件随机场的自动标引模型研究[J]. 中国图书馆学报, 2008, 34(5): 89-94,99. 11 Gollapalli S D, Li X L, Yang P. Incorporating expert knowledge into keyphrase extraction[C]// Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3180-3187. 12 Zhang Q, Wang Y, Gong Y Y, et al. Keyphrase extraction using deep recurrent neural networks on twitter[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 836-845. 13 Meng R, Zhao S Q, Han S G, et al. Deep keyphrase generation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2017: 582-592. 14 Basaldella M, Antolli E, Serra G, et al. Bidirectional LSTM recurrent neural network for keyphrase extraction[C]// Proceedings of the Digital Libraries and Multimedia Archives:14th Italian Research Conference on Digital Libraries. Cham: Springer, 2017: 180-187. 15 Zhang Y Y, Li J, Song Y, et al. Encoding conversation context for neural keyphrase extraction from microblog posts[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 1676-1686. 16 Rayner K. Eye movements in reading and information processing: 20 years of research[J]. Psychological Bulletin, 1998, 124(3): 372-422. 17 Kennedy A, Pynte J, Murray W S, et al. Frequency and predictability effects in the Dundee Corpus: an eye movement analysis[J]. Quarterly Journal of Experimental Psychology, 2013, 66(3): 601-618. 18 Kliegl R, Grabner E, Rolfs M, et al. Length, frequency, and predictability effects of words on eye movements in reading[J]. European Journal of Cognitive Psychology, 2004, 16(1/2): 262-284. 19 Luke S G, Christianson K. The Provo Corpus: a large eye-tracking corpus with predictability norms[J]. Behavior Research Methods, 2018, 50(2): 826-833. 20 Cop U, Dirix N, Drieghe D, et al. Presenting GECO: an eyetracking corpus of monolingual and bilingual sentence reading[J]. Behavior Research Methods, 2017, 49(2): 602-615. 21 Hollenstein N, Rotsztejn J, Troendle M, et al. ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading[J]. Scientific Data, 2018, 5: 180291. 22 Barrett M, Bingel J, Keller F, et al. Weakly supervised part-of-speech tagging using eye-tracking data[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2016: 579-584. 23 Li S, Gra?a J V, Taskar B. Wiki-ly supervised part-of-speech tagging[C]// Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2012: 1389-1398. 24 Mishra A, Kanojia D, Nagar S, et al. Leveraging cognitive features for sentiment analysis[C]// Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2016: 156-166. 25 Barrett M, S?gaard A. Reading behavior predicts syntactic categories[C]// Proceedings of the 19th Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2015: 345-349. 26 Barrett M, Bingel J, Hollenstein N, et al. Sequence classification with human attention[C]// Proceedings of the 22nd Conference on Computational Natural Language Learning. Stroudsburg: Association for Computational Linguistics, 2018: 302-312. 27 Zeng X S, Li J, Wang L, et al. Microblog conversation recommendation via joint modeling of topics and discourse[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics, 2018: 375-385. 28 Campos R, Mangaravite V, Pasquali A, et al. YAKE! collection-independent automatic keyword extractor[C]// Proceedings of the 40th European Conference on IR Research. Cham: Springer, 2018: 806-810. 29 Chen W, Gao Y F, Zhang J N, et al. Title-guided encoding for keyphrase generation[C]// Proceedings of the 33th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 6268-6275. 30 Jebbara S, Cimiano P. Improving opinion-target extraction with character-level word embeddings[C]// Proceedings of the First Workshop on Subword and Character Level Models in NLP. Stroudsburg: Association for Computational Linguistics, 2017: 159-167. 31 Graves A. Generating sequences with recurrent neural networks[EB/OL]. (2013-08-04). https://arxiv.org/pdf/1308.0850.pdf. |
|
|
|