|
|
Microblog Retrieval Model Combining User Interest and Mixed Estimation |
Wu Shufang1,2, Zhang Xiongtao1, Zhu Jie3 |
1.School of Management, Hebei University, Baoding 071000 2.College of Management and Economics, Tianjin University, Tianjin 300000 3.Department of Information Management, the Central Institute for Correctional Police, Baoding 071000 |
|
|
Abstract With the further development of mobile internet technology, microblog retrieval has become an important part of microblog service. Considering the difference between microblog retrieval and traditional text retrieval, a new microblog retrieval model is put forward. The new model improves the prior probability and document language model estimation of the query likelihood model. To improve the document prior probability, the user’s interest blog library is obtained by quantifying the interest of users in blogs, and then the prior probability of microblog document is computed based on the proposed interest blog library. On the other hand, the information of blog contents and user interaction are mixed to obtain related blogs, which are used to smooth the original blog and achieve the mixed estimation on document language model, to effectively solve the problem of data sparseness in microblog short text. Experiments adopt the real data crawled from Sina to verify the effectiveness of our model, and experimental results demonstrate that our model outperforms some state-of-the-art models on P@15, P@30, and MRR.
|
Received: 28 September 2018
|
|
|
|
1 微博数据中心. 2017年微博用户发展报告[EB/OL]. [2017-12-25]. http://www.useit. com.cn/thread-17562-1-1.html. 2 TeevanJ, RamageD, MorrisM R. TwitterSearch: A comparison of microblog search and web search[C]// Proceedings of the Fourth International Conference on Web Search and Data Mining. New York: ACM Press, 2011: 35-44. 3 卫冰洁, 王斌. 面向微博搜索的时间感知的混合语言模型[J]. 计算机学报, 2014, 37(1): 229-237. 4 LiangS S, de RijkeM. Burst-aware data fusion for microblog search[J]. Information Processing & Management, 2015, 51(2): 89-113. 5 LiS, NingH, HanZ Y, et al. A method for microblog search by adjusting the language model with time[C]// Proceedings of the Eighth International Conference on Internet Computing for Science and Engineering. IEEE, 2016: 25-28. 6 叶施仁, 严水歌, 杨长春. 基于VSM和LSA的微博搜索排序方法研究[J]. 情报科学, 2015, 33(7): 98-101, 112. 7 JiangY C, XuY X, ShaoL. A personalized microblog search model considering user-author relationship[C]// Proceedings of the First International Conference on Data Science in Cyberspace. IEEE, 2016: 508-513. 8 卫冰洁, 史亮, 王斌. 一种融合聚类和时间信息的微博排序新方法[J]. 中文信息学报, 2015, 29(3): 177-183. 9 李锐, 王斌. 一种基于作者建模的微博检索模型[J]. 中文信息学报, 2014, 28(2): 136-143. 10 TommaselA, GodoyD. A social-aware online short-text feature selection technique for social media[J]. Information Fusion, 2018, 40: 1-17. 11 PonteJ M, CroftW B. A language modeling approach to information retrieval[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1998: 275-281. 12 ZhaiC. A study of smoothing methods for language models applied to information retrieval[J]. ACM Transactions on Information Systems, 2004, 22(2): 179-214. 13 ChoiJ, CroftW B. Temporal models for microblogs[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2012: 2491-2494. 14 LiX Y, CroftW B. Time-based language models[C]// Proceedings of the 12th International Conference on Information and Knowledge Management. New York: ACM Press, 2003: 469-475. 15 SaltonG, WongA, YangC S. A vector space model for automatic indexing[J]. Communications of the ACM, 1974, 18(11): 613-620. 16 JoachimsT. A probabilistic analysis of the rocchio algorithm with TF-IDF for text categorization[C]// Proceedings of the Fourteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 1996: 143-151. 17 LinJ, MohammedS, SequieraR, et al. Overview of the TREC 2016 real-time summarization track[C]// Proceedings of the 25th Text Retrieval Conference, Boston, USA, 2016: 38-44. 18 徐建民, 王平. 小型中文信息检索测试集的构建与分析[J]. 情报杂志, 2009, 28(1): 13-16. 19 CormackG V, PalmerC R, ClarkeL A. Efficient construction of large test collections[C]// Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1998: 282-289. 20 WangY S, HuangH Y, FengC. Query expansion based on a feedback concept model for microblog retrieval[C]// Proceedings of the 26th International Conference on World Wide Web. Geneva: International World Wide Web Conferences Steering Committee, 2017: 559-568. 21 陈杰, 刘学军, 李斌, 等. 一种基于用户动态兴趣和社交网络的微博推荐方法[J]. 电子学报, 2017, 45(4): 898-905. 22 韩中元, 杨沐昀, 孔蕾蕾, 等. 基于词汇时间分布的微博查询扩展[J]. 计算机学报, 2016, 39(10): 2031-2044. 23 BertsimasD, GuptaV, KallusN. Data-driven robust optimization[J]. Mathematical Programming, 2018, 167(2): 235-292. |
|
|
|