Keyword-Based Clustering Ensembles in Academic Documents
Zhang Yingyi1,2, Zhang Chengzhi1,2, Chen Guo1
1.Department of Information Management, Nanjing University of Science & Technology, Nanjing 210094 2.Institute of Scientific and Technical Information of China, Beijing 100038
1 BornmannL, MutzR. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references[J]. Journal of the Association for Information Science and Technology, 2015, 66(11): 2215-2222. 2 WaltmanL, van EckN J. A new methodology for constructing a publication-level classification system of science[J]. Journal of the Association for Information Science and Technology, 2012, 63(12): 2378-2392. 3 FredA, JainA K. Evidence accumulation clustering based on the K-means algorithm[C]// Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition. Heidelberg: Springer, 2002: 442-451. 4 ZhaoW X, JiangJ, HeJ, et al. Topical keyphrase extraction from Twitter[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL Press, 2011: 379-388. 5 ChoiJ, CroftW B, KimJ Y. Quality models for Microblog retrieval[C]// Proceedings of the 21st ACM International Conference on Information and Knowledge Management. New York: ACM Press, 2012: 1834-1838. 6 MarujoL, RibeiroR, GershmanA, et al. Event-based summarization using a centrality-as-relevance model[J]. Knowledge and Information Systems, 2017, 50(3): 945-968. 7 RossiR G, MarcaciniR M, RezendeS O. Analysis of domain independent statistical keyword extraction methods for incremental clustering[J]. Learning and Nonlinear Models, 2014, 12(1): 17-37. 8 王伟华. 基于主题模型的科技论文聚类推荐[D]. 北京: 华北电力大学, 2013. 9 王旭仁, 李娜, 何发镁, 等. 基于改进聚类算法的网络舆情分析系统研究[J]. 情报学报, 2014, 33(5): 530-537. 10 徐禹洪, 黄沛杰. 基于优化样本分布抽样集成学习的半监督文本分类方法研究[J]. 中文信息学报, 2017, 31(6): 180-189. 11 RojarathA, SongpanW, Pong-InwongC. Improved ensemble learning for classification techniques based on majority voting[C]// Proceedings of the 7th IEEE International Conference on Software Engineering and Service Science. New York: IEEE, 2017: 107-110. 12 杨草原, 刘大有, 杨博, 等. 聚类集成方法研究[J]. 计算机科学, 2011, 38(2): 166-170. 13 MacQueenJ. Some methods for classification and analysis of multivariate observations[C]// Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967: 281-297. 14 ZhouZ H, TangW. Clusterer ensemble[J]. Knowledge-Based Systems, 2006, 19(1): 77-83. 15 JainA K, MurtyM N, FlynnP J. Data clustering: A review[J]. ACM Computing Surveys, 1999, 31(3): 264-323. 16 NarinF, PinskiG, GeeH H. Structure of the biomedical literature[J]. Journal of the American Society for Information Science, 1976, 27(1): 25-45. 17 LeydesdorffL, RafolsI. A global map of science based on the ISI subject categories[J]. Journal of the American Society for Information Science and Technology, 2009, 60(2): 348-362. 18 SmallH, SweeneyE. Clustering the science citation index using co-citations[J]. Scientometrics, 1985, 7(3-6): 391-409. 19 LiuF F, PennellD, LiuF, et al. Unsupervised approaches for automatic keyword extraction using meeting transcripts[C]// Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics Morristown: Association for Computational Linguistics, 2009: 620-628. 20 FreundY, SchapireR E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. 21 BreimanL. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140. 22 Minaei-BidgoliB, TopchyA, PunchW F. Ensembles of partitions via data resampling[C]// Proceedings of the International Conference on Information Technology: Coding and Computing. New York: IEEE, 2004: 188-192. 23 DudoitS, FridlyandJ. Bagging to improve the accuracy of a clustering procedure[J]. Bioinformatics, 2003, 19(9): 1090-1099. 24 GionisA, MannilaH, TsaparasP. Clustering aggregation[J]. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1): Article No. 4. 25 程凯, 钟才明, 庞永明. 聚类集成中基聚类的优化研究[J]. 计算机应用与软件, 2017, 34(9): 267-272. 26 FredA. Finding consistent clusters in data partitions[C]// Proceedings of the International Workshop on Multiple Classifier Systems. Heidelberg: Springer, 2001: 309-318. 27 WangX, YangC Y, ZhouJ. Clustering aggregation by probability accumulation[J]. Pattern Recognition, 2009, 42(5): 668-675. 28 StrehlA, GhoshJ. Cluster ensembles–A knowledge reuse framework for combining multiple partitions[J]. Journal of Machine Learning Research, 2002, 3(12): 583-617. 29 王丽娟, 郝志峰, 蔡瑞初, 等. 基于随机取样的选择性K-means聚类融合算法[J]. 计算机应用, 2013, 33(7): 1969-1972. 30 WittenI H, PaynterG W, FrankE, et al. KEA: Practical automatic keyphrase extraction[C]// Proceedings of the Fourth ACM Conference on Digital Libraries. New York: ACM Press, 1999: 254-255. 31 ZhangY Y, LiJ, SongY. et al. Encoding conversation context for neural keyphrase extraction from Microblog posts[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL Press, 2018: 1676-1686. 32 ZhangQ, WangY, GongY Y, et al. Keyphrase extraction using deep recurrent neural networks on Twitter[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language. Stroudsburg: ACL Press, 2016: 836-845. 33 SaltonG, BuckleyC. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5): 513-523. 34 MatsuoY, IshizukaM. Keyword extraction from a single document using word co-occurrence statistical information[J]. International Journal on Artificial Intelligence Tools, 2004, 13(1): 157-169. 35 PalshikarG K. Keyword extraction from a single document using centrality measures[C]// Proceedings of International Conference on Pattern Recognition and Machine Intelligence. Heidelberg: Springer, 2007: 503-510. 36 MihalceaR, TarauP. TextRank: Bringing order into text[C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL Press, 2004: 404-411. 37 SaltonG, LeskM E. Computer evaluation of indexing and text processing[J]. Journal of the ACM, 1968, 15(1): 8-36. 38 张振亚, 王进, 程红梅, 等. 基于余弦相似度的文本空间索引方法研究[J]. 计算机科学, 2005, 32(9): 160-163. 39 CoverT M, ThomasJ A. Elements of information theory[M]. New York: John Wiley & Sons, 1991. 40 ManningC D, RaghavanP, SchützeH. Introduction to information retrieval[M]. Cambridge: Cambridge University Press, 2008.