胡泽文, 任萍, 崔静静. 基于机器学习模型的科技论文潜在“精品”识别研究[J]. 情报学报, 2023, 42(2): 189-202.
Hu Zewen, Ren Ping, Cui Jingjing. Study on Identification of Potential “Treasures” in Massive Papers Based on Machine Learning Models. 情报学报, 2023, 42(2): 189-202.
1 苏新宁. 完善评价体系, 推动科技创新[N/OL]. 人民日报, 2018-06-21(18). http://edu.people.com.cn/n1/2018/0621/c1006-30070180.html. 2 坚定文化自信把握时代脉搏聆听时代声音, 坚持以精品奉献人民用明德引领风尚[N/OL]. 光明日报, 2019-03-05(01). https://m.gmw.cn/baijia/2019-03/05/32600885.html. 3 胡泽文, 武夷山, 高继平. 图书情报学领域期刊论文零被引率的演变规律研究[J]. 情报学报, 2018, 37(3): 243-253. 4 胡泽文, 崔静静, 曹玲. 国内外科技文献低被引研究进展述评[J]. 情报学报, 2020, 39(12): 1354-1362. 5 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. 6 徐晓芹, 刘晓燕, 李春花. 基于专家审稿意见的高被引和零被引论文学术质量差异性分析[J]. 编辑学报, 2015, 27(6): 564-566. 7 van Raan A F J. Advanced bibliometric methods as quantitative core of peer review based evaluation and foresight exercises[J]. Scientometrics, 1996, 36(3): 397-420. 8 Gl?nzel W. Seven myths in bibliometrics about facts and fiction in quantitative science studies[J]. COLLNET Journal of Scientometrics and Information Management, 2008, 2(1): 9-17. 9 叶鹰. 高品质论文被引数据及其对学术评价的启示[J]. 中国图书馆学报, 2010, 36(1): 100-103. 10 曾继城, 张家榕, 叶鹰. 天鹅展翅: 高品质论文的引文模式探析[J]. 大学图书馆学报, 2019, 37(2): 83-87, 112. 11 Li J. Citation curves of “all-elements-sleeping-beauties”: “flash in the pan” first and then “delayed recognition”[J]. Scientometrics, 2014, 100(2): 595-601. 12 Li J, Ye F Y. The phenomenon of all-elements-sleeping-beauties in scientific literature[J]. Scientometrics, 2012, 92(3): 795-799. 13 Moed H F. The impact-factors debate: the ISI’s uses and limits[J]. Nature, 2002, 415(6873): 731-732. 14 Essential science indicators[EB/OL]. [2021-06-24]. http://esi.webofknowledge.com/home.cgi. 15 Garfield E. Bradford’s law and related statistical patterns[OL]. Essays of an Information Scientist, 1980, 4: 476-483. (1980-05-12). http://www.garfield.library.upenn.edu/essays/v4p476y1979-80.pdf. 16 Albarrán P, Ortu?o I, Ruiz-Castillo J. The measurement of low- and high-impact in citation distributions: technical results[J]. Journal of Informetrics, 2011, 5(1): 48-63. 17 Albarrán P, Ortu?o I, Ruiz-Castillo J. High- and low-impact citation measures: empirical applications[J]. Journal of Informetrics, 2011, 5(1): 122-145. 18 Hu Y H, Tai C T, Liu K E, et al. Identification of highly-cited papers using topic-model-based and bibliometric features: the consideration of keyword popularity[J]. Journal of Informetrics, 2020, 14(1): 101004. 19 Martin-Martin A, Orduna-Malea E, Harzing A W, et al. Can we use Google Scholar to identify highly-cited documents?[J]. Journal of Informetrics, 2017, 11(1): 152-163. 20 Garfield E. Delayed recognition in scientific discovery: citation frequency analysis aids the search for case histories[OL]. Essays of an Information Scientist, 1989, 12: 154-160. (1989-06-05). http://garfield.library.upenn.edu/essays/v12p154y1989.pdf. 21 van Raan A F J. Sleeping beauties in science[J]. Scientometrics, 2004, 59(3): 467-472. 22 Costas R, van Leeuwen T N, van Raan A F J. Is scientific literature subject to a ‘Sell-By-Date’? A general methodology to analyze the ‘durability’ of scientific documents[J]. Journal of the American Society for Information Science and Technology, 2010, 61(2): 329-339. 23 Ke Q, Ferrara E, Radicchi F, et al. Defining and identifying sleeping beauties in science[J]. Proceedings of the National Academy of Sciences of the United States of America, 2015, 112(24): 7426-7431. 24 Teixeira A A C, Vieira P C, Abreu A P. Sleeping Beauties and their princes in innovation studies[J]. Scientometrics, 2017, 110(2): 541-580. 25 Bornmann L, Ye A Y, Ye F Y. Identifying “hot papers” and papers with “delayed recognition” in large-scale datasets by using dynamically normalized citation impact scores[J]. Scientometrics, 2018, 116(2): 655-674. 26 Ye F Y, Bornmann L. “Smart girls” versus “sleeping beauties” in the sciences: the identification of instant and delayed recognition by using the citation angle[J]. Journal of the Association for Information Science and Technology, 2018, 69(3): 359-367. 27 杜建, 武夷山. 基于被引速率指标识别睡美人文献及其“王子”——以2014年诺贝尔化学奖得主Stefan Hell的睡美人文献为例[J]. 情报学报, 2015, 34(5): 508-521. 28 杜建, 武夷山. 一个用于识别睡美人文献的新的无参数指标——基于“Science”和“Nature”上睡美人文献的验证[J]. 情报理论与实践, 2017, 40(2): 19-25. 29 宋呈玉, 李秀霞, 刘黎明. 基于引文曲线导数的睡美人文献识别研究[J]. 情报资料工作, 2019, 40(3): 33-38. 30 赵又霖, 刘黎明, 葛梦真, 等. 改进的“睡美人”B值识别模型构建及学科领域因素差异探析——以ISLS和WR为例[J]. 图书与情报, 2020(2): 128-139. 31 Avramescu A. Actuality and obsolescence of scientific literature[J]. Journal of the American Society for Information Science, 1979, 30(5): 296-303. 32 Burrell Q L. Are “sleeping beauties” to be expected?[J]. Scientometrics, 2005, 65(3): 381-389. 33 Dey R, Roy A, Chakraborty T, et al. Sleeping beauties in Computer Science: characterization and early identification[J]. Scientometrics, 2017, 113(3): 1645-1663. 34 Fu L D, Aliferis C F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature[J]. Scientometrics, 2010, 85(1): 257-270. 35 Ibá?ez A, Larra?aga P, Bielza C. Predicting citation count of Bioinformatics papers within four years of publication[J]. Bioinformatics, 2009, 25(24): 3303-3309. 36 Ruan X M, Zhu Y Y, Li J, et al. Predicting the citation counts of individual papers via a BP neural network[J]. Journal of Informetrics, 2020, 14(3): 101039. 37 Dang Q V, Ignat C L. Quality assessment of Wikipedia articles without feature engineering[C]// Proceedings of the 2016 IEEE/ACM Joint Conference on Digital Libraries. IEEE, 2016: 27-30. 38 Wang P, Li X D. Assessing the quality of information on wikipedia: a deep-learning approach[J]. Journal of the Association for Information Science and Technology, 2020, 71(1): 16-28. 39 崔静静, 胡泽文, 任萍. 基于决策树和逻辑回归模型的人工智能领域潜在“精品”论文识别研究[J]. 情报科学, 2022, 40(5): 90-96. 40 胡泽文, 任萍, 周西姬. 基于随机森林的Science和Nature期刊潜在精品论文识别研究[J]. 情报科学, 2022, 40(4): 90-95, 106. 41 袁梅宇. 数据挖掘与机器学习——WEKA应用技术与实践[M]. 北京: 清华大学出版社, 2014. 42 樊海玮, 史双, 张博敏, 等. 基于MLP改进型深度神经网络学习资源推荐算法[J]. 计算机应用研究, 2020, 37(9): 2629-2633. 43 李小涛, 秦萍, 钱玲飞. 图情领域基本科学指标数据库高被引论文的知识图谱分析[J]. 情报理论与实践, 2017, 40(2): 111-116, 121. 责任编辑 王克平)