|
|
Towards an Appropriate Scale of Datasets for Domain Bibliometrics: Empirical Study under Multiple Tasks |
Chen Guo1,2, Wang Panting1, Wang Yuefen1 |
1.Department of Information Management, School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094 2.Jiangsu Science and Technology Collaborative Innovation Center of Social Public Safety, Nanjing 210094 |
|
|
Abstract It is impossible to construct a complete dataset for domain bibliometrics owing to the centralized and decentralized distribution of literature, which raises an essential question about the appropriate scale of datasets. This problem should be resolved under specific bibliometric analysis tasks. In this study, we comprehensively consider typical task scenarios including domain scale, elements for analysis (such as subject classification, country, institution, keyword, reference, author, and their co-occurrence relationship), top N values of elements, and whether to sort or not. Based on this, we designed a corresponding experimental scheme. We selected “artificial intelligence” as an example domain, constructed various subsets with different sampling scales, and obtained 4,800 indices showing the imitative effect of those subsets to the full dataset. The results show that, in bibliometrics, when analyzing subject classifications or countries, a small dataset is sufficient for a reliable result. Analysis tasks on authors should be conducted with as large a dataset as possible, because author analysis is quite sensitive to data scale. When analyzing institutions, keywords, or references, a certain scale that corresponds to the specific task scenarios can also achieve reliable results. Additionally, for the co-occurrence analysis, more top elements—or sorting elements—and a larger dataset are necessary.
|
Received: 21 July 2019
|
|
|
|
1 钟丽萍. 情报研究有效性评价的国内外研究现状及评述[J]. 情报杂志, 2012, 31(10): 32-35, 70. 2 钟丽萍, 冷伏海, 罗世猛. 情报研究有效性的影响因素分析[J]. 情报理论与实践, 2013, 36(7): 6-9. 3 中国人工智能开源软件发展联盟标准. 人工智能: 深度学习算法评估规范AIOSS-01-2018[S/OL]. (2018-07-01) [2019-07-10]. http:// www.cesi.cn/images/editor/20180703/20180703174359294.pdf. 4 Kennedy G. An introduction to Corpus Linguistics[M]. London: Routledge, 1998. 5 苏金智, 肖航. 语料库与社会语言学研究方法[J]. 浙江大学学报(人文社会科学版), 2012, 42(4): 87-95. 6 冯璐. 面向学科信息集成的领域分析数据集构建[M]. 北京: 北京邮电大学出版社, 2013. 7 Shu F, Julien C A, Zhang L, et al. Comparing journal and paper level classifications of science[J]. Journal of Informetrics, 2019, 13(1): 202-225. 8 Chen G, Xiao L. Selecting publication keywords for domain analysis in bibliometrics: a comparison of three methods[J]. Journal of Informetrics, 2016, 10(1): 212-223. 9 Omar M, Mehmood A, Choi G S, et al. Global mapping of artificial intelligence in Google and Google Scholar[J]. Scientometrics, 2017, 113(3): 1269-1305. 10 Shu F, Dinneen J D, Asadi B, et al. Mapping science using library of congress subject headings[J]. Journal of Informetrics, 2017, 11(4): 1080-1094. 11 Milojevi? S, Sugimoto C R, Yan E J, et al. The cognitive structure of Library and Information Science: analysis of article title words[J]. Journal of the American Society for Information Science and Technology, 2011, 62(10): 1933-1953. 12 Iqbal W, Qadir J, Tyson G, et al. A bibliometric analysis of publications in computer networking research[J]. Scientometrics, 2019, 119(2): 1121-1155. 13 Waltman L, van Eck N J, Noyons E C M. A unified approach to mapping and clustering of bibliometric networks[J]. Journal of Informetrics, 2010, 4(4): 629-635. 14 冯志刚, 李长玲, 刘小慧, 等. 基于引用与被引用文献信息的图书情报学跨学科性分析[J]. 情报科学, 2018, 36(3): 105-111. 15 Figuerola C G, García Marco F J, Pinto M. Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA[J]. Scientometrics, 2017, 112(3): 1507-1535. 16 Blessinger K, Frasier M. Analysis of a decade in library literature: 1994–2004[J]. College & Research Libraries, 2007, 68(2): 155-169. 17 Julien H, Pecoskie J (J L), Reed K. Trends in information behavior research, 1999–2008: a content analysis[J]. Library & Information Science Research, 2011, 33(1): 19-24. 18 Chang Y W, Huang M H. A study of the evolution of interdisciplinarity in library and information science: using three bibliometric methods[J]. Journal of the American Society for Information Science and Technology, 2012, 63(1): 22-33. 19 Leydesdorff L, Nerghes A. Co-word maps and topic modeling: a comparison using small and medium‐sized corpora (N<1,000)[J]. Journal of the Association for Information Science and Technology, 2017, 68(4): 1024-1035. 20 邱均平. 信息计量学(七)第七讲: 文献信息分布的集中与离散规律——布-齐-洛分布系及理论[J]. 情报理论与实践, 2001, 24(1): 77-80. 21 Zhang J, Liu G N, Ren M. Finding a representative subset from large-scale documents[J]. Journal of Informetrics, 2016, 10(3): 762-775. 22 孙巍, 黄政, 张学福. 基于特征测度的领域分析文献数据集构建方法研究[J]. 数字图书馆论坛, 2015(12): 9-14. 23 冯璐, 冷伏海. 基于领域分析需求和目标的领域分析数据集界域研究[J]. 图书情报工作, 2009, 53(24): 51-54. 24 刘敏娟, 张学福, 颜蕴, 等. 基于期刊主题相似性的领域分析数据集构建: 方法与实证[J]. 图书情报工作, 2016, 60(10): 115-122. 25 Shu F, Julien C A, Larivière V. Does the web of science accurately represent Chinese scientific performance?[J]. Journal of the Association for Information Science and Technology, 2019, 70(10): 1138-1152. 26 Spearman C. The proof and measurement of association between two things[J]. The American Journal of Psychology, 1987, 100(3/4): 441-471. |
|
|
|