|
|
Pattern Mining Based on Attribution Analysis and Its Empirical Study |
Cui Yunxue, Wang Xianwen, Wang Yongzhen |
WISE Lab, Institute of Science of Science and S&T Management, Dalian University of Technology, Dalian 116024 |
|
|
Abstract The citation patterns of academic literature exhibit various citation motives, which restrict the in-depth understanding of researchers’ citation behavior. To address this issue, this study took attribution analysis as its research perspective and collected 500,000 citation relationships from PubMed Central as the research sample to reveal the composition of reasons behind citation patterns and quantitatively explain them. First, 12 citation reasons were selected as features to characterize the citation relationships from the academic and non-academic motives of citation. Second, a decision forest algorithm was used to conduct classification experiments on 500,000 real citation relationships and an equal number of paired virtual citation relationships based on the constructed features. Finally, the experimental results were attributed through the SHapley Additive exPlanations (SHAP) explanatory framework, evaluating the influence and mode of action in the 12 citation reasons in citation decisions. The empirical results indicate that the roles played by different citation reasons in deciding whether to cite an article vary considerably. Specifically, factors such as the relevance of a topic, similarity of the research context, and academic level of the cited author play a major role in the citation decision, while other factors such as journal influence and topic frontiers play a minor role. In addition, different citation reasons function in different ways on citation decisions, and the relationship between the change in the value of their features and degree of influence on citation decisions can be summarized into four types: S-curve, logarithmic growth, dichotomous, and random fluctuation.
|
Received: 01 April 2022
|
|
|
|
1 Slyder J B, Stein B R, Sams B S, et al. Citation pattern and lifespan: a comparison of discipline, institution, and individual[J]. Scientometrics, 2011, 89(3): 955-966. 2 邱均平. 信息计量学(九) 第九讲 文献信息引证规律和引文分析法[J]. 情报理论与实践, 2001, 24(3): 236-240. 3 Gross P L K, Gross E M. College libraries and chemical education[J]. Science, 1927, 66(1713): 385-389. 4 Garfield E. Citation analysis as a tool in journal evaluation[J]. Science, 1972, 178(4060): 471-479. 5 Avramescu A. Actuality and obsolescence of scientific literature[J]. Journal of the American Society for Information Science, 1979, 30(5): 296-303. 6 李江, 姜明利, 李玥婷. 引文曲线的分析框架研究——以诺贝尔奖得主的引文曲线为例[J]. 中国图书馆学报, 2014, 40(2): 41-49. 7 Sangam S L. Obsolescence of literature in the field of psychology[J]. Scientometrics, 1999, 44(1): 33-46. 8 Van Raan A F J. Sleeping beauties in science[J]. Scientometrics, 2004, 59(3): 467-472. 9 Egghe L, Ravichandra Rao I K. Citation age data and the obsolescence function: fits and explanations[J]. Information Processing & Management, 1992, 28(2): 201-217. 10 Price D J. Networks of scientific papers[J]. Science, 1965, 149(3683): 510-515. 11 宋歌. 网络分析方法在引文分析中的整合研究[J]. 中国图书馆学报, 2011, 37(4): 106-114. 12 Hummon N P, Dereian P. Connectivity in a citation network: the development of DNA theory[J]. Social Networks, 1989, 11(1): 39-63. 13 Chen P, Redner S. Community structure of the physical review citation network[J]. Journal of Informetrics, 2010, 4(3): 278-290. 14 Redner S. How popular is your paper? An empirical study of the citation distribution[J]. The European Physical Journal B - Condensed Matter and Complex Systems, 1998, 4(2): 131-134. 15 Chen C M. CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature[J]. Journal of the American Society for Information Science and Technology, 2006, 57(3): 359-377. 16 Garfield E. Can citation indexing be automated[C]// Symposium Proceedings of Statistical Association Methods for Mechanized Documentation. Philadelphia: ISI Press, 1964: 189-192. 17 Erikson M G, Erlandson P. A taxonomy of motives to cite[J]. Social Studies of Science, 2014, 44(4): 625-637. 18 Merton R K, Storer N W. The sociology of science: theoretical and empirical investigations[M]. Chicago: University of Chicago Press, 1973. 19 Latour B. Science in action: how to follow scientists and engineers through society[M]. Cambridge: Harvard University Press, 1987. 20 Tahamtan I, Bornmann L. Core elements in the process of citing publications: conceptual overview of the literature[J]. Journal of Informetrics, 2018, 12(1): 203-216. 21 Brooks T A. Private acts and public objects: an investigation of citer motivations[J]. Journal of the American Society for Information Science, 1985, 36(4): 223-229. 22 Vinkler P. A quasi-quantitative citation model[J]. Scientometrics, 1987, 12: 47-72. 23 邱均平, 陈晓宇, 何文静. 科研人员论文引用动机及相互影响关系研究[J]. 图书情报工作, 2015, 59(9): 36-44. 24 Tahamtan I, Afshar A S, Ahamdzadeh K. Factors affecting number of citations: a comprehensive review of the literature[J]. Scientometrics, 2016, 107(3): 1195-1225. 25 Bornmann L. Does the normalized citation impact of universities profit from certain properties of their published documents—such as the number of authors and the impact factor of the publishing journals? A multilevel modeling approach[J]. Journal of Informetrics, 2019, 13(1): 170-184. 26 王海涛, 谭宗颖, 陈挺. 论文被引频次影响因素研究——兼论被引频次评估科研质量的合理性[J]. 科学学研究, 2016, 34(2): 171-177. 27 Weiner B. An attributional theory of achievement motivation and emotion[J]. Psychological Review, 1985, 92(4): 548-573. 28 徐建中, 王名扬. 文献被引特征空间上的引文模式分析[J]. 情报杂志, 2013, 32(11): 55-58. 29 耿骞, 景然, 靳健, 等. 学术论文引用预测及影响因素分析[J]. 图书情报工作, 2018, 62(14): 29-40. 30 Abrishami A, Aliakbary S. Predicting citation counts based on deep neural network learning techniques[J]. Journal of Informetrics, 2019, 13(2): 485-499. 31 Wang M Y, Wang Z Y, Chen G S. Which can better predict the future success of articles? Bibliometric indices or alternative metrics[J]. Scientometrics, 2019, 119(3): 1575-1595. 32 Cui Y X, Wang Y Z, Liu X Z, et al. Multidimensional scholarly citations: Characterizing and understanding scholars’ citation behaviors[J]. Journal of the Association for Information Science and Technology, 2023, 74(1): 115-127. 33 Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 4768-4777. 34 Chen T Q, Guestrin C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 785-794. 35 Shapley L S. Contributions to the theory of games[M]. Princeton: Princeton University Press, 1953. 36 Weinstock M. Citation indexes[M]// Encyclopedia of Library & Information Science. New York: Marcel Dekker, 1971, 5(1): 16-40. 37 Kessler M M. Bibliographic coupling between scientific papers[J]. American Documentation, 1963, 14(1): 10-25. 38 Bouma G. Normalized (pointwise) mutual information in collocation extraction[C]// Proceedings of the Biennial GSCL Conference, Tubingen, Germany, 2009: 31-40. 39 Merton R K. The Matthew effect in science: the reward and communication systems of science are considered[J]. Science, 1968, 159(3810): 56-63. 责任编辑 潘尧 |
|
|
|