|
|
|
| Attribution Analysis of Research Collaboration Frequency Based on XGBoost-SHAP Framework |
| Peng Zhaoqi1,2, Shi Bin1,2, Yang Alex Jie1,2, Deng Sanhong1,2 |
1.School of Information Management, Nanjing University, Nanjing 210023 2.Key Laboratory of Data Engineering and Knowledge Services in Provincial Universities (Nanjing University), Nanjing 210023 |
|
|
|
|
Abstract The frequency of scientific collaboration may exhibit complex collaborative characteristics, which plays a key role in comprehensively understanding the intensity and patterns of researchers’ collaborative relationships. This study selected 13,220,951 authentic collaborative relationships from the PubMed knowledge graph 2.0 (PKG 2.0) biomedical dataset as research samples to reveal the key drivers and inherent patterns behind varying collaboration frequencies. First, the frequency of collaboration was treated as a proxy variable to measure the strength of the collaborative relationship, categorizing the frequency into low, medium, and high levels based on its distribution characteristics. Second, ten distinct variables across four dimensions were selected to form a feature system describing author collaboration intensity, which included research similarity, achievement production models, academic capital, and individual attributes. The eXtreme Gradient Boosting (XGBoost) algorithm was employed to capture the complex correlations among high-dimensional features. Finally, the SHapley Additive exPlanations (SHAP) framework was used to analyze the attribution of the prediction model. The influence degree and interaction mechanism of the features on different cooperation intensity were further evaluated. The study reveals that thematic similarity among authors plays a dominant role in the classification of collaboration frequency, followed by differences in the number of papers and citations (total and average). High thematic congruence serves as the core factor sustaining high frequency collaboration, whereas low frequency collaboration prioritizes cumulative output over individual paper impact. The effects of thematic similarity, paper count differences, and H-index variations on collaboration frequency followed distinct patterns: bimodal symmetry, threshold stability, and marginal diminishing effects, respectively. Furthermore, high-frequency collaborations demonstrated remarkable tolerance for heterogeneous factors such as knowledge structure or influence composition, often forming effective complementary division mechanisms through coordinated efforts.
|
|
Received: 15 April 2025
|
|
|
|
1 Fatt C K, Abu Ujum E, Ratnavelu K. The structure of collaboration in the journal of finance[J]. Scientometrics, 2010, 85(3): 849-860. 2 曾粤亮, 司莉. 跨学科科研合作: 背景、理论研究与实践进展[J]. 图书情报工作, 2021, 65(10): 127-140. 3 Song X, Zhang Y, Pan R, et al. Link prediction for statistical collaboration networks incorporating institutes and research interests[J]. IEEE Access, 2022, 10: 104954-104965. 4 Pisani N, Boekhout H D, Heemskerk E M, et al. China’s rise as global scientific powerhouse: a trajectory of international collaboration and specialization in high-impact research[J]. Research Policy, 2025, 54(8): 105288. 5 Hou J, Pan H X, Guo T, et al. Prediction methods and applications in the science of science: a survey[J]. Computer Science Review, 2019, 34: 100197. 6 Newman M E J. Coauthorship networks and patterns of scientific collaboration[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(suppl_1): 5200-5205. 7 Bozeman B, Corley E. Scientists’ collaboration strategies: implications for scientific and technical human capital[J]. Research Policy, 2004, 33(4): 599-616. 8 Leahey E, Reikowsky R C. Research specialization and collaboration patterns in sociology[J]. Social Studies of Science, 2008, 38(3): 425-440. 9 Leahey E. From sole investigator to team scientist: trends in the practice and study of research collaboration[J]. Annual Review of Sociology, 2016, 42: 81-100. 10 Liu J W, Guo X F, Xu S, et al. A new interpretation of scientific collaboration patterns from the perspective of symbiosis: an investigation for long-term collaboration in publications[J]. Journal of Informetrics, 2023, 17(1): 101372. 11 刘飞, 毛进, 李纲. 科学计量学领域科研合作特征及空间集聚模式研究[J]. 情报科学, 2022, 40(1): 166-175. 12 刘晓婷, 黄颖, 李瑞婻, 等. 内聚-耦合视角下科研团队合作模式识别与对比研究[J]. 情报科学, 2022, 40(12): 170-180. 13 Gazni A, Sugimoto C R, Didegah F. Mapping world scientific collaboration: authors, institutions, and countries[J]. Journal of the American Society for Information Science and Technology, 2012, 63(2): 323-335. 14 Finardi U, Buratti A. Scientific collaboration framework of BRICS countries: an analysis of international coauthorship[J]. Scientometrics, 2016, 109(1): 433-446. 15 欧桂燕, 岳名亮, 吴江, 等. 杰出青年科研人员学术职业生涯科研合作特征演变分析——以化学领域为例[J]. 情报学报, 2021, 40(7): 756-767. 16 刘晓娟, 余梦霞, 赵卓婧, 等. “长江学者奖励计划”资助前后科研合作变化与分析——以2005年度长江学者为例[J]. 图书情报工作, 2020, 64(3): 100-110. 17 曹仁猛, 谢维熙, 耿屿, 等. 新冠疫情暴发前后我国国际科研合作模式的变化——基于bioRxiv的分析[J]. 图书情报知识, 2022, 39(3): 41-49. 18 任妍, 杨金庆. 科研合作角色演化模式及其外生驱动因素研究[J]. 情报资料工作, 2025, 46(2): 37-45. 19 Petersen A M. Quantifying the impact of weak, strong, and super ties in scientific careers[J]. Proceedings of the National Academy of Sciences of the United States of America, 2015, 112(34): E4671-E4680. 20 Bu Y, Ding Y, Liang X K, et al. Understanding persistent scientific collaboration[J]. Journal of the Association for Information Science and Technology, 2018, 69(3): 438-448. 21 Bu Y, Murray D S, Ding Y, et al. Measuring the stability of scientific collaboration[J]. Scientometrics, 2018, 114(2): 463-479. 22 Wu L Y, Yi F, Bu Y, et al. Toward scientific collaboration: a cost-benefit perspective[J]. Research Policy, 2024, 53(2): 104943. 23 Wang G F, Gan Y T, Yang H D. The inverted U-shaped relationship between knowledge diversity of researchers and societal impact[J]. Scientific Reports, 2022, 12: Article No.18585. 24 马荣康, 李真真. 高被引还是零被引: 基于论文被引的最佳科研合作规模研究——来自Financial Times TOP 45商学院期刊的证据[J]. 情报学报, 2020, 39(11): 1182-1190. 25 Bozeman B, Gaughan M. How do men and women differ in research collaborations? An analysis of the collaborative motives and strategies of academic researchers[J]. Research Policy, 2011, 40(10): 1393-1402. 26 张明妍. 科研人员学术合作的性别差异[J]. 科学学研究, 2025, 43(6): 1197-1205. 27 Wang W, Yu S, Bekele T M, et al. Scientific collaboration patterns vary with scholars’ academic ages[J]. Scientometrics, 2017, 112(1): 329-343. 28 Benckendorff P, Zehrer A. Career and collaboration patterns in tourism research[J]. Current Issues in Tourism, 2016, 19(14): 1386-1404. 29 郭宇森, 杨艳萍. 国际科研合作网络视角下关键联络人识别模型构建[J]. 情报学报, 2024, 43(10): 1142-1153. 30 刘苗苗, 姜华, 刘盛博, 等. 不同学科科研合作差异的比较研究——以2017年教育部创新团队114位带头人为例[J]. 科技管理研究, 2019, 39(16): 100-107. 31 Liu Z F, Wang C L, Yang J Q. The effects of scientific collaboration network structures on impact and innovation: a perspective from project teams[J]. Journal of Informetrics, 2025, 19(1): 101611. 32 Gui Q C, Xu W, Jiang S D, et al. Unpacking the dynamics of international research collaboration network: structural effects and dyadic effects[J]. Technology in Society, 2025, 82: 102954. 33 Tao J, Zhou L N, Hickey K. Making sense of the black-boxes: toward interpretable text classification using deep learning models[J]. Journal of the Association for Information Science and Technology, 2023, 74(6): 685-700. 34 Ayoub J, Yang X J, Zhou F. Combat COVID-19 infodemic using explainable natural language processing models[J]. Information Processing & Management, 2021, 58(4): 102569. 35 安璐, 陈苗苗. 突发事件情境下政务微博信息发布有效性评估[J]. 情报学报, 2022, 41(7): 692-706. 36 崔蕴学, 王贤文, 王勇臻. 基于归因分析的引用模式挖掘及其实证研究[J]. 情报学报, 2023, 42(4): 381-392. 37 Chen T Q, Guestrin C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2016: 785-794. 38 Zhang Y L, Feng T, Wang S D, et al. A novel XGBoost method to identify cancer tissue-of-origin based on copy number variations[J]. Frontiers in Genetics, 2020, 11: 585029. 39 Bentéjac C, Cs?rg? A, Martínez-Mu?oz G. A comparative analysis of gradient boosting algorithms[J]. Artificial Intelligence Review, 2021, 54(3): 1937-1967. 40 Borisov V, Leemann T, Se?ler K, et al. Deep neural networks and tabular data: a survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(6): 7499-7519. 41 Lundberg S M, Lee S I. A unified approach to interpreting model predictions[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2017: 4768-4777. 42 Shapley L S. A value for n-person games[M]// Contributions to the Theory of Games. Princeton: Princeton University Press, 2016: 307-318. 43 Maschler M, Peleg B, Shapley L S. The kernel and bargaining set for convex games[J]. International Journal of Game Theory, 1971, 1(1): 73-93. 44 Lundberg S M, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees[J]. Nature Machine Intelligence, 2020, 2(1): 56-67. 45 Xu J, Yu C, Xu J W, et al. PubMed knowledge graph 2.0: connecting papers, patents, and clinical trials in biomedical science[J]. Scientific Data, 2025, 12: Article No.1018. 46 Karimi F, Wagner C, Lemmerich F, et al. Inferring gender from names on the web: a comparative evaluation of gender detection methods[C]// Proceedings of the 25th International Conference Companion on World Wide Web. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2016: 53-54. 47 Mueller J, Stumme G. Gender inference using statistical name characteristics in Twitter[C]// Proceedings of the 3rd Multidisciplinary International Social Networks Conference on SocialInformatics. New York: ACM Press, 2016: Article No.47. 48 Shen H Q, Xie J, Li J, et al. The correlation between scientific collaboration and citation count at the paper level: a meta-analysis[J]. Scientometrics, 2021, 126(4): 3443-3470. 49 Larivière V, Ni C Q, Gingras Y, et al. Bibliometrics: global gender disparities in science[J]. Nature, 2013, 504(7479): 211-213. 50 Benenson J F, Markovits H, Wrangham R. Rank influences human sex differences in dyadic cooperation[J]. Current Biology, 2014, 24(5): R190-R191. 51 杨杰, 孔嘉, 张艺炜, 等. 融合论文颠覆性与巩固性的学者二元影响力测度[J]. 情报学报, 2023, 42(12): 1412-1423. 52 杨杰, 王左戎, 邓三鸿, 等. 基于参考文献的论文跨学科性、跨时域性及其影响力研究[J]. 情报学报, 2024, 43(9): 1003-1014. 53 Wang J, Thijs B, Gl?nzel W. Interdisciplinarity and impact: distinct effects of variety, balance, and disparity[J]. PLoS One, 2015, 10(5): e0127298. 54 Conley J P, ?nder A S. The research productivity of new PhDs in economics: the surprisingly high non-success of the successful[J]. The Journal of Economic Perspectives, 2014, 28(3): 205-215. 55 Hagen N T. Harmonic allocation of authorship credit: source-level correction of bibliometric bias assures accurate publication and citation analysis[J]. PLoS One, 2008, 3(12): e4021. 56 Wang X F, Li R R, Ren S M, et al. Collaboration network and pattern analysis: case study of dye-sensitized solar cells[J]. Scientometrics, 2014, 98(3): 1745-1762. 57 Larivière V, Desrochers N, Macaluso B, et al. Contributorship and division of labor in knowledge production[J]. Social Studies of Science, 2016, 46(3): 417-435. 58 张琳, 孙蓓蓓, 黄颖. 跨学科合作模式下的交叉科学测度研究——以ESI社会科学领域高被引学者为例[J]. 情报学报, 2018, 37(3): 231-242. 59 Tuninetti M, Aleta A, Paolotti D, et al. Prediction of new scientific collaborations through multiplex networks[J]. EPJ Data Science, 2021, 10: Article No.25. 60 Aas K, Jullum M, L?land A. Explaining individual predictions when features are dependent: more accurate approximations to Shapley values[J]. Artificial Intelligence, 2021, 298: 103502. 61 Zeng A, Fan Y, Di Z R, et al. Impactful scientists have higher tendency to involve collaborators in new topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2022, 119(33): e2207436119. 62 Boudreau K J, Guinan E C, Lakhani K R, et al. Looking across and looking beyond the knowledge frontier: intellectual distance, novelty, and resource allocation in science[J]. Management Science, 2016, 62(10): 2765-2783. 责任编辑 魏瑞斌) |
|
|
|