|
|
Propensity Score Matching: Facilitating the Causal Inference of Data Science-Oriented Information Studies |
Wang Xiaolun, Zhao Yuxiang, Wang Yuefen |
School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094 |
|
|
Abstract In the era of big data, data science-oriented quantitative research in the information science field must go beyond analysis of correlation to analysis of causality, and more effectively mine data value and expand new quantitative methods in information studies. This article introduces a statistical method for measuring treatment effects based on secondhand or observational data, called propensity score matching (PSM). First, we introduce the implementation steps of PSM to clarify the method’s origins and principles. Second, we discuss the need for controlled experiments and causal inference in the field of data science-oriented quantitative information studies, and emphasize the advantages and contributions of PSM. Third, we conduct an in-depth review of existing work employing the PSM approach in both information science and related fields, and elaborate upon directions for future research applying PSM in information science. Finally, we describe the future prospects and challenges of this method in the field of information science. This study thus sheds light on the quantitative area of information science from a data science perspective.
|
Received: 10 April 2020
|
|
|
|
1 巴志超, 李纲, 周利琴, 等. 数据科学及其对情报学变革的影响[J]. 情报学报, 2018, 37(7): 653-667. 2 李广建, 江信昱. 情报分析计算化: 背景、作用及关键问题[J]. 图书情报工作, 2017, 61(16): 24-30. 3 李广建, 罗立群. 计算型情报分析的进展[J]. 中国图书馆学报, 2019, 45(4): 29-43. 4 陈云松, 范晓光. 定量研究须直面因果判断[N]. 中国社会科学报, 2011-02-15(11). 5 胡永远, 周志凤. 基于倾向得分匹配法的政策参与效应评估[J]. 中国行政管理, 2014(1): 98-101. 6 Rosenbaum P R, Rubin D B. The central role of the propensity score in observational studies for causal effects[J]. Biometrika, 1983, 70(1): 41-55. 7 李强. 实验社会科学: 以实验政治学的应用为例[J]. 清华大学学报(哲学社会科学版), 2016, 31(4): 41-42. 8 D’Agostino Jr R B. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group[J]. Statistics in Medicine, 1998, 17(19): 2265-2281. 9 Roy A D. Some thoughts on the distribution of earnings[J]. Oxford Economic Papers, 1951, 3(2): 135-146. 10 Heckman J J. Sample selection bias as a specification error[J]. Econometrica, 1979, 47(1): 153-161. 11 赵宇翔, 刘周颖, 宋士杰. 从免费到付费: 在线知识问答平台用户标识对回答者转移行为的影响[J]. 图书与情报, 2019(2): 16-28. 12 Weitzen S, Lapane K L, Toledano A Y, et al. Principles for modeling propensity scores in medical research: A systematic literature review[J]. Pharmacoepidemiology and Drug Safety, 2004, 13(12): 841-853. 13 Austin P C, Grootendorst P, Anderson G M. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study[J]. Statistics in Medicine, 2007, 26(4): 734-753. 14 Brookhart M A, Schneeweiss S, Rothman K J, et al. Variable selection for propensity score models[J]. American Journal of Epidemiology, 2006, 163(12): 1149-1156. 15 Rubin D B, Thomas N. Matching using estimated propensity scores: Relating theory to practice[J]. Biometrics, 1996, 52(1): 249-264. 16 Aul N, Davis G. Use of propensity score matching method and hybrid Bayesian method to estimate crash modification factors of signal installation[J]. Transportation Research Record, 2006, 1950(1): 17-23. 17 Wu H L, Tai Y H, Lin S P, et al. The impact of blood transfusion on recurrence and mortality following colorectal cancer resection: A propensity score analysis of 4,030 patients[J]. Scientific Reports, 2018, 8: 13345. 18 Hullsiek K H, Louis T A. Propensity score modeling strategies for the causal analysis of observational data[J]. Biostatistics, 2002, 3(2): 179-193. 19 Flury B K, Riedwyl H. Standard distance in univariate and multivariate analysis[J]. The American Statistician, 1986, 40(3): 249-251. 20 Klein J P, Moeschberger M L. Survival analysis: Techniques for censored and truncated data[M]. New York: Springer, 2006. 21 Stampf S, Graf E, Schmoor C, et al. Estimators and confidence intervals for the marginal odds ratio using logistic regression and propensity score stratification[J]. Statistics in Medicine, 2010, 29(7-8): 760-769. 22 Pentland A S. The data-driven society[J]. Scientific American, 2013, 309(4): 78-83. 23 Lazer D, Pentland A, Adamic L, et al. Computational social science[J]. Science, 2009, 323(5915): 721-723. 24 李广建, 江信昱. 论计算型情报分析[J]. 中国图书馆学报, 2018, 44(2): 4-16. 25 罗俊. 计算·模拟·实验: 计算社会科学的三大研究方法[J]. 学术论坛, 2020, 43(1): 35-49. 26 Risjord M. Philosophy of social science[M]. New York: Routledge, 2014. 27 Bates M J. The invisible substrate of information science[J]. Journal of the American Society for Information Science, 1999, 50(12): 1043-1050. 28 李广建, 化柏林. 大数据分析与情报分析关系辨析[J]. 中国图书馆学报, 2014, 40(5): 14-22. 29 刘千里, 叶鹰. 实验情报学的理论设计与现实基础[J]. 情报学报, 2018, 37(12): 1249-1261. 30 赵洪, 王芳, 柯平. 图书情报学实验研究方法与应用方向探析[J]. 情报科学, 2018, 36(11): 23-28. 31 姜婷婷, 范水香, 王昊. 高校图书馆OPAC中的分面搜索对用户体验的影响——基于不同任务的对比实验分析[J]. 图书情报工作, 2015, 59(4): 114-121. 32 许鑫, 曹阳. 基于眼动追踪实验的高校图书馆门户网站网页设计研究[J]. 大学图书馆学报, 2017, 35(3): 46-52. 33 刘向, 马费成. 科学知识网络的演化与动力——基于科学引证网络的分析[J]. 管理科学学报, 2012, 15(1): 87-94. 34 许加明, 陈友华. 数据质量、前提假设与因果模型——社会科学定量研究之反思[J]. 社会科学研究, 2020(2): 130-139. 35 储荷婷. 图书馆情报学主要研究方法: 了解、选择及使用[J]. 图书情报工作, 2019, 63(1): 146-152. 36 温兴祥, 杜在超. 匹配法综述: 方法与应用[J]. 统计研究, 2015, 32(4): 104-112. 37 胡安宁. 倾向值匹配与因果推论: 方法论述评[J]. 社会学研究, 2012, 27(1): 221-242, 246. 38 叶鹰. 图书情报学中定性和定量研究方法的科学哲学基础及双重整合原理探析[J]. 中国图书馆学报, 2017, 43(2): 4-12. 39 包昌火, 金学慧, 张婧, 等. 论中国情报学学科体系的构建[J]. 情报杂志, 2018, 37(10): 1-11, 41. 40 Mutz R, Wolbring T, Daniel H D. The effect of the “very important paper” (VIP) designation in Angewandte Chemie International Edition on citation impact: A propensity score matching analysis[J]. Journal of the Association for Information Science and Technology, 2017, 68(9): 2139-2153. 41 Mirnezami S R, Beaudry C. The effect of holding a research chair on scientists’ productivity[J]. Scientometrics, 2016, 107(2): 399-454. 42 Shimada Y A, Tsukada N, Suzuki J. Promoting diversity in science in Japan through mission-oriented research grants[J]. Scientometrics, 2017, 110(3): 1415-1435. 43 Liu M J, Hu X, Wang Y D, et al. Survive or perish: Investigating the life cycle of academic journals from 1950 to 2013 using survival analysis methods[J]. Journal of Informetrics, 2018, 12(1): 344-364. 44 宋士杰, 赵宇翔, 韩文婷, 等. 互联网环境下公民健康素养对健康风险的抑制效应分析——基于CHNS数据的慢性病实证研究[J]. 数据分析与知识发现, 2019, 3(4): 13-21. 45 宋士杰, 宋小康, 赵宇翔, 等. 互联网使用对于老年人孤独感缓解的影响——基于CHARLS数据的实证研究[J]. 图书与情报, 2019(1): 63-69. 46 王知津, 卞丹, 王文爽. 论情报学研究中的跨学科思维[J]. 情报科学, 2010, 28(5): 641-647, 651. 47 黄长著. 用发展的视角观察发展中的图书馆学情报学[J]. 情报资料工作, 2010, 31(1): 5-10, 38. 48 沈固朝. 两种情报观: Information还是Intelligence?——在情报学和情报工作中引入“Intelligence”的思考[J]. 术语标准化与信息技术, 2009(1): 22-30. 49 李静, 林哲薇, 牛毅. “学术权力”和“行政权力”融合对学术资源配置的影响效应分析——基于国家社科基金立项项目的实证研究[J]. 科技管理研究, 2017, 37(16): 38-45. 50 张冰冰, 张青根, 沈红. 海外访学能提高高校教师的论文产出吗?——基于“2014中国大学教师调查”的分析[J]. 宏观质量研究, 2018, 7(2): 114-128. 51 Luo W B, Guo X X, Zhong S H, et al. Environmental information disclosure quality, media attention and debt financing costs: Evidence from Chinese heavy polluting listed companies[J]. Journal of Cleaner Production, 2019, 231: 268-277. 52 杨宜, 赵一林. 媒体类型、媒体关注与上市公司违规行为——基于倾向得分匹配法的研究[J]. 现代经济探讨, 2017(12): 60-69. 53 张露, 黄京华, 黎波. ERP实施对企业绩效影响的实证研究——基于倾向得分匹配法[J].清华大学学报(自然科学版), 2013, 53(1): 117-121. 54 Grimes A, Ren C, Stevens P. The need for speed: Impacts of Internet connectivity on firm productivity[J]. Journal of Productivity Analysis, 2012, 37(2): 187-201. 55 Bai X, Marsden J R, Ross W T, et al. How e-WOM and local competition drive local retailers’ decisions about daily deal offerings[J]. Decision Support Systems, 2017, 101: 82-94. 56 Oestreicher-Singer G, Zalmanson L. Content or community? A digital business strategy for content providers in the social age[J]. MIS Quarterly, 2013: 591-616. 57 叶鹰. 图书情报学的学术思想与技术方法及其开新[J]. 中国图书馆学报, 2019, 45(2): 15-25. 58 梁鹤年. 政策规划与评估方法[M]. 丁进锋, 译. 北京: 中国人民大学出版社, 2009: 88-89. 59 马海群, 冯畅. 基于S-CAD方法的国家信息政策评估研究[J]. 情报学报, 2018, 37(10): 1060-1076. 60 赖茂生. 新环境、新范式、新方法、新能力——新时代情报学发展的思考[J]. 情报理论与实践, 2017, 40(12): 1-5. 61 Imbens G W. The role of the propensity score in estimating dose-response functions[J]. Biometrika, 2000, 87(3): 706-710. 62 胡咏梅, 唐一鹏. 公共政策或项目的因果效应评估方法及其应用[J]. 华中师范大学学报(人文社会科学版), 2018, 57(3): 168-181. 63 马费成. 情报学的进展与深化[J]. 情报学报, 1996, 15(5): 338-344. 64 Rubin D B. The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials[J]. Statistics in Medicine, 2007, 26(1): 20-36. |
|
|
|