Modeling and Methodological Advances in Causal Inference by Shuxi Zeng
Total Page:16
File Type:pdf, Size:1020Kb
Modeling and Methodological Advances in Causal Inference by Shuxi Zeng Department of Statistical Science Duke University Date: Approved: Fan Li, Advisor Surya T. Tokdar Jason Xu Susan C. Alberts Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistical Science in the Graduate School of Duke University 2021 ABSTRACT Modeling and Methodological Advances in Causal Inference by Shuxi Zeng Department of Statistical Science Duke University Date: Approved: Fan Li, Advisor Surya T. Tokdar Jason Xu Susan C. Alberts An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Statistical Science in the Graduate School of Duke University 2021 Copyright © 2021 by Shuxi Zeng All rights reserved Abstract This thesis develops novel theory, methods, and models in three major areas in causal inference: (i) propensity score weighting methods for randomized experiments and observational studies; (ii) causal mediation analysis with sparse and irregular longi- tudinal data; and (iii) machine learning methods for causal inference. All theoretical and methodological developments are accompanied by extensive simulation studies and real world applications. Our contribution to propensity score weighting method is presented in Chapter 2 and 3. In Chapter 2, we investigate the use of propensity score weighting in the ran- domized trials for covariate adjustment. We introduce the class of balancing weights and establish its theoretical properties. We demonstrate that it is asymptotically equivalent to the analysis of covariance (ANCOVA) and derive the closed-form vari- ance estimator. We further recommend the overlap weighting estimator based on its semiparametric efficiency and good finite-sample performance. In Chapter 3, We proposed a class of propensity score weighting estimators causal inference for survival outcomes based on the pseudo-observations. This class of estimators are applicable to several different target populations, survival causal estimands, as well as binary and multiple treatments. We study the theoretical properties of the weighting estimator and derive a new closed-form variance estimator. Our contribution to causal mediation analysis is presented in Chapter 4. Causal mediation analysis studies the causal relationships between treatment, outcome and an intermediate variable (i.e. mediator) that lies in between. We extend the existing causal mediation framework to the setting where both the mediator and outcome are measured repeatedly on sparse and irregular time grids. We view the observed mediator and outcome trajectories as realizations of underlying smooth stochastic processes and define causal estimands of direct and indirect effects accordingly. We provide assumptions to nonparametrically identify these estimands. We further de- iv vise a functional principal component analysis (FPCA) approach to estimate the smooth processes and consequently causal effects. We adopt the Bayesian paradigm to properly quantify the uncertainties in estimation. Our contribution to machine learning methods for causal inference is presented in Chapter 5 and 6. In Chapter 5, we develop a new algorithm that learns double-robust representations in observational studies, leading to consistent causal estimation if the model for either the propensity score or the outcome, but not necessarily both, is correctly specified. Specifically, we use the entropy balancing method to learn the weights that minimize the Jensen-Shannon divergence of the representation between the treated and control groups, based on which we make robust and efficient coun- terfactual predictions for both individual and average treatment effects. In Chapter 6, we study how to build a robust prediction model by exploiting the causal relation- ships among predictors. We propose a causal transfer random forest method learning the stable causal relationships efficiently from a large scale of observational data and a small amount of randomized data. We provide theoretical justifications and vali- date the algorithm empirically with synthetic experiments and real world prediction tasks. v Contents Abstract iv List of Tables xi List of Figures xiii Acknowledgements xv 1 Introduction 2 1.1 Motivation . .2 1.2 Research questions and main contributions . .4 1.3 Outline . .8 2 Propensity score weighting in RCT 10 2.1 Introduction . 10 2.2 Propensity score weighting for covariate adjustment . 13 2.2.1 The balancing weights . 13 2.2.2 The overlap weights . 16 2.3 Efficiency considerations and variance estimation . 18 2.3.1 Continuous outcomes . 19 2.3.2 Binary outcomes . 21 2.3.3 Variance estimation . 22 2.4 Simulation studies . 24 2.4.1 Simulation design . 24 2.4.2 Results on efficiency of point estimators . 27 2.4.3 Results on variance and interval estimators . 29 vi 2.4.4 Simulation studies with binary outcomes . 31 2.5 Application to the Best Apnea Interventions for Research Trial . 31 2.6 Discussion . 36 3 Propensity score weighting for survival outcome 40 3.1 Introduction . 40 3.2 Propensity score weighting with survival outcomes . 43 3.2.1 Time-to-event outcomes, causal estimands and assumptions . 43 3.2.2 Balancing weights with pseudo-observations . 44 3.3 Theoretical properties . 47 3.4 Simulation studies . 53 3.4.1 Simulation design . 53 3.4.2 Simulation results . 55 3.5 Application to National Cancer Database . 59 4 Mediation analysis with sparse and irregular longitudinal data 64 4.1 Introduction . 64 4.2 Motivating application: early adversity, social bond and stress . 67 4.2.1 Biological background . 67 4.2.2 Data . 69 4.3 Causal mediation framework . 71 4.3.1 Setup and causal estimands . 71 4.3.2 Identification assumptions . 74 4.4 Modeling mediator and outcome via functional principal component analysis . 77 4.5 Empirical application . 81 vii 4.5.1 Results of FPCA . 81 4.5.2 Results of causal mediation analysis . 84 4.6 Simulations . 86 4.6.1 Simulation design . 86 4.6.2 Simulation results . 89 5 Double robust representation learning 91 5.1 Introduction . 91 5.2 Background . 93 5.2.1 Setup and assumptions . 93 5.2.2 Related work . 94 5.3 Methodology . 97 5.3.1 Proposal: unifying covariate balance and representation learning 97 5.3.2 Practical implementation . 99 5.3.3 Theoretical properties . 102 5.4 Experiments . 104 5.4.1 Experimental setups . 105 5.4.2 Learned balanced representations . 106 5.4.3 Performance on semi-synthetic or real-world dataset . 108 5.4.4 High-dimensional performance and double robustness . 109 6 Causal transfer random forest 111 6.1 Introduction . 111 6.2 Related work . 114 6.2.1 Off-policy learning in online systems . 114 6.2.2 Transfer learning and domain adaptation . 114 viii 6.2.3 Causality and invariant learning . 115 6.3 Causal Transfer Random Forest . 116 6.3.1 Problem setup . 116 6.3.2 Proposed algorithm . 118 6.3.3 Interpretations from causal learning . 121 6.4 Experiments on synthetic data . 123 6.4.1 Setup and baselines . 123 6.4.2 Synthetic data with explicit mechanism . 124 6.4.3 Synthetic auction: implicit mechanism . 128 6.5 Experiments on real-world data . 130 6.5.1 Randomized experiment (R-data)............... 130 6.5.2 Robustness to real-world data shifts . 131 6.5.3 End-to-end marketplace optimization . 132 7 Conclusions 136 8 Appendix 141 8.1 Appendix for Chapter 2 . 141 8.1.1 Proofs of the propositions in Section 2.3 . 141 8.1.2 Derivation of the asymptotic variance and its consistent esti- mator in Section 2.3 . 150 8.1.3 Variance estimator forτ ^AIPW ................... 153 8.1.4 Additional simulations with binary outcomes . 154 8.1.5 Additional tables . 161 8.2 Appendix for Chapter 3 . 167 8.2.1 Proof of theoretical properties . 167 ix 8.2.2 Details on simulation design . 182 8.2.3 Additional simulation results . 185 8.2.4 Additional information of the application . 190 8.3 Appendix for Chapter 4 . 196 8.3.1 Proof of Theorem 3 . 196 8.3.2 Gibbs sampler . 200 8.3.3 Individual imputed process . 203 8.3.4 Simulation results for sample size N = 500; 1000 . 204 8.4 Appendix for Chapter 5 . 207 8.4.1 Theorem proofs . 207 8.4.2 Generalization to other estimands . 214 8.4.3 Experiments details . 216 8.5 Appendix for Chapter 6 . 218 8.5.1 Details on experiments . 218 8.5.2 Proof for theorems . 220 Bibliography 222 Biography 248 x List of Tables 2.1 Performance comparison under different scenarios for continuous out- comes in simulated RCT. 30 2.2 Baseline balance check for BestAIR study. 33 2.3 Results for application in BestAIR study. 35 3.1 Simulation results for zero treatment effect under different scenarios. 58 3.2 Results in NCDB application. 63 4.1 Summary of early adversity conditions in baboon study. 69 4.2 Mediation analysis results for baboon study. 85 4.3 Performance comparison for mediation analysis in simulations. 90 5.1 Results comparison on benchmark dataset for DRRL. 108 6.1 Performance comparison in real-world click predictions. 132 6.2 Performance comparison in real-world tuning tasks. 134 8.1 Performance comparison with continuous outcomes in simulated RCT. 163 8.2 Performance comparison with binary outcomes in simulated RCT, sce- nario (a)-(d). 164 8.3 Performance comparison with binary outcomes in simulated RCT, sce- nario (e)-(h). 165 8.4 Non convergence frequency with binary outcomes in simulated RCT. 166 8.5 Simulation results with non-zero treatment effect under different sce- narios. 189 8.6 Descriptive statistics of NCDB application. 190 xi 8.7 Additional simulations results for mediation analysis. 206 8.8 Hyperparameter choices . 216 8.9 Comparison for distribution shifts in tuning tasks. ..