Statistics and Causal Inference

Statistics and Causal Inference Kosuke Imai Princeton University June 2012 Empirical Implications of Theoretical Models (EITM) Summer Institute Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 1 / 82 Three Modes of Statistical Inference 1 Descriptive Inference: summarizing and exploring data Inferring “ideal points” from rollcall votes Inferring “topics” from texts and speeches Inferring “social networks” from surveys 2 Predictive Inference: forecasting out-of-sample data points Inferring future state failures from past failures Inferring population average turnout from a sample of voters Inferring individual level behavior from aggregate data 3 Causal Inference: predicting counterfactuals Inferring the effects of ethnic minority rule on civil war onset Inferring why incumbency status affects election outcomes Inferring whether the lack of war among democracies can be attributed to regime types Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 2 / 82 What is “Identification”? Inference: Learn about what you do not observe (parameters) from what you do observe (data) Identification: How much can we learn about parameters from infinite amount of data? Ambiguity vs. Uncertainty Identification assumptions vs. Statistical assumptions Point identification vs. Partial identification FURTHER READING: C. F. Manski. (2007). Identification for Prediction and Decision. Harvard University Press. Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 3 / 82 What is Causal Inference? Comparison between factual and counterfactual Incumbency effect: What would have been the election outcome if a candidate were not an incumbent? Resource curse thesis: What would have been the GDP growth rate without oil? Democratic peace theory: Would the two countries have escalated crisis in the same situation if they were both autocratic? FURTHER READING: Holland, P. (1986). Statistics and causal inference. (with discussions) Journal of the American Statistical Association, Vol. 81: 945–960. Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 4 / 82 Defining Causal Effects Units: i = 1;:::; n “Treatment”: Ti = 1 if treated, Ti = 0 otherwise Observed outcome: Yi Pre-treatment covariates: Xi Potential outcomes: Yi (1) and Yi (0) where Yi = Yi (Ti ) Voters Contact Turnout Age Party ID i Ti Yi (1) Yi (0) Xi Xi 1 1 1 ? 20 D 2 0 ? 0 55 R 3 0 ? 1 40 R . n 1 0 ? 62 D Causal effect: Yi (1) − Yi (0) Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 5 / 82 The Key Assumptions No simultaneity (different from endogeneity) No interference between units: Yi (T1; T2;:::; Tn) = Yi (Ti ) Potential violations: 1 spill-over effects 2 carry-over effects Cluster randomized experiments as a solution (more later) Stable Unit Treatment Value Assumption (SUTVA): no interference + “the same version” of the treatment Potential outcome is thought to be fixed: data cannot distinguish fixed and random potential outcomes But, potential outcomes across units have a distribution Observed outcome is random because the treatment is random Multi-valued treatment: more potential outcomes for each unit Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 6 / 82 Causal Effects of Immutable Characteristics “No causation without manipulation” (Holland, 1986) Immutable characteristics; gender, race, age, etc. What does the causal effect of gender mean? Causal effect of having a female politician on policy outcomes (Chattopadhyay and Duflo, 2004 QJE) Causal effect of having a discussion leader with certain preferences on deliberation outcomes (Humphreys et al. 2006 WP) Causal effect of a job applicant’s gender/race on call-back rates (Bertrand and Mullainathan, 2004 AER) Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 7 / 82 Average Treatment Effects Sample Average Treatment Effect (SATE): n 1 X Y (1) − Y (0) n i i i=1 Population Average Treatment Effect (PATE): E(Yi (1) − Yi (0)) Population Average Treatment Effect for the Treated (PATT): E(Yi (1) − Yi (0) j Ti = 1) Causal heterogeneity: Zero ATE doesn’t mean zero effect for everyone! Other quantities: Conditional ATE, Quantile Treatment Effects, etc. Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 8 / 82 Classical Randomized Experiments Units: i = 1;:::; n May constitute a simple random sample from a population Treatment: Ti 2 f0; 1g Outcome: Yi = Yi (Ti ) Complete randomization of the treatment assignment Exactly n1 units receive the treatment n0 = n − n1 units are assigned to the control group Pn Assumption: for all i = 1;:::; n, i=1 Ti = n1 and n (Y (1); Y (0)) ?? T ; Pr(T = 1) = 1 i i i i n Estimand = SATE or PATE Estimator = Difference-in-means: n n 1 X 1 X τ^ ≡ Ti Yi − (1 − Ti )Yi n1 n0 i=1 i=1 Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 9 / 82 Estimation of Average Treatment Effects Key idea (Neyman 1923): Randomness comes from treatment assignment (plus sampling for PATE) alone Design-based (randomization-based) rather than model-based Statistical properties of τ^ based on design features n Define O ≡ fYi (0); Yi (1)gi=1 Unbiasedness (over repeated treatment assignments): n n 1 X 1 X E(^τ j O) = E(Ti j O)Yi (1) − f1 − E(Ti j O)gYi (0) n1 n0 i=1 i=1 n 1 X = (Y (1) − Y (0)) = SATE n i i i=1 Over repeated sampling: E(^τ) = E(E(^τ j O)) = E(SATE) = PATE Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 10 / 82 Relationship with Regression The model: Yi = α + βTi + i where E(i ) = 0 Equivalence: least squares estimate β^ =Difference in means Potential outcomes representation: Yi (Ti ) = α + βTi + i Constant additive unit causal effect: Yi (1) − Yi (0) = β for all i α = E(Yi (0)) A more general representation: Yi (Ti ) = α + βTi + i (Ti ) where E(i (t)) = 0 Yi (1) − Yi (0) = β + i (1) − i (0) β = E(Yi (1) − Yi (0)) α = E(Yi (0)) as before Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 11 / 82 Bias of Model-Based Variance The design-based perspective: use Neyman’s exact variance What is the bias of the model-based variance estimator? Finite sample bias: ! ! σ^2 σ2 σ2 Bias = − 1 + 0 E Pn 2 i=1(Ti − T n) n1 n0 (n1 − n0)(n − 1) 2 2 = (σ1 − σ0) n1n0(n − 2) 2 2 Bias is zero when n1 = n0 or σ1 = σ0 In general, bias can be negative or positive and does not asymptotically vanish Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 12 / 82 Robust Standard Error 2 2 Suppose Var(i j T ) = σ (Ti ) 6= σ Heteroskedasticity consistent robust variance estimator: n !−1 n ! n !−1 \^ X > X 2 > X > Var((^α; β) j T ) = xi xi î xi xi xi xi i=1 i=1 i=1 where in this case xi = (1; Ti ) is a column vector of length 2 Model-based justification: asymptotically valid in the presence of heteroskedastic errors Design-based evaluation: ! σ2 σ2 = − 1 + 0 Finite Sample Bias 2 2 n1 n0 Bias vanishes asymptotically Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 13 / 82 Cluster Randomized Experiments Units: i = 1; 2;:::; nj Clusters of units: j = 1; 2;:::; m Treatment at cluster level: Tj 2 f0; 1g Outcome: Yij = Yij (Tj ) Random assignment: (Yij (1); Yij (0))??Tj Estimands at unit level: m nj 1 X X SATE ≡ (Y (1) − Y (0)) Pm n ij ij j=1 j j=1 i=1 PATE ≡ E(Yij (1) − Yij (0)) Random sampling of clusters and units Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 14 / 82 Merits and Limitations of CREs Interference between units within a cluster is allowed Assumption: No interference between units of different clusters Often easy to implement: Mexican health insurance experiment Opportunity to estimate the spill-over effects D. W. Nickerson. Spill-over effect of get-out-the-vote canvassing within household (APSR, 2008) Limitations: 1 A large number of possible treatment assignments 2 Loss of statistical power Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 15 / 82 Design-Based Inference For simplicity, assume equal cluster size, i.e., nj = n for all j The difference-in-means estimator: m m 1 X 1 X τ^ ≡ Tj Y j − (1 − Tj )Y j m1 m0 j=1 j=1 Pnj where Y j ≡ i=1 Yij =nj Easy to show E(^τ j O) = SATE and thus E(^τ) = PATE Exact population variance: Var(Y (1)) Var(Y (0)) Var(^τ) = j + j m1 m0 Intracluster correlation coefficient ρt : σ2 Var(Y (t)) = t f1 + (n − 1)ρ g ≤ σ2 j n t t Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 16 / 82 Cluster Standard Error Cluster robust variance estimator: −1 −1 0 m 1 0 m 1 0 m 1 \^ X > X > > X > Var((^α; β) j T ) = @ Xj Xj A @ Xj ^j ^j Xj A @ Xj Xj A j=1 j=1 j=1 where in this case Xj = [1Tj ] is an nj × 2 matrix and ^j = (^1j ;:::; ^nj j ) is a column vector of length nj Design-based evaluation (assume nj = n for all j): ! (Y (1)) (Y (0)) = − V j + V j Finite Sample Bias 2 2 m1 m0 Bias vanishes asymptotically as m ! 1 with n fixed Implication: cluster standard errors by the unit of treatment assignment Kosuke Imai (Princeton) Statistics & Causal Inference EITM, June 2012 17 / 82 Example: Seguro Popular de Salud (SPS) Evaluation of the Mexican universal health insurance program Aim: “provide social protection in health to the 50 million uninsured Mexicans” A key goal: reduce out-of-pocket health expenditures Sounds obvious but not easy to achieve in developing countries Individuals must affiliate in order to receive SPS services 100 health clusters nonrandomly chosen for evaluation Matched-pair design: based on population, socio-demographics, poverty, education, health infrastructure etc.

Load more