Estimation of Average Treatment Effects
Total Page:16
File Type:pdf, Size:1020Kb
Estimation of Average Treatment Effects Honors Thesis Peter Zhang Abstract The estimation of average treatment effects is an important issue in economic evaluations of the impact of policy intervention on job employment and the effect of education and training on income. This honors thesis is concerned with studying different approaches to the estimation of an average treatment effect. Motivated by the works of Khandker, Koolwal, and Samad (2010) and Kreif, Grieve, Radice, and Sekhon (2013), we consider complete-case method, inverse probability weighting and propensity score matching methods using the propensity score, the regression method using two treatment-specific regression models, augmented inverse probability weighting and general- ized method of moments methods using both the propensity score and treatment-specific regression models. These methods yield six different estimates of average treatment effects. Our empirical study is an application of these six estimates to the assessment of the National Supported Work job-training program; our analysis shows a positive impact of this job-training program with higher annual earnings for those participating in the program. We conduct a Monte Carlo simulation study that investigates these six estimates’ performance. Overall, we find that the regression and general- ized method of moments approaches produce least biased and most efficient estimates of an average treatment effect; yet the resulting estimates are most robust against functional form misspecification of the propensity score and treatment-specific regression models. 1. Introduction Causality, i.e., the relationship between cause and effect, is a topic of great significance in eco- nomics and related fields such as medicine, epidemiology, and health science. For example, the impact of policy intervention on job employment, the effect of education and training on income, and the effectiveness of a drug (or whether it is effective at all) on a disease are all questions of causality. One important and commonly used measure of causality is the average treatment effect (ATE) for a binary policy or treatment on a scalar outcome, which is the mean outcome difference between 1 the treatment and control groups. Economic evaluations of policy or health care interventions in many observational studies often require identification and estimation of ATEs, which is challenging because randomized experiments cannot always be implemented. The potential outcome framework, described in Khandker, Koolwal, and Samad (2010), provides a useful way of identifying and evaluating ATEs in observational studies. Under this conceptual framework, the problem of estimating an ATE becomes a missing data problem due to the absence of data on counterfactual outcomes (namely, outcomes from treated units had they not been exposed to the treatment). The presence of missing data in causal inference often leads to potential selection bias and thus complicates the relationship between cause and effect. To account for potential selection bias in treatment (program) participation that might affect evaluation of ATEs, policy makers need to isolate the effect of the policy intervention on outcomes from other observed and unobserved confounding factors affecting the outcomes. This implies that the evaluation of ATEs in observational studies often requires adjustment for differences in baseline covariates because treatment and control groups can be unbalanced on measured or unmeasured covariates. As discussed in Khandker, Koolwal, and Samad (2010), there are a number of different approaches in the literature to handling missing counterfactuals in impact evaluation theory. Rosenbaum and Rubin (1983) introduce the notion of propensity score (PS) defined as the probability of assignment to treatment conditional on covariates. The PS gives rise to two different methods of estimating ATEs, the inverse probability weighting (IPW) method and the propensity score matching (PSM); both of which control the differences between treatment and control groups by adjusting baseline covariates and thus reduce the selection bias. The IPW method weighs each complete outcome by the inverse probability of itself being observed, whereas the PSM method constructs a matching counterfactual or control group that is as similar to the treatment group as possible in terms of observed covariates based on PS. Instead of modeling the probability of assignment on covariates, the regression method estimates the ATE by modeling the outcome on covariates, as discussed in Kreif, Grieve, Radice, and Sekhon (2013). Robins, Rotnitzky, and Zhao (1994) propose a hybrid of the PS and regression methods to estimate ATEs. Their method, called the augmented inverse probability weighting (AIPW) method, is doubly robust (DR) in the sense that it renders consistent estimates of ATEs if either one of the PS or regression models is correctly specified, but not necessarily both. Compared to the PS and regression methods, the AIPW method is less sensitive to the 2 misspecification of the PS or regression model. For other approaches to estimating the ATE in the literature, see, for example, Khandker, Koolwal, and Samad (2010). The generalized method of moments (GMM) method, due to Hansen (1982), is an effective and widely used moments-based approach to parameter estimation in microeconometrics studies. The focus of this thesis is to explore an alternative approach to estimation of ATEs from a GMM perspective by exploiting the PS and regression models. The goal of this thesis is two-fold. The first goal is to apply the GMM-based estimate of ATEs to the program evaluation in econometrics by analyzing a real dataset from a randomized evaluation of a labor market training program, the National Supported Work (NSW) job-training program, originally analyzed by Lalonde (1986) and subsequently by Dehejia and Wahba (1999). It is also of interest to compare the GMM-based estimate of ATEs with other existing ATE estimates described in Khandker, Koolwal, and Samad (2010). Thus, our second goal is to perform a simulation study to examine the merits of the GMM-based and other related methods. This thesis is ordered as follows. Section 2 describes various methodologies of estimating ATEs. The results of the analysis of the NSW job-training data are reported in Section 3. Section 4 presents simulations results comparing various ATE estimates. Finally, the thesis is concluded with discussion in Section 5. 2. Methodology For the ith participant with i = 1,...,n, let Ti represent the treatment assignment taking value 1 if participant i is treated and value 0 if participant i is not treated. Furthermore, let Yi(1) denote the potential outcome under treatment and Yi(0) denote the potential outcome when there is no treatment; then the ATE is given by ATE = E[Yi(1) − Yi(0)]. The potential outcomes Y1(1) and Yi(0) cannot be observed simultaneously for each participant i; instead, we can only observe Yi = TiYi(1)+ (1 − Ti)Yi(0), the observed outcome from participant i. In addition to observe Ti and Yi, we can also observe Xi, a set of other baseline covariates or characteristics of individual i, such as his or her household and local environment. Thus, the observed data is composed of (Ti,Yi,Xi) for i = 1,...,n. Without adjustment for baseline covariates, a complete-case estimator of ATE is 3 simply the mean difference in observed outcomes by treatment status: n n i=1 TiYi i=1(1 − Ti)Yi ATECC = n − n . P i=1 Ti P i=1(1 − Ti) d P P It is widely known that ATECC is biased in observational studies and requires adjustment for differ- ences in baseline covariates.d One important assumption in causal inference is the unconfoundedness or conditional indepen- dence assumption (Rosenbaum and Rubin, 1983), which states that conditional on a set of observable covariates Xi that are not affected by treatment, the potential outcomes Yi(1) and Yi(0) are indepen- dent of treatment assignment Ti; symbolically, the unconfoundedness assumption can be expressed as (Yi(1),Yi(0))⊥ Ti|Xi. Under the unconfoundedness assumption, the propensity score, which is the probability of treatment assignment conditional on covariates, is given by p(Xi)= P (Ti =1|Xi)= P (Ti =1|Yi(1),Yi(0),Xi). As in Kreif, Grieve, Radice, and Sekhon (2013), a commonly adopted parametric model for PS is the logistic regression model given by T eη Xi p(Xi, η)= T , 1+ eη Xi where η is a vector parameter and can be estimated by the maximum likelihood estimateη ˆ based on the observed data (Ti,Xi), i =1,...,n. As pointed by Kreif, Grieve, Radice, and Sekhon (2013), the IPW estimate of ATE is obtained below by reweighting the observed outcomes for treatment and control samples using the inverse of the estimated probability of the observed treatment: n n 1 1 ATE = T w Y − (1 − T )w Y , IPW n i i i n i i i Xi=1 Xi=1 d where wi = 1/p(Xi, ηˆ). The validity of this IPW estimate relies on the correct specification of the PS model p(Xi, η). Kreif, Grieve, Radice, and Sekhon (2013) also consider employing two generalized linear models g1(Xi, β1) and g0(Xi, β0) to estimate ATE as given below: n n 1 1 ATE = g (X , βˆ ) − g (X , βˆ ), REG n 1 i 1 n 0 i 0 Xi=1 Xi=1 d 4 where βˆ1 and βˆ0 are the maximum likelihood estimates or the least squares estimates of the parameter vectors β1 and β0. The validity of this regression estimate depends on the correct specification of the two treatment-specific regression models g1(Xi, β1) and g0(Xi, β0). Combining the models for the PS and for the potential outcomes, Kreif, Grieve, Radice, and Sekhon (2013) consider the following AIPW estimate of ATE: n n 1 1 ATE = T w [Y − g (X , βˆ )] − (1 − T )w [Y − g (X , βˆ )] AIPW n i i i 1 i 1 n i i i 0 i 0 Xi=1 Xi=1 d n n 1 1 + g (X , βˆ ) − g (X , βˆ ).