Organizational Research Methods 00(0) 1-39 ª The Author(s) 2012 Using the Propensity Score Reprints and permission: sagepub.com/journalsPermissions.nav Method to Estimate Causal DOI: 10.1177/1094428112447816 http://orm.sagepub.com Effects: A Review and Practical Guide
Mingxiang Li1
Abstract Evidence-based management requires management scholars to draw causal inferences. Researchers generally rely on observational data sets and regression models where the independent variables have not been exogenously manipulated to estimate causal effects; however, using such models on observational data sets can produce a biased effect size of treatment intervention. This article introduces the propensity score method (PSM)—which has previously been widely employed in social science disciplines such as public health and economics—to the management field. This research reviews the PSM literature, develops a procedure for applying the PSM to estimate the cau- sal effects of intervention, elaborates on the procedure using an empirical example, and discusses the potential application of the PSM in different management fields. The implementation of the PSM in the management field will increase researchers’ ability to draw causal inferences using observational data sets.
Keywords causal effect, propensity score method, matching
Management scholars are interested in drawing causal inferences (Mellor & Mark, 1998). One example of a causal inference that researchers might try to determine is whether a specific manage- ment practice, such as group training or a stock option plan, increases organizational performance. Typically, management scholars rely on observational data sets to estimate causal effects of the management practice. Yet, endogeneity—which occurs when a predictor variable correlates with the error term—prevents scholars from drawing correct inferences (Antonakis, Bendahan, Jacquart, & Lalive, 2010; Wooldridge, 2002). Econometricians have proposed a number of techniques to deal
1Department of Management and Human Resources, University of Wisconsin-Madison, Madison, WI, USA
Corresponding Author: Mingxiang Li, Department of Management and Human Resources, University of Wisconsin-Madison, 975 University Avenue, 5268 Grainger Hall, Madison, WI 53706, USA Email: [email protected]
Downloaded from orm.sagepub.com at Vrije Universiteit 34820 on January 30, 2014 2 Organizational Research Methods 00(0) with endogeneity—including selection models, fixed effects models, and instrumental variables, all of which have been used by management scholars. In this article, I introduce the propensity score method (PSM) as another technique that can be used to calculate causal effects. In management research, many scholars are interested in evidence-based management (Rynes, Giluk, & Brown, 2007), which ‘‘derives principles from research evidence and translates them into practices that solve organizational problems’’ (Rousseau, 2006, p. 256). To contribute to evidence- based management, scholars must be able to draw correct causal inferences. Cox (1992) defined a cause as an intervention that brings about a change in the variable of interest, compared with the baseline control model. A causal effect can be simply defined as the average effect due to a certain intervention or treatment. For example, researchers might be interested in the extent to which train- ing influences future earnings. While field experiment is one approach that can be used to correctly estimate causal effects, in many situations field experiments are impractical. This has prompted scholars to rely on observational data, which makes it difficult for scholars to gauge unbiased causal effects. The PSM is a technique that, if used appropriately, can increase scholars’ ability to draw causal inferences using observational data. Though widely implemented in other social science fields, the PSM has generally been over- looked by management scholars. Since it was introduced by Rosenbaum and Rubin (1983), the PSM has been widely used by economists (Dehejia & Wahba, 1999) and medical scientists (Wolfe & Michaud, 2004) to estimate the causal effects. Recently, financial scholars (Campello, Graham, & Harvey, 2010), sociologists (Gangl, 2006; Grodsky, 2007), and political scientists (Arceneaux, Gerber, & Green, 2006) have implemented the PSM in their empirical studies. A Google Scholar search in early 2012 showed that over 7,300 publications cited Rosenbaum and Rubin’s classic 1983 article that introduced the PSM. An additional Web of Science analysis indicated that over 3,000 academic articles cited this influential article. Of these citations, 20% of the publications were in economics, 14% were in statistics, 10% were in methodological journals, and the remain- ing 56% were in health-related fields. Despite the widespread use of the PSM across a variety of disciplines, it has not been employed by management scholars, prompting Gerhart’s (2007) con- clusion that ‘‘to date, there appear to be no applications of propensity score in the management literature’’ (p. 563). This article begins with an overview of a counterfactual model, experiment, regression, and endo- geneity. This section illustrates why the counterfactual model is important for estimating causal effects and why regression models sometimes cannot successfully reconstruct counterfactuals. This is followed by a short review of the PSM and a discussion of the reasons for using the PSM. The third section employs a detailed example to illustrate how a treatment effect can be estimated using the PSM. The following section presents a short summary on the empirical studies that used the PSM in other social science fields, along with a description of potential implementation of the PSM in the management field. Finally, this article concludes with a discussion of the pros and cons of using the PSM to estimate causal effects.
Estimating Causal Effects Without the Propensity Score Method Evidence-based practices use quantitative methods to find reliable effects that can be implemen- ted by practitioners and administrators to develop and adopt effective policy interventions. Because the application of specific recommendations derived from evidence-based research is not costless, it is crucial for social scientists to draw correct causal inferences. As pointed out by King, Keohane, and Verba (1994), ‘‘we should draw causal inferences where they seem appro- priate but also provide the reader with the best and most honest estimate of the uncertainty of that inference’’ (p. 76).
Downloaded from orm.sagepub.com at Vrije Universiteit 34820 on January 30, 2014 Li 3
Counterfactual Model To better understand causal effect, it is important to discuss counterfactuals. In Rubin’s causal model (see Rubin, 2004, for a summary), Y1i and Y0i are potential earnings for individual i when i receives (Y1i) or does not receive training (Y0iÞ. The fundamental problem of making a causal inference is how to reconstruct the outcomes that are not observed, sometimes called counterfactuals, because they are not what happened. Conceptually, either the treatment or the nontreatment is not observed and hence is ‘‘missing’’ (Morgan & Winship, 2007). Specifically, if i received training at time t,the earnings for i at t þ 1isY1i. But if i also did not receive training at time t, the potential earnings for i at t þ 1isY0i. Then the effect of training can be simply expressed as Y1i Y0i. Yet, because it is impossible for i to simultaneously receive (Y1i) and not receive (Y0iÞ the training, scholars need to find other ways to overcome this fundamental problem. One can also understand this fundamental issue as the ‘‘what-if’’ problem. That is, what if individual i does not receive training? Hence, recon- structing the counterfactuals is crucial to estimate unbiased causal effects. The counterfactual model shows that it is impossible to calculate individual-level treatment effects, and therefore scholars have to calculate aggregated treatment effects (Morgan & Winship, 2007). There are two major versions of aggregated treatment effects: the average treatment effect (ATE) and the average treatment effect on the treated group (ATT). A simple definition of the ATE can be written as
ATE ¼ EYðÞ 1ijTi ¼ 1; 0 EðY0ijTi ¼ 1; 0Þ; ð1:1aÞ where E(.) represents the expectation in the population. Ti denotes the treatment with the value of 1 for the treated group and the value of 0 for the control group. In other words, the ATE can be defined as the average effect that would be observed if everyone in the treated and the control groups received treatment, compared with if no one in both groups received treatment (Harder, Stuart, & Anthony, 2010). The definition of ATT can be expressed as
ATT ¼ EYðÞ 1ijTi ¼ 1 EðY0ijTi ¼ 1Þ: ð1:1bÞ In contrast to the ATE, the ATT refers to the average difference that would be found if everyone in the treated group received treatment compared with if none of these individuals in the treated group received treatment. The value for the ATE will be the same as that for the ATT when the research design is experimental.1
Experiment There are different ways to estimate treatment effects other than PSM. Of these, the experiment is the gold standard (Antonakis et al., 2010). If the participants are randomly assigned to the treated or the control group, then the treatment effect can simply be estimated by comparing the mean differ- ence between these two groups. Experimental data can generate an unbiased estimator for causal effects because the randomized design ensures the equivalent distributions of the treated and the control groups on all observed and unobserved characteristics. Thus, any observed difference on out- come can be caused only by the treatment difference. Because randomized experiments can success- fully reconstruct counterfactuals, the causal effect generated by experiment is unbiased.
Regression In situations when the causal effects of training cannot be studied using an experimental design, scholars want to examine whether receiving training (T) has any effect on future earnings (Y). In this case, scholars generally rely on potentially biased observational data sets to investigate the causal
Downloaded from orm.sagepub.com at Vrije Universiteit 34820 on January 30, 2014 4 Organizational Research Methods 00(0) effect. For example, one can use a simple regression model by regressing future earnings (Y)on training (T) and demographic variables such as age (x1) and race (x2).
Y ¼ b0 þ b1x1 þ b2x2 þ tT þ e: ð1:2Þ Scholars then interpret the results by saying ‘‘ceteris paribus, the effect due to training is t.’’ They typically assume t is the causal effect due to management intervention. Indeed, regression or the structural equation models (SEM) (cf. Duncan, 1975; James, Mulaik, & Brett, 1982) is still a domi- nant approach for estimating treatment effect.2 Yet, regression cannot detect whether the cases are comparable in terms of distribution overlap on observed characteristics. Thus, regression models are unable to reconstruct counterfactuals. One can easily find many empirical studies that seek to esti- mate causal effects by regressing an outcome variable on an intervention dummy variable. The find- ings of these studies, which used observational data sets, could be wrong because they did not adjust for the distribution between the treated and control groups.
Endogeneity In addition to the nonequivalence of distribution between the control and treated groups, another severe error that prevents scholars from calculating unbiased causal effects is endogeneity. This occurs when predictor T correlates with error term e in Equation 1.2. A number of review articles have described the endogeneity problem and warned management scholars of its biasing effects (e.g., Antonakis et al., 2010; Hamilton & Nickerson, 2003). As discussed previously, endogeneity manifests from measurement error, simultaneity, and omitted variables. Measurement error typically attenuates the effect size of regression estimators in explanatory variables. Simultaneity happens when at least one of the predictors is determined simultaneously along with the dependent variable. An example of simultaneity is the estimation of price in a supply and demand model (Greene, 2008). An omitted variable appears when one does not control for additional variables that correlate with explanatory as well as dependent variables. Of these three sources of endogeneity, the omitted variable bias has probably received the most attention from management scholars. Returning to the earlier training example, suppose the researcher only controls for demographic variables but does not control for an individual’s ability. If training correlates with ability and ability correlates with future earnings, the result will be biased because of endogeneity. Consequently, omitting ability will cause a correlation between training dummy T and residuals e. This violates the assumption of strict exogeneity for linear regression models. Thus, the estimated causal effect (tÞ in Equation 1.2 will be biased. If the omitted variable is time-invariant, one can use the fixed effects model to deal with endogeneity (Allison, 2009). Beck, Bru¨derl, and Woywode’s (2008) simulation showed that the fixed effects model provided correction for biased estimation due to the omitted variable. One can also view nonrandom sample selection as a special case of the omitted variable problem. Taking the effect of training on earnings as an example, one can only observe earnings for individ- uals who are employed. Employed individuals could be a nonrandom subset of the population. One can write the nonrandom selection process as Equation 1.3, D ¼ aZ þ u; ð1:3Þ where D is latent selection variable (1 for employed individuals), Z represents a vector of variables (e.g., education level) that predicts selection, and u denotes disturbances. One can call Equation 1.2 the substantive equation and Equation 1.3 the selection equation. Sample selection bias is likely to materialize when there is correlation between the disturbances for substantive (e) and selection equation (u) (Antonakis et al., 2010, p. 1094; Berk, 1983; Heckman, 1979). When there is a correla- tion between e and u, the Heckman selection model, rather than the PSM, should be used to calculate
Downloaded from orm.sagepub.com at Vrije Universiteit 34820 on January 30, 2014 Li 5 causal effect (Antonakis et al., 2010). To correct for the sample selection bias, one can first fit the selection model using probit or logit model. Then the predicted values from the selection model will be saved to compute the density and distribution values, from which the inverse Mills ratio (l)—the ratio for the density value to the distribution value—will be calculated. Finally, the inverse Mills ratio will be included in the substantive Equation 1.2 to correct for the bias of t due to selection. For more information on two-stage selection models, readers can consult Berk (1983).
The Propensity Score Method Having briefly reviewed existing techniques for estimating causal effects, I now discuss how PSM can help scholars to draw correct causal inferences. The PSM is a technique that allows researchers to reconstruct counterfactuals using observational data. It does this by reducing two sources of bias in the observational data: bias due to lack of distribution overlap and bias due to different density weighting (Heckman, Ichimura, Smith, & Todd, 1998). A propensity score can be defined as the probability of study participants receiving a treatment based on observed characteristics. The PSM refers to a special procedure that uses propensity scores and matching algorithm to calculate the cau- sal effect. Before moving on, it is useful to conceptually differentiate PSM from Heckman’s (1979) ‘‘selec- tion model.’’ His selection model deals with the probability of treatment assignment indirectly from instrumental variables. Thus, the probability calculated using the selection model requires one or more variables that are not censored or truncated and that can predict the selection. For example, if one wanted to study how training affects future earnings, one must consider the self-selection problem, because wages can only be observed for individuals who are already employed. Using the predicted probability calculated from the first stage (Equation 1.3), one can compute the inverse Mills ratio and insert this variable to the wage prediction model to correct for selection bias. In con- trast to the predicted probability calculated in the Heckman selection model, propensity scores are calculated directly only through observed predictors. Furthermore, the propensity scores and the pre- dicted probabilities calculated using Heckman selection have different purposes in estimating causal effects: The probabilities estimated from the Heckman model generate an inverse Mills ratio that can be used to adjust for bias due to censoring or truncation, whereas the probabilities calculated in the PSM are used to adjust covariate distribution between the treated group and the control group.
Reasons for Using the PSM Because there are many methods that can estimate causal effects, why should management scholars care about the PSM? One reason is that most publications in the management field rely on observa- tional data. Such large data can be relatively inexpensive to obtain, yet they are almost always obser- vational rather than experimental. By adjusting covariates between the treated and control groups, the PSM allows scholars to reconstruct counterfactuals using observational data. If the strongly ignorable assumption that will be discussed in the next section is satisfied, then the PSM can produce an unbiased causal effect using observational data sets. Second, mis-specified econometric models using observational data sometimes produce biased estimators. One source of such bias is that the two samples lack distribution overlap, and regression analysis cannot tell researchers the distribution overlap between two samples. Cochran (1957, pp. 265-266) illustrated this problem using the following example: ‘‘Suppose that we were adjusting for differences in parents’ income in a comparison of private and public school children, and that the private-school incomes ranged from $10,000–$12,000, while the public-school incomes ranged from $4,000–$6,000. The covariance would adjust results so that they allegedly applied to a mean income of $8,000 in each group, although neither group has any observations in which incomes are
Downloaded from orm.sagepub.com at Vrije Universiteit 34820 on January 30, 2014 6 Organizational Research Methods 00(0) at or near this level.’’ The PSM can easily detect the lack of covariate distribution between two groups and adjust the distribution accordingly. Third, linear or logistic models have been used to adjust for confounding covariates, but such models rely on assumptions regarding functional form. For example, one assumption required for a linear model to produce an unbiased estimator is that it does not suffer from the aforementioned problem of endogeneity. Although the procedure to calculate propensity scores is parametric, using propensity scores to compute causal effect is largely nonparametric. Thus, using the PSM to calcu- late the causal effect is less susceptible to the violation of model assumptions. Overall, when one is interested in investigating the effectiveness of a certain management practice but is unable to collect experimental data, the PSM should be used, at least as a robust test to justify the findings estimated by parametric models.
Overview of the PSM The concept of subclassification is helpful for understanding the PSM. Simply comparing the mean difference of the outcome variables in two groups typically leads to biased estimators, because the distributions of the observational variables in the two groups may differ. Cochran’s (1968) subclas- sification method first divides an observational variable into n subclasses and then estimates the treatment effect by comparing the weighted means of the outcome variable in each subclass. He used two approaches to demonstrate the effectiveness of subclassification in reducing bias in observa- tional studies. First, he used an empirical example (death rate for smoking groups with country of origin and age as covariates) to show that when age was divided into two classes more than half the effect of the age bias was removed. Second, he used a mathematical model to derive the proportion of bias that can be removed through subclassification. For different distribution functions, using five or six subclasses will typically remove 90% or more of the bias shown in the raw comparison. With more than six subclasses, only small amounts of additional bias can be removed. Yet, subclassifica- tion is difficult to utilize if many confounding covariates exist (Rubin, 1997). To overcome the difficulty of estimating the treatment effects using Cochran’s technique, Rosen- baum and Rubin (1983) developed the PSM. The key objective of the PSM is to replace the many confounding covariates in an observational study with one function of these covariates. The function (or the propensity score) captures the likelihood of study participants receiving a treatment based on observed covariates. The estimated propensity score is then used as the only confounding covariate to adjust for all of the covariates that go into the estimation. Since the propensity score adjusts for all covariates using a simple variable and Cochran found that five blocks can remove 90% of bias due to raw comparison, stratifying the propensity score into five blocks can generally remove much of the difference due to the non-overlap of all observed covariates between the treated group and the con- trol group. Central to understanding the PSM is the balancing score. Rosenbaum and Rubin (1983) defined the balancing score as a function of observable covariates such that the conditional distribution of X given the balancing score is the same for the treated group and the control group. Formally, the bal- ancing score bðXÞ satisfies X?TjbðXÞ, where X is a vector of the observed covariates, T represents the treatment assignment, and ? refers to independence. Rosenbaum and Rubin argued that the pro- pensity score is a type of balancing score. They further proved that the finest balancing score is bXðÞ¼X, the coarsest balancing score is the propensity score, and any score that is finer than the propensity score is the balancing score. Rosenbaum and Rubin (1983) also introduced the strongly ignorable assumption, which implies that given the balancing scores, the distributions of the covariates between the treated and the control groups are the same. They further showed that treatment assignment is strongly ignorable if it satis- fies the condition of unconfoundedness and overlap. Unconfoundedness means that conditional on
Downloaded from orm.sagepub.com at Vrije Universiteit 34820 on January 30, 2014 Li 7
observational covariates X, potential outcomes (Y1 and Y0) are not influenced by treatment assign- ment (Y1; Y0?TjX). This assumption simply asserts that the researcher can observe all variables that need to be adjusted. The overlap assumption means that given covariates X, the person with the same X values has positive and equal opportunity of being assigned to the treated group or the control group ð0