Propensity Score Matching Regression Discontinuity Limited Dependent Variables
Total Page:16
File Type:pdf, Size:1020Kb
Propensity Score Matching Regression Discontinuity Limited Dependent Variables Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2013 Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 1 / 99 Propensity score matching Propensity score matching Policy evaluation seeks to determine the effectiveness of a particular intervention. In economic policy analysis, we rarely can work with experimental data generated by purely random assignment of subjects to the treatment and control groups. Random assignment, analogous to the ’randomized clinical trial’ in medicine, seeks to ensure that participation in the intervention, or treatment, is the only differentiating factor between treatment and control units. In non-experimental economic data, we observe whether subjects were treated or not, but in the absence of random assignment, must be concerned with differences between the treated and non-treated. For instance, do those individuals with higher aptitude self-select into a job training program? If so, they are not similar to corresponding individuals along that dimension, even though they may be similar in other aspects. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 2 / 99 Propensity score matching The key concern is that of similarity. How can we find individuals who are similar on all observable characteristics in order to match treated and non-treated individuals (or plants, or firms...) With a single measure, we can readily compute a measure of distance between a treated unit and each candidate match. With multiple measures defining similarity, how are we to balance similarity along each of those dimensions? The method of propensity score matching (PSM) allows this matching problem to be reduced to a single dimension: that of the propensity score. That score is defined as the probability that a unit in the full sample receives the treatment, given a set of observed variables. If all information relevant to participation and outcomes is observable to the researcher, the propensity score will produce valid matches for estimating the impact of an intervention. Thus, rather than matching on all values of the variables, individual units can be compared on the basis of their propensity scores alone. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 3 / 99 Propensity score matching An important attribute of PSM methods is that they do not require the functional form to be correctly specified. If we used OLS methods such as y = Xβ + Dγ + where y is the outcome, X are covariates and D is the treatment indicator, we would be assuming that the effects of treatment are constant across individuals. We need not make this assumption to employ PSM. As we will see, a crucial assumption is made on the contents of X, which should include all variables that can influence the probability of treatment. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 4 / 99 Propensity score matching Why use matching methods? Why use matching methods? The greatest challenge in evaluating a policy intervention is obtaining a credible estimate of the counterfactual: what would have happened to participants (treated units) had they not participated? Without a credible answer, we cannot rule out that whatever successes have occurred among participants could have happened anyway. This relates to the fundamental problem of causal inference: it is impossible to observe the outcomes of the same unit in both treatment conditions at the same time. The impact of a treatment on individual i, δi , is the difference between potential outcomes with and without treatment: δi = Y1i − Y0i where states 0 and 1 corrrespond to non-treatment and treatment, respectively. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 5 / 99 Propensity score matching Why use matching methods? To evaluate the impact of a program over the population, we may compute the average treatment effect (ATE): ATE = E[δi ] = E(Y1 − Y0) Most often, we want to compute the average treatment effect on the treated (ATT): ATT = E(Y1 − Y0jD = 1) where D = 1 refers to the treatment. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 6 / 99 Propensity score matching Why use matching methods? The problem is that not all of these parameters are observable, as they rely on counterfactual outcomes. For instance, we can rewrite ATT as ATT = E(Y1jD = 1) − E(Y0jD = 1) The second term is the average outcome of treated individuals had they not received the treatment. We cannot observe that, but we do observe a corresponding quantity for the untreated, and can compute ∆ = E(Y1jD = 1) − E(Y0jD = 0) The difference between ATT and ∆ can be defined as ∆ = ATT + SB where SB is the selection bias term: the difference between the counterfactual for treated units and observed outcomes for untreated units. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 7 / 99 Propensity score matching Why use matching methods? For the computable quantity ∆ to be useful, the SB term must be zero. But selection bias in a non-experimental context is often sizable. For instance, those who voluntarily sign up for a teacher-training program may be the more motivated teachers, who might be more likely to do well (in terms of student test scores) even in the absence of treatment. In other cases, the bias may not arise due to individuals self-selecting into treatment, but being selected for treatment on the basis of an interview or evaluation of their willingness to cooperate with the program. This gives rise to administrative selection bias or program placement bias. Even in the case of a randomized experiment, participants selected for treatment may choose not to be treated, or may not comply with all aspects of the treatment regime. In this sense, even a randomized trial may involve bias in evaluating the effects of treatment, and nonexperimental methods may be required to adjust for that bias. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 8 / 99 Propensity score matching Requirements for PSM validity Requirements for PSM validity Two key assumptions underly the use of matching methods, and PSM in particular: 1 Conditional independence: there exists a set X of observable covariates such that after controlling for these covariates, the potential outcomes are independent of treatment status: (Y1; Y0) ? DjX 2 Common support: for each value fo X, there is a positive probability of being both treated and untreated: 0 < P(D = 1jX) < 1 Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 9 / 99 Propensity score matching Requirements for PSM validity The conditional independence assumption (Y1; Y0) ? DjX implies that after controlling for X, the assignment of units to treatment is ‘as good as random.’ This assumption is also known as selection on observables, and it requires that all variables relevant to the probability of receiving treatment may be observed and included in X. This allows the untreated units to be used to construct an unbiased counterfactual for the treatment group. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 10 / 99 Propensity score matching Requirements for PSM validity The common support assumption 0 < P(D = 1jX) < 1 implies that the probability of receiving treatment for each possible value of the vector X is strictly within the unit interval: as is the probability of not receiving treatment. This assumption of common support ensures that there is sufficient overlap in the characteristics of treated and untreated units to find adequate matches. When these assumptions are satisfied, the treatment assignment is said to be strongly ignorable in the terminology of Rosenbaum and Rubin (Biometrika, 1983). Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 11 / 99 Propensity score matching Basic mechanics of matching Basic mechanics of matching The procedure for estimating the impact of a program can be divided into three steps: 1 Estimate the propensity score 2 Choose a matching algorithm that will use the estimated propensity scores to match untreated units to treated units 3 Estimate the impact of the intervention with the matched sample and calculate standard errors Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 12 / 99 Propensity score matching Basic mechanics of matching To estimate the propensity score, a logit or probit model is usually employed. It is essential that a flexible functional form be used to allow for possible nonlinearities in the participation model. This may involve the introduction of higher-order terms in the covariates as well as interaction terms. There will usually be no comprehensive list of the clearly relevant variables that would assure that the matched comparison group will provide an unbiased estimate of program impact. Obviously explicit criteria that govern project or program eligibility should be included, as well as factors thought to influence self-selection and administrative selection. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 13 / 99 Propensity score matching Basic mechanics of matching In choosing a matching algorithm, you must consider whether matching is to be performed with or without replacement. Without replacement, a given untreated unit can only be matched with one treated unit. A criterion for assessing the quality of the match must also be defined. The number of untreated units to be matched with each treated unit must also be chosen. Early matching estimators paired each treated unit with one unit from the control group, judged most similar. Researchers have found that estimators are more stable if a number of comparison cases are considered for each treated case, usually implying that the matching will be done with replacement. Christopher F Baum (BC / DIW) PSM, RD, LDV Boston College, Spring 2013 14 / 99 Propensity score matching Basic mechanics of matching The matching criterion could be as simple as the absolute difference in the propensity score for treated vs.