<<

Journal of Economic Literature 2009, 47:1, 5–86 http:www.aeaweb.org/articles.php?doi=10.1257/jel.47.1.5

Recent Developments in the of Program Evaluation

Guido W. Imbens and Jeffrey M. Wooldridge*

Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades, much research has been done on the econometric and statistical analysis of such causal effects. This recent theoreti- cal literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization, and other areas of empirical microeconomics. In this review, we discuss some of the recent developments. We focus primarily on practical issues for empirical research- ers, as well as provide a historical overview of the area and give references to more technical research.

1. Introduction research in economics and suitable for a review. In this article, we attempt to pres- any empirical questions in economics ent such a review. We will focus on practi- Mand other social sciences depend on cal issues for empirical researchers, as well as causal effects of programs or policies. In the provide an historical overview of the area and last two decades, much research has been give references to more technical research. done on the econometric and statistical anal- This review complements and extends other ysis of such causal effects. This recent theo- reviews and discussions, including those by retical literature has built on, and combined Richard Blundell and Monica Costa Dias features of, earlier work in both the statistics (2002), Guido W. Imbens (2004), and Joshua and econometrics literatures. It has by now D. Angrist and Alan B. Krueger (1999) and reached a level of maturity that makes it an the books by Paul R. Rosenbaum (1995), important tool in many areas of ­empirical Judea Pearl (2000), Myoung-Jae Lee (2005a), Donald B. Rubin (2006), Marco Caliendo (2006), Angrist and Jörn-Steffen Pischke * Imbens: Harvard University and NBER. Wooldridge: Michigan State University. Financial support for this (2009), Howard S. Bloom (2005), Stephen research was generously provided through NSF grants L. Morgan and Christopher Winship (2007), SES 0136789, 0452590 and 08. We are grateful for com- Jeffrey M. Wooldridge (2002) and Imbens and ments by Esther Duflo, Caroline Hoxby, Roger Gordon, Jonathan Beauchamp, Larry Katz, Eduardo Morales, and Rubin (forthcoming). In addition, the reviews two anonymous referees. in James J. Heckman, Robert J. LaLonde, 5 6 Journal of Economic Literature, Vol. XLVII (March 2009) and Jeffrey A. Smith (1999), Heckman and can involve different physical units or the Edward Vytlacil (2007a, 2007b), and Jaap same physical unit at different times. H. Abbring and Heckman (2007) provide an The problem of evaluating the effect of a excellent overview of the important theoreti- binary treatment or program is a well studied cal work by Heckman and his coauthors in problem with a long history in both econo- this area. metrics and statistics. This is true both in The central problem studied in this liter- the theoretical literature as well as in the ature is that of evaluating the effect of the more applied literature. The econometric exposure of a set of units to a program, or literature goes back to early work by Orley treatment, on some outcome. In economic Ashenfelter (1978) and subsequent work by studies, the units are typically economic Ashenfelter and David Card (1985), Heckman agents such as individuals, households, mar- and Richard Robb (1985), LaLonde (1986), kets, firms, counties, states, or countries Thomas Fraker and Rebecca Maynard but, in other disciplines where evaluation (1987), Card and Daniel G. Sullivan (1988), methods are used, the units can be animals, and Charles F. Manski (1990). Motivated plots of land, or physical objects. The treat- primarily by applications to the evaluation of ments can be job search assistance programs, labor market programs in observational set- educational programs, vouchers, laws or tings, the focus in the econometric literature regulations, medical drugs, environmental is traditionally on endogeneity, or self-selec- exposure, or technologies. A critical feature tion, issues. Individuals who choose to enroll is that, in principle, each unit can be exposed in a training program are by definition dif- to multiple levels of the treatment. Moreover, ferent from those who choose not to enroll. this literature is focused on settings with These differences, if they influence the observations on units exposed, and not response, may invalidate causal comparisons exposed, to the treatment, with the evalua- of outcomes by treatment status, possibly tion based on comparisons of units exposed even after adjusting for observed covariates. and not exposed.1 For example, an individual Consequently, many of the initial theoreti- may enroll or not in a training program, or he cal studies focused on the use of traditional or she may receive or not receive a voucher, econometric methods for dealing with endo- or be subject to a particular regulation or geneity, such as fixed effect methods from not. The object of interest is a comparison panel data analyses, and instrumental vari- of the two outcomes for the same unit when ables methods. Subsequently, the economet- exposed, and when not exposed, to the treat- rics literature has combined insights from ment. The problem is that we can at most the semiparametric literature to develop new observe one of these outcomes because the estimators for a variety of settings, requir- unit can be exposed to only one level of the ing fewer functional form and homogeneity treatment. Paul W. Holland (1986) refers to assumptions. this as the fundamental problem of causal The statistics literature starts from a dif- inference. In order to evaluate the effect of ferent perspective. This literature originates the treatment, we therefore always need to in the analysis of randomized experiments by compare distinct units receiving the different Ronald A. Fisher (1935) and Jerzy Splawa- levels of the treatment. Such a ­comparison Neyman (1990). From the early 1970s, Rubin (1973a, 1973b, 1974, 1977, 1978), in a series of papers, formulated the now dominant 1 As oppposed to studies where the causal effect of fundamentally new programs is predicted through direct approach to the analysis of causal effects in identification of preferences and production functions. observational studies. Rubin proposed the Imbens and Wooldridge: Econometrics of Program Evaluation 7 interpretation of causal statements as com- market programs, although more recently parisons of so-called potential outcomes: this emphasis seems to have weakened a pairs of outcomes defined for the same unit bit. In the last couple of years, some of the given different levels of exposure to the treat- most interesting experiments have been ment, with the ressearcher only observing conducted in development economics (e.g., the potential outcome corresponding to the Edward Miguel and 2004; level of the treatment received. Models are Esther Duflo 2001; Angrist, Eric Bettinger, developed for the pair of potential outcomes and Kremer 2006; Abhijit V. Banerjee rather than solely for the observed outcome. et al. 2007) and behavioral econom- Rubin’s formulation of the evaluation prob- ics (e.g., Marianne Bertrand and Sendhil lem, or the problem of causal inference, Mullainathan 2004). Nevertheless, experi- labeled the Rubin Causal Model (RCM) by mental evaluations remain relatively rare in Holland (1986), is by now standard in both economics. More common is the case where the statistics and econometrics literature. economists analyze data from observational One of the attractions of the potential out- studies. Observational data generally cre- comes setup is that from the outset it allows ate challenges in estimating causal effects for general heterogeneity in the effects of the but, in one important special case, variously treatment. Such heterogeneity is important referred to as unconfoundedness, exogene- in practice, and it is important theoretically ity, ignorability, or selection on observables, as it is often the motivation for the endogene- questions regarding identification and esti- ity problems that concern economists. One mation of the policy effects are fairly well additional advantage of the potential out- understood. All these labels refer to some come set up is that the parameters of interest form of the assumption that adjusting treat- can be defined, and the assumptions stated, ment and control groups for differences in without reference to particular statistical observed covariates, or pretreatment vari- models. ables, remove all biases in comparisons Of particular importance in Rubin’s between treated and control units. This case approach is the relationship between treat- is of great practical relevance, with many ment assignment and the potential out- studies relying on some form of this assump- comes. The simplest case for analysis is when tion. The semiparametric efficiency bound assignment to treatment is randomized and, has been calculated for this case (Jinyong thus, independent of covariates as well as the Hahn 1998) and various semiparametric potential outcomes. In such classical ran- estimators have been proposed (Hahn 1998; domized experiments, it is straightforward Heckman, Hidehiko Ichimura, and Petra to obtain estimators for the average effect E. Todd 1998; Keisuke Hirano, Imbens, of the treatment with attractive properties and Geert Ridder 2003; Xiaohong Chen, under repeated sampling, e.g., the difference Han Hong, and Alessandro Tarozzi 2008; in means by treatment status. Randomized Imbens, Whitney K. Newey, and Ridder experiments have been used in some areas 2005; Alberto Abadie and Imbens 2006). We in economics. In the 1970s, negative income discuss the current state of this literature, tax experiments received widespread atten- and the practical recommendations coming tion. In the late 1980s, following an influen- out of it, in detail in this review. tial paper by LaLonde (1986) that concluded Without unconfoundedness, there is no econometric methods were unable to repli- general approach to estimating treatment cate experimental results, more emphasis effects. Various methods have been proposed was put on experimental evaluations of labor for special cases and, in this review, we 8 Journal of Economic Literature, Vol. XLVII (March 2009) will discuss several of them. One approach ­particular attention to the practical issues (Rosenbaum and Rubin 1983b; Rosenbaum raised by the implementation of these meth- 1995) consists of sensitivity analyses, where ods. At this stage, the literature has matured robustness of estimates to specific limited to the extent that it has much to offer the departures from unconfoundedness are empirical researcher. Although the evalu- investigated. A second approach, developed ation problem is one where identification by Manski (1990, 2003, 2007), consists of problems are important, there is currently a bounds analyses, where ranges of estimands much better understanding of which assump- consistent with the data and the limited tions are most useful, as well as a better set assumptions the researcher is willing to make, of methods for inference given different sets are derived and estimated. A third approach, of assumptions. instrumental variables, relies on the pres- Most of this review will be limited to set- ence of additional treatments, the so-called tings with binary treatments. This is in keep- instruments, that satisfy specific exogeneity ing with the literature, which has largely and exclusion restrictions. The formulation focused on binary treatment case. There are of this method in the context of the potential some extensions of these methods to mul- outcomes framework is presented in Imbens tivalued, and even continuous, treatments and Angrist (1994) and Angrist, Imbens, and (e.g., Imbens 2000; Michael Lechner 2001; Rubin (1996). A fourth approach applies to Lechner and Ruth Miquel 2005; Richard D. settings where, in its pure form, overlap is Gill and James M. Robins 2001; Hirano and completely absent because the assignment Imbens 2004), and some of these extensions is a deterministic function of covariates, but will be discussed in the current review. But comparisons can be made exploiting conti- the work in this area is ongoing, and much nuity of average outcomes as a function of remains to be done here. covariates. This setting, known as the regres- The running example we will use through- sion discontinuity design, has a long tradition out the paper is that of a job market training in statistics (see William R. Shadish, Thomas program. Such programs have been among D. Cook, and Donald T. Campbell 2002 and the leading applications in the economics lit- Cook 2008 for historical perspectives), but erature, starting with Ashenfelter (1978) and has recently been revived in the economics including LaLonde (1986) as a particularly literature through work by Wilbert van der influential study. In such settings, a number Klaauw (2002), Hahn, Todd, and van der of individuals do, or do not enroll in a training Klaauw (2001), David S. Lee (2001), and Jack program, with labor market outcomes, such R. Porter (2003). Finally, a fifth approach, as yearly earnings or employment status, as referred to as difference-in-differences, relies the main outcome of interest. An individual on the presence of additional data in the form not participating in the program may have of samples of treated and control units before chosen not to do so, or may have been ineli- and after the treatment. An early applica- gible for various reasons. Understanding the tion is Ashenfelter and Card (1985). Recent choices made, and constraints faced, by the theoretical work includes Abadie (2005), potential participants, is a crucial component Bertrand, Duflo, and Mullainathan (2004), of any analysis. In addition to observing par- Stephen G. Donald and Kevin Lang (2007), ticipation status and outcome measures, we and Susan Athey and Imbens (2006). typically observe individual background char- In this review, we will discuss in detail acteristics, such as education levels and age, some of the new methods that have been as well as information regarding prior labor developed in this literature. We will pay market histories, such as earnings at various Imbens and Wooldridge: Econometrics of Program Evaluation 9 levels of aggregation (e.g., yearly, quarterly, or but not both, and thus only one of these two monthly). In addition, we may observe some potential outcomes can be realized. Prior to of the constraints faced by the individuals, the assignment being determined, both are including measures used to determine eli- potentially observable, hence the label poten- gibility, as well as measures of general labor tial outcomes. If individual i participates in market conditions in the local labor markets the program, Yi(1) will be realized and Yi(0) faced by potential participants. will ex post be a counterfactual outcome. If, on the other hand individual i does not par- ticipate in the program, Y (0) will be realized 2. The Rubin Causal Model: Potential i and Y (1) will be the ex post counterfactual. Outcomes, the Assignment Mechanism, i We will denote the realized outcome by Y , and Interactions i with Y the N-vector with i-th element equal In this section, we describe the essential to Yi. The preceding discussion implies that elements of the modern approach to program evaluation, based on the work by Rubin. Y Y (W) Y (0) (1 W) Y (1) W i = i i = i − i + i i Suppose we wish to analyze a job training program using observations on N individu- Y (0) if W = 0, i i als, indexed by i 1, …, N. Some of these = = e individuals were enrolled in the training Yi(1) if Wi = 1. program. Others were not enrolled, either because they were ineligible or chose not to The potential outcomes are tied to the spe- enroll. We use the indicator Wi to indicate cific manipulation that would have made whether individual i enrolled in the training one of them the realized outcome. The more program, with W 0 if individual i did not, precise the specification of the manipulation, i = and W 1 if individual i did, enroll in the the more well-defined the potential out- i = program. We use W to denote the N-vector comes are. with i-th element equal to Wi, and N0 and N1 This distinction between the pair of poten- to denote the number of control and treated tial outcomes (Yi(0),Yi(1)) and the realized units, respectively. For each unit, we also outcome Yi is the hallmark of modern statis- observe a K-dimensional column vector of tical and econometric analyses of treatment covariates or pretreatment variables, Xi, with effects. We offer some comments on it. The X denoting the N K matrix with i-th row potential outcomes framework has important × equal to X . precursors in a variety of other settings. Most ′i directly, in the context of randomized experi- 2.1 Potential Outcomes ments, the potential outcome framework was The first element of the RCM is the notion introduced by Splawa-Neyman (1990) to of potential outcomes. For individual i, for derive the properties of estimators and confi- i 1, …, N, we postulate the existence of two dence intervals under repeated sampling. = potential outcomes, denoted by Yi(0) and The potential outcomes framework also Yi(1). The first,Y i(0), denotes the outcome that has important antecedents in econometrics. would be realized by individual i if he or she Specifically, it is interesting to compare the did not participate in the program. Similarly, distinction between potential outcomes Yi(0) Yi(1) denotes the outcome that would be real- and Yi(1) and the realized outcome Yi in ized by individual i if he or she did partici- Rubin’s approach to ’s (1943) pate in the program. Individual i can either work on simultaneous equations ­models participate or not participate in the program, (SEMs). Haavelmo discusses ­identification of 10 Journal of Economic Literature, Vol. XLVII (March 2009) supply and demand models. He makes a dis- The potential outcomes framework has tinction between “any imaginable price ” as a number of advantages over a framework π the argument in the demand and supply func- based directly on realized outcomes. The tions, qd( ) and qs( ), and the “actual price p,” first advantage of the potential outcome π π which is the observed equilibrium price satis- framework is that it allows us to define causal fying qs( p) qd( p). The supply and demand effects before specifying the assignment = functions play the same role as the potential mechanism, and without making functional outcomes in Rubin’s approach, with the equi- form or distributional assumptions. The most librium price similar to the realized outcome. common definition of the causal effect at the Curiously, Haavelmo’s notational distinction unit level is as the difference Y (1) Y (0), i − i between equilibrium and potential prices has but we may wish to look at ratios Y (1) Y (0), i / i gotten blurred in many textbook discussions of or other functions. Such definitions do not simultaneous equations. In such discussions, require us to take a stand on whether the the starting point is often the general formula- effect is constant or varies across the popu- tion Y XB U for N M vectors of real- lation. Further, defining individual-specific Γ + = × ized outcomes Y, N L matrices of exogenous treatment effects using potential outcomes × covariates X, and an N M matrix of unob- does not require us to assume endogeneity or × served components U. A nontrivial byproduct exogeneity of the assignment mechanism. By of the potential outcomes approach is that it contrast, the causal effects are more difficult forces users of SEMs to articulate what the to define in terms of the realized outcomes. potential outcomes are, thereby leading to Often, researchers write down a regression better applications of SEMs. A related point is function Y W . This regres- i = α + τ · i + εi made in Pearl (2000). sion function is then interpreted as a struc- Another area where potential outcomes tural equation, with as the causal effect. τ are used explicitly is in the econometric Left unclear is whether the causal effect analyses of production functions. Similar to is constant or not, and what the properties the potential outcomes framework, a pro- of the unobserved component, , are. The εi duction function g(x, ) describes production potential outcomes approach separates these ε levels that would be achieved for each value issues, and allows the researcher to first of a vector of inputs, some observed (x) and define the causal effect of interest without some unobserved ( ). Observed inputs may considering probabilistic properties of the ε be chosen partly as a function of (expected) outcomes or assignment. values of unobserved inputs. Only for the The second advantage of the poten- level of inputs actually chosen do we observe tial outcome approach is that it links the the level of the output. Potential outcomes analysis of causal effects to explicit manip- are also used explicitly in labor market set- ulations. Considering the two potential out- tings by A. D. Roy (1951). Roy models indi- comes forces the researcher to think about viduals choosing from a set of occupations. scenarios under which each outcome could Individuals know what their earnings would be observed, that is, to consider the kinds be in each of these occupations and choose of experiments that could reveal the causal the occupation (treatment) that maximizes effects. Doing so clarifies the interpretation their earnings. Here we see the explicit use of causal effects. For illustration, consider of the potential outcomes, combined with a a couple of recent examples from the eco- specific selection/assignment mechanism, nomics literature. First, consider the causal namely, choosing the treatment with the effects of gender or ethnicity on outcomes highest potential outcome. of job applications. Simple comparisons of Imbens and Wooldridge: Econometrics of Program Evaluation 11

­economic outcomes by ethnicity are diffi- model the probability of enrolling in the pro- cult to interpret. Are they the result of dis- gram given the earnings in both treatment crimination by employers, or are they the arms conditional on individual characteris- result of differences between applicants, tics. This sequential modeling will lead to a possibly arising from discrimination at an model for the realized outcome, but it may earlier stage of life? Now, one can obtain be easier than directly specifying a model for unambiguous causal interpretations by link- the realized outcome. ing comparisons to specific manipulations. A fourth advantage of the potential out- A recent example is the study by Bertrand comes approach is that it allows us to for- and Mullainathan (2004), who compare call- mulate probabilistic assumptions in terms of back rates for job applications submitted potentially observable variables, rather than with names that suggest African-American in terms of unobserved components. In this or Caucasian ethnicity. Their study has a approach, many of the critical assumptions clear manipulation—a name change—and will be formulated as (conditional) indepen- therefore a clear causal effect. As a sec- dence assumptions involving the potential ond example, consider some recent eco- outcomes. Assessing their validity requires nomic studies that have focused on causal the researcher to consider the dependence effects of individual characteristics such structure if all potential outcomes were as beauty (e.g., Daniel S. Hamermesh and observed. By contrast, models in terms of Jeff E. Biddle 1994) or height. Do the dif- realized outcomes often formulate the criti- ferences in earnings by ratings on a beauty cal assumptions in terms of errors in regres- scale represent causal effects? One possible sion functions. To be specific, consider again interpretation is that they represent causal the regression function Y W . i = α + τ · i + εi effects of plastic surgery. Such a manipula- Typically (conditional independence) assump- tion would make differences causal, but it tions are made on the relationship between εi appears unclear whether cross-sectional and Wi. Such assumptions implicitly bundle a correlations between beauty and earnings number of assumptions, including functional- in a survey from the general population rep- form assumptions and substantive exogeneity resent causal effects of plastic surgery. assumptions. This bundling makes the plau- A third advantage of the potential outcome sibility of these assumptions more difficult to approach is that it separates the modeling assess. of the potential outcomes from that of the A fifth advantage of the potential outcome assignment mechanism. Modeling the real- approach is that it clarifies where the uncer- ized outcome is complicated by the fact that tainty in the estimators comes from. Even if it combines the potential outcomes and the we observe the entire (finite) population (as assignment mechanism. The researcher may is increasingly common with the growing have very different sources of information to availability of administrative data sets)—so bear on each. For example, in the labor mar- we can estimate population averages with no ket program example we can consider the uncertainty—causal effects will be uncertain outcome, say, earnings, in the absence of the because for each unit at most one of the two program: Yi(0). We can model this in terms of potential outcomes is observed. One may still individual characteristics and labor market use super population arguments to justify histories. Similarly, we can model the out- approximations to the finite sample distribu- come given enrollment in the program, again tions, but such arguments are not required to conditional on individual characteristics and motivate the existence of uncertainty about labor market histories. Then finally we can the causal effect. 12 Journal of Economic Literature, Vol. XLVII (March 2009)

2.2 The Assignment Mechanism to simple mean differences by assignment. Such analyses are valid, but often they are not The second ingredient of the RCM is the the most powerful tools available to exploit assignment mechanism. This is defined as the the randomization. We discuss the analysis conditional probability of receiving the treat- of randomized experiments, including more ment, as a function of potential outcomes and powerful randomization-based methods for observed covariates. We distinguish three inference, in section 4. classes of assignment mechanisms, in order of The second class of assignment mecha- increasing complexity of the required analysis. nisms maintains the restriction that the The first class of assignment mechanisms is assignment probabilities do not depend on that of randomized experiments. In random- the potential outcomes, or ized experiments, the probability of assign- ment to treatment does not vary with potential W AY (0), Y (1)B X , i ǁ i i | i outcomes, and is a known function of covari- ates. The leading case is that of a completely where A B C denotes conditional indepen- ǁ | randomized experiment where, in a popula- dence of A a nd B given C. However, in contrast tion of N units, N N randomly chosen units to randomized experiments, the assignment 1 < are assigned to the treatment and the remain- probabilities are no longer assumed to be a ing N N N units are in the control group. known function of the covariates. The pre- 0 = − 1 There are important variations on this exam- cise form of this critical assumption, not tied ple, such as pairwise randomization, where to functional form or distributional assump- initially units are matched in pairs, and in a tions, was first presented in Rosenbaum second stage one unit in each pair is randomly and Rubin (1983b). Following Rubin (1990) assigned to the treatment. Another variant is a we refer to this assignment mechanism as general stratified experiment, where random- unconfounded assignment. Somewhat con- ization takes place within a finite number of fusingly, this assumption, or variations on it, strata. In any case, there are in practice few are in the literature also referred to by vari- experiments in economics, and most of those ous other labels. These include selection on are of the completely randomized experiment observables,2 exogeneity,3 and conditional variety, so we shall limit our discussion to this type of experiment. It should be noted though that if one has the opportunity to design a 2 Although Heckman, Ichimura, and Todd (1997, page 611) write that “In the language of Heckman and Robb randomized experiment, and if pretreatment (1985), matching assumes that selection is on observables” variables are available, stratified experiments (their italics), the original definition in Heckman and Robb are at least as good as completely randomized (1985, page 163) is not equivalent to unconfoundedness. In the context of a single cross-section version of their two experiments, and typically better, in terms of equation selection model, Y W and W = Xi′β + i α + εi i expected mean squared error, even in finite 1{Z 0}, they define selection bias to refer to = i′ γ + νi > the case where E[ W ] 0, and selection-on-observables samples. See Imbens et al. (2008) for more εi i ≠ to the case where selection bias is present and caused by details. The use of formal randomization correlation between and Z , rather than by correlation εi i has become more widespread in the social between and . εi νi 3 Although X is not exogenous for E[Y (1) Y (0)], sciences in recent years, sometimes as a for- i i − i according to the definitions in Robert F. Engle, David mal design for an evaluation and sometimes F. Hendry and Jean-Francois Richard (1983), because as an acceptable way of allocating scarce ­knowledge of its marginal distribution contains infor- mation about E[Y (1) Y (0)], standard usage of the resources. The analysis of such experiments i − i term “exogenous” does appear to capture the notion of is often straightforward. In practice, however, unconfoundedness, e.g., Manski et al. (1992), and Imbens researchers have typically limited themselves (2004). Imbens and Wooldridge: Econometrics of Program Evaluation 13 independence4. Although the analysis of data outcomes for another unit. Only the level of with such assignment mechanisms is not as the treatment applied to the specific individ- straightforward as that of randomized exper- ual is assumed to potentially affect outcomes iments, there are now many practical meth- for that particular individual. In the statistics ods available for this case. We review them literature, this assumption is referred to as in section 5. the Stable-Unit-Treatment-Value-Assumption The third class of assignment mechanisms (Rubin 1978). In this paper, we mainly focus contains all remaining assignment mecha- on settings where this assumption is main- nisms with some dependence on potential tained. In the current section, we discuss outcomes.5 Many of these create substantive some of the literature motivated by concerns problems for the analysis, for which there is about this assumption. no general solution. There are a number of This lack-of-interaction assumption is very special cases that are by now relatively well plausible in many biomedical applications. understood, and we discuss these in section 6. Whether one individual receives or does The most prominent of these cases are instru- not receive a new treatment for a stroke or mental variables, regression discontinuity, and not is unlikely to have a substantial impact differences-in-differences. In addition, we on health outcomes for any other individual. discuss two general methods that also relax However, there are also many cases in which the unconfoundedness assumption but do not such interactions are a major concern and the replace it with additional assumptions. The assumption is not plausible. Even in the early first relaxes the unconfoundedness assump- experimental literature, with applications tion in a limited way and investigates the sen- to the effect of various fertilizers on crop sitivity of the estimates to such violations. The yields, researchers were cognizant of poten- second drops the unconfoundedness assump- tial problems with this assumption. In order tion entirely and establishes bounds on esti- to minimize leaking of fertilizer applied to mands of interest. The latter is associated with one plot into an adjacent plot experimenters the work by Manski (1990, 1995, 2007). used guard rows to physically separate the plots that were assigned different fertilizers. 2.3 Interactions and General A different concern arises in epidemiological Equilibrium Effects applications when the focus is on treatments In most of the literature, it is assumed that such as vaccines for contagious diseases. In treatments received by one unit do not affect that case, it is clear that the vaccination of one unit can affect the outcomes of others in their proximity, and such effects are a large 4 E.g., Lechner 2001; A. Colin Cameron and Pravin K. part of the focus of the evaluation. Trivedi 2005. 5 This includes some mechanisms where the In economic applications, interactions dependence on potential outcomes does not create any between individuals are also a serious con- problems in the analyses. Most prominent in this category cern. It is clear that a labor market program are sequential assignment mechanisms. For example, one could randomly assign the first ten units to the treatment that affects the labor market outcomes for or control group with probability 1/2. From then on one one individual potentially has an effect on could skew the assignment probability to the treatment the labor market outcomes for others. In a with the most favorable outcomes so far. For example, if the active treatment looks better than the control world with a fixed number of jobs, a train- treatment based on the first N units, then the (N 1)th ing program could only redistribute the jobs, + unit is assigned to the active treatment with probability and ignoring this constraint on the number 0.8 and vice versa. Such assignment mechanisms are not very common in economics settings, and we ignore them of jobs by using a partial, instead of a gen- in this discussion. eral, equilibrium analysis could lead one to 14 Journal of Economic Literature, Vol. XLVII (March 2009) erroneously conclude that extending the pro- depending on some distance metric, either gram to the entire population would raise geographical distance or proximity in some aggregate employment. Such concerns have economic metric. rarely been addressed in the recent program The most interesting literature in this area evaluation literature. Exceptions include views the interactions not as a nuisance but Heckman, Lance Lochner, and Christopher as the primary object of interest. This litera- Taber (1999) who provide some simulation ture, which includes models of social inter- evidence for the potential biases that may actions and peer effects, has been growing result from ignoring these issues. rapidly in the last decade, following the early In practice these general equilibrium effects work by Manski (1993). See Manski (2000a) may, or may not, be a serious problem. The and William Brock and Steven N. Durlauf indirect effect on one individual of exposure (2000) for recent surveys. Empirical work to the treatment of a few other units is likely to includes Jeffrey R. Kling, Jeffrey B. Liebman, be much smaller than the direct effect of the and Katz (2007), who look at the effect of exposure of the first unit itself. Hence, with households moving to neighborhoods with most labor market programs both small in higher average socioeconomic status; Bruce scope and with limited effects on the individ- I. Sacerdote (2001), who studies the effect ual outcomes, it appears unlikely that general of college roommate behavior on a student’s equilibrium effects are substantial and they grades; Edward L. Glaeser, Sacerdote, and can probably be ignored for most purposes. Jose A. Scheinkman (1996), who study social One general solution to these problems is interactions in criminal behavior; Anne C. to redefine the unit of interest. If the inter- Case and Lawrence F. Katz (1991), who look actions between individuals are at an inter- at neighborhood effects on disadvantaged mediate level, say a local labor market, or a youths; Bryan S. Graham (2008), who infers classroom, rather than global, one can ana- interactions from the effect of class size on lyze the data using the local labor market the variation in grades; and Angrist and Lang or classroom as the unit and changing the (2004), who study the effect of desegregation no-interaction assumption to require the programs on students’ grades. Many iden- absence of interactions among local labor tification and inferential questions remain markets or classrooms. Such aggregation is unanswered in this literature. likely to make the no-interaction assump- tion more plausible, albeit at the expense of 3. What Are We Interested In? reduced precision. Estimands and Hypotheses An alternative solution is to directly model the interactions. This involves specifying In this section, we discuss some of the which individuals interact with each other, questions that researchers have asked in this and possibly relative magnitudes of these literature. A key feature of the current litera- interactions. In some cases it may be plau- ture, and one that makes it more important to sible to assume that interactions are limited be precise about the questions of interest, is to individuals within well-defined, possibly the accommodation of general ­heterogeneity overlapping groups, with the intensity of in treatment effects. In contrast, in many the interactions equal within this group. early studies it was assumed that the effect This would be the case in a world with a of a treatment was constant, implying that fixed number of jobs in a local labor market. the effect of various policies could be cap- Alternatively, it may be that interactions occur tured by a single parameter. The essentially in broader groups but decline in importance unlimited heterogeneity in the effects of the Imbens and Wooldridge: Econometrics of Program Evaluation 15 treatment allowed for in the current litera- expectation of the unit-level causal effect, ture implies that it is generally not possible Y (1) Y (0): i − i to capture the effects of all policies of inter- est in terms of a few summary statistics. In E[Y (1) Y (0)]. τPATE = i − i practice researchers have reported estimates of the effects of a few focal policies. In this If the policy under consideration would section we describe some of these estimands. expose all units to the treatment or none at Most of these estimands are average treat- all, this is the most relevant quantity. Another ment effects, either for the entire population popular estimand is the Population Average or for some subpopulation, although some Treatment effect on the Treated (PATT), the correspond to other features of the joint dis- average over the subpopulation of treated tribution of potential outcomes. units: Most of the empirical literature has focused on estimation. Much less attention has been E[Y (1) Y (0) W 1]. τPATT = i − i | i = devoted to testing hypotheses regarding the properties or presence of treatment effects. In many observational studies, is a τPATT Here we discuss null and alternative hypoth- more interesting estimand than the overall eses that may be of interest in settings with average effect. As an example, consider the heterogeneous effects. Finally, we discuss case where a well defined population was some of the recent literature on decision- exposed to a treatment, say a job training theoretic approaches to program evaluation program. There may be various possibilities that ties estimands more closely to optimal for a comparison group, including subjects policies. drawn from public use data sets. In that case, it is generally not interesting to consider the 3.1 Average Treatment Effects effect of the program for the comparison The econometric literature has largely group: for many members of the comparison focused on average effects of the treatment. group (e.g., individuals with stable, high-wage The two most prominent average effects are jobs) it is difficult and uninteresting to imag- defined over an underlying population. In ine their being enrolled in the labor market cases where the entire population can be program. (Of course, the problem of averag- sampled, population treatment effects rely on ing across units that are unlikely to receive the notion of a superpopulation, where the future treatments can be mitigated by more current population that is available is viewed carefully constructing the comparison group as just one of many possibilities. In either to be more like the treatment group, mak- case, the the sample of size N is viewed as ing a more meaningful parameter. See τPATE a random sample from a large (super-)popu- the discussion below.) A second case where lation, and interest is in the average effect is the estimand of most interest is in τPATT in the superpopulation.6 The most popular the setting of a voluntary program where treatment effect is the Population Average those not enrolled will never be required Treatment Effect (PATE), the population to ­participate in the program. A specific example is the effect of serving in the mili- 6 For simplicity, we restrict ourselves to random sam- tary where an interesting question concerns pling. Some data sets are obtained by stratified sampling. the foregone earnings for those who served Most of the estimators we consider can be adjusted for (Angrist 1998). stratified sampling. See, for example, Wooldridge (1999, 2007) on inverse probability weighting of averages and In practice, there is typically little motiva- objective functions. tion presented for the focus on the overall 16 Journal of Economic Literature, Vol. XLVII (March 2009) average effect or the average effect for the have to be particularly concerned with the treated. Take a job training program. The distinction between the two estimands at the overall average effect would be the param- estimation stage. However, there is an impor- eter of interest if the policy under con- tant difference between the population and sideration is a mandatory exposure to the conditional estimands at the inference stage. treatment versus complete elimination. It If there is heterogeneity in the effect of the is rare that these are the alternatives, with treatment, we can estimate the sample aver- more typically exemptions granted to various age treatment effect more precisely τCATE subpopulations. Similarly the average effect than the population average treatment effect for the treated would be informative about . When one estimates the variance of an τPATE the effect of entirely eliminating the current estimator ​ ˆ —which can serve as an estimate τ​ program. More plausible regime changes for or —one therefore needs to τPATE τCATE would correspond to a modest extension of be explicit about whether one is interested in the program to other jurisdictions, or a con- the variance relative to the population or to traction to a more narrow population. the conditional average treatment effect. We A somewhat subtle issue is that we may will return to this issue in section 5. wish to separate the extrapolation from the A more general class of estimands includes sample to the superpopulation from the average causal effects for subpopulations problem of inference for the sample at hand. and weighted average causal effects. Let 픸 This suggests that, rather than focusing on be a subset of the covariate space , and let 핏 PATE or PATT, we might first focus on the CATE, denote the conditional average causal τ 픸 average causal effect conditional on the cova- effect for the subpopulation with X : i ∈ 픸 riates in the sample,

1 N CATE, ​ ___ ​ ​ ​E[Yi(1) Yi(0) Xi ], τ 픸 = N i X∑ ​ ​ − | 1 픸 ∶ i∈픸 CATE __​ ​ ​ ​ ​​ E[Yi(1) Yi(0) Xi ], τ = N i∑1 − | = where N is the number of units with Xi . 픸 ∈ 픸 and, similarly, the average over the subsam- Richard K. Crump et al. (2009) argue for ple of treated units: considering such estimands. Their argu- ment is not based on the intrinsic interest of

1 these subpopulations. Rather, they show that CATT ___​ ​ ​ ​ ​E[Yi(1) Yi(0) Xi ]. τ = N1 i W∑i 1 − | such estimands may be much easier to esti- | = mate than (or ). Instead of solely τCATE τCATT If the effect of the treatment or interven- reporting an imprecisely estimated average tion is constant (Y (1) Y (0) for some effect for the overall population, they sug- i − i = τ constant ), all four estimands, , , gest it may be informative to also report τ τPATE τPATT , and , are obviously identical. a precise estimate for the average effect of τCATE τCATT However, if there is heterogeneity in the some subpopulation. They then propose a effect of the treatment, the estimands may particular set for which the average effect 픸 all be different. The difference between is most easily estimable. See section 5.10.2 and (and between and for more details. The Crump et al. estimates τPATE τCATE τPATT ) is relatively subtle. Most estimators would not necessarily have as much external τCATT that are attractive for the population treat- validity as estimates for the overall popula- ment effect are also attractive for the cor- tion, but they may be much more informative responding conditional average treatment for the sample at hand. In any case, in many effect, and vice versa. Therefore, we do not instances the larger policy questions concern Imbens and Wooldridge: Econometrics of Program Evaluation 17 extensions of the interventions or treatments differences between quantiles of the two to other populations, so that external validity marginal potential outcome distributions, may be elusive irrespective of the estimand. and not as quantiles of the unit level effect, In settings with selection on unobservables 1 the enumeration of the estimands of interest (2) ​ ˜ q ​FY​−(1) Y(0)​ (q). τ​ = − becomes more complicated. A leading case is instrumental variables. In the presence of In general, the quantile of the differ- heterogeneity in the effect of the treatment ence, ​ ˜ , differs from the difference in the τ​q one can typically not identify the average quantiles, , unless there is perfect rank τq effect of the treatment even in the presence correlation between the potential outcomes of valid instruments. There are two new Yi(0) and Yi(1) (the leading case of this is approaches in the recent literature. One is to the constant additive treatment effect). focus on bounds for well-defined estimands The quantile treatment effects, , have τq such as the average effect or . received much more attention, and in our τPATE τCATE Manski (1990, 2003) developed this approach view rightly so, than the quantiles of the in a series of papers. An alternative is to focus treatment effect, ​ ˜ . There are two issues τ​q on estimands that can be identified under regarding the choice between a focus on weaker conditions than those required for the the difference in quantiles versus quantiles average treatment effect. Imbens and Angrist of the difference. The first issue is substan- (1994) show that one can, under much weaker tial. Suppose a policy maker is faced with conditions than required for identification of the choice of assigning all members of a , identify the average effect for the sub- subpopulation, homogenous in covariates τPATE population of units whose treatment status is Xi, to the treatment group, or assigning all affected by the instrument. They refer to this of them to the control group. The result- subpopulation as the compliers. This does not ing outcome distribution is either f Y(0) ( y) or directly fit into the classification above since f Y(1) (y), assuming the subpopulation is large. the subpopulation is not defined solely in Hence the choice should be governed by terms of covariates. We discuss this estimand preferences of the policymaker over these in more detail in section 6.3. distributions (which can often be summa- rized by differences in the quantiles), and 3.2 Quantile and Distributional Treatment not depend on aspects of the joint distri- Effects and Other Estimands bution f Y(0),Y(1) ( y, z) that do not affect the An alternative class of estimands consists two marginal distributions. (See Heckman of quantile treatment effects. These have and Smith 1997 for a somewhat different only recently been studied and applied in view.) The second issue is statistical. In gen- the economics literature, although they were eral the ​ ˜ are not (point-)identified without τ​q introduced in the statistics literature in the assumptions on the rank correlation between 1970s. Kjell Doksum (1974) and Erich L. the potential outcomes, even with data from Lehmann (1974) define a randomized experiment. In a randomized experiment, one can identify f Y(0) ( y) and 1 1 (1) ​F​− ​ (q) ​F​− ​ (q), f (y) (and any functional thereof) but not τq = Y(1) − Y(0) Y(1) the joint distribution f Y(0),Y(1) ( y, z). Note that as the q-th quantile treatment effect. There this issue does not arise if we look at average are some important issues in interpreting effects because the mean of the difference is these quantile treatment effects. First, note equal to the difference of the means: E[Yi(1) that these quantiles effects are defined as Y (0)] E[Y (1)] E[Y (0)]. − i = i − i 18 Journal of Economic Literature, Vol. XLVII (March 2009)

A complication facing researchers inter- using Kolmogorov-Smirnov type testing ested in quantile treatment effects is that procedures. the difference in a marginal quantile, , is in A second set of questions concerns treat- τq general not equal to the average difference ment effect heterogeneity. Even if the aver- in the conditional quantiles, where the latter age effect is zero, it may be important to are defined as establish whether a targeted implementa- tion of the intervention, with only those who 1 1 q(x) ​FY​−(1) X​ (q x) ​FY​−(0) X​ ( q x). can expect to benefit from the intervention τ = | | − | | assigned to it, could improve average out- In other words, even if we succeed in esti- comes. In addition, in cases where there is mating (x), we cannot simply average (X ) not sufficient information to obtain pre- τq τq i across i to consistently estimate . Marianne cise inferences for the average causal effect τq Bitler, Jonah Gelbach, and Hilary Hoynes , it may still be possible to establish τPATE (2006) estimate quantile treatment effects whether there are any subpopulations with an in a randomized evaluation of a job training average effect positive or different from zero, program. Sergio Firpo (2007) develops meth- or whether there are subpopulations with an ods for estimating in observational studies average effect exceeding some threshold. It τq given unconfoundedness. Abadie, Angrist, may also be interesting to test whether there and Imbens (2002) and Victor Chernozhukov is any evidence of heterogeneity in the treat- and Christian B. Hansen (2005) study quan- ment effect by observable characteristics. tile treatment effects in instrumental vari- This bears heavily on the question whether ables settings. the estimands are useful for extrapolation to other populations which may differ in terms 3.3 Testing of some observable characteristics. Crump et The literature on hypothesis testing in pro- al. (2008) study these questions in settings gram evaluation is relatively limited. Most of with unconfounded treatment assignment. the testing in applied work has focused on 3.4 Decision-Theoretic Questions the null hypothesis that the average effect of interest is zero. Because many of the com- Recently, a small but innovative literature monly used estimators for average treatment has started to move away from the focus effects are asymptotically normally distrib- on summary statistics of the distribution of uted with zero asymptotic bias, it follows treatment effects or potential outcomes to that standard confidence intervals (the point directly address policies of interest. This is estimate plus or minus a constant times the very much a literature in progress. Manski standard error) can be used for testing such (2000b, 2001, 2002, 2004), Rajeev H. Dehejia hypotheses. However, there are other inter- (2005b), and Hirano and Porter (2008) study esting hypotheses to consider. the problem faced by program administra- One question of interest is whether there tors who can assign individuals to the active is any effect of the program, that is whether treatment or to the control group. These the distribution of Yi(1) differs from that of administrators have available two pieces of Yi(0). This is equivalent to the hypothesis information. First, covariate information for that not just the mean, but all moments, these individuals, and second, information are identical in the two treatment groups. about the efficacy of the treatment based on Abadie (2002) studies such tests in the a finite sample of other individuals for whom ­settings with ­randomized experiments as both outcome and covariate information is well as settings with instrumental variables available. The administrator may care about Imbens and Wooldridge: Econometrics of Program Evaluation 19 the entire distribution of outcomes, or solely With experimental data the statisti- about average outcomes, and may also take cal analysis is generally straightforward. into account costs associated with participa- Differencing average outcomes by treatment tion. If the administrator knew exactly the status or, equivalently, regressing the out- conditional distribution of the potential out- come on an intercept and an indicator for the comes given the covariate information this treatment, leads to an unbiased estimator for would be a simple problem: the administra- the average effect of the treatment. Adding tor would simply compare the expected wel- covariates to the regression function typically fare for different rules and choose the one improves precision without jeopardizing con- with the highest value. However, the admin- sistency because the randomization implies istrator does not have this knowledge and that in large samples the treatment indicator needs to make a decision given uncertainty and the covariates are independent. In prac- about these distributions. In these settings, it tice, researchers have rarely gone beyond is clearly important that the statistical model basic regression methods. In principle, allows for heterogeneity in the treatment however, there are additional methods that effects. can be useful in these settings. In section Graham, Imbens, and Ridder (2006) 4.2, we review one important experimental extend the type of problems studied in this technique, randomization-based inference, literature by incorporating resource con- including Fisher’s method for calculating straints. They focus on problems that include exact p-values, that deserves wider usage in as a special case the problem of allocating a social sciences. See Rosenbaum (1995) for a fixed number of slots in a program to a set of textbook discussion. individuals on the basis of observable charac- 4.1 Randomized Experiments in Economics teristics of these individuals given a random sample of individuals for whom outcome and Randomized experiments have a long covariate information is available. tradition in biostatistics. In this literature they are often viewed as the only cred- ible approach to establishing causality. For 4. Randomized Experiments example, the United States Food and Drug Experimental evaluations have tradition- Administration typically requires evidence ally been rare in economics. In many cases from randomized experiments in order to ethical considerations, as well as the reluc- approve new drugs and medical procedures. tance of administrators to deny services to A first comment concerns the fact that even randomly selected individuals after they randomized experiments rely to some extent have been deemed eligible, have made it on substantive knowledge. It is only once difficult to get approval for, and implement, the researcher is willing to limit interactions randomized evaluations. Nevertheless, the between units that randomization can estab- few experiments that have been conducted, lish causal effects. In settings with poten- including some of the labor market training tially unrestricted interactions between programs, have generally been ­influential, units, randomization by itself cannot solve sometimes extremely so. More recently, the ­identification problems required for many exciting and thought-provoking experi- establishing causality. In biomedical settings, ments have been conducted in development where such interaction effects are often argu- economics, raising new issues of design and ably absent, randomized experiments are analysis (see Duflo, Rachel Glennerster, and ­therefore ­particularly attractive. Moreover, Kremer 2008 for a review). in biomedical settings it is often possible to 20 Journal of Economic Literature, Vol. XLVII (March 2009) keep the units ignorant of their treatment Examples of such programs include the status, further enhancing the interpretation Greater Avenues to INdependence (GAIN) of the estimated effects as causal effects of programs (e.g., James Riccio and Daniel the treatment, and thus improving the exter- Friedlander 1992, the WIN programs (e.g., nal validity. Judith M. Gueron and Edward Pauly 1991; In the economics literature randomization Friedlander and Gueron 1992; Friedlander has played a much less prominent role. At var- and Philip K. Robins 1995), the Self ious times social experiments have been con- Sufficiency Project in Canada (Card and Dean ducted, but they have rarely been viewed as R. Hyslop 2005, and Card and Robins 1996), the sole method for establishing causality, and and the Statistical Assistance for Programme in fact they have sometimes been regarded Selection in Switzerland (Stefanie Behncke, with some suspicion concerning the rele- Markus Frölich, and Lechner 2006). Like vance of the results for policy purposes (e.g., the NSW evaluation, these experiments have Heckman and Smith 1995; see Gary Burtless been useful not merely in establishing the 1995 for a more positive view of experiments effects of particular programs but also in pro- in social sciences). Part of this may be due to viding fertile testing grounds for new statisti- the fact that for the treatments of interest to cal evaluations methods. economists, e.g., education and labor mar- Recently there has been a large number of ket programs, it is generally impossible to do exciting and innovative experiments, mainly blind or double-blind experiments, creating in development economics but also in oth- the possibility of placebo effects that com- ers areas, including public finance (Duflo promise the internal validity of the estimates. and Emmanuel Saez 2003; Duflo et al. Nevertheless, this suspicion often down- 2006; Raj Chetty, Adam Looney, and Kory plays the fact that many of the concerns that Kroft forthcoming). The experiments in have been raised in the context of random- development economics include many edu- ized experiments, including those related to cational experiments (e.g., T. Paul Schultz missing data, and external validity, are often 2001; Orazio Attanasio, Costas Meghir, equally present in observational studies. and Ana Santiago 2005; Duflo and Rema Among the early social experiments in eco- Hanna 2005; Banerjee et al. 2007; Duflo nomics were the negative income tax experi- 2001; Miguel and Kremer 2004). Others ments in Seattle and Denver in the early study topics as wide-ranging as corruption 1970s, formally referred to as the Seattle and (Benjamin A. Olken 2007; Claudio Ferraz Denver Income Maintenance Experiments and Frederico Finan 2008) or gender issues (SIME and DIME). In the 1980s, a number in politics (Raghabendra Chattopadhyay and of papers called into question the reliability of Duflo 2004). In a number of these experi- econometric and statistical methods for esti- ments, economists have been involved from mating causal effects in observational studies. the beginning in the design of the evalua- In particular, LaLonde (1986) and Fraker and tions, leading to closer connections between Maynard (1987), using data from the National the substantive economic questions and the Supported Work (NSW) programs, suggested design of the experiments, thus improving that widely used econometric methods were the ability of these studies to lead to con- unable to replicate the results from experi- clusive answers to interesting questions. mental evaluations. These influential con- These experiments have also led to renewed clusions encouraged government agencies to interest in questions of optimal design. insist on the inclusion of experimental evalu- Some of these issues are discussed in Duflo, ation components in job training programs. Glennerster, and Kremer (2008), Miriam Imbens and Wooldridge: Econometrics of Program Evaluation 21

Bruhn and David McKenzie (2008), and Whether the null of no effect for any unit Imbens et al. (2008). versus the null of no effect on average is more interesting was the subject of a testy 4.2 Randomization-Based Inference and exchange between Fisher (who focused on Fisher’s Exact P-Values the first) and Neyman (who thought the lat- Fisher (1935) was interested in calculating ter was the interesting hypothesis, and who p-values for hypotheses regarding the effect of stated that the first was only of academic treatments. The aim is to provide exact infer- interest) in Splawa-Neyman (1990). Putting ences for a finite population of size N. This the argument about its ultimate relevance finite population may be a random sample aside, Fisher’s test is a powerful tool for from a large superpopulation, but that is not establishing whether a treatment has any exploited in the analysis. The inference is non- effect. It is not essential in this framework parametric in that it does not make functional that the probabilities of assignment to the form assumptions regarding the effects; it is treatment group are equal for all units. It is exact in that it does not rely on large sample crucial, however, that the probability of any approximations. In other words, the p-values particular assignment vector is known. These coming out of this analysis are exact and valid probabilities may differ by unit provided the irrespective of the sample size. probabilities are known. The most common null hypothesis in The implication of Fisher’s framework is Fisher’s framework is that of no effect of the that, under the null hypothesis, we know the treatment for any unit in this population, exact value of all the missing potential out- against the alternative that, at least for some comes. Thus there are no nuisance param- units, there is a non-zero effect: eters under the null hypothesis. As a result, we can deduce the distribution of any statis- H0 : Yi(0) Yi(1), i 1, …, N, tic, that is, any function of the realized values = ∀ = N of (Yi, Wi​) i​ 1​ , generated by the randomiza- against H : i such that Y (0) Y (1). tion. For example,= suppose the statistic is a ∃ i ≠ i the average difference between treated and __ __ It is not important that the null hypothesis control outcomes, T(W, Y) ​Y ​1 ​Y ​0, where ​ __ = − is that the effects are all zero. What is essen- Y ​w i W w​ ​ ​Yi /Nw , for w 0, 1. Now sup- = ​∑ ∶ i= = tial is that the null hypothesis is sharp, that pose we had assigned a different set of units is, the null hypothesis specifies the value of to the treatment. Denote the vector of alter- all unobserved potential outcomes for each native treatment assignments by ​ W˜ ​. Under unit. A more general null hypothesis could the null hypothesis we know all the potential be that Y (0) Y (1) c for some prespeci- outcomes and thus we can deduce what the i = i + fiedc , or that Yi(0) Yi(1) ci for some set of value of the statistic would have been under = + ˜ prespecified ci. Importantly, this framework that alternative assignment, namely T( W​ ​, Y). cannot accommodate null hypotheses such We can infer the value of the statistic for all as the average effect of the treatment is zero, possible values of the assignment vector W, against the alternative hypothesis of a non- and since we know the distribution of W we zero average effect, or can deduce the distribution of T(W, Y). The distribution generated by the randomization

1 of the treatment assignment is referred to as H0 : ​ __ ​ ​ ​ ​Yi A(1) Yi(0)B 0, the randomization distribution. The p-value ′ N ∑i − = of the statistic is then calculated as the prob- 1 against Ha : ​ __ ​ ​ ​ ​Yi A(1) Yi(0)B 0. ability of a value for the statistic that is at ′ N ∑i − ≠ 22 Journal of Economic Literature, Vol. XLVII (March 2009) least as large, in absolute value, as that of the this point, we took data from eight random- observed statistic, T(W, Y). ized evaluations of labor market programs. In moderately large samples, it is typi- Four of the programs are from the WIN cally not feasible to calculate the exact demonstration programs. The four evalua- p-values for these tests. In that case, one tions took place in Arkansas, Baltimore, San can approximate the p-value by basing it on Diego, and Virginia. See Gueron and Pauly a large number of draws from the random- (1991), Friedlander and Gueron (1992), David ization distribution. Here the approximation Greenberg and Michael Wiseman (1992), error is of a very different nature than that and Friedlander and Robins (1995) for more in typical large sample approximations: it is detailed discussions of each of these evalu- controlled by the researcher, and if more ations. The second set of four programs is precision is desired one can simply increase from the GAIN programs in California. The the number of draws from the randomiza- four locations are Alameda, Los Angeles, tion distribution. Riverside, and San Diego. See Riccio and In the form described above, with the Friedlander (1992), Riccio, Friedlander, and ­statistic equal to the difference in averages Freedman (1994), and Dehejia (2003) for by treatment status, the results are typically more details on these programs and their not that different from those using Wald evaluations. In each location, we take as the tests based on large sample normal approxi- outcome total earnings for the first (GAIN) mations to the sampling distribution to the or second (WIN) year following the program, difference in means ​Y__ ​ ​Y__ ​, as long as the and we focus on the subsample of individuals 1 − 0 sample size is moderately large. The Fisher who had positive earnings at some point prior approach to calculating p-values is much to the program. We calculate three p-values more interesting with other choices for for each location. The first p-value is based the statistic. For example, as advocated by on the normal approximation to the t-statis- Rosenbaum in a series of papers (Rosenbaum tic calculated as the difference in average 1984a, 1995), a generally attractive choice is outcomes for treated and control individu- the difference in average ranks by treatment als divided by the estimated standard error. status. First the outcome is converted into The second p-value is based on randomiza- ranks (typically with, in case of ties, all pos- tion inference using the difference in aver- sible rank orderings averaged), and then the age outcomes by treatment status. And the test is applied using the average difference third p-value is based on the randomization in ranks by treatment status as the statistic. distribution using the difference in average The test is still exact, with its exact distri- ranks by treatment status as the statistic. The bution under the null hypothesis known as results are in table 1. the Wilcoxon distribution. Naturally, the test In all eight cases, the p-values based on based on ranks is less sensitive to ­outliers the t-test are very similar to those based than the test based on the difference in on randomization inference. This outcome means. is not surprising given the reasonably large If the focus is on establishing whether the sample sizes, ranging from 71 (Arkansas, treatment has some effect on the outcomes, WIN) to 4,779 (San Diego, GAIN). However, rather than on estimating the average size in a number of cases, the p-value for the of the effect, such rank tests are much more rank test is fairly different from that based likely to provide informative conclusions on the level difference. In both sets of four than standard Wald tests based differences locations there is one location where the in averages by treatment status. To illustrate rank test suggests a clear rejection at the Imbens and Wooldridge: Econometrics of Program Evaluation 23

Table 1 P-values for Fisher Exact Tests: Ranks versus Levels

Sample Size p-values

Program Location Controls Treated t-test FET (levels) FET (ranks)

GAIN Alameda 601 597 0.835 0.836 0.890 GAIN Los Angeles 1400 2995 0.544 0.531 0.561 GAIN Riverside 1040 4405 0.000 0.000 0.000 GAIN San Diego 1154 6978 0.057 0.068 0.018 WIN Arkansas 37 34 0.750 0.753 0.805 WIN Baltimore 260 222 0.339 0.339 0.286 WIN San Diego 257 264 0.136 0.137 0.024 WIN Virginia 154 331 0.960 0.957 0.249

5 percent level whereas the level-based test 5. Estimation and Inference under would suggest that the null hypothesis of no Unconfoundedness effect should not be rejected at the 5 per- cent level. In the WIN (San Diego) evalua- Methods for estimation of average treat- tion, the p-value goes from 0.068 (levels) to ment effects under unconfoundedness are 0.024 (ranks), and in the GAIN (San Diego) the most widely used in this literature. The evaluation, the p-value goes from 0.136 (lev- central paper in this literature, which intro- els) to 0.018 (ranks). It is not surprising that duces the key assumptions, is Rosenbaum the tests give different results. Earnings data and Rubin (1983b), although the literature are very skewed. A large proportion of the goes further back (e.g., William G. Cochran populations participating in these programs 1968; Cochran and Rubin 1973; Rubin 1977). have zero earnings even after conditioning Often the unconfoundedness assumption, on positive past earnings, and the earnings which requires that conditional on observed distribution for those with positive earnings covariates there are no unobserved factors is skewed. In those cases, a rank-based test that are associated both with the assignment is likely to have more power against alterna- and with the potential outcomes, is contro- tives that shift the distribution toward higher versial. Nevertheless, in practice, where often earnings than tests based on the difference data have been collected in order to make this in means. assumption more plausible, there are many As a general matter it would be useful in cases where there is no clearly superior alter- randomized experiments to include such native, and the only alternative is to abandon results for rank-based p-values, as a generally the attempt to get precise inferences. In this applicable way of establishing whether the section, we discuss some of these methods treatment has any effect. As with all omnibus and the issues related to them. A general tests, one should use caution in interpreting theme of this literature is that the concern is a rejection, as the test can pick up interesting more with biases than with efficiency. changes in the distribution (such as a mean Among the many recent economic appli- or median effect) but also less interesting cations relying on assumptions of this type changes (such as higher moments about the are Blundell et al. (2001), Angrist (1998), mean). Card and Hyslop (2005), Card and Brian P. 24 Journal of Economic Literature, Vol. XLVII (March 2009)

McCall (1996), V. Joseph Hotz, Imbens, and in the subsample with treatment W w. I = Jacob A. Klerman (2006), Card and Phillip Imbens and Rubin (forthcoming) suggest as a B. Levine (1994), Card, Carlos Dobkin, and rule of thumb that with a normalized differ- Nicole Maestas (2004), Hotz, Imbens, and ence exceeding one quarter, linear regression Julie H. Mortimer (2005), Lechner (2002a), methods tend to be sensitive to the specifi- Abadie and Javier Gardeazabal (2003), and cation. Note the difference with the often Bloom (2005). reported t-statistic for the null hypothesis of This setting is closely related to that under- equal means, lying standard multiple regression analysis with a rich set of controls. See, for example, X__ ​ ​X__ ​ 1 0 Burt S. Barnow, Glend G. Cain, and Arthur (4) T ______​ − ​ . = ​ ​ ​S​2​ ​ N S​2​ ​ N ​ S. Goldberger (1980). Unconfoundedness √ 0 / 0 + ​ 1 / 1 implies that we have a sufficiently rich set of predictors for the treatment indicator, con- The reason for focusing on the normalized tained in the vector of covariates Xi, such difference, (3), rather than on the ­t-statistic, that adjusting for differences in these covari- (4), as a measure of the degree of difficulty in ates leads to valid estimates of causal effects. the statistical problem of adjusting for differ- Combined with linearity assumptions of the ences in covariates, comes from their relation conditional expectations of the potential out- to the sample size. Clearly, simply increasing comes given covariates, the unconfoundedness the sample size does not make the problem assumption justifies linear regression. But in of inference for the average treatment effect the last fifteen years the literature has moved inherently more difficult. However, quadru- away from the earlier emphasis on regression pling the sample size leads, in expectation, methods. The main reason is that, although to a doubling of the t-statistic. In contrast, locally linearity of the regression functions increasing the sample size does not system- may be a reasonable approximation, in many atically affect the normalized difference. In cases the estimated average treatment effects the landmark LaLonde (1986) paper the nor- based on regression methods can be severely malized difference in mean exceeds unity for biased if the linear ­approximation is not accu- many of the covariates, immediately show- rate globally. To assess the potential problems ing that standard regression methods are with (global) regression methods, it is useful unlikely to lead to credible results for those to report summary statistics of the covariates data, even if one views unconfoundedness as by treatment status. In particular, one may a reasonable assumption. wish to report, for each covariate, the differ- As a result of the concerns with the sen- ence in averages by treatment status, scaled sitivity of results based on linear regres- by the square root of the sum of the vari- sion methods to seemingly minor changes ances, as a scale-free measure of the differ- in specification, the literature has moved to ence in ­distributions. To be specific, one may more sophisticated methods for adjusting for wish to report the normalized difference differences in covariates. Some of these more sophisticated methods use the propensity ​X__ ​ X__​ ​ score—the conditional probability of receiv- 1 0 (3) X ​ ______− ​ , ing the treatment—in various ways. Others Δ = ​ ​S​2​ ​ S​2​ ​ ​ √ 0 + ​ 1 rely on pairwise matching of treated units to control units, using values of the covariates to 2​ where for w 0, 1, ​Sw​ ​ i W w​ ​ ​(Xi match. Although these estimators appear at = = ∑​ ∶ i= − X__​ ​ )2/(N 1), the sample variance of X first sight to be quite different, many ­including( w w − i Imbens and Wooldridge: Econometrics of Program Evaluation 25 nonparametric versions of the regression esti- An ongoing discussion concerns the role mators) in fact achieve the semiparametric of the propensity score, e(x) pr(W 1 X = i = | i efficiency bound; thus, they would tend to be x), introduced by Rosenbaum and Rubin = similar in large samples. Choices among them (1983b), and indeed whether there is any typically rely on small sample arguments, role for this concept. See for recent contribu- which are rarely formalized, and which do not tions to this discussion Hahn (1998), Imbens uniformly favor one estimator over another. (2004), Angrist and Hahn (2004), Peter C. Most estimators currently in use can be writ- Austin (2008a, 2008b), Dehejia (2005a), ten as the difference of a weighted average of Smith and Todd (2001, 2005), Heckman, the treated and control outcomes, with the Ichimura, and Todd (1998), Frölich (2004a, weights in both groups adding up to one: 2004b), B. B. Hansen (2008), Jennifer Hill (2008), Robins and Ya’acov Ritov (1997), N Rubin (1997, 2006), and Elizabeth A. Stuart ˆ ​ ​ · Y , with ​ ​ ​ 1, τ​= ​ λi i λi = (2008). i∑1 i W∑i 1 = ∶ = In this section, we first discuss the key

assumptions underlying an analysis based on ​ ​ ​ i 1. i W∑ 0 λ = − ∶ i= unconfoundedness. We then review some of the efficiency bound results for average treat- The estimators differ in the way the weights ment effects. Next, in sections 5.3 to 5.5, we λi depend on the full vector of assignments and briefly review the basic methods relying on matrix of covariates (including those of other regression, propensity score methods, and units). For example, some estimators implicitly matching. Although still fairly widely used, allow the weights to be negative for the treated we do not recommend these methods in prac- units and positive for controls units, whereas tice. In sections 5.6 to 5.8, we discuss three others do not. In addition, some depend on of the combination methods that we view as essentially all other units whereas others more attractive and recommend in practice. depend only on units with similar covariate We discuss estimating variances in section values. Nevertheless, despite the commonali- 5.9. Next we discuss implications of lack of ties of the estimators and large sample equiva- overlap in the covariate distributions. In par- lence results, in practice the performance of ticular, we discuss two general ­methods for the estimators can be quite different, partic- constructing samples with improved covari- ularly in terms of robustness and bias. Little ate balance, both relying heavily on the pro- is known about finite sample properties. The pensity score. In section 5.11, we describe few simulation studies include Zhong Zhao methods that can be used to assess the plau- (2004), Frölich (2004a), and Matias Busso, sibility of the unconfoundedness assumption, John DiNardo, and Justin McCrary (2008). even though this assumption is not directly On a more positive note, some ­understanding testable. We discuss methods for testing for has been reached regarding the sensitivity of the presence of average treatment effects specific estimators to particular configura- and for the presence of treatment effect het- tions of the data, such as limited overlap in erogeneity under unconfoundedness in sec- covariate distributions. Currently, the best tion 5.12. practice is to combine linear regression with 5.1 Identification either propensity score or matching methods in ways that explicitly rely on local, rather than The key assumption is unconfounded- global, linear approximations to the regression ness, introduced by Rosenbaum and Rubin functions. (1983b), 26 Journal of Economic Literature, Vol. XLVII (March 2009)

Assumption 1 (Unconfoundedness) The second assumption used to identify treatment effects is that for all possible val- W Y (0), Y (1) X . ues of the covariates, there are both treated i ǁ ( i i ) | i and control units. The unconfoundedness assumption is often controversial, as it assumes that beyond the Assumption 2 (Overlap) observed covariates Xi there are no (unob- served) characteristics of the individual 0 pr(W 1 | X x) 1, for all x. < i = i = < associated both with the potential outcomes and the treatment.7 Nevertheless, this kind We call this the overlap assumption as it of assumption is used routinely in multiple implies that the support of the conditional regression analysis. In fact, suppose we distribution of X given W 0 overlaps com- i i = assume that the treatment effect, , is con- pletely with that of the conditional distribu- τ stant, so that, for each random draw i, tion of Xi given Wi 1. τ = = N Yi(1) Yi(0). Further, assume that Yi(0) With a random sample (Wi, Xi​) i​ 1​ we − = α = X , where Y (0) E[Y (0) X ] can estimate the propensity score e(x) + β′ i + εi εi = i − i | i is the residual capturing the unobservables pr(W 1 X x), and this can provide = i = | i = affecting the response in the absence of some guidance for determining whether treatment. Then, with the observed outcome the overlap assumption holds. Of course defined as Y (1 W) Y (0) W Y (1), ­common parametric models, such as probit i = − i · i + i · i we can write and logit, ensure that all estimated prob- abilities are strictly between zero and one, Y W X , and so examining the fitted probabilities i = α + τ· i + β′ i + εi from such models can be misleading. We and unconfoundedness is equivalent to inde- discuss approaches for improving overlap pendence of and of W , conditional on X . in 5.10. εi i i Imbens (2004) discusses some economic The combination of unconfoundedness models that imply unconfoundedness. These and overlap was referred to by Rosenbaum models assume agents choose to participate in and Rubin’s (1983b) as strong ignorability. a program if the benefits, equal to the differ- There are various ways to establish identifi- ence in potential outcomes, exceed the costs cation of various average treatment effects associated with participation. It is important under strong ignorability. Perhaps the easi- here that there is a distinction between the est is to note that (x) E[Y (1) Y (0) X τ ≡ i − i | i objective of the participant (net benefits), and x] is identified for x in the support of the = the outcome that is the focus of the reseacher covariates: (gross benefits). (See Athey and Scott Stern 1998 for some discussion.) Unconfoundedness is implied by independence of the costs and (5) (x) E[Y (1) X x] E[Y (0) X x] τ = i | i = − i | i = benefits, conditional on observed covariates. E[Y (1) W 1, X x] = i | i = i = E[Y (0) W 0, X x] 7 Unconfoundedness generally fails if the covariates − i | i = i = themselves are affected by treatment. Wooldridge (2005) provides a simple example where treatment is random- E[Yi Wi 1, Xi x] ized with respect to the counterfactual outcomes but not = | = = with respect to the covariates. Unconfoundedness is easily E[Y W 0, X x], shown to fail. − i | i = i = Imbens and Wooldridge: Econometrics of Program Evaluation 27 where the second equality follows by uncon- effect, the third term drops out, and the vari- foundedness: E[Y (w) W w, X ] does not ance bound for is i | i = i τCATE depend on w. By the overlap assumption, we ​ ​2​ ​( X ) ​ ​2​ ​( X ) can estimate both terms in the last line, and (8) E​ ​ σ1 i ​ σ0 i ​ ​. 핍CATE = _____ + ______therefore we can identify (x). Given that we e(Xi) 1 e(Xi) τ − can identify (x) for all x, we can identify the [ ] τ expected value across the population distri- Still, the role of heterogeneity in the treat- bution of the covariates, ment effect is potentially important. Suppose we actually had prior knowledge that the (6) E[ (X )], average treatment effect conditional on the τPATE = τ i covariates is constant, or (x) for all τ = τPATE as well as and other estimands. x. Given this assumption, the model is closely τPATT related to the partial linear model (Peter M. 5.2 Efficiency Bounds Robinson 1988; James H. Stock 1989). Given Before discussing specific estimation this prior knowledge, the variance bound is methods, it is useful to see what we can learn about the parameters of interest, given just (9) V the strong ignorability of treatment assign- const 1 1 ment assumption, without functional form or ​ ​2​ ​( X ) ​ ​2​ ​( X ) − − distributional assumptions. In order to do so, aE ca​ σ1 i ​ σ0 i ​​b​ d​b​ ​ ​. = _____e(X ) +______1 e(X ) we need some additional notation. Let ​ ​2​ ​( x) i − i σ0 (Y (0) X x) and ​ ​2​ ​( x) (Y (1) X = 핍 i | i = σ1 = 핍 i | i x) denote the conditional variances of the This variance bound can be much lower = potential outcomes given the covariates. than (8) if there is variation in the propen- Hahn (1998) derives the lower bounds for sity score. Knowledge of lack of variation in __ asymptotic variances of ​ √N ​ -consistent esti- the treatment effect can be very valuable, or, mators for as conversely, allowing for general heterogene- τPATE 2 2 ity in the treatment effect can be expensive ​ ​​ ​( X ) ​​ ​( X ) (7) E c​ σ1 i σ​ 0 i ​ in terms of precision. 핍PATE = _____e(X ) + ______​ 1 e(X ) i − i In addition to the conditional variances of the counterfactual outcomes, a third impor- ( (X ) ) ​ 2d​ , + τ i − τ​ tant determinant of the efficiency bound is the propensity score. Because it enters into where p E[e(X )] is the unconditional treat- (7) in the denominator, the presence of units = i ment probability. Interestingly, this lower with the propensity score close to zero or one bound holds irrespective of whether the will make it difficult to obtain precise esti- ­propensity score is known or not. The form mates of the average effect of the treatment. of this variance bound is informative. It is no One approach to address this problem, devel- surprise that is more difficult to esti- oped by Crump et al. (2009) and discussed in τPATE mate the larger are the variances ​ ​2​ ​( x) and ​ more detail in section 5.10, is to drop obser- σ0 ​2​ ​( x). However, as shown by the presence of vations with the propensity score close to σ1 the third term, it is also more difficult to esti- zero and one, and focus on the average effect mate , the more variation there is in the of the treatment in the subpopulation with τPATE average treatment effect conditional on the propensity scores away from zero. Suppose covariates. If we focus instead on estimat- we focus on CATE, , the average of (Xi) for τ 픸 τ ing , the conditional average treatment X . Then the variance bound is τCATE i ∈ 픸 28 Journal of Economic Literature, Vol. XLVII (March 2009)

N 1 (10) ( 1 1 ) ˆ reg ​ ​ ​ ​ A​ ˆ 1 (Xi) ​ ˆ 0 (Xi)B. 픸 ______1 __ 핍 = ​ pr i ​ τ​ = N i∑1 μ​ − μ​ (X ∈ 픸) = 2 2 Given parametric models for ( ) and ​ ​( X ) ​ ​( X ) μ0 · E cσ​ ​1 i ​ σ​ ​0 i ​ (X ) d ( ), estimation and inference are straight- × _____​ e(X ) + ______​ 1 e (X ) i ∈ 픸 , μ1 · i − i forward.8 In the simplest case, we assume | each conditional mean can be expressed as By excluding from the set subsets of the functions linear in parameters, say 픸 covariate space where the propensity score is close to zero or one, we may be able to esti- (12) (x) (x ), μ0 = α0 + β′0 − ψX mate CATE, more precisely than CATE. (If τ 픸 τ we are instead interested in , we only (x) (x ), τCATT μ1 = α1 + β′1 − ψX need to worry about covariate values where e(x) is close to one.) where we take deviations from the overall Having displayed these lower bounds on population covariate mean so that the ψX variances for the average treatment effects, treatment effect is the difference in inter- a natural question is: Are there estimators cepts. (Naturally, as in any regression context, that achieve these lower bounds that do not we can replace x with general functions of x.) require parametric models or functional Of course, we rarely know the ­population form restrictions on either the conditional mean of the covariates, so in estimation we means or the propensity score? The answer replace X with the ­sample average across all ψ in general is yes, and we now consider differ- units, X__​ ​. Then ​ ˆ is simply τreg​ ent classes of estimators in turn.

( 1 3 ) ˆ ˆ ​ ​ ˆ ​ . 5.3 Regression Methods τ​reg = α​ 1 − α0 To describe the general approach to This estimator is also obtained from the regression methods for estimating average coefficient on the treatment indicator Wi in treatment effects, define (x) and (x) to be the regression Y on 1, W , X , W (X ​X__ ​). μ0 μ1 i i i i · i − the two regression functions for the potential Standard errors can be obtained from stan- outcomes: dard least square regression output. (As we show below, in the case of estimating (x) E[Y (0) X x] , the usual standard error, whether μ0 = i | i = τ PATE and or not it is made robust to heteroskedastic- ity, ignores the estimation error in ​X__ ​ as an

(x) E[Y (1) X x]. estimator of ; technically, the conventional μ1 = i | i = ψX

By definition, the average treatment effect 8 There is a somewhat subtle issue in estimating treat- conditional on X x is (x) (x) (x). = τ = μ1 − μ0 ment effects from stratified samples or samples with As we discussed in the identification subsec- missing values of the covariates. If the missingness or tion, under the unconfoundedness assump- stratification are determined by outcomes on the covari- ates, Xi, and the conditional means are correctly specified, tion, 0(x) E[Yi Wi 0, Xi x] and 1(x) then the missing data or stratification can be ignored for μ = | = = μ the purposes of estimating the regression parameters; see, E[Yi Wi 1, Xi x], which means we can = | = = for example, Wooldridge (1999, 2007). However, sample estimate 0( ) using regression methods for μ · selection or stratification based onX i cannot be ignored in the untreated subsample and ( ) using estimating, say, , because equals the expected μ1 · τ PATE τ PATE the treated subsample. Given consistent difference in regression functions across the population distribution of X . Therefore, consistent estimation of ­estimators ​ ˆ ( ) and ​ ˆ ( ), a consistent esti- i τ PATE μ​0 · μ1​ · requires applying inverse probability weights or sampling mator for either or is τPATE τCATE weights to the average in (11). Imbens and Wooldridge: Econometrics of Program Evaluation 29 standard error is only valid for and not linear approximation to the regression func- τCATE for PATE.) tion is globally accurate, regression may lead τ A different representation of ​ ˆ is useful to severe biases. Another way of ­interpreting τ​reg in order to illustrate some of the concerns this ­problem is as a multicollinearity prob- with regression estimators in this setting. lem. If the averages of the covariates in Suppose we do use the linear model in (12). the two treatment arms are very different, It can be shown that the correlation between the covariates and the treatment indicator is relatively high. N0 Although conventional least squares standard (14) ˆ Y__ ​ Y__​ ​ a​ ​ ​ ˆ τreg​ = ​ 1 − 0 − ______N N · β1​ errors take the degree of multicollinearity 0 + 1 into account, they do so conditional on the N ′ 1 ​ ​ ˆ b​ ​ (X__​ ​ X__​ ​). specification of the regression function. Here + ______​ N N · β0​ ​ 1 − 0 0 + 1 the concern is that any misspecification may be exacerbated by the collinearity problem. To adjust for differences in covariates As noted in the introduction to section 5, between treated and control units, the simple an easy way to establish the severity of this difference in average outcomes, ​Y__ ​ Y__​ ​, is problem is to inspect the normalized differ- 1 − 0 ______adjusted by the difference in average covari- ences (​X__ ​ ​X__ ​) S​2​ ​ S​2​ ​) ​. 1 − 0 /​ ​ 0 + ​ 1 ates, ​X__ ​ ​X__ ​, multiplied by the weighted In the case of the√ standard regression esti- 1 − 0 ­average of the regression coeffi- mator it is straightforward to derive and to cients ​ ˆ and ​ ˆ in the two treatment regimes. estimate the variance when we view the esti- β0​ β1​ This is a useful representation. It shows that mator as an estimator of . Assuming the τCATE if the averages of the covariates in the two linear regression model is correctly specified, treatment arms are very different, then the we have adjustment to the simple mean difference can d be large. We can see that even more clearly ( 1 5 ) __N ​( ​ ˆ ​ ) (0, V V ), √ τreg − τCATE →  0 + 1 by inspecting the predicted outcome for the 2 treated units had they been subject to the con- where V N · E [(​ ˆ ​ ) ], w = αw − αw trol treatments: which can be obtained directly from standard Eˆ ​[ Y (1) W 0] ​Y__ ​ ​ ˆ (X__​ ​ X__​ ​ ). regression output. Estimating the variance i | i = = 0 + β 0′ ​ 1 − 0 when we view the estimator as an estimator The regression parameter ​ ˆ is estimated of requires adding a term capturing the β0​ τPATE on the control sample, where the average variation in the treatment effect conditional of the covariates is equal to ​X__ ​ . It there- on the covariates. The form is then 0 fore likely provides a good approximation to __ d the conditional mean function around that N ​( ​ ˆ reg ​ CATE) (0, V0 V1 V ), √ τ − τ →  + + τ value. However, this estimated regression function is then used to predict outcomes where the third term in the normalized vari- in the treated sample, where the average of ance is the covariates is equal to ​X__ ​. If these cova- 1 riate averages are very different, and thus V ( 1 0) τ = β − β ′ the regression model is used to predict out- comes far away from where the parameters E[(X E[X ])(X E[X ]) ]( ), i − i i − i ′ β1 − β0 were estimated, the results can be sensitive to minor changes in the specification. Unless the which can be estimated as 30 Journal of Economic Literature, Vol. XLVII (March 2009)

ˆ ˆ ˆ ​ V ​ ( ​ 1 ​ 0 ) practice, researchers have not used higher τ = β​ − β​ ′ order kernels and, with positive kernels, the N 1 __ __ ˆ ˆ bias for kernel estimators is a more severe ​ ​ ​ ​ (X X​ ​)(X ​X ​) ( ​ ​ ). __ i i 1 0 ×N i∑1 − − ′ β​ − β​ problem than for the matching estimators = discussed in section 5.5. In practice, this additional term is rarely Kernel regression of this type can be inter- incorporated, and researcher instead report preted as locally fitting a constant regression the variance corresponding to . In cases function. A general alternative is to fit locally τCATE where the slope coefficients do not differ a polynomial regression function. The leading substantially across the two regimes—equiv- case of this is local linear regression (J. Fan alently, the coefficients on the interaction and I. Gijbels 1996), applied to estimation terms W (X ​X__ ​) are “small”—this last of average treatment effects by Heckman, i · i − term is likely to be swamped by the variances Ichimura, and Todd (1997) and Heckman et in (15). al. (1998). Define ​ ˆ ​(x) and ​ ˆ (x) as the local α β​ In many cases, researchers have sought to least squares estimates, based on locally fit- go beyond simple parametric models for the ting a linear regression function: regression functions. Two general directions N ˆ ˆ have been explored. The first relies on local A​ ​( x), ​ (x)B arg min​ ​ ​ i α β​ = , ​i∑1 λ smoothing, and the second on increasingly α β = flexible global approximations. We discuss AY (X x)B2, × i − α −β′ i − both in turn. Heckman, Ichimura, and Todd (1997) and with the same weights as in the standard λi Heckman et al. (1998) consider local smooth- kernel estimator. The regression function at ing methods to estimate the two regression x is then estimated as ​ ˆ (x) ˆ ​( x). In order μ​ = α​ functions. The first method they consider is to achieve convergence at the best pos- kernel regression. Given a kernel K( ), and a sible rate for ​ ˆ , one needs to use higher · τ​reg bandwidth h, the kernel estimator for (x) order kernels, although the order required μw is is less than that for the standard kernel estimator.

ˆ w (x) ​ ​ ​ Yi · i, with weight For both the standard kernel estimator μ​ = i W∑ w λ ∶ i= and the local linear estimator an important choice is that of the bandwidth h. In prac- Xi x Xi x i K a​ _____− ​ b ​ ​K a​ _____− ​ b . tice, researchers have used ad hoc methods λ = h i W∑​ w h /∶ i= for bandwidth selection. Formal results on bandwidth selection from the literature on Although the rate of convergence of the nonparametric regression are not directly kernel estimator to the regression function applicable. Those results are based on mini- is slower than the conventional parametric mizing a global criterion such as the expected 1 2 rate N − / , the rate of convergence of the value of the squared difference between the implied estimator for the average treatment estimated and true regression function, with effect, ​ ˆ in (11), is the regular parametric the expectation taken with respect to the τreg​ rate under regularity conditions. These con- marginal distribution of the covariates. Thus, ditions include smoothness of the regression they focus on estimating the regression func- functions and require the use of higher order tion well everywhere. Here the focus is on kernels (with the order of the kernel depend- a particular scalar functional of the regres- ing on the dimension of the covariates). In sion function, and it is not clear whether the Imbens and Wooldridge: Econometrics of Program Evaluation 31

K conventional methods for bandwidth choices k have good properties. w,K(x) ​ ​ w,k x . μ = k∑0 β · Although formal results are given for the = case with continuous regressors, modifica- We then estimate by least squares βw,k tions have been developed that allows for both regression, and estimate the average treat- continuous and discrete covariates (Jeffrey S. ment effect using (11). This is a special case Racine and Qi Li 2004). All such methods of the estimator discussed in Imbens, Newey, require choosing the degree of smoothing and Ridder (2005) and Chen, Hong, and (often known as bandwidths), and there has Tarozzi (2008), with formal results presented not been much work on choosing bandwidths for the case with general Xi. Imbens, Newey, for the particular problem of estimating aver- and Ridder (2005) also discuss methods for age treatment effects where the parameter of choosing the number of terms in the series interest is effectively the average of a regres- based on expected squared error for the sion function, and not the entire function. See average treatment effect. Imbens (2004) for more discussion. Although If the outcome is binary or more generally the estimators based on local smoothing have of a limited dependent variable form, a linear not been shown to attain the variance effi- series approximation to the regression func- ciency bound, it is likely that they can be con- tion is not necessarily attractive. It is likely structed to do so under sufficient smoothness that one can use increasingly flexible approx- conditions. imations based on models that exploit the An alternative to local smoothing meth- structure of the outcome data. For the case ods are global smoothing methods, such as with binary outcomes, Hirano, Imbens, and series or sieve estimators. Such estimators Ridder (2003) show how using a polynomial are parametric for a given sample size, with approximation to the log odds ratio leads to the number of parameters and the flexibil- an attractive estimator for the conditional ity of the model increasing with the sample mean. See Chen (2007) for general discus- size. One attraction of such methods is that sion of such models. One can imagine that, often estimation and inference can proceed in cases with nonnegative response variables, as if the model is completely parametric. exponential regression functions, or those The amount of smoothing is determined by derived from specific models, such as Tobit the number of terms in the series, and the (when the response can pile up at zero), com- large-sample analysis is carried out with the bined with polynomial approximations in the number of terms growing as a function of the linear index function, might be useful. sample size. Again, little is known about how Generally, methods based on global to choose the number of terms when inter- approximations suffer from the same draw- est lies in average treatment effects. For the backs as linear regression. If the covariate ­average treatment case, Hahn (1998), Imbens, distributions are substantially different in Newey, and Ridder (2005), Andrea Rotnitzky both treatment groups, estimates based on and Robins (1995), and Chen, Hong, and such methods rely, perhaps more than is Tarozzi (2008) have developed estimators of desired, on extrapolation. Using these meth- this type. Hahn shows that estimators in this ods in cases with substantial differences in class can achieve the variance lower bounds covariate distributions is therefore not rec- for estimating . For a simple version of ommended (except possibly in cases where τPATE such an estimator, suppose that Xi is a sca- the sample has been trimmed so that the lar. Then we can approximate (x) by a K-th covariates across the two treatment regimes μw order polynomial have considerable overlap). 32 Journal of Economic Literature, Vol. XLVII (March 2009)

Before we turn to propensity score methods, The basic insight is that for any binary vari- we should comment on estimating the average able Wi, and any random vector Xi, it is true treatment effects on the treated, PATT and (without assuming unconfoundedness) that τ . In this case, ​ ˆ (X ) gets averaged across τCATT τ ​ i observations with Wi 1, rather than across Wi ǁ Xi e(Xi). = | the entire sample as in (11) Because ​ ˆ (x) is μ1​ estimated on the treated subsample, in esti- Hence, within subpopulations with the same mating PATT or CATT there is no problem value for the propensity score, covariates are if (x) is poorly estimated at covariate values independent of the treatment indicator and μ1 that are common in the control group but thus cannot lead to biases (the same way in scarce in the treatment group. But we must a regression framework omitted variables that have a good estimate of (x) at covariate val- are uncorrelated with included covariates do μ0 ues common in the treatment group, and this not introduce bias). Since under unconfound- is not ensured because we can only use the edness all biases can be removed by adjusting control group to obtain ​ ˆ (x). Nevertheless, in for differences in covariates, this means that μ0​ many settings (x) can be estimated well over within subpopulations homogenous in the μ0 the entire range of the covariates because the propensity score there are no biases in com- control group often includes units that are sim- parisons between treated and control units. ilar to those in the treatment group. By con- Given the Rosenbaum–Rubin result, it is trast, often there are numerous control group sufficient, under the maintained assumption of units—for example, high-income workers in unconfoundedness, to adjust solely for differ- the context of a job training program—that ences in the propensity score between treated are quite different from any units in the treat- and control units. This result can be exploited ment group, making the ATE parameters con- in a number of ways. Here we discuss three siderably more difficult to estimate than ATT of these that have been used in practice. The parameters. (Further, the ATT parameters are first two of these methods exploit the fact more interesting from a policy perspective in that the propensity score can be viewed as a such cases, unless one redefines the popula- covariate that is sufficient to remove biases in tion to exclude some units that are unlikely to estimation of average treatment effects. For ever be in the treatment group.) this purpose, any one-to-one function of the propensity score could also be used. The third 5.4 Methods Based on the Propensity Score method further uses the fact that the pro- The first set of alternatives to regres- pensity score is the conditional probability of sion estimators relies on estimates of the receiving the treatment. ­propensity score. These methods were intro- The first method simply uses the pro- duced in Rosenbaum and Rubin (1983b). pensity score in place of the covariates in An early economic discussion is in Card and regression analysis. Define (e) E[Y W νw = i | i Sullivan (1988). Rosenbaum and Rubin show w,e(X ) e]. Unconfoundedness in com- = i = that, under unconfoundedness, independence bination with the Rosenbaum–Rubin result of potential outcomes and treatment indica- implies that (e) E[Y (w) e(X ) e]. Then νw = i | i = tors also holds after conditioning solely on the we can estimate (e) very generally using νw propensity score, e(x) pr(W 1 X x): kernel or series estimation on the propensity = i = | i = score, something which is greatly simpli- W AY (0), Y (1)B X fied by the fact that the propensity score is a i ǁ i i | i ­scalar. Heckman, Ichimura, and Todd (1998) W AY (0), Y (1)B e(X ). consider local smoothers and Hahn (1998) ⇒ i ǁ i i | i Imbens and Wooldridge: Econometrics of Program Evaluation 33 considers a series estimator. In either case 1 be boundary values. Then define B , = ij we have the consistent estimator for i 1, … , N, and j 1, … , J 1, as the = = − N indicators ˆ 1 ˆ ˆ regprop __​ ​ ​ ​ A​ 1 (e(Xi)) ​ 0 (e(Xi))B, τ​ = N · i∑1 ν​ − ν​ = 1 if cj 1 e(Xi) cj Bij e − ≤ < which is simply the average of the differ- = 0 otherwise ences in predicted values for the treated and J 1 untreated outcomes. Interestingly, Hahn − and BiJ 1 ​ ​ Bij. shows that, unlike when we use regression to = − ∑j 1 = adjust for the full set of covariates, the series regression estimator based on adjusting for Now estimate within stratum j the average the known propensity score does not achieve treatment effect E[Y(1) Y(0) B 1] τj = i − i | ij = the efficiency bound. as Although methods of this type have been used in practice, probably because of their ˆ ​ ​Y__ ​ ​Y__ ​ τj = j1 − j0 simplicity, regression on simple functions of the propensity score is not recommended. where N Because the propensity score does not have a __ 1 ​Y ​jw ___ ​ ​ ​ ​Bij Yi, = ​ Njw i W∑ w × substantive meaning, it is difficult to motivate ∶ i= a low order polynomial as a good approxima- and tion to the conditional expectation. For exam- N ple, a linear model in the propensity score Njw ​ ​ Bij . =i W∑​ w is unlikely to provide a good approximation ∶ i= to the conditional expectation: individuals If J is sufficiently large and the differences with propensity scores of 0.45 and 0.50 are cj cj 1 small, there is little variation in the − − likely to be much more similar than individu- propensity score within a stratum or block, als with propensity scores equal to 0.01 and and one can analyze the data as if the propen- 0.06. Moreover, no formal asymptotic prop- sity score is constant, and thus as if the data erties have been derived for the case with within a block were generated by a completely the propensity score unknown. randomized experiment (with the assignment The second method, variously referred to probabilities constant within a stratum, but as blocking, subclassification, or­stratification, varying between strata). The average treat- also adjusts for differences in the propensity ment effect is then estimated as the weighted score in a way that can be interpreted as average of the within-stratum estimates: regression, but in a more flexible manner. J Nj0 Nj1 Originally suggested by Rosenbaum and ˆ ˆ + block ​ ​ j a______​ ​ b . Rubin (1983b), the idea is to partition the τ​ = ∑​j 1 τ​· N = sample into strata by (discretized) values of the propensity score, and then analyze the With J large, the implicit step function data within each stratum as if the propensity approximation to the regression functions score were constant and the data could be (e) will be accurate. Cochran (1968) shows νw interpreted as coming from a completely ran- in a Gaussian example that with five equal- domized experiment. This can be interpreted sized blocks the remaining bias is less than as approximating the conditional mean of the 5 percent of the bias in the simple differ- potential outcomes by a step function. To be ence between average outcomes among more precise, let 0 c c c … c treated and controls. Motivated by Cochran’s = 0 < 1 < 2 < < J 34 Journal of Economic Literature, Vol. XLVII (March 2009)

­calculations, researchers have often used five where the second and final inequalities fol- strata, although depending on the sample low by iterated expectations and the third size and the joint distribution of the data, equality holds by unconfoundedness. The fewer or more blocks will generally lead to a implication is that weighting the treated lower expected mean squared error. population by the inverse of the propensity The variance for this estimator is typi- score recovers the expectation of the uncon- cally calculated conditional on the strata ditional response under treatment. A similar indicators, and assuming random assignment calculation shows E[((1 W)Y) (1 e(X ))] − i i / − i within the strata. That is, for stratum j, the E[Yi(0)], and together these imply = estimator is ​ ˆ j , and its variance is estimated ˆ ˆ τ​ ˆ as ​ V j​ V​ j ​0 ​ V j ​1, where Wi Yi (1 Wi) Yi = + (16) PATE E c​ · ​ − · ​ d . τ = _____e(X ) −______1 e(X ) 2 i i S​ ​ − ˆ ​ jw V jw ​ ___ ​, where = ​ Njw Equation (16) suggests an obvious estimator of PATE: 2 1 __ 2 τ ​S​ ​ ​ ​ ​ ​(Y ​Y ​ ) jw ___ i jw = ​ Njw i Bij ∑1,W i w − 1 ∶ = = ( 1 7 ) ˆ ​ ​ τweight​ = __N N The overall variance is then estimated as Wi Yi (1 Wi) Yi ​ ​ c_____​ · ______​ − · ​ d , × i∑1 e(Xi) − 1 e(Xi) J 2 = − Nj0 Nj1 ˆ ˆ ˆ ˆ + ​( ​ block) ​ (​ V 0 ​ j ​ V 1​ j) a______​ b ​ ​. 핍 τ​ = ∑​j 1 + · N which, as a sample average from a random = sample, is consistent for and ​ __N ​ τPATE √ This variance estimator is appropriate for asymptotically normally distributed. The , although it ignores biases arising from estimator in (17) is essentially due to D. G. τCATE variation in the propensity score within strata. Horvitz and D. J. Thompson (1952).9 The third method exploiting the propensity In practice, (17) is not a feasible estima- score is based on weighting. Recall that tor because it depends on the propensity τPATE E[Y (1) Y (0)] = E[Y (1)] E[Y (0)]. We score function e( ), which is rarely known. A = i − i i − i · consider the two terms separately. Because ­surprising result is that, even if we know the

W Y W Y (1), we have propensity score, ​ ˆ does not achieve the i · i = i · i τweight​ efficiency bound given in (7). It turns out to

Wi Yi Wi Yi(1) be better, in terms of large sample efficiency, E c_____​ · ​ d E c______​ · ​ d to weight using the estimated rather than the e(Xi) = e(Xi) true propensity score. Hirano, Imbens, and W Y (1) Ridder (2003) establish conditions under E cE c​ i · i X dd which replacing e( ) with a logistic sieve esti- = ______e(X ) i · i mator results in a weighted propensity score | estimator that achieves the variance bound. E(Wi | Xi) E(Yi(1) X) E ______c​ · ​| d The estimator is practically simple to com- = e(Xi) pute, as estimation of the propensity score involves a straightforward logit estimation e(Xi) E(Yi(1) Xi) E ______c​ · ​| d =e(Xi) 9 Because the Horvitz–Thompson estimator is based on sample averages, adjustments for stratified sampling are E [E(Y (1) X ] E[Y (1)], straightforward if one is provided sampling weights. = i | i = i Imbens and Wooldridge: Econometrics of Program Evaluation 35 involving flexible functions of the covariates. the block. This has the advantage of avoiding Theoretically, the number of terms in the particularly large weights, but comes at the approximation should increase with the sam- expense of introducing bias if the propensity ple size. In the second step, given the esti- score is correctly specified. mated propensity score ​eˆ ​( x), one estimates A particular concern with IPW estimators arises again when the covariate distributions N N Wi Yi Wi are substantially different for the two ­treatment ˆ · ( 1 8 ) ipw ​ _____​ ​ ____​ ​ groups. That implies that the propensity score τ​ = i∑​ 1 ​ eˆ ​(Xi) ​i∑1 ​ eˆ ​(Xi) − = / = gets close to zero or one for some values of the N N (1 Wi) Yi Wi covariates. Small or large values of the pro- ​ ​ ______​ − · ​ ​ ______​ ​ . pensity score raises a number of issues. One i∑1 1 e​ ˆ ​(Xi) i∑​ 1 1 e​ ˆ ​( Xi) = − / = − concern is that alternative parametric models We refer to this as the inverse probabil- for the binary data, such as probit and logit ity weighting (IPW) estimator. See Hirano, models that can provide similar approxima- Imbens, and Ridder (2003) for intuition as tions in terms of estimated probabilities over to why estimating the propensity score leads the middle ranges of their arguments, tend to to a more efficient estimator, asymptotically, be more different when the probabilities are than knowing the propensity score. close to zero or one. Thus the choice of model Ichimura and Oliver Linton (2005) stud- and specification becomes more important, ied ​ ˆ when e​ ˆ ​( ) is obtained via kernel and it is often difficult to make well motivated τipw​ · regression, and they consider the problem of choices in treatment effect settings. A second optimal bandwidth choice when the object of concern is that for units with propensity scores interest is . More recently, Li, Racine, close to zero or one, the weights can be large, τPATE and Wooldridge (forthcoming) consider making those units particularly influential in ­kernel estimation for discrete as well as con- the ­estimates of the average treatment effects, tinuous covariates. The estimator proposed and thus making the estimator imprecise. by Li, Racine, and Wooldridge achieves the These concerns are less serious than those variance lower bound. See Hirano, Imbens, regarding regression estimators because at and Ridder (2003) and Wooldridge (2007) least the IPW estimates will accurately reflect for methods for estimating the variance for uncertainty. Still, these concerns make the these estimators. simple IPW estimators less attractive. (As Note that the blocking estimator can also for regression cases, the problem can be less be interpreted as a weighting ­estimator. severe for the ATT parameters because pro- Consider observations in block j. Within the pensity score values close to zero play no role. block, the Nj1 treated observations all get Problems for estimating ATT arise when some equal weight 1/Nj1. In the estimator for the units, as described by their observed covari- overall average treatment effect, this block ates, are almost certain to receive treatment.) gets weight (Nj0 Nj1) N, so we can write ​ ˆ N + / τ​ 5.5 Matching ​ i 1​ ​ i Yi, where for treated observations = ∑ = λ · in block j the weight normalized by N is N Matching estimators impute the missing · λi (N N ) N ), and for control observa- potential outcomes using only the outcomes = j0 + j1 / j1 tions it is N (N N ) N ). Implicitly of a few nearest neighbors of the opposite · λi = j0 + j1 / j0 this estimator is based on an estimate of treatment group. In that sense, matching is the propensity score in block j equal to similar to nonparametric kernel regression, N (N N ). Compared to the IPW estima- with the number of neighbors playing the role j1/ j0 + j1 tor, the propensity score is smoothed within of the bandwidth in the kernel ­regression. A 36 Journal of Economic Literature, Vol. XLVII (March 2009) formal difference with kernel methods is that replacement.” Given the matched pairs, the the asymptotic distribution for matching esti- treatment effect within a pair is estimated mators is derived conditional on the implicit as the difference in outcomes, and the over- bandwidth, that is, the number of neighbors, all average as the average of the within-pair often fixed at a small number, e.g., one. Using difference. Exploiting the representation of such asymptotics, the implicit estimate ​ ˆ the estimator as a difference in two sample μ​ w(x) is (close to) unbiased, but not consistent, means, inference is based on standard meth- for (x). In contrast, the kernel regression ods for differences in means or methods for μw estimators discussed in the previous section paired randomized experiments, ignoring implied consistency of ​ ˆ (x). any remaining bias. Fully efficient matching μw​ Matching estimators have the attractive algorithms that take into account the effect feature that the smoothing parameters are of a particular choice of match for treated easily interpretable. Given the matching unit i on the pool of potential matches for metric, the researcher only has to choose unit j are computationally cumbersome. In the number of matches. Using only a single practice, researchers use greedy algorithms match leads to the most credible inference that sequentially match units. Most com- with the least bias, at the cost of sacrificing monly the units are ordered by the value of some precision. This sits well with the focus the propensity score with the highest pro- in the literature on reducing bias rather than pensity score units matched first. See Gu and variance. It also can make the matching esti- Rosenbaum (1993) and Rosenbaum (1995) mator easier to use than those estimators that for discussions. require more complex choices of smoothing Abadie and Imbens (2006) study formal parameters, and this may be another expla- asymptotic properties of matching estimators nation for its popularity. in a different setting, where both treated and Matching estimators have been widely control units are (potentially) matched and studied in practice and theory (e.g., X. Gu and matching is done with replacement. Code for Rosenbaum 1993; Rosenbaum 1989, 1995, the Abadie–Imbens estimator is available in 2002; Rubin 1973b, 1979; Rubin and Neal Matlab and Stata (see Abadie et al. 2004).10 N Thomas 1992a, 1992b, 1996, 2000; Heckman, Formally, given a sample, (Yi, Xi, Wi​) ​i 1​ , { } = Ichimura, and Todd 1998; Dehejia and Sadek let (i) be the nearest neighbor to i, that is, ℓ1 Wahba 1999; Abadie and Imbens 2006; (i) is equal to the nonnegative integer j, for ℓ1 Alexis Diamond and Jasjeet S. Sekhon 2008; j {1, … , N}, if W W , and ∈ j ≠ i Sekhon forthcoming; Sekhon and Richard Grieve 2008; Rosenbaum and Rubin 1985; j i min ​ k i ǁ X − X ǁ = ​ k ǁX − X ǁ . Stefano M. Iacus, Gary King, and Giuseppe :Wk≠Wi Porro 2008). Most often they have been More generally, let (i) be the index that sat- ℓm applied in settings where, (1) the interest is in isfies W (i) Wi and that is the m-th closest ℓm ≠ the average treatment effect for the treated, to unit i: and (2) there is a large reservoir of potential controls, although recent work (Abadie and ​ ​ ​ 1 E Xl Xi X m(i) Xi F m, l W∑W ǁ − ǁ ≤ ǁ ℓ − ǁ = Imbens 2006) shows that matching estima- ∶ l≠ i tors can be modified to estimate the overall average effect. The setting with many poten- tial controls allows the researcher to match 10 See Sascha O. Becker and Andrea Ichino (2002) and each treated unit to one or more distinct Edwin Leuven and Barbara Sianesi (2003) for alternative controls, hence the label “matching without Stata implementations of matching estimators. Imbens and Wooldridge: Econometrics of Program Evaluation 37 where 1{ } is the indicator function, equal to it is therefore critical that some weights are · one if the expression in brackets is true and negative through the device of higher order zero otherwise. In other words, (i) is the kernels, with the exact order required depen- ℓm index of the unit in the opposite treatment dent on the dimension of the covariates (see, group that is the m-th closest to unit i in e.g., Heckman, Ichimura, and Todd 1998). In terms of the distance measure based on the practice, however, researchers have not used norm . Let (i) {1, … , N} denote the higher order kernels, and so bias concerns ǁ·ǁ M ⊂ set of indices for the first M matches for unit for nearest-neighbor matching estimators i: (i) { (i), … , (i)}. Now impute the are even more relevant for kernel matching M = ℓ1 ℓM missing potential outcomes as the average methods. of the outcomes for the matches, by defin- There are three caveats to the Abadie– ˆ ˆ ing ​ Y i ​(0) and Y​ i ​(1) as Imbens bias result. First, it is only the con- tinuous covariates that should be counted in

ˆ Yi if Wi 0, the dimension of the covariates. With dis- Yi(0) e = ​ ​ = 1/M ​ j (i)​ ​ ​ Yj if Wi 1, crete covariates the matching will be exact ∑ ∈M = in large samples, and as a result such cova-

1/M ​ j (i)​ ​ Yj if Wi 0, riates do not contribute to the order of the Yˆ (1) ∑ ∈M = ​ i​ e Y if W 1, bias. Second, if one matches only the treated, = i i = and the number of potential controls is much The simple matching estimator discussed in larger than the number of treated units, one Abadie and Imbens is then can justify ignoring the bias by appealing to N an asymptotic sequence where the number

ˆ 1 ˆ ˆ (19) match​ __ ​ ​ ​ A​ Y i ​(1) ​ Y i ​(0)B. of potential controls increases faster with τ = ​ N i∑1 − = the sample size than the number of treated Abadie and Imbens show that the bias of units. Specifically, if the number of controls, 1/K this estimator is of order O(N− ), where K N0, and the number of treated, N1, satisfy 4/K is the dimension of the covariates. Hence, if N1 N0​​ ​ 0, then the bias disappears in /​ → __ one studies the asymptotic distribution of the large samples after normalization by ​ N1 ​. __ √ estimator by normalizing by ​ √N ​ (as can be Third, even though the order of the bias may justified by the fact that the variance of the be high, the actual bias may still be small estimator is of order O(1/N)), the bias does if the coefficients in the leading term are not disappear if the dimension of the covari- small. This is possible if the biases for differ- ates is equal to two, and will dominate the ent units are at least partially offsetting. For large sample variance if K is at least three. To example, the leading term in the bias relies put this result in perspective, it is useful to on the regression function being nonlinear, relate it to bias properties of estimators based and the density of the covariates having a on kernel regression. Kernel estimators can nonzero slope. If either the regression func- be viewed as matching estimators where tion is well approximated by a linear func- all observations within some bandwidth hN tion, or the density is approximately flat, the receive some weight. As the sample size N bias may be fairly limited. increases, the bandwidth hN shrinks, but Abadie and Imbens (2006) also show sufficiently slow in order to ensure that the that matching estimators are generally not number of units receiving non-zero weights efficient. Even in the case where the bias diverges. If all the weights are positive, the is of low enough order to be dominated by bias for kernel estimators would generally be the variance, the estimators do not reach worse. In order to achieve root-N consistency, the efficiency bound given a fixed number 38 Journal of Economic Literature, Vol. XLVII (March 2009) of matches. To reach the bound the num- on estimating (x) E[Y (w) X x] for μw = i | i = ber of matches would need to increase with w 0, 1 and averaging the difference as in = the sample size. If M , with M N 0, (11), and the second is based on estimating → ∞ / → then the matching estimator is essentially the propensity score e(x) pr(W 1 X x) = i = | i = like a nonparametric regression estima- and using that to weight the outcomes as in tor. However, it is not clear that using an (18). For each approach, we have discussed approximation based on a sequence with estimators that achieve the asymptotic effi- an increasing number of matches improves ciency bound. If we have large sample sizes, the accuracy of the approximation. Given relative to the dimension of Xi, we might that in an actual data set one uses a spe- think our nonparametric estimators of the cific number of matches,M , it would appear conditional means or propensity score are appropriate to calculate the asymptotic sufficiently accurate to invoke the asymptotic variance conditional on that number, rather efficiency results described above. than approximate the distribution as if this In other cases, however, we might choose number is large. Calculations in Abadie and flexible parametric models without being Imbens show that the efficiency loss from confident that they necessarily approximate even a very small number of matches is the means or propensity score well. As we quite modest, and so the concerns about the discussed earlier, one reason for viewing esti- inefficiency of matching estimators may not mators of conditional means or propensity be very relevant in practice. Little is known scores as flexible parametric models is that about the optimal number of matches, or it greatly simplifies standard error calcula- about data-dependent ways of choosing it. tions for treatment effect estimates. In such All of the distance metrics used in prac- cases, one might want to adopt a strategy that tice standardize the covariates in some combines regression and propensity score manner. Abadie and Imbens use a diagonal methods in order to achieve some robust- matrix with each diagonal element equal to ness to misspecification of the parametric the inverse of the corresponding covariate models. It may be helpful to think about the variance. The most common metric is the analogy to omitted variable bias. Suppose Mahalanobis metric, which is based on the we are interested in the coefficient on Wi in inverse of the full covariance matrix. Zhao the (long) linear regression of Yi on a con- (2004), in an interesting discussion of the stant, Wi and Xi. Suppose we omit Xi from choice of metrics, suggests some alterna- the long regression, and just run the short tives that depend on the correlation between regression of Yi on a constant and Wi. The covariates, treatment assignment, and out- bias in the estimate from the short regression comes. So far there is little experience with is equal to the product of the coefficient on any metrics beyond inverse-of-the-variances Xi in the long regression, and the coefficient and the Mahalanobis metrics. Zhao (2004) on Xi in a regression of Wi on a constant and reports the results of some simulations using Xi. Weighting can be interpreted as remov- his proposed metrics, finding no clear winner ing the correlation between Wi and Xi, and given his specific design. regression as removing the direct effect of Xi. Weighting therefore removes the bias from 5.6 Combining Regression and Propensity omitting X from the regression. As a result, Score Weighting i combining regression and weighting can lead In sections 5.3 and 5.4, we describe meth- to additional robustness by both removing ods for estimating average causal effects the correlation between the omitted covari- based on two strategies: the first is based ates, and by reducing the ­correlation between Imbens and Wooldridge: Econometrics of Program Evaluation 39 the omitted and included variables. This is (2007), weighting the objective function the idea behind the doubly-robust estima- by any nonnegative function of Xi does not tors developed in Robins and Rotnitzky affect consistency of least squares.11 As a (1995), Robins, Rotnitzky and Lue Ping Zhao result, even if the logit model for the propen- (1995), and Mark J. van der Laan and Robins sity score is misspecified, the binary response

(2003). M L E ​ ˆ still has a well-defined probability γ​ Suppose we model the two regression func- limit, say *, and the IPW estimator that uses __ γ tions as w(x) w w (x ​X ​), for w 0, 1 weights 1 p(Xi; ​ ˆ ) for treated observations μ = α + β′ − = / γ ​ (where we abuse notation a bit and insert the and 1 (1 p(X ; ​ ˆ )) for control observations / − i γ​ sample averages of the covariates for their pop- is asymptotically equivalent to the estima- ulation means). More generally, we may use a tor that uses weights based on *.12 It does γ nonlinear model for the conditional expecta- not matter that for some x, e(x) p(x; *). ≠ γ tion, or just a more flexible linear approxima- This is the first part of the double robustness tion. Suppose we model the propensity score result: if the parametric conditional means as e(x) p(x; ), for example as p(x; ) exp( for E[Y(w) X x] are correctly specified, the = γ γ = γ0 | = x ) (1 exp( x )). In the first step, model for the propensity score can be arbi- + ′γ1 / + γ0 + ′γ1 we estimate by maximum likelihood and trarily misspecified for the true propensity γ obtain the estimated propensity scores as e​ ˆ ​ score. Equation (20) still leads to a consistent

(X ) p(x; ​ ˆ ). In the second step, we use lin- estimator for . i = γ​ τPATE ear regression, where we weight the objec- When the conditional means are correctly tive function by the inverse probability of specified, weighting will generally hurt in treatment or non-treatment. Specifically, to terms of asymptotic efficiency. The optimal estimate ( , ) and ( , ), we would solve weight is the inverse of the variance, and α0 β0 α1 β1 the weighted least squares problems in general there is no reason to expect that __ 2 weighting the inverse of (one minus) the pro- (Yi 0 (Xi X​ ​)) (20) min ​ ​ ​ ______​ − α − β′ 0 − , pensity score gives a good approximation to 0, 0 i W∑ 0 p(X ; ​ ˆ ​)) α β ∶ i= i γ that. Specifically, under homoskedasticity 2 2 of Y (w) so that ​ ​w ​ ​ ​w ​ ( x), in the context of and i σ = σ least squares—the IPW estimator of ( w, w) __ 2 α β (Yi 1 (Xi ​X ​)) is less efficient than the unweighted estima- min ​ ​ ​ ______​​ − α − β′1 − , 1, 1 ˆ tor; see Wooldridge (2007). The motivation α β i W∑i 1 1 p(Xi; ​ ​)) ∶ = − γ for propensity score weighting is different: it Given the estimated conditional mean func- offers a robustness advantage for estimating tions, we estimate PATE, using the expres- PATE. τ τ sion for ​ ˆ ​ ˆ ​ ​ ˆ ​ as in equation (13). The second part of the double robustness τ​reg = α1 − α0 But what is the motivation for weighting by result assumes that the logit model (or an the inverse propensity score when we did alternative binary response model) is cor- not use such weighting in section 5.3? The rectly specified for the propensity score, so motivation is the double robustness result that e(x) p(x; *), but allows the condi- = γ due to Robins and Rotnitzky (1995); see also tional mean functions to be misspecified. Daniel O. Scharfstein, Rotnitzky, and Robins (1999). 11 More generally, it does not affect the consistency of First, suppose that the conditional expec- any quasi-likelihood method that is robust for estimating tation is indeed linear, or E[Y (w) | X x] the parameters of the conditional mean. These are likeli- i i = (x X__​ ​). Then, as discussed in hoods in the linear exponential family, as described in C. = αw + β′w − Gourieroux, A. Monfort, and A. Trognon (1984a, 1984b). the treatment effect context by Wooldridge 12 See Wooldridge (2007). 40 Journal of Economic Literature, Vol. XLVII (March 2009)

The result is that in that case ​ ˆ w ​ E[Yi(w)], Once we estimate based on (20), how α → τ and thus ​ ˆ ​ ˆ ​ ​ ˆ ​ E[Y (1)] E[Y (0)] should we obtain a standard error? The nor- τ​= α1 − α0 → i − i PATE and the estimator is still consistent. malized variance still has the form V0 V1, = τ 2 + Let the weight for control observations be i where Vw E[( ​ ˆ w ​ μw) ]. One option is to * 1 * λ1 = α − (1 p(Xi; ))− ​ j W 0​ ​ ​ (1 p(Xj; ))− . exploit the representation of ​ ˆ 0 ​} as a weighted = − γ /∑ ∶ j= − γ α Then the least squares estimator for ​ ˆ ​ is average of Y ​ ˆ (X ​X__ ​), and use the naive α0 i + β0​ i − N variance estimator based on weighted least

( 2 1 ) ˆ 0 ​ ​ ​ (1 Wi) i squares with known weights: α = i∑1 − λ =

ˆ __ ˆ 2 ˆ __ 2 AYi ​ 0 (Xi X​ ​)B. ( 2 2 ) V 0 ​ ​ ​ ​ ​ i​​ ​ (Yi ​ 0 (Xi ​X ​) ​ ˆ 0 ​) , × − β ′ ​ − =i W∑ 0 λ · + β ′ ​ − − α ∶ i=

The weights imply that E[(1 Wi) iYi ] and similar for V1. In general, we may again − λ__ E[Yi(0)] and E[(1 Wi) i(Xi ​X ​)] want to adjust for the estimation of the = − λ − E[X ​X__ ​] 0, and as a result ​ ˆ ​ parameters in . See Wooldridge (2007) for = i − = α0 → γ E[Yi(0)]. Similarly, the average of the pre- details. dicted values for Yi(1) converges to E[Yi(1)], Although combining weighting and regres- and so the resulting estimator ​ ˆ ​ ˆ ​ ​ ˆ ​ sion is more attractive then either weighting τ​ipw = α1 − α0 is consistent for and irrespective or regression on their own, it still requires at τPATE τCATE of the shape of the regression functions. This least one of the two specifications to be accu- is the second part of the double robustness rate globally. It has been used regularly in part, at least for linear regression. the epidemiology literature, partly through For certain kinds of responses, including the efforts of Robins and his coauthors, but binary responses, fractional responses, and has not been widely used in the economics count responses, linearity of E[Y (w) X literature. i | i = x] is a poor assumption. Using linear con- ditional expectations for limited dependent 5.7 Subclassification and Regression variables effectively abdicates the first part of the double robustness result. Instead, We can also combine subclassification we should use coherent models of the con- with regression. The advantage relative to ditional means, as well as a sensible model weighting and regression is that we do not for the propensity score, with the hope that use global approximations to the regression the mean functions, propensity score, or function. The idea is that within stratum j, both are correctly specified. Beyond speci- we estimate the average treatment effect by fying logically ­coherent for E[Y (w) X x] regressing the outcome on a constant, an i | i = so that the first part of double robustness indicator for the treatment, and the covari- has a chance, for the second part we need ates, instead of simply taking the difference to choose functional forms and estimators in averages by treatment status as in section with the following property: even when the 5.4. The latter can be viewed as a regression mean functions are misspecified, E[Y (w)] estimate based on a regression with only an i = E[ (X , *)], where * is the ­probability limit intercept and the treatment indicator. The μ i δw δw o f ​ ˆ . Fortunately, for the common kinds of further regression adjustment simply adds δw​ limited dependent ­variables used in appli- (some of) the covariates to that regression. cations, such functional forms and estima- The key difference with using regression in tors exist; see Wooldridge (2007) for further the full sample is that, within a stratum, the discussion. propensity score varies relatively little. As a Imbens and Wooldridge: Econometrics of Program Evaluation 41 result, the covariate distributions are simi- regression is not used to extrapolate far out lar, and the regression function is not used to of sample. extrapolate far out of sample. The idea behind the regression adjustment ˆ ˆ To be precise, we estimate on the observa- is to replace ​ Y i ​(0) and Y​ i ​(1) by tions with Bi j 1, the regression function = ˆ Yi if Wi 0, Y (0) 1 = i e Yi j j Wi j Xi i, ​ ​ = __ ​ ​ j M(i)​ ​ ​(Yj 0(Xi X j)) if Wi 1, = α + τ · + β′ + ε M ∑ ∈ +β′ − =

ˆ by least squares, obtaining the estimates ​ j ˆ τ​ 1 and estimated variances V​ ​ . Dropping X from j i __​ ​ j M(i)​ ​(Yj 1(Xi X j)) if Wi 0, __ __ Yˆ (1) M ∑ ∈ +β − = this regression leads to ​ ˆ Y​ ​ Y​ ​ , which ​ i​ e τj​= j1 − j0 = Yi if Wi 1, is the blocking estimator we discussed in sec- = tion 5.4. We average the estimated stratum- specific average treatment effects, weighted where the average of the matched outcomes by the relative stratum size: is adjusted by the difference in covariates relative to the matched observation. The J Nj0 Nj1 ˆ + ˆ only question left is how to estimate the ​ ​ a______​ ​ b · ​ j​ , τ​= ∑j 1 N τ regression coefficients 0 and 1. For vari- = β β ous methods, see D. Quade (1982), Rubin with estimated variance (1979), and Abadie and Imbens (2006). The methods differ in whether the differ- J N N 2 ˆ j0 + j1 ˆ ence in outcomes is modeled as linear in V ​ ​ ​ ​ a______​ b ​ ​ · ​ V j ​ . = ∑j 1 N the difference in covariates, or the origi- = nal conditional outcome distributions are With a modest number of strata, this already approximated by linear regression func- leads to an estimator that is considerably tions, and on what sample the regression more flexible and robust than either subclas- functions are estimated. sification alone, or regression alone. It is prob- Here is one simple regression adjustment. ably one of the more attractive estimators in To be clear, it is useful to introduce some practice. Imbens and Rubin (forthcoming) additional notation. Given the set of match- suggest data-dependent methods for choos- ing indices (i), define M ing the number of strata.

ˆ Xi if Wi 0, X (0) 1 = i e 5.8 Matching and Regression ​ ​ = __ ​ ​ j M(i)​ ​ ​Xj if Wi 1, M ∑ ∈ = 1 ˆ ˆ Once we have the N pairs (​ Yi​(0), ​ Yi​(1)), the __ ​ ​ j M(i)​ ​ ​ Xj if Wi 0, Xˆ (1) M ∑ ∈ = simple matching estimator given in (19) aver- ​ i​ = e Xi if Wi 1, ages the difference. This estimator may still = ˆ ˆ be biased due to discrepancies between the and let ​ w be based on a regression of ​ Y i ​(w) β​ ˆ covariates of the matched observations and on a constant and ​ X i ​(w): their matches. One can attempt to reduce ˆ this bias by using regression methods. This ​ w ​ a α b use of regression is very different from using ​ ˆ ​ β w​ = regression methods on the full sample. N ​ Xˆ ​ (w) 1 ​ Yˆ ​(w) Here the covariate distributions are likely 1 i ′ − i a​ a ˆ ˆ ˆ b​b a ˆ ˆ b . to be similar in the matched sample, and so ​ ​ ​ X i ​(w ) ​ X​ ​(w)X ​(w) ​ ​ ​ X​ i ​(w) ​​ Y i ​(w ) i∑1 i ​ ​i ′ = 42 Journal of Economic Literature, Vol. XLVII (March 2009)

Like the combination of subclassification and There is an alternative, general, method regression, this leads to relatively robust esti- for estimating variances of treatment effect mators. Abadie and Imbens (2008a) find that estimators, developed by Abadie and Imbens the method works well in simulations based (2006), that does not require additional non- on the LaLonde data. parametric estimation. First, recall that most estimators are of the form 5.9 A General Method for Estimating N Variances ​ ˆ ​ ​ ​ ​ · Y , with ​ ​ ​ 1, ​ ​ ​ −1, τ = λi i λi = λi = For some of the estimators discussed in i∑1 i W∑i 1 i W∑i 0 = ∶ = ∶ = the previous sections, particular variance estimators have been used. Assuming that with the weights generally functions of λi a particular parametric model is valid, one all covariates and all treatment ­indicators. can typically use standard methods based on Conditional on the covariates and the likelihood theory or generalized method of ­treatment indicators (and thus relative to moments theory. Often, these methods rely ), the variance of such an estimator is τCATE on consistent estimation of components of N the variance. Here we discuss two general 2 2 ( ​ ˆ X1, … , XN, W1, … ,WN) ​ ​ ​​ ​​ ​ ​ ​ ​ (Xi). methods for estimating variances that apply i Wi 핍 τ​ | = i∑1 λ · σ to all estimators. = The first approach is to use bootstrapping In order to use this representation, we need 2 ( and Robert J. Tibshirani 1993; estimates of ​ ​ ​ ( X ), for all i. Fortunately, σWi i A. C. Davison and D. V. Hinkley 1997; Joel these need not be consistent estimates, as L. Horowitz 2001). Bootstrapping has been long as the estimation errors are not too widely used in the treatment effects litera- highly correlated so that the weighted aver- ture, as it is straightforward to implement. It age of the estimates is consistent for the has rarely been formally justified, although in weighted average of the variances. This is many cases it is likely to be valid given that similar in the way robust (Huber-Eicker- many of the estimators are asymptotically White) standard errors allow for general linear. However, in some cases it is known forms of heteroskedasticity without hav- that bootstrapping is not valid. Abadie and ing to consistently estimate the conditional Imbens (2008a) show that, for a fixed num- variance function. ber of matches, bootstrapping is not valid for Abadie and Imbens (2006) suggested using 2 matching estimators. It is likely that the prob- a matching estimator for ​ ​ ​ ( X ). The idea σWi i lems that invalidate the bootstrap disappear behind this matching variance estimator is that if the number of matches increases with the if we can find two treated units with Xi x, 2 2 =2 sample size (thus, the bootstrap might be valid we can estimate ​ ​​ ​( x) as ​​ ˆ ​ ​ ​( x) (Y Y) 2. σ1 σ​1 = i − j / for kernel estimators). Nevertheless, because In general, it is difficult to find exact matches, in practice researchers often use a small num- but, again, this is not necessary. Instead, one ber of matches, or nonnegative kernels, it is uses the closest match within the set of units not clear whether the bootstrap is an effective with the same treatment status. Let (i) be ν method for obtaining standard errors and con- the unit closest to i, with the same treatment structing confidence intervals. In cases where indicator (W (i) Wi ), so that ν = bootstrapping is not valid, often subsampling (Dimitris N. Politis, Joseph P. Romano, and X (i) Xi ​ min ​ X j Xi . ǁ ν − ǁ = j W i ǁ − ǁ Michael Wolf 1999) remains valid, but this ∶ j= 2 has not been applied in practice. Then we can estimate ​ ​ (X ) as σ​Wi i Imbens and Wooldridge: Econometrics of Program Evaluation 43

2 2 ˆ W​ ​ (Xi) (Yi Y (i)) 2. limiting it to individuals with zero earnings in σ​ i = − ν / the year prior to the program). Dehejia and 2 This way we can estimate ​ ​ ( X ) for all units. Wahba looked at this problem more system- σW​ i i Note that these are not consistent estimators atically and found that a major concern is the of the conditional variances. As the sample lack of overlap in the covariate distributions. size increases, the bias of these estimators will Traditionally, overlap in the covariate dis- disappear, just as we saw that the bias of the tributions was assessed by looking at sum- matching estimator for the average treatment mary statistics of the covariate distributions effect disappears under similar conditions. by treatment status. As discussed before in the We then use these estimates of the con- introduction to section 5, it is particularly use- ditional variance to estimate the variance of ful to report differences in average covariates the estimator: normalized by the square root of the sum of N the within-treatment group variances. In table 2 2 ˆ ˆ ˆ V​ (​ ) ​ ​ ​ i​​ ​ ​​ Wi​ ( Xi). 2, we report, for the LaLonde data, averages τ​ = i∑1 λ · σ ​​ = and standard deviations of the basic covariates, An extension to allow for clustering has and the normalized difference. For four out of been developed by Samuel Hanson and Adi the ten covariates the means are more than Sunderam (2008). a standard deviation apart. This immediately suggests that the technical task of adjusting 5.10 Overlap in Covariate Distributions for differences in the covariates is a challeng- In practice, a major concern in applying ing one. Although reporting normalized dif- methods under the assumption of uncon- ferences in covariates by treatment status is a foundedness is lack of overlap in the covariate sensible starting point, inspecting differences distributions. In fact, once one is committed to one covariate at a time is not generally suffi- the unconfoundedness assumption, this may cient. Even if all these differences are small, well be the main problem facing the analyst. there may still be areas with limited overlap. The overlap issue was highlighted in papers Formally, we are concerned with regions in the by Dehejia and Wahba (1999) and Heckman, covariate space where the density of covariates Ichimura, and Todd (1998). Dehejia and in one treatment group is zero and the density Wahba reanalyzed data on a job training pro- in the other treatment group is not. This cor- gram originally analyzed by LaLonde (1986). responds to the propensity score being equal LaLonde (1986) had attempted to replicate to zero or one. Therefore, a more direct way of results from an ­experimental evaluation of a assessing the overlap in covariate distributions job training program, the National Supported is to inspect histograms of the estimated pro- Work (NSW) program, using a comparison pensity score by treatment status. group constructed from two public use data Once it has been established that overlap sets, the Panel Study of Income Dynamics is a concern, several strategies can be used. (PSID) and the Current Population Survey We briefly discuss two of the earlier specific (CPS). The NSW program targeted indi- suggestions, and then describe in more detail viduals who were disadvantaged with very two general methods. In practice, researchers poor labor market histories. As a result, they have often simply dropped observations with were very different from the raw comparison propensity score close to zero or one, with the groups constructed by LaLonde from the actual cutoff value chosen in an ad hoc fashion. CPS and PSID. LaLonde partially addressed Dehejia and Wahba (1999) focus on the aver- this problem by limiting his raw comparison age effect for the treated. After estimating samples based on single covariate criteria (e.g., the propensity score, they find the ­smallest 44 Journal of Economic Literature, Vol. XLVII (March 2009)

Table 2 Balance Improvements in the Lalonde Data (Dehejia–Wahba Sample)

CPS Controls NSW Treated Normalized Difference Treated-Controls

(15992) 185 All eˆ X e1 P-score Maha ( i) ≥ Covariate mean (s.d.) mean (s.d.) (16177) (6286) (370) (370)

Age 33.23 (11.05) 25.82 (7.16) 0.56 0.25 0.08 0.16 − − − − Education 12.03 (2.87) 10.35 (2.01) 0.48 0.30 0.02 0.09 − − − − Married 0.71 (0.45) 0.19 (0.39) 0.87 0.46 0.01 0.20 − − − − Nodegree 0.30 (0.46) 0.71 (0.46) 0.64 0.42 0.08 0.18 Black 0.07 (0.26) 0.84 (0.36) 1.72 1.45 0.02 0.00 − Hispanic 0.07 (0.26) 0.06 (0.24) 0.04 0.22 0.02 0.00 − − − Earn ’74 14.02 (9.57) 2.10 (4.89) 1.11 0.40 0.07 0.00 − − − Earn ’74 positive 0.88 (0.32) 0.29 (0.46) 1.05 0.72 0.07 0.00 − − − Earn ’75 13.65 (9.27) 1.53 (3.22) 1.23 0.35 0.02 0.01 − − − − Earn ’75 positive 0.89 (0.31) 0.40 (0.49) 0.84 0.54 0.09 0.00 − − − value of the estimated propensity score This improves the covariate balance, but among the treated units, e1 mini:W 1 e​ ˆ ​(Xi). many of the normalized differences are still = i= They then drop all control units with an substantial. estimated propensity score lower than this Heckman, Ichimura, and Todd (1997) and threshold e1. The idea behind this ­suggestion Heckman et al. (1998) develop a ­different is that control units with very low values for method. They focus on estimation of the set the propensity score may be so different where the density of the propensity score con- from treated units that including them in the ditional on the treatment is bounded away from analysis is likely to be ­counterproductive. (In zero for both treatment regimes. Specifically, effect, the population over which the treat- they first estimate the density functionsf (e W | ment effects are calculated is redefined.) A w), for w 0, 1, nonparametrically. They = = ˆ concern is that the results may be sensitive then evaluate the estimated density f​ ​ ( e​ ˆ ​(X ) W i | i to the choice of specific threshold e1. If, for 0) for all N values Xi, and the same for = ˆ example, one used as the threshold the K-th the estimated density f​ ​ ( e​ ˆ ​(X ) W 1) for all i | i = order statistic of the estimated propensity N values Xi. Given these 2N values they score among the treated (Lechner 2002a, calculate the 2N q order statistic of · 2002b), the results might change consider- these 2N estimated densities. Denote this ˆ ably. In the sixth column of table 2, we report order statistic by ​ f q​. Then, for each unit the normalized difference (normalized using i, they compare the estimated density ˆ ˆ ˆ the same denominator equal to the square ​ f ​ ( e​ ˆ ​(Xi) Wi 0) to ​ f q​, and ​ f ​ ( e​ ˆ ​( Xi) Wi 1) ˆ | = | = root of the sum of the within treatment group to ​ f q​. If either of those estimated densities sample variances) after removing 9,891 (out is below the order statistic, the observation of a total 16,177) control observations whose gets dropped from the analysis. Smith and estimated propensity score was smaller than Todd (2005) implement this method with the smallest value of the estimated propen- q 0.02, but provide no motivation for the = sity score among the treated, e 0.00051. choice of the threshold. 1 = Imbens and Wooldridge: Econometrics of Program Evaluation 45

5.10.1 Matching to Improve Overlap in ­constructing a matched sample in this fashion. Covariate Distributions In both cases the treated units were matched in reverse order of the estimated propen- A systematic method for dropping control sity score. The seventh column is based on units who are different from the treated units is matching on the estimated propensity score, to construct a matched sample. This approach and the last column is based on matching has been pushed by Rubin in a series of stud- on all the covariates, using the Mahalanobis ies, see Rubin (2006). It is designed for settings metric (the inverse of the covariance matrix where the interest is in the average effect for the of the covariates). Matching, either on the treated (e.g., as in the LaLonde application). It estimated propensity score or on the full set relies on the control sample being larger than of covariates dramatically improves the bal- the treated sample, and works especially well ance. Whereas before some of the covariates when the control sample is much larger. differed by as much as 1.7 times a standard First, the treated observations are deviation, now the normalized differences are ordered, typically by decreasing values of all less than one tenth of a standard deviation. the estimated propensity score, since treated The remaining differences are not negligible, observations with high values of the pro- however. For example, average differences in pensity score are generally more difficult to 1974 earnings are still on the order of $700, match. Then the first treated unit (e.g., the which, given the experimental estimate from one with the highest value for the estimated the LaLonde (1986) paper of about $2,000, propensity score) is matched to the nearest is substantial. As a result, simple estimators control unit. Next, the second treated unit is such as the average of the within-matched- matched to the nearest control unit, exclud- pair differences are not likely to lead to cred- ing the control unit that was used as a match ible estimates. Nevertheless, maintaining for the first treated unit. Matching without unconfoundedness, this matched sample is replacement all treated units in this manner sufficiently well balanced that one may be leads to a sample of 2 N units, (where N able to obtain credible and robust estimates × 1 1 is the size of the original treated subsample), from it in a way that the original sample half of them treated and half of them control would not allow. units. Note that the matching is not neces- 5.10.2 Trimming to Improve Overlap in sarily used here as the final analysis. We do Covariate Distributions not propose to estimate the average treat- ment effect for the treated by averaging the Matching with replacement does not work differences within the pairs. Instead, this is if the estimand of interest is the overall aver- intended as a preliminary analysis, with the age treatment effect. For that case Crump et goal being the construction of a sample with al. (2009) suggest an easily implementable more overlap. Given a more balanced sample, way of selecting the subpopulation with over- one can use any of the previously discussed lap, consistent with the current practice of methods for estimating the average effect of dropping observations with propensity score the treatment, including regression, propen- values close to zero or one. Their method is sity score methods, or matching. Using those generally applicable and in particular does not methods on the balanced sample is likely to require that the control sample is larger than reduce bias relative to using the simple dif- the treated sample. They consider estimation ference in averages by treatment status. of the average treatment effect for the sub- The last two columns in table 2 report population with X . They suggest choos- i ∈ 픸 the balance in the ten covariates after ing the set from the set of all ­subsets of the 픸 46 Journal of Economic Literature, Vol. XLVII (March 2009) covariate space to minimize the asymptotic half of what it is in the full sample, with this variance of the efficient estimator of the aver- improvement obtained by dropping approxi- age treatment effect for that set. Under some mately 20 percent of the original sample. conditions (in particular homoskedasticity), A potentially controversial feature of all they show that the optimal set * depends these methods is that they change what is 픸 only on the value of the propensity score. This being estimated. Instead of estimating , τPATE method suggests discarding observations with the Crump et al. (2009) approach estimates a ­propensity score less than away from the CATE, . This results in reduced external α τ 픸 two extremes, zero and one: validity, but it is likely to improve internal validity. * {x e(x) 1 }, 픸 = ∈ 핏 | α ≤ ≤ − α 5.11 Assessing the Unconfoundedness Assumption where satisifies a condition based on the α marginal distribution of the propensity The unconfoundedness assumption used score: in section 5 is not testable. It states that the conditional distribution of the outcome ​ 1 ​ under the control given receipt of the active ______(1 ) α · − α treatment and covariates, is identical to the distribution of the control outcome condi- 2 E c​ 1 ​ ​ 1 ​ = · ______e(X) (1 e(X)) | ______e(X) (1 e(X)) tional on being in the control and covari- · − · − ates. A similar assumption is made for the 1 ​d . distribution of the treatment outcome. Yet < ​ ______(1 ) α · − α since the data are completely uninformative about the distribution of Yi(0) for those who Based on empirical examples and numerical received the active treatment and of Yi(1) calculations with beta distributions for the for those receiving the control, the data can propensity score, Crump et al. (2009) suggest never reject the unconfoundedness assump- that the rule-of-thumb fixing at 0.10 gives tion. Nevertheless, there are often indi- α good results. rect ways of assessing this assumption. The To illustrate this method, table 3 presents most important of these were developed in summary statistics for data from Imbens, Rosenbaum (1987) and Heckman and Hotz Rubin and Sacerdote (2001) on lottery play- (1989). Both methods rely on testing the ers, including “winners” who won big prizes, null hypothesis that an average causal effect and “losers” who did not. Even though win- is zero, where the particular average causal ning the lottery is obviously random, varia- effect is known to equal zero. If the testing tion in the number of tickets bought, and procedure rejects the null hypothesis, this is nonresponse, creates imbalances in the cova- interpreted as weakening the support for the riate distributions. In the full sample (sample unconfoundedness assumption. These tests size N 496), some of the covariates dif- can be divided into two groups. = fer by as much as 0.64 standard deviations. The first set of tests focuses on estimating Following the Crump et al. calculations leads the causal effect of a treatment that is known to a bound of 0.0914. Discarding the obser- not to have an effect. It relies on the presence vations with an estimated propensity score of two or more control groups (Rosenbaum outside the interval [0.0914, 0.9086] leads 1987). Suppose one has two potential control to a sample size 388. In this subsample, the groups, for example eligible nonparticipants largest normalized difference is 0.35, about and ineligibles, as in Heckman, Ichimura and Imbens and Wooldridge: Econometrics of Program Evaluation 47

Table 3 Balance Improvements in the Lottery Data

Losers Winners Normalized Difference Treated-Controls

(259) (237) All 0.0914 eˆ X 0.9086 ≤ ( i) ≤ Covariate mean (s.d.) mean (s.d.) (496) (388)

Year Won 1996.4 (1.0) 1996.1 (1.3) 0.19 0.13 − − Tickets Bought 2.19 (1.77) 4.57 (3.28) 0.64 0.33 Age 53.2 (12.9) 47.0 (13.8) 0.33 0.19 − − Male 0.67 (0.47) 0.58 (0.49) 0.13 0.09 − − Years of Schooling 14.4 (2.0) 13.0 (2.2) 0.50 0.35 − − Working Then 0.77 (0.42) 0.80 (0.40) 0.06 0.02 − Earnings Year 6 15.6 (14.5) 12.0 (11.8) 0.19 0.10 − − − Earnings Year 5 16.0 (15.0) 12.1 (12.0) 0.20 0.12 − − − Earnings Year 4 16.2 (15.4) 12.0 (12.1) 0.21 0.15 − − − Earnings Year 3 16.6 (16.3) 12.8 (12.7) 0.18 0.14 − − − Earnings Year 2 17.6 (16.9) 13.5 (13.0) 0.19 0.15 − − − Earnings Year 1 18.0 (17.2) 14.5 (13.6) 0.16 0.14 − − − Pos Earnings Year 6 0.69 (0.46) 0.70 (0.46) 0.02 0.05 − Pos Earnings Year 5 0.68 (0.47) 0.74 (0.44) 0.10 0.07 − Pos Earnings Year 4 0.69 (0.46) 0.73 (0.44) 0.07 0.02 − Pos Earnings Year 3 0.68 (0.47) 0.73 (0.44) 0.09 0.02 − Pos Earnings Year 2 0.68 (0.47) 0.74 (0.44) 0.10 0.04 − Pos Earnings Year 1 0.69 (0.46) 0.74 (0.44) 0.07 0.02 −

Todd (1997). One can estimate a “pseudo” nonparticipants as in Heckman, Ichimura average treatment effect by analyzing the and Todd (1997) is a particularly attractive data from these two control groups as if one comparison. Alternatively one may use geo- of them is the treatment group. In that case, graphically distinct comparison groups, for the treatment effect is known to be zero example from areas bordering on different and statistical evidence of a non-zero effect sides of the treatment group. implies that at least one of the control groups To be more specific, let Gi be an indica- is invalid. Again, not rejecting the test does tor variable denoting the membership of the not imply the unconfoundedness assumption group, taking on three values, G { 1, 0, 1}. i ∈ − is valid (as both control groups could suffer For units with G 1 or 0, the treatment i = − the same bias), but nonrejection in the case indicator Wi is equal to 0: where the two control groups could poten- tially have different biases makes it more 0 if G 1, 0, W e i = − plausible that the unconfoundedness assump- i = 1 if G 1. i = tion holds. The key for the power of this test is to have available control groups that are Unconfoundedness only requires that likely to have different biases, if they have any at all. Comparing ineligibles and eligible (23) Y Y W | X , i(0), i(1) ǁ i i 48 Journal of Economic Literature, Vol. XLVII (March 2009) and this is not testable. Instead we focus on Next, we turn to implementation of the testing an implication of the stronger condi- tests. We can simply test whether there is a tional independence relation difference in average values of Yi between the two control groups, after adjusting for (24) Y Y G | X . differences in X . That is, we effectively test i(0), i(1) ǁ i i i whether This independence condition implies (23), but in contrast to that assumption, it also ECE[Y | G 1, X ] E[Y | G 0, X ]D 0. i i = − i − i i = i = implies testable restrictions. In particular, we focus on the implication that More generally we may wish to test

(25) Y (0) G | X , G { 1, 0} ECE[Y | G 1, X x] i ǁ i i i ∈ − i i = − i = Y G | X , G { 1, 0}, E[Y | G 0, X x]D 0 ⇔ i ǁ i i i ∈ − − i i = i = = because G { 1,0} implies that Y Y (0). for all x in the support of X using the meth- i ∈ − i = i i Because condition (24) is slightly stron- ods discussed in Crump et al. (2008b). We ger than unconfoundedness, the question is can also include transformations of the basic whether there are interesting settings where outcomes in the procedure to test for dif- the weaker condition of unconfoundedness ference in other aspects of the conditional holds, but not the stronger condition. To dis- distributions. cuss this question, it is useful to consider two A second set of tests of unconfounded- alternative conditional independence condi- ness focuses on estimating the causal effect tions, both of which are implied by (24): of the treatment on a variable known to be unaffected by it, typically because its value (26) AY (0), Y (1)B W | X , G { 1, 1}, is determined prior to the treatment itself. i i ǁ i i i ∈ − Such a variable can be time-invariant, but and the most interesting case is in considering the treatment effect on a lagged outcome. (27) AY (0), Y (1)B W | X , G {0, 1}. If it is not zero, this implies that the treated i i ǁ i i i ∈ observations are distinct from the controls; If (26) holds, then we can estimate the average namely that the distribution of Yi(0) for the causal effect by invoking the unconfounded- treated units is not comparable to the distri- ness assumption using only the first control bution of Yi(0) for the controls. If the treat- group. Similarly, if (27) holds, then we can ment is instead zero, it is more plausible that estimate the average causal effect by invok- the unconfoundedness assumption holds. Of ing the unconfoundedness assumption using course this does not directly test the uncon- only the second control group. The point is foundedness assumption; in this setting, that it is difficult to envision a situation where being able to reject the null of no effect does unconfoundedness based on the two com- not directly reflect on the hypothesis of inter- parison groups holds—that is, (23) holds— est, unconfoundedness. Nevertheless, if the but it does not hold using only one of the two variables used in this proxy test are closely comparison groups at the time. In practice, it related to the outcome of interest, the test seems likely that if unconfoundedness holds arguably has more power. For these tests it then so would the ­stronger condition (24), is clearly helpful to have a number of lagged and we have the testable implication (25). outcomes. Imbens and Wooldridge: Econometrics of Program Evaluation 49

First partition the vector of covariates Xi original unconfoundedness assumption is not into two parts, a (scalar) pseudo outcome, testable. Nevertheless, if one has a proxy for p denoted by ​X​i​ ​, and the remainder, denoted either of the potential outcomes, and in par- r p r by ​X​​ ​, so that X (​X​​ ​, ​X​​ ′). Now we will ticular a proxy that is observed irrespective i i = i i ​′ assess whether the following conditional of the treatment status, one can test inde- independence relation holds: pendence for that proxy variable. We use the p pseudo outcome ​X​i​ ​ as such a proxy variable. p r p (28) X​i​ ​ ǁ Wi | ​Xi​​ ​. That is, we view ​Xi​​ ​ as a proxy for, say, Yi(0), and assess (29) by testing (28). The two issues are, first, the interpretation The most convincing applications of these of this condition and its relationship to the assessments are settings where the two links unconfoundedness assumption, and second, are plausible. One of the leading examples the implementation of the test. is where Xi contains multiple lagged mea- The first issue concerns the link between sures of the outcome. For example, in the the conditional independence relation in (28) evaluation of the effect of a labor market and original unconfoundedness. This link, by program on annual earnings, one might necessity, is indirect, as unconfoundedness have ­observations on earnings for, say, six cannot be tested directly. Here we lay out the years prior to the program. Denote these arguments for the connection. First consider lagged outcomes by Yi, 1, … , Yi, 6, where − − a related condition: Yi, 1 is the most recent and Yi, 6 is the most − − distant preprogram earnings measure. One r (29) Yi(0), Yi(1) ǁ Wi | ​Xi​​ ​. could implement the above ideas using earn- ings for the most recent preprogram year p If this modified unconfoundedness condition Yi, 1 as the pseudo outcome ​X​​ ​, so that the − i were to hold, one could use the adjustment vector of remaining pretreatment variables ​ r methods using only the subset of covari- Xi​​ ​ would still include the five prior years of r ates ​X​​ ​. In practice, though not necessarily, preprogram earnings Yi, 2, … , Yi, 6 (ignor- i − − this is a stronger condition than the original ing additional pre-treatment variables). In unconfoundedness condition which requires that case one might reasonably argue that if conditioning on the full vector Xi. One has unconfoundedness holds given six years of to be careful here, because it is theoretically preprogram earnings, it is plausible that it possible that conditional on a subset of the would also hold given only five years of pre- covariates unconfoundedness holds, but at program earnings. Moreover, under uncon- the same time unconfoundedness does not foundedness Yi(c) is independent of Wi given hold conditional on the full set of covari- Yi, 1, … , Yi, 6, which would suggest that it is − − ates. In practice this situation is rare though. plausible that Yi, 1 is independent of Wi given − For example, it is difficult to imagine in an Yi, 2, … , Yi, 6. Given those arguments, one − − evaluation of a labor market program that can plausibly assess unconfoundedness by unconfoundedness would hold given age and testing whether the level of education, but not if one addi- tionally conditions on gender. Generally Yi, 1 ǁ Wi | Yi, 2, … , Yi, 6. − − − making subpopulations more homogenous in pretreatment variables tends to improve the The implementation of the tests is the same plausibility of unconfoundedness. as in the first set of tests for assessing uncon- The modified unconfoundedness condition foundedness. We can simply test whether (29) is not testable for the same reasons the estimates of the average difference between 50 Journal of Economic Literature, Vol. XLVII (March 2009)

r 13 the groups adjusted for differences in ​X​i​ ​ are if on average it does not affect outcomes. zero, or test whether the average difference They show that in some data sets they reject is zero for all values of the covariates (e.g., the null hypothesis (30) even though they Crump et al. 2008). cannot reject the null hypothesis of a zero average effect. 5.12 Testing Taking the motivation in Crump et al. (2008) one step further, one may also be Most of the focus in the evaluation litera- interested in testing the null hypothesis that ture has been on estimating average treat- the conditional distribution of Yi(0) given Xi ment effects. Testing has largely been limited x is the same as the conditional distribu- = to the null hypothesis that the average effect tion of Y (1) given X x. Under the main- i i = is zero. In that case testing is straightforward tained hypothesis of unconfoundedness, this since many estimators exist for the average is equivalent to testing the null hypothesis treatment effect that are approximately nor- that mally distributed in large samples with zero asymptotic bias. In addition there is some H0 : Yi ǁ Wi | Xi, testing based on the Fisher approach using the randomization distribution. In many against the alternative hypothesis that Yi is cases, however, there are other null hypoth- not independent of Wi given Xi. Tests of this eses of interest. Crump et al. (2008) develop type can be implemented using the methods tests of the null hypotheses of zero average of Linton and Pedro Gozalo (2003). There effects conditional on the covariates, and of have been no applications of these tests in a constant average effect conditional on the the program evaluation literature. covariates. Formally, in the first case the null hypothesis 5.13 Selection of Covariates

(30) H : (x) 0, x, A very important set of decisions in 0 τ = ∀ ­implementing all of the methods described against the alternative hypothesis in this section involves the choice of covari- ates to be included in the regression func- H : (x) 0, for some x. tions or the propensity score. Except for a τ ≠ warnings about including covariates that are Recall that (x) E[Y (1) Y (0) X x] is themselves influenced by the treatment (for τ = i − i | i = the average effect for the subpopulation with example, Heckman and Salvador Navarro- covariate value x. The second hypothesis Lozano 2004; Wooldridge 2005), the litera- studied by Crump et al. (2008) is ture has not been very helpful. Consequently, researchers have just included all covariates (31) H : (x) , x, linearly, without much systematic effort to 0 τ = τPATE ∀ find more compelling specifications. Most against the alternative hypothesis of the technical results using nonparametric methods include rates at which the ­smoothing H : (x) , for some x. a τ ≠ τPATE Part of their motivation is that in many cases 13 A second motivation is that it may be impossible to obtain precise estimates for even in cases where one τ PATE there is substantive interest in whether the can convincingly reject some of the hypotheses regarding program is beneficial for some groups, even (x). τ Imbens and Wooldridge: Econometrics of Program Evaluation 51 parameters should change with the sample ­functional form and functions of a small set size. For example, using regression estima- of covariates. tors, one would have to choose the bandwidth if using kernel estimators, or the number of 6. Selection on Unobservables terms in the series if using series estimators. The program evaluation literature does not In this section we discuss a number of provide much guidance as to how to choose methods that relax the pair of assump- these smoothing parameters in practice. tions made in section 5. Unlike in the set- More generally, the nonparametric estima- ting under unconfoundedness, there is not tion literature has little to offer in this regard. a unified set of methods for this case. In Most of the results in this literature offer a number of special cases there are well- optimal choices for smoothing parameters if understood methods, but there are many the criterion is integrated squared error. In cases without clear recommendations. We the current setting the interest is in a sca- will highlight some of the controversies and lar parameter, and the choice of smoothing different approaches. First we discuss some parameter that is optimal for the regression methods that simply drop the unconfound- function itself need not be close to optimal edness assumption. Next, in section 6.2, we for the average treatment effect. discuss sensitivity analyses that relax the Hirano and Imbens (2001) consider an unconfoundedness assumption in a more estimator that combines weighting with the limited manner. In section 6.3, we discuss propensity score and regression. In their appli- instrumental variables methods. Then, in cation they have a large number of covariates, section 6.4 we discuss regression disconti- and they suggest deciding which ones to include nuity designs, and in section 6.5 we discuss on the basis of t-statistics. They find that the difference-in-differences methods. results are fairly insensitive to the actual cutoff 6.1 Bounds point if they use the weight/regression estima- tor, but find more sensitivity if they only use In a series of papers and books, Manski weighting or regression. They do not provide (1990, 1995, 2003, 2005, 2007) has formal properties for these choices. ­developed a general framework for inference Ichimura and Linton (2005) consider in settings where the parameters of interest inverse probability weighting estimators and are not identified. Manski’s key insight is that analyze the formal problem of bandwidth even if in large samples one cannot infer the selection with the focus on the average treat- exact value of the parameter, one may be ment effect. Imbens, Newey and Ridder able to rule out some values that one could (2005) look at series regression estimators not rule out a priori. Prior to Manski’s work, and analyze the choice of the number of researchers had typically dismissed models terms to be included, again with the objective that are not point-identified as not useful in being the average treatment effect. Imbens practice. This framework is not restricted to and Rubin (forthcoming) discuss some step- causal settings, and the reader is referred to wise covariate selection methods for finding Manski (2007) for a general discussion of the a specification for the propensity score. approach. Here we limit the discussion to It is clear that more work needs to be program evaluation settings. done in this area, both for the case where We start by discussing Manksi’s per- the choice is which covariates to include spective in a very simple case. Suppose we from a large set of potential covariates, have no covariates and a binary outcome Yi and in the case where the choice concerns {0, 1}. Let the goal be inference for the ∈ 52 Journal of Economic Literature, Vol. XLVII (March 2009)

­average effect in the population, . We assumptions we cannot rule out any value τPATE can decompose the population average treat- inside the bounds. See Manski et al. (1992) ment effect as for an empirical example of these particular bounds. E[Y (1) W 1] pr(W 1) In this specific case the bounds are not τPATE = i | i = · i = particularly informative. The width of the E[Y (1) W 0] pr(W 0) bounds, the difference in , with and + i | i = · i = τu − τl τl given above, is always equal to one, imply- τu E[Y (0) W 1] pr(W 1) ing we can never rule out a zero average treat- − i | i = · i = ment effect. (In some sense this is obvious: E[Y (0) W 0] pr(W 0)]. if we refrain from making any assumptions + i | i = · i = regarding the treatment effects we cannot Of the eight components of this expres- rule out that the treatment effect is zero for sion, we can estimate six. The data con- any unit.) In general, however, we can add tain no information about the remaining some assumptions, short of making the type two, E[Y (1) W 0] and E[Y (0) W 1]. of assumption as strong as unconfoundedness i | i = i | i = Because the outcome is binary, and before that gets us back to the point-identified case. seeing any data, we can deduce that these With such weaker assumptions we may be able two ­conditional expectations must lie inside to tighten the bounds and obtain informative the interval [0, 1], but we cannot say any more results, without making the strong assump- without additional assumptions. This implies tions that strain credibility. The presence of that without additional assumptions we can covariates increases the scope for additional be sure that assumptions that may tighten the bounds. Examples of such assumptions include those [ , ], in the spirit of instrumental variables, where τPATE ∈ τl τu some covariates are known not to affect the where we can express the lower and upper potential outcomes (e.g., Manski 2007), or bound in terms of estimable quantities, monotonicity assumptions where expected outcomes are monotonically related to cova- E[Y (1) W 1] pr(W 1) riates or treatments (e.g., Manski and John τl = i | i = · i = V. Pepper 2000). For an application of these pr(W 1) E[Y (0) W 0] methods, see Hotz, Charles H. Mullin, and − i = − i | i = Seth G. Sanders (1997). We return to some of pr(W 0), these settings in section 6.3. × i = This discussion has focused on identifica- and tion and demonstrated what can be learned in large samples. In practice these bounds E[Y (1) W 1] pr(W 1) need to be estimated, which leads to addi- τu = i | i = · i = tional uncertainty regarding the estimands. pr(W 0) E[Y (0) W 0] A fast developing literature (e.g., Horowitz + i = − i | i = and Manski 2000; Imbens and Manski 2004; pr(W 0), Chernozhukov, Hong, and Elie Tamer 2007; × i = Arie Beresteanu and Francesca Molinari In other words, we can bound the average 2006; Romano and Azeem M. Shaikh 2006a, treatment effect. In this example the bounds 2006b; Ariel Pakes et al. 2006; Adam M. are tight, meaning that without additional Rosen 2006; Donald W. K. Andrews and Imbens and Wooldridge: Econometrics of Program Evaluation 53

Gustavo Soares 2007; Ivan A. Canay 2007; completely relaxing the unconfoundedness and Jörg Stoye 2007) discusses construction assumption, the idea is to relax it slightly. of confidence intervals in general settings More specifically, violations of unconfound- with partial identification. One point of con- edness are interpreted as evidence of the tention in this literature has been whether presence of unobserved covariates that are the focus should be on confidence intervals correlated, both with the potential outcomes for the parameter of interest ( in this and with the treatment indicator. The size of τ PATE case), or for the identified set. Imbens and bias these violations of unconfoundedness Manski (2004) develop confidence sets for can induce depends on the strength of these the parameter. In large samples, and at a correlations. Sensitivity analyses investigate 95 percent confidence level, the Imbens– whether results obtained under the main- Manski confidence intervals amount to tained assumption of unconfoundedness can taking the lower bound minus 1.645 times be changed substantially, or even overturned the standard error of the lower bound and entirely, by modest violations of the uncon- the upper bound plus 1.645 times its stan- foundedness assumption. dard error. The reason for using 1.645 To be specific, consider a job train- rather than 1.96 is to take account of the ing program with voluntary enrollment. fact that, even in the limit, the width of the Suppose that we have monthly labor market confidence set will not shrink to zero, and histories for a two year period prior to the therefore one only needs to be concerned program. We may be concerned that indi- with one-sided errors. Chernozhukov, Hong, viduals choosing to enroll in the program and Tamer (2007) focus on confidence sets are more motivated to find a job than those that include the entire partially identified that choose not to enroll in the program. set itself with fixed probability. For a given This unobserved motivation may be related confidence level, the latter approach gener- to subsequent earnings both in the presence ally leads to larger confidence sets than the and in the absence of training. Conditioning Imbens–Manski approach. See also Romano on the recent labor market histories of indi- and Shaikh (2006a, 2006b) for subsampling viduals may limit the bias associated with approaches to inference in these settings. this unobserved motivation, but it need not eliminate it entirely. However, we may be 6.2 Sensitivity Analysis willing to limit how highly ­correlated unob- served motivation is with the enrollment Unconfoundedness has traditionally been decision and the earnings outcomes in the seen as an all or nothing assumption: either two regimes, conditional on the labor mar- it is satisfied and one proceeds accord- ket histories. For example, if we compare ingly using the methods appropriate under two individuals with the same labor mar- unconfoundedness, such as matching, or ket history for the last two years, e.g., not the assumption is deemed implausible and employed the last six months and working one considers alternative methods. The lat- the eighteen months before, and both with ter include the bounds approach discussed one two-year old child, it may be reason- in section 6.1, as well as approaches relying able to assume that these cannot differ radi- on alternative assumptions, such as instru- cally in their unobserved motivation given mental variables, which will be discussed in that their recent labor market outcomes section 6.3. However, there is an important have been so similar. The sensitivity analy- alternative that has received much less atten- ses developed by Rosenbaum and Rubin tion in the economics literature. Instead of (1983a) formalize this idea and provides a 54 Journal of Economic Literature, Vol. XLVII (March 2009) tool for making such assessments. Imbens this changes the point estimate of the aver- (2003) applies this sensitivity analysis to age treatment effect. data from labor market training programs. Typically the sensitivity analysis is done The second approach is associated with in fully parametric settings, although work by Rosenbaum (1995). Similar to the since the models can be arbitrarily flex- Rosenbaum–Rubin approach Rosenbaum’s ible, this is not particularly restrictive. method relies on an unobserved covariate Following Rosenbaum and Rubin (1983b), that generates the deviations from uncon- we illustrate this approach in a setting foundedness. The analysis differs in that with binary outcomes. See Imbens (2003) sensitivity is measured using only the rela- and Lee (2005b) for examples in econom- tion between the unobserved covariate and ics. Rosenbaum and Rubin (1983a) fix the the treatment assignment, with the focus marginal distribution of the unobserved on the correlation required to overturn, or covariate to be binomial with p pr(U = i = change substantially, p-values of statistical 1), and assume independence of Ui and Xi. tests of no effect of the treatment. They specify a logistic distribution for the treatment assignment: 6.2.1 The Rosenbaum–Rubin Approach to Sensitivity Analysis pr(W 1 X x, U u) i = | i = i = The starting point is that unconfound- exp( 0 1 x 2 u) edness is satisfied only conditional on the ______​ α + α′ + α · ​ . = 1 exp( 0 1 x 2 u) observed covariates Xi and an unobserved + α + α′ + α · scalar covariate Ui: They also specify logistic regression func- tions for the two potential outcomes: Yi(0), Yi(1) ǁ Wi | Xi, Ui. pr(Y (w) 1 X x, U u) i = | i = i = = This set up in itself is not restrictive, although exp( x u) once parametric assumptions are made the βw0 + β′w1 + βw2 · ​ . ______1 exp( x u) assumption of a scalar unobserved covariate + βw0 + β′w1 + βw2 · Ui is restrictive. Now consider both the conditional dis- For the subpopulation with X x and U tribution of the potential outcomes given i = i = observed and unobserved covariates and the u, the average treatment effect is conditional probability of assignment given observed and unobserved covariates. Rather E[Y (1) Y (0 X x, U u] i − i | i = i = = than attempting to estimate both these con- exp( x u) ditional distributions, the idea behind the ​ β10 + β′11 + β12 · ​ ______1 exp( x u) sensitivity analysis is to specify the form and + β10 + β′11 + β12 · the amount of dependence of these condi- tional distributions on the unobserved cova- exp( 00 01 x 02 u) ​ ______β + β′ + β · ​ . riate, and estimate only the dependence on − 1 exp( 00 01 x 02 u) the observed covariate. Conditional on the + β + β′ + β · specification of the first part estimation of The average treatment effect can be τCATE the latter is typically straightforward. The expressed in terms of the parameters of this idea is then to vary the amount of depen- model and the distribution of the observable dence of the conditional distributions on the covariates by averaging over Xi, and integrat- unobserved covariate and assess how much ing out the unobserved covariate U: Imbens and Wooldridge: Econometrics of Program Evaluation 55

( p, , , , , , , ­functional form assumptions, and so attempts τCATE ≡ τ α2 β02 β12 α0 α1 β00 to estimate are therefore unlikely to be θsens , , ) effective. Given , however, estimating the β01 β10 β11 θsens remaining parameters is considerably easier. N exp( X ) In the second step the plan is therefore to 1 β10 + β′11 i + β12 ​ __ ​ e ​ ​ p a​ ______​ fix the first set of parameters and estimate = N i∑1 1 exp( 10 11 Xi 12) = + β + β′ + β the others by maximum likelihood, and then exp( 00 01 Xi 02) translate this into an estimate for . Thus, for ​ β + β′ + β ​b τ − ______fixed , we first estimate the remaining 1 exp( 00 01 Xi 02) θsens + β + β′ + β parameters through maximum likelihood: exp( X ) β10 + β′11 i (1 p) a​ ______​ ˆ + − 1 exp( 10 11 Xi) ( ) arg max​ ​ L( ), + β + β′ θother​ θsens = θother | θ sens θother exp( 00 01 Xi) where L( ) is the logarithm of the likelihood ​ β + β′ ​bf. · − ______1 exp( X ) function. Then we consider the function + β00 + β′01 i

We do not know the values of the parameters ( ) ( , ​ ˆ ( )), τ θsens = τ θsens θother​ θsens ( p, , ), but the data are somewhat informative α β about them. One conventional approach would Finally, in the third step, we consider the range be to attempt to estimate all parameters, and of values of the function ( ) for a reason- τ θsens then use those estimates to obtain an estimate able set of values for the sensitivity parameters for the average treatment effect. Given the ( ), and obtain a set of values for . θsens τCATE specific parametric model this may be possi- The key question is how to choose the ble, although in general this would be ­difficult set of reasonable values for the sensitiv- given the inclusion of unobserved covariates ity ­parameters. If we do not wish to restrict in the basic model. A second approach, as dis- this set at all, we end up with unrestricted cussed in section 6.1, is to derive bounds on bounds along the lines of section 6.1. The given the model and the data. A sensitivity power from the sensitivity approach comes τ analysis offers a third approach. from the researcher’s willingness to put The Rosenbaum–Rubin sensitivity analy- real limits on the values of the sensitivity sis proceeds by dividing the parameters into parameters ( p, , , ). Among these α2 β02 β12 two sets. The first set includes the­parameters parameters it is difficult to put real limits on that would be set to boundary values under p, and typically it is fixed at 1 2, with little / unconfoundedness, ( , , ), plus the sensitivity to its choice. The more interesting α2 β02 β12 parameter p capturing the marginal distribu- parameters are ( , , ). Let us assume α2 β02 β12 tion of the unobserved covariate Ui. Together that the effect of the unobserved covariate is we refer to these as the sensitivity parame- the same in both treatment arms, β2 ≡ β02 ters, ( p, , , ). The second set , so that there are only two parameters θsens = α2 β02 β12 = β21 consists of the remaining parameters, left to fix, and . Imbens (2003) sug- θother α2 β2 ( , , , , , ). The idea is that gests linking the parameters to the effects of = α0 α1 β00 β01 β10 β11 is difficult to estimate. Estimates of the the observed covariates on assignment and θsens other parameters under unconfoundedness potential outcomes. Specifically he suggests could be obtained by fixing to calculate the partial correlations between α2 = β02 = β12 0 and p at an arbitrary value. The data are observed covariates and the treatment and = not directly informative about the effect of potential outcomes, and then as a bench- an unobserved covariate in the absence of mark look at the sensitivity to an unobserved 56 Journal of Economic Literature, Vol. XLVII (March 2009) covariate that has ­partial correlations with suggests bounding the ratio of the odds ratios treatment and potential outcomes as high as e(x ) (1 e(x )) and e(x) (1 e(x)): i / − i j / − j any of the observed covariates. For example, e(x ) (1 e(x)) Imbens considers, in the labor market train- i · − j 1/ ______​ ​ . ing example, what the effect would be of Γ ≤(1 e(xi)) e(xj) ≤ Γ omitting unobserved motivation, if in fact − · motivation had as much explanatory power If 1, we are back in the setting with Γ = for future earnings and for treatment choice unconfoundedness. If we allow , we Γ = ∞ as did earnings in the year prior to the train- are not restricting the association between ing program. A bounds analysis, in contrast, the treatment indicator and the potential would implicitly allow unobserved motiva- outcomes. Rosenbaum investigates how tion to completely determine both selection much the odds would have to be different in into the program and future earnings. Even order to substantially change the p-value. Or, though putting hard limits on the effect of starting from the other side, he investigates motivation on earnings and treatment choice for fixed values of what the implication is Γ may be difficult, it may be reasonable to put on the p-value. some limits on it, and the Rosenbaum–Rubin For example, suppose that a test of the sensitivity analysis provides a useful frame- null hypothesis of no effect has a p-value of work for doing so. 0.0001 under the assumption of unconfound- edness. If the data suggest it would take the 6.2.2 Rosenbaum’s Method for Sensitivity presence of an unobserved covariate that Analysis changes the odds of participation by a factor ten in order to increase that p-value to 0.05, Rosenbaum (1995) developed a slightly then one would likely consider the result to different approach. The advantage of his be very robust. If instead a small change in approach is that it requires fewer tuning the odds of participation, say with a value of parameters than the Rosenbaum–Rubin 1.5, would be sufficient for a change of Γ = approach. Specifically, it only requires the the p-value to 0.05, the study would be much researcher to consider the effect unobserved less robust. confounders may have on the probability of 6.3 Instrumental Variables treatment assignment. Rosenbaum’s focus is on the effect the presence of unobserved In this section, we review the recent lit- covariates could have on the p-value for the erature on instrumental variables. We focus test of no effect of the treatment based on the on the part of the literature concerned with unconfoundedness assumption, in contrast to ­heterogenous effects. In the current sec- the Rosenbaum–Rubin focus on point esti- tion, we limit the discussion to the case with mates for average treatment effects. Consider a binary endogenous variable. The early two units i and j with the same value for the literature focused on identification of the covariates, x x . If the unconfoundedness ­population average treatment effect and the i = j assumption conditional on Xi holds, both units average effect on the treated. Identification must have the same probability of assignment of these estimands ran into serious prob- to the treatment, e(x ) e(x). Now suppose lems once researchers wished to allow for i = j unconfoundedness only holds conditional on unrestricted heterogeneity in the effect of both Xi and a binary unobserved covariate the treatment. In an important early result, Ui. In that case the assignment probabilities Bloom (1984) showed that if eligibility for the for these two units may differ. Rosenbaum program is used as an instrument, then one Imbens and Wooldridge: Econometrics of Program Evaluation 57

can identify the average effect of the treat- the observed outcome Yi and the potential ment for those who received the treatment. outcomes Yi(0) and Yi(1), is Key for the Bloom result is that the instru- ment changes the probability of receiving W W (0) (1 Z ) i = i · − i the treatment to zero. In order to identify the average effect on the overall popula- W (0) if Z 0 W (1) Z e i i = . tion, the instrument would also need to shift + i · i = W (1) if Z 1 i i = the probability of receiving the treatment to one. This type of identification is some- Exogeneity of the instrument is captured by times referred to as identification at infinity the assumption that all potential outcomes (Gary Chamberlain 1986; Heckman 1990) in are independent of the instrument, or settings with a continuous instrument. The practical usefulness of such identification (Yi(0), Yi(1), Wi(0), Wi(1)) ǁ Zi. results is fairly limited outside of cases where eligibility is randomized. Finding a credible Formulating exogeneity in this way is attrac- instrument is typically difficult enough, with- tive compared to conventional residual- out also requiring that the instrument shifts based definitions, as it does not require the the probability of the treatment close to zero researcher to specify a regression function in and one. In fact, the focus of the current order to define the residuals. This assump- literature on instruments that can credibly tion captures two properties of the instru- be expected to satisfy exclusion restrictions ment. First, it captures random assignment makes it even more difficult to find instru- of the instrument so that causal effects of the ments that even approximately satisfy these instrument on the outcome and treatment support conditions. Imbens and Angrist received can be estimated consistently. This (1994) got around this problem by changing part of the assumption, which is implied by the focus to average effects for the subpopu- explicitly randomization of the instrument, as lation that is affected by the instrument. for example in the seminal draft lottery study Initially we focus on the case with a binary by Angrist (1990), is not sufficient for causal instrument. This case provides some of the interpretations of instrumental variables clearest insight into the identification prob- methods. The second part of the assumption lems. In that case the identification at infin- captures an exclusion restriction that there ity arguments are obviously not satisfied and is no direct effect of the instrument on the so one cannot (point-)identify the population outcome. This second part is captured by the average treatment effect. absence of z in the definition of the potential outcome Y (w). This part of the assumption is 6.3.1 A Binary Instrument i not implied by randomization of the instru- Imbens and Angrist adopt a potential out- ment and it has to be argued on a case by come notation for the receipt of the ­treatment, case basis. See Angrist, Imbens, and Rubin as well as for the outcome itself. Let Zi denote (1996) for more discussion on the distinction the value of the instrument for individual i. between these two assumptions, and for a Let Wi(0) and Wi(1) denote the level of the formulation that separates them. treatment received if the instrument takes on Imbens and Angrist introduce a new con- the values 0 and 1 respectively. As before, let cept, the compliance type of an individual. Yi(0) and Yi(1) denote the potential values for The type of an individual describes the level the outcome of interest. The observed treat- of the treatment that an individual would ment is, analogously to the relation between receive given each value of the instrument. 58 Journal of Economic Literature, Vol. XLVII (March 2009)

In other words, it is captured by the pair of Bloom set up with one-sided noncompliance values (Wi(0), Wi(1)). With both the treat- both always-takers and defiers are absent by ment and instrument binary, there are four assumption. types of responses for the potential treat- Under these two assumptions, inde- ment. It is useful to define the compliance pendence of all four potential outcomes types explicitly: (Yi(0), Yi(1), Wi(0), Wi(1)) and the instrument Zi, and monotonicity, Imbens and Angrist never-taker if W (0) W (1) 0 show that one can identify the average i = i = complier if W (0) 0, W (1) 1 effect of the treatment for the subpopula- i = i = Ti defier if W (0) 1, W (1) 0 . tion of compliers. Before going through their = e i = i = always-taker if W (0) W (1) 1 argument, it is useful to see why we cannot i = i = generally identify the average effect of the The labels never-taker, complier, defier, treatment for others subpopulations. Clearly, and always-taker (e.g., Angrist, Imbens, and one cannot identify the average effect of the Rubin 1996) refer to the setting of a random- treatment for never-takers because they are ized experiment with noncompliance, where never observed receiving the treatment, and the instrument is the (random) assignment so E[Y (1) T n] is not identified. Thus, i | i = to the treatment and the endogenous regres- only compliers are observed in both treat- sor is an indicator for the actual receipt of ment groups, so only for this group is there the treatment. Compliers are in that case any chance of identifying the average treat- individuals who (always) comply with their ment effect. In order to understand the assignment, that is, take the treatment if positive component of the Imbens–Angrist assigned to it and not take it if assigned to result, that we can identify the average effect the control group. One cannot infer from the for compliers, it is useful to consider the observed data (Zi, Wi, Yi) whether a particular subpopulations defined by instrument and individual is a complier or not. It is important treatment. Table 4 shows the information not to confuse compliers (who comply with we have about the individual’s type given their actual assignment and would have com- the ­monotonicity assumption. Consider indi- plied with the alternative assignment) with viduals with (Z 1, W 0). Because of i = i = individuals who are observed to comply with monotonicity such individuals can only be their actual assignment: that is, individuals never-takers. Similarly, individuals (Z 0, i = who complied with the assignment they actu- W 1) can only be always-takers. However, i = ally received, Z W . For such individuals consider individuals with (Z 0, W 0). i = i i = i = we do not know what they would have done Such individuals can be either compliers had their assignment been different, that is or never-takers. We ­cannot infer the type we do not know the value of W (1 Z ). of such individuals from the observed data i − i Imbens and Angrist then invoke an addi- alone. Similarly, individuals with (Z 1, i = tional assumption they refer to as ­monotonicity. W 1) can be either compliers or always- i = Monotonicity requires that W (1) W (0) for takers. i ≥ i all individuals, or that increasing the level of The intuition for the identification result the instrument does not decrease the level is as follows. The first step is to see that we of the treatment. This assumption is equiva- can infer the population proportions of the lent to ruling out the presence of defiers, and three remaining subpopulations, never- it is therefore sometimes referred to as the takers, always-takers and compliers (using “no-defiance” assumption (Alexander Balke the fact that the monotonicity assumption and Pearl 1994; Pearl 2000). Note that in the rules out the presence of defiers). Call these Imbens and Wooldridge: Econometrics of Program Evaluation 59

Table 4 Type by Observed Variables

Zi 0 1 0 Nevertaker/Complier Nevertaker W i 1 Alwaystaker Alwaystaker/Complier

­population shares P pr(T t), for t E[Y (1) Y (0) W (0) 0, t = i = ∈ τLATE = i − i | i = {n, a, c}. Consider the subpopulation with Zi 0. Within this subpopulation we observe Wi(1) 1] = = W 1 only for always-takers. Hence the i = E[Y (1) Y (0) T complier]. conditional probability of W 1 given Z = i − i | i = i = i = 0 is equal to the population share of always- takers: P pr(W 1 Z 0). Similarly, in In practice one need not estimate the local a = i = | i = the subpopulation with Z 1 we observe W average treatment effect by decomposing the i = i 0 only for never-takers. Hence the popula- mixture distributions directly. Imbens and = tion share of never-takers is equal to the con- Angrist show that LATE equals the standard ditional probability of W 0 given Z 1: P instrumental variables estimand, the ratio of i = i = n pr(W 0 Z 1). The population share the covariance of Y and Z and the covari- = i = | i = i i of compliers is then obtained by subtracting ance of Wi and Zi: the population shares of never-takers and always-takers from one: P 1 P P . E[Yi Zi 1] E[Yi Zi 0] c = − n − a ​ | = − | = ​ The second step uses the distribution of Y τLATE = ______E[W Z 1] E[W Z 0] i i | i = − i | i = given (Zi, Wi). We can infer the distribution of Y W 0, T n from the subpopula- E[Y (Z E[Z ])] i | i = i = ​ i · i − i ​ , tion with (Z , W) (1, 0) since all these = ______i i = E[Wi (Zi E[Zi])] individuals are known to be never-takers. · − Then we use the distribution of Y Z 0, which can be estimated using two-stage- i | i = W 0. This is a mixture of the distribution least-squares. For applications using i = of Y W 0, T n and the distribution of ­parametric models with covariate, see i | i = i = Y W 0, T c, with mixture probabilities Hirano et al. (2000) and Fabrizia Mealli et i | i = i = equal to the relative population shares, P al. (2004). n/ (P P ) and P (P P ), respectively. Since Earlier we argued that one cannot con- c + n c/ c + n we already inferred the population shares sistently estimate the average effect for of the never-takers and compliers as well as either never-takers or always-takers in the ­distribution of Y W 0, T n, we can this setting. Nevertheless, we can still use i | i = i = back out of the conditional distribution of the bounds approach from Manski (1990, Y W 0, T c. Similarly we can infer the 1995) to bound the average effect for the i | i = i = conditional distribution of Y W 1, T c. full population. To understand the nature i | i = i = The difference between the means of these of the bound, it is useful to decompose the two conditional distributions is the Local average effect by compliance type τ PATE Average Treatment Effect (LATE), (Imbens (maintaining monotonicity, so there are no and Angrist, 1994): defiers): 60 Journal of Economic Literature, Vol. XLVII (March 2009)

P E[Y (1) Y (0) T n] We can combine these to estimate any τPATE = n · i − i | i = weighted average of these local average treat- P E[Y (1) Y (0) T a] ment effects: + a · i − i | i =

Pc E[Yi(1) Yi(0) Ti c]. LATE, ​ ​ ​ k,l LATE(zk, zl). + · − | = τ λ = ∑k,l λ · τ The only quantities not consistently estima- Imbens and Angrist show that the standard ble are the average effects for never-takers instrumental variables estimand, using g(Zi) and always-takers. Even for those we have as an instrument for Wi, is equal to a particu- some information. For example, we can write lar weighted average: E[Y (1) Y (0) T n] E[Y (1) T n] i − i | i = = i | i = − E[Yi(0) Ti n]. The second term we can E[Yi (g(Zi) E[g(Zi)])] | = ______· − ​ LATE, , estimate, and the data are completely unin- E[Wi (g(Zi) E[g(Zi)])] = τ λ formative about the first term. Hence, if there · − are natural bounds on Yi(1) (for example, if for a particular set of nonnegative weights as the outcome is binary), we can use that to long as E[W g(Z ) g] increases in g. i | i = bound E[Y (1) T n], and then in turn use Heckman and Vytlacil (2006) and i | i = that to bound . These bounds are tight. Heckman, Sergio Urzua, and Vytlacil (2006) τPATE See Manski (1990), Toru Kitagawa (2008), study the case with a continuous instrument. and Balke and Pearl (1994). They use an additive latent single index setup where the treatment received is equal to 6.3.2 Multivalued Instruments and Weighted Local Average Treatment W 1{h(Z ) V 0}, Effects i = i + i ≥ The previous discussion was in terms of a where h( ) is strictly monotonic, and the · single binary instrument. In that case there is latent type Vi is independent of Zi. In ­general, no other average effect of the treatment that in the presence of multiple instruments, this can be estimated consistently other than the latent single index framework imposes sub- local average treatment effect, . With stantive restrictions.14 Without loss of gener- τLATE a multivalued instrument, or with multiple ality we can take the marginal distribution binary instruments (still maintaining the set- of Vi to be uniform. Given this framework, ting of a binary treatment—see for extensions Heckman, Urzua, and Vytlacil (2006) define of the local average treatment effect con- the marginal treatment effect as a function cept to the multiple treatment case Angrist of the latent type v of an individual, and Imbens (1995) and Card (2001), we can estimate a variety of local average treatment (v) E[Y (1) Y (0) V v]. τMTE = i − i | i = effects. Let {z , … , z } denote the set of 핑 = 1 K values for the instruments. Initially we take In the single continuous instrument case, the set of values to be finite. Then for each (v) is, under some differentiability and τMTE pair (z , z) with pr(W 1 Z z ) pr(W invertibility conditions, equal to a limit of k l i = | i = k > i 1 Z z) one can define a local average local average treatment effects: = | i = l treatment effect:

LATE(zk, zl) τ = 14 See Vytlacil (2002) for a discussion in the case with binary instruments, where the latent index set up implies E[Y (1) Y (0) W (z) 0, W (z ) 1 ]. i − i | i l = i k = no loss of generality. Imbens and Wooldridge: Econometrics of Program Evaluation 61

1 (v) ​ l i m ​ (h− (v), z). Kenneth Y. Chay and Michael Greenstone MTE 1 LATE τ = z h− (v) τ ↓ (2005), Card, Alexandre Mas, and Jesse A parametric version of this concept goes Rothstein (2007), Lee, Enrico Moretti, and back to work by Anders Björklund and Matthew J. Butler (2004), Jens Ludwig and Robert Moffitt (1987). All average treatment Douglas L. Miller (2007), Patrick J. McEwan effects, including the overall average effect, and Joseph S. Shapiro (2008), Sandra E. the average effect for the treated, and any Black (1999), Susan Chen and van der Klaauw local average treatment effect can now be (2008), Ginger Zhe Jin and Phillip Leslie expressed in terms of integrals of this mar- (2003), Thomas Lemieux and Kevin Milligan ginal treatment effect, as shown in Heckman (2008), Per Pettersson-Lidbom (2007, 2008), and Vytlacil (2005). For example, PATE ​ and Pettersson-Lidbom and Björn Tyrefors 1 τ = ​ ​ ​ (v) dv. A complication in practice is (2007). Key theoretical and conceptual 0 τMTE ∫that not ­necessarily all the marginal treat- contributions include the interpretation of ment effects can be estimated. For example, estimates for fuzzy regression discontinu- if the instrument is binary, Z {0, 1}, then ity designs allowing for general heterogene- i ∈ for individuals with V min( h(0), h(1)), ity of treatment effects (Hahn, Todd, and i < − − it follows that W 0, and for these never- van der Klaauw 2001), adaptive estimation i = takers we cannot estimate (v). Any methods (Yixiao Sun 2005), methods for τMTE average effect that requires averaging over bandwidth selection tailored to the RD set- such values of v is therefore also not point- ting, (Ludwig and Miller 2005; Imbens and identified. Moreover, average effects that can Karthik Kalyanaraman 2008) and various be expressed as integrals of (v) may be tests for discontinuities in means and distri- τMTE identified even if some of the (v) that butions of nonaffected variables (Lee 2008; τMTE are being integrated over are not identified. McCrary 2008) and for misspecification Again, in a binary instrument example with (Lee and Card 2008). For recent reviews in pr(W 1 Z 1) 1, and pr(W 1 Z the economics literature, see van der Klaauw i = | i = = i = | i 0) 0, the average treatment effect (2008b), Imbens and Lemieux (2008), and = = τPATE is identified, but (v) is not identified for Lee and Lemieux (2008). τMTE any value of v. The basic idea behind the RD design is that assignment to the treatment is determined, 6.4 Regression Discontinuity Designs either completely or partly, by the value of Regression discontinuity (RD) methods a predictor (the forcing variable Xi) being on have been around for a long time in the psy- either side of a common threshold. This gen- chology and applied statistics ­literature, going erates a discontinuity, sometimes of size one, back to the early 1960s. For ­discussions and in the conditional probability of receiving references from this literature, see Donald L. the treatment as a function of this particular Thistlethwaite and Campbell (1960), William predictor. The forcing variable is often itself M. K. Trochim (2001), Shadish, Cook, and associated with the potential outcomes, but Campbell (2002), and Cook (2008). Except this association is assumed to be smooth. As for some important foundational work by a result any discontinuity of the conditional Goldberger (1972a, 1972b), it is only recently distribution of the outcome as a function of that these methods have attracted much atten- this covariate at the threshold is interpreted tion in the economics literature. For some of as evidence of a causal effect of the treatment. the recent applications, see Van Der Klaauw The design often arises from administrative (2002, 2008a), Lee (2008), Angrist and decisions, where the incentives for individu- Victor Lavy (1999), DiNardo and Lee (2004), als to participate in a program are rationed 62 Journal of Economic Literature, Vol. XLVII (March 2009) for reasons of resource constraints, and clear averaging we make a smoothness assump- transparent rules, rather than discretion, by tion that the two conditional expectations administrators are used for the allocation of E[Y (w) X x], for w 0, 1, are continuous i | i = = these incentives. in x. Under this assumption, E[Y (0) X ] i | i = c It is useful to distinguish between two gen- limx c E[Yi(0) Xi x] limx c E[Yi Xi x], = ↑ | = = ↑ | = eral settings, the sharp and the fuzzy regres- implying that sion discontinuity designs (e.g., Trochim 1984, 2001; Hahn, Todd, and van der Klaauw SRD ​l i m ​ E[Yi Xi x] ​lim ​ E[Yi Xi x], τ = x c | = − x c | = 2001; Imbens and Lemieux 2008; van der ↓ ↑ Klaauw 2008b; Lee and Lemieux 2008). where this expression uses the fact that Wi is a deterministic function of X (a key feature 6.4.1 The Sharp Regression Discontinuity i of the SRD). The statistical problem becomes Design one of estimating a regression function non- In the sharp regression discontinuity (SRD) parametrically at a boundary point. We dis- design, the assignment Wi is a deterministic cuss the statistical problem in more detail in function of one of the covariates, the forcing section 6.4.4. (or treatment-determining) variable X : i 6.4.2 The Fuzzy Regression Discontinuity Design W 1[X c], i = i ≥ In the fuzzy regression discontinuity (FRD) where 1[ ] is the indicator function, equal to design, the probability of receiving the treat- · one if the even in brackets is true and zero ment need not change from zero to one at the otherwise. All units with a covariate value of threshold. Instead the design only requires a at least c are in the treatment group (and par- discontinuity in the probability of assignment ticipation is mandatory for these individuals), to the treatment at the threshold: and all units with a covariate value less than c are in the control group (members of this l i m ​ pr(Wi 1 Xi x) x c = | = group are not eligible for the treatment). In ↓ the SRD design, we focus on estimation of ​l i m ​ pr(Wi 1 Xi x). ≠ x c = | = ↑ (32) E[Y (1) Y (0) X ]. In practice, the discontinuity needs to be τSRD = i − i | i = c sufficiently large that typically it can be seen (Naturally, if the treatment effect is constant, easily in simple graphical analyses. These then .) Writing this expression discontinuities can arise if incentives to par- τSRD = τPATE as E[Y (1) X ] E[Y (0) X c], we ticipate in a program change discontinuously i | i = c − i | i = focus on identification of the two terms sepa- at a threshold, without these incentives being rately. By design there are no units with Xi powerful enough to move all units from non- for whom we observe Y (0). To estimate participation to participation. = c i E[Y (w) X c] without making functional In this design we look at the ratio of the i | i = form assumptions, we exploit the ­possibility jump in the regression of the outcome on of observing units with covariate values arbi- the covariate to the jump in the regression trarily close to c.15 In order to justify this of the treatment indicator on the covariate

15 Although in principle the first term in the difference in (32) would be straightforward to estimate if we actually we also need to estimate this term by averaging over units observe individuals with X x, with continuous covariates with covariate values close to c. i = Imbens and Wooldridge: Econometrics of Program Evaluation 63 as an average causal effect of the treatment. c—and acting as if unconfoundedness = Formally, the functional of interest is holds, would lead to estimating the aver- age treatment effect at X c based on the i = ​l i m ​ E[Yi Xi x] ​lim ​ E[Yi Xi x] expression x c | = − x c | = ↓ ↑ frd ______​ ​ . τ = ​l i m ​ E[Wi Xi x] ​l i m ​ E[Wi Xi x] x c | = − x c | = unconf E[Yi Xi c, Wi 1] ↓ ↑ τ= | = = Hahn, Todd, and van der Klaauw (2001) E[Y X c, W 0], − i | i = i = exploit the instrumental variables con- nection to interpret the fuzzy regression discontinuity design when the effect of which equals E[Y (1) Y (0) X c] under i − i | i = the treatment varies by unit. They define unconfoundedness. In fact, under uncon- complier to be units whose participation is foundedness one can estimate the average affected by the cutoff point. That is, a com- effect E[Y (1) Y (0) X x] at values of i − i | i = plier is someone with a value of the forcing x different from c. However, an interesting variable Xi close to c, and who would par- result is that if unconfoundedness holds, ticipate if c were chosen to be just below the FRD also estimates E[Y (1) Y (0) X i − i | i X , and not participate if c were chosen to c], as long as the potential outcomes have i = be just above Xi. Hahn, Todd, and van der smooth expectations as a function of the Klaauw then exploit that structure to show forcing variable around x c. A special case = that in combination with a monotonicity of this is discussed in Hahn, Todd, and van assumption, der Klaauw (2001), who assume only that treatment is unconfounded with respect to E[Y (1) Y (0) unit i is a complier the individual-specific gain. Therefore, in τfrd = i − i | principle, there are situations where even if and X c]. i = one believes that unconfoundedness holds, The estimand is an average effect of the one may wish to use the FRD approach. τfrd treatment, but only averaged for units with In particular, even if we maintain uncon- X c (by regression discontinuity), and only foundedness, a standard analysis based i = for compliers (people who are affected by on can be problematic because the τunconf the threshold). Clearly the analysis generally potential discontinuities in the regression does not have much external validity. It is functions (at x c) under the FRD design = only valid for the subpopulation who is com- invalidate the usual statistical methods that plier at the threshold, and it is only valid for treat the regression functions as continuous the subpopulation with X c. Nevertheless, at x c. i = = the FRD analysis may do well in terms of Although unconfoundedness in the FRD internal validity. setting is possible, its failure makes it diffi- It is useful to compare the RD method in cult to interpret . By contrast, provided τunconf this setting with standard methods based on monotonicity holds, the FRD parameter, unconfoundedness. In contrast to the SRD , identifies the average treatment effect τfrd case, an unconfoundedness-based analysis for compliers at x c. In other words, = is possible in the FRD setting because some approaches that exploit the FRD nature of treated observations will have X c, and the design identify an (arguably) interesting i ≤ some control observations will have X c. parameter both when unconfoundedness i ≥ Ignoring the FRD setting—that is, ignor- holds and in a leading case (monotonicity) ing the discontinuity in E[W X x] at x when unconfoundedness fails. i | i = 64 Journal of Economic Literature, Vol. XLVII (March 2009)

6.4.3 Graphical Methods and credible estimates with statistically and substantially significant magnitudes. Graphical analyses are typically an inte- In addition to inspecting whether there gral part of any RD analysis. RD designs is a jump at this value of the covariate, one suggest that the effect of the treatment of should inspect the graph to see whether interest can be measured by the value of the there are any other jumps in the conditional discontinuity in the conditional expectation expectation of Yi given Xi that are compara- of the outcome as a function of the forcing ble in size to, or larger than, the discontinuity variable at a particular point. Inspecting the at the cutoff value. If so, and if one cannot estimated version of this conditional expec- explain such jumps on substantive grounds, it tation is a simple yet powerful way to visual- would call into question the interpretation of ize the identification strategy. Moreover, to the jump at the threshold as the causal effect assess the credibility of the RD strategy, it of the treatment. can be useful to inspect additional graphs, as In order to optimize the visual clarity it is discussed below in section 6.4.5. For strik- recommended to calculate averages that are ingly clear examples of such plots, see Lee, not smoothed across the cutoff point c. In Moretti, and Butler (2004), Rafael Lalive addition, it is recommended not to artificially (2008), and Lee (2008). smooth on either side of the threshold in a The main plot in a SRD setting is a his- way that implies that the only discontinuity togram-type estimate of the average value in the estimated regression function is at c. of the outcome by the forcing variable. For One should therefore use nonsmooth meth- some binwidth h, and for some number of ods such as the histogram type estimators bins K0 and K1 to the left and right of the described above rather than smooth meth- cutoff value, respectively, construct bins ods such as kernel estimators. (bk, bk 1], for k 1, … , K K0 K1, where In a FRD setting, one should also + = = + b c (K k 1) h. Then calculate the calculate k = − 0 − + · number of observations in each bin, and the N average outcome in the bin: __ 1 ​W ​k ​ ​ ​ Yi · 1[bk Xi bk 1], ___ + N = ​ Nk · i∑1 ≤ ≤ = Nk ​ 1[bk Xi bk 1], = ∑​ ≤ ≤ + i 1 and plot the ​W__ ​ against the bin centers b​ ˜ ​, in = k k N the same way as described above. __ 1 ​Y ​k ​ · ​ ​ Yi · 1[bk Xi bk 1]. ___ + = ​ Nk i∑1 ≤ ≤ 6.4.4 Estimation and Inference = The key plot is that of the ​Y__ ​, for k The object of interest in regression discon- k 1, … , K against the mid point of the tinuity designs is a difference in two regres- = ˜ bins, ​ b k ​ (bk bk 1) 2. The question is sion functions at a particular point (in the = + + / whether around the threshold c (by construc- SRD case), and the ratio of two differences of tion on the edge of one of the bins) there is regression functions (in the FRD case). These any evidence of a jump in the conditional estimands are identified without functional mean of the outcome. The formal statistical form assumptions, and in general one might analyses discussed below are essentially just therefore like to use nonparametric regres- sophisticated versions of this, and if the basic sion methods that allow for flexible func- plot does not show any evidence of a disconti- tional forms. Because we are interested in the nuity, there is relatively little chance that the behavior of the regression functions around a more sophisticated analyses will lead to robust single value of the covariate, it is attractive Imbens and Wooldridge: Econometrics of Program Evaluation 65 to use local smoothing methods such as ker- nel). The choice of bandwidth then amounts nel regression rather than global smoothing to to dropping all observations such that X i ∉ methods such as sieves or series regression [c h, c h]. The question becomes how to − + because the latter will generally be sensi- choose the bandwidth h. tive to behavior of the regression function Most standard methods for choosing away from the threshold. Local smoothing bandwidths in nonparametric regression, methods are generally well understood (e.g., including both cross-validation and plug-in Charles J. Stone 1977; Herman J. Bierens methods, are based on criteria that integrate 1987; Härdle 1990; Adrian Pagan and Aman the squared error over the entire distribution 2 Ullah 1999). For a particular choice of the of the covariates: ​ ​ ​ ​(m​ ˆ ​(z) m(z)) f (z) dz. z − X kernel, K( ), e.g., a rectangular kernel K(z) For our purposes∫ this criterion does not · = 1[ h z h], or a Gaussian kernel K(z) reflect the object of interest. We are specifi- − ≤ ≤ = exp( z2 2) ​ _( ​2 ), the regression function cally interested in the regression function at − / /√ π at x, m(x) E[Y X x] is estimated as a single point, moreover, this point is always = i | i = N a boundary point. Thus we would like to 2 ​ mˆ ​(x) ​ Y · , choose h to minimize E[( ​ mˆ ​(c) m(c)) ] (using = ​ i λi − i∑1 the data with X c only, or using the data = i ≤ Xi x KA ​ − ​ B with Xi c only). If the density of the forcing ____h ≥ with weights i ​ . variable is high at the threshold, a bandwidth λ = ​ ______N Xi x ​ i 1​ ​ K A​ − ​ B ∑ = ____h selection procedure based on global criteria may lead to a bandwidth that is much larger An important difference with the primary than is appropriate. focus in the nonparametric regression litera- There are few attempts to formalize to ture is that in the RD setting we are inter- standardize the choice of a bandwidth for ested in the value of the regression functions such cases. Ludwig and Miller (2005) and at boundary points. Standard kernel regres- Imbens and Lemieux (2008) discuss some sion methods do not work well in such cases. cross-validation methods that target more More attractive methods for this case are directly the object of interest in RD designs. local linear regression (Fan and Gijbels Assuming the density of Xi is continuous at c, 1996; Porter 2003; Burkhardt Seifert and and that the conditional variance of Yi given Theo Gasser 1996, 2000; Ichimura and Todd X is continuous and equal to 2 at X c, i σ i = 2007), where locally a linear regression func- Imbens and Kalyanaraman (2009) show that tion, rather than a constant regression func- the optimal bandwidth depends on the sec- tion, is fitted. This leads to an estimator for ond derivatives of the regression functions at the regression function at x equal to the threshold and has the form

ˆ 1 5 2 ​ mˆ ​( x) ​ ˆ ​, where (​ ˆ ​, ​ ​) h N− / C = α α β opt = · K · σ N 1 1 1 5 2 / arg min​ ​ ​ ​ i · (Yi · (Xi x)) , _​ p​ ___​ 1 p​ = , λ − α − β − + − i∑1 a ​ 2 2 ​​b​ ​, α β = × ______∂ m 2 ∂ m 2 limx c (​ __2 ​ (x)) limx c (​ __2 ​ (x)) with the same weights as before. In that ↓ ∂x + ↑ ∂x λi case the main remaining choice concerns the bandwidth, denoted by h. Suppose one where p is the fraction of observations with uses a rectangular kernel, K(z) 1[ h z X c, and C is a constant that depends = − ≤ i ≥ K h] (and typically the results are relatively on the kernel. For a rectangular kernel K(z) ≤ robust with respect to the choice of the ker- 1 h z h, the constant equals CK 2.70. = − ≤ ≤ = 66 Journal of Economic Literature, Vol. XLVII (March 2009)

Imbens and Kalyanaram propose and imple- times in an attempt to raise their score above ment a plug in method for the bandwidth.16 the threshold. If one uses a rectangular kernel, and given There are two sets of specification checks a choice for the bandwidth, estimation for that researchers can typically perform to at the SRD and FRD designs can be based on least partly assess the empirical relevance of ordinary least squares and two stage least these concerns. Although the proposed proce- squares, respectively. If the bandwidth goes dures do not directly test null hypotheses that to zero sufficiently fast, so that the asymp- are required for the RD approach to be valid, totic bias can be ignored, one can also base it is typically difficult to argue for the validity inference on these methods. (See HTV and of the approach when these null hypotheses Imbens and Lemieux 2008.) do not hold. First, one may look for discon- tinuities in average value of the covariates 6.4.5 Specification Checks around the threshold. In most cases, the rea- There are two important concerns in the son for the discontinuity in the probability of application of RD designs, be they sharp the treatment does not suggest a discontinu- or fuzzy. These concerns can sometimes be ity in the average value of covariates. Finding assuaged by investigating various implica- a discontinuity in other covariates typically tions of the identification argument underly- casts doubt on the assumptions underlying the ing the regression discontinuity design. RD design. Specifically, for covariates Zi, the A first concern about RD designs is the pos- test would look at the difference sibility of other changes at the same threshold value of the covariate. For example, the same Z ​l i m ​ E[Zi Xi x] ​lim ​ E[Zi Xi x]. τ = x c | = − x c | = age limit may affect eligibility for multiple ↓ ↑ programs. If all the programs whose eligibil- Second, McCrary (2008) suggests testing the ity changes at the same cutoff value affect null hypothesis of continuity of the density the outcome of interest, an RD analysis may of the covariate that underlies the assign- mistakenly attribute the combined effect to ment at the threshold, against the alterna- the treatment of interest. The second con- tive of a jump in the density function at that cern is that of manipulation by the individu- point. A discontinuity in the density of this als of the covariate value that underlies the covariate at the particular point where the assignment mechanism. The latter is less of a discontinuity in the conditional expectation concern when the forcing variable is a fixed, occurs is suggestive of violations of the no-­ immutable characteristic of an individual manipulation assumption. Here the focus is such as age. It is a particular concern when on the difference eligibility criteria are known to potential par- ticipants and are based on variables that are f (x) ​l i m ​ fX(x) ​lim ​ fX (x). τ = x c − x c affected by individual choices. For example, ↓ ↑ if eligibility for financial aid depends on test In both cases a substantially and statistically scores that are graded by teachers who know significant difference in the left and right the cutoff values, there may be a tendency to limits suggest that there may be problems push grades high enough to make students with the RD approach. In practice, more use- eligible. Alternatively if thresholds are known ful than formal statistical tests are ­graphical to students they may take the test multiple analyses of the type discussed in section 6.4.3 where histogram-type estimates of the conditional expectation of E[Z X x] and 16 Code in Matlab and Stata for calculating the optimal i | i = bandwidth is available on their website. of the marginal density fX(x) are graphed. Imbens and Wooldridge: Econometrics of Program Evaluation 67

6.5 Difference-in-Differences Methods Donald and Lang 2007), as well as the recent extensions by Athey and Imbens (2006) who Since the seminal work by Ashenfelter (1978) develop a functional form-free version of the and Ashenfelter and Card (1985), the use difference-in-differences methodology, and of Difference-In-Differences (DID) meth- Abadie, Diamond, and Jens Hainmueller ods has become widespread in empirical (2007), who develop a method for construct- economics. Influential applications include ing an artificial control group from multiple Philip J. Cook and George Tauchen (1982, nonexposed groups. 1984), Card (1990), Bruce D. Meyer, W. Kip 6.5.1 Repeated Cross Sections Viscusi, and David L. Durbin (1995), Card and Krueger (1993, 1994), Nada Eissa and The standard model for the DID approach Liebman (1996), Blundell, Alan Duncan, and is as follows. Individual i belongs to a group, Meghir (1998), and many others. The DID G {0, 1} (where group 1 is the treatment i ∈ approach is often associated with so-called group), and is observed in time period T i ∈ “natural experiments,” where policy changes {0, 1}. For i 1, … , N, a random sample from = can be used to effectively define control and the population, individual i’s group identity treatment groups. See Angrist and Krueger and time period can be treated as random (1999), Angrist and Pischke (2009), and variables. In the standard DID model, we Blundell and Thomas MaCurdy (1999) for can write the outcome for individual i in the textbook discussions. absence of the intervention, Yi(0) as The simplest setting is one where out- comes are observed for units observed in (33) Y(0) T G , i = α + β · i + γ · i + εi one of two groups, in one of two time peri- ods. Only units in one of the two groups, with unknown parameters , , and . We α β γ in the second time period, are exposed to ignore the potential presence of other cova- a treatment. There are no units exposed to riates, which introduce no special com- the treatment in the first period, and units plications. The second coefficient in this from the control group are never observed specification, , represents the time com- β to be exposed to the treatment. The average ponent common to both groups. The third gain over time in the non-exposed (control) coefficient, , represents a group-specific, γ group is subtracted from the gain over time time-invariant component. The fourth term, in the exposed (treatment) group. This dou- , represents unobservable characteristics εi ble differencing removes biases in second of the individual. This term is assumed to period comparisons between the treatment be independent of the group indicator and and control group that could be the result have the same distribution over time, i.e., from permanent differences between those (G , T), and is normalized to have mean εi ǁ i i groups, as well as biases from compari- zero. sons over time in the treatment group that An alternative set up leading to the same could be the result of time trends unrelated estimator allows for a time-invariant individ- to the treatment. In general this allows for ual-specific fixed effect, , potentially corre- γi the endogenous adoption of the new treat- lated with Gi, and models Yi(0) as ment (see Timothy Besley and Case 2000 and Athey and Imbens 2006). We discuss (34) Y (0) T . i = α + β · i + γi + εi here the conventional set up, and recent work on inference (Bertrand, Duflo, and (See, e.g., Angrist and Krueger 1999.) This Mullainathan 2004; Hansen 2007a, 2007b; generalization of the standard model does 68 Journal of Economic Literature, Vol. XLVII (March 2009) not affect the standard DID estimand, and 6.5.2 Multiple Groups and Multiple Periods it will be subsumed as a special case of the model we propose. With multiple time periods and multiple The equation for the outcome without groups we can use a natural extension of the the treatment is combined with an equa- two-group two-time-period model for the tion for the outcome given the treatment: outcome in the absence of the intervention. Y (1) Y (0) . The standard DID Let T and G denote the number of time peri- i = i + τDID estimand is under this model equal to ods and groups respectively. Then: T (35) DID E[Yi(1)] E[Yi(0)] (36) Yi(0) ​ ​ t 1[Ti t] τ = − = α + t∑1 β · = = G AE[Yi | Gi 1, Ti 1] = = = ​ ​ ​ g 1[Gi g] i + g∑1 γ · = + ε = E[Yi | Gi 1, Ti 0]B − = = with separate parameters for each group and time period, and , for g 1, … , G and t AE[Yi | Gi 0, Ti 1] γg βt = − = = 1, … , T, where the initial time period coef- = E[Y | G 0, T 0]B. ficient and first group coefficient have implic- − i i = i = itly been normalized to zero. This model is then combined with the additive model for In other words, the population average the treatment effect, Y (1) Y (0) , i = i + τDID ­difference over time in the control group implying that the parameters of this model (G 0) is subtracted from the population can still be estimated by ordinary least i = average difference over time in the treatment squares based on the regression function group (G 1) to remove biases associated T i = with a common time trend unrelated to the (37) Yi ​ ​ t 1[Ti t] = α + t∑1 β · = intervention. = We can estimate simply using least G τDID squares methods on the regression function ​ ​ g 1[Gi g] DID Ii i, + g∑1 γ · = + τ · + ε for the observed outcome, = where Ii is now an indicator for unit i being Y T G W , in a group and time period that was exposed i = α + β1 · i + γ1 · i + τDID · i + εi to the treatment. where the treatment indicator Wi is equal to This model with more than two time the interaction of the group and time indica- periods, or more than two groups, or both, tors, I T G . Thus the treatment effect imposes testable restrictions on the data. i = i · i is estimated through the coefficient on the For example, if group g1 and g2 are both not interaction between the indicators for the exposed to the treatment in periods t1 and t2, second time period and the treatment group. under this model the double difference This leads to ______( ​​Y ​g​ ,t ​ ​​Y ​g​ ,t ​) (Y​​ ​g​ ,t ​ ​​Y ​g​ ,t ​), 2 2 − 2 1 − 1 2 − 1 1 ˆ (Y__​ ​ ​Y__ ​ ) (__Y​ ​ ​Y__ ​ ), τDID​ = 11 − 10 − 01 − 00 should estimate zero, which can be tested __ where Y​ ​gt ​ i G g, T t​ ​ ​Yi Ngt is the average using conventional methods—this possibil- = ∑ | i= i= / outcome among units in group g and time ity is exploited in the next subsection. In the period t. two-period, two-group setting there are no Imbens and Wooldridge: Econometrics of Program Evaluation 69 testable restrictions on the four group/period ­unobserved components . In this two- ηgt means. group, two-time-period case the problem is even worse than the absence of a consistent 6.5.3 Standard Errors in the Multiple estimator, because one cannot even estab- Group and Multiple Period Case lish whether there is a clustering problem: Recently there has been attention called the data are not informative about the value to the concern that ordinary least square of ​ ​2​. If we have data from more than two ση​ standard errors for the DID estimator may groups or from more than two time periods, not be accurate in the presence of correla- we can typically estimate ​ ​2​ ​, and thus, at ση tions between outcomes within groups and least under the normality and independence between time periods. This is a particu- assumptions for , construct confidence ηgt lar case of clustering where the regressor intervals for . Consider, for example, the τDID of interest does not vary within clusters. case with three groups, and two time peri- See Brent R. Moulton (1990), Moulton ods. If groups G 0, 1 are both not treated i = and William C. Randolph (1989), and in the second period, then (Y__​ ​ Y__​ ​ ) (Y__​ ​ 11 − 10 − 01 Wooldridge (2002) for a general discus- ​Y__ ​ ) (0, 4 ​ ​2​ ​), which can be used to − 00 ~  · ση sion. The specific problem has been ana- obtain an unbiased estimator for ​ ​2​. See ση​ lyzed recently by Donald and Lang (2007), Donald and Lang (2007) for details. Bertrand, Duflo, and Mullainathan (2004), Bertrand, Duflo, and Mullainathan (2004) and Hansen (2007a, 2007b). and Hansen (2007a, 2007b) focus on the case The starting point of these analyses is a with multiple (more than two) time peri- particular structure on the error term : ods. In that case we may wish to relax the εi assumption that the are independent ηgt ​ ​ ​ , over time. Note that with data from only two εi = ηGi,Ti + νi time periods there is no information in the where is an individual-level idiosyncratic data that allows one to establish the absence νi error term, and is a group/time specific of independence over time. The typical gen- ηgt component. The unit level error term is eralization is to allow for a autoregressive νi independent across all units, E[ i j] 0 if structure on the gt, for example, 2 2 ν ·ν = η i j and E[​ i​​ ​] ​ ​ ​. Now suppose we also ≠ ν = σν​ 2 assume that gt (0, ​ ​ ​), and all the gt gt gt 1 gt, η ~  ση​ η η = α · η − + ω are independent. Let us focus initially on the two-group, two-time-period case. With a with a serially uncorrelated . More gener- ωgt large number of units in each group and time ally, with T time periods, one can allow for an __ period, ​Y ​gt t g 1g 1,t 1 DID autoregressive process of order T 2. Using → α + β + γ + = = · τ − , so that simulations and real data ­calculations based + ηgt on data for fifty states and multiple time

​ ˆ (__Y​ ​ ​Y__ ​ ) (Y__​ ​ Y__​ ​ ) periods, Bertrand, Duflo, and Mullainathan τ​DID = 11 − 10 − 01 − 00 → τDID (2004) show that corrections to the conven- ( ) ( ) ( , 4 ​ ​2​). tional standard errors taking into account the + η11 − η10 − η01 − η00 ~  τDID · ση​ clustering and autoregressive structure make Thus, in this case with two groups and two a substantial difference. Hansen (2007a, time periods, the conventional DID esti- 2007b) provides additional large sample mator is not consistent. In fact, no consis- results under sequences where the number tent estimator exists because there is no of time periods increases with the sample way to eliminate the influence of the four size. 70 Journal of Economic Literature, Vol. XLVII (March 2009)

6.5.4 Panel Data ­outcomes actually compromises the com- parison because Yi0 may in fact be correlated Now suppose we have panel data, in the with . In the end, the two approaches make εi two period, two group case. Here we have fundamentally different assumptions. One N individuals, indexed i 1, … , N, for whom needs to choose between them based on = we observe (Gi, Yi0, Yi1, Xi0, Xi1), where Gi is, substantive knowledge. When the estimated as before, group membership, Xit is the cova- coefficient on the lagged outcome is close riate value for unit i at time t, and Yit is the to zero, obviously there will be little differ- outcome for unit i at time t. ence between the point estimates. In addi- One option is to proceed with estimation tion, using the formula for omitted variable exactly as before, essentially ignoring the fact bias in least squares estimation, the results that the observations in different time peri- will be very similar if the average outcomes ods come from the same unit. We can now in the treatment and control groups are simi- interpret the estimator as the ordinary least lar in the first period. Finally, note that in squares estimator based on the regression the repeated cross-section case the choice function for the difference outcomes: between the DID and unconfoundedness approaches did not arise because the uncon- Y Y G , foundedness approach is not feasible: it is not i1 − i0 = β + τDID · i + εi possible to adjust for lagged outcomes when which leads to the double difference as the we do not have the same units available in estimator for : ​ ˆ (​Y__ ​ ​Y__ ​ ) both periods. τDID τDID​ = 11 − 10 − (​Y__ ​ ​Y__ ​ ). This estimator is identical to that As a practical matter, the DID approach 01 − 00 discussed in the context of repeated cross- appears less attractive than the unconfound- sections, and so does not exploit directly the edness-based approach in the context of panel panel nature of the data. data. It is difficult to see how making treated A second, and very different, approach and control units comparable on lagged out- with panel data, which does exploit the spe- comes will make the causal interpretation of cific features of the panel data, would be to their difference less credible, as suggested by assume unconfoundedness given lagged out- the DID assumptions. comes. Let us look at the differences between 6.5.5 The Changes-in-Changes Model these two approaches in a simple setting, without covariates, and assuming linearity. Now we return to the setting with two In that case the DID approach suggests the groups, two time periods, and repeated cross- regression of Yi1 Yi0 on the group indica- sections. Athey and Imbens (2006) general- − tor, leading to ​ ˆ . The unconfoundedness ize the standard model in several ways. They τDID​ assumption would suggest the regression of relax the additive linear model by assuming the difference Y Y on the group indica- that, in the absence of the intervention, the i1 − i0 tor and the lagged outcome Yi0: outcomes satisfy

Y Y G Y . (38) Y(0) h (U , T), i1 − i0 = β + τunconf · i + δ · i0 + εi i = 0 i i

While it appears that the analysis based with h0(u, t) increasing in u. The random vari- on unconfoundedness is necessarily less able Ui represents all unobservable charac- restrictive because it allows a free coef- teristics of individual i, and (38) incorporates ficient in Yi0, this is not the case. The DID the idea that the outcome of an individual assumption implies that adjusting for lagged with U u will be the same in a given time i = Imbens and Wooldridge: Econometrics of Program Evaluation 71 period, irrespective of the group member- the treatment group, no assumptions are ship. The distribution of Ui is allowed to required about how the intervention affects vary across groups, but not over time within outcomes. groups, so that U T G . Athey and Imbens The average effect of the treatment for i ǁ i | i call the resulting model the changes-in- the second period treatment group is τcic changes (CIC) model. E[Y (1) Y (0) G 1, T 1]. Because = i − i | i = i = The standard DID model in (33) adds the first term of this expression is equal to three additional assumptions to the CIC E[Y (1) G 1, T 1] E[Y G 1, T i | i = i = = i | i = i = model, namely 1], it can be estimated directly from the data. The difficulty is in estimating the second (39) U E[U G ] G (additivity) term. Under the assumptions of monotonicity i − i | i ǁ i of h0(u, t) in u, and conditional independence (40) h (u, t) (u t), of T and U given G , Athey and Imbens 0 = ϕ + δ · i i i (single index model) show that in fact the full distribution of (0) Y given G T 1 is identified through the i = i = for a strictly increasing function ( ), and equality ϕ · 1 (41) ( ) is the identity function. (42) F​ ​ (y) ​F​ ​(​F​− ​ ​ (​F​ ​( y))), ϕ · Y11 = Y10 Y00 Y01 (identity transformation).

where ​F​Ygt​ (y) denotes the distribution ­function In the CIC extension, the treatment of Y given G g and T t. The expected i i = i = group’s distribution of unobservables may be outcome for the second period treatment different from that of the control group in group under the control treatment is arbitrary ways. In the absence of treatment, 1 all differences between the two groups can E[Y (0) G 1, T 1] E[​F​−​ ​ (F (Y ))]. i | i = i = = 01 00 i10 be interpreted as coming from differences in the conditional distribution of U given G. To analyze the counterfactual effect of the The model further requires that the changes intervention on the control group, Athey and over time in the distribution of each group’s Imbens assume that, in the presence of the outcome (in the absence of treatment) arise intervention, solely from the fact that h0(u, 0) differs from h (u, 1), that is, the relation between unob- Y(1) h (U , T) 0 i = 1 i i servables and outcomes changes over time. Like the standard model, the Athey–Imbens for some function h1(u, t) that is increasing approach does not rely on tracking indivi­ in u. That is, the effect of the treatment at a duals over time. Although the distribution given time is the same for individuals with of U is assumed not to change over time the same U u, irrespective of the group. i i = within groups, the model does not make No further assumptions are required on any assumptions about whether a particu- the functional form of h1, so the treatment lar individual has the same realization U in effect, equal to h (u, 1) h (u, 1) for indi- i 1 − 0 each period. Thus, the estimators derived by viduals with unobserved component u, can Athey and Imbens will be the same whether differ across individuals. Because the distri- one observes a panel of individuals over time bution of the unobserved component U can or a repeated cross section. Just as in the vary across groups, the average return to the standard DID approach, if one only wishes policy intervention can vary across groups as to estimate the effect of the intervention on well. 72 Journal of Economic Literature, Vol. XLVII (March 2009)

6.5.6 The Abadie–Diamond–Hainmueller group-level covariates may be averages of Artificial Control Group Approach individual level covariates, or quantiles of the distribution of within group covariates. The Abadie, Diamond, and Hainmueller idea is that the future path of the artificial (2007) develop a very interesting alternative control group, consisting of the -weighted λ approach to the setting with multiple control average of all the control groups, mimics the groups. See also Abadie and Gardeazabal path that would have been observed in the (2003). Here we discuss a simple version of treatment group in the absence of the treat- their approach, with T 1 time periods, ment. Applications in Abadie, Diamond, + and G 1 groups, one treated in the final and Hainmueller (2007) to estimation of the + period, and G not treated in either period. effect of smoking legislation in California and The Abadie–Diamond–Hainmueller idea is the effect of reunification on West Germany to construct an artificial control group that are very promising. is more similar to the treatment group in the initial period than any of the control groups 7. Multivalued and Continuous on their own. Let G G denote the treated i = Treatments group, and G 0, … , G 1 denote the G i = − control groups. The outcome for the final Most of the recent econometric program period treatment group in the absence of the evaluation literature has focused on the case treatment will be estimated as a weighted with a binary treatment. As a result this case average of period T outcomes in the G con- is now understood much better than it was trol groups, a decade or two ago. However, much less is known about settings with multivalued, dis- E ˆ ​[ Y (0) T T, G G] crete or continuous treatments. Such cases i | i = i = = G 1 are common in practice. Social programs are − __ ​ ​ ​ ​ g ​Y ​gT, rarely homogenous. Typically individuals are g∑0 λ · = assigned to various activities and regimes, G 1 with weights g satisfying ​ g −0​ ​ ​ g 1, and often sequentially, and tailored to their spe- λ ∑ = λ = 0. The weights are chosen to make the cific circumstances and characteristics. λg ≥ weighted control group resemble the treat- To provide some insight into the issues ment group prior to the treatment. That is, arising in settings with multivalued treat- the weights are chosen to minimize the ments we discuss in this review five sepa- λg difference between the treatment group and rate cases. First, the simplest setting where the weighted average of the control groups the treatment is discrete and one is willing prior to the treatment, namely, to assume unconfoundedness of the treat- ment assignment. In that case straightfor- G 1 __ − __ ward extensions of the binary treatment case ​Y ​G0 ​ ​ ​ g Y​ ​g0 − g∑0 λ · can be used to obtain estimates and infer- = ences for causal effects. Second, we look at ⋮ the case with a continuous treatment under G 1 __ − __ unconfoundedness. In that case, the defini- ​Y ​G,T 1 ​ ​ ​ g ​Y ​g,T 1 , − − g∑0 λ · − = tion of the propensity score requires some ǁ ǁ modification but many of the insights from where denotes a measure of distance. the binary treatment case still carry over. ǁ · ǁ One can also add group level covariates to Third, we look at the case where units can be the criterion to determine the weights. These exposed to a sequence of binary treatments. Imbens and Wooldridge: Econometrics of Program Evaluation 73

For example, an individual may remain in a example, with three ­treatments, it may be training program for a number of periods. In that no units are exposed to treatment level 2 each period the assignment to the program if Xi is in some subset of the covariate space. is assumed to be unconfounded, given per- The insights from the binary case directly manent characteristics and outcomes up to extend to this multiple (but few) treatment that point. In the last two cases we briefly case. If the number of treatments is relatively discuss multivalued endogenous treatments. large, one may wish to smooth across treat- In the fourth case, we look at settings with ment levels in order to improve precision of a discrete multivalued treatment in the pres- the inferences. ence of endogeneity. We allow the treatment 7.2 Continuous Treatments with to be continuous in the final case. The last Unconfounded Treatment Assignment two cases tie in closely with the simultane- ous equations literature, where, somewhat In the case where the treatment taking separately from the program evaluation lit- on many values, Imbens (2000), Lechner erature, there has been much recent work on (2001, 2004), Hirano and Imbens (2004), nonparametric identification and estimation. and Carlos A. Flores (2005) extended some Especially in the discrete case, many of the of the propensity score methodology under results in this literature are negative in the unconfoundedness. The key maintained sense that, without unattractive restrictions assumption is that adjusting for pre-treat- on heterogeneity or functional form, few ment differences removes all biases, and thus objects of interest are point-identified. Some solves the problem of drawing causal infer- of the literature has turned toward establish- ences. This is formalized by using the con- ing bounds. This is an area with much ongo- cept of weak unconfoundedness, introduced ing work and considerable scope for further by Imbens (2000). Assignment to treatment research. Wi is weakly unconfounded, given pre-treat- ment variables X , if 7.1 Multivalued Discrete Treatments with i Unconfounded Treatment Assignment Wi ǁ Yi(w) | Xi, If there are a few different levels of the treatment, rather than just two, essentially all for all w. Compare this to the stronger of the methods discussed before go through assumption made by Rosenbaum and Rubin in the unconfoundedness case. Suppose, for (1983b) in the binary case: example, that the treatment can be one of three levels, say W {0, 1, 2}. In order to estimate W (Y (0), Y (1)) | X , i ∈ i ǁ i i i the effect of treatment level 2 relative to treat- ment level 1, one can simply put aside the data which requires the treatment Wi to be for units exposed to treatment level 0 if one independent of the entire set of potential is willing to assume unconfoundedness. More outcomes. Instead, weak unconfounded- specifically, one can estimate the average out- ness requires only pairwise independence come for each treatment level conditional on of the treatment with each of the potential the covariates, E[Y (w) X x], using data on outcomes. A similar assumption is used in i | i = units exposed to treatment level w, and aver- Robins and Rotnitzky (1995). The definition age these over the (estimated) marginal dis- of weak unconfoundedness is also similar ˆ tribution of the covariates, F​ X​ (x). In practice, to that of “missing at random” (Rubin 1976, the overlap assumption may more likely to be 1987; Roderick J. A. Little and Rubin 1987) violated with more than two treatments. For in the missing data literature. 74 Journal of Economic Literature, Vol. XLVII (March 2009)

Although in substantive terms the weak the assignment mechanism; see for example, unconfoundedness assumption is not very Marshall M. Joffe and Rosenbaum (1999). different from the assumption used by Because weak unconfoundedness given all Rosenbaum and Rubin (1983b), it is important pretreatment variables implies weak uncon- that one does not need the stronger assump- foundedness given the generalized propen- tion to validate estimation of the expected sity score, one can estimate average outcomes value of Yi(w) by adjusting for Xi: under by conditioning solely on the generalized weak unconfoundedness, we have E[Y (w) X ] propensity score. If assignment to treatment i | i E[Y (w) W w, X ] E[Y W w, X ], is weakly unconfounded given pretreatment = i | i = i = i | i = i and expected outcomes can then be esti- variables X, then two results follow. First, for mated by averaging these conditional means: all w, E[Y (w)] E[E[Y (w) X ]]. In practice, it can i = i | i be difficult to estimate E[Y (w)] in this man- (w, r) E[Y (w) | r(w, X ) r] i β ≡ i i = ner when the dimension of Xi is large, or if w takes on many values, because the first E[Y W w, r(W , X ) r], = i | i = i i = step requires estimation of the expectation of Yi(w) given the treatment level and all pre- which can be estimated using data on Yi, Wi, treatment variables. It was this difficulty that and r (Wi, Xi). Second, the average outcome motivated Rosenbaum and Rubin (1983b) to given a particular level of the treatment, develop the propensity score methodology. E[Yi(w)], can be estimated by appropriately Imbens (2000) introduces the general- averaging (w, r): β ized propensity score for the multiple treat- ment case. It is the conditional probability of E[Y (w)] E[ (w, r (w, X ))]. i = β i receiving a particular level of the treatment given the pretreatment variables: As with the implementation of the binary treatment propensity score methodology, the r(w, x) pr(W w X x). implementation of the generalized propensity ≡ i = | i = score method consists of three steps. In the In the continuous case, where, say, Wi first step the score r (w, x) is estimated. With takes values in the unit interval, r (w, x) a binary treatment the standard approach FW X(w x). Suppose assignment to treat- (Rosenbaum and Rubin 1984; Rosenbaum = | | ment Wi is weakly unconfounded given pre- 1995) is to estimate the propensity score treatment variables Xi. Then, by the same using a logistic regression. More generally, if argument as in the binary treatment case, the treatments correspond to ordered levels assignment is weakly unconfounded given of a treatment, such as the dose of a drug or the generalized propensity score, as 0, the time over which a treatment is applied, δ → one may wish to impose smoothness of the 1{w W w } Y (w) | r(w, X ), score in w. For continuous W , Hirano and − δ ≤ i ≤ + δ ǁ i i i Imbens (2004) use a lognormal distribution. for all w. This is the point where using the In the second step, the conditional expecta- weak form of the unconfoundedness assump- tion (w, r) E[Y W w, r(W , X ) r] is β = i | i = i i = tion is important. There is, in general, no sca- estimated. Again, the implementation may be lar function of the covariates such that the different in the case where the levels of the level of the treatment Wi is independent of treatment are qualitatively distinct than in the set of potential outcomes {Yi(w)}w [0,1], the case where smoothness of the conditional ∈ unless additional structure is imposed on expectation function in w is ­appropriate. Imbens and Wooldridge: Econometrics of Program Evaluation 75

Here, some form of linear or nonlinear instrument, the instrumental variables esti- regression may be used. In the third step the mand can still be interpreted as an average average response at treatment level w is esti- causal effect, but with a complicated weight- mated as the average of the estimated con- ing scheme. There are essentially two levels ditional expectation, ​ ˆ (w, r (w, X )), averaged of averaging going on. First, at each level β​ i over the distribution of the pretreatment of the treatment we can only get the aver- variables, X1, … , XN. Note that to get the age effect of a unit increase in the treatment average E[Yi(w)], the second argument in the for compliers at that level. In addition, there conditional expectation (w, r) is evaluated at is averaging over all levels of the treatment, β r (w, Xi), not at r (Wi, Xi). with the weights equal to the proportion of compliers at that level. 7.2.1 Dynamic Treatments with Imbens (2007) studies, in more detail, Unconfounded Treatment Assignment the case where the endogenous treatment Multiple-valued treatments can arise takes on three values and shows the limits to because at any point in time individuals identification in the case with heterogenous can be assigned to multiple different treat- treatment effects. ment arms, or because they can be assigned 7.4 Continuous Endogenous Treatments sequentially to different treatments. Gill and Robins (2001) analyze this case, where they Perhaps surprisingly, there are many assume that at any point in time an uncon- more results for the case with continuous foundedness assumption holds. Lechner endogenous treatments than for the discrete and Miquel (2005) (see also Lechner 1999, case that do not impose restrictive assump- and Lechner, Miquel, and Conny Wunsch tions. Much of the focus has been on tri- 2004) study a related case, where again a angular ­systems, with a single unobserved sequential unconfoundedness assumption is component of the equation determining the maintained to identify the average effects treatment: of interest. Abbring and Gerard J. van den W h(Z , ), Berg (2003) study settings with duration i = i ηi data. These methods hold great promise but, where is scalar, and an essentially unre- until now, there have been few substantive ηi stricted outcome equation: applications. Y g(W , ), 7.3 Multivalued Discrete Endogenous i = i εi Treatments where may be a vector. Blundell and James εi In settings with general heterogeneity in L. Powell (2003, 2004), Chernozhukov and the effects of the treatment, the case with Hansen (2005), Imbens and Newey (forth- more than two treatment levels is consider- coming), and Andrew Chesher (2003) study ably more challenging than the binary case. various versions of this setup. Imbens and There are few studies investigating identifi- Newey (forthcoming) show that if h(z, ) is η cation in these settings. Angrist and Imbens strictly monotone in , then one can iden- η (1995) and Angrist, Kathryn Graddy and tify average effects of the treatment subject Imbens (2000) study the interpretation of to support conditions on the instrument. the standard instrumental variable estimand, They suggest a control function approach the ratio of the covariances of outcome and to estimation. First is normalized to have η instrument and treatment and instrument. a uniform distribution on [0, 1] (e.g., Rosa They show that in general, with a valid L. Matzkin 2003). Then is estimated ηi 76 Journal of Economic Literature, Vol. XLVII (March 2009)

ˆ as ​ ˆ i F​ W​ Z (Wi Zi). In the second stage, Yi References η​ = | | is regressed nonparametrically on X and ​ ˆ . i ηi​ Abadie, Alberto. 2002. “Bootstrap Tests of Distribu- Chesher (2003) studies local versions of this tional Treatment Effects in Instrumental Variable problem. Models.” Journal of the American Statistical Asso- When the treatment equation has an addi- ciation, 97(457): 284–92. Abadie, Alberto. 2003. “Semiparametric Instrumental tive form, say W h (Z ) , where is i = 1 i + ηi ηi Variable Estimation of Treatment Response Mod- independent of Zi, Blundell and Powell (2003, els.” , 113(2): 231–63. 2004) derive nonparametric control function Abadie, Alberto. 2005. “Semiparametric Difference- in-Differences Estimators.” Review of Economic methods for estimating the average struc- Studies, 72(1): 1–19. tural function, E[g(w, i)]. The general idea Abadie, Alberto, Joshua D. Angrist, and Guido W. ε ˆ is to first obtain residuals, ​ ˆ W ​ h ​(Z ) Imbens. 2002. “Instrumental Variables Estimates of ηi​ = i − 1 i the Effect of Subsidized Training on the Quantiles from a nonparametric regression. Next, a of Trainee Earnings.” , 70(1): 91–117. nonparametric regression of Y on W and ​ ˆ Abadie, Alberto, Alexis Diamond, and Jens Hainmuel- i i ηi​ is used to recover m(w, ) E(Y W w, ler. 2007. “Synthetic Control Methods for Compara- η = i | i = ηi t ive Ca se St ud ies: Est imat ing t he Effect of Ca l ifor n ia’s ). Blundell and Powell show that the aver- = η Tobacco Control Program.” National Bureau of Eco- age structural function is generally identified nomic Research Working Paper 12831. Abadie, Alberto, David Drukker, Jane Leber Herr, and as E[m(w, i)], which is easily estimated by η Guido W. Imbens. 2004. “Implementing Matching averaging out ​ ˆ across the sample. ηi​ Estimators for Average Treatment Effects in Stata.” Stata Journal, 4(3): 290–311. Abadie, Alberto, and Javier Gardeazabal. 2003. “The 8. Conclusion Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review, Over the last two decades, there has 93(1): 113–32. been a proliferation of the literature on pro- Abadie, Alberto, and Guido W. Imbens. 2006. “Large Sample Properties of Matching Estimators for gram evaluation. This includes ­theoretical Average Treatment Effects.” Econometrica, 74(1): ­econometrics work, as well as empirical 235–67. work. Important features of the modern lit- Abadie, Alberto, and Guido W. Imbens. 2008a. “Bias Corrected Matching Estimators for Average Treat- erature are the convergence of the statistical ment Effects.” Unpublished. and econometric literatures, with the Rubin Abadie, Alberto, and Guido W. Imbens. Forthcoming. potential outcomes framework now the dom- “Estimation of the Conditional Variance in Paired Experiments. Annales d’Economie et de Statistique. inant framework. The modern literature has Abadie, Alberto, and Guido W. Imbens. 2008b. “On stressed the importance of relaxing func- the Failure of the Bootstrap for Matching Estima- tional form and distributional assumptions, tors.” Econometrica, 76(6): 1537–57. Abbring, Jaap H., and James J. Heckman. 2007. and has allowed for general heterogeneity in “Econometric Evaluation of Social Programs, Part the effects of the treatment. This has led to III: Distributional Treatment Effects, Dynamic renewed interest in identification questions, Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy Evaluation.” In Hand- leading to unusual and controversial esti- book of Econometrics, Volume 6B, ed. James J. mands such as the local average treatment Heckman and Edward E. Leamer, 5145–5303. effect (Imbens and Angrist 1994), as well Amsterdam; New York and Oxford: Elsevier Sci- ence, North-Holland. as to the literature on partial identification Abbring, Jaap H., and Gerard J. van den Berg. 2003. (Manski 1990). It has also borrowed heav- “The Nonparametric Identification of Treatment ily from the semiparametric literature, using Effects in Duration Models.” Econometrica, 71(5): 1491–1517. both efficiency bound results (Hahn 1998) Andrews, Donald W. K., and Gustavo Soares. 2007. and methods for inference based on series “Inference for Parameters Defined By Moment and kernel estimation (Newey 1994a, 1994b). Inequalities Using Generalized Moment Selection.” Cowles Foundation Discussion Paper 1631. It has by now matured to the point that it is Angrist, Joshua D. 1990. “Lifetime Earnings and the of great use for practitioners. Vietnam Era Draft Lottery: Evidence from Social Imbens and Wooldridge: Econometrics of Program Evaluation 77

Security Administrative Records.” American Eco- Economic Research Working Paper 6600. nomic Review, 80(3): 313–36. Attanasio, Orazio, Costas Meghir, and Ana Santiago. Angrist, Joshua D. 1998. “Estimating the Labor Market 2005. “Education Choices in Mexico: Using a Struc- Impact of Voluntary Military Service Using Social tural Model and a Randomized Experiment to Eval- Security Data on Military Applicants.” Economet- uate Progresa.” Institute for Fiscal Studies Centre rica, 66(2): 249–88. for the Evaluation of Development Policies Working Angrist, Joshua D. 2004. “Treatment Effect Hetero- Paper EWP05/01. geneity in Theory and Practice.” Economic Journal, Austin, Peter C. 2008a. “A Critical Appraisal of Pro- 114(494): C52–83. pensity-Score Matching in the Medical Literature Angrist, Joshua D., Eric Bettinger, and Michael Kre- between 1996 and 2003.” Statistics in Medicine, mer. 2006. “Long-Term Educational Consequences 27(12): 2037–49. of Secondary School Vouchers: Evidence from Austin, Peter C. 2008b. “Discussion of ‘A Critical Administrative Records in Colombia.” American Appraisal of Propensity-Score Matching in the Med- Economic Review, 96(3): 847–62. ical Literature between 1996 and 2003’: Rejoinder.” Angrist, Joshua D., Kathryn Graddy, and Guido W. Statistics in Medicine, 27(12): 2066–69. Imbens. 2000. “The Interpretation of Instrumen- Balke, Alexander, and Judea Pearl. 1994. “Nonpara- tal Variables Estimators in Simultaneous Equations metric Bounds of Causal Effects from Partial Com- Models with an Application to the Demand for Fish.” pliance Data.” University of California Los Angeles Review of Economic Studies, 67(3): 499–527. Cognitive Systems Laboratory Technical Report Angrist, Joshua D., and Jinyong Hahn. 2004. “When to R-199. Control for Covariates? Panel Asymptotics for Esti- Banerjee, Abhijit V., Shawn Cole, Esther Duflo, and mates of Treatment Effects.” Review of Economics Leigh Linden. 2007. “Remedying Education: Evi- and Statistics, 86(1): 58–72. dence from Two Randomized Experiments in .” Angrist, Joshua D., and Guido W. Imbens. 1995. “Two- Quarterly Journal of Economics, 122(3): 1235–64. Stage Least Squares Estimation of Average Causal Barnow, Burt S., Glend G. Cain, and Arthur S. Gold- Effects in Models with Variable Treatment Inten- berger. 1980. “Issues in the Analysis of Selectivity sity.” Journal of the American Statistical Associa- Bias.” In Evaluation Studies, Volume 5, ed. Ernst W. tion, 90(430): 431–42. Stromsdorfer and George Farkas, 43–59. San Fran- Angrist, Joshua D., Guido W. Imbens, and Donald B. cisco: Sage. Rubin. 1996. “Identification of Causal Effects Using Becker, Sascha O., and Andrea Ichino. 2002. “Estima- Instrumental Variables.” Journal of the American tion of Average Treatment Effects Based on Propen- Statistical Association, 91(434): 444–55. sity Scores.” Stata Journal, 2(4): 358–77. Angrist, Joshua D., and Alan B. Krueger. 1999. “Empir- Behncke, Stefanie, Markus Frölich, and Michael Lech- ical Strategies in Labor Economics.” In Handbook of ner. 2006. “Statistical Assistance for Programme Labor Economics, Volume 3A, ed. Orley Ashenfelter Selection—For a Better Targeting of Active Labour and David Card, 1277–1366. Amsterdam; New York Market Policies in Switzerland.” University of St. and Oxford: Elsevier Science, North-Holland. Gallen Department of Economics Discussion Paper Angrist, Joshua D., and Kevin Lang. 2004. “Does 2006-09. School Integration Generate Peer Effects? Evidence Beresteanu, Arie, and Francesca Molinari. 2006. from Boston’s Metco Program.” American Economic “Asymptotic Properties for a Class of Partially Iden- Review, 94(5): 1613–34. tified Models.” Institute for Fiscal Studies Centre Angrist, Joshua D., and Victor Lavy. 1999. “Using Mai- for Microdata Methods and Practice Working Paper monides’ Rule to Estimate the Effect of Class Size CWP10/06. on Scholastic Achievement.” Quarterly Journal of Bertrand, Marianne, Esther Duflo, and Sendhil Mul- Economics, 114(2): 533–75. lainathan. 2004. “How Much Should We Trust Angrist, Joshua D., and Jörn-Steffen Pischke. 2009. Differences-in-Differences Estimates?” Quarterly Mostly Harmless Econometrics: An Empiricist’s Journal of Economics, 119(1): 249–75. Companion. Princeton: Princeton University Press. Bertrand, Marianne, and Sendhil Mullainathan. 2004. Ashenfelter, Orley. 1978. “Estimating the Effect of “Are Emily and Greg More Employable than Lak- Training Programs on Earnings.” Review of Eco- isha and Jamal? A Field Experiment on Labor Mar- nomics and Statistics, 6(1): 47–57. ket Discrimination.” American Economic Review, Ashenfelter, Orley, and David Card. 1985. “Using the 94(4): 991–1013. Longitudinal Structure of Earnings to Estimate the Besley, Timothy, and Anne C. Case. 2000. “Unnatu- Effect of Training Programs.” Review of Economics ral Experiments? Estimating the Incidence of and Statistics, 67(4): 648–60. Endogenous Policies.” Economic Journal, 110(467): Athey, Susan, and Guido W. Imbens. 2006. “Identifica- F672–94. tion and Inference in Nonlinear Difference-in-Dif- Bierens, Herman J. 1987. “Kernel Estimators of Regres- ferences Models.” Econometrica, 74(2): 431–97. sion Functions.” In Advances in Econometrics: Fifth Athey, Susan, and Scott Stern. 1998. “An Empirical World Congress, Volume 1, ed. Truman F. Bew- Framework for Testing Theories About Complimen- ley, 99–144. Cambridge and New York: ­Cambridge tarity in Organizational Design.” National Bureau of ­University Press. 78 Journal of Economic Literature, Vol. XLVII (March 2009)

Bitler, Marianne, Jonah Gelbach, and Hilary Hoynes. Economic Perspectives, 9(2): 63–84. 2006. “What Mean Impacts Miss: Distributional Busso, Matias, John DiNardo, and Justin McCrary. Effects of Welfare Reform Experiments.” American 2008. “Finite Sample Properties of Semipara- Economic Review, 96(4): 988–1012. metric Estimators of Average Treatment Effects.” Björklund, Anders, and Robert Moffitt. 1987. “The Unpublished. Estimation of Wage Gains and Welfare Gains in Caliendo, Marco. 2006. Microeconometric Evaluation Self-Selection.” Review of Economics and Statistics, of Labour Market Policies. Heidelberg: Springer, 69(1): 42–49. Physica-Verlag. Black, Sandra E. 1999. “Do Better Schools Matter? Cameron, A. Colin, and Pravin K. Trivedi. 2005. Parental Valuation of Elementary Education.” Quar- Microeconometrics: Methods and Applications. terly Journal of Economics, 114(2): 577–99. Cambridge and New York: Cambridge University Bloom, Howard S. 1984. “Accounting for No-Shows Press. in Experimental Evaluation Designs.” Evaluation Canay, Ivan A. 2007. “EL Inference for Partially Iden- Review, 8(2): 225–46. tified Models: Large Deviations Optimally and Bloom, Howard S., ed. 2005. Learning More from Bootstrap Validity.” Unpublished. Social Experiments: Evolving Analytic Approaches. Card, David. 1990. “The Impact of the Mariel Boatlift New York: Russell Sage Foundation. on the Miami Labor Market.” Industrial and Labor Blundell, Richard, and Monica Costa Dias. 2002. Relations Review, 43(2): 245–57. “Alternative Approaches to Evaluation in Empirical Card, David. 2001. “Estimating the Return to School- Microeconomics.” Institute for Fiscal Studies Cen- ing: Progress on Some Persistent Econometric Prob- tre for Microdata Methods and Practice Working lems.” Econometrica, 69(5): 1127–60. Paper CWP10/02. Card, David, Carlos Dobkin, and Nicole Maestas. Blundell, Richard, Monica Costa Dias, Costas Meghir, 2004. “The Impact of Nearly Universal Insurance and John Van Reenen. 2001. “Evaluating the Coverage on Health Care Utilization and Health: Employment Impact of a Mandatory Job Search Evidence from Medicare.” National Bureau of Eco- Assistance Program.” Institute for Fiscal Studies nomic Research Working Paper 10365. Working Paper WP01/20. Card, David, and Dean R. Hyslop. 2005. “Estimating Blundell, Richard, Alan Duncan, and Costas Meghir. the Effects of a Time-Limited Earnings Subsidy for 1998. “Estimating Labor Supply Responses Using Welfare-Leavers.” Econometrica, 73(6): 1723–70. Tax Reforms.” Econometrica, 66(4): 827–61. Card, David, and Alan B. Krueger. 1993. “Trends in Blundell, Richard, Amanda Gosling, Hidehiko Relative Black–White Earnings Revisited.” Ameri- Ichimura, and Costas Meghir. 2004. “Changes in the can Economic Review, 83(2): 85–91. Distribution of Male and Female Wages Accounting Card, David, and Alan B. Krueger. 1994. “Minimum for Employment Composition Using Bounds.” Insti- Wages and Employment: A Case Study of the Fast- tute for Fiscal Studies Working Paper W04/25. Food Industry in New Jersey and Pennsylvania.” Blundell, Richard, and Thomas MaCurdy. 1999. “Labor American Economic Review, 84(4): 772–93. Supply: A Review of Alternative Approaches.” In Card, David, and Phillip B. Levine. 1994. “Unemploy- Handbook of Labor Economics, Volume 3A, ed. ment Insurance Taxes and the Cyclical and Seasonal Orley Ashenfelter and David Card, 1559–1695. Properties of Unemployment.” Journal of Public Amsterdam; New York and Oxford: Elsevier Sci- Economics, 53(1): 1–29. ence, North-Holland. Card, David, Alexandre Mas, and Jesse Rothstein. Blundell, Richard, and James L. Powell. 2003. “Endo- 2007. “Tipping and the Dynamics of Segregation.” geneity in Nonparametric and Semiparametric National Bureau of Economic Research Working Regression Models.” In Advances in Economics and Paper 13052. Econometrics: Theory and Applications, Eighth Card, David, and Brian P. McCall. 1996. “Is Workers’ World Congress, Volume 2, ed. Mathias Dewatri- Compensation Covering Uninsured Medical Costs? pont, Lars Peter Hansen, and Stephen J. Turnovsky, Evidence from the ‘Monday Effect.’” Industrial and 312–57. Cambridge and New York: Cambridge Uni- Labor Relations Review, 49(4): 690–706. versity Press. Card, David, and Philip K. Robins. 1996. “Do Finan- Blundell, Richard, and James L. Powell. 2004. “Endo- cial Incentives Encourage Welfare Recipients to geneity in Semiparametric Binary Response Mod- Work? Evidence from a Randomized Evaluation of els.” Review of Economic Studies, 71(3): 655–79. the Self-Sufficiency Project.” National Bureau of Brock, William, and Steven N. Durlauf. 2000. “Interac- Economic Research Working Paper 5701. tions-Based Models.” National Bureau of Economic Card, David, and Daniel G. Sullivan. 1988. “Measur- Research Technical Working Paper 258. ing the Effect of Subsidized Training Programs on Bruhn, Miriam, and David McKenzie. 2008. “In Pur- Movements In and Out of Employment.” Economet- suit of Balance: Randomization in Practice in Devel- rica, 56(3): 497–530. opment Field Experiments.” Policy Case, Anne C., and Lawrence F. Katz. 1991. “The Research Working Paper 4752. Company You Keep: The Effects of Family and Burtless, Gary. 1995. “The Case for Randomized Field Neighborhood on Disadvantaged Youths.” National Trials in Economic and Policy Research.” Journal of Bureau of Economic Research Working Paper 3705. Imbens and Wooldridge: Econometrics of Program Evaluation 79

Chamberlain, Gary. 1986. “Asymptotic Efficiency in Davison, A. C., and D. V. Hinkley. 1997. Bootstrap Semi-parametric Models with Censoring.” Journal Methods and Their Application. Cambridge and of Econometrics, 32(2): 189–218. New York: Cambridge University Press. Chattopadhyay, Raghabendra, and Esther Duflo. Dehejia, Rajeev H. 2003. “Was There a Riverside 2004. “Women as Policy Makers: Evidence from a Miracle? A Hierarchical Framework for Evaluating Randomized Policy Experiment in India.” Econo- Programs with Grouped Data.” Journal of Business metrica, 72(5): 1409–43. and Economic Statistics, 21(1): 1–11. Chay, Kenneth Y., and Michael Greenstone. 2005. Dehejia, Rajeev H. 2005a. “Practical Propensity Score “Does Air Quality Matter? Evidence from the Hous- Matching: A Reply to Smith and Todd.” Journal of ing Market.” Journal of Political Economy, 113(2): Econometrics, 125(1–2): 355–64. 376–424. Dehejia, Rajeev H. 2005b. “Program Evaluation as a Chen, Susan, and Wilbert van der Klaauw. 2008. “The Decision Problem.” Journal of Econometrics, 125(1– Work Disincentive Effects of the Disability Insur- 2): 141–73. ance Program in the 1990s.” Journal of Economet- Dehejia, Rajeev H., and Sadek Wahba. 1999. “Causal rics, 142(2): 757–84. Effects in Nonexperimental Studies: Reevaluat- Chen, Xiaohong. 2007. “Large Sample Sieve Estima- ing the Evaluation of Training Programs.” Journal tion of Semi-nonparametric Models.” In Handbook of the American Statistical Association, 94(448): of Econometrics, Volume 6B, ed. James J. Heckman 1053–62. and Edward E. Leamer, 5549–5632. Amsterdam Diamond, Alexis, and Jasjeet S. Sekhon. 2008. “Genetic and Oxford: Elsevier, North-Holland. Matching for Estimating Causal Effects: A General Chen, Xiaohong, Han Hong, and Alessandro Tarozzi. Multivariate Matching Method for Achieving Bal- 2008. “Semiparametric Efficiency in GMM Mod- ance in Observational Studies.” Unpublished. els with Auxiliary Data.” Annals of Statistics, 36(2): DiNardo, John, and David S. Lee. 2004. “Economic 808–43. Impacts of New Unionization on Private Sector Chernozhukov, Victor, and Christian B. Hansen. Employers: 1984–2001.” Quarterly Journal of Eco- 2005. “An IV Model of Quantile Treatment Effects.” nomics, 119(4): 1383–1441. Econometrica, 73(1): 245–61. Doksum, Kjell. 1974. “Empirical Probability Plots Chernozhukov, Victor, Han Hong, and Elie Tamer. and Statistical Inference for Nonlinear Models in 2007. “Estimation and Confidence Regions for the Two-Sample Case.” Annals of Statistics, 2(2): Parameter Sets in Econometric Models.” Economet- 267–77. rica, 75(5): 1243–84. Donald, Stephen G., and Kevin Lang. 2007. “Inference Chesher, Andrew. 2003. “Identification in Nonsepa- with Difference-in-Differences and Other Panel rable Models.” Econometrica, 71(5): 1405–41. Data.” Review of Economics and Statistics, 89(2): Chetty, Raj, Adam Looney, and Kory Kroft. Forthcom- 221–33. ing. “Salience and Taxation: Theory and Evidence.” Duflo, Esther. 2001. “Schooling and Labor Market American Economic Review. Consequences of School Construction in Indone- Cochran, William G. 1968. “The Effectiveness of sia: Evidence from an Unusual Policy Experiment.” Adjustment by Subclassification in Removing Bias in American Economic Review, 91(4): 795–813. Observational Studies.” Biometrics, 24(2): 295–314. Duflo, Esther, William Gale, Jeffrey B. Liebman, Peter Cochran, William G., and Donald B. Rubin. 1973. Orszag, and Emmanuel Saez. 2006. “Saving Incen- “Controlling Bias in Observational Studies: A tives for Low- and Middle-Income Families: Evi- Review.” Sankhya, 35(4): 417–46. dence from a Field Experiment with H&R Block.” Cook, Thomas D. 2008. “‘Waiting for Life to Arrive’: Quarterly Journal of Economics, 121(4): 1311–46. A History of the Regression–Discontinuity Design Duflo, Esther, Rachel Glennerster, and Michael Kre- in Psychology, Statistics and Economics.” Journal of mer. 2008. “Using Randomization in Development Econometrics, 142(2): 636–54. Economics Research: A Toolkit.” In Handbook of Cook, Philip J., and George Tauchen. 1982. “The Effect Development Economics, Volume 4, ed. T. Paul of Liquor Taxes on Heavy Drinking.” Bell Journal of Schultz and John Strauss, 3895–3962. Amsterdam Economics, 13(2): 379–90. and Oxford: Elsevier, North-Holland. Cook, Philip J., and George Tauchen. 1984. “The Effect Duflo, Esther, and Rema Hanna. 2005. “Monitor- of Minimum Drinking Age Legislation on Youthful ing Works: Getting Teachers to Come to School.” Auto Fatalities, 1970–1977.” Journal of Legal Stud- National Bureau of Economic Research Working ies, 13(1): 169–90. Paper 11880. Crump, Richard K., V. Joseph Hotz, Guido W. Imbens, Duflo, Esther, and Emmanuel Saez. 2003. “The Role and Oscar A. Mitnik. 2009. “Dealing with Lim- of Information and Social Interactions in Retire- ited Overlap in Estimation of Average ­Treatment ment Plan Decisions: Evidence from a Random- Effects.” Biometrika, 96:187–99. ized Experiment.” Quarterly Journal of Economics, Crump, Richard K., V. Joseph Hotz, Guido W. Imbens, 118(3): 815–42. and Oscar A. Mitnik. 2008. “Nonparametric Tests Efron, Bradley, and Robert J. Tibshirani. 1993. An for Treatment Effect Heterogeneity.” Review of Introduction to the Bootstrap. New York and ­Economics and Statistics, 90(3): 389–405. ­London: Chapman and Hall. 80 Journal of Economic Literature, Vol. XLVII (March 2009)

Eissa, Nada, and Jeffrey B. Liebman. 1996. “Labor Econometrica, 52(3): 681–700. Supply Response to the Earned Income Tax Credit.” Graham, Bryan S. 2008. “Identifying Social Interac- Quarterly Journal of Economics, 111(2): 605–37. tions through Conditional Variance Restrictions.” Engle, Robert F., David F. Hendry, and Jean-Francois Econometrica, 76(3): 643–60. Richard. 1983. “Exogeneity.” Econometrica, 51(2): Graham, Bryan S., Guido W. Imbens, and Geert Rid- 277–304. der. 2006. “Complementarity and Aggregate Impli- Fan, J., and I. Gijbels. 1996. Local Polynomial Mod- cations of Assortative Matching: A Nonparametric elling and Its Applications. London: Chapman and Analysis.” Unpublished. Hall. Greenberg, David, and Michael Wiseman. 1992. “What Ferraz, Claudio, and Frederico Finan. 2008. “Exposing Did the OBRA Demonstrations Do?” In Evaluat- Corrupt Politicians: The Effects of brazil’s Publicly ing Welfare and Training Programs, ed. Charles F. Released Audits on Electoral Outcomes.” Quarterly Manski and Irwin Garfinkel, 25–75. Cambridge and Journal of Economics, 123(2): 703–45. London: Harvard University Press. Firpo, Sergio. 2007. “Efficient Semiparametric Esti- Gu, X., and Paul R. Rosenbaum. 1993. “Comparison mation of Quantile Treatment Effects.” Economet- of Multivariate Matching Methods: Structures, Dis- rica, 75(1): 259–76. tances and Algorithms.” Journal of Computational Fisher, Ronald A. 1935. The Design of Experiments, and Graphical Statistics, 2(4): 405–20. First edition. London: Oliver and Boyd. Gueron, Judith M., and Edward Pauly. 1991. From Wel- Flores, Carlos A. 2005. “Estimation of Dose-Response fare to Work. New York: Russell Sage Foundation. Functions and Optimal Doses with a Continuous Haavelmo, Trygve. 1943. “The Statistical Implications Treatment.” Unpublished. of a System of Simultaneous Equations.” Economet- Fraker, Thomas, and Rebecca Maynard. 1987. “The rica, 11(1): 1–12. Adequacy of Comparison Group Designs for Evalu- Hahn, Jinyong. 1998. “On the Role of the Propensity ations of Employment-Related Programs.” Journal Score in Efficient Semiparametric Estimation of of Human Resources, 22(2): 194–227. Average Treatment Effects.” Econometrica, 66(2): Friedlander, Daniel, and Judith M. Gueron. 1992. “Are 315–31. High-Cost Services More Effective than Low-Cost Hahn, Jinyong, Petra E. Todd, and Wilbert van der Services?” In Evaluating Welfare Training Pro- Klaauw. 2001. “Identification and Estimation of grams, ed. Charles F. Manski and Irwin Garfinkel, Treatment Effects with a Regression-Discontinuity 143–98. Cambridge and London: Harvard Univer- Design.” Econometrica, 69(1): 201–09. sity Press. Ham, John C., and Robert J. LaLonde. 1996. “The Friedlander, Daniel, and Philip K. Robins. 1995. Effect of Sample Selection and Initial Conditions “Evaluating Program Evaluations: New Evidence in Duration Models: Evidence from Experimental on Commonly Used Nonexperimental Methods.” Data on Training.” Econometrica, 64(1): 175–205. American Economic Review, 85(4): 923–37. Hamermesh, Daniel S., and Jeff E. Biddle. 1994. Frölich, Markus. 2004a. “Finite-Sample Properties of “Beauty and the Labor Market.” American Eco- Propensity-Score Matching and Weighting Estima- nomic Review, 84(5): 1174–94. tors.” Review of Economics and Statistics, 86(1): Hansen, B. B. 2008. “The Essential Role of Balance 77–90. Tests in Propensity-Matched Observational Studies: Frölich, Markus. 2004b. “A Note on the Role of the Comments on ‘A Critical Appraisal of Propensity- Propensity Score for Estimating Average Treatment Score Matching in the Medical Literature between Effects.” Econometric Reviews, 23(2): 167–74. 1996 and 2003’ by Peter Austin.” Statistics in Medi- Gill, Richard D., and James M. Robins. 2001. “Causal cine, 27(12): 2050–54. Inference for Complex Longitudinal Data: The Hansen, Christian B. 2007a. “Asymptotic Properties of Continuous Case.” Annals of Statistics, 29(6): a Robust Variance Matrix Estimator for Panel Data 1785–1811. When T Is Large.” Journal of Econometrics, 141(2): Glaeser, Edward L., Bruce Sacerdote, and Jose A. 597–620. Scheinkman. 1996. “Crime and Social Interactions.” Hansen, Christian B. 2007b. “Generalized Least Quarterly Journal of Economics, 111(2): 507–48. Squares Inference in Panel and Multilevel Models Goldberger, Arthur S. 1972a. “Selection Bias in Evalu- with Serial Correlation and Fixed Effects.” Journal ating Treatment Effects: Some Formal Illustrations.” of Econometrics, 140(2): 670–94. Unpublished. Hanson, Samuel, and Adi Sunderam. 2008. “The Vari- Goldberger, Arthur S. 1972b. “Selection Bias in Evalu- ance of Average Treatment Effect Estimators in the ating Treatment Effects: The Case of Interaction.” Presence of Clustering.” Unpublished. Unpublished. Hardle, Wolfgang. 1990. Applied Nonparametric Gourieroux, C., A. Monfort, and A. Trognon. 1984a. Regression. Cambridge; New York and Melboure: “Pseudo Maximum Likelihood Methods: Appli- Cambridge University Press. cations to Poisson Models.” Econometrica, 52(3): Heckman, James J. 1990. “Varieties of Selection Bias.” 701–20. American Economic Review, 80(2): 313–18. Gourieroux, C., A. Monfort, and A. Trognon. 1984b. Heckman, James J., and V. Joseph Hotz. 1989. “Pseudo Maximum Likelihood Methods: Theory.” ­“Choosing among Alternative Nonexperimental Imbens and Wooldridge: Econometrics of Program Evaluation 81

Methods for Estimating the Impact of Social Pro- Heckman, James J., and Edward Vytlacil. 2007b. grams: The Case of Manpower Training.” Journal “Econometric Evaluation of Social Programs, Part of the American Statistical Association, 84(408): II: Using the Marginal Treatment Effect to Orga- 862–74. nize Alternative Econometric Estimators to Evalu- Heckman, James J., Hidehiko Ichimura, Jeffrey A. ate Social Programs, and to Forecast Their Effects Smith, and Petra E. Todd. 1998. “Characterizing in New Environments.” In Handbook of Economet- Selection Bias Using Experimental Data.” Econo- rics, Volume 6B, ed. James J. Heckman and Edward metrica, 66(5): 1017–98. E. Leamer, 4875–5143. Amsterdam and Oxford: Heckman, James J., Hidehiko Ichimura, and Petra E. Elsevier, North-Holland. Todd. 1997. “Matching as an Econometric Evalu- Hill, Jennifer. 2008. “Discussion of Research Using ation Estimator: Evidence from Evaluating a Job Propensity-Score Matching: Comments on ‘A Criti- Training Programme.” Review of Economic Studies, cal Appraisal of Propensity-Score Matching in the 64(4): 605–54. Medical Literature between 1996 and 2003’ by Peter Heckman, James J., Hidehiko Ichimura, and Petra E. Austin.” Statistics in Medicine, 27(12): 2055–61. Todd. 1998. “Matching as an Econometric Evalua- Hirano, Keisuke, and Guido W. Imbens. 2001. “Esti- tion Estimator.” Review of Economic Studies, 65(2): mation of Causal Effects Using Propensity Score 261–94. Weighting: An Application to Data on Right Heart Heckman, James J., Robert J. Lalonde, and Jeffrey A. Catheterization.” Health Services and Outcomes Smith. 1999. “The Economics and Econometrics of Research Methodology, 2(3–4): 259–78. Active Labor Market Programs.” In Handbook of Hirano, Keisuke, and Guido W. Imbens. 2004. “The Labor Economics, Volume 3A, ed. Orley Ashenfelter Propensity Score with Continuous Treatments.” In and David Card, 1865–2097. Amsterdam; New York Applied Bayesian Modeling and Causal Inference and Oxford: Elsevier Science, North-Holland. from Incomplete-Data Perspectives, ed. Andrew Heckman, James J., Lance Lochner, and Christopher Gelman and Xiao-Li Meng, 73–84. Hoboken, N.J.: Taber. 1999. “Human Capital Formation and Gen- Wiley. eral Equilibrium Treatment Effects: A Study of Tax Hirano, Keisuke, Guido W. Imbens, and Geert Ridder. and Tuition Policy.” Fiscal Studies, 20(1): 25–40. 2003. “Efficient Estimation of Average Treatment Heckman, James J., and Salvador Navarro-Lozano. Effects Using the Estimated Propensity Score.” 2004. “Using Matching, Instrumental Variables, and Econometrica, 71(4): 1161–89. Control Functions to Estimate Economic Choice Hirano, Keisuke, Guido W. Imbens, Donald B. Rubin, Models.” Review of Economics and Statistics, 86(1): and Xiao-Hua Zhou. 2000. “Assessing the Effect of 30–57. an Influenza Vaccine in an Encouragement Design.” Heckman, James J., and Richard Robb Jr. 1985. “Alter- Biostatistics, 1(1): 69–88. native Methods for Evaluating the Impact of Inter- Hirano, Keisuke, and Jack R. Porter. 2008. “Asymp- ventions.” In Longitudinal Analysis of Labor Market totics for Statistical Treatment Rules.” http:// Data, ed. James J. Heckman and Burton Singer, 156- www.u.arizona.edu/~hirano/hp3_2008_08_10.pdf. 245. Cambridge; New York and Sydney: Cambridge Holland, Paul W. 1986. “Statistics and Causal Infer- University Press. ence.” Journal of the American Statistical Associa- Heckman, James J., and Jeffrey A. Smith. 1995. tion, 81(396): 945–60. “Assessing the Case for Social Experiments.” Journal Horowitz, Joel L. 2001. “The Bootstrap.” In Hand- of Economic Perspectives, 9(2): 85–110. book of Econometrics, Volume 5, ed. James J. Heck- Heckman, James J., and Jeffrey A. Smith. 1997. “Mak- man and Edward Leamer, 3159–3228. Amsterdam; ing the Most Out of Programme Evaluations and London and New York: Elsevier Science, North- Social Experiments: Accounting for Heterogeneity Holland. in Programme Impacts.” Review of Economic Stud- Horowitz, Joel L., and Charles F. Manski. 2000. “Non- ies, 64(4): 487–535. parametric Analysis of Randomized Experiments Heckman, James J., Sergio Urzua, and Edward Vyt- with Missing Covariate and Outcome Data.” Jour- lacil. 2006. “Understanding Instrumental Variables nal of the American Statistical Association, 95(449): in Models with Essential Heterogeneity.” Review of 77–84. Economics and Statistics, 88(3): 389–432. Horvitz, D. G., and D. J. Thompson. 1952. “A Gener- Heckman, James J., and Edward Vytlacil. 2005. “Struc- alization of Sampling without Replacement from a tural Equations, Treatment Effects, and Econo- Finite Universe.” Journal of the American Statistical metric Policy Evaluation.” Econometrica, 73(3): Association, 47(260): 663–85. 669–738. Hotz, V. Joseph, Guido W. Imbens, and Jacob A. Kler- Heckman, James J., and Edward Vytlacil. 2007a. man. 2006. “Evaluating the Differential Effects of “Econometric Evaluation of Social Programs, Part Alternative Welfare-to-Work Training Components: I: Causal Models, Structural Models and Economet- A Reanalysis of the California GAIN Program.” ric Policy Evaluation.” In Handbook of Economet- Journal of Labor Economics, 24(3): 521–66. rics, Volume 6B, ed. James J. Heckman and Edward Hotz, V. Joseph, Guido W. Imbens, and Julie H. Mor- E. Leamer, 4779–4874. Amsterdam and Oxford: timer. 2005. “Predicting the Efficacy of Future Elsevier, North-Holland. Training Programs Using Past Experiences at Other 82 Journal of Economic Literature, Vol. XLVII (March 2009)

Locations.” Journal of Econometrics, 125(1–2): Imbens, Guido W., Whitney K. Newey, and Geert Rid- 241–70. der. 2005. “Mean-Squared-Error Calculations for Hotz, V. Joseph, Charles H. Mullin, and Seth G. Sand- Average Treatment Effects.” Unpublished. ers. 1997. “Bounding Causal Effects Using Data from Imbens, Guido W., and Donald B. Rubin. 1997a. a Contaminated Natural Experiment: Analysing the “Bayesian Inference for Causal Effects in Random- Effects of Teenage Childbearing.” Review of Eco- ized Experiments with Noncompliance.” Annals of nomic Studies, 64(4): 575–603. Statistics, 25(1): 305–27. Iacus, Stefano M., Gary King, and Giuseppe Porro. Imbens, Guido W., and Donald B. Rubin. 1997b. 2008. “Matching for Causal Inference without Bal- “Estimating Outcome Distributions for Compliers ance Checking.” Unpublished. in Instrumental Variables Models.” Review of Eco- Ichimura, Hidehiko, and Oliver Linton. 2005. “Asymp- nomic Studies, 64(4): 555–74. totic Expansions for Some Semiparametric Program Imbens, Guido W., and Donald B. Rubin. Forthcom- Evaluation Estimators.” In Identification and Infer- ing. Causal Inference in Statistics and the Social ence for Econometric Models: Essays in Honor of Sciences. Cambridge and New York: Cambridge Thomas Rothenberg, ed. Donald W. K. Andrews and University Press. James H. Stock, 149–70. Cambridge and New York: Imbens, Guido W., Donald B. Rubin, and Bruce I. Sac- Cambridge University Press. erdote. 2001. “Estimating the Effect of Unearned Ichimura, Hidehiko, and Petra E. Todd. 2007. “Imple- Income on Labor Earnings, Savings, and Consump- menting Nonparametric and Semiparametric Esti- tion: Evidence from a Survey of Lottery Players.” mators.” In Handbook of Econometrics, Volume American Economic Review, 91(4): 778–94. 6B, ed. James J. Heckman and Edward E. Leamer, Jin, Ginger Zhe, and Phillip Leslie. 2003. “The Effect 5369–5468. Amsterdam and Oxford: Elsevier, of Information on Product Quality: Evidence from North-Holland. Restaurant Hygiene Grade Cards.” Quarterly Jour- Imbens, Guido W. 2000. “The Role of the Propensity nal of Economics, 118(2): 409–51. Score in Estimating Dose-Response Functions.” Joffe, Marshall M., and Paul R. Rosenbaum. 1999. Biometrika, 87(3): 706–10. “Invited Commentary: Propensity Scores.” Ameri- Imbens, Guido W. 2003. “Sensitivity to Exogeneity can Journal of Epidemiology, 150(4): 327–33. Assumptions in Program Evaluation.” American Kitagawa, Toru. 2008. “Identification Bounds for the Economic Review, 93(2): 126–32. Local Average Treatment Effect.” Unpublished. Imbens, Guido W. 2004. “Nonparametric Estimation Kling, Jeffrey R., Jeffrey B. Liebman, and Lawrence of Average Treatment Effects under Exogeneity: A F. Katz. 2007. “Experimental Analysis of Neighbor- Review.” Review of Economics and Statistics, 86(1): hood Effects.” Econometrica, 75(1): 83–119. 4–29. Lalive, Rafael. 2008. “How Do Extended Benefits Imbens, Guido W. 2007. “Non-additive Models with Affect Unemployment Duration? A Regression Endogenous Regressors.” In Advances in Economics Discontinuity Approach.” Journal of Econometrics, and Econometrics: Theory and Applications, Ninth 142(2): 785–806. World Congress, Volume 3, ed. Richard Blundell, LaLonde, Robert J. 1986. “Evaluating the Economet- Whitney K. Newey, and Torsten Persson, 17–46. ric Evaluations of Training Programs with Experi- Cambridge and New York: Cambridge University mental Data.” American Economic Review, 76(4): Press. 604–20. Imbens, Guido W., and Joshua D. Angrist. 1994. “Iden- Lechner, Michael. 1999. “Earnings and Employment tification and Estimation of Local Average Treat- Effects of Continuous Off-the-Job Training in East ment Effects.” Econometrica, 62(2): 467–75. Germany after Unification.”Journal of Business and Imbens, Guido W., and Karthik Kalyanaraman. 2009. Economic Statistics, 17(1): 74–90. “Optimal Bandwidth Choice for the Regression Dis- Lechner, Michael. 2001. “Identification and Estima- continuity Estimator.” National Bureau of Economic tion of Causal Effects of Multiple Treatments under Research Working Paper 14726. the Conditional Independence Assumption.” In Imbens, Guido W., Gary King, David McKenzie, and Econometric Evaluation of Labour Market Policies, Geert Ridder. 2008. “On the Benefits of Stratifica- ed. Michael Lechner and Friedhelm Pfeiffer, 43–58. tion in Randomized Experiments.” Unpublished. Heidelberg and New York: Physica; Mannheim: Cen- Imbens, Guido W., and Thomas Lemieux. 2008. tre for European Economic Research. “Regression Discontinuity Designs: A Guide to Lechner, Michael. 2002a. “Program Heterogeneity Practice.” Journal of Econometrics, 142(2): 615–35. and Propensity Score Matching: An Application to Imbens, Guido W., and Charles F. Manski. 2004. “Con- the Evaluation of Active Labor Market Policies.” fidence Intervals for Partially Identified Parameters.” Review of Economics and Statistics, 84(2): 205–20. Econometrica, 72(6): 1845–57. Lechner, Michael. 2002b. “Some Practical Issues in Imbens, Guido W., and Whitney K. Newey. Forthcom- the Evaluation of Heterogeneous Labour Market ing. “Identification and Estimation of Triangular Programmes by Matching Methods.” Journal of the Simultaneous Equations Models without Additivity.” Royal Statistical Society: Series A (Statistics in Soci- National Bureau of Economic Research Technical ety), 165(1): 59–82. Econometrica. Lechner, Michael. 2004. “Sequential Matching Imbens and Wooldridge: Econometrics of Program Evaluation 83

­Estimation of Dynamic Causal Models.” University ­Evidence from a Regression Discontinuity Design.” of St. Gallen Department of Economics Discussion National Bureau of Economic Research Working Paper 2004-06. Paper 11702. Lechner, Michael, and Ruth Miquel. 2005. “Identi- Ludwig, Jens, and Douglas L. Miller. 2007. “Does fication of the Effects of Dynamic Treatments By Head Start Improve Children’s Life Chances? Evi- Sequential Conditional Independence Assump- dence from a Regression Discontinuity Design.” tions.” University of St. Gallen Department of Eco- Quarterly Journal of Economics, 122(1): 159–208. nomics Discussion Paper 2005-17. Manski, Charles F. 1990. “Nonparametric Bounds on Lechner, Michael, Ruth Miquel, and Conny Wunsch. Treatment Effects.” American Economic Review, 2004. “Long-Run Effects of Public Sector Spon- 80(2): 319–23. sored Training in West Germany.” Institute for the Manski, Charles F. 1993. “Identification of Endogenous Study of Labor Discussion Paper 1443. Social Effects: The Reflection Problem.” Review of Lee, David S. 2001. “The Electoral Advantage to Economic Studies, 60(3): 531–42. Incumbency and the Voters’ Valuation of Politicians’ Manski, Charles F. 1995. Identification Problems in Experience: A Regression Discontinuity Analysis of the Social Sciences. Cambridge and London: Har- Elections to the U.S. . . . ” National Bureau of Eco- vard University Press. nomic Research Working Paper 8441. Manski, Charles F. 2000a. “Economic Analysis of Lee, David S. 2008. “Randomized Experiments from Social Interactions.” Journal of Economic Perspec- Non-random Selection in U.S. House Elections.” tives, 14(3): 115–36. Journal of Econometrics, 142(2): 675–97. Manski, Charles F. 2000b. “Identification Problems Lee, David S., and David Card. 2008. “Regression and Decisions under Ambiguity: Empirical Analysis Discontinuity Inference with Specification Error.” of Treatment Response and Normative Analysis of Journal of Econometrics, 142(2): 655–74. Treatment Choice.” Journal of Econometrics, 95(2): Lee, David S., and Thomas Lemieux. 2008. “Regression 415–42. Discontinuity Designs in Economics.” Unpublished. Manski, Charles F. 2001. “Designing Programs for Lee, David S., Enrico Moretti, and Matthew J. Butler. Heterogeneous Populations: The Value of Covariate 2004. “Do Voters Affect or Elect Policies? Evidence Information.” American Economic Review, 91(2): from the U.S. House.” Quarterly Journal of Eco- 103–06. nomics, 119(3): 807–59. Manski, Charles F. 2002. “Treatment Choice under Lee, Myoung-Jae. 2005a. Micro-Econometrics for Ambiguity Induced By Inferential Problems.” Jour- Policy, Program, and Treatment Effects. Oxford and nal of Statistical Planning and Inference, 105(1): New York: . 67–82. Lee, Myoung-Jae. 2005b. “Treatment Effect and Sen- Manski, Charles F. 2003. Partial Identification of sitivity Analysis for Self-Selected Treatment and Probabilities Distributions. New York and Heidel- Selectively Observed Response.” Unpublished. berg: Springer. Lehmann, Erich L. 1974. Nonparametrics: Statistical Manski, Charles F. 2004. “Statistical Treatment Rules Methods Based on Ranks. San Francisco: Holden- for Heterogeneous Populations.” Econometrica, Day. 72(4): 1221–46. Lemieux, Thomas, and Kevin Milligan. 2008. “Incen- Manski, Charles F. 2005. Social Choice with Partial tive Effects of Social Assistance: A Regression Dis- Knowledge of Treatment Response. Princeton and continuity Approach.” Journal of Econometrics, Oxford: Princeton University Press. 142(2): 807–28. Manski, Charles F. 2007. Identification for Prediction Leuven, Edwin, and Barabara Sianesi. 2003. and Decision. Cambridge and London: Harvard “PSMATCH2: Stata Module to Perform Full University Press. Mahalanobis and Propensity Score Matching, Manski, Charles F., and John V. Pepper. 2000. “Mono- Common Support Graphing, and Covariate Imbal- tone Instrumental Variables: With an Application ance Testing.” http://ideas.repec.org/c/boc/bocode/ to the Returns to Schooling.” Econometrica, 68(4): s432001.html. 997–1010. Li, Qi, Jeffrey S. Racine, and Jeffrey M. Wooldridge. Manski, Charles F., Gary D. Sandefur, Sara McLana- Forthcoming. “Efficient Estimaton of Average han, and Daniel Powers. 1992. “Alternative Esti- Treatment Effects with Mixed Categorical and Con- mates of the Effect of Family Structure during tinuous Data.” Journal of Business and Economic Adolescence on High School Graduation.” Journal Statistics. of the American Statistical Association, 87(417): Linton, Oliver, and Pedro Gozalo. 2003. “Conditional 25–37. Independence Restrictions: Testing and Estima- Matzkin, Rosa L. 2003. “Nonparametric Estimation tion.” Unpublished. of Nonadditive Random Functions.” Econometrica, Little, Roderick J. A., and Donald B. Rubin. 1987. 71(5): 1339–75. Statistical Analysis with Missing Data. New York: McCrary, Justin. 2008. “Manipulation of the Running Wiley. Variable in the Regression Discontinuity Design: Ludwig, Jens, and Douglas L. Miller. 2005. “Does A Density Test.” Journal of Econometrics, 142(2): Head Start Improve Children’s Life Chances? 698–714. 84 Journal of Economic Literature, Vol. XLVII (March 2009)

McEwan, Patrick J., and Joseph S. Shapiro. 2008. “The Quade, D. 1982. “Nonparametric Analysis of Covari- Benefits of Delayed Primary School Enrollment: ance By Matching.” Biometrics, 38(3): 597–611. Discontinuity Estimates Using Exact Birth Dates.” Racine, Jeffrey S., and Qi Li. 2004. “Nonparametric Journal of Human Resources, 43(1): 1–29. Estimation of Regression Functions with Both Cat- Mealli, Fabrizia, Guido W. Imbens, Salvatore Ferro, egorical and Continuous Data.” Journal of Econo- and Annibale Biggeri. 2004. “Analyzing a Random- metrics, 119(1): 99–130. ized Trial on Breast Self-Examination with Noncom- Riccio, James, and Daniel Friedlander. 1992. GAIN: pliance and Missing Outcomes.” Biostatistics, 5(2): Program Strategies, Participation Patterns, and 207–22. First-Year Impacts in Six Countries. New York: Meyer, Bruce D., W. Kip Viscusi, and David L. Durbin. Manpower Demonstration Research Corporation. 1995. “Workers’ Compensation and Injury Duration: Riccio, James, Daniel Friedlander, and Stephen Freed- Evidence from a Natural Experiment.” American man. 1994. GAIN: Benefits, Costs, and Three-Year Economic Review, 85(3): 322–40. Impacts of a Welfare-to-Work Program. New York: Miguel, Edward, and Michael Kremer. 2004. “Worms: Manpower Demonstration Research Corporation. Identifying Impacts on Education and Health in the Robins, James M., and Ya’acov Ritov. 1997. “Toward Presence of Treatment Externalities.” Economet- a Curse of Dimensionality Appropriate (CODA) rica, 72(1): 159–217. Asymptotic Theory for Semi-parametric Models.” Morgan, Stephen L., and Christopher Winship. 2007. Statistics in Medicine, 16(3): 285–319. Counterfactuals and Causal Inference: Methods and Robins, James M., and Andrea Rotnitzky. 1995. “Semi- Principles for Social Research. Cambridge and New parametric Efficiency in Multivariate Regression York: Cambridge University Press. Models with Missing Data.” Journal of the American Moulton, Brent R. 1990. “An Illustration of a Pitfall Statistical Association, 90(429): 122–29. in Estimating the Effects of Aggregate Variables on Robins, James M., Andrea Rotnitzky, and Lue Ping Micro Unit.” Review of Economics and Statistics, Zhao. 1995. “Analysis of Semiparametric Regression 72(2): 334–38. Models for Repeated Outcomes in the Presence of Moulton, Brent R., and William C. Randolph. 1989. Missing Data.” Journal of the American Statistical “Alternative Tests of the Error Components Model.” Association, 90(429): 106–21. Econometrica, 57(3): 685–93. Robinson, Peter M. 1988. “Root-N-Consistent Semi- Newey, Whitney K. 1994a. “Kernel Estimation of parametric Regression.” Econometrica, 56(4): Partial Means and a General Variance Estimator.” 931–54. , 10(2): 233–53. Romano, Joseph P., and Azeem M. Shaikh. 2006a. Newey, Whitney K. 1994b. “Series Estimation of “Inference for Identifiable Parameters in Partially Regression Functionals.” Econometric Theory, Identified Econometric Models.” Stanford University 10(1): 1–28. Department of Statistics Technical Report 2006-9. Olken, Benjamin A. 2007. “Monitoring Corruption: Romano, Joseph P., and Azeem M. Shaikh. 2006b. Evidence from a Field Experiment in Indonesia.” “Inference for the Identified Set in Partially Identi- Journal of Political Economy, 115(2): 200–249. fied Econometric Models.” Unpublished. Pagan, Adrian, and Aman Ullah. 1999. Nonparamet- Rosen, Adam M. 2006. “Confidence Sets for Partially ric Econometrics. Cambridge; New York and Mel- Identified Parameters That Satisfy a Finite Number bourne: Cambridge University Press. of Moment Inequalities.” Institute for Fiscal Studies Pakes, Ariel, Jack R. Porter, Kate Ho, and Joy Ishii. Centre for Microdata Methods and Practice Work- 2006. “Moment Inequalities and Their Application.” ing Paper CWP25/06. Institute for Fiscal Studies Centre for Microdata Rosenbaum, Paul R. 1984a. “Conditional Permutation Methods and Practice Working Paper CWP16/07. Tests and the Propensity Score in Observational Pearl, Judea. 2000. Causality: Models, Reasoning, and Studies.” Journal of the American Statistical Asso- Inference. Cambridge; New York and Melbourne: ciation, 79(387): 565–74. Cambridge University Press. Rosenbaum, Paul R. 1984b. “The Consequences of Pettersson-Lidbom, Per. 2007. “The Policy Conse- Adjustment for a Concomitant Variable That Has quences of Direct versus Representative Democracy: Been Affected By the Treatment.” Journal of the A Regression-Discontinuity Approach.” Unpublished. Royal Statistical Society: Series A (Statistics in Soci- Pettersson-Libdom, Per. 2008. “Does the Size of the ety), 147(5): 656–66. Legislature Affect the Size of Government? Evidence Rosenbaum, Paul R. 1987. “The Role of a Second Con- from Two Natural Experiments.” Unpublished. trol Group in an Observational Study.” Statistical Pettersson-Lidbom, Per, and Björn Tyrefors. 2007. “Do Science, 2(3): 292–306. Parties Matter for Economic Outcomes? A Regres- Rosenbaum, Paul R. 1989. “Optimal Matching for sion-Discontinuity Approach.” Unpublished. Observational Studies.” Journal of the American Politis, Dimitris N., Joseph P. Romano, and Michael Statistical Association, 84(408): 1024–32. Wolf. 1999. Subsampling. New York: Springer, Rosenbaum, Paul R. 1995. Observational Studies. New Verlag York; Heidelberg and London: Springer. Porter, Jack R. 2003. “Estimation in the Regression Rosenbaum, Paul R. 2002. “Covariance Adjustment Discontinuity Model.” Unpublished. in Randomized Experiments and Observational Imbens and Wooldridge: Econometrics of Program Evaluation 85

­Studies.” Statistical Science, 17(3): 286–327. ­Linear ­Propensity Score Methods with Normal Dis- Rosenbaum, Paul R., and Donald B. Rubin. 1983a. tributions.” Biometrika, 79(4): 797–809. “Assessing Sensitivity to an Unobserved Binary Rubin, Donald B., and Neal Thomas. 1996. “Matching Covariate in an Observational Study with Binary Using Estimated Propensity Scores: Relating Theory Outcome.” Journal of the Royal Statistical Society: to Practice.” Biometrics, 52(1): 249–64. Series B (Statistical Methodology), 45(2): 212–18. Rubin, Donald B., and Neal Thomas. 2000. “Com- Rosenbaum, Paul R., and Donald B. Rubin. 1983b. bining Propensity Score Matching with Additional “The Central Role of the Propensity Score in Obser- Adjustments for Prognostic Covariates.” Journal vational Studies for Causal Effects.” Biometrika, of the American Statistical Association, 95(450): 70(1): 41–55. 573–85. Rosenbaum, Paul R., and Donald B. Rubin. 1984. Sacerdote, Bruce. 2001. “Peer Effects with Random “Reducing Bias in Observational Studies Using Sub- Assignment: Results for Dartmouth Roommates.” classification on the Propensity Score.”Journal of the Quarterly Journal of Economics, 116(2): 681–704. American Statistical Association, 79(387): 516–24. Scharfstein, Daniel O, Andrea Rotnitzky, and James Rosenbaum, Paul R., and Donald B. Rubin. 1985. M. Robins. 1999. “Adjusting for Nonignorable Drop- “Constructing a Control Group Using Multivariate Out Using Semiparametric Nonresponse Models.” Matched Sampling Methods That Incorporate the Journal of the American Statistical Association, Propensity Score.” American Statistician, 39(1): 94(448): 1096–1120. 33–38. Schultz, T. Paul. 2001. “School Subsidies for the Poor: Rotnitzky, Andrea, and James M. Robins. 1995. “Semi- Evaluating the Mexican Progresa Poverty Program.” parametric Regression Estimation in the Presence of Yale University Economic Growth Center Discus- Dependent Censoring.” Biometrika, 82(4): 805–20. sion Paper 834. Roy, A. D. 1951. “Some Thoughts on the Distribution of Seifert, Burkhardt, and Theo Gasser. 1996. “Finite- Earnings.” Oxford Economic Papers, 3(2): 135–46. Sample Variance of Local Polynomials: Analysis and Rubin, Donald B. 1973a. “Matching to Remove Bias in Solutions.” Journal of the American Statistical Asso- Observational Studies.” Biometrics, 29(1): 159–83. ciation, 91(433): 267–75. Rubin, Donald B. 1973b. “The Use of Matched Sam- Seifert, Burkhardt, and Theo Gasser. 2000. “Data pling and Regression Adjustment to Remove Bias in Adaptive Ridging in Local Polynomial Regression.” Observational Studies.” Biometrics, 29(1): 184–203. Journal of Computational and Graphical Statistics, Rubin, Donald B. 1974. “Estimating Causal Effects 9(2): 338–60. of Treatments in Randomized and Nonrandomized Sekhon, Jasjeet S. Forthcoming. “Multivariate and Pro- Studies.” Journal of Educational Psychology, 66(5): pensity Score Matching Software with Automated 688–701. Balance Optimization: The Matching Package for Rubin, Donald B. 1976. “Inference and Missing Data.” R.” Journal of Statistical Software. Biometrika, 63(3): 581–92. Sekhon, Jasjeet S., and Richard Grieve. 2008. “A New Rubin, Donald B. 1977. “Assignment to Treatment Non-parametric Matching Method for Bias Adjust- Group on the Basis of a Covariate.” Journal of Edu- ment with Applications to Economic Evaluations.” cational Statistics, 2(1): 1–26. http://sekhon.berkeley.edu/papers/GeneticMatch- Rubin, Donald B. 1978. “Bayesian Inference for Causal ing_SekhonGrieve.pdf. Effects: The Role of Randomization.” Annals of Sta- Shadish, William R., Thomas D. Cook, and Donald T. tistics, 6(1): 34–58. Campbell. 2002. Experimental and Quasi-Exper- Rubin, Donald B. 1979. “Using Multivariate Matched imental Designs for Generalized Causal Inference. Sampling and Regression Adjustment to Control Bias Boston: Houghton Mifflin. in Observational Studies.” Journal of the American Smith, Jeffrey A., and Petra E. Todd. 2001. “Recon- Statistical Association, 74(366): 318–28. ciling Conflicting Evidence on the Performance of Rubin, Donald B. 1987. Multiple Imputation for Non- Propensity-Score Matching Methods.” American response in Surveys. New York: Wiley. Economic Review, 91(2): 112–18. Rubin, Donald B. 1990. “Formal Mode of Statistical Smith, Jeffrey A., and Petra E. Todd. 2005. “Does Inference for Causal Effects.” Journal of Statistical Matching Overcome Lalonde’s Critique of Nonex- Planning and Inference, 25(3): 279–92. perimental Estimators?” Journal of Econometrics, Rubin, Donald B. 1997. “Estimating Causal Effects 125(1–2): 305–53. from Large Data Sets Using Propensity Scores.” Splawa-Neyman, Jerzy. 1990. “On the Application of Annals of Internal Medicine, 127(5 Part 2): 757–63. Probability Theory to Agricultural Experiments. Rubin, Donald B. 2006. Matched Sampling for Causal Essays on Principles. Section 9.” Statistical Science, Effects. Cambridge and New York: Cambridge Uni- 5(4): 465–72. (Orig. pub. 1923.) versity Press. Stock, James H. 1989. “Nonparametric Policy Analy- Rubin, Donald B., and Neal Thomas. 1992a. “Affinely sis.” Journal of the American Statistical Association, Invariant Matching Methods with Ellipsoidal Distri- 84(406): 567–75. butions.” Annals of Statistics, 20(2): 1079–93. Stone, Charles J. 1977. “Consistent Nonparametric Rubin, Donald B., and Neal Thomas. 1992b. Regression.” Annals of Statistics, 5(4): 595–620. ­“Characterizing the Effect of Matching Using Stoye, Jörg. 2007. “More on Confidence Intervals for 86 Journal of Economic Literature, Vol. XLVII (March 2009)

Partially Identified Parameters.” Unpublished. An Evaluation of Title I.” Journal of Econometrics, Stuart, Elizabeth A. 2008. “Developing Practical Rec- 142(2): 731–56. ommendations for the Use of Propensity Scores: Dis- Van der Klaauw, Wilbert. 2008b. “Regression-Discon- cussion of ‘A Critical Appraisal of Propensity Score tinuity Analysis: A Survey of Recent Developments Matching in the Medical Literature between 1996 in Economics.” Labour, 22(2): 219–45. and 2003’ by Peter Austin.” Statistics in Medicine, Van der Laan, Mark J., and James M. Robins. 2003. 27(12): 2062–65. Unified Methods for Censored Longitudinal Data Sun, Yixiao. 2005. “Adaptive Estimation of the Regres- and Causality. New York: Springer, Physica-Verlag. sion Discontinuity Model.” Unpublished. Vytlacil, Edward. 2002. “Independence, Monotonicity, Thistlethwaite, Donald L., and Donald T. Campbell. and Latent Index Models: An Equivalence Result.” 1960. “Regression-Discontinuity Analysis: An Alter- Econometrica, 70(1): 331–41. native to the Ex Post Facto Experiment.” Journal of Wooldridge, Jeffrey M. 1999. “Asymptotic Properties Educational Psychology, 51(6): 309–17. of Weighted M-Estimators for Variable Probability Trochim, William M. K. 1984. Research Design for Samples.” Econometrica, 67(6): 1385–1406. Program Evaluation: The Regression-Discon- Wooldridge, Jeffrey M. 2002. Econometric Analysis of tinuity Approach. Thousand Oaks, Calif.: Sage Cross Section and Panel Data. Cambridge and Lon- Publications. don: MIT Press. Trochim, William M. K. 2001. “Regression-Disconti- Wooldridge, Jeffrey M. 2005. “Violating Ignorability nuity Design.” In International Encyclopedia of the of Treatment By Controlling for Too Many Factors.” Social and Behavioral Sciences, Volume 20, ed. Neil Econometric Theory, 21(5): 1026–28. J. Smelser and Paul B. Baltes, 12940–45. Oxford: Wooldridge, Jeffrey M. 2007. “Inverse Probability Elsevier Science. Weighted Estimation for General Missing Data Van der Klaauw, Wilbert. 2002. “Estimating the Effect Problems.” Journal of Econometrics, 141(2): of Financial Aid Offers on College Enrollment: A 1281–1301. Regression-Discontinuity Approach.” International Zhao, Zhong. 2004. “Using Matching to Estimate Economic Review, 43(4): 1249–87. Treatment Effects: Data Requirements, Matching Van der Klaauw, Wilbert. 2008a. “Breaking the Link Metrics, and Monte Carlo Evidence.” Review of between Poverty and Low Student Achievement: Economics and Statistics, 86(1): 91–107. This article has been cited by:

1. Sylvain Chabé-Ferret. 2015. Analysis of the bias of Matching and Difference-in-Difference under alternative earnings and selection processes. Journal of Econometrics 185, 110-123. [CrossRef] 2. John P. Weche Geluebcke. 2015. The impact of foreign takeovers: comparative evidence from foreign and domestic acquisitions in Germany. Applied Economics 47, 739-755. [CrossRef] 3. Kenneth Fortson, Philip Gleason, Emma Kopa, Natalya Verbitsky-Savitz. 2015. Horseshoes, hand grenades, and treatment effects? Reassessing whether nonexperimental estimators are biased. Economics of Education Review 44, 100-113. [CrossRef] 4. Martin Huber, Giovanni Mellace. 2015. Sharp Bounds on Causal Effects under Sample Selection. Oxford Bulletin of Economics and Statistics 77:10.1111/obes.2015.77.issue-1, 129-151. [CrossRef] 5. Benjamin Balsmeier, Maikel Pellens. 2015. How much does it cost to be a scientist?. The Journal of Technology Transfer . [CrossRef] 6. Christoph Moser, Dieter Urban, Beatrice Weder Di Mauro. 2015. ON THE HETEROGENEOUS EMPLOYMENT EFFECTS OF OFFSHORING: IDENTIFYING PRODUCTIVITY AND DOWNSIZING CHANNELS. Economic Inquiry 53:10.1111/ecin.2015.53.issue-1, 220-239. [CrossRef] 7. Jörg Peters, Maximiliane Sievert, Christoph Strupat. 2015. Impacts of a Micro-Enterprise Clustering Programme on Firm Performance in Ghana. European Journal of Development Research 27, 99-121. [CrossRef] 8. Rocío Titiunik. 2015. Can Big Data Solve the Fundamental Problem of Causal Inference?. PS: Political Science & Politics 48, 75-79. [CrossRef] 9. Myoung-Jae Lee. 2014. Reference parameters in Blinder-Oaxaca decomposition: Pooled-sample versus intercept-shift approaches. The Journal of Economic Inequality . [CrossRef] 10. Edward Coffield, Allison J. Nihiser, Bettylou Sherry, Christina D. Economos. 2014. Shape Up Somerville: Change in Parent Body Mass Indexes During a Child-Targeted, Community-Based Environmental Change Intervention. American Journal of Public Health e1-e7. [CrossRef] 11. Sebastian Koehler, Thomas König. 2014. Fiscal Governance in the Eurozone: How Effectively Does the Stability and Growth Pact Limit Governmental Debt in the Euro Countries?. Political Science Research and Methods 1-23. [CrossRef] 12. Vincenzo Bove, Roberto Nisticò. 2014. Coups d’état and defense spending: a counterfactual analysis. Public Choice 161, 321-344. [CrossRef] 13. Di Mo, Renfu Luo, Chengfang Liu, Huiping Zhang, Linxiu Zhang, Alexis Medina, Scott Rozelle. 2014. Text Messaging and its Impacts on the Health and Education of the Poor: Evidence from a Field Experiment in Rural China. World Development 64, 766-780. [CrossRef] 14. Benno Buehler, Gábor Koltay, Xavier Boutin, Massimo Motta. 2014. Recent Developments at DG Competition: 2013–2014. Review of Industrial Organization 45, 399-415. [CrossRef] 15. Jenny Häggström, Xavier de Luna. 2014. Targeted smoothing parameter selection for estimating average causal effects. Computational Statistics 29, 1727-1748. [CrossRef] 16. Elizabeth A. Stuart, Haiden A. Huskamp, Kenneth Duckworth, Jeffrey Simmons, Zirui Song, Michael E. Chernew, Colleen L. Barry. 2014. Using propensity scores in difference-in-differences models to estimate the effects of a policy change. Health Services and Outcomes Research Methodology 14, 166-182. [CrossRef] 17. Igna Bonfrer, Ellen Van de Poel, Eddy Van Doorslaer. 2014. The effects of performance incentives on the utilization and quality of maternal and child care in Burundi. Social Science & Medicine 123, 96-104. [CrossRef] 18. Nava Ashraf, Oriana Bandiera, B. Kelsey Jack. 2014. No margin, no mission? A field experiment on incentives for public service delivery. Journal of Public Economics 120, 1-17. [CrossRef] 19. Melisso Boschi, Alessandro Girardi, Marco Ventura. 2014. Partial credit guarantees and SMEs financing. Journal of Financial Stability 15, 182-194. [CrossRef] 20. Marco Sanfilippo. 2014. FDI from emerging markets and the productivity gap—An analysis on affiliates of BRICS EMNEs in Europe. International Business Review . [CrossRef] 21. Michael Bradley, Dong Chen. 2014. Does Board Independence Reduce the Cost of Debt?. Financial Management n/a-n/a. [CrossRef] 22. Thomas Blondiau, Carole M. Billiet, Sandra Rousseau. 2014. Comparison of criminal and administrative penalties for environmental offenses. European Journal of Law and Economics . [CrossRef] 23. Peter Ericson, Lennart Flood, Nizamul Islam. 2014. Taxes, wages and working hours. Empirical Economics . [CrossRef] 24. Dirk Czarnitzki, Cindy Lopes-Bento. 2014. Innovation Subsidies: Does the Funding Source Matter for Innovation Intensity and Performance? Empirical Evidence from Germany. Industry and Innovation 1-30. [CrossRef] 25. D. Lapple, T. Hennessy. 2014. Exploring the Role of Incentives in Agricultural Extension Programs. Applied Economic Perspectives and Policy . [CrossRef] 26. Vincenzo Carrieri, Cinzia Di Novi, Rowena Jacobs, Silvana RoboneInsecure, Sick and Unhappy? Well- Being Consequences of Temporary Employment Contracts 157-193. [CrossRef] 27. Leon J.H. Bettendorf, Kees Folmer, Egbert L.W. Jongen. 2014. The dog that did not bark: The EITC for single mothers in the Netherlands. Journal of Public Economics 119, 49-60. [CrossRef] 28. Boris Kaiser, Christian Schmid. 2014. DOES PHYSICIAN DISPENSING INCREASE DRUG EXPENDITURES? EMPIRICAL EVIDENCE FROM SWITZERLAND. Health Economics n/a- n/a. [CrossRef] 29. Tim K. Loos, Manfred Zeller. 2014. Milk sales and dietary diversity among the Maasai. Agricultural Economics 45:10.1111/agec.2014.45.issue-S1, 77-90. [CrossRef] 30. Nicolai V. Kuminoff, Jaren C. Pope. 2014. DO “CAPITALIZATION EFFECTS” FOR PUBLIC GOODS REVEAL THE PUBLIC'S WILLINGNESS TO PAY?. International Economic Review 55:10.1111/iere.2014.55.issue-4, 1227-1250. [CrossRef] 31. Sylvain Friederich, Richard Payne. 2014. Trading anonymity and order anticipation. Journal of Financial Markets 21, 1-24. [CrossRef] 32. Paul J. Ferraro, Juan José Miranda. 2014. The performance of non-experimental designs in the evaluation of environmental programs: A design-replication study using a large-scale randomized experiment as a benchmark. Journal of Economic Behavior & Organization 107, 344-365. [CrossRef] 33. Adrian Hille, Jürgen Schupp. 2014. How learning a musical instrument affects the development of skills. Economics of Education Review . [CrossRef] 34. Edoardo Masset, Lawrence Haddad. 2014. Does beneficiary farmer feedback improve project performance? An impact study of a participatory monitoring intervention in Mindanao, Philippines. The Journal of Development Studies 1-18. [CrossRef] 35. Paul J. Ferraro, Merlin M. Hanauer. 2014. Advances in Measuring the Environmental and Social Impacts of Environmental Programs. Annual Review of Environment and Resources 39, 495-517. [CrossRef] 36. Annette Quinto Romani. 2014. Estimating the Peer Effect on Youth Overweight and Inactivity Using an Intervention Study. Journal of School Health 84:10.1111/josh.2014.84.issue-10, 617-624. [CrossRef] 37. A. Sundaram, F. Juarez, C. Ahiadeke, A. Bankole, N. Blades. 2014. The impact of Ghana's R3M programme on the provision of safe abortions and postabortion care. Health Policy and Planning . [CrossRef] 38. Abid A. Burki. 2014. Group-based BDS matching grants and farm-level outcomes in . Journal of Development Effectiveness 1-21. [CrossRef] 39. Craig A. Rolling, Yuhong Yang. 2014. Model selection for estimating treatment effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76:10.1111/rssb.2014.76.issue-4, 749-769. [CrossRef] 40. C. Viet Nguyen, A. Ngoc Tran. 2014. The role of crop land during economic development: evidence from rural Vietnam. European Review of Agricultural Economics 41, 561-582. [CrossRef] 41. Minli Liao, Kevin R. White. 2014. Post-permanency service needs, service utilization, and placement discontinuity for kinship versus non-kinship families. Children and Youth Services Review 44, 370-378. [CrossRef] 42. Haifang Huang, Brad R. Humphreys. 2014. NEW SPORTS FACILITIES AND RESIDENTIAL HOUSING MARKETS. Journal of Regional Science 54:10.1111/jors.2014.54.issue-4, 629-663. [CrossRef] 43. Ángela Rocío Vásquez-Urriago, Andrés Barge-Gil, Aurelia Modrego Rico, Evita Paraskevopoulou. 2014. The impact of science and technology parks on firms’ product innovation: empirical evidence from Spain. Journal of Evolutionary Economics 24, 835-873. [CrossRef] 44. Shawn Arita, Sumner La Croix, Christopher Edmonds. 2014. Effect of Approved Destination Status on Mainland Chinese Travel Abroad. Asian Economic Journal 28:10.1111/asej.2014.28.issue-3, 217-237. [CrossRef] 45. Myoung-Jae Lee, Young-Sook Kim. 2014. DIFFERENCE IN DIFFERENCES FOR STAYERS WITH A TIME-VARYING QUALIFICATION: HEALTH EXPENDITURE ELASTICITY OF THE ELDERLY. Health Economics 23:10.1002/hec.v23.9, 1134-1145. [CrossRef] 46. Laura Rosendahl Huber, Randolph Sloof, Mirjam Van Praag. 2014. The effect of early entrepreneurship education: Evidence from a field experiment. European Economic Review . [CrossRef] 47. Anita Alves Pena. 2014. The effect of continuing education participation on outcomes of male and female agricultural workers in the USA. Education Economics 1-26. [CrossRef] 48. Alfredo R. PaloyoThe Impact of Military Service on Future Labor-Market Outcomes 157-176. [CrossRef] 49. Solomon Asfaw, Benjamin Davis, Josh Dewbre, Sudhanshu Handa, Paul Winters. 2014. Cash Transfer Programme, Productive Activities and Labour Supply: Evidence from a Randomised Experiment in Kenya. The Journal of Development Studies 50, 1172-1196. [CrossRef] 50. Martin Huber, Michael Lechner, Andreas Steinmayr. 2014. Radius matching on the propensity score with bias adjustment: tuning parameters and finite sample behaviour. Empirical Economics . [CrossRef] 51. Can Liu, Katrina Mullan, Hao Liu, Wenqing Zhu, Qingjiao Rong. 2014. The estimation of long term impacts of China's key priority forestry programs on rural household incomes. Journal of Forest Economics 20, 267-285. [CrossRef] 52. Jesus Crespo Cuaresma, Harald Oberhofer, Gallina Andronova Vincelette. 2014. Firm growth and productivity in Belarus: New empirical evidence from the machine building industry. Journal of Comparative Economics 42, 726-738. [CrossRef] 53. A. Flores-Lagunes, T. Timko. 2014. Does Participation in 4-H Improve Schooling Outcomes? Evidence from Florida. American Journal of Agricultural Economics . [CrossRef] 54. Neda Trifković. 2014. Governance Strategies and Welfare Effects: Vertical Integration and Contracts in the Catfish Sector in Vietnam. The Journal of Development Studies 50, 949-961. [CrossRef] 55. Ian M. McCarthy, Chessie Robinson, Sakib Huq, Martha Philastre, Robert L. Fine. 2014. Cost Savings from Palliative Care Teams and Guidance for a Financially Viable Palliative Care Program. Health Services Research n/a-n/a. [CrossRef] 56. Corey Lang, James J. Opaluch, George Sfinarolakis. 2014. The windy city: Property value impacts of wind turbines in an urban setting. Energy Economics 44, 413-421. [CrossRef] 57. Yongheng Deng, Daniel P. McMillen, Tien Foo Sing. 2014. Matching indices for thinly-traded commercial real estate in Singapore. Regional Science and Urban Economics 47, 86-98. [CrossRef] 58. Xuequn Hu, Murat K. Munkin, Pravin K. Trivedi. 2014. ESTIMATING INCENTIVE AND SELECTION EFFECTS IN THE MEDIGAP INSURANCE MARKET: AN APPLICATION WITH DIRICHLET PROCESS MIXTURE MODEL. Journal of Applied Econometrics n/a-n/a. [CrossRef] 59. Tymon Słoczyński. 2014. The Oaxaca-Blinder Unexplained Component as a Treatment Effects Estimator. Oxford Bulletin of Economics and Statistics n/a-n/a. [CrossRef] 60. Sea-Jin Chang, Jungwook Shim. 2014. When does transitioning from family to professional management improve firm performance?. Strategic Management Journal n/a-n/a. [CrossRef] 61. Alfredo R. Paloyo, Arndt R. Reichert, Holger Reinermann, Harald Tauchmann. 2014. THE CAUSAL LINK BETWEEN FINANCIAL INCENTIVES AND WEIGHT LOSS: AN EVIDENCE-BASED SURVEY OF THE LITERATURE. Journal of Economic Surveys 28:10.1111/ joes.2014.28.issue-3, 401-420. [CrossRef] 62. Catherine Weinberger. 2014. The Increasing Complementarity between Cognitive and Social Skills. Review of Economics and Statistics 140624174807006. [CrossRef] 63. C. Connolly, H. A. Klaiber. 2014. Does Organic Command a Premium When the Food is Already Local?. American Journal of Agricultural Economics . [CrossRef] 64. Ronnie Pingel, Ingeborg Waernbaum. 2014. Effects of correlated covariates on the asymptotic efficiency of matching and inverse probability weighting estimators for causal inference. Statistics 1-20. [CrossRef] 65. Judea Pearl. 2014. TRYGVE HAAVELMO AND THE EMERGENCE OF CAUSAL CALCULUS. Econometric Theory 1-28. [CrossRef] 66. Marco Caliendo, Steffen Künn. 2014. Regional Effect Heterogeneity of Start-up Subsidies for the Unemployed. Regional Studies 48, 1108-1134. [CrossRef] 67. Katharine R. E. Sims. 2014. Do Protected Areas Reduce Forest Fragmentation? A Microlandscapes Approach. Environmental and Resource Economics 58, 303-333. [CrossRef] 68. Haeil Jung, Maureen A. Pirog. 2014. WHAT WORKS BEST AND WHEN: ACCOUNTING FOR MULTIPLE SOURCES OF PURESELECTION BIAS IN PROGRAM EVALUATIONS. Journal of Policy Analysis and Management 33:10.1002/pam.2014.33.issue-3, 752-777. [CrossRef] 69. J. Rusike, N.M. Mahungu, S.S. Lukombo, T. Kendenga, S.M. Bidiaka, A. Alene, A. Lema, V.M. Manyong. 2014. Does a cassava research-for-development program have impact at the farm level? Evidence from the Democratic Republic of Congo. Food Policy 46, 193-204. [CrossRef] 70. Taro Esaka. 2014. Are consistent pegs really more prone to currency crises?. Journal of International Money and Finance 44, 136-163. [CrossRef] 71. Allen Blackman, María Angélica Naranjo, Juan Robalino, Francisco Alpízar, Jorge Rivera. 2014. Does Tourism Eco-Certification Pay? Costa Rica’s Blue Flag Program. World Development 58, 41-52. [CrossRef] 72. Katharine O. Strunk, Andrew McEachin, Theresa N. Westover. 2014. The Use and Efficacy of Capacity-Building Assistance for Low-Performing Districts: The Case of California's District Assistance and Intervention Teams. Journal of Policy Analysis and Management 33:10.1002/ pam.2014.33.issue-3, 719-751. [CrossRef] 73. Arnstein Øvrum, Elling Bere. 2014. Evaluating free school fruit: results from a natural experiment in Norway with representative data. Public Health Nutrition 17, 1224-1231. [CrossRef] 74. Lorenzo Escot, José Andrés Fernández-Cornejo, Carlos Poza. 2014. Fathers’ Use of Childbirth Leave in Spain. The Effects of the 13-Day Paternity Leave. Population Research and Policy Review 33, 419-453. [CrossRef] 75. Philomena Ogwuike, Jonne Rodenburg, Aliou Diagne, Afiavi R. Agboh-Noameshie, Eyram Amovin- Assagba. 2014. Weed management in upland rice in sub-Saharan Africa: impact on labor and crop productivity. Food Security 6, 327-337. [CrossRef] 76. H. Pamuk, E. Bulte, A. Adekunle, A. Diagne. 2014. Decentralised innovation systems and poverty reduction: experimental evidence from Central Africa. European Review of Agricultural Economics . [CrossRef] 77. P. Bastos, N. P. Monteiro, O. R. Straume. 2014. The effect of private versus public ownership on labour earnings. Oxford Economic Papers . [CrossRef] 78. Sonali Senaratna Sellamuttu, Takeshi Aida, Ryuji Kasahara, Yasuyuki Sawada, Deeptha Wijerathna. 2014. How Access to Irrigation Influences Poverty and Livelihoods: A Case Study from Sri Lanka. The Journal of Development Studies 50, 748-768. [CrossRef] 79. Brendan D. Dooley, Alan Seals, David Skarbek. 2014. The effect of prison gang membership on recidivism. Journal of Criminal Justice 42, 267-275. [CrossRef] 80. Chong-En Bai, Binzhen Wu. 2014. Health insurance and consumption: Evidence from China’s New Cooperative Medical Scheme. Journal of Comparative Economics 42, 450-469. [CrossRef] 81. Ellen Van de Poel, Gabriela Flores, Por Ir, Owen O’Donnell, Eddy Van Doorslaer. 2014. Can vouchers deliver? An evaluation of subsidies for maternal health care in Cambodia. Bulletin of the World Health Organization 92, 331-339. [CrossRef] 82. G. Pugh, J. Mangan, V. Blackburn, D. Radicic. 2014. School expenditure and school performance: evidence from New South Wales schools using a dynamic panel analysis. British Educational Research Journal n/a-n/a. [CrossRef] 83. Xun Lu, Halbert White. 2014. Testing for separability in structural equations. Journal of Econometrics . [CrossRef] 84. Alexey Bessudnov, Igor Guardiancich, Ramon Marimon. 2014. A statistical evaluation of the effects of a structured postdoctoral programme. Studies in Higher Education 1-17. [CrossRef] 85. Heli Koski, Mika Pajarinen. 2014. Subsidies, the Shadow of Death and Labor Productivity. Journal of Industry, Competition and Trade . [CrossRef] 86. Amr Hosny, Magda Kandil, Hamid Mohtadi. 2014. What does Egypt's Revolution Reveal about its Economy?. International Economic Journal 1-23. [CrossRef] 87. Michael P. Murray. 2014. Teaching About Heterogeneous Response Models. The Journal of Economic Education 45, 110-120. [CrossRef] 88. Valeria Di Cosmo, Sean Lyons, Anne Nolan. 2014. Estimating the Impact of Time-of-Use Pricing on Irish Electricity Demand. The Energy Journal 35:10.5547/01956574.35.2. . [CrossRef] 89. Samuel B. Bonsall. 2014. The impact of issuer-pay on corporate bond rating properties: Evidence from [s initial adoptions. Journal of Accounting and Economics 57, 89-109. [CrossRef׳s and S&P׳Moody 90. Christian Brown. 2014. Returns to Postincarceration Education for Former Prisoners. Social Science Quarterly n/a-n/a. [CrossRef] 91. Jasmin Kantarevic, Boris Kralj. 2014. Risk selection and cost shifting in a prospective physician payment system: Evidence from Ontario. Health Policy 115, 249-257. [CrossRef] 92. Henrik Hansen, Neda Trifković. 2014. Food Standards are Good – For Middle-Class Farmers. World Development 56, 226-242. [CrossRef] 93. Nassul S. Kabunga, Thomas Dubois, Matin Qaim. 2014. Impact of tissue culture banana technology on farm household income and food security in Kenya. Food Policy 45, 25-34. [CrossRef] 94. Ryota Nakamura, Marc Suhrcke, Rachel Pechey, Marcello Morciano, Martin Roland, Theresa M. Marteau. 2014. Impact on alcohol purchasing of a ban on multi-buy promotions: a quasi-experimental evaluation comparing Scotland with England and Wales. Addiction 109:10.1111/add.2014.109.issue-4, 558-567. [CrossRef] 95. Halbert White, Haiqing Xu, Karim Chalak. 2014. Causal discourse in a game of incomplete information. Journal of Econometrics . [CrossRef] 96. C. Noelke, D. Horn. 2014. Social Transformation and the Transition from Vocational Education to Work in Hungary: A Differences-in-differences Approach. European Sociological Review . [CrossRef] 97. S. R. Cotten, G. Ford, S. Ford, T. M. Hale. 2014. Internet Use and Depression Among Retired Older Adults in the United States: A Longitudinal Analysis. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences . [CrossRef] 98. Hang Gao, Johannes Van Biesebroeck. 2014. Effects of Deregulation and Vertical Unbundling on the Performance of China's Electricity Generation Sector. The Journal of Industrial Economics 62:10.1111/ joie.2014.62.issue-1, 41-76. [CrossRef] 99. Bryan S. Graham, Guido W. Imbens, Geert Ridder. 2014. Complementarity and aggregate implications of assortative matching: A nonparametric analysis. Quantitative Economics 5, 29-66. [CrossRef] 100. John V. Duca, Anil Kumar. 2014. Financial literacy and mortgage equity withdrawals. Journal of Urban Economics 80, 62-75. [CrossRef] 101. Anthony A. Braga, David M. Hureau, Andrew V. Papachristos. 2014. Deterring Gang-Involved Gun Violence: Measuring the Impact of Boston’s Operation Ceasefire on Street Gang Behavior. Journal of Quantitative Criminology 30, 113-139. [CrossRef] 102. Nicolas R. Ziebarth, Martin Karlsson. 2014. THE EFFECTS OF EXPANDING THE GENEROSITY OF THE STATUTORY SICKNESS INSURANCE SYSTEM. Journal of Applied Econometrics 29:10.1002/jae.v29.2, 208-230. [CrossRef] 103. Claudio A. Agostini, Claudia Martínez A.. 2014. Response of Tax Credit Claims to Tax Enforcement: Evidence from a Quasi-Experiment in Chile. Fiscal Studies 35:10.1111/fisc.2014.35.issue-1, 41-65. [CrossRef] 104. S. Michael Gaddis, Douglas Lee Lauen. 2014. School accountability and the black–white test score gap. Social Science Research 44, 15-31. [CrossRef] 105. Elisa Iezzi, Matteo Lippi Bruni, Cristina Ugolini. 2014. The role of GP's compensation schemes in diabetes care: Evidence from panel data. Journal of Health Economics 34, 104-120. [CrossRef] 106. Vijesh V. Krishna, Prakashan C. Veettil. 2014. Productivity and efficiency impacts of conservation tillage in northwest Indo-Gangetic Plains. Agricultural Systems . [CrossRef] 107. John Herbert Ainembabazi, Arild Angelsen. 2014. Do commercial forest plantations reduce pressure on natural forests? Evidence from forest policy reforms in Uganda. Forest Policy and Economics 40, 48-56. [CrossRef] 108. Maria Sudulich, Matthew Wall, Leonardo Baccini. 2014. Wired Voters: The Effects of Internet Use on Voters’ Electoral Uncertainty. British Journal of Political Science 1-29. [CrossRef] 109. James J. Heckman, Hedibert F. Lopes, Rémi Piatek. 2014. Treatment Effects: A Bayesian Perspective. Econometric Reviews 33, 36-67. [CrossRef] 110. Hanna Hottenrott, Cindy Lopes-Bento. 2014. (International) R&D collaboration and SMEs: The effectiveness of targeted public R&D support schemes. Research Policy . [CrossRef] 111. Daniel I. Rees, Joseph J. Sabia. 2014. The kid's speech: The effect of stuttering on human capital acquisition. Economics of Education Review 38, 76-88. [CrossRef] 112. William P. Warburton, Rebecca N. Warburton, Arthur Sweetman, Clyde Hertzman. 2014. The Impact of Placing Adolescent Males into Foster Care on Education, Income Assistance, and Convictions. Canadian Journal of Economics/Revue canadienne d'économique 47:10.1111/caje.2014.47.issue-1, 35-69. [CrossRef] 113. K. J. Wendland, S. K. Pattanayak, E. O. Sills. 2014. National-level differences in the adoption of environmental health technologies: a cross-border comparison from Benin and Togo. Health Policy and Planning . [CrossRef] 114. Breno Sampaio. 2014. Identifying peer states for transportation policy analysis with an application to New York's handheld cell phone ban. Transportmetrica A: Transport Science 10, 1-14. [CrossRef] 115. Paulo Bastos, Natália P. Monteiro, Odd Rune Straume. 2014. The impact of private vs. public ownership on the level and structure of employment. Economics of Transition n/a-n/a. [CrossRef] 116. V. V. Acharya, R. P. Baghai, K. V. Subramanian. 2014. Wrongful Discharge Laws and Innovation. Review of Financial Studies 27, 301-346. [CrossRef] 117. Erik Ruzek, Margaret Burchinal, George Farkas, Greg J. Duncan. 2014. The quality of toddler child care and cognitive skills at 24 months: Propensity score analysis results from the ECLS-B. Early Childhood Research Quarterly 29, 12-21. [CrossRef] 118. Federico R León, Rebecka Lundgren, Irit Sinai, Ragini Sinha, Victoria Jennings. 2014. Increasing literate and illiterate women's met need for contraception via empowerment: a quasi-experiment in rural India. Reproductive Health 11, 74. [CrossRef] 119. Randall Juras. 2014. The effect of public employment on children’s work and school attendance: evidence from a social protection program in Argentina. IZA Journal of Labor & Development 3, 14. [CrossRef] 120. Anna Bottasso, Chiara Piccardo. 2014. Export activity and firm heterogeneity: a survey of the empirical evidence for Italy. ECONOMIA E POLITICA INDUSTRIALE 27-61. [CrossRef] 121. B.H. BaltagiPanel Data and Difference-in-Differences Estimation 425-433. [CrossRef] 122. Chihiro Udagawa, Ian Hodge, Mark Reader. 2014. Farm Level Costs of Agri-environment Measures: The Impact of Entry Level Stewardship on Cereal Farm Incomes. Journal of Agricultural Economics 65:10.1111/jage.2014.65.issue-1, 212-233. [CrossRef] 123. Ryo Takahashi, Yasuyuki Todo. 2014. The impact of a shade coffee certification program on forest conservation using remote sensing and household data. Environmental Impact Assessment Review 44, 76-81. [CrossRef] 124. Nicolai V. Kuminoff, V. Kerry Smith, Christopher Timmins. 2013. The New Economics of Equilibrium Sorting and Policy Evaluation Using Housing Markets. Journal of Economic Literature 51:4, 1007-1062. [Abstract] [View PDF article] [PDF with links] 125. Sankar Mukhopadhyay, Jeanne Wendel. 2013. Evaluating an employee wellness program. International Journal of Health Care Finance and Economics 13, 173-199. [CrossRef] 126. Patricia Apps, Silvia Mendolia, Ian Walker. 2013. The impact of pre-school on adolescents’ outcomes: Evidence from a recent English cohort. Economics of Education Review 37, 183-199. [CrossRef] 127. Abdulbaki Bilgic, Steven T. Yen. 2013. Household food demand in Turkey: A two-step demand system approach. Food Policy 43, 267-277. [CrossRef] 128. Rahul Rawat, Elizabeth Faust, John A. Maluccio, Suneetha Kadiyala. 2013. The impact of a food assistance program on nutritional status, disease progression, and food security among people living with HIV in Uganda. JAIDS Journal of Acquired Immune Deficiency Syndromes 1. [CrossRef] 129. Eskil Heinesen, Christophe Kolodziejczyk. 2013. Effects of breast and colorectal cancer on labour market outcomes—Average effects and educational gradients. Journal of Health Economics 32, 1028-1042. [CrossRef] 130. Xiaohong Chen. 2013. Comment. Journal of the American Statistical Association 108, 1262-1264. [CrossRef] 131. Martin Foureaux Koppensteiner. 2013. Automatic Grade Promotion and Student Performance: Evidence from Brazil. Journal of Development Economics . [CrossRef] 132. Carlos A. Flores, Oscar A. Mitnik. 2013. Comparing Treatments across Labor Markets: An Assessment of Nonexperimental Multiple-Treatment Strategies. Review of Economics and Statistics 95, 1691-1707. [CrossRef] 133. Robert E. Larzelere, Ronald B. Cox. 2013. Making Valid Causal Inferences About Corrective Actions by Parents From Longitudinal Data. Journal of Family Theory & Review 5:10.1111/jftr.2013.5.issue-4, 282-299. [CrossRef] 134. Noémi Kreif, Richard Grieve, Rosalba Radice, Jasjeet S. Sekhon. 2013. Regression-adjusted matching and double-robust methods for estimating average treatment effects in health economic evaluation. Health Services and Outcomes Research Methodology 13, 174-202. [CrossRef] 135. Ola Lotherington Vestad. 2013. Labour supply effects of early retirement provision. Labour Economics 25, 98-109. [CrossRef] 136. Jasmin Kantarevic, Boris Kralj. 2013. LINK BETWEEN PAY FOR PERFORMANCE INCENTIVES AND PHYSICIAN PAYMENT MECHANISMS: EVIDENCE FROM THE DIABETES MANAGEMENT INCENTIVE IN ONTARIO. Health Economics 22:10.1002/ hec.v22.12, 1417-1439. [CrossRef] 137. Demosthenes Ioannou, Livio Stracca. 2013. Have euro area and EU governance worked? Just the facts. European Journal of Political Economy . [CrossRef] 138. Ulf Rinne, Arne Uhlendorff, Zhong Zhao. 2013. Vouchers and caseworkers in training programs for the unemployed. Empirical Economics 45, 1089-1127. [CrossRef] 139. Lee Pinkowitz, Jason Sturgess, Rohan Williamson. 2013. Do cash stockpiles fuel cash acquisitions?. Journal of Corporate Finance 23, 128-149. [CrossRef] 140. K. Takahashi, C. B. Barrett. 2013. The System of Rice Intensification and its Impacts on Household Income and Child Schooling: Evidence from Rural Indonesia. American Journal of Agricultural Economics . [CrossRef] 141. Bola Amoke Awotide, Aziz Karimov, Aliou Diagne, Tebila Nakelse. 2013. The impact of seed vouchers on poverty reduction among smallholder rice farmers in Nigeria. Agricultural Economics 44:10.1111/ agec.2013.44.issue-6, 647-658. [CrossRef] 142. Yiwei Fang, Iftekhar Hasan, Katherin Marton. 2013. Institutional Development and Bank Stability: Evidence from Transition Countries. Journal of Banking & Finance . [CrossRef] 143. R. Forrest McCluer, Martha A. Starr. 2013. Using Difference in Differences to Estimate Damages in Healthcare Antitrust: A Case Study of Marshfield Clinic. International Journal of the Economics of Business 20, 447-469. [CrossRef] 144. Yi Lu, Zhigang Tao, Yan Zhang. 2013. How do exporters respond to antidumping investigations?. Journal of International Economics 91, 290-300. [CrossRef] 145. Ryo Takahashi, Yasuyuki Todo. 2013. The impact of a shade coffee certification program on forest conservation: A case study from a wild coffee forest in Ethiopia. Journal of Environmental Management 130, 48-54. [CrossRef] 146. Brad R. Humphreys, Joseph Marchand. 2013. New casinos and local labor markets: Evidence from Canada. Labour Economics 24, 151-160. [CrossRef] 147. Luis Ayala, Magdalena Rodríguez. 2013. Evaluating social assistance reforms under programme heterogeneity and alternative measures of success. International Journal of Social Welfare 22:10.1111/ ijsw.2013.22.issue-4, 406-419. [CrossRef] 148. Benjamin Schünemann, Michael Lechner, Conny Wunsch. 2013. Do Long-Term Unemployed Workers Benefit from Targeted Wage Subsidies?. German Economic Review n/a-n/a. [CrossRef] 149. Torsten Biemann, Nils Braakmann. 2013. The impact of international experience on objective and subjective career success in early careers. The International Journal of Human Resource Management 24, 3438-3456. [CrossRef] 150. Thomas K. Bauer, Sebastian Braun, Michael Kvasnicka. 2013. The Economic Integration of Forced Migrants: Evidence for Post-War Germany. 123:10.1111/ ecoj.2013.123.issue-571, 998-1024. [CrossRef] 151. Allen Blackman. 2013. Evaluating forest conservation policies in developing countries using remote sensing data: An introduction and practical guide. Forest Policy and Economics 34, 1-16. [CrossRef] 152. Luis Ayala, Magdalena Rodríguez. 2013. Health-related effects of welfare-to-work policies. Social Science & Medicine 93, 103-112. [CrossRef] 153. R. Esposti, F. Sotte. 2013. Evaluating the effectiveness of agricultural and rural policies: an introduction. European Review of Agricultural Economics 40, 535-539. [CrossRef] 154. Paul J. Ferraro, Juan José Miranda. 2013. Heterogeneous treatment effects and mechanisms in information-based environmental policies: Evidence from a large-scale field experiment. Resource and Energy Economics 35, 356-379. [CrossRef] 155. Gábor Kézdi, Gergely Csorba. 2013. Estimating Consumer Lock-In Effects from Firm-Level Data. Journal of Industry, Competition and Trade 13, 431-452. [CrossRef] 156. Andreas Peichl, Nico Pestel, Sebastian Siegloch. 2013. The politicians’ wage gap: insights from German members of parliament. Public Choice 156, 653-676. [CrossRef] 157. Byung-Seong Min. 2013. Evaluation of board reforms: An examination of the appointment of outside directors. Journal of the Japanese and International Economies 29, 21-43. [CrossRef] 158. George Messinis. 2013. Returns to education and urban-migrant wage differentials in China: IV quantile treatment effects. China Economic Review 26, 39-55. [CrossRef] 159. Stephen G. Donald, Yu-Chin Hsu. 2013. Estimation and inference for distribution functions and quantile functions in treatment effect models. Journal of Econometrics . [CrossRef] 160. Shephard Siziba, Kefasi Nyikahadzoi, Joachim Binam Nyemeck, Aliou Diagne, Adekunle Adewale, Fatunbi Oluwole. 2013. Estimating the impact of innovation systems on maize yields: the case of Iar4d in southern Africa. Agrekon 52, 83-100. [CrossRef] 161. Sea-Jin Chang, Jaiho Chung, Jon Jungbien Moon. 2013. When do foreign subsidiaries outperform local firms?. Journal of International Business Studies . [CrossRef] 162. A. Acharya, S. Vellakkal, F. Taylor, E. Masset, A. Satija, M. Burke, S. Ebrahim. 2013. The Impact of Health Insurance Schemes for the Informal Sector in Low- and Middle-Income Countries: A Systematic Review. The World Bank Research Observer 28, 236-266. [CrossRef] 163. Kazunobu Hayakawa, Toshiyuki Matsuura, Kazuyuki Motohashi, Ayako Obashi. 2013. Two- dimensional analysis of the impact of outward FDI on performance at home: Evidence from Japanese manufacturing firms. Japan and the World Economy 27, 25-33. [CrossRef] 164. Irma Arteaga, Sarah Humpage, Arthur J. Reynolds, Judy A. Temple. 2013. One year of preschool or two: Is it important for adult outcomes?. Economics of Education Review . [CrossRef] 165. Tania Barham, Jacob Rowberry. 2013. Living longer: The effect of the Mexican conditional cash transfer program on elderly mortality. Journal of Development Economics . [CrossRef] 166. Alan R. Ellis, M. Alan Brookhart. 2013. Approaches to inverse-probability-of-treatment–weighted estimation with concurrent treatments. Journal of Clinical Epidemiology 66, S51-S56. [CrossRef] 167. B. Sampaio, G. R. Sampaio, Y. Sampaio. 2013. On Estimating the Effects of Immigrant Legalization: Do U.S. Agricultural Workers Really Benefit?. American Journal of Agricultural Economics 95, 932-948. [CrossRef] 168. Fabiana Fontes Rocha, Marislei Nishijima, Sandro Garcia Duarte Peixoto. 2013. Primary health care policies: investigation on morbidity. Applied Economics Letters 20, 1046-1051. [CrossRef] 169. C. Bontemps, Z. Bouamra-Mechemache, M. Simioni. 2013. Quality labels and firm survival: some first empirical evidence. European Review of Agricultural Economics 40, 413-439. [CrossRef] 170. G. von Graevenitz. 2013. Trade mark cluttering-evidence from EU enlargement. Oxford Economic Papers 65, 721-745. [CrossRef] 171. J. Alix-Garcia, A. Bartlett, D. Saah. 2013. The landscape of conflict: IDPs, aid and land-use change in Darfur. Journal of Economic Geography 13, 589-617. [CrossRef] 172. Fuxiu Jiang, Bing Zhu, Jicheng Huang. 2013. CEO's financial experience and earnings management. Journal of Multinational Financial Management 23, 134-145. [CrossRef] 173. Martin Huber, Michael Lechner, Conny Wunsch. 2013. The performance of estimators based on the propensity score. Journal of Econometrics 175, 1-21. [CrossRef] 174. Paul J Ferraro, Merlin M Hanauer, Daniela A Miteva, Gustavo Javier Canavire-Bacarreza, Subhrendu K Pattanayak, Katharine R E Sims. 2013. More strictly protected areas are not necessarily more protective: evidence from Bolivia, Costa Rica, Indonesia, and Thailand. Environmental Research Letters 8, 025011. [CrossRef] 175. Vibeke Myrup Jensen. 2013. Working longer makes students stronger? The effects of ninth grade classroom hours on ninth grade student performance. Educational Research 55, 180-194. [CrossRef] 176. Farrukh Suvankulov. 2013. Internet recruitment and job performance: case of the US Army. The International Journal of Human Resource Management 24, 2237-2254. [CrossRef] 177. Anika Sieber, Tobias Kuemmerle, Alexander V. Prishchepov, Kelly J. Wendland, Matthias Baumann, Volker C. Radeloff, Leonid M. Baskin, Patrick Hostert. 2013. Landsat-based mapping of post-Soviet land-use change to assess the effectiveness of the Oksky and Mordovsky protected areas in European Russia. Remote Sensing of Environment 133, 38-51. [CrossRef] 178. Annette Schminke, Johannes Van Biesebroeck. 2013. Using export market performance to evaluate regional preferential policies in China. Review of World Economics 149, 343-367. [CrossRef] 179. R. Gutman, D.B. Rubin. 2013. Robust estimation of causal effects of binary treatments in unconfounded studies with dichotomous outcomes. Statistics in Medicine 32:10.1002/sim.v32.11, 1795-1814. [CrossRef] 180. John Moffat. 2013. Regional Selective Assistance (RSA) in Scotland: Does It Make a Difference to Plant Survival?. Regional Studies 1-14. [CrossRef] 181. Thomas G. Koch. 2013. Using RD design to understand heterogeneity in health insurance crowd- out. Journal of Health Economics 32, 599-611. [CrossRef] 182. Harald Oberhofer. 2013. Employment Effects of Acquisitions: Evidence from Acquired European Firms. Review of Industrial Organization 42, 345-363. [CrossRef] 183. Nguyen Viet Cuong. 2013. Which covariates should be controlled in propensity score matching? Evidence from a simulation study. Statistica Neerlandica 67:10.1111/stan.v67.2, 169-180. [CrossRef] 184. Aki Jääskeläinen, Harri Laihonen. 2013. Overcoming the specific performance measurement challenges of knowledge‐intensive organizations. International Journal of Productivity and Performance Management 62, 350-363. [CrossRef] 185. Gianluca Fiorentini, Matteo Lippi Bruni, Cristina Ugolini. 2013. GPs and hospital expenditures. Should we keep expenditure containment programs alive?. Social Science & Medicine 82, 10-20. [CrossRef] 186. Michael Lechner, Conny Wunsch. 2013. Sensitivity of matching-based program evaluations to the availability of control variables. Labour Economics 21, 111-121. [CrossRef] 187. Roberto Leombruni, Tiziano Razzolini, Francesco Serti. 2013. The pecuniary and non-pecuniary costs of job displacement–The risky job of being back to work. European Economic Review . [CrossRef] 188. Noémi Kreif, Richard Grieve, M. Zia Sadique. 2013. STATISTICAL METHODS FOR COST-EFFECTIVENESS ANALYSES THAT USE OBSERVATIONAL DATA: A CRITICAL APPRAISAL TOOL AND REVIEW OF CURRENT PRACTICE. Health Economics 22, 486-500. [CrossRef] 189. Sanders Korenman, Kristin S. Abner, Robert Kaestner, Rachel A. Gordon. 2013. The Child and Adult Care Food Program and the nutrition of preschoolers. Early Childhood Research Quarterly 28, 325-336. [CrossRef] 190. Yu Xiao, Jun Wan, Geoffrey J. D. Hewings. 2013. Flooding and the Midwest economy: assessing the Midwest floods of 1993 and 2008. GeoJournal 78, 245-258. [CrossRef] 191. Sea-Jin Chang, Jaiho Chung, Jon Jungbien Moon. 2013. When do wholly owned subsidiaries perform better than joint ventures?. Strategic Management Journal 34:10.1002/smj.2013.34.issue-3, 317-337. [CrossRef] 192. Seung-Hyun Hong. 2013. MEASURING THE EFFECT OF NAPSTER ON RECORDED MUSIC SALES: DIFFERENCE-IN-DIFFERENCES ESTIMATES UNDER COMPOSITIONAL CHANGES. Journal of Applied Econometrics 28:10.1002/jae.v28.2, 297-324. [CrossRef] 193. Richard Fabling, Lynda Sanderson. 2013. Exporting and firm performance: Market entry, investment and expansion. Journal of International Economics 89, 422-431. [CrossRef] 194. Taro Esaka. 2013. Evaluating the effect of de facto pegs on currency crises. Journal of Policy Modeling . [CrossRef] 195. Jin Wang. 2013. The economic impact of Special Economic Zones: Evidence from Chinese municipalities. Journal of Development Economics 101, 133-147. [CrossRef] 196. Christina Weiland, Hirokazu Yoshikawa. 2013. Impacts of a Prekindergarten Program on Children's Mathematics, Language, Literacy, Executive Function, and Emotional Skills. Child Development n/ a-n/a. [CrossRef] 197. Rahel Aichele, Gabriel Felbermayr. 2013. Estimating the Effects of Kyoto on Bilateral Trade Flows Using Matching Econometrics. The World Economy 36, 303-330. [CrossRef] 198. RICHARD HARRIS, QIAN CHER LI, JOHN MOFFAT. 2013. THE IMPACT OF HIGHER EDUCATION INSTITUTION-FIRM KNOWLEDGE LINKS ON ESTABLISHMENT- LEVEL PRODUCTIVITY IN BRITISH REGIONS*. The Manchester School 81:10.1111/ manc.2013.81.issue-2, 143-162. [CrossRef] 199. Edoardo Masset, Marie Gaarder, Penelope Beynon, Christelle Chapoy. 2013. What is the impact of a policy brief? Results of an experiment in research dissemination. Journal of Development Effectiveness 5, 50-63. [CrossRef] 200. Joshua K. Abbott, H. Allen Klaiber. 2013. The value of water as an urban club good: A matching approach to community-provided lakes. Journal of Environmental Economics and Management 65, 208-224. [CrossRef] 201. Caterina Giannetti, Nicola Jentzsch. 2013. Credit reporting, financial intermediation and identification systems: International evidence. Journal of International Money and Finance 33, 60-80. [CrossRef] 202. Dirk Czarnitzki, Cindy Lopes-Bento. 2013. Value for money? New microeconometric evidence on public R&D grants in Flanders. Research Policy 42, 76-89. [CrossRef] 203. Julia Koschinsky. 2013. The case for spatial analysis in evaluation to reduce health inequities. Evaluation and Program Planning 36, 172-176. [CrossRef] 204. Degnet Abebaw, Mekbib G. Haile. 2013. The impact of cooperatives on agricultural technology adoption: Empirical evidence from Ethiopia. Food Policy 38, 82-91. [CrossRef] 205. Metin Akyol, Michael Neugart, Stefan Pichler. 2013. Were the Hartz Reforms Responsible for the Improved Performance of the German Labour Market?. Economic Affairs 33:10.1111/ ecaf.2013.33.issue-1, 34-47. [CrossRef] 206. Matias D. Cattaneo, Max H. Farrell. 2013. Optimal convergence rates, Bahadur representation, and asymptotic normality of partitioning estimators. Journal of Econometrics . [CrossRef] 207. Alan R. Ellis, Stacie B. Dusetzina, Richard A. Hansen, Bradley N. Gaynes, Joel F. Farley, Til Stürmer. 2013. Investigating differences in treatment effect estimates between propensity score matching and weighting: a demonstration using STAR*D trial data. Pharmacoepidemiology and Drug Safety 22:10.1002/pds.v22.2, 138-144. [CrossRef] 208. Halbert White, Karim Chalak. 2013. Identification and Identification Failure for Treatment Effects Using Structural Systems. Econometric Reviews 32, 273-317. [CrossRef] 209. Bernadette M. Wanjala, Roldan Muradian. 2013. Can Big Push Interventions Take Small-scale Farmers out of Poverty? Insights from the Sauri Millennium Village in Kenya. World Development . [CrossRef] 210. D.L. MillimetEmpirical Methods for Political Economy Analyses of Environmental Policy 250-260. [CrossRef] 211. Dennis K. Orthner, Hinckley Jones-Sanpei, Patrick Akos, Roderick A. Rose. 2013. Improving Middle School Student Engagement Through Career-Relevant Instruction in the Core Curriculum. The Journal of Educational Research 106, 27-38. [CrossRef] 212. Gábor Békés, Péter Harasztosi. 2013. Agglomeration premium and trading activity of firms. Regional Science and Urban Economics 43, 51-64. [CrossRef] 213. Kishore Gawande, Hank Jenkins-Smith, May Yuan. 2013. The long-run impact of nuclear waste shipments on the property market: Evidence from a quasi-experiment. Journal of Environmental Economics and Management 65, 56-73. [CrossRef] 214. Gustavo Canavire-Bacarreza, Merlin M. Hanauer. 2013. Estimating the Impacts of Bolivia’s Protected Areas on Poverty. World Development 41, 265-285. [CrossRef] 215. Kwaw S. Andam, Paul J. Ferraro, Merlin M. Hanauer. 2013. The effects of protected area systems on ecosystem restoration: a quasi-experimental design to estimate the impact of Costa Rica's protected area system on forest regrowth. Conservation Letters n/a-n/a. [CrossRef] 216. Hyun Ah Kim, Yong-seong Kim, Myoung-jae Lee. 2012. Treatment effect analysis of early reemployment bonus program: panel MLE and mode-based semiparametric estimator for interval truncation. Portuguese Economic Journal 11, 189-209. [CrossRef] 217. Malcolm Keswell, Justine Burns, Rebecca Thornton. 2012. Evaluating the Impact of Health Programmes on Productivity. African Development Review 24:10.1111/afdr.2012.24.issue-4, 302-315. [CrossRef] 218. Doris Läpple, Thia Hennessy, Carol Newman. 2012. Quantifying the Economic Return to Participatory Extension Programmes in Ireland: an Endogenous Switching Regression Analysis. Journal of Agricultural Economics no-no. [CrossRef] 219. Arne Feddersen, Wolfgang Maennig. 2012. Sectoral labour market effects of the 2006 FIFA World Cup. Labour Economics 19, 860-869. [CrossRef] 220. Robert Girtz. 2012. The Effects of Personality Traits on Wages: A Matching Approach. LABOUR 26:10.1111/labr.2012.26.issue-4, 455-471. [CrossRef] 221. Daniel P. McMillen. 2012. Repeat Sales as a Matching Estimator. Real Estate Economics 40, 745-773. [CrossRef] 222. Ichiro Iwasaki, Péter Csizmadia, Miklós Illéssy, Csaba Makó, Miklós Szanyi. 2012. The Nested Variable Model of FDI Spillover Effects: Estimation Using Hungarian Panel Data. International Economic Journal 26, 673-709. [CrossRef] 223. Martin Petrick, Patrick Zier. 2012. Common Agricultural Policy effects on dynamic labour use in agriculture. Food Policy 37, 671-678. [CrossRef] 224. David McKenzie. 2012. Beyond baseline and follow-up: The case for more T in experiments. Journal of Development Economics 99, 210-221. [CrossRef] 225. Bruno Martorano, Marco Sanfilippo. 2012. INNOVATIVE FEATURES IN POVERTY REDUCTION PROGRAMMES: AN IMPACT EVALUATION OF CHILE SOLIDARIO ON HOUSEHOLDS AND CHILDREN. Journal of International Development 24:10.1002/jid.v24.8, 1030-1041. [CrossRef] 226. Hiroyuki Takeshima, Latha Nagarajan. 2012. Minor millets in Tamil Nadu, India: local market participation, on-farm diversity and farmer welfare. Environment and Development Economics 17, 603-632. [CrossRef] 227. Yu Ye, Jason C. Bond, Laura A. Schmidt, Nina Mulia, Tammy W. Tam. 2012. Toward a better understanding of when to apply propensity scoring: a comparison with conventional regression in ethnic disparities research. Annals of Epidemiology 22, 691-697. [CrossRef] 228. José Miguel Benavente, Gustavo Crespi, Lucas Figal Garone, Alessandro Maffioli. 2012. The impact of national research funds: A regression discontinuity approach to the Chilean FONDECYT. Research Policy 41, 1461-1475. [CrossRef] 229. James I. Stewart. 2012. Migration to U.S. frontier cities and job opportunity, 1860–1880. Explorations in Economic History 49, 528-542. [CrossRef] 230. Dmitriy Muravyev, Neil D. Pearson, John Paul Broussard. 2012. Is there price discovery in equity options?. Journal of Financial Economics . [CrossRef] 231. Dirk Czarnitzki, Susanne Thorwarth. 2012. The Contribution of In-house and External Design Activities to Product Market Performance. Journal of Product Innovation Management 29:10.1111/ jpim.2012.29.issue-5, 878-895. [CrossRef] 232. Nassul Ssentamu Kabunga, Thomas Dubois, Matin Qaim. 2012. Heterogeneous information exposure and technology adoption: the case of tissue culture bananas in Kenya. Agricultural Economics 43:10.1111/agec.2012.43.issue-5, 473-486. [CrossRef] 233. Manuel Gomes, Richard Grieve, Richard Nixon, Edmond S.-W. Ng, James Carpenter, Simon G. Thompson. 2012. METHODS FOR COVARIATE ADJUSTMENT IN COST- EFFECTIVENESS ANALYSIS THAT USE CLUSTER RANDOMISED TRIALS. Health Economics 21:10.1002/hec.v21.9, 1101-1118. [CrossRef] 234. David T. Butry. 2012. Comparing the performance of residential fire sprinklers with other life-safety technologies. Accident Analysis & Prevention 48, 480-494. [CrossRef] 235. Rosario Crinò. 2012. Imported inputs and skill upgrading. Labour Economics . [CrossRef] 236. Cristina Borra, Maria Iacovou, Almudena Sevilla. 2012. The effect of breastfeeding on children's cognitive and noncognitive development. Labour Economics 19, 496-515. [CrossRef] 237. Christian Langpap, Joe Kerkvliet. 2012. Endangered species conservation on private land: Assessing the effectiveness of habitat conservation plans. Journal of Environmental Economics and Management 64, 1-15. [CrossRef] 238. Hung-Hao Chang. 2012. Does the use of eco-labels affect income distribution and income inequality of aquaculture producers in Taiwan?. Ecological Economics . [CrossRef] 239. Patrick J. McEwan. 2012. Cost-effectiveness analysis of education and health interventions in developing countries. Journal of Development Effectiveness 4, 189-213. [CrossRef] 240. Jeffrey L. Furman, Fiona Murray, Scott Stern. 2012. Growing Stem Cells: The Impact of Federal Funding Policy on the U.S. Scientific Frontier. Journal of Policy Analysis and Management 31, 661-705. [CrossRef] 241. Hiroki Uematsu, Ashok K. Mishra. 2012. Organic farmers or conventional farmers: Where's the money?. Ecological Economics 78, 55-62. [CrossRef] 242. François Claveau. 2012. The Russo–Williamson Theses in the social sciences: Causal inference drawing on two types of evidence. Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences . [CrossRef] 243. Ephraim Nkonya, Dayo Phillip, Tewodaj Mogues, John Pender, Edward Kato. 2012. Impacts of Community-driven Development Programs on Income and Asset Acquisition in Africa: The Case of Nigeria. World Development . [CrossRef] 244. Daniel L. Millimet, Rusty Tchernis. 2012. Estimation of Treatment Effects without an Exclusion Restriction: with an Application to the Analysis of the School Breakfast Program. Journal of Applied Econometrics n/a-n/a. [CrossRef] 245. T. Randolph Beard, George S. Ford, Richard P. Saba, Richard A. Seals. 2012. Internet use and job search. Telecommunications Policy 36, 260-273. [CrossRef] 246. Thomas K. Bauer, Stefan Bender, Alfredo R. Paloyo, Christoph M. Schmidt. 2012. Evaluating the labor-market effects of compulsory military service. European Economic Review 56, 814-829. [CrossRef] 247. Sarah Kuck Jalbert, William Rhodes. 2012. Reduced caseloads improve probation outcomes. Journal of Crime and Justice 1-18. [CrossRef] 248. Barry T. Hirsch, Edward J. Schumacher. 2012. Underpaid or Overpaid? Wage Analysis for Nurses Using Job and Worker Attributes. Southern Economic Journal 78, 1096-1119. [CrossRef] 249. B Sampaio. 2012. To generalize or not to generalize? Comment on Robinson and Davies. Journal of the Operational Research Society 63, 563-565. [CrossRef] 250. Myoung-Jae Lee. 2012. Semiparametric Estimators for Limited Dependent Variable (LDV) Models with Endogenous Regressors. Econometric Reviews 31, 171-214. [CrossRef] 251. Francisco Henríquez, Bernardo Lara, Alejandra Mizala, Andrea Repetto. 2012. Effective schools do exist: low-income children's academic performance in Chile. Applied Economics Letters 19, 445-451. [CrossRef] 252. Vijesh V. Krishna, Matin Qaim. 2012. Bt cotton and sustainability of pesticide reductions in India. Agricultural Systems 107, 47-55. [CrossRef] 253. Heiner Mikosch, Jan-Egbert Sturm. 2012. Has the EMU reduced wage growth and unemployment? Testing a model of trade union behavior. European Journal of Political Economy 28, 27-37. [CrossRef] 254. D. A. Miteva, S. K. Pattanayak, P. J. Ferraro. 2012. Evaluation of biodiversity policy instruments: what works and what doesn't?. Oxford Review of Economic Policy 28, 69-92. [CrossRef] 255. Astrid Kiil. 2012. Does employment-based private health insurance increase the use of covered health care services? A matching estimator approach. International Journal of Health Care Finance and Economics . [CrossRef] 256. Patrick J. Egan. 2012. Group Cohesion without Group Mobilization: The Case of Lesbians, Gays and Bisexuals. British Journal of Political Science 1-20. [CrossRef] 257. Petra Moser,, Alessandra Voena. 2012. Compulsory Licensing: Evidence from the Trading with the Enemy Act. American Economic Review 102:1, 396-427. [Abstract] [View PDF article] [PDF with links] 258. Enrico Beretta, Silvia Del Prete. 2012. Bank Acquisitions and Decentralization Choices. Economic Notes 41:10.1111/ecno.2012.41.issue-1-2, 27-57. [CrossRef] 259. Peter C. Rockers, Andrea B. Feigl, John-Arne Røttingen, Atle Fretheim, David de Ferranti, John N. Lavis, Hans Olav Melberg, Till Bärnighausen. 2012. Study-design selection criteria in systematic reviews of effectiveness of health systems interventions and reforms: A meta-review. Health Policy . [CrossRef] 260. Ingeborg Waernbaum. 2012. Model misspecification and robustness in causal inference: comparing matching with doubly robust estimation. Statistics in Medicine n/a-n/a. [CrossRef] 261. C. N. Brinch, T. A. Galloway. 2012. Schooling in adolescence raises IQ scores. Proceedings of the National Academy of Sciences 109, 425-430. [CrossRef] 262. P. M. Dontsop Nguezet, V. O. Okoruwa, A. I. Adeoti, K. O. Adenegan. 2012. Productivity Impact Differential of Improved Rice Technology Adoption Among Rice Farming Households in Nigeria. Journal of Crop Improvement 26, 1-21. [CrossRef] 263. S. Benin, E. Nkonya, G. Okecho, J. Randriamamonjy, E. Kato, G. Lubade, M. Kyotalimye. 2012. Impact of the National Agricultural Advisory Services (Naads) program of Uganda: Considering Different Levels of Likely Contamination with the Treatment. American Journal of Agricultural Economics 94, 386-392. [CrossRef] 264. Harry J Holzer. 2012. Good workers for good jobs: improving education and workforce systems in the US. IZA Journal of Labor Policy 1, 5. [CrossRef] 265. Marco Caliendo, Jens Hogenacker. 2012. The German labor market after the Great Recession: successful reforms and future challenges. IZA Journal of European Labor Studies 1, 3. [CrossRef] 266. Liangjun Su, Halbert L. WhiteConditional Independence Specification Testing for Dependent Processes with Local Polynomial Quantile Regression 355-434. [CrossRef] 267. Nicholas Bloom, Zack Cooper, Martin Gaynor, Stephen Gibbons, Simon Jones, Alistair McGuire, Rodrigo Moreno-Serra, Carol Propper, John Van Reenen, Stephan Seiler. 2011. In defence of our research on competition in England's National Health Service. The Lancet 378, 2064-2065. [CrossRef] 268. MARIA DE PAOLA, VINCENZO SCOPPA. 2011. THE EFFECTS OF CLASS SIZE ON THE ACHIEVEMENT OF COLLEGE STUDENTS*. The Manchester School 79:10.1111/ manc.2011.79.issue-6, 1061-1079. [CrossRef] 269. Allyson Pollock, Azeem Majeed, Alison Macfarlane, Ian Greener, Graham Kirkwood, Howard Mellett, Sylvia Godden, Sean Boyle, Carol Morelli, Petra Brhlikova. 2011. In defence of our research on competition in England's National Health Service – Authors' reply. The Lancet 378, 2065-2066. [CrossRef] 270. Fabrizia Mealli, Barbara Pacini, Donald B. RubinStatistical Inference for Causal Effects 171-192. [CrossRef] 271. R. G. Fryer. 2011. Financial Incentives and Student Achievement: Evidence From Randomized Trials. The Quarterly Journal of Economics . [CrossRef] 272. Thomas C. Buchmueller,, John DiNardo,, Robert G. Valletta. 2011. The Effect of an Employer Health Insurance Mandate on Health Insurance Coverage and the Demand for Labor: Evidence from Hawaii. American Economic Journal: Economic Policy 3:4, 25-51. [Abstract] [View PDF article] [PDF with links] 273. Beatrix Eugster, Rafael Lalive, Andreas Steinhauer, Josef Zweimüller. 2011. The Demand for Social Insurance: Does Culture Matter?*. The Economic Journal 121:10.1111/ecoj.2011.121.issue-556, F413-F448. [CrossRef] 274. Timothy Powell-Jackson, Kara Hanson. 2011. Financial incentives for maternal health: impact of a national programme in nepal. Journal of Health Economics . [CrossRef] 275. WILLIAM NILSSON. 2011. HETEROGENEITY OR TRUE STATE DEPENDENCE IN POVERTY: THE TALE TOLD BY TWINS. Review of Income and Wealth no-no. [CrossRef] 276. Shelia R. Cotten, George Ford, Sherry Ford, Timothy M. Hale. 2011. Internet use and depression among older adults. Computers in Human Behavior . [CrossRef] 277. Annette Bergemann, Marco Caliendo, Gerard J. van den Berg, Klaus F. Zimmermann. 2011. The threat effect of participation in active labor market programs on job search behavior of migrants in Germany. International Journal of Manpower 32, 777-795. [CrossRef] 278. Hunt Allcott. 2011. Social norms and energy conservation. Journal of Public Economics 95, 1082-1095. [CrossRef] 279. Marco Di Cintio, Emanuele Grassi. 2011. Internal migration and wages of Italian university graduates*. Papers in Regional Science no-no. [CrossRef] 280. Kevin Gross, Jay A. Rosenheim. 2011. Quantifying secondary pest outbreaks in cotton and their monetary cost with causal-inference statistics. Ecological Applications 21, 2770-2780. [CrossRef] 281. François Claveau. 2011. Evidential variety as a source of credibility for causal inference: beyond sharp designs and structural models. Journal of 18, 233-253. [CrossRef] 282. C. D. Meyerhoefer, M. Yang. 2011. The Relationship between Food Assistance and Health: A Review of the Literature and Empirical Strategies for Identifying Program Effects. Applied Economic Perspectives and Policy 33, 304-344. [CrossRef] 283. Mark B. Stewart. 2011. Quantile estimates of counterfactual distribution shifts and the effect of minimum wage increases on the wage distribution. Journal of the Royal Statistical Society: Series A (Statistics in Society) no-no. [CrossRef] 284. Sébastien Massoni, Jean-Christophe Vergnaud. 2011. How to improve pupils’ literacy? A cost- effectiveness analysis of a French educational project. Economics of Education Review . [CrossRef] 285. Laure B. de Preux. 2011. Anticipatory ex ante moral hazard and the effect of medicare on prevention. Health Economics 20:10.1002/hec.v20.9, 1056-1072. [CrossRef] 286. P. J. Ferraro, M. M. Hanauer, K. R. E. Sims. 2011. Conditions associated with protected area success in conservation and poverty reduction. Proceedings of the National Academy of Sciences 108, 13913-13918. [CrossRef] 287. Elisabetta Magnani, Rong Zhu. 2011. Gender wage differentials among rural–urban migrants in China. Regional Science and Urban Economics . [CrossRef] 288. A. J. Reynolds, J. A. Temple, S.-R. Ou, I. A. Arteaga, B. A. B. White. 2011. School-Based Early Childhood Education and Age-28 Well-Being: Effects by Timing, Dosage, and Subgroups. Science 333, 360-364. [CrossRef] 289. Yongheng Deng, Daniel P. McMillen, Tien Foo Sing. 2011. Private residential price indices in Singapore: A matching approach. Regional Science and Urban Economics . [CrossRef] 290. Erling Barth, Bernt Bratsberg, Torbjørn Haegeland, Oddbjørn Raaum. 2011. Performance Pay, Union Bargaining and Within-Firm Wage Inequality*. Oxford Bulletin of Economics and Statistics no-no. [CrossRef] 291. Bert Scholtens. 2011. The sustainability of green funds. Natural Resources Forum no-no. [CrossRef] 292. Oliviero Carboni. 2011. R&D subsidies and private R&D expenditures: evidence from Italian manufacturing data. International Review of Applied Economics 25, 419-439. [CrossRef] 293. Isabelle Martinez, Stéphanie Serve. 2011. The delisting decision: The case of buyout offer with squeeze-out (BOSO). International Review of Law and Economics . [CrossRef] 294. Anirban Basu, Daniel Polsky, Willard G. Manning. 2011. Estimating treatment effects on healthcare costs under exogeneity: is there a ‘magic bullet’?. Health Services and Outcomes Research Methodology . [CrossRef] 295. Benedito Cunguara, Ika Darnhofer. 2011. Assessing the impact of improved agricultural technologies on household income in rural Mozambique. Food Policy 36, 378-390. [CrossRef] 296. Taro Esaka. 2011. Do hard pegs avoid currency crises? An evaluation using matching estimators. Economics Letters . [CrossRef] 297. Jonathan R. Kesselman. 2011. Consumer Impacts of BC's Harmonized Sales Tax: Tax Grab or Pass- Through?. Canadian Public Policy 37, 139-162. [CrossRef] 298. Tarjei Havnes, Magne Mogstad. 2011. Money for nothing? Universal child care and maternal employment. Journal of Public Economics . [CrossRef] 299. Frank D. Bean, Mark A. Leach, Susan K. Brown, James D. Bachmeier, John R Hipp. 2011. The Educational Legacy of Unauthorized Migration: Comparisons Across U.S.-Immigrant Groups in How Parents’ Status Affects Their Offspring1. International Migration Review 45:10.1111/ imre.2011.45.issue-2, 348-385. [CrossRef] 300. B. Cunguara, K. Moder. 2011. Is Agricultural Extension Helping the Poor? Evidence from Rural Mozambique. Journal of African Economies . [CrossRef] 301. Luis H. B. Braido, Pedro Olinto, Helena Perrone. 2011. Gender Bias in Intrahousehold Allocation: Evidence from an Unintentional Experiment. Review of Economics and Statistics 120224102436000. [CrossRef] 302. D. Newhouse, D. Suryadarma. 2011. The value of vocational education: High school type and labor market outcomes in Indonesia. The World Bank Economic Review . [CrossRef] 303. DAVID DREYER LASSEN, SØREN SERRITZLEW. 2011. Jurisdiction Size and Local Democracy: Evidence on Internal Political Efficacy from Large-scale Municipal Reform. American Political Science Review 1-21. [CrossRef] 304. Tarjei Havnes,, Magne Mogstad. 2011. No Child Left Behind: Subsidized Child Care and Children's Long-Run Outcomes. American Economic Journal: Economic Policy 3:2, 97-129. [Abstract] [View PDF article] [PDF with links] 305. Bryan S. Graham,, Keisuke Hirano. 2011. Robustness to Parametric Assumptions in Missing Data Models. American Economic Review 101:3, 538-543. [Abstract] [View PDF article] [PDF with links] 306. Lota D. Tamini. 2011. A nonparametric analysis of the impact of agri-environmental advisory activities on best management practice adoption: A case study of Québec. Ecological Economics 70, 1363-1374. [CrossRef] 307. B. Cantillon, A. De Ridder, E. Vanhaecht, G. Verbist. 2011. (Un)desirable effects of output funding for Flemish universities. Economics of Education Review . [CrossRef] 308. Pao-Li Chang, Myoung-Jae Lee. 2011. The WTO trade effect. Journal of International Economics . [CrossRef] 309. Martin Huber, Michael Lechner, Conny Wunsch. 2011. Does leaving welfare improve health? Evidence for Germany. Health Economics 20:10.1002/hec.v20.4, 484-504. [CrossRef] 310. Jay A. Rosenheim, Soroush Parsa, Andrew A. Forbes, William A. Krimmel, Yao Hua Law, Michal Segoli, Moran Segoli, Frances S. Sivakoff, Tania Zaviezo, Kevin Gross. 2011. Ecoinformatics for Integrated Pest Management: Expanding the Applied Insect Ecologist's Tool-Kit. Journal of Economic Entomology 104, 331-342. [CrossRef] 311. Nikolaus Graf, Helmut Hofer, Rudolf Winter-Ebmer. 2011. Labor supply effects of a subsidized old- age part-time scheme in Austria. Zeitschrift für ArbeitsmarktForschung . [CrossRef] 312. Hilmar Schneider, Arne Uhlendorff, Klaus F. Zimmermann. 2011. Mit Workfare aus der Sozialhilfe? Lehren aus einem Modellprojekt. Zeitschrift für ArbeitsmarktForschung . [CrossRef] 313. Yu Xiao. 2011. LOCAL ECONOMIC IMPACTS OF NATURAL DISASTERS*. Journal of Regional Science no-no. [CrossRef] 314. Martin Petrick, Patrick Zier. 2011. Regional employment impacts of Common Agricultural Policy measures in Eastern Germany: a difference-in-differences approach. Agricultural Economics 42:10.1111/agec.2011.42.issue-2, 183-193. [CrossRef] 315. Dirk Czarnitzki, Petr Hanel, Julio Miguel Rosa. 2011. Evaluating the impact of R&D tax credits on [Research Policy 40, 217-229. [CrossRef .׳innovation: A microeconometric study on Canadian firms 316. Eeshani Kandpal. 2011. Beyond Average Treatment Effects: Distribution of Child Nutrition Outcomes and Program Placement in India’s ICDS. World Development . [CrossRef] 317. Hendrik Jürges, Kerstin Schneider. 2011. Why Young Boys Stumble: Early Tracking, Age and Gender Bias in the German School System. German Economic Review no-no. [CrossRef] 318. Gregory K. Leonard, G. Steven Olley. 2011. What Can Be Learned About the Competitive Effects of Mergers from “Natural Experiments”?. International Journal of the Economics of Business 18, 103-107. [CrossRef] 319. Christian Volpe Martincus, Jeronimo Carballo, Pablo Garcia. 2011. Public programmes to promote firms' exports in developing countries: are there heterogeneous effects by size categories?. Applied Economics 1-21. [CrossRef] 320. Alberto Abadie, Guido W. Imbens. 2011. Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business and Economic Statistics 29, 1-11. [CrossRef] 321. W. Bentley MacLeodGreat Expectations: Law, Employment Contracts, and Labor Market Performance 1591-1696. [CrossRef] 322. Bijan J. Borah, Marguerite E. Burns, Nilay D. Shah. 2011. Assessing the impact of high deductible health plans on health-care utilization and cost: a changes-in-changes approach. Health Economics n/a-n/a. [CrossRef] 323. Eric French, Christopher TaberIdentification of Models of the Labor Market 537-617. [CrossRef] 324. Zia Sadique, Richard Grieve, David A Harrison, Brian H Cuthbertson, Kathryn M Rowan. 2011. Is Drotrecogin alfa (activated) for adults with severe sepsis, cost-effective in routine clinical practice?. Critical Care 15, R228. [CrossRef] 325. Paul J. Ferraro, Merlin M. Hanauer. 2011. Protecting Ecosystems and Alleviating Poverty with Parks and Reserves: ‘Win-Win’ or Tradeoffs?. Environmental and Resource Economics 48:2, 269. [CrossRef] 326. Tove Faber Frandsen, Jeppe Nicolaisen. 2011. Praise the bridge that carries you over: Testing the flattery citation hypothesis. Journal of the American Society for Information Science and Technology n/ a-n/a. [CrossRef] 327. Olli Ropponen. 2011. Reconciling the evidence of Card and Krueger (1994) and Neumark and Wascher (2000). Journal of Applied Econometrics n/a-n/a. [CrossRef] 328. Providing Clean Water: Evidence from Randomized Evaluations 67-77. [CrossRef] 329. David M. DrukkerIntroduction ix-xiv. [CrossRef] 330. Daniel L. MillimetThe Elephant in the Corner: A Cautionary Tale about Measurement Error in Treatment Effects Models 1-39. [CrossRef] 331. Myoung-jae Lee, Sanghyeok LeeLikelihood-Based Estimators for Endogenous or Truncated Samples in Standard Stratified Sampling 63-91. [CrossRef] 332. Matias D. Cattaneo, Max H. FarrellEfficient Estimation of the Dose–Response Function Under Ignorability Using Subclassification on the Covariates 93-127. [CrossRef] 333. Ian M. McCarthy, Rusty TchernisOn the Estimation of Selection Models when Participation is Endogenous and Misclassified 179-207. [CrossRef] 334. B. F. Arnold, R. S. Khush, P. Ramaswamy, A. G. London, P. Rajkumar, P. Ramaprabha, N. Durairaj, A. E. Hubbard, K. Balakrishnan, J. M. Colford. 2010. Causal inference methods to study nonrandomized, preexisting development interventions. Proceedings of the National Academy of Sciences 107, 22605-22610. [CrossRef] 335. Daniel J. Graham, Kurt Van Dender. 2010. Estimating the agglomeration benefits of transport investments: some tests for stability. Transportation . [CrossRef] 336. Arnab K. Acharya, Giulia Greco, Edoardo Masset. 2010. The economics approach to evaluation of health interventions in developing countries through randomised field trial. Journal of Development Effectiveness 2, 401-420. [CrossRef] 337. Christopher S. Armstrong, Wayne R. Guay, Joseph P. Weber. 2010. The role of information and Journal of Accounting and Economics .׳financial reporting in corporate governance and debt contracting 50, 179-234. [CrossRef] 338. Wang-Sheng Lee, Michael B. Coelli. 2010. The Labour Market Effects of Vocational Education and Training in Australia. Australian Economic Review 43:10.1111/aere.2010.43.issue-4, 389-408. [CrossRef] 339. C. B. Barrett, M. R. Carter. 2010. The Power and Pitfalls of Experiments in Development Economics: Some Non-random Reflections. Applied Economic Perspectives and Policy 32, 515-548. [CrossRef] 340. James Fenske. 2010. THE CAUSAL HISTORY OF AFRICA: A RESPONSE TO HOPKINS. Economic History of Developing Regions 25, 177-212. [CrossRef] 341. Carlos A. Flores, Alfonso Flores-Lagunes, Arturo Gonzalez, Todd C. Neumann. 2010. Estimating the Effects of Length of Exposure to Instruction in a Training Program: The Case of Job Corps. Review of Economics and Statistics 111102154542001. [CrossRef] 342. David Card, Jochen Kluve, Andrea Weber. 2010. Active Labour Market Policy Evaluations: A Meta- Analysis*. The Economic Journal 120, F452-F477. [CrossRef] 343. Alain de Janvry, Elisabeth Sadoulet, Sofia Villas-Boas. 2010. Short on shots: Are calls for cooperative restraint effective in managing a flu vaccines shortage?. Journal of Economic Behavior & Organization 76, 209-224. [CrossRef] 344. ÁDÁM SZENTPÉTERI, ÁLMOS TELEGDY. 2010. POLITICAL SELECTION OF FIRMS INTO PRIVATIZATION PROGRAMS. EVIDENCE FROM ROMANIAN COMPREHENSIVE DATA. Economics & Politics 22, 298-328. [CrossRef] 345. Halbert White, Xun Lu. 2010. Causal Diagrams for Treatment Effect Estimation with Application to Efficient Covariate Selection. Review of Economics and Statistics 110823094915005. [CrossRef] 346. Judea Pearl. 2010. THE FOUNDATIONS OF CAUSAL INFERENCE. Sociological Methodology 40, 75-149. [CrossRef] 347. Jenny C. Aker. 2010. Information from Markets Near and Far: Mobile Phones and Agricultural Markets in Niger. American Economic Journal: Applied Economics 2:3, 46-59. [Abstract] [View PDF article] [PDF with links] 348. René Böheim, Andrea Weber. 2010. The Effects of Marginal Employment on Subsequent Labour Market Outcomes. German Economic Review no-no. [CrossRef] 349. Guido W. Imbens. 2010. Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009). Journal of Economic Literature 48:2, 399-423. [Abstract] [View PDF article] [PDF with links] 350. James J. Heckman. 2010. Building Bridges between Structural and Program Evaluation Approaches to Evaluating Policy. Journal of Economic Literature 48:2, 356-398. [Abstract] [View PDF article] [PDF with links] 351. Tobias Wolbring. 2010. Weshalb die Separierung von Produktivitätseffekten und Diskriminierung bei der studentischen Lehrveranstaltungsbewertung misslingt. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie 62, 317-326. [CrossRef] 352. Markus Gangl. 2010. Causal Inference in Sociological Research. Annual Review of Sociology 36, 21-47. [CrossRef] 353. Joshua D. Angrist,, Jörn-Steffen Pischke,. 2010. The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics. Journal of Economic Perspectives 24:2, 3-30. [Abstract] [View PDF article] [PDF with links] 354. Matias D. Cattaneo. 2010. Efficient semiparametric estimation of multi-valued treatment effects under [Journal of Econometrics 155, 138-154. [CrossRef .׳ignorability 355. Peter Midmore, Mark D. Partridge, M. Rose Olfert, Kamar Ali. 2010. The Evaluation of Rural Development Policy: Macro and Micro Perspectives L’évaluation de la politique de développement rural: perspectives macro et microéconomiques Die Evaluation der Politik zur Entwicklung des ländlichen Raums: Mikro- und Makroperspekti. EuroChoices 9, 24-29. [CrossRef] 356. Fabio Veras Soares, Rafael Perez Ribas, Guilherme Issamu Hirata. 2010. Impact evaluation of a rural conditional cash transfer programme on outcomes beyond health and education. Journal of Development Effectiveness 2, 138-157. [CrossRef] 357. Runsheng Yin, Guiping Yin, Lanying Li. 2010. Assessing China’s Ecological Restoration Programs: What’s Been Done and What Remains to Be Done?. Environmental Management 45, 442-453. [CrossRef] 358. Franklin E. Zimring, Jeffrey Fagan, David T. Johnson. 2010. Executions, Deterrence, and Homicide: A Tale of Two Cities. Journal of Empirical Legal Studies 7:10.1111/jels.2010.7.issue-1, 1-29. [CrossRef] 359. Akhter Ali, Awudu Abdulai. 2010. The Adoption of Genetically Modified Cotton and Poverty Reduction in Pakistan. Journal of Agricultural Economics 61:10.1111/jage.2010.61.issue-1, 175-192. [CrossRef] 360. Steven Koch, Olufunke Alaba. 2010. On health insurance and household decisions: A treatment effect [Social Science & Medicine 70, 175-182. [CrossRef .׳analysis 361. Joseph Farrell, Paul A. Pautler, Michael G. Vita. 2009. Economics at the FTC: Retrospective Merger Analysis with a Focus on Hospitals. Review of Industrial Organization 35, 369-385. [CrossRef] 362. Andreas Kuhn, Rafael Lalive, Josef Zweimüller. 2009. The public health costs of job loss. Journal of Health Economics 28, 1099-1115. [CrossRef] 363. Onur Baser. 2009. Too Much Ado about Instrumental Variable Approach: Is the Cure Worse than the Disease?. Value in Health 12, 1201-1209. [CrossRef] 364. Ori Heffetz,, Moses Shayo. 2009. How Large Are Non-Budget-Constraint Effects of Prices on Demand?. American Economic Journal: Applied Economics 1:4, 170-199. [Abstract] [View PDF article] [PDF with links]