Recent Developments in the Econometrics of Program Evaluation

Journal of Economic Literature 2009, 47:1, 5–86 http:www.aeaweb.org/articles.php?doi=10.1257/jel.47.1.5 Recent Developments in the Econometrics of Program Evaluation Guido W. Imbens and Jeffrey M. Wooldridge* Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades, much research has been done on the econometric and statistical analysis of such causal effects. This recent theoretical literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization, and other areas of empirical microeconomics. In this review, we discuss some of the recent developments. We focus primarily on practical issues for empirical researchers, as well as provide a historical overview of the area and give references to more technical research. 1. Introduction research in economics and suitable for a review. In this article, we attempt to pres- any empirical questions in economics ent such a review. We will focus on practi- Mand other social sciences depend on cal issues for empirical researchers, as well as causal effects of programs or policies. In the provide an historical overview of the area and last two decades, much research has been give references to more technical research. done on the econometric and statistical anal- This review complements and extends other ysis of such causal effects. This recent theo- reviews and discussions, including those by retical literature has built on, and combined Richard Blundell and Monica Costa Dias features of, earlier work in both the statistics (2002), Guido W. Imbens (2004), and Joshua and econometrics literatures. It has by now D. Angrist and Alan B. Krueger (1999) and reached a level of maturity that makes it an the books by Paul R. Rosenbaum (1995), important tool in many areas of empirical Judea Pearl (2000), Myoung-Jae Lee (2005a), Donald B. Rubin (2006), Marco Caliendo (2006), Angrist and Jörn-Steffen Pischke * Imbens: Harvard University and NBER. Wooldridge: Michigan State University. Financial support for this (2009), Howard S. Bloom (2005), Stephen research was generously provided through NSF grants L. Morgan and Christopher Winship (2007), SES 0136789, 0452590 and 08. We are grateful for com- Jeffrey M. Wooldridge (2002) and Imbens and ments by Esther Duflo, Caroline Hoxby, Roger Gordon, Jonathan Beauchamp, Larry Katz, Eduardo Morales, and Rubin (forthcoming). In addition, the reviews two anonymous referees. in James J. Heckman, Robert J. LaLonde, 5 6 Journal of Economic Literature, Vol. XLVII (March 2009) and Jeffrey A. Smith (1999), Heckman and can involve different physical units or the Edward Vytlacil (2007a, 2007b), and Jaap same physical unit at different times. H. Abbring and Heckman (2007) provide an The problem of evaluating the effect of a excellent overview of the important theoreti- binary treatment or program is a well studied cal work by Heckman and his coauthors in problem with a long history in both econo- this area. metrics and statistics. This is true both in The central problem studied in this liter- the theoretical literature as well as in the ature is that of evaluating the effect of the more applied literature. The econometric exposure of a set of units to a program, or literature goes back to early work by Orley treatment, on some outcome. In economic Ashenfelter (1978) and subsequent work by studies, the units are typically economic Ashenfelter and David Card (1985), Heckman agents such as individuals, households, mar- and Richard Robb (1985), LaLonde (1986), kets, firms, counties, states, or countries Thomas Fraker and Rebecca Maynard but, in other disciplines where evaluation (1987), Card and Daniel G. Sullivan (1988), methods are used, the units can be animals, and Charles F. Manski (1990). Motivated plots of land, or physical objects. The treat- primarily by applications to the evaluation of ments can be job search assistance programs, labor market programs in observational set- educational programs, vouchers, laws or tings, the focus in the econometric literature regulations, medical drugs, environmental is traditionally on endogeneity, or self-selec- exposure, or technologies. A critical feature tion, issues. Individuals who choose to enroll is that, in principle, each unit can be exposed in a training program are by definition dif- to multiple levels of the treatment. Moreover, ferent from those who choose not to enroll. this literature is focused on settings with These differences, if they influence the observations on units exposed, and not response, may invalidate causal comparisons exposed, to the treatment, with the evalua- of outcomes by treatment status, possibly tion based on comparisons of units exposed even after adjusting for observed covariates. and not exposed.1 For example, an individual Consequently, many of the initial theoreti- may enroll or not in a training program, or he cal studies focused on the use of traditional or she may receive or not receive a voucher, econometric methods for dealing with endo- or be subject to a particular regulation or geneity, such as fixed effect methods from not. The object of interest is a comparison panel data analyses, and instrumental vari- of the two outcomes for the same unit when ables methods. Subsequently, the economet- exposed, and when not exposed, to the treat- rics literature has combined insights from ment. The problem is that we can at most the semiparametric literature to develop new observe one of these outcomes because the estimators for a variety of settings, requir- unit can be exposed to only one level of the ing fewer functional form and homogeneity treatment. Paul W. Holland (1986) refers to assumptions. this as the fundamental problem of causal The statistics literature starts from a dif- inference. In order to evaluate the effect of ferent perspective. This literature originates the treatment, we therefore always need to in the analysis of randomized experiments by compare distinct units receiving the different Ronald A. Fisher (1935) and Jerzy Splawa- levels of the treatment. Such a comparison Neyman (1990). From the early 1970s, Rubin (1973a, 1973b, 1974, 1977, 1978), in a series of papers, formulated the now dominant 1 As oppposed to studies where the causal effect of fundamentally new programs is predicted through direct approach to the analysis of causal effects in identification of preferences and production functions. observational studies. Rubin proposed the Imbens and Wooldridge: Econometrics of Program Evaluation 7 interpretation of causal statements as com- market programs, although more recently parisons of so-called potential outcomes: this emphasis seems to have weakened a pairs of outcomes defined for the same unit bit. In the last couple of years, some of the given different levels of exposure to the treat- most interesting experiments have been ment, with the ressearcher only observing conducted in development economics (e.g., the potential outcome corresponding to the Edward Miguel and Michael Kremer 2004; level of the treatment received. Models are Esther Duflo 2001; Angrist, Eric Bettinger, developed for the pair of potential outcomes and Kremer 2006; Abhijit V. Banerjee rather than solely for the observed outcome. et al. 2007) and behavioral econom- Rubin’s formulation of the evaluation prob- ics (e.g., Marianne Bertrand and Sendhil lem, or the problem of causal inference, Mullainathan 2004). Nevertheless, experi- labeled the Rubin Causal Model (RCM) by mental evaluations remain relatively rare in Holland (1986), is by now standard in both economics. More common is the case where the statistics and econometrics literature. economists analyze data from observational One of the attractions of the potential out- studies. Observational data generally cre- comes setup is that from the outset it allows ate challenges in estimating causal effects for general heterogeneity in the effects of the but, in one important special case, variously treatment. Such heterogeneity is important referred to as unconfoundedness, exogene- in practice, and it is important theoretically ity, ignorability, or selection on observables, as it is often the motivation for the endogene- questions regarding identification and esti- ity problems that concern economists. One mation of the policy effects are fairly well additional advantage of the potential out- understood. All these labels refer to some come set up is that the parameters of interest form of the assumption that adjusting treat- can be defined, and the assumptions stated, ment and control groups for differences in without reference to particular statistical observed covariates, or pretreatment vari- models. ables, remove all biases in comparisons Of particular importance in Rubin’s between treated and control units. This case approach is the relationship between treat- is of great practical relevance, with many ment assignment and the potential out- studies relying on some form of this assump- comes. The simplest case for analysis is when tion. The semiparametric efficiency bound assignment to treatment is randomized and, has been calculated for this case (Jinyong thus, independent of covariates as well as the Hahn 1998) and various semiparametric potential outcomes. In such classical ran- estimators have been proposed (Hahn 1998; domized experiments, it is straightforward Heckman, Hidehiko Ichimura, and Petra to obtain estimators for the average effect E. Todd 1998; Keisuke Hirano, Imbens, of the treatment with attractive properties and Geert Ridder 2003; Xiaohong Chen, under repeated sampling, e.g., the difference Han Hong, and Alessandro Tarozzi 2008; in means by treatment status. Randomized Imbens, Whitney K. Newey, and Ridder experiments have been used in some areas 2005; Alberto Abadie and Imbens 2006). We in economics. In the 1970s, negative income discuss the current state of this literature, tax experiments received widespread atten- and the practical recommendations coming tion.

Recent Developments in the Econometrics of Program Evaluation

Skill Versus Voice in Local Development

Allied Social Science Associations Atlanta, GA January 3–5, 2010

Personalities and Public Sector Performance: Evidence from a Health Experiment in Pakistan

The Political Economy of IMF Surveillance

(G)ARCH Processes and the Phenomenon of Misleading and Unambiguous Signals

CFE-Cmstatistics 2020 Book of Abstracts

Family Planning & Reproductive Health

An Experiment in Candidate Selection

Computational and Financial Econometrics (CFE 2017)

How Effective Is Community Driven Development

How to Improve Education Outcomes Most Efficiently? a Comparison of 150 Interventions Using the New Learning-Adjusted Years of Schooling Metric Noam Angrist, David K

Volumen 53 Volumen 64