Empirical Methods in Applied Economics
Total Page:16
File Type:pdf, Size:1020Kb
Empirical Methods in Applied Economics Jörn-Ste¤en Pischke LSE October 2005 1 The Evaluation Problem: Introduction and Ran- domized Experiments 1.1 The Nature of Selection Bias Many questions in economics and the social sciences are evaluation ques- tions of the form: what is the e¤ect of a treatment (D) on an outcome (y). Consider a simple question of this type: Do hospitals make people healthier? The answer seems to be a pretty obvious yes, but how do we really know this? One approach to answering the question would be to get some some data and run a regression. For example, y could be self-reported health status, and D could be an indicator (dummy variable) whether the indi- vidual went to a hospital in the previous year. Obviously, there are many issues with the measures, for example, whether self-reported health status is reliable and comparable across individuals. These questions are obviously important, but I will not discuss them here. We can then run a regression: y = + D + " (1) I have run this regression with data for 12,640 individuals from Germany for 1994. The health measure ranges from 1 to 5 with 5 being the best health. The regression yields the following result health = 2:52 0:58 hospital + " (0:09) (0:03) Taken at face value, the regression result suggests that going to the hospital makes people sicker, and the e¤ect is strongly signi…cant. Of course, it is easy to see why this interpretation of the regression result is silly: people who 1 go to the hospital are less healthy to begin with, and being treated may well improve their health. However, even after the hosptialization they are not as healthy as those individuals who never get hospitalized. Regression only uncovers correlations but does not prove that the causality runs from the right hand side variable (hospital) to the left hand side variable (health). The inherent problem here is referred to as selection bias, and it is useful to look at it a bit more formally. Think about our treatment variable D as binary, D = 0; 1 . This is the variable of interest in the regression. Extending the frameworkf g to multivalued or continuous treatments is possible but the basic insights are the same. We will concentrate on the binary case in most of these notes. Think of two potential outcomes y0 and y1, which are de…ned as y if D = 1 outcome = 1 y if D = 0 0 These are potential or counterfactual outcomes in the sense that I can think of both outcomes for each individual. For example, y0 is the health status of an individual if the individual had not gone to the hospital, irrespective of whether the individual actually went. However, each individual will only realize one of these outcomes, depending on whether the treatment is chosen (D = 1) or not (D = 0). Consider an individual who gets hospitalized. I will only ever observe y1 for this individual. However, I would like to know y0 for this individual (what would the health of this individual have been like had he or she not gone to the hospital) since I am interested in the treatment e¤ect y1 y0. Since I do not observe y0 in this case, the only available strategy is to substitute the health of someone who was not hospitalized instead. Let us return to the realized outcome y, which we observe y if D = 1 y = 1 y if D = 0 0 = y0 + (y1 y0)D (2) this notation is useful, because y1 y0 is the treatment e¤ect for an in- dividual. For most of these lectures we will concentrate on models with homogeneous (and linear) treatment e¤ects, so y1 y0 is the same for every- body: a constant parameter to be estimated. In principle, there could be a distribution of both y1 and y0 as well as y1 y0 in the population, and we will return to this in the last lecture. In the constant treatment e¤ects case we can rewrite (2) in the more 2 familiar form y = + D+ " q q q E(y0)(y1 y0) y0 E(y0) It is now straightforward to characterize what is happening in the simple regression (1). Take expectations conditional on D: E(y D = 1) = + + E(" D = 1) j j E(y D = 0) = + E(" D = 0) j j The OLS estimate of is E(y D = 1) E(y D = 0) = + E(" D = 1) E(" D = 0) j j j j treatment e¤ect selection bias |{z} | {z (3) } Notice that the selection bias can be rewritten in terms of y0: E(" D = 1) E(" D = 0) = E(y0 D = 1) E(y0 D = 0) (4) j j j j In terms of the hospital example, the selection bias is the average di¤erence in the counterfactual health status if individuals did not go into the hospital between the group who does and does not have hospitalizations. If the sick go to the hospital, the selection bias will be negative. The goal of evaluation research is to overcome the selection bias. These lectures will deal with various methods which are popular in the literature. 1.2 How Randomized Experiments Overcome the Selection Bias: The Tennessee STAR Experiment It is easy to see from (4) that statistical independence of D and y0; y1 (or "), i.e. D y0; y1, solves the selection problem. In fact, independence is stronger? than what is required, mean independence is enough as long as we are only interested in means. Clearly, random assignment of D ensures independence. To get at the result another way, consider the simple comparison of means estimator. E(y D = 1) E(y D = 0) E(y1 D = 1) E(y0 D = 0) j j ) j j E(y1) E(y0) = ) where the second line is a consequence of statistical independence of treat- ment and the counterfactual outcomes. 3 Randomized experiments are considered the gold standard of evalua- tion research. This does not mean that random assignment experiments are without problems, but in principle they perfectly solve the evaluation problem. Experiments in the social sciences are not as common as in the natural sciences but they have been around for many decades, and they are becoming more and more prevalent. Even in situations where we do not have experimental data, the idea of a randomization experiment is a useful conceptual benchmark, and it often helps us structure a successful empirical approach. As an example of an experiment, consider a large scale social experiment on class size called the Tennessee STAR experiment. We would like to know what the e¤ect of smaller classes is on student achievement. Class size is typically not randomly assigned but chosen by school administrators. For example, in many countries classes for remedial students tend to be smaller and classes for advanced students tend to be bigger. This assignment would clearly result in selection bias in observational data. In addition, school funding is mostly local in the US so better funded schools with smaller classes are often in richer neighborhoods. The children in these schools may do better for reasons that have nothing to do with class size. So in order to answer this question, the legislature of the US state of Tennessee funded a randomized experiment on small classes and teacher aides by appropriating 12 million US$. It was implemented for a cohort of kindergartners in 1985/86 and lasted for four years, i.e. until the kindergartners were in grade 3. Starting from grade 4, everybody was in regular sized classes again. The experiment involved about 11,600 children. The average class size in regular Tennessee classes in 1985/86 was about 22.3. The idea of the experiment was to assign students to one of three treatments: small classes with 13-17 children, regular classes with 22-25 children, or regular classes with a full time teacher aide. Other regular size classes also had a part-time aide. Schools with at least three classes could choose to participate in the experiment. The 1985/86 entering kindergarten cohort was randomly assigned to one of the class types within these schools. In addition, teachers were randomly assigned to these classes. Pupils who repeated or skipped a grade left the experiment. Students who entered an experimental school and grade later were added to the experiment and ran- domly assigned to one of the classes. One, somewhat unfortunate, aspect of the experiment is that students in the regular and regular/aide classes were reassigned after the kindergarten year, possibly due to protests of the par- ents with children in the regular classrooms. There was also some switching of children after the kindergarten year. Despite these problems, the STAR 4 experiment seems to have been an extremely well implemented randomized trial. The outcome measure collected on the students in the STAR experiment are test scores on the Stanford Achievement test and the Tennessee Basic Skills First Test. These tests were given in March of each school year. The results I will present are from Krueger (1999) and come from kindergarten through 3rd grade. Krueger uses the percentile rank on each test, using the regular and regular/aide classes as the basic for scaling. The measure he looks at is the average over three subject tests. The …rst question to ask about a randomized experiment is whether the randomization was carried out properly. In order to this, it is typical to look at the pre-treatment outcome. Since the experiment should have no ef- fect on test scores prior to the experimental intervention, the pre-treatment outcomes should be balanced, i.e.