The Estimation of Causal Effects from Observational Data
Total Page:16
File Type:pdf, Size:1020Kb
Annu. Rev. Sociol. 1999. 25:659-707 Copyright( 1999 by Annual Reviews. All rights reserved THE ESTIMATIONOF CAUSAL EFFECTSFROM OBSERVATIONAL DATA ChristopherWinship and StephenL. Morgan HarvardUniversity, Department of Sociology, William James Hall, 33 KirklandStreet, Cambridge,Massachusetts 02138; e-mail: [email protected]; [email protected] KEY WORDS: causal inference,causal analysis, counterfactual,treatment effect, selection bias ABSTRACT When experimentaldesigns are infeasible, researchersmust resort to the use of observationaldata from surveys, censuses, and administrativerecords. Because assignmentto the independentvariables of observationaldata is usually nonran- dom, the challenge of estimating causal effects with observationaldata can be formidable. In this chapter,we review the large literatureproduced primarily by statisticians and econometriciansin the past two decades on the estimation of causal effects from observationaldata. We firstreview the now widely accepted counterfactualframework for the modeling of causal effects. After examining estimators, both old and new, that can be used to estimate causal effects from cross-sectional data, we present estimatorsthat exploit the additionalinforma- tion furnishedby longitudinaldata. Because of the size and technical natureof the literature,we cannot offer a fully detailed and comprehensivepresentation. Instead, we present only the main features of methods that are accessible and potentiallyof use to quantitativelyoriented sociologists. INTRODUCTION Most quantitative empirical analyses are motivated by the desire to estimate the causal effect of an independent variable on a dependent variable. Although the randomized experiment is the most powerful design for this task, in most social science research done outside of psychology, experimental designs are infeasible. Social experiments are often too expensive and may require the 659 0360-0572/9/0815-0659$08.00 660 WINSHIP& MORGAN unethicalcoercion of subjects. Subjectsmay be unwilling to follow the experi- mentalprotocol, and the treatmentof interestmay not be directlymanipulable. For example, without considerablepower and a total absence of conscience, a researchercould not randomlyassign individualsto differentlevels of edu- cational attainmentin orderto assess the effect of educationon earnings. For these reasons, sociologists, economists, and political scientists must rely on what is now known as observationaldata-data that have been generatedby something other than a randomizedexperiment-typically surveys, censuses, or administrativerecords. The problems of using observationaldata to make causal inferences are considerable(Lieberson 1985, LaLonde 1986). In the past two decades, how- ever, statisticians(e.g. Rubin,Rosenbaum) and econometricians (e.g. Heckman, Manski)have madeconsiderable progress in clarifyingthe issues involvedwhen observationaldata are used to estimatecausal effects. In some cases, this hard- won clarityhas permittedthe developmentof new and more powerfulmethods of analysis. This line of researchis distinct from the work of sociologists and others who in the 1970s and 1980s developed path analysis and its general- ization, covariancestructure analysis. Despite their differences, both areas of researchare often labeled causal analysis. Statisticiansand econometricianshave adopteda sharedconceptual frame- work thatcan be used to evaluatethe appropriatenessof differentestimators in specific circumstances. This framework,to be describedbelow, also clarifies the propertiesof estimatorsthat are needed to obtain consistent estimates of causal effects in particularapplications. Our chapterprovides an overview of the work that has been done by statis- ticians and econometricianson causal analysis. We hope it will provide the reader with a basic appreciationof the conceptual advances that have been made and some of the methods that are now available for estimating causal effects. Because the literatureis massive and often technical, we do not at- temptto be comprehensive.Rather, we presentmaterial that we believe is most accessible and useful to practicingresearchers. As is typical of the literaturewe arereviewing, we use the languageof exper- iments in describingthese methods. This usage is an indicationof the advances thathave been made;we now have a conceptualframework that allows us to use the traditionalexperimental language and perspective to discuss and analyze observationaldata. Throughoutthis chapter,we write of individualswho are subjectto treatment,and we describeindividuals as having been assigned to ei- thera treatmentor a controlgroup. The reader,however, should not assumethat the thinkingand methods we review applyonly to the limited set of situationsin which it is strictlyproper to talk abouttreatment and control groups. In almost any situationwhere a researcherattempts to estimatea causaleffect, the analysis can be described,at least in terms of a thoughtexperiment, as an experiment. ESTIMATIONOF CAUSAL EFFECTS 661 The chapterconsists of threemajor sections. The firstpresents the conceptual frameworkand problems associated with using observationaldata to estimate causal effects. It presentsthe counterfactualaccount of causalityand its associ- ated definitionof a causal effect. We also discuss the basic problemsthat arise when using observationaldata to estimate a causal effect, and we show that there are two distinct sources of possible bias: Outcomesfor the treatmentand control groups may differ even in the absence of treatment;and the potential effect of the treatmentmay differ for the treatmentand controlgroups. We then present a general frameworkfor analyzing how assignment to the treatment group is relatedto the estimationof a causal effect. The second section examines cross-sectionalmethods for estimatingcausal effects. It discusses the bounds that data place on the permissible range of a causal effect; it also discusses the use of controlvariables to eliminatepotential differences between the treatmentand control groups that are related to the outcome. We review standardregression and matchingapproaches and discuss methods that condition on the likelihood of being assigned to the treatment. These latter methods include the regression discontinuitydesign, propensity score techniques, and dummy endogenous variablemodels. This section also discusses the use of instrumentalvariables to estimatecausal effects, presenting their developmentas a methodto identify parametersin simultaneousequation models and reviewing currentresearch on what instrumentalvariables identify in the presence of differenttypes of treatment-effectheterogeneity. The third section discusses methods for estimatingcausal effects from lon- gitudinaldata. We present the interruptedtime-series design, then use a rela- tively generalmodel specificationfor the structureof unobservablesto compare change-score analysis, differentiallinear growthrate models, and the analysis of covariance.The key lesson here is that no one method is appropriatefor all cases. This section also discusses how to use data to help determine which method is appropriatein a particularapplication. The paper concludes with a discussion of the general importanceof the methods reviewed for improvingthe quality of quantitativeempirical research in sociology. We have more powerful methods available,but more important, we have a frameworkfor examining the plausibility of assumptions behind different methods and thus a way of analyzing the quality and limitations of particularempirical estimates. BASIC CONCEPTUALFRAMEWORK In the past two decades, statisticiansand econometricianshave adopteda com- mon conceptualframework for thinkingabout the estimationof causaleffects- the counterfactualaccount of causality. The usefulness of the counterfactual frameworkis threefold. It provides an explicit frameworkfor understanding 662 WINSHIP& MORGAN (a) the limitationsof observationaldata, (b) how the treatmentassignment pro- cess may be relatedto the outcome of interest,and (c) the type of information that is providedby the data in the absence of any assumptions. The Counterfactual Account Of Causality Discussions of causality in the social sciences often degenerateinto fruitless philosophical digressions (e.g., see McKim & Turner1997, Singer & Marini 1987). In contrast,the developmentof the counterfactualdefinition of causality has yielded practicalvalue. With its origins in the early work on experimen- tal designs by Fisher (1935), Neyman (1923, 1935), Cochran& Cox (1950), Kempthorne(1952), andCox (1958a,b), the counterfactualframework has been formalized and extended to nonexperimentaldesigns in a series of papers by Rubin (1974, 1977, 1978, 1980, 1981, 1986, 1990; see also Pratt& Schlaifer 1984). However,it also has roots in the economics literature(Roy 1951, Quandt 1972). The counterfactualaccount has provided a conceptual and notational frameworkfor analyzing problems of causality that is now dominantin both statisticsand econometrics. Holland (1986), Pratt& Schlaifer(1988), andSobel (1995, 1996) providedetailed exegeses of this work. Let Y be an intervallevel measure of an outcome of interest,either contin- uous or discrete or a mixtureof the two. Examples are earnings,mathematics aptitude, educational attainment,employment status, and age at death. As- sume that individualscan be exposed to only one of two alternativestates but that each individual could a priori be exposed to either state. Each state is characterizedby a distinct set of conditions,