<<

Two-Stage Least Squares Estimation of Average Causal Effectsin Models WithVariable TreatmentIntensity Joshua D. ANGRISTand Guido W. IMBENS*

Two-stageleast squares (TSLS) is widelyused in econometricsto estimate parameters in systemsof linear simultaneous equations andto solve problems of omitted-var.ables biasin single-equationestimation. We showhere that TSLS canalso be usedto estimate theaverage causal effect of variable treatments such as drugdosage, hours of exam preparation, cigarette smoking, and yearsof schooling.The average causal effect in whichwe areinterested is a conditionalexpectation of the difference between the outcomes of thetreated and whatthese outcomes would have been in theabsence of treatment.Given mild regularity assumptions, the probabilitylimit of TSLS is a weightedaverage of per-unit average causal effects along the length of an appropriatelydefined causal responsefunction. The weightingfunction is illustratedin an empiricalexample based on therelationship between schooling and earnings. KEY WORDS: Instrumentalvariables; Rubin causal model; Schooling; Wages.

1. INTRODUCTION IV andTSLS estimators.Finally, the analysis used by Robins (1989,sec. 18) andRobins and Tsiatis (1991) tocorrect for Econometriciansand statisticians have developed a variety noncompliancein clinicaltrials is an applicationof instru- oftechniques for the estimation of systems of linear simul- mentalvariables to experimentaldata. taneousequations. The simplestand mostcommonly used Evaluationresearch in econometricsand applicationsof ofthese techniques isthe class of instrumental variables (IV) IV andTSLS inother fields typically rely on regressionmod- estimators,of which two-stage least squares (TSLS) is the els withconstant coefficients. The regressionapproach to mostimportant special case. IV and TSLS weredeveloped evaluationpostulates a hypothetical linear response function. in earlyresearch on simultaneousequations estimation (by In contrast,the statistical literature on evaluationhas been Wright[1928] and Theil[1958], among others), and both stronglyinfluenced by Rubin's (1974, 1977) model for causal estimatorsare now described in every textbook inferenceusing counterfactual outcomes. (The ideasbehind (e.g.,Theil 1971). These techniques are typically introduced thisframework date back to Neymanand Fisher; see Rubin as solutionsto the problem of "simultaneous equations bias." 1990for historical details.) In a methodologicalpaper related Moregenerally, however, IV providesa powerful and flexible to thisone (Imbensand Angrist1994), we discussthe use estimationstrategy that can be usedto tacklethe problem of IV to estimateaverage causal effects in Rubin'scausal ofomitted-variables bias in a widerange of single-equation model for binary treatments, as well as howto estimatethe regressionapplications, such as modelswith mismeasured sampling variance of IV estimatesof treatmenteffects. In regressors(Durbin 1954) and theestimation of treatment another recent paper (Angrist, Imbens, and Rubin1995), effectsin manpowertraining programs (Heckman and Hotz we presenta detailedanalysis of the conceptual issues and 1989;Heckman and Robb 1985). problemsthat arise when IV is usedto estimatethe average The use ofIV and TSLS to estimatetreatment effects is causaleffect of a binarytreatment inthe Rubin causal model, notlimited to econometrics. Permutt and Hebel (1989) used togetherwith a surveyof evaluation research in econometrics TSLS to estimatethe effect of maternalsmoking on birth and . weight.Hearst, Newman, and Hulley's(1986) use of the In thisarticle, we show how TSLS canbe usedto estimate draftlottery to estimatethe effect of Vietnam-eramilitary averagecausal effects in a versionof Rubin's causal model serviceon civilianmortality is another epidemiological ap- thatallows for variable treatment intensity, multiple instru- plicationof IV. Angristand Krueger(1992) usedIV to es- ments,and covariates. In particular,we showthat TSLS ap- timatethe effect of children's age at schoolentry on their pliedto a causalmodel with variable treatment intensity and ultimateeducational attainment. The Powersand Swinton nonignorabletreatment assignment identifies a weighted av- (1984) "encouragementdesign" used to studythe effect of erageof per-unit treatment effects along the length of a causal testpreparation on graduaterecord examination (GRE) responsefunction. Our resultsdo nothinge on linearityof scores,discussed by Holland (1988), alsoleads naturally to therelationships between response variables, treatment in- tensities,and instruments. * JoshuaD. AngristisSenior Lecturer, Department ofEconomics, Hebrew A secondcontribution of thisarticle is to illustratethe University,Jerusalem 91095, Israel. Guido W. Imbens is Associate Professor, causal response weighting function in an empiricalexample Departmentof Economics,Harvard University, Cambridge, MA 02138. Theauthors thank Don Rubinfor raising the question that this article an- basedon thework by Angrist and Krueger (1991), usingIV swers,and seminar participants atthe Harvard-MIT Econometrics Workshop and TSLS to estimatethe effect of yearsof schoolingon andthe University ofWisconsin for helpful comments. Four referees and earnings.The weighting function in theapplication is ofin- an associateeditor also madehelpful comments. Financial support from theNational Science Foundation (SES-9122627) and the Nederlandse Or- ganisatieVoor Wetenschappelijk Onderzoek is alsogratefully acknowledged. ? 1995 AmericanStatistical Association Partof the article was written while the second author was visiting the Hebrew Journalof the AmericanStatistical Association UniversityDepartment ofEconomics. June 1995,Vol. 90, No. 430, Applicationsand Case Studies 431 432 Journal of the American Statistical Association, June 1995

terestbecause the relationship between schooling and earn- pulsoryattendance laws typically require students to enter ingsis one ofthe most important empirical regularities in schoolin thefall of the year in whichthey turn 6, butallow economics.More generally, the weighting formula can help studentsto dropout of school when they reach age 16.This researchersunderstand which observations are contributing induces a relationshipbetween quarter of birth and educa- to a particularestimate, and theformula provides a causal tionalattainment, because students born in thefirst quarter interpretationforsome of the simple estimators commonly of theyear enter school at an olderage thando students usedin appliedresearch. bornin later quarters. Students who enter school at an older agethus are allowed to drop out after having completed less 2. APPLICATION:THE EFFECT OF COMPULSORY schoolingthan students who enter school at a youngerage. SCHOOL ATTENDANCEON EARNINGS Ifstudents' quarter of birth is correlatedwith earnings solely becauseit is correlatedwith schooling, then it is an instru- Thetheory of human capital (Becker 1964) says that years mentfor schooling in an earningsequation. of schoolingcan be treatedmuch the same way as an in- One wayto convertthis idea intoan estimationstrategy vestmentin physicalcapital, yielding a rate of return some- is to comparethe education and earningsof people born in thinglike an interestrate. The empirical counterpart ofthis thefirst quarter to the education and earnings of people born theoryis the"human capital earnings function," a semilog- in a laterquarter, say the fourth. This leads to thesimplest arithmicregression of earnings on schoolingand otherco- possibleIV estimator:Wald's method of fitting straight lines variates.An exampleis thefollowing model for microdata: (Durbin1954). As an example,calculations underlying Wald Y = o + Xy+pS+,(+1 ) estimatesbased on a firstquarter/fourth quarter comparison formen are laid outin Table 1. PanelA ofthe table shows whereY is thelog of weekly earnings, yo is a constant,X1 is resultstabulated from data on thewages and earningsof a rowvector of covariates, yj a vectorof coefficients,S is menin the 1970 Census, and Panel B showsresults tabulated yearsof schooling, and thecoefficient p is theapproximate using data fromthe 1980Census. In bothdata sets,men percentagereturn to a yearof schooling.Equation ( 1) is bornin thefirst quarter earn slightly less and haveslightly sometimesaugmented with an equationdescribing how lessschooling than men born in the fourth quarter. The ratio schoolingis relatedto thecovariates, X1, and additional co- ofdifferences inearnings to differences inschooling generates variates,X2: a Waldestimate of the return to schoolingof 5.3% using the 1970Census and 8.9%using the 1980 Census. S= 60 +Xlbl +X262 +q7 (2) Durbin(1954) showedthat under the null hypothesis that Econometricianstypically assume that a semilogarithmicthe OLS estimateis consistent,the sampling variance of the responsefunction for earnings is a reasonablygood approx- imationto the true earnings function. Ordinary least squares (OLS) appliedto ( 1), however,may lead to biased estimates Table1. CompulsorySchool Attendance of if p even thetrue response function is linear.The reason (1) (2) (3) is thatschooling is determinedby individual choices under Bornin Bornin Difference constraints.For example,the literature on schoolingand 1stquarter 4thquarter (std.error) of of - earningshas devoted considerable attention to theproblem year year (1) (2) of "abilitybias" in estimatesof the economicreturn to PanelA: WaldEstimates for 1970 Census-Men Born1920-1929a schooling.This is a formof omitted-variables biasthat would In(weekly wage) 5.1485 5.1578 -.00935 ariseif more able individuals in thelabor market get more (.00374) schooling,perhaps because of better access to capital markets. Education 11.3996 11.5754 -.1758 The observedpositive correlation between schooling and (.0192) earningswould then partly reflect the fact that those with Waldest. ofreturn .0531 to education (.0196) moreschooling have higher earnings potential. In termsof OLS est. ofreturn to .0797 Equations( 1 ) and(2), abilityis commonto the error terms, educationb (.0005) e and and so theseerror terms are correlated. q, PanelB: WaldEstimates for 1980 Census-Men Born1930-1939 One infeasiblesolution to thisproblem is to conductan experimentin which is Ran- In(weekly wage) 5.8916 5.9051 -.01349 schooling randomlyassigned. (.00337) dom assignmentwould eliminate the correlation between Education 12.6881 12.8394 -.1514 schoolingand ability or unobserved earnings potential. Even (.0162) in theabsence of a trueexperiment, a "natural experiment" Wald est. ofreturn .0891 maygenerate instrumental variables that effectively do the to education (.0210) samething. OLS est. ofreturn to .0703 Instrumentalvariables are variables related to the outcome education (.0005) ofinterest solely through the treatment of interest. For ex- a The sample size is 122,223 in Panel A and 162,515 in Panel B. Each sample consists of men bornin the first and fourth quarters of the year in the United States who had positive earnings in ample,in tworecent papers, Angrist and Krueger(1991, theyear preceding the survey. The 1980 Census sample is drawnfrom the 5% sample,and the 1992)showed that students' quarter of birth interacts with 1970 Censussample is fromthe state, county, and neighborhoods 1% samples.A detailedde- compulsoryattendance laws and age at schoolentry to gen- scriptionof the data sets is providedin the Appendix to Angristand Krueger (1991). bTheOLS returnto education was estimatedfrom a bivariate regression oflog weekly earnings eratevariation in yearsof completed schooling. State com- on yearsof education in a sampleof men born in the first and fourth quarters. Angristand Imbens: Estimation of Average Causal Effects 433 differencebetween a Waldand OLS estimatesis thediffer- stage" regression of schoolingon a constantand three encein their variances. Ignoring the small sampling error of quarter-of-birthdummies. theOLS estimates,the Wald estimates are within sampling Columns(3)-(4) and (7)-(8) reportestimates based on errorof the OLS estimates(8% and 7% in thetwo Census equationswhere the set of covariates, X1 includes nine year- datasets). Thus the instrumental variables estimates seem of-birthdummies. The excludedinstruments used to con- tosupport a conclusion that the OLS estimatesare not biased structthe TSLS estimatesin columns(4) and (8) include byunobserved ability in theerror term. (A detailedcase for threequarter-of-birth dummies interacted with 10 year-of- thisclaim is madein Angristand Krueger 1991, which also birthdummies. This specificationallows for year-of-birth discussesstrategies to controlfor age effectswhen using effectsin theoutcome equation and allowsthe relationship quarterof birth as an instrumentfor schooling.) betweenschooling and quarter of birth to differ for each year The generalformula for an IV estimateis (Z'W) -'Z 'y, ofbirth. The OLS andTSLS estimatesare similar and differ whereZ is a matrixof instruments conformable tothe matrix littleacross model specifications. ofregressors, W, withrows including X1 and S, and y is a vectorof observations on thedependent variable. Because 3. THEAVERAGE CAUSAL EFFECT A TREATMENT X1is assumedto be uncorrelatedwith e, Z typicallyincludes OF VARIABLE X1 and a singlevariable not in X1,perhaps taken from X2 Equation(1) is a structuralrelationship derived from as- in (2). Forexample, X2 might include quarter of birth. For sumptionsabout human behavior, but it is notnecessarily IV to be consistent,Z must be asymptoticallyuncorrelated a causalrelationship in theRubin (1974) sense.In thissec- withthe regression error and the probability limit of (Z 'W/ tionwe presentresults that give IV and TSLS estimatesof n) mustbe nonsingular(n is thesample size). Equation(1) a causalinterpretation. Suppose that each in- Analternative estimation strategy based on thesame idea dividualwould earn Yj ifhe or shehad j yearsof schooling is TSLS. Thereare potentiallythree different IV or Wald forj = 0, 1, 2, . . ., J. It is usefulto imaginethat a fullset estimatesthat could be computedusing quarter-of-birth ofYj exists for each person, even though only one is observed. dummies.A TSLS estimatorusing all the information avail- The setof Yj forone personis assumedindependent of the- able on quarterof birthis calculatedby firstregressing S outcomesand treatmentstatus of other people. Rubin has (theendogenous regressor) on all covariatesincluded in the called independenceof thesepotential or counterfactual equation,X1, and on all thepotential instruments excluded outcomesacross individuals the stable unit treatment values fromthe equation, X2-in thiscase threequarter-of-birth (SUTVA) assumption.A full set of counterfactual outcomes dummies.The secondstage in theTSLS procedureis to es- and SUTVA are commonlyassumed in thestatistics liter- timate atureon causality(e.g., Holland 1986), although these are nottrivial assumptions. Elsewhere (Angrist et al. 1995),we Y=0 + X1Y + PA+v,p havediscussed conceptual issues associated with these as- where3' is thefitted value from the first-stage regression and sumptionsin an IV context. v-{e + p[S -] 1}. In thiscase TSLS can be interpreted Providedthat we are willingto entertainthe notion of as an IV estimatorwhere the instruments are XI and3S (Theil counterfactualoutcomes, the objective of causalinference 1971), or as an efficientlinear combination of alternativeis touncover information about the distribution ofYj -Y 15 IV estimatesusing single quarter-of-birth dummies. whichis the causal effect ofthe jth yearof schooling (Holland Columns(1 )-( 2) and(5 )-( 6) ofTable 2 reportOLS and 1986;Rubin 1974, 1977). We view estimates of p inEquation TSLS estimatesof Equation( 1) withoutcovariates using (1) as havinga causalinterpretation when they have prob- bothCensus data sets.The excludedinstruments used to abilitylimit equal to a weightedaverage of E[ Yj -Yj- I ] for constructthe TSLS estimatesin thesecolumns are three allj in somesubpopulation or subpopulations ofinterest. In quarter-of-birthdummies. That is, the estimates are regres- general,this will not be thecase. (If thetreatment level, S, sionsof the log weekly wage on fittedvalues from a "first- is randomlyassigned, then E[ Yj -Yj_ I ] can be consistently

Table 2. TSLS Estimatesof the Returnto Schooling

Born 1920-1929, 1970 census Born 1930-1939, 1980 census

OLS TSLS OLS TSLS OLS TSLS OLS TSLS

(1) (2) (3) (4) (5) (6) (7) (8) Education .080 .063 .080 .077 .071 .103 .071 .089 (.0004) (.016) (.0004) (.015) (.0003) (.020) (.0003) (.016) YOB dummies no no yes yes no no yes yes x2(dof) 2.4 (2) 36.0 (29) 2.9 (2) 25.4 (29) Sample size 247,199 329,509

NOTE: Each sample consists of men bom in any quarterin the UnitedStates who had positiveeamings in the year precedingthe survey.The table reportscoefficients from OLS and TSLS regressionsof the log weeklywage on years of completedschooling. The excluded instrumentsin columns2 and 6 are threequarter-of-birth dummies. The excluded instrumentsin columns4 and 8 are threequarter-of-birth dummies times 10 year-of-birthdummies. The X2 statisticis an instrument-errororthogonality test statistic. 434 Journal of the American Statistical Association, June 1995 estimatedby subtracting the average response for individuals By Assumption 1, this simplifies to withtreatment level j - 1 fromthe average response for + (Y1 - + (Y1 - individualswith treatment level j.) E[Yo Yo)S] -E[Yo Yo)So] We defineSz E {0, 1, 2, .. ., J} to be thenumber of = E[ (Y-YO) (S1 - SO)]. (4) yearsof schooling completed by a studentconditional on thestudent's quarter of birth, Z. As withYj, Sz is assumed Withoutimposing additional restrictions, S1 - So can be to existfor each valueof Z foreach person,even though equalto 1,0, or -1. A valueof 1 indicatesindividuals who onlyone Sz is observed.This setupincorporates two in- areinduced to graduatehigh school by the instrument (i.e., novationsto theoriginal Rubin (1974) frameworkby al- beingborn in a latequarter), a valueof 0 indicatesthose lowingfor both multivalued treatments and counterfactual whose schooling status is unchanged,and a valueof -1 in- treatmentstatus. Rubin (1978) and Robins (1989a, b) dicatesthose who are inducedto dropout of highschool also discussedcausal inferencewith multivalued treat- beforegraduating. Therefore, ments,and Robins (1989a), Robins and Greenland E[(Y - YO)-(S1 - So)] (1992) and Holland(1988) used thenotion of counter- factualtreatment status. As faras weknow, however, these = E[Y1 -Yo S- So = 1] Pr[SI - So = 1] ideas have not been used previouslyin an IV or TSLS framework. -E[ Y1-Yo IS1-SO =-1I Pr[S-SO =-1]I Initiallywe assumethat Z is codedto takeon onlytwo (5) values,0 and 1,indicating first or later quarters of birth. So is theyears of schooling that would be attainedby an indi- Individualswhose schooling status is unaffectedby the in- vidualborn in the first quarter, and S1 is the years of schooling strumentclearly do notcontribute anything to comparisons thatwould be attainedby the same individual if he or she of averageoutcomes by instrumentstatus. But thegroup wereto be bornin laterquarters. For each personin the that does contributeincludes both "switchers-in"and Censussample, we observethe triple (Z, S, Y), whereZ is "switchers-out."It is clearthat it is theoreticallypossible to thequarter of birth, S = Sz = Z * S1 + (1 - Z) S0 is years havea situationwhere the treatment effect, Y1 - Yo,is pos- ofcompleted schooling, and Y = Ysis earnings. Our principal itivefor everyone but the sizes of the group of switchers-in identifyingassumption (apart from assuming the existence andswitchers-out is such that the average difference in out- ofYj) is thatZ is independentof all potentialoutcomes and comesis zeroor evennegative. Suppose, for example, that potentialtreatment intensities. As a practicalmatter, this thetreatment effect (effect of graduating high school) equals assumptionmay be trueonly after conditioning on covari- a forthose induced to graduateand 2a forthose induced to ates.(In fact,the need to controlfor covariates sometimes dropout. IfPr[ S1 -So = 1] = 2 and Pr[S1 -So = -1] = 3 motivatesthe use of TSLS insteadof IV.) Formally,we have thenthe average difference in Y conditionalon Z is zero, thefollowing assumption. eventhough Y1 - YO> 0 foreveryone. The mostcommon way to getaround this problem is AssumptionI (Independence). The randomvariables S0, simplyto assumea constantunit treatment effect, Yj - Yj_ S1, Yo, Y1,. . ., Yj arejointly independent of Z. = a, forall j and all individuals.This is theassumption In thecompulsory schooling example, Assumption 1 re- underlyingeconometric applications using linear regression quiresthat quarter of birth has no effecton earningsother models,as wellas theapplication of instrumental variables thanthrough its effect on schooling.This is themeaning of techniquesby Permutt and Hebel(1989). In hiscomment thenotion that quarter of birth provides a natural experiment on Holland's( 1988)discussion of causality, Leamer ( 1988) thatcan be usedto estimatethe effect of schooling on earn- pointedout thatin linearmodels with constant treatment ings.Assumption 1 can also be viewedas a nonparametriceffects, the problem of using instrumental variables for causal versionof the assumptions that the instruments, X2, are un- inferenceis straightforward. correlatedwith e andi, theerror terms in Equations(1) and Insteadof restricting treatment effect heterogeneity, inthis (2). Angristet al. (1995) discussedthe impact of deviations articlewe impose a nonparametricrestriction on theprocess fromAssumption 1 on IV estimatesin thebinary treatment determining S as a functionof Z. This restrictionis that case. eitherSI - So 2 0 or SI - So < 0 foreveryone. If in the It is importantto notethat outside a linearregression binary treatment example, S1 - So 2 0, then(5) becomes framework,the independence assumption alone is notusu- allysufficient to identify a meaningful average treatment ef- E [(Y - YO) (SI - SO)] fect.This point is easiestto see in a simpleexample where = E[Y1-Yo I11-So = 1] *Pr[S1-So = 1]. (6) S is binary(perhaps indicating high school graduates versus nongraduates).A comparison of outcomes by different val- The conditionalexpectation, E[ Y1 - YoI S - So = 1], is ues ofthe instrument gives whatwe have called a localaverage treatment effect (LATE; Imbensand Angrist 1994). LATE isthe average causal effect E[YI Z = 1I-E[YI Z = OI oftreatment for those whose treatment status is affectedby theinstrument ( i.e., for those for whom 51 = 1 andSO = 0) . = E[Yo (Y -o-)S1IlZ = 11 Formally,the following monotonicity condition is sufficient - E[Yo + (Y1 -Yo)SoIlZ = 0].* (3) (givenindependence) for LATE to be identified. Angristand Imbens: Estimation of Average Causal Effects 435

Assumption2 (Monotonicity). With probability 1,either Pr{I[SI 2 1] - I[So 2 1] = 1} = Pr(S 2 1 > So) and SI- So 0 or SI - So < 0 foreach person. Pr{I[SI ? 2] - I[SO ?2] = 1} = Pr(S 2 2 > SO). Thisassumption has beendiscussed previously by Robins The requirementthat Pr(SI 2 j > So) > 0 forsome j (1989a), whoshowed that it does notsharpen bounds for meansthat the instrument must affect the level of treatment, population-averagetreatment effects, and by Permuttand S. Also,note that in theproof of Theorem 1, S is assumed Hebel(1989) in thecontext of a regressionmodel for the totake on onlyinteger values between 0 and J. Itis enough, effectof maternal smoking on birthweight. In thesmoking however,that S be boundedand take on a finitenumber of example,monotonicity means that a randomizedantismok- rationalvalues. Then one can alwaysuse a lineartransfor- ingintervention never increases smoking. In theschooling mationto ensure that Stakes on integervalues only between example,monotonicity means that, because of compulsory 0 and J. A lineartransformation of S does nothave any attendancelaws, people born in quarters2-4 completeat effecton thenumerator of (7) and multipliesthe denomi- leastas muchschooling as theywould have completed had natorby a constant.Thus the linear transformation amounts theybeen born in thefirst quarter. to changingthe units in whichtreatment intensity is mea- Assumption2 is notverifiable, because it involves unob- sured. servedvariables (i.e., only one of S1 orSO is observed).Nev- Theorem1 is importantbecause it showsthat in a wide ertheless,for multivalued treatments (J > 1), Assumption varietyof models and circumstances, itis possible to identify 2 hasthe testable implication that the cumulative distribution features of the distribution of Yj - Yj_. For example,the function(CDF) ofS givenZ = 1 and theCDF ofS given monotonicityassumption appears plausible in researchde- Z = 0 shouldnot cross, because if SI ? SOwith probability signs based on thedraft lottery (e.g., Angrist 1990) and in 1,then Pr(SI ?1j) Pr(So ?j) forallj. Thisimplies Pr(S designsbased on randomlyassigned encouragement or jlZ = 1)? Pr(S jlZ = 0) orFs(jlZ = 0) ? Fs(jIlZ intention-to-treatsuch as discussedby Powers and Swinton = 1), whereFs is theCDF ofS. We investigatethis impli- (1984) and Holland(1988). This assumptionis also me- cationin theschooling example that follows. If J = 1 and chanicallysatisfied in thelatent index models commonly thetreatment is binary, then the CDF's cannotcross. usedin econometrics(Imbens and Angrist1994). The maintheoretical result of the article is nowgiven for Werefer to the parameter f as theaverage causal response the case whereS, - So ? 0. (ACR). Thisparameter captures a weighted average of causal responsesto a unitchange in treatment,for those whose Theorem1. Supposethat Assumptions 1 and 2 holdand treatmentstatus is affectedby theinstrument. The weight thatPr(SI 1 j > SO)> 0 forat leastone j. Then attachedto theaverage of Yj - Yj_ is proportionalto the E[YI Z = 1]-E[YI Z =0] numberof people who, because of the instrument, change E[SI Z = 1] - E[SI Z = 0] theirtreatment from less than j unitsto j ormore units. This is > So). In theschooling example, this J proportion Pr(SI 2j is theproportion of peoplewho, by accidentof birth,are wj, E[Yj -Yj-l I S, > j > SO]9 ,(7) j=I inducedto completeadditional years or fractionalyears of schooling.Note that this group need not be representative where ofthe population, and that the members of this group cannot Pr(SI j > SO) be identifiedfrom the data becausemembership involves unobservedcounterfactual treatment status. zJ=1 Pr(SI > i > SO) A refereemade the pointthat although the ACR is a This impliesthat 0 < wj 1 and IJ=11 = 1, so that is a weightedaverage, it averagestogether components that are weightedaverage per-unit treatment effect. potentiallyoverlapping. For example,someone who is in- Proofof the theorem is givenin theAppendix. The proof ducedto graduatehigh school by having been born in a late followsthe same lines as thedevelopment from Equations quarter,but would have completed only 11th grade had he (3)-(6) and generalizesour earlierresult (Imbens and orshe been born in thefirst quarter, contributes to thepop- Angrist1994) to modelswith variable treatment intensity. ulation of individuals for whom Pr(SI 2 12> SO).But any- A simplifiedexample with three treatment intensities helps onewho is inducedto graduate high school, but would have to understandthe more general result. Suppose that S can otherwisecompleted only 10th grade, will be in thepopu- be equalto 0, 1,or 2. We can write lationof individuals for whom Pr(S 2 12 > SO) and for whom Pr(SI ? 11 > So). Y = YO+ (Y1 - YO)I[S 2 1] + (Y2- Y1)I[S 2 2], Similarly,suppose that the instrument induces some frac- whereI[A ] is theindicator function for event A. A version tionof the sample to go from10 to 12 unitsof treatment ofEquation (4) forthis case is' buthas no effectotherwise. Then the ACR can be written as thesum of two single-unit average effects, E[YIZ= 1]-E[YIZ=0] = E[(Y1 - YO) (I[S1 11 - [So 2 1D)] I E[Y12 Y1 11S1 2 12 > SO] + E[(Y2 -Y1)>(I[51 ?2]2-I[So ? 2])].

Byvirtue of Assumption 2, 2 1] - [So 2 1] and I[S1 I[S1 +- EY Y1- oIlS1 2 11 > So], 2 2] - [So 2 2] mustboth be either1 or 0. Therefore, 436 Journalof the AmericanStatistical Association, June 1995

althoughwhat really is identifiedis E[Y12 -Y1o ISI ? 12, Thusthe weighting function can be consistentlyestimated 11 > S0]. In theschooling and otherexamples, however, fromthe differencebetween the empiricalCDF's of S mostindividuals would probably not be involvedin an over- givenZ. lap ofthis , because the instrumentwould typically be 4. MULTIPLEINSTRUMENTS AND MODELS expectedto cause no more thana one-unitincrement in WITHCOVARIATES treatmentintensity for any particular individual. Becausedifferent instruments are associated with different 3.1 IncorrectlyCoded BinaryTreatments weightingschemes in thedefinition of the ACR, the discus- sionin theprevious section provides an explanationof why Theorem1 has a simplecorollary that can be used to estimatesof f constructedusing different instruments might interpretparameter estimates in modelswhere a variable differ.The typicaleconometric application of TSLS, how- treatmentis incorrectly as a parameterized binarytreatment. ever, imposes a constant-treatment-effectmodelin which Forexample, Permutt and Hebel ( 1989)discussed conditions Yi - Yj_I = a forall j andall individuals.In thiscase, alter- sufficienttoidentify the effect of smoking when it is assumed nativeinstrumental variables estimates of the same a canbe thatall thatmatters for health is whether are anycigarettes combinedinto a single,more efficient estimate using TSLS. smoked. econometricianssometimes estimate the Similarly, Whatdoes the TSLS estimator-whichcombines alternative effectof on collegeand/or high school graduation earnings, instrumentalvariables estimates-produce when it is applied ignoringthe fact that dummy variables indicating graduation to the heterogeneous-treatment-effectsmodeloutlined in arenonlinear functions of an underlyingyears-of-schooling Section 3? variable(e.g., Rosen and Willis 1979). We explorethis question for the case whereK mutually Corollary(Misspecified binary treatment). Suppose that orthogonalbinary instruments are combined to form a single thetreatment ofinterest is assumedto be an indicatorfunc- TSLS estimate.This is a fairlygeneral example, because any tionof S, sayb I(S 2 1),for some 1 < 1? J=I Pr(SI SO),> 1. Theorem2 showsthat the TSLS estimatorconstructed by Pr(SI 2 1 > So) usinga constantplus K linearlyindependent dummy vari- ables, = I(Z = k), as instrumentshas probabilitylimit Note thatthe only situation where = I1 is whenthe dk instrumenthas no effectother than to cause people to switch equalto a weightedaverage of K linearlyindependent ACR's, fromS = /- 1 to S = 1. Thus whena variabletreatment is 1k,k-l, where incorrectlyparameterized as binary,the resulting estimate = E[YIZ = k]I-E[YIZ = k- 1] tendsto be too largerelative to theaverage per-unit effect Pick-i - E[SIZ = k]-E[SIZ= k-1] alongthe length of the response function. On theother hand, Becauseeach fklk-I is a weightedaverage of points on the byvirtue of monotonicity, the sign of the ACR is stillcon- causalresponse function, the TSLS estimatealso converges sistentlyestimated. to a weightedaverage of points on thecausal response func- 3.2 Estimationof the ACR and the Weighting tion. Function Letthe points of support of Z be orderedsuch that / < m impliesE[SI Z = I] < E[SI Z = mi]. Note that using K A naturalestimator of d is itssample analog. This esti- dummies,dk = I( Z = k), plusa constantin TSLS estimation matoris an applicationof Wald's ( 1940)grouping method is the same as instrumentalvariables estimation using of fittingstraight lines, where the data havebeen grouped E[S IZ] plusa constantas instruments.We thenhave the bythe instrument. Durbin ( 1954)appears to havebeen the followingtheorem. firstto pointout that the Wald estimator is also an instru- Theorem2. Supposethat E[S IZ] and a constantare mentalvariables estimator. usedto constructinstrumental variables estimates of 1, in The ACR weightsin Theorem1 can be estimatedusing theequation a randomsample of (Y, S, Z) because Y= -Y+ fZS+ e. Pr(S1 j > S0) = Pr(S j ) - -Pr(S0 2 j) The resultingestimate has probability limit = Pr(S0

where where@(X) = E{E[SI X, Z] .(E[SI X, Z] -E[SI X]) I X} and Wc = (E[SIZ = k] - E[SIZ = k - 1]) 3(X) = E{Y (E[SI X, Z] - E[SI X])I X} (10) zJ=o ir,E[SIZ = I](E[SIZ = 1] - E[S]) E{S.(E[SIX,Z]-E[SIXI)IX}. 1 and -rl= Pr[Z = 1]. Moreover,0 < Lk < 1 and k= KI Proof: Equation(8) is immediatefrom the definition of = 1. TSLS usingdummy variable instruments. The weighting Thisresult modifies and generalizesa previousresult of formulas(9) and(10) canbe establishedby iterating expec- ours(Imbens and Angrist 1994) for binary treatments. Again, tationsand usingthe definition of 3,from Theorem 2. detailsof the proof are in the The Appendix. theoremfollows Notethat d(X) is theTSLS estimate,O., constructed using partlyfrom standard formulas interpreting TSLS usingmu- Z as an instrumentin a populationwhere X is fixed.Thus tuallyorthogonal instruments as a weightedaverage of each Theorem3 saysthat the TSLS estimatesof a singletreat- ofthe instrumental variables estimates obtained taking the menteffect in a modelwith dummy variable covariates instrumentsone by one. Moreover,when the instruments is a weightedaverage of the TSLS estimatesconditional aremutually exclusive dummy variables, TSLS canbe writ- on the covariates.The weightsconsist of the varianceof tenas a linearcombination of linearlyindependent Wald E[SI X, Z] conditionalon thecovariates. estimates(Angrist 1991). The proofessentially combines thesetwo results. 4.2 Inference Theorem2 providesa usefulinterpretation for conven- tionalTSLS estimates.Just as thesimple Wald estimator In anotherpaper (Imbens and Angrist 1994), we discussed convergesto a weightedaverage effect along the length of resultson theasymptotic variance of IV andTSLS estimates thecausal response function, TSLS estimatesprovide one in modelswith binary treatments. These results apply to the wayof combining a set of different weighted average effects case discussedhere, and theyimply that standard errors of intoa newweighted average. The weightsused to constructthe ACR can be calculatedusing formulas of Huber (1967) TSLS estimatesfrom Wald estimatesare proportionalto andWhite (1982). One reasonfor reporting TSLS estimates (E[SI Z = k] - E[SI Z = k- 1]). Thus thebetter the Wald as wellas Wald estimatesis thatin modelswith constant estimate,in thesense of being based on an instrumentwith treatmenteffects, the TSLS estimateshave asymptotically a biggerimpact on theregressor, the more weight it receives lowersampling variance than any single Wald estimate. In in theTSLS linearcombination. The secondcomponent of general,however, this need not be trueif there is variation the weightingfunction, Jf=k k,r,(E[SI Z = /1]- E[S]), sim- inthe average causal response across instruments. Neverthe- plifiesto [E(SI Z 2 k) - E(SI Z < k)]P(Z 2 k)[1 - P(Z less,TSLS providesa convenient way to combine alternative ? k)]. ThusTSLS givesmore weight to Waldestimates that IV estimatesin a singlestatistic. arecloser to thecenter of the distribution ofZ. TSLS estimatorsare also associatedwith an overidentifi- cationtest statistic that equals the objectivefunction im- 4.1 TSLSEstimates of Models withCovariates plicitlyminimized by the estimates(Newey 1985). In a Conditionalon discretecovariates such as yearof birth, constant-treatment-effectmodel estimated by TSLS, the theproblem of identifying and estimating the ACR isexactly statisticprovides an overidentificationtestfor the null hy- thesame as outlinedpreviously. Therefore, analysis of models pothesisthat all the instruments are orthogonal to theregres- withdiscrete covariates can proceedin subsampleswhere sionerror term. The constanttreatment effect is overiden- thecovariates are fixed. A moreparsimonious approach ex- tifiedbecause any single instrument would be sufficientfor ploitsthe fact that instrumental variables estimates of average identification.But in themodel outlined here, each instru- treatmenteffects have a usefulaveraging property in pooled mentcan lead to a differentestimate even though all the subsamples.In particular,ignoring the fact that the ACR instrumentssatisfy the independence assumption. In fact, mayvary with the covariates leads to a variance-weightedTheorems 1, 2, and 3 providepossible explanations for why averagetreatment effect. estimatesof causal effects such as theeconomic returns to The followingresult formally describes the probabilityschooling may differ in studiesusing different samples or in limitof the instrumental variables estimator when we allow a singlesample with different instruments and covariates. fora changingintercept but fix the treatment effect across Forexample, a recentstudy using instrumental variables to covariates. estimatethe returns to schoolingin a sampleof twins (Ash- Theorem3. Let g[X] be a designmatrix constructed enfelter and Krueger1994) leadsto estimatedcoefficients fromindicator variables for each value of X. Considerthe roughlydouble those reported here and by Angristand TSLS estimatecomputed using g(X) anda fullset of inter- Krueger(1991) . actionsbetween g(X) and Z as instrumentsfor a regression of Y on rowsof g[X] and S. The resultingestimate is 5. IV ESTIMATESOF THE RETURNSTO SCHOOLING: FOR WHOM? _E{Y *(E[SI X, Z] -E[Sj X])} 8 -I x- Et (S X, }__Z]-[S X]) Angristand Krueger (1991) usedlinear regression models withconstant coefficients to interpret estimates of the return -Et{f3(X)0)(X)}9 to schoolingbased on quarterof birth. In thecontext of the E[0)(X)] X(9) causalmodel outlined here, however, the Wald estimates in 438 Journalof the American Statistical Association, June 1995

CO0F I10

0 8

0.6

0.4

0.2

0 .0o . . . . .

0 4 8 1 2 1 6 2 0

YEARS OF SCHOOL ING

Figure1. SchoolingCDF byQuarter of Birth (Men Born 1920-1929; Data From the 1970 Census).Quarter of birth: , first;-- -, fourth.

Table 1 shouldbe interpretedas theaverage effect of a 1- 2. Bothfigures show that the CDF formen born in the fourth yearincrease in schooling,for people whose schooling is in- quarterlies below the CDF formen born in the first quarter. fluencedby quarter of birth. This is a smallgroup, not nec- Thisis importantevidence in favorof the monotonicity as- essarilyrepresentative ofthe entire population. To identifysumption in this example. The weighting function underlying theACR forthis group, the monotonicity condition requires estimatesof the ACR in Table 2 is proportionalto thedif- thatmen born in thefourth quarter get at leastas much ferencebetween the CDF ofschooling for men born in the schoolingas theywould have if they had beenborn in the firstquarter and theCDF ofschooling for men born in the firstquarter. If thiscondition is satisfied,then we can get fourthquarter. For each level of schooling, j, thisdifference someidea ofthe size and characteristicsofthe group con- is thefraction of the population whose schooling is switched tributingto theACR throughthe ACR weightingfunction. byquarter of birth from less than j yearsto at leastj years. The CDF's of schoolingby quarterof birthfor men in Figures3 and4 showdifferences in the CDF ofschooling the 1970and 1980Censuses are graphedin Figures1 and byquarter of birth. In each figure,differences between the

C D F I . n

0.8

0.6

0.4

0.2

0 4 8 12 16 20

YEARS OF SCHOOLING

Figure2. Schooling CDF by Quarterof Birth(Men Born 1930-1939; Data Fromthe 1980 Census). Quarterof birth: , first;- -, fourth. Angristand Imbens: Estimation of Average Causal Effects 439

0 029

0 019

0 0149

0 0 09

0 0 04

-0 00 1 0 4 8 1 2 1 6 2 0 YEARS OF SCH-OOLING Figure3. First-FourthQuarter Difference inSchooling CDF (MenBorn 1920-1929, Data From the 1970 Census). Dotted lines are 95% confidence intervals.

CDF ofschooling for men born in thefirst and fourth quar- with8-12 yearsof schooling. Both figures show declines in tersare plotted, along with 95% pointwise confidence bands theweighting function at around12 yearsof schooling.A (calculatedusing the conventional formula for a differencemaximum of around 2% ofthe sample was induced by being inproportions). ACR weighting functions for estimates based bornin the fourth quarter to complete 11th grade, but much on comparisonsbetween first- and fourth-quarterbirths are smallerfractions were induced to completehigher grades. theCDF differencesplotted in thefigures, normalized to Thisis notsurprising, because compulsory attendance laws sumto 1. affectmainly high school students and cannotcompel stu- The figuresshow that the groups contributing most to dentsto go to college.Note that some weight is contributed estimatesof theACR basedon quarterof birthare those bycollege attenders, perhaps because some students forced

o 0281

, 0 023

0 018

0 00 8

0 00 8

-0 002 3______.,___X

-0 002

0 4 8 1 2 1 6 20

YEARS OF SCHOOLING

Figure4. First-FourthQuarter Difference in SchoolingCDF (Men Born 1930-1939; Data Fromthe 1980 Census). Dottedlines are 95% confidence intervals. 440 Journal of the American Statistical Association, June 1995

0.028-

0 0 2 3 -

O 0 1 8

0 008 3

0.003 -

-0 .0 02

0 4 8 1 2 1 6 20

YEARS OF SCHOOLING

Figure5. Differencesin Schooling CDF by Quarterof Birth(Men Born 1920-1929; Data Fromthe 1970 Census). Quarterof birth: 1st-4th; ---; 2nd-4th; - -, 3rd-4th. byaccident of birth to graduatehigh school decided later to timatesfor the more recent cohort are lower), because the go on to collegeafter all. returnsto thelast year of collegetend to be substantially One interestingfeature of Figures 3 and 4 is thatFigure higherthan those for any single year of high school (Card 3, formen born in 1920-1929,shows a muchsharper drop and Krueger1992). at 12 yearsof schooling than does Figure4, formen born Figures5 and6 plotthe contrast between schooling CDF's in 1930-1939.Therefore, men who endedup completingfor birth quarters 1-3 relativeto fourth-quarterbirths. The somecollege because they were forced to graduate high school figuresshow that schooling CDF's areessentially ordered by contributemore to the estimates for men born in 1930-1939 quarterof birth. This is evidencethat any adjacent pair of thanto theestimates for men born in 1920-1929.This dif- quarterscan be usedto define a binaryinstrumental variable ferencemay explain the higher Wald and TSLS estimates thatsatisfies the monotonicity assumption. TSLS usingthree formen born in 1930-1939(despite the fact that OLS es- quarter-of-birthdummies is a weightedaverage of the three

0 .028

0 023

0.0181

L0.013 /t

0 0083

0002 0 4 8 1 2 1 6 20 YEARS OF SCHOOLING

Figure6. Differencesin Schooling CDF by Quarterof Birth(Men Born 1930-1939; Data Fromthe 1980 Census). Quarterof birth: 1st-4th; -- -, 2nd-4th; - -, 3rd-4th. Angristand Imbens: Estimation of Average Causal Effects 441

possibleWald estimates based on adjacentquarters of birth. Using the independenceassumption, E [ Y I Z = 1]E- E Y Z = 0] The TSLS estimates,reported in Table 2, are .063 in the is, therefore, 1970Census and .103 in the 1980Census. These are esti- J matedwith slightly greater precision than the Wald estimates E Yj' -[-Olj- j+I-Xoj + XOj+I] reportedin Table 1. J=O Estimatesof models including year of birth dummies are = - + - reportedin columns(3)-(4) and (7)-(8) of Table 2. The E [(Yj Yj_)(lj-XOj)] Yo (Xo XOO)} instrumentlist forthese models includes a set of three quarter-of-birthdummies for each year of birth. In thecon- = E( (Yj - Yj -(Xlj _o1)}, textof Theorem 3, theTSLS estimatesin columns(4) and (8) canbe interpreted as a weightedaverage of separate TSLS because Xzo= 1 forZ = 0, 1. Note thatXlj 2 Xoj by Assumption2 estimatesof the ACR foreach year of birth. and thatX j and Xojequal 0 or 1. Therefore, j - Xojequals 0 or The TSLS overidentificationteststatistics for each of the 1, and we can writethe previousexpression as modelsreported in Table 2 are farfrom critical values at J conventionalsignificance levels under the null hypothesis of 2: E[Yj -Yj-lI lj i-Xoj = 1] -Pr(Xlj- xo= 1) constanttreatment effects and instrumenterror orthogo- j=l nality.Thus the test statistics cast little doubt on theconstant- J treatment-effectandindependence assumptions. = E[Yj-Yjil I S1I j > So] *Pr(SI > j > So). j=1

6. SUMMARYAND CONCLUSIONS Similarly,for the denominator,S = Z *SI + (1 - Z) *SO and, because playsthe role playedby in the numerator, Thisarticle defines the average causal response to variable j Yj treatmentssuch as drugdosage, cigarettes smoked, hours of E[SI Z= 1]-E[SI Z = O] study,and yearsof schooling.We haveshown here that a = weightedaverage of per-unit causal responses to a changein E ( j ( -lj-Xj+- Xoj+ Xoj+l) treatmentintensity is identified in a widevariety of models J=J andcircumstances. The average response that we can identify = E ( -ljXo) = jz1 Pr(SI 2 j > SO). is forindividuals whose treatment status is affectedby an in- 'j=l j=I strumentalvariable that is independentofpotential outcomes andpotential treatment intensities. The monotonicity condition Proof of Theorem 2 imposedwhen deriving this result requires only that the in- The denominatorof the formulafor Ak iS the same as the de- strumentalvariable affect treatment intensity in the same di- nominatorof the expression for l.. To evaluatethe numerator, we rectionfor each unit of observation. This condition has testable can write implicationsin models with variable treatment intensities. E[YI Z= l] We havepresented formulas for the weighting functions thatunderlie IV andTSLS estimatesof average causal effects. =g1,11_(E[SIZ= 1]-E[SIZ= I-l])+E[YIZ= I-1] Theseformulas can helpempirical researchers understand whichobservations are contributing to a particularestimate = E fk,k-l(E[SI Z = k] -E[SI Z = k-1]) + E[Y I Z = 0] k=1 and providea causalinterpretation for some of the simple estimatorscommonly used in appliedresearch. The inter- and pretationof TSLS and theexample presented here serve to E{Y . (E[SI Z -E[S]) } emphasizea pointmade earlier by Rubin (1986), thatob- servationaldata can onlybe informativeabout the causal = E{E[YIZ = l].(E[SIZ = 1] - E[S])}- effectof treatment for those whose treatment status can be Usingthe first line to substitutefor E[Y IZ = 1] in E{ Y *(E[SI Z] thoughtof as havingbeen manipulated in someway. This - E[S]) }, we have papershows that the estimated treatment effect may change K I whenthe nature of this manipulation changes. i7r,(E[SIZ = 1] - E[S])3k,k_I(E[SI Z = k]- E[SI Z = k - 1]) 1=1k=1 APPENDIX:PROOFS K K = z ir,(E[SIZ = 1]- E[S])1k,k-l Proofof Theorem I k=1 I=k Let I(A) be theindicator function for the event A. Definethe X (E[SI Z = k] - E[SI Z = k -1]). followingindicators: Xzj = I(Sz 1j) forZ = 0, 1 andj = 0, 1, 2, This establishesthe rightside of formulafor the weights,/k. The ... . J + 1. Notethat Xzo = 1 and Xzj+l = 0 forall Z. In termsof weightsare nonnegative,because the pointsof supportof Z are theXzj, Y canbe written as orderedso thatE[SI Z = k] > E[SI Z = k - 1]. To showthat the weightssum to 1, notethat the sum of thenumerator of each Ak iS Y = Z * Ysl + ( I-+Z) + 1 K K yk= z iril(E[SI Z = 1] -E[S] )(E[S|IZ = k] k=l l=k -E[SIZ=k- 1]). 442 Journal of the American Statistical Association, June 1995

Reversingthe orderof summationas before,this equals (1988), "CausalInference, Path Analysis, and Recursive Structural EquationsModels," in SociologicalMethodology, ed. CliffordC. Clogg, K ! Washington:American Sociological Association, pp. 449-484. 2: 2:7r,(E[SI Z = I] -E[S])(E[SI Z = k] -E[SI Z = k -1]). Huber,P. J. (1967), "The Behaviorof MaximumLikelihood Estimates 1=1 k=l UnderNonstandard Conditions," in Proceedingsof the5th Berkeley Symposiumon Mathematical Statistics and Probability, 1,pp. 221-233. Reversingthe first two stepsof theproof for the numerator,this is Imbens,G., and Angrist, J.(1994), "Identificationand Estimation of Local AverageTreatment Effects," , 62, 467-476. K Leamer,E. E. (1988),"Discussion," inSociological Methodology 1988 (Vol. z 7r,(E[SIZ = 1] - E[S])E[SIZ = 1]. 18), Washington:American Sociological Association, Chapter 14. 1=0 Newey,W. (1985), "GeneralizedMethod-of-Moments Estimation and Testing,"Journal of Econometrics, 29, 229-256. [ReceivedAugust 1992. Revised March 1994.] (1990), "EfficientInstrumental Variables Estimation of Nonlinear Models,"Econometrica, 58, 809-838. REFERENCES Permutt,T., andHebel, J. (1989), "Simultaneous-EquationEstimation in a ClinicalTrial of the Effect of Smoking on BirthWeight," , 45,619-622. Angrist,J.(1990), "LifetimeEarnings and the Vietnam-Era Draft Lottery: Powers,D. E.,and Swinton, S. S. (1984),"Effects ofSelf-Study forCoachable Evidencefrom Social Security Administrative Records," American Eco- TestItem Types," Journal of Educational Psychology, 76, 266-278. nomicReview, 80, 313-335. Robins,J. M. (1989a),"The Analysis of Randomized and Non-Randomized (1991), "Grouped-DataEstimation and Testingin SimpleLabor AIDS TreatmentTrials Using a NewApproach to CausalInference in SupplyModels," Journalof Econometrics, 47, 243-266. LongitudinalStudies," in Health Service Research Methodology: A Focus Angrist,J., Imbens, G., and Rubin,D. (1995), "IdentificationofCausal onAIDS, NCHSR,U.S. PublicHealth Service, eds. L. Sechrest,H. Free- EffectsUsing Instrumental Variables," Journal of the American Statistical man,and A. Bailey,pp. 113-159. Association,forthcoming. (1989b),"The Control of Confounding byIntermediate Variables," Angrist,J., and Krueger, A. (1991), "Does CompulsorySchool Attendance Statisticsin Medicine, 8, 679-701. AffectSchooling and Earnings?,"Quarterly Journal of Economics, 106, Robins,J. M., and Greenland, S. (1992), "Identifiability andExchangeability 979-1014. forDirect and Indirect Effects," Epidemiology, 3, 143-155. (1992), "TheEffect of Age at SchoolEntry on EducationalAttain- Robins,J. M., andTsiatis, A. (1991), "Correctingfor Non-Compliance in ment:An Applicationof InstrumentalVariables with Moments From RandomizedTrials Using Rank-Preserving Structural Failure Time Two Samples,"Journal of the American Statistical Association, 87, 328- Models,"Communications in Statistics Part A-Theory and Methods, 336. 20,2609-2631. Ashenfelter,O.,and Krueger, A. (1994),"Estimates ofthe Economic Return Rosen,S., and Willis, R. J.(1979), "Educationand Self-Selection," JournFal toSchooling from a NewSample of Twins," American Economic Review, ofPolitical Economy, 87, S7-S36. 84, 1157-1173. Rubin,D. (1974),"Estimating Causal Effects ofTreatments inRandomized Becker,G. (1964), HumanCapital, Chicago: University ofChicago Press. and NonrandomizedStudies," Journal of Educational Psychology, 66, Card,D., andKrueger, A. (1992), "Does SchoolQuality Matter? Returns 688-701. to Educationand theCharacteristics of Public Schools in theUnited (1977), "Assignmentto a TreatmentGroup on theBasis of a Co- States,"Journal of Political Economy, 100, 1-40. variate,"Journal of Educational Statistics, 2, 1-26. Durbin,J. (1954), "Errorsin Variables,"Review of the International Sta- (1978), "BayesianInference for Causal Effects: The Role ofRan- tisticalInstitute, 22, 23-32. domization,"The , 6, 34-58. Hearst,N., Newman,T., and Hulley,S. (1986), "DelayedEffects of the (1986), Commenton "Statisticsand Causal Inference" by P. Hol- MilitaryDraft on Mortality:A Randomized Natural Experiment," New land,Journal of the American Statistical Association, 81, 945-970. EnglandJournal of Medicine, 314, 620-624. (1990), "Comment:Neyman (1923) andCausal Inference in Ex- Heckman,J., and Hotz, V. J.(1989), "ChoosingAmong Alternative Non- perimentsand Observational Studies," , 5, 472-480. experimentalMethods for Estimating the Impact of Social Programs: The Theil,H. (1958), EconomicForecasts and Policy,Amsterdam: North- Case of ManpowerTraining," Journal of the American Statistical Asso- Holland. ciation,84, 862-880. (1971), Principlesof Econometrics, New York: John Wiley. Heckman,J., and Robb, R. (1985), "AlternativeMethods for Evaluating Wald,A. (1940),"The Fitting of Straight Lines if Both Variables are Subject the Impactof Interventions,"in LongitudinalAnalysis of Labor Market to Error,"Annals of Mathematical Statistics, 11, 284-300. Data,eds. J. Heckman and B. Singer,New York: Cambridge University White, H. (1982), "InstrumentalVariables Estimation With Independent Press,pp. 145-245. Observations,"Econometrica, 50, 482-499. Holland,P. (1986),"Statistics and Causal Inference," Journal of theAmer- Wright,S. (1928), Appendixto TheTariff on Animal and VegetableOils, ican StatisticalAssociation, 81, 945-970. byP. G. Wright,New York: MacMillan.