In Journal of American Statistical Association, Vol. 95, No. 450, 428-431, June 2000. TECHNICAL REPORT With discussions by D. Cox; G. Casella & S. Schwartz; J. Pearl; J. Robins & S. Greenland; R-269 D. Rubin; G. Shafer; L. Wasserman. June 2000 Comment Judea PEARL

1. BACKGROUND ships among observablevariables remain invariantwhen The fieldof statisticshas seenmany well-meaning cru- the values of those variableschange relative to our imme- sades againstthreats from metaphysics and otherheresy. diate observations.For example,Ohm's law (V = IR) as- In its foundingprospectus of 1834,the Royal Statistical sertsthat the ratiobetween the current(I) and the voltage Societyresolved "to excludecarefully all Opinionsfrom (V) across a resistorremains constant for all values of I, its transactionsand publications-toconfine its attentionincluding yet-unobserved values of I. We usually express rigorouslyto facts."This clause was officiallystruck out thisclaim in a functionor a hypotheticalsentence: "Had the in 1858,when it becameobvious that facts void of the- currentin theresistor been I (insteadof theobserved value orycould not take very far (Annals of theRoyal Io) the voltage would have been V = I(Vo/Io)," know- StatisticalSociety 1934, p. 16). ing perfectlywell thatthere is no way to simultaneously KarlPearson launched his ownmetaphysics "red scare" measure I and Io. (Every mathematicalfunction is inter- aboutcausality in 1911:"Beyond such discarded fundamen- preted hypothetically, and the studyof counterfactualsis talsas 'matter'and 'force' lies still another fetish amidst the merelya studyof standardmathematical functions.) Such inscrutablearcana of modern science, namely, the category sentencesappear to be counterfactual,because they deal of causeand effect" (Pearson 1911, p. iv). Pearson'sobjec- withunobserved quantities that differ from (and henceseem tionto theoreticalconcepts such as "matter"and "force" to contradict)those actually observed. Nonetheless, this cir- was so fierceand his rejectionof determinismso absolute cumstantialnonobservability and apparentcontradiction do thathe consignedstatistics to almosta centuryof neglect notdiminish whatsoever the ability to submitphysical laws withinthe study of causalinference. Philip Dawid was one to empiricaltests. Scientific methods thrive on attemptsto of a handfulof statisticianswho boldly protested the stale- confirmor falsifythe predictionsof such laws. mateover causality: " is one of themost The same applies to stochastic processes (or data- important,most subtle, and most neglected of all theprob- generationmodels), usually writtenin the formof func- lemsof statistics"(Dawid 1979). tionalrelations y = f(x, U), whereX and U standfor two In thepast two decades,owing largely to progressin sets of randomvariables, with joint distributionP(x, u), counterfactual,graphical, and structuralanalyses, causal- and f is a function(usually of unknownform) that deter- ityhas beentransformed into a mathematicaltheory with mines the value of the outcome Y = y in termsof ob- well-definedsemantics and well-foundedlogic, and many served and unobservedquantities, X = x and U = u. To practicalproblems that were long regarded as eithermeta- see how counterfactualsand joint probabilitiesof counter- physicalor unmanageable can now be solvedusing elemen- factualsemerge from such a stochasticmodel, I consider a tarymathematics. (See Pearl2000 fora gentleintroduction simple case where Y and X are binaryvariables (e.g., to thecounterfactual, graphical, and structural equation ap- treatmentand response)and U is an arbitrarycomplex set of proachesto causality.)In thearticle, Professor Dawid wel- all othervariables that may influenceY. For any given conditionU the X Y comesthe new progressin causal analysisbut expresses u, relationshipbetween and must be one mistrustof thequasi-deterministic methods by whichthis of the (only) fourbinary functions progresshas beenachieved. fo: y = 0 or {Yo = O,Y,= 0}, Attitudesof towardcounterfactuals and suspicion struc- fl: y = x or {Yo = O,Y1= 1} turalequation models are currently pervasive among statis- x ticians,and Dawid should be commendedfor bringing such f2: y Xor {Yo = 1,Y,= 0}, concernsinto the open. By helpingto dispelmisconcep- and tionsabout counterfactuals, Dawid's article may well have f3: y = or{Yo = 1, Y =1}. (1) rescuedstatistics from another century of stagnationover As u varies along its domain,the only effectit can have causality. on the model is to switchthe relationshipbetween X and Y among these fourfunctions. This partitionsthe domain 2. THE EMPIRICAL CONTENT OF of U into fourequivalence classes, whereeach class con- COUNTERFACTUALS tains those pointsu thatcorrespond to the same function. The word"counterfactual" is a misnomer.Counterfac- The probabilityP(u) thus induces a probabilityfunction tualscarry as clearan empiricalmessage as anyscientific over the potential response pairs {Yo, Y1} shownin (1). This laws,and indeed are fundamentalto them.The essenceof constructionis the inverseof the one discussedin Dawid's any scientificlaw lies in theclaim that certain relation- Section 13; one startswith genuineconcomitants U, and

JudeaPearl is Professorof ComputerScience and Statistics,University ? 2000 American Statistical Association of California,Los Angeles,CA 90024 (E-mail:[email protected], Web: Journal of the American Statistical Association www.cs.ucla..edu/-judea/). June 2000, Vol. 95, No. 450, Theory and Methods

428

This content downloaded from 131.179.232.118 on Fri, 12 Sep 2014 18:41:13 PM All use subject to JSTOR Terms and Conditions Pearl: Comment 429 theyturn into jointly distributedcounterfactual concomi- queriesread: tants{Yo, Y1} thatDawid calls metaphysicaland fatalistic. Admittedly,when u standsas the identityof a person, QI = P (Y1 = 0) - P (YO = 0) the mappingof u into the pair {Yo, Y1} appears horridly and fatalistic,as if thatperson is somehowdoomed to reactin = a predeterminedway to treatment(X 1) and no treat- QII = P (YO 1|X = i,Y = 0). (2) ment (X = 0). However,if one views u as the sum total of all experimentalconditions that mightpossibly affect In words,Qll standsfor the probabilitythat my headache thatindividual's reaction (including biological, psychologi- would have stayedhad I not takenaspirin (Yo = 1), given cal, and spiritualfactors, operating both before and afterthe thatI did in fact take aspirin (X = 1) and the headache applicationof the treatment),then the mappingis seen to has gone (Y = 0). (I restrictthe population to personswho evolve reasonablyand naturallyfrom the functionalmodel have headachesprior to consideringaspirin.) Dawid is cor- y = f (x, u). This quasi-deterministicfunctional model mir- rect in statingthat the two queries are of differenttypes, rorsLaplace's conceptionof nature(Laplace 1814), accord- and thelanguage of counterfactualsdisplays this difference ing to whichof nature'slaws are deterministic,and random- and its ramificationsin vividmathematical form. By exam- ness surfacesmerely due to our ignoranceof theunderlying iningtheir respective formulas, one can immediatelydetect boundaryconditions. (The structuralequation models used thatQll is conditionedon the outcomeY = 0, whereasQI in economics,biology, and stochasticcontrol are typical is unconditioned.This impliesthat some knowledgeof the examplesof Laplacian models.)Dawid deteststhis concep- functionalrelationship (between X and Y) mustbe invoked tion. This is not because it ever failed to match macro- in estimatingQll (Balke and Pearl 1994). I challengeDawid scopic empiricaldata (only quantummechanical phenom- to express Qll, let alone formulateconditions for its esti- ena exhibitassociations that might conflict with the Lapla- mationin a counterfactual-freelanguage. For background cian model),but rather because it appearsto standcontrary information,the identificationof QI requires exogeneity to the "familiarstatistical framework and machinery"(Sec. (i.e., randomizedtreatment), whereas thatof Qll requires 7). I fail to see why a frameworkand machinerythat did both exogeneityand monotonicity;both assumptionshave not exactlyexcel in the causal arena shouldbe deprivedof testableimplications (Pearl 2000, p. 294). Epidemiologists enhancementand retooling. are well aware of the differencebetween QI and Qll [they usuallywrite Qll = QI/P (Y = O|X = 1)], thoughthe cor- 3. EMPIRICISM VERSUS IDENTIFIABILITY respondingidentification conditions for Qll are oftennot Dawid's empiricismis summarizedin his abstract: spelled out as clearlyas theycould (Greenlandand Robins 1988). By definition,one can never observe such [counterfactual] quantities,nor assess empiricallythe validityof any model- What is puzzlingin Dawid's articleis thathe considers ing assumptionmade aboutthem, even thoughone's conclu- Qll to be, on one hand,valid and important(Sec. 3) and,on sions may be sensitiveto theseassumptions. the otherhand, untestable (Sec. 11); the two are irreconcil- able. If Qll is valid and important,then one shouldexpect This warningis not entirelyaccurate. Many counter- the magnitudeof Qll to affectsome futuredecisions, and factual modeling assumptionsdo have testable implica- can then use the correctnessof those decisions as a test tions; for example, exogeneity(or ignorability)(Y1 ii X) (henceinterpretation) of theempirical claims made by Qll. > and monotonicity(Y1 (u) Yo(u)) each can be falsifiedby What are those claims,and how can theybe tested? and data comparingexperimental nonexperimental (Pearl Accordingto theinterpretation given in theprevious sec- 2000, p. 294). More important,the warning is eitherempty tion,counterfactual claims are merelyconversational short- or self-contradictory.If one's conclusionshave no practi- hand for scientificpredictions. Hence Qll stands for the cal thentheir to invalidassump- consequences, sensitivity probabilitythat a person will benefitfrom taking aspirin is is tions totallyharmless, and Dawid's warning empty. in the nextheadache episode, giventhat aspirin proved ef- on other one's conclusionsdo have practical If, the hand, fectivefor that person in the past (i.e., X = 1,Y = 0). consequences,then theirsensitivity to assumptionsauto- Therefore,Qll is testablein sequentialexperiments where makes those testable,and Dawid's matically assumptions subjects'reactions to aspirinare monitoredrepeatedly over turnscontradictory. warning time. (One needs to assume thata person's characteristics The and headache, which two queries about aspirin do not change over time,an assumptionthat is testablein Dawid uses to distinguisheffects of causes fromcauses of principle.)In such testsone can easily verifywhether sub- effects from may servewell to illustrate ("sheep" "goats"), jects who have had one positive experiencewith aspirin in the inconsistency Dawid's philosophy.The two queries (X = 1,Y = 0) have a higherthan average probabilityof are benefitingfrom aspirin in the future. I. I have a headache.Will it help if I take aspirin? I have arguedelsewhere (Pearl 2000, p. 217) thatcounter- II. My headachehas gone. Is it because I took aspirin? factualqueries of typeII are the normin practicaldecision making,whereas causal effectqueries (type I) are theexcep- LettingX =1 standfor "takingaspirin" and and Y =1 tion.The reasonis thatdecision-related queries are usually stand for "having a headache" (after 1/2 hour, say), the broughtinto focus by observationsthat could be modified counterfactualexpressions for the probabilities of thesetwo by thedecision (e.g., a patientsuffering from a set of symp-

This content downloaded from 131.179.232.118 on Fri, 12 Sep 2014 18:41:13 PM All use subject to JSTOR Terms and Conditions 430 Journalof the AmericanStatistical Association, June 2000 toms).The case-specificinformation provided by thoseob- nately,careful reading of his articleshows that David aims servationsis essentialfor properlyassessing the effectof to imposean overlyrestrictive and unworkabletype of safe- the decision,and conditioningon these observationsleads guard,a typerejected in almostevery branch of science. to queriesof typeII, as in Qll. The Bayesian approachpro- posed by Dawid cannot properlyhandle conditioningon 4. PRAGMATICVERSUS DOGMATIC EMPIRICISM factorsthat are affectedby the treatment,and thus pre- cludesanswering the most common type of decision-related The requirementof identifiability,as just stated,is a re- queries. (Detailed dynamicmodels or temporallyindexed strictionon the typeof queriesone may ask (or inferences data for every conceivable set of observationswould be one maymake) and noton thetype of modelsone mayuse. neededfor specifying the probabilities in the decisiontrees This bringsup the differencebetween pragmatic and dog- of such analyses.) maticempiricism. A pragmaticempiricist insists on asking I agree withDawid thatcertain assumptions needed for empiricallytestable queries, but leaves the choice of theo- identifyingcausal quantitiesare not easily understood(let ries to convenienceand imagination;the dogmaticempiri- alone ascertained)when phrased in counterfactualterms. cist insistson positingonly theoriesthat are expressiblein Typical examples are assumptionsof ignorability(Rosen- empiricallytestable vocabulary. As an extremeexample, a baum and Rubin 1983), which involve conditionalinde- strictlydogmatic empiricist would shunthe use of negative pendenciesamong counterfactualvariables. However, this numbers,because negativequantities are not observablein cognitivedifficulty comes not because counterfactualsare isolation.For a less extremeexample, a pragmaticempiri- untestable,but rather because dependenciesamong counter- cist would welcome the counterfactualmodel of individual factualsare derivedquantities that are a few stepsremoved causal effects(ICE) (see Sec. 5.2) as long as it leads to fromthe way we conceptualizecause-effect relationships. valid and empiricallytestable estimation of the quantityof To overcomethis difficulty,a hybrid form of analysiscan interest(e.g., ACE). Dawid rejectsthis model a prioribe- be used,in whichassumptions are expressedin thefriendly cause it startswith unobservable unit-based counterfactual formof functionalrelationships (or diagrams),and causal terms,Y1 (u) and Yo(u), and thusfails the dogmatic require- queries(e.g., Qll) are posed and evaluatedin counterfactual mentthat the entireanalysis, including all auxiliarysym- vocabulary(Pearl 2000, p. 215-7, 231-4). Functionalmod- bols and all intermediatesteps, "involve only terms subject els, in theform of nonparametricstructural equations, thus to empiricalscrutiny." What is gained by this prohibition, provideboth formalsemantics and conceptualbasis for a accordingto Dawid, is protectionfrom asking nonidentifi- completemathematical theory of counterfactuals. able queries.His proposal,in theform of Bayesiandecision In Section 5.4, Dawid restateshis empiricistphilosophy trees,indeed ensures that one does notask certainforbidden in the formof a requirementwhich he calls Jeifreys'slaw: questions,but unfortunately,it also ensuresthat one never asks or answersimportant questions (such as Qll) thatcan- ... mathematicallydistinct models that cannot be distin- not be expressedin his restrictedlanguage. It is a stifling guished on the basis of empiricalobservation should lead insurancepolicy, analogous to banningdivision from arith- to indistinguishableinference. meticsto protectone fromdividing by 0. (Overprotection This requirementreads like a tautology:If two models en- may also temptthe counterfactualcamp; see Imbens and tail two distinguishableinferences, and if thedifference be- Rubin 1995.) tweenthe two inferencesmatters at all, thenthe two mod- Science rejected this kind of insurancelong ago. The els can easily be distinguishedby whatever(empirical) cri- Babyloniansastronomers were mastersof black box pre- terionused to distinguishthe two inferences.Dawid may diction,far surpassingtheir Greek rivals in accuracy and have meantthe following: consistency(Toulmin 1961, pp. 27-30). Yet sciencefavored the creative-speculativestrategy of the Greek astronomers, ... mathematicallydistinct models that cannot be distin- whichwas wild withmetaphysical imagery: circular tubes guished on the basis of past empiricalobservation should fullof fire,small holes throughwhich the fire was visibleas lead to indistinguishableinference regarding future observa- stars,and hemisphericalearth riding on turtlebacks. It was tion(which may be obtainedunder new experimentalcondi- tions). this wild modelingstrategy, not Babylonianrigidity, that jolted Eratosthenes(276-194 B.C.) to performone of the This is noneother but the requirement of identifiability(see, mostcreative in theancient world and measure e.g.,Pearl 1995). It requires,for example, that if ourdata are the radiusof the earth. nonexperimental,then two models that are indistinguishable This creative speculate-test-rejectstrategy (which is on the basis of those data entailthe same value of the av- my understandingof Popperian empiricism)is practiced erage causal effect(ACE)-a quantitydiscernible in exper- throughoutscience because it aims at understandingthe imentalstudies. It likewiserequires that if one's data come mechanismsbehind the observationsand thusgives rise to fromstatic experiments, then two models thatare indistin- new questionsand new experiments,which eventually yield guishableon the basis of those data entail the same value predictionsunder novel sets of conditions.Quantum me- of Q1I-a quantitydiscernible in sequentialexperiments. chanics was inventedprecisely because J. J. Thomsonand If the aim of Dawid's empiricismis to safeguardidenti- othersdared take deterministic classical mechanicsvery se- fiability,his proposal would be welcome by all causal an- riously,and boldly asked "metaphysical"questions about alysts,including adventurous counterfactualists. Unfortu- physicalproperties of electronswhen electronswere un-

This content downloaded from 131.179.232.118 on Fri, 12 Sep 2014 18:41:13 PM All use subject to JSTOR Terms and Conditions Robins and Greenland: Comment 431 observable.The language of counterfactualslikewise en- mathbooks? Examiningthe major tangible results in causal ables the statisticianto pose and reject a much richerset inferencein the past two decades (e.g., propensityscores, of "whatif" questionsthan does the languageof Bayesian identificationconditions, covariate selection, asymptotic decisiontheory. Giving up thisrichness is the price to pay bounds)reveals that, although these results could have been forDawid's insurance. derivedwithout counterfactuals, they simply were not. This may not be takenas a coincidenceif one asks why it was 5. COUNTERFACTUALS AS INSTRUMENTS Eratosthenesthat measured the size of the earthand not Dawid reports(at the end of Sec. 10.2) thatthe bounds some Babylonianastronomer, master in black box predic- for causal effectsin clinical trials with imperfectcom- tion.The success of thecounterfactual language stems from pliance (Balke and Pearl 1997) are "sheep-like"-namely two ingredientsnecessary for scientific progress in general: valid,meaningful, and safe evenfor counterfactually averse (a) theuse of modelinglanguages that are somewhatricher statisticians.Ironically, when we examine the conditional thanthe ones neededfor routine predictions, and (b) theuse probabilitiesthat achieve those bounds,we findthat they of powerfulmathematics to filter,rather than muzzle, the representsubjects with deterministicbehavior, compliers, untestablequeries thatsuch languagestempt us to ask. never-takers,and defiers,precisely the kind of behaviorthat Dawid is invitingcausality to submitto the Babylonian Dawid rejectsas "fatalistic"(Sec. 7.1). The lesson is illu- safeguardof black box mentality.I dare predictthat causal- minating:Even startingwith the best sheep-likeintentions, itywill rejecthis offer. thereis no escape fromcounterfactuals and goat-likedeter- minismin causal analysis. ADDITIONAL REFERENCES This lesson leads to a new way of legitimizingcoun- Annals of theRoyal StatisticalSociety 1834-1934 (1934), The Royal Sta- terfactualanalysis in the conservativecircles of statistics. tisticalSociety, London, p. 16. Researcherswho mistrustthe quasi-deterministicmodels Balke, A., and Pearl,J. (1997), "Bounds on TreatmentEffects From Stud- ies With ImperfectCompliance," Journal of the AmericanStatistical of Laplace (i.e., y = f(x, u)) can now view these mod- Association,92, 1172-1176. els as limitpoints of a space of nondeterministicmodels Greenland,S., and Robins,J. (1988), "ConceptualProblems in theDefini- P(ylx) constrainedto agree with the observed data. Ac- tion and Interpretationof AttributableFractions," American Journal of cordingly,the mistrustfulanalysis of counterfactualscan ,128, 1185-1197. Imbens,G. W., and Rubin,D. R. (1995), Discussion of "Causal Diagrams now be viewed as a benignanalysis of limitpoints of or- forEmpirical Research" by J. Pearl,, 82, 694-695. dinaryprobability spaces, in muchthe same way thatirra- Laplace, P. S. (1814), Essai Philosophiquesure les Probabilites,New York: tionalnumbers can be viewed as limitpoints (or Dedekind Courcier,English translationby F. W. Truscottand F. L. Emory,New cuts) of benignsets of rationalnumbers. York: Wiley,1902. Pearl,J. (2000), Causality,New York: CambridgeUniversity Press. Dawid is correctin notingthat many problems about the Pearson, K. (1911), Grammarof Science, (3rd ed.), London: A. and C. effectsof causes can be reinterpretedand solved in non- Black. counterfactualterms. Analogously, some of my colleagues Rosenbaum,P., and Rubin, D. (1983), "The CentralRole of Propensity = Score in ObservationalStudies for Causal Effects,"Biomnetrica, 70, 41- can derive De-Moivre's theorem,cosnO Re[(cos0 + 55. i sin O)n], withoutthe use of those mistrustfulimaginary Toulmin,S. (1961), Forecastand Understanding,Bloomington, IN: Indiana numbers.So, shouldwe strikecomplex analysisfrom our UniversityPress.

Comment JamesM. ROBINS and SanderGREENLAND

By narrowlyconcentrating on randomizedexperiments relianceon counterfactualsor theirlogical equivalentscan- with completecompliance, Dawid, in our opinion,incor- not be avoided. rectlyconcludes that an approachto causal inferencebased Causal inferencefrom observationaldata and broken on "decision analysis"and freeof counterfactualsis com- experimentshistorically has been viewed as problematic, pletely satisfactoryfor addressingthe problem of infer- and even illegitimate,by most statisticians.Thus we re- ence about the effectsof causes. We argue thatwhen at- gard it as a serious oversightfor Dawid to deny the temptingto estimatethe effectsof causes in observational usefulness of a counterfactualswithout a more careful considerationof observationalstudies and brokenexper- studiesor in randomizedexperiments with noncompliance iments.The purpose of this discussionis to redressthat (termed broken experimentsby Barnard et al. (1998), oversight,by reviewingthe considerationsthat have led

JamesM. Robins is Professorof Epidemiologyand ,Har- vard School of Public Health,Boston, MA 02115. Sander Greenlandis ? 2000 American Statistical Association Professorof Epidemiology,UCLA School of Public Health,Los Angeles, Journal of the American Statistical Association CA 90095. June 2000, Vol. 95, No. 450, Theory and Methods

This content downloaded from 131.179.232.118 on Fri, 12 Sep 2014 18:41:13 PM All use subject to JSTOR Terms and Conditions