Causal Inference Without Counterfactuals

In Journal of American Statistical Association, Vol. 95, No. 450, 428-431, June 2000. TECHNICAL REPORT With discussions by D. Cox; G. Casella & S. Schwartz; J. Pearl; J. Robins & S. Greenland; R-269 D. Rubin; G. Shafer; L. Wasserman. June 2000 Comment Judea PEARL 1. BACKGROUND ships among observablevariables remain invariantwhen The fieldof statisticshas seenmany well-meaning cru- the values of those variableschange relative to our imme- sades againstthreats from metaphysics and otherheresy. diate observations.For example,Ohm's law (V = IR) as- In its foundingprospectus of 1834,the Royal Statistical sertsthat the ratiobetween the current(I) and the voltage Societyresolved "to excludecarefully all Opinionsfrom (V) across a resistorremains constant for all values of I, its transactionsand publications-toconfine its attentionincluding yet-unobserved values of I. We usually express rigorouslyto facts."This clause was officiallystruck out thisclaim in a functionor a hypotheticalsentence: "Had the in 1858,when it becameobvious that facts void of the- currentin theresistor been I (insteadof theobserved value orycould not take statistics very far (Annals of theRoyal Io) the voltage would have been V = I(Vo/Io)," know- StatisticalSociety 1934, p. 16). ing perfectlywell thatthere is no way to simultaneously KarlPearson launched his ownmetaphysics "red scare" measure I and Io. (Every mathematicalfunction is inter- aboutcausality in 1911:"Beyond such discarded fundamen- preted hypothetically, and the studyof counterfactualsis talsas 'matter'and 'force' lies still another fetish amidst the merelya studyof standardmathematical functions.) Such inscrutablearcana of modern science, namely, the category sentencesappear to be counterfactual,because they deal of causeand effect" (Pearson 1911, p. iv). Pearson'sobjec- withunobserved quantities that differ from (and henceseem tionto theoreticalconcepts such as "matter"and "force" to contradict)those actually observed. Nonetheless, this cir- was so fierceand his rejectionof determinismso absolute cumstantialnonobservability and apparentcontradiction do thathe consignedstatistics to almosta centuryof neglect notdiminish whatsoever the ability to submitphysical laws withinthe study of causalinference. Philip Dawid was one to empiricaltests. Scientific methods thrive on attemptsto of a handfulof statisticianswho boldly protested the stale- confirmor falsifythe predictionsof such laws. mateover causality: "Causal inference is one of themost The same applies to stochastic processes (or data- important,most subtle, and most neglected of all theprob- generationmodels), usually writtenin the formof func- lemsof statistics"(Dawid 1979). tionalrelations y = f(x, U), whereX and U standfor two In thepast two decades,owing largely to progressin sets of randomvariables, with joint distributionP(x, u), counterfactual,graphical, and structuralanalyses, causal- and f is a function(usually of unknownform) that deter- ityhas beentransformed into a mathematicaltheory with mines the value of the outcome Y = y in termsof ob- well-definedsemantics and well-foundedlogic, and many served and unobservedquantities, X = x and U = u. To practicalproblems that were long regarded as eithermeta- see how counterfactualsand joint probabilitiesof counter- physicalor unmanageable can now be solvedusing elemen- factualsemerge from such a stochasticmodel, I consider a tarymathematics. (See Pearl2000 fora gentleintroduction simple case where Y and X are binaryvariables (e.g., to thecounterfactual, graphical, and structural equation ap- treatmentand response)and U is an arbitrarycomplex set of proachesto causality.)In thearticle, Professor Dawid wel- all othervariables that may influenceY. For any given conditionU the X Y comesthe new progressin causal analysisbut expresses u, relationshipbetween and must be one mistrustof thequasi-deterministic methods by whichthis of the (only) fourbinary functions progresshas beenachieved. fo: y = 0 or {Yo = O,Y,= 0}, Attitudesof towardcounterfactuals and suspicion struc- fl: y = x or {Yo = O,Y1= 1} turalequation models are currently pervasive among statis- x ticians,and Dawid should be commendedfor bringing such f2: y Xor {Yo = 1,Y,= 0}, concernsinto the open. By helpingto dispelmisconcep- and tionsabout counterfactuals, Dawid's article may well have f3: y = or{Yo = 1, Y =1}. (1) rescuedstatistics from another century of stagnationover As u varies along its domain,the only effectit can have causality. on the model is to switchthe relationshipbetween X and Y among these fourfunctions. This partitionsthe domain 2. THE EMPIRICAL CONTENT OF of U into fourequivalence classes, whereeach class con- COUNTERFACTUALS tains those pointsu thatcorrespond to the same function. The word"counterfactual" is a misnomer.Counterfac- The probabilityP(u) thus induces a probabilityfunction tualscarry as clearan empiricalmessage as anyscientific over the potential response pairs {Yo, Y1} shownin (1). This laws,and indeed are fundamentalto them.The essenceof constructionis the inverseof the one discussedin Dawid's any scientificlaw lies in the claimthat certain relation- Section 13; one startswith genuineconcomitants U, and JudeaPearl is Professorof ComputerScience and Statistics,University ? 2000 American Statistical Association of California,Los Angeles,CA 90024 (E-mail:[email protected], Web: Journal of the American Statistical Association www.cs.ucla..edu/-judea/). June 2000, Vol. 95, No. 450, Theory and Methods 428 This content downloaded from 131.179.232.118 on Fri, 12 Sep 2014 18:41:13 PM All use subject to JSTOR Terms and Conditions Pearl: Comment 429 theyturn into jointly distributedcounterfactual concomi- queriesread: tants{Yo, Y1} thatDawid calls metaphysicaland fatalistic. Admittedly,when u standsas the identityof a person, QI = P (Y1 = 0) - P (YO = 0) the mappingof u into the pair {Yo, Y1} appears horridly and fatalistic,as if thatperson is somehowdoomed to reactin = a predeterminedway to treatment(X 1) and no treat- QII = P (YO 1|X = i,Y = 0). (2) ment (X = 0). However,if one views u as the sum total of all experimentalconditions that mightpossibly affect In words,Qll standsfor the probabilitythat my headache thatindividual's reaction (including biological, psychologi- would have stayedhad I not takenaspirin (Yo = 1), given cal, and spiritualfactors, operating both before and afterthe thatI did in fact take aspirin (X = 1) and the headache applicationof the treatment),then the mappingis seen to has gone (Y = 0). (I restrictthe population to personswho evolve reasonablyand naturallyfrom the functionalmodel have headachesprior to consideringaspirin.) Dawid is cor- y = f (x, u). This quasi-deterministicfunctional model mir- rect in statingthat the two queries are of differenttypes, rorsLaplace's conceptionof nature(Laplace 1814), accord- and thelanguage of counterfactualsdisplays this difference ing to whichof nature'slaws are deterministic,and random- and its ramificationsin vividmathematical form. By exam- ness surfacesmerely due to our ignoranceof theunderlying iningtheir respective formulas, one can immediatelydetect boundaryconditions. (The structuralequation models used thatQll is conditionedon the outcomeY = 0, whereasQI in economics,biology, and stochasticcontrol are typical is unconditioned.This impliesthat some knowledgeof the examplesof Laplacian models.)Dawid deteststhis concep- functionalrelationship (between X and Y) mustbe invoked tion. This is not because it ever failed to match macro- in estimatingQll (Balke and Pearl 1994). I challengeDawid scopic empiricaldata (only quantummechanical phenom- to express Qll, let alone formulateconditions for its esti- ena exhibitassociations that might conflict with the Lapla- mationin a counterfactual-freelanguage. For background cian model),but rather because it appearsto standcontrary information,the identificationof QI requires exogeneity to the "familiarstatistical framework and machinery"(Sec. (i.e., randomizedtreatment), whereas thatof Qll requires 7). I fail to see why a frameworkand machinerythat did both exogeneityand monotonicity;both assumptionshave not exactlyexcel in the causal arena shouldbe deprivedof testableimplications (Pearl 2000, p. 294). Epidemiologists enhancementand retooling. are well aware of the differencebetween QI and Qll [they usuallywrite Qll = QI/P (Y = O|X = 1)], thoughthe cor- 3. EMPIRICISM VERSUS IDENTIFIABILITY respondingidentification conditions for Qll are oftennot Dawid's empiricismis summarizedin his abstract: spelled out as clearlyas theycould (Greenlandand Robins 1988). By definition,one can never observe such [counterfactual] quantities,nor assess empiricallythe validityof any model- What is puzzlingin Dawid's articleis thathe considers ing assumptionmade aboutthem, even thoughone's conclu- Qll to be, on one hand,valid and important(Sec. 3) and,on sions may be sensitiveto theseassumptions. the otherhand, untestable (Sec. 11); the two are irreconcil- able. If Qll is valid and important,then one shouldexpect This warningis not entirelyaccurate. Many counter- the magnitudeof Qll to affectsome futuredecisions, and factual modeling assumptionsdo have testable implica- can then use the correctnessof those decisions as a test tions; for example, exogeneity(or ignorability)(Y1 ii X) (henceinterpretation) of theempirical claims made by Qll. > and monotonicity(Y1 (u) Yo(u)) each can be falsifiedby What are those claims,and how can theybe tested? and data comparingexperimental nonexperimental (Pearl Accordingto theinterpretation

Causal Inference Without Counterfactuals

STATISTICAL SCIENCE Volume 36, Number 3 August 2021

A Matching Based Theoretical Framework for Estimating Probability of Causation

December 2000

ALEXANDER PHILIP DAWID Emeritus Professor of Statistics University of Cambridge E-Mail: [email protected] Web

Theory and Applications of Proper Scoring Rules

PHIL 8670 (Fall, 2015): Philosophy of Statistics

Computational and Financial Econometrics (CFE 2017)

Hyper and Structural Markov Laws for Graphical Models

Probability Forecasts and Prediction Markets A

Extended Conditional Independence and Applications in Causal Inference

Sufficient Covariate, Propensity Variable and Doubly Robust

Decision-Theoretic Foundations for Statistical Causality