<<

ARTICLE Communicated by Terrence Sejnowski

Causality, Conditional Independence, and Graphical Separation in Settable Systems

Karim Chalak Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 [email protected] Department of Economics, Boston College, Chestnut Hill, MA 02467, U.S.A.

Halbert White [email protected] Department of Economics, University of California, San Diego, La Jolla, CA 92093, U.S.A.

We study the connections between causal relations and conditional in- dependence within the settable systems extension of the Pearl causal model (PCM). Our analysis clearly distinguishes between causal notions and probabilistic notions, and it does not formally rely on graphical representations. As a foundation, we provide definitions in terms of suit- able functional dependence for direct causality and for indirect and total causality via and exclusive of a set of variables. Based on these founda- tions, we provide causal and stochastic conditions formally characterizing conditional dependence among random vectors of interest in structural systems by stating and proving the conditional Reichenbach principle of common cause, obtaining the classical Reichenbach principle as a corol- lary. We apply the conditional Reichenbach principle to show that the useful tools of d-separation and D-separation can be employed to estab- lish conditional independence within suitably restricted settable systems analogous to Markovian PCMs.

1 Introduction

This article studies the connections between probabilistic conditional inde- pendence and notions of causality based on functional dependence. Our results shed light on two questions fundamental to the understanding of empirical relationships. First, what implications for the joint distribution of variables of interest derive from knowledge of functionally defined causal relationships between them? Conversely, what restrictions (if any) on the possible causal relationships holding between variables of in- terest follow from knowledge of the joint governing these variables? These questions lie at the heart of Reichenbach’s (1956) principle of com- mon cause, which holds that if two random variables are correlated, then

Neural Computation 24, 1611–1668 (2012) c 2012 Massachusetts Institute of Technology 1612 K. Chalak and H. White one causes the other or there is an underlying common cause. They are also addressed in the context of the Pearl causal model (PCM) (Pearl, 2000) by such notions as d-separation and D-separation (Geiger, Verma, & Pearl, 1990). Nevertheless, the status of Reichenbach’s venerable principle is still ambiguous (Dawid, 2010a), and results for d-separation and D-separation establish conditional independence under a “Markovian” condition that re- quires the existence of certain jointly independent “background variables.” Not only is the Markovian condition strong, but it intermingles functionally Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 determined causal notions with probabilistic conditions (independence) to deliver its results. The main goal of this article, then, is to answer the two questions we have set out by connecting causal concepts founded on functional dependence to notions of probabilistic dependence without imposing strong Markovian- type assumptions and in such a way that the roles of causality and prob- ability are clearly delineated. Specifically, our main results provide new necessary and sufficient conditions for conditional independence relations to hold among structurally related variables of interest. These results are not accessible using the PCM. This delivers a generally applicable framework for disentangling the causal relations underlying probabilistic dependence, thereby facilitating learning about empirical relations in both experimental and nonexperimental contexts. (See Chalak & White, 2011, for a taxonomy of the use of conditional independence relations in this context.) We obtain our results using the settable systems (SS) causal framework of White and Chalak (2009; referred to hereafter as WC), an extension of the PCM. As WC and White, Chalak, and Lu (2010) discuss and illustrate with numerous examples, SS can better accommodate systems involving optimization, nonunique equilibrium, and learning. Among the features distinguishing SS from the PCM are SS’s lack of a unique fixed point re- quirement for the structural equations, and its “unlumping” of the PCM’s background variables into fixed system attributes that do not act causally and structurally exogenous variables that can play a causal role. To intro- duce SS here and to concretely illustrate its advantages for our purposes relative to the PCM or the causal framework of Spirtes, Glymour, and Scheines (1993; referred to hereafter as SGS), we use a simple advice-action- example in which an expert (e.g., a physician or financial advisor) advises an agent (e.g., a patient or investor) who is undertaking an action that influences an outcome of interest. In pursuing our goal, we make a number of related further contributions. First, by operating within SS, our results connecting causal and probabilistic relations are valid in contexts where other causal frameworks, such as PCM or SGS, may not apply. Second, to provide a foundation for connecting causal and probabilistic dependence, we introduce new rigorous and general function-based defi- nitions of direct causality, as well as of indirect and total causality via and exclusive of a set of variables. We use our advice-action-outcome example Causality and Conditional Independence in Settable Systems 1613 to demonstrate the intuitive content of our definitions. Although these def- initions may lend themselves to convenient graphical representations, they do not rely on properties of graphs. Our notions extend and complement definitions for indirect and path-specific effects in Pearl (2001) and Avin, Shpitser, and Pearl (2005) and related notions of direct, indirect, and total effects proposed in Robins and Greenland (1992), SGS, Robins (2003), Rubin (2004), Didelez, Dawid, and Geneletti (2006), and Geneletti (2007). Third, using these foundations, we establish general connections be- Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 tween causal and probabilistic dependence by stating and proving a new result, the conditional Reichenbach principle of common cause. As a corol- lary,this makes rigorous the classical (unconditional) Reichenbach principle of common cause. Among other things, this clarifies ambiguities in the lit- erature surrounding the formal status of Reichenbach’s principle (Spohn, 1980; Hausman & Woodward, 1999; Cartwright, 2000; Dawid, 2010a). Sig- nificantly, the conditional Reichenbach principle holds for any and does not require Markovian structure, as in Geiger et al. (1990), SGS, or Pearl (1993, 1995, 2000). Because the conditional Reichenbach prin- ciple is not restricted to Markovian systems, it permits a clear separation between causal relations and probabilistic relations. Fourth, we apply SS and the conditional Reichenbach principle to shed light on a variety of results from the PCM and DAG literature connecting causality and conditional independence. We show that the useful tools of d-separation and D-separation (Geiger et al., 1990) can be applied within a suitably restricted SS, analogous to Markovian PCMs. Specifically, we show that the conditional Reichenbach principle ensures the directed lo- cal Markov property (Lauritzen, Dawid, Larsen, & Leimer, 1990) among variables generated by the unimpeded evolution of the restricted settable system. In such systems, d-separation and D-separation (Geiger et al., 1990) can be used to establish conditional independence. We show as well how causal intuitions associated with these graphical separation principles can fail, even in Markovian systems, or hold, even in non-Markovian sys- tems. Correspondingly, the PCM and DAG literature recognizes that for Markovian PCMs, d-separation is sufficient but not necessary for condi- tional independence. To accommodate this, SGS refer to distributions in which failure of d-separation implies conditional dependence as “faithful” and Pearl (2000) refers to such distributions as “stable.” It is not clear, how- ever, what is excluded by faithfulness and stability restrictions. The need for these qualifications arises in part because the graphical PCM and DAG causal semantics is not sufficiently rich to accommodate functional defini- tions of causality. Because the conditional Reichenbach principle does not rely on graphs, and therefore does not rely on d-separation, it offers deeper insight into the notions of faithfulness and stability. The PCM and DAG lit- erature also establishes that conditioning on “common successors” induces conditional dependence among “causes” in faithful or stable Markovian systems (SGS; Pearl, 2000). We provide mild restrictions on probability 1614 K. Chalak and H. White measures and response functions demonstrating that this property holds not only for faithful or stable Markovian systems but also systems that need not be faithful or stable or Markovian. Taken together, the results of this article show how the SS extension of the PCM overcomes several cogent criticisms of the use of PCM DAGs for the study of the connections between causality and conditional inde- pendence (Dawid, 2002, 2010a, 2010b), while preserving many appealing features of the PCM, including response functions. In particular, our results Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 show that certain PCM and DAG concepts invoked in the literature, such as PCM background variables, the Markov properties, DAG-enhanced bases, DAG chance and deterministic nodes, and the assumption of faithfulness or stability, are not fundamental to establishing these connections. Neverthe- less, we show how certain of these can be helpful for studying conditional independence relations in suitably restricted settable systems. The article is organized as follows. In section 2, we review related strands of the literature that provide background relevant to our goals. Section 3 introduces our advice-action-outcome example, using this to illustrate certain difficulties, related to those discussed in Dawid (2002, 2010a, 2010b), that arise when studying the connections between probabilis- tic and causal relations using probabilistic DAGs, the PCM, and PCM DAGs. Section 4 places this example in the SS framework, illustrating various features of SS and showing how SS overcomes the difficulties encountered in section 3. Sections 5, 6, and 7 formalize the content of section 4. Section 5 formally introduces a version of WC’s SS, conveniently suited to formulating rigor- ous definitions, given in section 6, of direct causality based on functional dependence, as well as notions of indirect causality via and exclusive of a set of variables in recursive systems. With this foundation, section 7 es- tablishes the conditional Reichenbach principle of common cause, yielding necessary and sufficient causal and stochastic conditions for conditional dependence of vectors of random variables in recursive SS. The traditional Reichenbach principle obtains as a corollary. Section 8 applies the condi- tional Reichenbach principle to shed light on a variety of results from the PCM and DAG literature connecting causality and conditional indepen- dence. Section 9 summarizes and concludes. The appendix contains proofs of all results.

2 Background and Related Literature

2.1 Probabilistic DAGs. Graphical representations of probabilistic re- lations, and in particular conditional independence relations (Dawid, 1979), have been extensively studied in artificial intelligence and (Lauritzen & Spiegelhalter, 1988; Smith, 1989; Pearl, 1988, 2000; Lauritzen & Richardson, 2002; Wermuth & Cox, 2004). An important contribution is the introduction of graphical criteria, applicable to DAGs, that characterize Causality and Conditional Independence in Settable Systems 1615 Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 1: G1. independence and conditional independence relations among variables in Bayesian networks or directed Markov fields (Lauritzen et al., 1990; Geiger et al., 1990). Such a DAG is said to represent a probability distribution for random variables, represented each at a node, when the joint density func- tion exists and factorizes as the product of the densities of each conditional on its parents in the graph. For example, in G1 (see Figure 1), we have

( , , , , ) = ( ) ( ) ( | , ) ( | ) ( | ), p x1 x2 x3 x4 x5 p1 x1 p2 x2 p3 x3 x1 x2 p4 x4 x2 p5 x5 x4 where the left-hand term denotes the joint density and each right-hand term denotes the density of one variable conditional on the value of its “parents.” Following Dawid (2002), we refer to DAGs of this kind as “probabilistic DAGs.” Lauritzen et al. (1990, theorem 1) show that the joint density admits such a recursive factorization if and only if the collection of conditional inde- pendence statements that each variable is conditionally independent of its nondescendants given its parents in the DAG holds. Lauritzen et al. (1990) call this the “directed local Markov property”; in SGS (p. 54) this is the “causal Markov property”; Pearl (2000, theorem 1.2.7) calls it the “parental ⊥ Markov condition.” Letting denote independence, G1 implies, for exam- ⊥ ⊥ | ( , ) ple, that X1 X2 and X3 X4 X1 X2 for any distribution represented by G1.

2.2 Attributing Causal Meaning to Probabilistic DAGs. Acausal meaning is sometimes attributed to DAGs (SGS; Pearl, 2000). For exam- ple, an arrow from X2 to X3 in G1 (see Figure 1) is interpreted as “X2 is a . direct cause of X3 ” But as Dawid (2002, p. 164) notes, “There is absolutely nothing in the probabilistic semantics by which such graphs are supposed to be interpreted that is relevant to such causal intuitions.” Dawid (2010a, p. 66) warns that to represent causal concepts such as direct causal effect using a DAG, these must be defined a priori “by other, necessarily non-graphical, considerations not involving these terms.” In 1616 K. Chalak and H. White the PCM, certain causal concepts based on functional dependence are rep- resented by a DAG. Specifically, Pearl (2000) defines total and direct causal effects, linking these concepts to the connectivity of corresponding PCM DAGs. Pearl (2001) and Avin et al. (2005) similarly provide PCM DAG- based definitions for indirect effects and path-specific effects. Related no- tions of direct, indirect, and total effects have been proposed in Robins and Greenland (1992), SGS, Robins (2003), Didelez et al. (2006), and Geneletti (2007) (see also Rubin, 2004). Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 The implications of PCM notions of causality on conditional indepen- dence relations have been investigated in Markovian PCMs (Geiger et al. 1990; SGS; Pearl, 1993, 2000). But concerns regarding the strong assumptions on causal structures and probability measures imposed in Markovian PCMs have been raised in the literature (Dawid, 2002; Duvenaud, Eaton, Murphy, & Schmidt, 2010). Dawid (2002, 2010a, 2010b) argues that a clear separation of causal and probabilistic semantics and an explicit statement of the im- posed assumptions is needed to justify the simultaneous use of a DAG to appropriately represent probabilistic and causal relations embodied in the PCM. Dawid (2002, 2010a, 2010b) advocates the use of “extended condi- tional independence” relations and their representations using “influence diagrams” to achieve this and presents a series of instructive examples and discussions. These papers nevertheless do not put forward a self-contained formal framework accomplishing the desired separation. Nor does this ex- ist elsewhere. As stated in section 1, a main goal here is to provide such a framework.

2.3 d-Separation. Using properties of conditional independence (Dawid, 1979; Studeny, 1993), one can infer further conditional indepen- dence relations that hold among the variables represented in a probabilistic DAG. In particular, Geiger et al. (1990; see also Verma & Pearl, 1988; Geiger & Pearl, 1993; Pearl, 2000) provide a graphical criterion, called d-separation, that can identify exactly the conditional independence relations implied by a probabilistic DAG under the “graphoid” axioms. Lauritzen et al. (1990, proposition 3) provide graphical criteria equivalent to d-separation and show that when applied to a probabilistic DAG, these are equivalent to the directed local Markov property (Lauritzen et al., 1990, theorem 1).

For example, in probabilistic DAG G1, one can inspect whether nodes ⊆{ ,..., }/{ , }. Xi and Xj are d-separated by a set of nodes W X1 X5 Xi Xj For , − this, let an (Xi Xj) trail in G1 be any sequence of arrows linking Xi to Xj irrespective of their directionality. Then Wd-separates Xi and Xj in G1 , ∈ if every (Xi Xj)-trail in G1 contains either a node Wk W that does not , have converging arrows along the (Xi Xj)-trail or (2) a node Xk that has , converging arrows along the (Xi Xj)-trail, such that neither Xk nor any of its descendants are in W (Geiger et al., 1993; Pearl, 2000, definition 1.2.3; see also Lauritzen et al., 1990 for an equivalent graphical criterion). Thus, from Causality and Conditional Independence in Settable Systems 1617

Figure 2: G2. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 ⊥ | ⊥ | G1, X3 X4 X2 since X3 and X4 are d-separated by X2 and X2 X5 X4 since X4 d-separates X2 and X5.

2.4 Attributing Causal Meaning to d-Separation. Implications of d-separation have been ascribed causal intuition (Pearl, 2000). In Figure 1, ⊥ | , d-separation implies X2 X5 X4 which has been interpreted to mean that conditioning on a variable X4 that fully mediates the effect of a cause X2 on a response X5 renders X2 and X5 conditionally independent. Similarly, ⊥ | X3 X4 X2 has been interpreted to mean that conditioning on the com- mon cause X2 of the two effects X3 and X4 renders X3 and X4 conditionally ⊥ | independent. Also, the fact that X1 X2 X3 is not implied by d-separation has been attributed to the notion that conditioning on a common response

X3 of causes X1 and X2 renders these conditionally dependent. As Dawid (2002) warns, there is no formal basis for such causal interpretations in probabilistic DAGs.

3 A Motivating Example

To illustrate issues raised in the preceding section and motivate the formal results presented in the subsequent sections, we consider a concrete exam- ple where an expert e advises an agent a on an action that may influence an outcome of interest to a. For example, e may be a physician recommending a medical treatment to patient a,ore may be a financial expert recommending an investment plan to investor a.

3.1 A Probabilistic DAG. Consider the simple probabilistic DAG in c c c Figure 2 involving variables Y1 , Y2 ,andY3 denoting the advice of expert 1 e, action of agent a, and outcome, respectively. Probabilistically, G2 shows c ⊥ c | c that e’s advice is independent of the outcome given a’s action, Y1 Y3 Y2 , c c due to the lack of an arrow between Y1 and Y3 , implying d-separation.

3.2 A Pearl Causal Model. A PCM (Pearl, 2000, definition 7.1.1) for this c c example assumes that each of the endogenous variables (Y1 for advice,Y2 for c action, and Y3 for outcome) is determined as a function of its parents and

1The superscript c denotes canonical variables arising from the natural (unimpeded) operation of the system. This conforms with notation formally introduced below. 1618 K. Chalak and H. White background variables that are “often unobservable” (Pearl, 2000, p. 203). , , For simplicity, let U1 U2 and U3 be random background variables, each c c c associated with the endogenous variables Y1 , Y2 ,andY3 , respectively, as in Pearl (2000, p. 68), for example. In particular, suppose that

c = ( ), Y1 g1 U1 c = ( c, ), Y2 g2 Y1 U2 Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 c = ( c, ), Y3 g3 Y2 U3

, with g1, g2 and g3 denoting potential response functions. Observe that we c assume thatY1 is not an explicit argument of the potential response function g3. As WC discuss, the PCM rules out any causal role for the background U’s, since these are not subject to “counterfactual variation” (see Pearl, c 2000, definition 7.1.3). Since g3 excludes Y1 from its arguments, e’s advice is assumed to not directly cause the outcome. Also, this PCM implicitly c c c excludes endogenous variables other than Y1 , Y2 ,andY3 . Dawid (2010b) argues for explicitly referencing the “causal ambit,” that is, the set of vari- { c, c, c} ables to which the subset of endogenous variables Y1 Y2 Y3 belongs, in order to discuss notions such as unobserved common cause. We return to this shortly.

3.3 d-Separation and Conditional Independence in PCM DAGs. How, if at all, can this PCM generate the same conditional independence ( c, c, c ) relations among Y1 Y2 Y3 as those encoded by the probabilistic DAG G2? ( , , ) To answer this, assume that background variables U1 U2 U3 correspond- ing to the advice, action, and outcome are jointly independent. This yields a Markovian model in which the jointly independent “arbitrary distributed randomdisturbances...representindependentbackgroundfactorsthatthe investigator chooses not to include in the analysis” (Pearl, 2000, pp. 68–69). Now consider the PCM DAG associated with this Markovian model. In PCM DAGs, arrows between endogenous variables denote direct causal relations (Pearl, 2000). Thus, the assumption that e’s advice does not directly c c cause the outcome is represented by a missing arrow fromY1 toY3 . Typically c c c only the endogenous variables Y1 , Y2 ,andY3 are represented at the nodes of a Markovian PCM DAG (Pearl, 2000). This yields the PCM DAG depicted by G2, which is isomorphic (has identical connectivity) to the probabilistic DAG also depicted by G2. Using this PCM structure and properties of conditional independence, c ⊥ c | c, 2 it can be shown that Y1 Y3 Y2 as represented in probabilistic DAG G2.

2 c = ( c, ) We refer to lemmas in Dawid (1979) in what follows. Since Y3 g3 Y2 U3 ,we c ⊥ c|( c, ) ( , , ) c = ( ) have that Y1 Y3 Y2 U3 . By joint independence of U1 U2 U3 and since Y1 g1 U1 Causality and Conditional Independence in Settable Systems 1619

Here, the PCM represented by the PCM DAG G2 generates conditional ( c, c, c ) independence relations among the endogenous variables Y1 Y2 Y3 that coincide with those encoded via the d-separation criterion in probabilistic . DAG G2 isomorphic to PCM DAG G2

3.4 Conditional Independence Without d-Separation. Suppose that agent a fully complies with expert e’s advice. One way to represent this Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 is to exclude background variable U2 from the arguments of g2 so that c = ( c ) Y2 g2 Y1 . In this case, arguments similar to those in section 3.3 give that c ⊥ c | c c ⊥ c | c c c Y1 Y3 Y2 . But we now also have that Y2 Y3 Y1 , even though Y2 and Y3 c are not d-separated by Y1 in probabilistic DAG G2. Here, the PCM generates conditional independence relations that are not encoded via d-separation in probabilistic DAG G2. c c c Because Y2 is fully determined by Y1 , Geiger et al. (1990) refer to Y2 as c c a “deterministic node” and to Y1 and Y3 as “chance nodes.” Nevertheless, c, c, c Y1 Y2 and Y3 are all random variables, and the distinction between deter- ministic and chance nodes is not immediately readable from PCM DAG G2 because background variables are not represented there. Rather, additional information regarding the nature of the dependence of the endogenous vari- ables on the background variables is needed in order to distinguish between chance and deterministic nodes. Geiger et al. (1990) provide an alternative graphical criterion called D-separation to imply conditional independence relations in a DAG that modifies probabilistic DAG G2 to encode such distinctions at the nodes.

3.5 d-Separation Without Conditional Independence. Now suppose ( , , ) that the background variables U1 U2 U3 are not jointly independent. Be- cause we do not modify the causal relations among the endogenous vari- ables, the arrows linking endogenous variables in PCM DAG G2 remain un- c ⊥ c | c altered. However, the arguments from section 3.3 to establish Y1 Y3 Y2 c ⊥ c | c c c are no longer valid and Y1 Y3 Y2 may hold, even though Y1 and Y3 are c d-separated by Y2 in PCM DAG G2. As we discuss below, there may be particular choices for the background variable distribution and for the po- c ⊥ c | c tential response functions ensuring thatY1 Y3 Y2 . Nevertheless, the func- tions and underlying probability distribution needed for this are extremely special. c ⊥ c | c In the absence of such special conditions, Y1 Y3 Y2 generally, so the PCM does not generate conditional independence relations that are im- plied by d-separation in the probabilistic DAG corresponding to the PCM

DAG G2. Here, too, the PCM DAG must be augmented with structure to

c = ( ( ), ), ( c, c ) ⊥ and Y2 g2 g1 U1 U2 lemma 4.2(i) gives that Y1 Y2 U3 and in particular that c ⊥ | c c ⊥ ( , c )| c Y1 U3 Y2 by lemma 4.3. The converse of lemma 4.3 then gives that Y1 U3 Y3 Y2 . c ⊥ c| c Finally, lemma 4.2(i) ensures that Y1 Y3 Y2 . 1620 K. Chalak and H. White avoid incorrect probabilistic inference based on d-separation. To show de- pendence, a PCM DAG is augmented with bidirected arcs between nodes corresponding to endogenous variables whose background variables are not independent. Since the PCM rules out exogenous causes, one way to accommodate these is to assign certain background variables a causal status within the PCM so that they then become endogenous. Indeed, to facilitate applying the d-separation criterion to PCM DAGs, a bidirected arc linking two en- Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 dogenous variables is often replaced in the PCM DAG by an unobserved endogenous common cause of the two endogenous variables (Pearl, 2000). But this implies that to study the connections between conditional inde- pendence relations and causal relations within the PCM framework, one must specify the causal ambit discussed in Dawid (2010b); specify which observables and unobservables are endogenous; and, importantly, assume independence or dependence relations among background variables that do not have a causal status. The existence of jointly independent background variables is a strong assumption. Often these do not emerge naturally from the system of interest and thus appear artificial. As Dawid (2002, p. 183) observes, “When the additional variables are pure mathematical fictions, introduced merely so as to reproduce the desired probabilistic structure of the domain variables, there seems absolutely no good reason to include them in the model.”

4 Settable Systems Formulation

This section places the example within the settable system framework. This permits separating causal and probabilistic semantics and addressing the difficulties arising for the PCM in connecting causal and probabilistic relations discussed in section 3. A settable systems model for our advice-action-outcome scenario repre- sents an environment in which advice, action, and outcome may be random. Recall that random variables are measurable mappings from an underly- ing measurable space, say, (, F ), to the real line or one of its subsets. For c example in our scenario, the agent’s action Y2 is a random variable. Thus, it c → R 3 is not a real number but rather a measurable mapping Y2 : ,say. We left these underlying details implicit above, but to properly describe SS, we must now make them explicit. Also, we did not distinguish above between the agent’s random action c, Y2 determined in response to the advisor’s advice, and the random action c. determining the random outcome, Y3 In SS, however, this distinction is crucial. For example, in SS, we distinguish between an action determined

3 c c = c(ω) ω ∈ ; c Y2 determines a real number y2 Y2 for each y2 is called a realized value c. {ω c(ω) ≤ }∈F ∈ R or realization of Y2 Measurability means that : Y2 y for all y . Causality and Conditional Independence in Settable Systems 1621

, in response to other system variables, that is, a response Y2 and an arbi- trary action that may determine other system variables, that is, a setting . Z2 Thus, in SS, actions come in two flavors: responses and settings. The same is true for all other system variables. This enforces Strotz and Wold’s (1960) prescription to distinguish between arbitrary values for system vari- ables and values determined by the operation of the system, for example, as determined by optimization or equilibrium.4 To accomplish a similar ob- jective, the PCM defines degenerate settings via submodels (Pearl, 2000; see Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 also Avin et al., 2005), given a unique fixed point assumption (not required here), with extensions to stochastic interventions given in Pearl (2000, 4.2; see also Shpitser, Richardson, & Robins, 2009). = In the special case, further discussed below, where Zi Yi for all i ,we call the responses and settings canonical, as these represent the unimpeded c ≡ c ≡ = evolution of the system. In this case, we write, Y2 Z2 Z2 Y2 and refer to these as canonical responses or canonical settings. Thus, our example con- cerns a canonical system; settable systems describe structures that underlie (and include) canonical systems. To provide a convenient foundation for our advice-action-outcome ex- =×3 ample, we take to be i=0 i, with each i a copy of the principal ω ≡ (ω ,...,ω ) ∈ space 0. An element 0 3 represents a possibility, with ω ω components i determining the value of setting Zi and 0 representing a state of nature—typically unobservable—that may directly influence re- sponses. This product structure is not necessary, but it is convenient, as it enables a clear delineation of the various channels of . (ω ) First, consider a setting value for expert e’s advice, Z1 1 . Formally, → S , S ⊆ R Z1 : 1 1 where 1 represents the admissible values for e’s advice. Agent a’s response value is

(ω ,ω ) ≡ (ω , (ω )), Y2 0 1 r2 0 Z1 1 where r2 is a measurable response function that maps e’s advice and the state ω S ⊆ R of nature 0 to a’s action, taking values in [2] , the admissible values for ω , a’s response. The response function can be constant in 0 so only e’s advice (ω ), determines a’s action; it can be constant in Z1 1 so a always ignores e’s advice; or it can depend on both, yielding a random action Y2 determined in part by e’s advice and in part by other random factors. Typically r2 embodies some underlying principle or law; for example, a may choose her action to optimize her expected utility. , → S S ⊆ S ⊆ In contrast to the response Y2 the setting Z2 : 2 2 (with [2] 2 R (ω ). ) gives an arbitrary (typically suboptimal) action value, Z2 2 Generally

4Strotz and Wold (1960) describe this as wiping out equations of the system that otherwise govern the behavior of a given variable and replacing these with arbitrary values. 1622 K. Chalak and H. White

ω , the outcome response value can depend on the state of nature, 0 e’s advice, (ω ), (ω ) Z1 1 and a’s action, Z2 2 ,thatis,

(ω ,ω ,ω ) ≡ (ω , (ω ), (ω )), Y3 0 1 2 r3 0 Z1 1 Z2 2 where r3 is the outcome response function. In our advice-action-outcome (ω ), example, r3 does not depend on Z1 1 e’s advice; we relax this restriction Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 below. Unlike the PCM in section 3.2 (see also Pearl, 2000), the setting Z2 of . a’s action need not coincide with a’s response Y2 As mentioned above, the PCM employs submodels instead of settings.

In SS, both the setting Z2 and the response Y2 refer to a’s action. Together X { , }× → S they define the settable variable 2 : 0 1 2 for a’s action as

X ( ,ω)≡ (ω ,ω ) X ( ,ω)≡ (ω ), ω ∈ . 2 0 Y2 0 1 and 2 1 Z2 2

Similarly, the outcome setting Z3 and response Y3 define the outcome set- X X ( ,ω)≡ (ω ,ω ,ω ) X ( ,ω)≡ (ω ). table variable 3 as 3 0 Y3 0 1 2 and 3 1 Z3 3 In our example, the expert’s advice is not determined by a’s action set- tings Z2 or the outcome settings Z3. Wecall variables not directly determined ω ω by any system variables other than 0 or 1 fundamental, and we write fun- (ω ) ≡ (ω ) . damental response values Y1 0 r1 0 for some response function r1 X ( ,ω)≡ (ω ) The corresponding fundamental settable variables are 1 0 Y1 0 X ( ,ω)≡ (ω ). and 1 1 Z1 1 For fundamental variables, it is often convenient to ω ≡ ω (ω ) ≡ (ω ) = . let 1 0 and set r1 0 Z1 1 so that Y1 Z1 It is also convenient to define the principal setting Z0 as the identity map- → . ping Z0 : 0 0 The corresponding principal response Y0 and principal X X ( ,ω)≡ (ω ) = ω , settable variable 0 are such that by convention, 0 1 Z0 0 0 X ( ,ω)≡ (ω) = (ω ) X ≡{X , X , X , X } and 0 0 Y0 Z0 0 . Writing 0 1 2 3 ,wecanrep- resent this example as the settable system S ≡{(, F ), X }. A significant feature of the SS formalism is that the presence of the X principal settable variable 0 obviates the need to introduce background variables, as in the PCM, to induce randomness in the responses. Instead, SS explicitly specify the dependence of the responses on other settings and , on elements of the principal space 0 indexing states of nature (see also Heckerman & Shachter, 1995). Nor can we dispense with this structure without dispensing with the foundations needed to formalize stochastic behavior. Systems in which the principal settable variable is absent (i.e., constant) are necessarily nonstochastic. S In , the action and outcome responses Y2 and Y3 are determined sep- arately as functions of all other system settings. Alternatively, we may consider what happens when the action and outcome responses are jointly determined under uncertainty, given a setting of e’s advice. To represent this, we partition the system’s n = 3 units into blocks. Specifically, consider blocking together units 2 and 3, separate from a block including just unit Causality and Conditional Independence in Settable Systems 1623

 ≡{ , },  ={ } 1. This is represented by the partition 1 2 where 1 1 and  ={ , }   2 2 3 . In this case, responses Y2 and Y3 are jointly determined as   (ω) ≡  (ω ), ω , ∈  , Yi ri Z1 1 0 for i 2

 with Z1 a setting of e’s advice under this partition. Thus, settings and response functions r are partition specific. This system contrasts with the i Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 elementary partitioned settable system S. In the elementary partition, each e ={} unit i forms its own block i i . Partitioning in SS permits operations analogous to those implemented by the submodel and do operator devices of the PCM. For the remainder of this example, we work with the elementary e ={e, = , , } S = Se ≡{(, F ), (e, X e )} partition i i 1 2 3 defining , where now the partition and the partition dependence of the settable variables are made explicit. An important feature of this system Se is the inherent ordering of the X X variables such that settings of i may determine responses of j only if i < j. When this holds, we say that the system is a recursive partitioned settable system.

4.1 Causality in Settable Systems. What does it mean for e’s recom- X X S mendation 1 to directly cause a’s action 2 in settable system ? To for- ( , , , malize this notion, we first define an admissible intervention z0 z1 z2 ) → ( ∗, ∗, ∗, ∗ ) (X , X , X , X ) ( , , , ) z3 z0 z1 z2 z3 to 0 1 2 2 to be two points z0 z1 z2 z3 and ( ∗, ∗, ∗, ∗ ) S ⊆ × S × S × S , z0 z1 z2 z3 in the admissible space [0:3] 0 1 2 3 the space of all jointly admissible setting values. Underlying this intervention is a pri- ω → ω∗, ω = (ω ,ω ,ω ,ω ) mary intervention defined as two possibilities 0 1 2 3 ω∗ = (ω∗,ω∗,ω∗,ω∗ ). and 0 1 2 3 Often we may hold constant all but one element in considering interven- ω → ω∗, ω∗ = (ω ,ω∗,ω ,ω ), tions, for example, where 0 1 2 3 yielding setting = (ω ) ∗ = (ω∗ ) values z1 Z1 1 and z1 Z1 1 of e’s recommendations in a state of na- = ω (ω ,ω ) ture z0 0. Due to the recursive structure, any differences between 2 3 (ω∗,ω∗ ) and 2 3 are irrelevant. Thus, it suffices here just to consider pairs of S ⊆ × S . possibilities in [0:1] 0 1 Generally, constraints on joint setting values ( , ) S = × S S = × S . z0 z1 may imply that [0:1] 0 1; otherwise, [0:1] 0 1 Causal relations in SS are features of the response functions over their X X S domain. For example, we say that 1 directly causes 2 in if there ex- ( , ) → ( , ∗ ) ists an admissible intervention z0 z1 z0 z1 with a nonzero direct ef- D ( , , ∗ ) ≡ ( , ∗ ) − ( , ), X ⇒D X . fect 0,1,S z0 z1 z1 r2 z0 z1 r2 z0 z1 and we write 1 S 2 Oth- D X X S X ⇒ X erwise, 1 does not directly cause 2 in , written 1 S 2. We emphasize that this defines causal relations in terms of settable vari- ables rather than in terms of arbitrary random variables or events. The latter have no necessary causal structure beyond that arising from the fact that random variables are measurable functions of some underlying ω. In contrast, settable variables embody explicit structural relations holding 1624 K. Chalak and H. White

Figure 3: G3. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 among the variables of the system. This further entails that causal relations are always relative to a settable system and, in particular, relative to the governing partition. X X Similarly, we can formalize the notion of 1 directly causing 3 in system S . Specifically, a setting Z1 of e’s recommendation may directly influence . a’s outcome Y3 For example, this may represent a form of placebo effect X X in the physician and patient example. We say that 1 directly causes 3 S ( , , ) → ( , ∗, ) in if there exists an admissible intervention z0 z1 z2 z0 z1 z2 to (X , X , X ) D ( , , ∗, ) ≡ ( , ∗, ) − ( , , ) = 0 1 2 such that 0,3,S z0 z1 z1 z2 r3 z0 z1 z2 r3 z0 z1 z2 0, X ⇒D X X X and we then write 1 S 3. Otherwise, 1 does not directly cause 3 D S X ⇒ X X in , written 1 S 3. Below, we discuss the sense in which 1 acts on X without operating through any other system variable; in such cases, 3 ∼{ } X X X S, X ⇒2 X . we say that 1 causes 3 exclusive of 2 in written 1 S 3 These two concepts are closely related, but it will be important to distinguish them. It can be useful to visually represent causal relations in S using a di- rect causality graph. This consists of a collection of nodes corresponding to settable variables and a collection of directed arrows between nodes. A directed arrow links one node to another if and only if the first is a direct cause of the second. For example, Figure 3 visualizes possible direct causal- ity relations in system S. We emphasize that direct causality graphs are neither probabilistic DAGs nor PCM DAGs.

4.2 Conditional Independence in Settable Systems. We now consider conditional independence relations among certain random variables in set- table system S. Here we focus on the responses of settable variables in idle regimes (Pearl, 2000; Dawid, 2010a), that is, in an environment where the system operates naturally (i.e., without interference). For this, we focus on canonical settings of e’s recommendation and a’s action and outcome, c , c , c , 5 denoted by Z1 Z2 and Z3 determined as

5 ω ,ω , ω c , c , c , Note that 1 2 and 3 appear as arguments of Z Z and Z respectively. Thus, ω ,ω , ω ω . 1 2 3 ω = ω = 1 2 and 3 must be functions of 0 Without loss of generality, we may take 1 2 ω = ω . 3 0 Causality and Conditional Independence in Settable Systems 1625

Figure 4: G4. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 c (ω ) = c(ω ) = (ω ), Z1 1 Y1 0 r1 0 c (ω ) = c(ω ,ω ) = (ω , c (ω )), Z2 2 Y2 0 1 r2 0 Z1 1 c (ω ) = c(ω ,ω ,ω ) = (ω , c (ω ), c (ω )). Z3 3 Y3 0 1 2 r3 0 Z1 1 Z2 2

ω c(ω ) ω Given a possibility ,experte recommends Y1 0 based solely on 0, c(ω ) = (ω , c(ω )), c(ω ) = and a responds with Y2 0 r2 0 Y1 0 yielding an outcome Y3 0 (ω , c(ω ), c(ω )). c, c, c r3 0 Y1 0 Y2 0 We call Y1 Y2 and Y3 canonical responses. In this special case, settings and responses of settable variables coincide. These now correspond to the PCM endogenous variables discussed in section 3.2. The PCM fixed point requirement holds trivially because of recursivity. We c = c = . may also refer to the principal canonical setting and response Z0 Y0 Z0 By focusing on canonical settings, we specify the regimes underlying canonical responses, as in Dawid (2002, 2010a, 2010b). Nevertheless, we maintain that response functions are invariant to different settings of a system’s settable variables. This is essentially without loss of generality, as the response function fully embodies the consequences for the response of any change in the argument settings. Now let P be a probability measure on (, F ). How can conditional in- c, c, c dependence relations hold among canonical responses Y1 Y2 and Y3 ?We distinguish between two possibilities. First, we consider conditional inde- pendence relations that hold among canonical responses for any probability measure. Second, we consider conditional independence relations that may hold only for some probability measures on (, F ). To illustrate, suppose that e’s recommendation fully determines a’s ac- tion, such as when patient a fully complies with doctor e’s advice. Then X X , X . 0 hasnoimpacton 2 except through 1 (Having introduced canonical settings, effects via and exclusive of other settable variables are meaning- ∼{1} X ⇒ X , (X , X ) ful.) Here, 0 S 2 since for all admissible interventions to 0 1 the X X X S causal effect of 0 on 2 exclusive of 1 in is zero:

∼{1} ( , ∗, ) ≡ ( ∗, ) − ( , ) = . 0,2,S z0 z0 z1 r2 z0 z1 r2 z0 z1 0

D X ⇒ X Here, this also corresponds to 0 S 2 (see Figure 4). Thus, we can (ω) = (ω , (ω )) = ˜ ( (ω )) ˜ write Y2 r2 0 Z1 1 r2 Z1 1 for some measurable function r2. 1626 K. Chalak and H. White

Figure 5: G5. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

c c In particular, canonical settings Z0 and Z1 yield the canonical response c = ( c, c ) = ˜ ( c ). Y2 r2 Y0 Y1 r2 Y1 It follows that for any probability measure P, ∼{1} X ⇒ X c⊥ c | c. X 0 S 2 implies that Y2 Y3 Y1 In this circumstance, we say that 2 X X . and 3 are causally isolated given 1 The isolation is from the potential X . common cause 0 Because Figure 4 is a direct causality graph and not a probabilistic DAG, the notion of d-separation does not apply to it. Naively , X X (mis)applying d-separation to G4 we see that 2 and 3 are not d-separated X by 1 there. Similarly, suppose that a’s action fully determines the outcome. Then ∼{2} X ⇒ X ( , ( ), ) → ( ∗, 0 S 3 so that for all admissible interventions z0 r1 z0 z2 z0 ( ∗ ), )) (X , X , X ) r1 z0 z2 to 0 1 2 ,wehave

∼{2} ( , ∗, ) ≡ ( ∗, ( ∗ ), )) − ( , ( ), ) = , 0,3,S z0 z0 z2 r3 z0 r1 z0 z2 r3 z0 r1 z0 z2 0

X X X and now we say that 1 and 3 are causally isolated given 2.Wecan (ω) = (ω , (ω ), (ω )) = ˜ ( (ω )) then write Y3 r3 0 Z1 1 Z2 2 r3 Z2 2 for some measur- ˜ c able function r3. In particular, the canonical response Y3 is given by c = ( c, c, c ) = ˜ ( c ), Y3 r3 Y0 Y1 Y2 r3 Y2 and therefore for any probability measure P, ∼{2} ∼{2} X ⇒ X c⊥ c | c. X ⇒ X 0 S 3 implies that Y1 Y3 Y2 A sufficient condition for 0 S 3 is D D X ⇒ X X ⇒ X , that 0 S 3 and 1 S 3 as depicted in Figure 5. These examples demonstrate that for any probability measure P, con- ditional causal isolation is sufficient for conditional independence among canonical responses. It follows that failure of conditional causal isolation is a necessary requirement for conditional dependence for some P. In particular, X X X settable variables 2 and 3 must share the principal settable variable 0 as X a common cause exclusive of the third variable 1 in order for the canonical c c c. responses Y2 and Y3 to be conditionally dependent given Y1 We term this result the conditional Reichenbach principle of common cause. The unconditional counterpart of this result formally establishes Reichenbach’s principle of common cause in recursive SS. This states that, X trivially, the principal settable variable 0 must be a common cause of two X X c settable variables, say 1 and 2, in order for their canonical responses Y1 c c c and Y2 to be dependent. Otherwise, either Y1 or Y2 (or both) is a constant. Causality and Conditional Independence in Settable Systems 1627 Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 6: G6.

Figure 7: G7.

Observe that we do not need to employ, nor do we employ, notions of d-separation in the above discussion, as these are irrelevant here. On the other hand, conditional causal isolation is not necessary for condi- tional independence. The latter may arise for specific probability measures and causal relations. Systems analogous to Markovian PCMs fall into this category. SS can accommodate such special systems but need not be con- S fined to them. To illustrate, consider a Markovian-type settable system M S , X for the same example as above. In M the principal settable variable 0 does X , X X not directly cause settable variables 4 5,and 6, corresponding now to S e’s recommendation, a’s action, and the outcome. Instead, M assumes the X , X X S existence of fundamental settable variables 1 2,and 3 (so that M has n = 6 units) such that (see Figure 6):

(ω) = ( (ω )), Y4 r4 Z1 1 (ω) = ( (ω ), (ω )), Y5 r5 Z2 2 Z4 4 (ω) = ( (ω ), (ω ), (ω )). Y6 r6 Z3 3 Z4 4 Z5 5

Finally and importantly, the probability measure P ensures that the canon- ( c , c , c ) ical settings Z1 Z2 Z3 are jointly independent (securing the Markov property). , Now consider probabilistic DAG G7 isomorphic to the subgraph in- X , X X volving 4 5,and 6 in Figure 6. In Figure 7, we replace settable variables c, c, c with canonical responses Y4 Y5 and Y6 at the nodes. Arguments similar to those in section 3.3 demonstrate that conditional causal isolation, to- ( c , c , c ) gether with the joint independence imposed on Z1 Z2 Z3 via P, imply that the local Markov property holds in DAGs such as G7. Hence, one can 1628 K. Chalak and H. White Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 8: G8.

Figure 9: G9.

Figure 10: G10.

Figure 11: G11. apply d-separation to learn about conditional independence relations ( c, c, c ) among canonical responses Y4 Y5 Y6 . D c c For instance, if X ⇒S X as in Figure 8, we conclude that Y ⊥Y | 4 M 6 4 6 c, c c c Y5 since Y5 d-separates Y4 and Y6 in probabilistic DAG G9 (see Figure 9). This result formalizes the intuition that conditioning on a variable that fully mediates the effects of a cause on an effect renders the cause and D effect conditionally independent. Similarly, if X ⇒S X as in Figure 10, we 5 M 6 c⊥ c | c, c c c conclude thatY5 Y6 Y4 sinceY4 d-separatesY5 andY6 in probabilistic DAG G11 (see Figure 11), also formalizing the rough intuition that conditioning on a common cause of two variables renders these conditionally independent. c⊥ c | c When, as is true here, canonical responses are such that Y5 Y6 Y4 X X X X even though 5 and 6 are not causally isolated given 4, we say that 5 Causality and Conditional Independence in Settable Systems 1629 Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 12: G12.

Figure 13: G13.

X X and 6 are P-stochastically isolated given 4. In such cases, conditional independence holds only for specific probability measures, such as for the Markovian system of our example. D c c Next, suppose that X ⇒S X as in Figure 12. Then Y and Y are not 4 M 5 4 5 c d-separated given Y6 in probabilistic DAG G13 (see Figure 13). Although c ⊥ c | c lack of d-separation does not generally ensure that Y4 Y5 Y6 , it is often assumed that this is the case. For example, SGS refer to distributions in which failure of d-separation implies conditional dependence as “faithful,” and Pearl (2000) refers to such distributions as “stable.” Below, we give general conditions, valid for both Markovian and non-Markovian systems, under which this conclusion holds (see also Wermuth & Cox, 2004). When c ⊥ c | c Y4 Y6 Y5 holds, the heuristic intuition that conditioning on a common response induces dependence among independent causes is made formal. We emphasize that the assumptions about the existence of fundamental X , X X variables 1 2,and 3; the (lack of) causal relations involving these fun- c , c , c damental variables; and the canonical settings Z1 Z2 and Z3 being jointly independent are very strong assumptions that need not hold in general and that must be carefully justified if imposed. Nor are these assumptions and graphical criteria essential to the study of causal and probabilistic relations, as discussed above.

5 Settable Systems

In this and the following two sections, we expand on the content of the example to arrive at a general framework connecting causal relations and conditional independence. For this, we use the SS extension of the PCM. In this section, we briefly describe specialized versions of White and Chalak’s (2009) definitions that are sufficient for our purposes. We refer the interested 1630 K. Chalak and H. White reader to WC, White, Chalak, and Lu (2010), and White, Xu, and Chalak (2011) for detailed discussions of SS, their relationship to PCM, and a se- ries of examples demonstrating how SS extend the PCM to accommodate optimization, nonunique equilibrium, and learning. Heuristically, a stochastic settable system is a mathematical framework describing an environment in which a number of units interact under un- certainty. A unit is construed broadly. It could be a neuron, person, ma- chine, firm, market, or a player-decision pair in decision or game theory, for Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 example. There may be a countable infinity of units i, i = 1,...,n, where n ∈ N¯ + ≡ N+ ∪{∞} and N+ denotes the positive integers. When n =∞, we interpret i = 1,...,n as i = 1, 2,.... Random variables are defined on a measurable space (, F ); this provides the foundation for probabilis- tic statements. It is often convenient to define a principal space 0 and ≡×n . let i=0 i, with each i acopyof 0 An often convenient choice is = R. 0 X . X In SS, there is a settable variable i for each unit i i has a dual aspect. → S S It can be set to a random variable Zi : i i (the setting) where i,the , R admissible setting values for Zi is a multi-element subset of .Oritcanbe free to respond to settings of other settable variables in the system, denoted → S S ⊆ S by the response Yi : [i], with [i] i the admissible response values X for Yi. The response Yi of i to the settings of other settable variables is determined by a response function, ri, according to a governing principle such as optimization, yielding the response for unit i that is best in some X { , }× → S , sense. The dual role of a settable variable i : 0 1 i distinguishing X ( ,ω)≡ (ω) X ( ,ω)≡ (ω ), ω ∈ , responses i 0 Yi and settings i 1 Zi i enables us to formalize the directional nature of causal relations, whereby settings of some variables (causes) determine responses of others. The principal unit i = 0 plays a key role in understanding and formal- izing the connections between probabilistic and causal relations. We let the principal setting Z0 and principal response Y0 of the principal settable vari- X → (ω ) ≡ ω , able 0 be such that Z0 : 0 0 is the identity map, Z0 0 0 and we (ω) ≡ (ω ) define Y0 Z0 0 . The setting Z0 of the principal settable variable may directly influence all other responses in the system, whereas its response X Y0 is unaffected by other settings. Thus, 0 introduces an aspect of pure randomness to responses of settable variables. White and Chalak (2009) also explicitly reference attributes, that is, fixed S objects (e.g., numbers such as i or sets such as i). For conciseness, we leave attributes implicit here.

5.1 Elementary Settable Systems. In elementary SS, the response Yi is determined (actually or potentially) by the settings of all other variables in . = ( ). the system, denoted Z(i) Thus, in elementary SS, Yi ri Z(i) The relation = ( ) Yi ri Z(i) corresponds to a structural equation in the classical formulation of systems of structural equations (Heckman, 2005). Causality and Conditional Independence in Settable Systems 1631

S ⊆ × S S ⊂ × S The settings Z(i) take values in (i) 0 j=i j.Wehave (i) 0 j=i j if there are joint restrictions on the admissible settings values, such as in “mixed-strategy” static games of complete information, for example, where S certain elements of (i) might represent that must sum to one (see White & Chalak, 2009). We now give a formal definition of elementary SS.

Definition 1. An elementary settable system is a pair S ≡{(Ω,F), X } with Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 components defined as follows. Let (Ω,F) be a measurable space such that Ω ≡ ×n Ω ∈ N¯ +, Ω Ω i=0 i ,n with each i a copy of the principal space 0, containing at least two elements. To define X ≡{X0, X1,...,Xn}, let the principal setting Z0 : Ω0 → Ω0 be the identity mapping and define the principal response Y0 and principal settable variable X0 by Y0(ω) ≡ X0(0,ω) ≡ X0(1,ω) ≡ Z0(ω0),ω∈ Ω.Fori= + 1, 2,...,n, n ∈ N¯ , let Si be a multi-element Borel-measurable subset of R,and let settingsZi : Ωi → Si be surjective measurable functions. Let Z(i) be the vector including every setting except Zi and taking values in S(i) ⊆ Ω0 × j= i S j , S(i) = ∅. Let response functions ri : S(i) → S[i] with S[i] ⊆ Si be measurable functions, and define responses Yi (ω) ≡ ri (Z(i)(ω)). Settable variables Xi : {0, 1}×Ω → Si are given by

Xi (0,ω) ≡ Yi (ω) and Xi (1,ω) ≡ Zi (ωi ),ω∈ Ω.

A settable system is thus composed of a stochastic component, the mea- surable space (, F ), and a structural or causal component X , resting on the stochastic component and consisting of settable variables.

5.2 Partitioned Settable Systems. In definition 1, a single response Yi is free to respond to settings of all other variables in the system. We also wish to consider systems in which responses of several settable variables jointly depend on settings of the remaining variables in the system (Wermuth & Cox, 2004). This can occur, for example, when responses are determined as a solution to a joint optimization problem. Such specifications are formally implemented in SS by partitioning the system under study to group jointly responding variables into specific blocks. The system in definition 1 is called elementary because every unit i forms a block by itself. We now define general partitioned SS.

Definition 2. A partitioned settable system is a pair S ≡{(Ω,F), (Π, X Π )} with components defined as follows. Let (Ω,F), X0, n, and Si , i = 1,...,n, be as in + definition 1. Let Π = {Πb } be a partition of {1,...,n}, with cardinality B ∈ N¯ ≡ Π X Π ≡{X , X Π , X Π ...}, , ,..., , Π (B # ).Todefine 0 1 2 for i = 1 2 n let Zi be settings Π Π , ∈/ Π , and let Z(b) be the vector containing Z0 and Zi i b and taking values in Π Π S ⊆ Ω × ∈/Π S , S  ∅, ,..., . ,..., ∈ Π , (b) 0 i b i (b) = b = 1 B For b = 1 Bandi b suppose SΠ ⊆ S Π SΠ → SΠ , Π there exist [i] i and measurable functions ri : (b) [i] specific to such 1632 K. Chalak and H. White

Π ω Π ≡ Π Π that responses Yi ( ) are jointly determined as Yi ri (Z(b)). Settable variables X Π { , }×Ω → S i : 0 1 i are given by

X Π ,ω ≡ Π ω X Π ,ω ≡ Π ω ω ∈ Ω. i (0 ) Yi ( ) and i (1 ) Zi ( i )

 The settings Z(b) are allowed to be partition specific; this is especially Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 S relevant when the admissible set (b) imposes restrictions on the admissi-  . ble values of Z(b) Crucially, response functions and responses are partition  ≡ ( , ∈  ) specific. In definition 2, the joint response function r[b] ri i b spec-   ifies how the settings Z(b) outside block b determine the joint response  ≡ ( , ∈  ),  =  (  ). Y[b] Yi i b that is, Y[b] r[b] Z(b)  ={ } It is also convenient to let 0 0 represent the block corresponding to X 0.

5.3 Recursive Settable Systems. In what follows, we often consider ≤ ≤  ≡ recursive partitioned SS, defined next. For 0 a b,wedefine [a:b]  ∪···∪ ∪  . < , ≡ ∅. a b−1 b (For a b [b:a] )

Definition 3 (recursive partitioned settable system).LetS be a partitioned , ,..., Π settable system. For b = 0 1 B, let Z[0:b] denote the vector containing the Π ∈ Π S ⊆ Ω × ∈Π S , S  ∅ settings Zi for i [0:b] and taking values in [0:b] 0 i [1:b] i [0:b] = . ,..., ∈ Π , Π ≡{ Π } For b = 1 Bandi b suppose that r ri is such that the responses Π X Π , · Π ≡ Π Π Π Yi = i (0 ) are determined as Yi ri (Z[0:b−1]). Then we say that is a recursive partition, r Π is recursive, and S ≡{(Ω,F), (Π, X Π )} is a recursive partitioned settable system or simply recursive.

We employ the structure of recursive systems to provide definitions of indirect and total causality. This facilitates comparing our results to the DAG-related literature.

6 Causality in Settable Systems

We now employ SS to give definitions of several notions of causality, based on functional dependence: direct causality and indirect and total causality via and exclusive of a given set of variables. These notions refine and extend related concepts in the literature referenced below. We define causality in terms of settable variables rather than random variables or events, as is typical elsewhere. While they lend themselves to graphical representations, our definitions do not reply on properties of graphs. For notational ease, we may suppress reference in what follows to the  , , , X  superscript in Zi ri Yi and i , but it should be borne in mind that these are partition specific. Causality and Conditional Independence in Settable Systems 1633

6.1 Direct Causality and Direct Causality Graphs. Direct causality can be defined for both recursive and nonrecursive SS. Heuristically, we say that X ∈/  X , ∈  , S a settable variable i, i b, directly causes j j b in when there X X , exist responses for j that differ for different settings in i while holding  all other variables corresponding to units outside b to the same setting values. There are two main ingredients to this notion of direct causality.

Let z(b)(i) denote the vector containing all elements of setting values z(b) . ( , ) → Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 except zi The first ingredient is an admissible intervention, z(b)(i) zi ( , ∗ ) ≡ (( , ), ( , ∗ )) S z(b)(i) zi z(b)(i) zi z(b)(i) zi , defined as a pair of elements of (b), where we abuse notation somewhat by conveniently reordering the vector arguments. This intervention references only setting values corresponding  to units outside b. Note also that it differs only in the final component. The second ingredient is the behavior of the response to this intervention. We formalize this notion of direct causality as follows: Definition 4 (direct causality).LetS be a partitioned settable system. For given positive integer b, let j ∈ Πb . (i) For given i ∈/ Πb , Xi directly causes X j in S if , → , ∗ there exists an admissible intervention (z(b)(i) zi ) (z(b)(i) zi ) such that the direct X X , , ∗ S causal effect of i on j at (z(b)(i) zi zi ) in is nonzero:

ΔD , , ∗ ≡ , ∗ − ,  , i, j,S (z(b)(i) zi zi ) r j (z(b)(i) zi ) r j (z(b)(i) zi ) = 0

D and we write Xi ⇒S X j . Otherwise, we say Xi does not directly cause X j in S and D D write Xi ⇒S X j . (ii) For i, j ∈ Πb , Xi ⇒S X j . We emphasize that although we follow the literature in referring to inter- ventions, with their mechanistic or manipulative connotations, the mathe- matical concept involves only the properties of a response function on its domain. According to this definition, direct causality may fail either because the S set (b) is so constrained that it does not possess an admissible intervention of the desired form or because it does, but the response is the same for both elements of every admissible intervention of the specified form. The latter is perhaps the more common or intuitively appealing possibility, but we need not distinguish further between these possibilities. Note that by definition, variables within the same block do not directly D X ⇒ X . cause each other. In particular i S i Also, definition 4 permits mutual X ⇒D X X ⇒D X causality, so that i S j and j S i without contradiction for i and j in different blocks. Mutual causality is ruled out in SGS (p. 42), for example, where it is an axiom that if A causes B, then B does not cause A. X . Importantly, definition 4 permits an explicit causal role for 0 Similar to X , 0 PCM background variables are not determined by other system vari- ables, but background variables explicitly cannot act as causes in the PCM. X Thus, 0 serves here not only as a formal device to introduce randomness, 1634 K. Chalak and H. White but also as a bearer of causal influence. This causal role may be controversial from an ontological perspective, but it is clearly formally coherent.6 Indeed, X treating 0 as somehow noncausal introduces an arbitrary asymmetry into the functionally based notions of causality developed here: if causality is ex- pressed as functional dependence and if system responses are functionally X , X dependent on settings of 0 on what basis can 0 be treated as noncausal? We do not expect that this treatment will be palatable to all, but its formal coherence and the insight it lends into the Reichenbach principle and into Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 other causal questions make it worthy of careful consideration. X X ( , , ∗ ) S D ( , , ∗ ) The direct causal effect of i on j at z(b)(i) zi zi in , j,S z(b)(i) zi zi , in definition 4 corresponds to the notion of a “controlled” direct effect in Pearl (2001). Nevertheless, the PCM requires a unique fixed point, a require- ment absent here; the PCM also does not have a notion of partitioning, so the PCM notion pertains only to elementary partitions, and the PCM does not account for possible joint restrictions on setting values or responses and S = × S S = S . thus effectively assumes that (b) 0 i= j i and [i] i Direct causality relations have a convenient graphical representation. For this, we introduce notions of paths, successors, predecessors, and in- tercessors, adapting graph theoretic concepts discussed, for example, by Bang-Jensen and Gutin (2001):

Definition 5 (paths, successors, predecessors, and intercessors).LetS be a partitioned settable system. For given positive integer b, let j ∈ Πb and i ∈/ Πb . {X , X ,...,X , X } X , X We call the collection of settable variables i i1 im j an ( i j ) -walk X ⇒D X ⇒D ... ⇒D X ⇒D X of length m + 1if i S i1 S S im S j . When the elements of an (Xi , X j )-walk are distinct, we call it an (Xi , X j )-path. We say Xi precedes X j and that X j succeeds Xi if there exists at least one (Xi , X j )-path of positive length, and we call Xi a predecessor of X j and call X j a successor of Xi .IfXi precedes X j and Xi succeeds X j ,wesayXi and X j belong to a cycle. If Xi and X j do not belong to acycle,Xk succeeds Xi ,andXk precedes X j ,wesayXk intercedes Xi and X j and call Xk an (Xi , X j )−intercessor. We denote by Ii: j the set of (Xi , X j )−intercessors.

The direct causality graph for a given partitioned settable system S is a directed graph G ≡ (V, E) with a nonempty countable set of vertices ={X = , ,..., } ⊂ × V i : i 0 1 n and a set of arcs E V V of ordered pairs of (X , X ) X ⇒D X distinct vertices such that an arc i j belongs to E if and only if i S j. (X , X ) From definition 4, there exists at most one i j arc, so G need not contain, D X ⇒ X (X , X ) nor can it contain, parallel arcs. Since i S i, there can be no arc i i in E,soG need not and cannot contain self-directed arcs or loops. {X , X , Figure 14 illustrates the concepts of definition 5. We have that 0 1 X , X , X , X , X , X } (X , X ) {X , X , X , X , X } 2 3 1 2 3 4 is an 0 4 -walk of length 7 and 0 1 2 3 4

6We thank an anonymous reviewer for emphasizing these points. Causality and Conditional Independence in Settable Systems 1635

Figure 14: G14. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

(X , X ) X X X a 0 4 -path of length 4. We also have that 0 precedes 4, 3 suc- X X X X X . ceeds 1,and 1 and 3 belong to a cycle, as do 4 and 5 The set of (X , X )− I ={X , X } 1 4 intercessors is given by 1:4 2 3 . We use the term intercessor instead of mediator, as the latter may connote transmission and intercessors need not transmit effects, as we explain below. We emphasize that these direct causality graphs differ from other graphs in the literature. Nodes in direct causality graphs represent settable vari- ables, not random variables or events; arcs represent direct causality rela- tions, not functional or probabilistic dependence.

6.1.1 Direct Causality in Recursive Settable Systems. We now specialize definition 4 to recursive systems. For this, let 0 ≤ b < b and take i ∈  1 2 b1 and j ∈  . We write values of settings corresponding to  as z .We b2 [a:b] [a:b] also let z[0:b](i) denote a vector of values for settings for all settable variables  X S corresponding to [0:b] except i. Since is recursive, we can express re- sponse values for X as r (z − ). We abuse notation somewhat to permute j j [0:b2 1] the arguments of rj in a way that emphasizes their recursive relation to the X argument corresponding to i. In particular, we write

r (z ( ), z , z + − ) = r (z − ). j [0:b1] i i [b1 1:b2 1] j [0:b2 1]

X ⇒D X Definition 4 then concludes that i S j if there exists an admissible ∗ intervention (z ( ), z , z + − ) → (z ( ), z , z + − ) such that [0:b1] i i [b1 1:b2 1] [0:b1] i i [b1 1:b2 1]

D ∗ ∗  , ,S (z ( ), z , z , z + − ) ≡ r (z ( ), z , z + − ) i j [0:b1] i i i [b1 1:b2 1] j [0:b1] i i [b1 1:b2 1] − r (z ( ), z , z + − ) = 0. j [0:b1] i i [b1 1:b2 1]

Clearly if S is recursive successors do not directly cause predecessors; D ∈  ∈  < X ⇒ X that is, if i b and j b with b2 b1 then i S j. In particular, 1 D 2 X ⇒D X X ⇒ X if i S j then j S i. Thus, recursive systems do not admit mutual causality.For the direct causality graph, this means that we cannot have both 1636 K. Chalak and H. White

(X , X ) (X , X ) S arcs i j and j i belonging to E.Inaddition,arecursivesystem is D D D D acyclic: it does not admit cycles of the form X ⇒S X ⇒S ···⇒S X ⇒S i i1 im X S i. Thus, when is recursive, its corresponding direct causality graph G is aDAG. In the expression above for recursive settable system direct causality, the values for successors to X (corresponding to blocks  + − ) are set to i [b1 1:b2 1]

the same arbitrary value z + − in both argument lists. Sometimes it is of Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 [b1 1:b2 1] X X X interest to evaluate the direct effect on j of i when values of i’s successors X are set in both argument lists to the response value obtained when i is set to zi. When a setting is given by the response to its predecessors’ settings, X ∈  , we call it canonical. Thus, when it exists, the canonical setting for i, i b is

c = ≡ ( ) Zi Yi ri Z[0:b−1] .

c Then setting values z + − determined as responses y[b +1:b −1] to the [b1 1:b2 1] 1 2 admissible values of their predecessors’ settings are

y + − = r + − (z ). [b1 1:b2 1] [b1 1:b2 1] [0:b1]

The elements of this response vector obtain by recursive substitution. Any given vector element depends on only its corresponding predecessors. The associated direct effect is

D ∗ ∗  , ,S (z ( ), z , z , y + − ) ≡ r (z ( ), z , y + − ) i j [0:b1] i i i [b1 1:b2 1] j [0:b1] i i [b1 1:b2 1]

− r (z ( ), z , y + − ). j [0:b1] i i [b1 1:b2 1]

When S is recursive, we sometimes abuse notation to denote D ∗ D ∗  , ,S (z ( ), z , z , y + − ) simply as  , ,S (z ( ), z , z ). Pearl (2001) i j [0:b1] i i i [b1 1:b2 1] i j [0:b1] i i i refers to this as the “natural” direct effect. But unlike in the PCM, we do not require a unique fixed point, we allow for partitioning, and we permit constraints on joint settings.

S 6.1.2 Relation to Other Notions of Direct Causality. Using system 15 with Figure 15, we illustrate the relationships between definition 4 and several other notions of direct effects discussed in the literature:

X ( , ·) = (X ( , ·)), 1 0 r1 0 1 X ( , ·) = (X ( , ·), X ( , ·)). 2 0 r2 0 1 1 1 Causality and Conditional Independence in Settable Systems 1637

Figure 15: G15. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

X ⇒D X By definition 4, 0 S 2 if there is an admissible intervention ( , ) → ( ∗, ) 15 z0 z1 z0 z1 with

D ∗ ∗  , ,S (z , z , z ) ≡ r (z , z ) − r (z , z ) = 0. 0 2 15 0 0 1 2 0 1 2 0 1

D ( , ∗, ) X X Anonzero , ,S z0 z z1 justifies a link from 0 to 2 in G15.Asmen- 0 2 15 0 D ( , ∗, ) tioned above, , ,S z0 z z1 corresponds to the notion of controlled di- 0 2 15 0 rect effect in Pearl (2001).

If z1 is restricted to a specific suitable value, then we obtain a notion in the spirit of the “standardized” direct effect of Didelez et al. (2006) and Geneletti c = ( ) (2007). The canonical choice z1 r1 z0 yields Pearl’s (2001) natural direct effect,

D ∗ ∗  , ,S (z , z , r (z )) ≡ r (z , r (z )) − r (z , r (z )). 0 2 15 0 0 1 0 2 0 1 0 2 0 1 0

Robins and Greenland (1992) and Robins (2003) call this the “pure” direct effect. These same authors refer to the following contrast as the “total direct effect”:

D ∗ ∗ ∗ ∗ ∗  , ,S (z , z , r (z )) ≡ r (z , r (z )) − r (z , r (z )). 0 2 15 0 0 1 0 2 0 1 0 2 0 1 0

The literature also considers notions of direct effects defined as a contrast in some aspect of the distributions of responses for different settings. For example, let P be a probability measure on (, F ); then the average direct X X ( , ∗ ) S effect of 1 on 2 at z1 z1 in 15 is

D ∗ ∗ E[ , ,S (Z , z , z )] = E[r (Z , z ) − r (Z , z )], 1 2 15 0 1 1 2 0 1 2 0 1 where E is the expectation operator associated with P. Here, we consider direct effects to be differences in response values for any admissible intervention of the specified form. As Holland (1986) notes, these effects need not be identifiable without other assumptions. Nevertheless, the direct causality concept of definition 4 is in a precise sense the simplest and most general of the alternatives discussed. It is simplest in 1638 K. Chalak and H. White that direct causality is well defined even in the absence of recursive structure or fixed points. It is most general, as it is necessary but not sufficient for the others.

6.2 Indirect Causality in Settable Systems. We next define notions of indirect causality for recursive systems. We distinguish notions of indirect causality via and exclusive of specified variables. These definitions extend Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 notions of indirect causality in Robins and Greenland (1992), SGS, Pearl (2001), Robins (2003), Didelez et al. (2006), and Geneletti (2007), and notions of path-specific effects in Pearl (2001) and Avin et al. (2005). Although these extensions are of interest in their own right, their significance is enhanced by their role in providing foundations for the conditional Reichenbach principle of common cause, as well as later results on d-separation and D-separation.

6.2.1 Indirect Causality via Given Variables Motivating Examples. The basic idea of indirect causality adopted here is straightforward. Consider, for example, the system illustrated in Figure 15. X X X There, 0 indirectly causes 2 via 1 if there exists an admissible interven- ( , ( )) → ( , ( ∗ )) tion z0 r1 z0 z0 r1 z0 such that

I[{1}] ( , ∗ ) ≡ ( , ( ∗ )) − ( , ( )) = . , ,S z0 z0 r2 z0 r1 z0 r2 z0 r1 z0 0 0 2 15

( ) In the first case, Z0 is set to the value z0 and Z1 to the canonical value r1 z0 . In the second case, Z0 is set to the value z0 and Z1 to the canonical value ( ∗ ) ∗ r1 z0 that obtains when Z0 is set to z0. This corresponds to the notion of “natural indirect effect” in Pearl (2001) and Didelez et al. (2006) and to the notion of “pure indirect effect” in Robins and Greenland (1992) and Robins (2003). It is necessary but not sufficient for our notion of indirect causality that X X X X . 0 directly cause 1 and that 1 directly cause 2 We emphasize that tran- sitivity of causation is not guaranteed here, unlike classical treatments such as SGS, where transitivity of causation is axiomatic. Instead, transitivity ( ) = ( , ) depends on the response functions. For example, if r1 z0 max z0 0 and D D r (z , z ) = min(z , 0), then X ⇒S X and X ⇒S X ,butX does not in- 2 0 1 1 0 15 1 1 15 2 0 X ( , ( ∗ )) = ( ( ∗, ), ) = ∗ directly cause 2,asr2 z0 r1 z0 min max z0 0 0 0 for all z0. With X X X , X ) transitivity, i is an indirect cause of j if there exists an ( i j -path of length greater than 2 (SGS). Although this example conveys the basic idea, we work with more refined notions of indirect causality, elaborated below. X (X X ) In G15 (see Figure 15), 1 is the only 0, 2 -intercessor. In the presence of multiple intercessors, we may be interested in indirect causality via just one specified variable. Consider, for example, the system illustrated in Figure 16. Causality and Conditional Independence in Settable Systems 1639 Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 16: G16.

Figure 17: G17.

X X X Here, we say that 0 indirectly causes 3 via 1 if there exists an admis- ( , ( ), ) → ( , ( ∗ ), ) sible intervention z0 r1 z0 z2 z0 r1 z0 z2 such that

I[{1}] ( , ∗, ) ≡ ( , ( ∗ ), ) − ( , ( ), ) = . , ,S z0 z0 z2 r3 z0 r1 z0 z2 r3 z0 r1 z0 z2 0 0 3 16

( ) Restricting z2 to the value r2 z0 in the above difference essentially gives the {X , X , X } path-specific effect transmitted through the path 0 1 3 in Avin et al. (2005). More generally, we may consider notions of indirect causality via not just one but several settable variables, as illustrated in Figure 17. X X {X , X } Here, 0 indirectly causes 4 via 1 3 if there exists an admis- ( ( , ( )), ( ), { , ( )} ) → ( ( , ( ∗ )), sible intervention r2 z0 r1 z0 r3[r1 z0 r2 z0 r1 z0 ] r2 z0 r1 z0 ( ∗ ), { ∗, ( ∗ )} ) r3[r1 z0 r2 z0 r1 z0 ] such that

I[{1,3}] ( , ∗ ) ≡ ( ( , ( ∗ )), ( ∗ ), { ∗, ( ∗ )} ) , ,S z0 z0 r4 r2 z0 r1 z0 r3[r1 z0 r2 z0 r1 z0 ] 0 4 16 − ( ( , ( )), ( ), { , ( )} ) = . r4 r2 z0 r1 z0 r3[r1 z0 r2 z0 r1 z0 ] 0

(Here and elsewhere, we simplify notation by omitting response function arguments corresponding to variables that are not direct causes of the spec- ( , ( ∗ )) ified response.) Setting the first arguments of r4 above to r2 z0 r1 z0 and ( , ( )) X r2 z0 r1 z0 ensures that the difference in the response for 4 is not due to {X , X , X } effects transmitted through the path 0 2 4 . 1640 K. Chalak and H. White Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 18: G18.

The General Case. In the general case, the idea underlying indirect causal- ity in recursive systems is essentially the same as in these examples, but to express this rigorously demands careful attention to a perhaps daunting X mass of detail. Roughly speaking, however, we say that i indirectly causes X (X , X ) X X j via i j -intercessors A, if the response of j differs when the effects X ∗ of setting i to the value zi as opposed to zi are not transmitted directly, X but only through A. Because what follows is admittedly (and, apparently, unavoidably) heavy going, the more casual reader may want to let these heuristics suffice on first reading and skim along to section 6.4, where less intricate and more intuitive results again emerge. X In order to study the response of j under the relevant scenarios, we (X , X ) X partition the i j -intercessors in a recursive manner relative to A.We (X , X ) X distinguish the i j -intercessors that belong to paths through A from those that do not. Among the former, we distinguish the variables that X ; X ; X ; strictly precede A A the variables that intercede elements of A and the X variables that strictly succeed A. S For illustration, we employ system 18, with direct causality relations  ={ , }  ={ + } = ,..., illustrated in Figure 18, where 1 1 2 and b b 1 for b 2 8. The complexity of this example is not capricious. This is the simplest system permitting a full illustration of the relationships that must be considered in a general definition of indirect causality. < ∈  ∈  . S To begin the illustration, take b1 b2, i b , j b For example, in 18, = = = 1  2 ={ , } = let b1 1andb2 8, let i 2 (the second element of 1 1 2 ), and let j  (I ) 9(thesoleelementof 8). We denote by ind i: j the indexes of the elements (X , X ) I . S (I ) = of the i j -intercessors i: j For example, in 18,wehaveind 2:9 {3, 4, 5, 6, 7, 8}. We treat elements of  that do not correspond to (X , X )- [0:b2] i j intercessors as elements of  or  without loss of generality. Here, [0:b1] b2 ind(I ) =  + − . i: j [b1 1:b2 1] (I ) S ={ , } Let A beasubsetofind i: j .In 18, we can let A 5 7 ,say.Inwhat follows we order the arguments of response values r (z − ) for X to j [0:b2 1] j X X . emphasize their recursive ordering in relation to i and A Causality and Conditional Independence in Settable Systems 1641

∈ Ik ≡ I ∪{X }∪I (X , X ) For given k A,let i: j i:k k k: j denote the i j -intercessors X X ≡∪ {X } IA ≡∪ Ik for paths through k, and for A k∈A k ,let i: j k∈A i: j denote the (X , X ) X = ∅ IA = ∅ i j -intercessors for paths through A.(ForA ,welet i: j .) S (I5 ) = (I7 ) ={ , , , , } Thus, in 18,wehaveind 2:9 ind 2:9 3 5 6 7 8 , and it follows (IA ) ={ , , , , } that ind 2:9 3 5 6 7 8 as well. X ≡ I \IA (X , X ) Let A i: j i: j denote the i j -intercessors not belonging to paths

X X Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 through A, and let A denote the set of indexes of the elements of A. S = (I )\ (IA ) ={, , , , , }\{ , , , , }={ } In system 18, A ind 2:9 ind 2:9 3 4 5 6 7 8 3 5 6 7 8 4 . (I ) = (IA ) ∪ (IA ) ∩ = ∅. Thus, we have ind i: j ind i: j A and ind i: j A (IA ) We now partition ind i: j into four mutually exclusive and collectively X ≡∪ I \X X exhaustive subsets. First, let A k,l∈A k:l A denote the inter- A interces- X sors excluded from A, and let A denote the set of indexes of the elements X S ={ } of A.In 18,wehaveA 6 . (X , X ) Next, we distinguish between the i j -intercessors for paths through X X X A that strictly precede or succeed A. We define the A predecessors X ∪ X excluded from A A:

PA ≡∪ {X ∈ IA X ∈/ (X ∪ X ) X X }, i: j k∈A l i: j and l A A : l precedes k

X X ∪ X and the A successors excluded from A A :

SA ≡∪ {X ∈ IA X ∈/ (X ∪ X ) X X }. i: j k∈A l i: j and l A A : l succeeds k

(PA ) ={ } In the example illustrated in Figure 18, we have ind 2:9 3 and (SA ) ={ } ind 2:9 8 . (IA ) = (PA ) ∪ ∪ ∪ (SA ), By construction, ind i: j ind i: j A A ind i: j and these sub- (IA ) ={ , , , , } sets are mutually exclusive. In our example, ind 2:9 3 5 6 7 8 and (PA ) ∪ ∪ ∪ (SA ) ={ }∪{ , }∪{ }∪{ } (PA ), , , ind 2:9 A A ind 2:9 3 5 7 6 8 .Thus,ind i: j A A , (SA ) (I ) A and ind i: j partition ind i: j . X We now use this partition to represent response values for j in a conve- nient form. Recall that z ( ) denotes a vector of values for settings for the [0:b1] i vector of settable variables X ( ) corresponding to  \{i}. Thus, in S , [0:b1] i [0:b1] 18 X X z[0:1](2) denotes values of settings for 0 and 1. Similarly, let zi:A, zA, zA, zA, PA X X X and zA: j denote vectors of values of settings for elements of i: j , A, A, A, SA and i: j, respectively. We now slightly abuse notation to represent response values for X (recall j ∈  )as j b2

r (z ( ) , z , z , z , z , z , z ) = r (z − ), j [0:b1] i i i:A A A A A: j j [0:b2 1] 1642 K. Chalak and H. White

X where we reorder the arguments of rj to emphasize settings zi and zA of i X and A. = (I ) (PA ) (SA ) Note that when A ind i: j ,thesetsind i: j , A, A,andind i: j are empty, and we write r (z ( ), z , z ) = r (z − ). Alternatively, when j [0:b1] i i A j [0:b2 1] = ∅ (PA ) (SA ) = (I ) A ,thesetsind i: j , A ,andind i: j are empty, whereas A ind i: j , and we write r (z ( ), z , z ) = r (z − ). j [0:b1] i i A j [0:b2 1] We use the recursiveness of S and the definitions above to represent Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 PA X X X SA vectors of response values for elements of i: j , A, A, A,and i: j,re- spectively, in the following form, useful for general definitions of indirect causality:

r (z ( ), z ), i:A [0:b1] i i

r (z ( ), z , z ), A [0:b1] i i i:A

r (z ( ), z , z , z ), A [0:b1] i i i:A A

r (z ( ), z , z , z ), A [0:b1] i i i:A A

r (z ( ), z , z , z , z , z ). A: j [0:b1] i i i:A A A A

Here too, the elements of these response vectors obtain by recursive sub- stitution. Any given element of one of these vectors depends on only its predecessors. Thus, although zA appears as an argument in rA, only the predecessor elements of zA for a given response determine that response. X PA X By definition, an element of A cannot directly cause elements of i: j, A, X , X X SA . or A nor can it be directly caused by elements of A, A,or i: j Last, we denote canonical settings, defined as responses to specific setting values, by:

∗ ∗ y = r (z ( ), z ) y = r (z ( ), z ) i:A i:A [0:b1] i i i:A i:A [0:b1] i i ∗ ∗ ∗ y = r (z ( ), z , y ) y = r (z ( ), z , y ) A A [0:b1] i i i:A A A [0:b1] i i i:A ∗ ∗ ∗ ∗ y = r (z ( ), z , y , y ) y = r (z ( ), z , y , y ) A A [0:b1] i i i:A A A A [0:b1] i i i:A A ∗ ∗ ∗ ∗ y = r (z ( ), z , y , y ) y = r (z ( ), z , y , y ). A A [0:b1] i i i:A A A A [0:b1] i i i:A A

We can now state our first formal definition of indirect causality.

Definition 6 (indirect causality via XA).LetS be recursive. For given nonneg- < ∈ Π ∈ Π ative integers b1 and b2 with b1 b2,leti b1 ,j b2 , and A be a subset of ind(Ii: j ). We say that Xi indirectly causes X j via XA in S if there exists an ad- X , X , P A , X , X , X , S A missible intervention to ( [0:b1](i) i i: j A A A i: j ) with corresponding Causality and Conditional Independence in Settable Systems 1643

responses for X j such that the indirect causal effect of Xi on X j via XA at , , ∗, , S (z[0:b1](i) zi zi zi:A zA) in is nonzero:

ΔI [A] , , ∗, , i, j,S (z[0:b1](i) zi zi zi:A zA) ≡ , , , , ∗ , , , , ∗ , r j (z[0:b1](i) zi zi:A zA yA rA(z[0:b1](i) zi zi:A yA) , , , , ∗ , , , , ∗ rA: j [z[0:b1](i) zi zi:A zA yA rA(z[0:b1](i) zi zi:A yA)]) Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 − , , , , , , , , , r j (z[0:b1](i) zi zi:A zA yA rA(z[0:b1](i) zi zi:A yA) , , , , , , , ,  , rA: j [z[0:b1](i) zi zi:A zA yA rA(z[0:b1](i) zi zi:A yA)]) = 0

I [A] and we write Xi ⇒ S X j . Otherwise, we say that Xi does not indirectly cause X j I [A] via XA in S, and we write Xi ⇒ S X j .WhenA= ind(Ii: j ), we denote the indirect X X , , ∗ S ΔI , , ∗ causal effect of i on j at (z[0:b1](i) zi zi ) in by i, j,S (z[0:b1](i) zi zi ).When I [A] A = ind(Ii: j ) and Xi ⇒ S X j ,wesayXi indirectly causes X j in S and write I I [A] Xi ⇒S X j ;whenA= ind(Ii: j ) and Xi ⇒ S X j , we say that Xi does not indirectly I cause X j in S and write Xi ⇒S X j . S ={ , } Consider 18 illustrated in Figure 18. With A 5 7 , by definition 6, X I⇒[A] X ( , , ( , ), ( )) → 2 S 9 if there is an admissible intervention z1 z4 r6 z2 y5 r8 y7 ( , , 18 ( , ∗ ), ( ∗ )) z1 z4 r6 z2 y5 r8 y7 such that

I[A] ( , , ∗, ) ≡ ( , , ( , ∗ ), ( ∗ )) , ,S z1 z2 z2 z4 r9 z1 z4 r6 z2 y5 r8 y7 2 9 18 − ( , , ( , ), ( )) = . r9 z1 z4 r6 z2 y5 r8 y7 0

I[A] Intuitively, definition 6 concludes that X ⇒S X if the response of X 2 18 9 9 X ∗ differs when the effects of setting 2 to the value z2 as opposed to z2 are X . X not transmitted directly but only through A Thus, setting values for 1 X X and 4 are z1 and z4 in both responses of 9. On the other hand, setting X X X values for 6 and 8 differ across the two responses of 9 only in response (X , X ) to different settings of 5 7 . X ⇒I X ⊂ I X I⇒[A] X If i S j, then for some nonempty A i: j, i S j. The converse X X need not hold because i can indirectly cause j through each of two distinct intercessors whose associated effects may cancel each other. For example, S X X X X in 18, it may be that 2 indirectly causes 9 via 4 as well as via 6 but that X X {X , X } 2 does not indirectly cause 9 via 4 6 .

6.2.2 Indirect Causality Exclusive of Given Variables. We now introduce an indirect causality concept complementary to that above. For example, S X X in system 16 illustrated in Figure 16, we say that 0 indirectly causes 3 1644 K. Chalak and H. White

X ( , , ( )) → exclusive of 1 if there exists an admissible intervention z0 z1 r2 z0 ( , , ( ∗ )) z0 z1 r2 z0 such that

I[∼{1}] ( , ∗, ) ≡ ( , , ( ∗ )) − ( , , ( )) = . , ,S z0 z0 z1 r3 z0 z1 r2 z0 r3 z0 z1 r2 z0 0 0 3 16

S X X X X Similarly, in 17, we say that 0 indirectly causes 4 exclusive of 1 and 3 ( ( , )), ) → ( ( ∗, ), ) Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 if there exists an admissible intervention r2 z0 z1 z3 r2 z0 z1 z3 such that

I[∼{1,3}]( , ∗, , ) ≡ ( ( ∗, ), ) − ( ( , ), ) = . , ,S z0 z0 z1 z3 r4 r2 z0 z1 z3 r4 r2 z0 z1 z3 0 0 4 17

X X (X , X ) More generally, we say that i indirectly causes j exclusive of i j - X X intercessors A if the response of j differs when the effects of setting X ∗ i to the value zi as opposed to zi are transmitted indirectly through all X . succeeding variables except A These examples are instances of the following definition:

Definition 7 (indirect causality exclusive of XA).LetS and A be as definition 6. We say that Xi indirectly causes X j exclusive of XA in S if there exits an X , X , P A , X , X , X , S A admissible intervention to ( [0:b1](i) i i: j A A A i: j ) with corresponding responses for X j such that the indirect causal effect of Xi on X j exclusive of XA at , , ∗, S (z[0:b1](i) zi zi zA) in is nonzero:

ΔI [∼A] , , ∗, i, j,S (z[0:b1](i) zi zi zA) ≡ , , ∗ , ∗ , , , ∗, ∗ , , r j (z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA) , ∗, ∗ , ∗ , , , ∗, ∗ , rA: j [z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA)]) − , , , , , , , , , r j (z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA) , , , , , , , ,  rA: j [z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA)]) = 0;

I [∼A] and we write Xi ⇒ S X j . Otherwise, we say that Xi does not indirectly cause X j I [∼A] exclusive of XA in S, and we write Xi ⇒ S X j .

∼ S ={ , }, X I[⇒A] X In system 18 with A 5 7 by definition 7, 2 S 9 if there ( , , ( , ), ( )) →18( , ∗, ( ∗, exists an admissible intervention z1 y4 r6 z2 z5 r8 z7 z1 y4 r6 z2 ), ( )) z5 r8 z7 such that

I[∼A] ( , , ∗, , ) ≡ ( , ∗, ( ∗, ), ( )) , ,S z1 z2 z2 z5 z7 r9 z1 y4 r6 z2 z5 r8 z7 2 9 18 − ( , , ( , ), ( )) = . r9 z1 y4 r6 z2 z5 r8 z7 0 Causality and Conditional Independence in Settable Systems 1645

I[∼A] Intuitively, definition 7 concludes that X ⇒ S X if the response of X 2 18 9 9 X ∗ differs when the effects of setting 2 to the value z2 as opposed to z2 are X . transmitted indirectly through all succeeding variables, except through A

6.3 Total Causality in Recursive Settable Systems. In analyzing rela- tions between causality and conditional independence, it turns out to be important to keep track of channels of both indirect and direct causality. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 S Consider system 15 illustrated in Figure 15, for example. There, we say X X X that 0 (totally) causes 2 via 1 if there exists an admissible intervention ( , ( )) → ( ∗, ( ∗ )) z0 r1 z0 z0 r1 z0 such that

[{1}] ( , ∗ ) ≡ ( ∗, ( ∗ )) − ( , ( )) = . , ,S z0 z0 r2 z0 r1 z0 r2 z0 r1 z0 0 0 2 15

X Intuitively, the response of 2 differs when the effect of setting Z0 to the ∗ value z0 as opposed to z0 is transmitted fully, taking into account both direct S and indirect effects. Similarly, in system 16 illustrated in Figure 16, we say X X X that 0 (totally) causes 3 via 1 if there exists an admissible intervention ( , ( ), ) → ( ∗, ( ∗ ), ) z0 r1 z0 z2 z0 r1 z0 z2 such that

[{1}] ( , ∗, ) ≡ ( ∗, ( ∗ ), ) − ( , ( ), ) = . , ,S z0 z0 z2 r2 z0 r1 z0 z2 r2 z0 r1 z0 z2 0 0 3 16

We now provide formal definitions of (total) causality via and exclusive of asetofvariables:

Definition 8 (A−causality).LetS and A be as definition 6. We say that Xi causes X j via XA (or Xi A−causes X j )inS if there exists an admissible intervention to X , X , P A , X , X , X , S A X ( [0:b1](i) i i: j A A A i: j ) with corresponding responses for j such that X X X , , ∗, , S the causal effect of i on j via A at (z[0:b1](i) zi zi zi:A zA) in is nonzero:

Δ[A] , , ∗, , i, j,S (z[0:b1](i) zi zi zi:A zA) ≡ , ∗, , , ∗ , , , , , ∗ , r j (z[0:b1](i) zi zi:A zA yA rA(z0 z[1:b1](i) zi zi:A yA) , , , , ∗ , , , , , ∗ rA: j [z[0:b1](i) zi zi:A zA yA rA(z0 z[1:b1](i) zi zi:A yA)]) − , , , , , , , , , , r j (z[0:b1](i) zi zi:A zA yA rA(z0 z[1:b1](i) zi zi:A yA) , , , , , , , , ,  rA: j [z[0:b1](i) zi zi:A zA yA rA(z0 z[1:b1](i) zi zi:A yA)]) = 0;

[A] and we write Xi ⇒S X j . Otherwise, we say that Xi does not A−cause X j in S, [A] and we write Xi ⇒S X j .WhenA= ind(Ii: j ), we denote the causal effect of Xi ∗ ∗ X , , S Δ , ,S , , I on j at (z[0:b1](i) zi zi ) in by i j (z[0:b1](i) zi zi ).WhenA= ind( i: j ) and 1646 K. Chalak and H. White

[A] Xi ⇒S X j ,wesayXi causes X j in S, and we write Xi ⇒S X j ;whenA= ind(Ii: j ) [A] and Xi ⇒S X j , we say that Xi does not cause X j in S, and we write Xi ⇒S X j .

Definition 9 (∼A−causality).LetS and A be as definition 6. We say that Xi causes X j exclusive of XA (or Xi ∼ A− causes X j )inS if there exists an admissible X , X , P A , X , X , X , S A intervention to ( [0:b1](i) i i: j A A A i: j ) with corresponding responses X X X X , , ∗, Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 for j such that the causal effect of i on j exclusive of A at (z[0:b1](i) zi zi zA) in S is nonzero:

Δ∼A , , ∗, i, j,S (z[0:b1](i) zi zi zA) ≡ , ∗, ∗ , ∗ , , , ∗, ∗ , , r j (z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA) , ∗, ∗ , ∗ , , , ∗, ∗ , rA: j [z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA)]) − , , , , , , , , , r j (z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA) , , , , , , , ,  rA: j [z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA)]) = 0;

∼A and we write Xi ⇒S X j . Otherwise, we say that Xi does not cause X j exclusive of ∼A XA (or Xi does not ∼ A− cause X j )inS, and we write Xi ⇒S X j .

Thus, definitions 8 and 9 are analogous to definitions 6 and 7, with the X X difference that the direct effect of i on j is now further taken into account.

6.4 Relations Among Total, Direct, and Indirect Causality. We now relate the various causality notions defined above. These relations are com- pletely intuitive, but it is important that they be made rigorous. Moreover, their plausibility suggests that the foregoing definitions are natural in an important sense. X X X First, we verify that a causal effect of i on j via A can be decomposed X X X into a direct causal effect of i on j and an indirect causal effect of i on X X j via A.

Proposition 1. Let S and A be as definition 6. For all admissible interventions underlying the following direct effect and indirect effect via XA,wehave

Δ[A] , , ∗, , i, j,S (z[0:b1](i) zi zi zi:A zA) ΔD , , ∗, , , ∗ , , , , , ∗ , = i, j,S (z[0:b1](i) zi zi zi:A zA yA rA(z0 z[1:b1](i) zi zi:A yA) , , , , ∗ , { , , , , ∗ } rA: j [z[0:b1](i) zi zi:A zA yA rA z0 z[1:b1](i) zi zi:A yA ]) ΔI [A] , , ∗, , . + i, j,S (z[0:b1](i) zi zi zi:A zA) Causality and Conditional Independence in Settable Systems 1647

A corollary then links A−causality,direct causality,and indirect causality X via A: Corollary 1. Let S and A be as definition 6. Suppose that there exist admissible interventions underlying the direct effect and indirect effect via XA in proposition [A] ∗ D I [A] Δ , , , ,  X ⇒S X X ⇒ S X 1. If i, j,S (z[0:b1](i) zi zi zi:A zA) = 0, then i j or i j or both. = (I )

An important special case of corollary 1 occurs when A ind i: j : Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 ∗ I Δ , ,S , ,  Corollary 2. Set A = ind( i: j ) in corollary 1. If i j (z[0:b1](i) zi zi ) = 0, then D I Xi ⇒S X j or Xi ⇒S X j or both. Corollary 2 verifies the plausible claim that for admissible interventions, X X if i causes j, it does so directly, indirectly, or both. Significantly, the converse need not hold, as direct and indirect causal channels can cancel one another. Corollary 1 extends this proposition to A-causality. A similar result holds for ∼ A-causality: Proposition 2. Let S and A be as definition 6. For all admissible interventions underlying the following direct effect and indirect effect exclusive of XA,wehave

Δ∼A , , ∗, i, j,S (z[0:b1](i) zi zi zA) ΔD , , ∗, ∗ , ∗ , , , ∗, ∗ , , = i, j,S (z[0:b1](i) zi zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA) , ∗, ∗ , ∗ , , , ∗, ∗ , rA: j [z[0:b1](i) zi yi:A yA zA rA(z[0:b1](i) zi yi:A zA)]) ΔI [∼A] , , ∗, . + i, j,S (z[0:b1](i) zi zi zA)

Corollary 3. Let S and A be as definition 6. Suppose that there exist admissible interventions underlying the direct effect and indirect effect exclusive of XA in ∼ ∗ D I [∼A] Δ A , , ,  X ⇒S X X ⇒ S X proposition 2. If i, j,S (z[0:b1](i) zi zi zA) = 0, then i j or i j or both. Proposition 8 in the appendix collects additional useful basic results on X = ∅ (indirect) causality via or exclusive of A for the special cases A or = (I ) A ind i: j . [{1}] [∼{2}] Observe that in S , X ⇒ S X is equivalent to X ⇒ S X .In 16 0 16 3 0 16 3 X ⇒[A] X X ⇒∼B X general the relation between the notions i S j and i S j with = (I )\ B ind i: j A is more intricate. For example, it is straightforward to ver- [{1,3}] ∼{2} ify that in S , the notions X ⇒ S X and X ⇒ S X do not coincide. 17 0 17 4 0 17 4 X X X This contrast arises due to 2 interceding 0 and 4 along multiple paths S in 17. It is thus useful and of interest to study even more refined notions of (indirect) causality using the framework in this article. In particular, for (I ), X disjoint subsets A and B of ind i: j we can study the notions of i (indi- X X X X X X rectly) causing j via A and via B;via A and exclusive of B;via A or 1648 K. Chalak and H. White

X X X . exclusive of B; and exclusive of A or exclusive of B For brevity, we leave a formal treatment of these causal notions to other work, since the current definitions suffice for the version of the conditional Reichenbach principle we present next.

7 Conditional Independence in Recursive Systems

In this section, we use the foundations provided in sections 5 and 6 to Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 characterize the relations between conditional independence and settable system notions of causality by establishing the conditional Reichenbach principle of common cause. The classical unconditional Reichenbach prin- ciple follows immediately. X We focus on canonical systems. Recall that the canonical setting for i, ∈  , i b is

c = ≡ ( ) Zi Yi ri Z[0:b−1] .

c , Letting this expression recursively define the canonical settings Z[0:b−1] = ,..., , c ≡ = ≡ c, b 1 B with Z0 Z0 Y0 Y0 we also define canonical responses

c ≡ ( c ), ∈  , = ,..., . Yi ri Z[0:b−1] i b b 1 B

Below, whenever we reference canonical responses, we implicitly assume their existence. In the next two sections, we always reference canonical responses. Accordingly, we drop the superscript c and write Yi in place of c Yi , and so on, for notational convenience.

7.1 The Conditional Reichenbach Principle of Common Cause. So far, none of our formal definitions or results have required probability mea- sures. To relate causality and probabilistic dependence, we now explicitly introduce probability measures P on (, F ). Our next result formalizes a conditional version of Reichenbach’s principle: Proposition 3 (the conditional Reichenbach principle of common cause (I)). Let S be a recursive partitioned settable system, and for a, b ≥ 0, let i ∈ Πa and j ∈ Πb ,i = j. Let Xi and X j be settable variables with canonical responses Yi and Yj.LetA⊂{1,...,n}\{i, j}, and let XA be the corresponding vector of settable variables with canonical responses YA. For every probability measure P on (Ω,F), if Yi ⊥ Yj | YA, then either:

∼Aj i. i = 0andX0 causes X j exclusive of Aj ≡ A ∩ ind(I0: j ), that is, X0 ⇒ S X j ; or ∼Ai ii. j = 0andX0 causes Xi exclusive of Ai ≡ A ∩ ind(I0:i ), that is, X0 ⇒ S Xi ; or ∼ Aj ∼Ai iii. i, j = 0andX0 ⇒ S X j and X0 ⇒ S Xi . Causality and Conditional Independence in Settable Systems 1649

The traditional Reichenbach principle of common cause follows by putting A = ∅:

Corollary 4 (the Reichenbach principle of common cause).LetS, Xi , and X j be as in proposition 3. For every probability measure P on (Ω,F),ifYi ⊥ Yj , then: Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

i. i = 0andX0 causes X j ,thatis,X0 ⇒S X j ; or ii. j = 0andX0 causes Xi ,thatis,X0 ⇒S Xi ; or iii. i, j = 0andX0 ⇒S Xi and X0 ⇒S X j .

This provides fully explicit conditions, both causal and probabilistic, under which the Reichenbach principle of common cause holds—that is, under which it is true that when canonical responses for two settable variables are probabilistically dependent, either one causes the other or there exists an underlying common cause. Note that while the possibility that one variable causes the other is not explicit in item iii, it is nevertheless implicit, as one X ⇒ X X ⇒ way in which we may have 0 S j is via the indirect channel 0 S X ⇒ X i S j. If this fails in iii, then there nevertheless must be a common X cause, 0. This analysis reveals that the traditional unconditional Reichenbach prin- X ciple is not a deep fact. The reason is that the principal settable variable 0 can always serve as a universal common cause. Moreover, because the ω principal setting values z0 are identified with the underlying elements 0 of the principal universe 0 , one cannot dispense with this universal com- mon cause without dispensing with the underlying structure supporting probability statements. This demonstrates the indispensable and dramat- X ically simplifying role played by the principal variable 0 as a universal common cause. Once this role is understood, the content of the uncondi- tional Reichenbach principle is no longer mysterious. Its previously per- plexing status can be understood as a consequence of the lack of a proper context for its formulation. The settable system framework supplies this context. The conditional Reichenbach principle is substantive, however, as it im- plies that in recursive causal systems, knowledge of conditional dependence ⊥ | relations such as Yi Yj YA is informative about the possible causal rela- X X tions holding between settable variables i and j. proposition 3 implies that in recursive systems, in order for two canonical responsesYi and Yj to be conditionally dependent given a vector of canonical responses YA,itmust X X X be that the principal variable 0 causes at least i or j exclusive of the rele- X vant subsets of A. Otherwise, we can express Yi or Yj (or both) as a function ⊂{ ,..., }\{ , } of the relevant subvector of YA.Asproposition3hasA 1 n i j ,it ∈/ ⊥ | is necessary that 0 A; Yi Yj YA cannot hold otherwise. 1650 K. Chalak and H. White

Further, the possibility that one variable causes the other is again implicit ∼A ∼A X ⇒j X X ⇒i X in item iii. One way in which we may have 0 S j and 0 S i is via ∼A ∼A X ⇒i X ⇒i: j X ≡ ∩ (I ) the indirect channel 0 S i S j with Ai: j A ind i: j . But even X if this fails in iii, then there nevertheless must be a common cause, 0. If the conclusion of proposition 3 holds (regardless of whether the stated conditions hold), then the direct causality graph G associated with S has the following simple property: Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Proposition 4. Let S, Xi , X j ,andXA be as in proposition 3, and let G be the direct causality graph corresponding to S. Suppose that conclusions i to iii of proposition 3 hold. Then there exist an (X0, Xi ) path (if i = 0) and an (X0, X j ) path (if j = 0) that does not contain elements of XA.

∼ ∼ A A j ⊥ | X ⇒i X X ⇒ X , Thus, it suffices for Yi Yj YA that 0 S i or 0 S j which is (X , X ) (X , X ) implied by the absence of an 0 i path or an 0 j path that does not X contain elements of A. S ⊥ | To illustrate, we apply proposition 4 to system 18.WehaveY0 Yi Y2 ∼{2} for i = 3,...,8, as X ⇒ S X for i = 3,...,8. Similarly, Y ⊥ Y | (Y ,Y ), 0 18 i 0 9 1 2 ∼{1,2} ∼{3} as X ⇒ S X . Also, we have that Y ⊥ Y | Y , since X ⇒ S X . In addi- 0 18 9 2 5 3 0 18 5 ∼{2} ∼{2} tion, we have that Y ⊥ Y | Y ,asX ⇒ S X and X ⇒ S X . These facts 3 5 2 0 18 3 0 18 5 hold for every P. These examples require knowledge of only direct causality relations. The direct causality graph suffices for this. But specific properties of the response functions, not indicated by the graph, may also be important. To illustrate, ⊥ S consider determining whether Y2 Y3 in 18. Corollary 4 gives that either X ⇒S X or X ⇒S X (or both) is sufficient for this to hold. We know 0 18 2 0 18 3 from G that X ⇒S X , but determining whether X ⇒S X requires 18 0 18 2 0 18 3 additional information about the functional form of response functions r2 . and r3 ∼{3} Similarly, in S , the contrapositive of proposition 3 gives that X ⇒ S 18 0 18 ∼{3} X ensures Y ⊥ Y | Y . But determining whether X ⇒ S X holds re- 7 2 7 3 0 18 7 quires additional information, not contained in G18, about the functional forms of the response functions. Thus, similar to the situation for PCM DAGs, SS direct causality graphs do not provide complete information about the causal relations needed for resolving questions of conditional independence. Indeed, as Dawid (2010a) argued, nongraphical representations of causality are indispensable here. Our function-based definitions of causality supply just the information needed to relate causality to conditional independence. Causality and Conditional Independence in Settable Systems 1651

Because a direct causality graph is not a probabilistic DAG, there is no reason to expect d-separation to be informative about conditional in- dependence in direct causality DAGs such as G18. For example, although ⊥ | , X X X Y3 Y5 Y2 3 and 5 are not d-separated by 2 in G18. Similarly, we have ∼{3} Y ⊥ Y | (Y ,Y ), since X ⇒ S X , whereas X and X are not d-separated 2 5 3 6 0 18 5 2 5 (X , X ) X → X ← X . by 3 6 in G18 due to the “collider” 2 6 5 There is no para-

dox here: d-separation implies conditional independence in a certain class Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 of probabilistic DAGs, but it does not generally apply to direct causality graphs. {X , X , X } X Note also that because the path 2 6 7 does not contain 3,we X X X have that 2 and 7 are not d-separated by 3 in G18 . To conclude that ⊥ | Y2 Y7 Y3 in such situations in PCM DAGs, Pearl (2000) and SGS introduce the assumptions of stability or faithfulness of the probability measure P. In sharp contrast, Proposition 3 imposes no restrictions on P; instead the properties of the response functions play the key role in determining the presence or absence of causal relations.

7.2 Characterizing the Conditional Reichenbach Principle. The con- ditional Reichenbach principle gives necessary but not sufficient causal ⊥ | conditions for conditional dependence. Specifically, Yi Yj YA can hold even when the conclusion of proposition 3 holds. Examples of this are easy to construct: S Example 1. Consider system 16 with G16, and suppose that Y1 and Y2 are ρ. jointly normal with mean 0, variance 1, and correlation Then Y1 and Y2 ρ = . ρ = ⊥ X are independent if and only if 0 When 0, Y1 Y2 even though 1 X X . and 2 share the common cause 0 It is also easy to construct examples in which independence holds be- tween directly causally related variables: S Example 2. Consider system 16 with G16, and suppose that Y1 and Y2 are jointly normal with mean 0, variance 1, and correlation ρ. Suppose also that D X ⇒S X , with 0 16 3 = + . Y3 Y1 aY2 =−ρ. Then Y2 and Y3 are also jointly normal, with mean 0. Let a Then Y3 and Y2 have zero correlation, so they are independent, even though D X ⇒S X . (Note that Y has nonzero variance as long as |a| < 1.) 2 16 3 3 It is thus useful to refine the possibilities for conditional independence to distinguish situations in which causal restrictions among settable variables ensure that their canonical responses are conditionally independent for any probability measure and those where conditional independence holds only 1652 K. Chalak and H. White for some choice of P. Direct causality restrictions may be sufficient but are not necessary for the first situations to obtain, as seen above. Also, the second situations can hold due to a particular choice of P only or both a particular configuration of response functions and a particular choice of P. The following definitions are useful for this:

Definition 10 (conditional causal isolation and conditional P-stochastic iso- lation).LetS,Yi,Yj , and YA be as in proposition 3. Suppose that the conclusion Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 of proposition 3 fails; then Xi and X j are causally isolated given XA .LetPbea probability measure on (Ω,F), and suppose that Yi ⊥ Yj | YA when Xi and X j are not causally isolated given XA; then we say that Xi and X j are P-stochastically isolated given XA. X X From definition 10, we have that i and j are causally isolated given ∼ ∼ A A j X X ⇒i X X ⇒ X A when 0 S i or 0 S j. The “isolation” is from the potential X . = ∅, X X cause 0 When A we say that i and j are causally isolated when X ⇒ X the conclusion of corollary 4 does not hold, that is, when 0 S i or X ⇒ X 0 S j. Conditional causal isolation arises when, for one or the other of X X i and j, the response functions channel the effects of the principal cause X 0 in just the right way so as to yield canonical responses Yi or Yj (or both) expressible just as a function of the relevant subsets of Y (i.e., Y or Y ). A Ai A j Conditional P-stochastic isolation is just conditional independence with- out conditional causal isolation. It can arise either from P alone, as in ex- ample 1, or from just the right combination of P and functional relations between multiple causes (common or direct), as in example 2. The utility of conditional P-stochastic isolation is that it permits distinguishing be- tween guaranteed sources of conditional independence (conditional causal isolation) and more special or exceptional cases. Provided that the underlying structure ensures that the individual con- ditional distributions of Yi given YA and of Yj given YA are each regular ∗ (see Dudley, 2002), there exists a joint probability measure P such that Yi and Yj are conditionally independent given YA (see propositions III.2.1 and III.2.2 of Neveu, 1965). Nevertheless, the structural relations holding jointly , , among Yi Yj and YA may impose restrictions sufficient to rule out condi- tional independence, so that although P∗ exists, it does not represent the joint distribution of Yi and Yj given YA. An example occurs below when we provide conditions ensuring that conditioning on successors rules out conditional independence. On the other hand, when conditional independence is not strictly ruled out, P-stochastic isolation may be adopted without theoretical contradic- tion. As P-stochastic isolation is a potentially quite strong restriction, it should not be casually assumed. Instead, it should be empirically subjected to falsification whenever feasible by testing the conditional independence(s) it may be thought to justify. White and Chalak (2010) discuss such tests. Causality and Conditional Independence in Settable Systems 1653

Although we label the next statement as a corollary, it is really just a reformulation of the preceding definition. Its utility lies in providing a convenient explicit characterization of the relation between conditional independence and our function-based causality definitions, making it clear that conditional causal isolation goes only so far. In its absence, P-stochastic isolation can still ensure conditional independence.

Corollary 5 (conditional Reichenbach principle of common cause, II). Sup- Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 pose the conditions of proposition 3 hold. For given probability measure P on (Ω,F), Yi ⊥ Yj | YA if and only if either Xi and X j are causally isolated given XA or Xi and X j are P-stochastically isolated given XA. When A = ∅, corollary 5 corresponds to and completes Reichenbach’s principle of common cause. Theorem 1 in the appendix treats the vector case.

8 Settable Systems and Graphical Separation

As section 2 discusses, implications of d-separation in probabilistic DAGs have sometimes been ascribed causal intuition (Pearl, 2000). Absent other causal relations and expressed in our notation, these can be stated for canon- ical responses Yi and Yj as: . ⊥ | , X X X d 1 Yi Yj YA provided A fully mediates the effect of i on j. . ⊥ | , X X d 2 Yi Yj YA provided A denotes the common causes for i and X j or fully mediates the effects of these common causes on either X X (or both) i or j. . ⊥ | , X X X d 3 Yi Yj YA provided A is caused by both i and j. We reiterate that such causal interpretations are problematic in the con- text of probabilistic DAGs. Further, as discussed in section 3, even in the PCM, d.1tod.3 may or may not hold. We now apply the conditional Re- ichenbach principle to assess the validity of d.1tod.3 in SS. First, we consider restricted SS analogous to Markovian PCMs. We show that in these special systems, the directed local Markov property holds, so that d-separation implies that d.1andd.2 hold. Next, we discuss other special SS where conditional independence relations additional to the local Markov property may hold for canonical responses, as encoded by the D-separation criteria (Geiger et al.,1990). Here, d.1andd.2 hold, as well as other statements that generally fail in Markovian systems. Finally, we provide general conditions for d.3 to hold in SS without requiring the local Markov property. The conditions given here do not rely on graphs or notions of stability or faithfulness. Thus, d-separation, D-separation, and stability or faithful- ness are not fundamental to establishing the connections between function- ally defined causal relations and conditional independence, nor are they a 1654 K. Chalak and H. White natural starting point or context for this study. Still, they can be useful for inferring conditional independence in suitably restricted SS.

8.1 Conditioning on Predecessors

8.1.1 The Markovian PCM and d−Separation in Settable Systems. We now provide conditions sufficient for d.1andd.2 to hold in SS. Our next result de- scribes a settable system analogous to the Markovian PCM that generalizes Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 the examples illustrated in G6 through G13. In particular, we show that here the local Markov property holds for certain random variables analogous to endogenous variables in the PCM.

D Proposition 5. Let S be recursive. Suppose that X0 ⇒S Xk for all k ∈ Π1 and that the elements of Π1 are in one-to-one correspondence with those of Π[2:B], such that D for each i ∈ Π[2:B],thereisauniquek∈ Π1 such that Xk ⇒S Xi . Suppose further D that X0 ⇒S Xi for all i ∈ Π[2:B]. For given i ∈ Πb , b ≥ 2, let C ≡{l ∈ Π[2:b−1] : D Xl ⇒S Xi } and let A ≡{j ∈ Π[2:B]\C : X j does not succeed Xi }.LetYi , YC , and YA be canonical responses of Xi , XC , and XA. If P is a probability measure on (Ω,F) such that {Yk : k ∈ Π1} are jointly independent, then Yi ⊥ YA | YC . = ∅, ⊥ A special case of proposition 5 obtains for C in which case Yi YA follows. Here, conditional independence holds but not conditional causal isola- tion, an example of P−stochastic isolation. Such systems are very special, as { ∈  } probability measures ensuring joint independence of Yk : k 1 are shy in the set of all joint probability distributions.7 It is easy to construct a probabilistic DAG compatible with the distri- { ∈  } bution of canonical responses Yi : i [2:B] . This DAG is isomorphic to the subgraph of the settable system direct causality graph corresponding  to elements of [2:B], substituting canonical responses for settable variables at the nodes. Further, Lauritzen et al. (1990, proposition 3) ensure that for such systems, d-separation or equivalent graphical criteria can identify ex- actly the conditional independence relations implied by the directed local , ∈  ⊥ X ⇒ X Markov property. Thus, here, for i j [2:B], Yi Yj implies that i S j, X ⇒ X ∈  X ⇒ X X ⇒ X j S i, or there exists k [2:B] such that k S i and k S j.Nev- ertheless, section 7 demonstrates that Reichenbach’s principle holds here X ⇒ X X ⇒ X ( 0 S i or 0 S j), regardless of the special Markovian structure, and extends in a natural way to its conditional counterpart. For example, consider the advice-action-outcome example illustrated in direct causality graph G8 (see Figure 8), where the expert’s advice does not

7Shyness is the function space analog of being a subset of a set of Lebesgue measure zero. See, Corbae, Stinchcombe, and Zeman (2009) for a discussion of shyness. Causality and Conditional Independence in Settable Systems 1655 directly affect the outcome. With no restrictions on P, the expert’s advice, , , Y4 and the outcome, Y6 need not be conditionally independent given the X X canonical agent action, Y5, since 4 and 6 need be not causally isolated X . X given 5 This is despite the fact that the agent’s action 5 fully medi- X X ates the effect of the advice 4 on the outcome 6. Nevertheless, if we impose the strong assumption that the causal structure is as in G8 with ( , , ) ⊥ | , Y1 Y2 Y3 jointly independent, then by proposition 5, Y4 Y6 Y5 so that X X X . Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 4 and 6 are P-stochastically isolated given 5 Indeed, Y5 d-separates Y4 and Y6 in probabilisticDAG G9 associated with this “Markovian” structure.

8.1.2 Deterministic and Chance Nodes and D-Separation. Geiger et al. (1990) study DAGs that distinguish between deterministic and chance nodes. A deterministic node corresponds to a random variable that is conditionally independent of all other random variables given its DAG parents, whereas a chance node corresponds to a random variable that is conditionally inde- pendent of its “nondescendants” (nonsuccessors) given its parents. Geiger et al. (1990) call the conditional independence statements cor- responding to deterministic and chance nodes an “enhanced basis” and provide an analog to d-separation for these DAGs called “D-separation” that ensures conditional independence under the graphoid axioms. Similar to probabilistic DAGs, these DAGs do not contain any necessary causal content. In Markovian PCM graphs (such as G9), none of the nodes are fully determined by their parents, and thus d-separation and D-separation coincide in such DAGs. Suitably restricted SS can embody D-separation. For example, consider the probabilistic DAG G∗ corresponding to the direct causality graph G for a recursive system S that substitutes canonical responses for settable ∼C ⊂ (I ) X ⇒ X = variables. By theorem 1, if C ind 0:i is such that 0 S i and A ∼C { ,..., }\({ }∪ ), ⊥ | X ⇒ X 0 n i C then Yi YA YC. In particular, 0 S i when the set X C corresponds to all direct causes (i.e., the “parents”) of i. In this sense, in ∗ = G , Y0 is a chance node, whereas the nodes Yi, i 0, are deterministic. , { ,..., } It can be verified that for disjoint sets D E,andF in 0 n , YD and ∗ ∈ YE are not D-separated given YF in G if and only if (a)(i) 0 D and (ii) for ∈ (X , X ) some j E, there exists an 0 j path in G that does not contain elements X ∈ ∈ (X , X ) of F; or (b)(i) 0 E and (ii) for some i D, there exists an 0 i path in X ; ∈ ∪ G that does not contain elements of F or (c)(i) 0 D E and (ii) (a.ii) and (b.ii) hold. ⊥ | Although the graphical D-separation criteria are sufficient for YD YE , ⊥ | YF they are not necessary. A more general condition for YD YE YF is the failure of theorem 1’s condition (a), as this is implied by, but does not imply, D-separation in G∗. 1656 K. Chalak and H. White Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

Figure 19: G19.

Our next result describes a restricted settable system similar to that in proposition 5 that generates random variables forming an enhanced basis:

D Proposition 6. Let S be recursive. Suppose that X0 ⇒S Xk for all k ∈Π1 and that D for each k ∈ Π1,thereisauniquei ∈ Π[2:B] such that Xk ⇒S Xi . Suppose further D that X0 ⇒S Xi for all i ∈ Π[2:B]. For given i ∈ Πb , b ≥ 2, let C ≡{l ∈ Π[2:b−1] : D Xl ⇒S Xi }.LetA1 ≡{l ∈ Π[2:B]\(C ∪{i})} and A2 ≡{j ∈ Π[2:B]\(C ∪{i}):X j does not succeed Xi }. Let P be a probability measure on (Ω,F). (i) Suppose that D X ⇒ X ∈ Π ⊥ | X ⇒D X k S i for all k 1.ThenYi YA1 YC . (ii) Suppose that (a) k S i for some k ∈ Π1 and (b) Pissuchthat{Yk : k ∈ Π1} are jointly independent. Then ⊥ | . Yi YA2 YC Part i ensures conditional causal isolation. Part ii gives P-stochastic iso- { ∈  } lation when Yk : k 1 are jointly independent, generating an enhanced { , ∈  } † basis involving Yi i [2:B] . This is represented in DAG G isomorphic ∈  to the subgraph for i [2:B] of G, substituting canonical responses for set- D X ⇒ X ∈  , table variables at the nodes. If k S i for all k 1 Yi is represented by † a dashed deterministic node in G . Otherwise, Yi is represented by a solid chance node. Applying D-separation to G† identifies exactly the conditional independence relations implied by this enhanced basis under the graphoid axioms. To illustrate, consider the canonical responses of the advice-action- X , outcome example illustrated in Figure 19, where the expert’s advice, 3 X , has no effect on the outcome, 5 and the agent fully complies with the expert’s advice. ∼3 Since X ⇒S X , lemma 1 ensures that the agent’s action canonical 0 19 4 response Y4 is determined as a function of the expert’s advice canonical . response Y3 Thus, Y4 is represented by a deterministic (dashed) node in Figure 20. On the other hand, Y3 and Y5 are represented by chance (solid) Causality and Conditional Independence in Settable Systems 1657

Figure 20: G20. Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

. ⊥ | , nodes in G20 Proposition 6 gives that Y4 Y5 Y3 for any P a consequence X of conditional causal isolation. If Y1 and Y2 are independent, then 3 and X X ⊥ | . 5 are P-stochastically isolated given 4 and Y3 Y5 Y4 Clearly,such structures impose very strong restrictions on both the causal relationships of the system and the distribution of the responses in the first { ∈  }. block, Yk : k 1

8.2 Conditioning on Successors. Unlike properties d.1andd.2, which concern the independence of successors conditioning on predecessors, d.3is a statement about the conditional dependence of predecessors, conditioning on successors. Without further conditions, d.3 need not hold in recursive SS. For example, for canonical responses in system S , we have that Y ⊥ ∼{ } 18 2 3 D D Y | (Y ,Y ) since X ⇒ S X even though X ⇒S X and X ⇒S X . 5 3 6 0 18 5 2 18 6 5 18 6 Nevertheless, d.3 may follow from the conditional Reichenbach principle for special cases under further assumptions. For instance, the local Markov property holds among certain canonical responses in Markovian systems as described in section 8.1; thus, d.3 holds for these provided faithfulness (SGS) or stability (Pearl, 2000) holds. But these notions do not provide satisfying insight, as they simply impose whatever might be needed to ensure d.3. Moreover, the local Markov property is a strong restriction. Our next result extends d.3 for systems that need not be Markovian (see also Wermuth & Cox, 2004). In order to provide simple informative primitive conditions for d.3, we introduce two convenient definitions. First, ( , ) we specify what we mean by saying that predecessors Y1 Y2 are jointly ( ∗, ∗ ) continuously distributed at a point y1 y2 . For simplicity, we let Y1 and Y2 > , ∗ ∈ R, ∗ ∈ R, N ( ) ≡ be scalar. For given 0 y1 and y2 define neighborhoods 1 ∗ − , ∗ + , N ( ) ≡ ∗ − , ∗ + , N ( ) ≡ N ( ) × N ( ). [y1 y1 ] 2 [y2 y2 ] and 1 2

∗, ∗ Definition 11. We say Y1 and Y2 are jointly continuously distributed at (y1 y2 ) if there exists >0suchthatifA⊂ N ( ) is Borel measurable; then P[(Y1, Y2) ∈ A] > 0 if and only if λ(A) > 0, where λ denotes Lebesgue measure on R2.

Assuming that Y1 and Y2 are jointly continuously distributed ensures that both Y1 and Y2 exhibit nontrivial random variation and that neither com- pletely determines the other. 1658 K. Chalak and H. White

Next, we state a mild restriction on the response function:

R × R → R ∗ ∈ R, ∗ ∈ R, Definition 12. Let f : be such that there exist y1 y2 and >0 such that for all y1 in N1( ), f (y1, ·) is strictly monotone on N2( ) and for all y2 in N2( ), f (·, y2) is strictly monotone on N1( ). Then f is locally strictly ∗, ∗ . monotone at (y1 y2 ) Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 As special cases, locally strictly monotone functions can be locally strictly increasing or decreasing. The definition also covers mixed cases where, for N ( ), ( , ·) N ( ) example, for all y1 in 1 f y1 is strictly decreasing on 2 and for all N ( ), (·, ) N ( ). y2 in 2 f y2 is strictly increasing on 1 Local strict monotonicity X X X is a mild restriction, sufficient to ensure that 1 and 2 both cause 3 with = ( , ). canonical response Y3 r3 Y1 Y2

Proposition 7. Let S be recursive, and suppose that Y3 = r3(Y1, Y2). Suppose ∗, ∗ , further that for some (y1 y2 ) Y1 and Y2 are jointly continuously distributed at ∗, ∗ R × R → R (y1 y2 ) and that r3 : is both continuous and locally strictly monotone ∗, ∗ . ⊥ | . at (y1 y2 ) Then Y1 Y2 Y3

This delivers d.3 without imposing the local Markov property. Observe that it is straightforward to extend this to allow conditioning on both successors and nonsuccessors. For brevity, we leave further investigation of d.3 and its conditional extension to future work.

9Conclusion

We study the connections between conditional independence and causal relations within the SS extension of the PCM. As foundations, we provide formal function-based definitions of direct causality and of indirect and total causality via and exclusive of a set of variables. These definitions complement and extend those in Robins and Greenland (1992), SGS, Pearl (2000, 2001), Robins (2003), Avin et al. (2005), Didelez et al. (2006), and Geneletti (2007). With these foundations, we state and prove the condi- tional Reichenbach principle of common cause, characterizing the relations between conditional independence and causality. The classical Reichen- bach principle follows as a corollary. We address concerns raised by Dawid (2002, 2010a, 2010b) by demonstrating how the SS extension of the PCM permits a clear separation between causal and probabilistic concepts. In particular, we introduce concepts of conditional causal isolation and con- ditional P-stochastic isolation to distinguish between situations in which causal restrictions among settable variables ensure that their responses are conditionally independent for any probability measure and those where conditional independence is due to a particular choice of probability measure. Causality and Conditional Independence in Settable Systems 1659

We apply the conditional Reichenbach principle to examine the concepts of d-separation and D-separation, and we provide conditions under which the causal intuitions these notions support hold or fail in SS. Our results show that background variables, the Markov properties, enhanced bases, chance and deterministic nodes, and the assumption of faithfulness or sta- bility are not fundamental to relating conditional independence and causal- ity. We nevertheless show that d-separation and D-separation can help in understanding conditional independence relations in particular restricted Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 SS. This article’s results have direct relevance for empirical research by, among other things, providing conditions ensuring conditional indepen- dence relations useful for recovering causal effects from experimental and observational data and by providing foundations enabling researchers to identify, justify, and test the validity of covariates in treatment effect estima- tion, of instruments in instrumental variables estimation, and of predictors in forecasting models. Specifically,these results support Chalak and White’s (2011) study of the choice of instruments in extended instrumental variables estimation, White and Lu’s (2011) study of the choice of covariates for treat- ment effect estimation, and White and Chalak’s (in press) study of necessary and sufficient conditions for the identification of causal effects. A primary focus here is on recursive systems. An interesting direction for further research is to extend our concepts and results to nonrecursive sys- tems (see, e.g., Lauritzen & Richardson, 2002; Wermuth & Cox, 2004; White & Chalak, 2009). Our framework also provides an appropriate foundation for studying the identification and estimation of direct, indirect, and more refined causal effects (see Avin et al., 2005; Didelez et al., 2006; Geneletti, 2007; Shpitser & Pearl, 2008).

Mathematical Appendix

[A] ∗ Proof of Proposition 1. Beginning with  , ,S (z ( ), z , z , z , z ), the i j [0:b1] i i i i:A A result obtains by adding and subtracting

∗ ∗ r (z ( ), z , z , z , y , r (z , z ( ), z , z , y ), j [0:b1] i i i:A A A A 0 [1:b1] i i i:A A ∗ ∗ r [z ( ), z , z , z , y , r {z , z ( ), z , z , y }]) A: j [0:b1] i i i:A A A A 0 [1:b1] i i i:A A and applying definitions 4 and 6.

Proof of Corollary 1. Since there exist admissible interventions underlying X the direct effect and indirect effect via A in proposition 1, there exists an X admissible intervention underlying the total effect via A in proposition 1. 1660 K. Chalak and H. White

[A] ∗ Suppose that  , ,S (z ( ), z , z , z , z ) = 0, then by proposition 1 either i j [0:b1] i i i i:A A

D ∗ ∗ ∗  , ,S (z ( ), z , z , z , z , y , r (z , z ( ), z , z , y ), i j [0:b1] i i i i:A A A A 0 [1:b1] i i i:A A ∗ ∗ r [z ( ), z , z , z , y , r {z , z ( ), z , z , y }]) = 0, A: j [0:b1] i i i:A A A A 0 [1:b1] i i i:A A Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 I[A] ∗ D I[A] or  , ,S (z ( ), z , z , z , z ) = 0 or both. Thus, X ⇒S X or X ⇒S X or i j [0:b1] i i i i:A A i j i j both.

= (I ) Proof of Corollary 2. Apply corollary 1. with A ind i: j .

Proof of Proposition 2. The proof is analogous to that for proposition 1.

Proof of Corollary 3. The proof is analogous to that for corollary 1.

We next collect together useful basic results on (indirect) causality via or X = ∅ = (I ) exclusive of A for the special cases A or A ind i: j .

Proposition 8. Let S,i,andjbeasdefinition6,A= ∅,andB= ind(Ii: j ).Then

I [A] a. Xi ⇒ S X j I [∼B] b. Xi ⇒ S X j I [∼A] I [B] c. Xi ⇒ S X j if and only if Xi ⇒ S X j ∼A [B] d. Xi ⇒S X j if and only if Xi ⇒S X j [A] ∼B e. Xi ⇒S X j if and only if Xi ⇒S X j [A] D f. Xi ⇒S X j if and only if Xi ⇒S X j

It follows from e and f in proposition 8 that ∅−causality and ∼ − = (I ) A causality with A ind i: j are equivalent to direct causality in recursive systems.

Proof of Proposition 8.

I[A] a. We have ind(I ) = A and thus X ⇒S X since r (z ( ), z , z ) − i: j i j j [0:b1] i i A r (z ( ), z , z ) = 0forallfunctionarguments. j [0:b1] i i A I[∼B] b. We have B = ind(I ) and thus X ⇒ S X since r (z ( ), z , z ) − i: j i j j [0:b1] i i B r (z ( ), z , z ) = 0forallfunctionarguments. j [0:b1] i i B I[∼A] c. X ⇒ S X if there is an admissible intervention with r (z ( ), i j j [0:b1] i ∗ I[B] z , y ) − r (z ( ), z , y ) = 0. Also, X ⇒S X if there is an admissible i A j [0:b1] i i A i j Causality and Conditional Independence in Settable Systems 1661

∗ intervention with r (z ( ), z , y ) − r (z ( ), z , y ) = 0. But A = j [0:b1] i i B j [0:b1] i i B (I ) = ind i: j B and the two definitions coincide. ∼A ∗ ∗ d. X ⇒S X if there is an admissible intervention with r (z ( ), z , y ) − i j j [0:b1] i i A ( , , ) = . X ⇒[B] X r j z[0:b ](i) zi yA 0 Also, i S j if there is an admissible interven- 1 ∗ ∗ tion with r (z ( ), z , y ) − r (z ( ), z , y ) = 0. But A = ind(I ) = j [0:b1] i i B j [0:b1] i i B i: j B and the two definitions coincide. [A] ∗ Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 e. X ⇒S X if there is an admissible intervention with r (z ( ), z , z ) − i j j [0:b1] i i A ( , , ) = . X ⇒∼B X r j z[0:b ](i) zi zA 0 Also, i S j if there is an admissible interven- 1 ∗ tion with r (z ( ), z , z ) − r (z ( ), z , z ) = 0. But A = ind(I ) = j [0:b1] i i B j [0:b1] i i B i: j B and the two definitions coincide. [A] ∗ f. X ⇒S X if there is an admissible intervention with r (z ( ), z , z ) − i j j [0:b1] i i A ( , , ) = . X ⇒D X r j z[0:b ](i) zi zA 0 Also, i S j if there is an admissible interven- 1 ∗ tion with r (z ( ), z , z + − ) − r (z ( ), z , z + − ) = 0. But j [0:b1] i i [b1 1:b2 1] j [0:b1] i i [b1 1:b2 1] A = ind(I ) =  + − and the two definitions coincide. i: j [b1 1:b2 1] We next state a lemma that plays a key role in formalizing the conditional Reichenbach principle of common cause. For i ∈  , j ∈  , b < b ,and b1 b2 1 2 A ⊆ ind(I ), we let the elements of y = r (z ( ), z , y , y , y , y ) i: j A: j A: j [0:b1] i i i:A A A A obtain by recursive substitution.

∼A Lemma 1. Let S be recursive, j ∈ Πb , and A a subset of ind(I0: j ).IfX0 ⇒S X j , then there exists a measurable function r˜ j such that for all admissible setting values,

, , , , , , , r j (z0 y0:A yA zA rA(z0 y0:A zA) , , , , , , . rA: j [z0 y0:A yA zA rA(z0 y0:A zA)]) =r ˜ j (zA)

c X In particular, if the canonical response Yj for j exists, then

c ≡ , , , , , c . yj r j (z0 y0:A yA yA yA yA: j )=˜r j (yA)

= , , , , , Note that here i 0andz0 y0:A yA yA yA yA: j are canonical setting and response values. ∼A X ⇒ X Thus, if 0 S j, then, provided it exists, we can express a canonical c c response Yj as a function of canonical responses YA. Proof of Lemma 1. Denote by S∗ the space of jointly admissible settings X PA X X X SA ( ∗, ∗ , ∗ , , ( ∗, ∗ , ), for ( 0, 0: j, A, A, A, 0: j)oftheform z0 y0:A yA zA rA z0 y0:A zA ∗, ∗ , ∗ , , ( ∗, ∗ , ) ) S∗ rA: j[z0 y0:A yA zA rA z0 y0:A zA ] . First, suppose that is empty or a . Then there does not exist an admissible intervention to ∼A (X , PA , X , X , X , SA ) X ⇒ X 0 0: j A A A 0: j of the specified form, and thus 0 S j.It 1662 K. Chalak and H. White

˜ follows trivially that there exists a measurable function r j such that

( , , , , ( , , ), r j z0 y0:A yA zA rA z0 y0:A zA , , , , ( , , ) ) = ˜ ( ), rA: j[z0 y0:A yA zA rA z0 y0:A zA ] r j zA

( , , , , , ) ∈ S∗ c = ( , , and in particular if z0 y0:A yA yA yA yA: j ,wehavey j r j z0 y0:A , , , ) = ˜ ( c ) S∗ Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 yA yA yA yA: j r j yA . Second, suppose that is a multi-element set and ∼A X ⇒ X that 0 S j. Then by definition 9, for all admissible interventions to (X , PA , X , X , X , SA ) 0 0: j A A A 0: j with the following corresponding responses for X j,wehave

( ∗, ∗ , ∗ , , ( ∗, ∗ , ), ∗, ∗ , ∗ , , ( ∗, ∗ , ) ) r j z0 y0:A yA zA rA z0 y0:A zA rA: j[z0 y0:A yA zA rA z0 y0:A zA ] − ( , , , , ( , , ), r j z0 y0:A yA zA rA z0 y0:A zA , , , , ( , , ) ) = . rA: j[z0 y0:A yA zA rA z0 y0:A zA ] 0

→ ˜ ( ) Therefore, there exists a measurable function zA r j zA such that for all elements of S∗,

( , , , , ( , , ), , , , , ( , , ) ) r j z0 y0:A yA zA rA z0 y0:A zA rA: j[z0 y0:A yA zA rA z0 y0:A zA ] = ˜ ( ). r j zA

( , , , , , ) ∈ S∗ In particular, for all z0 y0:A yA yA yA yA: j we have

( , , , , , ) = ˜ ( ) = ˜ ( c ). r j z0 y0:A yA yA yA yA: j r j yA r j yA

A special case of lemma 1 occurs when A = ∅:

S X ⇒ X c Corollary 6. Let and Yj be as in lemma 1. If 0 S j ,thenYj is constant.

∼A = ∅. X ⇒ X Proof of Corollary 6. Let A Proposition 8(d) gives that 0 S j if X ⇒ X . ˜ ( ) and only if 0 S j Since r j zA must be constant, the result follows from lemma 1.

Proof of Proposition 3. Apply theorem 1 (below) with A ={i} and B ={j}.

Proof of Corollary 4. Apply proposition 3 with A = ∅ and corollary 1.

Proof of Proposition 4. We prove the contrapositive. i. Suppose that i = 0 (X , X ) and that there does not exist an 0 j path that does not contain elements Causality and Conditional Independence in Settable Systems 1663

X . = ∩ (I ) S∗ of A Let A j A ind 0: j .Denoteby the space of jointly admissible A A settings to (X , P j , X , X , X , S j ) of the form 0 0: j A j Aj A j 0: j

∗ ∗ ∗ ∗ ∗ (z , y , y , z , r (z , y , z ), 0 0:A j A j A j A j 0 0:A j A j ∗ ∗ ∗ ∗ ∗ r [z , y , y , z , r (z , y , z )]). A j : j 0 0:A j A j A j A j 0 0:A j A j Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021

First, suppose that S∗ is empty or a singleton. Then there does not exist A A X P j X X X S j an admissible intervention to ( 0, 0: j , A , Aj, A , 0: j)ofthespecified ∼ j j A j X ⇒ X S∗ form, and thus 0 S j, a contradiction. Second, suppose that is a X multi-element set. By construction, for all admissible interventions to ( 0, A A P j , X , X , X , S j ) with the following corresponding responses for X , 0: j A j Aj A j 0: j j we have

∗ ∗ ∗ ∗ ∗ r (z , y , y , z , r (z , y , z ), j 0 0:A j A j A j A j 0 0:A j A j ∗ ∗ ∗ ∗ ∗ r [z , y , y , z , r (z , y , z )]) A j : j 0 0:A j A j A j A j 0 0:A j A j −r (z , y , y , z , r (z , y , z ), j 0 0:A j A j A j A j 0 0:A j A j r [z , y , y , z , r (z , y , z )]) = 0. A j : j 0 0:A j A j A j A j 0 0:A j A j

(X , X ) Otherwise, by definition 4, there must exist an 0 j path that does X X not contain elements of A and therefore of A. Then by definition 9, ∼ j A j X ⇒ X , 0 S j a contradiction. = (X , X ) ii. Suppose that j 0 and that there does not exist an 0 i path that X . does not contain elements of A Then an argument parallel to part i leads ∼A X ⇒i X ≡ ∩ (I ) to 0 S i (with Ai A ind 0:i ), a contradiction. , = (X , X ) iii. Suppose that i j 0 and that there does not exist (a) an 0 i path X (X , X ) that does not contain elements of A or (b) an 0 j path that does not X contain elements of A (or both). Then arguments parallel to i or ii (or both) ∼A ∼A X ⇒j X X ⇒i X imply that 0 S j or 0 S i (or both), a contradiction.

Proof of Corollary 5. The result is immediate from proposition 3 and the contrapositive of the definition of conditional stochastic isolation.

A.1 Conditional Reichenbach for the Vector Case . In applications, we are often interested in conditional independence relations between vectors. For D X ⇒D X X ⇒ X this, we first extend the meaning of the notations i S j and i S j 1664 K. Chalak and H. White to accommodate disjoint sets of multiple settable variables appearing on the right- and left-hand sides. For nonempty disjoint collections of indexes X A and B,welet A be a vector of settable variables whose indexes belong X X ⇒D X X ⇒D X ∈ to A and similarly for B, and we write A S B if i S j for some i D ∩  ∈ ∩  < X ⇒ X A a and j B b with a b. Otherwise, we write A S B indicating D X ⇒ X ∈ ∩  ∈ ∩  < that i S j for all i A a and j B b with a b. Observe that for S X ⇒D X X ⇒D X Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 a recursive system ,itisthuspossibletohave A S B and B S A. I[A] I[∼A] [A] ∼A Similarly,we extend the notations ⇒, ⇒ , ⇒,and⇒ and their negations to accommodate disjoint sets of multiple settable variables appearing on the right- and left-hand sides. To do this requires some further notation. Let I =∪ ∪ I \(X ∪ X ) (X , X ) A:B i∈A j∈B i: j A B denote the set of A B -intercessors and ⊂ (I ) ∈ ∈ = ∩ (I ) let C ind A:B .Forgiveni A and j B,letCi: j C ind i: j . Then we X ⇒[C] X ∈ ∩  ∈ ∩  < say that A S B if there exists i A a and j B b with a b,such [C ] [C ] [C] i: j X ⇒i: j X X ⇒ X X ⇒ X that i S j. Otherwise, we write A S B, indicating that i S j ∈ ∩  ∈ ∩  < ⇒[A] for all i A a and j B b with a b. The notations other than S [A] and ⇒S in the list above are defined analogously for vectors of variables. The definitions of conditional causal isolation and conditional P- stochastic isolation generalize to the vector case in the obvious way. Thus, c ⊥ c | c X X X if YA YB YC when A and B are not causally isolated given C (i.e., X X condition a in theorem 1 below holds), then we say that A and B are X P-stochastically isolated given C. Theorem 1 (conditional Reichenbach principle of common cause III).LetS be recursive. Let A and B be nonempty disjoint subsets of {0,...,n} and C ⊂ {1,...,n}\(A ∪ B).LetXA, XB ,andXC be the corresponding vectors of settable variables with canonical responses YA ,YB,andYC. For given probability measure Pon(Ω,F),YA ⊥ YB | YC if and only if (a) either:

∼CB i. 0 ∈ AandX0 causes XB exclusive of CB ≡ C ∩ ind(I{0}:B ), that is, X0 ⇒ S XB ; or ∼C A ii. 0 ∈ BandX0 causes XA exclusive of C A ≡ C ∩ ind(I{0}:A), that is, X0 ⇒ S XA; or ∼C A ∼CB iii. 0 ∈ A ∪ BandX0 ⇒ S XA and X0 ⇒ S XB ; and (b) XA and XB are not P− stochastically isolated given XC . Proof of Theorem 1. Let P be any probability measure. First, we prove that ⊥ | X X X if YA YB YC, then A and B are not causally isolated given C. (i) Suppose ∼ ∼ C C0: j ∈ X ⇒B X X ⇒ X ∈ ∈ that 0 A and 0 S B. Then 0 S j for all j B.Forgiven j B,let X be a vector of settable variables and let Y denote the corresponding C0: j C0: j canonical responses. By lemma 1, it follows that Y = r˜ (Y ) for all j ∈ B. j j C0: j Let X be a vector of settable variables, and letY denote the corresponding CB CB Causality and Conditional Independence in Settable Systems 1665

= ˜ ( ) c = \ X canonical response; then we have Y r Y .LetC C C ,let c B B CB B B CB be a vector of settable variables, and let YCc denote the corresponding = ( , ) B = ˜ ( ) canonical responses. Then YC YC YCc . Since YB rB YC ,wehavethat ( , ) ⊥ | B B ⊥ | ( , )B YA YCc YB YC . We then have that YA YB YC YCc (Dawid, 1979, B B B B⊥ | converse of lemma 4.3; Smith 1989, property 3), that is, YA YB YC,acon- = tradiction. (Note that when CB C, the result is immediate. Also, when

= ∅ ∈ Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 CB , YB is constant and the result is trivial.) (ii) Suppose 0 B and that ∼C X ⇒A X ⊥ | 0 S A. The result is symmetric to i, yielding that YA YB YC, a contra- ∼C ∼C ∈ ∪ X ⇒A X X ⇒B X diction. (iii) Suppose that 0 A B and that 0 S A or 0 S B. Sup- ∼C X ⇒A X ⊥ | pose that 0 S A; then an argument similar to i gives that YA YB YC, ∼C X ⇒B X a contradiction. Alternatively, suppose that 0 S B. Then by a parallel ⊥ | argument, we obtain that YA YB YC, a contradiction. X X X That A and B are not stochastically isolated given C follows by the definition of conditional stochastic isolation. The rest of the proof follows from (the contrapositive of) the definition of conditional stochastic isolation.

∈  X ⇒D X . Proof of Proposition 5. Let k 1 such that k i By construction we ∼C∪{k} X ⇒ X . ⊥ | ( , ). have that 0 S i Theorem 1 gives that Yi YA YC Yk Further, X X X ⊂ since elements of C and A do not succeed i, there exists a set D ∼D  \{ } X ⇒ (X , X ). 1 k such that 0 S C A It follows from lemma 1 that there exists ˜ ( , ) = ˜ ( ). { ∈  } a measurable function rC,A such that yC yA rC,A yD Since Yk : k 1 ⊥ . are jointly independent we have that Yk YD It follows from Dawid (1979, ⊥ ( , ). lemma 4.2(i)) that Yk YC YA Also, the converse of lemma 4.3 in Dawid ⊥ | ⊥ | ( , ), (1979) gives that Yk YA YC. Given that Yi YA YC Yk Dawid (1979, ⊥ | lemmas 4.3 and 4.2) give that Yi YA YC.

∼C X ⇒ X . Proof of Proposition 6. (i) By construction, 0 S i Thus, by theorem 1, Y ⊥ Y |Y . (ii) An argument similar to proposition 5 gives that Y ⊥ Y | i A1 C i A2 YC.

Proof of Proposition 7. We consider only the locally strictly increasing case. , , The other cases are similar. To prove the result, we choose sets S1 S2 and ∈ = γ> , ∈ , ∈ = α> , ∈ , ∈ S3 such that P[Y3 S3] 0 P[Y1 S1 Y3 S3] 0 P[Y2 S2 Y3 = β> , ∈ , ∈ , ∈ = . S3] 0 and P[Y1 S1 Y2 S2 Y3 S3] 0 Then

αβ P[Y ∈ S | Y ∈ S ] × P[Y ∈ S | Y ∈ S ] = 1 1 3 3 2 2 3 3 γ 2 = = ∈ , ∈ | ∈ , 0 P[Y1 S1 Y2 S2 Y3 S3] 1666 K. Chalak and H. White

⊥ | > , ∗, ∗, N −( ) ≡ ∗ − and thus Y1 Y2 Y3.Forthegiven 0 y1 and y2 let 1 [y1 , ∗ ), N +( ) ≡ ∗, ∗ + , N −( ) ≡ ∗ − , ∗ ), N +( ) ≡ ∗, ∗ + . y1 1 [y1 y1 ] 2 [y2 y2 and 2 [y2 y2 ] = N −( ), = N −( ), ≡ ∗, ¯ , ∗ ≡ ( ∗, ∗ ) We let S1 1 S2 2 and S3 [y3 y3] where y3 r3 y1 y2 and ¯ ≡ { ( ∗, ∗ + ), ( ∗ + , ∗ )}. y3 min r3 y1 y2 r3 y1 y2 By the local strictly increasing prop- ∗ < ¯ ∈ ≥ ∈ N +( ), ∈ N +( ), ∈ > , erty, y3 y3.WehaveP[Y3 S3] P[Y1 1 Y2 2 Y3 S3] 0 ( ∗, ∗ ) as r3 is locally strictly increasing at y1 y2 and the specified event has ∈ , ∈ ≥ ∈ N −( ), ∈ positive Lebesgue measure. Next, P[Y1 S1 Y3 S3] P[Y1 1 Y2 N +( ), ∈ > ∈ , ∈ ≥ Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 2 Y3 S3] 0 for the same reasons. Similarly, P[Y2 S2 Y3 S3] ∈ N +( ), ∈ N −( ), ∈ > . ∈ , ∈ , ∈ = P[Y1 1 Y2 2 Y3 S3] 0 But P[Y1 S1 Y2 S2 Y3 S3] ∈ N −( ), ∈ N −( ), ∈ = P[Y1 1 Y2 2 Y3 S3] 0 by the local strictly increasing prop- erty, which completes the proof.

Acknowledgments

An earlier version of this article was circulated under the title “Indepen- dence and Conditional Independence in Causal Systems.” We thank five anonymous reviewers, participants of the Joint Statistical Meetings 2011; the Econometrics Seminars at Harvard/MIT, Yale, NYU, and Pennsylva- nia State University; as well as Julian Betts, Jin Seo Cho, Graham Elliott, Clive Granger, Arthur Lewbel, Mark Machina, Judea Pearl, Dimitris Politis, and especially Ruth Williams for their helpful comments and suggestions. We also thank Sin-Miao Wang and Tao Yang for their assistance with the preparation of this manuscript. Any errors are solely our responsibility.

References

Avin, C., Shpitser, I., & Pearl, J. (2005). Identifiability of path-specific effects. In Proceedings of the International Joint Conference on Artificial Intelligence (pp. 357–363). San Francisco: Morgan Kaufmann. Bang-Jensen, J., & Gutin, G. (2001). Digraphs: Theory, algorithms and applications.New York: Springer-Verlag. Cartwright, N. (2000). Measuring causes: Invariance, modularity and the causal Markov condition. London: Centre for Philosophy of Natural and Social Science. Chalak, K., & White, H., (2011). An extended class of instrumental variables for the estimation of causal effects. Canadian Journal of Economics, 44, 1–51. Corbae, D., Stinchcombe, M., & Zeman, J. (2009). An introduction to mathematical analysis for economic theory and econometrics. Princeton, NJ: Princeton University Press. Dawid, A. P. (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, B, 41, 1–31. Dawid, A. P. (2002). Influence diagrams for causal modeling and inference. Interna- tional Statistical Review, 70, 161–189. Dawid, A. P. (2010a). Beware of the DAG! Journal of Machine Learning Research Work- shop and Conference Proceedings, 6, 59–86. Causality and Conditional Independence in Settable Systems 1667

Dawid, A. P. (2010b). Seeing and doing: The Pearlian synthesis. In K. Dechter, H. Geffner, & J. Y. Halpern (Eds.), Heuristics, probability and causality: A tribute to Judea Pearl (pp. 309–325). London: College Publications. Didelez, V., Dawid, A. P., & Geneletti, S. (2006). Direct and indirect effects of se- quential treatments. In R. Dechter, & T. S. Richardson (Eds.), Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (pp. 138–146). N. P.: AUAI Press.

Dudley, R. M. (2002). Real analysis and probability. Cambridge: Cambridge University Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 Press. Duvenaud, D., Eaton, D., Murphy, K., & Schmidt, M. (2010). Causal learning without DAGs. Journal of Machine Learning Research Workshop and Conference Proceedings, 6, 177–190. Geiger, D., & Pearl, J. (1993). Logical and algorithmic properties of conditional inde- pendence and graphical Models. Annals of Statistics, 21, 2001–2021. Geiger, D., Verma, T. S., & Pearl, J. (1990). Identifying independence in Bayesian networks. Networks, 20, 507–534. Geneletti, S. (2007). Identifying direct and indirect effects in a non-counterfactual framework. Journal of the Royal Statistical Society, B, 69, 199–215. Hausman, D., & Woodward, J. (1999). Independence, invariance and the causal Markov condition. British Journal for the Philosophy of Science, 50, 521–583. Heckerman, D., & Shachter, R. (1995). Decision-theoretic foundations for causal reasoning. Journal of Artificial Intelligence Research, 3, 405–430. Heckman, J. (2005). The scientific model of causality. Sociological Methodology, 35, 1–97. Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–970. Lauritzen, S. L., Dawid, A. P., Larsen, B. N., & Leimer, H.-G. (1990). Independence properties of directed Markov fields. Networks, 20, 491–505. Lauritzen, S. L., & Richardson, T. S. (2002). Chain graph models and their causal interpretation. Journal of the Royal Statistical Society, B, 64, 321–361. Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, B, 50, 157–224. Neveu, J. (1965). Mathematical foundations of the calculus of probability (Trans. A. Fein- stein). San Francisco: Holden-Day. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible infer- ence. San Mateo, CA: Morgan Kaufmann. Pearl, J. (1993). Aspects of graphical methods connected with causality. In Proceedings of the 49th Session of the International Statistical Institute (pp. 391–401). Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, 669–710. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge: Cambridge University Press. Pearl, J. (2001). Direct and indirect effects. In Proceedings of the Seventeenth Confer- ence on Uncertainty in Artificial Intelligence (pp. 411–420). San Francisco: Morgan Kaufmann. Reichenbach, H. (1956). The direction of time. Berkeley: University of California Press. 1668 K. Chalak and H. White

Robins, J. (2003). Semantics of causal models and the identification of direct and indirect effects. In P. Green, N. L. Hjort, & S. Richardson (Eds.), Highly structured stochastic systems (pp. 70–81). New York: Oxford University Press. Robins, J., & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. Rubin, D. (2004). Direct and indirect causal effects via potential outcomes. Scandina- vian Journal of Statistics, 31, 161–170.

Shpitser, I., & Pearl, J. (2008). Complete identification methods for the causal hierar- Downloaded from http://direct.mit.edu/neco/article-pdf/24/7/1611/1065694/neco_a_00295.pdf by guest on 26 September 2021 chy. Journal of Machine Learning Research, 9, 1941–1979. Shpitser, I., Richardson, T., & Robins, J. (2009). Testing edges by truncations. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence. Cambridge, MA: AAAI Press. Smith, J. Q. (1989). Influence diagrams for statistical modeling. Annals of Statistics, 17, 654–672. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction and search. Berlin: Springer-Verlag. Spohn, W. (1980). Stochastic independence, causal independence, and shieldability. Journal of Philosophical Logic, 9, 73–99. Strotz, R., & Wold, H. (1960). Recursive vs. nonrecursive systems: An attempt at synthesis. Econometrica, 28, 417–427. Studeny, M. (1993). Formal properties of conditional independence in different cal- culi of AI. In M. Clarke, R. Kruse, & S. Moral (Eds.), Symbolic and quantitative approaches to reasoning and uncertainty (pp. 341–348). Berlin: Springer-Verlag. Verma, T., & Pearl, J. (1988). Causal networks: Semantics and expressiveness. In Proceedings, 4th Workshop on Uncertainty in Artificial Intelligence (pp. 352–359). New York: Elsevier. Wermuth, N., & Cox, D. R. (2004). Joint response graphs and separation induced by triangular systems. Journal of the Royal Statistical Society, B, 66, 687–717. White, H., & Chalak, K. (2009). Settable systems: An extension of Pearl’s causal model with optimization, equilibrium, and learning. Journal of Machine Learning Research, 10, 1759–1799. White, H., & Chalak, K. (2010). Testing a conditional form of exogeneity. Economics Letters, 109, 88–90. White, H., & Chalak, K. (in press). Identification and identification failure for treat- ment effects using structural systems. Econometric Reviews. White, H., Chalak, K., & Lu, X. (2010). Linking granger causality and the Pearl causal model with settable systems. Journal of Machine Learning Research Workshop and Conference Proceedings, 12, 1–29. White, H., & Lu, X. (2011). Causal diagrams for treatment effect estimation with application to efficient covariate selection. Review of Economics and Statistics, 93, 1453–1459. White, H., Xu, H., & Chalak, K. (2011). Causal discourse in a game of incomplete information (discussion paper). San Diego, CA: Department of Economics, Uni- versity of California, San Diego.

Received November 15, 2010; accepted December 19, 2011.