Identifying Independencies in Causal Graphs with Feedback P Aj
Total Page:16
File Type:pdf, Size:1020Kb
420 Pearl and Dechter Identifying Independencies In Causal Graphs with Feedback Judea Pearl Rina Dechter Cognitive Systems Laboratory Information & Computer Science Computer Science Department University of California, Irvine University of California, Los Angeles, CA 90024 Irvine, CA 92717 [email protected]. edu rdechter@ics. uci. edu Abstract then we contrast these notions with those associated with recursive and nonrecursive causal models. The We show that the d-separation criterion con reader is encouraged to consult the example in section stitutes a valid test for conditional indepen 3.1, where these notions are given graphical represen dence relationships that are induced by feed tations. back systems involving discrete variables. 2.1 BAYESIAN NETWORKS INTRODUCTION 1 Definition 1 Let V :::::: {X 1, . .., Xn} be an ordered set of variables, and let P( v) be the joint probability dis It is well known that the d-separation test is sound tribution on these variables. A set of variables P Aj is and complete relative to the independencies as said to be Markovian parents of Xj if PAj is a minimal sumed in the construction of Bayesian networks set of predecessors of Xi that renders Xi independent [Verma and Pearl, 1988, Geiger et al., 1990]. In other of all its other predecessors. In other words, PAj is words, any d-separation condition in the network cor any subset of {X,, ... , Xj -d satisfying responds to a genuine independence condition in the P(xJIPai):::::: P(xilx,, . ,X -t) (1) underlying probability distribution and, conversely, j such that no proper subset of PAj satisfies Eq. (1). every d-connection corresponds to a dependency in at least one distribution compatible with the network. Definition 1 assigns to each variable Xi a select set of The situation with feedback systems is more compli variables PAj that are sufficient for determining the cated, primarily because the probability distributions probability of X1; knowing the values of other preced associated with such systems do not lend themselves ing variables is redundant once we know the values paj to a simple product decomposition. The joint distribu of the parent set P Aj. This assignment can be repre tion of feedback systems cannot be written as a prod sented in the form of a directed acyclic graph (DAG) uct of the conditional distributions of each child vari in which variables are represented by nodes and arrows able, given its parents. Rather, the joint distribution are drawn from each node of the parent set P Aj toward is governed by the functional relationships that tie the the child node Xj. Definition 1 also suggests a simple variables together. recursive method for constructing such a DAG: At the ith stage of the construction, select any minimal set Spirtes (1994) and Koster (1995) have nevertheless of X;'s predecessors that satisfies Eq. (1), call this set shown that the d-separation test is valid for cyclic PA; (connoting "parents"), and draw an arrow from graphs, provided that the equations are linear and all each member in P Ai to X;. The result is a DAG, distributions are Gaussian. In this paper we extend called a Bayesian network [Pearl, 1988], in which an the results of Spirtes and Koster and show that the d arrow from X; to Xj assigns X; as a Markovian par separation test is valid for nonlinear feedback systems ent of Xi, consistent with Definition 1. and non-Gaussian distributions, provided the variables are discrete. The construction implied by Definition 1 defines a Bayesian network as a carrier of conditional indepen dence information that is obtained along a specific or BAYESIAN NETWORKS VS. 2 der 0. Clearly, every distribution satisfying Eq. (1) CAUSAL NETWORKS: A must decompose (using the chain rule of probability REVIE W calculus) into the product In this section we first review the basic notions and (2) nomenclature associated with Bayesian networks, and Identifying Independencies in Causal Graphs with Feedback 421 which is no longer order-specific. Conversely, for every This alternative test of d-separation will play a major distribution decomposed as Eq. (2) one can find an or role in our proofs, hence we cast it in some extra nota dering 0 that would produce G as a Bayesian network. tion. Denote by (X II" YIZ)c the condition that Z in If a probability distribution P admits the product de tercepts all paths between X and Y in some undirected composition dictated by G, as given in Eq.(2), we say graph G. Let oXYZ stand for the undirected ances that G and P are compatible. tral graph obtained through Lauritzen's construction. A convenient way of characterizing the set of distri Lauritzen's equivalence theorem amounts to asserting butions compatible with a DAG G is to list the set of (X_[ YIZ)cxYz ¢::::> (G _II YIZ)c (3) (conditional) independencies that each such distribu tion must satisfy. These independencies can be read off the DAG by using a graphical criterion called Note that the operator Jt denotes ordinary sepa d-separation [Pearl, 1988]. To test whether X is in ration in undirected graphs while -II stands for d- dependent of Y given Z in the distributions repre separation in DAGs. sented by G, we need to examine G and test whether the nodes corresponding to variables Z d-separate all 2.2 CAUSAL THEORIES AND CAUSAL paths from nodes in X to nodes in Y. By path we mean GRAPHS a sequence of consecutive edges (of any directionality) in the DAG. A causal theory is a fully specified model of the causal relationships that govern a given domain, that is, a Definition 2 ( d-separation) A path p is said to be d mathematical object that provides an interpretation separated (or blocked) by a set of nodes Z iff: (and computation) of every causal query about the domain. Following [Pearl, 1995b] we will adapt here a definition that generalizes most causal models used in (i) p contains a chain i ---+ j ---+ k or a fork i <- engineering and economics. j ------> k such that the middle nod e j is in Z, or Definition 3 A causal theory is a four-tuple (ii) p contains an inverted fork i ------> j +- k such that neither the middle node j nor any of its de T =< V, U, P(u), {!;} > scendants (in G) are in Z. where If X, Y, and Z are three disjoint subsets of nodes in a DA G G, then Z is said to d-separate X from Y, de (i) V = {X1, . , Xn} is a set of observed variables, noted (X II YIZ)c, iffZ d-separates every path from (ii) = . is a set of exogenous (often un- a node in X to a node in Y. U { U1, .. , Un} measured} variables that represent disturbances, To distinguish between the graphical notion of d abnormalities, or assumptions, separation, (X II YIZ)c, and the probabilistic notion (iii) P(u) is a distribution function over U1, ... , Un, of conditional independence, we will use the notation and (X .l YIZ)p for the latter. The connection between the two is given in Theorem 1. (iv) {fi} is a set of n deterministic functions, each of the form Theorem 1 [Verma and Pearl, 1988, Geiger et al., 1990J For any x; = f;(pa;, u;) i = 1, ... , n (4) three disjoint subsets of nodes (X, Y, Z) in a DA G G, and for all probability functions P, we have where P A; is a subset of variables in V not containing X;, and pa; is any instance of P A;. (i) (X II YjZ)c =:>(X .l YjZ)p whenever G and We will further assume that the set of equations in p arecompatible, and (iv) has a unique solution for X 1, . , Xn, given any (ii) If (X .l Y IZ)p holds in all distributions compat value of the disturbances ul' . .. 'Un. I Therefore, the ible with G, then (X _I! YjZ)G· distribution P( u) induces a unique distribution on the observables, which we denote by Pr(v). An alternative test for d-separation has been devised by Lauritzen et al. (1990), based on the notion of mor As in the case of Bayesian networks, drawing arrows alized ancestral graphs. To test for (X II YIZ)c, between the variables P Ai and Xi defines a directed delete from G all nodes except those in {X, Y, Z} 1The uniqueness assumption is equivalent to the re and their ancestors, connect by an edge every pair of quirement that the set of equations {fi}, viewed as a dy nodes that share a common child, and remove all ar namic physical system, be stable. Indeed, if V may attain rows from the arcs. (X II YIZ)c holds iffZ is a cut two different states under the same U, it means that a set of the resulting undirected graph, separating nodes small perturbation in U would result in a drastic change in of X from those of Y. V, hence, instability. 422 Pearl and Dechter graph Gr, which we call the causal graph ofT. How Causal theories that obey Eq. (6) but possibly not ever, the construction of causal graphs differs from Eq. (5) will be called semi-Markovian. Such theo that of Bayesian networks in two ways. First, Gr ries are "complete" in the sense that all probabilis may, in general, be cyclic. Second, unlike the Marko tic dependencies are explained in terms of causal de vian parents in Definition ( 1), P A; are specified by pendencies. We will first state our results for semi the modeler based on assumedfunctional relationships Markovian theories and then extend them to general, among the variables, not by conditional independence non-Markovian theories.