Reading Dependencies from Polytree-Like Bayesian Networks

PEÑA 303 Reading Dependencies from Polytree-Like Bayesian Networks Jose M. Pe~na Division of Computational Biology Department of Physics, Chemistry and Biology LinkÄopingUniversity, SE-58183 LinkÄoping,Sweden Abstract Section 3 that assuming composition and weak transitivity is not too restrictive. We present our graphical We present a graphical criterion for reading criterion in Section 4 and prove that it is sound and dependencies from the minimal directed in- complete. Finally, we close with some discussion in dependence map G of a graphoid p when G Section 5. is a polytree and p satis¯es composition and weak transitivity. We prove that the crite- 2 PRELIMINARIES rion is sound and complete. We argue that assuming composition and weak transitivity The de¯nitions and results in this section are taken is not too restrictive. from [Lauritzen, 1996, Pearl, 1988, Studen¶y,2005]. We use the juxtaposition XY to denote X [ Y, and 1 INTRODUCTION X to denote the singleton fXg. Let U denote a set of random variables. Unless otherwise stated, all the independence models and graphs in this paper are de- A minimal directed independence map G of an inde- ¯ned over U. pendence model p is used to read independencies that hold in p. Sometimes, however, G can also be used to Let X, Y, Z and W denote four mutually disjoint read dependencies holding in p. For instance, if p is a subsets of U. An independence model p is a set of graphoid that is faithful to G then, by de¯nition, lack independencies of the form X is independent of Y of vertex separation is a sound and complete graphi- given Z. We represent that an independence is in cal criterion for reading dependencies from G. If p is p by X ?? YjZ and that an independence is not in simply a graphoid, then there also exists a sound and p by X 6?? YjZ. In the latter case, we may equi- complete graphical criterion for reading dependencies valently say that the dependence X 6?? YjZ is in p. from G [Bouckaert, 1995]. However, this criterion can- An independence model is a graphoid when it satis¯es not be applied to check whether two set of nodes X the following ¯ve properties: Symmetry X ?? YjZ ) and Y are dependant given a third set Z unless X and Y ?? XjZ, decomposition X ?? YWjZ ) X ?? YjZ, Y are adjacent in G, i.e. unless there is an edge in weak union X ?? YWjZ ) X ?? YjZW, contraction G from some A 2 X to some B 2 Y. In this paper, X ?? YjZW ^ X ?? WjZ ) X ?? YWjZ, and intersec- we try to overcome this shortcoming so that X and tion X ?? YjZW ^ X ?? WjZY ) X ?? YWjZ. Any Y are not required to be adjacent in G. We do so by strictly positive probability distribution is a graphoid. constraining G and p. Speci¯cally, we present a sound Let sep(X; YjZ) denote that X is separated from Y and complete graphical criterion for reading dependen- given Z in a graph G. Speci¯cally, sep(X; YjZ) holds cies from G under the assumptions that G is a polytree when every path in G between X and Y is blocked and p is a graphoid that satis¯es composition and weak by Z. If G is an undirected graph (UG), then a path transitivity. We argue that assuming composition and in G between X and Y is blocked by Z when there weak transitivity is not too restrictive. Speci¯cally, exists some Z 2 Z in the path. We say that a node is we show that there exist important families of proba- a head-to-head node in a path if it has two parents in bility distributions, among them Gaussian probability the path. If G is a directed and acyclic graph (DAG), distributions, that satisfy these two properties. then a path in G between X and Y is blocked by Z The rest of the paper is organized as follows. We start when there exists a node Z in the path such that ei- by reviewing some concepts in Section 2. We show in ther (i) Z is not a head-to-head node in the path and 304 PEÑA Z 2 Z, or (ii) Z is a head-to-head node in the path V 2 U n (XYZ). We now argue that there exist im- and neither Z nor any of its descendants in G is in Z. portant families of probability distributions that are An independence model p is faithful to an UG or DAG CWT graphoids and, thus, that WT graphoids are G when X ?? YjZ i® sep(X; YjZ). Any independence worth studying. For instance, any probability distri- model that is faithful to some UG or DAG is a gra- bution that is Gaussian or faithful to some UG or DAG phoid. An UG (resp. DAG) G is an undirected (resp. is a CWT graphoid [Pearl, 1988, Studen¶y,2005]. The directed) independence map of an independence model theorem below proves that the marginals and condi- p when X??YjZ if sep(X; YjZ). Moreover, G is a mi- tionals of a probability distribution that is a CWT nimal undirected (resp. directed) independence map graphoid are CWT graphoids, although they may be of p when removing any edge from G makes it cease to neither Gaussian nor faithful to any UG or DAG. We be an independence map of p. We abbreviate minimal give an example at the end of this section. We refer undirected (resp. directed) independence map as MUI the reader to [Pe~naet al., 2006a, Pe~naet al., 2006b, (resp. MDI) map. If G is a MUI map of p, then two Pe~naet al., 2007] for the proofs of the theorems in this nodes X and Y are adjacent in G i® X 6??Y jU n (XY ). section. On the other hand, if G is a MDI map of p, then the parents of a node X in G, P a(X), are the smallest sub- Theorem 1 Let p be a probability distribution that is set of the nodes preceding X in a given total ordering a CWT graphoid and let W ⊆ U. Then, p(U n W) is of U, P re(X), such that X ??P re(X)nP a(X)jP a(X). a CWT graphoid. If p(U n WjW = w) has the same We denote the children of a node X by Ch(X). independencies for all w, then p(U n WjW = w) for any w is a CWT graphoid. Let Cl denote the set of cliques of an UG G.A Markov network (MN) is a pair (G; θ) where G is an Hereinafter, we say that a probability distribution p UG and θ are parameters specifying a non-negative has context-speci¯c independencies if there exists some function for the random variables in each C 2 Cl, W ⊆ U such that p(U n WjW = w) does not have Á(C). The MN represents the probability distribution Q the same independencies for all w. We now show that p = K Á(C) where K is a normalizing constant. C2Cl it is not too restrictive to assume, as in the theorem If a probability distribution p can be represented by a above, that a probability distribution is a CWT gra- MN with UG G, then G is an undirected independence phoid that has no context-speci¯c independencies, be- map of p. When p is strictly positive, the opposite also cause there exist important families of probability dis- holds. tributions whose all or almost all the members satisfy A Bayesian network (BN) is a pair (G; θ) where G such assumptions. For instance, a Gaussian probabi- is a DAG and θ are parameters specifying a multi- lity distribution is a CWT graphoid [Studen¶y,2005], nomial probability distribution for each X 2 U gi- and has no context-speci¯c independencies because its ven its parents in G, p(XjP a(X)). The BN repre- independencies are determined by its covariance ma- trix [Lauritzen, 1996]. The theorems below prove that Qsents the multinomial probability distribution p = X2U p(XjP a(X)). A probability distribution p can this is also the case for almost all the probability dis- be represented by a BN with DAG G i® G is an direc- tributions in M(G) where G is an UG or DAG. ted independence map of p. Theorem 2 Let G be an UG. M(G) has non-zero Given an UG (resp. DAG) G, we denote by M(G) Lebesgue measure wrt Rn, where n is the number of all the multinomial probability distributions that can MN parameters corresponding to G. The probabi- be represented by a MN (resp. BN) with UG (resp. lity distributions in M(G) that are not faithful to G DAG) G. Finally, recall that a polytree is a directed or have context-speci¯c independencies have zero Le- graph without undirected cycles. In other words, there besgue measure wrt Rn. exists at most one undirected path between any two nodes X and Y . We denote that path by X : Y if it Theorem 3 Let G be a DAG. M(G) has non-zero exists. Recall also that a directed tree is a polytree Lebesgue measure wrt Rn, where n is the number where every node has at most one parent. of BN parameters corresponding to G. The probability distributions in M(G) that are not faithful to G 3 CWT GRAPHOIDS or have context-speci¯c independencies have zero Le- besgue measure wrt Rn.

Reading Dependencies from Polytree-Like Bayesian Networks

Context-Bounded Analysis of Concurrent Queue Systems*

On the Parameterized Complexity of Polytree Learning

Bayesian Networks

Lecture 10: Variable Eliminaton, Contnued

Polytree-Augmented Classifier Chains for Multi-Label Classification

Arxiv:1802.06316V1 [Math.AC] 18 Feb 2018 Ne Yl,Eg Ideal

Introduction to Neurabase: Forming Join Trees in Bayesian Networks

The Study of the Theoretical Size and Node Probability of the Loop Cutset in Bayesian Networks

Learning Polytrees

An Algorithm to Learn Polytree Networks with Hidden Nodes

Arxiv:1507.02608V6 [Math.ST] 3 Feb 2018 Represented by a Completed Partially Directed Acyclic Graph (CPDAG)

Graph Theory