The Causal Interpretations of Bayesian Hypergraphs

The Causal Interpretations of Bayesian Hypergraphs Zhiyu Wang1, Mohammad Ali Javidian2, Linyuan Lu1, Marco Valtorta2 1Department of Mathematics 2Department of Computer Science and Engineering University of South Carolina, Columbia, SC, 29208 Abstract Probabilistic Graphical Models (PGMs) enjoy a well-deserved popularity because they allow ex- We propose a causal interpretation of Bayesian plicit representation of structural constraints in the hypergraphs (Javidian et al. 2018), a probabilistic graphical model whose structure is a directed language of graphs and similar structures. From acyclic hypergraph, that extends the causal interpre- the perspective of efficient belief update, factoriza- tation of LWF chain graphs. We provide intervention tion of the joint probability distribution of random formulas and a graphical criterion for intervention variables corresponding to variables in the graph is in Bayesian hypergraphs that specializes to a new paramount, because it allows decomposition of the graphical criterion for intervention in LWF chain calculation of the evidence or of the posterior prob- graphs and sheds light on the causal interpretation ability (Lauritzen and Jensen 1997). The prolifera- of interaction as represented by undirected edges tion of different PGMs that allow factorizations of in LWF chain graphs or heads in Bayesian hyper- different kinds leads us to consider a more general graphs. graphical structure in this paper, namely directed acyclic hypergraphs. Since there are many more Introduction and Motivation hypergraphs than DAGs, undirected graphs, chain Probabilistic graphical models are graphs in which graphs, and, indeed, other graph-based networks, nodes represent random variables and edges rep- Bayesian hypergraphs can model much finer fac- resent conditional independence assumptions. They torizations and thus are more computationally effi- provide a compact way to represent the joint prob- cient. See (Javidian et al. 2018) for examples of how ability distributions of a set of random variables. Bayesian hypergraphs can model graphically func- In undirected graphical models, e.g., Markov net- tional dependencies that are hidden in probability ta- works (see (Darroch et al. 1980; Pearl 1988)), there bles when using BNs or CGs. We provide a causal is a simple rule for determining independence: two interpretation of Bayesian hypergraphs that extends set of nodes A and B are conditionally independent the causal interpretation of LWF chain graphs (Lau- given C if removing C separates A and B. On the ritzen and Richardson 2002) by giving correspond- other hand, directed graphical models, e.g. Bayesian ing formulas and a graphical criterion for interven- networks (see (Kiiveri, Speed, and Carlin 1984; tion, which operationalize the intuition that feed- Wermuth and Lauritzen 1983; Pearl 1988)), which back processes in LWF chain graphs and Bayesian consist of a directed acyclic graph (DAG) and a hypergraphs models turn into causal processes when corresponding set of conditional probability tables, variables in them are conditioned upon by inter- have a more complicated rule (d-separation) for de- vention. The addition of feedback processes and its termining independence. More complex graphical causal interpretation is a conceptual advance within models include various types of graphs with edges the three-level causal hierarchy (Pearl 2009). of several types (e.g., (Cox and Wermuth 1993; 1996; Richardson and Spirtes 2002; Pena˜ 2014)), in- Bayesian Hypergraphs: Definitions cluding chain graphs (Lauritzen and Wermuth 1989; Hypergraphs are generalizations of graphs such that Lauritzen 1996), for which different interpretations each edge is allowed to contain more than two ver- have emerged (Andersson, Madigan, and Perlman tices. Formally, an (undirected) hypergraph is a pair 1996; Drton 2009). H = (V; E), where V = fv1; v2; ··· ; vng is the set of Copyright c 2019, Association for the Advancement of vertices (or nodes) and E = fh1; h2; ··· ; hmg is the Artificial Intelligence (www.aaai.org). All rights reserved. set of hyperedges where hi ⊆ V for all i 2 [m]. If jh j = k for every i 2 [m], then we say H is a k- chain components fτ : τ 2 Dg yields an unique nat- i S uniform (undirected) hypergraph. A directed hyper- ural partition of the vertex set V(H) = τ2D τ with edge or hyperarc h is an ordered pair, h = (X; Y), of the following properties: (possibly empty) subsets of V where X \ Y = ;; X Proposition 1. Let H be a DAH and fτ : τ 2 Dg is the called the tail of h while Y is the head of h. be its chain components. Let G be a graph obtained We write X = T(h) and Y = H(h). We say a directed from H by contracting each element of fτ : τ 2 Dg hyperedge h is fully directed if none of H(h) and into a single vertex and creating a directed edge T(h) are empty. A directed hypergraph is a hyper- from τi 2 V(G) to τ j 2 V(G) in G if and only if there graph such that all of the hyperedges are directed. A exists a hyperedge h 2 E(H) such that T(h) \ τi , ; (s; t)-uniform directed hypergraph is a directed hy- and H(h) \ τ j , ;. Then G is a DAG. pergraph such that the tail and head of every directed edge have size s and t respectively. For example, any Proof. See (Javidian et al. 2018). DAG is a (1; 1)-uniform hypergraph (but not vice Note that the DAG obtained in Proposition 1 is versa). An undirected graph is a (0; 2)-uniform hy- unique and given a DAH H we call such G the pergraph. Given a hypergraph H, we use V(H) and canonical DAG of H. E(H) to denote the the vertex set and edge set of H respectively. Definition 1. A Bayesian hypergraph is a triple We say two vertices u and v are co-head (or co- (V; H; P) such that V is a set of random variables, tail) if there is a directed hyperedge h such that H is a DAH on the vertex set V and P is a multivari- fu; vg ⊂ H(h) ( or fu; vg ⊂ T(h) respectively). Given ate probability distribution on V such that the local another vertex u , v, we say u is a parent of v, de- Markov property holds with respect to the DAH H, noted by u ! v, if there is a directed hyperedge h i.e., for any vertex v 2 V(H), 2 2 such that u T(h) and v H(h). If u and v are co- v y nd(v)ncl(v) j bd(v): (1) head, then u is a neighbor of v. If u; v are neighbors, we denote them by u−v. Given v 2 V, we define par- For a Bayesian hypergraph H whose underlying ent (pa(v)), neighbor (nb(v)), boundary (bd(v)), an- DAH is a LWF DAH, we call H a LWF Bayesian cestor (an(v)), anterior (ant(v)), descendant (de(v)), hypergraph. and non-descendant (nd(v)) for hypergraphs exactly the same as for graphs (and therefore use the same Bayesian hypergraphs factorizations names). The same holds for the equivalent concepts The factorization of a probability measure P accord- for τ ⊆ V. Note that it is possible that some vertex u ing to a Bayesian hypergraph is similar to that of is both the parent and neighbor of v. a chain graph. Before we present the factorization A partially directed cycle in H is a sequence property, let us introduce some additional terminol- u fv1; v2;::: vkg satisfying that vi is either a neighbor ogy. Given a DAH H, we use H to denote the or a parent of vi+1 for all 1 ≤ i ≤ k and vi ! vi+1 for undirected hypergraph obtained from H by replac- some 1 ≤ i ≤ k. Here vk+1 ≡ v1. We say a directed ing each directed hyperedge h = (A; B) of H into an hypergraph H is acyclic if H contains no partially undirected hyperedge A [ B. Given a family of sets directed cycle. For ease of reference, we call a di- F , define a partial order (F ; ≤) on F such that for rected acyclic hypergraph a DAH or a Bayesian hy- two sets A; B 2 F , A ≤ B if and only if A ⊆ B. Let pergraph structure (as defined in Definition 1). Note M(F ) denote the set of maximal elements in F , i.e., that for any two vertices u; v in a directed acyclic no element in M(F ) contains another element as hypergraph H, u can not be both the parent and subset. When F is a set of directed hyperedges, we neighbor of v otherwise we would have a partially abuse the notation to denote M(F ) = M(F u). Let directed cycle. H be a directed acyclic hypergraph and fτ : τ 2 Dg Remark 1. DAHs are generalizations of undirected be its chain components. Assume that a probability graphs, DAGs and chain graphs. In particular an distribution P has a density f , with respect to some undirected graph can be viewed as a DAH in which product measure µ = ×α2V µα on X = ×α2V Xα. Now every hyperedge is of the form (;; fu; vg). A DAG we say a probability measure P factorizes according is a DAH in which every hyperedge is of the form to H if it has density f such that (fug; fvg). A chain graph is a DAH in which every (i) f factorizes as in the directed acyclic case: hyperedge is of the form (;; fu; vg) or (fug; fvg). Y f (x) = f (x j x ): (2) We define the chain components of H as the τ pa(τ) 2D equivalence classes under the equivalence relation τ ∗ where two vertices v1; vt are equivalent if there ex- (ii) For each τ 2 D, define Hτ to be the subhy- ists a sequence of distinct vertices v1; v2;:::; vt such pergraph of Hτ[pa(τ) containing all edges h in that vi and vi+1 are co-head for all i 2 [t − 1].

Load more