Discrete Bayesian Networks: the Exact Posterior Marginal Distributions

Discrete Bayesian Networks: The Exact Posterior Marginal Distributions Do Le (Paul) Minh Department of ISDS, California State University, Fullerton CA 92831, USA [email protected] August 27, 2018 Abstract: In a Bayesian network, we wish to evaluate the marginal probability of a query variable, which may be conditioned on the observed values of some evidence variables. Here we first present our \border algorithm," which converts a BN into a directed chain. For the polytrees, we then present in details, with some modifications and within the border algorithm framework, the \revised polytree algorithm" by Peot & Shachter (1991). Finally, we present our \parentless polytree method," which, coupled with the border algorithm, converts any Bayesian network into a polytree, rendering the complexity of our inferences independent of the size of network, and linear with the number of its evidence and query variables. All quantities in this paper have probabilistic interpretations. Keywords: Bayesian networks; Exact inference; Border algorithm; Re- arXiv:1411.6300v1 [cs.AI] 23 Nov 2014 vised polytree algorithm; Parentless polytree method 1 The Bayesian Networks (BNs) Consider a directed graph G defined over a set of ` nodes V = fV1;V2; :::; V`g, in which each node represents a variable. (We denote both a variable and its corresponding node by the same notation, and use the two terms inter- changeably.) The pairs of nodes (Vi;Vj) may be connected by either the 1 Figure 1: The Bayesian Network A directed edge Vi ! Vj or Vj ! Vi, but not both. It is not necessary that all pairs be connected in this manner. In this paper, we will first use the graph in Figure 1 as an example. For node V 2 V, we call 1. the nodes sending the directed edges to V the \parents" of V . We denote the set of the parents of V by HV : In Figure 1, HH = fC; Dg. A node is said to be a \root" if it has no parents. (For example, nodes A, B and G.) 2. the nodes receiving the directed edges from V the \children" of V . We denote the set of the children of V by LV . In Figure 1, LD = fH; Ig.A node is said to be a \leaf" if it has no children. (For example, nodes J, K and L.) We also call the parents and children of V its \neighbors." 3. the parents of the children of V , except V , the \co-parents" of V . We denote the set of the co-parents of V by KV = f[η2LV Hηg nV . (We denote by X nY the set fX : X 2 X ; X2 = Yg. X nY = ? iff X ⊆ Y.) In our example, KD = fC; F g. The set of edges connecting nodes Vi and Vj either directly or via other nodes Vk, ..., Vm in the form of Vi ! Vk ! ::: ! Vm ! Vj is called a \ directed path" from Vi to Vj. We restrict ourselves to the \directed acyclic 2 graph" (DAG) in which there is no directed path that starts and ends at the same node. If there is a directed path from Vi to Vj, we say Vi is an \ancestor" of Vj and Vj a \descendant" of Vi. Let NV and MV be the set of all ancestors and descendants of V , respectively. In Figure 1 , NI = fA; B; D; F g, MC = fH; J; Kg. The \Markovian assumption" of a DAG is that every variable is condi- tionally independent of its non-descendants given its parents. Attached to each node V 2 V is a conditional probability distribution Pr fV jHV g. If a node has no parent, its distribution is unconditional. We assume in this paper that all V 2 V are discrete, and all conditional probability distributions are in the form of the conditional probability tables (CPTs), taking strictly positive values. We assume that the \size" of Pr fV jHV g (that is, the number of possible values of V and HV ) is finite for all V 2 V. A \Bayesian network" (BN) is a pair (G; Θ), where G is a DAG over a set of variables V = fV1;V2; :::; V`g (called the \network structure") and Θ a set of all CPTs (called the \network parametrization"). We will refer to the DAG in Figure 1 and its parametrization the Bayesian network A, or the BN A. It has been shown that the dependence constraints imposed by G and the numeric constraints imposed by Θ result in the unique joint probability distribution, Y Pr fVg = Pr fV1;V2; :::; V`g = Pr fV jHV g : (1) V 2V This equation is known as the \chain rule for Bayesian networks" (Pearl, 1987, Equation 3). In our example, Pr fA = a; B = b; C = c; :::; L = `g = Pr fA = ag Pr fB = bg Pr fC = cjA = a; B = bg ::: Pr fL = `jI = ig : 1.1 The Marginal Distribution We wish to evaluate the marginal probability Pr fQg, in which Q 2 V is known as a \query variable." This probability may be conditioned on the fact that some other variables in V are observed to take certain values. Suppose f is a function defined over a set of variables X ⊆ V. We say the \scope" of f is X . We list out the scope if necessary, such as f (X ); if not, we simply write f (·). 3 In this paper, suppose X = fY; Zg ⊆ V where Y\Z = ? and Y = fY1; :::; Yng. We express Pr fX g as Pr fY; Zg. Given Pr fY; Zg, \summing out" (or \elim- inating") Y from Pr fY; Zg means obtaining Pr fZg as follows: For every fixed Z = z, X Pr fz; Yg Y ! X X X = ::: Pr fz; Y1 = y1; :::; Yn−1 = yn−1;Yn = yng Y1 Yn−1 Yn 0 1 X X = ::: @ Pr fz; Y1 = y1; :::; Yn−1 = yn−1gA = Pr fzg : Y1 Yn−1 We write, X Pr fZ; Yg = Pr fZg : (2) Y One way to evaluate the marginal probability Pr fVjg is to use Equation (1) to calculate the joint probability Pr fV1; :::; V`g, then sum out all variables in fV1; :::; Vj−1;Vj+1; :::; V`g. This brute-force method is known to be NP- hard; that is, there is often an exponential relationship between the number of variables ` and the complexity of computations (Cooper, 1990). Thus it may be infeasible for large networks. There have been many attempts in the literature to find the most efficient methods to calculate Pr fQg. They can be divided into two broad categories: the approximate and the exact methods. One example of the approximate methods is using Gibbs samplings to generate \variates" (or \instantiations") for V, then using statistical techniques to find an estimate for Pr fQg. (See Pearl, 1987.) In this paper, we present a method to compute Pr fQg exactly, apart from precision or rounding errors. Guo & Hsu (2002) did a survey of the exact algorithms for the BNs, including the two most well-known ones, namely the variable eliminations (Zhang & Poole, 1996; Dechter, 1999) and the clique-tree propagations (Lau- ritzen & Spiegelhalter, 1988; Lepar & Shenoy, 1999). Other methods re- viewed were the message propagations in polytrees (Kim & Pearl, 1983; Pearl 1986a, 1986b), loop cutset conditioning (Pearl, 1986b; D´ıez, 1996), arc reversal/node reduction (Shachter, 1990), symbolic probabilistic inference (Shachter et al., 1990) and differential approach (Darwiche, 2003). We 4 also want to mention the more recent LAZY propagation algorithm (Madsen & Jensen, 1999). In this paper, we first present the border algorithm. Like the clique-tree propagation, instead of obtaining the joint probability Pr fV1; :::; V`g, the border algorithm breaks a Bayesian network into smaller parts and calculate the marginal probabilities of these parts, avoiding the exponential blow-ups associated with large networks. In the next section, we first show how a BN can be so divided, in such a way that its independency structure can be exploited. In Section 3, we explain how to calculate the marginal probability of each part when there is no observed evidence. In Section 4, we show how to calculate them, conditional on some observed evidences. In Section 5, we focus on a special kind of BN called the \polytrees," and present in details, with some modifications and within the border algorithm framework, the \revised polytree algorithm" by Peot & Shachter (1991). In Section 6, we present our parentless polytree method, which, coupled with the border algorithm, can convert any BN into a polytree. This part is static, in that they need to be done only once, off-line, prior to any dialogue with a user. Then we show the dynamic, on-line part of our method, in which the conditional marginal probabilities can be calculated whenever new evidences are entered or queries posed. Finally, our discussions and summary are presented in Section 7. 2 Partitioning a DAG In this section, we will show how a BN can be partitioned into smaller parts. 2.1 The Set Relationships Consider a non-empty set of nodes X ⊆ V. We also call 1. HX = f[V 2X HV g nX the \parent" of X . If HX = ?, we say X is \parentless" (or \ancestral"). For the BN A, HfA;Hg = fC; Dg. 2. LX = f[V 2X LV g n fX ; HX g the \child" of X . If LX = ?, we say X is \childless." For the BN A, LfA;Hg = fJ; Kg. (Although D is a child of A, it is also a parent of H; so it is a member of HfA;Hg, not of LfA;Hg.) 5 3. KX = f[V 2X KV g n fX ; HX ; LX g the \co-parent" of X .

Load more