Loopy Belief Propagation in Bayesian Networks : Origin and Possibilistic Perspectives

Loopy Belief Propagation in Bayesian Networks : origin and possibilistic perspectives Amen Ajroud 1, M ohamed Nazih Omri 2, Habib Youssef 3, Salem Benferhat 4 1 ISET Sousse, TUNISIA − amen.ajroud@ isetso.rnu.tn 2 IPEIM Monastir - University of Monastir, TUNISIA − nazih.omri@ ipeim.rnu.tn 3 ISITC Hammam-Sousse - University of Monastir, TUNISIA − habib.youssef@ fsm.rnu.tn 4 CRIL, Lens - University of Artois, FRANCE − benferhat@ cril.univ-artois.fr Abstract : In this paper we present a synthesis of the work performed on two inference algorithms: the Pearl‘s belief propagation (BP) algorithm applied to Bayesian networks without loops (i.e. polytree) and the Loopy belief propagation (LBP) algorithm (inspired from the BP) which is applied to networks containing undirected cycles. It is known that the BP algorithm, applied to Bayesian networks with loops, gives incorrect numerical results i.e. incorrect posterior probabilities. Murphy and al. [7] find that the LBP algorithm converges on several networks and when this occurs, LBP gives a good approximation of the exact posterior probabilities. However this algorithm presents an oscillatory behaviour when it is applied to QMR (Quick Medical Reference) network [15]. This phenomenon prevents the LBP algorithm from converging towards a good approximation of posterior probabilities. W e believe that the translation of the inference computation problem from the probabilistic framework to the possibilistic framework will allow performance improvement of LBP algorithm. W e hope that an adaptation of this algorithm to a possibilistic causal network will show an improvement of the convergence of LBP. 1. Review of Bayesian Networks algorithms and have strong theoretical foundations [9],[10],[11] and [12]. Bayesian networks are powerful tools for Theoretically, a Bayesian network is a modelling causes and effects in a wide directed acyclic graph (DAG) made up of variety of domains. They use graphs nodes and causal edges. Each node has a capturing causality notion between variables, probability of having a certain value. Nodes and probability theory to express the are often binary, though a Bayesian network causality power. may have n-ary nodes. Parent and child Bayesian networks are very effective for nodes are defined as follows: a directed edge modelling situations where some exists from a parent to a child. Each child information is already known and incoming node will have a conditional probability data is uncertain or partially unavailable. table (CPT) based on parental values. There These networks also offer consistent are no directed cycles in the graph, though semantics for representing causes and effects there may be —loops“, or undirected cycles. via an intuitive graphical representation. An example network is shown in Figure 1, Because of all these capabilities, Bayesian with parents Ui sharing a child X. The node networks are regarded as systems for X is a child of the Ui's as well as being a uncertain knowledge representation and have parent to the Yi's. a large number of applications with efficient 1 Parents of X probabilistic inference in general has been proven to be NP-hard by Cooper [4]. U1 U2 Um - Approximate algorithms: they are the alternatives of the exact algorithms when the networks become very complex. They estimate the posterior probabilities in various ways. Approximating probabilistic inference was also shown to be NP-hard by Dagum X and Luby [8]. 3. Pearl‘s algorithm At the beginning of the 80s, Pearl published Y1 Y2 Yn an efficient message propagation inference Children of X algorithm for polytrees [6] and [10]. This algorithm is an exact belief propagation Figure 1: A Bayesian Network (polytree) procedure but works only for polytrees [12]. Consider the case of a general discrete node X having parents U1… Um and children 2. Bayesian Network inference Y1… Yn, as shown in Figure 1. Evidence will In bayesian network, the objective of be represented by E, with evidence —above“ inference is to compute P(X|E), the posterior X (evidence at ancestors of X) represented probabilities of some —query“ nodes (noted as e+ and evidence —below“ X (evidence at by X) given some observed value of descendants of X) as e-. evidence nodes (noted by E, E⊄X) [13]. A Knowledge of evidence can flow either simple form of it results when X is a single down the network (from parent to child) or node, i.e., computing the posterior marginal up the network (child to parent). W e use the probabilities of a single query node. notation π(Ui) to represent a message from W ith the constitution of a large Bayesian parent Ui to child and λ(Yj) for a message network, the feasibility of the probabilistic from child Yj to parent (see Figure 2) [13]. inference is tested: when the network is simple, the calculation of these probabilities U1 U2 is not very difficult. On the other hand, when λ(X) the network becomes very large, several π(U1) problems emerge: the inference requires an π(U2) enormous memory size and calculation λ(X) X becomes very complex or even, in certain cases, can not be completed. λ(Y1) The inference algorithms are classified in π(X) π(X) two groups [5]: λ(Y2) - Exact algorithms: these methods use the conditional independence contained in the Y1 Y2 networks and give, to each inference, the exact posterior probabilities. The exact Figure 2: Messages propagation 2 The posterior probability (belief) on node X, B- Iterate until no change occurs given its parents and children value, can be - (For each node X) if X has received all the computed as follows: P messages from its parents, calculate (x) - (For each node X) if X has received all the BEL(X)=P(X=x|e)= (X) (X) αλ π Q messages from its children, calculate (x) W here α is a normalizing constant, λ(X) = - (For each node X) if (x) has been P(e-|x) and π(X) = P(x|e+ ). calculated and X received all the messages To calculate λ(X) = P(e-|x), it is assumed from all its children (except Y), calculate that node X has received all λ messages XY(x) and send it to Y. from its c children. λAB(X) will represent the - (For each node X) if (x) has been λ message sent from node A to node B. calculated and X received all the messages from all parents (except U), calculate (x) c XU λ(x) = λ (x) and send it to U. ∏ Y j X i j=1 C- Compute BEL(X)=(x)(x) and normalize In the same way, in computing π(X) = The belief propagation algorithm has + P(x|e ), we assume that node X has polynomial complexity in the number of received all π messages from its p parents. nodes and converges in time proportional to Similarly, πAB(X) represents the π message the diameter of network [12]. In addition, sent from node A to node B. By applying the computation in a node is proportional to its summation over the CPT of node X, we can CPT size. W e point out that the exact express π(X) : probabilistic inference in general has been p proven to be NP-hard [4]. π (x) = P(x | u ,...,u ) π (u ) Actually, the majority of Bayesian networks 1 p ∏ U j X i j u1 ,...,u p j=1 are not polytrees. It is proven that Pearl‘s Thus, we need to compute: algorithm, applied to Bayesian networks with loops, gives incorrect numerical results π (x) = απ (x) λ (x) XYJ X ∏ Yk X i.e. incorrect posterior probabilities. k ≠ j Pearl proposed an exact inference algorithm and for multiply connected networks called q —loop cutset conditioning“ [10]. This λY X (x) = λY (y j ) p(y | v1,...,vq ) πV Y (vk ) j j ∏ k j algorithm changes the connectivity of a y j v1 ,...,vq k =1 network and renders it singly connected. The resulting network is solved by Pearl‘s To summarize, Pearl‘s algorithm proceeds as algorithm. The complexity of this method follows: grows exponentially with the size of the loop A- Initialization step cutset for a multiple connected network. - For all nodes V =e in E: i i Unfortunately the loop cutset minimization λ(xi) = 1 wherever xi = ei ; 0 otherwise problem is NP-hard. π(xi) = 1 wherever xi = ei ; 0 otherwise Another exact inference algorithm called - For nodes without parents: —clique-tree propagation“ [14] transforms a π(xi) = p(xi) - prior probabilities multiple connected network into a clique - For nodes without children: tree, then it performs message propagation λ(xi) = 1 uniformly (normalize at end) method over the transformed network. The clique-tree propagation algorithm can be extremely slow for dense networks since its 3 complexity is exponential with the size of the largest clique of the transformed network. The approximate inference algorithm remains the best alternative of any exact inference algorithm when it is difficult or impossible to apply an exact inference method. 4. Loopy belief propagation The —Loopy belief propagation“ (LBP) is an approximate inference algorithm which applies the rules of belief propagation over networks with loops [7]. The main idea of LBP is to keep passing messages around the Figure 3: Incoming messages network until a stable belief state is reached to node X at step t (if ever). The LBP algorithm may not give with : exact results on a network with loops, but it (1) can be used in the following way: iterate message propagation until convergence. Empirically, several applications of the LBP and : algorithm proved successful, among them the decoding Turbo code: the error correction codes network [1]. (2) W e define here some notations used in the Similarly, we draw the outgoing messages algorithm: exchanged at step t+1 from a node X to its parents and children (see Figure 4): λY(X) is defined as the message to X from a child node Y.

Load more