<<

Loopy Belief Propagation in Bayesian Networks : origin and possibilistic perspectives

Amen Ajroud 1, M ohamed Nazih Omri 2, Habib Youssef 3, Salem Benferhat 4 1 ISET Sousse, TUNISIA − amen.ajroud@ isetso.rnu.tn 2 IPEIM Monastir - University of Monastir, TUNISIA − nazih.omri@ ipeim.rnu.tn 3 ISITC Hammam-Sousse - University of Monastir, TUNISIA − habib.youssef@ fsm.rnu.tn 4 CRIL, Lens - University of Artois, FRANCE − benferhat@ cril.univ-artois.fr

Abstract : In this paper we present a synthesis of the work performed on two inference : the Pearl‘s belief propagation (BP) applied to Bayesian networks without loops (i.e. ) and the Loopy belief propagation (LBP) algorithm (inspired from the BP) which is applied to networks containing undirected cycles. It is known that the BP algorithm, applied to Bayesian networks with loops, gives incorrect numerical results i.e. incorrect posterior probabilities. Murphy and al. [7] find that the LBP algorithm converges on several networks and when this occurs, LBP gives a good approximation of the exact posterior probabilities. However this algorithm presents an oscillatory behaviour when it is applied to QMR (Quick Medical Reference) network [15]. This phenomenon prevents the LBP algorithm from converging towards a good approximation of posterior probabilities. W e believe that the translation of the inference computation problem from the probabilistic framework to the possibilistic framework will allow performance improvement of LBP algorithm. W e hope that an adaptation of this algorithm to a possibilistic causal network will show an improvement of the convergence of LBP.

1. Review of Bayesian Networks algorithms and have strong theoretical foundations [9],[10],[11] and [12]. Bayesian networks are powerful tools for Theoretically, a is a modelling causes and effects in a wide directed acyclic graph (DAG) made up of variety of domains. They use graphs nodes and causal edges. Each node has a capturing causality notion between variables, probability of having a certain value. Nodes and probability theory to express the are often binary, though a Bayesian network causality power. may have n-ary nodes. Parent and child Bayesian networks are very effective for nodes are defined as follows: a directed edge modelling situations where some exists from a parent to a child. Each child information is already known and incoming node will have a conditional probability data is uncertain or partially unavailable. table (CPT) based on parental values. There These networks also offer consistent are no directed cycles in the graph, though semantics for representing causes and effects there may be —loops“, or undirected cycles. via an intuitive graphical representation. An example network is shown in Figure 1, Because of all these capabilities, Bayesian with parents Ui sharing a child X. The node networks are regarded as systems for X is a child of the Ui's as well as being a uncertain knowledge representation and have parent to the Yi's. a large number of applications with efficient

1 Parents of X probabilistic inference in general has been proven to be NP-hard by Cooper [4]. U1 U2 Um - Approximate algorithms: they are the alternatives of the exact algorithms when the networks become very complex. They estimate the posterior probabilities in various ways. Approximating probabilistic inference was also shown to be NP-hard by Dagum X and Luby [8].

3. Pearl‘s algorithm

At the beginning of the 80s, Pearl published Y1 Y2 Yn an efficient message propagation inference Children of X algorithm for [6] and [10]. This algorithm is an exact belief propagation Figure 1: A Bayesian Network (polytree) procedure but works only for polytrees [12]. Consider the case of a general discrete node 2. Bayesian Network inference X having parents U1… Um and children Y1… Yn, as shown in Figure 1. Evidence will In bayesian network, the objective of be represented by E, with evidence —above“ inference is to compute P(X|E), the posterior X (evidence at ancestors of X) represented probabilities of some —query“ nodes (noted as e+ and evidence —below“ X (evidence at by X) given some observed value of descendants of X) as e-. evidence nodes (noted by E, E⊄X) [13]. A Knowledge of evidence can flow either simple form of it results when X is a single down the network (from parent to child) or node, i.e., computing the posterior marginal up the network (child to parent). W e use the probabilities of a single query node. notation π(Ui) to represent a message from

W ith the constitution of a large Bayesian parent Ui to child and λ(Yj) for a message network, the feasibility of the probabilistic from child Yj to parent (see Figure 2) [13]. inference is tested: when the network is simple, the calculation of these probabilities U1 U2 is not very difficult. On the other hand, when λ(X) the network becomes very large, several π(U1) problems emerge: the inference requires an π(U2) enormous memory size and calculation λ(X) becomes very complex or even, in certain X cases, can not be completed. λ(Y1) The inference algorithms are classified in π(X) two groups [5]: π(X) λ(Y2) - Exact algorithms: these methods use the conditional independence contained in the Y1 Y2 networks and give, to each inference, the exact posterior probabilities. The exact Figure 2: Messages propagation

2 The posterior probability (belief) on node X, B- Iterate until no change occurs given its parents and children value, can be - (For each node X) if X has received all the computed as follows: P messages from its parents, calculate (x) - (For each node X) if X has received all the BEL(X)=P(X=x|e)= (X) (X) αλ π Q messages from its children, calculate (x) W here α is a normalizing constant, λ(X) = - (For each node X) if (x) has been P(e-|x) and π(X) = P(x|e+ ). calculated and X received all the messages To calculate λ(X) = P(e-|x), it is assumed from all its children (except Y), calculate that node X has received all λ messages XY(x) and send it to Y. from its c children. λAB(X) will represent the - (For each node X) if (x) has been λ message sent from node A to node B. calculated and X received all the  messages from all parents (except U), calculate (x) c XU λ(x) = λ (x) and send it to U. ∏ Y j X i j=1 C- Compute BEL(X)= (x)(x) and normalize In the same way, in computing π(X) = The belief propagation algorithm has + P(x|e ), we assume that node X has polynomial complexity in the number of received all π messages from its p parents. nodes and converges in time proportional to Similarly, πAB(X) represents the π message the diameter of network [12]. In addition, sent from node A to node B. By applying the computation in a node is proportional to its summation over the CPT of node X, we can CPT size. W e point out that the exact express π(X) : probabilistic inference in general has been

p proven to be NP-hard [4]. π (x) = P(x | u ,...,u ) π (u ) Actually, the majority of Bayesian networks ƒ 1 p ∏ U j X i j u1 ,...,u p j=1 are not polytrees. It is proven that Pearl‘s Thus, we need to compute: algorithm, applied to Bayesian networks with loops, gives incorrect numerical results π (x) = απ (x) λ (x) XYJ X ∏ Yk X i.e. incorrect posterior probabilities. k ≠ j Pearl proposed an exact inference algorithm and for multiply connected networks called q —loop cutset conditioning“ [10]. This λY X (x) = ƒλY (y j ) ƒ p(y | v1,...,vq ) πV Y (vk ) j j ∏ k j algorithm changes the connectivity of a y j v1 ,...,vq k =1 network and renders it singly connected. The

resulting network is solved by Pearl‘s To summarize, Pearl‘s algorithm proceeds as algorithm. The complexity of this method follows: grows exponentially with the size of the loop A- Initialization step cutset for a multiple connected network. - For all nodes V =e in E: i i Unfortunately the loop cutset minimization λ(xi) = 1 wherever xi = ei ; 0 otherwise problem is NP-hard. π(xi) = 1 wherever xi = ei ; 0 otherwise Another exact inference algorithm called - For nodes without parents: —clique- propagation“ [14] transforms a π(xi) = p(xi) - prior probabilities multiple connected network into a clique - For nodes without children: tree, then it performs message propagation λ(xi) = 1 uniformly (normalize at end) method over the transformed network. The clique-tree propagation algorithm can be extremely slow for dense networks since its

3 complexity is exponential with the size of the largest clique of the transformed network. The approximate inference algorithm remains the best alternative of any exact inference algorithm when it is difficult or impossible to apply an exact inference method.

4. Loopy belief propagation

The —Loopy belief propagation“ (LBP) is an approximate inference algorithm which applies the rules of belief propagation over networks with loops [7]. The main idea of LBP is to keep passing messages around the Figure 3: Incoming messages network until a stable belief state is reached to node X at step t (if ever). The LBP algorithm may not give with : exact results on a network with loops, but it (1) can be used in the following way: iterate message propagation until convergence. Empirically, several applications of the LBP and : algorithm proved successful, among them the decoding : the error correction codes network [1]. (2) W e define here some notations used in the Similarly, we draw the outgoing messages algorithm: exchanged at step t+1 from a node X to its parents and children (see Figure 4): λY(X) is defined as the message to X from a child node Y.

πX(U) is defined as the message to X from its parent U.

λX(X) is a message sent by X to itself if it is observed (X ∈ E).

W e allow messages to change at each iteration t: (t)(X) is a message at iteration t. λ Belief is the normalized product of all messages after convergence:

BEL(x) = αλ(x)π(x) ≈ P(X=x|E).

To represent the messages propagation between the nodes, we draw the incoming messages exchanged at step t for a node X with parents U={U ,… U }and children 1 n Figure 4 : Outgoing messages Y={Y ,… Y } (see Figure 3): 1 m from node X at step t + 1

4 theory is represented by a pair of dual with : —measures“ of possibility and necessity, usually graded on the unit interval called possibilistic scale. Possibility measures are max-decomposable for the union of events, and : in contrast with probability measures which are additive, while necessity measures are min-decomposable for the intersection of events. Nodes are updated in parallel: at each The possibility theory, resulting from the iteration, all nodes compute their outgoing fuzzy set theory, allows a better flexibility in messages based on the input of their the treatment of information available than neighbours from the previous iteration. The within the probabilistic framework. It differs messages are said to converge if none of the from the probability theory especially by the beliefs in successive iterations changed by fact that it is possible to distinguish more than a small threshold (e.g. 10-4) [7]. uncertainty from imprecise, which is not the W hen LBP algorithm converges, the case with probabilities [2]. provided posterior probabilities values are W e can distinguish between qualitative and often a good approximation to the exact quantitative possibility theories. Qualitative inference result. But if it does not converge possibility theory can be defined in purely it may oscillate between two belief states. ordinal settings, while quantitative Murphy and al. tested LBP algorithm over possibility theory requires the use of a both synthetic (PYRAMID and toyQMR), numerical scale. Quantitative possibility and real word (ALARM and QMR-DT) measures can be viewed as upper bounds of network [7]. They noted that LBP converges imprecisely known probability measures. for all network except QMR-DT in which Several operational semantics for possibility the algorithm oscillates between two belief degrees have been recently obtained. states. They explain this result by the low Qualitative and quantitative possibility prior probabilities values and the absence of theories differ in the way conditioning is randomization in the QMR-DT network. defined (it is respectively based on minimum They tried to avoid oscillations by using and product operations). —momentum“ term in equation 1 and 2. In A logical counterpart of possibility theory general, they found that momentum has been developed for almost twenty years, significantly reduced the chance of and is known as possibilistic logic. A oscillation. However, in several cases the possibilistic logic formula is a pair of a beliefs to which the algorithm converged classical logic formula and a weight were quite inaccurate. understood as a lower bound of a necessity measure. Various extensions of possibilistic 5. Introduction to the possibility logic handle lower bounds of guaranteed or ordinary possibility functions, weights theory involving variables, fuzzy constants, Possibility theory, introduced by Zadeh [16] multiple source information. Graphical and developed by Dubois and Prade [3], representations of possibilistic logic bases, treats uncertainty in a qualitative or using the two types of conditioning, have quantitative way. Uncertainty in possibility been also obtained [2].

5 6. Conclusion and future work Intelligence, Karlsruhe, W est Germany, 190- 193, 1983. In this paper we have presented two [7] K. P. Murphy, Y. W eiss, and M. I. Jordan. inference algorithms: the first, Pearl‘s belief Loopy belief propagation for approximate propagation, is an exact algorithm applied to inference: An empirical study. pages 467- bayesian networks without loop. The second, 475, 1999. LBP is an approximate inference algorithm [8] P. Dagum and M. Luby. Approximating which applies the rules of belief propagation probabilistic inference in Bayesian belief networks is NP-hard. , over networks with loops. LBP algorithm 60: 141-153, 1993. guarantees a good approximation for [9] P. Judea. Bayesian networks : a model of posterior probabilities when it converges. self-activated memory for evidential However, convergence is not guaranteed reasoning. Cognitive Science Society, UC when LBP is applied over QMR-DT Irvine, pages 329œ334, 1985. network. W e estimate that the transformation [10] P. Judea. Fusion, propagation and of the inference computation problem from structuring in belief networks. UCLA the probability theory to the possibility Computer Science Department Technical theory will allow to improve LBP algorithm Report 850022 (R-42); Artificial performances. W e are Currently studying Intelligence, 29 :241œ 288, 1986. this transformation which we didn‘t try out [11] P. Judea and T. Verma. Influence diagrams and d-separation. UCLA Cognitive Systems yet. Laboratory, Technical Report 880052, 1988. [12] P. Judea. Probabilistic Reasoning in References Intelligent Systems: Networks of Plausible [1] A. Glavieux C. Berrou and P. Thitimajshima. Inference. Morgan Kaufmann Publishers, Near shannon limit error correcting coding Palo Alto, 1988. and decoding: Turbo codes. Proceedings [13] P. Naïm, P-H. W uillemin, P. Leray, O. IEEE International Communications Pourret and A. Becker. Réseaux Bayésiens. Conference '93, 1993. Eyrolles Edition, 2004. [2] D. Dubois, H. Prade. Possibility theory, [14] S. L. Lauritzen and D. J. Spiegelhalter. probability theory and multiple-valued Local computations with probabilities on logics: A clarification. Dans: Annals of graphical structures and their applications to Mathematics and Artificial Intelligence. Eds: expert systems, Proceedings of the Royal Kluwer, Dordrecht, V. 32, p. 35-66, 2001 Statistical Society, Series B., 50: 154-227, [3] D. Dubois, and H. Prade. Possibility Theory. 1988. Plenum, 1988. [15] T.S. Jaakkola and M.I. Jordan. Variational [4] G. Cooper. The computational complexity of probabilistic inference and the qmr dt probabilistic inference using Bayesian belief network. Journal of Artificial Intelligence network. Artificial Intelligence, 42:393-405, Research, 10:291-322, 1999. 1990. [16] L. A. Zadeh. Fuzzy sets as a basis for a [5] H. Guo, W . Hsu, A survey of algorithms for theory of possibility. Fuzzy Sets and real-time Bayesian network inference, in: Systems 1 (1978), 3œ28. AAAI/KDD/UAI-2002 Joint W orkshop on Real-Time Decision Support and Diagnosis Systems, pages 1œ12, 2002. [6] J. H. Kim and J. Pearl. A computational model for causal and diagnostic reasoning in inference engines. In Proceeding of the 8th International Joint Conference on Artificial

6