Bayesian Networks: Belief Propagation

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Networks: Belief Propagation Outline Bayesian Networks: Belief Propagation (Cont’d) Belief Propagation Review and Examples Huizhen Yu Generalized Belief Propagation – Max-Product Applications to Loopy Graphs [email protected].fi Dept. Computer Science, Univ. of Helsinki Probabilistic Models, Spring, 2010 Announcement: The last exercise will be posted online soon. Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 1 / 28 Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 2 / 28 Belief Propagation Review and Examples Belief Propagation Review and Examples Outline Review of Last Lecture We studied an algorithm for computing marginal posterior distributions: • It works in singly connected networks, which are DAGs whose undirected versions are trees. • It is suitable for parallel implementation. • It is recursively derived by Belief Propagation (i) dividing the total evidence in pieces, according to the independence structure represented by the DAG, and then Review and Examples (ii) incorporating evidence pieces in either the probability terms Generalized Belief Propagation – Max-Product (π-messages) or the likelihood terms (conditional probability terms; Applications to Loopy Graphs λ-messages). Queries answerable by the algorithm for a singly connected network: • P(X = x |e) for a single x; • P(Xv = xv |e) for all xv and v ∈ V ; • Most probable configurations, arg maxx p(x & e). This can be related to finding global optimal solutions by distributed local computation. (Details are given today.) Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 3 / 28 Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 4 / 28 Belief Propagation Review and Examples Belief Propagation Review and Examples Practice: Belief Propagation for HMM A Fault-Detection Example t = 0 t = n A logic circuit for fault detection and its Bayesian network (Pearl 1988): t = 1 Y0 = 1 Y0 = 1 t = n +1 t = 2 X1 P(Xi = 1) = pi , X1 AND P(Xi = 0) = 1 − pi = qi , t = 3 Y 1 Y1 Yi = Yi−1 AND Xi . X2 X2 • Y = 1 always. Observation variables (black) are instantiated; latent variables (white) are AND 0 Y2 X , X ,.... The total evidence at time t is e . How would you use X • Xi is normal if Xi = 1, 1 2 t Y2 3 message-passing to calculate and faulty if Xi = 0. X3 Y • Normally all variables are • p(xt |et ), ∀xt ? 3 AND on, and a failure occurs if (You’ll obtain as a special case the so-called forward algorithm.) Y3 = 0. Y3 • p(xt+1 |et ), ∀xt+1? (This is a prediction problem.) • p(xk |et ), ∀xk , k < t? (You’ll obtain as a special case the so-called backward algorithm.) Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 5 / 28 Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 6 / 28 Belief Propagation Review and Examples Belief Propagation Review and Examples Example: Belief Updating Example: Explanations based on Beliefs Without observing any evidence, all the π-messages are prior probabilities: ˆ ˜ ˆ ˜ If q1 = 0.45 and q3 = 0.4, we obtain πXi ,Yi (xi ) = pi , qi , i = 1, 2, 3; πY0,Y1 (y0) = 1 , 0 , ˆ ˜ ˆ ˜ πY ,Y (y1) = p1 , q1 , πY ,Y (y2) = p1p2 , 1 − p1p2 , 1 2 2 3 P(X1 = 0|e) = 0.672 > P(X1 = 1|e) = 0.328, for xi = 1, 0 and yi = 1, 0. P(X3 = 0|e) = 0.597 > P(X3 = 1|e) = 0.403. Suppose e : {X2 = 1, Y3 = 0} is received. Then, X2 updates its message to Is I1 = {X1 = 0, X3 = 0} the most probable explanation of e, however? Y2 and Y2 updates its message to Y3: ˆ ˜ ˆ ˜ There are three possible explanations πX2,Y2 (x2) = p2 , 0 , πY2,Y3 (y2) = p1p2 , q1p2 . I1 = {X1 = 0, X3 = 0}, I2 = {X1 = 0, X3 = 1}, I3 = {X1 = 1, X3 = 0}. λ-messages starting from Y3 upwards are given by: ˆ ˜ ˆ ˜ Direct calculation shows λY3,X3 (x3) = p2q1 , p2 , λY3,Y2 (y2) = q3 , 1 ; ˆ ˜ ˆ ˜ q1q3 q1p3 p1q3 λY2,X2 (x2) = p1q3 + q1 , p1 + q1q3 , λY2,Y1 (y1) = p2q3 , p2 ; P(I1 |e) = , P(I2 |e) = , P(I3 |e) = . ˆ ˜ 1 − p1p3 1 − p1p3 1 − p1p3 λY1,X1 (x1) = p2q3 , p2 . So, if 0.5 > q1 > q2 > q3, then based on the evidence, I2 is the most So q3p2 q3 q3 probable explanation, while I1 is the least probable explanation. P(X3 = 0|e) = = = , p3p2q1 + q3p2 p3q1 + q3 1 − p1p3 q1p2 q1 q1 P(X1 = 0|e) = = = . p1p2q3 + q1p2 p1q3 + q1 1 − p1p3 Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 7 / 28 Huizhen Yu (U.H.) Bayesian Networks: Belief Propagation (Cont’d) Feb. 18 8 / 28 Outline Outline Outline Outline Bayesian Networks: Message Passing in Singly Connected Bayesian Networks:Bayesian Message Networks: Passing Message in Singly Passing Connected in Singly Connected Networks Networks Networks Inference Engine: Overview Bayesian Networks: Message Passing in Singly Connected Inference Engine: OverviewInferenceOutline Engine:Outline Overview Belief Propagation in Chains Belief PropagationBelief in Chains Propagation in Chains Networks Inference Engine: Overview Huizhen Yu Belief Propagation in Trees Huizhen Yu Huizhen Yu Belief PropagationBelief in Trees Propagation in Trees BeliefBelief Propagation Propagation in Singly in Chains Connected Networks Belief PropagationOutline Belief in Singly Propagation Connected in Networks Singly Connected Networks [email protected].fi Belief Propagation in Trees Bayesian Networks:BayesianDept. Computer Message Networks:Huizhen Science, Passing Univ. Yu [email protected].fi Helsinki in Singly Passing [email protected].fi in Singly Connected Loopy Belief Propagation Dept. Computer Science, Univ.Dept. of Computer Helsinki Science, Univ. of Helsinki Loopy Belief PropagationLoopy Belief Propagation Networks Networks InferenceConditioningBelief Engine: Propagation Overview in Singly Connected Networks Inference Engine: OverviewConditioning [email protected].fi Conditioning ProbabilisticDept. Computer Models, Science, Spring, Univ. of Helsinki 2010 BeliefLoopy Propagation BeliefBelief Propagation in ChainsPropagation in Chains Bayesian Networks: Message PassingProbabilistic in Singly Models, ConnectedProbabilistic Spring, 2010 Models, Spring, 2010 Huizhen Yu Huizhen Yu BeliefConditioning PropagationBelief in TreesPropagation in Trees Networks Inference Engine: Overview Belief Propagation Generalized Belief Propagation – Max-Product Belief Propagation Generalized BeliefProbabilistic Propagation Models, – Max-Product Spring, 2010 Belief PropagationBelief in SinglyPropagation Connected in Singly Networks Connected Networks Belief Propagation in Chains [email protected].fi [email protected].fi Loopy Belief Propagation Dept. Computer Science,Dept. Univ. Computer of Helsinki Science, Univ. of Helsinki Loopy Belief Propagation Huizhen Yu Belief Propagation in Trees Outline Recall Notation for Singly Connected Networks Conditioning Conditioning Belief Propagation in Singly Connected Networks Huizhen Yu (U.H.) BayesianProbabilistic Networks: Message Models, Passing inSpring, Singly Connected 2010 Networks Feb. 16 1 / 17 Huizhen Yu (U.H.) Bayesian Networks: Message Passing in Singly Connected Networks Feb. 16 2 / 17 Huizhen [email protected].fi (U.H.) HuizhenBayesianProbabilistic Yu (U.H.) Networks: Message Models, PassingBayesian in Spring, Singly Networks: Connected 2010 Message Networks Passing in Singly ConnectedFeb. Networks 16 1 / 17 Feb. 16 1 / 17 Huizhen Yu (U.H.) HuizhenBayesian Yu (U.H.) Networks: Message PassingBayesian in Singly Networks: Connected Message Networks Passing in Singly ConnectedFeb. Networks 16 2 / 17 Feb. 16 2 / 17 Consider a vertex v. Dept. Computer Science, Univ. of Helsinki Loopy Belief Propagation Conditioning • pa(v) = {u1,..., un}, ch(v) =Huizhen{ Yuw1 (U.H.),..., wmInference}Bayesian; Engine: Networks: Overview Message Passing in Singly Connected Networks Feb. 16 1 / 17 Huizhen Yu (U.H.) Inference Engine:Bayesian Overview Networks:Belief Message Propagation Passing in in Chains Singly Connected Networks Feb. 16 2 / 17 Probabilistic Models, Spring,Inference Engine:2010 Overview Inference Engine: Overview Inference Engine: Overview BeliefInference Propagation Engine: in Overview Chains Belief Propagation in Chains • Outline Tvu, u ∈ pa(v): the sub-polytree containing the parent u, resulting Outline Outline from removing the edge (u, v); Inference Engine: Overview Inference Engine: Overview Belief Propagation in Chains Huizhen Yu (U.H.) Huizhen YuBayesian (U.H.) Networks: MessageBayesian Passing Networks: in Singly Connected Message Passing Networks in Singly Connected NetworksFeb. 16 1 / 17 Feb. 16 1 / 17 Huizhen Yu (U.H.) Huizhen YuBayesian (U.H.) Networks: MessageBayesian Passing Networks: in Singly Connected Message Passing Networks in Singly Connected NetworksFeb. 16 2 / 17 Feb. 16 2 / 17 • Tvw , w ∈ ch(v): the sub-polytree containing the child w, resulting Outline Belief Propagation from removing the edge (v, w). Inference Engine: OverviewInference Engine: Overview Inference Engine: OverviewInferenceBelief Engine: Propagation Overview in ChainsBelief Propagation in Chains Huizhen Yu (U.H.) Bayesian Networks: Message Passing in Singly Connected Networks Feb. 16 1 / 17 Huizhen YuInference (U.H.) Engine:Bayesian Overview Networks: Message Passing in Singly Connected Networks Feb. 16 2 / 17 Review and Examples Form of evidence Inference Engine: OverviewInferenceOutline Engine:Outline Overview For a sub-polytree T , denote Form of evidence Form of evidence Belief Propagation in Chains Belief PropagationBelief in Chains Propagation in Chains InferenceeT Engine:un Overviewvw1 wm e + e λπ Belief PropagationInference Engine: in Overview Trees Belief Propagation in Chains Generalized Belief Propagation – Max-Product •
Recommended publications
  • Inference Algorithm for Finite-Dimensional Spin
    PHYSICAL REVIEW E 84, 046706 (2011) Inference algorithm for finite-dimensional spin glasses: Belief propagation on the dual lattice Alejandro Lage-Castellanos and Roberto Mulet Department of Theoretical Physics and Henri-Poincare´ Group of Complex Systems, Physics Faculty, University of Havana, La Habana, Codigo Postal 10400, Cuba Federico Ricci-Tersenghi Dipartimento di Fisica, INFN–Sezione di Roma 1, and CNR–IPCF, UOS di Roma, Universita` La Sapienza, Piazzale Aldo Moro 5, I-00185 Roma, Italy Tommaso Rizzo Dipartimento di Fisica and CNR–IPCF, UOS di Roma, Universita` La Sapienza, Piazzale Aldo Moro 5, I-00185 Roma, Italy (Received 18 February 2011; revised manuscript received 15 September 2011; published 24 October 2011) Starting from a cluster variational method, and inspired by the correctness of the paramagnetic ansatz [at high temperatures in general, and at any temperature in the two-dimensional (2D) Edwards-Anderson (EA) model] we propose a message-passing algorithm—the dual algorithm—to estimate the marginal probabilities of spin glasses on finite-dimensional lattices. We use the EA models in 2D and 3D as benchmarks. The dual algorithm improves the Bethe approximation, and we show that in a wide range of temperatures (compared to the Bethe critical temperature) our algorithm compares very well with Monte Carlo simulations, with the double-loop algorithm, and with exact calculation of the ground state of 2D systems with bimodal and Gaussian interactions. Moreover, it is usually 100 times faster than other provably convergent methods, as the double-loop algorithm. In 2D and 3D the quality of the inference deteriorates only where the correlation length becomes very large, i.e., at low temperatures in 2D and close to the critical temperature in 3D.
    [Show full text]
  • Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations
    Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Many-Pairs Mutual Information for Adding Structure to Belief Propagation Approximations Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 {aychoi,darwiche}@cs.ucla.edu Abstract instances of the satisfiability problem (Braunstein, Mezard,´ and Zecchina 2005). IBP has also been applied towards a We consider the problem of computing mutual informa- variety of computer vision tasks (Szeliski et al. 2006), par- tion between many pairs of variables in a Bayesian net- work. This task is relevant to a new class of Generalized ticularly stereo vision, where methods based on belief prop- Belief Propagation (GBP) algorithms that characterizes agation have been particularly successful. Iterative Belief Propagation (IBP) as a polytree approx- The successes of IBP as an approximate inference algo- imation found by deleting edges in a Bayesian network. rithm spawned a number of generalizations, in the hopes of By computing, in the simplified network, the mutual in- understanding the nature of IBP, and further to improve on formation between variables across a deleted edge, we the quality of its approximations. Among the most notable can estimate the impact that recovering the edge might are the family of Generalized Belief Propagation (GBP) al- have on the approximation. Unfortunately, it is compu- gorithms (Yedidia, Freeman, and Weiss 2005), where the tationally impractical to compute such scores for net- works over many variables having large state spaces. “structure” of an approximation is given by an auxiliary So that edge recovery can scale to such networks, we graph called a region graph, which specifies (roughly) which propose in this paper an approximation of mutual in- regions of the original network should be handled exactly, formation which is based on a soft extension of d- and to what extent different regions should be mutually con- separation (a graphical test of independence in Bayesian sistent.
    [Show full text]
  • Variational Message Passing
    Journal of Machine Learning Research 6 (2005) 661–694 Submitted 2/04; Revised 11/04; Published 4/05 Variational Message Passing John Winn [email protected] Christopher M. Bishop [email protected] Microsoft Research Cambridge Roger Needham Building 7 J. J. Thomson Avenue Cambridge CB3 0FB, U.K. Editor: Tommi Jaakkola Abstract Bayesian inference is now widely established as one of the principal foundations for machine learn- ing. In practice, exact inference is rarely possible, and so a variety of approximation techniques have been developed, one of the most widely used being a deterministic framework called varia- tional inference. In this paper we introduce Variational Message Passing (VMP), a general purpose algorithm for applying variational inference to Bayesian Networks. Like belief propagation, VMP proceeds by sending messages between nodes in the network and updating posterior beliefs us- ing local operations at each node. Each such update increases a lower bound on the log evidence (unless already at a local maximum). In contrast to belief propagation, VMP can be applied to a very general class of conjugate-exponential models because it uses a factorised variational approx- imation. Furthermore, by introducing additional variational parameters, VMP can be applied to models containing non-conjugate distributions. The VMP framework also allows the lower bound to be evaluated, and this can be used both for model comparison and for detection of convergence. Variational message passing has been implemented in the form of a general purpose inference en- gine called VIBES (‘Variational Inference for BayEsian networkS’) which allows models to be specified graphically and then solved variationally without recourse to coding.
    [Show full text]
  • Linearized and Single-Pass Belief Propagation
    Linearized and Single-Pass Belief Propagation Wolfgang Gatterbauer Stephan Gunnemann¨ Danai Koutra Christos Faloutsos Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University [email protected] [email protected] [email protected] [email protected] ABSTRACT DR TS HAF D 0.8 0.2 T 0.3 0.7 H 0.6 0.3 0.1 How can we tell when accounts are fake or real in a social R 0.2 0.8 S 0.7 0.3 A 0.3 0.0 0.7 network? And how can we tell which accounts belong to F 0.1 0.7 0.2 liberal, conservative or centrist users? Often, we can answer (a) homophily (b) heterophily (c) general case such questions and label nodes in a network based on the labels of their neighbors and appropriate assumptions of ho- Figure 1: Three types of network effects with example coupling mophily (\birds of a feather flock together") or heterophily matrices. Shading intensity corresponds to the affinities or cou- (\opposites attract"). One of the most widely used methods pling strengths between classes of neighboring nodes. (a): D: for this kind of inference is Belief Propagation (BP) which Democrats, R: Republicans. (b): T: Talkative, S: Silent. (c): H: iteratively propagates the information from a few nodes with Honest, A: Accomplice, F: Fraudster. explicit labels throughout a network until convergence. A well-known problem with BP, however, is that there are no or heterophily applies in a given scenario, we can usually give known exact guarantees of convergence in graphs with loops.
    [Show full text]
  • Α Belief Propagation for Approximate Inference
    α Belief Propagation for Approximate Inference Dong Liu, Minh Thanh` Vu, Zuxing Li, and Lars K. Rasmussen KTH Royal Institute of Technology, Stockholm, Sweden. e-mail: fdoli, mtvu, zuxing, [email protected] Abstract Belief propagation (BP) algorithm is a widely used message-passing method for inference in graphical models. BP on loop-free graphs converges in linear time. But for graphs with loops, BP’s performance is uncertain, and the understanding of its solution is limited. To gain a better understanding of BP in general graphs, we derive an interpretable belief propagation algorithm that is motivated by minimization of a localized α-divergence. We term this algorithm as α belief propagation (α-BP). It turns out that α-BP generalizes standard BP. In addition, this work studies the convergence properties of α-BP. We prove and offer the convergence conditions for α-BP. Experimental simulations on random graphs validate our theoretical results. The application of α-BP to practical problems is also demonstrated. I. INTRODUCTION Bayesian inference provides a general mathematical framework for many learning tasks such as classification, denoising, object detection, and signal detection. The wide applications include but not limited to imaging processing [34], multi-input-multi-output (MIMO) signal detection in digital communication [2], [10], inference on structured lattice [5], machine learning [19], [14], [33]. Generally speaking, a core problem to these applications is the inference on statistical properties of a (hidden) variable x = (x1; : : : ; xN ) that usually can not be observed. Specifically, practical interests usually include computing the most probable state x given joint probability p(x), or marginal probability p(xc), where xc is subset of x.
    [Show full text]
  • Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: a Survey Chaohui Wang, Nikos Komodakis, Nikos Paragios
    Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey Chaohui Wang, Nikos Komodakis, Nikos Paragios To cite this version: Chaohui Wang, Nikos Komodakis, Nikos Paragios. Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey. Computer Vision and Image Under- standing, Elsevier, 2013, 117 (11), pp.Page 1610-1627. 10.1016/j.cviu.2013.07.004. hal-00858390v1 HAL Id: hal-00858390 https://hal.archives-ouvertes.fr/hal-00858390v1 Submitted on 5 Sep 2013 (v1), last revised 6 Sep 2013 (v2) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Markov Random Field Modeling, Inference & Learning in Computer Vision & Image Understanding: A Survey Chaohui Wanga,b, Nikos Komodakisc, Nikos Paragiosa,d aCenter for Visual Computing, Ecole Centrale Paris, Grande Voie des Vignes, Chˆatenay-Malabry, France bPerceiving Systems Department, Max Planck Institute for Intelligent Systems, T¨ubingen, Germany cLIGM laboratory, University Paris-East & Ecole des Ponts Paris-Tech, Marne-la-Vall´ee, France dGALEN Group, INRIA Saclay - Ileˆ de France, Orsay, France Abstract In this paper, we present a comprehensive survey of Markov Random Fields (MRFs) in computer vision and image understanding, with respect to the modeling, the inference and the learning.
    [Show full text]
  • Belief Propagation for the (Physicist) Layman
    Belief Propagation for the (physicist) layman Florent Krzakala ESPCI Paristech Laboratoire PCT, UMR Gulliver CNRS-ESPCI 7083, 10 rue Vauquelin, 75231 Paris, France [email protected] http: // www. pct. espci. fr/ ∼florent/ (Dated: January 30, 2011) These lecture notes have been prepared for a series of course in a master in Lyon (France), and other teaching in Beijing (China) and Tokyo (Japan). It consists in a short introduction to message passing algorithms and in particular Belief propagation, using a statistical physics formulation. It is intended as a first introduction to such ideas which start to be now widely used in both physics, constraint optimization, Bayesian inference and quantitative biology. The reader interested to pursur her journey beyond these notes is refer to [1]. I. WHY STATISTICAL PHYSICS ? Why should we speak about statistical physics in this lecture about complex systems? A. Physics First of all, because of physics itself. Statistical physics is a fondamental tool and one of the most formidable success of modern physics. Here, we shall however use it only as a probabilist toolbox for other problems. It is perhaps better to do a recap before we start. In statistical physics, we consider N discrete (not always, actually, but for the matter let us assume discrete) variables σi and a cost function H({σ}) which we call the Hamiltonian. We introduce the temperature T (and the inverse temperature β = 1/T and the idea is that each configurations, or assignments, of the variable ({σ}) has a weight e−βH({σ}), which is called the
    [Show full text]
  • Efficient Belief Propagation for Higher Order Cliques Using Linear
    * 2. Manuscript Efficient Belief Propagation for Higher Order Cliques Using Linear Constraint Nodes Brian Potetz ∗, Tai Sing Lee Department of Computer Science & Center for the Neural Basis of Cognition Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213 Abstract Belief propagation over pairwise connected Markov Random Fields has become a widely used approach, and has been successfully applied to several important computer vision problems. However, pairwise interactions are often insufficient to capture the full statistics of the problem. Higher-order interactions are sometimes required. Unfortunately, the complexity of belief propagation is exponential in the size of the largest clique. In this paper, we introduce a new technique to compute belief propagation messages in time linear with respect to clique size for a large class of potential functions over real-valued variables. We discuss how this technique can be generalized to still wider classes of potential functions at varying levels of efficiency. Also, we develop a form of nonparametric belief representation specifically designed to address issues common to networks with higher-order cliques and also to the use of guaranteed-convergent forms of belief propagation. To illustrate these techniques, we perform efficient inference in graphical models where the spatial prior of natural images is captured by 2 2 cliques. This approach × shows significant improvement over the commonly used pairwise-connected models, and may benefit a variety of applications using belief propagation to infer images or range images, including stereo, shape-from-shading, image-based rendering, seg- mentation, and matting. Key words: Belief propagation, Higher order cliques, Non-pairwise cliques, Factor graphs, Continuous Markov Random Fields PACS: ∗ Corresponding author.
    [Show full text]
  • 13 : Variational Inference: Loopy Belief Propagation and Mean Field
    10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction Inference problems involve answering a query that concerns the likelihood of observed data. For example, to P answer the query on a marginal p(xA), we can perform the marginalization operation to derive C=A p(x). Or, for queries concern the conditionals, such as p(xAjxB), we can first compute the joint, and divide by the marginals p(xB). Sometimes to answer a query, we might also need to compute the mode of density x^ = arg maxx2X m p(x). So far, in the class, we have covered the exact inference problem. To perform exact inference, we know that brute force search might be too inefficient for large graphs with complex structures, so a family of message passing algorithm such as forward-backward, sum-product, max-product, and junction tree, was introduced. However, although we know that these message-passing based exact inference algorithms work well for tree- structured graphical models, it was also shown in the class that they might not yield consistent results for loopy graphs, or, the convergence might not be guaranteed. Also, for complex graphical models such as the Ising model, we cannot run exact inference algorithm such as the junction tree algorithm, because it is computationally intractable. In this lecture, we look at two variational inference algorithms: loopy belief propagation (yww) and mean field approximation (pschulam). 2 Loopy Belief Propagation The general idea for loopy belief propagation is that even though we know graph contains loops and the messages might circulate indefinitely, we still let it run anyway and hope for the best.
    [Show full text]
  • Characterization of Belief Propagation and Its Generalizations
    MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Characterization of belief propagation and its generalizations Jonathan S. Yedidia, William T. Freeman, Yair Weiss TR2001-15 March 2001 Abstract Graphical models are used in many scientific disciplines, including statistical physics, machine learning, and error-correcting coding. One typically seeks the marginal probability of selected variables (nodes of the graph) given observations at other variables. Belief propagation (BP) is a fast marginalization method, based on passing local messages. Designed for singly-connected graphs, BP nonetheless works well in many applications involving graphs with loops, for rea- sons that were not well understood. We characterize the BP solutions, showing that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This understanding lets us for construct new message-passing algorithms based on improvements to Bethe’s approximation introduced by Kikuchi and others. The new generalized belief propagation (GBP) algorithms are much more accurate than ordinary BP for some problems, and permit solutions to Kikuchi approximations for otherwise intractable inho- mogeneous systems. We illustrate GBP with a spin-glass example and an error-correcting code, showing dramatically improved estimates of local magnetizations and decoding performance using GBP. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice.
    [Show full text]
  • Particle Belief Propagation
    Particle Belief Propagation Alexander Ihler David McAllester [email protected] [email protected] Dept. of Computer Science Toyota Technological Institite, Chicago University of California, Irvine Abstract passing” algorithms such as belief propagation (Pearl, 1988). Traditionally, most such work has focused on systems of many variables, each of which has a rela- The popularity of particle filtering for infer- tively small state space (number of possible values), ence in Markov chain models defined over or particularly nice parametric forms (such as jointly random variables with very large or contin- Gaussian distributions). For systems with continuous- uous domains makes it natural to consider valued variables, or discrete-valued variables with very sample–based versions of belief propagation large domains, one possibility is to reduce the effec- (BP) for more general (tree–structured or tive state space through gating, or discarding low- loopy) graphs. Already, several such al- probability states (Freeman et al., 2000; Coughlan and gorithms have been proposed in the litera- Ferreira, 2002), or through random sampling (Arulam- ture. However, many questions remain open palam et al., 2002; Koller et al., 1999; Sudderth et al., about the behavior of particle–based BP al- 2003; Isard, 2003; Neal et al., 2003). The best-known gorithms, and little theory has been devel- example of the latter technique is particle filtering, de- oped to analyze their performance. In this fined on Markov chains, in which each distribution is paper, we describe a generic particle belief represented using a finite collection of samples, or par- propagation (PBP) algorithm which is closely ticles. It is therefore only natural to consider general- related to previously proposed methods.
    [Show full text]
  • Belief Propagation Neural Networks
    Belief Propagation Neural Networks Jonathan Kuck1, Shuvam Chakraborty1, Hao Tang2, Rachel Luo1, Jiaming Song1, Ashish Sabharwal3, and Stefano Ermon1 1Stanford University 2Shanghai Jiao Tong University 3Allen Institute for Artificial Intelligence {kuck,shuvamc,rsluo,tsong,ermon}@stanford.edu, [email protected], [email protected] Abstract Learned neural solvers have successfully been used to solve combinatorial opti- mization and decision problems. More general counting variants of these problems, however, are still largely solved with hand-crafted solvers. To bridge this gap, we introduce belief propagation neural networks (BPNNs), a class of parameterized operators that operate on factor graphs and generalize Belief Propagation (BP). In its strictest form, a BPNN layer (BPNN-D) is a learned iterative operator that provably maintains many of the desirable properties of BP for any choice of the parameters. Empirically, we show that by training BPNN-D learns to perform the task better than the original BP: it converges 1.7x faster on Ising models while pro- viding tighter bounds. On challenging model counting problems, BPNNs compute estimates 100’s of times faster than state-of-the-art handcrafted methods, while returning an estimate of comparable quality. 1 Introduction Probabilistic inference problems arise in many domains, from statistical physics to machine learning. There is little hope that efficient, exact solutions to these problems exist as they are at least as hard as NP-complete decision problems. Significant research has been devoted across the fields of machine learning, statistics, and statistical physics to develop variational and sampling based methods to approximate these challenging problems [13, 34, 48, 6, 38].
    [Show full text]