Understanding and Improving Belief Propagation

Understanding and Improving Belief Propagation Een wetenschappelijke proeve op het gebied van de Natuurwetenschappen, Wiskunde en Informatica Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen, op gezag van de rector magnificus prof. mr. S.C.J.J. Kortmann, volgens besluit van het College van Decanen in het openbaar te verdedigen op woensdag 7 mei 2008 om 13.30 uur precies door Joris Marten Mooij geboren op 11 maart 1980 te Nijmegen Promotor: Prof. dr. H.J. Kappen Manuscriptcommissie: Prof. dr. N.P. Landsman Prof. dr. M. Opper (University of Southampton) Prof. dr. Z. Ghahramani (University of Cambridge) Prof. dr. T.S. Jaakkola (Massachusetts Institute of Technology) Dr. T. Heskes The research reported here was sponsored by the Interactive Collaborative Informa- tion Systems (ICIS) project (supported by the Dutch Ministry of Economic Affairs, grant BSIK03024) and by the Dutch Technology Foundation (STW). Copyright c 2008 Joris Mooij ISBN 978-90-9022787-0 Gedrukt door PrintPartners Ipskamp, Enschede Contents Title pagei Table of Contents iii 1 Introduction1 1.1 A gentle introduction to graphical models1 1.1.1 The Asia network: an example of a Bayesian network2 1.1.2 The trade-off between computation time and accuracy7 1.1.3 Image processing: an example of a Markov random field9 1.1.4 Summary 16 1.2 A less gentle introduction to Belief Propagation 17 1.2.1 Bayesian networks 17 1.2.2 Markov random fields 18 1.2.3 Factor graphs 19 1.2.4 Inference in graphical models 20 1.2.5 Belief Propagation: an approximate inference method 21 1.2.6 Related approximate inference algorithms 24 1.2.7 Applications of Belief Propagation 24 1.3 Outline of this thesis 25 2 Sufficient conditions for convergence of BP 29 2.1 Introduction 29 2.2 Background 30 2.2.1 Factor graphs 30 2.2.2 Belief Propagation 31 2.3 Special case: binary variables with pairwise interactions 33 2.3.1 Normed spaces, contractions and bounds 34 2.3.2 The basic tool 35 2.3.3 Sufficient conditions for BP to be a contraction 35 2.3.4 Beyond norms: the spectral radius 37 2.3.5 Improved bound for strong local evidence 39 2.4 General case 41 2.4.1 Quotient spaces 42 2.4.2 Constructing a norm on V 43 2.4.3 Local ` norms 44 1 2.4.4 Special cases 47 2.4.5 Factors containing zeros 48 2.5 Comparison with other work 49 2.5.1 Comparison with work of Tatikonda and Jordan 49 2.5.2 Comparison with work of Ihler et al. 51 2.5.3 Comparison with work of Heskes 52 2.6 Numerical comparison of various bounds 53 2.6.1 Uniform couplings, uniform local field 53 2.6.2 Nonuniform couplings, zero local fields 55 2.6.3 Fully random models 56 2.7 Discussion 56 2.A Generalizing the `1-norm 58 2.B Proof that (2.43) equals (2.44) 60 3 BP and phase transitions 63 3.1 Introduction 63 3.2 The Bethe approximation and the BP algorithm 65 3.2.1 The graphical model 65 3.2.2 Bethe approximation 67 3.2.3 BP algorithm 68 3.2.4 The connection between BP and the Bethe approximation 68 3.3 Stability analysis for binary variables 69 3.3.1 BP for binary variables 69 3.3.2 Local stability of undamped, parallel BP fixed points 70 3.3.3 Local stability conditions for damped, parallel BP 70 3.3.4 Uniqueness of BP fixed points and convergence 71 3.3.5 Properties of the Bethe free energy for binary variables 72 3.4 Phase transitions 73 3.4.1 Ferromagnetic interactions 73 3.4.2 Antiferromagnetic interactions 74 3.4.3 Spin-glass interactions 75 3.5 Estimates of phase-transition temperatures 76 3.5.1 Random graphs with arbitrary degree distributions 76 3.5.2 Estimating the PA-FE transition temperature 76 3.5.3 The antiferromagnetic case 78 3.5.4 Estimating the PA-SG transition temperature 78 3.6 Conclusions 79 3.A Proof of Theorem 3.2 80 4 Loop Corrections 83 4.1 Introduction 83 4.2 Theory 85 4.2.1 Graphical models and factor graphs 85 4.2.2 Cavity networks and loop corrections 86 4.2.3 Combining approximate cavity distributions to cancel out errors 88 4.2.4 A special case: factorized cavity distributions 91 4.2.5 Obtaining initial approximate cavity distributions 93 4.2.6 Differences with the original implementation 94 4.3 Numerical experiments 96 4.3.1 Random regular graphs with binary variables 98 4.3.2 Multi-variable factors 105 4.3.3 Alarm network 106 4.3.4 Promedas networks 107 4.4 Discussion and conclusion 109 4.A Original approach by Montanari and Rizzo (2005) 112 4.A.1 Neglecting higher-order cumulants 114 4.A.2 Linearized version 114 5 Novel bounds on marginal probabilities 117 5.1 Introduction 117 5.2 Theory 118 5.2.1 Factor graphs 119 5.2.2 Convexity 120 5.2.3 Measures and operators 120 5.2.4 Convex sets of measures 122 5.2.5 Boxes and smallest bounding boxes 124 5.2.6 The basic lemma 126 5.2.7 Examples 127 5.2.8 Propagation of boxes over a subtree 130 5.2.9 Bounds using self-avoiding walk trees 133 5.3 Related work 140 5.3.1 The Dobrushin-Tatikonda bound 140 5.3.2 The Dobrushin-Taga-Mase bound 141 5.3.3 Bound Propagation 141 5.3.4 Upper and lower bounds on the partition sum 142 5.4 Experiments 142 5.4.1 Grids with binary variables 143 5.4.2 Grids with ternary variables 145 5.4.3 Medical diagnosis 145 5.5 Conclusion and discussion 148 List of Notations 151 Bibliography 155 Summary 163 Samenvatting 167 Publications 171 Acknowledgments 173 Curriculum Vitae 175 Chapter 1. Introduction 1 Chapter 1 Introduction This chapter gives a short introduction to graphical models, explains the Belief Propagation algorithm that is central to this thesis and motivates the research reported in later chapters. The first section uses intuitively appealing examples to illustrate the most important concepts and should be readable even for those who have no background in science. Hopefully, it succeeds in giving a relatively clear answer to the question \Can you explain what your research is about?" that often causes the author some difficulties at birthday parties. The second section does assume a background in science. It gives more precise definitions of the concepts introduced earlier and it may be skipped by the less or differently specialized reader. The final section gives a short introduction to the research questions that are studied in this thesis. 1.1 A gentle introduction to graphical models Central to the research reported in this thesis are the concepts of probability theory and graph theory, which are both branches of mathematics that occur widely in many different applications. Quite recently, these two branches of mathematics have been combined in the field of graphical models. In this section I will explain by means of two \canonical" examples the concept of graphical models. Graphical models can be roughly divided into two types, called Bayesian networks and Markov random fields. The concept of a Bayesian network will be introduced in the first subsection using an example from the context of medical diagnosis. In the second subsection, we will discuss the basic trade-off in the calculation of (approximations to) probabilities, namely that of computation time and accuracy. In the third subsection, the concept of a Markov random field will be explained using an example from the field of image processing. The fourth subsection is a short summary and 2 Chapter 1 A S Random variable Meaning A Recent trip to Asia T L T Patient has tuberculosis S Patient is a smoker L Patient has lung cancer B Patient has bronchitis E B E Patient has T and/or L X Chest X-Ray is positive D Patient has dyspnoea X D Figure 1.1: The Asia network, an example of a Bayesian network. briefly describes the research questions addressed in this thesis. 1.1.1 The Asia network: an example of a Bayesian network To explain the concept of a Bayesian network, I will make use of the highly simplified and stylized hypothetical example of a doctor who tries to find the most probable diagnosis that explains the symptoms of a patient. This example, called the Asia network, is borrowed from Lauritzen and Spiegelhalter[1988]. The Asia network is a simple example of a Bayesian network. It describes the probabilistic relationships between different random variables, which in this partic- ular example correspond to possible diseases, possible symptoms, risk factors and test results. The Asia network illustrates the mathematical modeling of reasoning in the presence of uncertainty as it occurs in medical diagnosis. A graphical representation of the Asia network is given in figure 1.1. The nodes of the graph (visualized as circles) represent random variables. The edges of the graph connecting the nodes (visualized as arrows between the circles) represent probabilistic dependencies between the random variables. Qualitatively, the model represents the following (highly simplified) medical knowledge. A recent trip to Asia (A) increases the chance of contracting tuberculosis (T ). Smoking (S) is a risk factor for both lung cancer (L) and bronchitis (B). The presence of either (E) tuberculosis or lung cancer can be detected by an X-ray (X), but the X-ray cannot distinguish between them.

Load more