Graphical Models and Message Passing

Graphical Models and Message passing Yunshu Liu ASPITRG Research Group 2013-07-16 References: [1]. Steffen Lauritzen, Graphical Models, Oxford University Press, 1996 [2]. Michael I. Jordan, Graphical models, Statistical Science, Vol.19, No. 1. Feb., 2004 [3]. Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer-Verlag New York, Inc. 2006 [4]. Daphne Koller and Nir Friedman, Probabilistic Graphical Models - Principles and Techniques, The MIT Press, 2009 [5].Kevin P. Murphy, Machine Learning - A Probabilistic Perspective, The MIT Press, 2012 Outline Preliminaries on Graphical Models Directed graphical model Undirected graphical model Directed graph and Undirected graph Message passing and sum-product algorithm Factor graphs Sum-product algorithms Junction tree algorithm Outline I Preliminaries on Graphical Models I Message passing and sum-product algorithm Preliminaries on Graphical Models Motivation: Curse of dimensionality I Matroid enumeration I Polyhedron computation I Entropic Region I Machine Learning: computing likelihoods, marginal probabilities ··· Graphical Models: Help capturing complex dependencies among random variables, building large-scale statistical models and designing efficient algorithms for Inference. Preliminaries on Graphical Models Definition of Graphical Models: A graphical model is a probabilistic model for which a graph denotes the conditional dependence structure between random variables. MIT : Accepted by MIT Stanford : Accepted by Stanford MIT Example: Suppose MIT and GPA Stanford accepted undergraduate Stanford students only Given Alice’s GPA as GP AAlice, based on GPA P(MIT Stanford, GP AAlice)=P(MIT GP AAlice) | | We say MIT is conditionally independent of Stanford given GP AAlice Sometimes use symbol (MIT Stanford GP A ) ⊥ | Alice Bayesian networks: directed graphical model Bayesian networks A Bayesian network consists of a collection of probability distributions P over x = x1; ; xK that factorize over a directed acyclic graph(DAG)f ··· in the followingg way: Y p(x) = p(x ; ; x ) = p(x pa ) 1 ··· K k j k k2K where pak is the direct parents nodes of xk . Alias of Bayesian networks: I probabilistic directed graphical model: via directed acyclic graph(DAG) I belief networks I causal networks: directed arrows represent causal realtions Bayesian networks: directed graphical model Examples of Bayesian networks Consider an arbitrary joint distribution p(x) = p(x1; x2; x3) over three variables, we can write: p(x ; x ; x ) = p(x x ; x )p(x ; x ) 1 2 3 3j 1 2 1 2 = p(x x ; x )p(x x )p(x ) 3j 1 2 2j 1 1 which can be expressed in the following directed graph: x1 x3 x2 Bayesian networks: directed graphical model Examples Similarly, if we change the order of x1, x2 and x3(same as consider all permutations of them), we can express p(x1; x2; x3) in five other different ways, for example: p(x ; x ; x ) = p(x x ; x )p(x ; x ) 1 2 3 1j 2 3 2 3 = p(x x ; x )p(x x )p(x ) 1j 2 3 2j 3 3 which corresponding to the following directed graph: x1 x3 x2 Bayesian networks: directed graphical model Examples Recall the previous example about how MIT and Stanford accept undergraduate students, if we assign x1 to ”GPA”, x2 to ”accepted by MIT” and x3 to ”accepted by Stanford”, then since p(x x ; x ) = p(x x ) we have 3j 1 2 3j 1 p(x ; x ; x ) = p(x x ; x )p(x x )p(x ) 1 2 3 3j 1 2 2j 1 1 = p(x x )p(x x )p(x ) 3j 1 2j 1 1 which corresponding to the following directed graph: x1 GPA x3 Stanford x2 MIT 8.3. Markov Random Fields 385 Figure 8.28 For an undirected graph, the Markov blanket of a node xi consists of the set of neighbouring nodes. It has the property that the conditional distribution of xi, conditioned on all the remaining variables in the graph, is dependent only on the variables in the Markov blanket. If we consider two nodes xi and xj that are not connected by a link, then these variables must be conditionally independent given all other nodes in the graph. This follows from the fact that there is no direct path between the two nodes, and all other paths pass through nodes that are observed, and hence those paths are blocked. This conditional independence property can be expressed as p(xi,xj x i,j )=p(xi x i,j )p(xj x i,j ) (8.38) | \{ } | \{ } | \{ } where x i,j denotes the set x of all variables with xi and xj removed. The factor- \{ } ization of the joint distribution must therefore be such that xi and xj do not appear in the same factor in order for the conditional independence property to hold for all possible distributions belonging to the graph. This leads us to consider a graphical concept called a clique, which is defined as a subset of the nodes in a graph such that there exists a link between all pairs of nodes in the subset. In other words, the set of nodes in a clique is fully connected. Furthermore, a maximal clique is a clique such that it is not possible to include any other nodes from the graph in the set without it ceasing to be a clique. These concepts are illustrated by the undirected graph over four variables shown in Figure 8.29. This graph has five cliques of two nodes given by x1,x2 , x2,x3 , x3,x4 , x4,x2 , and x ,x , as well as two maximal cliques{ given by} {x ,x ,x} { and }x {,x ,x }. { 1 3} { 1 2 3} { 2 3 4} The set x1,x2,x3,x4 is not a clique because of the missing link from x1 to x4. We{ can thereforeMarkov define} the random factors in fields: the decomposition undirected of graphicalthe joint distribution model to be functions of the variablesIn the undirected in the cliques. case, the In probability fact, we candistribution consider factorizes functions of the maximal cliques,according without loss to functions of generality, defined because on the clique other of cliques the graph. must be subsets of maximal cliques. Thus, if x ,x ,x is a maximal clique and we define A clique is a subset{ 1 2 of nodes3} in a graph such that there exist a an arbitrary function overlink this between clique, all then pairs including of nodes in another the subset. factor defined over a subset of these variablesA would maximal beredundant. clique is a clique such that it is not possible to Let us denote a cliqueinclude by C anyand other the set nodes of variables from the graph in that in clique the set by withoutxC . Then it ceasing to be a clique. Figure 8.29 A four-node undirected graph showing a clique (outlined in green) and a maximal clique (outlined in blue). x1 Example of cliques: x2 x ,x , x ,x , x ,x , x ,x , x ,x , f 1 2g f 2 3g f 3 4g f 2 4g f 1 3g x1,x2,x3 , x2,x3,x4 Maximalf g cliques:f g x1,x2,x3 , x2,x3,x4 x f g f g 3 x4 Markov random fields: undirected graphical model Markov random fields: Definition Denote C as a clique, xC the set of variables in clique C and C(xC) a nonnegative potential function associated with clique C. Then a Markov random field is a collection of distributions that factorize as a product of potential functions C(xC) over the maximal cliques of the graph: 1 Y p(x) = (x ) Z C C C P Q where normalization constant Z = x C C(xC) sometimes called the partition function. Markov random fields: undirected graphical model Factorization of undirected graphs Question: how to write the joint distribution for this undirected graph? x1 x3 x2 (2 3 1) hold ⊥ | Answer: 1 p(x) = (x ; x ) (x ; x ) Z 12 1 2 13 1 3 where 12(x1; x2) and 13(x1; x3) are the potential functions and Z is the partition function that make sure p(x) satisfy the conditions to be a probability distribution. Markov random fields: undirected graphical model Markov property Given an undirected graph G = (V,E), a set of random variables X = (Xa)a2V indexed by V , we have the following Markov properties: I Pairwise Markov property: Any two non-adjacent variables are conditionally independent given all other variables: Xa X X if a; b E ? bj V nfa;bg f g 62 I Local Markov property: A variable is conditionally independent of all other variables given its neighbors: Xa XV nfnb(a)[ag Xnb(a) where? nb(a) is thej neighbors of node a. I Global Markov property: Any two subsets of variables are conditionally independent given a separating subset: XA XB XS, where every path from a node in A to a node? in Bj passes through S(means when we remove all the nodes in S, there are no paths connecting any nodes in A to any nodes in B). 662 Chapter 19. Undirected graphical models (Markov random fields) X1 X2 X3 X4 X5 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X16 X17 X18 X19 X20 (a) (b) Markov random fields: undirected graphical model Figure 19.1 (a) A 2d lattice represented as a DAG. The dotted red node X8 is independent of all other nodes (black) given its Markov blanket, which includeExamples its parents of Markov (blue), properties children (green) and co-parents Pairwise Markov property: (1 7 23456), (3 4 12567) (orange).

Load more