CS 3750 Machine Learning Lecture 5
Markov Random Fields
Milos Hauskrecht [email protected] 5329 Sennott Square
CS 3750 Advanced Machine Learning
Markov random fields
• Probabilistic models with symmetric dependences. – Typically models spatially varying quantities
P( x) ∝ ∏ φ c ( xc ) c∈cl ( x )
φ c ( xc ) - A potential function (defined over factors)
- If φ c ( x c ) is strictly positive we can rewrite the definition as: 1 P( x) = exp − E ( x ) ∑ c c - Energy function Z c∈cl ( x ) - Gibbs (Boltzman) distribution Z = exp − E ( x ) - A partition function ∑∑ c c xx∈∈{)x} c cl (
CS 3750 Advanced Machine Learning Graphical representation of MRFs
An undirected network (also called independence graph) • G = (S, E) – S=1, 2, .. N correspond to random variables – (i, j) ∈ E ⇔ ∃c :{i, j} ⊂ c
or xi and xj appear within the same factor c Example: – variables A,B ..H A G – Assume the full joint of MRF H P( A, B,... H ) ~ B C F φ1 ( A, B, C )φ 2 (B, D, E )φ3 ( A, G ) φ (C , F )φ (G , H )φ (F , H ) 4 5 6 D E
CS 3750 Advanced Machine Learning
Markov random fields
• regular lattice (Ising model)
• Arbitrary graph
CS 3750 Advanced Machine Learning Markov random fields
• Pairwise Markov property – Two nodes in the network that are not directly connected can be made independent given all other nodes • Local Markov property – A set of nodes (variables) can be made independent from the rest of nodes variables given its immediate neighbors • Global Markov property – A vertex set A is independent of the vertex set B (A and B are disjoint) given set C if all chains in between elements in A and B intersect C
CS 3750 Advanced Machine Learning
Types of Markov random fields
• MRFs with discrete random variables – Clique potentials can be defined by mapping all clique- variable instances to R – Example: Assume two binary variables A,B with values {a1,a2,a3} and {b1,b2} are in the same clique c. Then:
a1 b1 0.5 φ c ( A, B) ≅
a1 b2 0.2
a2 b1 0.1
a2 b2 0.3
a3 b1 0.2
a3 b2 0.4
CS 3750 Advanced Machine Learning Types of Markov random fields
• Gaussian Markov Random Field x ~ N (µ, Σ) 1 1 p(x | µ, Σ) = exp − (x − µ)T Σ −1 (x − µ) d / 2 1/ 2 (2π ) Σ 2
• Precision matrix Σ −1 • Variables in x are connected in the network only if they have a nonzero entry in the precision matrix – All zero entries are not directly connected – Why?
CS 3750 Advanced Machine Learning
Tree decomposition of the graph
A G • A tree decomposition of H a graph G: B C F – A tree T with a vertex set associated to every node. D E – For all edges {v,w}∈G: there is a set containing both v and w in T. AA G – For every v ∈G : the nodes G C G B C C F H F in T that contain v form a connected subtree. B D E
CS 3750 Advanced Machine Learning Tree decomposition of the graph
A G • A tree decomposition of H a graph G: B C F – A tree T with a vertex set associated to every node. D E Cliques in – For all edges {v,w}∈G: the graph there is a set containing both v and w in T. AA G – For every v ∈G : the nodes G C G B C C F H F in T that contain v form a connected subtree. B D E
CS 3750 Advanced Machine Learning
Tree decomposition of the graph
A G • Another tree H decomposition of a B C F graph G: – A tree T with a vertex set D E associated to every node. – For all edges {v,w}∈G: there is a set containing AA G both v and w in T. G C H B C C F – For every v ∈G : the nodes in T that contain v form a B D connected subtree. E
CS 3750 Advanced Machine Learning Treewidth of the graph
A G • Width of the tree H decomposition: B C F maxi∈I | X i | −1 • Treewidth of a graph D E G: tw(G)= minimum width over all tree AA G decompositions of G. G C G B C C F H F
B D E
CS 3750 Advanced Machine Learning
Trees
Why do we like trees? • Inference in trees structures can be done in time linear in the number of nodes
AB C D
E
CS 3750 Advanced Machine Learning Clique tree
• Clique tree = a tree decomposition of the graph • Can be constructed: – from the induced graph Built by running the variable elimination procedure – from the chordal graph Built by running the triangulation algorithm
• We have precompiled the clique tree. • So how to take advantage of the clique tree to perform inferences?
CS 3750 Advanced Machine Learning
VE on the Clique tree
• Variable Elimination on the clique tree – works on factors • Makes factor a data structure – Sends and receives messages • Cluster graph for set of factors, each node i is associated with
a subset (cluster) Ci. – Family-preserving: each factor’s variables are completely embedded in a cluster
CS 3750 Advanced Machine Learning Clique tree properties
• Sepset Sij = Ci ∩C j – separation set: Variables X on one side of sepset are separated from the variables Y on the other side in the factor graph given variables in S • Running intersection property
–if Ci and Cj both contain X, then all cliques on the unique path between them also contain X
CS 3750 Advanced Machine Learning
C Clique trees D I π 0 (G, S, I) G S C,D G,I,D G,S,I L K H J π 0 (C, D) π 0 (G, I, D)
G,J,S,L S,K Running intersection: E.g. Cliques involving S form a connected subtree. π 0 (G, J, S, L) 0 H,G,J Initial potentials π i : 0 Assign factors to cliques π (H,G, J ) and multiply them.
CS 3750 Advanced Machine Learning Message Passing VE
• Query for P(J) C τ (D) = π 0[C,D] –Eliminate C: 1 ∑ 1 D I C G C,D G,I,D G,S,I Message sent S Æ from [C,D] D to [G,I,D] L Message received K J at [G,I,D] -- G,J,S,L S,K H [G,I,D] updates: H,G,J
0 π2[G,I,D]=τ1(D)×π2[G,I,D]
CS 3750 Advanced Machine Learning
Message Passing VE
• Query for P(J) C τ (G,I) = π [G,I,D] –Eliminate D: 2 ∑ 2 D I D G C,D G,I,D G,S,I Message sent Æ Æ from [G,I,D] S G,I D to [G,S,I] L Message received K J at [G,S,I] -- G,J,S,L SK H [G,S,I] updates: H,G,J
0 π3[G,S,I]=τ2(G,I)×π3[G,S,I]
CS 3750 Advanced Machine Learning Message Passing VE
• Query for P(J)
–Eliminate I:τ3(G,S) = ∑π3[G,S,I] C I D I C,D G,I,D G,S,I Message sent Æ G Æ from [G,S,I] D G,I L G,S to [G,J,S,L] S Message received L at [G,J,S,L] -- G,J,S,L S,K K [G,J,S,L] updates: H J
H,G,J 0 π4[G,J,S,L]=τ3(G,S)×π4[G,J,S,L] ! [G,J,S,L] is not ready!
CS 3750 Advanced Machine Learning
Message Passing VE
• Query for P(J) C –Eliminate H:τ4(G,J) = ∑π5[H,G,J] H D I C,D G,I,D G,S,I Message sent Æ G Æ from [H,G,J] D G,I L G,S to [G,J,S,L] S L G,J,S,L S,K K H J K G,J H,G,J 0 π4[G,J,S,L]=τ3(G,S)×τ4(G,J)×π4[G,J,S,L] And …
CS 3750 Advanced Machine Learning Message Passing VE
• Query for P(J) 0 –Eliminate K: τ6(S) = ∑π [S,K] C K D I C,D G,I,D G,S,I Message sent Æ G Æ from [S,K] D G,I L G,S to [G,J,S,L] S Å L G,J,S,L S S,K K All messages H J received at [G,J,S,L] K G,J [G,J,S,L] updates: H,G,J
0 π4[G,J,S,L]=τ3(G,S)×τ4(G,J)×τ6(S)×π4[G,J,S,L] And calculate P(J) from it by summing out G,S,L
CS 3750 Advanced Machine Learning
Message Passing VE
• [G,J,S,L] clique potential • … is used to finish the inference
C,D G,I,D G,S,I Æ Æ D G,I L G,S
G,J,S,L S,K Å S G,J K
H,G,J
CS 3750 Advanced Machine Learning Message passing VE
•Often, many marginals are desired – Inefficient to re-run each inference from scratch – One distinct message per edge & direction • Methods : – Compute (unnormalized) marginals for any vertex (clique) of the tree
– Results in a calibrated clique tree ∑ π i = ∑ π j C i − S ij C j − S ij
• Recap: three kinds of factor objects – Initial potentials, final potentials and messages
CS 3750 Advanced Machine Learning
Two-pass message passing VE
• Chose the root clique, e.g. [S,K] • Propagate messages to the root
C,D G,I,D G,S,I Æ Æ D G,I L G,S
G,J,S,L S,K Æ S G,J K
H,G,J
CS 3750 Advanced Machine Learning Two-pass message passing VE
• Send messages back from the root
Å Å D G,I C,D G,I,D G,S,I
G,S
G,J,S,L S,K Å S G,J L Notation: number the cliques and H,G,J denote the messages
δ i → j
CS 3750 Advanced Machine Learning
Message Passing: BP
• Graphical model of a distribution – More edges = larger expressive power – Clique tree also a model of distribution – Message passing preserves the model but changes the parameterization • Different but equivalent algorithm
CS 3750 Advanced Machine Learning Factor division
A=1 B=1 0.5 A=1 B=1 0.5/0.4=1.25
A=1 B=2 0.4 A=1 B=2 0.4/0.4=1.0 A=1 0.4 A=2 B=1 0.8 A=2 B=1 0.8/0.4=2.0 A=2 0.4 A=2 B=2 0.2 A=3 0.5 A=2 B=2 0.2/0.4=2.0
A=3 B=1 0.6 A=3 B=1 0.6/0.5=1.2
A=3 B=2 0.5 A=3 B=2 0.5/0.5=1.0
Inverse of factor product
CS 3750 Advanced Machine Learning
Message Passing: BP
• Each node: multiply all the messages and divide by the one that is coming from node we are sending the message to – Clearly the same as VE
∑ π i ∑ ∏ δ k → i C i − S ij C i − S ij k ∈ N ( i ) δ i → j = = = ∑ ∏ δ k → i δ j → i δ j → i C i − S ij k ∈ N ( i ) \ j
– Initialize the messages on the edges to 1
CS 3750 Advanced Machine Learning Message Passing: BP
µ2,3 =1
A,B B B,C C C,D
0 0 0 π1 (A, B) π 2 (B,C) π 3 (C, D)
0 Store the last message δ 2→3 = ∑π 2 (B,C) on the edge and divide B each passing message by the last stored.
0 δ 2−>3 0 0 π 3 (C, D) = π 3 (C, D) = π 3 (C, D)∑π 2 (B,C) µ2,3 B
0 µ2,3 = δ 2→3 = ∑π 2 (B,C) New message B
CS 3750 Advanced Machine Learning
Message Passing: BP
0 µ2,3 = ∑π 2 (B,C) B
A,B B B,C C C,D
0 0 π1 (A, B) π 2 (B,C) π 3 (C, D)
0 0 0 π 3 (C, D) = π 3 (C, D)∑π 2 (B,C) = π 3 (C, D)µ2,3 Store the last message B on the edge and divide each passing message δ3−>2 = ∑π 3 (C, D) by the last stored. D
0 0 δ3−>2 π2 (B,C) 0 0 0 π2 (B,C) =π2 (B,C) = ×∑π3 (C,D)×µ2,3(C) =π2 (B,C)×∑π3 (C,D) µ2,3(C) µ2,3(C) D D
0 0 µ2,3 = δ3−>2 = ∑π 3 (C, D) = ∑∑π 3 (C, D) π 2 (B,C) New message D DB
CS 3750 Advanced Machine Learning Message Passing: BP
0 0 µ2,3 = ∑π 3 (C, D)∑π 2 (B,C) DB
A,B B B,C C C,D
0 π1 (A, B) π 2 (B,C) π 3 (C, D)
0 0 π 3 (C, D) = π 3 (C, D)∑π 2 (B,C) Store the last message B on the edge and divide each passing message δ3−>2 = ∑π 3 (C, D) by the last stored. D 0 0 The same as before π2 (B,C) =π2 (B,C)×∑π3 (C,D) D 0 0 ∑π3 (C,D)×∑π2 (B,C) δ3−>2 DB π2 (B,C) =π2 (B,C) =π2 (B,C)× 0 0 =π2 (B,C) µ2,3(C) ∑∑π3 (C,D)× π2 (B,C) DB
CS 3750 Advanced Machine Learning
Message Propagation: BP
• Lauritzen-Spiegelhalter algorithm • Two kinds of objects: clique and sepset potentials – Initial potentials not kept • Improved “stability” of asynchronous algorithm (repeated messages cancel out) • New distribution representation – clique tree potential
∏ π i (C i ) C i ∈T π T = = PF ( X ) ∏ µ ij ( S ij ) ( C i ↔ C j )∈T
– Clique tree invariant = PF
CS 3750 Advanced Machine Learning Loopy belief propagation • The asynchronous BP algorithm works on clique trees • What if we run the belief propagation algorithm on a non-tree structure? A,B B B,C A B
A C D C
A,D D C,D • Sometimes converges • If it converges it leads to an approximate solution • Advantage: tractable for large graphs
CS 3750 Advanced Machine Learning
Loopy belief propagation
• If the BP algorithm converges, it converges to an optimum of the Bethe free energy See papers: • Yedidia J.S., Freeman W.T. and Weiss Y. Generalized Belief Propagation, 2000 • Yedidia J.S., Freeman W.T. and Weiss Y. Understanding Belief Propagation and Its Generalizations, 2001
CS 3750 Advanced Machine Learning