Markov Random Field

PROBABILISTIC MODELS FOR STRUCTURED DATA 04: Markov Random Field Instructor: Yizhou Sun [email protected] January 29, 2019 Content • From Sequence to Graph: Markov Random Field • Inference • VE • Belief Propagation • Loopy Belief Propagation • Learning • Exponential Family • The Learning Framework • Summary 2 From Sequence to Graph • Dependency exists among data points, which form a graph • Examples • Image semantic labeling (a regular graph) • User profiling in social network (a general graph) e.g., Friends tend to have the same voting preference: Democratic vs. Republican 3 Motivating Example • Modeling voting preference among persons • Person: A, B, C, D • Each person can take a binary value • 1: democratic • 0: republican • Friendship: (A,B), (B,C), (C,D), and (D,A) • Friends tend to take the same value • Indicated by a factor ( , )—assign higher score to consistent votes among friends • The joint probability • , , , = ( , ) ( , ) ( , ) ( , ) 1 = ( , ) ( , ) ( , ) ( , ) , ,, : normalization constant 4 ∑ Motivating Example (Cont.) • Given the model, some interesting questions can be asked: • What will be the most likely vote assignment such that the joint probability is maximized? • If we know A is republican and C is democratic, what will be the most likely vote for B and D? • For the model, how can we learn the parameters, i.e., the score to each possible factor configuration? • E.g., = = 1, = 1 =? 11 5 Formal Definition • A Markov Random Field (MRF) is a probability distribution P over random variables , , … , defined by a undirected graph G 1 2 • , , … , = ( ) Gibbs Distribution 1 • C: cliques in graph G 1 2 ∈ • ( ): factor or potential∏ function • = , ,…, ( ) : partition function • 1 2 ( ) log-linear∑ form ∏ (if∈ >0) • , , … , = exp ( ( )) = 1 exp( ( )) 1 2 ∑∈ 1 • = − ∑∈ • = ( ): energy function (the lower the better) − ∑∈ 6 Graphical Representation of MRFs • = , • = {1,2, … , }, a set of random variables • , : , (or appear in the same factor) • Neighbor: ∈ ⟺ ∃ = { : ⊂, } • Example ∈ • , , … , , , , , , , , ( , ) ∝ 1 2 3 4 5 6 7 Conditional Independence in MRF • Key properties • Global Markov property • For sets of nodes A, B, and C, | , iff C separates A from B in the graph • E.g., 1,2 {6,7}|{3,4,5} ⊥ • Local Markov⊥ property • A set of nodes are independent from the rest of the nodes in the graph, given its immediate neighbors (Markov blanket) • E.g., 1 {4,5,6,7}|{2,3} • Pairwise ⊥Markov property • Two nodes in the network that are not directly connected can be made independent given all other nodes • E.g., 1 {7}|{2,3,4,5,6} ⊥ 8 Hammersley-Clifford Theorem > 0 & satisfy the conditional independence properties of an undirected graph G can be factorized by a product of factors, one per maximal clique: i.e., , , … , = ( ) 1 1 2 ∏∈ 9 Examples of MRFs: Discrete MRFs • Nodes are discrete random variables • Factor or clique potentials: given a configuration of random variables in a clique, assign a real number to it • E.g., two variables A and B with values , , { , } in the same clique c, a possible potential function can be defined as: 1 2 3 1 2 Scope of a factor: the set of the variables defining the factor, e.g., {A,B} in this case 10 Examples of MRFs: Gaussian MRFs • Gaussian MRFs • Nodes are continuous random variables • Precision matrix: H = • Variables in x are connected− in1 the network only if they have a nonzero entry in theΣ precision matrix • Potentials are defined over cliques of edges • For an edge , , the potential is defined as exp{ } 1 − 2 − − 11 Examples of MRFs: Pairwise MRF • A very simple factorization • Only considers factors on vertexes and edges • , , … , = , , 1 • E.g., Ising model: mathematical model of ferromagnetism 1 2 ∈ ∈ in statistical mechanics∏ ∏ • Each atom takes two discrete values {+1, -1} • ∈ • > 0, ferromagnetic • < 0, antiferromagnetic 12 Content • From Sequence to Graph: Markov Random Field • Inference • VE • Belief Propagation • Loopy Belief Propagation • Learning • Exponential Family • The Learning Framework • Summary 13 Inference Problems • Marginal distribution inference • What is the marginal probability of a random variable ? • E.g., what is the marginal probability of one person’s voting preference being “republican”? • What is the marginal probability of a random variable conditioning on some variables are observed | = 1 ? • E.g., what is the marginal probability of one person’s voting preference given some people’s voting preference observed? • Maximum a posterior inference (MAP) • What is the most likely assignment of a set of random variables (possibly with some evidence)? • E.g., what are the most likely voting preferences for everyone in a social network? 14 Inference Methods • Marginal inference • Variable elimination • Belief propagation: sum-product message passing • MAP inference • Belief propagation: max-product message passing 15 An Illustrative Example • Consider the marginal inference problem in Markov chain • , … , = ( | ) • Assuming each variable takes d discrete values 1 1 ∏=2 −1 • How to compute marginal probability of ? • = ( , … , ) = ( | ) ∑1 ⋯ ∑−1 1 =∑1 ⋯ ∑−1 1 ∏=2( |−1 ) ( | ) � −1 � −2 −3 ⋯ � 1 2 1 −1 −2 Intermediary1 factor Familiar procedure? 2 from ( ) Intermediary factor 2 16 −1 General VE Algorithm • Given an MRF (or other graphical model) • , … , = ( ) 1 • Compute1 unnormalized ∏∈ marginal, then normalize it after computation • , … , = ( ) • Compute by VE � 1 ∏∈ • Normalize �marginal: = 1 � 17 Factor Operations • Factor product • : = × • 3 1= 2 × 1 2 •3Scope of 1is the union of 2 scopes of and () • denotes3 an assignment to the variables1 in the2 scope of by restricting in that scope • E.g., , , : = , × ( , ) 3 1 2 18 Factor Operations (Cont.) • Factor marginalization • Locally eliminates a set of variables from a factor • E.g., = ( , ) ∑ marginalizing out variable B from factor ( , , )) 19 The Variable Elimination Algorithm • Given an eliminating order • Deciding the best order is an NP-hard problem • The algorithm • For each random variable following the given order 1. Multiply all factors containing 2. Marginalize out to obtain a new factor 3. Replace the factors containing by 20 Example of VE , , … , , , , , , , , , 1 2 3 4 5 6 •∝Compute () • Eliminate in the order E, D, H, F, G, C, A 21 22 23 24 G 25 26 27 Question • What can we obtain by ? ∑ � 28 Introducing Evidence • What if some variables are observed? P(Y, E = e) • P(Y E = e) = P(E=e) • E.g., ∣ = , = • Computation flow 1 2 • Perform variable elimination on P(Y, E = e) • For every factor that has variables in E, specify their values as e • Perform variable elimination on P(E = e) 29 Running Time • Time complexity • m: number of variables • d: number of states for each variable • M: maximum size of any factor during the elimination process e.g., size of , , is 3, we need to go through all the configuration (3*2*2=12) to get , 30 Elimination Orderings • NP-hard problem • Some useful heuristics • Min-neighbors: Choose a variable with the fewest dependent variables. • Min-weight: Choose variables to minimize the product of the cardinalities of its dependent variables. • Min-fill: Choose vertices to minimize the number of edges that will be added to the graph. 31 Content • From Sequence to Graph: Markov Random Field • Inference • VE • Belief Propagation • Loopy Belief Propagation • Learning • Exponential Family • The Learning Framework • Summary 32 Belief Propagation • Limitation of VE • Each running of VE can only address one query • E.g., ( ) ( ) need two runnings • Can we share1 intermediate 2 factors while computing? • Belief Propagation: Variable elimination as message passing 33 In the Case of Tree Structure • Compute marginal probability ( ) in a tree structure • Tree: no cycles (acyclic connected graph) • The optimal order: • Set as root • Traverse the nodes via postorder • Starts from leaf nodes, go up the tree after its children have been visited (left, right, up) Postorder: 4 5 2 3 1 34 In the Case of Tree Structure (Cont.) • At each step, eliminate following the proposed order • Suppose parent node is = ′ , ( ) • , • The size is 2 ∑ • Example: compute ( ) • Postorder: , , , , • Eliminate : = 3 ( , ) × 1 2 1 4 5 3 • Eliminate : = ( , ) × 2 21 1 ∑2 1 2 • Eliminate : = ( , ) × 1 1 13 3 ∑1 3 1 21 1 • Eliminate : = ( , ) × 1 4 43 3 ∑4 4 3 • = × × 5 53 3 ∑5 5 3 3 13 3 43 3 53 3 35 Message Passing View • When is marginalized out, it receives all the signals from variables underneath it from the tree • Which can be summarized as a factor ( ) • ( ) can be considered as a message that sends to • receives messages from all its immediate children to obtain the final marginal • What if we change the root of the tree, i.e., to compute the marginal of a different variable? • Do we need to re-compute messages? 36 The Message-Passing Algorithm • How do we compute all the messages we need? • A node sends a message to a neighbor whenever it has received messages from all nodes besides • 2 , • Each edge will receive messages twice: and 푤 � → → • These messages are intermediate factors in the VE algorithm 37 Sum-Product Message Passing • While there is a node ready to send message to (meaning has received all messages) • = , ( ) ( )\ • in previous example → ∑ ∏∈ → • The marginal probability

Load more