Markov Random Field

PROBABILISTIC MODELS FOR STRUCTURED DATA 04: Markov Random Field Instructor: Yizhou Sun [email protected] January 29, 2019 Content • From Sequence to Graph: Markov Random Field • Inference • VE • Belief Propagation • Loopy Belief Propagation • Learning • Exponential Family • The Learning Framework • Summary 2 From Sequence to Graph • Dependency exists among data points, which form a graph • Examples • Image semantic labeling (a regular graph) • User profiling in social network (a general graph) e.g., Friends tend to have the same voting preference: Democratic vs. Republican 3 Motivating Example • Modeling voting preference among persons • Person: A, B, C, D • Each person can take a binary value • 1: democratic • 0: republican • Friendship: (A,B), (B,C), (C,D), and (D,A) • Friends tend to take the same value • Indicated by a factor ( , )—assign higher score to consistent votes among friends • The joint probability • , , , = ( , ) ( , ) ( , ) ( , ) 1 = ( , ) ( , ) ( , ) ( , ) , ,, : normalization constant 4 ∑ Motivating Example (Cont.) • Given the model, some interesting questions can be asked: • What will be the most likely vote assignment such that the joint probability is maximized? • If we know A is republican and C is democratic, what will be the most likely vote for B and D? • For the model, how can we learn the parameters, i.e., the score to each possible factor configuration? • E.g., = = 1, = 1 =? 11 5 Formal Definition • A Markov Random Field (MRF) is a probability distribution P over random variables , , … , defined by a undirected graph G 1 2 • , , … , = ( ) Gibbs Distribution 1 • C: cliques in graph G 1 2 ∈ • ( ): factor or potential∏ function • = , ,…, ( ) : partition function • 1 2 ( ) log-linear∑ form ∏ (if∈ >0) • , , … , = exp ( ( )) = 1 exp( ( )) 1 2 ∑∈ 1 • = − ∑∈ • = ( ): energy function (the lower the better) − ∑∈ 6 Graphical Representation of MRFs • = , • = {1,2, … , }, a set of random variables • , : , (or appear in the same factor) • Neighbor: ∈ ⟺ ∃ = { : ⊂, } • Example ∈ • , , … , , , , , , , , ( , ) ∝ 1 2 3 4 5 6 7 Conditional Independence in MRF • Key properties • Global Markov property • For sets of nodes A, B, and C, | , iff C separates A from B in the graph • E.g., 1,2 {6,7}|{3,4,5} ⊥ • Local Markov⊥ property • A set of nodes are independent from the rest of the nodes in the graph, given its immediate neighbors (Markov blanket) • E.g., 1 {4,5,6,7}|{2,3} • Pairwise ⊥Markov property • Two nodes in the network that are not directly connected can be made independent given all other nodes • E.g., 1 {7}|{2,3,4,5,6} ⊥ 8 Hammersley-Clifford Theorem > 0 & satisfy the conditional independence properties of an undirected graph G can be factorized by a product of factors, one per maximal clique: i.e., , , … , = ( ) 1 1 2 ∏∈ 9 Examples of MRFs: Discrete MRFs • Nodes are discrete random variables • Factor or clique potentials: given a configuration of random variables in a clique, assign a real number to it • E.g., two variables A and B with values , , { , } in the same clique c, a possible potential function can be defined as: 1 2 3 1 2 Scope of a factor: the set of the variables defining the factor, e.g., {A,B} in this case 10 Examples of MRFs: Gaussian MRFs • Gaussian MRFs • Nodes are continuous random variables • Precision matrix: H = • Variables in x are connected− in1 the network only if they have a nonzero entry in theΣ precision matrix • Potentials are defined over cliques of edges • For an edge , , the potential is defined as exp{ } 1 − 2 − − 11 Examples of MRFs: Pairwise MRF • A very simple factorization • Only considers factors on vertexes and edges • , , … , = , , 1 • E.g., Ising model: mathematical model of ferromagnetism 1 2 ∈ ∈ in statistical mechanics∏ ∏ • Each atom takes two discrete values {+1, -1} • ∈ • > 0, ferromagnetic • < 0, antiferromagnetic 12 Content • From Sequence to Graph: Markov Random Field • Inference • VE • Belief Propagation • Loopy Belief Propagation • Learning • Exponential Family • The Learning Framework • Summary 13 Inference Problems • Marginal distribution inference • What is the marginal probability of a random variable ? • E.g., what is the marginal probability of one person’s voting preference being “republican”? • What is the marginal probability of a random variable conditioning on some variables are observed | = 1 ? • E.g., what is the marginal probability of one person’s voting preference given some people’s voting preference observed? • Maximum a posterior inference (MAP) • What is the most likely assignment of a set of random variables (possibly with some evidence)? • E.g., what are the most likely voting preferences for everyone in a social network? 14 Inference Methods • Marginal inference • Variable elimination • Belief propagation: sum-product message passing • MAP inference • Belief propagation: max-product message passing 15 An Illustrative Example • Consider the marginal inference problem in Markov chain • , … , = ( | ) • Assuming each variable takes d discrete values 1 1 ∏=2 −1 • How to compute marginal probability of ? • = ( , … , ) = ( | ) ∑1 ⋯ ∑−1 1 =∑1 ⋯ ∑−1 1 ∏=2( |−1 ) ( | ) � −1 � −2 −3 ⋯ � 1 2 1 −1 −2 Intermediary1 factor Familiar procedure? 2 from ( ) Intermediary factor 2 16 −1 General VE Algorithm • Given an MRF (or other graphical model) • , … , = ( ) 1 • Compute1 unnormalized ∏∈ marginal, then normalize it after computation • , … , = ( ) • Compute by VE � 1 ∏∈ • Normalize �marginal: = 1 � 17 Factor Operations • Factor product • : = × • 3 1= 2 × 1 2 •3Scope of 1is the union of 2 scopes of and () • denotes3 an assignment to the variables1 in the2 scope of by restricting in that scope • E.g., , , : = , × ( , ) 3 1 2 18 Factor Operations (Cont.) • Factor marginalization • Locally eliminates a set of variables from a factor • E.g., = ( , ) ∑ marginalizing out variable B from factor ( , , )) 19 The Variable Elimination Algorithm • Given an eliminating order • Deciding the best order is an NP-hard problem • The algorithm • For each random variable following the given order 1. Multiply all factors containing 2. Marginalize out to obtain a new factor 3. Replace the factors containing by 20 Example of VE , , … , , , , , , , , , 1 2 3 4 5 6 •∝Compute () • Eliminate in the order E, D, H, F, G, C, A 21 22 23 24 G 25 26 27 Question • What can we obtain by ? ∑ � 28 Introducing Evidence • What if some variables are observed? P(Y, E = e) • P(Y E = e) = P(E=e) • E.g., ∣ = , = • Computation flow 1 2 • Perform variable elimination on P(Y, E = e) • For every factor that has variables in E, specify their values as e • Perform variable elimination on P(E = e) 29 Running Time • Time complexity • m: number of variables • d: number of states for each variable • M: maximum size of any factor during the elimination process e.g., size of , , is 3, we need to go through all the configuration (3*2*2=12) to get , 30 Elimination Orderings • NP-hard problem • Some useful heuristics • Min-neighbors: Choose a variable with the fewest dependent variables. • Min-weight: Choose variables to minimize the product of the cardinalities of its dependent variables. • Min-fill: Choose vertices to minimize the number of edges that will be added to the graph. 31 Content • From Sequence to Graph: Markov Random Field • Inference • VE • Belief Propagation • Loopy Belief Propagation • Learning • Exponential Family • The Learning Framework • Summary 32 Belief Propagation • Limitation of VE • Each running of VE can only address one query • E.g., ( ) ( ) need two runnings • Can we share1 intermediate 2 factors while computing? • Belief Propagation: Variable elimination as message passing 33 In the Case of Tree Structure • Compute marginal probability ( ) in a tree structure • Tree: no cycles (acyclic connected graph) • The optimal order: • Set as root • Traverse the nodes via postorder • Starts from leaf nodes, go up the tree after its children have been visited (left, right, up) Postorder: 4 5 2 3 1 34 In the Case of Tree Structure (Cont.) • At each step, eliminate following the proposed order • Suppose parent node is = ′ , ( ) • , • The size is 2 ∑ • Example: compute ( ) • Postorder: , , , , • Eliminate : = 3 ( , ) × 1 2 1 4 5 3 • Eliminate : = ( , ) × 2 21 1 ∑2 1 2 • Eliminate : = ( , ) × 1 1 13 3 ∑1 3 1 21 1 • Eliminate : = ( , ) × 1 4 43 3 ∑4 4 3 • = × × 5 53 3 ∑5 5 3 3 13 3 43 3 53 3 35 Message Passing View • When is marginalized out, it receives all the signals from variables underneath it from the tree • Which can be summarized as a factor ( ) • ( ) can be considered as a message that sends to • receives messages from all its immediate children to obtain the final marginal • What if we change the root of the tree, i.e., to compute the marginal of a different variable? • Do we need to re-compute messages? 36 The Message-Passing Algorithm • How do we compute all the messages we need? • A node sends a message to a neighbor whenever it has received messages from all nodes besides • 2 , • Each edge will receive messages twice: and 푤 � → → • These messages are intermediate factors in the VE algorithm 37 Sum-Product Message Passing • While there is a node ready to send message to (meaning has received all messages) • = , ( ) ( )\ • in previous example → ∑ ∏∈ → • The marginal probability

Markov Random Field

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support