CS 6347 Lecture 4 Markov Random Fields

CS 6347 Lecture 4 Markov Random Fields (a.k.a Undirected Graphical Models) Bayesian Network • A Bayesian network is a directed graphical model that captures independence relationships of a given probability distribution – Encodes local Markov independence assumptions that each node is independent of its non-descendants given its parents – Corresponds to a factorization of the joint distribution 푝 푥1, … , 푥푛 = 푝(푥푖|푥푝푎푟푒푛푡푠(푖)) 푖 2 Limits of Bayesian Networks • Not all sets of independence relations can be captured by a Bayesian network – 퐴 ⊥ 퐶 | 퐵, 퐷 – 퐵 ⊥ 퐷 | 퐴, 퐶 • Possible DAGs that represent these independence relationships? 퐴 퐷 퐵 퐷 퐵 퐶 퐶 퐴 3 Markov Random Fields (MRFs) • A Markov random field is an undirected graphical model – Undirected graph 퐺 = (푉, 퐸) – One node for each random variable – One potential function or "factor" associated with cliques, 퐶, of the graph – Nonnegative potential functions represent interactions and need not correspond to conditional probabilities (may not even sum to one) 4 Markov Random Fields (MRFs) • A Markov random field is an undirected graphical model – Corresponds to a factorization of the joint distribution 1 푝 푥 , … , 푥 = 휓 (푥 ) 1 푛 푍 푐 푐 푐∈퐶 ′ 푍 = 휓푐(푥푐) ′ ′ 푥1,…,푥푛 푐∈퐶 5 Markov Random Fields (MRFs) • A Markov random field is an undirected graphical model – Corresponds to a factorization of the joint distribution 1 푝 푥 , … , 푥 = 휓 (푥 ) 1 푛 푍 푐 푐 푐∈퐶 ′ 푍 = 휓푐(푥푐) ′ ′ 푥1,…,푥푛 푐∈퐶 Normalizing constant, 푍, often called the partition function 6 An Example 퐴 퐵 퐶 1 • 푝 푥 , 푥 , 푥 = 휓 (푥 , 푥 )휓 (푥 , 푥 ) 휓 (푥 , 푥 ) 퐴 퐵 퐶 푍 퐴퐵 퐴 퐵 퐵퐶 퐵 퐶 퐴퐶 퐴 퐶 • Each potential function can be specified as a table as before 푥퐴 = 0 푥퐴 = 1 1 1 휓퐴퐵 푥퐴, 푥퐵 = 1 0 7 The Ising Model • Mathematical model of ferromagnets • Each atom has an associated spin that is biased by both its neighbors in the material and an external magnetic field – Spins can be either +1 or -1 +1 −1 +1 −1 – Edge potentials capture the local interactions +1 +1 +1 +1 – Singleton potentials capture the external field +1 −1 +1 −1 1 푝 푥 = exp ( ℎ 푥 + 퐽 푥 푥 ) −1 −1 −1 +1 푉 푍 푖 푖 푖푗 푖 푗 푖∈V 푖,푗 ∈퐸 8 Independence Assertions • Instead of d-separation, we need only consider separation: – If 푋 ⊆ 푉 is graph separated from 푌 ⊆ 푉 by 푍 ⊆ 푉, (i.e., all paths from 푋 to 푌 go through 푍) then 푋 ⊥ 푌 | 푍 – What independence assertions follow from this MRF? 퐴 퐷 퐵 퐶 9 Independence Assertions 퐴 퐵 퐶 1 푝 푥 , 푥 , 푥 = 휓 푥 , 푥 휓 (푥 , 푥 ) 퐴 퐵 퐶 푍 퐴퐵 퐴 퐵 퐵퐶 퐵 퐶 • How does separation imply independence? • Show that 퐴 ⊥ 퐶 | 퐵 10 Independence Assertions • In particular, each variable is independent of all of its non-neighbors given its neighbors – All paths leaving a single variable must pass through some neighbor • If the joint probability distribution, 푝, factorizes with respect to the graph 퐺, then 퐺 is an I-map for 푝 • If 퐺 is an I-map of a positive distribution 푝, then 푝 factorizes with respect to the graph 퐺 – Hamersley-Clifford Theorem 11 BNs vs. MRFs Property Bayesian Networks Markov Random Fields Conditional Factorization Potential Functions Distributions Product of Conditional Normalized Product of Distribution Distributions Potentials Cycles Not Allowed Allowed Partition Potentially NP-hard to 1 Function Compute Independence d-Separation Graph Separation Test 12 Moralization • Every Bayesian network can be converted into an MRF with some possible loss of independence information – Remove the direction of all arrows in the network – If 퐴 and 퐵 are parents of 퐶 in the Bayesian network, we add an edge between 퐴 and 퐵 in the MRF • This procedure is called "moralization" because it "marries" the parents of every node 퐴 퐷 ? 퐵 퐶 13 Moralization • Every Bayesian network can be converted into an MRF with some possible loss of independence information – Remove the direction of all arrows in the network – If 퐴 and 퐵 are parents of 퐶 in the Bayesian network, we add an edge between 퐴 and 퐵 in the MRF • This procedure is called "moralization" because it "marries" the parents of every node 퐴 퐷 퐴 퐷 퐵 퐶 퐵 퐶 14 Moralization 퐶 퐶 퐴 퐵 퐴 퐵 • What independence information is lost? 15 Factorizations • Many factorizations over the same graph may represent the same joint distribution – Some are better than others (e.g., they more compactly represent the distribution) – Simply looking at the graph is not enough to understand which specific factorization is being assumed 퐴 퐷 퐵 퐶 16 Factor Graphs • Factor graphs are used to explicitly represent a given factorization over a given graph – Not a different model, but rather different way to visualize an MRF – Undirected bipartite graph with two types of nodes: variable nodes (circles) and factor nodes (squares) – Factor nodes are connected to the variable nodes on which they depend 17 Factor Graphs 1 푝 푥 , 푥 , 푥 = 휓 (푥 , 푥 )휓 (푥 , 푥 ) 휓 푥 , 푥 퐴 퐵 퐶 푍 퐴퐵 퐴 퐵 퐵퐶 퐵 퐶 퐴퐶 퐴 퐶 퐴 휓퐴퐵 휓퐴퐶 퐵 퐶 휓퐵퐶 18 Conditional Random Fields (CRFs) • Undirected graphical models that represent conditional probability distributions 푝 푌 푋) – Potentials can depend on both 푋 and 푌, typically only the observed variables are considered in the model 1 푝 푌 푋) = 휓 (푥 , 푦 ) 푍(푥) 푐 푐 푐 푐∈C ′ 푍 푥 = 휓푐(푥푐, 푦푐) 푦′ 푐∈C 19 Log-Linear Models • CRFs often assume that the potentials are log-linear functions 휓푐 푥푐, 푦푐 = exp(푤 ⋅ 푓푐(푥푐, 푦푐)) where 푓푐 is referred to as a feature vector and 푤 is some vector of feature weights • The feature weights are typically learned from data • CRFs don’t require us to model the full joint distribution (which may not be possible anyhow) 20 Conditional Random Fields (CRFs) • Binary image segmentation – Label the pixels of an image as belonging to the foreground or background – +/- correspond to foreground/background – Interaction between neighboring pixels in the image depends on how similar the pixels are • Similar pixels should preference having the same spin (i.e., being in the same part of the image) 21 Conditional Random Fields (CRFs) • Binary image segmentation – This can be modeled as a CRF where the image information (e.g., pixel colors) is observed, but the segmentation is unobserved – Because the model is conditional, we don’t need to describe the joint probability distribution of (natural) images and their foreground/background segmentations – CRFs will be particularly important when we want to learn graphical models 22 Low Density Parity Check Codes • Want to send a message across a noisy channel in which bits can be flipped with some probability – use error correcting codes 휓퐴 휓퐵 휓퐶 푦1 푦2 푦3 푦4 푦5 푦6 휙1 휙2 휙3 휙4 휙5 휙6 푥1 푥2 푥3 푥4 푥5 푥6 • 휓퐴, 휓퐵, 휓퐶 are all parity check constraints: they equal one if their input contains an even number of ones and zero otherwise • 휙푖 푥푖, 푦푖 = 푝 푦푖 푥푖 , the probability that the th bit was flipped during transmission23 Low Density Parity Check Codes 휓퐴 휓퐵 휓퐶 푦1 푦2 푦3 푦4 푦5 푦6 휙1 휙2 휙3 휙4 휙5 휙6 푥 푥 푥3 푥 푥 푥 1 2 4 5 6 • The parity check constraints enforce that the 푦’s can only be one of a few possible codewords: 000000, 001011, 010101, 011110, 100110, 101101, 110011, 111000 • Decoding the message that was sent is equivalent to computing the most likely codeword under the joint probability distribution 24 Low Density Parity Check Codes 휓퐴 휓퐵 휓퐶 푦1 푦2 푦3 푦4 푦5 푦6 휙1 휙2 휙3 휙4 휙5 휙6 푥 푥 푥3 푥 푥 푥 1 2 4 5 6 • Most likely codeword is given by MAP inference arg max 푝 푦|푥 푦 • Do we need to compute the partition function for MAP inference? 25 .

CS 6347 Lecture 4 Markov Random Fields

Methods of Monte Carlo Simulation II

12 : Conditional Random Fields 1 Hidden Markov Model

The Ising Model

Pdf File of Second Edition, January 2018

Mean Field Methods for Classification with Gaussian Processes

MCMC Learning (Slides)

Arxiv:1511.03031V2

Statistical Field Theory University of Cambridge Part III Mathematical Tripos

Markov Random Fields: – Geman, Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images”, IEEE PAMI 6, No

Anomaly Detection Based on Wavelet Domain GARCH Random Field Modeling Amir Noiboar and Israel Cohen, Senior Member, IEEE

Lecture 8: the Ising Model Introduction

Probability on Graphs Random Processes on Graphs and Lattices