Chapter 10: Random Fields

LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 10: Random Fields Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control Systems Learning and Inference in Graphical Models. Chapter 10 – p. 1/38 References for this chapter ◮ Christopher M. Bishop, Pattern Recognition and Machine Learning, ch. 8, Springer, 2006 ◮ Michael Ying Yang and Wolfgang Forstner,¨ A hierarchical conditional random field model for labeling and classifying images of man-made scenes. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 196-203, 2011 Learning and Inference in Graphical Models. Chapter 10 – p. 2/38 Motivation Bayesian networks model clear dependencies, often causal dependencies. Bayesian networks are acyclic. How can we model mutual and cyclic dependencies? Example (economy): ◮ demand and supply determine the price ◮ high price fosters supply ◮ low price fosters demand Learning and Inference in Graphical Models. Chapter 10 – p. 3/38 Motivation Example (physics): modeling ferromagnetism in statistical mechanics ◮ a grid of magnetic dipoles in a volume ◮ every dipole causes a force on its neighbors ◮ every dipole is forced by its neighbors The dipoles might change their orientation. Every configuration of the magnetic dipole field can be characterized by its energy. The probability of a certain configuration depends on its energy: high energy configurations are less probable, low energy configurations are more probable. → Ising-model (Ernst Ising, 1924) Learning and Inference in Graphical Models. Chapter 10 – p. 4/38 Markov random fields ◮ a Markov random field (MRF) is a undirected, connected graph A ◮ each node represents a random variable • open circles indicate non-observed random variables C • filled circles indicate observed random variables B • dots indicate given constants ◮ links indicate an explicitly modeled stochastic D dependence Learning and Inference in Graphical Models. Chapter 10 – p. 5/38 Markov random fields Joint probability distribution of a MRF is defined over cliques in the graph X1 Definition: A clique of size k is a subset C of k nodes of the MRF so that for each pair X,Y ∈ C with X =6 Y holds that X and Y are connected by an edge. X4 Example: X2 The MRF on the right has ◮ one clique of size 3: X3 {X2,X3,X4} ◮ four cliques of size 2: {X1,X2}, {X2,X3}, {X2,X4}, {X3,X4} ◮ four cliques of size 1: {X1}, {X2}, {X3}, {X4} Learning and Inference in Graphical Models. Chapter 10 – p. 6/38 Markov random fields For every clique C in the MRF we specify a potential function ψC : C → R>0 ◮ large values of ψC indicate that a certain configuration of the random variables in the clique is more probable ◮ small values of ψC indicate that a certain configuration of the random variables in the clique is less probable The joint distribution of the MRF is defined as the product of the potential functions for all cliques 1 p(X ,...,X )= ψ (C) 1 n Z C C Cliques ∈Y with Z = C∈Cliques ψC (C)d(X1,...,Xn) the partition function Remark: calculatingR Q Z might be very hard in practice Learning and Inference in Graphical Models. Chapter 10 – p. 7/38 Markov random fields Potential functions are usually given in terms of Gibbs/Boltzmann distributions −EC (C) ψC (C)= e with EC : C → R an “energy function” ◮ large energy means low probability ◮ small energy means large probability Hence, the overall probability distribution of an MRF is 1 E C p(X ,...,X )= e− PC∈Cliques C ( ) 1 n Z Learning and Inference in Graphical Models. Chapter 10 – p. 8/38 Markov random fields Example: let us model the food preferences of a group of four persons: Antonia, Ben, Charles, and Damaris. They might choose between pasta, fish, and meat ◮ Ben likes meat and pasta but hates fish ◮ Antonia, Ben, and Charles prefer to choose the same ◮ Charles is vegetarian ◮ Damaris prefers to choose something else than all the other → create an MRF on the blackboard that models the food preferences of the four persons and assign potential functions to the cliques. Learning and Inference in Graphical Models. Chapter 10 – p. 9/38 Markov random fields One way to model the food preference task A C Random variables A, B, C, D model Antonias, Bens, Charles, and Damaris’ choice. Discrete variables with values 1=pasta, 2=fish, 3=meat Energy functions which are relevant B D (all others are constant): 0 if b ∈{1, 3} 0 if a =6 d E{B}(b)= E{A,D}(a, d)= (100 if b =2 (10 if a = d 0 if a = b = c 0 if b =6 d E{A,B,C}(a,b,c)= E{B,D}(b,d)= (30 otherwise (10 if b = d 0 if c =1 0 if c =6 d E{C,D}(c,d)= E{C}(c)= 50 if c =2 (10 if c = d 200 if c =3 Learning and Inference in Graphical Models. Chapter 10 – p. 10/38 Factor graphs Like for Bayesian networks we can define factor graphs over MRFs. A factor graph is a bipartite graph with two kind of nodes: ◮ variable nodes that model random variables ◮ factor nodes that model a probabilistic relationship between variable nodes. Each factor node is assigned with a potential function Variable nodes and factor nodes are connected by undirected links. For each MRF we can create a factor graph as follows: ◮ the set of variable nodes is taken from the nodes of the MRF ◮ for each non-constant potential function ψC • we create a new factor node f • we connect f with all variable nodes in clique C • we assign the potential function ψC to f Hence, the joint probability of the MRF is equal to the Gibbs distribution over the sum of all factor potentials Learning and Inference in Graphical Models. Chapter 10 – p. 11/38 Factor graphs The factor graph of the food preference task looks likes E{C} A C E{A,B,V } E{C,D} E{A,D} B D E{B,D} E{B} Learning and Inference in Graphical Models. Chapter 10 – p. 12/38 Stochastic inference in Markov random fields How can we calculate p(U = u|O = o) and arg maxu p(U = u|O = u)? ◮ if the factor graph related to a MRF is a tree, we can use the sum-product and max-sum algorithm introduced in chapter 4. ◮ in the general case there are no efficient exact algorithms ◮ we can build variational approximations (chapter 6) for approximate inference ◮ we can use MCMC samplers (chapter 7) for numerical inference ◮ we can use local optimization (chapter 8) Example: in the food preference task, ◮ what is the overall best choice of food? ◮ what is the best choice of food if Antonia eats fish? Learning and Inference in Graphical Models. Chapter 10 – p. 13/38 Special types of MRFs MRFs are very general and can be used for many purposes. Some models have been shown to be very useful. In this lecture, we introduce ◮ the Potts model. Useful for image segmentation and noise removal ◮ Conditional random fields. Usefule for image segmentation ◮ the Boltzmann machine. Useful for unsupervised and supervised learning ◮ Markov logic networks. Useful for logic inference on noisy data (chapter 11) Learning and Inference in Graphical Models. Chapter 10 – p. 14/38 Potts Model Learning and Inference in Graphical Models. Chapter 10 – p. 15/38 Potts model The Potts model can be used for segmentation and noise removal in images and other sensor data. We discuss it in the image segmentation case Assume, ◮ an image is composed out of several areas (e.g. foreground/background, object A/object B/background) ◮ each area has a characteristic color or gray value ◮ pixels in the image are corrupted by noise ◮ neighboring pixels are very likely to belong to the same area How can we model these assumptions with a MRF? Learning and Inference in Graphical Models. Chapter 10 – p. 16/38 Potts model ◮ every pixel belongs to a certain area. We Yi−1,j−1 Yi−1,j Yi−1,j+1 model it with a discrete random variable Xi−1,j−1 Xi−1,j Xi−1,j+1 Xi,j. The true class label is unobserved. ◮ the color/gray value of each pixel is Yi,j−1 Yi,j Yi,j+1 described by a random variable Y . i,j Xi,j−1 Xi,j Xi,j+1 The color value is observed. Yi+1,j−1 Yi+1,j Yi+1,j+1 ◮ Xi,j and Yi,j are stochastically dependent. This dependency can be Xi+1,j−1 Xi+1,j Xi+1,j+1 described by an energy function ◮ the class labels of neighboring pixels are stochastically dependent. This can be described by an energy functions. ◮ we can provide priors for the class label as energy function on individual Xi,j Learning and Inference in Graphical Models. Chapter 10 – p. 17/38 Potts model energy functions on cliques: ◮ similarity of neighboring nodes 0 if xi,j = xi+1,j E{Xi,j ,Xi+1,j }(xi,j,xi+1,j)= (1 if xi,j =6 xi+1,j 0 if xi,j = xi,j+1 E{Xi,j ,Xi,j+1}(xi,j,xi,j+1)= (1 if xi,j =6 xi,j+1 ◮ dependecy between observed color/gray value and class label. Assume each class k can be characterized by a typical color/gray value ck E{Xi,j ,Yi,j }(xi,j,yi,j)= ||Yi,j − cXi,j || ◮ overall preference for certain classes. Assume a prior distribution p over the classes E{Xi,j }(xi,j)= − log p(Xi,j) Learning and Inference in Graphical Models.

Chapter 10: Random Fields

Image Segmentation Combining Markov Random Fields and Dirichlet Processes

Lecturenotes 4 MCMC I – Contents

Mathematisches Forschungsinstitut Oberwolfach Scaling Limits in Models of Statistical Mechanics

Random and Out-Of-Equilibrium Potts Models Christophe Chatelain

Theory of Continuum Percolation I. General Formalism

CFT Interpretation of Merging Multiple SLE Traces

A Novel Approach for Markov Random Field with Intractable Normalising Constant on Large Lattices

A Review on Statistical Inference Methods for Discrete Markov Random Fields Julien Stoehr, Richard Everitt, Matthew T

Critical Gaussian Multiplicative Chaos: Convergence of The

The Potts Model and Tutte Polynomial, and Associated Connections Between Statistical Mechanics and Graph Theory

Markov Chain Monte Carlo Sampling

Sharp Phase Transition for the Random-Cluster and Potts Models Via Decision Trees