Chapter 10: Random Fields

LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 10: Random Fields Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control Systems Learning and Inference in Graphical Models. Chapter 10 – p. 1/38 References for this chapter ◮ Christopher M. Bishop, Pattern Recognition and Machine Learning, ch. 8, Springer, 2006 ◮ Michael Ying Yang and Wolfgang Forstner,¨ A hierarchical conditional random field model for labeling and classifying images of man-made scenes. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 196-203, 2011 Learning and Inference in Graphical Models. Chapter 10 – p. 2/38 Motivation Bayesian networks model clear dependencies, often causal dependencies. Bayesian networks are acyclic. How can we model mutual and cyclic dependencies? Example (economy): ◮ demand and supply determine the price ◮ high price fosters supply ◮ low price fosters demand Learning and Inference in Graphical Models. Chapter 10 – p. 3/38 Motivation Example (physics): modeling ferromagnetism in statistical mechanics ◮ a grid of magnetic dipoles in a volume ◮ every dipole causes a force on its neighbors ◮ every dipole is forced by its neighbors The dipoles might change their orientation. Every configuration of the magnetic dipole field can be characterized by its energy. The probability of a certain configuration depends on its energy: high energy configurations are less probable, low energy configurations are more probable. → Ising-model (Ernst Ising, 1924) Learning and Inference in Graphical Models. Chapter 10 – p. 4/38 Markov random fields ◮ a Markov random field (MRF) is a undirected, connected graph A ◮ each node represents a random variable • open circles indicate non-observed random variables C • filled circles indicate observed random variables B • dots indicate given constants ◮ links indicate an explicitly modeled stochastic D dependence Learning and Inference in Graphical Models. Chapter 10 – p. 5/38 Markov random fields Joint probability distribution of a MRF is defined over cliques in the graph X1 Definition: A clique of size k is a subset C of k nodes of the MRF so that for each pair X,Y ∈ C with X =6 Y holds that X and Y are connected by an edge. X4 Example: X2 The MRF on the right has ◮ one clique of size 3: X3 {X2,X3,X4} ◮ four cliques of size 2: {X1,X2}, {X2,X3}, {X2,X4}, {X3,X4} ◮ four cliques of size 1: {X1}, {X2}, {X3}, {X4} Learning and Inference in Graphical Models. Chapter 10 – p. 6/38 Markov random fields For every clique C in the MRF we specify a potential function ψC : C → R>0 ◮ large values of ψC indicate that a certain configuration of the random variables in the clique is more probable ◮ small values of ψC indicate that a certain configuration of the random variables in the clique is less probable The joint distribution of the MRF is defined as the product of the potential functions for all cliques 1 p(X ,...,X )= ψ (C) 1 n Z C C Cliques ∈Y with Z = C∈Cliques ψC (C)d(X1,...,Xn) the partition function Remark: calculatingR Q Z might be very hard in practice Learning and Inference in Graphical Models. Chapter 10 – p. 7/38 Markov random fields Potential functions are usually given in terms of Gibbs/Boltzmann distributions −EC (C) ψC (C)= e with EC : C → R an “energy function” ◮ large energy means low probability ◮ small energy means large probability Hence, the overall probability distribution of an MRF is 1 E C p(X ,...,X )= e− PC∈Cliques C ( ) 1 n Z Learning and Inference in Graphical Models. Chapter 10 – p. 8/38 Markov random fields Example: let us model the food preferences of a group of four persons: Antonia, Ben, Charles, and Damaris. They might choose between pasta, fish, and meat ◮ Ben likes meat and pasta but hates fish ◮ Antonia, Ben, and Charles prefer to choose the same ◮ Charles is vegetarian ◮ Damaris prefers to choose something else than all the other → create an MRF on the blackboard that models the food preferences of the four persons and assign potential functions to the cliques. Learning and Inference in Graphical Models. Chapter 10 – p. 9/38 Markov random fields One way to model the food preference task A C Random variables A, B, C, D model Antonias, Bens, Charles, and Damaris’ choice. Discrete variables with values 1=pasta, 2=fish, 3=meat Energy functions which are relevant B D (all others are constant): 0 if b ∈{1, 3} 0 if a =6 d E{B}(b)= E{A,D}(a, d)= (100 if b =2 (10 if a = d 0 if a = b = c 0 if b =6 d E{A,B,C}(a,b,c)= E{B,D}(b,d)= (30 otherwise (10 if b = d 0 if c =1 0 if c =6 d E{C,D}(c,d)= E{C}(c)= 50 if c =2 (10 if c = d 200 if c =3 Learning and Inference in Graphical Models. Chapter 10 – p. 10/38 Factor graphs Like for Bayesian networks we can define factor graphs over MRFs. A factor graph is a bipartite graph with two kind of nodes: ◮ variable nodes that model random variables ◮ factor nodes that model a probabilistic relationship between variable nodes. Each factor node is assigned with a potential function Variable nodes and factor nodes are connected by undirected links. For each MRF we can create a factor graph as follows: ◮ the set of variable nodes is taken from the nodes of the MRF ◮ for each non-constant potential function ψC • we create a new factor node f • we connect f with all variable nodes in clique C • we assign the potential function ψC to f Hence, the joint probability of the MRF is equal to the Gibbs distribution over the sum of all factor potentials Learning and Inference in Graphical Models. Chapter 10 – p. 11/38 Factor graphs The factor graph of the food preference task looks likes E{C} A C E{A,B,V } E{C,D} E{A,D} B D E{B,D} E{B} Learning and Inference in Graphical Models. Chapter 10 – p. 12/38 Stochastic inference in Markov random fields How can we calculate p(U = u|O = o) and arg maxu p(U = u|O = u)? ◮ if the factor graph related to a MRF is a tree, we can use the sum-product and max-sum algorithm introduced in chapter 4. ◮ in the general case there are no efficient exact algorithms ◮ we can build variational approximations (chapter 6) for approximate inference ◮ we can use MCMC samplers (chapter 7) for numerical inference ◮ we can use local optimization (chapter 8) Example: in the food preference task, ◮ what is the overall best choice of food? ◮ what is the best choice of food if Antonia eats fish? Learning and Inference in Graphical Models. Chapter 10 – p. 13/38 Special types of MRFs MRFs are very general and can be used for many purposes. Some models have been shown to be very useful. In this lecture, we introduce ◮ the Potts model. Useful for image segmentation and noise removal ◮ Conditional random fields. Usefule for image segmentation ◮ the Boltzmann machine. Useful for unsupervised and supervised learning ◮ Markov logic networks. Useful for logic inference on noisy data (chapter 11) Learning and Inference in Graphical Models. Chapter 10 – p. 14/38 Potts Model Learning and Inference in Graphical Models. Chapter 10 – p. 15/38 Potts model The Potts model can be used for segmentation and noise removal in images and other sensor data. We discuss it in the image segmentation case Assume, ◮ an image is composed out of several areas (e.g. foreground/background, object A/object B/background) ◮ each area has a characteristic color or gray value ◮ pixels in the image are corrupted by noise ◮ neighboring pixels are very likely to belong to the same area How can we model these assumptions with a MRF? Learning and Inference in Graphical Models. Chapter 10 – p. 16/38 Potts model ◮ every pixel belongs to a certain area. We Yi−1,j−1 Yi−1,j Yi−1,j+1 model it with a discrete random variable Xi−1,j−1 Xi−1,j Xi−1,j+1 Xi,j. The true class label is unobserved. ◮ the color/gray value of each pixel is Yi,j−1 Yi,j Yi,j+1 described by a random variable Y . i,j Xi,j−1 Xi,j Xi,j+1 The color value is observed. Yi+1,j−1 Yi+1,j Yi+1,j+1 ◮ Xi,j and Yi,j are stochastically dependent. This dependency can be Xi+1,j−1 Xi+1,j Xi+1,j+1 described by an energy function ◮ the class labels of neighboring pixels are stochastically dependent. This can be described by an energy functions. ◮ we can provide priors for the class label as energy function on individual Xi,j Learning and Inference in Graphical Models. Chapter 10 – p. 17/38 Potts model energy functions on cliques: ◮ similarity of neighboring nodes 0 if xi,j = xi+1,j E{Xi,j ,Xi+1,j }(xi,j,xi+1,j)= (1 if xi,j =6 xi+1,j 0 if xi,j = xi,j+1 E{Xi,j ,Xi,j+1}(xi,j,xi,j+1)= (1 if xi,j =6 xi,j+1 ◮ dependecy between observed color/gray value and class label. Assume each class k can be characterized by a typical color/gray value ck E{Xi,j ,Yi,j }(xi,j,yi,j)= ||Yi,j − cXi,j || ◮ overall preference for certain classes. Assume a prior distribution p over the classes E{Xi,j }(xi,j)= − log p(Xi,j) Learning and Inference in Graphical Models.

Chapter 10: Random Fields

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support