Markov Random Fields and Images

Volume 11 (4) 1998, pp. 413 { 437 Markov Random Fields and Images Patrick Perez Irisa/Inria, Campus de Beaulieu, 35042 Rennes Cedex, France e-mail: [email protected] At the intersection of statistical physics and probability theory,Markov random elds and Gibbs distributions have emerged in the early eighties as p owerful to ols for mo deling images and coping with high-dimensional inverse problems from low- level vision. Since then, they have b een used in many studies from the image pro cessing and computer vision community. Abrief and simple intro duction to the basics of the domain is prop osed. 1. Introduction and general framework With a seminal pap er by Geman and Geman in 1984 [18], powerful to ols known for long by physicists [2] and statisticians [3] were brought in a com- prehensive and stimulating way to the knowledge of the image pro cessing and computer vision community. Since then, their theoretical richness, their prac- tical versatility, and a numb er of fruitful connections with other domains, have resulted in a profusion of studies. These studies deal either with the mo d- eling of images (for synthesis, recognition or compression purp oses) or with the resolution of various high-dimensional inverse problems from early vision (e.g., restoration, deblurring, classi cation, segmentation, data fusion, surface reconstruction, optical ow estimation, stereo matching, etc. See collections of examples in [11,30,40]). The implicit assumption b ehind probabilistic approaches to image analysis is that, for a given problem, there exists a probability distribution that can capture to some extent the variability and the interactions of the di erent sets of relevant image attributes. Consequently, one considers the variables of the n problem as random variables forming a set (or random vector) X = (X ) i i=1 1 with joint probability distribution P . X 1 P is actually a probability mass in the case of discrete variables, and a probability density X function when the X 's are continuously valued. In the latter case, all summations over i states or con gurations should b e replaced byintegrals. 413 The rst critical step toward probabilistic mo deling thus obviously relies on the choice of the multivariate distribution P . Since there is so far no really X generic theory for selecting a mo del, a tailor-made parameterized function P X is generally chosen among standard ones, based on intuition of the desirable 2 prop erties . The basic characteristic of chosen distributions is their decomp osition as a pro duct of factors dep ending on just a few variables (one or two in most cases). Also, a given distribution involves only a few typ es of factors. One has simply to sp ecify these local \interaction" factors (which might be complex, and might involvevariables of di erent nature) to de ne, up to some multiplicative constant, the joint distribution P (x ;::: ;x ): one ends X 1 n up with a global mo del. With such a setup, eachvariable only directly dep ends on a few other \neigh- b oring" variables. From a more global p oint of view, all variables are mutually dep endent, but only through the combination of successive lo cal interactions. This key notion can b e formalized considering the graph for which i and j are neighb ors if x and x app ear within a same lo cal comp onent of the chosen i j factorization. This graph turns out to b e a p owerful to ol to account for lo cal and global structural prop erties of the mo del, and to predict changes in these prop erties through various manipulations. From a probabilistic p oint of view, this graph neatly captures Markov-typ e conditional indep endencies among the random variables attached to its vertices. After the sp eci cation of the mo del, one deals with its actual use for mo d- eling a class of problems and for solving them. At that p oint, as we shall see, one of the two following things will have to b e done: (1) drawing samples from the joint distribution, or from some conditional distribution deduced from the jointlaw when part of the variables are observed and thus xed; (2) maximiz- ing some distribution (P itself, or some conditional or marginal distribution X deduced from it). The very high dimensionality of image problems under concern usually ex- cludes any direct metho d for p erforming b oth tasks. However the lo cal decom- p osition of P fortunately allows to devise suitable deterministic or sto chastic X iterative algorithms, based on a common principle: at each step, just a few variables (often a single one) are considered, all the others b eing \frozen". Markovian prop erties then imply that the computations to be done remain lo cal, that is, they only involve neighb oring variables. This pap er is intended to give a brief (and de nitely incomplete) overview of how Markovian mo dels can b e de ned and manipulated in the prosp ect of mo deling and analyzing images. Starting from the formalization of Markov random elds (mrfs) on graphs through the sp eci cation of a Gibbs distribution (x2), the standard issues of interest are then grossly reviewed: sampling from a high-dimensional Gibbs distribution (x3); learning mo dels (at least parameters) from observed images (x4); using the Bayesian machinery to cop e with 2 Sup erscript denotes a parameter vector. Unless necessary, it will b e dropp ed for nota- tional convenience. 414 inverse problems, based on learned mo dels (x5); estimating parameters with partial observations, esp ecially in the case of inverse problems (x6). Finally two mo deling issues (namely the intro duction of so-called auxiliary variables, and the de nition of hierarchical mo dels), which are receiving a great deal of attention from the community at the moment, are evoked (x7). 2. Gibbs distribution and graphical Markovproperties Let us now make more formal acquaintance with Gibbs distributions and their Markov prop erties. Let X ;i = 1;::: ;n, be random variables taking values i in some discrete or continuous state space , and form the random vector T n X =(X ;::: ;X ) with con guration set = . All sorts of state spaces 1 n are used in practice. More common examples are: = f0;::: ;255g for 8-bit quantized luminances; = f1;::: ;Mg for semantic \lab elings" involving M classes; = R for continuously-valued variables like luminance, depth, etc.; = fu ;::: ;u g fv ;::: ;v g in matching problems involving max max max max 2 displacementvectors or stereo disparities for instance; = R in vector eld- based problems like optical ow estimation or shap e-from-shading. As said in the intro duction, P exhibits a factorized form: X Y P (x) / f (x ); (1) X c c c2C where C consists of small index subsets c, the factor f dep ends only on the c Q variable subset x = fx ;i 2 cg, and f is summable over . If, in addi- c i c c tion, the pro duct is p ositive (8x 2 ; P (x) > 0), then it can be written in X exp onential form (letting V = ln f ): c c X 1 P (x)= expf V (x )g: (2) X c c Z c Well known from physicists, this is the Gibbs (or Boltzman) distribution with P interaction potential fV ;c 2Cg, energy U = V , and partition function (of c c c P 3 parameters) Z = expfU (x)g . Con gurations of lower energies are x2 the more likely, whereas high energies corresp ond to low probabilities. The interaction structure induced by the factorized form is conveniently captured by a graph that statisticians refer to as the independencegraph: the Q indep endence graph asso ciated with the factorization f is the simple c c2C undirected graph G =[S; E ] with vertex set S = f1;::: ;ng, and edge set E de ned as: fi; j g2 E ()9c2C :fi; j gc, i.e., i and j are neighb ors if x i and x app ear simultaneously within a same factor f . The neighborhood n(i) j c 4 . As a consequence of site i is then de ned as n(i) = fj 2 S : fi; j g2 Eg 3 Formal expression of the normalizing constant Z must not veil the fact that it is unknown and b eyond reach in general, due to the intractable summation over . 4 n = fn(i);i 2 Sg is called a neighborhood system, and the neighb orho o d of some subset a S is de ned as n(a)=fj 2Sa:n(j)\a6=g. 415 of the de nition, any subset c is either a singleton or comp osed of mutually neighb oring sites: C is a set of cliques for G . When variables are attached to the pixels of an image, the most common neighb orho o d systems are the regular ones where a site away from the b order of the lattice has four or eight neighb ors. In the rst case ( rst-order neighb orho o d system, like in Figure 1.a) subsets c have at most two elements, whereas, in the second-order neighb orho o d system cliques can exhibit up to 4 sites. How- ever, other graph structures are also used: in segmentation applications where the image plane is partitioned, G might b e the planar graph asso ciated to the partition (Figure 1.b); and hierarchical image mo dels often live on (quad)-trees (Figure 1.c). (a) (b) (c) Figure 1.

Load more