IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 12. NO. 7, JULY 1990 609 Boundary Detection by Constrained Optimization ‘ DONALD GEMAN, MEMBER, IEEE, STUART GEMAN, MEMBER, IEEE, CHRISTINE GRAFFIGNE, AND PING DONG, MEMBER, IEEE

Abstract-We use a statistical framework for finding boundaries and tations related to image segmentation: partition and for partitioning scenes into homogeneous regions. The model is a joint boundary labels. These, in tum, are special cases of a probability distribution for the array of pixel gray levels and an array of “labels.” In boundary finding, the labels are binary, zero, or one, general “label model” in the form of an “energy func- representing the absence or presence of boundary elements. In parti- tional’’ involving two components, one of which ex- tioning, the label values are generic: two labels are the same when the presses the interactions between the labels and the (inten- corresponding scene locations are considered to belong to the same re- sity) data, and the other encodes constraints derived from gion. The distribution incorporates a measure of disparity between cer- general information or expectations about label patterns. tain spatial features of pairs of blocks of pixel gray levels, using the Kolmogorov-Smirnov nonparametric measure of difference between The labeling we seek is, by definition, the minimum of the distributions of these features. Large disparities encourage inter- this energy. vening boundaries and distinct partition labels. The number of model The partition labels do not classify. Instead they are parameters is minimized by forbidding label configurations that are in- generic and are assigned to blocks of pixels; the size of consistent with prior beliefs, such as those defining very small regions, the blocks (or label resolution) is variable, and depends or redundant or blindly ending boundary placements. Forbidden con- figurations are assigned probability zero. We examine the MAP (mar- on the resolution of the data and the intended interpreta- imum a posterion’) estimator of boundary placements and partition- tions. The boundary labels are just “on”/“off ,” and are ings. The forbidden states introduce constraints into the calculation of also of variable resolution, associated with an interpixel these configurations. Stochastic relaxation methods are extended to ac- sublattice. In both cases, the interaction term incorporates commodate constrained optimization, and experiments are performed a measure of disparity between certain spatial features of on some texture collages and some natural scenes. pairs of blocks of pixel gray levels, using the Kolmogo- i Zndex Terms-Annealing, Bayesian inference, boundary finding, rov-Smirnov nonparametric measure of difference be- constrained optimization, Gibbs distribution, MAP estimate, Markov tween the distributions of these features. Large disparities random field, segmentation, stochastic relaxation, texture discrimina- encourage intervening boundaries and distinct partition tion. labels. The number of model parameters is reduced by forbidding label configurations that are inconsistent with I. INTRODUCTION prior expectations, such as those defining very small re- ANY problems in image analysis, from signal res- gions, or redundant or blindly ending boundary place- Mtoration to object recognition, involve representa- ments. These forbidden states introduce constraints into tions of the observed data, usually radiant energy or range the calculation of the optimal label configuration. measurements, in terms of unobserved attributes or label Both models are applied mainly to the problem of tex- variables. These representations may be conclusive, or ture segmentation. The data is a gray-level image con- serve as intermediate data structures for further analysis, sisting of textured regions, such as a mosaic of microtex- ’ perhaps involving additional data, assorted “sketches,” tures from the Brodatz album [6], a patch of rug inside or stored models. This work concems two such represen- plastic, or radar-imaged ice floes in water. It is well- known that humans perceive “textural” boundaries be- Manuscript received April 18, 1988; revised September 14, 1989. Rec- tween regions of approximately the same average bright- ommended for acceptance by A. K. Jain. The work of D. Geman and P. Dong was supported in part by the Office of Naval Research under Contract ness because the regions themselves, although containing “14-86-K-0027. The work of S. Geman and C. Graffigne was sup- sharp intensity changes, are perceived as “homogene- ! ported in part by the Army Research Office under Contract DAAL03-86- ous” based on other properties, namely those involving K-0171 to the Center for Intelligent Control Systems, by the National Sci- ence Foundation under Grant DMS-8352087, and by the General Motors the spatial distribution of intensity values. The goal, then, Research Laboratories. is to find the (visually) distinct texture regions, either by D. Geman is with the Department of Mathematics and , Uni- assigning categorical labels to the pixels, or by construct- versity of Massachusetts, Amherst, MA 01003. S. Geman is with the Division of Applied Mathematics, Brown Univer- ing a boundary map. Obviously, the problem is more dif- sity, Providence, RI 02912. ficult than texture classification, in which we are pre- C. Graffigne was with the Division of Applied Mathematics, Brown sented with only one texture from a given list; , University, Providence, RI 02912. She is now with the French National Center for Scientific Research (CNRS), Equipe de Statistique Appliquke, segmentation may be complicated by an absence of infor- Bitiment 425, UniversitC de Paris-Sud, 91405 Orsay Cedex, France. mation about the number of textures, or about the size, , P. Dong was with the Department of Mathematics and Statistics, Uni- shape, or number of regions. versity of Massachusetts, Amherst, MA 01003. He is now with Codex, Mansfield. MA 02048. There is no “model” for the individual textures, and IEEE Log Number 90361 12. hence no capacity for texture synthesis. Our approach is

0162-8828/90/0700-0609$01.OO 0 1990 IEEE 610 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 12. NO. 7. JULY 1990

therefore different from those for classification and seg- Apparently, some of these ideas have been around for mentation in which textures are discriminated based on a while. For example, the Kolmogorov-Smirnov statistic model parameters which are estimated from specific tex- is recommended in [51], and reference is made to still ture samples. Partitionings and boundary placements are earlier papers; more recently, see [60]. Moreover, the dis- driven by the observed spatial statistics as summarized by tributional properties of residuals (from surface-fitting) selected features. Still, the labeling is not unsupervised are advocated in [29], [48] for detecting discontinuities. because in some cases we use “training samples” to de- It is certainly our contention that the statistical warehouse termine feature thresholds for the disparity measures; see is full of useful tools for computer vision. Sections I1 and 111. Finally, our model may be interpreted in a Bayesian Since many visually distinct textures have nearly iden- framework as a “prior” joint probability distribution for tical histograms, segmentation must rely on features, or the array of pixel gray levels and the array of labels. For- transformations, which go beyond the raw gray levels, bidden configurations are assigned probability zero, and involving various spatial statistics. We experimented with the label estimate is then associated with the MAP (max- several conventional classes of features, in particular the imum a posteriori) estimate. We shall discuss this inter- well-known ones based on cooccurrence matrices [ 161, pretation in more detail later on. Suffice to say that, [3 13, but finally adopted a new class based mainly on “di- whereas our formulation of the problem and definition of rectional residuals” and involving third and higher order the “best” labeling are formally independent of the sto- distributions. However, our viewpoint is exactly that ex- chastic outlook, our optimization procedures are in fact pressed by Zobrist and Thompson [62], Triendl and Hen- strongly motivated by this viewpoint. In fact, in order to dersen [59], and others: instead of trying to find exactly deal with the more difficult textures, the deterministic the “right” features to convert texture to tone differences, (‘ ‘zero-temperature”) relaxation algorithm we use for one should find a mechanism for integrating multiple, most of our experiments must be replaced by a version of even redundant, ‘‘cues.’’ The label model provides a co- stochastic relaxation extended to accommodate con- herent method for integrating such information and, strained sampling and optimization. simultaneously, organizing the label patterns. Whereas texture discrimination may be regarded as the A. Applications detection of discontinuities in surface composition, we Texture is a dominant feature in remotely sensed im- also apply the boundary model to the problem of locating ages and regions cannot be distinguished by methods sudden changes in depth (occluding boundaries) or shape based solely on shading, such as edge detectors or clus- (surface creases, etc.). The idea is to define contours tering algorithms. Specifically, for example, one might which are faithful to the 3-D scene but avoid the “non- wish to determine the concentration of ice in synthetic physical” edges due to noise, digitization, texture, light- aperture radar images of the ocean, or analyze multispec- ing, etc. In particular, the boundaries should be con- tral satellite data for land use classification. Another ap- nected, unique, and reasonably smooth, unlike the typical plication is to wafer inspection: low magnification views output of a pure edge detector. Indeed, we formulate of memory arrays appear as highly structured textures, boundary detection as a single optimization problem, fus- and other geometries have a characteristic, but random, ing the detection of edges with their pruning, linking, graining. In addition, the use of boundary maps as the smoothing, and so on. Obviously, there are discontinu- input to further processing is ubiquitous in computer vi- ities, such as shadows, which are essentially impossible sion; for example, algorithms for stereopsis, optical flow, to distinguish from the occluding and shape boundaries, and simple object recognition are often based on matching at least without information from multiple sensors or a boundary segments. Other applications include the anal- rich knowledge base, in which case boundary classifica- ysis of medical images, automated navigation, and the de- tion becomes possible. tection of roads, faults, field boundaries, etc. in remotely Our model enjoys some invariance properties. In par- sensed images. (Obviously no generic algorithm provides ticular, due to the use of the Kolmogorov-Smirnov dis- “off-the-shelf” solutions to real problems; genuine ap- tance, the detected labels are unaffected by linear changes plications require substantial adaptations.) in illumination, and in some cases by all monotone data transformations. Such distortions often result from sensor B. Label Model nonlinearities, digitizers, and film properties. In regard to Let x = { xs, s E S } and y = { yij, 1 5 i, j I N ] rotation invariance, the estimated labeling will be inde- denote, respectively, the labels and the data; thus x, is the pendent of the relative orientation of the training samples label at “site” s E S and yQ is the gray-level at pixel (i, with the image data to the extent that the features are ro- j). The set S of label sites is a regular lattice, distinct tation invariant. The pixel lattice precludes true invari- from that of the pixels, and typically more sparse; the ance. However, since our collection of features is basi- coarseness depends on the label resolution o. For parti- cally isotropic, our results are approximately invariant. tioning, we associate each site s E S with a block of pix- See [38] for an approach to rotation-invariant classijica- els, “sitting below it,” if we were to stack the label lat- tion. tice on top of the pixel lattice. In the boundary model, GEMAN et al. : BOUNDARY DETECTION BY CONSTRAINED OPTIMIZATION 61 1

. pairs of nearby sites in S define boundary segments, and 11), and yif;, yiij) are the data in the two blocks associated these are associated with pairs of pixel blocks, sitting with ( s, t ) for the ith transform. Often, we simply take “across from each other,” with respect to the segments m = 1 andy(” = y. 4 (see Fig. 6). Later, we will define a neighborhood system The other component in the model is a penalty function for S such that the bonding is nearest-neighbor (relative V(x) which counts the number of “taboo patterns” in x; to a) in the boundary model, whereas in the region model states x for which V(x) > 0 are “forbidden.” For ex- there are interactions at all scales. This has important con- ample, boundary maps are penalized for dead-ends, sequences for the distribution of local minima in the “en- “clutter,” density, etc. whereas partitions are penalized ergy landscape”; see Section 11. Other energy functionals for too many transitions or regions which are “too small. ” with global interactions can be found in [22], [26], and Given the observed image y, our estimate i? = a(y) is [471. then any solution to the constrained optimization problem The “interaction” between x and y is defined in terms minimize,: v(x)= U(x, y) . (1.2) of an enerw“4 function We seek to minimize the energy of the data-label inter- action over all possible nonforbidden label states x. The rationale for constrained optimization is that our The summation extends over all “neighboring pairs” (or expectations about certain types of labels are quite precise “bonds”) ( s, t ), s, t E S; 9,,t(y) is a measure of the and rigid. For example, most “physical boundaries” are disparity between the two blocks of pixel data associated smooth, persistent, and well-localized; consequently it is with the label sites s, t E S. *,,,(x) depends only on the reasonable to impose these assumptions on image bound- labels x, and x,. In fact, we simply take *,,,(x) = 1 - aries, and corresponding restrictions on partition geome- x,x, in the boundary model and *,,,(x) = SXs=,, in the tries. Contrast this with other inference problems, for ex- partition model. In this way, in the “low energy states,” ample restoring an image degraded by blur and noise. large disparities (a > 0) will typically be coupled with Aside from contraints derived from scene-specific knowl- an active boundary (x, = x, = 1 ) or with dissimilar region edge, the only reasonable generic constraints might be labels (x, # x,) and small disparities (9 < 0) will be ‘‘piecewise continuity, ” and generally the degree of am- coupled with an inactive boundary (x, = 0 or x, = 0) or biguity favors more flexible constraints, such as those in with equal region labels (x, = x,). U, or in the energy functions used in [20] and [22]. The interaction between the labels and the data is based There are no multiplicative parameters in the model, on various disparity measures for comparing two (possi- such as the “smoothing” or “weighting” parameters in bly distant) blocks of image data. These measures derive [3], [17], [19], [20], and [22]; in effect, the energy is U from the raw data as well as from various transforma- + X V with X = 00. Thresholds must be selected for the tions. We experiment with several transformations y -+ disparity measures; this may be done “in the blind,” but y ’ for texture analysis, for example transformations of the the performance of the algorithm is certainly increased if form these choices are data-driven, either from prior experi- ence with the types of textures, or by extracting training Y; = Y, - samples, as we have done here (see Section V). Fortu- nately, the algorithm is not unduly sensitive to these where Eai = 1 and { tj } are pixels nearby to pixel t in the choices within a certain range of values. Other inputs in- same row, column, or diagonal. We call these “direc- clude the label resolution, block sizes, and penalty pat- tional residuals,” regarding Eaj y,, as a “predictor” of y,. terns. The algorithm is robust against these choices pro- We have also investigated raw gray levels ( y‘ = y), other vided that modest information is available about the pixel features based on local measures of gray-level range, and resolution. The selection of penalty patterns is nearly self- isotropic versions of (1.1). In any case, disparity is mea- evident: as already noted, our expectations about the ge- sured by the Kolmogorov-Smirnov statistic (or distance), ometry of regions and boundaries are very concrete and a common tool in nonparametric statistics which has de- easily encoded in local constraints. sirable invariance properties. (In particular, using the di- rectional residuals, the disparity measure is invariant to C. Constrained Stochastic Relaxation linear distortions (yo -, ayi/ + b) of the raw data, and The search for i,the optimal labeling, is based on a using the raw data itself for comparisons, the disparity version of stochastic relaxation which incorporates hard measure is invariant to all monotone (data) transforma- constraints. The theoretical foundations are laid out in tions.) The general form of the 9 term is then [ 181, although there is enough information provided here (see Section IV) to keep this paper self-contained. (See also Grenander [28] and Linsker [41] for similar modifi- cations of stochastic relaxation.) We employ two algo- where 4 is monotone increasing, p denotes a distance rithms, both of which are approximations to precise com- based on the Kolmogorov-Smirnov statistic (see Section putational procedures, in which convergence is at least __

612 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 12. NO. 7, JULY 1990

theoretically guaranteed, if not always realized in prac- is intended to measure the overall similarity between two tice. regions. Edges are then detected using the Robert’s cross Introduce a control parameter t corresponding to “tem- gradient applied to this distance; there is no effort to or- perature” and another control parameter X, corresponding ganize the responses. Our experiments compare favorably to a Lagrange multiplier for the constraint V = 0. Let to those in [57] and [59], both of which utilize collages of natural textures similar to ours. Local filtering is also the focus of Laws’ approach [40], although the mecha- nism for combining cues is entirely different from ours. where y (the data) is fixed, tk L 0, and Xk 7 03. One Among the statistical approaches to texture segmenta-

generates a sequence of states fk,k = 1, 2, * - - 3 by tion are those in which the image data is regarded as a Monte Carlo sampling from the Gibbs measures with en- realization of a two-dimensional stochastic process, or ergy functions U,. If tk is fixed, then under suitable con- random field. In Simchony and Chellappa [53] and in ditions on the growth of Xk, the sequence xk converges to Derin and Cole [12], the image is modeled as a two-lay- a sample from the Gibbs distribution with energy U, but ered Markov random field: the “upper level” is the re- restricted to V = 0, i.e., constrained to assign probability gion process, a simple Ising-type process, and the “bot- zero to the forbidden configurations. We refer to this as tom level” is the observed intensity process, with some

“constrained stochastic relaxation. ” On the other hand, distribution (e.g., Gaussian) conditional on the regions. if Xk 7 03 and tk L 0, both at suitably (coupled) rates Both papers employ stochastic relaxation to search for the (see Section IV), then the sequence converges to a solu- MAP and related estimators. Modestino et al. [45] use tion of (1.2); this is “constrained simulated annealing. ” random tessellations for the upper level and maximum The first algorithm we use is deterministic, with the likelihood estimation based on cooccurrence data for the temperature fixed at zero, and is essentially the ICM (“it- labeling. In each case, the random field parameters are erated conditional mode”) algorithm of Besag [2], al- assumed known in advance or are estimated from training though X = hk 7 a.The convergence is rapid and the samples. The image data in [12] is generated from the performance is sufficient for all but the most difficult im- model, and simpler than the actual texture collages in [53] ages. The second algorithm is constrained stochastic re- and [45]. Again, our results compare favorably. laxation with the temperature fixed at some “small” value Finally, Kashyap and Eom [37] employ a “long cor- (but, again, X = hk 7 03); the idea is simply that the relation” random field model and statistical hypothesis very likely states of the constrained Gibbs distribution testing to obtain a boundary map. Boundary placements should be “close” to the mode, i.e., f. This “low tem- are estimated in each small strip of the image based on perature sampling’ ’ is more computationally demanding least-squares estimates of the model parameters. There is than ICM, but invariably arrives at a better labeling; see no training data and the results are good, although, nat- Section V on experiments. urally, the boundaries are somewhat disorganized. D. Related Work E. Stochastic Formulation The subject of edge detection is very active; however, This work is an extension, incorporating hard con- as we have already mentioned, the raw output of standard straints, of a Bayesian paradigm in which prior probabil- edge detectors, for example those based on differential ity models are constructed for both the observed and operators, would generally require considerable “pro- unobserved scene attributes. The degradation process is cessing” to produce unique, smooth, persistent bound- modeled as a conditional probability distribution, and the aries. The work in “regularization theory” [44], [49], as estimates are obtained by a form of Monte Carlo sam- well as our earlier paper [20], is different in that “edge” pling. This framework has been applied by numerous re- or “line” process is employed there as a device for sus- searchers, including applications to image restoration pending continuity constraints in the context of recon- [20], [2], [9], [24], [28], image synthesis [28], computed structing an image from sparse data, restoring an image tomography [22], texture and boundary analysis [ 171, corrupted by blur and noise, and similar tasks not aimed [53], [ 191 , [2 11 , [25], scene segmentation based on opti- at boundary detection per se. cal flow [47] or shading and texture [ 111, [ 131, frame-to- In contrast, there is much related work in the area of frame matching for computing optical flow and stereo dis- texture segmentation. In Triendl [58] and Triendl and parity [36], and surface reconstruction [43], [44]. Thompson [59], local property values referred to as “tex- To facilitate placing our model in the general frame- ture parameters” are computed in a neighborhood of each work, let us write x = (xL,xp), where xLis the vector of pixel and edge maps are computed for several principal boundary or region labels, and xp is the vector of pixel components of these feature images. These maps are in- gray levels. These are the relevant image attributes for the tegrated across components, and then spatially, resulting problem at hand. The prior distribution ll is a probability in a final boundary map. Similarly, in Thompson [57] (see for x: 0 I n(x) I 1, C,II(x) = 1, where C, is the also Zobrist and Thompson [62]), the idea is to integrate summation over all configurations of x (all assignments of multiple cues into a single “distance function” which is gray levels and boundary placements, for example). a linear combination of elementary measures, and which Adopting the Gibbs representation in the unconstrained __

GEMAN et a/.: BOUNDARY DETECTION BY CONSTRAINED OPTIMIZATION 613

case we would represent II as exact MAP estimate and approximations derived from simulated annealing in the commentary of Greig, Por- 1 n(x) = - exp { - i=Cexp{- teous, and Seheult [27]. However, pixel-based error I X measures are too local for boundary analysis. In particu- The real-valued function U is called the energy, and evi- lar, the Bayes rule based on misclassification error rate, dently determines II. In our case we separate the prior namely the marginal (individual component) modes of energy into a pixel-label interaction term and a pure label n(x I y), is unsuitable because this estimator lacks the fine contribution. The former, U (xL, xp), promotes place- structure we expect of boundary maps; placement deci- ments of boundaries, or assignments of distinct labels, be- sions cannot be based on the data alone-pending labels tween regions in the image that demonstrate distinct spa- (i.e., context) must be considered. See [44], [50], and tial patterns. The pure label contribution is to inhibit [61] for discussions of alternative loss functions and per- “blind” endings of boundaries, redundant boundary rep- formance criteria. resentations, excessively small or thin regions, and other 11. PARTITIONMODEL unexpected label configurations. Rather than inhibiting these configurations with high energy via large parame- A. Partitionings ters, we have found it convenient, especially when work- Denote the pixel (image) lattice { (i,j ): 1 Ii, j IN } ing with boundaries and partitionings, to extend this by SI and let S, (formally S in Section I) be the label lat- framework, by allowing ‘‘infinite energies” (zero prob- tice, just a copy of SI in the case of the partition model. abilities) in the prior distribution. (See Moussouris [46] For each experiment, a resolution o is chosen, which de- for an analysis of Gibbs measures with “forbidden termines a sublattice Sfp’ C s,, and the coarseness of the states. ”) We forbid these configurations by introducing V partitioning. Larger 0’s will correspond to coarser parti- = V(xL) 2 0 and concentrating on {x: V(xL)= O}. tionings and give more reliable results (see Section V), The prior, then, is of the form but they lose boundary detail. Specifically, let

The constraint, V(xL)= 0, amounts to a placement of Recall that the observation process, or data, consists of infinite energy barriers in the “energy landscape. ” These gray levels y,, s E SI. With the usual gray-level discreti- inhibit the free flow that is essential to the good perfor- zation, the state, or configuration, space for the data is mance of stochastic relaxation; indeed the theory will1 in {{ys}:s~Sr,0_(y,~255) general break down, and convergence is no longer guar- a,= anteed. As indicated above, a simple and effective solu- The configuration space for the partitioning x is deter- tion is to introduce the barriers gradually during the re- mined by U and by a maximum number of allowed regions laxation process. Again, the supporting convergence P: theory is spelled out in [ 181. There is usually a problem-specific degradation that ay’= { {x,]: s E sp, 0 5 x, IP - 1). precludes directly observing x. It may be the blur and Recall that the labels are generic: x defines a partitioning noise introduced by the camera and recording devices, the by identifying sites with a given label (0, 1, * . ,p- attenuated Radon transform that figures into emission 1) as belonging to the same region. Only the sublattice tomography, or, as in this case, an “occlusion”: we as- Sfp)is labeled, and a maximum number of labels (regions) sume the gray levels xp are observed (uncorrupted) is fixed a priori. A prior estimate of the number of distinct whereas the labels xLare of course unobserved. Thus the (but not necessarily connected) regions must be available, data is y = xp and our only interest is in estimating the since the model often subdivides homogeneous regions unobserved label process xL. II ( y I x) is singular, and the when P is too large (see Section V). The boundary model posterior distribution for xLgiven the data is (Section 111) is more robust in this regard. Each label site s E S, is associated with a square block D,E SI of pixel sites centered at s. (Recall that S, is just a copy of S,; we sometimes use “s” ambiguously to ref- There is skepticism about the MAP estimator: see, e.g., erence a site in S, und the corresponding site in s/.) x, [2], [15], [44]. It has sometimes been found to be too labels the pixels in 0,: { { yr }: r E D,} . As we will see “global,” leading to gross mislabeling in certain classi- shortly, the partitioning is based on the spatial statistics fication problems and “over-smoothing” in surface re- of these (overlapping) subimages. The size of D, is there- construction and image restoration. (See [7] for a different fore important. We have experimented only with textures view.) The discussion paper of Besag [2] has shed much (the boundary model has been applied more generally), light on the subject; see especially the remarks of Silver- and is obvious that for these the pixel blocks { D,)seSy man [52] on MAP versus simulations from the posterior must be large enough to capture the characteristic pattern n(x I y), and the remarkable comparisons between the of the texture, at least in comparison to the other textures 614 IEEE TRANSACTIONS ON PATTERN ANALYSlS AND MACHINE INTELLIGENCE, VOL. 12, NO. 7. JULY 1990 present. Of course, “large enough” is with respect to the 0001100 1112211 2221100 0001100 1112211 2221100 features used, but in the absence of a multiscale analysis, 0011100 1122211 2211100 an a priori choice of scale is unavoidable. In all of our 0011000 1122111 2211000 0110000 1221111 2110000 experiments, 1 D,1 = 441, a 21 x 21 square block of pix- 0110000 1221111 2110000 els. There is again a resolution issue: larger blocks char- 1110000 2221111 1110000 acterize the textures more reliably, having less within-re- Fig. 1. Left: original “image” and a correct labeling. Middle: a correct gion variation, but boundary detail is sacrificed. labeling. Right: spurious labeling. Label-Data Interaction B. 0001122 1112200 0001100 We establish a neighborhood system on the label lattice 0001122 1112200 0001100 0011122 1122200 0011100 Sp’:each s E Sp’is associated with a set of neighbors N, 0011222 1122000 0011000 C The system is symmetric, meaning that s E N, 0112222 1220000 0110000 Sp’. e 0112222 1220000 0110000 r EN,. As we shall see, the neighborhood system largely 1112222 2220000 1110000 determines the computational burden. For now we will Fig. 2. Left: original “image” and a correct labeling. Middle: a correct proceed as though the neighborhood system is given, but labeling. Right: spurious labeling. we will have much more to say about it shortly. Let (s, t), denote a neighbor pair, meaning s, t E Sfp’, s E N,. We will introduce a disparity measure a,,, = two kinds of global minima: correct labelings, in which a,,,( y) for each neighbor pair ( s, t ),. Roughly speak- there are two populations of labels corresponding to the ing, +,,, measures the similarity between the pixel gray two gray-level regions; and spurious labelings, in which levels in the two pixel blocks associated with s, t E Sp’. the three regions (two of type “0” and one of type “1”) For the partition model, +,,t is simply - 1 (“similar”) or are given three distinct labels. + 1 (“dissimilar”); it is more complicated for the bound- (See Fig. 2.) If R = 3, and region “0” does not ary model. The interaction energy is then neighbor region “2”, then there are again two kinds of global minima: correct labelings have three labels; spu- rious labelings have only two, incorrectly identifying re- gions “0” and “2”. where 6{,,,,) = 1 if x, = x,, and 0 otherwise. In the low Quite obviously, the model requires more global inter- energy states, similar (resp. dissimilar) pairs, a,,, = - 1 actions. In particular, just a few long range interactions (resp. a,,, = +l), are associated with identical (resp. would distinguish the correct from the spurious labelings. distinct) labels: x, = x, (resp. x, # x,). Although U(x, Only a correct labeling would achieve the global mini- y)is conceived of as the interaction term in a prior dis- mum of U in these two examples. tribution that is jointly on x and y, only the posterior dis- The third difficulty with local neighborhoods is com- tribution is actually used, and y is fixed by observation. putational, and is already apparent when R = 1 and P = It would be interesting, and perhaps instructive (see [35]), 2. This time there are only two global minima, and each to sample from the joint distribution, but computationally is a desirable labeling ( { x, = 0 V, E S, } or { x, = 1 Vs E very expensive. S, } ) . But, with N = 5 12, for example, consider the label configuration in which xu = 0 whenever 1 Ii I256 and C. Neighborhood System xu = 1 whenever 257 I i I 512, a half “black” and A simple example will serve to highlight the issues. half “white” picture. This is a local minimum, and rather Suppose y has R constant gray-level (untextured) regions severe in that it would take very many “uphill” or “flat” (y, E (0, 1, . . * R - l}, s E SI), and u = 1 (full reso- moves (single site changes) to arrive at either of the global lution). Of course, in this case y is a labeling, so there is minima. SR is a local relaxation algorithm, and despite no point in bringing in the partition process x; but this is the various convergence theorems, the practical fact of just an illustrative example. The obvious disparity mea- the matter is that “wide” local minima such as these are sure is simply @,,, = - l if y, = y,, and + l otherwise: impossible to cope with. (But, there are some encourag- ing results in the direction of multiscale relaxation, see [24], [5], [54].) Many readers will be reminded of the Ising model, in the absence of an external field, and the Entertain, for the time being, a nearest neighbor system notorious difficulty of finding its (two) global minima by on S,, which is the natural choice. To be concrete, take Monte Carlo relaxation. In fact, the R = 1, P = 2 energy N, to be the four (two horizontal and two vertical) nearest landscape is identical to that of the Ising model, as is neighbors of s. There are three essential difficulties with readily demonstrated by a suitable transformation of the this choice of neighborhood system. Two can be readily label variables. (Indeed, the same goes for the R = 2, P appreciated: = 2 case, although this is less obvious. A suitable trans- (See Fig. 1.) If R = 2 and P = 3, and if region “0” formation identifies the two Ising minima with the two (i.e., { s E SI: y, = 0}) is split into two disjoint pieces acceptable labelings: x, = 0 e y, = 0 and x, = 1 e y, by region “1” (i.e., {s E SI: y, = l}), then (2.1) has = 0.) ~

GEMAN et al. : BOUNDARY DETECTION BY CONSTRAINED OPTIMIZATION 615

These local minima can be mostly eliminated by intro- neighborhood system would have a gradual fall-off of in- ducing long range interactions in the label lattice Sp’; the teraction densities with distance, a system with an equal same remedy as for the label ambiguities. We will provide number of neighbors at each Manhattan distance, for ex- a heuristic argument for the important role of long range ample. interactions in creating a favorable energy landscape. In any case, simulations firmly establish their utility. First D. Kolmogorov-Smirnov Statistic recall that the distance between two sites in a graph is the At the heart of the partitioning and boundary algorithms smallest number of edges that must be crossed in travel- is a disparity measure ay,,.Recall that ifs, tare neighbors ing from one site to the other. Notice that in the four near- in Sp’( (s, t),) then a,,/ is a measure of disparity be- est neighbor graph (two-dimensional lattice) the average tween two corresponding blocks of pixel data, { { y,}: r distance between sites is large. Correct partitioning re- E D,} and { { yr}: r E D,} in the case of the partition quires all pairs of label sites to resolve their relationships model. We base as,,on the Kolmogorov-Smirnov dis- (“same” or “different”), as dictated by the statistics of tance, a measure of separation between two probability their associated pixel blocks. Of course most pairs are not distributions, well-known in statistics. When applied to neighbors. With a local relaxation, such as SR, the reso- the sample distributions (i.e., histograms) for two sets of lution is achieved by propagating relationships through data, say U(’) = {U\’), vi’), - , vi:)} and d2)= intervening sites. Thus the task is facilitated by minimiz- { vi2’, vi2’, - * , U(*)} it provides a test statistics for the ing the number of intervening sites, and a relatively small hypothesis that v(Irand d2)are samples from the same number of long range connections can drastically reduce underlying probability distribution, meaning that Fl = F2 the typical number of these. where, for i = 1, 2, U(’) = {vy), dj’),* * , U:’} are The largest distance over all pairs of sites is the diam- independent and identically distributed with F, (t) = eter of a graph. In an appropriate limiting (large graph) P(25(‘’5 t). The test is designed for continuous distri- sense, random graphs have minimum diameter among all butions and has a powerful invariance property which will graphs of fixed degree.’ In light of our heuristics, this be discussed below. suggests a random neighborhood system for Sp’. Indeed, The sample distribution function of a data set { vl, v2, random neighborhoods have a remarkable effect on the ... 9 U, 1 is structure of local minima for these systems. In a series of experiments, with “perfect” disparity data (such as the (T = 1 gray-level problems discussed above) we could al- ways achieve the global minimum by single-site iterative Thus, P is a step function, with jumps occurring at the improvement when adopting a random graph neighbor- points { vk } . It characterizes the histogram. Now consider hood configuration, using rather modest degrees for large two se;s of data U(’), d2)with sample distribution func- graphs. We conjecture, but have been unable to prove, tions Fl, F2. The Kolmogorov-Smimov distance (or sta- that even with the degree a vanishingly small fraction of tistic) is tke maximum (vertical) distance between the the graph size, random graphs (in the “large graph graphs of F,, F2, i.e., limit”) have no local minima, under the Ising potential or the potential U (x, y) with perfect disparity data (2. I). d(v“’, d2’)= max IP,(t) - P2(t)l. (2.2) Of course the disparity data is not usually perfect. In -a c,)(y(’)) - l] “measure of homogeneity” which is invariant to point- 1 ~}(y)- 1. ibrated, regardless of the differing statistical properties of In other words, @,,r is 1 or - 1 depending on whether the the two regions; i.e., the disparity measure has the same Kolmogorov-Smimov statistic is above threshold or not. interpretation anywhere in the image. Of course, many distinct textures have nearly identical F, Penalties histograms (see [21] for some experiments with partition- ing and classification of such textures, also in the Baye- Recall that V(x) counts the total number of “penal- sian framework). In these cases, discrimination will rely ties’’ associated with x E np9‘).There are two kinds of on features, or transformations, that go beyond raw gray “forbidden” configurations that give rise to penalties: levels, involving various spatial statistics. We use several roughly, these correspond to very small regions and very of these at once, defining @,,, to be 1 if the Kolmogorov- narrow regions. Fix r~ and s E Sp). Let E, be the 5 x 5 Smimov statistic associated with any of these transfor- block of sites in Sp) centered at s. A configuration x is mations exceeds a transformation-specific threshold, and “small at s E SF)” if fewer than nine labels in { x,: t E - 1 otherwise. The philosophy is simple: if enough trans- E,} agree with x,. Notice that a right comer at s is al- formations are employed, then two distinct textures will lowed; there are exactly nine agreements. The total num- differ significantly in at least one of the aspects repre- ber of penalties for “small regions” is sented by the transformations. Unfortunately, the imple- (2.4) mentation of this idea is complicated; more transforma- tions mean more thresholds to adjust, and more Obviously, the numbers “5” and “9”, as well as other possibilities for “false alarms” based on “normal” vari- penalty parameters below, are quite arbitrary, and could ations within homogeneous regions. reasonably be scale-dependent. Proceeding more formally, let A denote one such data As for “thin regions,” these are regions that have a transformation and put y’ = A( y), the transformed im- horizontal or vertical “neck” that is only one label-site age. In general, y: is a function of both y, and the gray- wide (at resolution a). Let 7h be a one-site horizontal levels in a window centered at t E s,. For example, y: translation within Sp), and let 7, be the analogous vertical might be the mean, range, or variance of y in a neighbor- translation. Penalties arise when either { xSpr,,# x, and x, hood of t, or a measure of the local “energy” or “en- # x, + T,, } or { x, - rl, # x, and x, # x, + rl,}. The number of tropy.” Or, y,! might be a directional residual defined in ‘ ‘thin-region’’ penalties is therefore (1.1); isotropic residuals, in which the pixels { t,} sur- round t, are also effective. Notice that any A given by (2.5) (1.1) is linear in the sense that if y, + ay, + b, Vs E SI then y: --* 1 a I y: , Vs E SI, and recall that the Kolmogorov- and V(x)is just the sum of (2.4) and (2.5). GEMAN er al. : BOUNDARY DETECTION BY CONSTRAINED OPTIMIZATION 617

G. Summary 0 0 0 We are given + + 0 0 0 I) a gray-level image y = { yij >; 2) a resolution u = 1,2, * * - , and a maximum number + + of labels P; 0 0 0 3) a disparity measure CP,,,( y)for each pair ( s, t ), in Fig. 3. Pixel sites ( O)and boundary sites ( + ) for a 3 x 3 lattice the sublattice Sfp’; 4) a collection of penalty patterns. +.+’+’+ +..+.. + The (MAP) partitioning .2 = a( y)is then any solution ...... x E QfP,‘) of the constrained optimization +.+.+.+ ...... +‘.+.’+ +.+.+.+ ...... +’+.+.+ +..+.. + where V(x)is the number of penalties in x. Sp Fig. 4. The boundary grid S‘,.’ at resolutions U = 2 (left) and U = 3 (right). 111. BOUNDARYMODEL A. Boundary Maps The pixel lattice is again SI. Let S, denote another reg- maps x which are faithful to the data y in the sense that ular lattice interspersed among the pixels (see Fig. 3) and “large” values of As,,(y) are associated with “on” seg- of dimension (N - 1 ) X (N - 1 ); these are the “bound- ments (x,x, = I) and “small” values with “off” seg- ary sites.” We will associate s = (i,j ) E S, with the pixel ments (x,x, = 0). There are no a priori constraints on x (i,j ) E SI to the upper left of s. at this point; in fact, because of digitization effects, tex- Given y, a gray-level image, we wish to assign values tures, and so-on, the energy U will typically favor maps to the boundary variables x = { x,, s E S,}, where x, = x with undesirable deadends, multiple representations, 1 (resp. 0) indicates the presence (resp. absence) of a high curvature, etc. These will be penalized later on. A boundary at site s E S,. We have already discussed the simple choice for the x/y interaction is corresponding interpretation of the boundary map x = x ( y ) in terms of physical discontinuities in the underlying u(x, Y> = (1 - xsxr)$(A,.,(Y)> (3.1) three-dimensional scene. We establish a boundary reso- (S,f)O lution or grid size a 2 1, analogous to the resolution used where the summation extends over all nearest-neighbor earlier for the partition model. Let Sg) C SB denote the pairs (s, t),; the “weighting function” +(x), x 1 0, sublattice {(iu + 1,ju + 1): 1 5 i,j 5 (N - 2)/u}. will be described presently. Only the variables x,, s E Sfp’, interact directly with the The energy in (3. l), which is similar to a “spin-glass” data; the remaining variables x,, s E S, \ Sg), are deter- in statistical mechanics, is a variation of the ones we used mined by those on the “grid” Sfp’. Fig. 4 shows the grids in our previous work [19], [20]; when a = 1, the variable Sh2) and Sb3’ for N = 8; the sites off the grid are denoted xsxr corresponds directly to the “edge” or “line” vari- by dots. The selection of a influences the interpretation ables in [20] and [47]. Since y is given, the term 1 - x,~, of x, the computational load, the interaction range at the can be replaced by -x,xr with no change in the resulting pixel level, and is related to the role played by the size of boundary interpretation. By contrast, in [20] we were the spatial filter in edge detection methods based on dif- concerned with image restoration and regarded both x and ferential operators. Finally, let QI and Qg)denote the state y as unobservable; the data then consists of some trans- spaces of intensity arrays and boundary maps respec- formation of y, involving for example blur and noise. In tively; that is, that case, or in conceiving U as defining a prior distribu- tion over both y and x, the bond between the associated QI = { {ys}:s €SI, 0 5 ys 5 2551, pixels should be broken when the edge is active, i.e., 1 np= { (x,}: s E sp, x, E (0, I}}. - x,x, = 0. The term 1 - x,xt is exactly analogous to the “controlled-continuity functions” in [56]. See also [35] Sometimes, we simply write Q, for fig’. for experiments involving simulations from a related Mar- kov random field model; the resulting “fantasies” ( y, x) B. Boundary-Data Interaction are generated by stochastic relaxation and yield insight Let ( s, t ),, s, t E Sfp’ denote a nearest-neighbor pair into the nature of these layered Markov models. relativetothegrid. Thus,s = (iu+ l,ja+ 1),t= (ka Returning to (3. I), a little reflection shows that 4 should + 1, la + 1 ) is such a (horizontal or vertical) pair if either be increasing, with 4(0) < 0 < $(+m); otherwise, if i = k andj = I ? 1, orj = I and i = k +_ 1. We identify $ were never negative, the energy would always be min- ( s, t ), with the elementary boundary segment consisting imized with x, 3 1. The intercept d* = $-‘(O) is criti- of the horizontal or vertical string of u + 1 sites (in S,) cal; values of A above (resp. below) d* will promote including s, t and the u - 1 sites “in between.” (resp. inhibit) boundary formation. The influence of this The energy function U(x,y) should promote boundary threshold is reduced by choosing 4’ (d”) = 0. We employ 618 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 12. NO. 7. JULY 1990 the simple quadratic 0 + + + 0 0 0 + Fig. 5. Pixi pairs (:>’s) associated with horizontal and vertical boundary segments.

00000 00000

Notice that if we select (Y = max A,y,,then the maximum +(SI “penalty” (+ (0) = - 1 ) and “reward” (+ (a)= 1 ) are 00000 00000 balanced. D.. o o o o o o o o o oDr.

C. Disparity Measures 00000 00000 +(t) We employ one type of measure for depth and shape 00000 00000 boundaries and another for the texture experiments. In the former case, the disparity measure involves the (raw) Fig. 6. Pixel blocks D,,and D,,associated with the boundary segment ( S, gray-levels only, whereas for texture discrimination we t)3. also consider data transforms based on the directional re- siduals (1.1). Except when U = 1, the data sets are com- One difficulty with (3.4) is that the distance between pared by the Kolmogorov-Smimov distance. two nonoverlapping histograms is the maximum value, The disparity measure should gauge the intensity namely 1, regardless of the amount of separation. Thus, “flux,” As,,= A$P,‘(y)2 0, across (s, t),, i.e., orthog- two constant regions differing by a single gray-level are onal to the associated segment. At the highest resolution as “far apart” as two differing by 255 levels. Thus, it is (U = l), the measure 1 ys, - y,, 1 (where s*, t* are the occasionally necessary to “desensitize” (3.4) for exam- two pixels associated with the boundary sites s, t-see ple by “smearing” the data or perhaps adding some noise Fig. 5) can be effective for simple scenes but necessitates to it; see Section V. a single differential threshold: differences above d* (resp. Raw gray-level data is generally not satisfactory for below d* ) promote (resp. inhibit) boundary formation. discriminating textures. Instead, as discussed in Section Typically, however, this measure will fluctuate consid- 11, we base the disparity measure on several data trans- erably over the image, complicating the selection of d*. formations, involving higher order spatial statistics, such (Such is the case, e.g., for the “cart” scene, see Section as the directional residuals defined in (1.1). Given a fam- .) V Moreover, this measure lacks any invariance proper- ily A,, A2, * * . , A, of these transforms (see Section II), ties, as will be explained below. A more effective mea- a resolution U, and blocks D,,, D,* as above, define sure is one of the form A,,,(y) = max [c;’d(y‘”(D,.), Y(~)(D~*)](3.5) Isism

where y(;)= Ai (y),1 5 i Im, and y(’)(D,)= { y:’), where the sum extends over parallel edges ( si, ti ) in the s E D,}, exactly as in Section 11. Then A.v,,(y) > d* immediate vicinity of ( s, t ). Thus the difference I ys. - (and, hence, +(As,,( y)) > 0) if and only if d( y(i)(D.y*), y,, 1 is “modulated” by adjacent, competing differences. y(‘)(D,*))2 d*cj for some transform i. The thresholds The result is a spatially varying threshold and the distri- cl, * - , c, are again chosen to limit “false alarms.” bution of As,t(y) across the image is less v@able than Finally, we note that (3.5) has the same desirable invari- that of 1 ys, - y,* I. Choosing y = const. x A, where A ance properties as the measure constructed for partition- is the mean (raw) absolute intensity difference over all ing (Section 11). (vertical and horizontal) bonds, renders As,,( y)invariant to linear transformations of the data; that is, As,,(y) = D. Penalties As,,(ay+ 6)for any a, b. V(x)again denotes the total number of “penalties” as- At lower resolution, let D,, and D,, denote two adjacent sociated with x E ap). These penalties are simply local blocks of pixels, of equal size and shape. An example is binary pattems over subsets of Sg). Fig. 7 illustrates a illustrated in Fig. 6 for the case of two square blocks of family of four such pattems; they can be associated with size 52 = 25 pixels which straddle a vertical boundary any resolution U by the obvious scaling. These corre- segment with U = 3. Let y(D,) = { ys, s E D,}, r = s*, spond, respectively, to an isolated or abandoned segment, t*, be the corresponding gray-levels and set sharp turn, quadruple junction, and “small” structure. Depending on U, the pixel resolution, and scene infor- As.,(Y> = d(Y(DS*)>Y(D,*))3 (3.4) mation, we may or may not wish to include the latter where d is the Kolmogorov-Smirnov distance discussed three. For example, including the last one with U = 6 in Section 11. This is the disparity measure used for the would prohibit detection of a square structure of pixel size House and Ice Floe scenes (see Section V). 6 x 6. GEMAN et al. : BOUNDARY DETECTION BY CONSTRAINED OPTIMIZATION 619

We have studied [20] Monte Carlo site-replacement al- gorithms for the unconstrained versions of these prob- Fig. 7. Forbidden boundary patterns. lems: stochastic relaxation (SR) for sampling, and sto- chastic relaxation with simulated annealing (SA) for optimization. SA was devised in [8] and [39] for mini- 1.1.0 11100 ..... 00100 mizing a “cost functional” U (e.g., the tour length for 0.1.0 00100 the traveling salesman problem) by regarding U as the en- ..... 00100 ergy of a physical system and simulating the dynamics of 0’1.1 00111 chemical annealing. The effect is to drive the system to- (a) (b) wards the “ground states,” i.e., the minimizers of U. Fig. 8. (a) A boundary configuration at resolution U = 2. (b) Completion This is accomplished by applying the Metropolis (relax- of (a) from U = 2 to U = 1. ation) algorithm to the Boltzmann distribution

Finally, there is a natural extension from Qg)to QB (4.1 ) which is useful for display and evaluation. Given x E Qg),we define x, = 0 for sites s E SB \ Sg)lying on a row or column disjoint from Sg), and x,~= xt,xt,if s lies on a at successively lower values of the “temperature” t. segment ( tl, tz >,. Thus, for example, the state x E Qb2) We presented two theorems in 1201: one for generating in Fig. 8(a) is identified with the state x E QB in 8(b). a sequence { X(k) } which converges in distribution to E. Summary (4.1) for t = 1 (SR), and one for generating a sequence { X(k) } having asymptotic distribution the uniform mea- We are given sure over Q, = (x E Q: U(x)= U}, U = minx U(x),(SR 1) a gray-level image y = { yu}; with SA). The essence of the latter algorithm is a “cool- 2) a resolution level U = 1, 2, * * . ; ing schedule” t = t,, t2, - - * for establishing conver- 3) a disparity measure @,,t( y) = +( As,t(y)) for each gence. SA has been extensively studied recently [4], [lo], neighbor pair ( s, t >, in the sublattice Sg); [23], 1241, [30], 1321, 1341, [55]; see also the comprehen- 4) a collection of penalty patterns. sive review [l] and the references therein. The (MAP) boundary estimate i = i ( y ) is any solution Results concerning constrained SR and SA are reported x E Qg)of the constrained optimization in [18], which was motivated by a desire to find a theo- minimizex:v(x)=o (1 - xsxt>+(As,r) retical foundation for the algorithms used here. We have (s,r), deviated from the instructions in 1181, with regard to the where V(x)is the number of penalties in x. cooling schedule, but at least we know that the algorithms represent approximations to rigorous results. IV. ALGORITHMS Both algorithms produce a Markov chain on Q by sam- We begin with an abstract formulation of the optimi- pling from the low-order, marginal conditional distribu- zation and sampling problems outlined in Sections 1-111. tions of the free Gibbs measures Recall that our search algorithms are motivated by the sto- chastic formulation of the problem, in particular the role exp { -t-’(U(x) + AV(~))} n(x; t, A) = of the posterior distribution (x I y ). However, y is fixed n CX’ exp { -t-’(U(x’) + AV(~’)))‘ by observation, and can be ignored in this discussion of computational issues. Thus, we are given two functions It is easy to check that U and V on a space of configurations 52 = { (xs,,xs2, ... , x,,): x,, E A, 1 s i IM } where A is finite, M is lim n(x; 1, A) = IT*(x) (4.2) X+W very large, and S = { sl, s2, * , sM} is a collection of “sites,” typically a 2-D lattice. Write x = (x,,, . . . , and that xsM)for an element of Q, and let

Q* = {x: ~(x)= U} v = min ~(x) lim n(x; t, A) = (4.3) X A-0, otherwise t-0 exp { II*(x) = &*(x) -w} where 52: = {aE Q*: U(o)= E}, ,$ = minx,,* U(x). C exp { -~(xj)}. Let IIo denote the uniform measure in (4.3). Sampling X’€i2* directly from II(x; t, A) is impossible due to the size of We wish to solve the constrained optimization problem Q; otherwise just use (4.2) and (4.3) to generate a se- minimize { U(x):V(x) = U } or to sample from the Gibbs quence of random variables X(k), k = 1, 2, * , with distribution IT*. (Recall that sampling at low temperature values in Q, and limiting distribution either II* or no. allows us to approximate the global minimum or to enter- However, we can evaluate ratios n(x; t, A)/II(z; t, A), tain other estimates, for instance the posterior mean.) x, z E 52, and hence conditional probabilities. The price 620 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 12. NO. 7. JULY 1990

for indirect sampling is that we must restrict the rate of Another variation is the analog for constrained relaxa- growth of X and the rate of decrease oft. tion of “zero-temperature’’ sampling, which has been ex- Fix two sequences { tk}, { A,}, a “site visitation” tensively studied by Besag [2] under the name ICM (for schedule {Ak},Ak C S, and let &(x) = n(x; tk, Xk). “iterated conditional modes”); see also [ 1 11, [ 141, and The set Ak is the cluster of sites to be updated at “time” [22]. Without constraints, this algorithm, which is deter- k; the “centers” of the clusters are addressed in a raster ministic, results in a sequence of states X(k) which scan. In our experiments we take either I Ak 1 = 1 or 1 Ak 1 monotonically decrease the global energy, i.e. , increase = 5,inwhichcasetheAk’sareoftheform {(i,j),(i + the posterior likelihood. The constrained version operates 1,j), (i - l,j),(i,j + 11, (i,j - I)}. as follows. Recall that when the set of sites Ak + is visited Define a nonhomogeneous Markov chain { X( k), k = for updating, we defined X( k + 1 ) by replacing the co- 0, 1, 2, - } on Q as follows. Put X(0) = q arbitrarily. ordinates of X(k) in Ak+l by a sample drawn from the Given X(k) = (X,,(k), , XsM(k)),define X,(k + 1) conditional distribution of Ilk + on { x,, s E Ak+ } given =X,(k)fo~-s$A~+~andlet{X,(k + l):s~A~+~}bethe values { x, = X, (k),s 6 Ak+ } . Suppose we replace a (multivariate) sample from the conditional probability the sample with the mode, i.e., the most likely vector { x,, distribution I&+l(x,,s E Akfl Ix, = X,(k), s 6 &+I). s E Ak + } conditional upon { x, = X, (k), s g! Ak+ I }. In Then, under suitable conditions on { tk } and { hk} , either essence, we fix tk = 0. This generates a deterministic se- quenceX(k), k = 0, 1, 2, * - - depending only onX(O), lim P(X(k) = x(X(0) = q) = II*(x) nk,and { Ak}.(Notice that the mode is unaffected by tk k+m since it corresponds to the minimum of U(x) + Xk V(x). ) Then, during the kth sweep, with Xk, the energy or the limit is IIo(x). The condition in the former case X = U Xk I/ is successively reduced, just as in ICM where hk (constrained SR) is that tk = 1, Xk 7 03, and Xk 5 const. + log k. The condition for convergence to no(constrained = 0. Of course since there is nojixed (reference) energy, the algorithm cannot be conceived as one of iterative im- SA) is that tk \1 0, Xk 7 03 and tklXk Iconst. * log k. The algorithm yields a solution to the constrained opti- provement. Several experiments were run with both the mization problem (1.2) in the sense that the asymptotic stochastic and deterministic algorithms; see Section V. distribution of X(k) is uniform over the solution set: if

the solution is unique, i.e., Cl,* = {xo}, then X(k) -+ xo B. Summary in probability. See [ 181 for proofs. Each experiment used one of the following two varia- tions on constrained relaxation (see Section V for details): A. Approximations Choose a label resolution U and an associated pixel block size. The logarithmic rate is certainly slow. Still, we often Select features. adhere to it for ordinary annealing; others [36],[47] have Choose a site visitation schedule { Ak},Ak C S k = as well. We refer the reader to for some interesting [35] 1,2, *.-. comparisons between schedules. It is commonplace to find Fix a Zinear growth schedule for { Xk } . linear (tk = to - ak) and exponential ( tk = ( 1 - ~)~t~, Choose X(O), arbitrary. y small) schedules; here k refers to the number of sweeps Set k = 0. or iterations of S; in our experiments = or s sP’ sP’. Deterministic Algorithm: We now describe several protocols used in our experi- 0) Set tk = 1, k = 1, 2, * ments. One variant we do not use is to fix Xk = very - . X 1) Define X,(k 1) = X,(k), s $ Ak+land define large and do ordinary annealing, which might appear sen- + { X, (k 1 ): s E Ak+ } to be the (multivariate) mode of sible since the solutions to min { U(x):V(x) = 0} co- + incide with those of min { hV(x)}for all suf- U(x) + X nk+l(Xs,SEAk+l(xs= X,(k), S$Ak+l). ficiently large (due to the fact that Q is finite). However this is not practical: unless to is very large and tk is re- 2) k=k+ 1. duced very slowly, the system immediately gets stuck in 3) Go to 1. local energy minima of U + AV which are basically in- Stochastic Algorithm: dependent of the data, although faithful to the constraints. 0) Set tk = E, k = 1, 2, * (E “small”). It is better to begin with states faithjid to the data and 1) Define X,y(k + 1 ) = X,y(k), s 6 Ak+ and define slowly impose the constraints, a standard technique in { X, (k + 1): s E Ak + } to be a (multivariate) sample from conventional optimization. IIk+I(x,JEAk+lI~,=Xs(k),sg!Ak+I). One variation of constrained SR that has been effective is “low-temperature sampling”: fix tk = E (small) and let 2) k=k+ 1. Xk 7 03. The idea is to reach a likely state of the posterior 3) Go to 1. distribution II (x I y). In practice, we allow Xk to grow lin- Usually, but not always, the deterministic algorithm early; the details are in Section V. was sufficient (again, see Section V). GEMAN et al. : BOUNDARY DETECTION BY CONSTRAINED OPTIMIZATION 62 1

v. EXPERIMENTS’ The texture collages in Figs. 10 and 11 are harder. We A. Partition Model used four data transformations in addition to the raw pixel data. Hence, for these experiments m = 5, ycl)= y, and There are three experiments: an L-band synthetic ap- y(*), * - y(5)are based on various transforms. In partic- erture radar (SAR) image3 of ice floes in the ocean (Fig. ular, 9), a texture mosaic constructed from the Brodatz album y;” measures the intensity range in the 7 x 7 pixel [6] (Fig. lo), and another mosaic from pieces of rug, block V, centered at s: plastic, and cloth (Fig. 11). Processing: In each experiment the partitioning was y1*’ = maxtevSYr - minteV,Yr; randomly initiated; the labels, x,s E Sp),were chosen in- yf” is the “residual” (1.1) obtained by comparing y, to dependently and uniformly from 0, 1, P - 1. - - , the 24 “boundary pixels” (dV,) of V, (i.e., all pixels on Thereafter, label sites were visited and updated one at a the perimeter of the 7 x 7 block): time, by a “raster scan” sweep through the label array. MAP partitionings were approximated by “zero-temper- yl” = ature” sampling (see Section IV), with X = Xk increasing with the number of sweeps. Specifically, X was held at 0 through the first 10 sweeps, and thereafter was raised by and y‘4’ and y(5)are horizontal and vertical “directional 1 every 5 sweeps: hk = 0, k = 1, lo; Xk = 1, k = residuals” : 11, . 15; Xk = 2, k = 16, - 20; etc. Most probably, A could have been increased more rapidly, perhaps with every sweep, without substantially changing the results, but this was not systematically investigated. For the three experiments shown in Figs. 9, 10, and 11, between 15 Parameter Selection: The resolution (a) was 7 for the and 50 sweeps sufficed to bring the changes in labels to a SAR picture (Fig. 9); 15 for the Brodatz collage; and 13 halt; see below for more details. Recall that zero-temper- for the pieces of rug, plastic, and cloth. These numbers ature sampling corresponds to choosing the conditional were chosen more or less ad hoc, but are small enough to mode. Occasionally there are ties, and these were re- capture the important detail of the respective pictures solved by choosing randomly, and uniformly, from the while not so small as to incur the degraded performance, collection of modes. seen at higher resolutions and mentioned earlier. As a general rule, results were less reliable at higher The number of allowed labels is also important; recall resolutions (lower a’s) and when more labels were al- that too many usually results in over-segmentation. This lowed (higher values of P ). In these cases, repeated ex- was actually used to advantage in the SAR experiment periments, with different initializations, often produced (Fig. 9), where there are evidently two varieties of ice. different results. With P too large, homogeneous regions The best segmentations were obtained by allowing three were frequently subdivided, being assigned two or three labels. Invariably, two would be assigned to the ice, and labels. With too small, the tendency was to mislabel U one to the water. Using just two labels led to mistakes small patches within a given texture. It is likely that many within the ice regions, although there was little experi- of these mistakes correspond to local minima; perhaps mentation with the Kolmogorov-Smirnov threshold, and some could be corrected by following a proper annealing no attempt was made with the data transforms (m > 1 ) schedule (see Section IV), and by more careful choices of used for the collages. In the other experiments, the num- thresholds (see below). Here again, definitive experi- ber of labels was set to the number of texture species in ments have not been done. the scene. Measures of Disparity: Recall that the disparity mea- The most important parameters were the thresholds, sure is derived from the Kolmogorov-Smirnov distance { 1 Ii Im, associated with the Kolmogorov-Smir- between blocks of pixel data under various transforma- ci} nov statistics [see (2.3)]. For the SAR experiment, m = tions, as defined in Section 11, equation (2.3).For the SAR 1, and the threshold was guessed, a priori; it was found image, good partitionings were obtained using only the that small changes are reflected only in the lesser details raw data: m = 1 and y(l) is just y in (2.3). Evidently, of the segmentation. For the collages (m= 5 ), the thresh- gray-level distributions are enough to segment the water olds were chosen by examining histograms of Kolmogo- and ice “textures,” at least when supplemented by the rov-Smirnov distances for block pairs within homogene- “prior constraints” embodied in the penalty term V(x). ous samples of the textures. Thresholds were set so that no more than 3 or 4 % of these intraregion distances would ’Fortran code and terminal sessions are available. be above threshold (a “false alarm”). Of course, we ’We are grateful to the Radar Division at ERIM for providing us with would have preferred to find more or less universal the SAR image (collected for the U.S. Geological Survey under Contract 14-08-0001-21748 and the Office of Naval Research under Contract N- thresholds, one for each data transform, but this may not 00014-81-C-0692 and N-00014-81-C-0295). be possible. Conceivably, with enough of the “right” 622 IEEE TRANSACTIOES ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE. VOL. 12. NO 7. JULY 1Y90

(b) (b) Fir. 9. (a) Synthetic aperture radar image of oceanic ice floes. (b) Evo- Fig. 10. (a) Collage of five Brodatz textures. (b) Evolution of the labels. lution of the label states from random initialization (upper left) to final upper left to lower right. partition (lower right). transforms, one could set conservative (high) and nearly inal 256 X 256 image was processed, and is shown in universal thresholds, and be assured that visibly distinct Fig. 10(a). Leather and water are on top, grass and wood textures would be segmented with respect to at least one on the bottom, and sand is in the middle. The resolution of the transforms. Recall that the disparity measure (2.3) was CJ = 15, which resulted in a 16 x 16 label lattice is constructed to signal “different” when the distance be- SP). Fig. 10(b) shows the random starting configuration tween blocks, with respect to any of the transforms, ex- (upper left panel), the configuration after 5 iterations (up- ceeds threshold. per right panel), after 10 iterations (lower left panel), and Fig. 9 (SAR): As mentioned earlier, three labels were after 15 iterations (lower right panel), by which point the used, with the expectation that the ice would segment into labels had stopped changing. two regions (basically, dark and light). The resolution was Fig. 11 (Rug, Plastic, Cloth): The 216 X 216 image U = 7, and the Kolmogorov-Smimov statistic was com- in Fig. 1 I(a) was partitioned at resolution CJ = 13, with a puted only on the raw data, so m = 1. The threshold was 16 x 16 label lattice. The Kolmogorov-Smirnov thresh- cI = 0.15. The original image is 512 X 512 (the pixel olds were c, = 0.90, c2 = 0.49, ci = 0.20, c4 = 0.11, resolution is about 4 by 4 m), but to avoid special treat- and c5 = 0.12, corresponding to the same data transforms ment of the boundary, only the 462 X 462 piece shown used for the Brodatz textures (Fig. 10). The experiment in Fig. 9(a) was processed. The label lattice Sp)is 64 x makes apparent a huzurd of long range bonds: the gradual 64. Fig. 9(b) shows the evolution of the partitioning dur- but marked lighting variation across the top of the image ing the relaxation. For display. gray levels were arbitrar- produces a large Kolmogorov-Smimov distance when raw ily assigned to the labels. The upper left panel is the ran- pixel blocks from the left and right sides are compared. dom starting configuration. In successive panels are the This makes it necessary to essentially ignore the raw data states of the labels after each five iterations (full sweeps). Kolmogorov-Smirnov statistic, and base the partitioning In the bottom right panel, the two labels associated with on the four data transformations; hence the threshold cI ice are combined, “by hand.” = 0.9. The transformed data are far less sensitive to light- Fig. 10 (Brodurz Textures): The Kolmogorov-Smirnov ing gradients. Fig. 1 l(b) displays the evolution of the par- thresholds were cI = 0.40, c2 = 0.53, c3 = 0.26. c4 = titioning during relaxation. The layout is the same one 0.28, and cs = 0.19, corresponding to the transforms 4.‘ I, used in the previous figures, showing every 5 iterations. * y”) discussed above. A 246 X 246 piece of the orig- except that there are 10 iterations between the final two GEMAN ef U/ BOUNDARY DETECTION BY CONSTRAINED OPTIMIIATION 623

the conditional model. In both cases we let Ak grow line- arly. Stochastic relaxation at low temperature is more effec- tive than at zero temperature (essentially iterative im- provement). However, deterministic relaxation sufficed for all but two scenes, the ice floes and the four texture collage; these results could not be duplicated with deter- ministic relaxation. In one case, we present both results for comparison. Generally, deterministic relaxation stabilizes in 5- 10 sweeps whereas stochastic relaxation requires more sweeps, perhaps 20-60. We provide several pictures showing the evolution of the algorithm. Penalties: All the experiments were conducted with the same forbidden patterns, namely those in Fig. 12, with the exception of the house scene, for which the last pat- tern was omitted. (At the resolution used for the house, namely U = 3, the inclusion of that pattern would inhibit the formation of structures at the scale of six pixels; many such nontrivial structures appear in that scene.) Thus, the penalty function V(x)records a unit penalty for each oc- currence in the boundary map x = { xI,s E Sg)}of any of the five patterns depicted in Fig. 12. It is interesting to note that in no case was the final labeling completely free of penalties, i.e., V(a) = 0. Perhaps this could be (b) achieved with a proper annealing schedule, or with up- Fig. 11. (a) Texture collage: rug, plastic. cloth. (h) Evolution of the la- dates of more than five sites. bels. upper left to lower right. Measures of Disparity: All the experiments are based on instances of the measures (3.3)-(3.5) described in Sec- tion 111. panels. The lower right panel is the partitioning after the 1) For the first experiment, the cart scene, the bound- 30th sweep, by which time the pattern was “frozen.” ary resolution is U = 1 and we employed the measure B. Bounday Model given in (3.3) with y = lOA and the raw difference 1 y,,. - y,*l modulated by the four nearest differences of the There are five test images: one made indoors from tin- same orientation as ( s*, t* ). Thus, for the horizontal ker toys (“cart”), an outdoor scene of a house, another pair ( s, t ) of adjacent boundary sites, of ice floes in the ocean (the same SAR image used above), and two texture mosaics constructed from the Brodatz album. Processing: All the experiments were performed with the same site-visitation schedule. Given the resolution U. which varies among experiments, the sites of the sublat- where s* = (i, j),t* = (i + 1, j),and is the mean tice Sf’ were addressed in a raster-scan andjive sites were absolute intensity difference over the image. The utility simultaneously updated. Specifically, at each visit to the seems largely impervious to the choice of the scaling con- site (io + 1, jo + 1 ), the values of the boundary process stant (here = 10) for the mean as well as to the range of at this site and its four nearest neighbors, { ((i 1 )U + the modulation. 1, ( j k 1 ) U + 1 ) }, were replaced based on the condi- 2) We used the Kolmogorov-Smirnov measure (3.4) tional distribution of these five boundary variables given for both the house and ice floes scenes. For the house, we the variables at the other sites and the data J. Of course chose U = 3 and blocks of size 25: the setup is depicted this distribution is concentrated on the 2s = 32 possible in Fig. 6. Due to the uniform character of the background configurations for these five variables. (e.g., the sky) the distance (3.4) was computed based on Two update mechanisms were employed: stochastic re- the transformed data yIi = jI, + qr,, where { qrJ} are in- laxation and the “zero-temperature,” deterministic vari- dependent variables, and distributed with a triangular ation discussed earlier. In the former case, the updated density, specifically that of 1O(u, + U?), ul, U? uniform binary quintuple is a sample from the aforementioned (and independent) on [ 0, 1 1. conditional distribution, which varies depending on the The boundary resolution for the radar experiment is U penalty weight AL for the kth sweep of the lattice SF’. Of = 8. reflecting the larger important structures there; the course “O-temperature” refers to replacing the sample by image is 512 x 512. The dynamic range is very narrow 624 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 12. NO. 7, JULY 1990

1 1 1 0 11 house, and ice floes, the parameters a and d* were chosen 101 111 010 11 as follows. Find the mean disparity over all (vertical and 1 1 1 ( t ); Fig. 12. Forbidden patterns for the boundary experiments. horizontal) values of As,, for relevant bonds s, take a equal to the 99th percentile of those above the mean and d* equal to the 70th percentile of those above the and the difference between the dark water and somewhat mean. This yields the values cy = 150, d* = 42 for the less dark ice is essentially one of texture, due in part to cart scene; recall that for this experiment, both the grid the customary speckle noise. In particular, the ice cannot and block sizes are unity. For the house scene ( (T = 3) be well-differentiated from the water based on shading the Kolmogorov-Smimov statistics were computed over alone. The disparity measure is (3.4), applied to the raw 5 X 5 blocks, and the resulting parameters are then a = image data over 24 x 24 blocks. The problem encoun- 1 and d* = 0.7. (The number of distances at (the maxi- tered in the house scene is actually alleviated by the spec- mum) value A = 1 was considerable. ) Finally, for the ice kle. floes, the recipe above yielded a = 0.33, d* = 0.13. 3) The texture mosaic experiments are based on the Tuming to the experiments with texture mosaics, let measure (3.5) for a particular family AI, * , A5 of five C;,k denote the normalizing constant in (3.5) for feature i, data transformations or “features.” In each case, the res- 1 I i 5 5, and texture k, 1 I k c: K, where K is the olution is U = 5 and block size is 21 X 21. Recall that number of textures in the mosaic. For each feature i and these five features are combined into a single measure of texture type k, we computed the histogram of the (com- change according to the formula in (3.5). The transfor- bined vertical and horizontal) Kolmogorov-Smimov dis- mations used are the range tances and selected Ci,k = loo( 1 - y) percentile of that histogram. Specifically, we took y = 0.01 for the two = Zjl’ maxtc V, Yt - mints V, Yt Brodatz collages. (Other experiments indicated that any (small) value of y will suffice, say 0 5 y I0.03. ) Thus, over a x window Vs centered at pixel and the four 7 7 s, 100( 1 - y) % of the distances d( y(i)(Dl),y(;)(D2)) directional residuals are below c;,~within each texture type k. Now set c; = maxl sksK~i,k, ensuring that at most ky( lO0)X of the interior differences As,f( y) within the entire collage will exceed the threshold d* = 1. Finally, we put a = 2. Fig. 13 (Cart Scene): Sixty sweeps of stochastic relax- ation were run with tk * 0.05 and Xp 7 3. Actually, all the boundaries were “in place” after about 10 sweeps, as illustrated in Fig. 13(b), which shows every third sweep up to the 46th. The image is 110 X 110. Not shown is a run with the deterministic algorithm; the results are vir- tually indistinguishable. These residuals were then uniformly averaged over V,, Fig. 14 (House Scene): This 256 X 256 monochrome yielding the final features y(’), * * Y‘~). image was supplied to us by the Visions group at the Uni- It is instructive to compare the Kolmogorov-Smimov versity of Massachusetts. The update is by deterministic differences for the raw and transformed data over these relaxation with Xk increasing linearly from ho = 0 to Xlo texture mosaics. Typically, if one looks at the resulting = 2. two histograms of differences for a give transform, on Fig. 15 (Ice Floes): The image in Fig. 15(a) is 512 X finds that, whereas the raw (Kolmogorov-Smimov) dif- 512. We did 60 sweeps of stochastic relaxation with tk ferences are actually larger at the texture borders, the = 0.1 and hk 7 2. Fig. 15(b) shows 16 “snapshots”- transitions between the borders and interiors are sharper every third sweep as in Fig. 13, as well as the final (60th) for the transformed data. Detecting the boundaries with sweep. the raw data necessitates an unacceptable number of Fig. 16 (Brodatz Collage 1): The collage is composed “false alarms” in the sense of interior “microedges.” of nine Brodatz textures [Fig. 16(a)]: leather, grass, and Finally, the values of the constants cI, . . * , c5 used in pigskin (top row), raffia, wool, and straw (middle row), the construction of As,, [see (3.5)] are selected by restrict- and water, wood, and sand (bottom row). Two of the tex- ing the percentage of false alarms. The details are given tures, leather and water, are repeated in the two circles. in the following section. The image size is 384 x 384, the individual textures all Parameter Selection: Recall that the total change across being 128 X 128. We show the results [Fig. 16(b)] of the boundary segment (s, t)is measured by 4(As,r( y)), both the deterministic (left) and stochastic (right) algo- where 4 is given in (3.2). Given A, there are two param- rithms; they are roughly comparable. Other false alarm eters to choose: a normalizing constant a and the intercept rates (y = 0.005, 0.02, and 0.03) yield the same overall d* = 4-’(0). quality. For the object boundary experiments, namely the cart, Fig. 17 (Brodatz Collage 2): There are four textures GOMAN CI(11.: BOUNDARY I~ITIIKTIONBY CONSTKAINEU OVIIML%AIION 625

(b) (b)

Fig. 13. (a) Irnage ai il tinker toy cart (b) Evolution or the boundaries Fig. 15. (a) Ice flocs imagc [sec Fig. 9(a)]. (h) Evalotinn of the boundaries (upper lcfl to lowcr right) with moltiplc-sitc stochastic relaxation (at resolution LT = 8) during stochastic relaxiitinn.

(b) (b) Fig. 14 (a) House scenc. (h) Final boundary placcmenls at resolution c Pig. 16. (a) Collage of nine Brodetz texturces. (b) Final boundary place- = 3 with deterministic relaxation. mcnts with deterministic relaxalion (Mi) and stochastic relaxatiun (right). 626 IEtP TRANSACTIOhS ON PATTERN ANALYSIS AND MACHINF IhTELLIGENCE. VOL 12. NO 7. JULY li)i)o + + ++ ++ +++ + ++ ++ +I + Fig. 18. Family of possible boundary primitives

dar. One might integrate these with a composite disparity measure of the type used in the texture studies, viz. A = max A(k),in order to improve discrimination. Another possibility is to attempt to classify boundaries, especially if range data is also available. Let y‘” and y‘” be range and brightness values, and let x, assume three values, say 0, 1, 2, corresponding to “off ,” “occluding (= depth) boundary” and “other boundary” (for in- stance a crease or shadow). Now rig the energy function to couple A ( y(’)) with 4 = { 4, }, 5, = 6, } (x,) and A( $’)) with q = (qs}, qs = 6il,2}(xs);for example, just add the two corresponding terms to form U. The pen- alty patterns are the usual ones regarding dead-ends etc. (with 0’s and 2’s as well as 0’s and l’s), and mixtures of 0, 1, 2 corresponding to physically implausible (or im- possible) transitions.

(b) C. Region-Boundary Model Fig. 17. (a) Collage of four Brodatz textures. (b) Two runs of stochastic relaxation showing every third sweep. Put the boundary and region labels into a single model, for example of the form U,(X’,y) + U2(x‘,y). Now [Fig. 17(a)]: raffia (upper left), sand (upper right), wool penalize improper local configurations in the pair (x’, x‘), (bottom), and pigskin (center). Two runs (diRerent seeds) for example “type 1” errors (a boundary “between” like are shown [Fig. 17(b)] individual frames representing region labels) and “type 2” errors (no boundary “be- every third sweep. tween” unlike labels). The problem may be that there are deep local minima which are unfaithful to the data but VI. GENERALIZATIONS difficult to escape from, at least without updating many These models may be extended in many directions, or sites. combined into a single model. What follows is a brief description of three such generalizations. VII. SUMMARY We have developed algorithms for partitioning an im- A. Other Boundary Primitives age, possibly textured, into homogeneous regions and for The elementary boundary unit or primitive is a horizon- locating boundaries at significant transitions. Both are tal or vertical segment whose length is resolution-depen- based on a scale-dependent notion of disparity, or gra- dent, namely 0 + 1 in pixel units. As a result, a discon- dient. and both incorporate prior expectations about reg- tinuity running at 45” is detected and localized with less ular boundary or region configurations. reliability than one at 0” or 90”, where the disparity mea- The disparity measure scores the difference between the sure has maximum sensitivity. An obvious remedy is to statistical structures of two scale-dependent blocks of pix- replace the pairs (s, r), by other “primitives,” e.g., a els. We have experimented with several measures. Ide- distinguished family of triples, such as the six represented ally, the disparity will be large when there is an apparent (up to translates) in Fig. 18. difference, either in gray-level or in texture, between the More generally, for any such family a consider the in- blocks. Usually, it was necessary to tune the measure to teraction term the particular textures or structures involved; a more uni- versal measure may require both better preprocessing (e.g., first extracting reflectance from intensity [33]) and where A,( y) is a measure of the disparity in the “A-di- better use of “high-level” information about expected rection.” In Fig. 18, these directions are, respectively, macrostructures and shapes. For texture discrimination, w/4, 7r/4, -w/4, --w/4, w/2, and 0. One can imagine by either partitions or boundary placement, we introduce a variety of ways to situate two appropriately shaped sets a class of features, or transformations, that are decidedly of pixels to straddle (and abut) segments such as these. multivariate, depending on the spatial distribution of large numbers of pixel gray levels. Our disparity measure is B. Multivariate Datu then a composite of measures of differences in the histo- Data may be available from several sensors, e.g., mul- grams of the block data, under the various transforma- tispectral satellite data, or an optical camera and laser ra- tions. Low-order features, such as those derived solely GEMAN er 01. BOUNDARY DETECTION BY CONSTRAINED OPTlMlLATlON 627

from raw gray-level histograms and cooccurrence mat- 121 J. Besag. "On the statistical analysis of ditty pictures" (with discus- rices, were not as effective in our framework. sion). J. Roy. Statist. Soc., series B, vol. 48, pp. 259-302. 1986. [31 A. Blake, "The least disturbance principle and weak constidints." Disparity measures between pairs of pixel blocks drive Partern Recog. Lett., vol. I. pp. 393-399, 1983. the segmentations or boundary placements through a "la- [4] E. Bonomi and J.-L. Lutton, "The n-city travelling salesman prob- lem: Statistical mechanics and Metropolis algorithm," SIAM Re\%., bel model, " that specifies likely label configurations con- vol. 26. pp. 551-568, 1984. ditional on disparity data. For partitioning, labels are ge- [5] A. Brandt, "Multi-level approaches to large scale problems," in Proc. neric and associated with local blocks of the image. Two lnt. ConRr. Mathemariciuns 1986, A. M. Gleason, Ed., Amer. Math. labels are the same if their respective regions are judged Soc.. Providence. RI, 1987. [6] P. BrodatL. Texture: A Phorographir Album for Artists and Desigr7- to be instances of the same texture. For boundary place- ers. New York: Dover, 1966. ment, the labels are zero or one, and interpreted as indi- 171 T. M. Cannon, H. J. Trussell. and B. R. Hunt. "Comparison of im- cating, respectively, the absence or presence of boundary age restoration methods," Appl. Opt., vol. 17, pp. 3385-3390. 1978. 181 V. Cemy. "A thermodynamical approach to the travelling salesman elements. A priori knowledge about acceptable label con- problem: An efficient simulation algorithm," Inst. Phys. Biophys., figurations, which, for example, may preclude very small Comenius Univ., Bratislava, Czechoslovakia, 1982. or thin regions, or cluttered boundary elements, is applied (91 B. Chalmond, "Image restoration using an estimated Markov model," Dep. Math., Univ. Paris, Orsay, France. 1987. by restricting labels to an appropriate subset of all possi- IO] T.-S. Chiang and Y. Chow, "On eigenvalues and optimal annealing ble configurations. The result of modeling disparity-label rate," Inst. Math., Academia Sinica, Taipei, Taiwan, preprint 1987. interactions and of defining restricted configurations can Ill F. S. Cohen and D. B. Cooper, "Simple parallel hierarchical and relaxation algorithms for segmenting noncausal Markovian random be regarded as a Gibbs distribution jointly on pixel gray fields." IEEE Trans. Pattern Anal. Machine Inrell., vol. PAMI-9. levels and label configurations, with the marginal label pp. 195-219. 1987. distribution supported on a subset of the configuration 121 H. Derin and W. S. Cole, "Segmentation of textured images using Gibbs random fields," Compur. Vision. Graphics, Image Processing, space. vol. 35. pp. 72-98, 1986. Partitioning and boundary finding is accomplished by [13] H. Derin and H. Elliott. "Modeling and segmentation of noisy and approximating the maximum a posreriori (MAP) label textured images using Gibbs random fields," IEEE Trans. Partern Anal. Machine Intell., vol. PAMI-9. pp. 39-55. 1987. configuration, conditioned on observed pixel data. Be- (141 H. Derin and C. Won, "A parallel image segmentation algorithm cause certain configurations are forbidden, MAP estima- using relaxation with varying neighborhoods and its mapping to array tion amounts to constrained optimization. Stochastic re- processors," Dep. Elec. Comput. Eng., Univ. Massachusetts, Tech. Rep.. 1986. laxation and simulated annealing are extended to [I51 P. A. Devijver and M. M. Dekesel, "Learning the parameters of a accommodate constraints by introducing a nonnegative hidden Markov random field image model: A simple example." in constraint function that is zero only for allowed label con- Theory and Applications. P. A. Devijver and J. Kittler. Eds. Heidelberg: Springer-Verlag. 1987, pp. 141-163. figurations. The constraint function, with a multiplicative [ 161 A. Gagalowicz and Song De Ma, "Sequential synthesis of natural constant, is added to the posterior energy, and the con- textures." Cornput. Vision, Gruphics, Image Processing, vol. 30, pp. stant is slowly increased during relaxation. Straightfor- 289-315. 1985. [I71 D. Geman. "A stochastic model for boundary detection," Image Vi- ward calculations establish an upper bound on the rate of sion Cofnput., May 1987. increase of this multiplicative constant that ensures con- (181 D. Geman and S. Geman, "Relaxation and annealing with con- vergence of the relaxation and annealing algorithms to the straints." Division Appl. Math., Brown Univ., Complex Systems Tech. Rep. 35, 1987. desired limits. In a series of partitioning and boundary- [I91 D. Geman, S. Geman, and C. Graffigne, "Locating texture and object finding experiments, deterministic and other fast varia- boundaries." in Pattern Recognition Theor?. and Applications. P. A. tions of the constrained relaxation algorithm are found to Devijver and J Kittler, Eds. Heidelberg: Springer-Verlag. 1987. [20] S. Geman and D. Geman, "Stochastic relaxation, Gibbs distribu- be effective. tions. and the Bayesian restoration of images," IEEE Trans. Pattern The partitioning model is appropriate when a small Anal. Machine Inre//.. vol. PAMI-6, pp. 721-741. 1984. number of homogeneous regions are present. Disjoint in- (2I] S. Geman and C. Graffigne. "Markov random field image models and their applications to computer vision," in Proc. Int. Congr. Marhe- stances of a common texture are automatically identified. maricians, 1986, A. M. Gleason, Ed., Amer. Math. Soc.. Provi- The boundary model can be effective in complex, multi- dence. RI. 1987. textured, scenes. Both models sometimes require prior 1221 S. Geman and D. E. McClure. "Statistical methods for tomographic image reconstruction." in Bull. IS1 (Proc. 46th Session Int. Srarisri- training to adjust parameters. ral Instirure), vol. 52, 1987. [23] B. Gidas. "Nonstationary Markov chains and convergence of the an- nealing algorithm," J. Statist. Phys., vol. 39, pp. 73-131, 198.5. ACKNOWLEDGMENT 1241 -. "A renormalization group approach to image processing prob- lems," IEEE Truns. Puttern Anal. Machine Inrell., vol. 1 I. pp. 164- We are indebted to E. Bienenstock for recommending 180, Feb. 1989. the random topology employed in the partition model, and (251 C. Graffigne. "Experiments in texture analysis and segmentation," for pointing out a fascinating connection to some work in Ph.D. dissertation, Division Appli. Math., Brown Univ., 1987. [26] P. J. Green. "Discussion on "On the statistical analysis of dirty pic- neural modeling [42]. tures." J. Besag. J. Roy. Starisr. Soc.. vol. B-48. pp, 259-302. 1986. [27] D. M. Greig, B. T, Porteous, and A. H. Seheult, Discussion on "On the statistical analysis of dirty pictures." by J. Besag. J. Roy Srutist. REFERENCES SOC..vol. B-48. pp. 259-302, 1986. 1281 U. Grenander. "Tutorial in pattern theory." Division Appl. Math.. [I]E. Arts and P. van Laarhoven. "Simulated annealing: A pedestrian Brown Univ.. Lecture Notes Volume, 1984. review of the theory and some applications." in NA70 Adwrirecl Stud! 1291 W. E. L. Grimson and T. Pavlidis, "Discontinuity detection for vi-

Institlrte on Pattern Recognition: Theor! arid Applicatiorr\. Spa. Bel- sual surface reconstruction. " Compur. Vision. Graphics. Iinccjie Pro- gium. 1986. cessing, vol. 30. pp, 316-330. 1985. 628 IEEE TRAKSACTIONS ON PATTFRN AKALYSIS AND MACHINE INTELLIGENCF. VOL I?.NO 7. JULY IY90

[30] B. Hajek, “A tutorial survey of theory and applications of simulated [59] E. Triendl and T. Henderson. “A model for texture edges,” in Proc. annealing.” in Proc. 24rh /€€E Con$ Decisiori arid Control. 1985. 5th Int. Con,[ Parrerri Recognition, Miami Beach. FL, Dec. 1-4. pp. 755-760. 1980. pp. 1100-1 102. [31] R. M. Haralick. K. Shanmugam. and I. Denstein, “Textural features [60] H. L. Voorhees, Jr.. “Finding texture boundaries in images.” Mas- for image classification.” IEEE Trans. Sysr.. Man. Cyhrrn., vol. 6, ter’s thesis, Dep. Elec. Eng. Comput. Sci., Massachusetts Inst. Tech- pp. 610-621. 1973. nol.. 1987. [32] R. Holley and D. Stroock. “Simulated annealing via Sobolev ine- [61] W. A. Yasnoff, J. K. Mui, and J. W. Bacus, “Error measures for qualities,” preprint. 1987. scene segmentation,” Parrern Recog., vol. 9, pp. 217-231, 1977. (331 B. K. P. Horn and M. J. Brooks. “The variational approach to shape 1621 A. I. Zobrist and W. B. Thompson. “Building a distance function for from shading,” Compur. Visioti. Graphics, Image ProcessinLy,vol. Gestalt grouping.” I€€€ Tram. Comput.. vol. C-24, pp. 718-728, 33, pp. 174-208, 1986. 1975. [34] C.-R. Hwang and S.-J. Sheu. “Large time behaviors of perturbed diffusion Markov processes with applications.” I, 11, and Ill, Inst. Math., Academic Sinica, Taipei. Taiwan, preprint 1987. [35] A. Kashko and B. F. Buxton, “Markov random fantasies.” preprint, Donald Geman (M’84) received the B.A. degree 1986. in English literature from the University of Illi- [36] A. Kashko, H. Buxton. and B. F. Buxton, “Parallel stochastic optim- nois, Urbana, in 1965 and the Ph.D. degree in ization in computer vision,” preprint, 1987. mathematics from , Ev- [37] R. L. Kashyap and K. Eom. “Texture boundary detection based on anston. IL. in 1970. the long correlation model,” /E€€ Trans. Pattern Anal. Machine In- Since 1970 he has been a member of the De- fell.. vol. PAMI-2, pp. 58-67, 1989. partment of Mathematics and Statistics, Univer- sity of Massachusetts, Amherst. where he is cur- [38] R. L. Kashyap and A. Khotanzad. ”A model-based method for ro- tation invariant texture classification,” l€EE Trans. Parrern Anal. rently Professor of Mathematics and Statistics. His Machine ltirell. . vol. PAMI-8. pp. 472-481. 1986. visiting appointments include those at the Univer- (391 S. Kirkpatrick, C. D. Gellatt, Jr.. and M. P. Vecchi, “Optimization sity of North Carolina (1976-1977), Brown Uni- by simulated annealing.” Science. vol. 220, pp. 671-680, 1983. versity (1983), and UniversitC de Paris-Sud (1986). His research interests [40] K. Laws. “Textured image segmentation,” Ph.D. dissertation. Univ. are in stochastic processes. image processing, and computer vision. Southern California, Los Angeles, USCIPl Rep. 940, 1980. [41] R. Linsker, “An iterative-improvement penalty-function-driven wire routing system,” IEMJ. Res. Dewlop.. vol. 28, pp. 613-624, 1984. Stuart Geman (M’84) received the B.A. degree [42] C. von der Malsburg and E. Bienenstock. “Statistical coding and in physics from the University of Michigan, Ann short-term synaptic plasticity: A scheme for knowledge representa- Arbor, in 1971, the M.S. degree in physiology tion in the brain, ’‘ in Disordered System and Biological 0rgani:a- from Dartmouth College, Hanover, NH, in 1973. rion (NATO AS1 Series), E. Bienenstock. F. Fajelman SouliC, and and the Ph.D. degree in applied mathematics from G. Weisbuch, Eds. Berlin: Springer-Verlag. 1986. the Massachusetts Institute of Technology, Cam- [43] J. L. Marroquin. “Surface reconstruction preserving discontinui- bridge, in 1977. ties,” Artificial Intell. Lab., Massachusetts Inst. Technol., Memo Since 1977 he has been a member of the Di- 792. 1984. vision of Applied Mathematics at Brown Univer- [44] J. L. Marroquin, S. Mitter, and T. Poggio, “Probabilistic solution of sity. Providence, RI, where he is currently Pro- ill-posed problems in computational vision,” J. Amer. Starisr. As- fessor of Applied Mathematics. His research SOC.,vol. 82, pp. 76-89, 1987. interests include stochastic processes. image and speech processing. and [45] J. W. Modestino. R. W. Fries, and A. L. Vickers, “Texture discrim- computer vision. ination based upon an assumed stochastic texture model.” IEEE Dr. Geman is a Fellow of the Institute of Mathematical Statistics and Trans. Pattern Anal. Muchine Intell., vol. PAMI-3. pp. 557-580. was a recipient of the Presidential Young Investigator Award. 1981. [46] J. Moussouris. “Gibbs and Markov random systems with con- straints,” J. Srarisr. Phys., vol. 10. pp. 11-33. 1974. [47] D. W. Murray and B. F. Buxton. “Scene segmentation froin visual Christine Graffigne was born in LiCvin. France, motion using global optimization,” IEEE Trans. Parrern Anal. Ma- on December 1, 1959. She received the Master’s chine Intell.. vol. PAMI-9. pp. 220-228. 1987. degree in mathematics from Lille University. [48] T. Pavlidis, “A critical survey of image analysis methods.” Ex- France. in 1982. the French Ph.D. degree from panded version of paper delivered to 8th Int. Conf. Pattern Recogni- Paris-Sud University, Orsay, France, in 1986, and tion, Paris. France, Oct. 1986. the Ph.D. degree in applied mathematics from [491 T. Poggio, V. Torre, and C. Koch, “Computational vision and Brown University. Providence, RI. in June 1987. regularization theory.“ Narure. vol. 317. pp. 314-319, 1985. From 1987 until 1988, she was an Assistant [SO] B. D. Ripley, “Statistics. images. and pattern recognition.” Canu- Professor at Paris-Sud University. Orsay. France. dian J. Stati.sr., vol. 14, pp. 83-1 1 I, 1986. Since 1988. she has been a Research Scholar with (511 A. Rosenfeld and L. S. Davis. ”Image segmentation and image the French National Center for Scientific Research models,” Proc. I€€€. vol. 67, pp. 764-772, 1979. (CNRS), and has worked at Paris-Sud University. Her research interests [521 B. W. Silverman. Discussion on “On the statistical analysis of dirty are focused on stochastic processes. image processing. pattern recognition, pictures,” by J. Besag. J. Roy. Srutisr. Soc.. vol. 8-48. pp. 259-302. and parallel computing. 1986. [53] T. Simchony and R. Chellappa, “Stochastic and deterministic algo- rithms for texture segmentation,’’ Dep. EE-Syst.. Univ. Southern California, 1988. Ping Dong (M’88) was born in Zhuzhou, China. [S4] R. H. Swendsen and J.-S. Wang. “Nonuniversal critical dynamics in on December 2, 1954. He received the B.S. and Monte Carlo Fimulations,” Phys. Rev. Lett.. vol. 58. pp. 86-88, M.S. degrees in electrical engineering from 1987. Northwest Telecommunication Engineering Insti- [55] H. szu, “Non-convex optimization,” in Proc. SPIE Con$ Real Time tute, Xian, China, in 1979 and 1981, respec- Signal Processing fX, vol. 698, 1986. tively. and the Ph.D. degree in electrical engi- 1-56] D. Terzopoulos. “Regularization of inverse visual problems involv- neering from University of Massachusetts at ing discontinuities.” fEEE Trans. Parrerri Anal. Machine /rite//., vol. Amherst in 1987. PAMI-8, pp. 413-424. 1986. He joined the Codex, Motorola research and [571 W. B. Thompson. “Textural boundary analysis.“ f€E€ Trans. Coni- advanced development department, Canton, MA, PU/..vol. C-26, pp. 272-276. 1977. in 1987. His current research interests are image [58] E. E. Triendl, “Texturerkennung and texturreproduktion.” Kyher- segmentation and estimation algorithms, random signal processing and es- nrtik, vol. 13, pp. 1-5. 1973. timation algoriihms, and communication theory.