Robust Analysis of Feature Spaces:

Color Image Segmentation

Dorin Comaniciu Peter Meer

Department of Electrical and Computer Engineering

Rutgers University, Piscataway, NJ 08855, USA

Keywords: robust pattern analysis, low-level vision, content-based indexing

Abstract mo des can b e very large, of the order of tens.

The highest density regions corresp ond to clusters

A general technique for the recovery of signi cant

centered on the mo des of the underlying probability

image features is presented. The technique is basedon

distribution. Traditional clustering techniques [6], can

the mean shift algorithm, a simple nonparametric pro-

b e used for feature space analysis but they are reliable

cedure for estimating density gradients. Drawbacks of

only if the number of clusters is small and known a

the current methods (including robust clustering) are

priori. Estimating the number of clusters from the

avoided. Featurespace of any naturecan beprocessed,

data is computationally exp ensive and not guaranteed

and as an example, image segmentation is dis-

to pro duce satisfactory result.

cussed. The segmentation is completely autonomous,

Amuch to o often used assumption is that the indi-

only its class is chosen by the user. Thus, the same

vidual clusters ob ey multivariate normal distributions,

program can produce a high quality edge image, or pro-

i.e., the feature space can b e mo deled as a mixture of

vide, by extracting al l the signi cant , a prepro-

Gaussians. The parameters of the mixture are then

cessor for content-based query systems. A 512  512

estimated by minimizing an error criterion. For exam-

color image is analyzed in less than 10 seconds on a

ple, a large class of thresholding algorithms are based

standard workstation. Gray level images are hand led

on the Gaussian mixture mo del of the histogram, e.g.

as color images having only the coordinate.

[11]. However, there is no theoretical evidence that an

1 Intro duction

extracted normal cluster necessarily corresp onds to a

Feature space analysis is a widely used to ol for solv-

signi cant image feature. On the contrary, a strong

ing low-level image understanding tasks. Given an im-

artifact cluster may app ear when several features are

age, feature vectors are extracted from lo cal neighb or-

mapp ed into partially overlapping regions.

hoods and mapp ed into the space spanned by their

Nonparametric density estimation [4, Chap. 6]

comp onents. Signi cant features in the image then

avoids the use of the normality assumption. The two

corresp ond to high density regions in this space. Fea-

families of metho ds, Parzen window, and k-nearest

ture space analysis is the pro cedure of recovering the

neighb ors, b oth require additional input information

centers of the high density regions, i.e., the represen-

(typ e of the kernel, numb er of neighb ors). This infor-

tations of the signi cant image features. Histogram

mation must b e provided by the user, and for multi-

based techniques, Hough transform are examples of

mo dal distributions it is dicult to guess the optimal

the approach.

setting.

When the number of distinct feature vectors is

Nevertheless, a reliable general technique for fea-

large, the size of the feature space is reduced by group-

ture space analysis can be develop ed using a simple

ing nearbyvectors into a single cell. A discretized fea-

nonparametric density estimation algorithm. In this

ture space is called an accumulator. Whenever the size

pap er we prop ose such a technique whose robust b e-

of the accumulator cell is not adequate for the data,

havior is sup erior to metho ds employing robust esti-

serious artifacts can app ear. The problem was exten-

mators from statistics.

sively studied in the context of the Hough transform,

2 Requirements for Robustness

e.g. [5]. Thus, for satisfactory results a featurespace

should have continuous coordinate system. The con- Estimation of a cluster center is called in statistics

tent of a continuous feature space can b e mo deled as the multivariate lo cation problem. To be robust, an

a sample from a multivariate, multimo dal probability estimator must tolerate a p ercentage of outliers, i.e.,

distribution. Note that for real images the number of data p oints not ob eying the underlying distribution 1

of the cluster. Numerous robust techniques were pro- image domain. That is, the feature vectors satisfy ad-

posed [10, Sec. 7.1], and in computer vision the most ditional, spatial constraints. While these constraints

widely used is the minimum volume el lipsoid (MVE) are indeed used in the currenttechniques, their role is

estimator prop osed by Rousseeuw [10,p. 258]. mostly limited to comp ensating for feature allo cation

errors made during the independent analysis of the

The MVE estimator is ane equivariant(anane

feature space. To b e robust the featurespace analysis

transformation of the input is passed on to the es-

must ful ly exploit the image domain information.

timate) and has high breakdown point (tolerates up

As a consequence of the increased role of image

to half the data b eing outliers). The estimator nds

domain information the burden on the feature space

the center of the highest density region by searching

analysis can b e reduced. First al l the signi cant fea-

for the minimal volume ellipsoid containing at least

tures are extracted, and only after then are the clusters

h data p oints. The multivariate lo cation estimate is

containing the instances of these features recovered.

the center of this ellipsoid. To avoid combinatorial

The latter pro cedure uses image domain information

explosion a probabilistic search is employed. Let the

and avoids the normality assumption.

dimension of the data b e p. A small number of (p + 1)-

tuple of p oints are randomly chosen. For each(p + 1)-

Signi cant features corresp ond to high densityre-

tuple the mean vector and covariance matrix are com-

gions and to lo cate these regions a search windowmust

puted, de ning an ellipsoid. The ellipsoid is in ated

be employed. The number of parameters de ning

to include h points, and the one having the minimum

the shap e and size of the window should b e minimal,

volume provides the MVE estimate.

and therefore whenever it is p ossible the featurespace

should be isotropic. A space is isotropic if the distance

Based on MVE, a robust clustering technique with

between two p oints is indep endent on the lo cation of

applications in computer vision was prop osed in [7].

the p oint pair. The most widely used isotropic space is

The data is analyzed under several \resolutions" by

the Euclidean space, where a sphere, having only one

applying the MVE estimator rep eatedly with h values

parameter (its radius) can b e employed as searchwin-

representing xed p ercentages of the data p oints. The

dow. The isotropy requirement determines the map-

b est cluster then corresp onds to the h value yielding

ping from the image domain to the feature space. If

the highest density inside the minimum volume ellip-

the isotropy condition cannot be satis ed, a Maha-

soid. The cluster is removed from the feature space,

lanobis metric should be de ned from the statement

and the whole pro cedure is rep eated till the space is

of the task.

not empty. The robustness of MVE should ensure

that each cluster is asso ciated with only one mo de of

We conclude that robust feature space analysis re-

the underlying distribution. The numb er of signi cant

quires a reliable pro cedure for the detection of high

clusters is not needed a priori.

density regions. Such a pro cedure is presented in the

next section.

The robust clustering metho d was successfully em-

ployed for the analysis of a large variety of feature

spaces, but was found to b ecome less reliable once

3 Mean Shift Algorithm

the number of mo des exceeded ten. This is mainly

A simple, nonparametric technique for estimation

due to the normality assumption emb edded into the

of the density gradientwas prop osed in 1975 byFuku-

metho d. The ellipsoid de ning a cluster can be also

naga and Hostetler [4, p. 534]. The idea was recently

viewed as the high con dence region of a multivari-

generalized by Cheng [2].

ate normal distribution. Arbitrary feature spaces are

Assume, for the moment, that the probability den-

not mixtures of Gaussians and constraining the shap e

sity function p(x ) of the p-dimensional feature vectors

of the removed clusters to b e elliptical can intro duce

x is unimo dal. This condition is for sake of clarity

serious artifacts. The e ect of these artifacts propa-

only, later will be removed. A sphere S of radius

x

gates as more and more clusters are removed. Fur-

r , centered on x contains the feature vectors y such

thermore, the estimated covariance matrices are not

that k y x k r . The exp ected value of the vector

reliable since are based on only p +1 points. Subse-

z = y x,given x and S is

x

quent p ostpro cessing based on all the p oints declared

inliers cannot fully comp ensate for an initial error.

Z

To be able to correctly recover a large number of

 = E [ zjS ] = (y x)p(y jS )dy (1)

x x

S

signi cant features, the problem of feature space anal- x

Z

p(y )

ysis must b e solved in context. In image understand-

dy (y x) =

ing tasks the data to be analyzed originates in the

p(y 2S )

x

S

x 2

If S is suciently small we can approximate lo cation of the mo de detected by the one-dimensional

x

MVE mo de detector, i.e., the center of the shortest

p

p(y 2S )=p(x )V where V = c  r (2)

x

rectangular window containing half the data points

S S

x x

[10, Sec. 4.2]. Since the data is bimo dal with nearby

is the volume of the sphere. The rst order approxi-

mo des, the mo de estimator fails and returns a lo ca-

mation of p(y )is

tion in the trough. The starting p oint is marked by

the cross at the top of Figure 1.

T

p(y )=p(x )+(y x) rp(x ) (3)

12

where rp(x) is the gradient of the probability density

10

function in x. Then

Z 8

T

(y x)(y x) rp(x)

 = dy (4)

6

V p(x )

S

S

x x

4

since the rst term term vanishes. The value of the

2

integral is [4,p. 535]

0

2 −4 −2 0 2 4 6 8

rp(x ) r

(5)  =

p +2 p(x)

Figure 1: An example of the mean shift algorithm.

or

2

In this synthetic data example no a priori informa-

r rp(x )

E [ x j x 2S ] x = (6)

x

tion is available ab out the analysis window. Its size

p +2 p(x )

was taken equal to that returned by the MVE esti-

Thus, the mean shift vector, the vector of di erence

mator, 3.2828. Other, more adaptive strategies for

between the lo cal mean and the center of the window,

setting the search window size can also b e de ned.

is prop ortional to the gradient of the probability den-

Table 1: Evolution of Mean Shift Algorithm

sity at x. The prop ortionality factor is recipro cal to

p(x). This is b ene cial when the highest density re-

Initial Mo de Initial Mean Final Mean

gion of the probability density function is sought. Such

1.5024 1.4149 0.1741

region corresp onds to large p(x ) and small rp(x ), i.e.,

to small mean shifts. On the other hand, low den-

In Table 1 the initial values and the nal lo cation,

sity regions corresp ond to large mean shifts (ampli ed

shown with a star at the top of Figure 1, are given.

also by small p(x ) values). The shifts are always in

The mean shift algorithm is the to ol needed for fea-

the direction of the probability density maximum, the

ture space analysis. The unimo dality condition can

mo de. At the mo de the mean shift is close to zero.

be relaxed by randomly cho osing the initial lo cation

This prop erty can b e exploited in a simple, adaptive

of the search window. The algorithm then converges

steep est ascent algorithm.

to the closest high density region. The outline of a

general pro cedure is given b elow.

Mean Shift Algorithm

1. Cho ose the radius r of the searchwindow.

FeatureSpaceAnalysis

2. Cho ose the initial lo cation of the window.

1. Map the image domain into the feature space.

2. De ne an adequate number of search windows at

3. Compute the mean shift vector and translate the

random lo cations in the space.

search windowby that amount.

3. Find the high density region centers by applying

4. Rep eat till convergence.

the mean shift algorithm to eachwindow.

To illustrate the ability of the mean shift algorithm,

4. Validate the extracted centers with image domain

200 data p oints were generated from two normal distri-

constraints to provide the featurepalette.

butions, b oth having unit variance. The rst hundred

5. Allo cate, using image domain information, all the

points b elonged to a zero-mean distribution, the sec-

feature vectors to the feature palette.

ond hundred to a distribution having mean 3.5. The

data is shown as a histogram in Figure 1. It should b e The pro cedure is very general and applicable to any

emphasized that the feature space is pro cessed as an feature space. In the next section we describ e a color

ordered one-dimensional sequence of p oints, i.e., it is image segmentation technique develop ed based on this

continuous. The mean shift algorithm starts from the outline. 3

4 Color Image Segmentation the desired class, the sp eci c op erating conditions are

derived automatically by the program.

Image segmentation, partioning the image into ho-

Images are usually stored and displayed in the RGB

mogeneous regions, is a challenging task. The richness

space. However, to ensure the isotropy of the feature

of visual information makes b ottom-up, solely image

space, a uniform with the p erceived color

driven approaches always prone to errors. To be re-

di erences measured by Euclidean distances should

liable, the current systems must be large and incor-

  

b e used. Wehavechosen the L u v space [13, Sec.

p orate numerous ad-ho c pro cedures, e.g. [1]. The

3.3.9], whose co ordinates are related to the RGB val-

paradigms of gray level image segmentation (-

ues by nonlinear transformations. The daylight stan-

based, area-based, edge-based) are also used for color

dard D was used as reference illuminant. The chro-

images. In addition, the physics-based metho ds take

65

 

matic information is carried by u and v , while the

into account information ab out the image formation



lightness co ordinate L can b e regarded as the relative

pro cesses as well. See, for example, the reviews [8, 12].

brightness. Psychophysical exp eriments show that

The prop osed segmentation technique do es not con-

  

L u v space may not be p erfectly isotropic [13, p.

sider the physical pro cesses, it uses only the given

311], however, it was found satisfactory for image un-

image, i.e., a set of RGB vectors. Nevertheless, can

derstanding applications. The image capture/display

be easily extended to incorp orate supplementary in-

op erations also intro duce deviations which are most

formation ab out the input. As homogeneity criterion

often neglected.

color similarity is used.

The steps of color image segmentation are presented

Since p erfect segmentation cannot be achieved

b elow. The acronyms ID and FS stand for image do-

without a top-down, knowledge driven comp onent, a

main and feature space resp ectively. All feature space

b ottom-up segmentation technique should

  

computations are p erformed in the L u v space.

 only provide the input into the next stage where

the task is accomplished using a priori knowledge

1. [FS] De nition of the segmentation parameters.

ab out its goal; and

The user only indicates the desired class of segmen-

 eliminate, as much as p ossible, the dep endence on

tation. The class de nition is translated into three

user set parameter values.

parameters

Segmentation resolution is the most general param-

 the radius of the search window, r ;

eter characterizing a segmentation technique. While

 the smallest number of elements required for a

this parameter has a continuous scale, three imp ortant

signi cant color, N ;

classes can b e distinguished.

min

Undersegmentation corresp onds to the lowest res-

 the smallest number of contiguous required

olution. Homogeneity is de ned with a large tol-

for a signi cant image region, N .

con

erance margin and only the most signi cant colors

The size of the searchwindow determines the resolu-

are retained for the feature palette. The region

tion of the segmentation, smaller values corresp onding

b oundaries in a correctly undersegmented image

to higher resolutions. The sub jective (p erceptual) def-

are the dominant edges in the image.

inition of a homogeneous region seems to dep end on

Oversegmentation corresp onds to intermediate res-

the \visual activity" in the image. Within the same

olution. The feature palette is rich enough that

segmentation class an image containing large homoge-

the image is broken into many small regions from

neous regions should b e analyzed at higher resolution

which any sought information can be assembled

than an image with many textured areas. The sim-

under knowledge control. Oversegmentation is

plest measure of the \visual activity" can b e derived

the recommended class when the goal of the task

from the global covariance matrix. The square ro ot

is ob ject recognition.

of its trace,  , is related to the power of the signal

Quantization corresp onds to the highest resolution.

(image). The radius r is taken prop ortional to  . The

The feature palette contains all the imp ortant col-

rules de ning the three segmentation class parameters

ors in the image. This segmentation class b ecame

are given in Table 2. These rules were used in the seg-

imp ortant with the spread of image databases,

mentation of a large variety images, ranging from sim-

e.g., [3, 9]. The full palette, p ossibly together

ple blo o d cells to complex indo or and outdo or scenes.

with the underlying spatial structure, is essential

When the goal of the task is well de ned and/or all

for content-based queries.

the images are of the same typ e, the parameters can

The prop osed color segmentation technique op erates

b e ne tuned.

in any of the these three classes. The user only cho oses 4

are also considered when de ning the connected com-

Table 2: Segmentation Class Parameters

ponents Note that the threshold is not N whichis

con

used only at the p ostpro cessing stage.

Segmentation Parameter

Class r N N

min con

7. [ID+FS] Determining the nal featurepalette.

The initial feature palette provides the colors allowed

Undersegmentation 0:4 400 10

when segmenting the image. If the palette is not rich

Oversegmentation 0:3 100 10

enough the segmentation resolution was not chosen

Quantization 0:2 50 0

correctly and should b e increased to the next class. All

the pixel are reallo cated based on this palette. First,

2. [ID+FS] De nition of the search window.

the pixels yielding feature vectors inside the search

The initial lo cation of the search window in the feature

windows at their nal lo cation are considered. These

space is randomly chosen. To ensure that the search

pixels are allo cated to the color of the windowcenter

starts close to a high density region several lo cation

without taking into account image domain informa-

candidates are examined. The random sampling is

tion. The windows are then in ated to double volume

p

3

p erformed in the image domain and a few, M =25,

(their radius is multiplied with 2). The newly in-

pixels are chosen. For each pixel, the mean of its 3  3

corp orated pixels are retained only if they have at

neighborhood is computed and mapp ed into the fea-

least one neighbor which was already allo cated to

ture space. If the neighborhood b elongs to a larger

that color. The mean of the feature vectors mapp ed

homogeneous region, with high probability the lo ca-

into the same color is the value retained for the nal

tion of the search window will b e as wanted. To fur-

palette. At the end of the allo cation pro cedure a small

ther increase this probability, the window containing

numb er of pixels can remain unclassi ed. These pixels

the highest density of feature vectors is selected from

are allo cated to the closest color in the nal feature

the M candidates.

palette.

3. [FS] Mean shift algorithm. 8. [ID+FS] Postprocessing.

To lo cate the closest mo de the mean shift algorithm This step dep ends on the goal of the task. The sim-

is applied to the selected search window. Convergence

plest pro cedure is the removal from the image of all

is declared when the magnitude of the shift b ecomes small connected comp onents of size less than N .

con

less than 0.1. These pixels are allo cated to the ma jority color in

their 3  3 neighborhood, or in the case of a tie to

4. [ID+FS] Removal of the detectedfeature.

the closest color in the feature space.

The pixels yielding feature vectors inside the search

In Figure 2 the house image containing 9603 dif-

window at its nal lo cation are discarded from b oth

ferent colors is shown. The segmentation results for

domains. Additionally, their 8-connected neighbors

the three classes and the region b oundaries are given

in the image domain are also removed independent of

in Figure 5a{f. Note that undersegmentation yields

the feature vector value. These neighb ors can have

a go o d edge map, while in the quantization class the

\strange" colors due to the image formation pro cess

original image is closely repro duced with only 37 col-

and their removal cleans the background of the fea-

ors. A second example using the oversegmentation

ture space. Since all pixels are reallo cated in Step 7,

class is shown in Figure 3. Note the details on the

p ossible errors will b e corrected.

fuselage.

5. [ID+FS] Iterations.

Rep eat Steps 2 to 4, till the numb er of feature vectors

5 Discussion

in the selected search window no longer exceeds N .

min

The simplicity of the basic computational mo d-

6. [ID] Determining the initial featurepalette. ule, the mean shift algorithm, enables the feature

In the feature space a signi cant color must b e based space analysis to be accomplished very fast. From

on minimum N vectors. Similarly, to declare a a 512  512 pixels image a palette of 10{20 features

min

color signi cant in the image domain more than N can be extracted in less than 10 seconds on a Ultra

min

pixels of that color should b elong to a connected com- SPARC 1 workstation. To achieve such a sp eed the

ponent. From the extracted colors only those are re- implementation was optimized and whenever possi-

tained for the initial feature palette which yield at ble, the feature space (containing fewer distinct el-

least one connected comp onent in the image of size ements than the image domain) was used for array

larger than N . The neighb ors removed at Step 4. scanning; lo okup tables were employed instead of fre-

min 5

Figure 2: The house image, 255  192 pixels, 9603

(a)

colors.

quently rep eated computations; direct addressing in-

stead of nested p ointers; xed p oint arithmetic instead

of oating p oint calculations; partial computation of

the Euclidean distances, etc.

The analysis of the feature space is completely au-

tonomous, due to the extensive use of image domain

information. All the examples in this pap er, and

dozens more not shown here, were pro cessed using

the parameter values given in Table 2. Recently Zhu

and Yuille [14] describ ed a segmentation technique

incorp orating complex global optimization metho ds

(snakes, minimum description length) with sensitive

parameters and thresholds. To segment a color im-

age over a hundred iterations were needed. When the

(b)

images used in [14] were pro cessed with the technique

describ ed in this pap er, the same quality results were

Figure 3: Color image segmentation example. (a)

obtained unsup ervised and in less than a second. Fig-

Original image, 512  512 pixels, 77041 colors. (b)

ure 4 shows one of the results, to be compared with

Oversegmentation: 21/21 colors.

Figure 14h in [14]. The new technique can b e used un-

modi ed for segmenting graylevel images, whichare



handled as color images with only the L co ordinates.

In Figure 6 an example is shown.

The result of segmentation can be further re ned

by lo cal pro cessing in the image domain. For exam-

ple, robust analysis of the pixels in a large connected

comp onent yields the inlier/outlier dichotomy which

then can b e used to recover discarded ne details.

In conclusion, we have presented a general tech-

nique for feature space analysis with applications in

manylow-level vision tasks like thresholding, edge de-

tection, segmentation. The nature of the feature space

(a) (b)

is not restricted, currently we are working on apply-

ing the technique to range image segmentation, Hough

Figure 4: Performance comparison. (a) Original im-

transform and optical ow decomp osition.

age, 116  261 pixels, 200 colors. (b) Undersegmenta-

tion: 5/4 colors. Region b oundaries. 6

(a) (b)

(c) (d)

(e) (f )

Figure 5: The three segmentation classes for the house image. The right column shows the region b oundaries.

(a)(b) Undersegmentation. Numb er of colors extracted initially and in the feature palette: 8/8. (c)(d) Overseg-

mentation: 24/19 colors. (e)(f ) Quantization: 49/37 colors. 7

Acknowledgement

The researchwas supp orted by the National Science

Foundation under the grant IRI-9530546.

References

[1] J.R. Beveridge, J. Grith, R.R. Kohler, A.R.

Hanson, E.M. Riseman, \Segmenting images us-

ing lo calized histograms and region merging",

Int'l. J. of Comp. Vis.,vol. 2, 311{347, 1989.

[2] Y. Cheng, \Mean shift, mo de seeking, and clus-

tering", IEEE Trans. Pattern Anal. Machine In-

tel l.,vol. 17, 790{799, 1995.

[3] M. Flickner et al., \Query by image and video

content: The QBIC system", Computer,vol. 28,

no. 9, 23{32, 1995.

[4] K. Fukunaga, Introduction to Statistical Pat-

(a)

tern Recognition, Second Ed., Boston: Academic

Press, 1990.

[5] J. Illingworth, J. Kittler, \A survey of the Hough

transform", Comp. Vis., Graph. and Imag. Proc.,

vol. 44, 87{116, 1988.

[6] A.K. Jain, R.C. Dub es, Algorithms for Clustering

Data, Englewo o d Cli , NJ: Prentice Hall, 1988.

[7] J.-M. Jolion, P. Meer, S. Bataouche, \Robust

clustering with applications in computer vision,"

IEEE Trans. Pattern Anal. Machine Intel l.,vol.

13, 791{802, 1991.

[8] Q.T. Luong, \Color in computer vision", In

Handbook of Pattern Recognition and Computer

Vision, C.H. Chen, L.F. Pau, and P.S P. Wang

(Eds.), Singap ore: World Scienti c, 311{368,

1993.

[9] A. Pentland, R.W. Picard, S. Sclaro , \Pho-

(b)

tob o ok: Content-based manipulation of image

databases", Int'l. J. of Comp. Vis. vol. 18, 233{

254, 1996.

[10] P.J. Rousseeuw, A.M. Leroy, Robust Regression

and Outlier Detection. New York: Wiley,1987.

[11] P.K. Saho o, S. Soltani, A.K.C. Wong, \A survey

of thresholding techniques", Comp. Vis., Graph.

and Imag. Proc.,vol. 41, 233{260, 1988.

[12] W. Skarb ek, A. Koschan, Colour Image Segmen-

tation { A Survey,Technical Rep ort 94-32, Tech-

nical University Berlin, Octob er 1994.

[13] G. Wyszecki, W.S. Stiles, Color Science: Con-

cepts and Methods, Quantitative Data and For-

mulae, Second Ed. New York: Wiley, 1982.

[14] S.C. Zhu, A. Yuille, \Region comp etition: Uni-

fying snakes, region growing, and Bayes/MDL

for multiband image segmentation", IEEE Trans.

(c)

Pattern Anal. Machine Intel l.,Vol. 18, 884{900,

Figure 6: Gray level image segmentation example. (a)

1996.

Original image, 256  256 pixels. (b) Undersegmenta-

tion: 5 gray levels. (c) Region b oundaries. 8