Robust Analysis of Feature Spaces:
Color Image Segmentation
Dorin Comaniciu Peter Meer
Department of Electrical and Computer Engineering
Rutgers University, Piscataway, NJ 08855, USA
Keywords: robust pattern analysis, low-level vision, content-based indexing
Abstract mo des can b e very large, of the order of tens.
The highest density regions corresp ond to clusters
A general technique for the recovery of signi cant
centered on the mo des of the underlying probability
image features is presented. The technique is basedon
distribution. Traditional clustering techniques [6], can
the mean shift algorithm, a simple nonparametric pro-
b e used for feature space analysis but they are reliable
cedure for estimating density gradients. Drawbacks of
only if the number of clusters is small and known a
the current methods (including robust clustering) are
priori. Estimating the number of clusters from the
avoided. Featurespace of any naturecan beprocessed,
data is computationally exp ensive and not guaranteed
and as an example, color image segmentation is dis-
to pro duce satisfactory result.
cussed. The segmentation is completely autonomous,
Amuch to o often used assumption is that the indi-
only its class is chosen by the user. Thus, the same
vidual clusters ob ey multivariate normal distributions,
program can produce a high quality edge image, or pro-
i.e., the feature space can b e mo deled as a mixture of
vide, by extracting al l the signi cant colors, a prepro-
Gaussians. The parameters of the mixture are then
cessor for content-based query systems. A 512 512
estimated by minimizing an error criterion. For exam-
color image is analyzed in less than 10 seconds on a
ple, a large class of thresholding algorithms are based
standard workstation. Gray level images are hand led
on the Gaussian mixture mo del of the histogram, e.g.
as color images having only the lightness coordinate.
[11]. However, there is no theoretical evidence that an
1 Intro duction
extracted normal cluster necessarily corresp onds to a
Feature space analysis is a widely used to ol for solv-
signi cant image feature. On the contrary, a strong
ing low-level image understanding tasks. Given an im-
artifact cluster may app ear when several features are
age, feature vectors are extracted from lo cal neighb or-
mapp ed into partially overlapping regions.
hoods and mapp ed into the space spanned by their
Nonparametric density estimation [4, Chap. 6]
comp onents. Signi cant features in the image then
avoids the use of the normality assumption. The two
corresp ond to high density regions in this space. Fea-
families of metho ds, Parzen window, and k-nearest
ture space analysis is the pro cedure of recovering the
neighb ors, b oth require additional input information
centers of the high density regions, i.e., the represen-
(typ e of the kernel, numb er of neighb ors). This infor-
tations of the signi cant image features. Histogram
mation must b e provided by the user, and for multi-
based techniques, Hough transform are examples of
mo dal distributions it is dicult to guess the optimal
the approach.
setting.
When the number of distinct feature vectors is
Nevertheless, a reliable general technique for fea-
large, the size of the feature space is reduced by group-
ture space analysis can be develop ed using a simple
ing nearbyvectors into a single cell. A discretized fea-
nonparametric density estimation algorithm. In this
ture space is called an accumulator. Whenever the size
pap er we prop ose such a technique whose robust b e-
of the accumulator cell is not adequate for the data,
havior is sup erior to metho ds employing robust esti-
serious artifacts can app ear. The problem was exten-
mators from statistics.
sively studied in the context of the Hough transform,
2 Requirements for Robustness
e.g. [5]. Thus, for satisfactory results a featurespace
should have continuous coordinate system. The con- Estimation of a cluster center is called in statistics
tent of a continuous feature space can b e mo deled as the multivariate lo cation problem. To be robust, an
a sample from a multivariate, multimo dal probability estimator must tolerate a p ercentage of outliers, i.e.,
distribution. Note that for real images the number of data p oints not ob eying the underlying distribution 1
of the cluster. Numerous robust techniques were pro- image domain. That is, the feature vectors satisfy ad-
posed [10, Sec. 7.1], and in computer vision the most ditional, spatial constraints. While these constraints
widely used is the minimum volume el lipsoid (MVE) are indeed used in the currenttechniques, their role is
estimator prop osed by Rousseeuw [10,p. 258]. mostly limited to comp ensating for feature allo cation
errors made during the independent analysis of the
The MVE estimator is ane equivariant(anane
feature space. To b e robust the featurespace analysis
transformation of the input is passed on to the es-
must ful ly exploit the image domain information.
timate) and has high breakdown point (tolerates up
As a consequence of the increased role of image
to half the data b eing outliers). The estimator nds
domain information the burden on the feature space
the center of the highest density region by searching
analysis can b e reduced. First al l the signi cant fea-
for the minimal volume ellipsoid containing at least
tures are extracted, and only after then are the clusters
h data p oints. The multivariate lo cation estimate is
containing the instances of these features recovered.
the center of this ellipsoid. To avoid combinatorial
The latter pro cedure uses image domain information
explosion a probabilistic search is employed. Let the
and avoids the normality assumption.
dimension of the data b e p. A small number of (p + 1)-
tuple of p oints are randomly chosen. For each(p + 1)-
Signi cant features corresp ond to high densityre-
tuple the mean vector and covariance matrix are com-
gions and to lo cate these regions a search windowmust
puted, de ning an ellipsoid. The ellipsoid is in ated
be employed. The number of parameters de ning
to include h points, and the one having the minimum
the shap e and size of the window should b e minimal,
volume provides the MVE estimate.
and therefore whenever it is p ossible the featurespace
should be isotropic. A space is isotropic if the distance
Based on MVE, a robust clustering technique with
between two p oints is indep endent on the lo cation of
applications in computer vision was prop osed in [7].
the p oint pair. The most widely used isotropic space is
The data is analyzed under several \resolutions" by
the Euclidean space, where a sphere, having only one
applying the MVE estimator rep eatedly with h values
parameter (its radius) can b e employed as searchwin-
representing xed p ercentages of the data p oints. The
dow. The isotropy requirement determines the map-
b est cluster then corresp onds to the h value yielding
ping from the image domain to the feature space. If
the highest density inside the minimum volume ellip-
the isotropy condition cannot be satis ed, a Maha-
soid. The cluster is removed from the feature space,
lanobis metric should be de ned from the statement
and the whole pro cedure is rep eated till the space is
of the task.
not empty. The robustness of MVE should ensure
that each cluster is asso ciated with only one mo de of
We conclude that robust feature space analysis re-
the underlying distribution. The numb er of signi cant
quires a reliable pro cedure for the detection of high
clusters is not needed a priori.
density regions. Such a pro cedure is presented in the
next section.
The robust clustering metho d was successfully em-
ployed for the analysis of a large variety of feature
spaces, but was found to b ecome less reliable once
3 Mean Shift Algorithm
the number of mo des exceeded ten. This is mainly
A simple, nonparametric technique for estimation
due to the normality assumption emb edded into the
of the density gradientwas prop osed in 1975 byFuku-
metho d. The ellipsoid de ning a cluster can be also
naga and Hostetler [4, p. 534]. The idea was recently
viewed as the high con dence region of a multivari-
generalized by Cheng [2].
ate normal distribution. Arbitrary feature spaces are
Assume, for the moment, that the probability den-
not mixtures of Gaussians and constraining the shap e
sity function p(x ) of the p-dimensional feature vectors
of the removed clusters to b e elliptical can intro duce
x is unimo dal. This condition is for sake of clarity
serious artifacts. The e ect of these artifacts propa-
only, later will be removed. A sphere S of radius
x
gates as more and more clusters are removed. Fur-
r , centered on x contains the feature vectors y such
thermore, the estimated covariance matrices are not
that k y x k r . The exp ected value of the vector
reliable since are based on only p +1 points. Subse-
z = y x,given x and S is
x
quent p ostpro cessing based on all the p oints declared
inliers cannot fully comp ensate for an initial error.
Z
To be able to correctly recover a large number of
= E [ zjS ] = (y x)p(y jS )dy (1)
x x
S
signi cant features, the problem of feature space anal- x
Z
p(y )
ysis must b e solved in context. In image understand-
dy (y x) =
ing tasks the data to be analyzed originates in the
p(y 2S )
x
S
x 2
If S is suciently small we can approximate lo cation of the mo de detected by the one-dimensional
x
MVE mo de detector, i.e., the center of the shortest
p
p(y 2S )=p(x )V where V = c r (2)
x
rectangular window containing half the data points
S S
x x
[10, Sec. 4.2]. Since the data is bimo dal with nearby
is the volume of the sphere. The rst order approxi-
mo des, the mo de estimator fails and returns a lo ca-
mation of p(y )is
tion in the trough. The starting p oint is marked by
the cross at the top of Figure 1.
T
p(y )=p(x )+(y x) rp(x ) (3)
12
where rp(x) is the gradient of the probability density
10
function in x. Then
Z 8
T
(y x)(y x) rp(x)
= dy (4)
6
V p(x )
S
S
x x
4
since the rst term term vanishes. The value of the
2
integral is [4,p. 535]
0
2 −4 −2 0 2 4 6 8
rp(x ) r
(5) =
p +2 p(x)
Figure 1: An example of the mean shift algorithm.
or
2
In this synthetic data example no a priori informa-
r rp(x )
E [ x j x 2S ] x = (6)
x
tion is available ab out the analysis window. Its size
p +2 p(x )
was taken equal to that returned by the MVE esti-
Thus, the mean shift vector, the vector of di erence
mator, 3.2828. Other, more adaptive strategies for
between the lo cal mean and the center of the window,
setting the search window size can also b e de ned.
is prop ortional to the gradient of the probability den-
Table 1: Evolution of Mean Shift Algorithm
sity at x. The prop ortionality factor is recipro cal to
p(x). This is b ene cial when the highest density re-
Initial Mo de Initial Mean Final Mean
gion of the probability density function is sought. Such
1.5024 1.4149 0.1741
region corresp onds to large p(x ) and small rp(x ), i.e.,
to small mean shifts. On the other hand, low den-
In Table 1 the initial values and the nal lo cation,
sity regions corresp ond to large mean shifts (ampli ed
shown with a star at the top of Figure 1, are given.
also by small p(x ) values). The shifts are always in
The mean shift algorithm is the to ol needed for fea-
the direction of the probability density maximum, the
ture space analysis. The unimo dality condition can
mo de. At the mo de the mean shift is close to zero.
be relaxed by randomly cho osing the initial lo cation
This prop erty can b e exploited in a simple, adaptive
of the search window. The algorithm then converges
steep est ascent algorithm.
to the closest high density region. The outline of a
general pro cedure is given b elow.
Mean Shift Algorithm
1. Cho ose the radius r of the searchwindow.
FeatureSpaceAnalysis
2. Cho ose the initial lo cation of the window.
1. Map the image domain into the feature space.
2. De ne an adequate number of search windows at
3. Compute the mean shift vector and translate the
random lo cations in the space.
search windowby that amount.
3. Find the high density region centers by applying
4. Rep eat till convergence.
the mean shift algorithm to eachwindow.
To illustrate the ability of the mean shift algorithm,
4. Validate the extracted centers with image domain
200 data p oints were generated from two normal distri-
constraints to provide the featurepalette.
butions, b oth having unit variance. The rst hundred
5. Allo cate, using image domain information, all the
points b elonged to a zero-mean distribution, the sec-
feature vectors to the feature palette.
ond hundred to a distribution having mean 3.5. The
data is shown as a histogram in Figure 1. It should b e The pro cedure is very general and applicable to any
emphasized that the feature space is pro cessed as an feature space. In the next section we describ e a color
ordered one-dimensional sequence of p oints, i.e., it is image segmentation technique develop ed based on this
continuous. The mean shift algorithm starts from the outline. 3
4 Color Image Segmentation the desired class, the sp eci c op erating conditions are
derived automatically by the program.
Image segmentation, partioning the image into ho-
Images are usually stored and displayed in the RGB
mogeneous regions, is a challenging task. The richness
space. However, to ensure the isotropy of the feature
of visual information makes b ottom-up, solely image
space, a uniform color space with the p erceived color
driven approaches always prone to errors. To be re-
di erences measured by Euclidean distances should
liable, the current systems must be large and incor-
b e used. Wehavechosen the L u v space [13, Sec.
p orate numerous ad-ho c pro cedures, e.g. [1]. The
3.3.9], whose co ordinates are related to the RGB val-
paradigms of gray level image segmentation (pixel-
ues by nonlinear transformations. The daylight stan-
based, area-based, edge-based) are also used for color
dard D was used as reference illuminant. The chro-
images. In addition, the physics-based metho ds take
65
matic information is carried by u and v , while the
into account information ab out the image formation
lightness co ordinate L can b e regarded as the relative
pro cesses as well. See, for example, the reviews [8, 12].
brightness. Psychophysical exp eriments show that
The prop osed segmentation technique do es not con-
L u v space may not be p erfectly isotropic [13, p.
sider the physical pro cesses, it uses only the given
311], however, it was found satisfactory for image un-
image, i.e., a set of RGB vectors. Nevertheless, can
derstanding applications. The image capture/display
be easily extended to incorp orate supplementary in-
op erations also intro duce deviations which are most
formation ab out the input. As homogeneity criterion
often neglected.
color similarity is used.
The steps of color image segmentation are presented
Since p erfect segmentation cannot be achieved
b elow. The acronyms ID and FS stand for image do-
without a top-down, knowledge driven comp onent, a
main and feature space resp ectively. All feature space
b ottom-up segmentation technique should
computations are p erformed in the L u v space.
only provide the input into the next stage where
the task is accomplished using a priori knowledge
1. [FS] De nition of the segmentation parameters.
ab out its goal; and
The user only indicates the desired class of segmen-
eliminate, as much as p ossible, the dep endence on
tation. The class de nition is translated into three
user set parameter values.
parameters
Segmentation resolution is the most general param-
the radius of the search window, r ;
eter characterizing a segmentation technique. While
the smallest number of elements required for a
this parameter has a continuous scale, three imp ortant
signi cant color, N ;
classes can b e distinguished.
min
Undersegmentation corresp onds to the lowest res-
the smallest number of contiguous pixels required
olution. Homogeneity is de ned with a large tol-
for a signi cant image region, N .
con
erance margin and only the most signi cant colors
The size of the searchwindow determines the resolu-
are retained for the feature palette. The region
tion of the segmentation, smaller values corresp onding
b oundaries in a correctly undersegmented image
to higher resolutions. The sub jective (p erceptual) def-
are the dominant edges in the image.
inition of a homogeneous region seems to dep end on
Oversegmentation corresp onds to intermediate res-
the \visual activity" in the image. Within the same
olution. The feature palette is rich enough that
segmentation class an image containing large homoge-
the image is broken into many small regions from
neous regions should b e analyzed at higher resolution
which any sought information can be assembled
than an image with many textured areas. The sim-
under knowledge control. Oversegmentation is
plest measure of the \visual activity" can b e derived
the recommended class when the goal of the task
from the global covariance matrix. The square ro ot
is ob ject recognition.
of its trace, , is related to the power of the signal
Quantization corresp onds to the highest resolution.
(image). The radius r is taken prop ortional to . The
The feature palette contains all the imp ortant col-
rules de ning the three segmentation class parameters
ors in the image. This segmentation class b ecame
are given in Table 2. These rules were used in the seg-
imp ortant with the spread of image databases,
mentation of a large variety images, ranging from sim-
e.g., [3, 9]. The full palette, p ossibly together
ple blo o d cells to complex indo or and outdo or scenes.
with the underlying spatial structure, is essential
When the goal of the task is well de ned and/or all
for content-based queries.
the images are of the same typ e, the parameters can
The prop osed color segmentation technique op erates
b e ne tuned.
in any of the these three classes. The user only cho oses 4
are also considered when de ning the connected com-
Table 2: Segmentation Class Parameters
ponents Note that the threshold is not N whichis
con
used only at the p ostpro cessing stage.
Segmentation Parameter
Class r N N
min con
7. [ID+FS] Determining the nal featurepalette.
The initial feature palette provides the colors allowed
Undersegmentation 0:4 400 10
when segmenting the image. If the palette is not rich
Oversegmentation 0:3 100 10
enough the segmentation resolution was not chosen
Quantization 0:2 50 0
correctly and should b e increased to the next class. All
the pixel are reallo cated based on this palette. First,
2. [ID+FS] De nition of the search window.
the pixels yielding feature vectors inside the search
The initial lo cation of the search window in the feature
windows at their nal lo cation are considered. These
space is randomly chosen. To ensure that the search
pixels are allo cated to the color of the windowcenter
starts close to a high density region several lo cation
without taking into account image domain informa-
candidates are examined. The random sampling is
tion. The windows are then in ated to double volume
p
3
p erformed in the image domain and a few, M =25,
(their radius is multiplied with 2). The newly in-
pixels are chosen. For each pixel, the mean of its 3 3
corp orated pixels are retained only if they have at
neighborhood is computed and mapp ed into the fea-
least one neighbor which was already allo cated to
ture space. If the neighborhood b elongs to a larger
that color. The mean of the feature vectors mapp ed
homogeneous region, with high probability the lo ca-
into the same color is the value retained for the nal
tion of the search window will b e as wanted. To fur-
palette. At the end of the allo cation pro cedure a small
ther increase this probability, the window containing
numb er of pixels can remain unclassi ed. These pixels
the highest density of feature vectors is selected from
are allo cated to the closest color in the nal feature
the M candidates.
palette.
3. [FS] Mean shift algorithm. 8. [ID+FS] Postprocessing.
To lo cate the closest mo de the mean shift algorithm This step dep ends on the goal of the task. The sim-
is applied to the selected search window. Convergence
plest pro cedure is the removal from the image of all
is declared when the magnitude of the shift b ecomes small connected comp onents of size less than N .
con
less than 0.1. These pixels are allo cated to the ma jority color in
their 3 3 neighborhood, or in the case of a tie to
4. [ID+FS] Removal of the detectedfeature.
the closest color in the feature space.
The pixels yielding feature vectors inside the search
In Figure 2 the house image containing 9603 dif-
window at its nal lo cation are discarded from b oth
ferent colors is shown. The segmentation results for
domains. Additionally, their 8-connected neighbors
the three classes and the region b oundaries are given
in the image domain are also removed independent of
in Figure 5a{f. Note that undersegmentation yields
the feature vector value. These neighb ors can have
a go o d edge map, while in the quantization class the
\strange" colors due to the image formation pro cess
original image is closely repro duced with only 37 col-
and their removal cleans the background of the fea-
ors. A second example using the oversegmentation
ture space. Since all pixels are reallo cated in Step 7,
class is shown in Figure 3. Note the details on the
p ossible errors will b e corrected.
fuselage.
5. [ID+FS] Iterations.
Rep eat Steps 2 to 4, till the numb er of feature vectors
5 Discussion
in the selected search window no longer exceeds N .
min
The simplicity of the basic computational mo d-
6. [ID] Determining the initial featurepalette. ule, the mean shift algorithm, enables the feature
In the feature space a signi cant color must b e based space analysis to be accomplished very fast. From
on minimum N vectors. Similarly, to declare a a 512 512 pixels image a palette of 10{20 features
min
color signi cant in the image domain more than N can be extracted in less than 10 seconds on a Ultra
min
pixels of that color should b elong to a connected com- SPARC 1 workstation. To achieve such a sp eed the
ponent. From the extracted colors only those are re- implementation was optimized and whenever possi-
tained for the initial feature palette which yield at ble, the feature space (containing fewer distinct el-
least one connected comp onent in the image of size ements than the image domain) was used for array
larger than N . The neighb ors removed at Step 4. scanning; lo okup tables were employed instead of fre-
min 5
Figure 2: The house image, 255 192 pixels, 9603
(a)
colors.
quently rep eated computations; direct addressing in-
stead of nested p ointers; xed p oint arithmetic instead
of oating p oint calculations; partial computation of
the Euclidean distances, etc.
The analysis of the feature space is completely au-
tonomous, due to the extensive use of image domain
information. All the examples in this pap er, and
dozens more not shown here, were pro cessed using
the parameter values given in Table 2. Recently Zhu
and Yuille [14] describ ed a segmentation technique
incorp orating complex global optimization metho ds
(snakes, minimum description length) with sensitive
parameters and thresholds. To segment a color im-
age over a hundred iterations were needed. When the
(b)
images used in [14] were pro cessed with the technique
describ ed in this pap er, the same quality results were
Figure 3: Color image segmentation example. (a)
obtained unsup ervised and in less than a second. Fig-
Original image, 512 512 pixels, 77041 colors. (b)
ure 4 shows one of the results, to be compared with
Oversegmentation: 21/21 colors.
Figure 14h in [14]. The new technique can b e used un-
modi ed for segmenting graylevel images, whichare
handled as color images with only the L co ordinates.
In Figure 6 an example is shown.
The result of segmentation can be further re ned
by lo cal pro cessing in the image domain. For exam-
ple, robust analysis of the pixels in a large connected
comp onent yields the inlier/outlier dichotomy which
then can b e used to recover discarded ne details.
In conclusion, we have presented a general tech-
nique for feature space analysis with applications in
manylow-level vision tasks like thresholding, edge de-
tection, segmentation. The nature of the feature space
(a) (b)
is not restricted, currently we are working on apply-
ing the technique to range image segmentation, Hough
Figure 4: Performance comparison. (a) Original im-
transform and optical ow decomp osition.
age, 116 261 pixels, 200 colors. (b) Undersegmenta-
tion: 5/4 colors. Region b oundaries. 6
(a) (b)
(c) (d)
(e) (f )
Figure 5: The three segmentation classes for the house image. The right column shows the region b oundaries.
(a)(b) Undersegmentation. Numb er of colors extracted initially and in the feature palette: 8/8. (c)(d) Overseg-
mentation: 24/19 colors. (e)(f ) Quantization: 49/37 colors. 7
Acknowledgement
The researchwas supp orted by the National Science
Foundation under the grant IRI-9530546.
References
[1] J.R. Beveridge, J. Grith, R.R. Kohler, A.R.
Hanson, E.M. Riseman, \Segmenting images us-
ing lo calized histograms and region merging",
Int'l. J. of Comp. Vis.,vol. 2, 311{347, 1989.
[2] Y. Cheng, \Mean shift, mo de seeking, and clus-
tering", IEEE Trans. Pattern Anal. Machine In-
tel l.,vol. 17, 790{799, 1995.
[3] M. Flickner et al., \Query by image and video
content: The QBIC system", Computer,vol. 28,
no. 9, 23{32, 1995.
[4] K. Fukunaga, Introduction to Statistical Pat-
(a)
tern Recognition, Second Ed., Boston: Academic
Press, 1990.
[5] J. Illingworth, J. Kittler, \A survey of the Hough
transform", Comp. Vis., Graph. and Imag. Proc.,
vol. 44, 87{116, 1988.
[6] A.K. Jain, R.C. Dub es, Algorithms for Clustering
Data, Englewo o d Cli , NJ: Prentice Hall, 1988.
[7] J.-M. Jolion, P. Meer, S. Bataouche, \Robust
clustering with applications in computer vision,"
IEEE Trans. Pattern Anal. Machine Intel l.,vol.
13, 791{802, 1991.
[8] Q.T. Luong, \Color in computer vision", In
Handbook of Pattern Recognition and Computer
Vision, C.H. Chen, L.F. Pau, and P.S P. Wang
(Eds.), Singap ore: World Scienti c, 311{368,
1993.
[9] A. Pentland, R.W. Picard, S. Sclaro , \Pho-
(b)
tob o ok: Content-based manipulation of image
databases", Int'l. J. of Comp. Vis. vol. 18, 233{
254, 1996.
[10] P.J. Rousseeuw, A.M. Leroy, Robust Regression
and Outlier Detection. New York: Wiley,1987.
[11] P.K. Saho o, S. Soltani, A.K.C. Wong, \A survey
of thresholding techniques", Comp. Vis., Graph.
and Imag. Proc.,vol. 41, 233{260, 1988.
[12] W. Skarb ek, A. Koschan, Colour Image Segmen-
tation { A Survey,Technical Rep ort 94-32, Tech-
nical University Berlin, Octob er 1994.
[13] G. Wyszecki, W.S. Stiles, Color Science: Con-
cepts and Methods, Quantitative Data and For-
mulae, Second Ed. New York: Wiley, 1982.
[14] S.C. Zhu, A. Yuille, \Region comp etition: Uni-
fying snakes, region growing, and Bayes/MDL
for multiband image segmentation", IEEE Trans.
(c)
Pattern Anal. Machine Intel l.,Vol. 18, 884{900,
Figure 6: Gray level image segmentation example. (a)
1996.
Original image, 256 256 pixels. (b) Undersegmenta-
tion: 5 gray levels. (c) Region b oundaries. 8