Mean Shift Analysis and Applications

Dorin Comaniciu Peter Meer

Department of Electrical and Computer Engineering

Rutgers University, Piscataway, NJ 08854-8058, USA

fcomanici, [email protected]

to the average of the data p oints within. Details in the Abstract

image are preserved due to the nonparametric charac-

A nonparametric estimator of density gradient, the

ter of the analysis which do es not assume a priori any

, is employed in the joint, spatial-range

particular structure for the data.

value domain of gray level and color images for dis-

The pap er is organized as follows. Section 2 dis-

continuity preserving ltering and image segmentation.

cusses the estimation of the density gradient and de-

Prop erties of the mean shift are reviewed and its con-

nes the mean shift vector. The convergence of the

vergence on lattices is proven. The prop osed ltering

mean shift pro cedure is proven in Section 3 for discrete

metho d asso ciates with each pixel in the image the clos-

data. Section 4 de nes the pro cessing principle in the

est lo cal mo de in the density distribution of the joint do-

joint spatial-range domain. Mean shift ltering is ex-

main. Segmentation into a piecewise constant structure

plained and ltering examples are given in Section 5.

requires only one more step, fusion of the regions asso-

The prop osed mean shift segmentation is intro duced

ciated with nearby mo des. The prop osed technique has

and analyzed in Section 6.

two parameters controlling the resolution in the spatial

and range domains. Since convergence is guaranteed,

2 Density Gradient Estimation

the technique does not require the intervention of the

Let fx g b e an arbitrary set of n p oints in the

i

i=1:::n

user to stop the ltering at the desired image quality.

d

d-dimensional Euclidean space R . The multivariate

Several examples, for gray and color images, show the

density estimate obtained with kernel K x and

versatilityof the metho d and compare favorably with

window radius h, computed in the p oint x is de ned as

results describ ed in the literature for the same images.

[12, p.76]

n

X

1 Intro duction

1 x x

i

^

f x = K : 1

d

Low level tasks are misleadingly dif-

nh h

i=1

cult and often yield unreliable results, since the em-

ployed techniques rely up on the correct choice by the

The optimum kernel yielding minimum mean inte-

user of the tuning parameter values. Today,itisanac-

grated square error MISE is the Epanechnikovkernel

cepted fact in the vision community that the execution



1

1

T T

c d + 21 x x  if x x < 1

of low level tasks should b e task driven, i.e., supp orted

d

2

K x =

E

0 otherwise

by indep endent high level information. To be able to

2

successfully complement this paradigm, the low-level

where c is the volume of the unit d-dimensional sphere

techniques must b ecome more autonomous. In this pa- d

[12, p.76].

per we prop ose such a technique for image smo othing

The use of a di erentiable kernel allows to de ne the

and for segmentation.

estimate of the density gradient as the gradient of the

The mean shift estimate of the gradient of a density

kernel density estimate 1

function and the asso ciated iterative pro cedure of mo de

seeking have b een develop ed byFukunaga and Hostetler

n

X

1 x x

i

in [6]. Only recently, however, the nice prop erties of

^ ^

rf x  rf x = rK : 3

d

nh h

data compaction and of the

i=1

mean shift have b een exploited in low level computer

Conditions on the kernel K x and the window radius

vision tasks color space analysis [3], face tracking [1].

h to guarantee asymptotic unbiasedness, mean-square

In this pap er we describ e a new application based

consistency, and uniform consistency are derived in [6].

on the theoretical results obtained in [4]. We show that

For the Epanechnikovkernel 2 the density gradient

high quality edge preserving ltering and image seg-

estimate 3 b ecomes

mentation can b e obtained by applying the mean shift

in the combined spatial-range domain. The metho ds we

X

1 d +2

^

develop ed are conceptually very simple b eing based on

[x x] rf x =

i

d 2

nh c  h

d

the same idea of iteratively shifting a xed size window

x 2S x

i h

0 1

X 8

d +2 1 n

x

@ A

[x x ] 4 =

i 6

d 2

nh c  h n

d x

x 2S x i h 4

2

where the region S x  is a hyp ersphere of radius h

h 0

d

ving the volume h c , centered on x, and containing

ha −2 d

−4

n data p oints. The last term in 4 x

−6

X X

1 1

−8

M x [x x]= x x 5

h i i

−10 −5 0 5 10

n n

x x

x 2S x x 2S x

i i

h h

Figure 1: Successive computations of the mean shift

is called the sample mean shift. Using a kernel di er-

de ne a path leading to a lo cal density maximum.

ent from the Epanechnikovkernel results in a weighted

mean computation in 5.

n

the direction of the gradient of the density estimate at

x

The quantity is the kernel density estimate

d

nh c 

d

x, it is not apparent that the density estimate at lo-

^

f x  computed with the hyp ersphere S x the uni-

h

cations fy g is a monotonic increasing sequence.

k

k =1;2:::

form kernel, and thus we can write 4 as

Moving in the direction of the gradient guarantees hill

climbing only for in nitesimal steps. The following the-

d +2

^

^

rf x=f x  M x; 6

h

2

orem asserts the convergence for discrete data.

h

which yields

o n

^ ^

2

f y ;K  be the se- Theorem 1 Let f =

^

k E E

k

rf x h

k =1;2:::

: 7 M x =

h

^

quence of density estimates obtained using Epanech-

d +2

f x

nikov kernel and computed in the points fy g de-

k

k =1;2:::

The expression 7 was rst derived in [6] and shows

ned by the successive locations of the mean shift proce-

that an estimate of the normalized gradient can b e ob-

dure with uniform kernel. The sequenceisconvergent.

tained by computing the sample mean shift in a uniform

kernel centered on x . The mean shift vector has the di-

Pro of Since the data set fx g has nite car-

i

i=1:::n

rection of the gradient of the density estimate at x when

^

dinality n, the sequence f is b ounded. Moreover, we

E

this estimate is obtained with the Epanechnikovkernel.

^

will show that f is strictly monotonic increasing, i.e.,

E

Since the mean shift vector always p oints towards

^ ^

if y 6= y then f k  < f k + 1, for all k =1; 2 :::.

E E

k k +1

00 0 00 0

the direction of the maximum increase in the density,it

b e the num- + n with n = n , and n Let n , n

k k

k k k k

can de ne a path leading to a lo cal density maximum,

b er of data p oints falling in the d-dimensional windows

i.e., to a mo de of the density Figure 1.

0 00

Figure 2 S y , S y =S y  S y , and

h h h h

k k k k

T

The mean shift procedure, obtained by successive

00

S y =S y  S y .

h h h

k k k +1

 computation of the mean shift vector M x 

h

Without loss of generalitywe can assume the origin

 translation of the window S xby M x ,

h h

lo cated at y . Using the de nition of the density esti-

k

mate 1 with the Epanechnikovkernel 2 and noting

is guaranteed to converge, as it will be shown in the

2 2

that ky x k = kx k wehave

i i

k

next section.

3 Convergence

^ ^

f k  = f y ;K 

E k E

k

Let fy g denote the sequence of successive lo-

X

k

k =1;2:::

1 y x

i

k

= K

cations of the mean shift pro cedure. By de nition we

E

d

nh h

x 2S y 

have for each k=1,2. . .

i

h

k

2

X

kx k d +2

i

X

1

1 : 9 =

d 2

y = x ; 8

i

2nh c  h

k +1

d

n

x 2S y 

k

i

h

k

x 2S y 

i

h

k

where y is the center of the initial window and n is the

k

1

Since the kernel K is nonnegativewe also have

E

numberofpoints falling in the window S y  centered

h

k

^ ^

on y .

f k +1= f y ;K  

k

E k +1 E

k +1

The convergence of the mean shift has b een justi ed

X

y x

1

i

k +1

 K

as a consequence of relation 7, see [2]. However,

E

d

nh h

00

while it is true that the mean shift vector M x  has

x 2S y 

h

i

k h

4 Pro cessing in Spatial-Range Domain

An image is typically represented as a 2-dimensional

lattice of r-dimensional vectors pixels, where r is1in

the gray level case, 3 for color images, or r>3 in the

multisp ectral case. The space of the lattice is known

as the spatial domain while the gray level, color, or

sp ectral information is represented in the range domain.

However, after a prop er normalization with  and  ,

s r

global parameters in the spatial and range domains,

the lo cation and range vectors can b e concatenated to

Figure 2: The d-dimensional windows used in the pro of

obtain a spatial-range domain of dimension d = r +2.

0 00

of convergence: S y , S y , and S y . The

The main novelty of this pap er is to apply the mean

h h h

k k k

p oint y is the mean of the data p oints falling in

shift pro cedure for the data p oints in the joint spatial-

k +1

S y .

range domain. Each data p oint b ecomes asso ciated to

h

k

a p ointofconvergence which represents the lo cal mo de

2

of the density in the d-dimensional space. The pro cess,

X

ky x k

d +2

i

k +1

: 10 1 =

having the parameters  and  , takes into account

s r

d 2

2nh c  h

d

00

x 2S y 

simultaneously b oth the spatial and range information.

i

k

h

The output of the mean shift lter for an image pixel

0 00

Hence, knowing that n = n n we obtain

k

k k

is de ned as the range information carried by the p oint

d +2

of convergence. This pro cess achieves a high quality,

^ ^

f k +1 f k  

E E

d 2

discontinuity preserving spatial ltering. For the seg-

2nh c h

d

2 3

mentation task, the convergence p oints suciently close

X X

in the joint domain are fused to obtain the homogeneous

0

6 7

2 2 2

kx k ky x k n h ; 11

4 5

i i

k +1

k

regions in the image.

00

x 2S y 

i

h

x 2S y 

k

i

k

h

The prop osed spatial-range ltering and segmenta-

tion are describ ed in the sequel with results shown for

where the last term app ears due to the di erent sum-

b oth gray level and color images. The p erceptually uni-

mation b oundaries.

2 2

form L*u*v* space has b een used to represent the color

Also, by de nition ky x k  h for all x 2

i i

k +1

0

information, while for the gray level cases only the L*

S y , which implies that

k

h

comp onent has b een considered.

X

0

2 2

ky x k  n h : 12

i

k +1 k

5 Filtering

0

x 2S y 

i

k

h

Let fx g and fz g b e the d-dimensional

j j

j =1:::n j =1:::n

Finally, employing 12 in 11 and using 8 we ob-

original and ltered image p oints in the spatial-range

tain

domain. The upp erscripts s and r will denote the spa-

tial and range parts of the vectors, resp ectively. The

d +2

^ ^

f k +1 f k  

E E

d 2

original data is assumed to b e normalized with  for

s

2nh c h

d

3 2

the spatial part and  for the range.

r

X X

2 2

5 4

ky x k kx k

Mean Shift Filtering

i i

k +1

x 2S y  x 2S y 

For each j =1:::n

i i

h h

k k

2 3

1. Initialize k = 1 and y = x .

X

j

k

d +2

T 2

P

4 5

= 2y x n ky k

1

i k

k +1 k +1

d 2

2. Compute y = x , k  k +1

i

2nh c h

k +1

x 2S y 

n

d

i 1

k

k

x 2S y 

i

h

k

till convergence.

d +2

s r

2

3. Assign z =x ; y .

= n ky k : 13

j

conv

k j

k +1

d 2

2nh c h

d

The last assignment sp eci es that the ltered data

The last item of the relation 13 is strictly p ositive

at the spatial lo cation of x will have the range comp o-

j

except when y = y =0.

k k +1

nents of the p oint of convergence y . The number of

conv

Being b ounded and strictly monotonic increasing,

p oints in the window S y  of radius 1 and centered

^

1

k

the sequence f is convergent. Note that if y = y

E

k k +1

on Y is n . The unit radius of the windowis due to

k k

^

then y is the limit of f , i.e., y is the xed p ointof

E

k k

the normalization. the mean shift pro cedure.

5.1 Arithmetic Complexity

In a practical implementation the lattice structure

of the spatial domain is used for the ecient searchof

the p oints x 2 S y . This search can obviously b e

i 1

k

limited to a rectangular window of size 2  2 in the

2

normalized space, which corresp onds to 2b c +1

s

image pixels, where bc is the down-rounded integer.

By denoting with k the mean numb er of iterations

c

to convergence, the arithmetic complexity of mean shift

2

ltering is ab out k 2b c +1 ops p er image pixel.

c s

5.2 Normalization Constants

The value of  is related to the spatial resolution

s

of the analysis while the value of  de nes the range

r

color resolution.

a

An asymptotically optimal in the MISE sense gra-

dient estimate is obtained when the distribution in the

joint space is normal. The radius of the searching win-

dowisa function of the number of data p oints n [11,

p.152]. In our case, however, the data is far from be-

ing normal. Therefore, no theoretical constraints can

b e imp osed on the values of  and  , which are task

s r

dep endent and in practical settings their choice should

incorp orate a top-down, knowledge driven comp onent.

Achallenging issue not considered in this pap er is the

adaptive de nition of the normalization constants. To

takeinto account the nonstationarity of the input adap-

tivekernel estimation techniques were prop osed in the

statistical literature [14], however for less complex data.

Beside exploiting a priori information often available

for low level vision robust image understanding meth-

b

o ds can also b e helpful.

Figure 3: Cameraman image. a Original. b Mean

5.3 Exp eriments

shift ltered  ; =8; 4.

s r

Mean shift ltering with  ;  = 8; 4 has b een

s r

applied to the often used 256  256 gray level cam-

remain on it. As a result, the ltered data Figure 4c

eraman image Figure 3a, the pro cessed image b eing

shows clean quasi-homogeneous regions.

shown in Figure 3b. The regions containing the grass

A second ltering example is given in Figure 5b.

eld have b een almost completely smo othed while de-

The original, 512  512 color image baboon has b een

tails such as the trip o d and the buildings in the back-

pro cessed with a mean shift lter having  ;  =

s r

ground were preserved.

16; 16. While the texture of the fur has b een cleaned,

The entire pro cessing time was a few seconds on a

the details of the eyes and the whiskers remained crisp.

standard laptop with a 233 MHz Pentium II pro cessor.

5.4 Comparison to Bilateral Filtering

We used a Java implementation of the algorithm. The

mean number of iterations necessary for convergence

We note here two imp ortant di erences b etween the

was very low, around 3, due to the relatively small num-

mean shift and bilateral ltering prop osed by Tomasi

b er of data p oints falling in the searching window.

and Manduchi [15]. Both metho ds are based on the

same principle, the simultaneous pro cessing of b oth the To illustrate the e ectiveness of the ltering pro-

spatial and range domains. However, while the bilat- cess, the region marked in Figure 3a is represented in

eral ltering uses a static window in the two domains, three dimensions in Figure 4a. In Figure 4b the mean

the mean shift windowisdynamic,moving in the direc- shift paths asso ciated with each pixel from the central

tion of the maximum increase in the density gradient. plateau and the line are shown. Note that the conver-

Therefore, the mean shift ltering has a more p owerful gence p oints black dots are situated in the opp osite

adaptation to the lo cal structure of the data. direction relative to the edge, while the shifts on the line

a b

a

c d

Figure 4: A40 20 window from the image cameraman.

a Original data rotated and ipp ed over for b etter

visualization. b Mean shift paths for the p oints in

the central and top white plateaus. c Filtering result

 ; =8; 4. d Segmentation result see Section 6

s r

for details.

In addition, the ltering iterations prop osed in [15]

do not have a stopping criterion. After a sucientnum-

b er of iterations, the pro cessed image collapses to a at

surface. The same observation is valid for other adap-

tive smo othing techniques [9,10]. The pro cess de ned

by mean shift is run till convergence and maintains the

b

structure of the data.

Figure 5: Baboon image. a Original. b Mean shift

ltered  ;  = 16; 16.

s r

6 Segmentation

The mean shift segmentation in the spatial-range do-

dimensional convergence p oint is stored nowinz , not

j

main has the same simple design as the ltering pro cess.

only its range part. Note also that the numb er of clus-

Again, we assume the input data to b e normalized with

ters m is controlled by the parameters  ; .

s r

 ; . Let fx g be the original image p oints,

s r j

j =1:::n

The arithmetic complexity of the segmentation is

fz g the p oints of convergence, and fL g a

j j

j =1:::n j =1:::n

similar to that of the mean shift ltering, its rst step

set of lab els scalars.

b eing the most computationally exp ensive.

Mean Shift Segmentation

6.1 Exp eriments

1. For each j =1:::n run the mean shift pro cedure

We employed the algorithm describ ed ab ove with

for x and store the convergence p ointinz .

j j

 ; ;M=8; 7; 20 to segment the 256  256 gray

s r

2. Identify clusters fC g of convergence p oints

p

p=1:::m

level image MIT Figure 6a. The segmentation is pre-

by linking together all z which are closer than 0:5

j

sented in Figure 6b with the asso ciated contours in Fig-

from each other in the joint domain.

ure 6c. Compare the result in Figure 6 with the segmen-

3. For each j =1:::n assign L = fp j z 2 C g.

j j p

tations of the same image through clustering [3, Figure

4. Optional: Eliminate spatial regions smaller than

4] or using Gibbs random eld [8, Figure 7].

M pixels.

Returning to the cameraman image, Figure 7 shows

the reconstructed image after the regions corresp onding

to the sky and grass were replaced with white. Observe The rst step of the segmentation is a ltering

the preservation of the details. The mean shift segmen- pro cess. However all the information ab out the d-

tation has b een applied with  ; ;M = 8; 4; 10.

s r

Figure 4d shows the segmentation with the same pa-

rameters of the selected rectangular window in Fig-

ure 3a.

a

Figure 7: Segmentation with  ; ;M = 8; 4; 10

s r

and reconstruction of the cameraman image after the

elimination of regions representing sky and grass.

The segmentation with  ; ;M = 16; 7; 40 of

s r

the 512  512 color image lake is shown in Figure 8b.

Compare this result with that of the multiscale ap-

proach in [13, Figure 11]. Finally, one can compare

the contours of the color image hand presented in Fig-

ure 9 with those from [16, Figure 15] obtained through

a complex global optimization.

b

6.2 Discussion

It is interesting to contrast the mean shift segmenta-

tion with those based on the attraction force eld [13]

and edge ow propagation [7]. While all the three meth-

o ds employa vector eld to detect regions in the spatial

domain, only the mean shift based segmentation has

strong statistical foundations. Our metho d asso ciates

the current pixel with a mo de of the density lo cated in

its neighb orho o d measured in b oth spatial and range

domains.

The attraction force eld de ned in [13] is computed

at each pixel as a vector sum of pairwise anities b e-

tween the current pixel and all other pixels. No theo-

retical evidence of the existence of such a force eld is

given.

c

The edge ow in [7] is obtained at each lo cation for

a given set of directions as the magnitude of the gradi- Figure 6: MIT image. a Original. b Segmented

ent of a smo othed image. The quantization of the edge  ; ;M=8; 7; 20. Anumb er of 225 homogeneous

s r

ow direction, however, may intro duce artifacts. Re- regions are identi ed. c Corresp onding contours allow

call that, by contrast, the direction of the mean shift is the delineation of the walls, sky, steps, inscription on

dictated solely by the data. the building, etc.

a b

Figure 9: Hand image. a Original. b Contours

 ; ;M = 16; 19; 40.

s r

[2] Y. Cheng, \Mean Shift, Mo de Seeking, and Clustering",

IEEE Trans. PAMI,vol. 17, 790-799, 1995.

[3] D. Comaniciu, P. Meer, \Robust Analysis of Feature

a

Spaces: Color Image Segmentation", IEEE Conf. Comp.

Vis. and Pattern Recogn., Puerto Rico, 750-755, 1997.

[4] D. Comaniciu, P. Meer, \Distribution Free Decomp osi-

tion of Multivariate Data", Pattern Anal. and Applica-

tions,vol. 2, 22-30, 1999.

[5] P.F. Felzenszwalb, D.P. Huttenlo cher, \Image Segmen-

tation Using Lo cal Variation", IEEE Conf. Comp. Vis.

and Pattern Recogn., Santa Barbara, 98-103, 1998.

[6] K. Fukunaga, L.D. Hostetler, \The Estimation of the

Gradient of a Density Function, with Applications in

Pattern Recognition", IEEE Trans. Info. Theory, vol.

IT-21, 32-40, 1975.

[7] W.Y. Ma, B.S. Manjunath, \Edge Flow: AFramework

of Boundary Detection and Image Segmentation", IEEE

Conf. Comp. Vis. and Pattern Recogn., Puerto Rico,

744-749, 1997.

[8] T.N. Pappas, \An Adaptive Clustering Algorithm for

Image Segmentation", IEEE Trans. Signal Process.,vol.

b

40, 901-914, 1992.

Figure 8: Lake image. a Original. b Segmented

[9] P.Perona, J. Malik, \Scale-Space and Edge Detection

 ; ;M = 16; 7; 40.

s r

Using Anisotropic Di usion", IEEE Trans. PAMI,vol.

12, 629-639, 1990.

7 Conclusions

[10] P. Saint-Marc, J.S. Chen, G.G. Medioni, \Adaptive

This pap er suggests that e ective image analysis can

Smo othing: A General Tool for Early Vision", IEEE

Trans. PAMI,vol. 13, 514-529, 1991.

be implemented based on the mean shift pro cedure.

[11] D.W. Scott, Multivariate Density Estimation, New

The nonparametric estimation of the density gradient

York: Wiley, 1992.

in the spatial-range domain is a useful to ol for b ottom-

[12] B.W. Silverman, Density Estimation for Statistics and

up computer vision tasks such as edge preserving lter-

Data Analysis, New York: Chapman and Hall, 1986.

ing and segmentation. The metho ds we prop osed can

[13] M. Tabb, N. Ahuja, \Multiscale Image Segmentation

b e easily extended to the pro cessing of other low level

byIntegrated Edge and Region Detection", IEEE Trans.

image features like the texture or optical ow.

Image Process.,vol. 6, 642-655, 1997.

[14] G.R. Terrell, D.W. Scott, \Variable Density Estima-

Acknowledgment

tion", The Annals of Statistics,vol. 20, 1236-1265, 1992.

This researchwas supp orted by the NSF under the

[15] C. Tomasi, R. Manduchi, \Bilateral Filtering for Gray

grant IRI-9530546.

and Color Images", Int'l Conf. Comp. Vis., Bombay,

India, 839-846, 1998.

References

[16] S.C. Zhu, A. Yuille, \Region Comp etition: Unifying

Snakes, Region Growing, and Bayes/MDL for Multi-

[1] G.R. Bradski, `'Real Time Face and Ob ject Tracking

band Image Segmentation", IEEE Trans. PAMI, vol.

as a Comp onentofa Perceptual User Interface", IEEE

18, 884{900, 1996.

Workshop Applications of Comp. Vis., Princeton, 214- 219, 1998.