Appearance-Based Tracking current frame + previous location likelihood over object location current location Mean-shift Tracking

appearance model R.Collins, CSE, PSU (e.g. image template, or -Seeking (e.g. mean-shift; Lucas-Kanade; CSE598G Spring 2006 particle filtering)

color; intensity; edge histograms)

Mean-Shift Motivation • Motivation – to track non-rigid objects, (like The mean-shift algorithm is an efficient a walking person), it is hard to specify approach to tracking objects whose an explicit 2D parametric motion model. appearance is defined by histograms. • Appearances of non-rigid objects can (not limited to only color) sometimes be modeled with color distributions

Credits: Many Slides Borrowed from

www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/mean_shift/mean_shift.ppt

Mean Shift Theory and Applications

Yaron Ukrainitz & Bernard Sarel Theory weizmann institute Advanced topics in

even the slides on my own work! --Bob

1 Intuitive Description Intuitive Description Region of Region of interest interest

Center of Center of mass mass

Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls

Intuitive Description Intuitive Description Region of Region of interest interest

Center of Center of mass mass

Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls

Intuitive Description Intuitive Description Region of Region of interest interest

Center of Center of mass mass

Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls

2 Intuitive Description What is Mean Shift ? Region of interest A tool for: Center of Finding modes in a set of data samples, manifesting an mass underlying probability density function (PDF) in RN

PDF in feature space • Color space Non-parametric • Scale spaceDensity Estimation • Actually any feature space you can conceive • … Discrete PDF Representation

Data

Non-parametric Density GRADIENT Estimation (Mean Shift) Objective : Find the densest region Distribution of identical billiard balls PDF Analysis

Non-Parametric Density Estimation Non-Parametric Density Estimation

Assumption : The data points are sampled from an underlying PDF

Data point density implies PDF value !

Assumed Underlying PDF Real Data Samples Assumed Underlying PDF Real Data Samples

Non-Parametric Density Estimation Appearance via Color Histograms

R’

B’ G’ Color distribution (1D histogram discretize normalized to have unit weight)

R’ = R << (8 - nbits) Total histogram size is (2^(8-nbits))^3 G’ = G << (8 - nbits) B’ = B << (8-nbits) example, 4-bit encoding of R,G and B channels yields a histogram of size 16*16*16 = 4096. Assumed Underlying PDF Real Data Samples

3 Smaller Color Histograms Color Histogram Example Histogram information can be much much smaller if we are willing to accept a loss in color resolvability. red green blue Marginal R distribution R’ G’

Marginal G distribution

B’

discretize Marginal B distribution

R’ = R << (8 - nbits) Total histogram size is 3*(2^(8-nbits)) G’ = G << (8 - nbits) B’ = B << (8-nbits) example, 4-bit encoding of R,G and B channels yields a histogram of size 3*16 = 48.

Normalized Color Intro to Parzen Estimation (Aka Density Estimation) (r,g,b) (r’,g’,b’) = (r,g,b) / (r+g+b) Mathematical model of how histograms are formed Assume continuous data points

Normalized color divides out pixel luminance (brightness), leaving behind only chromaticity (color) information. The result is less sensitive to variations due to illumination/shading.

Parzen Estimation Example Histograms (Aka Kernel Density Estimation) Box filter Mathematical model of how histograms are formed [1 1 1] Assume continuous data points

[ 1 1 1 1 1 1 1]

Convolve with box filter of width w (e.g. [1 1 1]) Take samples of result, with spacing w [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] Resulting value at point u represents count of data points falling in range u-w/2 to u+w/2 Increased smoothing

4 Why Formulate it This Way? Kernel Density Estimation

• Generalize from box filter to other filters • Parzen windows: Approximate probability density by estimating local density of points (same idea as a (for example Gaussian) histogram) • Gaussian acts as a smoothing filter. – Convolve points with window/kernel function (e.g., Gaussian) using scale parameter (e.g., sigma)

from Hastie et al.

Density Estimation at Different Scales Smoothing Scale Selection

• Example: Density estimates for 5 data points with • Unresolved issue: how to pick the scale differently-scaled kernels (sigma value) of the smoothing filter • Scale influences accuracy vs. generality (overfitting) • Answer for now: a user-settable parameter

from Duda et al. from Duda et al.

Kernel Density Estimation Kernel Density Estimation Parzen Windows - General Framework Parzen Windows - Function Forms

1 n 1 n P()x   K(x - xi ) A function of some finite number of data points P()x   K(x - xi ) A function of some finite number of data points n i1 n i1 x1…xn x1…xn

Kernel Properties: Data Data • Normalized K()xd x 1  In practice one uses the forms: Rd • Symmetric  xK()xd x  0 d d or R K()x  ck() xi K()x  ck  x  i1 d • Exponential weight lim x K()x  0 decay x  • ???  xxT K()xd x  cI Same function on each dimension Function of vector length only Rd

5 Kernel Density Estimation Various Kernels Key Idea:

n 1 n 1 P()x  K(x - x ) A function of some finite number of data points P()x  K(x - x )  i  i n i1 x1…xn n i1

Superposition of kernels, centered at each data point is equivalent Examples: Data to convolving the data points with the kernel. 2  c 1 x  x  1 • Epanechnikov Kernel K E ()x    0 otherwise convolution  c x  1 • Uniform Kernel KU ()x    0 otherwise * Kernel Data  1 2  • Normal Kernel K N ()x c exp  x   2 

Kernel Density Estimation Key Idea: Gradient n 1 n 1  P (x )   K (x - x ) Give up estimating the PDF !  i  P (x )  K (x-xi) n i1 Estimate ONLY the gradient  ni1

2 Gradient of superposition of kernels, centered at each data point Using the  x - x  K(x - x )  ck i is equivalent to convolving the data points with gradient of the kernel. Kernel form: i    h  We get : Size of window convolution  n  x g c n c  n    i i   P ()x   k  g   i 1  x  * n  i n   i  n i1  i 1   g    i  gradient of Kernel Data  i 1 

g(x )  k()x

ComputingKernel Density The EstimationMean Shift Computing The Mean Shift Gradient  n  x g c n c  n    i i   P ()x   k  g   i 1  x  n  i n   i  n i1  i 1   g    i   i 1 

Yet another Kernel density estimation !

Simple Mean Shift procedure: • Compute mean shift vector n   2 x g  n  x - x   n n   i i   x g  i   c c   i 1  i    P ()x   k  g    x   i1 h   i   i  n m() x     x  2  n i1 n i 1   n   g   x - x     i  g  i   i 1     h    i 1    g(x )  k()x •Translate the Kernel window by m(x) g(x )  k()x

6 Mean Shift Mode Detection Mean Shift Properties

What happens if we reach a saddle point ? • Automatic convergence speed – the mean shift Adaptive vector size depends on the gradient itself. Gradient Ascent Perturb the mode position • Near maxima, the steps are small and refined and check if we return back • Convergence is guaranteed for infinitesimal steps only  infinitely convergent, (therefore set a lower bound)

• For Uniform Kernel ( ), convergence is achieved in a finite number of steps Updated Mean Shift Procedure: • Find all modes using the Simple Mean Shift Procedure • Normal Kernel ( ) exhibits a smooth trajectory, but • Prune modes by perturbing them (find saddle points and plateaus) is slower than Uniform Kernel ( ). • Prune nearby – take highest mode in the window

Real Modality Analysis Real Modality Analysis

Tessellate the space Run the procedure in parallel The blue data points were traversed by the windows towards the mode with windows

Real Modality Analysis Adaptive Mean Shift An example

Window tracks signify the steepest ascent directions

7 Mean Shift Strengths & Weaknesses

Strengths : Weaknesses :

• Application independent tool • The window size (bandwidth selection) is not trivial • Suitable for real data analysis Mean Shift Applications • Inappropriate window size can • Does not assume any prior shape cause modes to be merged, (e.g. elliptical) on data clusters or generate additional “shallow” modes  Use adaptive window • Can handle arbitrary feature size spaces

• Only ONE parameter to choose

• h (window size) has a physical meaning, unlike K-Means

Clustering Clustering Synthetic Examples Cluster : All data points in the attraction basin of a mode

Attraction basin : the region for which all trajectories lead to the same mode

Simple Modal Structures

Complex Modal Structures Mean Shift : A robust Approach Toward Feature Space Analysis, by Comaniciu, Meer

Clustering Clustering Real Example Real Example Feature space: Initial window L*u*v representation centers

L*u*v space representation Modes found Modes after Final clusters pruning

8 Clustering Real Example Non-Rigid Object Tracking

2D (L*u) Final clusters space representation

… … Not all trajectories in the attraction basin reach the same mode Kernel Based Object Tracking, by Comaniciu, Ramesh, Meer (CRM)

Non-Rigid Object Tracking Mean-Shift Object Tracking General Framework: Target Representation Real-Time

Choose a Represent the Object-Based reference Choose a model in the Surveillance Driver Assistance Video Compression model in the feature space chosen feature current frame space

… Current … frame

Mean-Shift Object Tracking Using Mean-Shift for General Framework: Target Localization Tracking in Color Images

Start from the Search in the Find best position of the model’s candidate by Two approaches: model in the neighborhood maximizing a current frame in next frame similarity func. 1) Create a color “likelihood” image, with pixels weighted by similarity to the desired color (best for unicolored objects)

Repeat the same process 2) Represent color distribution with a histogram. Use in the next pair of frames mean-shift to find region that has most similar distribution of colors. Model Candidate … Current … frame

9 Mean-shift on Weight Images Mean-Shift Tracking Let pixels form a uniform grid of data points, each with a weight (pixel value) Ideally, we want an indicator function that returns 1 for pixels on the proportional to the “likelihood” that the pixel is on the object we want to track. object we are tracking, and 0 for all other pixels Perform standard mean-shift algorithm using this weighted set of points.

Instead, we compute likelihood maps where the value at a pixel is proportional to the likelihood that the pixel comes from the object we are tracking.

Computation of likelihood can be based on • color • texture • shape (boundary) • predicted location a K(a-x) w(a) (a-x) Note: So far, we have described mean-shift x = as operating over a set of point samples... a K(a-x) w(a)

Nice Property Kernel-Shadow Pairs Running mean-shift with kernel K on weight image w is equivalent to performing gradient ascent in a (virtual) image formed by convolving w Given a convolution kernel H, what is the corresponding mean-shift kernel K? 2 with some “shadow” kernel H. Perform change of variables r = ||a-x|| Rewrite H(a-x) => h(||a-x||2) => h(r) .

Then kernel K must satisfy h’(r) = - c k (r)

Examples

Shadow

Epanichnikov Gaussian

Note: mode we are looking for is mode of location (x,y) Kernel likelihood, NOT mode of the color distribution! Flat Gaussian

Using Mean-Shift on Color High-Level Overview Models Spatial smoothing of similarity function by Two approaches: introducing a spatial kernel (Gaussian, box filter)

1) Create a color “likelihood” image, with pixels Take derivative of similarity with respect to colors. weighted by similarity to the desired color (best This tells what colors we need more/less of to for unicolored objects) make current hist more similar to reference hist.

2) Represent color distribution with a histogram. Use Result is weighted mean shift we used before. However, mean-shift to find region that has most similar the color weights are now computed “on-the-fly”, and distribution of colors. change from one iteration to the next.

10 Mean-Shift Object Tracking Mean-Shift Object Tracking Target Representation PDF Representation

Target Model Target Candidate Choose a Represent the Choose a (centered at 0) (centered at y) reference model by its feature space target model PDF in the 0.35 0.3 feature space 0.3 0.25 0.25 y y 0.2 t t i i l l i 0.2 i b b a a 0.15 b b

o 0.15 o r r P P 0.1 0.1

0.05 0.05

0 0 1 2 3 . . . m 1 2 3 . . . m 0.35 color color 0.3

0.25 y t i

Quantized l m m i 0.2 b

a   b

Color Space o 0.15 r q  qu qu 1 p y  pu  y pu 1 P u1..m   u1..m  0.1 u1 u1 0.05

0 1 2 3 . . . m color Similarity   f y  f q, p y Function:  

Mean-Shift Object Tracking Mean-Shift Object Tracking Smoothness of Similarity Function Finding the PDF of the target model

model candidate x Target pixel locations    ii1..n Similarity Function: f y  f p y, q   f y 0 y k() x A differentiable, isotropic, convex, monotonically decreasing kernel • Peripheral pixels are affected by occlusion and background interference

The color bin index (1..m) of pixel x Large similarity b() x Target is Spatial info is variations for represented by Problem: lost adjacent Probability of feature u in model Probability of feature u in candidate color info only locations 2 2   y xi Gradient- qu  C k xi p y  C k      u   h    based b() xi u b() x u h f is not smooth i    optimizations are not robust 0.3 0.3 0.25 0.25

y 0.2

y 0.2 t t i i l l i i b b a 0.15 Normalization a 0.15 b Normalization b o o Pixel weight r r Pixel weight P Mask the target with P 0.1 0.1 f(y) becomes factor factor Solution: an isotropic kernel in 0.05 0.05 smooth in y 0 0 the spatial domain 1 2 3 . . . m 1 2 3 . . . m color color

Mean-Shift Object Tracking Mean-Shift Object Tracking Similarity Function Target Localization Algorithm  Start from the Target model: q  q1,,qm  Search in the Find best position of the  model’s candidate by model in the neighborhood maximizing a Target candidate: p y   p1  y,, pm  y current frame in next frame similarity func.   Similarity function: f y  f p y, q  ?

The Bhattacharyya Coefficient  q   1 q   q1 ,, qm    y p y  p y ,, p y    1   m     1 p y p yT q m   f y  cos   p y q     y  u   u q p y f p y, q p y  q u1    

11 Mean-Shift Object Tracking Mean-Shift Object Tracking Approximating the Similarity Function Maximizing the Similarity Function

m Model location: y0 f y  p y q 2  u u y C n  y x  u1 Candidate location: The mode of h w k i  i   = sought maximum 2 i1 h Linear 1 m 1 m q   approx. u f y  pu  y0  qu  pu  y (around y )   0 2 u1 2 u1 pu  y0  Important Assumption:  y x 2  p y  C k i u   h    The target b() x u h Independent i   representation of y provides sufficient C n  y x 2  Density discrimination h i estimate! wi k    (as a function 2 i1  h    of y) One mode in the searched neighborhood

Mean-Shift Object Tracking Mean-Shift Object Tracking Applying Mean-Shift About Kernels and Profiles

2 A special class of radially 2 C n  y x  The mode of h w k i symmetric kernels: K x  ck x  i   = sought maximum   2 i1  h 

The profile of n  y  x 2  kernel K x g  0 i  2  i   n   i1  h  Original y xi y  Find mode of c k   using 1 2 Mean-Shift:   h  n  y  x  i1   g 0 i    k x  g x i1  h 

n  2  n  2  y0  xi y0  xi xi wi g   xi wi g   n 2    n 2     y x  i1  h   y x  i1  h  Extended i using y  Extended i using y  Find mode of c wi k   1 2 Find mode of c wi k   1 2 Mean-Shift:  h  n  y  x  Mean-Shift:  h  n  y  x  i1   w g 0 i i1   w g 0 i  i    i   i1  h  i1  h 

Mean-Shift Object Tracking Mean-Shift Object Tracking Choosing the Kernel Adaptive Scale

A special class of radially 2 Problem: symmetric kernels: K x  ck x  The scale of The scale Epanechnikov kernel: Uniform kernel: the target (h) of the changes in kernel must time be adapted

1  x if x 1  1 if x 1  Solution: k x    g x  k x    0 otherwise 0 otherwise     Run Choose h localization 3 that achieves n  y  x 2  n times with maximum x w g  0 i  x w  i i    i i different h similarity i1  h  i1 y1  y1  n n  2  y0  xi w g wi  i    i1  h  i1

12 Mean-Shift Object Tracking Mean-Shift Object Tracking Results Results

Partial occlusion Distraction Motion blur

From Comaniciu, Ramesh, Meer

Feature space: 161616 quantized RGB Target: manually selected on 1st frame Average mean-shift iterations: 4

Mean-Shift Object Tracking Mean-Shift Object Tracking Results Results

From Comaniciu, Ramesh, Meer From Comaniciu, Ramesh, Meer

Feature space: 128128 quantized RG

Mean-Shift Object Tracking Results The man himself…

Handling Scale Changes

From Comaniciu, Ramesh, Meer

Feature space: 128128 quantized RG

13 Mean-Shift Object Tracking Some Approaches to Size The Scale Selection Problem Selection Kernel too big •Choose one scale and stick with it.

•Bradski’s CAMSHIFT tracker computes principal axes and scales from the second moment matrix of the blob. Assumes one blob, little clutter.

•CRM adapt window size by +/- 10% and evaluate using Battacharyya coefficient. h mustn’t get Poor Although this does stop the window from growing too big, it is not sufficient to too big or localization keep the window from shrinking too much. Kernel too small too small! •Comaniciu’s variable bandwidth methods. Computationally complex.

•Rasmussen and Hager: add a border of pixels around the window, and require that pixels in the window should look like the object, while pixels in the border should not.

In uniformly Smaller h may Nothing keeps colored regions, achieve better h from shrinking Problem: similarity is similarity too small! invariant to h Center-surround

Tracking Through Scale Space Scale Space Motivation Form a resolution scale space by convolving image with Gaussians of increasing variance.

Lindeberg proposes that the natural scale for describing a feature is the scale at which a normalized differential operator for detecting that feature achieves a local maximum both spatially and in scale.

For blob detection, the Laplacian operator is Spatial Simultaneous used, leading to a search for modes in a LOG localization localization in scale space. (Actually, we approximate the LOG operator by DOG). e

for several space and l a c

scales scale S Previous method This method Mean-shift Blob Tracking through Scale Space, by R. Collins

Lindeberg’s Theory Lindeberg’s Theory The Laplacian operator for selecting blob-like features Multi-Scale Feature Selection Process

2D LOG filter f x  with scale σ 2 2 2 2  x  G x;1  2  x LOG x;  e 2 2 250 strongest responses G x; 1    6 3D scale-space 2 3D scale-space (Large circle = large scale) LOG(;) x 1 representation function

x  f , 1..k : Original Image L x,   LOG x;  f x LOG(; x  2 )

2 Convolve Maximize  G x; 2  G x; 2  y LOG(; x  k )

  x f x 2G x; L x,   LOG x;   f x  k  σ G x; k 

Best features are at Scale-space Laplacian of representation Gaussian (LOG) (x,σ) that maximize L

14 Tracking Through Scale Space Tracking Through Scale Space Approximating LOG using DOG Using Lindeberg’s Theory

LOG x;   DOG x;   G x;   G x;1.6 

2D Gaussian 2D Gaussian 2D LOG filter 2D DOG filter    E x,  with μ=0 and with μ=0 and with scale σ with scale σ 1D scale kernel 3D scale-space scale 1.6σ scale σ (Epanechnikov) representation 3D spatial kernel Weight image Why DOG? (DOG)

• Gaussian pyramids are created faster Recall: Centered at Modes are blobs in current DOG filters at  the scale-space • Gaussian can be used as a mean-shift kernel location and Model: q  q1,,qm  at y0 3D spatial multiple scales scale neighborhood kernel  Candidate: p y   p1  y,, pm  y  1 b() x Color bin: Need a mean-shift  K x,  2 Scale-space   q The likelihood procedure that  filter bank b() x Pixel weight: w x  that each finds local modes  k pb() x  y0  candidate pixel belongs to the in E(x,σ) target

Outline of Approach Scale-Space Kernel

General Idea: build a “designer” shadow kernel that generates the desired DOG scale space when convolved with weight image w(x).

Change variables, and take derivatives of the shadow kernel to find corresponding mean-shift kernels using the relationship shown earlier.

Given an initial estimate (x0, s0), apply the mean-shift algorithm to find the nearest local mode in scale space. Note that, using mean-shift, we DO NOT have to explicitly generate the scale space.

Tracking Through Scale Space Sample Results: No Scaling Applying Mean-Shift

Use interleaved spatial/scale mean-shift

Spatial stage: Scale stage: Fix σ and Fix x and look for the look for the best x best σ

σ

σopt

Iterate stages until σ0 convergence x x0 xopt of x and σ

15 Sample Results: +/- 10% rule Sample Results: Scale-Space

Tracking Through Scale Space Results Fixed-scale

± 10% scale adaptation

Tracking through scale space

16