Mean Shift Theory and Applications

Appearance-Based Tracking current frame + previous location likelihood over object location current location Mean-shift Tracking appearance model R.Collins, CSE, PSU (e.g. image template, or Mode-Seeking (e.g. mean-shift; Lucas-Kanade; CSE598G Spring 2006 particle filtering) color; intensity; edge histograms) Mean-Shift Motivation • Motivation – to track non-rigid objects, (like The mean-shift algorithm is an efficient a walking person), it is hard to specify approach to tracking objects whose an explicit 2D parametric motion model. appearance is defined by histograms. • Appearances of non-rigid objects can (not limited to only color) sometimes be modeled with color distributions Credits: Many Slides Borrowed from www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/mean_shift/mean_shift.ppt Mean Shift Theory and Applications Yaron Ukrainitz & Bernard Sarel Mean Shift Theory weizmann institute Advanced topics in computer vision even the slides on my own work! --Bob 1 Intuitive Description Intuitive Description Region of Region of interest interest Center of Center of mass mass Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls Intuitive Description Intuitive Description Region of Region of interest interest Center of Center of mass mass Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls Intuitive Description Intuitive Description Region of Region of interest interest Center of Center of mass mass Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls 2 Intuitive Description What is Mean Shift ? Region of interest A tool for: Center of Finding modes in a set of data samples, manifesting an mass underlying probability density function (PDF) in RN PDF in feature space • Color space Non-parametric • Scale spaceDensity Estimation • Actually any feature space you can conceive • … Discrete PDF Representation Data Non-parametric Density GRADIENT Estimation (Mean Shift) Objective : Find the densest region Distribution of identical billiard balls PDF Analysis Non-Parametric Density Estimation Non-Parametric Density Estimation Assumption : The data points are sampled from an underlying PDF Data point density implies PDF value ! Assumed Underlying PDF Real Data Samples Assumed Underlying PDF Real Data Samples Non-Parametric Density Estimation Appearance via Color Histograms R’ B’ G’ Color distribution (1D histogram discretize normalized to have unit weight) R’ = R << (8 - nbits) Total histogram size is (2^(8-nbits))^3 G’ = G << (8 - nbits) B’ = B << (8-nbits) example, 4-bit encoding of R,G and B channels yields a histogram of size 16*16*16 = 4096. Assumed Underlying PDF Real Data Samples 3 Smaller Color Histograms Color Histogram Example Histogram information can be much much smaller if we are willing to accept a loss in color resolvability. red green blue Marginal R distribution R’ G’ Marginal G distribution B’ discretize Marginal B distribution R’ = R << (8 - nbits) Total histogram size is 3*(2^(8-nbits)) G’ = G << (8 - nbits) B’ = B << (8-nbits) example, 4-bit encoding of R,G and B channels yields a histogram of size 3*16 = 48. Normalized Color Intro to Parzen Estimation (Aka Kernel Density Estimation) (r,g,b) (r’,g’,b’) = (r,g,b) / (r+g+b) Mathematical model of how histograms are formed Assume continuous data points Normalized color divides out pixel luminance (brightness), leaving behind only chromaticity (color) information. The result is less sensitive to variations due to illumination/shading. Parzen Estimation Example Histograms (Aka Kernel Density Estimation) Box filter Mathematical model of how histograms are formed [1 1 1] Assume continuous data points [ 1 1 1 1 1 1 1] Convolve with box filter of width w (e.g. [1 1 1]) Take samples of result, with spacing w [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] Resulting value at point u represents count of data points falling in range u-w/2 to u+w/2 Increased smoothing 4 Why Formulate it This Way? Kernel Density Estimation • Generalize from box filter to other filters • Parzen windows: Approximate probability density by estimating local density of points (same idea as a (for example Gaussian) histogram) • Gaussian acts as a smoothing filter. – Convolve points with window/kernel function (e.g., Gaussian) using scale parameter (e.g., sigma) from Hastie et al. Density Estimation at Different Scales Smoothing Scale Selection • Example: Density estimates for 5 data points with • Unresolved issue: how to pick the scale differently-scaled kernels (sigma value) of the smoothing filter • Scale influences accuracy vs. generality (overfitting) • Answer for now: a user-settable parameter from Duda et al. from Duda et al. Kernel Density Estimation Kernel Density Estimation Parzen Windows - General Framework Parzen Windows - Function Forms 1 n 1 n P()x K(x - xi ) A function of some finite number of data points P()x K(x - xi ) A function of some finite number of data points n i1 n i1 x1…xn x1…xn Kernel Properties: Data Data • Normalized K()xd x 1 In practice one uses the forms: Rd • Symmetric xK()xd x 0 d d or R K()x ck() xi K()x ck x i1 d • Exponential weight lim x K()x 0 decay x • ??? xxT K()xd x cI Same function on each dimension Function of vector length only Rd 5 Kernel Density Estimation Various Kernels Key Idea: n 1 n 1 P()x K(x - x ) A function of some finite number of data points P()x K(x - x ) i i n i1 x1…xn n i1 Superposition of kernels, centered at each data point is equivalent Examples: Data to convolving the data points with the kernel. 2 c 1 x x 1 • Epanechnikov Kernel K E ()x 0 otherwise convolution c x 1 • Uniform Kernel KU ()x 0 otherwise * Kernel Data 1 2 • Normal Kernel K N ()x c exp x 2 Kernel Density Estimation Key Idea: Gradient n 1 n 1 P(x ) K(x - x ) Give up estimating the PDF ! i P(x ) K(x-xi) n i1 Estimate ONLY the gradient ni1 2 Gradient of superposition of kernels, centered at each data point Using the x - x K(x - x ) ck i is equivalent to convolving the data points with gradient of the kernel. Kernel form: i h We get : Size of window convolution n x g c n c n i i P ()x k g i 1 x * n i n i n i1 i 1 g i gradient of Kernel Data i 1 g(x ) k()x ComputingKernel Density The EstimationMean Shift Computing The Mean Shift Gradient n x g c n c n i i P ()x k g i 1 x n i n i n i1 i 1 g i i 1 Yet another Kernel density estimation ! Simple Mean Shift procedure: • Compute mean shift vector n 2 x g n x - x n n i i x g i c c i 1 i P ()x k g x i1 h i i n m() x x 2 n i1 n i 1 n g x - x i g i i 1 h i 1 g(x ) k()x •Translate the Kernel window by m(x) g(x ) k()x 6 Mean Shift Mode Detection Mean Shift Properties What happens if we reach a saddle point ? • Automatic convergence speed – the mean shift Adaptive vector size depends on the gradient itself. Gradient Ascent Perturb the mode position • Near maxima, the steps are small and refined and check if we return back • Convergence is guaranteed for infinitesimal steps only infinitely convergent, (therefore set a lower bound) • For Uniform Kernel ( ), convergence is achieved in a finite number of steps Updated Mean Shift Procedure: • Find all modes using the Simple Mean Shift Procedure • Normal Kernel ( ) exhibits a smooth trajectory, but • Prune modes by perturbing them (find saddle points and plateaus) is slower than Uniform Kernel ( ). • Prune nearby – take highest mode in the window Real Modality Analysis Real Modality Analysis Tessellate the space Run the procedure in parallel The blue data points were traversed by the windows towards the mode with windows Real Modality Analysis Adaptive Mean Shift An example Window tracks signify the steepest ascent directions 7 Mean Shift Strengths & Weaknesses Strengths : Weaknesses : • Application independent tool • The window size (bandwidth selection) is not trivial • Suitable for real data analysis Mean Shift Applications • Inappropriate window size can • Does not assume any prior shape cause modes to be merged, (e.g. elliptical) on data clusters or generate additional “shallow” modes Use adaptive window • Can handle arbitrary feature size spaces • Only ONE parameter to choose • h (window size) has a physical meaning, unlike K-Means Clustering Clustering Synthetic Examples Cluster : All data points in the attraction basin of a mode Attraction basin : the region for which all trajectories lead to the same mode Simple Modal Structures Complex Modal Structures Mean Shift : A robust Approach Toward Feature Space Analysis, by Comaniciu, Meer Clustering Clustering Real Example Real Example Feature space: Initial window L*u*v representation centers L*u*v space representation Modes found Modes after Final clusters pruning 8 Clustering Real Example Non-Rigid Object Tracking 2D (L*u) Final clusters space representation … … Not all trajectories in the attraction basin reach the same mode Kernel Based Object Tracking, by Comaniciu, Ramesh, Meer (CRM) Non-Rigid Object Tracking Mean-Shift Object Tracking General Framework: Target Representation Real-Time Choose a Represent the Object-Based reference Choose a model in the Surveillance Driver Assistance Video Compression model in the feature space chosen feature current frame space … Current … frame Mean-Shift Object Tracking Using Mean-Shift for General Framework: Target Localization Tracking in Color Images Start from the Search in the Find best position of the model’s candidate by Two approaches: model in the neighborhood maximizing a current frame in next frame similarity func.

Mean Shift Theory and Applications

Mean Shift Paper by Comaniciu and Meer Presentation by Carlo Lopez-Tello What Is the Mean Shift Algorithm?

Identification Using Telematics

DBSCAN++: Towards Fast and Scalable Density Clustering

Deep Mean-Shift Priors for Image Restoration

Comparative Analysis of Clustering Techniques for Movie Recommendation

The Variable Bandwidth Mean Shift and Data-Driven Scale Selection

A Comparison of Clustering Algorithms for Face Clustering

Generalized Mean Shift with Triangular Kernel Profile Arxiv:2001.02165V1 [Cs.LG] 7 Jan 2020

Application of Deep Learning on Millimeter-Wave Radar Signals: a Review

Machine Learning – En Introduktion Josefin Rosén, Senior Analytical Expert, SAS Institute

Arxiv:1703.09964V1 [Cs.CV] 29 Mar 2017 Denoising, Or Each Magniﬁcation Factor in Super-Resolution

Outline of Machine Learning