Meanshift Clustering What Is Mean Shift ? Parzen Density Estimation

Robert Collins Robert Collins CSE586 CSE586 MeanShift Clustering nonparametric mode seeking Advanced Clustering Methods MeanShift, MedoidShift and QuickShift References: D. Comaniciu, P. Meer: Mean Shift: A Robust Approach toward Feature Space Analysis, IEEE Trans. Pattern Analysis Machine Intel., Vol. 24(5), 603-619, 2002 Yaser Sheikh, Erum Khan, Takeo Kanade, "Mode-seeking via Medoidshifts", IEEE International Conference on Computer Vision, 2007. A. Vedaldi and S. Soatto, "Quick Shift and Kernel Methods for Mode Seeking," in Proceedings of the European Conference on Computer Vision (ECCV), 2008 Don’t need to know number of clusters in advance! Robert Collins Robert Collins CSE586 CSE586 Parzen Density Estimation What is Mean Shift ? Given a set of data samples xi; i=1...n A tool for: Convolve with a kernel function H to generate a smooth function f(x) Finding modes in a set of data samples, manifesting an Equivalent to superposition of multiple kernels centered at each data point underlying probability density function (PDF) in RN Non-parametric Density Estimation Discrete PDF Representation Data H Non-parametric Density GRADIENT Estimation (Mean Shift) PDF Analysis Ukrainitz&Sarel, Weizmann Robert Collins Robert Collins CSE586 CSE586 Smoothing Scale Selection Density Estimation at Different Scales • Example: Density estimates for 5 data points with • Unresolved issue: how to pick the scale differently-scaled kernels (sigma value) of the smoothing filter • Scale influences accuracy vs. generality (overfitting) • Answer for now: a user-settable parameter from Duda et al. from Duda et al. 1 Robert Collins Robert Collins CSE586 CSE586 Mean-Shift MeanShift as Mode Seeking P(x) The mean-shift algorithm is a hill-climbing algorithm that seeks modes of a density without explicitly computing that density. The density is implicitly represented by raw samples and a kernel function. The density is the one that would be computed if Parzen estimation was applied to the data with the given kernel. x1 x2 x3 Parzen density estimation Robert Collins Robert Collins CSE586 CSE586 MeanShift as Mode Seeking MeanShift as Mode Seeking P(x) P(x) P(y2) P(y1) y1 y2 y1 y2 Seeking the mode from an initial point y1 Note, P(y2) >= b(y2) > b(y1) = P(y1) Construct a tight convex lower bound b at y1 [b(y1)=P(y1)] Therefore P(y2) > P(y1) Find y2 to maximize b Robert Collins Robert Collins CSE586 CSE586 MeanShift as Mode Seeking Aside: EM Optimizes a Lower Bound P(x) • The optimization EM is performing can also be described as constructing a lower bound, optimizing it, then repeating until convergence. • In the case of EM the lower bound is constructed by Jensen’s Inequality: log 1 Σ f(x ) > 1 Σ log f(x ) i Ν i y2 y3 y4 y5 y6 Ν Move to y2 and repeat until you reach a mode Not necessarily quadratic, but solvable in closed-form 2 Robert Collins Robert Collins CSE586 CSE586 MeanShift in Equations MeanShift in Equations Parzen density estimate using {x1,x2,...,xN} Parzen density estimate using {x1,x2,...,xN} where Φ(r) is a convex, symmetric windowing (weighting) where Φ(r) is a convex, symmetric windowing (weighting) kernel; e.g. Φ(r)=exp(-r) with r = ||x -x||2 kernel; e.g. Φ(r)=exp(-r) i 2 Form lower bound of each Φj by expanding about r0= ||yold-xj|| Φ(r)= 1-r Examples Epanichnikov Φ(r)= exp(-r) note: first-order taylor series expansion Gaussian f(r) ~ f(r0) + (r-r0)f’(r0) + h.o.t. Robert Collins Robert Collins CSE586 CSE586 MeanShift in Equations MeanShift in Equations Quadratic lower bound on the Parsen density function Lower bound of Φ at r0 Want to find y to maximize this. We therefore need to maximize now let and substitute into Parzen equation to get a quadratic lower bound (wrt y) on the Parzen density function Take derivative wrt y, set to 0, and solve where mean-shift update eqn Robert Collins Robert Collins CSE586 CSE586 MeanShift in Equations Intuitive Description Density estimate Region of interest Mean-shift update eqn Center of mass Weights at each iteration Epanichnikov Flat Mean Shift vector Objective : Find the densest region Gaussian Gaussian Distribution of identical billiard balls Ukrainitz&Sarel, Weizmann 3 Robert Collins Robert Collins CSE586 CSE586 Intuitive Description Intuitive Description Region of Region of interest interest Center of Center of mass mass Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls Ukrainitz&Sarel, Weizmann Ukrainitz&Sarel, Weizmann Robert Collins Robert Collins CSE586 CSE586 Intuitive Description Intuitive Description Region of Region of interest interest Center of Center of mass mass Mean Shift Mean Shift vector vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls Ukrainitz&Sarel, Weizmann Ukrainitz&Sarel, Weizmann Robert Collins Robert Collins CSE586 CSE586 Intuitive Description Intuitive Description Region of Region of interest interest Center of Center of mass mass Mean Shift vector Objective : Find the densest region Objective : Find the densest region Distribution of identical billiard balls Distribution of identical billiard balls Ukrainitz&Sarel, Weizmann Ukrainitz&Sarel, Weizmann 4 Robert Collins Robert Collins CSE586 CSE586 MeanShift Clustering What is MedoidShift? For each data point x i Instead of solving for continuous-valued y to maximize the quadratic lower bound on the Parzen density estimate, Let y (0) = x ; t = 0 i i consider only a discrete set of y locations, taken from the initial set of Repeat data points {x1,x2,...,xN} Compute weights w1, w2, ... wN for all data points, using yi(t) That is, for the first step towards the mode Compute updated location yi(t+1) using mean-shift update eqn If ( yi(t) and yi(t+1) are “close enough” ) declare convergence to a mode and exit y (0) = x else y is now chosen from a i i t = t + 1; discrete set of data points number distinct modes found = number clusters end (repeat) points that converge to “same” mode are labeled as belonging to one cluster Robert Collins Robert Collins CSE586 CSE586 MedoidShift MedoidShift Implications of this strategy for clustering: Different clusters are found as separate connected components (trees, actually... so can do efficient tree traversal to locate all points in each Since we are choosing the best y from among {x1,x2,...,xN}, cluster. yi(1) will be one of the original data points. Also note: no need for heuristic distance thresholds to test for Therefore, we only need to compute a single step starting from each convergence to a mode, or to determine membership in a cluster. data point, because yi(2) will then be the result of yyi(1)(1) x1 x1 x y3(1) 6 x y5(3) x3 x3 7 y4(1) x10 x8 y (2) x y2(1) 5 4 x4 x9 x x 2 y (1) y (1) 2 5 5 x11 x5 x5 y5(0) Robert Collins Robert Collins CSE586 CSE586 MedoidShift Advantages of MedoidShift • Automatically determines the number of clusters (as does mean-shift) • Previous computations can be reused when new samples are added or old samples are deleted (good for incremental clustering applications) • Can work in domains where only distances between samples are defined (e.g. EMD or ISOMAP); there is no need to compute a mean. • No need for heuristic terminating conditions (don’t need to use a distance threshold to determine convergence) MedoidShift behavior asymptotically approaches MeanShift behavior as number of data samples approaches infinity. 5 Robert Collins Robert Collins CSE586 CSE586 MeanShift vs MedoidShift Drawbacks of MedoidShift • O(N3) [although not really – Vedaldi paper] • Can fail to find all the modes of a density, thereby causing over-fragmentation starting point continuous meanshift step goes here but medoidshift has to stay here Robert Collins Robert Collins CSE586 CSE586 Iterated MedoidShift Seeds of QuickShift! • keep a count of how many points move to xj and apply a new iteration of medoidshift with points weighted by that count. • Each iteration is essentially using a smoother Parzen estimate Robert Collins Robert Collins CSE586 CSE586 Vedaldi and Soatto Kernel MedoidShift Contributions: Insight: medoidshift is defined in terms of pairwise distances • If using Euclidean distance, MedoidShift can actually be between data points. implemented in O(N^2) Therefore, using the “kernel trick”, we can generalize it to • generalize to non-Euclidean distances using kernel trick handle nonEuclidean distances defined by a kernel matrix K (similar to how many machine learning algorithms are “kernelized”) • show that MedoidShift does not always find all modes • propose QuickShift, an alternative approach that is simple, fast, and yields a one-parameter family of clustering results that explicitly represent the tradeoff between over- and under-fragmentation. 6 Robert Collins Robert Collins CSE586 CSE586 What is QuickShift? QuickShift vs MedoidShift A combination of the above kernel trick and... • Recall our failure case for medoidshift instead of choosing the data point that optimizes the lower bound function b(x) of the Parzen estimate P(x) choose the closest data point that yields a higher value of P(x) directly starting point continuous meanshift goes here medoidshift has to stay here quickshift goes HERE Robert Collins Robert Collins CSE586 CSE586 Comparisons Quickshift Benefits • Similar benefits to medoidshift, plus additional ones: • All points are in one connected tree. You can therefore explore various clusterings by choosing

Load more