CSE601 Mixture Model Clustering

Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline • Basics – Motivation, definition, evaluation • Methods – Partitional – Hierarchical – Density-based – Mixture model – Spectral methods • Advanced topics – Clustering ensemble – Clustering in MapReduce – Semi-supervised clustering, subspace clustering, co-clustering, etc. 2 Using Probabilistic Models for Clustering • Hard vs. soft clustering – Hard clustering: Every point belongs to exactly one cluster – Soft clustering: Every point belongs to several clusters with certain degrees • Probabilistic clustering – Each cluster is mathematically represented by a parametric distribution – The entire data set is modeled by a mixture of these distributions 3 Gaussian Distribution f(x) Changing μ shifts the distribution left or right Changing σ increases or decreases the spread σ μ x Probability density function f(x) is a function of x given μ and σ 1 1 x N(x | , 2 ) exp( ( )2 ) 2 2 4 Likelihood Which Gaussian distribution is f(x) more likely to generate the data? x Define likelihood as a function of μ and σ n given x1, x2, …, xn 2 N(xi | , ) i1 5 Gaussian Distribution • Multivariate Gaussian mean covariance • Log likelihood n n 1 T 1 L(, ) ln N(xi |,) ( (xi ) (xi )) ln | |) i1 i1 2 6 Maximum Likelihood Estimate • MLE – Find model parameters , that maximize log likelihood L(,) • MLE for Gaussian 7 Gaussian Mixture • Linear combination of Gaussians where parameters to be estimated 8 Gaussian Mixture • To generate a data point: – first pick one of the components with probability – then draw a sample from that component distribution • Each data point is generated by one of K components, a latent variable is associated with each 9 Gaussian Mixture • Maximize log likelihood • Without knowing values of latent variables, we have to maximize the incomplete log likelihood 10 Expectation-Maximization (EM) Algorithm • E-step: for given parameter values we can compute the expected values of the latent variables (responsibilities of data points) – Note that instead of but we still have 11 Expectation-Maximization (EM) Algorithm • M-step: maximize the expected complete log likelihood • Parameter update: 12 EM Algorithm • Iterate E-step and M-step until the log likelihood of data does not increase any more. – Converge to local optimal – Need to restart algorithm with different initial guess of parameters (as in K-means) • Relation to K-means – Consider GMM with common covariance – As , two methods coincide 13 14 14 15 15 16 16 17 17 18 18 19 19 K-means vs GMM • Objective function • Objective function – Minimize sum of squared – Maximize log-likelihood Euclidean distance • EM algorithm • Can be optimized by an EM – E-step: Compute posterior algorithm probability of membership – E-step: assign points to clusters – M-step: Optimize parameters – M-step: optimize clusters – Perform soft assignment during E-step – Performs hard assignment during E-step • Can be used for non-spherical • Assumes spherical clusters with clusters equal probability of a cluster • Can generate clusters with different probabilities 20 Mixture Model • Strengths – Give probabilistic cluster assignments – Have probabilistic interpretation – Can handle clusters with varying sizes, variance etc. • Weakness – Initialization matters – Choose appropriate distributions – Overfitting issues 21 Take-away Message • Probabilistic clustering • Maximum likelihood estimate • Gaussian mixture model for clustering • EM algorithm that assigns points to clusters and estimates model parameters alternatively • Strengths and weakness 22 .

CSE601 Mixture Model Clustering

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support