Deep Adversarial Gaussian Mixture Auto-Encoder for Clustering

Deep Adversarial Gaussian Mixture Auto-Encoder for Clustering Warith HARCHAOUI Pierre-Alexandre MATTEI Charles BOUVEYRON Université Paris Descartes MAP5 Oscaro.com Research & Development February 2017 1/17 Clustering Clustering is grouping similar objects together! 2/17 Thesis Representation Learning and Clustering operate a symbiosis 3/17 Gaussian Mixture Model I Density Estimation applied to Clustering for K modes/clusters I Linear complexity suitable for Large Scale Problems 4/17 Learning Representations I Successful in a supervised context (Kernel SVM) I Successful in an unsupervised context (Spectral Clustering) 5/17 Auto-Encoder An auto-encoder is a neural network that consists of: D d I an Encoder: E : R ! R (compression) d D I a Decoder: D : R ! R (decompression) D >> d D(E(x)) ' x 6/17 Optimization Scheme Gaussian Clusters (π, µ, Σ) GMM Discriminator Code Space Encoder Decoder Input Space Figure: Global Optimization Scheme for DAC 7/17 Adversarial Auto-Encoder An adversarial auto-encoder is a neural network that consists of: D d I an Encoder: E : R ! R (compression) d D I a Decoder: D : R ! R (decompression) d R I a Prior: P : ! and d P = 1 associated with a random R R R generator of distribution P d I a Discriminator: A : R ! [0; 1] ⊂ R that distinguishes fake data from the random generator and real data from the encoder 8/17 Optimizations 3 lines objectives: I The encoder and decoder try to minimize the reconstruction loss I The discriminator tries to distinguish fake codes (from the random generator associated with the prior) and real codes (from the encoder) I The encoder also tries to fool the discriminator (opposite discriminator loss function) 9/17 Results Datasets MNIST-70k Reuters-10k HHAR DAC EC (Ensemble Clustering) 96.50 73.34 81.24 DAC 94.08 72.14 80.5 GMVAE 88.54 - - DEC 84.30 72.17 79.86 AE + GMM (full covariances, median accuracy over 10 runs) 82.56 70.12 78.48 GMM 53.73 54.72 60.34 KM 53.47 54.04 59.98 Table: Experimental accuracy results (%, the higher, the better) based on the Hungarian method 10/17 Visualizations 0 6710 0 68 22 1 43 49 2 8 0 1 6 7486 173 8 88 2 8 25 80 1 2 5 1 6863 42 13 3 3 29 27 4 3 2 0 84 6819 2 125 1 24 81 3 4 3 1 22 0 6747 4 11 17 1 18 5 7 0 22 94 7 6104 39 1 29 10 Actual class 6 23 1 20 2 7 124 6678 0 21 0 7 2 3 90 17 80 2 0 7025 15 59 8 3 2 69 235 13 170 17 15 6281 20 9 11 0 59 127 318 44 3 104 62 6230 0 1 2 3 4 5 6 7 8 9 Predicted class Figure: Confusion matrix for DAC on MNIST. (best seen in color) 11/17 Visualizations µk µk + 0:5σ µk + 1σ µk + 1:5σ µk + 2σ µk + 2:5σ µk + 3σ µk + 3:5σ Figure: Generated digits images. From left to right, we have the ten classes found by DAC and ordered thanks to the Hungarian algorithm. From top to bottom, we go further and further in random directions from the centroids (the rst row being the decoded centroids). 12/17 Visualizations Figure: Principal Component Analysis rendering of the code space for MNIST at the end of the DAC optimization, with colors indicating the true labels. (best seen in color) 13/17 Conclusion Representation Learning and Clustering operate a symbiosis 14/17 ReferencesI Christopher M.. Bishop. Pattern recognition and machine learning. Springer, 2006. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 26722680, 2014. 15/17 ReferencesII Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, and Ian Goodfellow. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015. Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):33713408, 2010. 16/17.

Load more