Unsupervised Learning: Deep Auto-Encoder Credits

Unsupervised Learning: Deep Auto-encoder Credits • This lecture contains public domain materials from • Guy Nolan •Stefania Raimondo •Jason Su • Hung-yi Lee Auto-encoder Usually <784 Compact NN code representation of Encoder the input object 28 X 28 = 784 Learn together Can reconstruct code NN Decoder the original object Recap: PCA Minimize 푥 − 푥ො 2 As close as possible encode decode 푥 푐 푥ො 푊 푊푇 hidden layer Input layer (linear) output layer Bottleneck later Output of the hidden layer is the code Symmetric is not Deep Auto-encoder necessary. • Of course, the auto-encoder can be deep As close as possible Output Layer Input Layer bottle Layer Layer Layer Layer Layer Layer Layer Layer Layer Layer Layer … … 푇 푇 푊2 푊 푊1 푊2 1 Initialize by RBM 푥 Code 푥ො layer-by-layer Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507 784 784 1000 30 500 784 250 30 250 500 encoder 1000 - 784 encoder PCA - Deep Image Original Auto Deep Auto DEEP AUTOENCODER EXAMPLE https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html - By Andrej Karpathy 784 784 1000 2 500 784 250 2 250 500 1000 784 More: Contractive auto-encoder Ref: Rifai, Salah, et al. "Contractive Auto-encoder auto-encoders: Explicit invariance during feature extraction.“ Proceedings of the 28th International Conference on • De-noising auto-encoder Machine Learning (ICML-11). 2011. As close as possible encode decode 푐 푥 푥′ 푥ො Add noise Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." ICML, 2008. Deep Auto-encoder - Example NN 푐 Encoder PCA 32-dim Pixel -> tSNE t-Distributed Stochastic Neighbor Embedding (t-SNE) Auto-encoder – Text Retrieval Vector Space Model Bag-of-word this 1 is 1 word string: query “This is an apple” a 0 an 1 apple 1 pen 0 document … Semantics are not considered. Auto-encoder – Text Retrieval The documents talking about the same thing will have close code. 2 query 125 250 500 LSA: project documents to 2000 2 latent topics Bag-of-word (document or query) Auto- As close as encoder possible for CNN Deconvolution Unpooling Convolution Deconvolution Pooling Unpooling Convolution code Deconvolution Pooling CNN -Unpooling 14 x 14 28 x 28 Alternative: simply Source of image : repeat the values https://leonardoaraujosantos.gitbooks.io/artificial- inteligence/content/image_segmentation.html CNN Actually, deconvolution is convolution. - Deconvolution + + + = + Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 500 1000 784 푥ො Target W1’ 1000 1000 W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 500 1000 푎ො1 W2’ 1000 1000 Target W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 1000 푎ො2 W3’ 500 500 W3 1000 1000 푎2 Target fix W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN Find-tune by • Greedy Layer-wise Pre-training again backpropagation output 10 output 10 Random 4 W init 500 500 W3 1000 1000 Target W2 1000 1000 W1 Input 784 Input 784 푥 SIMPLE LATENT SPACE INTERPOLATION - KERAS 푧1 Encoder 푧2 Encoder SIMPLE LATENT SPACE INTERPOLATION - KERAS 푧1 푧2 푧푖 = 훼 + 1 − 훼 푧푖 Decoder SIMPLE LATENT SPACE INTERPOLATION – KERAS CODE EXAMPLE SIMPLE LATENT SPACE – INTERPOLATION - KERAS DENOISING AUTOENCODERS Intuition: - We still aim to encode the input and to NOT mimic the identity function. - We try to undo the effect of corruption process stochastically applied to the input. A more robust model Encoder Decoder Noisy Input Denoised Input Latent space representation DENOISING AUTOENCODERS Use Case: - Extract robust representation for a NN classifier. Encoder Noisy Input Latent space representation DENOISING AUTOENCODERS Instead of trying to mimic the identity function by minimizing: 퐿 푥, 푔 푓 푥 where L is some loss function A DAE instead minimizes: 퐿 푥, 푔 푓 푥෤ where 푥෤ is a copy of 푥 that has been corrupted by some form of noise. DENOISING AUTOENCODERS Idea: A robust representation 푥ො against noise: 푤′ - Random assignment of subset of inputs to 0, with probability 푣. 푓 푥 - Gaussian additive noise. 푤 푥 DENOISING AUTOENCODERS 푥ො • Reconstruction 푥ො computed from the 푤′ corrupted input 푥෤. • Loss function compares 푥ො reconstruction 푓 푥෤ with the noiseless 푥. ❖The autoencoder cannot fully trust each 푤 feature of 푥 independently so it must learn the correlations of 푥’s features. 푥෤ 0 0 0 ❖Based on those relations we can predict a more ‘not prune to changes’ model. Noise Process 풑 풙෥ȁ풙 ➢ We are forcing the hidden layer to learn a 푥 generalized structure of the data. DENOISING AUTOENCODERS - PROCESS Taken some input 푥 Apply Noise 푥෤ DENOISING AUTOENCODERS - PROCESS 푥෤ Encode And Decode DAEDAE 푔 푓 푥෤ DENOISING AUTOENCODERS - PROCESS DAEDAE 푥ො 푔 푓 푥෤ DENOISING AUTOENCODERS - PROCESS 푥ො Compare 푥 DENOISING AUTOENCODERS DENOISING CONVOLUTIONAL AE – KERAS - 50 epochs. - Noise factor 0.5 - 92% accuracy on validation set. STACKED AE - Motivation: ❑ We want to harness the feature extraction quality of a AE for our advantage. ❑ For example: we can build a deep supervised classifier where it’s input is the output of a SAE. ❑ The benefit: our deep model’s W are not randomly initialized but are rather “smartly selected” ❑Also using this unsupervised technique lets us have a larger unlabeled dataset. STACKED AE - Building a SAE consists of two phases: 1. Train each AE layer one after the other. 2. Connect any classifier (SVM / FC NN layer etc.) STACKED AE 푥 푦 SAE Classifier STACKED AE – TRAIN PROCESS First Layer Training (AE 1) 푥 푓1 푥 푧1 푔1 푧1 푥ො STACKED AE – TRAIN PROCESS Second Layer Training (AE 2) 푧Ƹ 푥 푓1 푥 푧1 푓2 푧1 푧2 푔2 푧2 1 STACKED AE – TRAIN PROCESS Add any classifier 푥 푓1 푥 푧1 푓2 푧1 푧2 Classifier Output https://mi.eng.cam.ac.uk/projects/segnet/#demo U-Net • Current state of the art – Very popular in MICCAI 2016 – Works well with low data • Influenced by the previous – Up-conv 2x2 = bilinear up- sampling then 2x2 convolution – 2D slices – 3x3 convolutions https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ Credits • This lecture contains public domain materials from – Guy Nolan – Stefania Raimondo – Jason Su – Hung-yi Lee .

Unsupervised Learning: Deep Auto-Encoder Credits

Self-Discriminative Learning for Unsupervised Document Embedding

Q-Learning in Continuous State and Action Spaces

Face Recognition: a Convolutional Neural-Network Approach

CNN Architectures

Reinforcement Learning Data Science Africa 2018 Abuja, Nigeria (12 Nov - 16 Nov 2018)

Deep Learning Architectures for Sequence Processing

A Review of Unsupervised Artificial Neural Networks with Applications

A Hybrid Model Consisting of Supervised and Unsupervised Learning for Landslide Susceptibility Mapping

4 Perceptron Learning

Unsupervised Pre-Training of a Deep LSTM-Based Stacked Autoencoder for Multivariate Time Series Forecasting Problems Alaa Sagheer 1,2,3* & Mostafa Kotb2,3

Optimal Path Routing Using Reinforcement Learning

Introduction-To-Deep-Learning.Pdf