Unsupervised Learning: Deep Auto-encoder Credits
• This lecture contains public domain materials from • Guy Nolan •Stefania Raimondo •Jason Su • Hung-yi Lee Auto-encoder
Usually <784 Compact NN code representation of Encoder the input object 28 X 28 = 784 Learn together
Can reconstruct code NN Decoder the original object Recap: PCA
Minimize 푥 − 푥ො 2 As close as possible
encode decode 푥 푐 푥ො 푊 푊푇 hidden layer Input layer (linear) output layer Bottleneck later
Output of the hidden layer is the code Symmetric is not Deep Auto-encoder necessary.
• Of course, the auto-encoder can be deep
As close as possible
Output Layer Output
Input Layer Input
bottle
Layer Layer
Layer
Layer
Layer Layer … … 푇 푇 푊2 푊 푊1 푊2 1
Initialize by RBM 푥 Code 푥ො layer-by-layer
Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507 Deep Auto-encoder
Original 784
Image 784 30
PCA
Deep
Auto-encoder
500 500
250 250
30
1000 1000
784 784 DEEP AUTOENCODER EXAMPLE https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html - By Andrej Karpathy 784 784
1000 2
500 784
250
2
250
500
1000
784 More: Contractive auto-encoder Ref: Rifai, Salah, et al. "Contractive Auto-encoder auto-encoders: Explicit invariance during feature extraction.“ Proceedings of the 28th International Conference on • De-noising auto-encoder Machine Learning (ICML-11). 2011. As close as possible
encode decode 푐 푥 푥′ 푥ො Add noise
Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." ICML, 2008. Deep Auto-encoder - Example
NN 푐 Encoder
PCA 32-dim
Pixel -> tSNE t-Distributed Stochastic Neighbor Embedding (t-SNE) Auto-encoder – Text Retrieval
Vector Space Model Bag-of-word this 1 is 1 word string: query “This is an apple” a 0 an 1 apple 1 pen 0
document …
Semantics are not considered. Auto-encoder – Text Retrieval The documents talking about the same thing will have close code. 2 query
125
250
500
LSA: project documents to 2000 2 latent topics Bag-of-word (document or query) Auto- As close as encoder possible for CNN Deconvolution
Unpooling Convolution
Deconvolution Pooling
Unpooling Convolution code
Deconvolution Pooling CNN -Unpooling
14 x 14 28 x 28
Alternative: simply Source of image : repeat the values https://leonardoaraujosantos.gitbooks.io/artificial- inteligence/content/image_segmentation.html CNN Actually, deconvolution is convolution. - Deconvolution
+
+ + = + Auto-encoder – Pre-training DNN
• Greedy Layer-wise Pre-training again
output 10
500
1000 784 푥ො Target W1’ 1000 1000 W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN
• Greedy Layer-wise Pre-training again
output 10
500 1000 푎ො1 W2’
1000 1000 Target W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN
• Greedy Layer-wise Pre-training again
output 10 1000 푎ො2 W3’ 500 500 W3
1000 1000 푎2 Target fix W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN
Find-tune by • Greedy Layer-wise Pre-training again backpropagation
output 10 output 10 Random 4 W init 500 500 W3
1000 1000 Target W2 1000 1000 W1 Input 784 Input 784 푥 SIMPLE LATENT SPACE INTERPOLATION - KERAS
푧1
Encoder
푧2
Encoder SIMPLE LATENT SPACE INTERPOLATION - KERAS
푧1 푧2
푧푖 = 훼 + 1 − 훼
푧푖
Decoder SIMPLE LATENT SPACE INTERPOLATION – KERAS CODE EXAMPLE SIMPLE LATENT SPACE – INTERPOLATION - KERAS DENOISING AUTOENCODERS
Intuition: - We still aim to encode the input and to NOT mimic the identity function. - We try to undo the effect of corruption process stochastically applied to the input. A more robust model
Encoder Decoder
Noisy Input Denoised Input Latent space representation DENOISING AUTOENCODERS Use Case: - Extract robust representation for a NN classifier.
Encoder
Noisy Input Latent space representation DENOISING AUTOENCODERS Instead of trying to mimic the identity function by minimizing: 퐿 푥, 푔 푓 푥
where L is some loss function
A DAE instead minimizes: 퐿 푥, 푔 푓 푥
where 푥 is a copy of 푥 that has been corrupted by some form of noise. DENOISING AUTOENCODERS
Idea: A robust representation 푥ො against noise: 푤′ - Random assignment of subset of inputs to 0, with probability 푣. 푓 푥 - Gaussian additive noise. 푤 푥 DENOISING AUTOENCODERS 푥ො
• Reconstruction 푥ො computed from the 푤′ corrupted input 푥. • Loss function compares 푥ො reconstruction 푓 푥 with the noiseless 푥.
❖The autoencoder cannot fully trust each 푤 feature of 푥 independently so it must learn the correlations of 푥’s features. 푥 0 0 0 ❖Based on those relations we can predict a more ‘not prune to changes’ model. Noise Process 풑 풙ȁ풙 ➢ We are forcing the hidden layer to learn a 푥 generalized structure of the data. DENOISING AUTOENCODERS - PROCESS
Taken some input 푥 Apply Noise 푥 DENOISING AUTOENCODERS - PROCESS
푥 Encode And Decode DAEDAE
푔 푓 푥 DENOISING AUTOENCODERS - PROCESS
DAEDAE 푥ො
푔 푓 푥 DENOISING AUTOENCODERS - PROCESS
푥ො Compare 푥 DENOISING AUTOENCODERS DENOISING CONVOLUTIONAL AE – KERAS
- 50 epochs. - Noise factor 0.5 - 92% accuracy on validation set. STACKED AE
- Motivation: ❑ We want to harness the feature extraction quality of a AE for our advantage. ❑ For example: we can build a deep supervised classifier where it’s input is the output of a SAE. ❑ The benefit: our deep model’s W are not randomly initialized but are rather “smartly selected” ❑Also using this unsupervised technique lets us have a larger unlabeled dataset. STACKED AE
- Building a SAE consists of two phases: 1. Train each AE layer one after the other. 2. Connect any classifier (SVM / FC NN layer etc.) STACKED AE
푥 푦
SAE Classifier STACKED AE – TRAIN PROCESS
First Layer Training (AE 1)
푥 푓1 푥 푧1 푔1 푧1 푥ො STACKED AE – TRAIN PROCESS
Second Layer Training (AE 2)
푧Ƹ 푥 푓1 푥 푧1 푓2 푧1 푧2 푔2 푧2 1 STACKED AE – TRAIN PROCESS
Add any classifier
푥 푓1 푥 푧1 푓2 푧1 푧2 Classifier Output https://mi.eng.cam.ac.uk/projects/segnet/#demo
U-Net • Current state of the art – Very popular in MICCAI 2016 – Works well with low data • Influenced by the previous – Up-conv 2x2 = bilinear up- sampling then 2x2 convolution – 2D slices – 3x3 convolutions
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ Credits • This lecture contains public domain materials from – Guy Nolan – Stefania Raimondo – Jason Su – Hung-yi Lee