<<

Unsupervised Learning: Deep Auto-encoder Credits

• This lecture contains public domain materials from • Guy Nolan •Stefania Raimondo •Jason Su • Hung-yi Lee Auto-encoder

Usually <784 Compact NN code representation of Encoder the input object 28 X 28 = 784 Learn together

Can reconstruct code NN Decoder the original object Recap: PCA

Minimize 푥 − 푥ො 2 As close as possible

encode decode 푥 푐 푥ො 푊 푊푇 hidden layer Input layer (linear) output layer Bottleneck later

Output of the hidden layer is the code Symmetric is not Deep Auto-encoder necessary.

• Of course, the auto-encoder can be deep

As close as possible

Output Layer Output

Input Layer Input

bottle

Layer Layer

Layer

Layer

Layer Layer … … 푇 푇 푊2 푊 푊1 푊2 1

Initialize by RBM 푥 Code 푥ො layer-by-layer

Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507 Deep Auto-encoder

Original 784

Image 784 30

PCA

Deep

Auto-encoder

500 500

250 250

30

1000 1000

784 784 DEEP EXAMPLE https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html - By Andrej Karpathy 784 784

1000 2

500 784

250

2

250

500

1000

784 More: Contractive auto-encoder Ref: Rifai, Salah, et al. "Contractive Auto-encoder auto-encoders: Explicit invariance during feature extraction.“ Proceedings of the 28th International Conference on • De-noising auto-encoder (ICML-11). 2011. As close as possible

encode decode 푐 푥 푥′ 푥ො Add noise

Vincent, Pascal, et al. "Extracting and composing robust features with denoising ." ICML, 2008. Deep Auto-encoder - Example

NN 푐 Encoder

PCA 32-dim

Pixel -> tSNE t-Distributed Stochastic Neighbor Embedding (t-SNE) Auto-encoder – Text Retrieval

Vector Space Model Bag-of-word this 1 is 1 word string: query “This is an apple” a 0 an 1 apple 1 pen 0

document …

Semantics are not considered. Auto-encoder – Text Retrieval The documents talking about the same thing will have close code. 2 query

125

250

500

LSA: project documents to 2000 2 latent topics Bag-of-word (document or query) Auto- As close as encoder possible for CNN Deconvolution

Unpooling

Deconvolution Pooling

Unpooling Convolution code

Deconvolution Pooling CNN -Unpooling

14 x 14 28 x 28

Alternative: simply Source of image : repeat the values https://leonardoaraujosantos.gitbooks.io/artificial- inteligence/content/image_segmentation.html CNN Actually, deconvolution is convolution. - Deconvolution

+

+ + = + Auto-encoder – Pre-training DNN

• Greedy Layer-wise Pre-training again

output 10

500

1000 784 푥ො Target W1’ 1000 1000 W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN

• Greedy Layer-wise Pre-training again

output 10

500 1000 푎ො1 W2’

1000 1000 Target W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN

• Greedy Layer-wise Pre-training again

output 10 1000 푎ො2 W3’ 500 500 W3

1000 1000 푎2 Target fix W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN

Find-tune by • Greedy Layer-wise Pre-training again

output 10 output 10 Random 4 W init 500 500 W3

1000 1000 Target W2 1000 1000 W1 Input 784 Input 784 푥 SIMPLE LATENT SPACE INTERPOLATION -

푧1

Encoder

푧2

Encoder SIMPLE LATENT SPACE INTERPOLATION - KERAS

푧1 푧2

푧푖 = 훼 + 1 − 훼

푧푖

Decoder SIMPLE LATENT SPACE INTERPOLATION – KERAS CODE EXAMPLE SIMPLE LATENT SPACE – INTERPOLATION - KERAS DENOISING AUTOENCODERS

Intuition: - We still aim to encode the input and to NOT mimic the identity function. - We try to undo the effect of corruption process stochastically applied to the input. A more robust model

Encoder Decoder

Noisy Input Denoised Input Latent space representation DENOISING AUTOENCODERS Use Case: - Extract robust representation for a NN classifier.

Encoder

Noisy Input Latent space representation DENOISING AUTOENCODERS Instead of trying to mimic the identity function by minimizing: 퐿 푥, 푔 푓 푥

where L is some loss function

A DAE instead minimizes: 퐿 푥, 푔 푓 푥෤

where 푥෤ is a copy of 푥 that has been corrupted by some form of noise. DENOISING AUTOENCODERS

Idea: A robust representation 푥ො against noise: 푤′ - Random assignment of subset of inputs to 0, with probability 푣. 푓 푥 - Gaussian additive noise. 푤 푥 DENOISING AUTOENCODERS 푥ො

• Reconstruction 푥ො computed from the 푤′ corrupted input 푥෤. • Loss function compares 푥ො reconstruction 푓 푥෤ with the noiseless 푥.

❖The autoencoder cannot fully trust each 푤 feature of 푥 independently so it must learn the correlations of 푥’s features. 푥෤ 0 0 0 ❖Based on those relations we can predict a more ‘not prune to changes’ model. Noise Process 풑 풙෥ȁ풙 ➢ We are forcing the hidden layer to learn a 푥 generalized structure of the data. DENOISING AUTOENCODERS - PROCESS

Taken some input 푥 Apply Noise 푥෤ DENOISING AUTOENCODERS - PROCESS

푥෤ Encode And Decode DAEDAE

푔 푓 푥෤ DENOISING AUTOENCODERS - PROCESS

DAEDAE 푥ො

푔 푓 푥෤ DENOISING AUTOENCODERS - PROCESS

푥ො Compare 푥 DENOISING AUTOENCODERS DENOISING CONVOLUTIONAL AE – KERAS

- 50 epochs. - Noise factor 0.5 - 92% accuracy on validation set. STACKED AE

- Motivation: ❑ We want to harness the feature extraction quality of a AE for our advantage. ❑ For example: we can build a deep supervised classifier where it’s input is the output of a SAE. ❑ The benefit: our deep model’s W are not randomly initialized but are rather “smartly selected” ❑Also using this unsupervised technique lets us have a larger unlabeled dataset. STACKED AE

- Building a SAE consists of two phases: 1. Train each AE layer one after the other. 2. Connect any classifier (SVM / FC NN layer etc.) STACKED AE

푥 푦

SAE Classifier STACKED AE – TRAIN PROCESS

First Layer Training (AE 1)

푥 푓1 푥 푧1 푔1 푧1 푥ො STACKED AE – TRAIN PROCESS

Second Layer Training (AE 2)

푧Ƹ 푥 푓1 푥 푧1 푓2 푧1 푧2 푔2 푧2 1 STACKED AE – TRAIN PROCESS

Add any classifier

푥 푓1 푥 푧1 푓2 푧1 푧2 Classifier Output https://mi.eng.cam.ac.uk/projects/segnet/#demo

U-Net • Current state of the art – Very popular in MICCAI 2016 – Works well with low data • Influenced by the previous – Up-conv 2x2 = bilinear up- sampling then 2x2 convolution – 2D slices – 3x3

https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ Credits • This lecture contains public domain materials from – Guy Nolan – Stefania Raimondo – Jason Su – Hung-yi Lee