Unsupervised Learning: Deep Auto-Encoder Credits

Unsupervised Learning: Deep Auto-Encoder Credits

Unsupervised Learning: Deep Auto-encoder Credits • This lecture contains public domain materials from • Guy Nolan •Stefania Raimondo •Jason Su • Hung-yi Lee Auto-encoder Usually <784 Compact NN code representation of Encoder the input object 28 X 28 = 784 Learn together Can reconstruct code NN Decoder the original object Recap: PCA Minimize 푥 − 푥ො 2 As close as possible encode decode 푥 푐 푥ො 푊 푊푇 hidden layer Input layer (linear) output layer Bottleneck later Output of the hidden layer is the code Symmetric is not Deep Auto-encoder necessary. • Of course, the auto-encoder can be deep As close as possible Output Layer Input Layer bottle Layer Layer Layer Layer Layer Layer Layer Layer Layer Layer Layer … … 푇 푇 푊2 푊 푊1 푊2 1 Initialize by RBM 푥 Code 푥ො layer-by-layer Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the dimensionality of data with neural networks." Science 313.5786 (2006): 504-507 784 784 1000 30 500 784 250 30 250 500 encoder 1000 - 784 encoder PCA - Deep Image Original Auto Deep Auto DEEP AUTOENCODER EXAMPLE https://cs.stanford.edu/people/karpathy/convnetjs/demo/autoencoder.html - By Andrej Karpathy 784 784 1000 2 500 784 250 2 250 500 1000 784 More: Contractive auto-encoder Ref: Rifai, Salah, et al. "Contractive Auto-encoder auto-encoders: Explicit invariance during feature extraction.“ Proceedings of the 28th International Conference on • De-noising auto-encoder Machine Learning (ICML-11). 2011. As close as possible encode decode 푐 푥 푥′ 푥ො Add noise Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." ICML, 2008. Deep Auto-encoder - Example NN 푐 Encoder PCA 32-dim Pixel -> tSNE t-Distributed Stochastic Neighbor Embedding (t-SNE) Auto-encoder – Text Retrieval Vector Space Model Bag-of-word this 1 is 1 word string: query “This is an apple” a 0 an 1 apple 1 pen 0 document … Semantics are not considered. Auto-encoder – Text Retrieval The documents talking about the same thing will have close code. 2 query 125 250 500 LSA: project documents to 2000 2 latent topics Bag-of-word (document or query) Auto- As close as encoder possible for CNN Deconvolution Unpooling Convolution Deconvolution Pooling Unpooling Convolution code Deconvolution Pooling CNN -Unpooling 14 x 14 28 x 28 Alternative: simply Source of image : repeat the values https://leonardoaraujosantos.gitbooks.io/artificial- inteligence/content/image_segmentation.html CNN Actually, deconvolution is convolution. - Deconvolution + + + = + Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 500 1000 784 푥ො Target W1’ 1000 1000 W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 500 1000 푎ො1 W2’ 1000 1000 Target W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN • Greedy Layer-wise Pre-training again output 10 1000 푎ො2 W3’ 500 500 W3 1000 1000 푎2 Target fix W2 1000 1000 푎1 fix W1 Input 784 Input 784 푥 Auto-encoder – Pre-training DNN Find-tune by • Greedy Layer-wise Pre-training again backpropagation output 10 output 10 Random 4 W init 500 500 W3 1000 1000 Target W2 1000 1000 W1 Input 784 Input 784 푥 SIMPLE LATENT SPACE INTERPOLATION - KERAS 푧1 Encoder 푧2 Encoder SIMPLE LATENT SPACE INTERPOLATION - KERAS 푧1 푧2 푧푖 = 훼 + 1 − 훼 푧푖 Decoder SIMPLE LATENT SPACE INTERPOLATION – KERAS CODE EXAMPLE SIMPLE LATENT SPACE – INTERPOLATION - KERAS DENOISING AUTOENCODERS Intuition: - We still aim to encode the input and to NOT mimic the identity function. - We try to undo the effect of corruption process stochastically applied to the input. A more robust model Encoder Decoder Noisy Input Denoised Input Latent space representation DENOISING AUTOENCODERS Use Case: - Extract robust representation for a NN classifier. Encoder Noisy Input Latent space representation DENOISING AUTOENCODERS Instead of trying to mimic the identity function by minimizing: 퐿 푥, 푔 푓 푥 where L is some loss function A DAE instead minimizes: 퐿 푥, 푔 푓 푥෤ where 푥෤ is a copy of 푥 that has been corrupted by some form of noise. DENOISING AUTOENCODERS Idea: A robust representation 푥ො against noise: 푤′ - Random assignment of subset of inputs to 0, with probability 푣. 푓 푥 - Gaussian additive noise. 푤 푥 DENOISING AUTOENCODERS 푥ො • Reconstruction 푥ො computed from the 푤′ corrupted input 푥෤. • Loss function compares 푥ො reconstruction 푓 푥෤ with the noiseless 푥. ❖The autoencoder cannot fully trust each 푤 feature of 푥 independently so it must learn the correlations of 푥’s features. 푥෤ 0 0 0 ❖Based on those relations we can predict a more ‘not prune to changes’ model. Noise Process 풑 풙෥ȁ풙 ➢ We are forcing the hidden layer to learn a 푥 generalized structure of the data. DENOISING AUTOENCODERS - PROCESS Taken some input 푥 Apply Noise 푥෤ DENOISING AUTOENCODERS - PROCESS 푥෤ Encode And Decode DAEDAE 푔 푓 푥෤ DENOISING AUTOENCODERS - PROCESS DAEDAE 푥ො 푔 푓 푥෤ DENOISING AUTOENCODERS - PROCESS 푥ො Compare 푥 DENOISING AUTOENCODERS DENOISING CONVOLUTIONAL AE – KERAS - 50 epochs. - Noise factor 0.5 - 92% accuracy on validation set. STACKED AE - Motivation: ❑ We want to harness the feature extraction quality of a AE for our advantage. ❑ For example: we can build a deep supervised classifier where it’s input is the output of a SAE. ❑ The benefit: our deep model’s W are not randomly initialized but are rather “smartly selected” ❑Also using this unsupervised technique lets us have a larger unlabeled dataset. STACKED AE - Building a SAE consists of two phases: 1. Train each AE layer one after the other. 2. Connect any classifier (SVM / FC NN layer etc.) STACKED AE 푥 푦 SAE Classifier STACKED AE – TRAIN PROCESS First Layer Training (AE 1) 푥 푓1 푥 푧1 푔1 푧1 푥ො STACKED AE – TRAIN PROCESS Second Layer Training (AE 2) 푧Ƹ 푥 푓1 푥 푧1 푓2 푧1 푧2 푔2 푧2 1 STACKED AE – TRAIN PROCESS Add any classifier 푥 푓1 푥 푧1 푓2 푧1 푧2 Classifier Output https://mi.eng.cam.ac.uk/projects/segnet/#demo U-Net • Current state of the art – Very popular in MICCAI 2016 – Works well with low data • Influenced by the previous – Up-conv 2x2 = bilinear up- sampling then 2x2 convolution – 2D slices – 3x3 convolutions https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/ Credits • This lecture contains public domain materials from – Guy Nolan – Stefania Raimondo – Jason Su – Hung-yi Lee .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    46 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us