COMP 551 – Applied Machine Learning Lecture 16: Deep Learning
Instructor: Joelle Pineau ([email protected])
Class web page: www.cs.mcgill.ca/~jpineau/comp551
Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor’s written permission. The deep learning objective
Figure 1: We would like the raw input image to be transformedinto gradually higher levels of representation, COMPrepresenting-551: Applied more andMachine more abstractLearning functions of the raw input, e.g.,2 edges, local shapes, objectJoelle parts, Pineau etc. In practice, we do not know in advance what the “right” representation should be for all these levels of abstractions, although linguistic concepts might help guessing what the higher levels should implicitly represent.
3 Learning an autoencoder function
• Goal: Learn a compressed representation of the input data. f g
• We have two functions:
– Encoder: h = fW(x) = sf (Wx)
– Decoder: x’ = gW’(h) = sg (W’h) where s() can be a sigmoid, linear, or other function and W, W’ are weight matrices.
x h x’
COMP-551: Applied Machine Learning 3 Joelle Pineau Learning an autoencoder function
• Goal: Learn a compressed representation of the input data. f g
• We have two functions:
– Encoder: h = fW(x) = sf (Wx)
– Decoder: x’ = gW’(h) = sg (W’h) where s() can be a sigmoid, linear, or other function and W, W’ are weight matrices.
• To train, minimize reconstruction error: x h x’
Err(W,W’) = ∑i=1:n L [ xi , gW’ (fW(xi)) ] using squared-error loss (continuous inputs) or cross-entropy (binary inputs).
COMP-551: Applied Machine Learning 4 Joelle Pineau PCA vs autoencoders
In the case of a linear function: f g fW(x) = Wx gŴ(h) = W’h , with squared-error loss:
2 Err(W,W’) = ∑i=1:n || xi – gW’ ( fW(xi ) ) || we can show that the minimum error solution W yields the same subspace as PCA.
x h x’
COMP-551: Applied Machine Learning 5 Joelle Pineau Stacked autoencoders
Key idea: Apply greedy layerwise unsupervised pre-training.
http://www.dmi.usherb.ca/~larocheh/projects_deep_learning.html
COMP-551: Applied Machine Learning 6 Joelle Pineau Regularization of autoencoders
• How can we generate sparse autoencoders? (And also, why?)
COMP-551: Applied Machine Learning 7 Joelle Pineau Regularization of autoencoders
• How can we generate sparse autoencoders? (And also, why?)
• Weight tying of the encoder and decoder weights (W=W’) to explicitly constrain (regularize) the learned function.
COMP-551: Applied Machine Learning 8 Joelle Pineau Regularization of autoencoders
• How can we generate sparse autoencoders? (And also, why?)
• Weight tying of the encoder and decoder weights (W=W’) to explicitly constrain (regularize) the learned function.
• Directly penalize the output of the hidden units (e.g. with L1 penalty) to introduce sparsity in the weights.
COMP-551: Applied Machine Learning 9 Joelle Pineau Regularization of autoencoders
• How can we generate sparse autoencoders? (And also, why?)
• Weight tying of the encoder and decoder weights (W=W’) to explicitly constrain (regularize) the learned function.
• Directly penalize the output of the hidden units (e.g. with L1 penalty) to introduce sparsity in the weights.
• Penalize the average output (over a batch of data) to encourage it to approach a fixed target.
COMP-551: Applied Machine Learning 10 Joelle Pineau Extracting and Composing Robust Features with Denoising Autoencoders
2.3. The Denoising Autoencoder Extracting and Composingtowards Robust reconstructing Features with the Denoisinguncorrupted Autoencoders version from the corrupted version. Note that in this way, the au- To test our hypothesis2.3. and The enforce Denoising robustness Autoencoder to par- towards reconstructing the uncorrupted version from toencoder cannot learn the identity, unlike the basic tially destroyed inputs we modify the basic autoen- the corrupted version. Note that in this way, the au- ExtractingTo test and our Composing hypothesis and Robust enforce Features robustnessautoencoder, with to Denoising par- thus Autoencoders removing the constraint that d