Deep Learning Unsupervised Learning
Russ Salakhutdinov
Machine Learning Department Carnegie Mellon University Canadian Institute for Advanced Research Tutorial Roadmap
Part 1: Supervised (Discrimina ve) Learning: Deep Networks
Part 2: Unsupervised Learning: Deep Genera ve Models
Part 3: Open Research Ques ons Unsupervised Learning
Non-probabilis c Models Probabilis c (Genera ve) Ø Sparse Coding Models Ø Autoencoders Ø Others (e.g. k-means)
Non-Tractable Models Tractable Models Ø Genera ve Adversarial Ø Fully observed Ø Boltzmann Machines Networks Ø Belief Nets Varia onal Ø Moment Matching Ø NADE Autoencoders Networks Ø PixelRNN Ø Helmholtz Machines Ø Many others…
Explicit Density p(x) Implicit Density Tutorial Roadmap
• Basic Building Blocks: Ø Sparse Coding Ø Autoencoders
• Deep Genera ve Models Ø Restricted Boltzmann Machines Ø Deep Boltzmann Machines Ø Helmholtz Machines / Varia onal Autoencoders
• Genera ve Adversarial Networks Tutorial Roadmap
• Basic Building Blocks: Ø Sparse Coding Ø Autoencoders
• Deep Genera ve Models Ø Restricted Boltzmann Machines Ø Deep Boltzmann Machines Ø Helmholtz Machines / Varia onal Autoencoders
• Genera ve Adversarial Networks Sparse Coding • Sparse coding (Olshausen & Field, 1996). Originally developed to explain early visual processing in the brain (edge detec on).
• Objec ve: Given a set of input data vectors learn a dic onary of bases such that:
Sparse: mostly zeros
• Each data vector is represented as a sparse linear combina on of bases. Sparse Coding Natural Images Learned bases: “Edges”
New example
= 0.8 * + 0.3 * + 0.5 *
x = 0.8 * + 0.3 * + 0.5 *
[0, 0, … 0.8, …, 0.3, …, 0.5, …] = coefficients (feature representa on) Slide Credit: Honglak Lee Sparse Coding: Training • Input image patches: • Learn dic onary of bases:
Reconstruc on error Sparsity penalty • Alterna ng Op miza on: 1. Fix dic onary of bases and solve for ac va ons a (a standard Lasso problem). 2. Fix ac va ons a, op mize the dic onary of bases (convex QP problem). Sparse Coding: Tes ng Time • Input: a new image patch x* , and K learned bases • Output: sparse representa on a of an image patch x*.
= 0.8 * + 0.3 * + 0.5 *
x* = 0.8 * + 0.3 * + 0.5 *
[0, 0, … 0.8, …, 0.3, …, 0.5, …] = coefficients (feature representa on) Image Classifica on Evaluated on Caltech101 object category dataset.
Classification Algorithm (SVM)
Learned Input Image bases Features (coefficients) 9K images, 101 classes
Algorithm Accuracy Baseline (Fei-Fei et al., 2004) 16% PCA 37% Sparse Coding 47%
Slide Credit: Honglak Lee Lee, Ba le, Raina, Ng, 2006 Interpre ng Sparse Coding
a Sparse features a
Implicit g(a) Explicit f(x) Linear nonlinear Decoding encoding x’ x
• Sparse, over-complete representa on a. • Encoding a = f(x) is implicit and nonlinear func on of x. • Reconstruc on (or decoding) x’ = g(a) is linear and explicit. Autoencoder
Feature Representation
Feed-back, Feed-forward, generative, bottom-up top-down Decoder Encoder path
Input Image
• Details of what goes insider the encoder and decoder ma er! • Need constraints to avoid learning an iden ty. Autoencoder
Binary Features z
Decoder Encoder filters D filters W. Dz z=σ(Wx) Linear Sigmoid function function path
Input Image x Autoencoder
Binary Features z • An autoencoder with D inputs, D outputs, and K hidden units, with K • Given an input x, its reconstruc on is given by: Input Image x Decoder Encoder Autoencoder Binary Features z • An autoencoder with D inputs, D outputs, and K hidden units, with K Input Image x • We can determine the network parameters W and D by minimizing the reconstruc on error: Autoencoder Linear Features z • If the hidden and output layers are linear, it will learn hidden units that are a linear func on of the data and minimize the squared error. Wz z=Wx • The K hidden units will span the same space as the first k principal components. The weight vectors Input Image x may not be orthogonal. • With nonlinear hidden units, we have a nonlinear generaliza on of PCA. Another Autoencoder Model Binary Features z Encoder filters W. σ(WTz) z=σ(Wx) Decoder Sigmoid filters D function path Binary Input x • Need addi onal constraints to avoid learning an iden ty. • Relates to Restricted Boltzmann Machines (later). Predic ve Sparse Decomposi on Binary Features z L Sparsity Encoder 1 filters W. Dz z=σ(Wx) Decoder Sigmoid filters D function path Real-valued Input x At training time path Decoder Encoder Kavukcuoglu, Ranzato, Fergus, LeCun, 2009 Stacked Autoencoders Class Labels Decoder Encoder Features Sparsity Decoder Encoder Features Sparsity Decoder Encoder Input x Stacked Autoencoders Class Labels Decoder Encoder Features Sparsity Decoder Encoder Features Greedy Layer-wise Learning. Sparsity Decoder Encoder Input x Stacked Autoencoders Class Labels • Remove decoders and use feed-forward part. Encoder • Standard, or Features convolu onal neural network architecture. Encoder • Parameters can be Features fine-tuned using backpropaga on. Encoder Input x Deep Autoencoders Decoder 30 W 4 Top 500 RBM T T W 1 W 1 + 8 2000 2000 T T 500 W 2 W 2 + 7 1000 1000 W 3 1000 W T W T RBM 3 3 + 6 500 500 W T T 4 W 4 + 5 1000 30 Code layer 30 W 4 W 44+ W 2 2000 500 500 RBM W 3 W 33+ 1000 1000 W 2 W 22+ 2000 2000 2000 W 1 W 1 W 11+ RBM Encoder Pretraining Unrolling Fine tuning Deep Autoencoders • 25x25 – 2000 – 1000 – 500 – 30 autoencoder to extract 30-D real- valued codes for Olive face patches. • Top: Random samples from the test dataset. • Middle: Reconstruc ons by the 30-dimensional deep autoencoder. • Bo om: Reconstruc ons by the 30-dimen noal PCA. Informa on Retrieval European Community Interbank Markets Monetary/Economic 2-D LSA space Energy Markets Disasters and Accidents Leading Legal/Judicial Economic Indicators Government Accounts/ Borrowings Earnings • The Reuters Corpus Volume II contains 804,414 newswire stories (randomly split into 402,207 training and 402,207 test). • “Bag-of-words” representa on: each ar cle is represented as a vector containing the counts of the most frequently used 2000 words. (Hinton and Salakhutdinov, Science 2006) Tutorial Roadmap • Basic Building Blocks: Ø Sparse Coding Ø Autoencoders • Deep Genera ve Models Ø Restricted Boltzmann Machines Ø Deep Boltzmann Machines Ø Helmholtz Machines / Varia onal Autoencoders • Genera ve Adversarial Networks BRIEF ARTICLE THE AUTHOR Maximum likelihood ✓⇤Fully Observed Models = arg max Ex pdata log pmodel(x ✓) ✓ ⇠ | Fully-visible belief• Explicitly model condi onal probabili es: net n pmodel(x)=pmodel(x1) pmodel(xi x1,...,xi 1) | Yi=2 Each condi onal can be a complicated neural network • A number of successful models, including Ø NADE, RNADE (Larochelle, et.al. 20011) Ø Pixel CNN (van den Ord et. al. 2016) Ø Pixel RNN (van den Ord et. al. 2016) Pixel CNN 1 Restricted Boltzmann Machines hidden variables Pair-wise Unary Graphical Models: Powerful Feature Detectors framework for represen ng dependency structure between random variables. Image visible variables RBM is a Markov Random Field with: • Stochas c binary visible variables • Stochas c binary hidden variables • Bipar te connec ons. Markov random fields, Boltzmann machines, log-linear models. Learning Features Observed Data Learned W: “edges” Subset of 25,000 characters Subset of 1000 features Sparse New Image: representa ons = …. Logis c Func on: Suitable for modeling binary images RBMs for Real-valued & Count Data Learned features (out of 10,000) 4 million unlabelled images Learned features: ``topics’’ russian clinton computer trade stock Reuters dataset: russia house system country wall 804,414 unlabeled moscow president product import street newswire stories yeltsin bill so ware world point dow Bag-of-Words soviet congress develop economy Collabora ve Filtering Binary hidden: user preferences h Learned features: ``genre’’ W1 Fahrenheit 9/11 Independence Day v Bowling for Columbine The Day A er Tomorrow The People vs. Larry Flynt Con Air Mul nomial visible: user ra ngs Canadian Bacon Men in Black II La Dolce Vita Men in Black Ne lix dataset: 480,189 users Friday the 13th Scary Movie The Texas Chainsaw Massacre Naked Gun 17,770 movies Children of the Corn Hot Shots! Over 100 million ra ngs Child's Play American Pie The Return of Michael Myers Police Academy State-of-the-art performance on the Ne lix dataset. (Salakhutdinov, Mnih, Hinton, ICML 2007) Different Data Modali es • Binary/Gaussian/ So max RBMs: All have binary hidden variables but use them to model different kinds of data. Binary 0 0 0 1 0 Real-valued 1-of-K • It is easy to infer the states of the hidden variables: Product of Experts The joint distribu on is given by: Marginalizing over hidden variables: Product of Experts government clinton bribery mafia stock … authority house corrup on business wall power president dishonesty gang street empire bill corrupt mob point federa on congress fraud insider dow Topics “government”, ”corrup on” and ”mafia” can combine to give very high probability to a word “Silvio Silvio Berlusconi Berlusconi”. Product of Experts The joint distribu on is given by: Marginalizing over hidden variables: Product of Experts 50 Replicated 40 Softmax 50−D government clinton bribery mafia stock … authority house 30 corrup on business wall street power president dishonesty LDA 50−D gang point empire bill 20 corrupt mob federa on congress fraud insider dow Precision (%) 10 Topics “government”, ”corrup on” and ”mafia” can combine to give very 0.001 0.006 0.051 0.4 1.6 6.4 25.6 100 Recall high probability to a word “(%) Silvio Silvio Berlusconi Berlusconi”. Deep Boltzmann Machines Low-level features: Edges Built from unlabeled inputs. Input: Pixels Image (Salakhutdinov 2008, Salakhutdinov & Hinton 2012) Deep Boltzmann Machines Learn simpler representa ons, then compose more complex ones Higher-level features: Combina on of edges Low-level features: Edges Built from unlabeled inputs. Input: Pixels Image (Salakhutdinov 2008, Salakhutdinov & Hinton 2009) Model Formula on h3 Same as RBMs W3 model parameters h2 • Dependencies between hidden variables. W2 • All connec ons are undirected. h1 • Bo om-up and Top-down: W1 v Input Top-down Bo om-up • Hidden variables are dependent even when condi oned on the input. Approximate Learning h3 (Approximate) Maximum Likelihood: W3 h2 W2 • Both expecta ons are intractable! h1 W1 v Not factorial any more! Approximate Learning h3 (Approximate) Maximum Likelihood: W3 h2 W2 h1 W1 v Data Not factorial any more! Approximate Learning h3 (Approximate) Maximum Likelihood: W3 h2 W2 h1 W1 Varia onal Stochas c v Inference Approxima on (MCMC-based) Not factorial any more! Good Genera ve Model? Handwri en Characters Good Genera ve Model? Handwri en Characters Good Genera ve Model? Handwri en Characters Simulated Real Data Good Genera ve Model? Handwri en Characters Real Data Simulated Good Genera ve Model? Handwri en Characters Handwri ng Recogni on MNIST Dataset Op cal Character Recogni on 60,000 examples of 10 digits 42,152 examples of 26 English le ers Learning Algorithm Error Learning Algorithm Error Logis c regression 12.0% Logis c regression 22.14% K-NN 3.09% K-NN 18.92% Neural Net (Pla 2005) 1.53% Neural Net 14.62% SVM (Decoste et.al. 2002) 1.40% SVM (Larochelle et.al. 2009) 9.70% Deep Autoencoder 1.40% Deep Autoencoder 10.05% (Bengio et. al. 2007) (Bengio et. al. 2007) Deep Belief Net 1.20% Deep Belief Net 9.68% (Hinton et. al. 2006) (Larochelle et. al. 2009) DBM 0.95% DBM 8.40% Permuta on-invariant version. 3-D object Recogni on NORB Dataset: 24,000 examples Learning Algorithm Error Logis c regression 22.5% K-NN (LeCun 2004) 18.92% SVM (Bengio & LeCun 2007) 11.6% Deep Belief Net (Nair & Hinton 9.0% 2009) DBM 7.2% Pa ern Comple on Data – Collec on of Modali es • Mul media content on the web - image + text + audio. • Product recommenda on systems. car, automobile • Robo cs applica ons. sunset, pacificocean, Motor control bakerbeach, Touch sensors seashore, ocean Vision Audio Challenges - I Image Text Very different input sunset, pacific ocean, representa ons baker beach, seashore, ocean • Images – real-valued, dense • Text – discrete, sparse Dense Sparse Difficult to learn cross-modal features from low-level representa ons. Challenges - II Image Text pentax, k10d, pentaxda50200, kangarooisland, sa, Noisy and missing data australiansealion mickikrimmel, mickipedia, headshot < no text> unseulpixel, naturey Challenges - II Image Text Text generated by the model pentax, k10d, beach, sea, surf, strand, pentaxda50200, shore, wave, seascape, kangarooisland, sa, sand, ocean, waves australiansealion mickikrimmel, portrait, girl, woman, lady, mickipedia, blonde, pre y, gorgeous, headshot expression, model night, no e, traffic, light, < no text> lights, parking, darkness, lowlight, nacht, glow unseulpixel, fall, autumn, trees, leaves, naturey foliage, forest, woods, branches, path Mul modal DBM Gaussian model Replicated So max 0 Dense, real-valued 0 Word image features 0 counts 1 0 (Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) Mul modal DBM Gaussian model Replicated So max 0 Dense, real-valued 0 Word image features 0 counts 1 0 (Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) Mul modal DBM Gaussian model Replicated So max 0 Dense, real-valued 0 Word image features 0 counts 1 0 (Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) Text Generated from Images Given Generated Given Generated dog, cat, pet, ki en, insect, bu erfly, insects, puppy, ginger, tongue, bug, bu erflies, lepidoptera ki y, dogs, furry sea, france, boat, mer, graffi , streetart, stencil, beach, river, bretagne, s cker, urbanart, graff, plage, bri any sanfrancisco portrait, child, kid, canada, nature, ritra o, kids, children, sunrise, ontario, fog, boy, cute, boys, italy mist, bc, morning Genera ng Text from Images Samples drawn a er every 50 steps of Gibbs updates Text Generated from Images Given Generated portrait, women, army, soldier, mother, postcard, soldiers obama, barackobama, elec on, poli cs, president, hope, change, sanfrancisco, conven on, rally water, glass, beer, bo le, drink, wine, bubbles, splash, drops, drop Images from Text Given Retrieved water, red, sunset nature, flower, red, green blue, green, yellow, colors chocolate, cake MIR-Flickr Dataset • 1 million images along with user-assigned tags. nikon, abigfave, food, cupcake, sculpture, beauty, d80 goldstaraward, d80, vegan stone nikond80 anawesomeshot, nikon, green, light, white, yellow, sky, geotagged, theperfectphotographer, photoshop, apple, d70 abstract, lines, bus, reflec on, cielo, flash, damniwishidtakenthat, graphic bilbao, reflejo spiritofphotography Huiskes et. al. Results • Logis c regression on top-level representa on. • Mul modal Inputs Mean Average Precision Learning Algorithm MAP Precision@50 Random 0.124 0.124 LDA [Huiskes et. al.] 0.492 0.754 Labeled 25K SVM [Huiskes et. al.] 0.475 0.758 examples DBM-Labelled 0.526 0.791 Deep Belief Net 0.638 0.867 + 1 Million Autoencoder 0.638 0.875 unlabelled DBM 0.641 0.873 Helmholtz Machines • Hinton, G. E., Dayan, P., Frey, B. J. and Neal, R., Science 1995 • Kingma & Welling, 2014 Genera ve Approximate Process • Inference Rezende, Mohamed, Daan, 2014 h3 W3 • Mnih & Gregor, 2014 2 h • Bornschein & Bengio, 2015 W2 • Tang & Salakhutdinov, 2013 h1 W1 v Input data Helmholtz Machines vs. DBMs Helmholtz Machine Deep Boltzmann Machine Genera ve Approximate Process Inference 3 h3 h 3 W3 W 2 h2 h 2 W2 W h1 h1 W1 W1 v v Input data Varia onal Autoencoders (VAEs) • The VAE defines a genera ve process in terms of ancestral sampling through a cascade of hidden stochas c layers: Genera ve Each term may denote a Process complicated nonlinear rela onship h3 • denotes parameters 3 W of VAE. h2 W2 • is the number of stochas c layers. h1 W1 • Sampling and probability evalua on is tractable for v each . Input data VAE: Example • The VAE defines a genera ve process in terms of ancestral sampling through a cascade of hidden stochas c layers: This term denotes a one-layer neural net. Stochas c Layer • denotes parameters Determinis c of VAE. Layer • is the number of Stochas c Layer stochas c layers. • Sampling and probability evalua on is tractable for each . Varia onal Bound • The VAE is trained to maximize the varia onal lower bound: • Trading off the data log-likelihood and the KL divergence from the true posterior. 3 h • Hard to op mize the varia onal bound W3 h2 with respect to the recogni on network W2 (high-variance). h1 W1 • Key idea of Kingma and Welling is to use v reparameteriza on trick. Input data Reparameteriza on Trick • Assume that the recogni on distribu on is Gaussian: with mean and covariance computed from the state of the hidden units at the previous layer. • Alterna vely, we can express this in term of auxiliary variable: Reparameteriza on Trick • Assume that the recogni on distribu on is Gaussian: • Or • The recogni on distribu on can be expressed in terms of a determinis c mapping: Determinis c Distribu on of Encoder does not depend on Compu ng the Gradients • The gradient w.r.t the parameters: both recogni on and genera ve: Autoencoder Gradients can be The mapping h is a determinis c computed by backprop neural net for fixed . Importance Weighted Autoencoders • Can improve VAE by using following k-sample importance weigh ng of the log-likelihood: h3 W3 unnormalized h2 importance weights W2 where are sampled h1 from the recogni on network. W1 v Input data Burda, Grosse, Salakhutdinov, 2015 Genera ng Images from Cap ons Stochas c Layer • Genera ve Model: Stochas c Recurrent Network, chained sequence of Varia onal Autoencoders, with a single stochas c layer. • Recogni on Model: Determinis c Recurrent Network. Gregor et. al. 2015 (Mansimov, Pariso o, Ba, Salakhutdinov, 2015) Mo va ng Example • Can we generate images from natural language descrip ons? A stop sign is flying in A pale yellow school bus blue skies is flying in blue skies A herd of elephants is A large commercial airplane flying in blue skies is flying in blue skies (Mansimov, Pariso o, Ba, Salakhutdinov, 2015) Flipping Colors A yellow school bus parked A red school bus parked in in the parking lot the parking lot A green school bus parked in A blue school bus parked in the parking lot the parking lot (Mansimov, Pariso o, Ba, Salakhutdinov, 2015) Novel Scene Composi ons A toilet seat sits open in the A toilet seat sits open in the bathroom grass field Ask Google? (Some) Open Problems • Reasoning, A en on, and Memory • Natural Language Understanding • Deep Reinforcement Learning • Unsupervised Learning / Transfer Learning / One-Shot Learning (Some) Open Problems • Reasoning, A en on, and Memory • Natural Language Understanding • Deep Reinforcement Learning • Unsupervised Learning / Transfer Learning / One-Shot Learning Who-Did-What Dataset • Document: “…arrested Illinois governor Rod Blagojevich and his chief of staff John Harris on corrup on charges … included Blogojevich allegedly conspiring to sell or trade the senate seat le vacant by President-elect Barack Obama…” • Query: President-elect Barack Obama said Tuesday he was not aware of alleged corrup on by X who was arrested on charges of trying to sell Obama’s senate seat. • Answer: Rod Blagojevich Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016 Recurrent Neural Network Nonlinearity Hidden State at Input at me previous me step step t h1 h2 h3 x1 x2 x3 Gated A en on Mechanism • Use Recurrent Neural Networks ( RNNs) to encode a document and a query: Ø Use element-wise mul plica on to model the interac ons between document and query: (Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017) Mul -hop Architecture • Reasoning requires several passes over the context (Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017) Analysis of A en on • Context: “…arrested Illinois governor Rod Blagojevich and his chief of staff John Harris on corrup on charges … included Blogojevich allegedly conspiring to sell or trade the senate seat le vacant by President-elect Barack Obama…” • Query: “President-elect Barack Obama said Tuesday he was not aware of alleged corrup on by X who was arrested on charges of trying to sell Obama’s senate seat.” • Answer: Rod Blagojevich Layer 1 Layer 2 Analysis of A en on • Context: “…arrested Illinois governor Rod Blagojevich and his chief of staff John Harris on corrup on charges … included Blogojevich allegedly conspiring to sell or trade the senate seat le vacant by President-elect Barack Obama…” • Query: “President-elect Barack Obama said Tuesday he was not aware of alleged corrup on by X who was arrested on charges of trying to sell Obama’s senate seat.” • Answer: Rod Blagojevich Layer 1 Layer 2 Code + Data: h ps://github.com/bdhingra/ga-reader Incorpora ng Prior Knowledge Mary got the football She went to the kitchen She left the ball there RNN Coreference Hyper/Hyponymy Dhingra, Yang, Cohen, Salakhutdinov 2017 Incorpora ng Prior Knowledge Mary got the football She went to the kitchen She left the ball there RNN Coreference Hyper/Hyponymy Mt e1 ... e E | | h0 Mt+1 h1 gt . Memory as Acyclic Graph . ht Encoding (MAGE) - RNN RNN ht 1 xt Dhingra, Yang, Cohen, Salakhutdinov 2017 Incorpora ng Prior Knowledge Her plain face broke into Coreference a huge smile when she Core NLP Dependency Parses saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want En ty rela ons you to meet an old Freebase friend, Owen McKenna. Owen, please meet Emily.'’ She gave me a quick nod and turned WordNet Word rela ons back to X Recurrent Neural Network Text Representa on Neural Story Telling Sample from the Genera ve Model (recurrent neural network): She was in love with him for the first me in months, so she had no inten on of escaping. The sun had risen from the ocean, making her feel more alive than normal. She is beau ful, but the truth is that I do not know what to do. The sun was just star ng to fade away, leaving people sca ered around the Atlan c Ocean. (Kiros et al., NIPS 2015) (Some) Open Problems • Reasoning, A en on, and Memory • Natural Language Understanding • Deep Reinforcement Learning • Unsupervised Learning / Transfer Learning / One-Shot Learning Learning Behaviors Observa on Ac on Learning to map sequences of observa ons to ac ons, for a par cular goal Reinforcement Learning with Memory Learned Ac on External Memory Reward Observa on / State Differentiable Neural Computer, Graves et al., Nature, 2016; Neural Turing Machine, Graves et al., 2014 Reinforcement Learning with Memory Learned Ac on External Memory Learning 3-D game Reward without memory Chaplot, Lample, AAAI 2017 Observa on / State Differentiable Neural Computer, Graves et al., Nature, 2016; Neural Turing Machine, Graves et al., 2014 Deep RL with Memory Learned Ac on Structured Memory Reward Observa on / State Parisotto, Salakhutdinov, 2017 Random Maze with Indicator • Indicator: Either blue or pink Ø If blue, find the green block Ø If pink, find the red block • Nega ve reward if agent does not find correct block in N steps or goes to wrong block. Parisotto, Salakhutdinov, 2017 Random Maze with Indicator Mt Mt+1 Write Write Read with A en on Parisotto, Salakhutdinov, 2017 Random Maze with Indicator Building Intelligent Agents Learned Ac on External Memory Reward Knowledge Base Observa on / State Building Intelligent Agents Learned Ac on External Memory Reward Learning from Fewer Knowledge Base Examples, Fewer Experiences Observa on / State Summary • Efficient learning algorithms for Deep Unsupervised Models Text & image retrieval / Image Tagging Learning a Category Object recogni on Hierarchy mosque, tower, building, cathedral, dome, castle Object Detec on Speech Recogni on Mul modal Data HMM decoder sunset, pacific ocean, beach, seashore • Deep models improve the current state-of-the art in many applica on domains: Ø Object recogni on and detec on, text and image retrieval, handwri en character and speech recogni on, and others. Thank you