<<

Russ Salakhutdinov

Machine Learning Department Carnegie Mellon University Canadian Institute for Advanced Research Tutorial Roadmap

Part 1: Supervised (Discriminave) Learning: Deep Networks

Part 2: Unsupervised Learning: Deep Generave Models

Part 3: Open Research Quesons Unsupervised Learning

Non-probabilisc Models Probabilisc (Generave) Ø Sparse Coding Models Ø Ø Others (e.g. k-)

Non-Tractable Models Tractable Models Ø Generave Adversarial Ø Fully observed Ø Boltzmann Machines Networks Ø Belief Nets Variaonal Ø Moment Matching Ø NADE Autoencoders Networks Ø PixelRNN Ø Helmholtz Machines Ø Many others…

Explicit Density p(x) Implicit Density Tutorial Roadmap

• Basic Building Blocks: Ø Sparse Coding Ø Autoencoders

• Deep Generave Models Ø Restricted Boltzmann Machines Ø Deep Boltzmann Machines Ø Helmholtz Machines / Variaonal Autoencoders

• Generave Adversarial Networks Tutorial Roadmap

• Basic Building Blocks: Ø Sparse Coding Ø Autoencoders

• Deep Generave Models Ø Restricted Boltzmann Machines Ø Deep Boltzmann Machines Ø Helmholtz Machines / Variaonal Autoencoders

• Generave Adversarial Networks Sparse Coding • Sparse coding (Olshausen & Field, 1996). Originally developed to explain early visual processing in the brain (edge detecon).

• Objecve: Given a set of input data vectors learn a diconary of bases such that:

Sparse: mostly zeros

• Each data vector is represented as a sparse linear combinaon of bases. Sparse Coding Natural Images Learned bases: “Edges”

New example

= 0.8 * + 0.3 * + 0.5 *

x = 0.8 * + 0.3 * + 0.5 *

[0, 0, … 0.8, …, 0.3, …, 0.5, …] = coefficients (feature representaon) Slide Credit: Honglak Lee Sparse Coding: Training • Input image patches: • Learn diconary of bases:

Reconstrucon error Sparsity penalty • Alternang Opmizaon: 1. Fix diconary of bases and solve for acvaons a (a standard Lasso problem). 2. Fix acvaons a, opmize the diconary of bases (convex QP problem). Sparse Coding: Tesng Time • Input: a new image patch x* , and K learned bases • Output: sparse representaon a of an image patch x*.

= 0.8 * + 0.3 * + 0.5 *

x* = 0.8 * + 0.3 * + 0.5 *

[0, 0, … 0.8, …, 0.3, …, 0.5, …] = coefficients (feature representaon) Image Classificaon Evaluated on Caltech101 object category dataset.

Classification (SVM)

Learned Input Image bases Features (coefficients) 9K images, 101 classes

Algorithm Accuracy Baseline (Fei-Fei et al., 2004) 16% PCA 37% Sparse Coding 47%

Slide Credit: Honglak Lee Lee, Bale, Raina, Ng, 2006 Interpreng Sparse Coding

a Sparse features a

Implicit g(a) Explicit f(x) Linear nonlinear Decoding encoding x’ x

• Sparse, over-complete representaon a. • Encoding a = f(x) is implicit and nonlinear funcon of x. • Reconstrucon (or decoding) x’ = g(a) is linear and explicit.

Feature Representation

Feed-back, Feed-forward, generative, bottom-up top-down Decoder Encoder path

Input Image

• Details of what goes insider the encoder and decoder maer! • Need constraints to avoid learning an identy. Autoencoder

Binary Features z

Decoder Encoder filters D filters W. Dz z=σ(Wx) Linear function path

Input Image x Autoencoder

Binary Features z • An autoencoder with D inputs, D outputs, and K hidden units, with K

• Given an input x, its reconstrucon is given by: Input Image x

Decoder Encoder Autoencoder

Binary Features z • An autoencoder with D inputs, D outputs, and K hidden units, with K

Input Image x

• We can determine the network parameters W and D by minimizing the reconstrucon error: Autoencoder

Linear Features z • If the hidden and output layers are linear, it will learn hidden units that are a linear funcon of the data and minimize the squared error. Wz z=Wx • The K hidden units will span the same space as the first k principal components. The weight vectors Input Image x may not be orthogonal.

• With nonlinear hidden units, we have a nonlinear generalizaon of PCA. Another Autoencoder Model

Binary Features z

Encoder filters W. σ(WTz) z=σ(Wx) Decoder Sigmoid filters D function path Binary Input x

• Need addional constraints to avoid learning an identy. • Relates to Restricted Boltzmann Machines (later). Predicve Sparse Decomposion

Binary Features z

L Sparsity Encoder 1 filters W. Dz z=σ(Wx) Decoder Sigmoid filters D function path Real-valued Input x

At training time path Decoder Encoder Kavukcuoglu, Ranzato, Fergus, LeCun, 2009 Stacked Autoencoders

Class Labels

Decoder Encoder

Features

Sparsity Decoder Encoder

Features

Sparsity Decoder Encoder

Input x Stacked Autoencoders

Class Labels

Decoder Encoder

Features

Sparsity Decoder Encoder

Features Greedy -wise Learning. Sparsity Decoder Encoder

Input x Stacked Autoencoders

Class Labels • Remove decoders and use feed-forward part. Encoder

• Standard, or Features convoluonal neural . Encoder

• Parameters can be Features fine-tuned using backpropagaon. Encoder

Input x Deep Autoencoders Decoder 30 W 4 Top 500 RBM T T W 1 W 1 + 8 2000 2000 T T 500 W 2 W 2 + 7 1000 1000 W 3 1000 W T W T RBM 3 3 + 6 500 500 W T T 4 W 4 + 5 1000 30 Code layer 30

W 4 W 44+ W 2 2000 500 500 RBM W 3 W 33+ 1000 1000

W 2 W 22+ 2000 2000 2000

W 1 W 1 W 11+

RBM Encoder Pretraining Unrolling Finetuning Deep Autoencoders

• 25x25 – 2000 – 1000 – 500 – 30 autoencoder to extract 30-D real- valued codes for Olive face patches.

• Top: Random samples from the test dataset. • Middle: Reconstrucons by the 30-dimensional deep autoencoder. • Boom: Reconstrucons by the 30-dimennoal PCA. Informaon Retrieval

European Community Interbank Markets Monetary/Economic 2-D LSA space

Energy Markets Disasters and Accidents

Leading Legal/Judicial Economic Indicators

Government Accounts/ Borrowings Earnings • The Reuters Corpus Volume II contains 804,414 newswire stories (randomly split into 402,207 training and 402,207 test). • “Bag-of-words” representaon: each arcle is represented as a vector containing the counts of the most frequently used 2000 words.

(Hinton and Salakhutdinov, Science 2006) Tutorial Roadmap

• Basic Building Blocks: Ø Sparse Coding Ø Autoencoders

• Deep Generave Models Ø Restricted Boltzmann Machines Ø Deep Boltzmann Machines Ø Helmholtz Machines / Variaonal Autoencoders

• Generave Adversarial Networks BRIEF ARTICLE

THE AUTHOR

Maximum likelihood

✓⇤Fully Observed Models = arg max Ex pdata log pmodel(x ✓) ✓ ⇠ | Fully-visible belief• Explicitly model condional probabilies: net n pmodel(x)=pmodel(x1) pmodel(xi x1,...,xi 1) | Yi=2 Each condional can be a complicated neural network • A number of successful models, including

Ø NADE, RNADE (Larochelle, et.al. 20011) Ø Pixel CNN (van den Ord et. al. 2016) Ø Pixel RNN (van den Ord et. al. 2016)

Pixel CNN

1 Restricted Boltzmann Machines

hidden variables Pair-wise Unary Graphical Models: Powerful Feature Detectors framework for represenng dependency structure between random variables.

Image visible variables

RBM is a Markov Random Field with: • Stochasc binary visible variables • Stochasc binary hidden variables • Biparte connecons. Markov random fields, Boltzmann machines, log-linear models. Learning Features Observed Data Learned W: “edges” Subset of 25,000 characters Subset of 1000 features

Sparse New Image: representaons

= ….

Logisc Funcon: Suitable for modeling binary images RBMs for Real-valued & Count Data Learned features (out of 10,000) 4 million unlabelled images

Learned features: ``topics’’ russian clinton computer trade stock Reuters dataset: russia house system country wall 804,414 unlabeled moscow president product import street newswire stories yeltsin bill soware world point dow Bag-of-Words soviet congress develop economy Collaborave Filtering

Binary hidden: user preferences h Learned features: ``genre’’ W1 Fahrenheit 9/11 Independence Day v Bowling for Columbine The Day Aer Tomorrow The People vs. Larry Flynt Con Air Mulnomial visible: user rangs Canadian Bacon Men in Black II La Dolce Vita Men in Black Nelix dataset: 480,189 users Friday the 13th Scary Movie The Texas Chainsaw Massacre Naked Gun 17,770 movies Children of the Corn Hot Shots! Over 100 million rangs Child's Play American Pie The Return of Michael Myers Police Academy

State-of-the-art performance on the Nelix dataset.

(Salakhutdinov, Mnih, Hinton, ICML 2007) Different Data Modalies • Binary/Gaussian/ Somax RBMs: All have binary hidden variables but use them to model different kinds of data.

Binary 0 0 0 1 0 Real-valued 1-of-K

• It is easy to infer the states of the hidden variables: Product of Experts The joint distribuon is given by:

Marginalizing over hidden variables: Product of Experts

government clinton bribery mafia stock … authority house corrupon business wall power president dishonesty gang street empire bill corrupt mob point federaon congress fraud insider dow

Topics “government”, ”corrupon” and ”mafia” can combine to give very high probability to a word “Silvio Silvio Berlusconi Berlusconi”. Product of Experts The joint distribuon is given by:

Marginalizing over hidden variables: Product of Experts 50 Replicated 40 Softmax 50−D government clinton bribery mafia stock … authority house 30 corrupon business wall street power president dishonesty LDA 50−D gang point empire bill 20 corrupt mob federaon congress fraud insider dow Precision (%) 10 Topics “government”, ”corrupon” and ”mafia” can combine to give very 0.001 0.006 0.051 0.4 1.6 6.4 25.6 100 Recall high probability to a word “(%) Silvio Silvio Berlusconi Berlusconi”. Deep Boltzmann Machines

Low-level features: Edges

Built from unlabeled inputs.

Input: Pixels Image (Salakhutdinov 2008, Salakhutdinov & Hinton 2012) Deep Boltzmann Machines

Learn simpler representaons, then compose more complex ones

Higher-level features: Combinaon of edges

Low-level features: Edges

Built from unlabeled inputs.

Input: Pixels Image (Salakhutdinov 2008, Salakhutdinov & Hinton 2009) Model Formulaon

h3 Same as RBMs W3 model parameters h2 • Dependencies between hidden variables. W2 • All connecons are undirected. h1 • Boom-up and Top-down: W1 v Input Top-down Boom-up • Hidden variables are dependent even when condioned on the input. Approximate Learning h3 (Approximate) Maximum Likelihood: W3 h2 W2 • Both expectaons are intractable! h1 W1 v

Not factorial any more! Approximate Learning h3 (Approximate) Maximum Likelihood: W3 h2 W2 h1 W1 v Data

Not factorial any more! Approximate Learning h3 (Approximate) Maximum Likelihood: W3 h2 W2 h1 W1 Variaonal Stochasc v Inference Approximaon (MCMC-based)

Not factorial any more! Good Generave Model?

Handwrien Characters Good Generave Model?

Handwrien Characters Good Generave Model?

Handwrien Characters

Simulated Real Data Good Generave Model?

Handwrien Characters

Real Data Simulated Good Generave Model?

Handwrien Characters Handwring Recognion MNIST Dataset Opcal Character Recognion 60,000 examples of 10 digits 42,152 examples of 26 English leers Learning Algorithm Error Learning Algorithm Error

Logisc regression 12.0% Logisc regression 22.14% K-NN 3.09% K-NN 18.92% Neural Net (Pla 2005) 1.53% Neural Net 14.62% SVM (Decoste et.al. 2002) 1.40% SVM (Larochelle et.al. 2009) 9.70% Deep Autoencoder 1.40% Deep Autoencoder 10.05% (Bengio et. al. 2007) (Bengio et. al. 2007) Deep Belief Net 1.20% Deep Belief Net 9.68% (Hinton et. al. 2006) (Larochelle et. al. 2009) DBM 0.95% DBM 8.40%

Permutaon-invariant version. 3-D object Recognion NORB Dataset: 24,000 examples Learning Algorithm Error Logisc regression 22.5% K-NN (LeCun 2004) 18.92% SVM (Bengio & LeCun 2007) 11.6% Deep Belief Net (Nair & Hinton 9.0% 2009) DBM 7.2%

Paern Compleon Data – Collecon of Modalies

• Mulmedia content on the web - image + text + audio.

• Product recommendaon systems. car, automobile • Robocs applicaons. sunset, pacificocean, Motor control bakerbeach, Touch sensors seashore, ocean

Vision Audio Challenges - I

Image Text Very different input sunset, pacific ocean, representaons baker beach, seashore, ocean • Images – real-valued, dense • Text – discrete, sparse

Dense Sparse Difficult to learn cross-modal features from low-level representaons. Challenges - II Image Text pentax, k10d, pentaxda50200, kangarooisland, sa, Noisy and missing data australiansealion

mickikrimmel, mickipedia, headshot

< no text>

unseulpixel, naturey Challenges - II Image Text Text generated by the model pentax, k10d, beach, sea, surf, strand, pentaxda50200, shore, wave, seascape, kangarooisland, sa, sand, ocean, waves australiansealion

mickikrimmel, portrait, girl, woman, lady, mickipedia, blonde, prey, gorgeous, headshot expression, model

night, noe, traffic, light, < no text> lights, parking, darkness, lowlight, nacht, glow

unseulpixel, fall, autumn, trees, leaves, naturey foliage, forest, woods, branches, path Mulmodal DBM

Gaussian model Replicated Somax

0 Dense, real-valued 0 Word image features 0 counts 1 0

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) Mulmodal DBM

Gaussian model Replicated Somax

0 Dense, real-valued 0 Word image features 0 counts 1 0

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) Mulmodal DBM

Gaussian model Replicated Somax

0 Dense, real-valued 0 Word image features 0 counts 1 0

(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) Text Generated from Images

Given Generated Given Generated dog, cat, pet, kien, insect, buerfly, insects, puppy, ginger, tongue, bug, buerflies, lepidoptera kiy, dogs, furry

sea, france, boat, mer, graffi, streetart, stencil, beach, river, bretagne, scker, urbanart, graff, plage, briany sanfrancisco

portrait, child, kid, canada, nature, ritrao, kids, children, sunrise, ontario, fog, boy, cute, boys, italy mist, bc, morning Generang Text from Images

Samples drawn aer every 50 steps of Gibbs updates Text Generated from Images

Given Generated portrait, women, army, soldier, mother, postcard, soldiers

obama, barackobama, elecon, polics, president, hope, change, sanfrancisco, convenon, rally

water, glass, beer, bole, drink, wine, bubbles, splash, drops, drop Images from Text Given Retrieved water, red, sunset

nature, flower, red, green

blue, green, yellow, colors

chocolate, cake MIR-Flickr Dataset • 1 million images along with user-assigned tags.

nikon, abigfave, food, cupcake, sculpture, beauty, d80 goldstaraward, d80, vegan stone nikond80

anawesomeshot, nikon, green, light, white, yellow, sky, geotagged, theperfectphotographer, photoshop, apple, d70 abstract, lines, bus, reflecon, cielo, flash, damniwishidtakenthat, graphic bilbao, reflejo spiritofphotography Huiskes et. al. Results • Logisc regression on top-level representaon. • Mulmodal Inputs Average Precision

Learning Algorithm MAP Precision@50 Random 0.124 0.124 LDA [Huiskes et. al.] 0.492 0.754 Labeled 25K SVM [Huiskes et. al.] 0.475 0.758 examples DBM-Labelled 0.526 0.791 Deep Belief Net 0.638 0.867 + 1 Million Autoencoder 0.638 0.875 unlabelled DBM 0.641 0.873 Helmholtz Machines • Hinton, G. E., Dayan, P., Frey, B. J. and Neal, R., Science 1995

• Kingma & Welling, 2014 Generave Approximate Process • Inference Rezende, Mohamed, Daan, 2014 h3 W3 • Mnih & Gregor, 2014 2 h • Bornschein & Bengio, 2015 W2 • Tang & Salakhutdinov, 2013 h1 W1 v Input data Helmholtz Machines vs. DBMs

Helmholtz Machine Deep Boltzmann Machine

Generave Approximate Process Inference 3 h3 h 3 W3 W 2 h2 h 2 W2 W h1 h1 W1 W1 v v Input data Variaonal Autoencoders (VAEs) • The VAE defines a generave process in terms of ancestral sampling through a cascade of hidden stochasc layers:

Generave Each term may denote a Process complicated nonlinear relaonship h3 • denotes parameters 3 W of VAE. h2 W2 • is the number of stochasc layers. h1 W1 • Sampling and probability evaluaon is tractable for v each . Input data VAE: Example • The VAE defines a generave process in terms of ancestral sampling through a cascade of hidden stochasc layers:

This term denotes a one-layer neural net. Stochasc Layer • denotes parameters Determinisc of VAE. Layer • is the number of Stochasc Layer stochasc layers. • Sampling and probability evaluaon is tractable for each . Variaonal Bound • The VAE is trained to maximize the variaonal lower bound:

• Trading off the data log-likelihood and the KL divergence from the true posterior.

3 h • Hard to opmize the variaonal bound W3 h2 with respect to the recognion network W2 (high-variance). h1 W1 • Key idea of Kingma and Welling is to use v reparameterizaon trick. Input data Reparameterizaon Trick • Assume that the recognion distribuon is Gaussian:

with mean and covariance computed from the state of the hidden units at the previous layer. • Alternavely, we can express this in term of auxiliary variable: Reparameterizaon Trick • Assume that the recognion distribuon is Gaussian:

• Or

• The recognion distribuon can be expressed in terms of a determinisc mapping:

Determinisc Distribuon of Encoder does not depend on Compung the Gradients • The gradient w.r.t the parameters: both recognion and generave: Autoencoder

Gradients can be The mapping h is a determinisc computed by backprop neural net for fixed . Importance Weighted Autoencoders • Can improve VAE by using following k-sample importance weighng of the log-likelihood:

h3 W3 unnormalized h2 importance weights W2 where are sampled h1 from the recognion network. W1 v Input data Burda, Grosse, Salakhutdinov, 2015 Generang Images from Capons

Stochasc Layer

• Generave Model: Stochasc Recurrent Network, chained sequence of Variaonal Autoencoders, with a single stochasc layer. • Recognion Model: Determinisc Recurrent Network. Gregor et. al. 2015 (Mansimov, Parisoo, Ba, Salakhutdinov, 2015) Movang Example

• Can we generate images from natural language descripons? A stop sign is flying in A pale yellow school bus blue skies is flying in blue skies

A herd of elephants is A large commercial airplane flying in blue skies is flying in blue skies

(Mansimov, Parisoo, Ba, Salakhutdinov, 2015) Flipping Colors

A yellow school bus parked A red school bus parked in in the parking lot the parking lot

A green school bus parked in A blue school bus parked in the parking lot the parking lot

(Mansimov, Parisoo, Ba, Salakhutdinov, 2015) Novel Scene Composions

A toilet seat sits open in the A toilet seat sits open in the bathroom grass field

Ask Google? (Some) Open Problems

• Reasoning, Aenon, and Memory

• Natural Language Understanding

• Deep

• Unsupervised Learning / Transfer Learning / One-Shot Learning (Some) Open Problems

• Reasoning, Aenon, and Memory

• Natural Language Understanding

• Deep Reinforcement Learning

• Unsupervised Learning / Transfer Learning / One-Shot Learning Who-Did-What Dataset

• Document: “…arrested Illinois governor Rod Blagojevich and his chief of staff John Harris on corrupon charges … included Blogojevich allegedly conspiring to sell or trade the senate seat le vacant by President-elect Barack Obama…”

• Query: President-elect Barack Obama said Tuesday he was not aware of alleged corrupon by X who was arrested on charges of trying to sell Obama’s senate seat. • Answer: Rod Blagojevich

Onishi, Wang, Bansal, Gimpel, McAllester, EMNLP, 2016

Nonlinearity Hidden State at Input at me previous me step step t

h1 h2 h3

x1 x2 x3 Gated Aenon Mechanism • Use Recurrent Neural Networks ( RNNs) to encode a document and a query:

Ø Use element-wise mulplicaon to model the interacons between document and query:

(Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017) Mul-hop Architecture • Reasoning requires several passes over the context

(Dhingra, Liu, Yang, Cohen, Salakhutdinov, ACL 2017) Analysis of Aenon

• Context: “…arrested Illinois governor Rod Blagojevich and his chief of staff John Harris on corrupon charges … included Blogojevich allegedly conspiring to sell or trade the senate seat le vacant by President-elect Barack Obama…” • Query: “President-elect Barack Obama said Tuesday he was not aware of alleged corrupon by X who was arrested on charges of trying to sell Obama’s senate seat.” • Answer: Rod Blagojevich Layer 1 Layer 2 Analysis of Aenon

• Context: “…arrested Illinois governor Rod Blagojevich and his chief of staff John Harris on corrupon charges … included Blogojevich allegedly conspiring to sell or trade the senate seat le vacant by President-elect Barack Obama…” • Query: “President-elect Barack Obama said Tuesday he was not aware of alleged corrupon by X who was arrested on charges of trying to sell Obama’s senate seat.” • Answer: Rod Blagojevich Layer 1 Layer 2

Code + Data: hps://github.com/bdhingra/ga-reader Incorporang Prior Knowledge

Mary got the football

She went to the kitchen

She left the ball there RNN Coreference Hyper/Hyponymy

Dhingra, Yang, Cohen, Salakhutdinov 2017 Incorporang Prior Knowledge

Mary got the football

She went to the kitchen

She left the ball there RNN Coreference Hyper/Hyponymy Mt e1 ... e E | | h0 Mt+1 h1 gt . Memory as Acyclic Graph . ht Encoding (MAGE) - RNN RNN ht 1 xt

Dhingra, Yang, Cohen, Salakhutdinov 2017 Incorporang Prior Knowledge

Her plain face broke into Coreference a huge smile when she Core NLP Dependency Parses saw Terry. “Terry!” she called out. She rushed to meet him and they embraced. “Hon, I want Enty relaons you to meet an old Freebase friend, Owen McKenna. Owen, please meet Emily.'’ She gave me a quick nod and turned WordNet Word relaons back to X

Recurrent Neural Network

Text Representaon Neural Story Telling Sample from the Generave Model (recurrent neural network): She was in love with him for the first me in months, so she had no intenon of escaping.

The sun had risen from the ocean, making her feel more alive than normal. She is beauful, but the truth is that I do not know what to do. The sun was just starng to fade away, leaving people scaered around the Atlanc Ocean.

(Kiros et al., NIPS 2015) (Some) Open Problems

• Reasoning, Aenon, and Memory

• Natural Language Understanding

• Deep Reinforcement Learning

• Unsupervised Learning / Transfer Learning / One-Shot Learning Learning Behaviors

Observaon Acon

Learning to map sequences of observaons to acons, for a parcular goal Reinforcement Learning with Memory

Learned Acon External Memory Reward

Observaon / State

Differentiable Neural Computer, Graves et al., Nature, 2016; Neural , Graves et al., 2014 Reinforcement Learning with Memory

Learned Acon External Memory Learning 3-D game Reward without memory

Chaplot, Lample, AAAI 2017 Observaon / State

Differentiable Neural Computer, Graves et al., Nature, 2016; , Graves et al., 2014

Deep RL with Memory

Learned Acon Structured Memory Reward

Observaon / State

Parisotto, Salakhutdinov, 2017 Random Maze with Indicator

• Indicator: Either blue or pink

Ø If blue, find the green block Ø If pink, find the red block

• Negave reward if agent does not find correct block in N steps or goes to wrong block.

Parisotto, Salakhutdinov, 2017 Random Maze with Indicator

Mt Mt+1

Write Write

Read with Aenon

Parisotto, Salakhutdinov, 2017 Random Maze with Indicator Building Intelligent Agents

Learned Acon External Memory Reward

Knowledge Base

Observaon / State Building Intelligent Agents

Learned Acon External Memory Reward Learning from Fewer Knowledge Base Examples, Fewer Experiences Observaon / State Summary • Efficient learning for Deep Unsupervised Models

Text & image retrieval / Image Tagging Learning a Category Object recognion Hierarchy

mosque, tower, building, cathedral, dome, castle Object Detecon Speech Recognion Mulmodal Data HMM decoder

sunset, pacific ocean, beach, seashore

• Deep models improve the current state-of-the art in many applicaon domains: Ø Object recognion and detecon, text and image retrieval, handwrien character and speech recognion, and others. Thank you