Lecture 9: Recurrent Neural Networks Deep Learning @ UvA
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 1 Previous Lecture
o Word and Language Representations o From n-grams to Neural Networks o Word2vec o Skip-gram
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 2 Lecture Overview
o Recurrent Neural Networks (RNN) for sequences o Backpropagation Through Time o RNNs using Long Short-Term Memory (LSTM) o Applications of Recurrent Neural Networks
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 3 Output
Recurrent Neural Networks NN Cell Recurrent State connections
Input UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 4 Sequential vs. static data
o Abusing the terminology ◦ static data come in a mini-batch of size 퐷 ◦ dynamic data come in a mini-batch of size 1
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 5 Sequential vs. static data
o Abusing the terminology ◦ static data come in a mini-batch of size 퐷 ◦ dynamic data come in a mini-batch of size 1
What about inputs that appear in sequences, such as text? Could a neural network handle such modalities?
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 6 Modelling sequences is …
o ... roughly equivalent to predicting what is going to happen next
Pr 푥 = Pr 푥푖 푥1, … , 푥푖−1) 푖 o Easy to generalize to sequences of arbitrary length
o Considering small chunks 푥푖 fewer parameters, easier modelling o Often is convenient to pick a “frame” 푇
Pr 푥 = Pr 푥푖 푥푖−푇, … , 푥푖−1) 푖
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 7 Word representations
o One-hot vector ◦ After one-hot vector add an embedding o Instead of one-hot vector use directly a word representation ◦ Word2vec ◦ GloVE Vocabulary One-hot vectors I I 1 I 0 I 0 I 0 am am 0 am 1 am 0 am 0 Bond Bond 0 Bond 0 Bond 1 Bond 0 James James 0 James 0 James 0 James 1 tired tired 0 tired 0 tired 0 tired 0 , , 0 , 0 , 0 , 0 McGuire McGuire 0 McGuire 0 McGuire 0 McGuire 0 ! ! 0 ! 0 ! 0 ! 0
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 8 What a sequence really is?
o Data inside a sequence are non i.i.d. ◦ Identically, independently distributed o The next “word” depends on the previous “words” ◦ Ideally on all of them o We need context, and we need memory! o How to model context and memory ?
I am Bond , James
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 9 What a sequence really is?
o Data inside a sequence are non i.i.d. ◦ Identically, independently distributed o The next “word” depends on the previous “words” ◦ Ideally on all of them o We need context, and we need memory! o How to model context and memory ? McGuire Bond
I am Bond , James tired am !
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 10 What a sequence really is?
o Data inside a sequence are non i.i.d. ◦ Identically, independently distributed o The next “word” depends on the previous “words” ◦ Ideally on all of them o We need context, and we need memory! o How to model context and memory ? McGuire Bond
I am Bond , James tired am !
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 11 What a sequence really is?
o Data inside a sequence are non i.i.d. ◦ Identically, independently distributed o The next “word” depends on the previous “words” ◦ Ideally on all of them o We need context, and we need memory! o How to model context and memory ? McGuire Bond
I am Bond , James Bond tired am !
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 12 Modelling-wise, what is memory?
o A representation (variable) of the past o How to adapt a simple Neural Network to include a memory variable? Input 푦푡 푦푡+1 푦푡+2 푦푡+3 Output parameters 푉 Memory 푉 푉 푉 parameters 푐푡 푊 푊 푊 푊 Memory 푐푡+1 푐푡+2 푐푡+3
푈 푈 푈 푈 Input parameters
푥푡 푥푡+1 푥푡+2 푥푡+3 Output
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 13 Recurrent Neural Network (RNN)
Unrolled/Unfolded Network Recurrent Neural Network
푦푡 푦푡+1 푦푡+2 푦푡
푉 푉 푉 푊
푊 푊 푐푡 푐푡+1 푐푡+2 푐푡
(푐푡−1) 푈 푈 푈
푥푡 푥푡+1 푥푡+2 푥푡
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 14 Why unrolled?
o Imagine we care only for 3-grams o Are the two networks that different? ◦ Steps instead of layers ◦ Step parameters are same (shared parameters in a NN)
푦1 푦2 푦3 Final output 푉 푉 푉
“Layer/Step” 1 “Layer/Step” 2 “Layer/Step” 3
퐿푎푦푒푟 퐿푎푦푒푟 퐿푎푦푒푟 푊 푊 푊 Final output
푊 푊 푊 1 2 3
1 2 3
푈 푈 푈 3-gram Unrolled Recurrent Network 3-layer Neural Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 15 Why unrolled?
o Imagine we care only for 3-grams • Sometimes intermediate outputs are not even needed o Are the two networks that different? ◦ Steps instead of layers ◦ Step parameters are same (shared parameters in a NN)
푦1 푦2 푦3 Final output 푉 푉 푉
“Layer/Step” 1 “Layer/Step” 2 “Layer/Step” 3
퐿푎푦푒푟 퐿푎푦푒푟 퐿푎푦푒푟 푊 푊 푊 Final output
푊 푊 푊 1 2 3
1 2 3
푈 푈 푈 3-gram Unrolled Recurrent Network 3-layer Neural Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 16 Why unrolled?
o Imagine we care only for 3-grams • Sometimes intermediate outputs are not even needed o Are the two networks that different? • Removing them, we almost end up to a standard Neural Network ◦ Steps instead of layers ◦ Step parameters are same (shared parameters in a NN)
푦3 Final output 푉 푉 푉
“Layer/Step” 1 “Layer/Step” 2 “Layer/Step” 3
퐿푎푦푒푟 퐿푎푦푒푟 퐿푎푦푒푟 푊 푊 푊 Final output
푊 푊 푊 1 2 3
1 2 3
푈 푈 푈 3-gram Unrolled Recurrent Network 3-layer Neural Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 17 Why unrolled?
o Imagine we care only for 3-grams • Sometimes intermediate outputs are not even needed o Are the two networks that different? • Removing them, we almost end up to a standard Neural Network ◦ Steps instead of layers ◦ Step parameters are same (shared parameters in a NN)
푦1 푦2 푦3 Final output 푉 푉 푉
“Layer/Step” 1 “Layer/Step” 2 “Layer/Step” 3
퐿푎푦푒푟 퐿푎푦푒푟 퐿푎푦푒푟 푊 푊 푊 Final output
푊 푊 푊 1 2 3
1 2 3
푈 푈 푈 3-gram Unrolled Recurrent Network 3-layer Neural Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 18 Why unrolled?
o Imagine we care only for 3-grams • Sometimes intermediate outputs are not even needed o Are the two networks that different? • Removing them, we almost end up to a standard Neural Network ◦ Steps instead of layers ◦ Step parameters are same (shared parameters in a NN)
푦1 푦2 푦3 Final output 푉 푉 푉
“Layer/Step” 1 “Layer/Step” 2 “Layer/Step” 3
퐿푎푦푒푟 퐿푎푦푒푟 퐿푎푦푒푟 푊 푊 푊 Final output
푊 푊 푊 1 2 3
1 2 3
푈 푈 푈 3-gram Unrolled Recurrent Network 3-layer Neural Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 19 Why unrolled?
o Imagine we care only for 3-grams • Sometimes intermediate outputs are not even needed o Are the two networks that different? • Removing them, we almost end up to a standard Neural Network ◦ Steps instead of layers ◦ Step parameters are same (shared parameters in a NN)
푦1 푦2 푦3 Final output 푉 푉 푉
“Layer/Step” 1 “Layer/Step” 2 “Layer/Step” 3
퐿푎푦푒푟 퐿푎푦푒푟 퐿푎푦푒푟 푊 푊 푊 Final output
푊 푊 푊 1 2 3
1 2 3
푈 푈 푈 3-gram Unrolled Recurrent Network 3-layer Neural Network
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 20 RNN equations
o At time step 푡 One-hot vector
푐푡 = tanh(푈 푥푡 + 푊푐푡−1)
푦푡 = softmax(푉푐푡) o Example ◦ Vocabulary of 500 words ◦ An input projection of 50 dimensions (U: [50 × 500]) ◦ A memory of 128 units (푐푡: 128 × 1 , W: [128 × 128]) ◦ An output projections of 500 dimensions (V: [500 × 128])
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 21 Loss function
o Cross entropy loss 1 푃 = 푦푙푡푘 ⇒ ℒ = − log 푃 = − 푙 log 푦 , 푡푘 푇 푡 푡 푡 푘 푡 ◦ non-target words 푙푡 = 0 ⇒ Compute loss only for target words ◦ Random loss =? ⇒ ℒ = log 퐾
o Usually loss is measured w.r.t. perplexity
퐽 = 2ℒ
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 22 Training an RNN
o Backpropagation Through Time (BPTT)
푦1 푦2 푦3
푉 푉 푉
푊 푊 푊
푈 푈 푈
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 23 Backpropagation Through Time
휕ℒ 휕ℒ 휕ℒ o , , 휕푉 휕푊 휕푈 o To make it simpler let’s focus on step 3 휕ℒ 휕ℒ 휕ℒ 3 , 3 , 3 휕푉 휕푊 휕푈 푦1, ℒ1 푦2, ℒ2 푦3, ℒ3
푉 푉 푉
푐푡 = tanh(푈 푥푡 + 푊푐푡−1) 푊 푊 푊 푦푡 = softmax(푉푐푡)
ℒ = − 푙푡 log 푦푡 = ℒ푡 푡 푡 푈 푈 푈
푥1 푥2 푥3
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 24 Backpropagation Through Time
휕ℒ3 휕ℒ3 휕푦3 = = 푦3 − 푙3 × 푐3 휕푉 휕푦3 휕푉
푦1, ℒ1 푦2, ℒ2 푦3, ℒ3
푉 푉 푉
푐푡 = tanh(푈 푥푡 + 푊푐푡−1) 푊 푊 푊 푦푡 = softmax(푉푐푡)
ℒ = − 푙푡 log 푦푡 = ℒ푡 푡 푡 푈 푈 푈
푥1 푥2 푥3
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 25 Backpropagation Through Time
휕ℒ 휕ℒ 휕푦 휕푐 o 3 = 3 3 3 휕푊 휕푦3 휕푐3 휕푊
o What is the relation between 푐3 and 푊? ◦ Two-fold: 푐푡 = tanh(푈 푥푡 + 푊푐푡−1) 휕 푓(휑 푥 , 휓(푥)) 휕푓 휕휑 휕푓 휕휓 o = + 푦1, ℒ1 푦2, ℒ2 푦3, ℒ3 휕푥 휕휑 휕푥 휕휓 휕푥 휕푐 휕푐 휕푊 o 3 ∝ 푐 + 2 ( = 1) 푉 푉 푉 휕푊 2 휕푊 휕푊 푊 푊 푊 푐푡 = tanh(푈 푥푡 + 푊푐푡−1)
푦푡 = softmax(푉푐푡)
ℒ = − 푙푡 log 푦푡 = ℒ푡 푈 푈 푈 푡 푡 푥1 푥2 푥3
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 26 Recursively
휕푐3 휕푐2 o = 푐2 + 휕푊 휕푊 3 3 휕푐2 휕푐1 휕푐3 휕푐3 휕푐푡 휕ℒ3 휕ℒ3 휕푦3 휕푐3 휕푐푡 o = 푐1 + = ⇒ = 휕푊 휕푊 휕푊 휕푐푡 휕푊 휕푊 휕푦3 휕푐3 휕푐푡 휕푊 푡=1 푡=1 휕푐 휕푐 o 1 = 푐 + 0 휕푊 0 휕푊 푦1, ℒ1 푦2, ℒ2 푦3, ℒ3
푉 푉 푉
푊 푊 푊 푐푡 = tanh(푈 푥푡 + 푊푐푡−1)
푦푡 = softmax(푉푐푡)
ℒ = − 푙푡 log 푦푡 = ℒ푡 푈 푈 푈 푡 푡 푥1 푥2 푥3
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 27 Are RNNs perfect?
o NO ◦ Although in theory yes! o Vanishing gradient ◦ After a few time steps the gradients become almost 0 o Exploding gradient ◦ After a few time steps the gradients become huge o Can’t really capture long-term dependencies
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 28 Exploding gradients
휕ℒr r 휕ℒ푟 휕푦푟 휕푐푟 휕푐푡 o = 푡=1 휕푊 휕푦푟 휕푐푟 휕푐푡 휕푊 휕푐 o 푟 can be decomposed further based on the chain rule 휕푐푡 휕푐 휕푐 휕푐 휕푐 o 푟 = 푟 ⋅ 푟−1 ⋅ … ⋅ 푡+1 휕푐푡 휕푐푟−1 휕푐푟−2 휕푐푡
푅푒푠푡 → long-term factors 휏 ≫ 푟 → short-term factors 휕푐 o When many long-term factors, for many of which 푖 ≫ 1 휕푐푖−1 휕푐 ◦ then 푟 ≫ 1 휕푐푡 휕ℒ Gradient explodes ◦ then r ≫ 1 휕푊
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 29 Remedy for exploding gradients
o Scale the gradients to a threshold g 휕ℒ o Step 1. g ← 휕휃 휃 0 푔 푔 o Step 2. Is 푔 > 휃0? 휃 ◦ Step 3a. If yes g ← 0 푔 푔 ◦ Step 3b. If no, then do nothing 휃0 o Simple, but works!
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 30 Vanishing gradients
휕ℒr r 휕ℒ푟 휕푦푟 휕푐푟 휕푐푡 o = 푡=1 휕푊 휕푦푟 휕푐푟 휕푐푡 휕푊 휕푐 o 푟 can be decomposed further based on the chain rule 휕푐푡 휕푐 휕푐 휕푐 휕푐 o 푟 = 푟 ⋅ 푟−1 ⋅ … ⋅ 푡+1 휕푐푡 휕푐푟−1 휕푐푟−2 휕푐푡
푅푒푠푡 → long-term factors 휏 ≫ 푟 → short-term factors 휕푐 o When many 푖 → 0 (e.g. with sigmoids), 휕푐푖−1 휕푐 ◦ then 푟 → 0 휕푐푡 휕ℒ Gradient vanishes ◦ then r → 0 휕푊
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 31 푐 푐 푡−1 + 푡 Advanced RNNs tanh
푓푡 푖푡 표푡
푐 푡 휎 휎 tanh 휎
푚푡−1 푚푡 Output
푥 Input 푡 UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 32 A more realistic memory unit needs what?
New memory and NN state
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 33 A more realistic memory unit needs what?
Previous Input New memory and NN state memory
Previous NN state
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 34 Long Short-Termn Memory (LSTM: Beefed up RNN)
(푖) (푖) 푐푡−1 푐푡 푖 = 휎 푥푡푈 + 푚푡−1푊 + (푓) (푓) 푓 = 휎 푥푡푈 + 푚푡−1푊 tanh (표) (표) 표 = 휎 푥푡푈 + 푚푡−1푊 푓푡 푖푡 표푡 푔 (푔) 푐 푡 = tanh(푥푡푈 + 푚푡−1푊 ) 푐 푡 푐푡 = 푐푡−1 ⊙ 푓 + 푐 푡 ⊙ 푖 휎 휎 tanh 휎
푚푡 = tanh 푐푡 ⊙ 표 푚푡−1 푚푡 Output
푥 Input 푡
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 35 LSTM Step-by-Step: Step (1)
o E.g. Model the sentence “Yesterday she slapped me. Today she loves me.” o Decide what to forget and what to remember for the new memory ◦ Sigmoid 1 Remember everything ◦ Sigmoid 0 Forget everything 푐 푐 푡−1 + 푡 tanh (푖) (푖) 푖푡 = 휎 푥푡푈 + 푚푡−1푊 푓푡 푖푡 표푡 (푓) (푓) 푓푡 = 휎 푥푡푈 + 푚푡−1푊 푐 푡 (표) (표) 표푡 = 휎 푥푡푈 + 푚푡−1푊 휎 휎 tanh 휎 푔 (푔) 푐 푡 = tanh(푥푡푈 + 푚푡−1푊 ) 푚푡−1 푚푡 푐푡 = 푐푡−1 ⊙ 푓 + 푐 푡 ⊙ 푖 푥푡 푚푡 = tanh 푐푡 ⊙ 표
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 36 LSTM Step-by-Step: Step (2)
o Decide what new information should you add to the new memory ◦ Modulate the input 푖푡 ◦ Generate candidate memories 푐 푡
푐 푐 푡−1 + 푡 tanh (푖) (푖) 푖푡 = 휎 푥푡푈 + 푚푡−1푊 푓푡 푖푡 표푡 (푓) (푓) 푓푡 = 휎 푥푡푈 + 푚푡−1푊 푐 푡 (표) (표) 표푡 = 휎 푥푡푈 + 푚푡−1푊 휎 휎 tanh 휎 푔 (푔) 푐 푡 = tanh(푥푡푈 + 푚푡−1푊 ) 푚푡−1 푚푡 푐푡 = 푐푡−1 ⊙ 푓 + 푐 푡 ⊙ 푖 푥푡 푚푡 = tanh 푐푡 ⊙ 표
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 37 LSTM Step-by-Step: Step (3)
o Compute and update the current cell state 푐푡 ◦ Depends on the previous cell state ◦ What we decided to forget ◦ What inputs we allowed 푐 푐 ◦ The candidate memories 푡−1 + 푡 tanh (푖) (푖) 푖푡 = 휎 푥푡푈 + 푚푡−1푊 푓푡 푖푡 표푡 (푓) (푓) 푓푡 = 휎 푥푡푈 + 푚푡−1푊 푐 푡 (표) (표) 표푡 = 휎 푥푡푈 + 푚푡−1푊 휎 휎 tanh 휎 푔 (푔) 푐 푡 = tanh(푥푡푈 + 푚푡−1푊 ) 푚푡−1 푚푡 푐푡 = 푐푡−1 ⊙ 푓 + 푐 푡 ⊙ 푖 푥푡 푚푡 = tanh 푐푡 ⊙ 표
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 38 LSTM Step-by-Step: Step (4)
o Modulate the output ◦ Does the cell state contain something relevant? Sigmoid 1 o Generate the new memory 푐 푐 푡−1 + 푡 tanh (푖) (푖) 푖푡 = 휎 푥푡푈 + 푚푡−1푊 푓푡 푖푡 표푡 (푓) (푓) 푓푡 = 휎 푥푡푈 + 푚푡−1푊 푐 (표) (표) 푡 표푡 = 휎 푥푡푈 + 푚푡−1푊 휎 휎 tanh 휎 푔 (푔) 푐 푡 = tanh(푥푡푈 + 푚푡−1푊 ) 푚푡−1 푚푡 푐푡 = 푐푡−1 ⊙ 푓 + 푐 푡 ⊙ 푖
푥푡 푚푡 = tanh 푐푡 ⊙ 표
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 39 LSTM Unrolled Network
o Macroscopically very similar to standard RNNs o The engine is a bit different (more complicated) ◦ Because of their gates LSTMs capture long and short term dependencies
× + × + × + tanh tanh tanh
× × × × × ×
휎 휎 tanh 휎 휎 휎 tanh 휎 휎 휎 tanh 휎
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 40 Even more beef to the RNNs
o LSTM with peephole connections ◦ Gates have access also to the previous cell states 푐푡−1 (not only memories) ◦ Coupled forget and input gates, 푐푡 = 푓푡 ⊙ 푐푡−1 + 1 − 푓푡 ⊙ 푐 푡 ◦ Bi-directional recurrent networks o GRU ◦ A bit simpler than LSTM ◦ Performance similar to LSTM o Deep LSTM LSTM (2) LSTM (2) LSTM (2)
LSTM (1) LSTM (1) LSTM (1)
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 41 Click to go to the video in Youtube Click to go to the website
Applications of Recurrent Networks
UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 42 Text generation
o Generate text like Nietzsche, Shakespeare or Wikipedia o Generate Linux kernel-like C++ code o Or even generate a new website
Pr 푥 = 푖 Pr 푥푖 푥푝푎푠푡; 휃), where 휃 are the LSTM parameters
Training Inference Shakespeare To be or … Prediction Two are or … O Romeo
LSTM (1) LSTM (1) LSTM (1) LSTM (1) LSTM (1)
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 43 Handwriting generation
o Problem 1: Model real coordinates instead of one-hot vectors o Recurrent Mixture Density Networks o Have outputs follow a Gaussian Distribution ◦ Output needs to be suitably squashed o We don’t just fit a Gaussian to the data ◦ We also condition on the previous outputs
Pr 표푡 = 푤푖 푥1:푇 푁(표푡|휇푖 푥1:푇 , Σ푖(푥1:푇)) 푖
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 44 Machine Translation
o The phrase in the source language is one sequence ◦ “Today the weather is very good” o The phrase in the target language is also a sequence ◦ “Vandaag de weer is heel goed” o Problems ◦ no perfect word alignment, sentence length might differ o Solution ◦ Encoder-decoder scheme Vandaag de weer is heel goed
LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Today the weather is good
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 45 Better Machine Translation
o It might even pay off reversing the source sentence ◦ The first target words will be closer to their respective source words o The encoder and decoder parts can be modelled with different LSTMs o Deep LSTM
Vandaag de weer is heel goed
LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM good is weather the Today
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 46 Image captioning
o An image is a thousand words, literally! o Pretty much the same as machine transation o Replace the encoder part with the output of a Convnet ◦ E.g. use Alexnet or a VGG16 network o Keep the decoder part to operate as a translator
Today the weather is good
Convnet LSTM LSTM LSTM LSTM LSTM LSTM
Today the weather is good
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 47 Question answering
Q: John entered the living room, where o Bleeding-edge research, no real consensus he met Mary. She was drinking some ◦ Very interesting open, research problems wine and watching a movie. What room did John enter? o Again, pretty much like machine translation A: John entered the living room. o Encoder-Decoder scheme ◦ Insert the question to the encoder part ◦ Model the answer at the decoder part o You can also have question answering with images ◦ Again, bleeding-edge research ◦ How/where to add the image? ◦ What has been working so far is to add the image only in the beginning Q: what are the people playing? A: They play beach football
UVA DEEP LEARNING COURSE – EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 48 o Recurrent Neural Networks (RNN) for sequences o Backpropagation Through Time Summary o RNNs using Long Short-Term Memory (LSTM) o Applications of Recurrent Neural Networks
UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 49 o Memory networks o Recursive networks Next lecture
UVA DEEP LEARNING COURSE EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS - 50