Lecture 10: Recurrent Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 10: Recurrent Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 1 May 4, 2017 Administrative A1 grades will go out soon A2 is due today (11:59pm) Midterm is in-class on Tuesday! We will send out details on where to go soon Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 2 May 4, 2017 Extra Credit: Train Game More details on Piazza by early next week Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 3 May 4, 2017 Last Time: CNN Architectures AlexNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 4 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Softmax FC 4096 FC 1000 FC 4096 FC 4096 Pool FC 4096 3x3 conv, 512 Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 Pool Pool 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 Pool Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Input Input VGG16 VGG19 GoogLeNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 5 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 relu 3x3 conv, 64 3x3 conv, 64 F(x) + x 3x3 conv, 64 ... conv 3x3 conv, 128 X 3x3 conv, 128 F(x) relu 3x3 conv, 128 identity 3x3 conv, 128 conv 3x3 conv, 128 3x3 conv, 128 / 2 3x3 conv, 64 3x3 conv, 64 X 3x3 conv, 64 Residual block 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Pool 7x7 conv, 64 / 2 Figure copyright Kaiming He, 2016. Reproduced with permission. Input Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 6 May 4, 2017 DenseNet FractalNet Softmax FC 1x1 conv, 64 Pool Dense Block 3 Concat Conv 1x1 conv, 64 Pool Conv Concat Dense Block 2 Conv Conv Pool Concat Conv Dense Block 1 Conv Conv Input Input Dense Block Figures copyright Larsson et al., 2017. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 7 May 4, 2017 Last Time: CNN Architectures Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 8 May 4, 2017 Last Time: CNN Architectures AlexNet and VGG have tons of parameters in the fully connected layers AlexNet: ~62M parameters FC6: 256x6x6 -> 4096: 38M params FC7: 4096 -> 4096: 17M params FC8: 4096 -> 1000: 4M params ~59M params in FC layers! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 9 May 4, 2017 Today: Recurrent Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 10 May 4, 2017 “Vanilla” Neural Network Vanilla Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 11 May 4, 2017 Recurrent Neural Networks: Process Sequences e.g. Image Captioning image -> sequence of words Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017 Recurrent Neural Networks: Process Sequences e.g. Sentiment Classification sequence of words -> sentiment Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 13 May 4, 2017 Recurrent Neural Networks: Process Sequences e.g. Machine Translation seq of words -> seq of words Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 14 May 4, 2017 Recurrent Neural Networks: Process Sequences e.g. Video classification on frame level Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 15 May 4, 2017 Sequential Processing of Non-Sequence Data Classify images by taking a series of “glimpses” Ba, Mnih, and Kavukcuoglu, “Multiple Object Recognition with Visual Attention”, ICLR 2015. Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 16 May 4, 2017 Sequential Processing of Non-Sequence Data Generate images one piece at a time! Gregor et al, “DRAW: A Recurrent Neural Network For Image Generation”, ICML 2015 Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 17 May 4, 2017 Recurrent Neural Network RNN x Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 18 May 4, 2017 Recurrent Neural Network usually want to y predict a vector at some time steps RNN x Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 19 May 4, 2017 Recurrent Neural Network We can process a sequence of vectors x by applying a recurrence formula at every time step: y RNN new state old state input vector at some time step some function x with parameters W Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 20 May 4, 2017 Recurrent Neural Network We can process a sequence of vectors x by applying a recurrence formula at every time step: y RNN Notice: the same function and the same set x of parameters are used at every time step. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 21 May 4, 2017 (Vanilla) Recurrent Neural Network The state consists of a single “hidden” vector h: y RNN x Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 22 May 4, 2017 RNN: Computational Graph h h 0 fW 1 x1 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 23 May 4, 2017 RNN: Computational Graph h h h 0 fW 1 fW 2 x1 x2 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 24 May 4, 2017 RNN: Computational Graph h f h f h f h h 0 W 1 W 2 W 3 … T x1 x2 x3 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 25 May 4, 2017 RNN: Computational Graph Re-use the same weight matrix at every time-step h f h f h f h h 0 W 1 W 2 W 3 … T x x x W 1 2 3 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 26 May 4, 2017 RNN: Computational Graph: Many to Many y y y y1 2 3 T h f h f h f h h 0 W 1 W 2 W 3 … T x x x W 1 2 3 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 27 May 4, 2017 RNN: Computational Graph: Many to Many y L y L y L y1 L1 2 2 3 3 T T h f h f h f h h 0 W 1 W 2 W 3 … T x x x W 1 2 3 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 28 May 4, 2017 L RNN: Computational Graph: Many to Many y L y L y L y1 L1 2 2 3 3 T T h f h f h f h h 0 W 1 W 2 W 3 … T x x x W 1 2 3 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 29 May 4, 2017 RNN: Computational Graph: Many to One y h f h f h f h h 0 W 1 W 2 W 3 … T x x x W 1 2 3 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 30 May 4, 2017 RNN: Computational Graph: One to Many y y y y3 3 3 T h f h f h f h h 0 W 1 W 2 W 3 … T x W Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 31 May 4, 2017 Sequence to Sequence: Many-to-one + one-to-many Many to one: Encode input sequence in a single vector h h h h … h fW fW fW 0 1 2 3 T x x x W 1 2 3 1 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 32 May 4, 2017 Sequence to Sequence: Many-to-one + one-to-many One to many: Produce output sequence from single input vector Many to one: Encode input sequence in a single vector y y 1 2 h h h h h h h f f f … f f f … W W W W W W 0 1 2 3 T 1 2 x x x W W 1 2 3 1 2 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 33 May 4, 2017 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello” Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 34 May 4, 2017 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello” Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 35 May 4, 2017 Example: Character-level Language Model Vocabulary: [h,e,l,o] Example training sequence: “hello” Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 36 May 4, 2017 “e” “l” “l” “o” Example: Sample .03 .25 .11 .11 Character-level .13 .20 .17 .02 Softmax .00 .05 .68 .08 Language Model .84 .50 .03 .79 Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 37 May 4, 2017 “e” “l” “l” “o” Example: Sample .03 .25 .11 .11 Character-level .13 .20 .17 .02 Softmax .00 .05 .68 .08 Language Model .84 .50 .03 .79 Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 38 May 4, 2017 “e” “l” “l” “o” Example: Sample .03 .25 .11 .11 Character-level .13 .20 .17 .02 Softmax .00 .05 .68 .08 Language Model .84 .50 .03 .79 Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 39 May 4, 2017 “e” “l” “l” “o” Example: Sample .03 .25 .11 .11 Character-level .13 .20 .17 .02 Softmax .00 .05 .68 .08 Language Model .84 .50 .03 .79 Sampling Vocabulary: [h,e,l,o] At test-time sample characters one at a time, feed back to model Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 40 May 4, 2017 Forward through entire sequence to compute loss, then backward through Backpropagation through time entire sequence to compute gradient Loss Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 41 May 4, 2017 Truncated Backpropagation through time Loss Run forward and backward through chunks of the sequence instead of whole sequence Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 42 May 4, 2017 Truncated Backpropagation through time Loss Carry hidden states forward in time forever, but only backpropagate for some smaller number of steps Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 43 May 4, 2017 Truncated Backpropagation