Lecture 4: Neural Networks and Backpropagation

Lecture 4: Neural Networks and Backpropagation Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 1 April 11, 2019 Administrative: Assignment 1 Assignment 1 due Wednesday April 17, 11:59pm If using Google Cloud, you don’t need GPUs for this assignment! We will distribute Google Cloud coupons by this weekend Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 2 April 11, 2019 Administrative: Alternate Midterm Time If you need to request an alternate midterm time: See Piazza for form, fill it out by 4/25 (two weeks from today) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 3 April 11, 2019 Administrative: Project Proposal Project proposal due 4/24 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 4 April 11, 2019 Administrative: Discussion Section Discussion section tomorrow (1:20pm in Gates B03): How to pick a project / How to read a paper Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 5 April 11, 2019 Where we are... Linear score function SVM loss (or softmax) data loss + regularization How to find the best W? Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 6 April 11, 2019 Finding the best W: Optimize with Gradient Descent Landscape image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 7 April 11, 2019 Gradient descent Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 8 April 11, 2019 Problem: Linear Classifiers are not very powerful Visual Viewpoint Geometric Viewpoint Linear classifiers learn Linear classifiers one template per class can only draw linear decision boundaries Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 9 April 11, 2019 One Solution: Feature Transformation f(x, y) = (r(x, y), θ(x, y)) Transform data with a cleverly chosen feature transform f, then apply linear classifier Color Histogram Histogram of Oriented Gradients (HoG) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 10 April 11, 2019 Image features vs ConvNets f Feature Extraction 10 numbers giving scores for classes training Krizhevsky, Sutskever, and Hinton, “Imagenet classification with deep convolutional neural networks”, NIPS 2012. Figure copyright Krizhevsky, Sutskever, and Hinton, 2012. Reproduced with permission. 10 numbers giving scores for classes training Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - 11 April 9, 2019 Today: Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 12 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 13 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network (In practice we will usually add a learnable bias at each layer as well) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 14 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network “Neural Network” is a very broad term; these are more accurately called “fully-connected networks” or sometimes “multi-layer perceptrons” (MLP) (In practice we will usually add a learnable bias at each layer as well) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 15 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network or 3-layer Neural Network (In practice we will usually add a learnable bias at each layer as well) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 16 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x W1 h W2 s 3072 100 10 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 17 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network x W1 h W2 s 3072 100 10 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 18 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network The function is called the activation function. Q: What if we try to build a neural network without one? Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 19 April 11, 2019 Neural networks: without the brain stuff (Before) Linear score function: (Now) 2-layer Neural Network The function is called the activation function. Q: What if we try to build a neural network without one? A: We end up with a linear classifier again! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 20 April 11, 2019 Activation functions Sigmoid Leaky ReLU tanh Maxout ReLU ELU Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 21 April 11, 2019 ReLU is a good default Activation functions choice for most problems Sigmoid Leaky ReLU tanh Maxout ReLU ELU Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 22 April 11, 2019 Neural networks: Architectures “3-layer Neural Net”, or “2-layer Neural Net”, or “2-hidden-layer Neural Net” “1-hidden-layer Neural Net” “Fully-connected” layers Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 23 April 11, 2019 Example feed-forward computation of a neural network Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 24 April 11, 2019 Full implementation of training a 2-layer Neural Network needs ~20 lines: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 25 April 11, 2019 Setting the number of layers and their sizes more neurons = more capacity Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 26 13 Jan 2016 Do not use size of neural network as a regularizer. Use stronger regularization instead: (Web demo with ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html) Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 27 13 Jan 2016 This image by Fotis Bobolas is licensed under CC-BY 2.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 28 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 29 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 30 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 sigmoid activation function Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 31 April 11, 2019 Impulses carried toward cell body dendrite presynaptic terminal axon cell body Impulses carried away from cell body This image by Felipe Perucho is licensed under CC-BY 3.0 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 32 April 11, 2019 Biological Neurons: Neurons in a neural network: Complex connectivity patterns Organized into regular layers for computational efficiency This image is CC0 Public Domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 33 April 11, 2019 Biological Neurons: But neural networks with random Complex connectivity patterns connections can work too! This image is CC0 Public Domain Xie et al, “Exploring Randomly Wired Neural Networks for Image Recognition”, arXiv 2019 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 34 April 11, 2019 Be very careful with your brain analogies! Biological Neurons: ● Many different types ● Dendrites can perform complex non-linear computations ● Synapses are not a single weight but a complex non-linear dynamical system ● Rate code may not be adequate [Dendritic Computation. London and Hausser] Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 35 April 11, 2019 Problem: How to compute gradients? Nonlinear score function SVM Loss on predictions Regularization Total loss: data loss + regularization If we can compute then we can learn W1 and W2 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 36 April 11, 2019 (Bad) Idea: Derive on paper Problem: Very tedious: Lots of matrix calculus, need lots of paper Problem: What if we want to change loss? E.g. use softmax instead of SVM? Need to re-derive from scratch =( Problem: Not feasible for very complex models! Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 37 April 11, 2019 Better Idea: Computational graphs + Backpropagation x s (scores) hinge + * loss L W R Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 38 April 11, 2019 Convolutional network (AlexNet) input image weights loss Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 39 April 11, 2019 Neural Turing Machine input image loss Figure reproduced with permission from a Twitter post by Andrej Karpathy. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 40 April 11, 2019 Neural Turing Machine Figure reproduced with permission from a Twitter post by Andrej Karpathy. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 11, 2019 Backpropagation: a simple example Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 42 April 11, 2019 Backpropagation: a simple example Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 43 April 11, 2019 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 44 April 13, 2017 Backpropagation: a simple example e.g. x = -2, y = 5, z = -4 Want: Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 45 April 13, 2017 Backpropagation: a simple example e.g.

Load more