Deep Learning: What’S All the Fuss About?
Total Page:16
File Type:pdf, Size:1020Kb
Deep Learning: What’s All the Fuss About? Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) The Perceptual Expertise Network The Temporal Dynamics of Learning Center Computer Science and Engineering Department, UCSD 2/5/20 PACS-2017 1 Or, a Shallow Introduction to Deep Learning! Garrison W. Cottrell Gary's Unbelievable Research Unit (GURU) The Perceptual Expertise Network The Temporal Dynamics of Learning Center Computer Science and Engineering Department, UCSD 2/5/20 PACS-2017 2 Outline ! What is Deep Learning? ! Why is it interesting? ! What can it do? 2/5/20 PACS-2017 3 What is Deep Learning? ! Deep Learning refers to Deep Neural Networks – What’s a neural network? – What’s a deep neural network? 2/5/20 PACS-2017 4 What’s a neural network? ! A neural network is a kind of computational model inspired by the brain ! It consists of “units” connected by weighted links ! Here’s a simple neural network, called a perceptron: 2/5/20 PACS-2017 5 Perceptrons: A bit of history Frank Rosenblatt studied a simple version of a neural net called a perceptron: – A single layer of processing – Binary output – Can compute simple things like (some) boolean functions (OR, AND, etc.) PACS-2017 6 Perceptrons: A bit of history ! Computes the weighted sum of its inputs (the net input), compares it to a threshold, and “fires” if the net is greater than or equal than the threshold. ! Why is this a “neural” network? – Because we can think of the input units here as “neurons” that are spreading activation to the output unit via synaptic connections – like real neurons do. PACS-2017 7 The Perceptron Activation Rule output net input This is called a binary threshold unit PACS-2017 8 The Perceptron Activation Rule output net input This is called a binary threshold unit PACS-2017 9 Quiz!!! X1 X2 OR(X1,X2) 0 0 0 0 1 1 1 0 1 1 1 1 Assume: FALSE == 0, TRUE==1, so if X1 is false, it is 0. Can you come up with a set of weights and a threshold so that a two-input perceptron computes OR? PACS-2017 10 Quiz X1 X2 AND(X1,X2) 0 0 0 0 1 0 1 0 0 1 1 1 Assume: FALSE == 0, TRUE==1 Can you come up with a set of weights and a threshold so that a two-input perceptron computes AND? PACS-2017 11 Quiz X1 X2 XOR(X1,X2) 0 0 0 0 1 1 1 0 1 1 1 0 Assume: FALSE == 0, TRUE==1 Can you come up with a set of weights and a threshold so that a two-input perceptron computes XOR? PACS-2017 12 Perceptrons The goal was to make a neurally-inspired machine that could categorize inputs – and learn to do this from examples via supervised learning: The network is presented with many input-output examples, and the weights are adjusted by a learning rule: Wi(t+1) = Wi(t) + α*(target - output)*Xi (the target is given by the example, α is the learning rate) This rule is called the delta rule, because the weights are changed according to the delta – the difference between what the output and the target are. PACS-2017 13 How the delta rule works Wi(t+1) = Wi(t) + α*(target - output)*Xi So, the network is presented with an input, it computes the output, and this is compared with the target. Notice that if the output and the target are the same, nothing happens – there is no change to the weight. If the output is 0 and the target is 1, and Xi is 1, then the weight is raised – which will tend to make the output 1 next time. If the output is 1 and the target is 0, and Xi is 1, then the weight is lowered– which will tend to make the output 0 next time. PACS-2017 14 Problems with perceptrons ! The learning rule comes with a great guarantee: anything a perceptron can compute, it can learn to compute. ! Problem: Lots of things were not computable, e.g., XOR (Minsky & Papert, 1969) ! Minsky & Papert said: – if you had hidden units, you could compute any boolean function. – But no learning rule exists for such multilayer networks, and we don’t think one will ever be discovered. Back propagation, 25 years 15 later Problems with perceptrons Back propagation, 25 years 16 later XOR: The smallest “hard problem” Notice that the network could not do XOR. If there is another layer (here, just one unit!) between the input and the output, then the network can compute XOR: PACS-2017 17 XOR: The smallest “hard problem” Notice that the network could not do XOR. If there is another layer between the input and the output, then the network can compute XOR: Without the middle unit, the rest of the network just computes OR. OR PACS-2017 18 XOR: The smallest “hard problem” Notice that the network could not do XOR. If there is another layer between the input and the output, then the network can compute XOR: OR is “right” in 3 out of four cases for XOR: X1 X2 OR XOR (X1,X2) (X1,X2) 0 0 0 0 0 1 1 1 1 0 1 1 1 1 1 0 OR PACS-2017 19 XOR: The smallest “hard problem” Notice that the network could not do XOR. If there is another layer between the input and the output, then the network can compute XOR: The middle unit computes “AND” of the inputs and turns the output off just for the one exception to OR: the 1,1 case! AND PACS-2017 20 Multi-layer neural networks Ok, we’ve established that: 1. The simplest neural networks have a single layer of processing 2. By adding one more layer, networks can compute harder problems. 3. The AND unit can be thought of as a feature that is useful for solving the task Now, backpropagation learning generalizes the delta rule to multi-layer networks – it can be used to learn the AND feature! PACS-2017 21 Aside about perceptrons ! They didn’t have hidden units - but Rosenblatt assumed nonlinear preprocessing! ! Hidden units compute features of the input ! The nonlinear preprocessing is a way to choose features by hand. ! Support Vector Machines essentially do this in a principled way, followed by a (highly sophisticated) perceptron learning algorithm. Back propagation, 25 years 22 later Enter Rumelhart, Hinton, & Williams (1985) ! (Re-)Discovered a learning rule for networks with hidden units. ! Works a lot like the perceptron algorithm: – Randomly choose an input-output pattern – present the input, let activation propagate through the network – give the teaching signal – propagate the error back through the network (hence the name back propagation) – change the connection strengths according to the error Back propagation, 25 years 23 later Enter Rumelhart, Hinton, & Williams (1985) OUTPUTS . Hidden Units Activation Error . INPUTS ! The actual algorithm uses the chain rule of calculus to go downhill in an error measure with respect to the weights Back propagation, 25 years ! The hidden units must learn features24 that solve the problem later XOR Back Propagation Learning OR AND Random Network XOR Network Here, the hidden units learned AND and OR - two features that when combined appropriately, can solve the problem Back propagation, 25 years 25 later XOR Back Propagation Learning OR AND Random Network XOR Network But, depending on initial conditions, there are an infinite number of ways to do XOR - backprop can surprise you with innovative solutions. Back propagation, 25 years 26 later Why is/was this wonderful? 1. Learns internal representations 2. Learns internal representations 3. Learns internal representations ! Generalizes to recurrent networks Back propagation, 25 years 27 later Hinton’s Family Trees example ! Idea: Learn to represent relationships between people that are encoded in a family tree: Back propagation, 25 years 28 later Hinton’s Family Trees example ! Idea 2: Learn distributed representations of concepts: localist outputs Learn: features of these entities useful for solving the task Input: localist people localist relations Localist: one unit “ON” to represent each item Back propagation, 25 years 29 later People hidden units: Hinton diagram ! What does the unit 1 encode? What is unit 1 encoding? Back propagation, 25 years 30 later People hidden units: Hinton diagram ! What does unit 2 encode? What is unit 2 encoding? Back propagation, 25 years 31 later People hidden units: Hinton diagram ! Unit 6? What is unit 6 encoding? Back propagation, 25 years 32 later People hidden units: Hinton diagram When all three are on, these units pick out Christopher and Penelope: Back propagation, 25 years Other combinations pick out other parts of33 the trees later Relation units What does the lower middle one code? Back propagation, 25 years 34 later Lessons ! The network learns features in the service of the task - i.e., it learns features on its own. ! This is useful if we don’t know what the features ought to be. ! The networks have been used in my lab for years to explain some human phenomena Back propagation, 25 years 35 later Switch to Demo ! This demo is downloadable from my website under “Resources” ! About the middle of the page: ! “A matlab neural net demo with face processing.” Back propagation, 25 years 36 later Multi-layer neural networks Why is this interesting? Because now we can train neural networks to do very interesting tasks – like face recognition, object recognition, handwriting recognition, etc. Thus there was a lot of excitement about neural nets when the backpropagation learning rule was (re-)discovered in 1985.