Lecture 4 Backpropagation CMSC 35246: Deep Learning

Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Lecture 4 Backpropagation CMSC 35246 • Still more backpropagation • Quiz at 4:05 PM Things we will look at today • More Backpropagation Lecture 4 Backpropagation CMSC 35246 • Quiz at 4:05 PM Things we will look at today • More Backpropagation • Still more backpropagation Lecture 4 Backpropagation CMSC 35246 Things we will look at today • More Backpropagation • Still more backpropagation • Quiz at 4:05 PM Lecture 4 Backpropagation CMSC 35246 To understand, let us just calculate! Lecture 4 Backpropagation CMSC 35246 T y^ = x w = x1w1 + x2w2 + : : : xdwd One Neuron Again y^ wd w1 w2 x1 x2 ... xd Consider example x; Output for x is y^; Correct Answer is y Loss L = (y − y^)2 Lecture 4 Backpropagation CMSC 35246 One Neuron Again y^ wd w1 w2 x1 x2 ... xd Consider example x; Output for x is y^; Correct Answer is y Loss L = (y − y^)2 T y^ = x w = x1w1 + x2w2 + : : : xdwd Lecture 4 Backpropagation CMSC 35246 Update rule: w := w − η @L i i @wi Now @L @(^y − y)2 @(x w + x w + : : : x w ) = = 2(^y − y) 1 1 2 2 d d @wi @wi @wi One Neuron Again y^ wd w1 w2 x1 x2 ... xd Want to update wi (forget closed form solution for a bit!) Lecture 4 Backpropagation CMSC 35246 Now @L @(^y − y)2 @(x w + x w + : : : x w ) = = 2(^y − y) 1 1 2 2 d d @wi @wi @wi One Neuron Again y^ wd w1 w2 x1 x2 ... xd Want to update wi (forget closed form solution for a bit!) Update rule: w := w − η @L i i @wi Lecture 4 Backpropagation CMSC 35246 @(^y − y)2 @(x w + x w + : : : x w ) = 2(^y − y) 1 1 2 2 d d @wi @wi One Neuron Again y^ wd w1 w2 x1 x2 ... xd Want to update wi (forget closed form solution for a bit!) Update rule: w := w − η @L i i @wi Now @L = @wi Lecture 4 Backpropagation CMSC 35246 @(x w + x w + : : : x w ) 2(^y − y) 1 1 2 2 d d @wi One Neuron Again y^ wd w1 w2 x1 x2 ... xd Want to update wi (forget closed form solution for a bit!) Update rule: w := w − η @L i i @wi Now @L @(^y − y)2 = = @wi @wi Lecture 4 Backpropagation CMSC 35246 One Neuron Again y^ wd w1 w2 x1 x2 ... xd Want to update wi (forget closed form solution for a bit!) Update rule: w := w − η @L i i @wi Now @L @(^y − y)2 @(x w + x w + : : : x w ) = = 2(^y − y) 1 1 2 2 d d @wi @wi @wi Lecture 4 Backpropagation CMSC 35246 2(^y − y)xi Update Rule: wi := wi − η(^y − y)xi = wi − ηδxi where δ = (^y − y) In vector form: w := w − ηδx Simple enough! Now let's graduate ... One Neuron Again y^ wd w1 w2 x1 x2 ... xd @L We have: = @wi Lecture 4 Backpropagation CMSC 35246 Update Rule: wi := wi − η(^y − y)xi = wi − ηδxi where δ = (^y − y) In vector form: w := w − ηδx Simple enough! Now let's graduate ... One Neuron Again y^ wd w1 w2 x1 x2 ... xd @L We have: = 2(^y − y)xi @wi Lecture 4 Backpropagation CMSC 35246 In vector form: w := w − ηδx Simple enough! Now let's graduate ... One Neuron Again y^ wd w1 w2 x1 x2 ... xd @L We have: = 2(^y − y)xi @wi Update Rule: wi := wi − η(^y − y)xi = wi − ηδxi where δ = (^y − y) Lecture 4 Backpropagation CMSC 35246 Simple enough! Now let's graduate ... One Neuron Again y^ wd w1 w2 x1 x2 ... xd @L We have: = 2(^y − y)xi @wi Update Rule: wi := wi − η(^y − y)xi = wi − ηδxi where δ = (^y − y) In vector form: w := w − ηδx Lecture 4 Backpropagation CMSC 35246 One Neuron Again y^ wd w1 w2 x1 x2 ... xd @L We have: = 2(^y − y)xi @wi Update Rule: wi := wi − η(^y − y)xi = wi − ηδxi where δ = (^y − y) In vector form: w := w − ηδx Simple enough! Now let's graduate ... Lecture 4 Backpropagation CMSC 35246 (1) (1) (1) z1 = tanh(a1) where a1 = w11 x1 + w21 x2 + w31 x3 likewise for z2 Simple Feedforward Network y^ (2) (2) w1 w2 z1 z2 w(1) (1) (1) 12 w22 w32 (1) w(1) (1) w11 21 w31 x1 x2 x3 (2) (2) y^ = w1 z1 + w2 z2 Lecture 4 Backpropagation CMSC 35246 Simple Feedforward Network y^ (2) (2) w1 w2 z1 z2 w(1) (1) (1) 12 w22 w32 (1) w(1) (1) w11 21 w31 x1 x2 x3 (2) (2) y^ = w1 z1 + w2 z2 (1) (1) (1) z1 = tanh(a1) where a1 = w11 x1 + w21 x2 + w31 x3 likewise for z2 Lecture 4 Backpropagation CMSC 35246 Want to assign credit for the loss L to each weight Simple Feedforward Network y^ (2) (2) w1 w2 z1 z2 (1) w w(1) w(1) z = tanh(a ) where a = 12 22 32 1 1 1 (1) (1) w w w(1) (1) (1) (1) 11 21 31 w11 x1 + w21 x2 + w31 x3 x1 x2 x3 z2 = tanh(a2) where a2 = (1) (1) (1) w12 x1 + w22 x2 + w32 x3 (2) (2) 2 Output y^ = w1 z1 + w2 z2; Loss L = (^y − y) Lecture 4 Backpropagation CMSC 35246 Simple Feedforward Network y^ (2) (2) w1 w2 z1 z2 (1) w w(1) w(1) z = tanh(a ) where a = 12 22 32 1 1 1 (1) (1) w w w(1) (1) (1) (1) 11 21 31 w11 x1 + w21 x2 + w31 x3 x1 x2 x3 z2 = tanh(a2) where a2 = (1) (1) (1) w12 x1 + w22 x2 + w32 x3 (2) (2) 2 Output y^ = w1 z1 + w2 z2; Loss L = (^y − y) Want to assign credit for the loss L to each weight Lecture 4 Backpropagation CMSC 35246 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (2) = (2) = 2(^y − y) (2) = 2(^y − y)z1 @w1 @w1 @w1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first Lecture 4 Backpropagation CMSC 35246 2 (2) (2) @(^y−y) @(w1 z1+w2 z2) (2) = 2(^y − y) (2) = 2(^y − y)z1 @w1 @w1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first @L (2) = @w1 Lecture 4 Backpropagation CMSC 35246 (2) (2) @(w1 z1+w2 z2) 2(^y − y) (2) = 2(^y − y)z1 @w1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first @L @(^y−y)2 (2) = (2) = @w1 @w1 Lecture 4 Backpropagation CMSC 35246 2(^y − y)z1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (2) = (2) = 2(^y − y) (2) = @w1 @w1 @w1 Lecture 4 Backpropagation CMSC 35246 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (2) = (2) = 2(^y − y) (2) = 2(^y − y)z1 @w1 @w1 @w1 Lecture 4 Backpropagation CMSC 35246 (2) w1 − ηδz1 with δ = (^y − y) (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (2) = (2) = 2(^y − y) (2) = 2(^y − y)z1 @w1 @w1 @w1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L w1 := w1 − η (2) = @w1 Lecture 4 Backpropagation CMSC 35246 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (2) = (2) = 2(^y − y) (2) = 2(^y − y)z1 @w1 @w1 @w1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 Lecture 4 Backpropagation CMSC 35246 Top Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 (1) (1) (1) @L @L w11 w21 w31 Want to find: (2) and (2) x1 x2 x3 @w1 @w2 (2) Consider w1 first 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (2) = (2) = 2(^y − y) (2) = 2(^y − y)z1 @w1 @w1 @w1 (2) Familiar from earlier! Update for w1 would be (2) (2) @L (2) w1 := w1 − η (2) = w1 − ηδz1 with δ = (^y − y) @w1 (2) (2) (2) Likewise, for w2 update would be w2 := w2 − ηδz2 Lecture 4 Backpropagation CMSC 35246 2 (2) (2) @L @(^y−y) @(w1 z1+w2 z2) (1) = (1) = 2(^y − y) (21) @w11 @w11 @w11 (2) (2) (1) (1) (1) @(w1 z1+w2 z2) (2) @(tanh(w11 x1+w21 x2+w31 x3)) Now: (1) = w1 (1) + 0 @w11 @w11 (2) 2 Which is: w1 (1 − tanh (a1))x1 recall a1 =? @L (2) 2 So we have: (1) = 2(^y − y)w1 (1 − tanh (a1))x1 @w11 Next Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 There are six weights to assign (1) (1) (1) w11 w21 w31 credit for the loss incurred x1 x2 x3 (1) Consider w11 for an illustration Rest are similar Lecture 4 Backpropagation CMSC 35246 2 (2) (2) @(^y−y) @(w1 z1+w2 z2) (1) = 2(^y − y) (21) @w11 @w11 (2) (2) (1) (1) (1) @(w1 z1+w2 z2) (2) @(tanh(w11 x1+w21 x2+w31 x3)) Now: (1) = w1 (1) + 0 @w11 @w11 (2) 2 Which is: w1 (1 − tanh (a1))x1 recall a1 =? @L (2) 2 So we have: (1) = 2(^y − y)w1 (1 − tanh (a1))x1 @w11 Next Layer y^ (2) (2) w1 w2 z1 z2 (1) (1) (1) w12 w22 w32 There are six weights to assign (1) (1) (1) w11 w21 w31 credit for the loss incurred x1 x2 x3 (1) Consider w11 for an illustration Rest are similar @L (1) = @w11 Lecture 4 Backpropagation CMSC 35246 (2) (2) @(w1 z1+w2 z2) 2(^y − y) (21) @w11 (2) (2) (1) (1) (1) @(w1 z1+w2 z2) (2) @(tanh(w11 x1+w21 x2+w31 x3)) Now: (1) = w1 (1) + 0

Load more