CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs) Multi-layer Perceptrons (MLPs Previous • Softmax Classifier • Inference vs Training • Gradient Descent (GD) • Stochastic Gradient Descent (SGD) • mini-batch Stochastic Gradient Descent (SGD) • Max-Margin Classifier • Regression vs Classification • Issues with Generalization / Overfitting • Regularization / momentum Today’s Class Neural Networks • The Perceptron Model • The Multi-layer Perceptron (MLP) • Forward-pass in an MLP (Inference) • Backward-pass in an MLP (Backpropagation) Perceptron Model Frank Rosenblatt (1957) - Cornell University Activation function ": .: - .; "; 1, if ) .*"* + 0 > 0 ! " = $ *+, . ) 0, otherwise < " < .= "= More: https://en.wikipedia.org/wiki/Perceptron Perceptron Model Frank Rosenblatt (1957) - Cornell University ": !? .: - .; "; 1, if ) .*"* + 0 > 0 ! " = $ *+, . ) 0, otherwise < " < .= "= More: https://en.wikipedia.org/wiki/Perceptron Perceptron Model Frank Rosenblatt (1957) - Cornell University Activation function ": .: - .; "; 1, if ) .*"* + 0 > 0 ! " = $ *+, . ) 0, otherwise < " < .= "= More: https://en.wikipedia.org/wiki/Perceptron Activation Functions Step(x) Sigmoid(x) Tanh(x) ReLU(x) = max(0, x) Two-layer Multi-layer Perceptron (MLP) ”hidden" layer & Loss / Criterion ! '" " & ! '# # & )(" )" ' !$ $ & ! '% % & Linear Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 0- = 1-&$"& + 1-'$"' + 1-($"( + 1-)$") + 3- 0. = 1.&$"& + 1.'$"' + 1.($"( + 1.)$") + 3. 0/ = 1/&$"& + 1/'$"' + 1/($"( + 1/)$") + 3/ 56 56 59 5: ,- = 4 /(4 +4 + 4 ) 59 56 59 5: ,. = 4 /(4 +4 + 4 ) 5: 56 59 5: ,/ = 4 /(4 +4 + 4 ) 9 Linear Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 1-& 1-' 1-( 1-) 0 = 1 $ + 1 $ + 1 $ + 1 $ + 3 - -& "& -' "' -( "( -) ") - 1 = 1.& 1.' 1.( 1.) 0. = 1.&$"& + 1.'$"' + 1.($"( + 1.)$") + 3. 1/& 1/' 1/( 1/) 0/ = 1/&$"& + 1/'$"' + 1/($"( + 1/)$") + 3/ 3 = 3- 3. 3/ 56 56 59 5: ,- = 4 /(4 +4 + 4 ) 59 56 59 5: ,. = 4 /(4 +4 + 4 ) 5: 56 59 5: ,/ = 4 /(4 +4 + 4 ) 10 Linear Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 1-& 1-' 1-( 1-) 1 = 1.& 1.' 1.( 1.) 0 = 1$2 + 42 1/& 1/' 1/( 1/) 4 = 4- 4. 4/ 67 67 6: 6; ,- = 5 /(5 +5 + 5 ) 6: 67 6: 6; ,. = 5 /(5 +5 + 5 ) 6; 67 6: 6; ,/ = 5 /(5 +5 + 5 ) 11 Linear Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 1-& 1-' 1-( 1-) 1 = 1.& 1.' 1.( 1.) 0 = 1$2 + 42 1/& 1/' 1/( 1/) 4 = 4- 4. 4/ , = 56,789$(0) 12 Linear Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] , = 01,234$(6$7 + 97) 13 Two-layer MLP + Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 , = 15,=40$(8[']$ + ;[']) 14 N-layer MLP + Softmax $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … 9 9 0A = 1234526(8[A]0A?& + ;[A]) … 9 9 , = 15,=40$(8[>]0>?& + ;[>]) 15 How to train the parameters? $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … 9 9 0A = 1234526(8[A]0A?& + ;[A]) … 9 9 , = 15,=40$(8[>]0>?& + ;[>]) 16 Forward pass (Forward-propagation) / !+ = 4567859(*+) *+ = & 0"+1'+ + 3" +-. & / <" = & 0#+!+ + 3# +-. )" = 4567859(<+) ! '" " & ! '# # & )(" )" ' !$ $ & !% '% =8>> = =()", )(") & How to train the parameters? $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … We can still use SGD 9 9 0A = 1234526(8[A]0A?& + ;["]) … We need! BC BC , = 15,=40$(8 09 + ;9 ) [>] >?& [>] B8[A]"D B; A " 18 How to train the parameters? $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … We can still use SGD 9 9 0" = 1234526(8[C]0C?& + ;[C]) … We need! 9 9 , = 15,=40$(8[>]0>?& + ;[>]) AB AB A8 A; B = B511(,, !) [C]"D C " 19 How to train the parameters? $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … We can still use SGD 9 9 0" = 1234526(8[C]0C?& + ;[C]) … We need! 9 9 , = 15,=40$(8[>]0>?& + ;[>]) AB AB A8 A; B = B511(,, !) [C]"D C " 20 How to train the parameters? $" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/] 9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … BC BC B0>?& B0A B0A?& 9 9 = … 0" = 1234526(8[A]0A?& + ;[A]) B8[A]"D B0>?& B0>?' B0A?& B8 A "D … 9 9 , = 15,=40$(8[>]0>?& + ;[>]) C = C511(,, !) 21 Backward pass (Back-propagation) *+ * A *+ *+ * *+ = ( & B"-C'- + E") = /012304(,-) *'7 *'7 -?@ *,- *,- *,- *!7 *+ * A *+ = ( & B#-!- + E#) & *!7 *!7 -?@ *<" 89 8 89 = /012304(<-) 8:; 8:; 8=(; ! '" " & ! '# # & )(" )" ' !$ $ & *+ *!7 *+ ' !% = 89 8 % *B *B *! = +()", )(") #- #- 7 8=(; 8=(; *+ *' *+ & = 7 *B"-C *B"-C *'7 (mini-batch) Stochastic Gradient Descent (SGD) ! = 0.01 '(), +) = / −log 60,789:7(), +) 0∈2 Initialize w and b randomly For Softmax Classifier for e = 0, num_epochs do for b = 0, num_batches do Compute: &'(), +)/&) and &'(), +)/&+ Update w: ) = ) − ! &'(), +)/&) Update b: + = + − ! &'(), +)/&+ Print: '(), +) // Useful to see if this is becoming smaller or not. end end 23 Automatic Differentiation You only need to write code for the forward pass, backward pass is computed automatically. Frameworks such as Pytorch will “record” the operations performed on tensors and compute gradients through the “recorded” operations when requested. Pytorch (Facebook -- mostly): https://pytorch.org/ Tensorflow (Google -- mostly): https://www.tensorflow.org/ DyNet (team includes UVA Prof. Yangfeng Ji): http://dynet.io/ Example • Provided in Assignments 3 and Assignment 4. • Let’s dissect Assignment 3… Defining a Linear Softmax classifier Defining a Linear Softmax classifier Using a Linear Softmax classifier Training a Linear Softmax classifier What is trainLoader? Training a Linear Softmax classifier (improved) This depends on the model but we don’t need it anymore Defining a Two-layer Neural Network Questions? 33.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages33 Page
-
File Size-