<<

CS4501: Introduction to Vision Neural Networks (NNs) Artificial Neural Networks (ANNs) Multi-layer (MLPs Previous

• Softmax Classifier • Inference vs Training • (GD) • Stochastic Gradient Descent (SGD) • mini-batch Stochastic Gradient Descent (SGD) • Max-Margin Classifier • Regression vs Classification • Issues with Generalization / • Regularization / momentum Today’s Class

Neural Networks • The Model • The Multi-layer Perceptron (MLP) • Forward-pass in an MLP (Inference) • Backward-pass in an MLP () Perceptron Model

Frank Rosenblatt (1957) - Cornell University

": .:

- .; "; 1, if ) .*"* + 0 > 0 ! " = $ *+, . ) 0, otherwise < " < .=

"=

More: https://en.wikipedia.org/wiki/Perceptron Perceptron Model

Frank Rosenblatt (1957) - Cornell University

": !? .:

- .; "; 1, if ) .*"* + 0 > 0 ! " = $ *+, . ) 0, otherwise < " < .=

"=

More: https://en.wikipedia.org/wiki/Perceptron Perceptron Model

Frank Rosenblatt (1957) - Cornell University Activation function

": .:

- .; "; 1, if ) .*"* + 0 > 0 ! " = $ *+, . ) 0, otherwise < " < .=

"=

More: https://en.wikipedia.org/wiki/Perceptron Activation Functions Step(x) Sigmoid(x)

Tanh(x) ReLU(x) = max(0, x) Two-layer Multi-layer Perceptron (MLP)

”hidden" layer & Loss / Criterion ! '" " & ! '# # & )(" )" ' !$ $ & ! '% %

& Linear Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

0- = 1-&$"& + 1-'$"' + 1-($"( + 1-)$") + 3-

0. = 1.&$"& + 1.'$"' + 1.($"( + 1.)$") + 3.

0/ = 1/&$"& + 1/'$"' + 1/($"( + 1/)$") + 3/

56 56 59 5: ,- = 4 /(4 +4 + 4 ) 59 56 59 5: ,. = 4 /(4 +4 + 4 ) 5: 56 59 5: ,/ = 4 /(4 +4 + 4 )

9 Linear Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

1-& 1-' 1-( 1-) 0 = 1 $ + 1 $ + 1 $ + 1 $ + 3 - -& "& -' "' -( "( -) ") - 1 = 1.& 1.' 1.( 1.) 0. = 1.&$"& + 1.'$"' + 1.($"( + 1.)$") + 3. 1/& 1/' 1/( 1/)

0/ = 1/&$"& + 1/'$"' + 1/($"( + 1/)$") + 3/ 3 = 3- 3. 3/

56 56 59 5: ,- = 4 /(4 +4 + 4 ) 59 56 59 5: ,. = 4 /(4 +4 + 4 ) 5: 56 59 5: ,/ = 4 /(4 +4 + 4 )

10 Linear Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

1-& 1-' 1-( 1-) 1 = 1.& 1.' 1.( 1.) 0 = 1$2 + 42 1/& 1/' 1/( 1/)

4 = 4- 4. 4/

67 67 6: 6; ,- = 5 /(5 +5 + 5 ) 6: 67 6: 6; ,. = 5 /(5 +5 + 5 ) 6; 67 6: 6; ,/ = 5 /(5 +5 + 5 )

11 Linear Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

1-& 1-' 1-( 1-) 1 = 1.& 1.' 1.( 1.) 0 = 1$2 + 42 1/& 1/' 1/( 1/)

4 = 4- 4. 4/

, = 56,789$(0)

12 Linear Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

, = 01,234$(6$7 + 97)

13 Two-layer MLP + Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 , = 15,=40$(8[']$ + ;['])

14 N-layer MLP + Softmax

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … 9 9 0A = 1234526(8[A]0A?& + ;[A])

9 9 , = 15,=40$(8[>]0>?& + ;[>]) 15 How to train the parameters?

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … 9 9 0A = 1234526(8[A]0A?& + ;[A])

9 9 , = 15,=40$(8[>]0>?& + ;[>]) 16 Forward pass (Forward-propagation)

/ !+ = 4567859(*+) *+ = & 0"+1'+ + 3" +-.

& / <" = & 0#+!+ + 3# +-. )" = 4567859(<+) ! '" " & ! '# # & )(" )" ' !$ $ & !% '% =8>> = =()", )(")

& How to train the parameters?

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … We can still use SGD 9 9 0A = 1234526(8[A]0A?& + ;["])

… We need!

BC BC , = 15,=40$(8 09 + ;9 ) [>] >?& [>] B8[A]"D B; A " 18 How to train the parameters?

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … We can still use SGD 9 9 0" = 1234526(8[C]0C?& + ;[C]) … We need! 9 9 , = 15,=40$(8[>]0>?& + ;[>]) AB AB A8 A; B = B511(,, !) [C]"D C " 19 How to train the parameters?

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … We can still use SGD 9 9 0" = 1234526(8[C]0C?& + ;[C]) … We need! 9 9 , = 15,=40$(8[>]0>?& + ;[>]) AB AB A8 A; B = B511(,, !) [C]"D C " 20 How to train the parameters?

$" = [$"& $"' $"( $")] !" = [1 0 0] !+" = [,- ,. ,/]

9 9 0& = 1234526(8[&]$ + ;[&]) 9 9 0' = 1234526(8[']0& + ;[']) … BC BC B0>?& B0A B0A?& 9 9 = … 0" = 1234526(8[A]0A?& + ;[A]) B8[A]"D B0>?& B0>?' B0A?& B8 A "D …

9 9 , = 15,=40$(8[>]0>?& + ;[>])

C = C511(,, !) 21 Backward pass (Back-propagation)

*+ * A *+ *+ * *+ = ( & B"-C'- + E") = /012304(,-) *'7 *'7 -?@ *,- *,- *,- *!7 *+ * A *+ = ( & B#-!- + E#) & *!7 *!7 -?@ *<"

89 8 89 = /012304(<-) 8:; 8:; 8=(; ! '" " & ! '# # & )(" )" ' !$ $ &

*+ *!7 *+ ' !% = 89 8 % *B *B *! = +()", )(") #- #- 7 8=(; 8=(;

*+ *' *+ & = 7 *B"-C *B"-C *'7 (mini-batch) Stochastic Gradient Descent (SGD)

! = 0.01 '(), +) = / −log 60,789:7(), +) 0∈2 Initialize w and b randomly For Softmax Classifier for e = 0, num_epochs do for b = 0, num_batches do Compute: &'(), +)/&) and &'(), +)/&+ Update w: ) = ) − ! &'(), +)/&) Update b: + = + − ! &'(), +)/&+ Print: '(), +) // Useful to see if this is becoming smaller or not. end end 23 Automatic Differentiation

You only need to write code for the forward pass, backward pass is computed automatically.

Frameworks such as Pytorch will “record” the operations performed on and compute gradients through the “recorded” operations when requested.

Pytorch (Facebook -- mostly): https://pytorch.org/

Tensorflow (Google -- mostly): https://www.tensorflow.org/

DyNet (team includes UVA Prof. Yangfeng Ji): http://dynet.io/ Example

• Provided in Assignments 3 and Assignment 4.

• Let’s dissect Assignment 3… Defining a Linear Softmax classifier Defining a Linear Softmax classifier Using a Linear Softmax classifier Training a Linear Softmax classifier What is trainLoader? Training a Linear Softmax classifier (improved)

This depends on the model but we don’t need it anymore Defining a Two-layer Neural Network Questions?

33