Machine Learning and Computer Vision Group
Deep Learning with TensorFlow http://cvml.ist.ac.at/courses/DLWT_W18
Lecture 3: Artificial Neural Networks (Multilayer Perceptrons)
Artificial Neural Networks
Multi-layer Perceptrons
Lars Bollmann Contents
01 Introduction Brief history, biological neuron and artificial neural network
02 Perceptron Features & drawbacks
03 Multi-layer Perceptron Description & training
04 Implementation: Tensorflow High-Level API and plain Tensorflow Brief history
[1]
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Biological neuron
[2]
[3] [4]
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
(e
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105
Artificial neural network (ANN)
McCulloch & Pitts (1943)
LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 Logical OR LOGICAL CALCULUSHeat illusion FOR NERVOUS ACTIVITY “learning”105 (i 1
(e
Logical AND Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1);
(c) N3(t).s.N1(t-1).N2(t-1); All images from [5] N3(t).= 01 Introduction(d) N,(t-l).-N,(t-1);02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (e N&).=.N2(t-2).N2(t-1); (e (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2); (i 1 (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x).
Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (i 1 (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1); (c) N3(t).s.N1(t-1).N2(t-1);(i 1 (d) N3(t).= N,(t-l).-N,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); cell, and the corresponding action is denoted by “N” with is subscript, as in the text: N&).=.N2(t-2).N2(t-1); (a) N*(t) .=.N,(t- 1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (b) N,(t).s.N,(t-l)vN,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action isN,(t-l).N,(t-1) denoted by “N” with is subscript, as in the text: (c) N3(t).s.N1(t-1).N2(t-1); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (d) N3(t).= N,(t-l).-N,(t-1); (a) N*(t) .=.N,(t- 1); N,(t-2).N,(t-2); (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (b) N,(t).s.N,(t-l)vN,(t-1); (g) N,(t).=.NN,(t-2).-N,(t-3); N&).=.N2(t-2).N2(t-1); (c) N3(t).s.N1(t-1).N2(t-1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (d) N3(t).= N,(t-l).-N,(t-1); (h) N,(t).=.N,(t-l).N,(t-2); N,(t-l).N,(t-1) (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2);(i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N&).=.N2(t-2).N2(t-1); N,(t-2).N,(t-2); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (g) N,(t).=.NN,(t-2).-N,(t-3); N,(t-l).N,(t-1) (h) N,(t).=.N,(t-l).N,(t-2); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). N,(t-2).N,(t-2); (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). The Perceptron
Rosenblatt (1957) Features Drawbacks
Outputs • Linear threshold unit: • Still a linear classifier y ⎧ m ⎪1 if w ⋅ x + b > 0 y ∑ i,1 i =⎨ i=1 x2 ⎪ Σ ⎩0 otherwise
w2,1 b • Update rule for weights: w1,1 w (next _ step)=w +η( y − yˆ )x 1 i, j i, j j j i • Converges if training instances are linearly x1 x2 x1 Inputs separable
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Multi-layer Perceptron (MLP)
Input layer Hidden layer Output layer • Activation of one neuron:
1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Alternative activation functions: 1 Σ y1 Σ
x 1 Σ y2 Σ
x2 Σ y3
Σ [6]
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training
Input layer Hidden layer Output layer • Activation of one neuron:
1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Find optimal weights & bias 2 terms to reduce cost function 1 Σ (ŷ1 – y1) (e.g. MSE, cross-entropy) Σ • Weights: influence of neurons in x 2 1 Σ (ŷ2 - y2) previous layer
Σ • Bias terms: nudge towards x2 2 active/inactive Σ (ŷ3 – y3)
Σ
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training
Input layer Hidden layer Output layer Minimize cost 1 function using GD
Σ • Initialize weights & bias terms with random values 2 1 Σ (ŷ1 – y1) • Average cost over all instances: Σ m 1 x Σ (ŷ - y )2 C = C 1 2 2 ∑ i m i=1 Σ • Mini-batches to compute gradient with x2 2 Σ (ŷ3 – y3) respect to weights and biases
Σ • Gradient descent step
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training
Input layer Hidden layer Output layer Backpropagation 1 b • Forward pass: compute output of all neurons for training instance x Σ Σ 1 (L) w • Backward pass: compute partial (L-1) (L) a a derivatives for each weight and bias
• Cost: C = a( L) − y 2 0 ( ) • Average over mini-batch: m ( L) ( L) ( L) ( L) ( L−1) C mb C • Activation: with ∂ mb 1 ∂ i a =fact (z ) z = w a + b = ( L) ∑ ( L) ∂w mmb i=1 ∂w
(L) • Derivative of C0 with respect to w : • Combine partial derivatives ∂C ∂z( L) ∂a( L) ∂C à gradient vector 0 = ⋅ ⋅ 0 ∂w( L) ∂w( L) ∂z( L) ∂a( L)
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Tensorflow
High-Level API Plain Tensorflow
• TF.Learn 1. Construction phase
• DNNClassifier 2. Training phase
• Parameters: 3. Using the trained • # layers network • # neurons per layer • batch size à Demo in Jupyter • # iterations • activation function
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Hyperparameters
# hidden layers # neurons / layer # activation
• More layers: • Funnel or constant size • ReLU is a good choice in • exponentially fewer general neurons for complex • “black art” functions • Softmax for output when • Converge faster • Too many neurons cause classes are mutually • Generalize better overfitting exclusive • Reuse of layers
• Start with few layers and increase number
01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Summary
From Artifical Neurons to Deep Neural networks Initial concepts in the 1940s, dark age and recent boom
Multi-layer Perceptron Layers, neurons, bias, weights, activation function
Training of ANN Gradient descent, backpropagation & mini-batches
Implementation in Tensorflow High-Level API, Plain TensorFlow, hyperparameter tuning References
[1] https://www.slideshare.net/deview/251-deep-learning-using-cu-dnn/4
[2] https://en.wikipedia.org/wiki/Neuron#/media/File:Blausen_0657_MultipolarNeuron.png
[3] https://en.wikipedia.org/wiki/Neural_oscillation
[4] http://www.lesicalab.com/research/
[5] McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity
[6] https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep- learning-9689331ba09