<<

and Vision Group

Deep Learning with TensorFlow http://cvml.ist.ac.at/courses/DLWT_W18

Lecture 3: Artificial Neural Networks (Multilayer )

Artificial Neural Networks

Multi-layer Perceptrons

Lars Bollmann Contents

01 Introduction Brief history, biological neuron and artificial neural network

02 Features & drawbacks

03 Multi-layer Perceptron Description & training

04 Implementation: Tensorflow High-Level API and plain Tensorflow Brief history

[1]

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Biological neuron

[2]

[3] [4]

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105

(e

LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105

Artificial neural network (ANN)

McCulloch & Pitts (1943)

LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 Logical OR LOGICAL CALCULUSHeat FOR NERVOUS ACTIVITY “learning”105 (i 1

(e

Logical AND Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1);

(c) N3(t).s.N1(t-1).N2(t-1); All images from [5] N3(t).= 01 Introduction(d) N,(t-l).-N,(t-1);02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (e N&).=.N2(t-2).N2(t-1); (e (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2); (i 1 (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x).

Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (i 1 (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1); (c) N3(t).s.N1(t-1).N2(t-1);(i 1 (d) N3(t).= N,(t-l).-N,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); cell, and the corresponding action is denoted by “N” with is subscript, as in the text: N&).=.N2(t-2).N2(t-1); (a) N*(t) .=.N,(t- 1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (b) N,(t).s.N,(t-l)vN,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action isN,(t-l).N,(t-1) denoted by “N” with is subscript, as in the text: (c) N3(t).s.N1(t-1).N2(t-1); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (d) N3(t).= N,(t-l).-N,(t-1); (a) N*(t) .=.N,(t- 1); N,(t-2).N,(t-2); (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (b) N,(t).s.N,(t-l)vN,(t-1); (g) N,(t).=.NN,(t-2).-N,(t-3); N&).=.N2(t-2).N2(t-1); (c) N3(t).s.N1(t-1).N2(t-1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (d) N3(t).= N,(t-l).-N,(t-1); (h) N,(t).=.N,(t-l).N,(t-2); N,(t-l).N,(t-1) (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2);(i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N&).=.N2(t-2).N2(t-1); N,(t-2).N,(t-2); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (g) N,(t).=.NN,(t-2).-N,(t-3); N,(t-l).N,(t-1) (h) N,(t).=.N,(t-l).N,(t-2); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). N,(t-2).N,(t-2); (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). The Perceptron

Rosenblatt (1957) Features Drawbacks

Outputs • Linear threshold unit: • Still a linear classifier y ⎧ m ⎪1 if w ⋅ x + b > 0 y ∑ i,1 i =⎨ i=1 x2 ⎪ Σ ⎩0 otherwise

w2,1 b • Update rule for weights: w1,1 w (next _ step)=w +η( y − yˆ )x 1 i, j i, j j j i • Converges if training instances are linearly x1 x2 x1 Inputs separable

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Multi-layer Perceptron (MLP)

Input layer Hidden layer Output layer • Activation of one neuron:

1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Alternative activation functions: 1 Σ y1 Σ

x 1 Σ y2 Σ

x2 Σ y3

Σ [6]

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training

Input layer Hidden layer Output layer • Activation of one neuron:

1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Find optimal weights & bias 2 terms to reduce cost function 1 Σ (ŷ1 – y1) (e.g. MSE, cross-entropy) Σ • Weights: influence of neurons in x 2 1 Σ (ŷ2 - y2) previous layer

Σ • Bias terms: nudge towards x2 2 active/inactive Σ (ŷ3 – y3)

Σ

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training

Input layer Hidden layer Output layer Minimize cost 1 function using GD

Σ • Initialize weights & bias terms with random values 2 1 Σ (ŷ1 – y1) • Average cost over all instances: Σ m 1 x Σ (ŷ - y )2 C = C 1 2 2 ∑ i m i=1 Σ • Mini-batches to compute gradient with x2 2 Σ (ŷ3 – y3) respect to weights and biases

Σ • step

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training

Input layer Hidden layer Output layer 1 b • Forward pass: compute output of all neurons for training instance x Σ Σ 1 (L) w • Backward pass: compute partial (L-1) (L) a a derivatives for each weight and bias

• Cost: C = a( L) − y 2 0 ( ) • Average over mini-batch: m ( L) ( L) ( L) ( L) ( L−1) C mb C • Activation: with ∂ mb 1 ∂ i a =fact (z ) z = w a + b = ( L) ∑ ( L) ∂w mmb i=1 ∂w

(L) • Derivative of C0 with respect to w : • Combine partial derivatives ∂C ∂z( L) ∂a( L) ∂C à gradient vector 0 = ⋅ ⋅ 0 ∂w( L) ∂w( L) ∂z( L) ∂a( L)

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Tensorflow

High-Level API Plain Tensorflow

• TF.Learn 1. Construction phase

• DNNClassifier 2. Training phase

• Parameters: 3. Using the trained • # layers network • # neurons per layer • batch size à Demo in Jupyter • # iterations •

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Hyperparameters

# hidden layers # neurons / layer # activation

• More layers: • Funnel or constant size • ReLU is a good choice in • exponentially fewer general neurons for complex • “black art” functions • Softmax for output when • Converge faster • Too many neurons cause classes are mutually • Generalize better exclusive • Reuse of layers

• Start with few layers and increase number

01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Summary

From Artifical Neurons to Deep Neural networks Initial concepts in the 1940s, dark age and recent boom

Multi-layer Perceptron Layers, neurons, bias, weights, activation function

Training of ANN Gradient descent, backpropagation & mini-batches

Implementation in Tensorflow High-Level API, Plain TensorFlow, hyperparameter tuning References

[1] https://www.slideshare.net/deview/251-deep-learning-using-cu-dnn/4

[2] https://en.wikipedia.org/wiki/Neuron#/media/File:Blausen_0657_MultipolarNeuron.png

[3] https://en.wikipedia.org/wiki/Neural_oscillation

[4] http://www.lesicalab.com/research/

[5] McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity

[6] https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep- learning-9689331ba09