Multi-Layer Perceptrons

Machine Learning and Computer Vision Group Deep Learning with TensorFlow http://cvml.ist.ac.at/courses/DLWT_W18 Lecture 3: Artiﬁcial Neural Networks (Multilayer Perceptrons) Artificial Neural Networks Multi-layer Perceptrons Lars Bollmann Contents 01 Introduction Brief history, biological neuron and artificial neural network 02 Perceptron Features & drawbacks 03 Multi-layer Perceptron Description & training 04 Implementation: Tensorflow High-Level API and plain Tensorflow Brief history [1] 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Biological neuron [2] [3] [4] 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 (e LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 Artificial neural network (ANN) McCulloch & Pitts (1943) LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 Logical OR LOGICAL CALCULUSHeat illusion FOR NERVOUS ACTIVITY “learning”105 (i 1 (e Logical AND Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1); (c) N3(t).s.N1(t-1).N2(t-1); All images from [5] N3(t).= 01 Introduction(d) N,(t-l).-N,(t-1);02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (e N&).=.N2(t-2).N2(t-1); (e (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2); (i 1 (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (i 1 (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1); (c) N3(t).s.N1(t-1).N2(t-1);(i 1 (d) N3(t).= N,(t-l).-N,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); cell, and the corresponding action is denoted by “N” with is subscript, as in the text: N&).=.N2(t-2).N2(t-1); (a) N*(t) .=.N,(t- 1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (b) N,(t).s.N,(t-l)vN,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action isN,(t-l).N,(t-1) denoted by “N” with is subscript, as in the text: (c) N3(t).s.N1(t-1).N2(t-1); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (d) N3(t).= N,(t-l).-N,(t-1); (a) N*(t) .=.N,(t- 1); N,(t-2).N,(t-2); (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (b) N,(t).s.N,(t-l)vN,(t-1); (g) N,(t).=.NN,(t-2).-N,(t-3); N&).=.N2(t-2).N2(t-1); (c) N3(t).s.N1(t-1).N2(t-1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (d) N3(t).= N,(t-l).-N,(t-1); (h) N,(t).=.N,(t-l).N,(t-2); N,(t-l).N,(t-1) (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2);(i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N&).=.N2(t-2).N2(t-1); N,(t-2).N,(t-2); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (g) N,(t).=.NN,(t-2).-N,(t-3); N,(t-l).N,(t-1) (h) N,(t).=.N,(t-l).N,(t-2); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). N,(t-2).N,(t-2); (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). The Perceptron Rosenblatt (1957) Features Drawbacks Outputs • Linear threshold unit: • Still a linear classifier y ⎧ m ⎪1 if w ⋅ x + b > 0 y ∑ i,1 i =⎨ i=1 x2 ⎪ Σ ⎩0 otherwise w2,1 b • Update rule for weights: w1,1 w (next _ step)=w +η( y − yˆ )x 1 i, j i, j j j i • Converges if training instances are linearly x1 x2 x1 Inputs separable 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Multi-layer Perceptron (MLP) Input layer Hidden layer Output layer • Activation of one neuron: 1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Alternative activation functions: 1 Σ y1 Σ x 1 Σ y2 Σ x2 Σ y3 Σ [6] 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training Input layer Hidden layer Output layer • Activation of one neuron: 1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Find optimal weights & bias 2 terms to reduce cost function 1 Σ (ŷ1 – y1) (e.g. MSE, cross-entropy) Σ • Weights: influence of neurons in x 2 1 Σ (ŷ2 - y2) previous layer Σ • Bias terms: nudge towards x2 2 active/inactive Σ (ŷ3 – y3) Σ 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training Input layer Hidden layer Output layer Minimize cost 1 function using GD Σ • Initialize weights & bias terms with random values 2 1 Σ (ŷ1 – y1) • Average cost over all instances: Σ 1 m 2 x1 Σ (ŷ2 - y2) C = Ci ∑ m i=1 Σ • Mini-batches to compute gradient with x2 2 Σ (ŷ3 – y3) respect to weights and biases Σ • Gradient descent step 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training Input layer Hidden layer Output layer Backpropagation 1 b • Forward pass: compute output of all neurons for training instance x Σ Σ 1 (L) w • Backward pass: compute partial (L-1) (L) a a derivatives for each weight and bias • Cost: C = a( L) − y 2 0 ( ) • Average over mini-batch: m ( L) ( L) ( L) ( L) ( L−1) ∂C 1 mb ∂C • Activation: a =f (z ) with z = w a + b mb i act ( L) = ( L) ∑ ∂w mmb i=1 ∂w (L) • Derivative of C0 with respect to w : • Combine partial derivatives ∂C ∂z( L) ∂a( L) ∂C à gradient vector 0 = ⋅ ⋅ 0 ∂w( L) ∂w( L) ∂z( L) ∂a( L) 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Tensorflow High-Level API Plain Tensorflow • TF.Learn 1. Construction phase • DNNClassifier 2. Training phase • Parameters: 3. Using the trained • # layers network • # neurons per layer • batch size à Demo in Jupyter • # iterations • activation function 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Hyperparameters # hidden layers # neurons / layer # activation • More layers: • Funnel or constant size • ReLU is a good choice in • exponentially fewer general neurons for complex • “black art” functions • Softmax for output when • Converge faster • Too many neurons cause classes are mutually • Generalize better overfitting exclusive • Reuse of layers • Start with few layers and increase number 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Summary From Artifical Neurons to Deep Neural networks Initial concepts in the 1940s, dark age and recent boom Multi-layer Perceptron Layers, neurons, bias, weights, activation function Training of ANN Gradient descent, backpropagation & mini-batches Implementation in Tensorflow High-Level API, Plain TensorFlow, hyperparameter tuning References [1] https://www.slideshare.net/deview/251-deep-learning-using-cu-dnn/4 [2] https://en.wikipedia.org/wiki/Neuron#/media/File:Blausen_0657_MultipolarNeuron.png [3] https://en.wikipedia.org/wiki/Neural_oscillation [4] http://www.lesicalab.com/research/ [5] McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity [6] https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep- learning-9689331ba09 .

Load more