Multi-Layer Perceptrons

Machine Learning and Computer Vision Group Deep Learning with TensorFlow http://cvml.ist.ac.at/courses/DLWT_W18 Lecture 3: Artiﬁcial Neural Networks (Multilayer Perceptrons) Artificial Neural Networks Multi-layer Perceptrons Lars Bollmann Contents 01 Introduction Brief history, biological neuron and artificial neural network 02 Perceptron Features & drawbacks 03 Multi-layer Perceptron Description & training 04 Implementation: Tensorflow High-Level API and plain Tensorflow Brief history [1] 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Biological neuron [2] [3] [4] 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 (e LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 Artificial neural network (ANN) McCulloch & Pitts (1943) LOGICAL CALCULUS FOR NERVOUS ACTIVITY 105 Logical OR LOGICAL CALCULUSHeat illusion FOR NERVOUS ACTIVITY “learning”105 (i 1 (e Logical AND Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1); (c) N3(t).s.N1(t-1).N2(t-1); All images from [5] N3(t).= 01 Introduction(d) N,(t-l).-N,(t-1);02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (e N&).=.N2(t-2).N2(t-1); (e (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). N,(t-l).N,(t-1) NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N,(t-2).N,(t-2); (i 1 (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action is denoted by “N” with is subscript, as in the text: (i 1 (a) N*(t) .=.N,(t- 1); (b) N,(t).s.N,(t-l)vN,(t-1); (c) N3(t).s.N1(t-1).N2(t-1);(i 1 (d) N3(t).= N,(t-l).-N,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the(e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); cell, and the corresponding action is denoted by “N” with is subscript, as in the text: N&).=.N2(t-2).N2(t-1); (a) N*(t) .=.N,(t- 1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (b) N,(t).s.N,(t-l)vN,(t-1); Figure 1. The neuron ci is always marked with the numeral i upon the body of the cell, and the corresponding action isN,(t-l).N,(t-1) denoted by “N” with is subscript, as in the text: (c) N3(t).s.N1(t-1).N2(t-1); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (d) N3(t).= N,(t-l).-N,(t-1); (a) N*(t) .=.N,(t- 1); N,(t-2).N,(t-2); (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2); (b) N,(t).s.N,(t-l)vN,(t-1); (g) N,(t).=.NN,(t-2).-N,(t-3); N&).=.N2(t-2).N2(t-1); (c) N3(t).s.N1(t-1).N2(t-1); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (d) N3(t).= N,(t-l).-N,(t-1); (h) N,(t).=.N,(t-l).N,(t-2); N,(t-l).N,(t-1) (e) N,(t):=:N,(t-l).v.N,(t-3).-N,(t-2);(i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). N&).=.N2(t-2).N2(t-1); N,(t-2).N,(t-2); (f) N4(t):3: --N,(t-l).N,(t-l)vN,(t-l).v.N,(t-1). (g) N,(t).=.NN,(t-2).-N,(t-3); N,(t-l).N,(t-1) (h) N,(t).=.N,(t-l).N,(t-2); NJt):=: -N,(t-2).N,(t-2)vN,(t-2).v.N,(t-2). (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). N,(t-2).N,(t-2); (g) N,(t).=.NN,(t-2).-N,(t-3); (h) N,(t).=.N,(t-l).N,(t-2); (i) N,(t):=:Nz(t-l).v.N,(t-l).(Ex)t-1 .N,(x).N,(x). The Perceptron Rosenblatt (1957) Features Drawbacks Outputs • Linear threshold unit: • Still a linear classifier y ⎧ m ⎪1 if w ⋅ x + b > 0 y ∑ i,1 i =⎨ i=1 x2 ⎪ Σ ⎩0 otherwise w2,1 b • Update rule for weights: w1,1 w (next _ step)=w +η( y − yˆ )x 1 i, j i, j j j i • Converges if training instances are linearly x1 x2 x1 Inputs separable 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Multi-layer Perceptron (MLP) Input layer Hidden layer Output layer • Activation of one neuron: 1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Alternative activation functions: 1 Σ y1 Σ x 1 Σ y2 Σ x2 Σ y3 Σ [6] 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training Input layer Hidden layer Output layer • Activation of one neuron: 1 ⎛ ⎞ a( L) f ⎜ w a( L−1) b⎟ i = activation ⎜∑ (i, j) j + ⎟ ⎝ j ⎠ Σ • Find optimal weights & bias 2 terms to reduce cost function 1 Σ (ŷ1 – y1) (e.g. MSE, cross-entropy) Σ • Weights: influence of neurons in x 2 1 Σ (ŷ2 - y2) previous layer Σ • Bias terms: nudge towards x2 2 active/inactive Σ (ŷ3 – y3) Σ 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training Input layer Hidden layer Output layer Minimize cost 1 function using GD Σ • Initialize weights & bias terms with random values 2 1 Σ (ŷ1 – y1) • Average cost over all instances: Σ 1 m 2 x1 Σ (ŷ2 - y2) C = Ci ∑ m i=1 Σ • Mini-batches to compute gradient with x2 2 Σ (ŷ3 – y3) respect to weights and biases Σ • Gradient descent step 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Training Input layer Hidden layer Output layer Backpropagation 1 b • Forward pass: compute output of all neurons for training instance x Σ Σ 1 (L) w • Backward pass: compute partial (L-1) (L) a a derivatives for each weight and bias • Cost: C = a( L) − y 2 0 ( ) • Average over mini-batch: m ( L) ( L) ( L) ( L) ( L−1) ∂C 1 mb ∂C • Activation: a =f (z ) with z = w a + b mb i act ( L) = ( L) ∑ ∂w mmb i=1 ∂w (L) • Derivative of C0 with respect to w : • Combine partial derivatives ∂C ∂z( L) ∂a( L) ∂C à gradient vector 0 = ⋅ ⋅ 0 ∂w( L) ∂w( L) ∂z( L) ∂a( L) 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Tensorflow High-Level API Plain Tensorflow • TF.Learn 1. Construction phase • DNNClassifier 2. Training phase • Parameters: 3. Using the trained • # layers network • # neurons per layer • batch size à Demo in Jupyter • # iterations • activation function 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Hyperparameters # hidden layers # neurons / layer # activation • More layers: • Funnel or constant size • ReLU is a good choice in • exponentially fewer general neurons for complex • “black art” functions • Softmax for output when • Converge faster • Too many neurons cause classes are mutually • Generalize better overfitting exclusive • Reuse of layers • Start with few layers and increase number 01 Introduction 02 Perceptron 03 Multi-layer Perceptron 04 Tensorflow Summary From Artifical Neurons to Deep Neural networks Initial concepts in the 1940s, dark age and recent boom Multi-layer Perceptron Layers, neurons, bias, weights, activation function Training of ANN Gradient descent, backpropagation & mini-batches Implementation in Tensorflow High-Level API, Plain TensorFlow, hyperparameter tuning References [1] https://www.slideshare.net/deview/251-deep-learning-using-cu-dnn/4 [2] https://en.wikipedia.org/wiki/Neuron#/media/File:Blausen_0657_MultipolarNeuron.png [3] https://en.wikipedia.org/wiki/Neural_oscillation [4] http://www.lesicalab.com/research/ [5] McCulloch and Pitts, A logical calculus of the ideas immanent in nervous activity [6] https://medium.com/@shrutijadon10104776/survey-on-activation-functions-for-deep- learning-9689331ba09 .

Multi-Layer Perceptrons

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support