Neural Networks I

Machine Learning { winter term 2016/17 { Chapter 07: Neural Networks I Prof. Adrian Ulges Masters \Computer Science" DCSM Department University of Applied Sciences RheinMain 1 14. November 2016 Resources for the next Chapters Online Book 1: The nice one I Nielsen: \Neural Networks and Deep Learning" http://neuralnetworksanddeeplearning.com Online Book 2: The tough one I Goodfellow, Bengio, Courville: \Deep learning" https://www.deeplearningbook.org 2 (Artificial) Neural Networks.... image: [1] I ... are one of the core topics of artificial intelligence (AI) I ... are biologically inspired: They are mathematical models simulating the process of \thinking" in living organisms' brains I ... are graphs (or networks) of interlinked atomic processing units (neurons) I ... are not programmed but define their own behavior entirely through the graph's links and weights. These are adjusted by training. I ... are covered by two research fields 1. computational neuroscience (goal: better understanding of biological processes) 2. machine learning / AI (here) (goal: solving practical data analysis problems) 3 Neural Networks.... I ... have a turbulent history 1940 1950 1960 1970 1980 1990 2000 2010 2015 I ... exist in many variants I multi-layer perceptron (in this lecture) I convolutional neural networks (in this lecture) I Boltzmann machines I RBF networks I recurrent neural networks I LSTM networks I ... I ... are the most intensely machine learning model these days (\deep learning") 4 Deep Learning Applications images from [5] [6] [2] [4] [7] 5 Neural Networks: Biological Inspiration Neurons are brain cells that send and receive electrical impulses. Neurons: Terminology I Cell body (soma): soma 1 center of the neuron, with electrical axon a diameter of 5 − 100µm impulse soma 3 synapse I axon: thin, long (up to 1m) axon dendrites nerve fibre, splits and dendrites dendrites transmits impulses to other neurons I dendrites: small, soma 2 branch-like outgrowths that axon synapse receive impulses and axon transmit them to the soma pre-synaptic post-synaptic membrane membrane I synapses: electro-chemical link between two neurons. Transfers signal from axon dentrite to dendrites via transmitter substances. 6 From Biological to Artificial Neural Networks I Biological neural networks are (much) more complex than artificial ones - connectivity: 1011 neurons, each connected with ≈ 7; 000 others - adaptivity: number and connections of neurons change over time I Biological neural networks are asynchronous and compute with rather low frequency (1 KHz) I The circuit layout is unknown Learning I Learning happens at synapses, by changing the transmitter dose - sensitization: more transmitter, enhancing the signal - desensitization: less transmitter, supressing the signal Artificial Neural Networks X1 w1 θ I weighted sum + activation function X y 2 Σ w2 I learning happens by adapting weights wm I memory happens by storing weights Xm 7 Outline 1. Neurons 2. Training Neurons 3. The Capacity of Neurons 4. Neural Networks 8 The McP-Neuron I The historically oldest (1943) neuron by McCulloch & Pitts I strongly simplified model of biological neurons I no learning yet Definition I input d x = (x1; :::; xd ) 2 f0; 1g I output y 2 f0; 1g I weights w = d (w1; :::; wd ) 2 {−1; 1g I wj = 1: stimulating X (="anregend") 1 w1 θ y I wj = −1: inhibitory X 2 w Σ (= \hemmend") 2 X wm I threshold θ 2 R m 9 The McP-Neuron Computational Model I compute the scalar product between input signal x and weights w, namely w · x I apply an activation function f (here, a step function), obtaining the output y y = f (w · x) 1 if w · x ≥ θ = 0 else f(x·w) I The McP-Neuron is a function φw,θ : f0; 1gd ! f0; 1g with parameters w; θ If φw,θ(x) = 1, we say the x·w I θ neuron \is activated", or the neuron “fires” 10 The McP-Neuron x → w McP-Neurons: Graphical Representation 1 1 . I input x . I weights w → w xm m I threshold θ θ ↓ y What can McP-Neurons do? I We can model logic gates using McP-neurons x → x → → 1 -1 1 1 x1 1 -0,5 x → → 2 1 x2 1 ↓ y 2 1 ↓ y ↓ y NOT AND OR 11 From McP-Neurons to the Perceptron McP-Neurons: Limitations I only boolean inputs/weights/outputs I only binary activation functions I no learning Extensions: The Perceptron d I real-valued weights and inputs w; x 2 R I We will introduce a learning algorithm, the Delta-rule 12 Perceptron: Graphical Interpretation I The parameters w; θ define a hyperplane I w · x ≥ θ: x is on the positive side I w · x < θ: x is on the negative side hy pe w rp la ne θ / ||w|| positive side negative side 13 Outline 1. Neurons 2. Training Neurons 3. The Capacity of Neurons 4. Neural Networks 14 Perceptron Training: Basics w,θ I A neuron's function (or behavior) φ is determined by the parameters w and θ I Given labeled training data: How do we find the best parameters / the best hyperplane? hyperplane 2 0 (w , θ ) positive 2 2 0 0 side 0 1 0 1 1 1 negative 0 0 0 0 side 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 negative 1 1 side hyperplane 1 positive θ (w1, 1) side bad parameters: good parameters: error rate 12 / 30 = 40% error rate 0% 15 Perceptron Training: Vector Augmentation Before learning, we simplify notation using a → w x1 1 simple trick called vector augmentation: . Idea . I omit the additional threshold parameter → w θ and include it in the weight vector xd d θ So far... ↓ y I x := (x1; :::; xd ); w := (w1; :::; wd ) I fire if w · x ≥ θ (or w · x − θ ≥ 0) → w x1 1 Now . I x := (x1; :::; xd ; 1); w := (w1; :::; wd ; −θ) . I fire if w · x ≥ 0 x → w Remark d d 1 → −θ This is just a change of notation (θ is not 0 written separately any more), not of behavior! ↓ y 16 Perceptron Training Learning happens by an iterative optimization. We start with random values for w and θ, and in each iteration we ... I ... classify a training sample x and compare the result φ(x) with the targeted result y I If necessary, we correct the neuron's parameters such that the output φ(x) is adapted towards the desired output y wnew ← wold + ∆w x1 w 1 no 2 φ(x) x θ w Σ =? 2 yes w xd d y 17 Perceptron Training: The Delta Rule How do we compute the 'right' update ∆w? There are different strategies. The most famous one is the Delta Rule: 4w = λ · y − φ(x) · x Remarks I We call λ the learning rate I Note: If xi is classified correctly, ∆w = 0 ! no update 1 function train perceptron(x1; :::; xn; y1; :::; yn; λ): 2 initialize w randomly 3 until convergence: 4 choose a random training sample (x; y) (with y 2 f0; 1g) 5 compute φ(x) // classify x 6 if φ(x) <> y: 7 w := w + λ · y − φ(x) · x | {z } ∆w 8 18 The Delta Rule: Motivation 4w = λ · y − φ(x) · x I Why does the Delta rule work? I We illustrate what happens when a training sample is misclassified ... hy pe rp la ne w w ∆w (=0.2·x) wnew =0.2 x 1 1 19 The Delta Rule: Motivation Formal Motivation 20 The Delta Rule: Motivation Formal Motivation 21 The Delta Rule: Code Example 22 The Delta Rule: Code Example iteration 0 10 8 6 4 2 0 0 2 4 6 8 10 23 The Delta Rule: Code Example iteration 1 10 8 6 4 2 0 0 2 4 6 8 10 24 The Delta Rule: Code Example iteration 2 10 8 6 4 2 0 0 2 4 6 8 10 25 The Delta Rule: Code Example iteration 1000 10 8 6 4 2 0 0 2 4 6 8 10 26 Outline 1. Neurons 2. Training Neurons 3. The Capacity of Neurons 4. Neural Networks 27 An Important Question... d Can a single perceptron learn any Function f : R ! {−1; 1g? 28 Achieving Non-Linearity by connecting Neurons We can realize XOR by connecting multiple neurons! x2 neuron 1 neuron 2 1 0 x1 1 1 n e x2 1 1 u ro n 0,5 1,5 2 0 1 n x φ φ e 1 1(x1,x2) 2(x1,x2) u ro n 1 1 -1 0,5 φ φ φ (x ,x ) 3( 1(x1,x2), 2 1 2 neuron 3 φ (x ,x ) ) 2 1 2 0 φ φ φ φ φ ⊗ x1 x2 (x ,x ) 2(x ,x ) 3( (x ,x ), 2(x ,x )) x x 3 1 1 2 1 2 1 1 2 1 2 1 2 n ro u 0 0 0 0 0 0 e n 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 φ 1(x1,x2) 1 1 1 1 0 0 29 neuron 1 neuron 2 x 1 1 Achieving Non-Linearity 1 x2 1 1 Remarks 0,5 1,5 (x ,x ) I This is only possible because 1(x1,x2) 1 2 of the neurons' non-linear 1 -1 0,5 ( 1(x1,x2), activation functions! neuron 3 (x1,x2) ) - with activation functions 00 00 0 0 y = f (w1 · f (w1x1 + w2x2) + w2 · f (w1x1 + w2x2)) - without activation functions 00 00 0 0 y = w1 · (w1x1 + w2x2) + w2 · (w1x1 + w2x2) 000 000 = w1 x1 + w2 x2 This is just a linear function (and does not solve XOR) 30 Achieving Non-Linearity by connecting Neurons Theorem (Learning Capacity of Networks of McP-Neurons) We can realize any boolean function f : f0; 1gd ! f0; 1g by connecting not more than 2d + 1 McP-neurons.

Neural Networks I

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support