Neural Networks I

Machine Learning { winter term 2016/17 { Chapter 07: Neural Networks I Prof. Adrian Ulges Masters \Computer Science" DCSM Department University of Applied Sciences RheinMain 1 14. November 2016 Resources for the next Chapters Online Book 1: The nice one I Nielsen: \Neural Networks and Deep Learning" http://neuralnetworksanddeeplearning.com Online Book 2: The tough one I Goodfellow, Bengio, Courville: \Deep learning" https://www.deeplearningbook.org 2 (Artificial) Neural Networks.... image: [1] I ... are one of the core topics of artificial intelligence (AI) I ... are biologically inspired: They are mathematical models simulating the process of \thinking" in living organisms' brains I ... are graphs (or networks) of interlinked atomic processing units (neurons) I ... are not programmed but define their own behavior entirely through the graph's links and weights. These are adjusted by training. I ... are covered by two research fields 1. computational neuroscience (goal: better understanding of biological processes) 2. machine learning / AI (here) (goal: solving practical data analysis problems) 3 Neural Networks.... I ... have a turbulent history 1940 1950 1960 1970 1980 1990 2000 2010 2015 I ... exist in many variants I multi-layer perceptron (in this lecture) I convolutional neural networks (in this lecture) I Boltzmann machines I RBF networks I recurrent neural networks I LSTM networks I ... I ... are the most intensely machine learning model these days (\deep learning") 4 Deep Learning Applications images from [5] [6] [2] [4] [7] 5 Neural Networks: Biological Inspiration Neurons are brain cells that send and receive electrical impulses. Neurons: Terminology I Cell body (soma): soma 1 center of the neuron, with electrical axon a diameter of 5 − 100µm impulse soma 3 synapse I axon: thin, long (up to 1m) axon dendrites nerve fibre, splits and dendrites dendrites transmits impulses to other neurons I dendrites: small, soma 2 branch-like outgrowths that axon synapse receive impulses and axon transmit them to the soma pre-synaptic post-synaptic membrane membrane I synapses: electro-chemical link between two neurons. Transfers signal from axon dentrite to dendrites via transmitter substances. 6 From Biological to Artificial Neural Networks I Biological neural networks are (much) more complex than artificial ones - connectivity: 1011 neurons, each connected with ≈ 7; 000 others - adaptivity: number and connections of neurons change over time I Biological neural networks are asynchronous and compute with rather low frequency (1 KHz) I The circuit layout is unknown Learning I Learning happens at synapses, by changing the transmitter dose - sensitization: more transmitter, enhancing the signal - desensitization: less transmitter, supressing the signal Artificial Neural Networks X1 w1 θ I weighted sum + activation function X y 2 Σ w2 I learning happens by adapting weights wm I memory happens by storing weights Xm 7 Outline 1. Neurons 2. Training Neurons 3. The Capacity of Neurons 4. Neural Networks 8 The McP-Neuron I The historically oldest (1943) neuron by McCulloch & Pitts I strongly simplified model of biological neurons I no learning yet Definition I input d x = (x1; :::; xd ) 2 f0; 1g I output y 2 f0; 1g I weights w = d (w1; :::; wd ) 2 {−1; 1g I wj = 1: stimulating X (="anregend") 1 w1 θ y I wj = −1: inhibitory X 2 w Σ (= \hemmend") 2 X wm I threshold θ 2 R m 9 The McP-Neuron Computational Model I compute the scalar product between input signal x and weights w, namely w · x I apply an activation function f (here, a step function), obtaining the output y y = f (w · x) 1 if w · x ≥ θ = 0 else f(x·w) I The McP-Neuron is a function φw,θ : f0; 1gd ! f0; 1g with parameters w; θ If φw,θ(x) = 1, we say the x·w I θ neuron \is activated", or the neuron “fires” 10 The McP-Neuron x → w McP-Neurons: Graphical Representation 1 1 . I input x . I weights w → w xm m I threshold θ θ ↓ y What can McP-Neurons do? I We can model logic gates using McP-neurons x → x → → 1 -1 1 1 x1 1 -0,5 x → → 2 1 x2 1 ↓ y 2 1 ↓ y ↓ y NOT AND OR 11 From McP-Neurons to the Perceptron McP-Neurons: Limitations I only boolean inputs/weights/outputs I only binary activation functions I no learning Extensions: The Perceptron d I real-valued weights and inputs w; x 2 R I We will introduce a learning algorithm, the Delta-rule 12 Perceptron: Graphical Interpretation I The parameters w; θ define a hyperplane I w · x ≥ θ: x is on the positive side I w · x < θ: x is on the negative side hy pe w rp la ne θ / ||w|| positive side negative side 13 Outline 1. Neurons 2. Training Neurons 3. The Capacity of Neurons 4. Neural Networks 14 Perceptron Training: Basics w,θ I A neuron's function (or behavior) φ is determined by the parameters w and θ I Given labeled training data: How do we find the best parameters / the best hyperplane? hyperplane 2 0 (w , θ ) positive 2 2 0 0 side 0 1 0 1 1 1 negative 0 0 0 0 side 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 negative 1 1 side hyperplane 1 positive θ (w1, 1) side bad parameters: good parameters: error rate 12 / 30 = 40% error rate 0% 15 Perceptron Training: Vector Augmentation Before learning, we simplify notation using a → w x1 1 simple trick called vector augmentation: . Idea . I omit the additional threshold parameter → w θ and include it in the weight vector xd d θ So far... ↓ y I x := (x1; :::; xd ); w := (w1; :::; wd ) I fire if w · x ≥ θ (or w · x − θ ≥ 0) → w x1 1 Now . I x := (x1; :::; xd ; 1); w := (w1; :::; wd ; −θ) . I fire if w · x ≥ 0 x → w Remark d d 1 → −θ This is just a change of notation (θ is not 0 written separately any more), not of behavior! ↓ y 16 Perceptron Training Learning happens by an iterative optimization. We start with random values for w and θ, and in each iteration we ... I ... classify a training sample x and compare the result φ(x) with the targeted result y I If necessary, we correct the neuron's parameters such that the output φ(x) is adapted towards the desired output y wnew ← wold + ∆w x1 w 1 no 2 φ(x) x θ w Σ =? 2 yes w xd d y 17 Perceptron Training: The Delta Rule How do we compute the 'right' update ∆w? There are different strategies. The most famous one is the Delta Rule: 4w = λ · y − φ(x) · x Remarks I We call λ the learning rate I Note: If xi is classified correctly, ∆w = 0 ! no update 1 function train perceptron(x1; :::; xn; y1; :::; yn; λ): 2 initialize w randomly 3 until convergence: 4 choose a random training sample (x; y) (with y 2 f0; 1g) 5 compute φ(x) // classify x 6 if φ(x) <> y: 7 w := w + λ · y − φ(x) · x | {z } ∆w 8 18 The Delta Rule: Motivation 4w = λ · y − φ(x) · x I Why does the Delta rule work? I We illustrate what happens when a training sample is misclassified ... hy pe rp la ne w w ∆w (=0.2·x) wnew =0.2 x 1 1 19 The Delta Rule: Motivation Formal Motivation 20 The Delta Rule: Motivation Formal Motivation 21 The Delta Rule: Code Example 22 The Delta Rule: Code Example iteration 0 10 8 6 4 2 0 0 2 4 6 8 10 23 The Delta Rule: Code Example iteration 1 10 8 6 4 2 0 0 2 4 6 8 10 24 The Delta Rule: Code Example iteration 2 10 8 6 4 2 0 0 2 4 6 8 10 25 The Delta Rule: Code Example iteration 1000 10 8 6 4 2 0 0 2 4 6 8 10 26 Outline 1. Neurons 2. Training Neurons 3. The Capacity of Neurons 4. Neural Networks 27 An Important Question... d Can a single perceptron learn any Function f : R ! {−1; 1g? 28 Achieving Non-Linearity by connecting Neurons We can realize XOR by connecting multiple neurons! x2 neuron 1 neuron 2 1 0 x1 1 1 n e x2 1 1 u ro n 0,5 1,5 2 0 1 n x φ φ e 1 1(x1,x2) 2(x1,x2) u ro n 1 1 -1 0,5 φ φ φ (x ,x ) 3( 1(x1,x2), 2 1 2 neuron 3 φ (x ,x ) ) 2 1 2 0 φ φ φ φ φ ⊗ x1 x2 (x ,x ) 2(x ,x ) 3( (x ,x ), 2(x ,x )) x x 3 1 1 2 1 2 1 1 2 1 2 1 2 n ro u 0 0 0 0 0 0 e n 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 φ 1(x1,x2) 1 1 1 1 0 0 29 neuron 1 neuron 2 x 1 1 Achieving Non-Linearity 1 x2 1 1 Remarks 0,5 1,5 (x ,x ) I This is only possible because 1(x1,x2) 1 2 of the neurons' non-linear 1 -1 0,5 ( 1(x1,x2), activation functions! neuron 3 (x1,x2) ) - with activation functions 00 00 0 0 y = f (w1 · f (w1x1 + w2x2) + w2 · f (w1x1 + w2x2)) - without activation functions 00 00 0 0 y = w1 · (w1x1 + w2x2) + w2 · (w1x1 + w2x2) 000 000 = w1 x1 + w2 x2 This is just a linear function (and does not solve XOR) 30 Achieving Non-Linearity by connecting Neurons Theorem (Learning Capacity of Networks of McP-Neurons) We can realize any boolean function f : f0; 1gd ! f0; 1g by connecting not more than 2d + 1 McP-neurons.

Load more