1 Neural Learning Methods the Basic Neuron Model Input Bias Weights
Total Page:16
File Type:pdf, Size:1020Kb
Neural Learning Methods } An obvious source of biological inspiration for learning research: the brain } The work of McCulloch and Pitts on the perceptron (1943) started as research into how we could precisely model the neuron and the network of connections that 212 Chapter 18. LearningfromExamples allow animals (like us) to learn Class #16: } These networks are used as classifiers: given an input, Neural Networks they label that input with a classification, or a distribution 212 over possible classifications Chapter 18. LearningfromExamples Machine Learning (COMP 135): M. Allen, 28 Oct. 19 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 2 1 2 Source: Russel & Norvig, AI: A Modern Approach The Basic Neuron Model (Prentice Hal, 2010) Input Bias Weights Bias Weight Bias Weight a = 1 a = 1 0 w aj = g(inj) 0 aj = g(inj) 0,j w0,j g inj g wi,j inj ai aj wi,j Σ ai Σ aj Input Input Activation Output Links Function Function Output Links Input Input Activation Output Links Function Function Output Links Figure 18.19 FILES: figures/neuron-unit.eps (Wed Nov 4 11:23:13 2009). Asimplemathemat- } Each input ai to neuron j is given a weightn wi,j ical model for a neuron. The unit’s output activation is aj = g(Pi =0wi,j ai),whereai is the output } Eachactivation neuron of unit iisand treatedwi,j is the weightas having on the link a from fixed unit i todummy this unit. input, a0 = 1 Figure 18.19 FILES: figures/neuron-unit.eps (Wed Nov 4 11:23:13 2009). Asimplemathemat- } Neuron gets input from a set of other neurons,n or from } The input function is then the weighted linear sum: ical model for a neuron. The unit’s output activation is aj = g(P wi,j ai),whereai is the output i =0 n activationthe problem of unit i and winput,i,j is the and weight computes on the link from the unit i functionto this unit. g inj = wi,j ai = w0,j a0 + w1,j a1 + w2,j a2 + + wn,j an } ··· Output aj is either passed along to another set of neurons, i=0 X or is used as final output for learning problem itself = w + w a + w a + + w a 0,j 1,j 1 2,j 2 ··· n,j n Monday, 28 Oct. 2019 Machine Learning (COMP 135) 3 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 4 3 4 1 212 Chapter 18. LearningfromExamples 212 Chapter 18. LearningfromExamples We’ve Seen This Before! Neuron Output Functions Bias Weight Bias Weight a = 1 a = 1 0 aj = g(inj) 0 aj = g(inj) w0,j w0,j g g inj inj wi,j wi,j ai Σ aj ai Σ aj Input Input Activation Output Input Input Activation Output Links Function Function Output Links Links Function Function Output Links } TheFigure weighted 18.19 FILES:linear figures/neuron-unit.eps sum of inputs, with (Wed Nov dummy, 4 11:23:13 a 2009).0 = 1Asimplemathemat-, is just a form Figure 18.19 FILES: figures/neuron-unit.eps (Wed Nov 4 11:23:13 2009). Asimplemathemat- n n ical model for a neuron. The unit’s output activation is aj = g(P wi,j ai),whereai is the output ical model for a neuron. The unit’s output activation is aj = g(P wi,j ai),whereai is the output of the cross-product that our classifiers havei =0been using all along } While the inputs to any neuron are treatedi =0 in a linear activation of unit i and wi,j is the weight on the link from unit i to this unit. activation of unit i and wi,j is the weight on the link from unit i to this unit. } Remember that the “neuron” here is just another way of looking at the fashion, the output function g need not be linear perceptron idea we already discussed 212 Chapter 18. LearningfromExamples 212 Chapter 18. LearningfromExamples n } The power of neural nets comes from fact that we can in = w a = w + w a , +w a + + w a combine large numbers of neurons together to compute j i,j i 0,j 1,j 1 2,j 2 ··· n,j n i=0 X any function (linear or not) that we choose = w a j · Monday, 28 Oct. 2019 Machine Learning (COMP 135) 5 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 6 5 6 The Perceptron Threshold Function The Sigmoid Activation Function Bias Weight Bias Weight a = 1 a = 1 0 aj = g(inj) 0 aj = g(inj) w0,j w0,j g g inj inj wi,j wi,j ai Σ aj ai Σ aj Input Input Activation Output Input Input Activation Output Links Function Function Output Links Links Function Function Output Links Figure 18.19 FILES: figures/neuron-unit.eps (Wed Nov 4 11:23:13 2009). Asimplemathemat- Figure 18.19 FILES: figures/neuron-unit.eps (Wed Nov 4 11:23:13 2009). Asimplemathemat- } n } A function that has been more often usedn in neural networks is Oneical modelpossible for a neuron. function The unit’s outputis the activation binary is aj = gthreshold(P wi,j ai),where, whichai is the is output ical model for a neuron. The unit’s output activation is aj = g(P wi,j ai),whereai is the output i =0 the logistic (also known as the Sigmoid), asi =0 seen before suitableactivation offor unit “firm”i and wi,j is classification the weight on the link from problems, unit i to this unit. and causes the activation of unit i and wi,j is the weight on the link from unit i to this unit. } This gives us a “soft” value, which we can often interpret as the neuron to activate based on a simple binary function: probability of belonging to some output class 1ifinj 0 1 g(inj)= ≥ g(inj)= (0 else inj 1+e− Monday, 28 Oct. 2019 Machine Learning (COMP 135) 7 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 8 7 8 2 Power of Perceptron Networks Power of Perceptron Networks } A single-layer network combines a linear function of input } A single-layer network with inputs for variables (x1, x2), and weights with the non-linear output function bias term (x0 == 1), can compute OR of inputs } Threshold: (y == 1) if weighted sum (S >= 0); else (y == 0) } If we threshold output, we have a boolean (1/0) function } This is sufficient to compute numerous linear functions x1 OR x2 1 x1 x2 y x1 OR x2 x1 AND x2 0 0 0 x1 y x1 x2 y x1 x2 y 0 1 1 x 0 0 0 0 0 0 1 0 1 2 0 1 1 0 1 0 1 1 1 1 0 1 1 0 0 } What weights can we apply to the three inputs to produce OR? 1 1 1 1 1 1 } One answer: -0.5 + x1 + x2 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 9 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 10 9 10 Power of Perceptron Networks Linear Separation with Perceptron Networks } We can think of binary functions as dividing (x1, x2) plane x1 AND x2 1 } The ability to express such a function is analogous to the ability x1 x2 y to linearly separate data in such regions 0 0 0 x1 y 0 1 0 x1 OR x2 x2 1 0 0 x1 x2 y 1 1 1 1 0 0 0 x2 0 1 1 } What about the AND function instead? 1 0 1 0 } One answer: -1.5 + x1 + x2 1 1 1 = 1 0 x1 1 = 0 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 11 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 12 11 12 3 Linear Separation with Perceptron Networks Functions with Non-Linear Boundaries } We can think of binary functions as dividing (x1, x2) plane } There are some functions that cannot be expressed using a single layer of linear weighted inputs, and a non-linear output } The ability to express such a function is analogous to the ability } Again, this is analogous to the inability to linearly separate data in some cases to linearly separate data in such regions x1 AND x2 x1 XOR x2 210 Chapter 18. LearningfromExamples x1 x2 y 1 x1 x2 y 1 0 0 0 x2 0 0 0 x2 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 = 1 0 x1 1 = 1 0 x1 1 = 0 = 0 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 13 Monday, 28 Oct. 2019 Machine Learning (COMP 135) 14 13 14 MLP’s for Non-Linear Boundaries Review: Properties of the Sigmoid Function } Neural networks gain expressive power because they can have more than one layer } The Sigmoid takes its name 1 1 } A multi-layer perceptron has one or more 1 from the shape of its plot hidden layers between input and output 1 } It always has a value in range: } Each hidden node applies a non-linear x2 0.8 activation function, producing output that 0.6 0 ≤ x ≤ 1 it sends along to the next layer 0.5 0.5 0.4 } In such cases, much more complex functions 0 0.2 } The function is everywhere -2-4 are possible, corresponding to non-linear 0 differentiable, and has a 0 decision boundaries (as in current -2 4 2 x 0 2 6 2 homework assignment) derivativex 4 that is 10 easy8 to 0 x1 1 1 6 0 0 calculate, which turns out to 1 -8 -6 -4 -2 0 2 4 6 8 -6 -4 -2 0 2 4 6 be useful for learning: h1 (a) (b) (c) x1 y 1 2 g(inj)= h Figure 18.17 FILES: .