<<

Neural Networks 2

Biological Neural Networks Artifical Neural Networks  Neural networks are inspired by our .  The human has about 1011 and 1014 synapses.  A consists of a soma (cell body), axons (sends signals), and dendrites (receives signals).  A synapse connects an axon to a dendrite.  Given a signal, a synapse might increase (excite) or decrease (inhibit) electrical potential. A neuron fires when its electrical potential reaches a threshold.  Neural Networks 2 might occur by changes to synapses. BiologicalNeuralNetworks ...... 2 CS 5233 Artificial Artificial Neural Networks – 2 ArtificialNeuralNetworks ...... 3 ANNStructure...... 4 Artificial Neural Networks ANNIllustration...... 5 An (artificial) neural network consists of units, connections, and weights. Inputs 6 and outputs are numeric. PerceptronLearningRule ...... 6 PerceptronsContinued ...... 7 Biological NN Artificial NN Example of Learning (α =1)...... 8 soma unit 9 axon, dendrite connection GradientDescent(Linear)...... 9 synapse weight TheDeltaRule ...... 10 potential weighted sum threshold bias weight Multilayer Networks 11 signal activation Illustration...... 11 SigmoidActivation...... 12 CS 5233 Artificial Intelligence Artificial Neural Networks – 3 PlotofSigmoidFunction...... 13 ApplyingTheChainRule...... 14 ...... 15 Hypertangent Activation ...... 16 GaussianActivationFunction...... 17 IssuesinNeuralNetworks ...... 18

1 2 ANN Structure Perceptrons 6  A typical unit i receives inputs aj1, aj2,... from other units and performs a Perceptron Learning Rule weighted sum:  A perceptron is a single unit with activation: ini = W0,i + j Wj,i aj P and outputs activation ai = g(ini). a = sign W0 + j Wj ∗ xj  input units hidden units  P  Typically, store the inputs, transform the inputs sign returns −1 or 1. W is the bias weight. into an internal numeric vector, and an output unit transforms the hidden 0  One version of the perceptron learning rule is: values into the prediction.  An ANN is a function f(x, W) = a, where x is an example, W is the in ← W0 + Wj ∗ xj weights, and a is the prediction (activation value from output unit). E ← max(0,P1 − y ∗ in)  Learning is finding a W that minimizes error. if E > 0 then W0 ← W0 + α ∗ y CS 5233 Artificial Intelligence Artificial Neural Networks – 4 Wj ← Wj + α ∗ xj ∗ y for each input xj x ANN Illustration is the inputs, y ∈{1, −1} is the target, and α > 0 is the learning rate. INPUT HIDDEN OUTPUT CS 5233 Artificial Intelligence Artificial Neural Networks – 6 UNITS UNITS UNIT OUTPUT Perceptrons Continued x1 W15 H5  This learning rule tends to minimize E. W25  The perceptron convergence states that if some W classifies all W35 W05 W57 the examples correctly, then the perceptron learning rule will x2 W45 converge to zero error on the training examples.  Usually, many epochs (passes over the training examples) are needed until bias W07 O7 a7 W16 convergence. x3  If zero error is not possible, use α ≈ 0.1/n, where n is the number of W26 normalized or standardized inputs. W06 W67 W36 CS 5233 Artificial Intelligence Artificial Neural Networks – 7 x4W46 H6 WEIGHTS CS 5233 Artificial Intelligence Artificial Neural Networks – 5

3 4 Example of Perceptron Learning (α =1) The Delta Rule Using α =1:  Given error E(W), obtain the gradient: Inputs Weights ∂E ∂E ∂E ∇E(W) = , ,... x1 x2 x3 x4 y in E W0 W1 W2 W3 W4 ∂W ∂W ∂W  0 0 0 0 0 0 1 n 0 0 0 1 -1 0 1 -1 0 0 0 -1  To decrease error, use the update rule: 1 1 1 0 1 -1 2 0 1 1 1 -1 ∂E 1 1 1 1 1 2 0 0 1 1 1 -1 Wj ← Wj − α 0 0 1 1 -1 0 1 -1 1 1 0 -2 ∂Wj 0 0 0 0 1 -1 2 0 1 1 0 -2  0 1 0 1 -1 -1 0 0 1 1 0 -2 The LMS update rule can be derived using: 1 0 0 0 1 1 0 0 1 1 0 -2 2 E(W)=(y − (W0 + Wj ∗ xj)) /2 1 0 1 1 1 -1 2 1 2 1 1 -1 0 1 0 0 -1 2 3 0 2 0 1 -1  For the perceptron learningP rule, use: CS 5233 Artificial Intelligence Artificial Neural Networks – 8 E(W) = max(0, 1 − y ∗ (W0 + Wj ∗ xj))

CS 5233 Artificial Intelligence P Artificial Neural Networks – 10

Gradient Descent 9

Gradient Descent (Linear) Multilayer Networks 11  Suppose activation is a linear function: Illustration a = W0 + Wj ∗ xj INPUT HIDDEN OUTPUT  The Widrow-HoffP (LMS) update rule is: UNITS UNITS UNIT OUTPUT diff ← y − a x1 1 H5 W0 ← W0 + α ∗ diff Wj ← Wj + α ∗ xj ∗ diff for each input j 2 1 −1 2 where α is the learning rate. x2  This update rule tends to minimize squared error. 3 E(W) = (y − a)2 bias 1 O7 a7 (x,y) 0 P x3 CS 5233 Artificial Intelligence Artificial Neural Networks – 9 −3 −2 −4 4 x41 H6 WEIGHTS CS 5233 Artificial Intelligence Artificial Neural Networks – 11

5 6 Sigmoid Activation Applying The Chain Rule 2  The is defined as:  Using E =(yi − ai) for output unit i: 1 sigmoid(x) = ∂E ∂E ∂ai ∂ini 1 + e−x = ∂Wj,i ∂ai ∂ini ∂Wj,i  It is commonly used for ANN activation functions: = −2(yi − ai) ai(1 − ai) aj ai = sigmoid(ini) = sigmoid(W + W a ) 0,i j j,i j  For weights from input to hidden units:  Note that P ∂sigmoid(x) ∂E ∂E ∂ai ∂ini ∂aj ∂inj = sigmoid(x)(1 − sigmoid(x)) = ∂x ∂Wk,j ∂ai ∂ini ∂aj ∂inj ∂Wk,j

CS 5233 Artificial Intelligence Artificial Neural Networks – 12 = −2(yi − ai) ai(1 − ai) Wj,i aj(1 − aj) xk Plot of Sigmoid Function CS 5233 Artificial Intelligence Artificial Neural Networks – 14 1.0 Backpropagation sigmoid(x) 0.8  Backpropagation is an application of the delta rule.  Update each weight W using the rule: 0.6 ∂E W ← W − α 0.4 ∂W where α is the learning rate. 0.2 CS 5233 Artificial Intelligence Artificial Neural Networks – 15 0.0 -4 -2 0 2 4 CS 5233 Artificial Intelligence Artificial Neural Networks – 13

7 8 Hypertangent Issues in Neural Networks 1.0  Sufficiently complex ANNs can approximate any “reasonable” function. tanh(x) ANNs approx. a preference bias for interpolation.  How many hidden units and layers? 0.5 What are good initial values?  One approach to avoid overfitting is: 0.0 Remove “validation” exs. from training exs. Train neural network using training exs. Choose weights that are best on validation set. -0.5  Faster might rely on: Momentum, a running average of deltas. -1.0 Conjugant Gradient, a second deriv. method, -2 -1 0 1 2 or RPROP, update based only on sign. CS 5233 Artificial Intelligence Artificial Neural Networks – 16 CS 5233 Artificial Intelligence Artificial Neural Networks – 18

Gaussian Activation Function 1.0 gaussian(x,1) 0.8

0.6

0.4

0.2

0.0 -3 -2 -1 0 1 2 3 CS 5233 Artificial Intelligence Artificial Neural Networks – 17

9 10