Neural Networks Slides

Artificial Intelligence: Representation and Problem Solving 15-381 January 16, 2007 Neural Networks Topics • decision boundaries • linear discriminants • perceptron • gradient learning • neural networks Artificial Intelligence: Neural Networks 2 Michael S. Lewicki ! Carnegie Mellon The Iris dataset with decision tree boundaries 2.5 2 1.5 petal width (cm) 1 0.5 0 1 2 3 4 5 6 7 petal length (cm) Artificial Intelligence: Neural Networks 3 Michael S. Lewicki ! Carnegie Mellon The optimal decision boundary for C2 vs C3 • optimal decision boundary is p(petal length |C3) determined from the statistical 0.9p(petal length |C2) distribution of the classes 0.8 0.7 optimal only if model is correct! 0.6 • 0.5 • assigns precise degree of uncertainty 0.4 to classification 0.3 0.2 0.1 0 2.5 2 1.5 1 petal width (cm) 0.5 0 1 2 3 4 5 6 7 petal length (cm) Artificial Intelligence: Neural Networks 4 Michael S. Lewicki ! Carnegie Mellon Optimal decision boundary Optimal decision boundary p(C2 | petal length) p(C3 | petal length) 1 0.8 p(petal length |C2) p(petal length |C3) 0.6 0.4 0.2 0 1 2 3 4 5 6 7 Artificial Intelligence: Neural Networks 5 Michael S. Lewicki ! Carnegie Mellon Can we do better? • only way is to use more information p(petal length |C3) DTs use both petal width and petal 0.9p(petal length |C2) • 0.8 length 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 2.5 2 1.5 1 petal width (cm) 0.5 0 1 2 3 4 5 6 7 petal length (cm) Artificial Intelligence: Neural Networks 6 Michael S. Lewicki ! Carnegie Mellon Arbitrary decision boundaries would be more powerful 2.5 Decision boundaries 2 could be non-linear 1.5 1 petal width (cm) 0.5 0 1 2 3 4 5 6 7 petal length (cm) Artificial Intelligence: Neural Networks 7 Michael S. Lewicki ! Carnegie Mellon Defining a decision boundary consider just two classes • 2.5 • want points on one side of line in class 1, otherwise class 2. • 2D linear discriminant function: 2 2 y = mT x + b x 1.5 = m1x1 + m2x2 + b = mixi + b i ! 1 3 4 5 6 7 This defines a 2D plane which x • 1 leads to the decision: The decision boundary: T class 1 if y 0, y = m x + b = 0 x ≥ ∈ !class 2 if y < 0. Or in terms of scalars: m x + m x = b 1 1 2 2 − m1x1 + b x2 = ⇒ − m2 Artificial Intelligence: Neural Networks 8 Michael S. Lewicki ! Carnegie Mellon Linear separability • Two classes are linearly separable if they can be separated by a linear combination of attributes - 1D: threshold not linearly separable - 2D: line - 3D: plane - M-D: hyperplane 2.5 2 1.5 1 petal width (cm) 0.5 linearly separable 0 1 2 3 4 5 6 7 petal length (cm) Artificial Intelligence: Neural Networks 9 Michael S. Lewicki ! Carnegie Mellon Diagraming the classifier as a “neural” network • The feedforward neural network is specified by weights wi and bias b: y = wT x + b y “output unit” M “bias” b = wixi + b w1 w2 wM “weights” i=1 ! “input units” x1 x2 • •!• xM • It can written equivalently as M T y = w x = wixi y i=0 ! w0 w1 w2 wM • where w0 = b is the bias and a “dummy” input x0 that is always 1. x0=1 x1 x2 • •!• xM Artificial Intelligence: Neural Networks 10 Michael S. Lewicki ! Carnegie Mellon Determining, ie learning, the optimal linear discriminant • First we must define an objective function, ie the goal of learning • Simple idea: adjust weights so that output y(xn) matches class cn • Objective: minimize sum-squared error over all patterns xn: 1 N E = (wT x c )2 2 n − n n=1 ! • Note the notation xn defines a pattern vector: x = x , . , x n { 1 M }n • We can define the desired class as: 0 xn class 1 cn = ∈ 1 x class 2 ! n ∈ Artificial Intelligence: Neural Networks 11 Michael S. Lewicki ! Carnegie Mellon We’ve seen this before: curve fitting t = sin(2πx) + noise tn 1 t 0 y(xn, w) !1 x 0 1 xn example from Bishop (2006), Pattern Recognition and Machine Learning Artificial Intelligence: Neural Networks 12 Michael S. Lewicki ! Carnegie Mellon Neural networks compared to polynomial curve fitting M y(x, w) = w + w x + w x2 + + w xM = w xj 0 1 2 · · · M j 1 N j=0 E(w) = [y(x , w) t ]2 ! 2 n − n n=1 ! 1 1 0 0 !1 !1 For the linear network, M=1 and 0 1 0 1 there are multiple input dimensions 1 1 0 0 !1 !1 0 1 0 1 example from Bishop (2006), Pattern Recognition and Machine Learning Artificial Intelligence: Neural Networks 13 Michael S. Lewicki ! Carnegie Mellon General form of a linear network • A linear neural network is simply a y linear transformation of the input. M = yj ! wi,j xi W i=0 x • Or, in matrix-vector form: y = Wx y1 • •!• yi • •!• yK “outputs” • Multiple outputs corresponds to multivariate regression “weights” wĳ x0=1 x1 • •!• xi • •!• xM “inputs” “bias” Artificial Intelligence: Neural Networks 14 Michael S. Lewicki ! Carnegie Mellon Training the network: Optimization by gradient descent • We can adjust the weights incrementally to minimize the objective function. • This is called gradient descent • Or gradient ascent if we’re maximizing. • The gradient descent rule for weight wi is: t+1 t ∂E wi = wi − ! wi w2 w4 Or in vector form: • w3 ∂E w = w − ! w2 t+1 t w For gradient ascent, the sign • w1 of the gradient step changes. w1 Artificial Intelligence: Neural Networks 15 Michael S. Lewicki ! Carnegie Mellon Computing the gradient • Idea: minimize error by gradient descent • Take the derivative of the objective function wrt the weights: 1 N E = (wT x c )2 2 n − n n=1 ! ∂E 2 N = (w x + + w x + + w x c )x w 2 0 0,n · · · i i,n · · · M M,n − n i,n i n=1 ! N = (wT x c )x n − n i,n n=1 ! • And in vector form: ∂E N = (wT x c )x w n − n n n=1 ! Artificial Intelligence: Neural Networks 16 Michael S. Lewicki ! Carnegie Mellon Simulation: learning the decision boundary Each iteration updates the gradient: • 2.5 t+1 t ∂E wi = wi − ! wi 2 2 x N ∂E 1.5 wT x − c x w = !( n n) i,n i =1 n 1 3 4 5 6 7 x Epsilon is a small value: 1 • 11000 " = 0.1/N 10000 9000 Learning Curve 8000 Epsilon too large: 7000 • 6000 Error - learning diverges 5000 Epsilon too small: 4000 • 3000 - convergence slow 2000 1000 0 5 10 15 iteration Artificial Intelligence: Neural Networks 17 Michael S. Lewicki ! Carnegie Mellon Simulation: learning the decision boundary Each iteration updates the gradient: • 2.5 t+1 t ∂E wi = wi − ! wi 2 2 x N ∂E 1.5 wT x − c x w = !( n n) i,n i =1 n 1 3 4 5 6 7 x Epsilon is a small value: 1 • 11000 " = 0.1/N 10000 9000 Learning Curve 8000 Epsilon too large: 7000 • 6000 Error - learning diverges 5000 Epsilon too small: 4000 • 3000 - convergence slow 2000 1000 0 5 10 15 iteration Artificial Intelligence: Neural Networks 18 Michael S. Lewicki ! Carnegie Mellon Simulation: learning the decision boundary Each iteration updates the gradient: • 2.5 t+1 t ∂E wi = wi − ! wi 2 2 x N ∂E 1.5 wT x − c x w = !( n n) i,n i =1 n 1 3 4 5 6 7 x Epsilon is a small value: 1 • 11000 " = 0.1/N 10000 9000 Learning Curve 8000 Epsilon too large: 7000 • 6000 Error - learning diverges 5000 Epsilon too small: 4000 • 3000 - convergence slow 2000 1000 0 5 10 15 iteration Artificial Intelligence: Neural Networks 19 Michael S. Lewicki ! Carnegie Mellon Simulation: learning the decision boundary Each iteration updates the gradient: • 2.5 t+1 t ∂E wi = wi − ! wi 2 2 x N ∂E 1.5 wT x − c x w = !( n n) i,n i =1 n 1 3 4 5 6 7 x Epsilon is a small value: 1 • 11000 " = 0.1/N 10000 9000 Learning Curve 8000 Epsilon too large: 7000 • 6000 Error - learning diverges 5000 Epsilon too small: 4000 • 3000 - convergence slow 2000 1000 0 5 10 15 iteration Artificial Intelligence: Neural Networks 20 Michael S. Lewicki ! Carnegie Mellon Simulation: learning the decision boundary Each iteration updates the gradient: • 2.5 t+1 t ∂E wi = wi − ! wi 2 2 x N ∂E 1.5 wT x − c x w = !( n n) i,n i =1 n 1 3 4 5 6 7 x Epsilon is a small value: 1 • 11000 " = 0.1/N 10000 9000 Learning Curve 8000 Epsilon too large: 7000 • 6000 Error - learning diverges 5000 Epsilon too small: 4000 • 3000 - convergence slow 2000 1000 0 5 10 15 iteration Artificial Intelligence: Neural Networks 21 Michael S. Lewicki ! Carnegie Mellon Simulation: learning the decision boundary Each iteration updates the gradient: • 2.5 t+1 t ∂E wi = wi − ! wi 2 2 x N ∂E 1.5 wT x − c x w = !( n n) i,n i =1 n 1 3 4 5 6 7 x Epsilon is a small value: 1 • 11000 " = 0.1/N 10000 9000 Learning Curve 8000 Epsilon too large: 7000 • 6000 Error - learning diverges 5000 Epsilon too small: 4000 • 3000 - convergence slow 2000 1000 0 5 10 15 iteration Artificial Intelligence: Neural Networks 22 Michael S.

Neural Networks Slides

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support