Introduction to Machine Learning

Introduction to Machine Learning Multilayer Perceptron Barnabás Póczos The Multilayer Perceptron 2 Multilayer Perceptron 3 ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK Dean A. Pomerleau, Carnegie Mellon University, 1989 Training: using simulated road generator 4 Gradient Descent We want to solve: 5 Starting Point 6 Starting Point 7 Fixed step size can be too big 8 Fixed step size can be too small 9 10 11 Character Recognition with MLP Matlab: appcr1 12 The network Noise-free input: 26 different letters of size 7x5 13 Noisy inputs 14 Matlab MLP Training % Create MLP hiddenlayers=[10, 25]; net1 = feedforwardnet(hiddenlayers); net1 = configure(net1,X,T); %View view(net1); %Train net1 = train(net1,X,T); %Test Y1 = net1(Xtest); 15 Prediction errors ▪ Network 1 was trained on clean images 16 ▪ Network 2 was trained on noisy images. 30 noisy copies of each letter are created The Backpropagation Algorithm 17 Multilayer Perceptron 18 The gradient of the error 19 Notation 20 Some observations 21 The backpropagated error 22 The backpropagated error Lemma 23 The backpropagated error Therefore, 24 The backpropagation algorithm 25 26 27 28 29 30 31 32 What functions can multilayer perceptrons represent? 33 Perceptrons cannot represent the XOR function f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0 What functions can multilayer perceptrons represent? 34 Hilbert’s 13th Problem 1902: 23 “most important” problems in mathematics The 13th Problem: “Solve 7-th degree equation using continuous functions of two parameters.” Conjecture: It can’t be solved… Related conjecture: Let f be a function of 3 arguments such that Prove that f cannot be rewritten as a composition of finitely many functions of two arguments. Another rewritten form: Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables. 35 Function decompositions f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x) ψ1 x Φ1 ψ2 f(x,y,z) Σ y ψ3 c1 Σ Φ2 z ψ4 c2 36 Function decompositions 1957, Arnold disproves Hilbert’s conjecture. 37 Function decompositions Corollary: Issues: This statement is not constructive. 38 Universal Approximators Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989 Definition: ΣN(g) neural network with 1 hidden layer: Definition: Theorem: 39 Universal Approximators Definition: Theorem: (Blum & Li, 1991) Formal statement: 40 Proof GOAL: Integral approximation in 1-dim: Integral approximation in 2-dim: xi xi xi i i j 41 i Proof GOAL: xi The indicator xfunctioni of Xi polygon can be learned by this neural network: xi 1 if x is in Xi -1 otherwise The weighted linear combination of these indicator functions will be a good approximation of the original function f 42 Proof This linear equation can also be solved. 43 Thanks for your attention! 44.

Load more