Introduction to Machine Learning
Multilayer Perceptron
Barnabás Póczos The Multilayer Perceptron
2 Multilayer Perceptron
3 ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK Dean A. Pomerleau, Carnegie Mellon University, 1989
Training: using simulated road generator
We want to solve:
5 Starting Point
6 Starting Point
7 Fixed step size can be too big
8 Fixed step size can be too small
9 10 11 Character Recognition with MLP
Matlab: appcr1
12 The network
Noise-free input:
26 different letters of size 7x5
13 Noisy inputs
14 Matlab MLP Training % Create MLP hiddenlayers=[10, 25]; net1 = feedforwardnet(hiddenlayers); net1 = configure(net1,X,T);
%View view(net1);
%Train net1 = train(net1,X,T);
%Test
Y1 = net1(Xtest); 15 Prediction errors
▪ Network 1 was trained on clean images 16 ▪ Network 2 was trained on noisy images. 30 noisy copies of each letter are created The Backpropagation Algorithm
17 Multilayer Perceptron
18 The gradient of the error
19 Notation
20 Some observations
21 The backpropagated error
22 The backpropagated error
Lemma
23 The backpropagated error
Therefore,
24 The backpropagation algorithm
25 26 27 28 29 30 31 32 What functions can multilayer perceptrons represent?
33 Perceptrons cannot represent the XOR function
f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0
What functions can multilayer perceptrons represent?
34 Hilbert’s 13th Problem
1902: 23 “most important” problems in mathematics
The 13th Problem: “Solve 7-th degree equation using continuous functions of two parameters.” Conjecture: It can’t be solved…
Related conjecture: Let f be a function of 3 arguments such that
Prove that f cannot be rewritten as a composition of finitely many functions of two arguments.
Another rewritten form: Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables. 35 Function decompositions f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x)
ψ1 x Φ1
ψ2 f(x,y,z) Σ y ψ3 c1 Σ Φ2 z ψ4 c2
36 Function decompositions
1957, Arnold disproves Hilbert’s conjecture.
37 Function decompositions
Corollary:
Issues: This statement is not constructive.
38 Universal Approximators
Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989
Definition: ΣN(g) neural network with 1 hidden layer:
Definition:
Theorem:
39 Universal Approximators
Definition:
Theorem: (Blum & Li, 1991)
Formal statement:
40 Proof GOAL:
Integral approximation in 1-dim:
Integral approximation in 2-dim:
xi
xi xi
i i j 41 i Proof GOAL:
xi
xi xi
The indicator function of Xi polygon can be learned by this neural network:
1 if x is in Xi -1 otherwise
The weighted linear combination of these indicator functions will be a good approximation of the original function f 42 Proof
This linear equation can also be solved.
43 Thanks for your attention!
44