Introduction to

Multilayer

Barnabás Póczos The Multilayer Perceptron

2 Multilayer Perceptron

3 ALVINN: AN AUTONOMOUS LAND VEHICLE IN A NEURAL NETWORK Dean A. Pomerleau, Carnegie Mellon University, 1989

Training: using simulated road generator

4

We want to solve:

5 Starting Point

6 Starting Point

7 Fixed step size can be too big

8 Fixed step size can be too small

9 10 11 Character Recognition with MLP

Matlab: appcr1

12 The network

Noise-free input:

26 different letters of size 7x5

13 Noisy inputs

14 Matlab MLP Training % Create MLP hiddenlayers=[10, 25]; net1 = feedforwardnet(hiddenlayers); net1 = configure(net1,X,T);

%View view(net1);

%Train net1 = train(net1,X,T);

%Test

Y1 = net1(Xtest); 15 Prediction errors

▪ Network 1 was trained on clean images 16 ▪ Network 2 was trained on noisy images. 30 noisy copies of each letter are created The Algorithm

17 Multilayer Perceptron

18 The gradient of the error

19 Notation

20 Some observations

21 The backpropagated error

22 The backpropagated error

Lemma

23 The backpropagated error

Therefore,

24 The backpropagation algorithm

25 26 27 28 29 30 31 32 What functions can multilayer represent?

33 Perceptrons cannot represent the XOR function

f(0,0)=1, f(1,1)=1, f(0,1)=0, f(1,0)=0

What functions can multilayer perceptrons represent?

34 Hilbert’s 13th Problem

1902: 23 “most important” problems in mathematics

The 13th Problem: “Solve 7-th degree equation using continuous functions of two parameters.” Conjecture: It can’t be solved…

Related conjecture: Let f be a function of 3 arguments such that

Prove that f cannot be rewritten as a composition of finitely many functions of two arguments.

Another rewritten form: Prove that there is a nonlinear continuous system of three variables that cannot be decomposed with finitely many functions of two variables. 35 Function decompositions f(x,y,z)=Φ1(ψ1(x), ψ2(y))+Φ2(c1ψ3(y)+c2ψ4(z),x)

ψ1 x Φ1

ψ2 f(x,y,z) Σ y ψ3 c1 Σ Φ2 z ψ4 c2

36 Function decompositions

1957, Arnold disproves Hilbert’s conjecture.

37 Function decompositions

Corollary:

Issues: This statement is not constructive.

38 Universal Approximators

Kur Hornik, Maxwell Stinchcombe and Halber White: “Multilayer feedforward networks are universal approximators”, Neural Networks, Vol:2(3), 359-366, 1989

Definition: ΣN(g) neural network with 1 hidden :

Definition:

Theorem:

39 Universal Approximators

Definition:

Theorem: (Blum & Li, 1991)

Formal statement:

40 Proof GOAL:

Integral approximation in 1-dim:

Integral approximation in 2-dim:

xi

xi xi

        i i  j 41 i Proof GOAL:

xi

xi xi

The indicator function of Xi polygon can be learned by this neural network:

1 if x is in Xi -1 otherwise

The weighted linear combination of these indicator functions will be a good approximation of the original function f 42 Proof

This linear equation can also be solved.

43 Thanks for your attention!

44