The

Foundations of Data Analysis 04/09/2020

1 History of Perceptron

■ Frank Rosenblatt ■ 1928-1969

invented perceptron algorithm 2 History of Perceptron

■ Marvin Minsky ■ 1927-2016

stated that a simple perceptron is limited to linearly separable problems 3 Perceptron Learning Algorithm

■ First neural network learning model in the 1960’s

4 Perceptron Learning Algorithm

■ First neural network learning model in the 1960’s

■ Simple and limited (single layer model)

5 Perceptron Learning Algorithm

■ First neural network learning model in the 1960’s

■ Simple and limited (single layer model)

■ Basic concepts are similar to multi-layer models

6 What is Perceptron?

The goal of perceptron algorithm is to find a hyperplane that separates a set of data into two classes.

Hyperplane (decision boundary)

Class 1

Class 0

7 What is Perceptron?

The goal of perceptron algorithm is to find a hyperplane that separates a set of data into two classes.

Hyperplane (decision boundary)

Class 1 • Binary classifier

• Supervised learning

Class 0

8 Perceptron

bias term Class 1 1,if w x + ✓ > 0 f(x)=1,w·x + b>0 0,otherwise·

n Class 0 w x = wixi (dot product) · i=1 P

9 Perceptron

Net Activation funciton Output

x 1 w1

x2 w2 Z

1,if w x>✓ z = · x w 0,w x<= ✓ n n · n w x = wixi · i=1 P • Learning weights such that an objective function is minimized

10

Outputs the label given an input or a set of inputs.

⎧ 1 if x ≥ 0 f (x) := sgn(x) = ⎨ ⎩− 1 if x < 0 Step function

€ f(x)=max(0,x)

ReLU (rectified linear unit)

1 f (x) := σ(x) = 1+ e(−ax)

Sigmoid function 11 € Perceptron as a Single Layer Neuron

12 Examples

✓ =0.2 .9 .5 Z z =? .1 -.3

1,if w x>✓ z = · 0,w x<= ✓ · .8 .4

.6 .2 Z z =?

-.5 .2 ✓ =0.1

13 How to Learn Perceptron?

Class 1 1,if w x + ✓ > 0 f(x)=1,w·x + b>0 0,otherwise·

Class 0 w, ✓ are unknown parameters

14 How to Learn Perceptron?

Class 1 1,if w x + ✓ > 0 f(x)=1,w·x + b>0 0,otherwise·

Class 0 w, ✓ are unknown parameters

■ In supervised learning the network has its output compared with known correct answers ■ Supervised learning ■ Learning with a teacher 15 CS 478 - 16 Perceptron Learning Rules

■ Consider linearly separable problems ■ How to find appropriate weights ■ Look if the output result o belongs to the desired class has the desired value d (give labels)

new old w = w + w w = ⌘ (d o)xi 4 4 i η is called the , withP 0 < η ≤ 1

Perceptron Convergence Theorem: Guaranteed to find a solution in finite time if a solution exists

17 Perceptron Learning Rules

■ The algorithm converges to the correct classification ■ if the training data is linearly separable

■ When assigning a value to η we must keep in mind two conflicting requirements ■ Averaging of past inputs to provide stable weights estimates, which requires small η ■ Fast adaptation with respect to real changes in the underlying distribution of the process responsible for the generation of the input vector x, which requires large η

18 Linear Separability

19 Limited Functionality of Hyperplane

CS 478 - Perceptrons 20 Multilayer Network

output layer n

o1 = sgn(∑w1i xi ) i= 0

hidden layer n € o2 = sgn(∑w2i xi ) i= 0 input layer

21