The Perceptron
Foundations of Data Analysis 04/09/2020
1 History of Perceptron
■ Frank Rosenblatt ■ 1928-1969
invented perceptron algorithm 2 History of Perceptron
■ Marvin Minsky ■ 1927-2016
stated that a simple perceptron is limited to linearly separable problems 3 Perceptron Learning Algorithm
■ First neural network learning model in the 1960’s
4 Perceptron Learning Algorithm
■ First neural network learning model in the 1960’s
■ Simple and limited (single layer model)
5 Perceptron Learning Algorithm
■ First neural network learning model in the 1960’s
■ Simple and limited (single layer model)
■ Basic concepts are similar to multi-layer models
6 What is Perceptron?
The goal of perceptron algorithm is to find a hyperplane that separates a set of data into two classes.
Hyperplane (decision boundary)
Class 1
Class 0
7 What is Perceptron?
The goal of perceptron algorithm is to find a hyperplane that separates a set of data into two classes.
Hyperplane (decision boundary)
Class 1 • Binary classifier
• Supervised learning
Class 0
8 Perceptron
bias term Class 1 1,if w x + ✓ > 0 f(x)=1,w·x + b>0 0,otherwise·
n Class 0 w x = wixi (dot product) · i=1 P
9 Perceptron
Net Activation funciton Output
x 1 w1
x2 w2 Z
1,if w x>✓ z = · x w 0,w x<= ✓ n n · n w x = wixi · i=1 P • Learning weights such that an objective function is minimized
Outputs the label given an input or a set of inputs.
⎧ 1 if x ≥ 0 f (x) := sgn(x) = ⎨ ⎩− 1 if x < 0 Step function
€ f(x)=max(0,x)
ReLU (rectified linear unit)
1 f (x) := σ(x) = 1+ e(−ax)
Sigmoid function 11 € Perceptron as a Single Layer Neuron
12 Examples
✓ =0.2 .9 .5 Z z =? .1 -.3
1,if w x>✓ z = · 0,w x<= ✓ · .8 .4
.6 .2 Z z =?
-.5 .2 ✓ =0.1
13 How to Learn Perceptron?
Class 1 1,if w x + ✓ > 0 f(x)=1,w·x + b>0 0,otherwise·
Class 0 w, ✓ are unknown parameters
14 How to Learn Perceptron?
Class 1 1,if w x + ✓ > 0 f(x)=1,w·x + b>0 0,otherwise·
Class 0 w, ✓ are unknown parameters
■ In supervised learning the network has its output compared with known correct answers ■ Supervised learning ■ Learning with a teacher 15 CS 478 - Perceptrons 16 Perceptron Learning Rules
■ Consider linearly separable problems ■ How to find appropriate weights ■ Look if the output result o belongs to the desired class has the desired value d (give labels)
new old w = w + w w = ⌘ (d o)xi 4 4 i η is called the learning rate, withP 0 < η ≤ 1
Perceptron Convergence Theorem: Guaranteed to find a solution in finite time if a solution exists
17 Perceptron Learning Rules
■ The algorithm converges to the correct classification ■ if the training data is linearly separable
■ When assigning a value to η we must keep in mind two conflicting requirements ■ Averaging of past inputs to provide stable weights estimates, which requires small η ■ Fast adaptation with respect to real changes in the underlying distribution of the process responsible for the generation of the input vector x, which requires large η
18 Linear Separability
19 Limited Functionality of Hyperplane
CS 478 - Perceptrons 20 Multilayer Network
output layer n
o1 = sgn(∑w1i xi ) i= 0
hidden layer n € o2 = sgn(∑w2i xi ) i= 0 input layer
€
21