Introduction to Natural Computation
Lecture 08
Perceptrons
Leandro Minku
1 / 47 Overview of the Lecture
Brief introduction to machine learning.
What are Perceptrons?
How do they learn?
Applications of Perceptrons.
2 / 47 Machine Learning
Focus To study and develop computational models capable of improving their performance with experience and of acquiring knowledge on their own.
How? Through data examples. Types of learning: Supervised.
Unsupervised. Semi-supervised.
Reinforcement.
3 / 47 Supervised Learning
Examples: Attributes: features of the examples. Label.
Unlabelled data are frequently called instances. We use algorithms to learn based on labelled examples and generalize to new instances.
Table: Credit Approval
Age Gender Salary Bank account start time ... Good or bad payer
4 / 47 Neural Networks
[video]
5 / 47 A Single Neuron
A neuron receives several inputs and combines these in the cell body. If the input reaches a threshold, then the neuron may fire (produce an output). Some inputs are excitatory, while others are inhibitory.
6 / 47 Perceptron: An artificial neuron
Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as the simplest artificial neural network.
7 / 47 Activation function: threshold Activation state: ( 1, if u(x) > θ 0 or 1 (-1 or 1) y = f(u(x)) = 0, otherwise
Perceptron: An artificial neuron
Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as the simplest artificial neural network.
Input function: Pn u(x) = i=1 wixi
8 / 47 Activation state: 0 or 1 (-1 or 1)
Perceptron: An artificial neuron
Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as the simplest artificial neural network.
Input function: Activation function: threshold Pn ( u(x) = wixi 1, if u(x) > θ i=1 y = f(u(x)) = 0, otherwise
9 / 47 Perceptron: An artificial neuron
Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as the simplest artificial neural network.
Input function: Activation function: threshold Activation state: Pn ( u(x) = wixi 1, if u(x) > θ 0 or 1 (-1 or 1) i=1 y = f(u(x)) = 0, otherwise
10 / 47 Perceptron: An artificial neuron
Perceptron was developed by Frank Rosenblatt in 1957 and can be considered as the simplest artificial neural network.
Inputs are typically in the range [0, 1], where 0 is “off” and 1 is “on”. Weights can be any real number (positive or negative).
11 / 47 Perceptron – A linear classifier
Input function: Pn u(x) = i=1 wixi
Activation function: threshold ( 1, if u(x) > θ y = f(u(x)) = 0, otherwise
u(x) > θ
12 / 47 Perceptron – A linear classifier
Input function: Pn u(x) = i=1 wixi
Activation function: threshold ( 1, if u(x) > θ y = f(u(x)) = 0, otherwise
u(x) > θ wx > θ
13 / 47 Perceptron – A linear classifier
Input function: Pn u(x) = i=1 wixi
Activation function: threshold ( 1, if u(x) > θ y = f(u(x)) = 0, otherwise
u(x) > θ wx > θ wx − θ > 0
14 / 47 Perceptron – A linear classifier
Input function: Pn u(x) = i=1 wixi
Activation function: threshold ( 1, if u(x) > θ y = f(u(x)) = 0, otherwise
u(x) > θ wx > θ wx − θ > 0
Linear equation (two variables): Slope–intercept form: y = mx + b General form: ax + by + c = 0
Linear equation (n variables): General form: a1x1 + a2x2 + ... + anxn + b = 0
15 / 47 Perceptron – A linear classifier
Input function: Pn u(x) = i=1 wixi
Activation function: threshold ( 1, if u(x) > θ y = f(u(x)) = 0, otherwise
u(x) > θ wx > θ wx − θ > 0
Linear equation: w1x1 + w2x2 + ... + wnxn − θ = 0
Above the linear boundary, class 1: w1x1 + w2x2 + ... + wnxn − θ > 0
Below or on the linear boundary, class 0: w1x1 + w2x2 + ... + wnxn − θ ≤ 0
16 / 47 Perceptron – A linear classifier
Input function: Pn u(x) = i=1 wixi
Activation function: threshold ( 1, if u(x) > θ y = f(u(x)) = 0, otherwise
u(x) > θ wx > θ wx − θ > 0
Linear equation: w1x1 + w2x2 + ... + wnxn − θ = 0
Above the linear boundary, class 1: w1x1 + w2x2 + ... + wnxn − θ > 0
Below or on the linear boundary, class 0: w1x1 + w2x2 + ... + wnxn − θ ≤ 0
17 / 47 Exercise
Consider the perceptron in the right and θ = 1.5.
Determine the outputs for the inputs below and plot them in a graph x1 vs x2. {0.0,0.0},
{0.0,1.0} and 1 {1.0,1.0}. 0.9 0.8 0.7 Use a closed circle to indicate output 1 and 0.6 2 0.5 an opened circle to indicate output 0. An x example for the input {1.0,0.0} is shown in 0.4 0.3 the right. Class 1 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
18 / 47 Exercise
Consider the perceptron in the right and θ = 1.5.
Determine the outputs for the inputs below and plot them in a graph x1 vs x2. {0.0,0.0},
{0.0,1.0} and 1 {1.0,1.0}. 0.9 0.8 0.7 Use a closed circle to indicate output 1 and 0.6 2 0.5 an opened circle to indicate output 0. An x example for the input {1.0,0.0} is shown in 0.4 0.3 the right. Class 1 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
19 / 47 Logic Gate AND
Consider the perceptron in the right and θ = 1.5.
x x x AND x 1 2 1 2 1 0 0 0 0.9 0 1 0 0.8 1 0 0 0.7 1 1 1 0.6 2 0.5 x 0.4 0.3 Class 1 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
20 / 47 Exercise
Consider the perceptron in the right and θ = 1.5.
Draw a line that represents the decision boundary between classes 1 and 0. 1 0.9 Remembering (line equation): 0.8 w x + w x = θ 0.7 1 1 2 2 0.6 2 0.5 You need two points to draw a line. x 0.4 0.3 Class 1 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
21 / 47 Exercise
Consider the perceptron in the right and θ = 1.5.
Draw a line that represents the decision boundary between classes 1 and 0. 1 0.9 Remembering (line equation): 0.8 w x + w x = θ 0.7 1 1 2 2 0.6 2 0.5 You need two points to draw a line. x 0.4 0.3 Class 1 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
22 / 47 Exercise
Consider the perceptron in the right and θ = 1.5.
Note that: Anything below or on the line is class 0. Anything above the line is class 1.
If we calculate the outputs to: 1 {0.2, −0.1} = 0.9 1×0.2+1×(−0.1) = 0.1 < 1.5 → class 0 0.8 {0.9, 1.1} = 0.7 1 × 0.9 + 1 × 1.1 = 2 > 1.5 → class 1 0.6 2 0.5 x 0.4 0.3 Class 1 A nice property 0.2 0.1 Class 0 Perceptrons are noise tolerant! 0 0 0.2 0.4 0.6 0.8 1 x1
23 / 47 Robustness Neural networks are often still able to give us the right answer, if there is a small amount of degradation.
Compare this with classical computing approaches!
A Nice Property of Perceptrons
Faulty or degraded hardware What would happen if the input signals were to degrade somewhat? What would happen if the weights were slighty wrong?
x1 x2 x1 AND x2 0.2 -0.1 0 0.1 1.05 0 1.1 0.3 0 0.91 1.0 1
24 / 47 A Nice Property of Perceptrons
Faulty or degraded hardware What would happen if the input signals were to degrade somewhat? What would happen if the weights were slighty wrong?
x1 x2 x1 AND x2 0.2 -0.1 0 0.1 1.05 0 1.1 0.3 0 0.91 1.0 1
Robustness Neural networks are often still able to give us the right answer, if there is a small amount of degradation.
Compare this with classical computing approaches!
25 / 47 A not so nice property Perceptrons cannot model non linearly separable data.
Another Interesting Example - Logic Gate XOR
x1 x2 x1 XOR x2 0 0 0 0 1 1 1 0 1 1 1 0
1 0.9 0.8 0.7 0.6 2 0.5 x 0.4 0.3 Class 1 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
26 / 47 Another Interesting Example - Logic Gate XOR
x1 x2 x1 XOR x2 0 0 0 0 1 1 1 0 1 1 1 0
1 0.9 0.8 0.7 0.6
2 A not so nice property 0.5 x 0.4 Perceptrons cannot model non linearly 0.3 Class 1 separable data. 0.2 0.1 Class 0 0 0 0.2 0.4 0.6 0.8 1 x1
27 / 47 But... choosing weights and threshold θ for the perceptron is not easy! How to learn the weights and threshold from examples?
We can use a learning algorithm that adjusts the weights and threshold θ based on examples.
[video]
Learning
So far: We know what is a Perceptron. We learnt that it works as a linear classifier. We saw a Perceptron that implements AND. We learnt that perceptrons are somewhat robust.
28 / 47 We can use a learning algorithm that adjusts the weights and threshold θ based on examples.
[video]
Learning
So far: We know what is a Perceptron. We learnt that it works as a linear classifier. We saw a Perceptron that implements AND. We learnt that perceptrons are somewhat robust.
But... choosing weights and threshold θ for the perceptron is not easy! How to learn the weights and threshold from examples?
29 / 47 Learning
So far: We know what is a Perceptron. We learnt that it works as a linear classifier. We saw a Perceptron that implements AND. We learnt that perceptrons are somewhat robust.
But... choosing weights and threshold θ for the perceptron is not easy! How to learn the weights and threshold from examples?
We can use a learning algorithm that adjusts the weights and threshold θ based on examples.
[video]
30 / 47 Pn i=1 wixi − θ > 0
w1x1 + w2x2 + ... + wnxn − θ > 0
w1x1 + w2x2 + ... + wnxn + θ ∗ (−1) > 0
Learning – A trick to learn θ
Pn i=1 wixi > θ
31 / 47 w1x1 + w2x2 + ... + wnxn − θ > 0
w1x1 + w2x2 + ... + wnxn + θ ∗ (−1) > 0
Learning – A trick to learn θ
Pn i=1 wixi > θ Pn i=1 wixi − θ > 0
32 / 47 w1x1 + w2x2 + ... + wnxn + θ ∗ (−1) > 0
Learning – A trick to learn θ
Pn i=1 wixi > θ Pn i=1 wixi − θ > 0
w1x1 + w2x2 + ... + wnxn − θ > 0
33 / 47 Learning – A trick to learn θ
Pn i=1 wixi > θ Pn i=1 wixi − θ > 0
w1x1 + w2x2 + ... + wnxn − θ > 0
w1x1 + w2x2 + ... + wnxn + θ ∗ (−1) > 0
34 / 47 Learning – A trick to learn θ
Pn i=1 wixi > θ Pn i=1 wixi − θ > 0
w1x1 + w2x2 + ... + wnxn − θ > 0
w1x1 + w2x2 + ... + wnxn + θ ∗ (−1) > 0
35 / 47 Learning – A trick to learn θ
Pn i=1 wixi > θ Pn i=1 wixi − θ > 0
w1x1 + w2x2 + ... + wnxn − θ > 0 We can consider θ as a weight to be learnt! w1x1 + w2x2 + ... + wnxn + θ ∗ (−1) > 0 The input is fixed as -1.
The activation function is then: ( 1, if u(x) > 0 y = f(u(x)) = 0, otherwise
36 / 47 Perceptron’s Learning Rule
As explained: Learning happens by adjusting weights. The threshold can be considered as a weight.
The adjustments are based on the following:
Perceptron’s Learning Rule
wi ← wi + ∆wi
∆wi = η(t − o)xi
Where: η, 0 < η ≤ 1 is a constant called learning rate. t is the target output of the current example. o is the output obtained by the Perceptron.
37 / 47 Perceptron’s Learning Rule – A closer look
Learning Rule
wi ← wi + ∆wi
∆wi = η(t − o)xi
Consider the following cases: o = 1 and t = 1. ∆wi = η(t − o)xi = η(1 − 1)xi = 0
o = 0 and t = 1. ∆wi = η(t − o)xi = η(1 − 0)xi = ηxi
The learning rate η is positive and controls how big the changes ∆wi are.
If xi > 0, ∆wi > 0. Then wi increases in an attempt to make wixi become larger than the threshold θ.
If xi < 0, ∆wi < 0. Then wi reduces. Note that wixi > 0 if wi < 0.
38 / 47 Learning Rule – Exercise
Perceptron’s Learning Rule
wi ← wi + ∆wi
∆wi = η(t − o)xi
Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 and learning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now, let’s temporarily ignore the learning of the threshold and consider it fixed.
Determine: The output of the Perceptron for the input -1.
The new weight w1 after applying the learning rule. The new output of the Perceptron for the input -1.
39 / 47 Learning Rule – Exercise
Perceptron’s Learning Rule
wi ← wi + ∆wi
∆wi = η(t − o)xi
Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 and learning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now, let’s temporarily ignore the learning of the threshold and consider it fixed.
Determine: The output of the Perceptron for the input -1. w1x1 = 0.5(−1) = −0.5 < θ → o = 0
The new weight w1 after applying the learning rule. The new output of the Perceptron for the input -1.
40 / 47 Learning Rule – Exercise
Perceptron’s Learning Rule
wi ← wi + ∆wi
∆wi = η(t − o)xi
Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 and learning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now, let’s temporarily ignore the learning of the threshold and consider it fixed.
Determine: The output of the Perceptron for the input -1. w1x1 = 0.5(−1) = −0.5 < θ → o = 0
The new weight w1 after applying the learning rule. ∆w1 = η(t − o)x1 = 0.6(1 − 0)(−1) = −0.6 → w1 = 0.5 − 0.6 = −0.1 The new output of the Perceptron for the input -1.
41 / 47 Learning Rule – Exercise
Perceptron’s Learning Rule
wi ← wi + ∆wi
∆wi = η(t − o)xi
Consider a Perceptron with only one input x1, weight w1 = 0.5, threshold θ = 0 and learning rate η = 0.6. Consider also the training example {x1 = −1, t = 1}. For now, let’s temporarily ignore the learning of the threshold and consider it fixed.
Determine: The output of the Perceptron for the input -1. w1x1 = 0.5(−1) = −0.5 < θ → o = 0
The new weight w1 after applying the learning rule. ∆w1 = η(t − o)x1 = 0.6(1 − 0)(−1) = −0.6 → w1 = 0.5 − 0.6 = −0.1 The new output of the Perceptron for the input -1. w1x1 = −0.1(−1) = 0.1 > θ → o = 1
42 / 47 Learning Rule – To try at home
Do the same as in the previous slide, but using learning rate η = 0.3.
Do the same as in the previous slide, but considering the threshold θ as a weight to be learnt.
43 / 47 Learning Algorithm
1: Initialize all weights randomly. 2: repeat 3: for each training example do 4: Apply the learning rule. 5: end for 6: until the error is acceptable or a certain number of iterations is reached.
This algorithm is guaranteed to find a solution with error zero in a limited number of iterations as long as the examples are linearly separable.
44 / 47 Applications of Perceptrons
Perceptrons can be used only for linearly separable data. SPAM filter. Web pages classification.
However, if we use a transformation of the data to another space in which data separation is easier (kernels), perceptrons can have more applications.
Neural networks with several layers of artificial neurons (multilayer perceptrons) can be used as universal approximators (next lecture)!
45 / 47 Summary
In this lecture, we’ve seen:
What a Perceptron is.
That Perceptron works as a linear classifier.
That Perceptrons are somewhat robust.
A learning algorithm for Perceptrons.
What Perceptrons can be used for.
46 / 47 Further Reading
Gurney K. Chapters 2, 3, 4.1-4.4. In: Introduction to Neural Networks. Taylor and Francis; 2003. .
47 / 47