Perceptrons.Pdf

Machine Learning: Perceptrons Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Machine Learning: Perceptrons – p.1/24 Neural Networks ◮ The human brain has approximately 1011 neurons − ◮ Switching time 0.001s (computer ≈ 10 10s) ◮ Connections per neuron: 104 − 105 ◮ 0.1s for face recognition ◮ I.e. at most 100 computation steps ◮ parallelism ◮ additionally: robustness, distributedness ◮ ML aspects: use biology as an inspiration for artificial neural models and algorithms; do not try to explain biology: technically imitate and exploit capabilities Machine Learning: Perceptrons – p.2/24 Biological Neurons ◮ Dentrites input information to the cell ◮ Neuron fires (has action potential) if a certain threshold for the voltage is exceeded ◮ Output of information by axon ◮ The axon is connected to dentrites of other cells via synapses ◮ Learning corresponds to adaptation of the efficiency of synapse, of the synaptical weight dendrites SYNAPSES AXON soma Machine Learning: Perceptrons – p.3/24 Historical ups and downs 1942 artificial neurons (McCulloch/Pitts) 1949 Hebbian learning (Hebb) 1950 1960 1970 1980 1990 2000 1958 Rosenblatt perceptron (Rosenblatt) 1960 Adaline/MAdaline (Widrow/Hoff) 1960 Lernmatrix (Steinbuch) 1969 “perceptrons” (Minsky/Papert) 1970 evolutionary algorithms (Rechenberg) 1972 self-organizing maps (Kohonen) 1982 Hopfield networks (Hopfield) 1986 Backpropagation (orig. 1974) 1992 Bayes inference computational learning theory support vector machines Boosting Machine Learning: Perceptrons – p.4/24 Perceptrons: adaptive neurons ◮ perceptrons (Rosenblatt 1958, Minsky/Papert 1969) are generalized variants of a former, more simple model (McCulloch/Pitts neurons, 1942): • inputs are weighted • weights are real numbers (positive and negative) • no special inhibitory inputs Machine Learning: Perceptrons – p.5/24 Perceptrons: adaptive neurons ◮ perceptrons (Rosenblatt 1958, Minsky/Papert 1969) are generalized variants of a former, more simple model (McCulloch/Pitts neurons, 1942): • inputs are weighted • weights are real numbers (positive and negative) • no special inhibitory inputs ◮ a percpetron with n inputs is described by a weight vector T n ~w = (w1,...,wn) ∈ R and a threshold θ ∈ R. It calculates the following function: T 1 if x1w1 + x2w2 + ··· + xnwn ≥ θ (x1,...,xn) 7→ y = (0 if x1w1 + x2w2 + ··· + xnwn < θ Machine Learning: Perceptrons – p.5/24 Perceptrons: adaptive neurons (cont.) ◮ for convenience: replacing the threshold by an additional weight (bias weight) w0 = −θ. A perceptron with weight vector ~w and bias weight w0 performs the following calculation: n T (x1,...,xn) 7→ y = fstep(w0 + (wixi)) = fstep(w0 + h~w, ~xi) i=1 X with 1 if z ≥ 0 fstep(z) = (0 if z < 0 Machine Learning: Perceptrons – p.6/24 Perceptrons: adaptive neurons (cont.) ◮ for convenience: replacing the threshold by an additional weight (bias weight) w0 = −θ. A perceptron with weight vector ~w and bias weight w0 performs the following calculation: n T (x1,...,xn) 7→ y = fstep(w0 + (wixi)) = fstep(w0 + h~w, ~xi) i=1 X with 1 if z ≥ 0 fstep(z) = 1 (0 if z < 0 w0 w1 x1 . Σ y xn wn Machine Learning: Perceptrons – p.6/24 Perceptrons: adaptive neurons (cont.) geometric interpretation of a x2 upper perceptron: halfspace • input patterns (x1,...,xn) are points in n-dimensional space hyperplane lower halfspace x1 x3 upper halfspace lower halfspace x2 hyperplane x1 Machine Learning: Perceptrons – p.7/24 Perceptrons: adaptive neurons (cont.) geometric interpretation of a x2 upper perceptron: halfspace • input patterns (x1,...,xn) are points in n-dimensional space hyperplane points with are on lower • w0 + h~w, ~xi =0 halfspace a hyperplane defined by w0 and ~w x1 x3 upper halfspace lower halfspace x2 hyperplane x1 Machine Learning: Perceptrons – p.7/24 Perceptrons: adaptive neurons (cont.) geometric interpretation of a x2 upper perceptron: halfspace • input patterns (x1,...,xn) are points in n-dimensional space hyperplane points with are on lower • w0 + h~w, ~xi =0 halfspace a hyperplane defined by w0 and ~w x1 • points with w0 + h~w, ~xi > 0 are x3 upper above the hyperplane halfspace lower halfspace x2 hyperplane x1 Machine Learning: Perceptrons – p.7/24 Perceptrons: adaptive neurons (cont.) geometric interpretation of a x2 upper perceptron: halfspace • input patterns (x1,...,xn) are points in n-dimensional space hyperplane points with are on lower • w0 + h~w, ~xi =0 halfspace a hyperplane defined by w0 and ~w x1 • points with w0 + h~w, ~xi > 0 are x3 upper above the hyperplane halfspace • points with w0 + h~w, ~xi < 0 are below the hyperplane lower halfspace x2 hyperplane x1 Machine Learning: Perceptrons – p.7/24 Perceptrons: adaptive neurons (cont.) geometric interpretation of a x2 upper perceptron: halfspace • input patterns (x1,...,xn) are points in n-dimensional space hyperplane points with are on lower • w0 + h~w, ~xi =0 halfspace a hyperplane defined by w0 and ~w x1 • points with w0 + h~w, ~xi > 0 are x3 upper above the hyperplane halfspace • points with w0 + h~w, ~xi < 0 are below the hyperplane lower • perceptrons partition the input space halfspace x2 hyperplane into two halfspaces along a hyperplane x1 Machine Learning: Perceptrons – p.7/24 Perceptron learning problem ◮ perceptrons can automatically adapt to example data ⇒ Supervised Learning: Classification Machine Learning: Perceptrons – p.8/24 Perceptron learning problem ◮ perceptrons can automatically adapt to example data ⇒ Supervised Learning: Classification ◮ perceptron learning problem: given: • a set of input patterns P ⊆ Rn, called the set of positive examples • another set of input patterns N ⊆ Rn, called the set of negative examples task: • generate a perceptron that yields 1 for all patterns from P and 0 for all patterns from N Machine Learning: Perceptrons – p.8/24 Perceptron learning problem ◮ perceptrons can automatically adapt to example data ⇒ Supervised Learning: Classification ◮ perceptron learning problem: given: • a set of input patterns P ⊆ Rn, called the set of positive examples • another set of input patterns N ⊆ Rn, called the set of negative examples task: • generate a perceptron that yields 1 for all patterns from P and 0 for all patterns from N ◮ obviously, there are cases in which the learning task is unsolvable, e.g. P ∩ N= 6 ∅ Machine Learning: Perceptrons – p.8/24 Perceptron learning problem (cont.) ◮ Lemma (strict separability): Whenever exist a perceptron that classifies all training patterns accurately, there is also a perceptron that classifies all training patterns accurately and no training pattern is located on the decision boundary, i.e. ~w0 + h~w, ~xi=0 6 for all training patterns. Machine Learning: Perceptrons – p.9/24 Perceptron learning problem (cont.) ◮ Lemma (strict separability): Whenever exist a perceptron that classifies all training patterns accurately, there is also a perceptron that classifies all training patterns accurately and no training pattern is located on the decision boundary, i.e. ~w0 + h~w, ~xi=0 6 for all training patterns. Proof: Let (~w, w0) be a perceptron that classifies all patterns accurately. Hence, ≥ 0 for all ~x ∈ P h~w, ~xi + w0 (< 0 for all ~x ∈ N Machine Learning: Perceptrons – p.9/24 Perceptron learning problem (cont.) ◮ Lemma (strict separability): Whenever exist a perceptron that classifies all training patterns accurately, there is also a perceptron that classifies all training patterns accurately and no training pattern is located on the decision boundary, i.e. ~w0 + h~w, ~xi=0 6 for all training patterns. Proof: Let (~w, w0) be a perceptron that classifies all patterns accurately. Hence, ≥ 0 for all ~x ∈ P h~w, ~xi + w0 (< 0 for all ~x ∈ N Define ε = min{−(h~w, ~xi + w0)|~x ∈N}. Then: ε for all ε ≥ 2 > 0 ~x ∈ P h~w, ~xi + w0 + ε for all 2 (≤ − 2 < 0 ~x ∈ N Machine Learning: Perceptrons – p.9/24 Perceptron learning problem (cont.) ◮ Lemma (strict separability): Whenever exist a perceptron that classifies all training patterns accurately, there is also a perceptron that classifies all training patterns accurately and no training pattern is located on the decision boundary, i.e. ~w0 + h~w, ~xi=0 6 for all training patterns. Proof: Let (~w, w0) be a perceptron that classifies all patterns accurately. Hence, ≥ 0 for all ~x ∈ P h~w, ~xi + w0 (< 0 for all ~x ∈ N Define ε = min{−(h~w, ~xi + w0)|~x ∈N}. Then: ε for all ε ≥ 2 > 0 ~x ∈ P h~w, ~xi + w0 + ε for all 2 (≤ − 2 < 0 ~x ∈ N Thus, the perceptron ε proves the lemma. (~w, w0 + 2 ) Machine Learning: Perceptrons – p.9/24 Perceptron learning algorithm: idea ◮ assume, the perceptron makes an error on a pattern ~x ∈ P: h~w, ~xi + w0 < 0 ◮ how can we change ~w and w0 to avoid this error? Machine Learning: Perceptrons – p.10/24 Perceptron learning algorithm: idea ◮ assume, the perceptron makes an error on a pattern ~x ∈ P: h~w, ~xi + w0 < 0 ◮ how can we change ~w and w0 to avoid this error? – we need to increase h~w, ~xi + w0 Machine Learning: Perceptrons – p.10/24 Perceptron learning algorithm: idea ◮ assume, the perceptron makes an error on a pattern ~x ∈ P: h~w, ~xi + w0 < 0 ◮ how can we change ~w and w0 to avoid this error? – we need to increase h~w, ~xi + w0 • increase w0 • if xi > 0, increase wi • if xi < 0 (’negative influence’), decrease wi ◮ perceptron learning algorithm: add ~x to ~w, add 1 to w0 in this case. Errors on negative patterns: analogously. Machine Learning: Perceptrons – p.10/24 Perceptron learning algorithm: idea ◮ assume, the perceptron makes an error on a pattern ~x ∈ P: h~w, ~xi + w0 < 0 ◮ how can we change ~w and w0 to avoid this error? – we need to increase h~w, ~xi + w0 • increase w0 • if xi > 0, increase wi • if xi < 0 (’negative influence’), decrease wi ◮ perceptron learning algorithm: add ~x to ~w, add 1 to w0 in this case. Errors on negative patterns: analogously. Machine Learning: Perceptrons – p.10/24 Perceptron learning algorithm: idea ◮ assume, the perceptron makes an x3 error on a pattern ~x ∈ P: h~w, ~xi + w0 < 0 ◮ how can we change ~w and w0 to avoid this error? – we need to w increase h~w, ~xi + w0 x2 • increase w0 • if xi > 0, increase wi x1 • if xi < 0 (’negative influence’), decrease wi Geometric intepretation: increasing w0 ◮ perceptron learning algorithm: add ~x to ~w, add 1 to w0 in this case.

Perceptrons.Pdf

Malware Classification with BERT

Backpropagation and Deep Learning in the Brain

Implementing Machine Learning & Neural Network Chip

Predrnn: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal Lstms

Fun with Hyperplanes: Perceptrons, Svms, and Friends

Introduction to Machine Learning

Audio Event Classification Using Deep Learning in an End-To-End Approach

Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning

Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inﬂow Forecasting

Deep Learning in Bioinformatics

Machine Learning and Data Mining Machine Learning Algorithms Enable Discovery of Important “Regularities” in Large Data Sets

Sparse Cooperative Q-Learning