<<

Lecture 6. Notes on Linear Algebra.

COMP90051 Statistical Machine

Semester 2, 2017 Lecturer: Andrey Kan

Copyright: University of Melbourne Statistical (S2 2017) Deck 6 This lecture

• Notes on linear algebra ∗ Vectors and dot products ∗ and vector normals • Perceptron ∗ Introduction to Artificial Neural Networks ∗ The perceptron model ∗ Stochastic

2 Statistical Machine Learning (S2 2016) Deck 6

Notes on Linear Algebra

Link between geometric and algebraic interpretation of ML methods

3 Statistical Machine Learning (S2 2017) Deck 6 What are vectors? Suppose = , . What does really represent? ′ 𝒖𝒖 𝑢𝑢1 𝑢𝑢2 𝒖𝒖 Ordered set of numbers { , }

𝑢𝑢1 𝑢𝑢2 Cartesian coordinates of a 𝑢𝑢2 0

A direction 𝑢𝑢1 2 𝑢𝑢 art: OpenClipartVectors at pixabay.com (CC0) 𝑢𝑢1 4 Statistical Machine Learning (S2 2017) Deck 6 Dot product: algebraic definition

• Given two -dimensional vectors and , their dot product is ∗ E.g., weighted𝑚𝑚 sum of′ terms is𝑚𝑚 a dot product𝒖𝒖 𝒗𝒗 𝑖𝑖=1 𝑖𝑖 𝑖𝑖 𝒖𝒖 ⋅ 𝒗𝒗 ≡ 𝒖𝒖 𝒗𝒗 ≡ ∑ 𝑢𝑢 𝑣𝑣 ′ • Verify that if is a constant and , and 𝒙𝒙 𝒘𝒘are vectors of the same size then 𝑘𝑘 = = 𝒂𝒂 𝒃𝒃 𝒄𝒄 ′ + =′ + ′ 𝑘𝑘𝒂𝒂′ 𝒃𝒃 𝑘𝑘 𝒂𝒂 𝒃𝒃 ′ 𝒂𝒂 ′𝑘𝑘𝒃𝒃 𝒂𝒂 𝒃𝒃 𝒄𝒄 𝒂𝒂 𝒃𝒃 𝒂𝒂 𝒄𝒄

5 Statistical Machine Learning (S2 2017) Deck 6 Dot product: geometric definition • Given two -dimensional vectors and , their dot product is cos ∗ Here 𝑚𝑚and are′ L2 norms (i.e.,𝒖𝒖 Euclidean𝒗𝒗 lengths) for vectors𝒖𝒖 ⋅ 𝒗𝒗and≡ 𝒖𝒖 𝒗𝒗 ≡ 𝒖𝒖 𝒗𝒗 𝜃𝜃 ∗ and is𝒖𝒖 the angle𝒗𝒗 between vectors 𝒖𝒖 𝒗𝒗 𝜃𝜃 The scalar projection of onto is given by 𝒖𝒖 = cos 𝒖𝒖 𝒗𝒗 𝒗𝒗 𝒗𝒗 Thus𝑢𝑢 dot product𝒖𝒖 is𝜃𝜃 = = ′ 𝜃𝜃 𝒗𝒗 𝒖𝒖 𝒖𝒖 𝒗𝒗 𝑢𝑢 𝒗𝒗 𝑣𝑣 𝒖𝒖 6 Statistical Machine Learning (S2 2017) Deck 6 Equivalence of definitions • Lemma: The algebraic and geometric definitions are identical • Proof sketch: ∗ Express the vectors using the standard vector basis , … , in , = , and = 1 𝑚𝑚 ∗ Vectors𝑚𝑚 are𝑚𝑚 an orthonormal basic,𝑚𝑚 they have unit𝒆𝒆 length𝒆𝒆 and ∑𝑖𝑖=1 𝑖𝑖 𝑖𝑖 ∑𝑖𝑖=1 𝑖𝑖 𝑖𝑖 orthogonal𝑹𝑹 𝒖𝒖 to each𝑢𝑢 𝒆𝒆 other,𝒗𝒗 so =𝑣𝑣 1𝒆𝒆 and = 0 for 𝑖𝑖 𝒆𝒆 ′ ′ 𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑗𝑗 = 𝑚𝑚 =𝒆𝒆 𝒆𝒆𝑚𝑚 𝒆𝒆 𝒆𝒆 𝑖𝑖 ≠ 𝑗𝑗 ′ ′ ′ 𝒖𝒖 𝒗𝒗 𝒖𝒖 � 𝑣𝑣𝑖𝑖𝒆𝒆𝑖𝑖 � 𝑣𝑣𝑖𝑖 𝒖𝒖 𝒆𝒆𝑖𝑖 𝑖𝑖=1 𝑖𝑖=1 = 𝑚𝑚 cos = 𝑚𝑚

� 𝑣𝑣𝑖𝑖 𝒖𝒖 𝒆𝒆𝑖𝑖 𝜃𝜃𝑖𝑖 � 𝑣𝑣𝑖𝑖𝑢𝑢𝑖𝑖 𝑖𝑖=1 𝑖𝑖=1 7 Statistical Machine Learning (S2 2017) Deck 6 Geometric properties of the dot product • If the two vectors are orthogonal then = 0 ′ • If the two vectors are parallel then 𝒖𝒖=𝒗𝒗 , if they are anti-parallel then = ′ ′ 𝒖𝒖 𝒗𝒗 𝒖𝒖 𝒗𝒗 • = , so = 𝒖𝒖 𝒗𝒗+ −+ 𝒖𝒖 defines𝒗𝒗 the Euclidean′ vector2 length 2 2 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝑢𝑢1 ⋯ 𝑢𝑢𝑚𝑚 𝒖𝒖

𝒗𝒗

𝜃𝜃 8 Statistical Machine Learning (S2 2017) Deck 6 Hyperplanes and normal vectors • A defined by parameters and is a set of points that satisfy + = 0 ′ 𝒘𝒘 𝑏𝑏 • In 2D, a hyperplane𝒙𝒙 is a𝒙𝒙 :𝒘𝒘 a 𝑏𝑏line is a set of points that satisfy + + = 0

𝑤𝑤1𝑥𝑥1 𝑤𝑤2𝑥𝑥2 𝑏𝑏 A normal vector for a 𝑥𝑥2 hyperplane is a vector perpendicular to that hyperplane

9 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Hyperplanes and normal vectors • Consider a hyperplane defined by parameters and . Note that is itself a vector 𝒘𝒘 • Lemma𝑏𝑏 : Vector𝒘𝒘 is a normal vector to the hyperplane 𝒘𝒘 • Proof sketch: ∗ Choose any two points and on the hyperplane. Note that vector lies on the hyperplane ∗ Consider dot product 𝒖𝒖 𝒗𝒗 = = 𝒖𝒖 − 𝒗𝒗+ ′+ =′ 0 ′ ∗ Thus lies′ on the𝒖𝒖 hyperplane,− 𝒗𝒗′ 𝒘𝒘 𝒖𝒖 but𝒘𝒘 − is 𝒗𝒗perpendicular𝒘𝒘 to , and so𝒖𝒖 𝒘𝒘is a vector𝑏𝑏 − 𝒗𝒗normal𝒘𝒘 𝑏𝑏 𝒖𝒖 − 𝒗𝒗 𝒘𝒘 𝒘𝒘 10 Statistical Machine Learning (S2 2017) Deck 6 Example in 2D • Consider a line defined by , and

• Vector = , is a normal𝑤𝑤1 𝑤𝑤2 vector𝑏𝑏 ′ 𝒘𝒘 𝑤𝑤1 𝑤𝑤2

𝑥𝑥2

= 𝑤𝑤1 𝑏𝑏 𝑤𝑤1 𝑥𝑥2 − 𝑥𝑥1 − 𝑤𝑤2 𝑤𝑤2 𝑤𝑤2 𝑥𝑥1 11 Statistical Machine Learning (S2 2017) Deck 6 Is a linear method?

Logistic function 1.0 0.8 0.6 𝑠𝑠 𝑓𝑓 0.4 0.2 0.0

-10 -5 0 5 10 Reals

𝑠𝑠 12 Statistical Machine Learning (S2 2017) Deck 6 Logistic regression is a • Logistic regression model: 1 = 1| = 1 + exp 𝑃𝑃 𝒴𝒴 𝒙𝒙 ′ • Classification rule: −𝒙𝒙 𝒘𝒘 if = 1| > then class “1”, else class “0” 1 • Decision𝑃𝑃 𝒴𝒴boundary:𝒙𝒙 2 1 1 = 1 + exp 2 exp ′ = 1 −′=𝒙𝒙 𝒘𝒘0 −′ 𝒙𝒙 𝒘𝒘 13 𝒙𝒙 𝒘𝒘 Statistical Machine Learning (S2 2016) Deck 6

The Perceptron Model

A building block for artificial neural networks, yet another linear classifier

14 Statistical Machine Learning (S2 2017) Deck 6 Biological inspiration

• Humans perform well at many tasks that matter • Originally neural networks were an attempt to mimic the human

photo: Alvesgaspar, Wikimedia Commons, CC3

15 Statistical Machine Learning (S2 2017) Deck 6 Artificial neural network • As a crude approximation, the can be thought as a mesh of interconnected processing nodes () that relay electrical signals

• Artificial neural network is a network of processing elements

• Each element converts inputs … to output … … • The output is a function (called ) of a weighted sum of inputs

16 Statistical Machine Learning (S2 2017) Deck 6 Outline

• In order to use an ANN we need (a) to design network topology and (b) adjust weights to given ∗ In this course, we will exclusively focus on task (b) for a particular class of networks called feed forward networks • Training an ANN means adjusting weights for training data given a pre-defined network topology • We will come back to ANNs and discuss ANN training in the next lecture • Right now we will turn our attention to an individual network element because it is an interesting model in itself

17 Statistical Machine Learning (S2 2017) Deck 6 Perceptron model

×

1 × 𝑤𝑤1 𝑥𝑥 ( ) 𝑤𝑤2 𝑥𝑥2 × 𝑠𝑠 1 𝑓𝑓 𝑠𝑠 𝑤𝑤0 Σ • ,𝑓𝑓 – inputs

1 2 Compare this • 𝑥𝑥 , 𝑥𝑥 – synaptic weights model to logistic 1 2 regression • 𝑤𝑤 –𝑤𝑤 weight • 𝑤𝑤0– activation function

𝑓𝑓 18 Statistical Machine Learning (S2 2017) Deck 6 Perceptron is a linear binary classifier

Perceptron is a Predict class A if 0 binary classifier: Predict class B if < 0 where = 𝑠𝑠 ≥ 𝑚𝑚 𝑠𝑠 Perceptron is a linear classifier: 𝑠𝑠 ∑𝑖𝑖=0 𝑥𝑥𝑖𝑖𝑤𝑤𝑖𝑖 is a linear function of inputs, and the decision boundary is linear 𝑠𝑠 decision with boundary values of

𝑠𝑠

𝑥𝑥2 plane with 𝑥𝑥1 data points Example for 2D data

19 Statistical Machine Learning (S2 2017) Deck 6

Exercise: find weights of a perceptron capable of 𝒙𝒙0𝟏𝟏 𝒙𝒙0𝟐𝟐 Class𝒚𝒚 B perfect classification of 0 1 Class B the following dataset 1 0 Class B 1 1 Class A

art: OpenClipartVectors at pixabay.com (CC0) Statistical Machine Learning (S2 2017) Deck 6 for perceptron

• Recall that “training” means finding weights that minimise some loss. Therefore, we proceed with considering the loss function for perceptron • Our task is . Let’s arbitrarily encode one class as +1 and the other as 1. So each training example is now { , }, where is either +1 or 1 − • Recall𝒙𝒙 that,𝑦𝑦 in a perceptron,𝑦𝑦 = − , and the sign of determines the predicted class: +1𝑚𝑚if > 0, and 1 if < 0 𝑠𝑠 ∑𝑖𝑖=0 𝑥𝑥𝑖𝑖𝑤𝑤𝑖𝑖 𝑠𝑠 • Consider a single training example. If 𝑠𝑠 and have− the 𝑠𝑠same sign then the example is classified correctly. If and have different signs, the example is misclassified𝑦𝑦 𝑠𝑠 𝑦𝑦 𝑠𝑠 21 Statistical Machine Learning (S2 2017) Deck 6 Loss function for perceptron

• Consider a single training example. If and have the same sign then the example is classified correctly. If and have different signs, the example is misclassified𝑦𝑦 𝑠𝑠 𝑦𝑦 𝑠𝑠 • The perceptron uses a loss function in which there is no penalty for correctly classified examples, while the penalty (loss) is equal to for misclassified examples* ( , )

• Formally: 𝑠𝑠 𝐿𝐿 𝑠𝑠 𝑦𝑦 ∗ , = 0 if both , have the same sign ∗ , = if both , have different signs 0 𝐿𝐿 𝑠𝑠 𝑦𝑦 𝑠𝑠 𝑦𝑦 𝑠𝑠𝑠𝑠 • This𝐿𝐿 can𝑠𝑠 𝑦𝑦 be re𝑠𝑠 -written𝑠𝑠 𝑦𝑦as , = max 0,

𝐿𝐿 𝑠𝑠 𝑦𝑦 −𝑠𝑠𝑠𝑠 * This is similar, but not identical to another widely used loss function called hinge loss 22 Statistical Machine Learning (S2 2017) Deck 6 Stochastic gradient descent

• Split all training examples in batches

( ) • Choose initial 𝐵𝐵 Iterations over the 1 For from 1 to entire dataset are • 𝜽𝜽 called epochs • For𝑖𝑖 from 1 to𝑇𝑇 • Do𝑗𝑗 gradient descent𝐵𝐵 update using data from batch 𝑗𝑗 • Advantage of such an approach: computational feasibility for large datasets

23 Statistical Machine Learning (S2 2017) Deck 6 Perceptron training Choose initial guess ( ), = 0 0 For from 1 to (epochs)𝒘𝒘 𝑘𝑘 𝑖𝑖 For from𝑇𝑇 1 to (training examples) 𝑗𝑗 Consider𝑁𝑁 example , Update*: =𝒙𝒙𝑗𝑗 𝑦𝑦𝑗𝑗 ( ( )) 𝑘𝑘++ 𝑘𝑘 𝑘𝑘 ( ) = max 0, 𝒘𝒘 𝒘𝒘 − 𝜂𝜂𝛁𝛁𝐿𝐿 𝒘𝒘 *There is no derivative when = 0, but this case 𝐿𝐿 𝒘𝒘 = −𝑠𝑠𝑠𝑠 𝑚𝑚 is handled explicitly in the 𝑠𝑠 is𝑠𝑠 learning� 𝑥𝑥𝑖𝑖 𝑤𝑤rate𝑖𝑖 algorithm, see next slides 𝑖𝑖=0 𝜂𝜂 24 Statistical Machine Learning (S2 2017) Deck 6 Perceptron training rule

• We have = 0 when > 0 𝜕𝜕𝐿𝐿 ∗ We don’t need to do update when an example is correctly classified 𝜕𝜕𝑤𝑤𝑖𝑖 𝑠𝑠𝑠𝑠 • We have = when = 1 and < 0 𝜕𝜕𝜕𝜕 𝜕𝜕𝑤𝑤𝑖𝑖 −𝑥𝑥𝑖𝑖 𝑦𝑦 𝑠𝑠 • We have = when = 1 and > 0 𝜕𝜕𝜕𝜕 ( , ) 𝜕𝜕𝑤𝑤𝑖𝑖 𝑖𝑖 𝑥𝑥 𝑦𝑦 − 𝑠𝑠 𝐿𝐿 𝑠𝑠 𝑦𝑦 • = 𝑚𝑚 ∑𝑖𝑖=0 𝑖𝑖 𝑖𝑖 0 𝑠𝑠 𝑥𝑥 𝑤𝑤 𝑠𝑠𝑠𝑠

25 Statistical Machine Learning (S2 2017) Deck 6 Perceptron training algorithm When classified correctly, weights are unchanged

When misclassified: = (± ) ( > 0 is called )𝑘𝑘+1 𝒘𝒘 −𝜂𝜂 𝒙𝒙 If =𝜂𝜂 1, but < 0 If = 1, but 0 + 𝑦𝑦 + 𝑠𝑠 𝑦𝑦 − 𝑠𝑠 ≥ 𝑤𝑤𝑖𝑖 ← 𝑤𝑤𝑖𝑖 𝜂𝜂𝑥𝑥𝑖𝑖 𝑤𝑤𝑖𝑖 ← 𝑤𝑤𝑖𝑖 − 𝜂𝜂𝑥𝑥𝑖𝑖 𝑤𝑤0 ← 𝑤𝑤0 𝜂𝜂 𝑤𝑤0 ← 𝑤𝑤0 − 𝜂𝜂 Convergence Theorem: if the training data is linearly separable, the algorithm is guaranteed to converge to a solution. That is, there exist a finite such that = 0 𝐾𝐾 𝐾𝐾 𝐿𝐿 𝒘𝒘 26 Statistical Machine Learning (S2 2017) Deck 6 Perceptron convergence theorem • Assumptions ∗ : There exists so that for all training data = 1, … , and some∗ positive . ∗ ′ 𝑖𝑖 𝑖𝑖 ∗ Bounded data: for =𝒘𝒘1, … , and𝑦𝑦 some𝒘𝒘 𝒙𝒙finite≥ 𝛾𝛾 . 𝑖𝑖 𝑁𝑁 𝛾𝛾 • Proof sketch 𝒙𝒙𝑖𝑖 ≤ 𝑅𝑅 𝑖𝑖 𝑁𝑁 𝑅𝑅 ∗ Establish that + ∗ It then follows that∗ ′ 𝑘𝑘 ∗ ′ 𝑘𝑘−1 𝒘𝒘 𝒘𝒘 ≥ 𝒘𝒘 𝒘𝒘 𝛾𝛾 ∗ ∗ ′ 𝑘𝑘 Establish that 𝒘𝒘 𝒘𝒘 ≥ 𝑘𝑘𝑘𝑘 𝑘𝑘 2 2 ∗ Note that cos , = 𝒘𝒘 ≤ 𝑘𝑘𝑅𝑅 ∗ ′ 𝑘𝑘 ∗ 𝑘𝑘 𝒘𝒘 𝒘𝒘 ∗ cos … 1 ∗ 𝑘𝑘 Use the fact that𝒘𝒘 𝒘𝒘 𝒘𝒘 𝒘𝒘 ∗ Arrive at 2 ∗ 2 ≤ 𝑅𝑅 𝒘𝒘 𝑘𝑘 ≤ 𝛾𝛾 27 Statistical Machine Learning (S2 2017) Deck 6 Pros and cons of perceptron learning

• If the data is linearly separable, the perceptron training algorithm will converge to a correct solution ∗ There is a formal proof  good! ∗ It will converge to some solution (separating boundary), one of infinitely many possible  bad! • However, if the data is not linearly separable, the training will fail completely rather than give some approximate solution ∗ Ugly 

28 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Basic setup

= + + 1 𝑤𝑤 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 1, 0 2 𝑤𝑤 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 𝜂𝜂

29 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Start with random weights

= 0.2 = + + 1 𝑤𝑤 = 0.0 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = 0.1 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 − 𝜂𝜂

𝑥𝑥2

0.5 30 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Consider training example 1

= 0.2 = + + 1 𝑤𝑤 = 0.0 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = 0.1 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 − 𝜂𝜂 0.2 + 0.0 0.1 > 0 𝑥𝑥2 𝑥𝑥1 𝑥𝑥2 − = 0.1 (1,1) = 0.1 𝑤𝑤1 ← 𝑤𝑤1 − 𝜂𝜂𝑥𝑥1 = 0.2 2 2 2 𝑤𝑤 ← 𝑤𝑤 − 𝜂𝜂𝑥𝑥 − class -1 0 0 𝑤𝑤 ← 𝑤𝑤 − 𝜂𝜂 − class 1 0.5 31 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Update weights

= 0.1 = + + 1 𝑤𝑤 = 0.1 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = −0.2 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 − 𝜂𝜂

𝑥𝑥2

(1,1)

class -1

class 1 2 32 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Consider training example 2

= 0.1 = + + 1 𝑤𝑤 = 0.1 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = −0.2 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 − 𝜂𝜂 0.1 0.1 0.2 < 0 𝑥𝑥2 𝑥𝑥1 − +𝑥𝑥2 − = 0.3 (2,1) + = 0.0 𝑤𝑤1 ← 𝑤𝑤1 + 𝜂𝜂𝑥𝑥1 = 0.1 2 2 2 𝑤𝑤 ← 𝑤𝑤 𝜂𝜂𝑥𝑥 class -1 𝑤𝑤0 ← 𝑤𝑤0 𝜂𝜂𝑥𝑥1 − class 1 2 33 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Update weights

= 0.3 = + + 1 𝑤𝑤 = 0.0 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = 0.1 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 − 𝜂𝜂

𝑥𝑥2

(2,1)

class -1 class 1 1/3 34 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Further examples

= 0.3 = + + 1 𝑤𝑤 = 0.0 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = 0.1 − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 − 𝜂𝜂 4th point 0.3 0.0 0.1 > 0 𝑥𝑥2 3rd point: correctly classified 4th point:𝑥𝑥1 − incorrect,𝑥𝑥2 − update etc. (1.5,0.5) class -1 class 1 1/3 35 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 Perceptron Learning Example Further examples

= = + + 1 𝑤𝑤 = ⋯ 1 1 2 2 0 𝑥𝑥1 𝑠𝑠 𝑤𝑤 𝑥𝑥 𝑤𝑤 𝑥𝑥 𝑤𝑤 1, < 0 2 1, 0 𝑤𝑤 = ⋯ − 𝑠𝑠 𝑥𝑥2 Σ 𝑓𝑓 � 1 (learning rate = 0.1) 𝑠𝑠 ≥ 𝑤𝑤0 ⋯ 𝜂𝜂 Eventually, all the data will 𝑥𝑥2 be correctly classified (provided it is linearly separable)

class -1 class 1 36 𝑥𝑥1 Statistical Machine Learning (S2 2017) Deck 6 This lecture

• Notes on linear algebra ∗ Vectors and dot products ∗ Hyperplanes and vector normals • Perceptron ∗ Introduction to Artificial Neural Networks ∗ The perceptron model ∗ Training algorithm

37