Introduction to Machine Learning

Home , Gradient descent, Learning rate, Machine learning, Perceptron, Supervised learning

Introduction to Machine Learning Amo G. Tong 1 Lecture 7 Supervised Learning

• Linear Classifiers • Perceptron • Sigmoid unit

• Some materials are courtesy of Vibhave Gogate and Tom Mitchell. • All pictures belong to their creators.

Introduction to Machine Learning Amo G. Tong 2 Linear Classifiers

• Input: a real vector 풙 = (푥1, … , 푥푛), some features

• A weight vector 흎 = (휔1, … , 휔푛)

• Bias: 휔0

• A constant feature: 푥0 = 1 for all instance

• Linear classifier:

• 푓(풙) = 푝표푠푖푡푖푣푒 푖푓 σ 휔푖 ⋅ 푥푖 > 0

• 푓 풙 = 푛푒푔푎푡푖푣푒 푖푓 σ 휔푖 ⋅ 푥푖 ≤ 0

Introduction to Machine Learning Amo G. Tong 3 Linear Classifiers

• Exercise: • Is logistic regression a linear classifier?

Introduction to Machine Learning Amo G. Tong 4 Linear Classifiers

• Example • (from Gogate)

Introduction to Machine Learning Amo G. Tong 5 Perceptron

1, if 휔 + 휔 푥 + ⋯ + 휔 푥 > 0 표(푥 , … , 푥 ) = ቊ 표 1 1 푛 푛 1 푛 −1, else 1, if 푦 > 0 표(푥 , … , 푥 ) = sgn(흎 ⋅ 풙) where sgn 푦 = ቊ 1 푛 −1, else

Introduction to Machine Learning Amo G. Tong 6 Decision Surface of Perceptron

+ + + + + + - - - - + + - + - - - - 휔표 + 휔1푥1 > 0 -

휔표 + 휔1푥1 + 휔2푥2 > 0 • Representation Power of Perceptron: • All linearly separable data. + + - - - - + - - - +

Introduction to Machine Learning Amo G. Tong 7 Decision Surface of Perceptron

• Representation Power of Perceptron: • All linearly separable data.

• Special cases: Boolean function 푥푖 = 0 or 1. • Perceptron can represent some Boolean functions such as AND and OR. • But not all of them.

- + - + - - + -

AND XOR Introduction to Machine Learning Amo G. Tong 8 Training a Perceptron

• Training a Perceptron. • Method 1: perceptron training rule • Method 2: LSM and gradient descent.

1, if 휔 + 휔 푥 + ⋯ + 휔 푥 > 0 표(푥 , … , 푥 ) = ቊ 표 1 1 푛 푛 1 푛 −1, else

1 2 3 3 + Initialize 휔푖 as some small values 2 6 3 3 - Update 휔푖 ← 휔푖 + Δ휔푖 9 2 4 4 + 2 7 5 3 - Until some criteria met

Introduction to Machine Learning Amo G. Tong 9 Training a Perceptron

• Method 1: perceptron training rule

1, if 휔 + 휔 푥 + ⋯ + 휔 푥 > 0 표(푥 , … , 푥 ) = ቊ 표 1 1 푛 푛 1 푛 −1, else

Do until converge Update 휔 ← 휔 + Δ휔 푖 푖 푖 For each 푥 in 퐷 For each 휔푖 Δ휔푖 = 휂 푡 − 표 푥푖 휔푖 ← 휔푖 + Δ휔푖

• 푡: target value • 표: current prediction • 휂: learning rate. small value.

Introduction to Machine Learning Amo G. Tong 10 Training a Perceptron

• Method 1: perceptron training rule

1, if 휔 + 휔 푥 + ⋯ + 휔 푥 > 0 표(푥 , … , 푥 ) = ቊ 표 1 1 푛 푛 1 푛 −1, else

Update 휔푖 ← 휔푖 + Δ휔푖

Δ휔푖 = 휂 푡 − 표 푥푖

• 푡: target value • 표: current prediction • 휂: learning rate. small value.

Introduction to Machine Learning Amo G. Tong 11 Training a Perceptron

• Method 1: perceptron training rule Do until converge For each 푥 in 퐷 • Converges if data is linearly separable. For each 휔푖 • If learning rate is small enough 휔푖 ← 휔푖 + Δ휔푖

• May not converge if data is not linearly separable

• Your training data may not be linearly separable, even if the underlying truth is linear. Noise. • Any approach that always converges? LSM+Gradient Descent.

Introduction to Machine Learning Amo G. Tong 12 Training a Perceptron