COMP 551 – Applied Machine Learning Lecture 5: Generative Models for Linear Classification

COMP 551 – Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: Joelle Pineau ([email protected]) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course are copyright of the instructor, and cannot be reused or reposted without the instructor’s written permission. Today’s quiz • Q1. What is a linear classifier? (In contrast to a non-linear classifier.) • Q2. Describe the difference between discriminative and generative classifiers. • Q3. Consider the following data set. If you use logistic regression to compute a decision boundary, what is the prediction for x6? Data Feature 1 Feature 2 Feature 3 Output x1 1 0 0 0 x2 1 0 1 0 x3 0 1 0 0 x4 1 1 1 1 x5 1 1 0 1 x6 0 0 0 ? COMP-551: Applied Machine Learning 2 Joelle Pineau Quick recap • Two approaches for linear classification: – Discriminative learning: Directly estimate P(y|x). • Logistic regression, P(y|x) : σ(WX) = 1 / (1 + e-WX) – Generative learning: Separately model P(x|y) and P(y). Use these, through Bayes rule, to estimate P(y|x). COMP-551: Applied Machine Learning 3 Joelle Pineau LinearLinear discriminantLinear discriminant discriminant analysis analysis analysis (LDA) (LDA) (LDA) • Return to Bayes rule: P(x | y)PP(y(x)| y)P(y) • Return to Bayes• Return rule: to BayesP (rule:y | x ) = P(y | x) = P(x) P(x) 1 T −1 1 − (x−µ) Σ (x−µ) − e(x−2 µ)T Σ−1(x−µ) • LDA makes• Make explicit explicit assumptions assumptions aboutabout P( P(x|yx|y): ): P(x | y) =2 e 1/2 1/2 • Make explicit assumptions about P(x|y): P(x | y) = (2π ) | Σ | (2 )1/2 | |1/2 • Multivariate– Multivariate Gaussian, Gaussian, with mean with meanμ and µcovariance and covariance matrix matrix Σ . π Σ . Σ • Notation: here x is a single instance, represented as an m*1 vector. – Multivariate Gaussian, with mean µ and covariance matrix Σ . • •Key Consider assumption the of log-odds LDA: Both ratio classes (again, haveP(x) doesn’t the same mattercovariance for decision) matrix,: Σ. Pr(x | y =1)P(y =1) P(y =1) 1 T 1 T 1 log = log − (µ + µ ) Σ− (µ − µ )+ x Σ− (µ − µ ) • Consider the log-oddsPr(x | y = 0 )ratioP(y = 0(again,) P (P(x)y = 0 )doesn’t2 y=0 mattery=1 for decision)y=0 y=1 : y=0 y=1 Pr(x | y =1)P(y =1) P(y =1) 1 T 1 T 1 log Key assumption= log of LDA:− (Bothµ classes+ µ ) haveΣ− ( µthe same− µ covariance)+ x Σ− ( matrix,µ − µΣ. ) Pr(x | y = 0)P(y = 0) P(y = 0) 2 y=0 y=1 y=0 y=1 y=0 y=1 COMP-598: Applied Machine Learning 19 Joelle Pineau Key assumption of LDA: Both classes have the same covariance matrix, Σ. COMP-551: Applied Machine Learning 4 Joelle Pineau COMP-598: Applied Machine Learning 19 Joelle Pineau Applying LDA – 2 class case • Estimate µ, Σ, P(y), from the training data: – Let N1, N0, be the number of training data points from classes 1 and 0, respectively. Applying– Let I(x) LDAbe the indicator – 2function, class where I(x)=0 case if x=0 , I(x)=1 if x=1. – P(y=0) = N0 / (N0 + N1) P(y=1) = N1 / (N0 + N1) – µ0 = ∑i=1:n I(yi=0) xi / N0 µ1 = ∑i=1:n I(yi=1) xi / N1 T – Σ = ∑k=0:1∑i=1:n I(yi=0) (xi – µk)(xi – µk) / (N0+N1-Nk) • Estimate µ, Σ, P(y), from the training data: – Let N1, N0, be the number of training data points from classes 1 and • Given an input x, classify it as class 1 if the log-odds ratio is >0, 0, respectively. classify it as class 0 otherwise. – Let I(x) be the indicator function, where I(x)=0 if x=0, I(x)=1 if x=1. – P(y=0) = NCOMP-598:0 / (N0 Applied + N 1Machine) Learning P(y=1)20 = N1 / (N0 + N1Joelle) Pineau – µ0 = ∑i=1:n I(yi=0) xi / N0 µ1 = ∑i=1:n I(yi=1) xi / N1 T – Σ = ∑k=0:1∑i=1:n I(yi=0) (xi – µk)(xi – µk) / (N0+N1-Nk) 10 • Given an input x, classify it as class 1 if the log-odds ratio is >0, classify it as class 0 otherwise. COMP-598: Applied Machine Learning 20 Joelle Pineau 10 LinearLinear discriminantLinear discriminant discriminant analysis analysis analysis (LDA) (LDA) (LDA) • Return to Bayes rule: P(x | y)PP(y(x)| y)P(y) • Return to Bayes• Return rule: to BayesP (rule:y | x ) = P(y | x) = P(x) P(x) 1 T −1 1 − (x−µ) Σ (x−µ) − e(x−2 µ)T Σ−1(x−µ) • LDA makes• Make explicit explicit assumptions assumptions aboutabout P( P(x|yx|y): ): P(x | y) =2 e 1/2 1/2 • Make explicit assumptions about P(x|y): P(x | y) = (2π ) | Σ | (2 )1/2 | |1/2 • Multivariate– Multivariate Gaussian, Gaussian, with mean with meanμ and µcovariance and covariance matrix matrix Σ . π Σ . Σ • Notation: here x is a single instance, represented as an m*1 vector. – Multivariate Gaussian, with mean µ and covariance matrix Σ . • •Key Consider assumption the of log-odds LDA: Both ratio classes (again, haveP(x) doesn’t the same mattercovariance for decision) matrix,: Σ. Pr(x | y =1)P(y =1) P(y =1) 1 T 1 T 1 log = log − (µ + µ ) Σ− (µ − µ )+ x Σ− (µ − µ ) • Consider• Consider the log-odds Pther(x |logy = 0- odds)ratioP(y = 0ratio(again,) (again,P (P(x)y = 0 P(x))doesn’t2doesn’ty=0 matter mattery=1 for for decision)y =decision)0 y=1 :: y=0 y=1 Pr(x | y =P1)(Px(|yy=Key=11)) Passumption(y =1)P(y =of1 PLDA:)(y =1 1Both) 1classesT −1 Thave−11 theT same−1 covarianceT −T1 −1 matrix, Σ. log ln = log = ln − (µ− +µµ1 Σ )µ1Σ+ (µµ1 Σ−µµ1 +)x+Σx Σ(µ1(−µµ2 )− µ ) Pr(x | y =P0()xP|(y ==00))P(y = 0P)(y = 0P)(y =20) y=02 y=1 2 y=0 y=1 y=0 y=1 COMP-598: Applied Machine Learning 19 Joelle TPineau This is a linear decision boundary! w0 + x w Key assumption of LDA: Both classes have the same covariance matrix, Σ. COMP-551: Applied Machine Learning 5 Joelle Pineau COMP-598: Applied Machine Learning 19 Joelle Pineau Applying LDA – 2 class case • Estimate µ, Σ, P(y), from the training data: – Let N1, N0, be the number of training data points from classes 1 and 0, respectively. Applying– Let I(x) LDAbe the indicator – 2function, class where I(x)=0 case if x=0 , I(x)=1 if x=1. – P(y=0) = N0 / (N0 + N1) P(y=1) = N1 / (N0 + N1) – µ0 = ∑i=1:n I(yi=0) xi / N0 µ1 = ∑i=1:n I(yi=1) xi / N1 T – Σ = ∑k=0:1∑i=1:n I(yi=0) (xi – µk)(xi – µk) / (N0+N1-Nk) • Estimate µ, Σ, P(y), from the training data: – Let N1, N0, be the number of training data points from classes 1 and • Given an input x, classify it as class 1 if the log-odds ratio is >0, 0, respectively. classify it as class 0 otherwise. – Let I(x) be the indicator function, where I(x)=0 if x=0, I(x)=1 if x=1. – P(y=0) = NCOMP-598:0 / (N0 Applied + N 1Machine) Learning P(y=1)20 = N1 / (N0 + N1Joelle) Pineau – µ0 = ∑i=1:n I(yi=0) xi / N0 µ1 = ∑i=1:n I(yi=1) xi / N1 T – Σ = ∑k=0:1∑i=1:n I(yi=0) (xi – µk)(xi – µk) / (N0+N1-Nk) 10 • Given an input x, classify it as class 1 if the log-odds ratio is >0, classify it as class 0 otherwise. COMP-598: Applied Machine Learning 20 Joelle Pineau 10 Learning in LDA: 2 class case • Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. COMP-551: Applied Machine Learning 6 Joelle Pineau Learning in LDA: 2 class case • Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. – P(y=0) = N0 / (N0 + N1) P(y=1) = N1 / (N0 + N1) where N1, N0, be # of training samples from classes 1 and 0, respectively. COMP-551: Applied Machine Learning 7 Joelle Pineau Learning in LDA: 2 class case • Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. – P(y=0) = N0 / (N0 + N1) P(y=1) = N1 / (N0 + N1) where N1, N0, be # of training samples from classes 1 and 0, respectively. – μ0 = ∑i=1:n I(yi=0) xi / N0 μ1 = ∑i=1:n I(yi=1) xi / N1 where I(x) is the indicator function: I(x)=0 if x=0, I(x)=1 if x=1. COMP-551: Applied Machine Learning 8 Joelle Pineau Learning in LDA: 2 class case • Estimate P(y), μ, Σ, from the training data, then apply log-odds ratio. – P(y=0) = N0 / (N0 + N1) P(y=1) = N1 / (N0 + N1) where N1, N0, be # of training samples from classes 1 and 0, respectively. – μ0 = ∑i=1:n I(yi=0) xi / N0 μ1 = ∑i=1:n I(yi=1) xi / N1 where I(x) is the indicator function: I(x)=0 if x=0, I(x)=1 if x=1.

COMP 551 – Applied Machine Learning Lecture 5: Generative Models for Linear Classification

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support