Introduction to Machine Learning Lecture 3 Mehryar Mohri Courant Institute and Google Research
[email protected] Bayesian Learning Bayes’ Formula/Rule Terminology: likelihood prior Pr[X Y ]Pr[Y ] Pr[Y X]= | . posterior| Pr[X] probability evidence Mehryar Mohri - Introduction to Machine Learning page 3 Loss Function Definition: function L : R + indicating the penalty for an incorrectY prediction.×Y→ • L ( y,y ) : loss for prediction of y instead of y . Examples: • zero-one loss: standard loss function in classification; L ( y,y )=1 y = y for y,y . ∈Y • non-symmetric losses: e.g., for spam classification; L ( ham , spam) L ( spam , ham) . ≤ squared loss: standard loss function in • 2 regression; L ( y,y )=( y y ) . − Mehryar Mohri - Introduction to Machine Learning page 4 Classification Problem Input space : e.g., set of documents. X N feature vector Φ ( x ) R associated to x . • ∈ ∈X N notation: feature vector x R . • ∈ • example: vector of word counts in document. Output or target space : set of classes; e.g., sport, Y business, art. Problem: given x , predict the correct class y ∈Y associated to x . Mehryar Mohri - Introduction to Machine Learning page 5 Bayesian Prediction Definition: the expected conditional loss of predicting y is ∈Y [y x]= L(y,y)Pr[y x]. L | | y ∈Y Bayesian decision : predict class minimizing expected conditional loss, that is y∗ =argmin [y x]=argmin L(y,y)Pr[y x]. L | | y y y ∈Y zero-oneb loss: y∗ =argmaxb Pr[y x]. • y | Maximum a Posteriori (MAP) principle. b Mehryar Mohri - Introduction to Machine Learning page 6 Binary Classification - Illustration 1 Pr[y x] 1 | Pr[y x] 2 | 0 x Mehryar Mohri - Introduction to Machine Learning page 7 Maximum a Posteriori (MAP) Definition: the MAP principle consists of predicting according to the rule y =argmaxPr[y x].