The Naïve Bayes Classifier

The Naïve Bayes Classifier Machine Learning 1 Today’s lecture • The naïve Bayes Classifier • Learning the naïve Bayes Classifier • Practical concerns 2 Today’s lecture • The naïve Bayes Classifier • Learning the naïve Bayes Classifier • Practical concerns 3 Where are we? We have seen Bayesian learning – Using a probabilistic criterion to select a hypothesis – Maximum a posteriori and maximum likelihood learning You should know what is the difference between them We could also learn functions that predict probabilities of outcomes – Different from using a probabilistic criterion to learn Maximum a posteriori (MAP) prediction as opposed to MAP learning 4 Where are we? We have seen Bayesian learning – Using a probabilistic criterion to select a hypothesis – Maximum a posteriori and maximum likelihood learning You should know what is the difference between them We could also learn functions that predict probabilities of outcomes – Different from using a probabilistic criterion to learn Maximum a posteriori (MAP) prediction as opposed to MAP learning 5 MAP prediction Using the Bayes rule for predicting � given an input � � � = � � = � � � = � � � = � � = � = �(� = �) Posterior probability of label being y for this input x 6 MAP prediction Using the Bayes rule for predicting � given an input � � � = � � = � � � = � � � = � � = � = �(� = �) Predict the label � for the input � using � � = � � = � � � = � argmax . �(� = �) 7 MAP prediction Using the Bayes rule for predicting � given an input � � � = � � = � � � = � � � = � � = � = �(� = �) Predict the label � for the input � using argmax � � = � � = � � � = � . 8 Don’t confuse with MAP learning: MAP prediction finds hypothesis by Using the Bayes rule for predicting � given an input � � � = � � = � � � = � � � = � � = � = �(� = �) Predict the label � for the input � using argmax � � = � � = � � � = � . 9 MAP prediction Predict the label � for the input � using argmax � � = � � = � � � = � . Likelihood of observing this Prior probability of the label input x when the label is y being y All we need are these two sets of probabilities 10 Example: Tennis again Play tennis P(Play tennis) Without any other information, what is the prior probability that I Prior Yes 0.3 should play tennis? No 0.7 Temperature Wind P(T, W |Tennis = Yes) On days that I do play tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and Cold Strong 0.1 the wind is W? Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play Hot Strong 0.4 tennis, what is the probability that the Hot Weak 0.1 temperature is T and the Cold Strong 0.3 wind is W? Cold Weak 0.2 11 Example: Tennis again Play tennis P(Play tennis) Without any other information, what is the prior probability that I Prior Yes 0.3 should play tennis? No 0.7 Temperature Wind P(T, W |Tennis = Yes) On days that I do play tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and Cold Strong 0.1 the wind is W? Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play Hot Strong 0.4 tennis, what is the probability that the Hot Weak 0.1 temperature is T and the Cold Strong 0.3 wind is W? Cold Weak 0.2 12 Example: Tennis again Play tennis P(Play tennis) Without any other information, what is the prior probability that I Prior Yes 0.3 should play tennis? No 0.7 Temperature Wind P(T, W |Tennis = Yes) On days that I do play tennis, what is the Hot Strong 0.15 probability that the Hot Weak 0.4 temperature is T and Cold Strong 0.1 the wind is W? Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) On days that I don’t play Hot Strong 0.4 tennis, what is the probability that the Hot Weak 0.1 temperature is T and the Cold Strong 0.3 wind is W? Cold Weak 0.2 13 Example: Tennis again Input: Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Prior Yes 0.3 No 0.7 Should I play tennis? Temperature Wind P(T, W |Tennis = Yes) Hot Strong 0.15 Hot Weak 0.4 Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 14 Example: Tennis again Input: Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Prior Yes 0.3 No 0.7 Should I play tennis? Temperature Wind P(T, W |Tennis = Yes) argmaxy P(H, W | play?) P (play?) Hot Strong 0.15 Hot Weak 0.4 Cold Strong 0.1 Cold Weak 0.35 Likelihood Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 15 Example: Tennis again Input: Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Prior Yes 0.3 No 0.7 Should I play tennis? Temperature Wind P(T, W |Tennis = Yes) argmaxy P(H, W | play?) P (play?) Hot Strong 0.15 P(H, W | Yes) P(Yes) = 0.4 £ 0.3 Hot Weak 0.4 = 0.12 Cold Strong 0.1 Cold Weak 0.35 P(H, W | No) P(No) = 0.1 £ 0.7 Likelihood = 0.07 Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 16 Example: Tennis again Input: Temperature = Hot (H) Play tennis P(Play tennis) Wind = Weak (W) Prior Yes 0.3 No 0.7 Should I play tennis? Temperature Wind P(T, W |Tennis = Yes) argmaxy P(H, W | play?) P (play?) Hot Strong 0.15 P(H, W | Yes) P(Yes) = 0.4 £ 0.3 Hot Weak 0.4 = 0.12 Cold Strong 0.1 Cold Weak 0.35 P(H, W | No) P(No) = 0.1 £ 0.7 Likelihood = 0.07 Temperature Wind P(T, W |Tennis = No) Hot Strong 0.4 MAP prediction = Yes Hot Weak 0.1 Cold Strong 0.3 Cold Weak 0.2 17 How hard is it to learn probabilistic models? O T H W Play? Outlook: S(unny), 1 S H H W - O(vercast), 2 S H H S - 3 O H H W + R(ainy) 4 R M H W + Temperature: H(ot), 5 R C N W + M(edium), 6 R C N S - 7 O C N S + C(ool) 8 S M H W - Humidity: H(igh), 9 S C N W + 10 R M N W + N(ormal), 11 S M N S + L(ow) 12 O M H S + 13 O H N W + Wind: S(trong), 14 R M H S - W(eak) 18 How hard is it to learn probabilistic models? O T H W Play? Outlook: S(unny), 1 S H H W - O(vercast), 2 S H H S - 3 O H H W + R(ainy) We need to learn 4 R M H W + Temperature: H(ot), 5 R C N W + M(edium), 6 R C N S - 1.The prior �(Play? ) C(ool) 7 O C N S + 2.The likelihoods � x Play? ) 8 S M H W - Humidity: H(igh), 9 S C N W + 10 R M N W + N(ormal), 11 S M N S + L(ow) 12 O M H S + 13 O H N W + Wind: S(trong), 14 R M H S - W(eak) 19 How hard is it to learn probabilistic models? O T H W Play? Prior P(play?) 1 S H H W - 2 S H H S - • A single number (Why only one?) 3 O H H W + 4 R M H W + Likelihood P(X | Play?) 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + 10 R M N W + assignment: P(x1, x2, x3, x4 | Play?) 11 S M N S + • (24 – 1) parameters in each case 12 O M H S + 13 O H N W + One for each assignment 14 R M H S - 20 How hard is it to learn probabilistic models? O T H W Play? Prior P(play?) 1 S H H W - 2 S H H S - • A single number (Why only one?) 3 O H H W + 4 R M H W + Likelihood P(X | Play?) 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + 10 R M N W + assignment: P(O, T, H, W | Play?) 11 S M N S + 12 O M H S + 13 O H N W + 14 R M H S - 21 How hard is it to learn probabilistic models? O T H W Play? Prior P(play?) 1 S H H W - 2 S H H S - • A single number (Why only one?) 3 O H H W + 4 R M H W + Likelihood P(X | Play?) 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + 10 R M N W + assignment: P(O, T, H, W | Play?) 11 S M N S + 12 O M H S + 13 O H N W + 14 R M H S - 3 3 3 2 Values for this feature 22 How hard is it to learn probabilistic models? O T H W Play? Prior P(play?) 1 S H H W - 2 S H H S - • A single number (Why only one?) 3 O H H W + 4 R M H W + Likelihood P(X | Play?) 5 R C N W + • There are 4 features 6 R C N S - 7 O C N S + • For each value of Play? (+/-), we 8 S M H W - need a value for each possible 9 S C N W + 10 R M N W + assignment: P(O, T, H, W | Play?) 11 S M N S + • (3 ⋅ 3 ⋅ 3 ⋅ 2 − 1) parameters in 12 O M H S + 13 O H N W + each case 14 R M H S - One for each assignment 3 3 3 2 Values for this feature 23 How hard is it to learn probabilistic models? In general O T H W Play? Prior P(Y) 1 S H H W - 2 S H H S - • If there are k labels, then k – 1 3 O H H W + parameters (why not k?) 4 R M H W + 5 R C N W + Likelihood P(X | Y) 6 R C N S - 7 O C N S + • If there are d features, then: 8 S M H W - • We need a value for each possible 9 S C N W + 10 R M N W + P(x1, x2, !, xd | y) for each y 11 S M N S + • k(2d – 1) parameters 12 O M H S + 13 O H N W + Need a lot of data to estimate these 14 R M H S - many numbers! 24 How hard is it to learn probabilistic models? In general O T H W Play? Prior P(Y) 1 S H H W - 2 S H H S - • If there are k labels, then k – 1 3 O H H W + parameters (why not k?) 4 R M H W + 5 R C N W + Likelihood P(X | Y) 6 R C N S - 7 O C N S + • If there are d Boolean features: 8 S M H W - • We need a value for each 9 S C N W + 10 R M N W + possible P(x1, x2, !, xd | y) for 11 S M N S + each y 12 O M H S + • d 13 O H N W + k(2 – 1) parameters 14 R M H S - Need a lot of data to estimate these many numbers! 25 How hard is it to learn probabilistic models? In general O T H W Play? Prior P(Y) 1 S H H W - 2 S H H S - • If there are k labels, then k – 1 3 O H H W + parameters (why not k?) 4 R M H W + 5 R C N W + Likelihood P(X | Y) 6 R C N S - 7 O C N S + • If there are d Boolean features: 8 S M H W - • We need a value for each 9 S C N W + 10 R M N W + possible

The Naïve Bayes Classifier

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support