Generative and Discriminative Learning

Generative and Discriminative Learning

Generative and Discriminative Learning Machine Learning 1 What we saw most of the semester • A fixed, unknown distribution D over � → � – X: Instance space, Y: label space (eg: {+1, -1}) • Given a dataset S = {(xi, yi)} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 2 What we saw most of the semester Is this different • A fixed, unknown distribution D over � → � from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(xi, yi)} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 3 What we saw most of the semester Is this different • A fixed, unknown distribution D over � → � from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(xi, yi)} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 4 What we saw most of the semester Is this different • A fixed, unknown distribution D over � → � from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(xi, yi)} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 5 What we saw most of the semester Is this different • A fixed, unknown distribution D over � → � from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(xi, yi)} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 6 Discriminative models Goal: learn directly how to make predictions • Look at many (positive/negative) examples • Discover regularities in the data • Use these to construct a prediction policy • Assumptions come in the form of the hypothesis class Bottom line: approximating ℎ: � → � is estimating the conditional probability �(�|�) 7 Generative models • Explicitly model how instances in each category are generated by modeling the joint probability of X and Y, that is �(�, �) • That is, learn �(�|�) and �(�) • The naïve Bayes classifier does this – Naïve Bayes is a generative model • Predict �(�|�) using the Bayes rule 8 Example: Generative story of naïve Bayes 9 Example: Generative story of naïve Bayes P(Y) First sample a label Y 10 Example: Generative story of naïve Bayes P(Y) Y X1 P(X1 | Y) Given the label, sample the features independently from the conditional distributions 11 Example: Generative story of naïve Bayes P(Y) Y X1 X2 P(X1 | Y) P(X2 | Y) Given the label, sample the features independently from the conditional distributions 12 Example: Generative story of naïve Bayes P(Y) Y X1 X2 X3 P(X1 | Y) P(X2 | Y) P(X3 | Y) Given the label, sample the features independently from the conditional distributions 13 Example: Generative story of naïve Bayes P(Y) Y X1 X2 X3 . P(X1 | Y) P(X2 | Y) P(X3 | Y) Given the label, sample the features independently from the conditional distributions 14 Example: Generative story of naïve Bayes P(Y) Y X1 X2 X3 . Xd P(X1 | Y) P(X2 | Y) P(X3 | Y) P(Xd | Y) Given the label, sample the features independently from the conditional distributions 15 Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 16 Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 17 Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 18 Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 19 Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model A generative model tries to characterize the distribution of the inputs, a discriminative model doesn’t care • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 20.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us