Machine Learning CSE 6363 (Fall 2016)
Lecture 4 Bayesian Learning
Heng Huang, Ph.D. Department of Computer Science and Engineering
Probability Distribution
• Let’s start from a question • A billionaire from the suburbs of Seattle asks you a question: – He says: I have thumbtack, if I flip it, what’s the probability it will fall with the nail up? – You say: Please flip it a few times
– You say: The probability is: – He says: Why??? – You say: Because…
Fall 2016 Heng Huang Machine Learning 2 Ref: Carlos Guestrin Thumbtack – Binomial Distribution
• P(Heads) = q, P(Tails) = 1 - q
• Flips are: – Independent events – Identically distributed according to Binomial distribution
• Sequence D of aH Heads and aT Tails
Fall 2016 Heng Huang Machine Learning 3 Ref: Carlos Guestrin Maximum Likelihood Estimation
• Data: Observed set D of aH Heads and aT Tails • Hypothesis: Binomial distribution • Learning q is an optimization problem – What’s the objective function? • MLE: Choose q that maximizes the probability of observed data:
Fall 2016 Heng Huang Machine Learning 4 Ref: Carlos Guestrin Your First Learning Algorithm
• Set derivative to zero:
Fall 2016 Heng Huang Machine Learning 5 Ref: Carlos Guestrin What about Prior
• Billionaire says: I flipped 3 heads and 2 tails. • You say: q = 3/5, I can prove it! • Billionaire says: Wait, I know that the thumbtack is “close” to 50-50. What can you? • You say: I can learn it the Bayesian way… • Rather than estimating a single q, we obtain a distribution over possible values of q
Fall 2016 Heng Huang Machine Learning 6 Ref: Carlos Guestrin Bayesian Learning
• Use Bayes rule:
• Or equivalently:
Fall 2016 Heng Huang Machine Learning 7 Ref: Carlos Guestrin Bayesian Learning for Thumbtack
• Likelihood function is simply Binomial:
• What about prior? – Represent expert knowledge – Simple posterior form • Conjugate priors: – Closed-form representation of posterior – For Binomial, conjugate prior is Beta distribution
Fall 2016 Heng Huang Machine Learning 8 Ref: Carlos Guestrin Beta Prior Distribution – P(q)
• Likelihood function: • Posterior:
Fall 2016 Heng Huang Machine Learning 9 Ref: Carlos Guestrin Posterior Distribution
• Prior:
• Data: aH Heads and aT Tails • Posterior distribution:
Fall 2016 Heng Huang Machine Learning 10 Ref: Carlos Guestrin Conjugate Priors
Fall 2016 Heng Huang Machine Learning 11 Ref: Kevin Murphy The Beta Distribution
Fall 2016 Heng Huang Machine Learning 12 Ref: Kevin Murphy The Beta Distribution
Fall 2016 Heng Huang Machine Learning 13 Ref: Kevin Murphy Bayesian Updating in Pictures
Fall 2016 Heng Huang Machine Learning 14 Ref: Kevin Murphy Posterior Predictive Distribution
Fall 2016 Heng Huang Machine Learning 15 Ref: Kevin Murphy Effect of Prior Strength
Fall 2016 Heng Huang Machine Learning 16 Ref: Kevin Murphy Effect of Prior Strength
Fall 2016 Heng Huang Machine Learning 17 Ref: Kevin Murphy Parameter Posterior – Small Sample, Uniform Prior
Fall 2016 Heng Huang Machine Learning 18 Ref: Kevin Murphy Parameter Posterior – Small Sample, Strong Prior
Fall 2016 Heng Huang Machine Learning 19 Ref: Kevin Murphy Maximum A Posteriori (MAP) Estimation
Fall 2016 Heng Huang Machine Learning 20 Ref: Kevin Murphy Integrate Out or Optimize
Fall 2016 Heng Huang Machine Learning 21 Ref: Kevin Murphy From Coin to Dice
Fall 2016 Heng Huang Machine Learning 22 Ref: Kevin Murphy MLE for Multinomial
Fall 2016 Heng Huang Machine Learning 23 Ref: Kevin Murphy Dirichlet Priors
Fall 2016 Heng Huang Machine Learning 24 Ref: Kevin Murphy Gaussian Density in 1-D
Fall 2016 Heng Huang Machine Learning 25 Ref: Kevin Murphy Multivariate Gaussian
Fall 2016 Heng Huang Machine Learning 26 Multivariate Gaussian
Fall 2016 Heng Huang Machine Learning 27 Ref: Kevin Murphy Conditional Gaussian
Fall 2016 Heng Huang Machine Learning 28 Ref: Kevin Murphy