Machine Learning CSE 6363 (Fall 2016)

Lecture 4 Bayesian Learning

Heng Huang, Ph.. Department of Computer Science and Engineering

Probability Distribution

• Let’ start from a question • A billionaire from the suburbs of Seattle asks you a question: – He says: I have thumbtack, if I flip it, what’s the probability it will fall with the nail up? – You say: Please flip it a few times

– You say: The probability is: – He says: Why??? – You say: Because…

Fall 2016 Heng Huang Machine Learning 2 Ref: Carlos Guestrin Thumbtack – Binomial Distribution

• P(Heads) = , P(Tails) = 1 - q

• Flips are: – Independent events – Identically distributed according to Binomial distribution

• Sequence D of aH Heads and aT Tails

Fall 2016 Heng Huang Machine Learning 3 Ref: Carlos Guestrin Maximum Likelihood Estimation

• Data: Observed set D of aH Heads and aT Tails • Hypothesis: Binomial distribution • Learning q is an optimization problem – What’s the objective function? • MLE: Choose q that maximizes the probability of observed data:

Fall 2016 Heng Huang Machine Learning 4 Ref: Carlos Guestrin Your First Learning Algorithm

• Set derivative to zero:

Fall 2016 Heng Huang Machine Learning 5 Ref: Carlos Guestrin What about Prior

• Billionaire says: I flipped 3 heads and 2 tails. • You say: q = 3/5, I can prove it! • Billionaire says: Wait, I know that the thumbtack is “close” to 50-50. What can you? • You say: I can learn it the Bayesian way… • Rather than estimating a single q, we obtain a distribution over possible values of q

Fall 2016 Heng Huang Machine Learning 6 Ref: Carlos Guestrin Bayesian Learning

• Use Bayes rule:

• Or equivalently:

Fall 2016 Heng Huang Machine Learning 7 Ref: Carlos Guestrin Bayesian Learning for Thumbtack

• Likelihood function is simply Binomial:

• What about prior? – Represent expert knowledge – Simple posterior form • Conjugate priors: – Closed-form representation of posterior – For Binomial, conjugate prior is Beta distribution

Fall 2016 Heng Huang Machine Learning 8 Ref: Carlos Guestrin Beta Prior Distribution – P(q)

• Likelihood function: • Posterior:

Fall 2016 Heng Huang Machine Learning 9 Ref: Carlos Guestrin Posterior Distribution

• Prior:

• Data: aH Heads and aT Tails • Posterior distribution:

Fall 2016 Heng Huang Machine Learning 10 Ref: Carlos Guestrin Conjugate Priors

Fall 2016 Heng Huang Machine Learning 11 Ref: Kevin Murphy The Beta Distribution

Fall 2016 Heng Huang Machine Learning 12 Ref: Kevin Murphy The Beta Distribution

Fall 2016 Heng Huang Machine Learning 13 Ref: Kevin Murphy Bayesian Updating in Pictures

Fall 2016 Heng Huang Machine Learning 14 Ref: Kevin Murphy Posterior Predictive Distribution

Fall 2016 Heng Huang Machine Learning 15 Ref: Kevin Murphy Effect of Prior Strength

Fall 2016 Heng Huang Machine Learning 16 Ref: Kevin Murphy Effect of Prior Strength

Fall 2016 Heng Huang Machine Learning 17 Ref: Kevin Murphy Parameter Posterior – Small Sample, Uniform Prior

Fall 2016 Heng Huang Machine Learning 18 Ref: Kevin Murphy Parameter Posterior – Small Sample, Strong Prior

Fall 2016 Heng Huang Machine Learning 19 Ref: Kevin Murphy Maximum A Posteriori (MAP) Estimation

Fall 2016 Heng Huang Machine Learning 20 Ref: Kevin Murphy Integrate Out or Optimize

Fall 2016 Heng Huang Machine Learning 21 Ref: Kevin Murphy From Coin to Dice

Fall 2016 Heng Huang Machine Learning 22 Ref: Kevin Murphy MLE for Multinomial

Fall 2016 Heng Huang Machine Learning 23 Ref: Kevin Murphy Dirichlet Priors

Fall 2016 Heng Huang Machine Learning 24 Ref: Kevin Murphy Gaussian Density in 1-D

Fall 2016 Heng Huang Machine Learning 25 Ref: Kevin Murphy Multivariate Gaussian

Fall 2016 Heng Huang Machine Learning 26 Multivariate Gaussian

Fall 2016 Heng Huang Machine Learning 27 Ref: Kevin Murphy Conditional Gaussian

Fall 2016 Heng Huang Machine Learning 28 Ref: Kevin Murphy