Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Contact information Please report any typos or errors to geoff[email protected] Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Outline 1 Hierarchical Bayesian Modelling Coin toss redux: point estimates for θ Hierarchical models Application to clinical study 2 Bayesian Model Selection Introduction Bayes Factors Shortcut for Marginal Likelihood in Conjugate Case Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Outline 1 Hierarchical Bayesian Modelling Coin toss redux: point estimates for θ Hierarchical models Application to clinical study 2 Bayesian Model Selection Introduction Bayes Factors Shortcut for Marginal Likelihood in Conjugate Case Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Let Y be a random variable denoting number of observed heads in n coin tosses. Then, we can model Y ∼ Bin(n; θ), with probability mass function n p(Y = y j θ) = θy (1 − θ)n−y (1) y We want to estimate the parameter θ Coin toss: point estimates for θ Probability model Consider the experiment of tossing a coin n times. Each toss results in heads with probability θ and tails with probability 1 − θ Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 We want to estimate the parameter θ Coin toss: point estimates for θ Probability model Consider the experiment of tossing a coin n times. Each toss results in heads with probability θ and tails with probability 1 − θ Let Y be a random variable denoting number of observed heads in n coin tosses. Then, we can model Y ∼ Bin(n; θ), with probability mass function n p(Y = y j θ) = θy (1 − θ)n−y (1) y Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Coin toss: point estimates for θ Probability model Consider the experiment of tossing a coin n times. Each toss results in heads with probability θ and tails with probability 1 − θ Let Y be a random variable denoting number of observed heads in n coin tosses. Then, we can model Y ∼ Bin(n; θ), with probability mass function n p(Y = y j θ) = θy (1 − θ)n−y (1) y We want to estimate the parameter θ Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Let ``(θjy) := log p(yjθ), the log-likelihood. Then, θ^ML = argmax``(θjy) = argmax ylog(θ) + (n − y)log(1 − θ) (2) θ θ Since the log likelihood is a concave function of θ, @`(θjy) argmax `(θjy) , 0 = @θ ^ θ θML y n − y , 0 = − (3) θ^ML 1 − θ^ML y , θ^ = ML n Coin toss: point estimates for θ Maximum Likelihood By interpreting p(Y = yjθ) as a function of θ rather than y, we get the likelihood function for θ Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Since the log likelihood is a concave function of θ, @`(θjy) argmax `(θjy) , 0 = @θ ^ θ θML y n − y , 0 = − (3) θ^ML 1 − θ^ML y , θ^ = ML n Coin toss: point estimates for θ Maximum Likelihood By interpreting p(Y = yjθ) as a function of θ rather than y, we get the likelihood function for θ Let ``(θjy) := log p(yjθ), the log-likelihood. Then, θ^ML = argmax``(θjy) = argmax ylog(θ) + (n − y)log(1 − θ) (2) θ θ Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Coin toss: point estimates for θ Maximum Likelihood By interpreting p(Y = yjθ) as a function of θ rather than y, we get the likelihood function for θ Let ``(θjy) := log p(yjθ), the log-likelihood. Then, θ^ML = argmax``(θjy) = argmax ylog(θ) + (n − y)log(1 − θ) (2) θ θ Since the log likelihood is a concave function of θ, @`(θjy) argmax `(θjy) , 0 = @θ ^ θ θML y n − y , 0 = − (3) θ^ML 1 − θ^ML y , θ^ = ML n Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Asymptotic result that this approaches true parameter Coin toss: point estimates for θ Point estimate for θ: Maximum Likelihood What if sample size is small? Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Coin toss: point estimates for θ Point estimate for θ: Maximum Likelihood What if sample size is small? Asymptotic result that this approaches true parameter Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Recall that if p(yjθ) is in the exponential family, there exists a conjugate prior p(θ) s.t. if p(θ) 2 F, then p(yjθ)p(θ) 2 F Saw last time that binomial is in the exponential family, and θ ∼ Beta(α; β) is a conjugate prior. Γ(α + β) p(θjα; β) = θα−1(1 − θ)β−1 (5) Γ(α)Γ(β) Coin toss: point estimates for θ Maximum A Posteriori Alternative analysis: reverse the conditioning with Bayes' Theorem: p(yjθ)p(θ) p(θjy) = (4) p(y) Lets us encode our prior beliefs or knowledge about θ in a prior distribution for the parameter, p(θ) Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Saw last time that binomial is in the exponential family, and θ ∼ Beta(α; β) is a conjugate prior. Γ(α + β) p(θjα; β) = θα−1(1 − θ)β−1 (5) Γ(α)Γ(β) Coin toss: point estimates for θ Maximum A Posteriori Alternative analysis: reverse the conditioning with Bayes' Theorem: p(yjθ)p(θ) p(θjy) = (4) p(y) Lets us encode our prior beliefs or knowledge about θ in a prior distribution for the parameter, p(θ) Recall that if p(yjθ) is in the exponential family, there exists a conjugate prior p(θ) s.t. if p(θ) 2 F, then p(yjθ)p(θ) 2 F Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Coin toss: point estimates for θ Maximum A Posteriori Alternative analysis: reverse the conditioning with Bayes' Theorem: p(yjθ)p(θ) p(θjy) = (4) p(y) Lets us encode our prior beliefs or knowledge about θ in a prior distribution for the parameter, p(θ) Recall that if p(yjθ) is in the exponential family, there exists a conjugate prior p(θ) s.t. if p(θ) 2 F, then p(yjθ)p(θ) 2 F Saw last time that binomial is in the exponential family, and θ ∼ Beta(α; β) is a conjugate prior. Γ(α + β) p(θjα; β) = θα−1(1 − θ)β−1 (5) Γ(α)Γ(β) Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Thus, p(θjy) / p(yjθ)p(θ)p(y) so that θ^MAP = argmax p(θjy) θ = argmax p(yjθ)p(θ) θ = argmax log p(yjθ)p(θ) θ By evaluating the first partial derivative w.r.t θ and setting to 0 at θ^MAP we can derive y + α − 1 θ^ = (6) MAP n + β − 1 + α − 1 Coin toss: point estimates for θ Maximum A Posteriori Moreover, for any given realization y of Y, the marginal distribution p(y) = R p(yjθ0)p(θ0)dθ0 is a constant Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 By evaluating the first partial derivative w.r.t θ and setting to 0 at θ^MAP we can derive y + α − 1 θ^ = (6) MAP n + β − 1 + α − 1 Coin toss: point estimates for θ Maximum A Posteriori Moreover, for any given realization y of Y, the marginal distribution p(y) = R p(yjθ0)p(θ0)dθ0 is a constant Thus, p(θjy) / p(yjθ)p(θ)p(y) so that θ^MAP = argmax p(θjy) θ = argmax p(yjθ)p(θ) θ = argmax log p(yjθ)p(θ) θ Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Coin toss: point estimates for θ Maximum A Posteriori Moreover, for any given realization y of Y, the marginal distribution p(y) = R p(yjθ0)p(θ0)dθ0 is a constant Thus, p(θjy) / p(yjθ)p(θ)p(y) so that θ^MAP = argmax p(θjy) θ = argmax p(yjθ)p(θ) θ = argmax log p(yjθ)p(θ) θ By evaluating the first partial derivative w.r.t θ and setting to 0 at θ^MAP we can derive y + α − 1 θ^ = (6) MAP n + β − 1 + α − 1 Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Can also choose uninformative prior: Jeffreys’ prior. For 1 1 beta-binomial model, corresponds to (α; β) = ( 2 ; 2 ). Deriving an analytic form for the posterior is possible also if the prior is conjugate. We saw last week that for a single Binomial experiment with a conjugate Beta, p(θjy) ∼ Beta(α + y − 1; β + n − y − 1) Coin toss: point estimates for θ Point estimate for θ: Choosing α; β The point estimate for θ^MAP shows choices of α and β correspond to having already seen prior data. Can encode strength of prior belief using these parameters. Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Deriving an analytic form for the posterior is possible also if the prior is conjugate. We saw last week that for a single Binomial experiment with a conjugate Beta, p(θjy) ∼ Beta(α + y − 1; β + n − y − 1) Coin toss: point estimates for θ Point estimate for θ: Choosing α; β The point estimate for θ^MAP shows choices of α and β correspond to having already seen prior data.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    88 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us