Conjugate Priors

Conjugate Priors

Lecture 7: Conjugate Priors Marina Meil˘a [email protected] Department of Statistics University of Washington February 1, 2018 Exponential families have conjugate priors Examples:Bernoulli model with Beta prior Examples:Multivariate normal with Normal-Inverse Wishart prior Example: Poisson distribution Reading B&S: 5.2, Hoff: 3.3,7.1{3 The posterior pθjx1:n in an exponential family I Exponential family model in canonical form θT x− (θ) pX (x; θ) = c(x)e (1) P Likelihood of sample x1:n with meanx ¯ = ( i xi )=n xT θ− (θ) pxjθ = c(x)e (2) Prior for parameter θ pθ(θ; parameters) the parameters will be specified shortly (3) By Bayes' rule T n(¯x θ− (θ))+ln pθ (θ;parameters) pθjx1:n / C(x1:n)e (4) Q with C(x1:n) = i c(xi ). I Let's look at the exponent T n(x ¯ θ − (θ) ) + ln pθ(θ; parameters) (5) |{z} | {z } bilinear ln Z(θ) I First two terms look like an exponential family in θ. What would it take to make the posterior be an exponential family? T Answer: ln pθ(θ; parameters) = ν0(µ0 θ − (θ))+constant(ν0;µ 0). The conjugate prior I A prior 1 T ν0(µ0 θ− (θ)) pθ(θ; ν0;µ 0) / e (6) Z(ν0;µ 0) is called conjugate prior for the exponential family defined by (1) I The normalization constant is Z φ(ν ;µ ) ν (µ T θ− (θ)) Z(ν0;µ 0) = e 0 0 = e 0 0 dθ (7) Θ I The posterior is now T T nx¯ +ν0µ0 T T (n+ν0) θ− (θ) −φ(ν0;µ0)) n(¯x θ− (θ))+ν0(µ0 θ− (θ)) n+ν0 pθjx1:n / e = e (8) T T nx¯ +ν0µ0 with hyper-parameters ν = n + ν0, µ = Exercise Why did the factor n+ν0 C(x1:n) disappear? I Hence, the ν parameter behaves like an equivalent sample size and the µ parameter like a mean value parameter and νµ like a equivalent sufficient statistic I When n ν0 the influence of the prior becomes neglijible, while for n ν0 , the prior sets the model mean of near µ0 Bernoulli model with Beta prior I See Lecture 6. Multivariate Normal I The multivariate normal distribution in p dimensions is 1 − 1 (x−µ)T Σ−1(x−µ) Normal(x; µ, Σ) = p e 2 (9) (2π)p=2 det Σ p p×p with x; µ 2 R and Σ 2 R positive definite. − 1 X T AX +bT X −1 −1 Remark When pX / e 2 , X ∼ Normal(A b; A ) I Useful to separate prior pµ;Σ = pµjΣpΣ The conjugate prior on µ pµ(µ; µ0; Λ0) = Normal(µ; µ0; Λ0) (10) 1 Pn T I Data sufficient statistics n; x¯; S, S = n i=1(xi − µ)(xi − µ) the sample covariance matrix −1 −1 −1 I The posterior covarianceΛ =Λ 0 + nΣ −1 I Cov is called a precision matrix. −1 −1 I The posterior mean µ =Λ (Λ0 µ0 + nx¯) I Data affects posterior of µ only viax ¯; n Defining prior overΣ I Idea \prior parameter is a sufficient statistic". Hence, conjugate distribution should be the distribution of statistics from Normal(0; S0) p I Assume z1:ν0 ∼ i.i.d.N(0; S0), zi 2 R . Pν0 T I Then S = z z is a covariance matrix (= ν × sample covar of z ), and it is ν0 i0=1 i i 0 1:ν0 non-singular w.p. 1 for ν0 ≥ p. The distribution of S is the Wishart distribution I 0ν0 −1 I We set the conjugate prior forΣ to be this distribution I . and we sayΣ is distributed as the Inverse Wishart Wishart and Inverse Wishart + I The Wishart distribution with ν0 degrees of freedom, over Sp the group of positive definite p × p matrices 1 1 −1 (ν0−p−1)=2 − trace S0 K pK (K; ν0; S0) = det K e 2 (11) ν0p=2 ν0 2 det S0Γp 2 ν0 p(p−1)=4 Qp ν0 j−1 with Γp 2 = π j=1 Γ 2 − 2 −1 I E[Σ ] = ν0S0 1 −1 I E[Σ] = S0 ν0−p−1 −1 I Posterior parameters ν0 + n, S0 + nS(µ) I again, posterior parameters and sufficient statistics combine linearly 1 −1 I Posterior expectation ofΣ= [S0 + nS(µ)] ν0+n−p−1 Univariate Normal and its conjugate prior 1 1 2 2 − 2 (x−µ) pX (x; µ;σ ) = p e 2σ (12) 2πσ2 I Prior p 2 = Normal(µ; µ ;λ ) µjσ ;µ0;λ0 0 0 I Posterior p 2 = Normal(µ; µ;λ ) µjx1:n;σ ;µ0;λ0 1 1 I Posterior mean E[µ] = µ0 + nx¯ λ λ0 I 1=λ0 is equivalent sample size 1 1 1 I Posterior variance = + n 2 λ0 λ0 σ I Precision increases with observing data The Poisson distribution and the Gamma prior 1 −λ x 1 θx−eθ I Poisson distribution PX (x) = x! e λ = Γ(x−1) e with θ = ln λ. I The conjugate prior is then (θµ−eθ )ν θ(νµ) −νeθ pλjµ / e = e e (13) I Changing the variable back to λ we have. dθ = dλ/λ and 1 p = λνµe−νλ / gamma(λ; µν; µ) (14) λjν,µ λ α I Recall that the mean of gamma(α, β) is β ; hence, E[λ] = µ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us