Bayesian Inference

Bayesian Inference Ricardo Ehlers [email protected] Departamento de Matemática Aplicada e Estat´ıstica Universidade de S~ao Paulo Estimation Introduction to Decision Theory A decision problem is completely specified by the description of the following spaces: • Parameter space or state of nature, Θ. • Space of possible results of an experiment, Ω. • Space of possible actions, A. A decision rule δ is a function defined in Ω which assumes values in A, i.e. δ :Ω ! A. 1 For each decision δ and each possible value of the parameter θ we can associate a loss L(δ; θ) assuming positive values. This is supposed to measure the penalty associated with a decision δ when the parameter takes the value θ. 2 Definition The risk of a decision rule, denoted R(δ), is the posterior expected loss, i.e. R(δ) = Eθjx[L(δ; θ)] Z = L(δ; θ)p(θjx)dθ: Θ 3 Definition A decision rule δ∗ is optimal if its risk is minimum, i.e. R(δ∗) < R(δ); 8δ: • This rule is called the Bayes rule and its risk is the Bayes risk. • Then, δ∗ = arg min R(δ); is the Bayes estimator. • Bayes risks can be used to compare estimators. A decision rule δ1 is preferred to a rule δ2 if R(δ1) < R(δ2) 4 Loss Functions Definition The quadratic loss function is defined as, L(δ; θ) = (δ − θ)2; and the associated risk is, 2 R(δ) = Eθjx[(δ − θ) ] Z = (δ − θ)2p(θjx)dθ: Θ 5 Lemma If L(δ; θ) = (δ − θ)2 the Bayes estimator of θ is E(θjx). The Bayes risk is, E[E(θjx) − θ]2 = Var(θjx): 6 Definition The absolute loss function is defined as, L(δ; θ) = jδ − θj: Lemma If L(δ; θ) = jδ − θj the Bayes estimator of θ is the median of the posterior distribution. 7 Definition The 0-1 loss function is defined as, L(δ; θ) = 1 − I (jθ − δj < ); where I (·) is an indicator function. Lemma Let δ() be the Bayes rule for this loss function and δ∗ the mode of the posterior distribution. Then, lim δ() = δ∗: !0 So, the Bayes estimator for a 0-1 loss function is the mode of the posterior distribution. 8 Consequently, for a 0-1 loss function and θ continuous the Bayes estimate is, θ∗ = arg max p(θjx) θ = arg max p(xjθ)p(θ) θ = arg max[log p(xjθ) + log p(θ)]: θ This is also referred to as the generalized maximum likelihood estimator (GMLE). 9 Example. Let X1;:::; Xn a random sample from a Bernoulli distribution with parameter θ. For a Beta(α; β) conjugate prior it follows that the posterior distribution is, θjx ∼ Beta(α + t; β + n − t) Pn where t = i=1 xi . Under quadratic loss the Bayes estimate is given by, α + Pn x E(θjx) = i=1 i : α + β + n 10 Example. In the previous example suppose that n = 100 and t = 10 so that the Bayes estimate under quadratic loss is, α + 10 E(θjx) = : α + β + 100 For a Beta(1,1) prior the estimate is 0.108 while for quite a different Beta(1,2) prior the estimate is 0.107. Both estimates are close to the maximum likelihood estimate 0.1. Under 0-1 loss with a Beta(1,1) prior the Bayes estimate is then 0.1. 11 Example. Bayesian estimates of θ under quadratic loss with a Beta(a; b) prior, varying n and keeping x = 0:1. Prior Parameters n (1,1) (1,2) (2,1) (0.001,0.001) (7,1.5) 10 0.167 (0.067) 0.154 (0.063) 0.231 (0.061) 0.1 (0.082) 0.432 (0.039) 20 0.136 (0.038) 0.13 (0.037) 0.174 (0.035) 0.1 (0.043) 0.316 (0.025) 50 0.115 (0.016) 0.113 (0.016) 0.132 (0.016) 0.1 (0.018) 0.205 (0.013) 100 0.108 (0.008) 0.107 (0.008) 0.117 (0.008) 0.1 (0.009) 0.157 (0.007) 200 0.104 (0.004) 0.103 (0.004) 0.108 (0.004) 0.1 (0.004) 0.129 (0.004) 12 Example. Let X1;:::; Xn a random sample from a Poisson distribution with parameter θ. Using a conjugate prior it follows that, X1;:::; Xn ∼ Poisson(θ) θ ∼ Gamma(α; β) θjx ∼ Gamma(α + t; β + n) Pn where t = i=1 xi . The Bayes estimate under quadratic loss is, α + Pn x α + nx E[θjx] = i=1 i = β + n β + n while the Bayes risk is, α + nx E[θjx] Var(θjx) = = : (β + n)2 β + n If α ! 0 and β ! 0 then, E[θjx] ! x and Var(θjx) ! x=n. If n ! 1 then E[θjx] ! x. 13 Bayes estimates of θ under quadratic loss using a Gamma(a; b) prior, varying n and keeping x = 10. Prior Parameters n (1,0.01) (1,2) (2,1) (0.001,0.001) (7,1.5) 10 10.09 (1.008) 8.417 (0.701) 9.273 (0.843) 9.999 (1) 9.304 (0.809) 20 10.045 (0.502) 9.136 (0.415) 9.619 (0.458) 10 (0.5) 9.628 (0.448) 50 10.018 (0.2) 9.635 (0.185) 9.843 (0.193) 10 (0.2) 9.845 (0.191) 100 10.009 (0.1) 9.814 (0.096) 9.921 (0.098) 10 (0.1) 9.921 (0.098) 200 10.004 (0.05) 9.906 (0.049) 9.96 (0.05) 10 (0.05) 9.96 (0.049) 14 2 Example. If X1;:::; Xn is a random sample from a N(θ; σ ) with 2 2 σ known and using the conjugate prior, i.e. θ ∼ N(µ0; τ0 ) then the posterior is also normal in which case mean, median and mode coincide. The Bayes estimate of θ is then given by, τ −2µ + nσ−2x 0 0 : −2 −2 τ0 + nσ Under a Jeffreys prior for θ the Bayes estimate is simply x. 15 2 Example. Let X1;:::; Xn a random sample from a N(θ; σ ) distribution with θ known and φ = σ−2 unknown. n n σ2 φ ∼ Gamma 0 ; 0 0 2 2 n + n n σ2 + ns2 φjx ∼ Gamma 0 ; 0 0 0 : 2 2 2 Pn 2 where ns0 = i=1(xi − θ) . Then, n0 + n E(φjx) = 2 2 n0σ0 + ns0 is the Bayes estimate under quadratic loss. 16 • The quadratic loss can be extended to the multivariate case, L(δ; θ) = (δ − θ)0(δ − θ); and the Bayes estimate is E(θjx). • Likewise, the 0-1 loss can also be extended, L(δ; θ) = lim I (jθ − δj 2 A); vol(A)!0 and the Bayes estimate is the joint mode of the posterior distribution. • However, the absolute loss has no clear extension. 17 Example. Suppose X = (X1;:::; Xp) has a multinomial distribution with parameters n and θ = (θ1; : : : ; θp). If we adopt a Dirichlet prior with parameters α1; : : : ; αp the posterior distribution is Dirichlet with parameters xi + αi , i = 1;:::; p. Under quadratic loss, the Bayes estimate of θ is E(θjx) where, xi + αi E(θi jx) = E(θi jxi ) = Pp : n + j=1 αj 18 Definition A quantile loss function is defined as, L(δ; θ) = c1(δ − θ)I(−∞,δ)(θ) + c2(θ − δ)I(δ;1)(θ); where c1 > 0 and c2 > 0. It can be shown that the Bayes estimate of θ is a value θ∗ such that, c P(θ ≤ θ∗) = 2 : c1 + c2 So the Bayes estimate is the quantile of order c2=(c1 + c2) of the posterior distribution. 19 Definition The Linex (Linear Exponential) loss function is defined as, L(δ; θ) = exp[c(δ − θ)] − c(δ − θ) − 1; It can be shown that the Bayes estimator is, 1 δ∗ = log E[ecθjx]; c 6= 0: c 20 Linex function with c < 0 reflecting small losses for overestimation and large losses for underestimation. 40 30 Loss 20 c = − 0.5 c = − 1 c = − 2 10 0 −6 −4 −2 0 2 4 6 δ − θ 21 Credible Sets • Point estimates simplify the posterior distribution into single figures. • How precise are point estimates? • We seek a compromise between reporting a single number representing the posterior distribution or report the distribution itself. 22 Definition A set C 2 Θ is a 100(1-α)% credible set for θ if, P(θ 2 C) ≥ 1 − α: • The inequality is useful when θ has a discrete distribution, otherwise an equality is used in practice. • This definition differs fundamentally from the classical confidence region. 23 Invariance under Transformation Credible sets are invariant under 1 to 1 parameter transformations. Let θ∗ = g(θ) and C ∗ denotes the image of θ under g. Then, P(θ∗ 2 C ∗) = 1 − α: In the univariate case, if C = [a; b] is a 100(1-α)% credible interval for θ then [g(a); g(b)] is a 100(1-α)% credible interval for θ∗. 24 • Credible sets are not unique in general. • For any α > 0 there are infinitely many solutions to P(θ 2 C) = 1 − α: 25 90% credible intervals for a Poisson parameter θ when the posterior is Gamma(4,0:5). 0.10 0.10 0.08 0.08 0.06 0.06 0.04 0.04 0.02 0.02 0.00 0.00 0 5 10 15 20 25 0 5 10 15 20 25 θ θ 0.10 0.08 0.06 0.04 0.02 0.00 26 0 5 10 15 20 25 90% credible intervals for Binomial parameter θjx ∼ Beta(2; 1:5).

Load more