Conjugate Prior

Home , Conjugate prior, Gibbs sampling, Hyperprior

Chapter 4 More than one parameter

Aims: ◃ Moving towards practical applications ◃ Illustrating that computations become quickly involved ◃ Illustrating that frequentist results can be obtained with Bayesian procedures ◃ Illustrating a multivariate (independent) sampling algorithm

Bayesian Biostatistics - Piracicaba 2014 196 4.1 Introduction

• Most statistical models involve more than one parameter to estimate

• Examples: ◃ Normal distribution: mean µ and variance σ2 2 ◃ Linear regression: regression coeﬃcients β0, β1, . . . , βd and residual variance σ

◃ Logistic regression: regression coeﬃcients β0, β1, . . . , βd ∑ d ◃ Multinomial distribution: class probabilities θ1, θ2, . . . , θd with j=1 θj = 1

• This requires a prior for all parameters (together): expresses our beliefs about the model parameters

• Aim: derive posterior for all parameters and their summary measures

Bayesian Biostatistics - Piracicaba 2014 197 • It turns out that in most cases analytical solutions for the posterior will not be possible anymore

• In Chapter 6, we will see that for this Markov Chain Monte Carlo methods are needed

• Here, we look at a simple multivariate sampling approach: Method of Composition

Bayesian Biostatistics - Piracicaba 2014 198 4.2 Joint versus marginal posterior inference

• Bayes theorem:

L(θ | y)p(θ) p(θ | y) = ∫ L(θ | y)p(θ) d θ

T ◦ Hence, the same expression as before but now θ = (θ1, θ2, . . . , θd) ◦ Now, the prior p(θ) is multivariate. But often a prior is given for each parameter separately ◦ Posterior p(θ | y) is also multivariate. But we usually look only at the (marginal) posteriors p(θj | y)(j = 1, . . . , d)

• We also need for each parameter: posterior mean, median (and sometimes mode), and credible intervals

Bayesian Biostatistics - Piracicaba 2014 199 • Illustration on the normal distribution with µ and σ2 unknown

• Application: determining 95% normal range of alp (continuation of Example III.6)

• We look at three cases (priors): ◃ No prior knowledge is available ◃ Previous study is available ◃ Expert knowledge is available

• But ﬁrst, a brief theoretical introduction

Bayesian Biostatistics - Piracicaba 2014 200 4.3 The normal distribution with µ and σ2 unknown

Acknowledging that µ and σ2 are unknown

2 • Sample y1, . . . , yn of independent observations from N(µ, σ )

• Joint likelihood of (µ, σ2) given y: [ ] 1 1 ∑n L(µ, σ2 | y) = exp − (y − µ)2 (2πσ2)n/2 2σ2 i i=1

• The posterior is again product of likelihood with prior divided by the denominator which involves an integral

• In this case analytical calculations are possible in 2 of the 3 cases

Bayesian Biostatistics - Piracicaba 2014 201 4.3.1 No prior knowledge on µ and σ2 is available

• Noninformative joint prior p(µ, σ2) ∝ σ−2 (µ and σ2 a priori independent) { [ ]} • 2 | ∝ 1 − 1 − 2 − 2 Posterior distribution p(µ, σ y) σn+2 exp 2σ2 (n 1)s + n(y µ)

2.4 7.4 2.2 7.3 2.0 7.2 1.8 7.1 1.6 σ2 7.0 1.4 6.9 µ 1.2 6.8

Bayesian Biostatistics - Piracicaba 2014 202 Justiﬁcation prior distribution

• Most often prior information on several parameters arrives to us for each of the parameters separately and independently ⇒ p(µ, σ2) = p(µ) × p(σ2)

• And, we do not have prior information on µ nor on σ2 ⇒ choice of prior distributions:

• The chosen priors are called ﬂat priors

Bayesian Biostatistics - Piracicaba 2014 203 • Motivation: ◦ If one is totally ignorant of a location parameter, then it could take any value on the real line with equal prior probability. ◦ If totally ignorant about the scale of a parameter, then it is as likely to lie in the interval 1-10 as it is to lie in the interval 10-100. This implies a ﬂat prior on the log scale.

• The ﬂat prior p(log(σ)) = c is equivalent to chosen prior p(σ2) ∝ σ−2

Bayesian Biostatistics - Piracicaba 2014 204 Marginal posterior distributions

Marginal posterior distributions are needed in practice

◃ p(µ | y) ◃ p(σ2 | y)

• Calculation of marginal posterior distributions involve integration: ∫ ∫ p(µ | y) = p(µ, σ2 | y)dσ2 = p(µ | σ2, y)p(σ2 | y)dσ2

• Marginal posterior is weighted sum of conditional posteriors with weights = uncertainty on other parameter(s)

Bayesian Biostatistics - Piracicaba 2014 205 Conditional & marginal posterior distributions for the normal case

• Conditional posterior for µ: p(µ | σ2, y) = N(y, σ2/n)

2 • Marginal posterior for µ: p(µ | y) = tn−1(y, s /n) µ − y ⇒ √ ∼ t − (µ is the random variable) s/ n (n 1)

• Marginal posterior for σ2: p(σ2 | y) ≡ Inv − χ2(n − 1, s2) (scaled inverse chi-squared distribution)

(n − 1)s2 ⇒ ∼ χ2(n − 1) (σ2 is the random variable) σ2

= special case of IG(α, β) (α = (n − 1)/2, β = 1/2)

Bayesian Biostatistics - Piracicaba 2014 206 Some t-densities

Bayesian Biostatistics - Piracicaba 2014 207 Some inverse-gamma densities

Bayesian Biostatistics - Piracicaba 2014 208 Joint posterior distribution

• Joint posterior = multiplication of marginal with conditional posterior

p(µ, σ2 | y) = p(µ | σ2, y) p(σ2 | y) = N(y, σ2/n) Inv − χ2(n − 1, s2)

• Normal-scaled-inverse chi-square distribution = N-Inv-χ2(y,n,(n − 1),s2)

2.4 7.4 2.2 7.3 2.0 7.2 1.8 7.1 1.6 σ2 7.0 1.4 6.9 µ 1.2 6.8

⇒ A posteriori µ and σ2 are dependent

Bayesian Biostatistics - Piracicaba 2014 209 Posterior summary measures and PPD

For µ:

◃ Posterior mean = mode = median = y

(n−1) 2 ◃ Posterior variance = n(n−2)s ◃ 95% equal tail credible and HPD interval: √ √ [y − t(0.025; n − 1) s/ n, y + t(0.025; n − 1) s/ n]

For σ2:

◦ Posterior mean, mode, median, variance, 95% equal tail CI all analytically available ◦ 95% HPD interval is computed iteratively

PPD: [ ( )] ◦ 2 1 tn−1 y, s 1 + n -distribution

Bayesian Biostatistics - Piracicaba 2014 210 Implications of previous results

Frequentist versus Bayesian inference: ◃ Numerical results are the same ◃ Inference is based on diﬀerent principles

Bayesian Biostatistics - Piracicaba 2014 211 Example IV.1: SAP study – Noninformative prior

◃ Example III.6: normal range for alp is too narrow ◃ Joint posterior distribution = N-Inv-χ2 (NI prior + likelihood, see before) √ ◃ Marginal posterior distributions (red curves) for y = 100/ alp Posterior Posterior 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 6.2 6.6 7.0 7.4 1.5 2.0 2.5 3.0 3.5 4.0 2 µ σ

Bayesian Biostatistics - Piracicaba 2014 212 Normal range for alp:

• PPD for y = t249(7.11, 1.37)-distribution

• 95% normal range for alp = [104.1, 513.2], slightly wider than before

Bayesian Biostatistics - Piracicaba 2014 213 4.3.2 An historical study is available

• Posterior of historical data can be used as prior to the likelihood of current data

• 2 2 Prior = N-Inv-χ (µ0,κ0,ν0,σ0)-distribution (from historical data)

• Posterior = N-Inv-χ2(µ,κ, ν,σ2)-distribution (combining data and N-Inv-χ2 prior) ◃ N-Inv-χ2 is conjugate prior ◃ Again shrinkage of posterior mean towards prior mean ◃ Posterior variance = weighted average of prior-, sample variance and distance between prior and sample mean ⇒ posterior variance is not necessarily smaller than prior variance!

• Similar results for posterior measures and PPD as in ﬁrst case

Bayesian Biostatistics - Piracicaba 2014 214 Example IV.2: SAP study – Conjugate prior

• Prior based on retrospective study (Topal et al., 2003) of 65 ‘healthy’ subjects: √ ◦ Mean (SD) for y = 100/ alp = 5.25 (1.66) ◦ Conjugate prior = N-Inv-χ2(5.25, 65, 64,2.76) ◦ Note: mean (SD) prospective data: 7.11 (1.4), quite diﬀerent

◦ Posterior = N-Inv-χ2(6.72, 315, 314, 2.61): ◦ Posterior mean in-between between prior mean & sample mean, but: ◦ Posterior precision ≠ prior + sample precision ◦ Posterior variance < prior variance and > sample variance ◦ Posterior informative variance > NI variance ◦ Prior information did not lower posterior uncertainty, reason: conﬂict of likelihood with prior

Bayesian Biostatistics - Piracicaba 2014 215 Marginal posteriors: Posterior Posterior 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 6.2 6.6 7.0 7.4 1.5 2.0 2.5 3.0 3.5 4.0 2 µ σ

Red curves = marginal posteriors from informative prior (historical data)

Bayesian Biostatistics - Piracicaba 2014 216 Histograms retro- and prospective data:

Prospective data (Likelihood) Density 0.0 0.1 0.2 0.3 0.4 2 4 6 8 10 12 100ALP(−1 2)

Retrospective data (Informative prior) Density 0.0 0.1 0.2 0.3 2 4 6 8 10 12 100ALP(−1 2)

Bayesian Biostatistics - Piracicaba 2014 217 4.3.3 Expert knowledge is available

• Expert knowledge available on each parameter separately

⇒ 2 × − 2 2 ̸ Joint prior N(µ0, σ0) Inv χ (ν0, τ0 ) = conjugate

• Posterior cannot be derived analytically, but numerical/sampling techniques are available

Bayesian Biostatistics - Piracicaba 2014 218 What now?

Computational problem: ◃ ‘Simplest problem’ in classical statistics is already complicated ◃ Ad hoc solution is still possible, but not satisfactory ◃ There is the need for another approach

Bayesian Biostatistics - Piracicaba 2014 219 4.4 Multivariate distributions

Distributions with a multivariate response:

◃ Multivariate normal distribution: generalization of normal distribution ◃ Multivariate Student’s t-distribution: generalization of location-scale t-distribution ◃ Multinomial distribution: generalization of binomial distribution

Multivariate prior distributions:

◃ N-Inv-χ2-distribution: prior for N(µ, σ2) ◃ Dirichlet distribution: generalization of beta distribution ◃ (Inverse-)Wishart distribution: generalization of (inverse-) gamma (prior) for covariance matrices (see mixed model chapter)

Bayesian Biostatistics - Piracicaba 2014 220 Example IV.3: Young adult study – Smoking and alcohol drinking

• Study examining life style among young adults

Smoking Alcohol No Yes No-Mild 180 41 Moderate-Heavy 216 64 Total 396 105

• Of interest: association between smoking & alcohol-consumption

Bayesian Biostatistics - Piracicaba 2014 221 Likelihood part:

2×2 contingency table = multinomial model Mult(n, θ) ∑ • { − − − } θ = θ11, θ12, θ21, θ22 = 1 θ11 θ12 θ21 and 1 = i,j θij ∑ • { } y = y11, y12, y21, y22 and n = i,j yij

n! y11 y12 y11 y22 Mult(n, θ) = θ11 θ12 θ21 θ22 y11! y12! y21! y22!

Bayesian Biostatistics - Piracicaba 2014 222 Dirichlet prior:

Conjugate prior to multinomial distribution = Dirichlet prior Dir(α)

∏ 1 α −1 θ ∼ θ ij B(α) ij i,j

◦ α = {α11, α12, α21, α22} ∏ (∑ ) ◦ B(α) = i,j Γ(αij)/Γ i,j αij

⇒ Posterior distribution = Dir(α + y)

• Note: ◦ Dirichlet distribution = extension of beta distribution to higher dimensions ◦ Marginal distributions of a Dirichlet distribution = beta distribution

Bayesian Biostatistics - Piracicaba 2014 223 Measuring association:

• Association between smoking and alcohol consumption:

θ θ ψ = 11 22 θ12 θ21

• Needed p(ψ | y), but diﬃcult to derive

• Alternatively replace analytical calculations by sampling procedure

Bayesian Biostatistics - Piracicaba 2014 224 Analysis of contingency table:

• Prior distribution: Dir(1, 1, 1, 1)

• Posterior distribution: Dir(180+1, 41+1,216+1, 64+1)

• Sample of 10, 000 generated values for θ parameters

• 95% equal tail CI for ψ: [0.839, 2.014]

• Equal to classically obtained estimate

Bayesian Biostatistics - Piracicaba 2014 225 Posterior distributions: 0 5 10 15 20 0 5 10 15 20 25 30 35

0.30 0.35 0.40 0.45 0.04 0.06 0.08 0.10 0.12

θ11 θ12 0 5 10 15 0.0 0.4 0.8 1.2 0.35 0.40 0.45 0.50 0.5 1.0 1.5 2.0 2.5 3.0 3.5

θ21 ψ

Bayesian Biostatistics - Piracicaba 2014 226 4.5 Frequentist properties of Bayesian inference

• Not of prime interest for a Bayesian to know the sampling properties of estimators

• However, it is important that Bayesian approach gives most often the right answer

• What is known? ◃ Theory: posterior is normal for a large sample (BCLT) ◃ Simulations: Bayesian approach may oﬀer alternative interval estimators with better coverage than classical frequentist approaches

Bayesian Biostatistics - Piracicaba 2014 227 4.6 The Method of Composition

A method to yield a random sample from a multivariate distribution

• Stagewise approach

• Based on factorization of joint distribution into a marginal & several conditionals

p(θ1, . . . , θd | y) = p(θd | y) p(θd−1 | θd, y) ... p(θ1 | θd−1, . . . , θ2, y)

• Sampling approach: e ◃ Sample θd from p(θd | y) e e ◃ Sample θ(d−1) from p(θ(d−1) | θd, y) ◃ ... e e e ◃ Sample θ1 from p(θ1 | θd−1,..., θ2, y)

Bayesian Biostatistics - Piracicaba 2014 228 Sampling from posterior when y ∼ N(µ, σ2), both parameters unknown

• Sample ﬁrst σ2, then given a sampled value of σ2 (σe2) sample µ from p(µ | σe2, y)

• Output case 1: No prior knowledge on µ and σ2 on next page

Bayesian Biostatistics - Piracicaba 2014 229 Sampled posterior distributions:

(a) (b) 0 1 2 3 4 5 0.0 0.5 1.0 1.5 2.0 1.4 1.6 1.8 2.0 2.2 2.4 6.9 7.1 7.3 σ2 µ

(d) (c) 2 σ 0.00 0.10 0.20 0.30 1.4 1.6 1.8 2.0 2.2 2.4 6.9 7.0 7.1 7.2 7.3 7.4 4 6 8 10 12 µ ~y

Bayesian Biostatistics - Piracicaba 2014 230 4.7 Bayesian linear regression models

• Example of a classical multiple linear regression analysis

• Non-informative Bayesian multiple linear regression analysis: ◃ Non-informative prior for all parameters + ... classical linear regression model ◃ Analytical results are available + method of composition can be applied

Bayesian Biostatistics - Piracicaba 2014 231 4.7.1 The frequentist approach to linear regression

Classical regression model: y = Xβ + ε

. y = a n × 1 vector of independent responses . X = n × (d + 1) design matrix . β = (d + 1) × 1 vector of regression parameters . ε = n × 1 vector of random errors ∼ N(0, σ2 I)

Likelihood: [ ] 1 1 L(β, σ2 | y, X) = exp − (y − Xβ)T (y − Xβ) (2πσ2)n/2 2σ2

. MLE = LSE of β: βb = (XT X)−1XT y . Residual sum of squares: S = (y − Xβ)T (y − Xβ) . Mean residual sum of squares: s2 = S/(n − d − 1)

Bayesian Biostatistics - Piracicaba 2014 232 Example IV.7: Osteoporosis study: a frequentist linear regression analysis

◃ Cross-sectional study (Boonen et al., 1996) ◃ 245 healthy elderly women in a geriatric hospital ◃ Aim: Find determinants for osteoporosis ◃ Average age women = 75 yrs with a range of 70-90 yrs ◃ Marker for osteoporosis = tbbmc (in kg) measured for 234 women ◃ Simple linear regression model: regressing tbbmc on bmi ◃ Classical frequentist regression analysis: b ◦ β0 = 0.813 (0.12) b ◦ β1 = 0.0404 (0.0043) ◦ s2 = 0.29, with n − d − 1 = 232 b b ◦ corr(β0, β1) =-0.99

Bayesian Biostatistics - Piracicaba 2014 233 Scatterplot + ﬁtted regression line: TBBMC TBBMC (kg)

0.5 1.020 1.5 2.0 25 2.5 30 35 40 BMI (kg m2)

Bayesian Biostatistics - Piracicaba 2014 234 4.7.2 A noninformative Bayesian linear regression model

Bayesian linear regression model = prior information on regression parameters & residual variance + normal regression likelihood

• Noninformative prior for (β, σ2): p(β, σ2) ∝ σ−2

• Notation: omit design matrix X

Bayesian Biostatistics - Piracicaba 2014 235 4.7.3 Posterior summary measures for the linear regression model

• Posterior summary measures of (a) regression parameters β (b) parameter of residual variability σ2

• Univariate posterior summary measures b ◃ The marginal posterior mean (mode, median) of βj = MLE (LSE) βj

◃ 95% HPD interval for βj ◃ Marginal posterior mode and mean of σ2 ◃ 95% HPD-interval for σ2

Bayesian Biostatistics - Piracicaba 2014 236 Multivariate posterior summary measures

Multivariate posterior summary measures for β

• Posterior mean (mode) of β = βb (MLE=LSE)

• 100(1-α)%-HPD region

• Contour probability for H0 : β = β0

Bayesian Biostatistics - Piracicaba 2014 237 Posterior predictive distribution

• PPD of ye with xe: t-distribution

• How to sample? ◃ Directly from t-distribution ◃ Method of Composition

Bayesian Biostatistics - Piracicaba 2014 238 4.7.4 Sampling from the posterior distribution

• Most posteriors can be sampled via standard sampling algorithms

• What about p(β | y) = multivariate t-distribution? How to sample from this distribution? (R function rmvt in mvtnorm)

• Easy with Method of Composition: Sample in two steps ◃ Sample from p(σ2 | y): scaled inverse chi-squared distribution ⇒ σe2 ◃ Sample from p(β | σe2, y) = multivariate normal distribution

Bayesian Biostatistics - Piracicaba 2014 239 Example IV.8: Osteoporosis study – Sampling with Method of Composition

• Sample σe2 from p(σ2 | y) = Inv − χ2(σ2 | n − d − 1, s2) [ ] e 2 b 2 T −1 • Sample from β from p(β | σe , y) = N(d+1) β | β, σe (X X)

• Sampled mean regression vector = (0.816, 0.0403)

• 95% equal tail CIs = β0: [0.594, 1.040] & β1: [0.0317, 0.0486]

• Contour probability for H0 : β = 0 < 0.001

• Marginal posterior of (β0, β1) has a ridge (r(β0, β1) = −0.99)

Bayesian Biostatistics - Piracicaba 2014 240 PPD:

• Distribution of a future observation at bmi=30

• e e e2 Sample future observation y from N(µ30, σ30): T ◃ µe = βe (1, 30) 30 [ ] e2 e2 T −1 T ◃ σ30 = σ 1 + (1, 30)(X X) (1, 30)

• Sampled mean and standard deviation = 2.033 and 0.282

Bayesian Biostatistics - Piracicaba 2014 241 Posterior distributions:

(a) (b) 0 1 2 3 4 0 20 40 60 80 100

0.6 0.8 1.0 1.2 0.025 0.035 0.045 β0 β1

Bayesian Biostatistics - Piracicaba 2014 242 4.8 Bayesian generalized linear models

Generalized Linear Model (GLIM): extension of the linear regression model to a wide class of regression models

• Examples: ◦ Normal linear regression model with normal distribution for continuous response and σ2 assumed known ◦ Poisson regression model with Poisson distribution for count response, and log(mean) = linear function of covariates ◦ Logistic regression model with Bernoulli distribution for binary response and logit of probability = linear function of covariates

Bayesian Biostatistics - Piracicaba 2014 243 4.8.1 More complex regression models

• Considered multiparameter models are limited ◃ Weibull distribution for alp? ◃ Censored/truncated data? ◃ Cox regression?

• Postpone to MCMC techniques

Bayesian Biostatistics - Piracicaba 2014 244 Take home messages

• Any practical application involves more than one parameter, hence immediately Bayesian inference is multivariate even with univariate data.

• A multivariate prior is needed and a multivariate posterior is obtained, but the marginal posterior is the basis for practical inference

• Nuisance parameters: ◃ Bayesian inference: average out nuisance parameter ◃ Classical inference: proﬁle out (maximize out nuisance parameter)

• Multivariate independent sampling can be done, if marginals can be computed

• Frequentist properties of Bayesian estimators (with NI priors) often good

Bayesian Biostatistics - Piracicaba 2014 245 Chapter 5 Choosing the prior distribution

Aims: ◃ Review the diﬀerent principles that lead to a prior distribution ◃ Critically review the impact of the subjectivity of prior information

Bayesian Biostatistics - Piracicaba 2014 246 5.1 Introduction

Incorporating prior knowledge

◃ Unique feature for Bayesian approach ◃ But might introduce subjectivity ◃ Useful in clinical trials to reduce sample size

In this chapter we review diﬀerent kinds of priors:

◃ Conjugate ◃ Noninformative ◃ Informative

Bayesian Biostatistics - Piracicaba 2014 247 5.2 The sequential use of Bayes theorem

• Posterior of the kth experiment = prior for the (k + 1)th experiment (sequential surgeries)

• In this way, the Bayesian approach can mimic our human learning process

• Meaning of ‘prior’ in prior distribution: ◦ Prior: prior knowledge should be speciﬁed independent of the collected data ◦ In RCTs: ﬁx the prior distribution in advance

Bayesian Biostatistics - Piracicaba 2014 248 5.3 Conjugate prior distributions

In this section:

• Conjugate priors for univariate & multivariate data distributions

• Conditional conjugate and semi-conjugate distributions

• Hyperpriors

Bayesian Biostatistics - Piracicaba 2014 249 5.3.1 Conjugate priors for univariate data distributions

• In previous chapters, examples were given whereby combination of prior with likelihood, gives posterior of the same type as the prior.

• This property is called conjugacy.

• For an important class of distributions (those that belong to exponential family) there is a recipe to produce the conjugate prior

Bayesian Biostatistics - Piracicaba 2014 250 Table conjugate priors for univariate discrete data distributions

Exponential family member Parameter Conjugate prior UNIVARIATE CASE Discrete distributions

Bernoulli Bern(θ) θ Beta(α0, β0)

Binomial Bin(n,θ) θ Beta(α0, β0)

Negative Binomial NB(k,θ) θ Beta(α0, β0)

Poisson Poisson(λ) λ Gamma(α0, β0)

Bayesian Biostatistics - Piracicaba 2014 251 Table conjugate priors for univariate continuous data distributions

Exponential family member Parameter Conjugate prior UNIVARIATE CASE Continuous distributions 2 2 2 Normal-variance fixed N(µ, σ )-σ fixed µ N(µ0, σ0) 2 2 Normal-mean fixed N(µ, σ )-µ fixed σ IG(α0, β0) 2 2 Inv-χ (ν0, τ0 ) ∗ 2 2 Normal N(µ, σ ) µ, σ NIG(µ0, κ0, a0, b0) 2 2 N-Inv-χ (µ0,κ0,ν0,τ0 )

Exponential Exp(λ) λ Gamma(α0, β0)

Bayesian Biostatistics - Piracicaba 2014 252 Recipe to choose conjugate priors p(y | θ) ∈ exponential family: [ ] p(y | θ) = b(y) exp c(θ)T t(y) + d(θ)

T ◦ d(θ), b(y) = scalar functions, c(θ) = (c1(θ), . . . , cd(θ)) ◦ t(y) = d-dimensional suﬃcient statistic for θ (canonical parameter) ◦ Examples: Binomial distribution, Poisson distribution, normal distribution, etc.

For a random sample y = {y1, . . . , yn} of i.i.d. elements: [ ] p(y | θ) = b(y) exp c(θ)T t(y) + nd(θ)

∏ ∑ ◦ n n b(y) = 1 b(yi) & t(y) = 1 t(yi)

Bayesian Biostatistics - Piracicaba 2014 253 Recipe to choose conjugate priors

For the exponential family, the class of prior distributions ℑ closed under sampling =

[ ] p(θ | α, β) = k(α, β) exp c(θ)T α + βd(θ)

◦ α = (α , . . . , α )T and β hyperparameters 1 d ∫ [ ] ◦ Normalizing constant: k(α, β) = 1/ exp c(θ)T α + βd(θ) dθ

Proof of closure: | ∝ | p(θ y) p(y [ θ)p(θ) ] [ ] T T = exp [c(θ) t(y) + n d(θ]) exp c(θ) α + βd(θ) = exp c(θ)T α∗ + β∗d(θ) , with α∗ = α + t(y), β∗ = β + n

Bayesian Biostatistics - Piracicaba 2014 254 Recipe to choose conjugate priors

• Above rule gives the natural conjugate family

• Enlarge class of priors ℑ by adding extra parameters: conjugate family of priors, again closed under sampling (O’Hagan & Forster, 2004)

• The conjugate prior has the same functional form as the likelihood, obtained by replacing the data (t(y) and n) by parameters (α and β)

• A conjugate prior is model-dependent, in fact likelihood-dependent

Bayesian Biostatistics - Piracicaba 2014 255 Practical advantages when using conjugate priors

A (natural) conjugate prior distribution for the exponential family is convenient from several viewpoints:

• mathematical

• numerical

• interpretational (convenience prior): ◃ The likelihood of historical data can be easily turned into a conjugate prior. The natural conjugate distribution = equivalent to a ﬁctitious experiment ◃ For a natural conjugate prior, the posterior mean = weighted combination of the prior mean and sample estimate

Bayesian Biostatistics - Piracicaba 2014 256 Example V.2: Dietary study – Normal versus t-prior

• Example II.2: IBBENS-2 normal likelihood was combined with N(328,100) (conjugate) prior distribution

• Replace the normal prior by a t30(328, 100)-prior ⇒ posterior practically unchanged, but 3 elegant features of normal prior are lost:

◃ Posterior cannot be determined analytically ◃ Posterior is not of the same class as the prior ◃ Posterior summary measures are not obvious functions of the prior and the sample summary measures

Bayesian Biostatistics - Piracicaba 2014 257 5.3.2 Conjugate prior for normal distribution – mean and variance unknown

N(µ, σ2) with µ and σ2 unknown ∈ two-parameter exponential family

• Conjugate = product of a normal prior with inverse gamma prior

• Notation: NIG(µ0, κ0, a0, b0)

Bayesian Biostatistics - Piracicaba 2014 258 Mean known and variance unknown

• For σ2 unknown and µ known :

Natural conjugate is inverse gamma (IG) Equivalently: scaled inverse-χ2 distribution (Inv-χ2)

Bayesian Biostatistics - Piracicaba 2014 259 5.3.3 Multivariate data distributions

Priors for two popular multivariate models:

• Multinomial model

• Multivariate normal model

Bayesian Biostatistics - Piracicaba 2014 260 Table conjugate priors for multivariate data distributions

Exponential family member Parameter Conjugate prior MULTIVARIATE CASE Discrete distributions

Multinomial Mult(n,θ) θ Dirichlet(α0) Continuous distributions

Normal-covariance ﬁxed N(µ, Σ)-Σ ﬁxed µ N(µ0, Σ0)

Normal-mean ﬁxed N(µ, Σ)-µ ﬁxed Σ IW(Λ0, ν0) ∗ Normal N(µ, Σ) µ, Σ NIW(µ0, κ0, ν0, Λ0)

Bayesian Biostatistics - Piracicaba 2014 261 Multinomial model

∏ y Mult(n,θ): p(y | θ) = n! k θ j ∈ exponential family y1!y2!...yk! j=1 j

Natural conjugate: Dirichlet(α0) distribution

∏ k Γ(α ) ∏ α −1 | ∑j=1 0j k 0j p(θ α0) = k j=1 θj j=1 Γ(α0j)

Properties:

◃ Posterior distribution = Dirichlet(α0 + y) ◃ Beta distribution = special case of a Dirichlet distribution with k = 2 ◃ Marginal distributions of the Dirichlet distribution = beta distributions ◃ Dirichlet(1, 1,..., 1) = extension of the classical uniform prior Beta(1,1)

Bayesian Biostatistics - Piracicaba 2014 262 Multivariate normal model

The p-dimensional multivariate normal distribution:

[ ∑ ] p(y ,..., y | µ, Σ) = 1 exp −1 n (y − µ)T Σ−1(y − µ) 1 n (2π)np/2|Σ|1/2 2 i=1 i i

Conjugates:

◃ Σ known and µ unknown: N(µ0, Σ0) for µ

◃ Σ unknown and µ known: inverse Wishart distribution IW(Λ0, ν0) for Σ

◃ Σ unknown and µ unknown:

Normal-inverse Wishart distribution NIW(µ0, κ0, ν0, Λ0) for µ and Σ

Bayesian Biostatistics - Piracicaba 2014 263 5.3.4 Conditional conjugate and semi-conjugate priors

Example θ = (µ, σ2) for y ∼ N(µ, σ2)

• 2 Conditional conjugate for µ:N(µ0, σ0)

• Conditional conjugate for σ2: IG(α, β)

• Semi-conjugate prior = product of conditional conjugates

• Often conjugate priors cannot be used in WinBUGS, but semi-conjugates are popular

Bayesian Biostatistics - Piracicaba 2014 264 5.3.5 Hyperpriors

Conjugate priors are restrictive to present prior knowledge

⇒ Give parameters of conjugate prior also a prior

Example:

• Prior: θ ∼ Beta(1, 1)

• Instead: θ ∼ Beta(α, β) and α ∼ Gamma(1, 3), β ∼ Gamma(2, 4) ◃ α, β = hyperparameters ◃ Gamma(1, 3) × Gamma(2, 4) = hyperprior/hierarchical prior

• Aim: more ﬂexibility in prior distribution (and useful for Gibbs sampling)

Bayesian Biostatistics - Piracicaba 2014 265 5.4 Noninformative prior distributions

Bayesian Biostatistics - Piracicaba 2014 266 5.4.1 Introduction

Sometimes/often researchers cannot or do not wish to make use of prior knowledge ⇒ prior should reﬂect this absence of knowledge

• Prior that express no knowledge = (initially) called a noninformative (NI)

• Central question: What prior reflects absence of knowledge? ◃ Flat prior? ◃ Huge amount of research to find best NI prior ◃ Other terms for NI: non-subjective, objective, default, reference, weak, diffuse, flat, conventional and minimally informative, etc

• Challenge: make sure that posterior is a proper distribution!

Bayesian Biostatistics - Piracicaba 2014 267 5.4.2 Expressing ignorance

• Equal prior probabilities = principle of insuﬃcient reason, principle of indiﬀerence, Bayes-Laplace postulate

• Unfortunately, but ... ﬂat prior cannot express ignorance

Bayesian Biostatistics - Piracicaba 2014 268 Ignorance at diﬀerent scales:

Flat prior on σ Flat prior on σ Prior Prior 0 1 2 3 4 5 0 1 2 3 4 5

0 1 2 3 4 5 0 1 2 3 4 5 σ σ2

Flat prior on σ2 Flat prior on σ2 Prior Prior 0 1 2 3 4 5 0 1 2 3 4 5

0 1 2 3 4 5 0 1 2 3 4 5 σ2 σ

Ignorance on σ-scale is diﬀerent from ignorance on σ2-scale

Bayesian Biostatistics - Piracicaba 2014 269 Ignorance cannot be expressed mathematically

Bayesian Biostatistics - Piracicaba 2014 270 5.4.3 General principles to choose noninformative priors

A lot of research has been spent on the speciﬁcation of NI priors, most popular are Jeﬀreys priors:

• Result of a Bayesian analysis depends on choice of scale for ﬂat prior: p(θ) ∝ c or p(h(θ)) ≡ p(ψ) ∝ c

• To preserve conclusions when changing scale: Jeﬀreys suggested a rule to construct priors based on the invariance principle/rule (conclusions do not change when changing scale)

• Jeﬀreys rule suggests a way to choose a scale to take the ﬂat prior on

• Jeﬀreys rule also exists for more than one parameter (Jeﬀreys multi-parameter rule)

Bayesian Biostatistics - Piracicaba 2014 271 Examples of Jeﬀreys priors

√ • Binomial model: p(θ) ∝ θ−1/2(1 − θ)−1/2 ⇔ ﬂat prior on ψ(θ) ∝ arcsin θ √ • Poisson model: p(λ) ∝ λ−1/2 ⇔ ﬂat prior on ψ(λ) = λ

• Normal model with σ ﬁxed: p(µ) ∝ c

• Normal model with µ ﬁxed: p(σ2) ∝ σ−2 ⇔ ﬂat prior on log(σ)

• Normal model with µ and σ2 unknown: p(µ, σ2) ∝ σ−2, which reproduces some classical frequentist results !!!

Bayesian Biostatistics - Piracicaba 2014 272 5.4.4 Improper prior distributions

• Many NI priors are improper (= AUC is inﬁnite)

• Improper prior is technically no problem when posterior is proper

• Complex models: diﬃcult to know when improper prior yields a proper posterior (variance of the level-2 obs in Gaussian hierarchical model)

• Interpretation of improper priors?

Bayesian Biostatistics - Piracicaba 2014 273 5.4.5 Weak/vague priors

• For practical purposes: suﬃcient that prior is locally uniform also called vague or weak

• Locally uniform: prior ≈ constant on interval outside which likelihood ≈ zero

• Examples for N(µ, σ2) likelihood: ◦ 2 µ:N(0, σ0) prior with σ0 large ◦ σ2: IG(ε, ε) prior with ε small ≈ Jeﬀreys prior

Bayesian Biostatistics - Piracicaba 2014 274 Locally uniform prior

LIKELIHOOD

POSTERIOR

LOCALLY UNIFORM PRIOR 0.000 0.005 0.010 0.015 0.020 0.025 0 200 400 600 800 1000 µ

Bayesian Biostatistics - Piracicaba 2014 275 Vague priors in software:

• WinBUGS allows only (proper) vague priors (Jeﬀreys priors are not allowed) ◦ mu ∼ dnorm(0.0,1.0E-6): normal prior with variance = 1000 ◦ tau2 ∼ dgamma(0.001,0.001): inverse gamma prior for variance with shape=rate=10−3

• SAS allows improper priors (allows Jeﬀreys priors)

Bayesian Biostatistics - Piracicaba 2014 276 Density of log(σ) for σ2 (= 1/τ 2) ∼ IG(ε, ε)

Bayesian Biostatistics - Piracicaba 2014 277 Density of log(σ) for σ2 ∼ IG(ε, ε)

Bayesian Biostatistics - Piracicaba 2014 278 5.5 Informative prior distributions

Bayesian Biostatistics - Piracicaba 2014 279 5.5.1 Introduction

• In basically all research some prior knowledge is available

• In this section: ◃ Formalize the use of historical data as prior information using the power prior ◃ Review the use of clinical priors, which are prior distributions based on either historical data or on expert knowledge ◃ Priors that are based on formal rules expressing prior skepticism and optimism

• The set of priors representing prior knowledge = subjective or informative priors

• But, first two success stories how the Bayesian approach helped to find: ◃ a crashed plane ◃ a lost fisherman on the Atlantic Ocean

Bayesian Biostatistics - Piracicaba 2014 280 Locating a lost plane

◃ Statisticians helped locate an Air France plane in 2011 which was missing for two years using Bayesian methods ◃ June 2009: Air France flight 447 went missing flying from Rio de Janeiro in Brazil to Paris, France ◃ Debris from the Airbus A330 was found floating on the surface of the Atlantic five days later ◃ After a number of days, the debris would have moved with the ocean current, hence finding the black box is not easy ◃ Existing software (used by the US Coast Guard) did not help ◃ Senior analyst at Metron, Colleen Keller, relied on Bayesian methods to locate the black box in 2011

Bayesian Biostatistics - Piracicaba 2014 281 Members of the Brazilian Frigate Constituicao recovering debris A 2009 infrared satellite image shows weather conditions in June 2009 off the Brazilian coast and the plane search area

Debris from the Air France crash is laid out for investi- gation in 2009

Bayesian Biostatistics - Piracicaba 2014 282 Finding a lost ﬁsherman on the Atlantic Ocean

New York Times (30 September 2014)

◃ ”... if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer ◃ The man owes his life to a once obscure field known as Bayesian statistics - a set of mathematical rules for using new data to continuously update beliefs or existing knowledge ◃ It is proving especially useful in approaching complex problems, including searches like the one the Coast Guard used in 2013 to find the missing fisherman, John Aldridge ◃ But the current debate is about how scientists turn data into knowledge, evidence and predictions. Concern has been growing in recent years that some fields are not doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced that they’d analyzed 53 cancer studies

Bayesian Biostatistics - Piracicaba 2014 283 and found it could not replicate 47 of them ◃ The Coast Guard has been using Bayesian analysis since the 1970s. The approach lends itself well to problems like searches, which involve a single incident and many different kinds of relevant data, said Lawrence Stone, a statistician for Metron, a scientific consulting firm in Reston, Va., that works with the Coast Guard

The Coast Guard, guided by the statistical method of Thomas Bayes, was able to find the missing fisherman John Aldridge.

Bayesian Biostatistics - Piracicaba 2014 284 5.5.2 Data-based prior distributions

• In previous chapters: ◦ Combined historical data with current data assuming identical conditions ◦ Discounted importance of prior data by increasing variance

• Generalized by power prior (Ibrahim and Chen): ◦ | { } Likelihood historical data: L(θ y0) based on y0 = y01, . . . , y0n0 ◦ Prior of historical data: p0(θ | c0) ◦ Power prior distribution:

| ∝ | a0 | p(θ y0, a0) L(θ y0) p0(θ c0)

with 0 no accounting ≤ a0 ≤ 1 fully accounting

Bayesian Biostatistics - Piracicaba 2014 285 5.5.3 Elicitation of prior knowledge

• Elicitation of prior knowledge: turn (qualitative) information from ‘experts’ into probabilistic language

• Challenges: ◃ Most experts have no statistical background ◃ What to ask to construct prior distribution: ◦ Prior mode, median, mean and prior 95% CI? ◦ Description of the prior: quartiles, mean, SD? ◃ Some probability statements are easier to elicit than others

Bayesian Biostatistics - Piracicaba 2014 286 Example V.5: Stroke study – Prior for 1st interim analysis from experts

Prior knowledge on θ (incidence of SICH), elicitation based on:

◦ Most likely value for θ and prior equal-tail 95% CI

◦ Prior belief pk on each of the K intervals Ik ≡ [θk−1, θk) covering [0,1] 0 2 4 6 8 10 0.0 0.1 0.2 0.3 0.4 θ

Bayesian Biostatistics - Piracicaba 2014 287 Elicitation of prior knowledge – some remarks

• Community and consensus prior: obtained from a community of experts

• Diﬃculty in eliciting prior information on more than 1 parameter jointly

• Lack of Bayesian papers based on genuine prior information

Bayesian Biostatistics - Piracicaba 2014 288 Identiﬁability issues

• With overspeciﬁed model: non-identiﬁable model

• Unidentiﬁed parameter, when given a NI prior also posterior is NI

• Bayesian approach can make parameters estimable, so that it becomes an identiﬁable model

• In next example, not all parameters can be estimated without extra (prior) information

Bayesian Biostatistics - Piracicaba 2014 289 Example V.6: Cysticercosis study – Estimate prevalence without gold standard

Experiment:

◃ 868 pigs tested in Zambia with Ag-ELISA diagnostic test ◃ 496 pigs showed a positive test ◃ Aim: estimate the prevalence π of cysticercosis in Zambia among pigs

If estimate of sensitivity α and speciﬁcity β available, then: p+ + βb − 1 πb = αb + βb − 1

◦ p+ = n+/n = proportion of subjects with a positive test ◦ αb and βb = estimated sensitivity and speciﬁcity

Bayesian Biostatistics - Piracicaba 2014 290 Data:

Table of results:

Test Disease (True) Observed + - + πα (1 − π)(1 − β) n+=496 - π(1 − α) (1 − π)β n−=372 Total π (1 − π) n=868

◃ Only collapsed table is available ◃ Since α and β vary geographically, expert knowledge is needed

Bayesian Biostatistics - Piracicaba 2014 291 Prior and posterior:

• Prior distribution on π (p(π)), α (p(α)) and β (p(β)) is needed

• Posterior distribution:

( ) − | + − ∝ n − − n+ − − n p(π, α, β n , n ) n+ [πα + (1 π)(1 β)] [π(1 α) + (1 π)β] p(π)p(α)p(β)

• WinBUGS was used

Bayesian Biostatistics - Piracicaba 2014 292 Posterior of π:

(a) Uniform priors for π, α and β (no prior information) (b) Beta(21,12) prior for α and Beta(32,4) prior for β (historical data)

(a) (b)

p( π|y)

p( π|y) 0 1 2 3 4 0 1 2 3 4

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 N = 10000 Bandwidth = 0.04473 N = 10000 Bandwidth = 0.01542

Bayesian Biostatistics - Piracicaba 2014 293 5.5.4 Archetypal prior distributions

• Use of prior information in Phase III RCTs is problematic, except for medical device trials (FDA guidance document)

⇒ Pleas for objective priors in RCTs

• There is a role of subjective priors for interim analyses: ◃ Skeptical prior ◃ Enthusiastic prior

Bayesian Biostatistics - Piracicaba 2014 294 Example V.7: Skeptical priors in a phase III RCT

Tan et al. (2003):

◃ Phase III RCT for treating patients with hepatocellular carcinoma ◃ Standard treatment: surgical resection ◃ Experimental treatment: surgery + adjuvant radioactive iodine (adjuvant therapy) ◃ Planning: recruit 120 patients

Frequentist interim analyses for eﬃcacy were planned:

◃ First interim analysis (30 patients): experimental treatment better (P = 0.01 < 0.029 = P -value of stopping rule) ◃ But, scientiﬁc community was skeptical about adjuvant therapy ⇒ New multicentric trial (300 patients) was set up

Bayesian Biostatistics - Piracicaba 2014 295 Prior to the start of the subsequent trial:

◃ Pretrial opinions of the 14 clinical investigators were elicited ◃ The prior distributions of each investigator were constructed by eliciting the prior belief on the treatment eﬀect (adjuvant versus standard) on a grid of intervals ◃ Average of all priors = community prior ◃ Average of the priors of the 5 most skeptical investigators = skeptical prior

To exemplify the use of the skeptical prior:

◃ Combine skeptical prior with interim analysis results of previous trial ⇒ 1-sided contour probability (in 1st interim analysis) = 0.49 ⇒ The ﬁrst trial would not have been stopped for eﬃcacy

Bayesian Biostatistics - Piracicaba 2014 296 Questionnaire:

Bayesian Biostatistics - Piracicaba 2014 297 Prior of investigators:

Bayesian Biostatistics - Piracicaba 2014 298 Skeptical priors:

Bayesian Biostatistics - Piracicaba 2014 299 A formal skeptical/enthusiastic prior

Formal subjective priors (Spiegelhalter et al., 1994) in normal case:

• Useful in the context of monitoring clinical trials in a Bayesian manner

• θ = true eﬀect of treatment (A versus B)

• Skeptical normal prior: choose mean and variance of p(θ) to reﬂect skepticism

• Enthusiastic normal prior: choose mean and variance of p(θ) to reﬂect enthusiasm

• See ﬁgure next page & book

Bayesian Biostatistics - Piracicaba 2014 300 Example V.8+9

skeptical prior

enthusiastic prior

θa

5% 5% 0.0 0.5 1.0 1.5 2.0 −1.0 −0.5 0.0 0.5 1.0 θ

Bayesian Biostatistics - Piracicaba 2014 301 5.6 Prior distributions for regression models

Bayesian Biostatistics - Piracicaba 2014 302 5.6.1 Normal linear regression

Normal linear regression model:

T yi = xi β + εi, (i = 1, . . . , n)

y = Xβ + ε

Bayesian Biostatistics - Piracicaba 2014 303 Priors

• Non-informative priors: ◃ Popular NI prior: p(β, σ2) ∝ σ−2 (Jeﬀreys multi-parameter rule) 2 2 ◃ WinBUGS: product of independent N(0, σ0) (σ0 large) + IG(ε, ε)(ε small)

• Conjugate priors: 2 × 2 2 ◃ Conjugate NIG prior = N(β0, σ Σ0) IG(a0, b0) (or Inv-χ (ν0, τ0 ))

• Historical/expert priors: ◃ Prior knowledge on regression coeﬃcients must be given jointly ◃ Elicitation process via distributions at covariate values ◃ Most popular: express prior based on historical data

Bayesian Biostatistics - Piracicaba 2014 304 5.6.2 Generalized linear models

• In practice choice of NI priors much the same as with linear models

• But, too large prior variance may not be best for sampling, e.g. in logistic regression model

• In SAS: Jeﬀreys (improper) prior can be chosen

• Conjugate priors are based on ﬁctive historical data ◃ Data augmentation priors & conditional mean priors ◃ Not implemented in classical software, but ﬁctive data can be explicitly added and then standard software can be used

Bayesian Biostatistics - Piracicaba 2014 305 5.7 Modeling priors

Modeling prior: adapt characteristics of the statistical model

• Multicollinearity: appropriate prior avoids inﬂation of of β

• Numerical (separation) problems: appropriate prior avoids inﬂation of β

• Constraints on parameters: constraint can be put in prior

• Variable selection: prior can direct the variable search

Bayesian Biostatistics - Piracicaba 2014 306 Multicollinearity

Multicollinearity: |XT X| ≈ 0 ⇒ regression coeﬃcients and standard errors inﬂated Ridge regression:

∗ T ∗ T ∗ ◃ Minimize: (y − Xβ) (y − Xβ) + λβ β with λ ≥ 0 & y = y − y1n R ◃ Estimate: βb (λ) = (XT X + λI)−1XT y

= Posterior mode of a Bayesian normal linear regression analysis with: ◃ Normal ridge prior N(0, τ 2I) for β ◃ τ 2 = σ2/λ with σ and λ ﬁxed

• Can be easily extended to BGLIM

Bayesian Biostatistics - Piracicaba 2014 307 Numerical (separation) problems

Separation problems in binary regression models: complete separation and quasi-complete separation Solution: Take weakly informative prior on regression coeﬃcients

Cauchy (Gelman) 1 1 1

1 quasi %complete separation 1

0 0 1 x2

0 0

0 1

0 N(0,100) 0 2 4 6 8 10 0 2 4 6 8 10 x1

Bayesian Biostatistics - Piracicaba 2014 308 Constraints on parameters

Signal-Tandmobielr study:

• θk = probability of CE among Flemish children in (k = 1,..., 6) school year

• Constraint on parameters: θ1 ≤ θ2 ≤ · · · ≤ θ6

• Solutions:

T ◃ Prior on θ = (θ1, . . . , θ6) that maps all θs that violate the constraint to zero ◃ Neglect the values that are not allowed in the posterior (useful when sampling)

Bayesian Biostatistics - Piracicaba 2014 309 Other modeling priors

• LASSO prior (see Bayesian variable selection)

• ...

Bayesian Biostatistics - Piracicaba 2014 310 5.8 Other regression models

• A great variety of models

• Not considered here: conditional logistic regression model, Cox proportional hazards model, generalized linear mixed eﬀects models

• ...

Bayesian Biostatistics - Piracicaba 2014 311 Take home messages

• Often prior is dominated by the likelihood (data)

• Prior in RCTs: prior to the trial

• Conjugate priors: convenient mathematically, computationally and from an interpretational viewpoint

• Conditional conjugate priors: heavily used in Gibbs sampling

• Hyperpriors: extend the range of conjugate priors, also important in Gibbs sampling

Bayesian Biostatistics - Piracicaba 2014 312 • Noninformative priors: ◃ do not exist, strictly speaking ◃ in practice vague priors (e.g. locally uniform) are ok ◃ important class of NI priors: Jeﬀreys priors ◃ be careful with improper priors, they might imply improper posterior

• Informative priors: ◃ can be based on historical data & expert knowledge (but only useful when viewpoint of a community of experts) ◃ are useful in clinical trials to reduce sample size

Bayesian Biostatistics - Piracicaba 2014 313 Chapter 6 Markov chain Monte Carlo sampling

Aims: ◃ Introduce the sampling approach(es) that revolutionized Bayesian approach

Bayesian Biostatistics - Piracicaba 2014 314 6.1 Introduction

◃ Solving the posterior distribution analytically is often not feasible due to the diﬃculty in determining the integration constant ◃ Computing the integral using numerical integration methods is a practical alternative if only a few parameters are involved ⇒ New computational approach is needed

◃ Sampling is the way to go! ◃ With Markov chain Monte Carlo (MCMC) methods: 1. Gibbs sampler 2. Metropolis-(Hastings) algorithm

MCMC approaches have revolutionized Bayesian methods!

Bayesian Biostatistics - Piracicaba 2014 315 Intermezzo: Joint, marginal and conditional probability

Two (discrete) random variables X and Y

• Joint probability of X and Y: probability that X=x and Y=y happen together

• Marginal probability of X: probability that X=x happens

• Marginal probability of Y: probability that Y=y happens

• Conditional probability of X given Y=y: probability that X=x happens if Y=y

• Conditional probability of Y given X=x: probability that Y=y happens if X=x

Bayesian Biostatistics - Piracicaba 2014 316 Intermezzo: Joint, marginal and conditional probability

IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bank participated in a dietary study WEIGHT 40 60 80 100 120 140 150 160 170 180 190 200

LENGTH

Bayesian Biostatistics - Piracicaba 2014 317 Intermezzo: Joint, marginal and conditional probability

IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bank participated in a dietary study WEIGHT 40 60 80 100 120 140 150 160 170 180 190 200

LENGTH

Bayesian Biostatistics - Piracicaba 2014 318 Intermezzo: Joint, marginal and conditional probability

IBBENS study: frequency table

Length Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− Total −50 2 12 4 0 0 0 0 18 50 − 60 1 25 50 14 0 0 0 90 60 − 70 0 12 54 52 13 1 0 132 70 − 80 0 5 42 72 34 0 0 153 80 − 90 0 0 12 58 32 2 1 105 90 − 100 0 0 0 20 18 3 0 41 100 − 110 0 0 1 2 7 1 0 11 110 − 120 0 0 0 2 2 1 0 5 120− 0 0 0 0 1 0 0 1 Total 3 54 163 220 107 8 1 556

Bayesian Biostatistics - Piracicaba 2014 319 Intermezzo: Joint, marginal and conditional probability

IBBENS study: joint probability

Length Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total −50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556 50 − 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556 60 − 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556 70 − 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556 80 − 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556 90 − 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556 100 − 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556 110 − 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556 120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556 Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1

Bayesian Biostatistics - Piracicaba 2014 320 Intermezzo: Joint, marginal and conditional probability

IBBENS study: marginal probabilities

Bayesian Biostatistics - Piracicaba 2014 321 Intermezzo: Joint, marginal and conditional probability

IBBENS study: conditional probabilities

Length Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total −50 12/54 50 − 60 1/90 25/90 25/54 50/90 14/90 0/90 0/90 0/90 90/90 60 − 70 12/54 70 − 80 5/54 80 − 90 0/54 90 − 100 0/54 100 − 110 0/54 110 − 120 0/54 120− 0/54 Total 54/54

Bayesian Biostatistics - Piracicaba 2014 322 Intermezzo: Joint, marginal and conditional density

Two (continuous) random variables X and Y

• Joint density of X and Y: density f(x, y)

• Marginal density of X: density f(x)

• Marginal density of Y: density f(y)

• Conditional density of X given Y=y: density f(x|y)

• Conditional density of Y given X=x: density f(y|x)

Bayesian Biostatistics - Piracicaba 2014 323 Intermezzo: Joint, marginal and conditional density

IBBENS study: joint density

Bayesian Biostatistics - Piracicaba 2014 324 Intermezzo: Joint, marginal and conditional density

IBBENS study: marginal densities

Bayesian Biostatistics - Piracicaba 2014 325 Intermezzo: Joint, marginal and conditional density

IBBENS study: conditional densities

Conditional density of WEIGHT GIVEN LENGTH Conditional density of LENGTH GIVEN WEIGHT

Bayesian Biostatistics - Piracicaba 2014 326 6.2 The Gibbs sampler

• Gibbs Sampler: introduced by Geman and Geman (1984) in the context of image-processing for the estimation of the parameters of the Gibbs distribution

• Gelfand and Smith (1990) introduced Gibbs sampling to tackle complex estimation problems in a Bayesian manner

Bayesian Biostatistics - Piracicaba 2014 327 6.2.1 The bivariate Gibbs sampler

Method of Composition:

• p(θ1, θ2 | y) is completely determined by:

◃ marginal p(θ2 | y)

◃ conditional p(θ1 | θ2, y)

• Split-up yields a simple way to sample from joint distribution

Bayesian Biostatistics - Piracicaba 2014 328 Gibbs sampling:

• p(θ1, θ2 | y) is completely determined by:

◃ conditional p(θ2 | θ1, y)

◃ conditional p(θ1 | θ2, y)

• Property yields another simple way to sample from joint distribution:

0 0 ◃ Take starting values θ1 and θ2 (only 1 is needed) k k ◃ Given θ1 and θ2 at iteration k, generate the (k + 1)-th value according to iterative scheme: (k+1) | k 1. Sample θ1 from p(θ1 θ2, y) (k+1) | (k+1) 2. Sample θ2 from p(θ2 θ1 , y)

Bayesian Biostatistics - Piracicaba 2014 329 Result of Gibbs sampling:

• k k k T Chain of vectors: θ = (θ1, θ2) , k = 1, 2,... ◦ Consists of dependent elements ◦ Markov property: p(θ(k+1) | θk, θ(k−1),..., y) = p(θ(k+1) | θk, y)

• Chain depends on starting value + initial portion/burn-in part must be discarded

• Under mild conditions: sample from the posterior distribution = target distribution

⇒ From k0 on: summary measures calculated from the chain consistently estimate the true posterior measures

Gibbs sampler is called a Markov chain Monte Carlo method

Bayesian Biostatistics - Piracicaba 2014 330 Example VI.1: SAP study – Gibbs sampling the posterior with NI priors

• Example IV.5: sampling from posterior distribution of the normal likelihood based on 250 alp measurements of ‘healthy’ patients with NI prior for both parameters √ • Now using Gibbs sampler based on y = 100/ alp

• Determine two conditional distributions: 1. p(µ | σ2, y): N(µ | y,¯ σ2/n) ∑ 2 | − 2 2 | 2 2 1 n − 2 2. p(σ µ, y): Inv χ (σ n, sµ) with sµ = n i=1(yi µ)

• Iterative procedure: At iteration (k + 1) 1. Sample µ(k+1) from N(¯y, (σ2)k/n) 2. Sample (σ2)(k+1) from Inv − χ2(n, s2 ) µ(k+1)

Bayesian Biostatistics - Piracicaba 2014 331 Gibbs sampling: 2 2 σ σ 0 1 2 3 4 5 6 0 1 2 3 4 5 6

6 7 8 9 10 6 7 8 9 10 µ µ 2 2 σ σ 0 1 2 3 4 5 6 0 1 2 3 4 5 6

6 7 8 9 10 6 7 8 9 10 µ µ

◦ Sampling from conditional density of µ given σ2 ◦ Sampling from conditional density of σ2 given µ

Bayesian Biostatistics - Piracicaba 2014 332 Gibbs sampling path and sample from joint posterior:

(a) (b) 2 2 σ σ 1.4 1.6 1.8 2.0 2.2 2.4 2.6 1.4 1.6 1.8 2.0 2.2 2.4 2.6 6.6 6.8 7.0 7.2 7.4 6.6 6.8 7.0 7.2 7.4 µ µ

◦ Zigzag pattern in the (µ, σ2)-plane ◦ 1 complete step = 2 substeps (blue=genuine element) ◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 333 Posterior distributions:

(a) (b) 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 2.5 6.8 6.9 7.0 7.1 7.2 7.3 7.4 1.4 1.6 1.8 2.0 2.2 2.4 µ σ2

Solid line = true posterior distribution

Bayesian Biostatistics - Piracicaba 2014 334 Example VI.2: Sampling from a discrete × continuous distribution ( ) • ∝ n x+α−1 − (n−x+β−1) Joint distribution: f(x, y) x y (1 y) ◦ x a discrete random variable taking values in {0, 1, . . . , n} ◦ y a random variable on the unit interval ◦ α, β > 0 parameters

• Question: f(x)?

Bayesian Biostatistics - Piracicaba 2014 335 Marginal distribution: Density 0.00 0.02 0.04 0.06 0.08 0 5 10 15 20 25 30 x

◦ Solid line = true marginal distribution ◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 336 Example VI.3: SAP study – Gibbs sampling the posterior with I priors

• Example VI.1: now with independent informative priors (semi-conjugate prior) ◦ ∼ 2 µ N(µ0, σ0) ◦ 2 ∼ − 2 2 σ Inv χ (ν0, τ0 )

• Posterior: − 1 (µ−µ )2 2 1 2 0 p(µ, σ | y) ∝ e 2σ0 σ0 − − 2 2 × (σ2) (ν0/2+1) e ν0 τ0 /2σ ∏n 1 − 1 (y −µ)2 × e 2σ2 i σn i=1 n ∏ − 1 (µ−µ )2 n+ν − 1 (y −µ)2 2 0 2 − 0 +1 −ν τ 2/2σ2 ∝ e 2σ2 i e 2σ0 (σ ) ( 2 ) e 0 0 i=1

Bayesian Biostatistics - Piracicaba 2014 337 Conditional distributions:

• Determine two conditional distributions: ∏ − 1 − 2 ( ) n − 1 (y −µ)2 2 (µ µ0) | 2 2σ2 i 2σ0 k 2 k 1. p(µ σ , y): i=1 e e (N µ , (σ ) ) ( ∑ ) n − 2 2 2 2 i=1(yi µ) +ν0τ0 2. p(σ | µ, y): Inv − χ ν0 + n, ν0+n

• Iterative procedure: At iteration (k + 1) ( ) 1. Sample µ(k+1) from N µk, (σ2)k ( ∑ ) n − 2 2 2 (k+1) 2 i=1(yi µ) +ν0τ0 2. Sample (σ ) from Inv − χ ν0 + n, ν0+n

Bayesian Biostatistics - Piracicaba 2014 338 Trace plots: µ

(a) 5.5 6.0 6.5 7.0 0 500 1000 1500 Iteration

(b) 2 σ 1.8 2.2 2.6 3.0 0 500 1000 1500 Iteration

Bayesian Biostatistics - Piracicaba 2014 339 6.2.2 The general Gibbs sampler

0 0 0 T Starting position θ = (θ1, . . . , θd) Multivariate version of the Gibbs sampler: Iteration (k + 1):

(k+1) | k k k 1. Sample θ1 from p(θ1 θ2, . . . , θ(d−1), θd, y) (k+1) | (k+1) k k 2. Sample θ2 from p(θ2 θ1 , θ3, . . . , θd, y) . (k+1) | (k+1) (k+1) d. Sample θd from p(θd θ1 , . . . , θ(d−1), y)

Bayesian Biostatistics - Piracicaba 2014 340 • | k k k k k Full conditional distributions: p(θj θ1, . . . , θ(j−1), θ(j+1), . . . , θ(d−1), θd, y)

• Also called: full conditionals

• Under mild regularity conditions: θk, θ(k+1),... ultimately are observations from the posterior distribution

With the help of advanced sampling algorithms (AR, ARS, ARMS, etc) sampling the full conditionals is done based on the prior × likelihood

Bayesian Biostatistics - Piracicaba 2014 341 Example VI.4: British coal mining disasters data

◃ British coal mining disasters data set: # severe accidents in British coal mines from 1851 to 1962 ◃ Decrease in frequency of disasters from year 40 (+ 1850) onwards? # # Disasters 0 1 2 3 4 5 6 0 20 40 60 80 100 1850+year

Bayesian Biostatistics - Piracicaba 2014 342 Statistical model:

• Likelihood: Poisson process with a change point at k

◃ yi ∼ Poisson(θ) for i = 1, . . . , k

◃ yi ∼ Poisson(λ) for i = k + 1, . . . , n (n=112)

• Priors

◃ θ: Gamma(a1, b1), (a1 constant, b1 parameter)

◃ λ: Gamma(a2, b2), (a2 constant, b2 parameter) ◃ k: p(k) = 1/n

◃ b1: Gamma(c1, d1), (c1, d1 constants)

◃ b2: Gamma(c2, d2), (c2, d2 constants)

Bayesian Biostatistics - Piracicaba 2014 343 Full conditionals:

◦ a1 = a2 = 0.5, c1 = c2 = 0, d1 = d2 = 1

Bayesian Biostatistics - Piracicaba 2014 344 Posterior distributions: 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.00 0.05 0.10 0.15 0.20 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 35 40 45 θ λ k

◦ Posterior mode of k: 1891 ◦ Posterior mean for θ/λ= 3.42 with 95% CI = [2.48, 4.59]

Bayesian Biostatistics - Piracicaba 2014 345 Note:

• In most published analyses of this data set b1 and b2 are given inverse gamma priors. The full conditionals are then also inverse gamma

• The results are almost the same ⇒ our analysis is a sensitivity analysis of the analyses seen in the literature

• Despite the classical full conditionals, the WinBUGS/OpenBUGS sampler for θ and λ are not standard gamma but rather a slice sampler. See Exercise 8.10.

Bayesian Biostatistics - Piracicaba 2014 346 Example VI.5: Osteoporosis study – Using the Gibbs sampler

Bayesian linear regression model with NI priors:

◃ Regression model: tbbmci = β0 + β1bmii + εi (i = 1, . . . , n = 234) 2 −2 ◃ Priors: p(β0, β1, σ ) ∝ σ T T ◃ Notation: y = (tbbmc1, . . . , tbbmc234) , x = (bmi1, . . . , bmi234)

Full conditionals: 2 | − 2 2 p(σ β0, β1, y) = Inv χ (n, sβ) | 2 2 p(β0 σ , β1, y) = N(rβ1, σ /n) | 2 2 T p(β1 σ , β0, y) = N(rβ0, σ /x x) with ∑ s2 = 1 (y − β − β x )2 β n ∑ i 0 1 i r = 1 (y − β x ) β1 ∑n i 1 i − T rβ0 = (yi β0) xi/x x

Bayesian Biostatistics - Piracicaba 2014 347 Comparison with Method of Composition:

Parameter Method of Composition 2.5% 25% 50% 75% 97.5% Mean SD

β0 0.57 0.74 0.81 0.89 1.05 0.81 0.12

β1 0.032 0.038 0.040 0.043 0.049 0.040 0.004 σ2 0.069 0.078 0.083 0.088 0.100 0.083 0.008 Gibbs sampler 2.5% 25% 50% 75% 97.5% Mean SD

β0 0.67 0.77 0.84 0.91 1.10 0.77 0.11

β1 0.030 0.036 0.040 0.042 0.046 0.039 0.0041 σ2 0.069 0.077 0.083 0.088 0.099 0.083 0.0077

◦ Method of Composition = 1,000 independently sampled values ◦ Gibbs sampler: burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 348 Index plot from Method of Composition:

(a) 1 β 0.030 0.045 0 200 400 600 800 1000 Index

(b) 2 σ 0.07 0.09 0.11

0 200 400 600 800 1000 Index

Bayesian Biostatistics - Piracicaba 2014 349 Trace plot from Gibbs sampler:

(a) 1 β 0.030 0.045 0 500 1000 1500 Iteration

(b) 2 σ

0.06 0.080 0.10 500 1000 1500 Iteration

Bayesian Biostatistics - Piracicaba 2014 350 Trace versus index plot:

Comparison of index plot with trace plot shows:

• σ2: index plot and trace plot similar ⇒ (almost) independent sampling

• β1: trace plot shows slow mixing ⇒ quite dependent sampling

⇒ Method of Composition and Gibbs sampling: similar posterior measures of σ2

⇒ Method of Composition and Gibbs sampling: less similar posterior measures of β1

Bayesian Biostatistics - Piracicaba 2014 351 Autocorrelation:

k (k−1) ◃ Autocorrelation of lag 1: correlation of β1 with β1 (k=1, ...) k (k−2) ◃ Autocorrelation of lag 2: correlation of β1 with β1 (k=1, ...) ...

k (k−m) ◃ Autocorrelation of lag m: correlation of β1 with β1 (k=1, ...)

High autocorrelation:

⇒ burn-in part is larger ⇒ takes longer to forget initial positions ⇒ remaining part needs to be longer to obtain stable posterior measures

Bayesian Biostatistics - Piracicaba 2014 352 6.2.3 Remarks∗

• Full conditionals determine joint distribution

• Generate joint distribution from full conditionals

• Transition kernel

Bayesian Biostatistics - Piracicaba 2014 353 6.2.4 Review of Gibbs sampling approaches

Sampling the full conditionals is done via diﬀerent algorithms depending on:

◃ Shape of full conditional (classical versus general purpose algorithm) ◃ Preference of software developer: ◦ SASr procedures GENMOD, LIFEREG and PHREG: ARMS algorithm ◦ WinBUGS: variety of samplers

Several versions of the basic Gibbs sampler:

◃ Deterministic- or systematic scan Gibbs sampler: d dims visited in ﬁxed order ◃ Block Gibbs sampler: d dims split up into m blocks of parameters and Gibbs sampler applied to blocks

Bayesian Biostatistics - Piracicaba 2014 354 Review of Gibbs sampling approaches – The block Gibbs sampler

Block Gibbs sampler:

• Normal linear regression 2 ◃ p(σ | β0, β1, y) 2 ◃ p(β0, β1 | σ , y)

• May speed up considerably convergence, at the expense of more computational time needed at each iteration

• WinBUGS: blocking option on

• SASr procedure MCMC: allows the user to specify the blocks

Bayesian Biostatistics - Piracicaba 2014 355 6.3 The Metropolis(-Hastings) algorithm

Metropolis-Hastings (MH) algorithm = general Markov chain Monte Carlo technique to sample from the posterior distribution but does not require full conditionals

• Special case: Metropolis algorithm proposed by Metropolis in 1953

• General case: Metropolis-Hastings algorithm proposed by Hastings in 1970

• Became popular only after introduction of Gelfand & Smith’s paper (1990)

• Further generalization: Reversible Jump MCMC algorithm by Green (1995)

Bayesian Biostatistics - Piracicaba 2014 356 6.3.1 The Metropolis algorithm

Sketch of algorithm:

• New positions are proposed by a proposal density q

• Proposed positions will be: ◃ Accepted: ◦ Proposed location has higher posterior probability: with probability 1 ◦ Otherwise: with probability proportional to ratio of posterior probabilities ◃ Rejected: ◦ Otherwise

• Algorithm satisﬁes again Markov property ⇒ MCMC algorithm

• Similarity with AR algorithm

Bayesian Biostatistics - Piracicaba 2014 357 Metropolis algorithm: Chain is at θk ⇒ Metropolis algorithm samples value θ(k+1) as follows:

1. Sample a candidate θe from the symmetric proposal density q(θe | θ), with θ = θk 2. The next value θ(k+1) will be equal to: • θe with probability α(θk, θe) (accept proposal), • θk otherwise (reject proposal), with ( ) p(θe | y) α(θk, θe) = min r = , 1 p(θk | y)

Function α(θk, θe) = probability of a move

Bayesian Biostatistics - Piracicaba 2014 358 The MH algorithm only requires the product of the prior and the likelihood to sample from the posterior

Bayesian Biostatistics - Piracicaba 2014 359 Example VI.7: SAP study – Metropolis algorithm for NI prior case

Settings as in Example VI.1, now apply Metropolis algorithm:

◃ Proposal density: N(θk, Σ) with θk = (µk, (σ2)k)T and Σ = diag(0.03, 0.03)

(a) (b) 2 2 σ σ 1.4 1.6 1.8 2.0 2.2 2.4 2.6 1.4 1.6 1.8 2.0 2.2 2.4 2.6 6.6 6.8 7.0 7.2 7.4 6.6 6.8 7.0 7.2 7.4 µ µ

◦ Jumps to any location in the (µ, σ2)-plane ◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 360 MH-sampling:

● ●

2 2 2 ● ● ● ● σ σ σ 1.5 2.0 2.5 1.5 2.0 2.5 1.5 2.0 2.5

6.5 7.0 7.5 6.5 7.0 7.5 6.5 7.0 7.5 µ µ µ

Bayesian Biostatistics - Piracicaba 2014 361 Marginal posterior distributions:

(a) (b) 0 1 2 3 4 5 6 0.0 0.5 1.0 1.5 2.0 2.5 3.0 6.9 7.0 7.1 7.2 7.3 1.6 1.8 2.0 2.2 2.4 µ σ2

◦ Acceptance rate = 40% ◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 362 Trace plots:

(a) µ 6.9 7.1 7.3 600 800 1000 1200 1400 Iteration

(b) 2 σ 1.6 2.0 2.4

600 800 1000 1200 1400 Iteration

◦ Accepted moves = blue color, rejected moves = red color

Bayesian Biostatistics - Piracicaba 2014 363 Second choice of proposal density:

◃ Proposal density: N(θk, Σ) with θk = (µk, (σ2)k)T and Σ = diag(0.001, 0.001)

(a) (b) 2 σ 1.4 1.6 1.8 2.0 2.2 2.4 2.6 0.0 1.0 2.0 3.0 6.6 6.8 7.0 7.2 7.4 1.5 1.7 1.9 2.1 µ σ2

◦ Acceptance rate = 84% ◦ Poor approximation of true distribution

Bayesian Biostatistics - Piracicaba 2014 364 Accepted + rejected positions:

Variance proposal = 0.03 Variance proposal = 0.001 Variance proposal = 0.1 ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● 2 ● 2 2 ● ● ● ● ● ● σ σ σ ●●●● ● ●

● ● ● ● ● ●

1.5 2.0 2.5 ● 1.5 2.0 2.5 1.5 2.0 2.5 ● ● ●

6.5 7.0 7.5 6.5 7.0 7.5 6.5 7.0 7.5 µ µ µ

Bayesian Biostatistics - Piracicaba 2014 365 Problem:

What should be the acceptance rate for a good Metropolis algorithm?

From theoretical work + simulations:

• Acceptance rate: 45% for d = 1 and ≈ 24% for d > 1

Bayesian Biostatistics - Piracicaba 2014 366 6.3.2 The Metropolis-Hastings algorithm

Metropolis-Hastings algorithm: Chain is at θk ⇒ Metropolis-Hastings algorithm samples value θ(k+1) as follows:

1. Sample a candidate θe from the (asymmetric) proposal density q(θe | θ), with θ = θk 2. The next value θ(k+1) will be equal to: • θe with probability α(θk, θe) (accept proposal), • θk otherwise (reject proposal), with ( ) p(θe | y) q(θk | θe) α(θk, θe) = min r = , 1 p(θk | y) q(θe | θk)

Bayesian Biostatistics - Piracicaba 2014 367 • Reversibility condition: Probability of move from θ to θe = probability of move from θe to θ

• Reversible chain: chain satisfying reversibility condition

• Example asymmetric proposal density: q(θe | θk) ≡ q(θe) (Independent MH algorithm)

• WinBUGS makes use of univariate MH algorithm to sample from some non-standard full conditionals

Bayesian Biostatistics - Piracicaba 2014 368 Example VI.8: Sampling a t-distribution using Independent MH algorithm

2 Target distribution : t3(3, 2 )-distribution

(a) Independent MH algorithm with proposal density N(3,42) (b) Independent MH algorithm with proposal density N(3,22)

(a) (b) 0.00 0.10 0.20 0.30 0.00 0.10 0.20 0.30 −5 0 5 10 −5 0 5 10 t t

Bayesian Biostatistics - Piracicaba 2014 369 6.3.3 Remarks*

• The Gibbs sampler is a special case of the Metropolis-Hastings algorithm, but Gibbs sampler is still treated diﬀerently

• The transition kernel of the MH-algorithm

• The reversibility condition

• Diﬀerence with AR algorithm

Bayesian Biostatistics - Piracicaba 2014 370 6.5. Choice of the sampler

Choice of the sampler depends on a variety of considerations

Bayesian Biostatistics - Piracicaba 2014 371 Example VI.9: Caries study – MCMC approaches for logistic regression

Subset of n = 500 children of the Signal-Tandmobielr study at 1st examination:

◃ Research questions: ◦ Have girls a different risk for developing caries experience (CE) than boys (gender) in the first year of primary school? ◦ Is there an east-west gradient (x-coordinate) in CE? ◃ Bayesian model: logistic regression + N(0, 1002) priors for regression coefficients ◃ No standard full conditionals ◃ Three algorithms: ◦ Self-written R program: evaluate full conditionals on a grid + ICDF-method ◦ WinBUGS program: multivariate MH algorithm (blocking mode on) ◦ SASr procedure MCMC: Random-Walk MH algorithm

Bayesian Biostatistics - Piracicaba 2014 372 Program Parameter Mode Mean SD Median MCSE Intercept -0.5900 0.2800 MLE gender -0.0379 0.1810 x-coord 0.0052 0.0017 Intercept -0.5880 0.2840 -0.5860 0.0104 R gender -0.0516 0.1850 -0.0578 0.0071 x-coord 0.0052 0.0017 0.0052 6.621E-5 Intercept -0.5800 0.2810 -0.5730 0.0094 WinBUGS gender -0.0379 0.1770 -0.0324 0.0060 x-coord 0.0052 0.0018 0.0053 5.901E-5 Intercept -0.6530 0.2600 -0.6450 0.0317 SASr gender -0.0319 0.1950 -0.0443 0.0208 x-coord 0.0055 0.0016 0.0055 0.00016

Bayesian Biostatistics - Piracicaba 2014 373 Conclusions:

• Posterior means/medians of the three samplers are close (to the MLE)

• Precision with which the posterior mean was determined (high precision = low MCSE) diﬀers considerably

• The clinical conclusion was the same

⇒ Samplers may have quite a diﬀerent eﬃciency

Bayesian Biostatistics - Piracicaba 2014 374 Take home messages

• The two MCMC approaches allow ﬁtting basically any proposed model

• There is no free lunch: computation time can be MUCH longer than with likelihood approaches

• The choice between Gibbs sampling and the Metropolis-Hastings approach depends on computational and practical considerations

Bayesian Biostatistics - Piracicaba 2014 375