<<

Bayesian Contents

I Basic concepts in Bayesian methods 1

Emmanuel Lesaffre 0.1 Preface ...... 2 Interuniversity Institute for Biostatistics and statistical Bioinformatics 0.2 Time schedule of the course ...... 7 KU Leuven & University Hasselt 0.3 Aims of the course ...... 9 [email protected] 1 Modes of ...... 10 1.1 The frequentist approach: a critical reflection ...... 11

1.1.1 The classical statistical approach ...... 12 Departamento de Ciˆencias Exatas — University of Piracicaba 1.1.2 The P -value as a measure of evidence ...... 15 8 to 12 December 2014 1.1.3 The confidence interval as a measure of evidence ...... 26 1.2 Statistical inference based on the ...... 28

Bayesian Biostatistics - Piracicaba 2014 i

1.2.1 The likelihood function ...... 29 2.9 The prior and posterior of derived parameter ...... 113

1.2.2 The likelihood principles ...... 34 2.10 Bayesian versus likelihood approach ...... 116

1.3 The Bayesian approach: some basic ideas ...... 43 2.11 Bayesian versus frequentist approach ...... 117

1.3.1 Introduction ...... 44 2.12 The different modes of the Bayesian approach ...... 119

1.3.2 Bayes theorem – Discrete version for simple events - 1 ...... 48 2.13 An historical note on the Bayesian approach ...... 120

1.4 Outlook ...... 54 3 Introduction to ...... 127

2 Bayestheorem: computingtheposteriordistribution ...... 59 3.1 Introduction ...... 128 2.1 Introduction ...... 60 3.2 Summarizing the posterior with probabilities ...... 129

2.2 Bayes theorem – The binary version ...... 61 3.3 Posterior summary measures ...... 130

2.3 Probability in a Bayesian context ...... 63 3.3.1 Posterior , , , and SD ...... 131

2.4 Bayes theorem – The categorical version ...... 66 3.3.2 Credible/credibility interval ...... 135

2.5 Bayes theorem – The continuous version ...... 67 3.4 Predictive distributions ...... 140

2.6 The binomial case ...... 69 3.4.1 Introduction ...... 141

2.7 The Gaussian case ...... 87 3.4.2 Posterior predictive distribution: General case ...... 148

2.8 The Poisson case ...... 102 3.5 Exchangeability ...... 157

Bayesian Biostatistics - Piracicaba 2014 ii Bayesian Biostatistics - Piracicaba 2014 iii 3.6 A normal approximation to the posterior ...... 158 4.4 Multivariate distributions ...... 220

3.7 Numerical techniques to determine the posterior ...... 159 4.5 Frequentist properties of Bayesian inference ...... 227

3.7.1 Numerical integration ...... 160 4.6 The Method of Composition ...... 228

3.7.2 from the posterior distribution ...... 162 4.7 Bayesian models ...... 231

3.7.3 Choice of posterior summary measures ...... 175 4.7.1 The frequentist approach to linear regression ...... 232

3.8 Bayesian hypothesis testing ...... 176 4.7.2 A noninformative Bayesian linear regression model ...... 235

3.8.1 The Bayes factor ...... 183 4.7.3 Posterior summary measures for the linear regression model ...... 236

3.9 Medical literature on Bayesian methods in RCTs ...... 185 4.7.4 Sampling from the posterior distribution ...... 239

4.8 Bayesian generalized linear models ...... 243 4 Morethanoneparameter ...... 196 4.8.1 More complex regression models ...... 244 4.1 Introduction ...... 197

4.2 Joint versus marginal posterior inference ...... 199 5 Choosingthepriordistribution ...... 246 4.3 The normal distribution with μ and σ2 unknown ...... 201 5.1 Introduction ...... 247

4.3.1 No prior knowledge on μ and σ2 is available ...... 202 5.2 The sequential use of Bayes theorem ...... 248

4.3.2 An historical study is available ...... 214 5.3 distributions ...... 249

4.3.3 Expert knowledge is available ...... 218 5.3.1 Conjugate priors for univariate distributions ...... 250

Bayesian Biostatistics - Piracicaba 2014 iv Bayesian Biostatistics - Piracicaba 2014 v

5.3.2 Conjugate prior for normal distribution – mean and variance unknown ...... 258 5.6 Prior distributions for regression models ...... 302

5.3.3 Multivariate data distributions ...... 260 5.6.1 Normal linear regression ...... 303

5.3.4 Conditional conjugate and semi-conjugate priors ...... 264 5.6.2 Generalized linear models ...... 305

5.3.5 ...... 265 5.7 Modeling priors ...... 306

5.4 Noninformative prior distributions ...... 266 5.8 Other regression models ...... 311

5.4.1 Introduction ...... 267 6 MarkovchainMonteCarlosampling ...... 314 5.4.2 Expressing ignorance ...... 268 6.1 Introduction ...... 315 5.4.3 General principles to choose noninformative priors ...... 271 6.2 The Gibbs sampler ...... 327 5.4.4 Improper prior distributions ...... 273 6.2.1 The bivariate Gibbs sampler ...... 328 5.4.5 Weak/vague priors ...... 274 6.2.2 The general Gibbs sampler ...... 340

5.5 Informative prior distributions ...... 279 ∗ 6.2.3 Remarks ...... 353 5.5.1 Introduction ...... 280 6.2.4 Review of approaches ...... 354 5.5.2 Data-based prior distributions ...... 285 6.3 The Metropolis(-Hastings) algorithm ...... 356 5.5.3 Elicitation of prior knowledge ...... 286 6.3.1 The Metropolis algorithm ...... 357 5.5.4 Archetypal prior distributions ...... 294 6.3.2 The Metropolis-Hastings algorithm ...... 367

Bayesian Biostatistics - Piracicaba 2014 vi Bayesian Biostatistics - Piracicaba 2014 vii 6.3.3 Remarks* ...... 370 8.1 WinBUGS and related software ...... 439

8.1.1 A first analysis ...... 440 7 AssessingandimprovingconvergenceoftheMarkovchain ...... 376 8.1.2 Information on samplers ...... 450 7.1 Introduction ...... 377 8.1.3 Assessing and accelerating convergence ...... 452 7.2 Assessing convergence of a Markov chain ...... 378 8.1.4 Vector and matrix manipulations ...... 457 7.2.1 Definition of convergence for a Markov chain ...... 379 8.1.5 Working in batch mode ...... 461 7.2.2 Checking convergence of the Markov chain ...... 381 8.1.6 Troubleshooting ...... 463 7.2.3 Graphical approaches to assess convergence ...... 382 8.1.7 Directed acyclic graphs ...... 464 7.2.4 Graphical approaches to assess convergence ...... 383 8.1.8 Add-on modules: GeoBUGS and PKBUGS ...... 468 7.2.5 Formal approaches to assess convergence ...... 394 8.1.9 Related software ...... 469 7.2.6 Computing the Monte Carlo ...... 409 8.2 Bayesian analysis using SAS ...... 472 7.2.7 Practical experience with the formal diagnostic procedures ...... 412 8.3 Additional Bayesian software and comparisons ...... 473 7.3 Accelerating convergence ...... 414 8.3.1 Additional Bayesian software ...... 474 7.4 Practical guidelines for assessing and accelerating convergence ...... 418 8.3.2 Comparison of Bayesian software ...... 475 7.5 Data augmentation ...... 419

8 Software ...... 436

Bayesian Biostatistics - Piracicaba 2014 viii Bayesian Biostatistics - Piracicaba 2014 ix

II Bayesian tools for statistical modeling 476 9.4.2 The linear mixed model ...... 523

9.4.3 The Bayesian generalized linear mixed model (BGLMM) ...... 536 9 Hierarchicalmodels ...... 477

9.1 The Poisson-gamma hierarchical model ...... 480 10 Modelbuildingandassessment ...... 561 9.1.1 Introduction ...... 481 10.1 Introduction ...... 562

9.1.2 Model specification ...... 485 10.2 Measures for ...... 564

9.1.3 Posterior distributions ...... 491 10.2.1 The Bayes factor ...... 565

9.1.4 Estimating the parameters ...... 495 10.2.2 Information theoretic measures for model selection ...... 566

9.1.5 Posterior predictive distributions ...... 502 10.3 Model checking procedures ...... 584

9.2 Full versus Empirical Bayesian approach ...... 505 10.3.1 Introduction ...... 585

9.3 Gaussian hierarchical models ...... 509

9.3.1 Introduction ...... 510 III More advanced Bayesian modeling 599

9.3.2 The Gaussian hierarchical model ...... 511 11 AdvancedmodelingwithBayesianmethods ...... 600 9.3.3 Estimating the parameters ...... 512 11.1 Example 1: Longitudinal profiles as covariates ...... 601

9.4 Mixed models ...... 521 11.2 Example 2: relating caries on deciduous teeth with caries on permanent teeth ...... 612

9.4.1 Introduction ...... 522

Bayesian Biostatistics - Piracicaba 2014 x Bayesian Biostatistics - Piracicaba 2014 xi IV Exercises 614 Are you a Bayesian?

V Medical papers 644

Bayesian Biostatistics - Piracicaba 2014 xii Bayesian Biostatistics - Piracicaba 2014 xiii

But ...

Part I

Basic concepts in Bayesian methods

Bayesian Biostatistics - Piracicaba 2014 xiv Bayesian Biostatistics - Piracicaba 2014 1 0.1 Preface A significant result on a small trial Mouthwash trial  Small sized study with an unexpectedly positive result about a new medication to  Trial that tests whether daily use of a new mouthwash before tooth brushing treat patients with oral cancer reduces plaque when compared to using tap water only  First reaction (certainly of the drug company) = ”great!”  Result: New mouthwash reduces 25% of the plaque with a 95% CI = [10%, 40%]  Past: none of the medications had such a large effect and new medication is not  From previous trials on similar products: overall reduction in plaque lies between much different from the standard treatment 5% to 15%  Second reaction (if one is honest) = be cautious  Experts: plaque reduction from a mouthwash does not exceed 30%  Then, You are a Bayesian ()  What to conclude?  Classical frequentist analysis: 25% reduction + 95% CI  Conclusion ignores what is known from the past on similar products  Likely conclusion in practice: truth must lie somewhere in-between 5% and 25%

Bayesian Biostatistics - Piracicaba 2014 2 Bayesian Biostatistics - Piracicaba 2014 3

Incomplete information

 Some studies have not all required data, to tackle research question Bayesian approach mimics our natural life where learning is done by combining past and present experience  Example: Determining prevalence of a disease from fallible diagnostic test  Expert knowledge can fill in the gaps  Bayesian approach is the only way to tackle the problem!

Bayesian Biostatistics - Piracicaba 2014 4 Bayesian Biostatistics - Piracicaba 2014 5 Most of the material is obtained from 0.2 Time schedule of the course

• Day 1  Morning: Chapters 1 and 2  Afternoon: ◦ Chapter 3, until Section 3.6 ◦ Brief practical introduction to R & Rstudio

• Day 2  Morning: refreshing day 1 + Chapter 3, from Section 3.7  Afternoon: ◦ Discussion clinical papers ◦ Computer exercises in R

Bayesian Biostatistics - Piracicaba 2014 6 Bayesian Biostatistics - Piracicaba 2014 7

• Day 3 0.3 Aims of the course  Morning: Chapters 4 + 6 + computer exercises in R

 Afternoon: Chapter 8 via interactive WinBUGS computer session • Understand the Bayesian paradigm • Day 4 • Understand the use of Bayesian methodology in medical/biological papers  Morning: selection of topics in Chapters 5 and 7 + WinBUGS exercises • Be able to build up a (not too complex) WinBUGS program  Afternoon: Chapter 9 + WinBUGS exercises • Be able to build up a (not too complex) R2WinBUGS program • Day 5  Morning: Chapter 10 + (R2)WinBUGS computer exercises  Afternoon: exercises and advanced Bayesian modeling + wrap up

Bayesian Biostatistics - Piracicaba 2014 8 Bayesian Biostatistics - Piracicaba 2014 9 Chapter 1 1.1 The frequentist approach: a critical reflection Modes of statistical inference

• Review of the ‘classical’ approach on statistical inference

Aims:  Reflect on the ‘classical approach’ for statistical inference  Look at a precursor of Bayesian inference: the likelihood approach  A first encounter of the Bayesian approach

Bayesian Biostatistics - Piracicaba 2014 10 Bayesian Biostatistics - Piracicaba 2014 11

1.1.1 The classical statistical approach Example I.1: Toenail RCT

• Randomized, double blind, parallel group, multi-center study (Debacker et al., Classical approach: 1996) • Two treatments (A : Lamisil and B : Itraconazol)on2 × 189 patients • Mix of two approaches (Fisher & Neyman and Pearson) • 12 weeks of treatment and 48 weeks of follow up (FU) • • Here: based on P -value, significance level, power and confidence interval Significance level α =0.05 • Sample size to ensure that β ≤ 0.20 • Example: RCT • Primary endpoint = negative mycology (negative microscopy & negative culture) • Here unaffected nail length at week 48 on big toenail • 163 patients treated with A and 171 treated with B

Bayesian Biostatistics - Piracicaba 2014 12 Bayesian Biostatistics - Piracicaba 2014 13 1.1.2 The P -value as a measure of evidence

• A: μ1 & B: μ2 Use and misuse of P -value:

• H0: Δ=μ1 − μ2 =0 • The P -value is not the probability that H0 is (not) true • Completion of study: Δ=1.38 with tobs =2.19 in 0.05 rejection region • The P -value depends on fictive data (Example I.2) • Neyman-Pearson: reject that A and B are equally effective • The P -value depends on the sample space (Examples I.3 and I.4)

• Fisher: 2-sided P = 0.030 ⇒ strong evidence against H0 • The P -value is not an absolute measure

Wrong statement: Result is significant at 2-sided α of 0.030. This gives P -value an a • The P -value does not take all evidence into account (Example I.5) ‘priori status’.

Bayesian Biostatistics - Piracicaba 2014 14 Bayesian Biostatistics - Piracicaba 2014 15

The P -value is not the probability that H0 is (not) true The P -value depends on fictive data

Often P -value is interpreted in a wrong manner • P -value = probability that observed or a more extreme result occurs under H0

⇒ • P -value = probability that observed or a more extreme result occurs under H0 P -value is based not only on the observed result but also on fictive (never observed) data ⇒ P -value = surprise index • Probability has a long-run definition

• P -value = p(H0 | y) • Example I.2

• p(H0 | y) =

Bayesian Biostatistics - Piracicaba 2014 16 Bayesian Biostatistics - Piracicaba 2014 17 Example I.2: Graphical representation of P -value The P -value depends on the sample space

P -value of RCT (Example I.1) • P -value = probability that observed or a more extreme result occurs if H0 is true

⇒ Calculation of P -value depends on all possible samples (under H0)

• The possible samples are similar in some characteristics to the observed sample (e.g. same sample size)

• Examples I.3 and I.4

Bayesian Biostatistics - Piracicaba 2014 18 Bayesian Biostatistics - Piracicaba 2014 19

Example I.3: Accounting for interim analyses in a RCT Example I.4: Kaldor et al’s case-control study

2 identical RCTs except for the number of analyses: • Case-control study (Kaldor et al., 1990) to examine the impact of chemotherapy on leukaemia in Hodgkin’s survivors • RCT 1: 4 interim analyses + final analysis • 149 cases (leukaemia) and 411 controls  Correction for multiple testing  Group sequential trial: Pocock’s rule • Question: Does chemotherapy induce excess risk of developing solid tumors,  Global α =0.05, nominal significance level=0.016 leukaemia and/or lymphomas?

• RCT 2: 1 final analysis Treatment Controls Cases  Global α =0.05, nominal significance level=0.05 No Chemo 160 11 Chemo 251 138 • If both trials run until the end and P =0.02 for both trials, Total 411 149 then for RCT 1: NO significance,forRCT 2: Significance

Bayesian Biostatistics - Piracicaba 2014 20 Bayesian Biostatistics - Piracicaba 2014 21 The P -value is not an absolute measure

• Pearson χ2(1)-test: P =7.8959x10−13 • Small P -value does not necessarily indicate large difference between treatments, • Fisher’s : P = 1.487x10−14 strong association, etc.

• • ratio = 7.9971 with a 95% confidence interval = [4.19,15.25] Interpretation of a small P -value in a small/large study

• Reason for difference: 2 sample spaces are different  Pearson χ2(1)-test: condition on n  Fisher’s Exact test: condition on marginal totals

Bayesian Biostatistics - Piracicaba 2014 22 Bayesian Biostatistics - Piracicaba 2014 23

The P -value does not take all evidence into account Example I.5: Merseyside registry results

• Studies are analyzed in isolation, no reference to historical data • Subsequent registry study in UK

• Why not incorporating past information in current study? • Preliminary results of the Merseyside registry: P =0.67

• Conclusion: no excess effect of chemotherapy (?)

Treatment Controls Cases No Chemo 3 0 Chemo 3 2 Total 6 2

Bayesian Biostatistics - Piracicaba 2014 24 Bayesian Biostatistics - Piracicaba 2014 25 1.1.3 The confidence interval as a measure of evidence Example I.6: 95% confidence interval toenail RCT

• 95% confidence interval for Δ = [0.14, 2.62] • 95% confidence interval: expression of uncertainty on parameter of interest • Interpretation: most likely (with 0.95 probability) Δ lies between 0.14 and 2.62 • Technical definition: in 95 out of 100 studies true parameter is enclosed =aBayesian interpretation • In each study confidence interval includes/does not include true value

• Practical interpretation has a Bayesian nature

Bayesian Biostatistics - Piracicaba 2014 26 Bayesian Biostatistics - Piracicaba 2014 27

1.2 Statistical inference based on the likelihood function 1.2.1 The likelihood function

• Likelihood was introduced by Fisher in 1922

• Likelihood function = plausibility of the observed data as a function of the • Inference purely on likelihood function has not been developed to a full-blown parameters of the stochastic model statistical approach • Inference based on likelihood function is QUITE different from inference based on • Considered here as a pre-cursor to Bayesian approach P -value

• In the likelihood approach, one conditions on the observed data

Bayesian Biostatistics - Piracicaba 2014 28 Bayesian Biostatistics - Piracicaba 2014 29 Example I.7: A surgery

⎛ ⎞ • n New but rather complicated surgical technique ⎝ n ⎠ s n−s fθ (s)= θ (1 − θ) with s = yi s i=1 • Surgeon operates n =12patients with s =9successes

• Notation: • θ fixed & function of s: n ◦ Result on ith operation: success yi =1, failure yi =0 fθ (s) is discrete distribution with fθ (s)=1 s=0 ◦ Total experiment: n operations with s successes

◦ Sample {y1,...,yn}≡y • s fixed & function of θ:

◦ Probability of success = p(yi)=θ ⇒ binomial likelihood function L(θ|s)

⇒ Binomial distribution: Expresses probability of s successes out of n .

Bayesian Biostatistics - Piracicaba 2014 30 Bayesian Biostatistics - Piracicaba 2014 31

Binomial distribution Example I.7 – Determining MLE

To determine MLE first derivative of likelihood function is needed:

• (θ|s)=c +[s ln θ +(n − s)ln(1− θ)]

• d | s − (n−s) ⇒ dθ (θ s)=θ (1−θ) =0 θ = s/n • For s =9and n =12⇒ θ =0.75

θ θ

◦ Maximum likelihood estimate (MLE) θ maximizes L(θ|s) ◦ Maximizing L(θ|s) equivalent to maximizing log[L(θ|s)] ≡ (θ|s)

Bayesian Biostatistics - Piracicaba 2014 32 Bayesian Biostatistics - Piracicaba 2014 33 1.2.2 The likelihood principles 1

Two likelihood principles (LP):

• LP 1:Allevidence, which is obtained from an experiment, about an unknown quantity θ,iscontained in the likelihood function of θ for the given data ⇒ LP 1:Allevidence, which is obtained from an experiment, about an unknown quantity θ,iscontained in the likelihood function of θ for the given data  Standardized likelihood  Interval of evidence

• LP 2: Two likelihood functions for θ contain the same information about θ if they are proportional to each other.

Bayesian Biostatistics - Piracicaba 2014 34 Bayesian Biostatistics - Piracicaba 2014 35

Example I.7 (continued) Inference on the likelihood function:

• Maximal evidence for θ =0.75

• Likelihood ratio L(0.5|s)/L(0.75|s) = relative evidence for 2 hypotheses θ =0.5 & θ =0.75 (0.21 ⇒??)

• Standardized likelihood: LS(θ|s) ≡ L(θ|s)/L(θ|s) • LS(0.5|s)=0.21 = test for hypothesis H0 without involving fictive data

• Interval of (≥ 1/2 maximal) evidence θ

Bayesian Biostatistics - Piracicaba 2014 36 Bayesian Biostatistics - Piracicaba 2014 37 Likelihood principle 2 Example I.8: Another surgery experiment

 Surgeon 1: (Example I.7) Operates n =12patients, observes s =9successes (and 3 failures) LP 2: Two likelihood functions for θ contain the same information about θ if they are proportional to each other  Surgeon 2:Operatesn patients until k =3failures are observed (n = s + k). And, suppose s =9

• LP 2 = principle n  Surgeon 1: s = yi has a binomial distribution i=1 ⇒ When likelihood is proportional under two experimental conditions, then − ⇒ binomial likelihood L (θ|s) = n θs(1 − θ)(n s) information about θ must be the same! 1 s n  Surgeon 2: s = yi has a negative binomial (Pascal) distribution i=1 ⇒ | s+k−1 s − k negative binomial likelihood L2(θ s) = s θ (1 θ)

Bayesian Biostatistics - Piracicaba 2014 38 Bayesian Biostatistics - Piracicaba 2014 39

Inference on the likelihood function: :

• H0: θ =0.5 & HA: θ>0.5

• Surgeon 1: Calculation P -value = 0.0730 ⎛ ⎞ 12 12 − p [s ≥ 9 |θ =0.5] = ⎝ ⎠ 0.5s (1 − 0.5)12 s s=9 s • Surgeon 2: Calculation P -value = 0.0337

⎛ ⎞ ∞  ⎝ 2+s ⎠ s 3 p [s ≥ 9 |θ =0.5] = 0.5 (1 − 0.5) θ s=9 s

LP 2: 2 experiments give us the same information about θ Frequentist conclusion = Likelihood conclusion

Bayesian Biostatistics - Piracicaba 2014 40 Bayesian Biostatistics - Piracicaba 2014 41 Conclusion: 1.3 The Bayesian approach: some basic ideas

Design aspects (stopping rule) are important in frequentist context • Bayesian methodology = topic of the course

• Statistical inference through different type of “glasses”

When likelihoods are proportional, it is not important in the likelihood approach how the data were obtained

Bayesian Biostatistics - Piracicaba 2014 42 Bayesian Biostatistics - Piracicaba 2014 43

1.3.1 Introduction

• Medical device trials: • Examples I.7 and I.8: combination of information from a similar historical surgical  The effect of a medical device is better understood than that of a drug technique could be used in the evaluation of current technique = Bayesian exercise  It is very difficult to motivate surgeons to use concurrent controls to compare the new device with the control device • Planning phase III study:  Can we capitalize on the information of the past to evaluate the performance  Comparison new ⇔ old treatment for treating breast cancer of the new device? Using only a single arm trial??  Background information is incorporated when writing the protocol  Background information is not incorporated in the statistical analysis  Suppose small-scaled study with unexpectedly positive result (P<0.01) ⇒ Reaction???

Bayesian Biostatistics - Piracicaba 2014 44 Bayesian Biostatistics - Piracicaba 2014 45 Example I.9: Examples of Bayesian reasoning in daily life

Central idea of Bayesian approach: • Tourist example: Prior view on Belgians + visit to Belgium (data) ⇒ posterior view on Belgians Combine likelihood (data) with Your prior • knowledge () to update information Marketing example: Launch of new energy drink on the market on the parameter to result in a revised probability associated with the parameter () • Medical example: Patients treated for CVA with thrombolytic agent suffer from SBAs. Historical studies (20% - prior), pilot study (10% - data) ⇒ posterior

Bayesian Biostatistics - Piracicaba 2014 46 Bayesian Biostatistics - Piracicaba 2014 47

1.3.2 Bayes theorem – Discrete version for simple events - 1

• A (diseased) & B (positive diagnostic test) • AC (not diseased) & BC (negative diagnostic test) Bayes theorem -versionII:

• · | · | p (A | B) · p (B) p(A, B)=p(A) p(B A)=p(B) p(A B) p (B | A) = p (A | B) · p (B)+p (A | BC ) · p (BC)

• Bayes theorem = Theorem on Inverse Probability

p (A | B) · p (B) p (B | A) = p (A)

Bayesian Biostatistics - Piracicaba 2014 48 Bayesian Biostatistics - Piracicaba 2014 49 Example I.10: Sensitivity, specificity, prevalence and predictive values

• Folin-Wu blood test: screening test for diabetes (Boston City Hospital) • B = “diseased”, A = “positive diagnostic test”

• Characteristics of diagnostic test: Test Diabetic Non-diabetic Total + 56 49 105  Sensitivity (Se)=p (A | B) - 14 461 475 C C  Specificity (Sp)=p A B Total 70 510 580  Positive predictive value (pred+)=p (B | A)

 Negative predictive value (pred-)=p BC AC ◦ S =56/70 = 0.80  Prevalence (prev)=p(B) e ◦ Sp = 461/510 = 0.90

• pred+ calculated from Se, Sp and prev using Bayes theorem ◦ prev =70/580 = 0.12

Bayesian Biostatistics - Piracicaba 2014 50 Bayesian Biostatistics - Piracicaba 2014 51

• Bayes theorem: • For p(B)=0.03 ⇒ pred+ = 0.20 & pred- = 0.99 + + + p (T | D ) · p (D ) p D+ T + = • ⇒ p (T + | D+) · p (D+)+p (T + | D−) · p (D−) For p(B)=0.30 pred+ = ?? & pred- = ??

• Individual prediction: combine prior knowledge (prevalence of diabetes in

• In terms of Se, Sp and prev: population) with result of Folin-Wu blood test on patient to arrive at revised opinion for patient

Se · prev pred+= • Folin-Wu blood test: prior (prevalence) = 0.10 & positive test ⇒ posterior = 0.47 Se · prev +(1− Sp) · (1 − prev)

• Obvious, but important: it is not possible to find the probability of having a disease based on test results without specifying the disease’s prevalence.

Bayesian Biostatistics - Piracicaba 2014 52 Bayesian Biostatistics - Piracicaba 2014 53 1.4 Outlook Example I.12: Toenail RCT – A Bayesian analysis

Re-analysis of toenail data using WinBUGS (most popular Bayesian software)

• Bayes theorem will be further developed in the next chapter ⇒ such that it • Figure 1.4 (a): AUC on pos x-axis represents = our posterior belief that Δ is becomes useful in statistical practice positive (= 0.98)

• Reanalyze examples such as those seen in this chapter • Figure 1.4 (b): AUC for the interval [1, ∞) = our posterior belief that μ1/μ2 > 1

• Valid question: What can a Bayesian analysis do more than a classical • Figure 1.4 (c): incorporation of skeptical prior that Δ is positive (a priori around frequentist analysis? -0.5) (= 0.95)

• • 2 2 Six additional chapters are needed to develop useful Bayesian tools Figure 1.4 (d): incorporation of information on the variance parameters (σ2/σ1 varies around 2) • But it is worth the effort!

Bayesian Biostatistics - Piracicaba 2014 54 Bayesian Biostatistics - Piracicaba 2014 55

True stories ...

 A Bayesian and a frequentist were sentenced to death. When the judge asked what their final wishes were, the Bayesian replied that he wished to teach the frequentist the ultimate lesson. The judge granted his request and then repeated the question to the frequentist. He replied that he wished to get the lesson again and again and again ...

 A Bayesian is walking in the fields expecting to meet horses. However, suddenly he runs into a donkey. Looks at the animal and continues his path concluding that he saw a mule.

Bayesian Biostatistics - Piracicaba 2014 56 Bayesian Biostatistics - Piracicaba 2014 57 Chapter 2 Take home messages Bayes theorem: computing the posterior distribution

• The P -value does not bring the message as it is perceived by many, i.e. it is not the probability that H0 is true or false Aims: • The confidence interval is generally accepted as a better tool for inference, but we interpret it in a Bayesian way  Derive the general expression of Bayes theorem  Exemplify the computations • There are other ways of statistical inference

• Pure likelihood inference = Bayesian inference without a prior

• Bayesian inference builds on likelihood inference by adding what is known of the problem

Bayesian Biostatistics - Piracicaba 2014 58 Bayesian Biostatistics - Piracicaba 2014 59

2.1 Introduction 2.2 Bayes theorem – The binary version

In this chapter: ◦ D+ ≡ θ =1and D− ≡ θ =0(diabetes) • Bayes theorem for binary outcomes, counts and continuous outcomes case ◦ T + ≡ y =1and T − ≡ y =0(Folin-Wu test)

• Derivation of posterior distribution: for binomial, normal and Poisson | · | p(y =1 θ =1) p(θ =1) p(θ =1 y =1)= | · | · • A variety of examples p(y =1 θ =1) p(θ =1)+p(y =1 θ =0) p(θ =0)

◦ p(θ =1), p(θ =0)prior probabilities ◦ p(y =1| θ =1)likelihood ◦ p(θ =1| y =1)posterior probability ⇒ Now parameter has also a probability

Bayesian Biostatistics - Piracicaba 2014 60 Bayesian Biostatistics - Piracicaba 2014 61 Bayes theorem 2.3 Probability in a Bayesian context

Shorthand notation Bayesian probability = expression of Our/Your uncertainty of the parameter value

• Coin tossing: truth is there, but unknown to us p(y | θ)p(θ) p(θ | y)= p(y) • Diabetes: from population to individual patient where θ can stand for θ =0or θ =1. Probability can have two meanings: limiting proportion (objective) or personal belief (subjective)

Bayesian Biostatistics - Piracicaba 2014 62 Bayesian Biostatistics - Piracicaba 2014 63

Other examples of Bayesian probabilities Subjective probability rules

Let A ,A ,...,A mutually exclusive events with total event S Subjective probability varies with individual, in time, etc. 1 2 K Subjective probability p should be coherent: • Tour de France • Ak: p(Ak) ≥ 0 (k=1, ...,K) • FIFA World Cup 2014 • p(S)=1 • Global warming • p(AC)=1− p(A) • ... • With B1,B2,...,BL another set of mutually exclusive events:

p(Ai,Bj) p(Ai | Bj)= p(Bj)

Bayesian Biostatistics - Piracicaba 2014 64 Bayesian Biostatistics - Piracicaba 2014 65 2.4 Bayes theorem – The categorical version 2.5 Bayes theorem – The continuous version

• 1-dimensional continuous parameter θ • Subject can belong to K>2 classes: θ1,θ2,...,θK • i.i.d. sample y = y1,...,yn • y takes L different values: y1,...,yL or continuous • y| n | |y Joint distribution of sample = p( θ)= i=1 p(yi θ) = likelihood L(θ ) ⇒ Bayes theorem for categorical parameter: • Prior density function p(θ) • Split up: p(y,θ)=p(y|θ)p(θ)=p(θ|y)p(y) p (y | θ ) p (θ ) p (θ | y)= k k k K ⇒ Bayes theorem for continuous parameters: p (y | θk ) p (θk) k=1 L(θ|y)p(θ) L(θ|y)p(θ) p(θ|y)= = p(y) L(θ|y)p(θ)dθ

Bayesian Biostatistics - Piracicaba 2014 66 Bayesian Biostatistics - Piracicaba 2014 67

2.6 The binomial case

• Shorter: p(θ|y) ∝ L(θ|y)p(θ)

• L(θ|y)p(θ)dθ = averaged likelihood Example II.1: Stroke study – Monitoring safety  Rt-PA: thrombolytic for ischemic stroke • Averaged likelihood ⇒ posterior distribution involves integration  Historical studies ECASS 1 and ECASS 2: complication SICH • θ is now a and is described by a . All  ECASS 3 study: patients with ischemic stroke (Tx between 3 & 4.5 hours) because we express our uncertainty on the parameter  DSMB: monitor SICH in ECASS 3 • In the Bayesian approach (as in likelihood approach), one only looks at the  Fictive situation: observed data. No fictive data are involved ◦ First interim analysis ECASS 3: 50 rt-PA patients with 10 SICHs • One says: In the Bayesian approach, one conditions on the observed data.In ◦ Historical data ECASS 2: 100 rt-PA patients with 8 SICHs other words, the data are fixed and p(y) is a constant!  Estimate risk for SICH in ECASS 3 ⇒ construct Bayesian stopping rule

Bayesian Biostatistics - Piracicaba 2014 68 Bayesian Biostatistics - Piracicaba 2014 69 Comparison of 3 approaches Notation

• Frequentist • SICH incidence: θ

• Likelihood • i.i.d. Bernoulli random variables y1,...,yn

• Bayesian - different prior distributions • SICH: yi =1, otherwise yi =0  Prior information is available: from ECASS 2 study  • n | n y − (n−y) y = 1 yi has Bin(n, θ): p(y θ)= θ (1 θ)  Experts express their opinion: subjective prior y  No prior information is available: non-informative prior

• Exemplify mechanics of calculating the posterior distribution using Bayes theorem

Bayesian Biostatistics - Piracicaba 2014 70 Bayesian Biostatistics - Piracicaba 2014 71

Frequentist approach Likelihood inference

• MLE θ = y/n =10/50 = 0.20 • MLE θ =0.20

• Test hypothesis θ =0.08 with binomial test (8% = value of ECASS 2 study) • No hypothesis test is performed

• Classical 95% confidence interval = [0.089, 0.31] • 0.95 interval of evidence = [0.09, 0.36]

Bayesian Biostatistics - Piracicaba 2014 72 Bayesian Biostatistics - Piracicaba 2014 73 Bayesian approach: prior obtained from ECASS 2 study 1. Specifying the (ECASS 2) prior distribution

n0 y0 (n0−y0) • ECASS 2 likelihood: L(θ|y0)= θ (1 − θ) (y0 =8& n0 = 100) 1. Specifying the (ECASS 2) prior distribution y0 • 2. Constructing the posterior distribution ECASS 2 likelihood expresses prior belief on θ but is not (yet) prior distribution 3. Characteristics of the posterior distribution • As a function of θ L(θ|y ) = density (AUC = 1) 4. Equivalence of prior information and extra data 0 FCE9FCBII8 • How to standardize? Numerically or analytically?  #  #

 #  #    # !

PRIPIRTCIH SC7B

Bayesian Biostatistics - Piracicaba 2014 74 Bayesian Biostatistics - Piracicaba 2014 75

− • Kernel of binomial likelihood θy0(1 − θ)(n0 y0) ∝ beta density Beta(α, β): Some beta densities

1 α −1 β −1 p(θ)= θ 0 (1 − θ) 0          B(α0,β0)         

with B(α, β)=Γ(α)Γ(β)

Γ(α+β)   

· ‚„‚„†u€gx † Γ( ) gamma function FCE9FCBII8    • ≡    α0(=9) y0 +1        β0(= 100 − 8+1) ≡ n0 − y0 +1            FCE9FCBII8     #  #

 #  #    # !    PRIPIRTCIH SC7B   

Bayesian Biostatistics - Piracicaba 2014 76 Bayesian Biostatistics - Piracicaba 2014 77 2. Constructing the posterior distribution − − − n 1 θα0+y 1(1 − θ)β0+n y 1 • y B(α0,β0) Posterior distribution = − • n B(α0+y,β0+n y) Bayes theorem needs: y B(α0,β0)  Prior p(θ) (ECASS 2 study)  Likelihood L(θ|y) (ECASS 3 interim analysis), y=10 & n=50 ⇒ Posterior distribution = Beta(α, β)

 Averaged likelihood L(θ|y)p(θ)dθ 1 − − p(θ|y)= θα 1(1 − θ)β 1 • Numerator of Bayes theorem B(α, β)   n 1 − − − L(θ|y) p(θ) = θα0+y 1(1 − θ)β0+n y 1 with y B(α0,β0) α = α0 + y • Denominator of Bayes theorem = averaged likelihood β = β + n − y    0 n B(α + y, β + n − y) p(y)= L(θ|y) p(θ) dθ = 0 0 y B(α0,β0)

Bayesian Biostatistics - Piracicaba 2014 78 Bayesian Biostatistics - Piracicaba 2014 79

2. Prior, likelihood & posterior 3. Characteristics of the posterior distribution

• Posterior = compromise between prior & likelihood

n0 n • Posterior mode: θM = θ0 + θ n0+n n0+n • Shrinkage: θ0 ≤ θM ≤ θ when (y0/n0 ≤ y/n) • Posterior more peaked than prior & likelihood, but not in general

• Posterior = beta distribution = prior (conjugacy)

• Likelihood dominates the prior for large sample sizes θ θ θ • NOTE: Posterior estimate θ = MLE of combined ECASS 2 data & interim data θ ECASS 3 (!!!), i.e. y0 + y θM = n0 + n

Bayesian Biostatistics - Piracicaba 2014 80 Bayesian Biostatistics - Piracicaba 2014 81 Example dominance of likelihood 4. Equivalence of prior information and extra data

• Beta(α0,β0)prior

≡ binomial experiment with (α0 − 1) successes in (α0 + β0 − 2) experiments ⇒

Prior θ θ ≈ extra data to observed data set: (α0 − 1) successes and (β0 − 1) failures

θ θ

Likelihood based on 5,000 subjects with 800 “successes”

Bayesian Biostatistics - Piracicaba 2014 82 Bayesian Biostatistics - Piracicaba 2014 83

Bayesian approach: using a subjective prior Example subjective prior

• Suppose DSMB neurologists ‘believe’ that SICH incidence is probably more than

5% but most likely not more than 20% • If prior belief = ECASS 2 prior density ⇒ posterior inference is the same • The neurologists could also combine their qualitative prior belief with ECASS 2 data to construct a prior distribution ⇒ adjust ECASS 2 prior θ

Bayesian Biostatistics - Piracicaba 2014 84 Bayesian Biostatistics - Piracicaba 2014 85 Bayesian approach: no prior information is available 2.7 The Gaussian case

• Suppose no prior information is available Example II.2: Dietary study – Monitoring dietary behavior in Belgium • Need: a prior distribution that expresses ignorance = noninformative (NI)prior • IBBENS study: dietary survey in Belgium

• For stroke study: NI prior = • Of interest: intake of cholesterol p(θ)=I[0,1] = flat prior on [0,1] • • Uniform prior on [0,1] = Beta(1,1) Monitoring dietary behavior in Belgium: IBBENS-2 study

Assume σ is known θ

Bayesian Biostatistics - Piracicaba 2014 86 Bayesian Biostatistics - Piracicaba 2014 87

Bayesian approach: prior obtained from the IBBENS study 1. Specifying the (IBBENS) prior distribution

Histogram of the dietary cholesterol of 563 bank employees ≈ normal 1. Specifying the (IBBENS) prior distribution • y ∼ N(μ, σ2) when 2. Constructing the posterior distribution 1   f(y)=√ exp −(y − μ)2/2σ2 3. Characteristics of the posterior distribution 2πσ

4. Equivalence of prior information and extra data • Sample y1,...,yn ⇒ likelihood       n 2 1 1 μ − y L(μ|y) ∝ exp − (y − μ)2 ∝ exp − √ ≡ L(μ|y) 2σ2 i 2 σ/ n i=1

Bayesian Biostatistics - Piracicaba 2014 88 Bayesian Biostatistics - Piracicaba 2014 89 and likelihood IBBENS study:

• y ≡{ } Denote sample n0 IBBENS data: 0 y0,1,...,y0,n0 with mean y0

• ∝ 2 Likelihood N(μ0,σ0) μ ≡ y = 328 0 0 √ √ σ0 = σ/ n0 = 120.3/ 563 = 5.072

• IBBENS prior distribution (σ is known)     1 1 μ − μ 2 p(μ)=√ exp − 0 2πσ0 2 σ0 μ ≡ with μ0 y0

Bayesian Biostatistics - Piracicaba 2014 90 Bayesian Biostatistics - Piracicaba 2014 91

2. Constructing the posterior distribution

 Integration constant to obtain density? • IBBENS-2 study: sample y with n=50  Recognize standard distribution: exponent (quadratic function of μ) y = 318 mg/day & s = 119.5 mg/day  Posterior distribution: 95% confidence interval = [284.3, 351.9] mg/day ⇒ wide p(μ|y)=N(μ, σ2), • Combine IBBENS prior distribution IBBENS-2 normal likelihood: with ◦ | IBBENS-2 likelihood: L(μ y) 1 n 2 μ0 + 2 y ◦ | 2 σ0 σ 2 1 IBBENS prior density: p(μ)=N(μ μ0,σ0) μ = 1 n and σ = 1 n 2 + 2 2 + 2 σ0 σ σ0 σ • Posterior distribution ∝ p(μ)L(μ|y):       2 2  Here: μ = 327.2 and σ =4.79. 1 μ − μ μ − y p(μ|y) ≡ p(μ|y) ∝ exp − 0 + √ 2 σ0 σ/ n

Bayesian Biostatistics - Piracicaba 2014 92 Bayesian Biostatistics - Piracicaba 2014 93 IBBENS-2 posterior distribution: 3. Characteristics of the posterior distribution

 Posterior distribution: compromise between prior and likelihood  Posterior mean: weighted average of prior and the sample mean w0 w1 μ = μ0 + y w0 + w1 w0 + w1 with 1 1 w0 = 2 & w1 = 2 σ0 σ /n

 The posterior precision = 1/posterior variance: 1 = w0 + w1 2 σ μ 2 2 with w0 =1/σ0 = prior precision and w1 =1/(σ /n) = sample precision

Bayesian Biostatistics - Piracicaba 2014 94 Bayesian Biostatistics - Piracicaba 2014 95

4. Equivalence of prior information and extra data

• Posterior is always more peaked than prior and likelihood • 2 2 ⇒ 2 2 Prior variance σ0 = σ (unit information prior) σ = σ /(n +1)

⇒ Prior information = adding one extra observation to the sample 2 • When n →∞or σ0 →∞: p(μ|y)=N(y, σ /n) • 2 2 General: σ0 = σ /n0, with n0 general ⇒ When sample size increases the likelihood dominates the prior n0 n μ = μ0 + y n0 + n n0 + n • Posterior = normal = prior ⇒ conjugacy and σ2 σ2 = n0 + n

Bayesian Biostatistics - Piracicaba 2014 96 Bayesian Biostatistics - Piracicaba 2014 97 Bayesian approach: using a subjective prior Discounted priors:

• Discounted IBBENS prior: increase IBBENS prior variance from 25 to 100

• Discounted IBBENS prior + shift: increase from μ0 = 328 to μ0 = 340

μ μ

Discounted prior Discounted prior + shift

Bayesian Biostatistics - Piracicaba 2014 98 Bayesian Biostatistics - Piracicaba 2014 99

Bayesian approach: no prior information is available Non-informative prior:

• 2 →∞ Non-informative prior: σ0

⇒ Posterior: N(y, σ2/n)

μ

Bayesian Biostatistics - Piracicaba 2014 100 Bayesian Biostatistics - Piracicaba 2014 101 2.8 The Poisson case Example II.6: Describing caries experience in Flanders

The Signal-Tandmobiel (STM) study: • Take y ≡{y1,...,yn} independent counts ⇒ Poisson distribution • Longitudinal oral health study in Flanders • Poisson(θ) θy e−θ p(y| θ)= • y! Annual examinations from 1996 to 2001

• 4468 children (7% of children born in 1989) • Mean and variance = θ PROPORTION • Caries experience measured by dmft-index • Poisson likelihood: (min=0, max=20) 0.0 0.1 0.2 0.3 0.4 n n 0 5 10 15 dmft-index yi −nθ L(θ| y) ≡ p(yi| θ)= (θ /yi!) e i=1 i=1

Bayesian Biostatistics - Piracicaba 2014 102 Bayesian Biostatistics - Piracicaba 2014 103

Frequentist and likelihood calculations Bayesian approach: prior distribution based on historical data

• MLE of θ: θ = y =2.24 1. Specifying the prior distribution • Likelihood-based 95% confidence interval for θ: [2.1984,2.2875] 2. Constructing the posterior distribution

3. Characteristics of the posterior distribution

4. Equivalence of prior information and extra data

Bayesian Biostatistics - Piracicaba 2014 104 Bayesian Biostatistics - Piracicaba 2014 105 1. Specifying the prior distribution Gamma prior for STM study:

• Information from literature: ◦ Average dmft-index 4.1 (Li`ege, 1983) & 1.39 (Gent, 1994) ◦ Oral hygiene has improved considerably in Flanders ◦ Average dmft-index bounded above by 10

• Candidate for prior: Gamma(α0,β0) VW

α0 β − − p(θ)= 0 θα0 1 e β0θ Γ(α0) ◦ α0 = & β0 = inverse of ◦ 2 θ E(θ)=α0/β0 & var(θ)=α0/β0

• STM study: α0 =3& β0 =1

Bayesian Biostatistics - Piracicaba 2014 106 Bayesian Biostatistics - Piracicaba 2014 107

Some gamma densities 2. Constructing the posterior distribution

           • Posterior n α0 − β − −       nθ y 0 α 1 β θ i 0 0 p(θ| y) ∝ e (θ /yi!) θ e   

Γ(α0)  i=1

   − − ∝ θ( yi+α0) 1e (n+β0)θ   

        • Recognize kernel of a Gamma( yi + α0, n + β0) distribution           α¯ β¯ − − ¯ ⇒ p(θ| y) ≡ p(θ| y)= θα¯ 1 e βθ       Γ(¯α)        with α¯ = yi + α0= 9758 + 3 = 9761 and β¯ = n + β0= 4351 + 1 = 4352       ⇒    STM study: effect of prior is minimal

Bayesian Biostatistics - Piracicaba 2014 108 Bayesian Biostatistics - Piracicaba 2014 109 3. Characteristics of the posterior distribution 4. Equivalence of prior information and extra data

• Posterior is a compromise between prior and likelihood • Prior = equivalent to experiment of size β0 with counts summing up to α0 − 1

• Posterior mode demonstrates shrinkage • STM study: prior corresponds to an experiment of size 1 with count equal to 2

• For STM study posterior more peaked than prior likelihood, but not in general

• Prior is dominated by likelihood for a large sample size

• Posterior = gamma = prior ⇒ conjugacy

Bayesian Biostatistics - Piracicaba 2014 110 Bayesian Biostatistics - Piracicaba 2014 111

Bayesian approach: no prior information is available 2.9 The prior and posterior of derived parameter

• Gamma with α0 ≈ 1 and β0 ≈ 0 = non-informative prior • If p(θ) is prior of θ, what is then corresponding prior for h(θ)=ψ?  Same question for posterior density  Example: θ = , ψ = log (odds ratio)

• Whydowewishtoknowthis?  Prior: prior information on θ and ψ should be the same  Posterior: allows to reformulate conclusion on a different scale   −1 • −1 dh (ψ) Solution: apply transformation rule p(h (ψ)) dψ

• Note: parameter is a random variable!

Bayesian Biostatistics - Piracicaba 2014 112 Bayesian Biostatistics - Piracicaba 2014 113 Example II.4: Stroke study – Posterior distribution of log(θ) Posterior and transformed posterior:

• Probability of ‘success’ is often modeled on the log-scale (or logit scale)

• Posterior distribution of ψ = log(θ)

1 − p(ψ|y)= exp ψα(1 − exp ψ)β 1 B(α, β)

with α =19and β = 133. X X θ ψ

Bayesian Biostatistics - Piracicaba 2014 114 Bayesian Biostatistics - Piracicaba 2014 115

2.10 Bayesian versus likelihood approach 2.11 Bayesian versus frequentist approach

• Bayesian approach satisfies 1st likelihood principle in that inference does not • Frequentist approach: depend on never observed results θfixed and data are stochastic

• Bayesian approach satisfies 2nd likelihood principle:  Many tests are based on asymptotic arguments   Maximization is key tool p2(θ | y) = L2(θ | y)p(θ)/ L2(θ | y)p(θ)dθ   Does depend on stopping rules

= cL1(θ | y)p(θ)/ cL1(θ | y)p(θ)dθ • Bayesian approach: = p1(θ | y)  Condition on observed data (data fixed), uncertainty about θ (θ stochastic) • In Bayesian approach parameter is stochastic  No asymptotic arguments are needed, all inference depends on posterior ⇒ different effect of transformation h(θ) in Bayesian and likelihood approach  Integration is key tool  Does not depend on stopping rules

Bayesian Biostatistics - Piracicaba 2014 116 Bayesian Biostatistics - Piracicaba 2014 117 2.12 The different modes of the Bayesian approach

• Frequentist and Bayesian approach can give the same numerical output (with • Subjectivity ⇔ objectivity different interpretation), but may give quite a different inference • Subjective (proper) Bayesian ⇔ objective (reference) Bayesian • Frequentist ideas in Bayesian approaches (MCMC) • Empirical Bayesian

• Decision-theoretic (full) Bayesian: use of utility function

• 46656 varieties of Bayesians (De Groot)

• Pragmatic Bayesian = Bayesian ??

Bayesian Biostatistics - Piracicaba 2014 118 Bayesian Biostatistics - Piracicaba 2014 119

2.13 An historical note on the Bayesian approach

 Thomas Bayes was probably born in 1701  Up to 1950 Bayes theorem was called and died in 1761 Theorem of Inverse Probability  He was a Presbyterian minister, studied logic  Fundament of Bayesian theory was developed by and theology at Edinburgh University, Pierre-Simon Laplace (1749-827) and had strong mathematical interests  Laplace first assumed indifference prior,  Bayes theorem was submitted posthumously by later he relaxed this assumption his friend Richard Price in 1763  Much opposition: and was entitled e.g. Poisson, Fisher, Neyman and Pearson, etc An Essay toward a Problem in the Doctrine of Chances

Bayesian Biostatistics - Piracicaba 2014 120 Bayesian Biostatistics - Piracicaba 2014 121 Proponents of the Bayesian approach:

• de Finetti: exchangeability

 Fisher strong opponent to Bayesian theory • Jeffreys: noninformative prior, Bayes factor  Because of his influence • Savage: theory of subjective and personal probability and statistics ⇒ dramatic negative effect  Opposed to use of flat prior and • Lindley: Gaussian hierarchical models that conclusions change when putting flat prior on h(θ) rather than on θ • Geman & Geman: Gibbs sampling  Some connection between • Fisher and (inductive) Bayesian approach, Gelfand & Smith: introduction of Gibbs sampling into statistics but much difference with N&P approach • Spiegelhalter: (Win)BUGS

Bayesian Biostatistics - Piracicaba 2014 122 Bayesian Biostatistics - Piracicaba 2014 123

Recommendation Take home messages

The theory that would not die. How Bayes rule cracked the enigma • Bayes theorem = model for learning code, hunted down Russian submarines & emerged triumphant from • Probability in a Bayesian context: two centuries of controversy  data: classical, parameters: expressing what we belief/know

Mc Grayne (2011) • Bayesian approach = likelihood approach + prior • Inference:  Bayesian: ◦ based on parameter space (posterior distribution) ◦ conditions on observed data, parameter stochastic  Classical: ◦ based on sample space (set of possible outcomes) ◦ looks at all possible outcomes, parameter fixed

Bayesian Biostatistics - Piracicaba 2014 124 Bayesian Biostatistics - Piracicaba 2014 125 Chapter 3 Introduction to Bayesian inference • Prior can come from: historical data or subjective belief

• Prior is equivalent to extra data

• Noninformative prior can mimic classical results Aims: • Posterior = compromise between prior and likelihood  Introduction to basic concepts in Bayesian inference  Introduction to simple sampling algorithms • For large sample, likelihood dominates prior  Illustrating that sampling can be useful alternative to analytical/other numerical • Bayesian approach was obstructed by many throughout history techniques to determine the posterior  Illustrating that Bayesian testing can be quite different from frequentist testing • ... but survived because of a computational trick ... (MCMC)

Bayesian Biostatistics - Piracicaba 2014 126 Bayesian Biostatistics - Piracicaba 2014 127

3.1 Introduction 3.2 Summarizing the posterior with probabilities

More specifically we look at: Direct exploration of the posterior: P (a<θ

• Normal approximation of posterior • P(a<θ

Bayesian Biostatistics - Piracicaba 2014 128 Bayesian Biostatistics - Piracicaba 2014 129 3.3 Posterior summary measures 3.3.1 Posterior mode, mean, median, variance and SD

• Posterior mode: θM where posterior distribution is maximum • We now summarize the posterior distribution with some simple measures, similar to what is done when summarizing collected data. • Posterior mean: mean of posterior distribution, i.e. θ = θp(θ|y)dθ

• • Posterior median: median of posterior distribution, i.e. 0.5= p(θ|y)dθ The measures are computed on a (population) distribution. θM

• Posterior variance: variance of posterior distribution, i.e. σ2 = (θ − θ)2 p(θ|y)dθ

• Posterior : sqrt of posterior variance, i.e. σ

• Note: ◦ Only posterior median of h(θ) can be obtained from posterior median of θ ◦ Only posterior mode does not require integration ◦ Posterior mode = MLE with flat prior

Bayesian Biostatistics - Piracicaba 2014 130 Bayesian Biostatistics - Piracicaba 2014 131

Example III.2: Stroke study – Posterior summary measures Graphical representation measures:

• Posterior at 1st interim ECASS 3 analysis: Beta(α, β) (α =19& β = 133)

• Posterior mode: maximize (α − 1) ln(θ)+(β − 1) ln(1 − θ) wrt θ ⇒ θM =(α − 1)/((α + β − 2)) = 18/150 = 0.12

• Posterior mean: integrate 1 1 θθα−1(1 − θ)β−1 dθ B(α,β) 0 ⇒ θ = B(α +1, β)/B(α, β)=α/(α + β)=19/152 = 0.125

• 1 1 α−1 − β−1 Posterior median: solve 0.5= θ θ (1 θ) dθfor θ B(α,β) M ⇒ θM = =0.122 (R-function qbeta) θ • Posterior variance: calculate also 1 1 θ2 θα−1(1 − θ)β−1 dθ  B(α,β) 0 ⇒ σ2 = α β/ (α + β)2(α + β +1) = 0.02672

Bayesian Biostatistics - Piracicaba 2014 132 Bayesian Biostatistics - Piracicaba 2014 133 Posterior summary measures: 3.3.2 Credible/credibility interval

• Posterior for μ (based on IBBENS prior): • [a,b] = 95% (CI) for θ if P (a ≤ θ ≤ b | y)=0.95 ≡ ≡ Gaussian with μM μ μM • Two types of 95% credible interval: • Posterior mode=mean=median: μM = 327.2 mg/dl  95% equal tail CI [a, b]: AUC = 0.025 is left to a & AUC = 0.025 is right to b • Posterior variance & SD: σ2 =22.99 & σ =4.79 mg/dl  95% highest posterior density (HPD) CI [a, b]: [a, b] contains most plausible values of θ

• Properties:  100(1-α)%HPDCI=shortest interval with size (1 − α) (Press, 2003) h(HPD CI) = HPD CI, but h(equal-tail CI) = equal-tail CI  Symmetric posterior: equal tail = HPD CI

Bayesian Biostatistics - Piracicaba 2014 134 Bayesian Biostatistics - Piracicaba 2014 135

Example III.4: Dietary study – Interval estimation of dietary intake Example III.5: Stroke study – Interval estimation of probability of SICH

• Posterior = N(μ, σ2) • Posterior = Beta(19, 133)-distribution

• Obvious choice for a 95% CI is [μ − 1.96 σ, μ +1.96 σ] • 95% equal tail CI (R function qbeta)=[0.077, 0.18] (see figure)

• Equal 95% tail CI = 95% HPD interval • 95% HPD interval = [0.075, 0.18] (see figure)

• Results IBBENS-2 study: • Computations HPD interval:useR-function optimize  IBBENS prior distribution ⇒ 95% CI = [317.8, 336.6] mg/dl  N(328; 10,000) prior ⇒ 95% CI = [285.6, 351.0] mg/dl  Classical (frequentist) 95% confidence interval = [284.9, 351.1] mg/dl

Bayesian Biostatistics - Piracicaba 2014 136 Bayesian Biostatistics - Piracicaba 2014 137 Equal tail credible interval: HPD interval:

W W

Z

θ ψ θ

HPD interval is not-invariant to (monotone) transformations

Bayesian Biostatistics - Piracicaba 2014 138 Bayesian Biostatistics - Piracicaba 2014 139

3.4 Predictive distributions 3.4.1 Introduction

• Predictive distribution = distribution of a future observation y after having observed the sample {y1,...,yn}

• Two assumptions:  Future observations are independent of current observations given θ  Future observations have the same distribution (p(y | θ)) as the current observations

• We look at three cases: (a) binomial, (b) Gaussian and (c) Poisson

• We start with a binomial example

Bayesian Biostatistics - Piracicaba 2014 140 Bayesian Biostatistics - Piracicaba 2014 141 Example III.7: Stroke study – Predicting SICH incidence in interim analysis Example III.7: Known incidence rate

• • Before 1st interim analysis but given the pilot data: MLE of θ (incidence SICH) = 8/100 = 0.08 for (fictive) ECASS 2 study Obtain an idea of the number of (future) rt-PA treated patients who will suffer  Assume now that 8% is the true incidence fromSICHinsampleofsizem =50  Predictive distribution: Bin(50, 0.08)  95% predictive set: {0, 1,...,7}≈94% of the future counts • Distribution of y (given the pilot data)? ⇒ observed result of 10 SICH patients out of 50 is extreme

• But, 8% is likely not the true value

Bayesian Biostatistics - Piracicaba 2014 142 Bayesian Biostatistics - Piracicaba 2014 143

Binomial predictive distribution Example III.7: Unknown incidence rate

• Uncertainty of θ expressed by posterior (ECASS 2 prior) = Beta(α, β)-distribution with α =9and β =93 W • Distribution (likelihood) of m =50future SICH events is given by Bin(m, θ)

• Distribution of m =50future SICH events without knowing θ is weighted by posterior distribution = posterior predictive distribution (PPD)

• PPD = Beta-binomial distribution BB(m, α, β)    1 α−1 − β−1 |y m y − (m−y) θ (1 θ) p(y )=  θ (1 θ) dθ Y 0 y B(α, β) m B(y + α, m − y + β) = y Ignores uncertainty in θ B(α, β)

Bayesian Biostatistics - Piracicaba 2014 144 Bayesian Biostatistics - Piracicaba 2014 145 Result: Binomial and beta-binomial predictive distribution • BB(m, α, β)showsmore variability than Bin(m, θ)

• 94.4% PPS is {0, 1,...,9}⇒10 SICH patients out of 50 less extreme WW W Y

Bayesian Biostatistics - Piracicaba 2014 146 Bayesian Biostatistics - Piracicaba 2014 147

3.4.2 Posterior predictive distribution: General case Example III.7: Second interim analysis

• Posterior after 1st interim analysis can be used: • Central idea: Take the posterior uncertainty into account ◦ To compute PPD of the number SICH events in subsequent m treated rt-PA patients, to be evaluated in 2nd interim analysis • Three cases: ◦ As prior to be combined with the results in the 2nd interim analysis ≈ | y ⇒   |  All mass (AUC 1)ofp(θ ) at θM distribution of y: p(y θM ) ◦  This is an example of the sequential use of Bayes theorem 1 K ⇒  K  | k k | y  All mass at θ ,...,θ distribution of y: k=1 p(y θ )p(θ )  General case: posterior predictive distribution (PPD)

⇒ distribution of y

p(y | y)= p(y | θ)p(θ | y) dθ

• PPD expresses what we know about the distribution of the (future) ys

Bayesian Biostatistics - Piracicaba 2014 148 Bayesian Biostatistics - Piracicaba 2014 149 Example III.6: SAP study – 95% normal Example III.6: Frequentist approach

√ 2 • Serum alkaline phosphatase (alp) was measured on a prospective set of 250 • yi = 100/ alpi (i =1,...,250) ≈ normal distribution N(μ, σ ) ‘healthy’ patients by Topal et al (2003) • μ =known • Purpose: determine 95% normal range  95% normal range for y: [μ − 1.96 σ, μ +1.96 σ], with μ = y = 7.11 (σ = 1.4) • Recall: σ2 is known ⇒ 95% normal range for alp = [104.45, 508.95]

• μ = unknown    95% normal range for y: [y − 1.96 σ 1+1/n, y +1.96 σ 1+1/n] ⇒ 95% normal range for alp = [104.33, 510.18]

Bayesian Biostatistics - Piracicaba 2014 150 Bayesian Biostatistics - Piracicaba 2014 151

Histogram alp + 95% normal range Example III.6: Bayesian approach

• PPD: y | y ∼ N(μ, σ2 + σ2) √ √ • 95% normal range for y: [μ − 1.96 σ2 + σ2, μ +1.96 σ2 + σ2]

• 2 ⇒ 2 2 Prior variance σ0 large σ = σ /n

⇒ Bayesian 95% normal range = frequentist 95% normal range

⇒ 95% normal range for alp = [104.33, 510.18]

V • Same numerical results BUT the way to deal with uncertainty of the true value of

the parameter is different

Bayesian Biostatistics - Piracicaba 2014 152 Bayesian Biostatistics - Piracicaba 2014 153 Example III.8: Caries study – PPD for caries experience Example III.8: Caries study – PPD for caries experience

• Poisson likelihood + gamma prior ⇒ gamma posterior • PPD = negative binomial distribution NB(α, β)

    Γ(α + y) β α 1 y p(y|y)= Γ(α) y! β +1 β +1 [

Bayesian Biostatistics - Piracicaba 2014 154 Bayesian Biostatistics - Piracicaba 2014 155

More applications of PPDs 3.5 Exchangeability

See later: • Exchangeable: • Determining when to stop a trial (see medical literature further) ◦ Order in {y1,y2,...,yn} is not important ◦ y can be exchanged for y without changing the problem/solution • Given the past responses, predict the future responses of a patient i j ◦ Extension of independence • Imputing • Exchangeability = key idea in (Bayesian) statistical inference • Model checking (Chapter 10) • Exchangeability of patients, trials, ...

Bayesian Biostatistics - Piracicaba 2014 156 Bayesian Biostatistics - Piracicaba 2014 157 3.6 A normal approximation to the posterior 3.7 Numerical techniques to determine the posterior

• When the sample is large and the is relatively simple (not many • To compute the posterior, the product of likelihood with the prior needs to be parameters), then often the posterior looks like a normal distribution divided with the integral of that product

• This property is typically used in the analysis of clinical trials • In addition, to compute posterior mean, median, SD we also need to compute integrals • In the book, we show that the normal approximation can work well for small sample sizes (Example III.10) • Here we see 2 possible ways to obtain in general the posterior and posterior summary measures: • This is a nice property, but not crucial in Bayesian world, since all we need is the  Numerical integration posterior  Sampling from the posterior

Bayesian Biostatistics - Piracicaba 2014 158 Bayesian Biostatistics - Piracicaba 2014 159

3.7.1 Numerical integration Calculation AUC using mid-point approach:

Example III.11: Caries study – Posterior distribution for a lognormal prior

◦ Take first 10 children from STM study ◦ Assume dmft-index has Poisson(θ) with θ = mean dmft-index ◦ With prior: Gamma(3, 1) ⇒ Posterior: Gamma(29, 11)

 Replace gamma prior by lognormal prior, then posterior

   − 2 log(θ) μ0 n y −1 −nθ− [X ∝ θ i=1 i e 2σ0 , (θ>0) θ  Posterior moments cannot be evaluated & AUC not known  Mid-point approach provides AUC

Bayesian Biostatistics - Piracicaba 2014 160 Bayesian Biostatistics - Piracicaba 2014 161 3.7.2 Sampling from the posterior distribution Monte-Carlo integration

  • Monte Carlo integration: replace integral by a Monte Carlo sample {θ1,...,θK} • Monte Carlo integration: usefulness of sampling idea • Approximate p(θ|y) by sample histogram • General purpose sampling algorithms • Approximate posterior mean by:  K 1  θp(θ|y) dθ ≈ θ = θ , for K large K k k=1 • Classical 95% confidence interval to indicate precision of posterior mean √ √ [θ − 1.96 s/ K,θ +1.96 s/ K] √ θ θ with s/ K = Monte Carlo error θ  Also 95% credible intervals can be computed (approximated)

Bayesian Biostatistics - Piracicaba 2014 162 Bayesian Biostatistics - Piracicaba 2014 163

Example III.12: Stroke study – Sampling from the posterior distribution Sampling approximation:

• Posterior for θ = probability of SICH with rt-PA = Beta(19, 133) (Example II.1) • 5, 000 sampled values of θ from Beta(19, 133)-distribution

• Posterior of log(θ): one extra line in R-program

• Sample summary measures ≈ true summary measures

• 95% equal tail CI for θ: [0.0782, 0.182] (θ) • 95% equal tail CI for log(θ): [-2.56, -1.70] θ

• Approximate 95% HPD interval for θ: [0.0741, 0.179]

Bayesian Biostatistics - Piracicaba 2014 164 Bayesian Biostatistics - Piracicaba 2014 165 Monte Carlo error (K=50) General purpose sampling algorithms

• Many algorithms are available to sample from standard distributions

• Dedicated procedures/general purpose algorithms are needed for non-standard

  distributions

• An important example: Accept-reject (AR) algorithm used by e.g. WinBUGS

  Generate samples from an instrumental distribution  Then reject certain generated values to obtain sample from posterior   

AR algorithm: black box = true posterior mean & red box = sampled posterior mean only product of likelihood and prior is needed

Bayesian Biostatistics - Piracicaba 2014 166 Bayesian Biostatistics - Piracicaba 2014 167

Accept-reject algorithm – 1 Accept-reject algorithm – 2

 • Sampling in two steps: • Stage 1: θ & u are drawn independently from q(θ) & U(0, 1)   Stage 1: sample from q(θ) (proposal distribution) ⇒ θ • Stage 2:  ⇒ | y  Stage 2: reject θ sample from p(θ ) (target)    Accept: when u ≤ p(θ | y)/A q(θ) • |y   Assumption: p(θ ) p(θ | y)/A q(θ) q= envelope distribution • W Properties AR algorithm: A= envelope constant  Produces a sample from the posterior \  Only needs p(y | θ) p(θ)  Probability of acceptance = 1/A θ

Bayesian Biostatistics - Piracicaba 2014 168 Bayesian Biostatistics - Piracicaba 2014 169 Example III.13: Caries study – Sampling from posterior with lognormal prior Sampling approximation

Accept-reject algorithm

• Lognormal prior is maximized for log(θ)=μ0   n − − ⇒ i=1 yi 1 nθ ∝ Aq(θ)=θ e Gamma( i yi,n)-distribution • Data from Example III.11

• Prior: lognormal distribution with μ0 = log(2) & σ0 =0.5

• 1000 θ-values sampled: 840 accepted  Sampled posterior mean (median) = 2.50 (2.48) θ  Posterior variance = 0.21  95% equal tail CI = [1.66, 3.44]

Bayesian Biostatistics - Piracicaba 2014 170 Bayesian Biostatistics - Piracicaba 2014 171

Adaptive Rejection Sampling algorithms – 1 Adaptive Rejection Sampling algorithms – 2

Adaptive Rejection Sampling (ARS) algorithm: Tangent ARS Derivative-free ARS • Builds envelope function/density in an adaptive manner

V • Builds squeezing function/density in an adaptive manner

• Envelope and squeezing density are log of piecewise linear functions with knots at sampledgridpoints Z^V Z^V

• Two special cases:

] ]  Tangent method of ARS θ θ θ θ θ θ θ θ  Derivative-free method of ARS

Bayesian Biostatistics - Piracicaba 2014 172 Bayesian Biostatistics - Piracicaba 2014 173 Adaptive Rejection Sampling algorithms – 3 3.7.3 Choice of posterior summary measures

Properties ARS algorithms: • In practice: posterior summary measures are computed with sampling techniques • Envelope function can be made arbitrarily close to target • Choice driven by available software: • 5 to 10 grid points determine envelope density  Mean, median and SD because provided by WinBUGS  Mode almost never reported, but useful to compare with frequentist solution • Squeezing density avoids (many) function evaluations  Equal tail CI (WinBUGS) & HPD (CODA and BOA) • Derivative-free ARS is implemented in WinBUGS

• ARMS algorithm: combination with Metropolis algorithm for non log-concave distributions, implemented in Bayesian SAS procedures

Bayesian Biostatistics - Piracicaba 2014 174 Bayesian Biostatistics - Piracicaba 2014 175

3.8 Bayesian hypothesis testing Example III.14: Cross-over study – Use of CIs in Bayesian hypothesis testing

• 30 patients with systolic hypertension in cross-over study: Two Bayesian tools for hypothesis testing H0: θ = θ0 ◦ Period 1: to A or B • Based on credible interval: ◦ Washout period ◦  Direct use of credible interval: Period 2: switch medication (A to B and B to A) ◦ Is θ contained in 100(1-α)% CI? If not, then reject H in a Bayesian way! 0 0 • θ =P(A better than B) & H : θ = θ (= 0.5) ◦ Popular when many parameters need to be evaluated, e.g. in regression 0 0 models • Result: 21 patients better of with A than with B  Contour probability pB: P [p(θ | y) >p(θ0 | y)] ≡ (1 − pB) • Testing: • Bayes factor: change of prior odds for H due to data (below) 0  Frequentist: 2-sided binomial test (P =0.043)

 Bayesian: U(0,1) prior + LBin(21 | 30,θ) = Beta(22, 10)posterior (pB =0.023)

Bayesian Biostatistics - Piracicaba 2014 176 Bayesian Biostatistics - Piracicaba 2014 177 Graphical representation of pB: Predictive distribution of power and necessary sample size • Classical power and sample size calculations make use of formulas to compute power for a given sample size, and necessary sample size for a chosen power

• However, there is often much uncertainty involved in the calculations because: ◦ Comparison of : only a rough idea of the common SD is available to us,

θ and there is no agreement on the aimed difference in efficacy ◦ Comparison of proportions: only a rough idea of the control rate is available to us, and there is no agreement on the aimed difference in efficacy

• Typically one tries a number of possible values of the required parameter values and inspects the variation in the calculation power/sample size

Y • Alternatively: use Bayesian approach with priors on parameters θ • Here 2 examples: comparison of means & comparison of proportions

Bayesian Biostatistics - Piracicaba 2014 178 Bayesian Biostatistics - Piracicaba 2014 179

Comparison of means Comparison of means – continued

• Explicit formulas for power given sample size & sample size given power are  Example 6.5 (Spiegelhalter et al., 2004) available: ◦ ⇒ −   σ =1, Δ=1, α =0.05 zα/2 = 1.96 ◦ nΔ2 ◦ Power given sample size:power=Φ 2σ2 + zα/2 Power given sample size: n =63(sample size in each treatment group) ◦ Sample size given power: power = 0.80 ⇒ z =0.84 ◦ 2σ2 − 2 β Sample size given power: n = Δ2 (zβ zα/2)  Priors: Δ ∼ N(0.5, 0.1) & σ ∼ N(1, 0.3) | [0, ∞) • Formulas correspond to Z-test to compare 2 means with:  Predictive distributions of power and n with WinBUGS ◦ Δ = assumed difference in means

◦ σ = assumed common SD in the two populations Y ◦ zα/2 = (negative) threshold corresponding to 2-sided α-level ◦ zβ = (positive) threshold corresponding to power = 1 − β _ _ _

Bayesian Biostatistics - Piracicaba 2014 180 Bayesian Biostatistics - Piracicaba 2014 181 Pure Bayesian ways to determine necessary sample size 3.8.1 The Bayes factor

• The above approach is a combination of a Bayesian technique to take into • account the prior uncertainty of parameter values and the classical approach of Posterior probability for H0 determining the sample size p(y | H ) p(H ) p(H | y)= 0 0 0 p(y | H ) p(H )+p(y | H ) p(H ) • Alternative, Bayesian approaches for sample size calculation have been suggested 0 0 a a based on the size of the posterior CI • Bayes factor: BF(y) • Note that the calculations require a design prior, i.e. a prior used to express the uncertainty of the parameters at the design stage. This design prior does not need p(H | y) p(y | H ) p(H ) 0 = 0 × 0 to be the same as the prior used in the analysis 1 − p(H0 | y) p(y | Ha) 1 − p(H0)

• Bayes factor = factor that transforms prior odds for H0 into posterior odds after observed the data

• Central in Bayesian inference, but not to all Bayesians

Bayesian Biostatistics - Piracicaba 2014 182 Bayesian Biostatistics - Piracicaba 2014 183

Bayes factor - Jeffreys classification 3.9 Medical literature on Bayesian methods in RCTs

• Jeffreys classification for favoring H and against H 0 a Next three papers use above introduced summary measures + techniques  Decisive (BF(y) > 100)  Very strong (32

• Jeffreys classification for favoring Ha and against H0: replace criteria by 1/criteria

 Decisive (BF(y) > 1/100) ...

Bayesian Biostatistics - Piracicaba 2014 184 Bayesian Biostatistics - Piracicaba 2014 185 RCT example 1

@‘h€ƒyr ) QG Tuhu r‡ hy Ghpr‡ "&' ! 

        

Y

Bayesian Biostatistics - Piracicaba 2014 186 Bayesian Biostatistics - Piracicaba 2014 187

RCT example 2

@‘h€ƒyr !) Xuh‡ v† yvxry’ r†ˆy‡ h‡ ‡ur rq4 @‘h€ƒyr !) 6V 7ˆ“qh r‡ hy E ‚s 8yv Pp !" % !" #                 #  ,-./  "  $    ! "   !  "#  " $     8 2     +"  "    #"  "  # "$        %  "   &'( )  "  # /$       $       # *  +  % )  345 # ,-./        0!     )  # ,-./      %'1  Xuh‡ v† yvxry’ r†ˆy‡ h‡ ‡ur rq4              

Bayesian Biostatistics - Piracicaba 2014 188 Bayesian Biostatistics - Piracicaba 2014 189 RCT example 3

@‘h€ƒyr !) Brr h‡v‚ ‚s svp‡v‰r sˆ‡ˆ r †‡ˆqvr† tv‰r r†ˆy‡† @‘h€ƒyr ") 8h‚ r‡ hy 6€ Crh ‡ E !( $') $ "(r" ‚i‡hvrq †‚ sh

Bayesian Biostatistics - Piracicaba 2014 190 Bayesian Biostatistics - Piracicaba 2014 191

@‘h€ƒyr ") @‘h€ƒyr ") 9rpv†v‚ ‡hiyr

Bayesian Biostatistics - Piracicaba 2014 192 Bayesian Biostatistics - Piracicaba 2014 193 Take home messages • Bayesian hypothesis testing can be based on:  credible intervals • Posterior distribution contains all information for statistical inference  contour probability • To characterize posterior, we can use:  Bayes factor  posterior mode, mean, median, variance, SD • Sampling is a useful tool to replace integration  credible intervals: equal-tail, HPD

• Integration is key in Bayesian inference

• PPD = Bayesian tool to predict future

• No need to rely on large samples for Bayesian inference

Bayesian Biostatistics - Piracicaba 2014 194 Bayesian Biostatistics - Piracicaba 2014 195

Chapter 4 4.1 Introduction More than one parameter • Most statistical models involve more than one parameter to estimate

• Examples:  Normal distribution: mean μ and variance σ2 Aims:  Linear regression: regression coefficients β ,β ,...,β and residual variance σ2  Moving towards practical applications 0 1 d  : regression coefficients β0,β1,...,βd  Illustrating that computations become quickly involved  d  Multinomial distribution: class probabilities θ1,θ2,...,θd with j=1 θj =1  Illustrating that frequentist results can be obtained with Bayesian procedures  Illustrating a multivariate (independent) sampling algorithm • This requires a prior for all parameters (together): expresses our beliefs about the model parameters

• Aim: derive posterior for all parameters and their summary measures

Bayesian Biostatistics - Piracicaba 2014 196 Bayesian Biostatistics - Piracicaba 2014 197 • It turns out that in most cases analytical solutions for the posterior will not be 4.2 Joint versus marginal posterior inference possible anymore

• In Chapter 6, we will see that for this methods are • Bayes theorem: needed θ | y θ L( )p( ) • Here, we look at a simple multivariate sampling approach: Method of Composition p(θ | y)= L(θ | y)p(θ) d θ T ◦ Hence, the same expression as before but now θ =(θ1,θ2,...,θd) ◦ Now, the prior p(θ) is multivariate. But often a prior is given for each parameter separately ◦ Posterior p(θ | y) is also multivariate. But we usually look only at the (marginal) posteriors p(θj | y)(j =1,...,d)

• We also need for each parameter: posterior mean, median (and sometimes mode), and credible intervals

Bayesian Biostatistics - Piracicaba 2014 198 Bayesian Biostatistics - Piracicaba 2014 199

4.3 The normal distribution with μ and σ2 unknown

• Illustration on the normal distribution with μ and σ2 unknown Acknowledging that μ and σ2 are unknown • Application: determining 95% normal range of alp (continuation of Example III.6) 2 • Sample y1,...,yn of independent observations from N(μ, σ ) • We look at three cases (priors): • Joint likelihood of (μ, σ2) given y:  No prior knowledge is available   n  Previous study is available 1 1 L(μ, σ2 | y)= exp − (y − μ)2 (2πσ2)n/2 2σ2 i  Expert knowledge is available i=1 • • But first, a brief theoretical introduction The posterior is again product of likelihood with prior divided by the denominator which involves an integral

• In this case analytical calculations are possible in 2 of the 3 cases

Bayesian Biostatistics - Piracicaba 2014 200 Bayesian Biostatistics - Piracicaba 2014 201 4.3.1 No prior knowledge on μ and σ2 is available Justification prior distribution

• Most often prior information on several parameters arrives to us for each of the − • Noninformative joint prior p(μ, σ2) ∝ σ 2 (μ and σ2 a priori independent) parameters separately and independently ⇒ p(μ, σ2)=p(μ) × p(σ2)    • Posterior distribution p(μ, σ2 | y) ∝ 1 exp − 1 (n − 1)s2 + n(y − μ)2 σn+2 2σ2 • And, we do not have prior information on μ nor on σ2 ⇒ choice of prior distributions:   a a       a a

` `                    a           σ μ   • The chosen priors are called flat priors  

Bayesian Biostatistics - Piracicaba 2014 202 Bayesian Biostatistics - Piracicaba 2014 203

• Motivation: Marginal posterior distributions ◦ If one is totally ignorant of a , then it could take any value on the real line with equal prior probability. Marginal posterior distributions are needed in practice ◦ If totally ignorant about the scale of a parameter, then it is as likely to lie in the interval 1-10 as it is to lie in the interval 10-100. This implies a flat prior  p(μ | y) on the log scale.  p(σ2 | y) • The flat prior p(log(σ)) = c is equivalent to chosen prior p(σ2) ∝ σ−2 • Calculation of marginal posterior distributions involve integration:

p(μ | y)= p(μ, σ2 | y)dσ2 = p(μ | σ2, y)p(σ2 | y)dσ2

• Marginal posterior is weighted sum of conditional posteriors with weights = uncertainty on other parameter(s)

Bayesian Biostatistics - Piracicaba 2014 204 Bayesian Biostatistics - Piracicaba 2014 205 Conditional & marginal posterior distributions for the normal case Some t-densities

• Conditional posterior for μ: p(μ | σ2, y)=N(y, σ2/n)         

2 • Marginal posterior for μ: p(μ | y)=tn−1(y, s /n)          μ − y ⇒ √ ∼ t − (μ is the random variable) s/ n (n 1)                        

         • Marginal posterior for σ2: p(σ2 | y) ≡ Inv − χ2(n − 1,s2) (scaled inverse chi-squared distribution)

− 2          ⇒ (n 1)s ∼ 2 − 2 χ (n 1) (σ is the random variable) σ2                            = special case of IG(α, β) (α =(n − 1)/2, β =1/2)

Bayesian Biostatistics - Piracicaba 2014 206 Bayesian Biostatistics - Piracicaba 2014 207

Some inverse-gamma densities Joint posterior distribution

         • Joint posterior = multiplication of marginal with conditional posterior

     

p(μ, σ2 | y)=p(μ | σ2, y) p(σ2 | y)=N(y, σ2/n) Inv − χ2(n − 1,s2)   

   • 2 − 2   

Normal-scaled-inverse chi-square distribution = N-Inv-χ (y,n,(n 1),s )

        

           

     

  

  

   

         σ   μ  

        

⇒ Aposterioriμ and σ2 are dependent

Bayesian Biostatistics - Piracicaba 2014 208 Bayesian Biostatistics - Piracicaba 2014 209 Posterior summary measures and PPD Implications of previous results

For μ:

 Posterior mean = mode = median = y (n−1) 2 Frequentist versus Bayesian inference:  Posterior variance = n(n−2)s  Numerical results are the same  95% equal tail credible and HPD interval:√ √ [y − t(0.025; n − 1) s/ n, y + t(0.025; n − 1) s/ n]  Inference is based on different principles

For σ2:

◦ Posterior mean, mode, median, variance, 95% equal tail CI all analytically available ◦ 95% HPD interval is computed iteratively

PPD:   1 ◦ − 2 tn 1 y, s 1+n -distribution

Bayesian Biostatistics - Piracicaba 2014 210 Bayesian Biostatistics - Piracicaba 2014 211

Example IV.1: SAP study – Noninformative prior Normal range for alp:

 Example III.6: normal range for alp is too narrow • PPD for y = t249(7.11, 1.37)-distribution  Joint posterior distribution = N-Inv-χ2 (NI prior + likelihood, see before) √  Marginal posterior distributions (red curves) for y = 100/ alp • 95% normal range for alp = [104.1, 513.2], slightly wider than before μ σ

Bayesian Biostatistics - Piracicaba 2014 212 Bayesian Biostatistics - Piracicaba 2014 213 4.3.2 An historical study is available Example IV.2: SAP study – Conjugate prior

• Prior based on retrospective study (Topal et al., 2003) of 65 ‘healthy’ subjects: • Posterior of historical data can be used as prior to the likelihood of current data √ ◦ Mean (SD) for y = 100/ alp = 5.25 (1.66) • 2 2 ◦ 2 Prior = N-Inv-χ (μ0,κ0,ν0,σ0)-distribution (from historical data) Conjugate prior = N-Inv-χ (5.25, 65, 64,2.76) ◦ Note: mean (SD) prospective data: 7.11 (1.4), quite different • Posterior = N-Inv-χ2(μ,κ, ν,σ2)-distribution (combining data and N-Inv-χ2 prior) ◦ 2  N-Inv-χ2 is conjugate prior Posterior = N-Inv-χ (6.72, 315, 314, 2.61): ◦  Again shrinkage of posterior mean towards prior mean Posterior mean in-between between prior mean & sample mean, but: ◦   Posterior variance = weighted average of prior-, sample variance and distance Posterior precision = prior + sample precision between prior and sample mean ◦ Posterior variance < prior variance and > sample variance ◦ ⇒ posterior variance is not necessarily smaller than prior variance! Posterior informative variance > NI variance ◦ Prior information did not lower posterior uncertainty, reason: conflict of • Similar results for posterior measures and PPD as in first case likelihood with prior

Bayesian Biostatistics - Piracicaba 2014 214 Bayesian Biostatistics - Piracicaba 2014 215

Marginal posteriors: retro- and prospective data:

      

(− )

          μ σ Red curves = marginal posteriors from informative prior (historical data) (− )

Bayesian Biostatistics - Piracicaba 2014 216 Bayesian Biostatistics - Piracicaba 2014 217 4.3.3 Expert knowledge is available What now?

• Expert knowledge available on each parameter separately

⇒ 2 × − 2 2  Joint prior N(μ0,σ0) Inv χ (ν0,τ0 ) = conjugate Computational problem:  ‘Simplest problem’ in classical statistics is already complicated • Posterior cannot be derived analytically, but numerical/sampling techniques are available  Ad hoc solution is still possible, but not satisfactory  There is the need for another approach

Bayesian Biostatistics - Piracicaba 2014 218 Bayesian Biostatistics - Piracicaba 2014 219

4.4 Multivariate distributions Example IV.3: Young adult study – Smoking and alcohol drinking

• Study examining life style among young adults Distributions with a multivariate response: Smoking  Multivariate normal distribution: generalization of normal distribution Alcohol No Yes  Multivariate Student’s t-distribution: generalization of location-scale t-distribution No-Mild 180 41  Multinomial distribution: generalization of binomial distribution Moderate-Heavy 216 64 Total 396 105 Multivariate prior distributions:

 N-Inv-χ2-distribution: prior for N(μ, σ2) • Of interest: association between smoking & alcohol-consumption  Dirichlet distribution: generalization of beta distribution  (Inverse-)Wishart distribution: generalization of (inverse-) gamma (prior) for matrices (see mixed model chapter)

Bayesian Biostatistics - Piracicaba 2014 220 Bayesian Biostatistics - Piracicaba 2014 221 Likelihood part: Dirichlet prior:

2×2 = multinomial model Mult(n, θ) Conjugate prior to multinomial distribution = Dirichlet prior Dir(α)   • θ = {θ11,θ12,θ21,θ22 =1− θ11 − θ12 − θ21} and 1= θij 1 α −1 i,j θ ∼ θ ij  B(α) ij • y { } i,j = y11,y12,y21,y22 and n = i,j yij

◦ α = {α11,α12,α21,α22}   ◦ α n! B( )= i,j Γ(αij)/Γ i,j αij θ y11 y12 y11 y22 Mult(n, )= θ11 θ12 θ21 θ22 y11! y12! y21! y22! ⇒ Posterior distribution = Dir(α + y)

• Note: ◦ Dirichlet distribution = extension of beta distribution to higher dimensions ◦ Marginal distributions of a Dirichlet distribution = beta distribution

Bayesian Biostatistics - Piracicaba 2014 222 Bayesian Biostatistics - Piracicaba 2014 223

Measuring association: Analysis of contingency table:

• Prior distribution: Dir(1, 1, 1, 1) • Association between smoking and alcohol consumption: • θ θ Posterior distribution: Dir(180+1, 41+1,216+1, 64+1) ψ = 11 22 θ12 θ21 • Sample of 10, 000 generated values for θ parameters • Needed p(ψ | y), but difficult to derive • 95% equal tail CI for ψ: [0.839, 2.014] • Alternatively replace analytical calculations by sampling procedure • Equal to classically obtained estimate

Bayesian Biostatistics - Piracicaba 2014 224 Bayesian Biostatistics - Piracicaba 2014 225 Posterior distributions: 4.5 Frequentist properties of Bayesian inference

• Not of prime interest for a Bayesian to know the sampling properties of estimators

• However, it is important that Bayesian approach gives most often the right answer

• What is known?

θ θ  Theory: posterior is normal for a large sample (BCLT)  Simulations: Bayesian approach may offer alternative interval estimators with better coverage than classical frequentist approaches

θ ψ

Bayesian Biostatistics - Piracicaba 2014 226 Bayesian Biostatistics - Piracicaba 2014 227

4.6 The Method of Composition Sampling from posterior when y ∼ N(μ, σ2), both parameters unknown

• 2 2 2 | 2 y A method to yield a random sample from a multivariate distribution Sample first σ , then given a sampled value of σ (σ ) sample μ from p(μ σ , )

• Stagewise approach • Output case 1: No prior knowledge on μ and σ2 on next page

• Based on factorization of joint distribution into a marginal & several conditionals

p(θ1,...,θd | y)=p(θd | y) p(θd−1 | θd, y) ... p(θ1 | θd−1,...,θ2, y)

• Sampling approach:   Sample θd from p(θd | y)  |  y  Sample θ(d−1) from p(θ(d−1) θd, ) ...     Sample θ1 from p(θ1 | θd−1,...,θ2, y)

Bayesian Biostatistics - Piracicaba 2014 228 Bayesian Biostatistics - Piracicaba 2014 229 Sampled posterior distributions: 4.7 Bayesian linear regression models

• Example of a classical multiple linear

• Non-informative Bayesian multiple linear regression analysis:  Non-informative prior for all parameters + ... classical linear regression model σ μ  Analytical results are available + method of composition can be applied

σ μ ]

Bayesian Biostatistics - Piracicaba 2014 230 Bayesian Biostatistics - Piracicaba 2014 231

4.7.1 The frequentist approach to linear regression Example IV.7: Osteoporosis study: a frequentist linear regression analysis

Classical regression model: y = Xβ + ε  Cross-sectional study (Boonen et al., 1996)

. y =an × 1 vector of independent responses  245 healthy elderly women in a geriatric hospital . X = n × (d +1)design matrix  Aim: Find determinants for osteoporosis . β = (d +1)× 1 vector of regression parameters  Average age women = 75 yrs with a range of 70-90 yrs ε × ∼ 2 I . = n 1 vector of random errors N(0,σ )  Marker for osteoporosis = tbbmc (in kg) measured for 234 women Likelihood:  model: regressing tbbmc on bmi β 2 | y X 1 − 1 y − Xβ T y − Xβ L( ,σ , )= 2 n/2 exp 2( ) ( )  Classical frequentist regression analysis: (2πσ ) 2σ ◦ β0 =0.813 (0.12) β β XT X −1XT y . MLE = LSE of : =( ) ◦ β1 =0.0404 (0.0043) . Residual sum of squares: S =(y − Xβ)T (y − Xβ) ◦ s2 =0.29, with n − d − 1 = 232 2 . Mean residual sum of squares: s = S/(n − d − 1) ◦ corr(β0, β1) =-0.99

Bayesian Biostatistics - Piracicaba 2014 232 Bayesian Biostatistics - Piracicaba 2014 233 Scatterplot + fitted regression line: 4.7.2 A noninformative Bayesian linear regression model

Bayesian linear regression model = prior information on regression parameters & residual variance + normal regression likelihood

• Noninformative prior for (β,σ2): p(β,σ2) ∝ σ−2

• Notation: omit design matrix X

• Posterior distributions: ! " − p(β,σ2 | y)=N β | β,σ2(XT X) 1 × Inv − χ2(σ2 | n − d − 1,s2) (d+1) ! " − β | 2 y β | β 2 XT X 1 ( ) p( σ , )=N(d+1) ,σ ( ) 2 2 2 2 p(σ | y)=Inv − χ!(σ | n − d − 1,s )" 2 T −1 p(β | y)=Tn−d−1 β | β,s (X X)

Bayesian Biostatistics - Piracicaba 2014 234 Bayesian Biostatistics - Piracicaba 2014 235

4.7.3 Posterior summary measures for the linear regression model Multivariate posterior summary measures

β • Posterior summary measures of Multivariate posterior summary measures for (a) regression parameters β • Posterior mean (mode) of β = β (MLE=LSE) (b) parameter of residual variability σ2 • 100(1-α)%-HPD region • Univariate posterior summary measures • β β Contour probability for H0 : = 0  The marginal posterior mean (mode, median) of βj = MLE (LSE) βj

 95% HPD interval for βj  Marginal posterior mode and mean of σ2  95% HPD-interval for σ2

Bayesian Biostatistics - Piracicaba 2014 236 Bayesian Biostatistics - Piracicaba 2014 237 Posterior predictive distribution 4.7.4 Sampling from the posterior distribution

• PPD of y with x: t-distribution • Most posteriors can be sampled via standard sampling algorithms • How to sample? • What about p(β | y) = multivariate t-distribution? How to sample from this  Directly from t-distribution distribution? (R function rmvt in mvtnorm)  Method of Composition • Easy with Method of Composition: Sample in two steps  Sample from p(σ2 | y): scaled inverse chi-squared distribution ⇒ σ2  Sample from p(β | σ2, y) = multivariate normal distribution

Bayesian Biostatistics - Piracicaba 2014 238 Bayesian Biostatistics - Piracicaba 2014 239

Example IV.8: Osteoporosis study – Sampling with Method of Composition PPD:

• Sample σ2 from p(σ2 | y)=Inv − χ2(σ2 | n − d − 1,s2) • Distribution of a future observation at bmi=30 ! " • β β | 2 y β | β 2 XT X −1 • Sample future observation y from N(μ , σ2 ): Sample from from p( σ , )=N(d+1) , σ ( ) 30 30 T  μ30 = β (1, 30) • Sampled mean regression vector = (0.816, 0.0403)   2 2 XT X −1 T  σ30 = σ 1+(1, 30)( ) (1, 30)

• 95% equal tail CIs = β0: [0.594, 1.040] & β1: [0.0317, 0.0486] • Sampled mean and standard deviation = 2.033 and 0.282

• Contour probability for H0 : β = 0 < 0.001

• Marginal posterior of (β0,β1) has a ridge (r(β0, β1)=−0.99)

Bayesian Biostatistics - Piracicaba 2014 240 Bayesian Biostatistics - Piracicaba 2014 241 Posterior distributions: 4.8 Bayesian generalized linear models

Generalized (GLIM): extension of the linear regression model to a wide class of regression models

• Examples:

◦ Normal linear regression model with normal distribution for continuous β β 2 response and σ assumed known ◦ model with Poisson distribution for count response, and log(mean) = linear function of covariates ◦

Logistic regression model with Bernoulli distribution for binary response and β logit of probability = linear function of covariates ] β

Bayesian Biostatistics - Piracicaba 2014 242 Bayesian Biostatistics - Piracicaba 2014 243

4.8.1 More complex regression models Take home messages

• Considered multiparameter models are limited • Any practical application involves more than one parameter, hence immediately  Weibull distribution for alp? Bayesian inference is multivariate even with univariate data.

 Censored/truncated data? • A multivariate prior is needed and a multivariate posterior is obtained, but the  Cox regression? marginal posterior is the basis for practical inference

• Postpone to MCMC techniques • Nuisance parameters:  Bayesian inference: average out nuisance parameter  Classical inference: profile out (maximize out nuisance parameter)

• Multivariate independent sampling can be done, if marginals can be computed

• Frequentist properties of Bayesian estimators (with NI priors) often good

Bayesian Biostatistics - Piracicaba 2014 244 Bayesian Biostatistics - Piracicaba 2014 245 Chapter 5 5.1 Introduction Choosing the prior distribution

Incorporating prior knowledge

 Unique feature for Bayesian approach Aims:  But might introduce subjectivity  Review the different principles that lead to a prior distribution  Useful in clinical trials to reduce sample size  Critically review the impact of the subjectivity of prior information In this chapter we review different kinds of priors:

 Conjugate  Noninformative  Informative

Bayesian Biostatistics - Piracicaba 2014 246 Bayesian Biostatistics - Piracicaba 2014 247

5.2 The sequential use of Bayes theorem 5.3 Conjugate prior distributions

• Posteriorofthekth experiment = prior for the (k +1)th experiment (sequential In this section: surgeries) • Conjugate priors for univariate & multivariate data distributions • In this way, the Bayesian approach can mimic our human learning process • Conditional conjugate and semi-conjugate distributions • Meaning of ‘prior’ in prior distribution: • Hyperpriors ◦ Prior: prior knowledge should be specified independent of the collected data ◦ In RCTs: fix the prior distribution in advance

Bayesian Biostatistics - Piracicaba 2014 248 Bayesian Biostatistics - Piracicaba 2014 249 5.3.1 Conjugate priors for univariate data distributions Table conjugate priors for univariate discrete data distributions

• In previous chapters, examples were given whereby combination of prior with member Parameter Conjugate prior likelihood, gives posterior of the same type as the prior. UNIVARIATE CASE Discrete distributions

• This property is called conjugacy. Bernoulli Bern(θ) θ Beta(α0, β0)

Binomial Bin(n,θ) θ Beta(α0, β0)

• For an important class of distributions (those that belong to exponential family) Negative Binomial NB(k,θ) θ Beta(α0, β0)

there is a recipe to produce the conjugate prior Poisson Poisson(λ) λ Gamma(α0, β0)

Bayesian Biostatistics - Piracicaba 2014 250 Bayesian Biostatistics - Piracicaba 2014 251

Table conjugate priors for univariate continuous data distributions Recipe to choose conjugate priors

p(y | θ) ∈ exponential family: Exponential family member Parameter Conjugate prior   UNIVARIATE CASE p(y | θ)=b(y)exp c(θ)T t(y)+d(θ) Continuous distributions 2 2 2 Normal-variance fixed N(μ, σ )-σ fixed μ N(μ0, σ0) T ◦ d(θ), b(y) = scalar functions, c(θ)=(c1(θ),...,cd(θ)) Normal-mean fixed N(μ, σ2)-μ fixed σ2 IG(α ,β ) 0 0 ◦ t θ 2 2 (y) = d-dimensional sufficient for (canonical parameter) Inv-χ (ν0,τ0 ) ∗ 2 2 ◦ Normal N(μ, σ ) μ, σ NIG(μ0, κ0, a0, b0) Examples: Binomial distribution, Poisson distribution, normal distribution,etc. 2 2 N-Inv-χ (μ0,κ0,ν0,τ0 ) For a random sample y = {y1,...,yn} of i.i.d. elements: Exponential Exp(λ) λ Gamma(α0, β0)   p(y | θ)=b(y)exp c(θ)T t(y)+nd(θ)

 ◦ y n t y n t b( )= 1 b(yi) & ( )= 1 (yi)

Bayesian Biostatistics - Piracicaba 2014 252 Bayesian Biostatistics - Piracicaba 2014 253 Recipe to choose conjugate priors Recipe to choose conjugate priors

For the exponential family, the class of prior distributions  closed under sampling = • Above rule gives the natural conjugate family   p(θ | α,β)=k(α,β)exp c(θ)T α + βd(θ) • Enlarge class of priors  by adding extra parameters: conjugate family of priors, again closed under sampling (O’Hagan & Forster, 2004) ◦ α =(α ,...,α )T and β 1 d   • ◦ Normalizing constant: k(α,β)=1/ exp c(θ)T α + βd(θ) dθ The conjugate prior has the same functional form as the likelihood, obtained by replacing the data (t(y) and n) by parameters (α and β) Proof of closure: • A conjugate prior is model-dependent, in fact likelihood-dependent θ | y ∝ y | θ θ p( ) p(  )p( )    θ T t y θ θ T α θ =expc( ) ( )+nd( ) exp c( ) + βd( ) ∗ ∗ =expc(θ)T α + β d(θ) , with α∗ = α + t(y), β∗ = β + n

Bayesian Biostatistics - Piracicaba 2014 254 Bayesian Biostatistics - Piracicaba 2014 255

Practical advantages when using conjugate priors Example V.2: Dietary study – Normal versus t-prior

A (natural) conjugate prior distribution for the exponential family is convenient from • Example II.2: IBBENS-2 normal likelihood was combined with N(328,100) several viewpoints: (conjugate) prior distribution

• mathematical • Replace the normal prior by a t30(328, 100)-prior ⇒ posterior practically unchanged, but 3 elegant features of normal prior are lost: • numerical  Posterior cannot be determined analytically • interpretational (convenience prior):  Posterior is not of the same class as the prior  The likelihood of historical data can be easily turned into a conjugate prior.  Posterior summary measures are not obvious functions of the prior and the The natural conjugate distribution = equivalent to a fictitious experiment sample summary measures  For a natural conjugate prior, the posterior mean = weighted combination of the prior mean and sample estimate

Bayesian Biostatistics - Piracicaba 2014 256 Bayesian Biostatistics - Piracicaba 2014 257 5.3.2 Conjugate prior for normal distribution – mean and variance Mean known and variance unknown unknown

• For σ2 unknown and μ known : N(μ, σ2) with μ and σ2 unknown ∈ two-parameter exponential family Natural conjugate is inverse gamma (IG) • Conjugate = product of a normal prior with inverse gamma prior Equivalently: scaled inverse-χ2 distribution (Inv-χ2)

• Notation: NIG(μ0, κ0, a0, b0)

Bayesian Biostatistics - Piracicaba 2014 258 Bayesian Biostatistics - Piracicaba 2014 259

5.3.3 Multivariate data distributions Table conjugate priors for multivariate data distributions

Priors for two popular multivariate models: Exponential family member Parameter Conjugate prior • MULTIVARIATE CASE Multinomial model Discrete distributions Multinomial Mult(n,θ) θ Dirichlet(α0) • Multivariate normal model Continuous distributions Normal-covariance fixed N(μ, Σ)-Σ fixed μ N(μ0, Σ0) Normal-mean fixed N(μ, Σ)-μ fixed Σ IW(Λ0,ν0) ∗ Normal N(μ, Σ) μ, Σ NIW(μ0, κ0, ν0, Λ0)

Bayesian Biostatistics - Piracicaba 2014 260 Bayesian Biostatistics - Piracicaba 2014 261 Multinomial model Multivariate normal model

y Mult(n,θ): p(y | θ)= n! k θ j ∈ exponential family y1!y2!...yk! j=1 j The p-dimensional multivariate normal distribution: α Natural conjugate: Dirichlet( 0) distribution    y y | μ 1 −1 n y − μ T −1 y − μ p( 1,..., n , Σ) = np/2| |1/2 exp 2 i=1( i ) Σ ( i ) (2π) Σ k Γ(α ) α −1 θ | α j=1 0j k 0j p( 0)= k j=1 θj j=1 Γ(α0j) Conjugates: Properties: μ μ μ  Σ known and unknown: N( 0, Σ0) for

 Posterior distribution = Dirichlet(α0 + y)  Σ unknown and μ known: inverse Wishart distribution IW(Λ0, ν0) for Σ  Beta distribution = special case of a Dirichlet distribution with k =2  Σ unknown and μ unknown:  Marginal distributions of the Dirichlet distribution = beta distributions μ μ Normal-inverse Wishart distribution NIW( 0,κ0,ν0, Λ0) for and Σ  Dirichlet(1, 1,...,1) = extension of the classical uniform prior Beta(1,1)

Bayesian Biostatistics - Piracicaba 2014 262 Bayesian Biostatistics - Piracicaba 2014 263

5.3.4 Conditional conjugate and semi-conjugate priors 5.3.5 Hyperpriors

Example θ =(μ, σ2) for y ∼ N(μ, σ2) Conjugate priors are restrictive to present prior knowledge

• 2 ⇒ Give parameters of conjugate prior also a prior Conditional conjugate for μ:N(μ0,σ0) Example: • Conditional conjugate for σ2:IG(α, β) • Prior: θ ∼ Beta(1, 1) • Semi-conjugate prior = product of conditional conjugates • Instead: θ ∼ Beta(α, β) and α ∼ Gamma(1, 3), β ∼ Gamma(2, 4) • Often conjugate priors cannot be used in WinBUGS, but semi-conjugates are popular α,β= hyperparameters  Gamma(1, 3) × Gamma(2, 4) = /hierarchical prior

• Aim: more flexibility in prior distribution (and useful for Gibbs sampling)

Bayesian Biostatistics - Piracicaba 2014 264 Bayesian Biostatistics - Piracicaba 2014 265 5.4 Noninformative prior distributions 5.4.1 Introduction

Sometimes/often researchers cannot or do not wish to make use of prior knowledge ⇒ prior should reflect this absence of knowledge

• Prior that express no knowledge = (initially) called a noninformative (NI)

• Central question: What prior reflects absence of knowledge?  Flat prior?  Huge amount of research to find best NI prior  Other terms for NI: non-subjective, objective, default, reference, weak, diffuse, flat, conventional and minimally informative,etc

• Challenge: make sure that posterior is a proper distribution!

Bayesian Biostatistics - Piracicaba 2014 266 Bayesian Biostatistics - Piracicaba 2014 267

5.4.2 Expressing ignorance Ignorance at different scales:

σ σ • Equal prior probabilities = principle of insufficient reason, principle of indifference, Bayes-Laplace postulate

• Unfortunately, but ... flat prior cannot express ignorance σ σ

σ σ σ σ

Ignorance on σ-scale is different from ignorance on σ2-scale

Bayesian Biostatistics - Piracicaba 2014 268 Bayesian Biostatistics - Piracicaba 2014 269 5.4.3 General principles to choose noninformative priors

A lot of research has been spent on the specification of NI priors, most popular are Ignorance cannot be expressed mathematically Jeffreys priors:

• Result of a Bayesian analysis depends on choice of scale for flat prior: p(θ) ∝ c or p(h(θ)) ≡ p(ψ) ∝ c

• To preserve conclusions when changing scale: Jeffreys suggested a rule to construct priors based on the invariance principle/rule (conclusions do not change when changing scale)

• Jeffreys rule suggests a way to choose a scale to take the flat prior on

• Jeffreys rule also exists for more than one parameter (Jeffreys multi-parameter rule)

Bayesian Biostatistics - Piracicaba 2014 270 Bayesian Biostatistics - Piracicaba 2014 271

Examples of Jeffreys priors 5.4.4 Improper prior distributions

√ • Binomial model: p(θ) ∝ θ−1/2(1 − θ)−1/2 ⇔ flat prior on ψ(θ) ∝ arcsin θ • Many NI priors are improper (= AUC is infinite) √ • Poisson model: p(λ) ∝ λ−1/2 ⇔ flat prior on ψ(λ)= λ • Improper prior is technically no problem when posterior is proper

• Normal model with σ fixed: p(μ) ∝ c • Example: Normal likelihood (μ unknown + σ2 known) + flat prior on μ     2 2 −2 p(y | μ) p(μ) p(y | μ) c 1 n μ − y • Normal model with μ fixed: p(σ ) ∝ σ ⇔ flat prior on log(σ) p(μ | y)= = = √ √ exp − p(y | μ) p(μ) dμ p(y | μ) cdμ 2πσ/ n 2 σ • Normal model with μ and σ2 unknown: p(μ, σ2) ∝ σ−2, which reproduces some classical frequentist results !!! • Complex models: difficult to know when improper prior yields a proper posterior (variance of the level-2 obs in Gaussian hierarchical model)

• Interpretation of improper priors?

Bayesian Biostatistics - Piracicaba 2014 272 Bayesian Biostatistics - Piracicaba 2014 273 5.4.5 Weak/vague priors Locally uniform prior

• For practical purposes: sufficient that prior is locally uniform also called vague or weak • Locally uniform:prior≈ constant on interval outside which likelihood ≈ zero

• Examples for N(μ, σ2) likelihood: ◦ 2 μ:N(0,σ0) prior with σ0 large ◦ σ2:IG(ε, ε) prior with ε small ≈ Jeffreys prior

X μ

Bayesian Biostatistics - Piracicaba 2014 274 Bayesian Biostatistics - Piracicaba 2014 275

Density of log(σ) for σ2 (= 1/τ 2) ∼ IG(ε, ε)

    

Vague priors in software:     

•  WinBUGS allows only (proper) vague priors (Jeffreys priors are not allowed)        ◦ ∼  mu dnorm(0.0,1.0E-6): normal prior with variance = 1000    ◦ ∼   tau2 dgamma(0.001,0.001): inverse gamma prior for variance with  −3     shape=rate=10  

•     

SAS allows improper priors (allows Jeffreys priors)                    

Bayesian Biostatistics - Piracicaba 2014 276 Bayesian Biostatistics - Piracicaba 2014 277 Density of log(σ) for σ2 ∼ IG(ε, ε) 5.5 Informative prior distributions

                         

        

                     

Bayesian Biostatistics - Piracicaba 2014 278 Bayesian Biostatistics - Piracicaba 2014 279

5.5.1 Introduction Locating a lost plane

• In basically all research some prior knowledge is available  helped locate an Air France plane in 2011 which was missing for two years using Bayesian methods • In this section:  June 2009: Air France flight 447 went missing flying from Rio de Janeiro in Brazil  Formalize the use of historical data as prior information using the power prior to Paris, France  Review the use of clinical priors, which are prior distributions based on either  Debris from the Airbus A330 was found floating on the surface of the Atlantic five historical data or on expert knowledge days later  Priors that are based on formal rules expressing prior skepticism and optimism  After a number of days, the debris would have moved with the ocean current, hence finding the black box is not easy • The set of priors representing prior knowledge = subjective or informative priors  Existing software (used by the US Coast Guard) did not help • But, first two success stories how the Bayesian approach helped to find:  Senior analyst at Metron, Colleen Keller, relied on Bayesian methods to locate the  a crashed plane black box in 2011  a lost fisherman on the Atlantic Ocean

Bayesian Biostatistics - Piracicaba 2014 280 Bayesian Biostatistics - Piracicaba 2014 281 Finding a lost fisherman on the Atlantic Ocean

New York Times (30 September 2014)

 ”... if not for statisticians, a Long Island fisherman might have died in the Atlantic Ocean after falling off his boat early one morning last summer  The man owes his life to a once obscure field known as - a set of mathematical rules for using new data to continuously update beliefs or existing VIGV`Q` .V `:<1C1:J`1$: VQJ 1 %1H:Q`VHQ0V`1J$RVG`1  1J``:`VR: VCC1 V1I:$V.Q11V: .V`HQJR1 1QJ 1J%JV  Q`` .V `:<1C1:JHQ: :JR .V]C:JVV:`H.:`V: knowledge  It is proving especially useful in approaching complex problems, including searches like the one the Coast Guard used in 2013 to find the missing fisherman, John Aldridge  But the current debate is about how scientists turn data into knowledge, evidence

VG`1``QI .V 1``:JHVH`:.1C:1RQ% `Q`1J0V 1R and predictions. Concern has been growing in recent years that some fields are not $: 1QJ1J  doing a very good job at this sort of inference. In 2012, for example, a team at the biotech company Amgen announced that they’d analyzed 53 cancer studies

Bayesian Biostatistics - Piracicaba 2014 282 Bayesian Biostatistics - Piracicaba 2014 283

and found it could not replicate 47 of them 5.5.2 Data-based prior distributions  The Coast Guard has been using Bayesian analysis since the 1970s. The approach lends itself well to problems like searches, which involve a single incident and many • In previous chapters: different kinds of relevant data, said Lawrence Stone, a statistician for Metron, a scientific consulting firm in Reston, Va., that works with the Coast Guard ◦ Combined historical data with current data assuming identical conditions ◦ Discounted importance of prior data by increasing variance

• Generalized by power prior (Ibrahim and Chen): ◦ θ | y y { } Likelihood historical data: L( 0) based on 0 = y01,...,y0n0 ◦ Prior of historical data: p0(θ | c0) ◦ Power prior distribution:

θ | y ∝ θ | y a0 θ | c p( 0,a0) L( 0) p0( 0) #.V Q:  $%:`R5 $%1RVR G7 .V  : 1 1H:C IV .QRQ`#.QI: :7V51::GCV Q`1JR .V I11J$`1.V`I:JQ.J CR`1R$V8 with 0 no accounting ≤ a0 ≤ 1 fully accounting

Bayesian Biostatistics - Piracicaba 2014 284 Bayesian Biostatistics - Piracicaba 2014 285 5.5.3 Elicitation of prior knowledge Example V.5: Stroke study – Prior for 1st interim analysis from experts

Prior knowledge on θ (incidence of SICH), elicitation based on: • Elicitation of prior knowledge: turn (qualitative) information from ‘experts’ into probabilistic language ◦ Most likely value for θ and prior equal-tail 95% CI

◦ Prior belief pk on each of the K intervals Ik ≡ [θk−1,θk) covering [0,1] • Challenges:  Most experts have no statistical background  What to ask to construct prior distribution: ◦ Prior mode, median, mean and prior 95% CI? ◦ Description of the prior: quartiles, mean, SD?  Some probability statements are easier to elicit than others θ

Bayesian Biostatistics - Piracicaba 2014 286 Bayesian Biostatistics - Piracicaba 2014 287

Elicitation of prior knowledge – some remarks Identifiability issues

• Community and consensus prior: obtained from a community of experts • With overspecified model: non-identifiable model

• Difficulty in eliciting prior information on more than 1 parameter jointly • Unidentified parameter, when given a NI prior also posterior is NI

• Lack of Bayesian papers based on genuine prior information • Bayesian approach can make parameters estimable, so that it becomes an identifiable model

• In next example, not all parameters can be estimated without extra (prior) information

Bayesian Biostatistics - Piracicaba 2014 288 Bayesian Biostatistics - Piracicaba 2014 289 Example V.6: Cysticercosis study – Estimate prevalence without Data:

Table of results: Experiment: Test Disease (True) Observed  868 pigs tested in Zambia with Ag-ELISA diagnostic test +-  496 pigs showed a positive test + πα (1 − π)(1 − β) n+=496 −  Aim: estimate the prevalence π of cysticercosis in Zambia among pigs - π(1 − α)(1− π)βn=372 Total π (1 − π) n=868 If estimate of sensitivity α and specificity β available, then: + − p + β 1 π =  Only collapsed table is available α + β − 1  Since α and β vary geographically, expert knowledge is needed ◦ p+ = n+/n = proportion of subjects with a positive test ◦ α and β = estimated sensitivity and specificity

Bayesian Biostatistics - Piracicaba 2014 290 Bayesian Biostatistics - Piracicaba 2014 291

Prior and posterior: Posterior of π:

• Prior distribution on π (p(π)),α (p(α)) and β (p(β)) is needed (a) Uniform priors for π, α and β (no prior information) (b) Beta(21,12) prior for α and Beta(32,4) prior for β (historical data) • Posterior distribution:

− | + − ∝ n − − n+ − − n p(π, α, β n ,n ) n+ [πα +(1 π)(1 β)] [π(1 α)+(1 π)β] p(π)p(α)p(β)

• WinBUGS was used πb

πb

Bayesian Biostatistics - Piracicaba 2014 292 Bayesian Biostatistics - Piracicaba 2014 293 5.5.4 Archetypal prior distributions Example V.7: Skeptical priors in a phase III RCT

Tan et al. (2003): • Use of prior information in Phase III RCTs is problematic, except for medical device trials (FDA guidance document)  Phase III RCT for treating patients with hepatocellular carcinoma  Standard treatment: surgical resection ⇒ Pleas for objective priors in RCTs  Experimental treatment: surgery + adjuvant radioactive iodine (adjuvant therapy) • There is a role of subjective priors for interim analyses:  Planning: recruit 120 patients  Skeptical prior Frequentist interim analyses for efficacy were planned:  Enthusiastic prior  First interim analysis (30 patients): experimental treatment better (P = 0.01 < 0.029 = P -value of stopping rule)  But, scientific community was skeptical about adjuvant therapy ⇒ New multicentric trial (300 patients) was set up

Bayesian Biostatistics - Piracicaba 2014 294 Bayesian Biostatistics - Piracicaba 2014 295

Questionnaire: Prior to the start of the subsequent trial:

 Pretrial opinions of the 14 clinical investigators were elicited  The prior distributions of each investigator were constructed by eliciting the prior belief on the treatment effect (adjuvant versus standard) on a grid of intervals  Average of all priors = community prior  Average of the priors of the 5 most skeptical investigators = skeptical prior

To exemplify the use of the skeptical prior:

 Combine skeptical prior with interim analysis results of previous trial ⇒ 1-sided contour probability (in 1st interim analysis) = 0.49 ⇒ The first trial would not have been stopped for efficacy

Bayesian Biostatistics - Piracicaba 2014 296 Bayesian Biostatistics - Piracicaba 2014 297 Prior of investigators: Skeptical priors:

Bayesian Biostatistics - Piracicaba 2014 298 Bayesian Biostatistics - Piracicaba 2014 299

A formal skeptical/enthusiastic prior Example V.8+9

Formal subjective priors (Spiegelhalter et al., 1994) in normal case: • Useful in the context of monitoring clinical trials in a Bayesian manner • θ = true effect of treatment (A versus B)

• Skeptical normal prior: choose mean and variance of p(θ) to reflect skepticism

• Enthusiastic normal prior: choose mean and variance of p(θ) to reflect θ enthusiasm

• See figure next page & book θ

Bayesian Biostatistics - Piracicaba 2014 300 Bayesian Biostatistics - Piracicaba 2014 301 5.6 Prior distributions for regression models 5.6.1 Normal linear regression

Normal linear regression model:

xT β yi = i + εi, (i =1,...,n)

y = Xβ + ε

Bayesian Biostatistics - Piracicaba 2014 302 Bayesian Biostatistics - Piracicaba 2014 303

Priors 5.6.2 Generalized linear models

• Non-informative priors:  Popular NI prior: p(β,σ2) ∝ σ−2 (Jeffreys multi-parameter rule) • In practice choice of NI priors much the same as with linear models 2 2  WinBUGS: product of independent N(0,σ0) (σ0 large)+IG(ε, ε)(ε small) • But, too large prior variance may not be best for sampling, e.g. in logistic • Conjugate priors: regression model β 2 × 2 2  Conjugate NIG prior = N( 0,σ Σ0) IG(a0,b0) (or Inv-χ (ν0,τ0 )) • In SAS: Jeffreys (improper) prior can be chosen

• Historical/expert priors: • Conjugate priors are based on fictive historical data  Prior knowledge on regression coefficients must be given jointly  Data augmentation priors & conditional mean priors  Elicitation process via distributions at covariate values  Not implemented in classical software, but fictive data can be explicitly added  Most popular: express prior based on historical data and then standard software can be used

Bayesian Biostatistics - Piracicaba 2014 304 Bayesian Biostatistics - Piracicaba 2014 305 5.7 Modeling priors Multicollinearity

|XT X|≈ ⇒ Modeling prior: adapt characteristics of the statistical model Multicollinearity: 0 regression coefficients and standard errors inflated Ridge regression: • Multicollinearity: appropriate prior avoids inflation of of β ∗ T ∗ T ∗  Minimize: (y − Xβ) (y − Xβ)+λβ β with λ ≥ 0 & y = y − y1n • Numerical (separation) problems: appropriate prior avoids inflation of β R −  Estimate: β (λ)=(XT X + λI) 1XT y • Constraints on parameters: constraint can be put in prior = Posterior mode of a Bayesian normal linear regression analysis with: • Variable selection: prior can direct the variable search  Normal ridge prior N(0,τ2I) for β τ2 = σ2/λ with σ and λ fixed

• Can be easily extended to BGLIM

Bayesian Biostatistics - Piracicaba 2014 306 Bayesian Biostatistics - Piracicaba 2014 307

Numerical (separation) problems Constraints on parameters

Separation problems in binary regression models: complete separation and Signal-Tandmobiel study: quasi-complete separation • Solution: Take weakly informative prior on regression coefficients θk = probability of CE among Flemish children in (k =1,...,6) school year

V • Constraint on parameters: θ1 ≤ θ2 ≤···≤θ6 • Solutions: \ θ T θ  Prior on =(θ1,...,θ6) that maps all s that violate the constraint to zero [  Neglect the values that are not allowed in the posterior (useful when sampling)

W [

Bayesian Biostatistics - Piracicaba 2014 308 Bayesian Biostatistics - Piracicaba 2014 309 Other modeling priors 5.8 Other regression models

• LASSO prior (see Bayesian variable selection) • A great variety of models • ... • Not considered here: conditional logistic regression model, Cox proportional hazards model, generalized linear mixed effects models

• ...

Bayesian Biostatistics - Piracicaba 2014 310 Bayesian Biostatistics - Piracicaba 2014 311

Take home messages • Noninformative priors:  do not exist, strictly speaking • Often prior is dominated by the likelihood (data)  in practice vague priors (e.g. locally uniform) are ok • Prior in RCTs: prior to the trial  important class of NI priors: Jeffreys priors

• Conjugate priors: convenient mathematically, computationally and from an  be careful with improper priors, they might imply improper posterior interpretational viewpoint • Informative priors: • Conditional conjugate priors: heavily used in Gibbs sampling  can be based on historical data & expert knowledge (but only useful when viewpoint of a community of experts) • Hyperpriors: extend the range of conjugate priors, also important in Gibbs  are useful in clinical trials to reduce sample size sampling

Bayesian Biostatistics - Piracicaba 2014 312 Bayesian Biostatistics - Piracicaba 2014 313 Chapter 6 6.1 Introduction Markov chain Monte Carlo sampling  Solving the posterior distribution analytically is often not feasible due to the difficulty in determining the integration constant  Computing the integral using numerical integration methods is a practical alternative if only a few parameters are involved Aims: ⇒ New computational approach is needed  Introduce the sampling approach(es) that revolutionized Bayesian approach  Sampling is the way to go!  With Markov chain Monte Carlo (MCMC) methods: 1. Gibbs sampler 2. Metropolis-(Hastings) algorithm

MCMC approaches have revolutionized Bayesian methods!

Bayesian Biostatistics - Piracicaba 2014 314 Bayesian Biostatistics - Piracicaba 2014 315

Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional probability

IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bank Two (discrete) random variables X and Y participated in a dietary study • Joint probability of X and Y: probability that X=x and Y=y happen together

• Marginal probability of X: probability that X=x happens

• Marginal probability of Y: probability that Y=y happens WEIGHT • Conditional probability of X given Y=y: probability that X=x happens if Y=y

• Conditional probability of Y given X=x: probability that Y=y happens if X=x 40 60 80 100 120 140 150 160 170 180 190 200

LENGTH

Bayesian Biostatistics - Piracicaba 2014 316 Bayesian Biostatistics - Piracicaba 2014 317 Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional probability

IBBENS study: 563 (556) bank employees in 8 subsidiaries of Belgian bank IBBENS study: frequency table participated in a dietary study Length Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− Total −50 2124000018 50 − 60 12550140 0090 60 − 70 01254521310132 70 − 80 0 5 42 72 34 0 0 153

WEIGHT 80 − 90 0 0 12 58 32 2 1 105 90 − 100 00 0 20183041 100 − 110 001271011 110 − 120 00022105 40 60 80 100 120 140 150 160 170 180 190 200 120− 00001001 LENGTH Total 3 54 163 220 107 8 1 556

Bayesian Biostatistics - Piracicaba 2014 318 Bayesian Biostatistics - Piracicaba 2014 319

Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional probability

IBBENS study: joint probability IBBENS study: marginal probabilities

Length Length Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total −50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556 −50 2/556 12/556 4/556 0/556 0/556 0/556 0/556 18/556 50 − 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556 50 − 60 1/556 25/556 50/556 14/556 0/556 0/556 0/556 90/556 60 − 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556 60 − 70 0/556 12/556 54/556 52/556 13/556 1/556 0/556 132/556 70 − 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556 70 − 80 0/556 5/556 42/556 72/556 34/556 0/556 0/556 153/556 80 − 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556 80 − 90 0/556 0/556 12/556 58/556 32/556 2/556 1/556 105/556 90 − 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556 90 − 100 0/556 0/556 0/556 20/556 18/556 3/556 0/556 41/556 100 − 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556 100 − 110 0/556 0/556 1/556 2/556 7/556 1/556 0/556 11/556 110 − 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556 110 − 120 0/556 0/556 0/556 2/556 2/556 1/556 0/556 5/556 120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556 120− 0/556 0/556 0/556 0/556 1/556 0/556 0/556 1/556 Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1 Total 3/556 54/556 163/556 220/556 107/556 8/556 1/556 1

Bayesian Biostatistics - Piracicaba 2014 320 Bayesian Biostatistics - Piracicaba 2014 321 Intermezzo: Joint, marginal and conditional probability Intermezzo: Joint, marginal and conditional density

IBBENS study: conditional probabilities Two (continuous) random variables X and Y

Length • Joint density of X and Y: density f(x, y) Weight −150 150 − 160 160 − 170 170 − 180 180 − 190 190 − 200 200− total −50 12/54 • Marginal density of X: density f(x) 50 − 60 1/90 25/90 25/54 50/90 14/90 0/90 0/90 0/90 90/90 − 60 70 12/54 • Marginal density of Y: density f(y) 70 − 80 5/54 − 80 90 0/54 • Conditional density of X given Y=y: density f(x|y) 90 − 100 0/54 − 100 110 0/54 • Conditional density of Y given X=x: density f(y|x) 110 − 120 0/54 120− 0/54 Total 54/54

Bayesian Biostatistics - Piracicaba 2014 322 Bayesian Biostatistics - Piracicaba 2014 323

Intermezzo: Joint, marginal and conditional density Intermezzo: Joint, marginal and conditional density

IBBENS study: joint density IBBENS study: marginal densities

Bayesian Biostatistics - Piracicaba 2014 324 Bayesian Biostatistics - Piracicaba 2014 325 Intermezzo: Joint, marginal and conditional density 6.2 The Gibbs sampler

IBBENS study: conditional densities

• Gibbs Sampler: introduced by Geman and Geman (1984) in the context of image-processing for the estimation of the parameters of the Gibbs distribution            •  Gelfand and Smith (1990) introduced Gibbs sampling to tackle complex estimation problems in a Bayesian manner

Bayesian Biostatistics - Piracicaba 2014 326 Bayesian Biostatistics - Piracicaba 2014 327

6.2.1 The bivariate Gibbs sampler

Gibbs sampling: Method of Composition:

• p(θ1,θ2 | y) is completely determined by: • p(θ1,θ2 | y) is completely determined by:  conditional p(θ2 | θ1, y)  marginal p(θ2 | y)  conditional p(θ1 | θ2, y)  conditional p(θ1 | θ2, y) • Property yields another simple way to sample from joint distribution: • Split-up yields a simple way to sample from joint distribution 0 0  Take starting values θ1 and θ2 (only 1 is needed) k k  Given θ1 and θ2 at iteration k, generate the (k +1)-th value according to iterative scheme: (k+1) | k y 1. Sample θ1 from p(θ1 θ2, ) (k+1) | (k+1) y 2. Sample θ2 from p(θ2 θ1 , )

Bayesian Biostatistics - Piracicaba 2014 328 Bayesian Biostatistics - Piracicaba 2014 329 Example VI.1: SAP study – Gibbs sampling the posterior with NI priors

Result of Gibbs sampling: • Example IV.5: sampling from posterior distribution of the normal likelihood based • θk k k T on 250 alp measurements of ‘healthy’ patients with NI prior for both parameters Chain of vectors: =(θ1,θ2) ,k =1, 2,... √ ◦ Consists of dependent elements • − Now using Gibbs sampler based on y = 100/ alp ◦ Markov property: p(θ(k+1) | θk, θ(k 1),..., y)=p(θ(k+1) | θk, y) • Determine two conditional distributions: • Chain depends on starting value + initial portion/burn-in part must be discarded | 2 y | 2 1. p(μ σ , ): N(μ y,¯ σ /n)  2 | y − 2 2 | 2 2 1 n − 2 • Under mild conditions: sample from the posterior distribution = target distribution 2. p(σ μ, ): Inv χ (σ n, sμ) with sμ = n i=1(yi μ)

• ⇒ From k0 on: summary measures calculated from the chain consistently estimate Iterative procedure: At iteration (k +1) the true posterior measures 1. Sample μ(k+1) from N(¯y, (σ2)k/n) 2 (k+1) − 2 2 2. Sample (σ ) from Inv χ (n, s (k+1)) Gibbs sampler is called a Markov chain μ

Bayesian Biostatistics - Piracicaba 2014 330 Bayesian Biostatistics - Piracicaba 2014 331

Gibbs sampling: Gibbs sampling path and sample from joint posterior:

σ σ σ σ μ μ

σ σ μ μ μ μ ◦ Zigzag pattern in the (μ, σ2)-plane ◦ 1 complete step = 2 substeps (blue=genuine element) ◦ Sampling from conditional density of μ given σ2 ◦ Burn-in = 500, total chain = 1,500 ◦ Sampling from conditional density of σ2 given μ

Bayesian Biostatistics - Piracicaba 2014 332 Bayesian Biostatistics - Piracicaba 2014 333 Posterior distributions: Example VI.2: Sampling from a discrete × continuous distribution

• ∝ n x+α−1 − (n−x+β−1) Joint distribution: f(x, y) x y (1 y) ◦ x a discrete random variable taking values in {0, 1,...,n} ◦ y a random variable on the unit interval ◦ α, β > 0 parameters

• Question: f(x)? μ σ

Solid line = true posterior distribution

Bayesian Biostatistics - Piracicaba 2014 334 Bayesian Biostatistics - Piracicaba 2014 335

Marginal distribution: Example VI.3: SAP study – Gibbs sampling the posterior with I priors

• Example VI.1: now with independent informative priors (semi-conjugate prior) ◦ ∼ 2 μ N(μ0,σ0) ◦ 2 ∼ − 2 2 σ Inv χ (ν0,τ0 )

• Posterior: − 1 (μ−μ )2 2 1 2 0 p(μ, σ | y) ∝ e 2σ0 σ0 − − 2 2 × (σ2) (ν0/2+1) e ν0 τ0 /2σ n 1 − 1 − 2 2 (yi μ) × e 2σ σn [ i=1 n 1 1 − (μ−μ )2 n+ν − (y −μ)2 2 0 2 − 0 +1 −ν τ 2/2σ2 ∝ e 2σ2 i e 2σ0 (σ ) ( 2 ) e 0 0 ◦ Solid line = true marginal distribution i=1 ◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 336 Bayesian Biostatistics - Piracicaba 2014 337 Conditional distributions: Trace plots:

• Determine two conditional distributions: − 1 − 2 n − 1 (y −μ)2 2 (μ μ0) 1. p(μ | σ2, y): e 2σ2 i e 2σ0 (N μk, (σ2)k ) i=1    μ n − 2 2 2 2 i=1(yi μ) +ν0τ0 2. p(σ | μ, y): Inv − χ ν0 + n, ν0+n

• Iterative procedure: At iteration (k +1)

1. Sample μ(k+1) from N μk, (σ2)k    n 2 2 (yi−μ) +ν0τ 2 (k+1) − 2 i=1 0 2. Sample (σ ) from Inv χ ν0 + n, σ ν0+n

Bayesian Biostatistics - Piracicaba 2014 338 Bayesian Biostatistics - Piracicaba 2014 339

6.2.2 The general Gibbs sampler

• | k k k k k y θ0 0 0 T Full conditional distributions: p(θj θ1,...,θ(j−1),θ(j+1),...,θ(d−1),θd, ) Starting position =(θ1,...,θd) Multivariate version of the Gibbs sampler: • Also called: full conditionals Iteration (k +1): • Under mild regularity conditions: (k+1) | k k k y θk, θ(k+1),... ultimately are observations from the posterior distribution 1. Sample θ1 from p(θ1 θ2,...,θ(d−1),θd, ) (k+1) | (k+1) k k y 2. Sample θ2 from p(θ2 θ1 , θ3,...,θd, ) . . With the help of advanced sampling algorithms (AR, ARS, ARMS, etc) (k+1) | (k+1) (k+1) y sampling the full conditionals is done based on the prior × likelihood d. Sample θd from p(θd θ1 ,...,θ(d−1), )

Bayesian Biostatistics - Piracicaba 2014 340 Bayesian Biostatistics - Piracicaba 2014 341 Example VI.4: British coal mining disasters data Statistical model:

 British coal mining disasters data set: # severe accidents in British coal mines • Likelihood: Poisson process with a change point at k from 1851 to 1962  y ∼ Poisson(θ) for i =1,...,k  Decrease in frequency of disasters from year 40 (+ 1850) onwards? i  yi ∼ Poisson(λ) for i = k +1,...,n (n=112)

• Priors

 θ: Gamma(a1,b1), (a1 constant, b1 parameter)

 λ: Gamma(a2,b2), (a2 constant, b2 parameter)

  k: p(k)=1/n

 b1: Gamma(c1,d1), (c1,d1 constants)  b2: Gamma(c2,d2), (c2,d2 constants) _

Bayesian Biostatistics - Piracicaba 2014 342 Bayesian Biostatistics - Piracicaba 2014 343

Full conditionals: Posterior distributions:

k p(θ | y,λ,b1,b2,k)=Gamma(a1 + yi,k+ b1) i=1 n p(λ | y,θ,b1,b2,k)=Gamma(a2 + yi,n− k + b2) i=k+1 p(b1 | y,θ,λ,b2,k)=Gamma(a1 + c1,θ+ d1) p(b | y,θ,λ,b ,k)=Gamma(a + c ,λ+ d ) 2 1 2 2 2 π(y | k, θ, λ) θ λ p(k | y,θ,λ,b ,b )= 1 2 n π(y | j, θ, λ) j=1    k θ i=1 yi with π(y | k, θ, λ)=exp[k(λ − θ)] ◦ Posterior mode of k: 1891 λ ◦ Posterior mean for θ/λ= 3.42 with 95% CI = [2.48, 4.59]

◦ a1 = a2 =0.5, c1 = c2 =0, d1 = d2 =1

Bayesian Biostatistics - Piracicaba 2014 344 Bayesian Biostatistics - Piracicaba 2014 345 Note: Example VI.5: Osteoporosis study – Using the Gibbs sampler

Bayesian linear regression model with NI priors: • In most published analyses of this data set b1 and b2 are given inverse gamma priors. The full conditionals are then also inverse gamma  Regression model: tbbmci = β0 + β1bmii + εi (i =1,...,n= 234) 2 −2 • The results are almost the same ⇒ our analysis is a sensitivity analysis of the  Priors: p(β0,β1,σ ) ∝ σ analyses seen in the literature T T  Notation: y =(tbbmc1,...,tbbmc234) , x =(bmi1,...,bmi234) • Despite the classical full conditionals, the WinBUGS/OpenBUGS sampler for θ 2 2 2 Full conditionals: p(σ | β ,β , y)=Inv − χ (n, sβ) and λ are not standard gamma but rather a slice sampler. See Exercise 8.10. 0 1 | 2 y 2 p(β0 σ ,β1, )=N(rβ1,σ /n) | 2 y 2 xT x p(β1 σ ,β0, )=N(rβ0,σ / ) with  s2 = 1 (y − β − β x )2 β n  i 0 1 i r = 1 (y − β x ) β1 n i 1 i − xT x rβ0 = (yi β0) xi/

Bayesian Biostatistics - Piracicaba 2014 346 Bayesian Biostatistics - Piracicaba 2014 347

Comparison with Method of Composition: Index plot from Method of Composition:

Parameter Method of Composition 2.5% 25% 50% 75% 97.5% Mean SD

β0 0.57 0.74 0.81 0.89 1.05 0.81 0.12 β β1 0.032 0.038 0.040 0.043 0.049 0.040 0.004 σ2 0.069 0.078 0.083 0.088 0.100 0.083 0.008 Gibbs sampler 2.5% 25% 50% 75% 97.5% Mean SD [

β0 0.67 0.77 0.84 0.91 1.10 0.77 0.11

β1 0.030 0.036 0.040 0.042 0.046 0.039 0.0041 σ2 0.069 0.077 0.083 0.088 0.099 0.083 0.0077 σ

◦ Method of Composition = 1,000 independently sampled values ◦ Gibbs sampler: burn-in = 500, total chain = 1,500 [

Bayesian Biostatistics - Piracicaba 2014 348 Bayesian Biostatistics - Piracicaba 2014 349 Trace plot from Gibbs sampler: Trace versus index plot:

Comparison of index plot with trace plot shows:

• σ2: index plot and trace plot similar ⇒ (almost) independent sampling β

• β1: trace plot shows slow mixing ⇒ quite dependent sampling ⇒ Method of Composition and Gibbs sampling: similar posterior measures of σ2

⇒ Method of Composition and Gibbs sampling: less similar posterior measures of β1 σ

Bayesian Biostatistics - Piracicaba 2014 350 Bayesian Biostatistics - Piracicaba 2014 351

Autocorrelation: 6.2.3 Remarks∗

k (k−1)  of lag 1: correlation of β1 with β1 (k=1, ...) • Full conditionals determine joint distribution k (k−2)  Autocorrelation of lag 2: correlation of β1 with β1 (k=1, ...) ... • Generate joint distribution from full conditionals −  Autocorrelation of lag m: correlation of βk with β(k m) (k=1, ...) 1 1 • Transition kernel High autocorrelation:

⇒ burn-in part is larger ⇒ takes longer to forget initial positions ⇒ remaining part needs to be longer to obtain stable posterior measures

Bayesian Biostatistics - Piracicaba 2014 352 Bayesian Biostatistics - Piracicaba 2014 353 6.2.4 Review of Gibbs sampling approaches Review of Gibbs sampling approaches – The block Gibbs sampler

Sampling the full conditionals is done via different algorithms depending on: Block Gibbs sampler:  Shape of full conditional (classical versus general purpose algorithm) • Normal linear regression  Preference of software developer: 2  p(σ | β0,β1, y) ◦  SAS procedures GENMOD, LIFEREG and PHREG: ARMS algorithm 2  p(β0,β1 | σ , y) ◦ WinBUGS: variety of samplers • May speed up considerably convergence, at the expense of more computational Several versions of the basic Gibbs sampler: time needed at each iteration  Deterministic- or systematic scan Gibbs sampler: d dims visited in fixed order • WinBUGS: option on  Block Gibbs sampler: d dims split up into m blocks of parameters and Gibbs sampler applied to blocks  • SAS procedure MCMC: allows the user to specify the blocks

Bayesian Biostatistics - Piracicaba 2014 354 Bayesian Biostatistics - Piracicaba 2014 355

6.3 The Metropolis(-Hastings) algorithm 6.3.1 The Metropolis algorithm

Sketch of algorithm: Metropolis-Hastings (MH) algorithm = general Markov chain Monte Carlo technique to sample from the posterior distribution but does not require full • New positions are proposed by a proposal density q conditionals • Proposed positions will be: • Special case: Metropolis algorithm proposed by Metropolis in 1953  Accepted: • General case: Metropolis-Hastings algorithm proposed by Hastings in 1970 ◦ Proposed location has higher posterior probability: with probability 1 ◦ Otherwise: with probability proportional to ratio of posterior probabilities • Became popular only after introduction of Gelfand & Smith’s paper (1990)  Rejected: ◦ Otherwise • Further generalization: Reversible Jump MCMC algorithm by Green (1995) • Algorithm satisfies again Markov property ⇒ MCMC algorithm

• Similarity with AR algorithm

Bayesian Biostatistics - Piracicaba 2014 356 Bayesian Biostatistics - Piracicaba 2014 357 Metropolis algorithm: The MH algorithm only requires the product of the prior and the likelihood θk ⇒ θ(k+1) Chain is at Metropolis algorithm samples value as follows: to sample from the posterior   1. Sample a candidate θ from the symmetric proposal density q(θ | θ), with θ = θk 2. The next value θ(k+1) will be equal to:   • θ with probability α(θk, θ) (accept proposal), • θk otherwise (reject proposal), with # $   p(θ | y) α(θk, θ)=min r = , 1 p(θk | y)

 Function α(θk, θ) = probability of a move

Bayesian Biostatistics - Piracicaba 2014 358 Bayesian Biostatistics - Piracicaba 2014 359

Example VI.7: SAP study – Metropolis algorithm for NI prior case MH-sampling:

Settings as in Example VI.1, now apply Metropolis algorithm:

 Proposal density: N(θk, Σ) with θk =(μk, (σ2)k)T and Σ=diag(0.03, 0.03)

● ●

● ● ● ● σ σ σ σ σ

μ μ μ μ μ

◦ Jumps to any location in the (μ, σ2)-plane ◦ Burn-in = 500, total chain = 1,500

Bayesian Biostatistics - Piracicaba 2014 360 Bayesian Biostatistics - Piracicaba 2014 361 Marginal posterior distributions: Trace plots:

μ

μ σ σ ◦ Acceptance rate = 40% ◦ Burn-in = 500, total chain = 1,500 ◦ Accepted moves = blue color, rejected moves = red color

Bayesian Biostatistics - Piracicaba 2014 362 Bayesian Biostatistics - Piracicaba 2014 363

Second choice of proposal density: Accepted + rejected positions:

 Proposal density: N(θk, Σ) with θk =(μk, (σ2)k)T and Σ=diag(0.001, 0.001)

           !      !

● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● σ σ σ ●●●● ●

σ ●

● ● ● ● ● ●

● ● ● ●

μ μ μ μ σ

◦ Acceptance rate = 84% ◦ Poor approximation of true distribution

Bayesian Biostatistics - Piracicaba 2014 364 Bayesian Biostatistics - Piracicaba 2014 365 Problem: 6.3.2 The Metropolis-Hastings algorithm

What should be the acceptance rate for a good Metropolis algorithm? Metropolis-Hastings algorithm: Chain is at θk ⇒ Metropolis-Hastings algorithm samples value θ(k+1) as follows:   From theoretical work + simulations: 1. Sample a candidate θ from the (asymmetric) proposal density q(θ | θ), with θ = θk • Acceptance rate: 45% for d =1and ≈ 24% for d>1 2. The next value θ(k+1) will be equal to:   • θ with probability α(θk, θ) (accept proposal), • θk otherwise (reject proposal), with # $ θ | y θk | θ θk θ p( ) q( ) α( , )=min r =  , 1 p(θk | y) q(θ | θk)

Bayesian Biostatistics - Piracicaba 2014 366 Bayesian Biostatistics - Piracicaba 2014 367

Example VI.8: Sampling a t-distribution using Independent MH algorithm

 2 • Reversibility condition: Probability of move from θ to θ = probability of move Target distribution : t3(3, 2 )-distribution  from θ to θ (a) Independent MH algorithm with proposal density N(3,42) 2 • Reversible chain: chain satisfying reversibility condition (b) Independent MH algorithm with proposal density N(3,2 )

  • Example asymmetric proposal density: q(θ | θk) ≡ q(θ) (Independent MH algorithm)

• WinBUGS makes use of univariate MH algorithm to sample from some non-standard full conditionals

Bayesian Biostatistics - Piracicaba 2014 368 Bayesian Biostatistics - Piracicaba 2014 369 6.3.3 Remarks* 6.5. Choice of the sampler

Choice of the sampler depends on a variety of considerations • The Gibbs sampler is a special case of the Metropolis-Hastings algorithm, but Gibbs sampler is still treated differently

• The transition kernel of the MH-algorithm

• The reversibility condition

• Difference with AR algorithm

Bayesian Biostatistics - Piracicaba 2014 370 Bayesian Biostatistics - Piracicaba 2014 371

Example VI.9: Caries study – MCMC approaches for logistic regression

Program Parameter Mode Mean SD Median MCSE Subset of n = 500 children of the Signal-Tandmobiel study at 1st examination: Intercept -0.5900 0.2800 MLE gender -0.0379 0.1810  Research questions: x-coord 0.0052 0.0017 ◦ Have girls a different risk for developing caries experience (CE) than boys Intercept -0.5880 0.2840 -0.5860 0.0104 (gender) in the first year of primary school? R gender -0.0516 0.1850 -0.0578 0.0071 ◦ Is there an east-west gradient (x-coordinate)inCE? x-coord 0.0052 0.0017 0.0052 6.621E-5  Bayesian model: logistic regression + N(0, 1002) priors for regression coefficients Intercept -0.5800 0.2810 -0.5730 0.0094 WinBUGS gender -0.0379 0.1770 -0.0324 0.0060  No standard full conditionals x-coord 0.0052 0.0018 0.0053 5.901E-5  Three algorithms: Intercept -0.6530 0.2600 -0.6450 0.0317  ◦ Self-written R program: evaluate full conditionals on a grid + ICDF-method SAS gender -0.0319 0.1950 -0.0443 0.0208 x-coord 0.0055 0.0016 0.0055 0.00016 ◦ WinBUGS program: multivariate MH algorithm (blocking mode on) ◦ SAS procedure MCMC: Random-Walk MH algorithm

Bayesian Biostatistics - Piracicaba 2014 372 Bayesian Biostatistics - Piracicaba 2014 373 Take home messages Conclusions:

• Posterior means/ of the three samplers are close (to the MLE) • The two MCMC approaches allow fitting basically any proposed model

• Precision with which the posterior mean was determined (high precision = low • There is no free lunch: computation time can be MUCH longer than with MCSE) differs considerably likelihood approaches

• The clinical conclusion was the same • The choice between Gibbs sampling and the Metropolis-Hastings approach depends on computational and practical considerations ⇒ Samplers may have quite a different efficiency

Bayesian Biostatistics - Piracicaba 2014 374 Bayesian Biostatistics - Piracicaba 2014 375

Chapter 7 7.1 Introduction Assessing and improving convergence of the Markov chain

• MCMC sampling is powerful, but comes with a cost: dependent sampling + checking convergence is not easy: ◦ Convergence theorems do not tell us when convergence will occur Questions: ◦ In this chapter: graphical + formal diagnostics to assess convergence  Are we getting the right answer? •  Can we get the answer quicker? Acceleration techniques to speed up MCMC sampling procedure

• Data augmentation as Bayesian generalization of EM algorithm

Bayesian Biostatistics - Piracicaba 2014 376 Bayesian Biostatistics - Piracicaba 2014 377 7.2 Assessing convergence of a Markov chain 7.2.1 Definition of convergence for a Markov chain

Loose definition:

With increasing number of iterations (k −→ ∞ ), the distribution of k θ , pk(θ), converges to the target distribution p(θ | y).

In practice, convergence means:

• Histogram of θk remains the same along the chain

• Summary statistics of θk remain the same along the chain

Bayesian Biostatistics - Piracicaba 2014 378 Bayesian Biostatistics - Piracicaba 2014 379

Approaches 7.2.2 Checking convergence of the Markov chain

• Theoretical research Here we look at: ◦ Establish conditions that ensure convergence, but in general these theoretical results cannot be used in practice • Convergence diagnostics that are implemented in ◦ WinBUGS • Two types of practical procedures to check convergence: ◦ CODA, BOA ◦ Checking stationarity: from which iteration (k0) is the chain sampling from the posterior distribution (assessing burn-in part of the Markov chain). • Graphical and formal tests ◦ Checking accuracy: verify that the posterior summary measures are computed with the desired accuracy • Illustrations using the ◦ Osteoporosis study (chain of size 1,500) • Most practical convergence diagnostics (graphical and formal) appeal to the stationarity property of a converged chain

• Often done: base the summary measures on the converged part of the chain

Bayesian Biostatistics - Piracicaba 2014 380 Bayesian Biostatistics - Piracicaba 2014 381 7.2.3 Graphical approaches to assess convergence 7.2.4 Graphical approaches to assess convergence

Trace plot: Trace plot:

• A simple exploration of the trace plot gives a good impression of the convergence • A simple exploration of the trace plot gives a good impression of the convergence

• Univariate trace plots (each parameter separately) or multivariate trace plots • Univariate trace plots (each parameter separately) or multivariate trace plots (evaluating parameters jointly) are useful (evaluating parameters jointly) are useful

• Multivariate trace plot: monitor the (log) of likelihood or log of posterior density: • Multivariate trace plot: monitor the (log) of likelihood or log of posterior density: ◦ WinBUGS: automatically created variable ◦ WinBUGS: automatically created variable deviance ◦ Bayesian SAS procedures: default variable LogPost ◦ Bayesian SAS procedures: default variable LogPost

• Thick pen test: trace plot appears as a horizontal strip when stationarity • Thick pen test: trace plot appears as a horizontal strip when stationarity

• Trace plot evaluates the mixing rate of the chain • Trace plot evaluates the mixing rate of the chain

Bayesian Biostatistics - Piracicaba 2014 382 Bayesian Biostatistics - Piracicaba 2014 383

Univariate trace plots on osteoporosis regression analysis Multivariate trace plot on osteoporosis regression analysis

β

σ

Bayesian Biostatistics - Piracicaba 2014 384 Bayesian Biostatistics - Piracicaba 2014 385 Autocorrelation plot: Autocorrelation plot on osteoporosis regression analysis

k (k+m) • Autocorrelation of lag m (ρm): corr(θ ,θ )(fork =1,...) estimated by "#"$ β σ ◦ Pearson correlation ◦ approach

• Autocorrelation function (ACF): function m → ρm (m =0, 1,...) • Autocorrelation plot: graphical representation of ACF and indicates ◦ Mixing rate ◦ Minimum number of iterations for the chain to ‘forget’ its starting position

• Autocorrelation plot is a NOT a convergence diagnostic. Except for the burn-in phase, the ACF plot will not change anymore when further sampling

Bayesian Biostatistics - Piracicaba 2014 386 Bayesian Biostatistics - Piracicaba 2014 387

Running-mean plot: Running-mean plot on osteoporosis regression analysis

k • Running mean (ergodic mean) θ : mean of sampled values up to iteration k "#"$ β σ

◦ Checks stationarity for k>k0 of pk(θ) via the mean ◦ Initial variability of the running-mean plot is always relatively high

◦ Stabilizes with increasing k in case of stationarity β σ V

Bayesian Biostatistics - Piracicaba 2014 388 Bayesian Biostatistics - Piracicaba 2014 389 Q-Q plot: Q-Q plot on osteoporosis regression analysis

• Stationarity: distribution of the chain is stable irrespective of part β • Q-Q plot: 1st half of chain (X-axis) versus 2nd half of chain on (Y -axis) σ

• Non-stationarity: Q-Q graph deviating from the bisecting line

V V β σ

Bayesian Biostatistics - Piracicaba 2014 390 Bayesian Biostatistics - Piracicaba 2014 391

Cross-correlation plot: Cross-correlation plot on osteoporosis regression analysis

• k k Cross-correlation of θ1 with θ2: correlation between θ1 with θ2 (k =1,...,n) • k k Cross-correlation plot: scatterplot of θ1 versus θ2 • Useful in case of convergence problems to indicate which model parameters are

strongly related and thus whether model is overspecified

Bayesian Biostatistics - Piracicaba 2014 392 Bayesian Biostatistics - Piracicaba 2014 393 7.2.5 Formal approaches to assess convergence Geweke diagnostic:

◦ Acts on a single chain (θk) (k =1,...,n) Here: k

◦ Checks only stationarity ⇒ looks for k • Four formal convergence diagnostics: 0 ◦ Geweke diagnostic: single chain ◦ Compares means of (10%) early ⇔ (50%) late part of chain using a t-like test ◦ Heidelberger-Welch diagnostic: single chain • In addition, there is a dynamic version of the test: ◦ Raftery-Lewis diagnostic: single chain ◦ Cut off repeatedly x% of the chain, from the beginning ◦ Brooks-Gelman-Rubin diagnostic: multiple chains ◦ Each time compute Geweke test • Illustrations based on the • Test cannot be done in WinBUGS, but should be done in R accessing the sampled ◦ Osteoporosis study (chain of size 1,500) values of the chain produced by WinBUGS

Bayesian Biostatistics - Piracicaba 2014 394 Bayesian Biostatistics - Piracicaba 2014 395

Geweke diagnostic on osteoporosis regression analysis Geweke diagnostic on osteoporosis regression analysis

• Based on chain of 1,500 iterations

• R program geweke.diag (CODA)

With standard settings of 10% (1st part) & 50% (2nd part): numerical problems ^ ^ ^

• 2

Early part = 20%, Geweke diagnostic for β1: Z =0.057 and for σ : Z =2.00 • Dynamic version of the Geweke diagnostic (with K =20) next page, results: ◦ ⇒ β1:majorityofZ-values are outside [-1.96,1.96] non-stationarity 2 (a) β1 (b) σ (c) log posterior ◦ log(posterior): non-stationary ◦ σ2: all values except first inside [-1.96,1.96]

Bayesian Biostatistics - Piracicaba 2014 396 Bayesian Biostatistics - Piracicaba 2014 397 BGR diagnostic: Trace plot of multiple chains on osteoporosis regression analysis

◦ Version 1: Gelman and Rubin ANOVA diagnostic ◦ Version 2: Brooks-Gelman-Rubin interval diagnostic ◦ General term: Brooks-Gelman-Rubin (BGR) diagnostic ◦ Global and dynamic diagnostic ◦ Implementations in WinBUGS, CODA and BOA For both versions:

◦ 0 Take M widely dispersed starting points θm (m =1,...,M) ◦ k M parallel chains (θm)k (m =1,...,M) are run for 2n iterations ◦ The first n iterations are discarded and regarded as burn-in

Bayesian Biostatistics - Piracicaba 2014 398 Bayesian Biostatistics - Piracicaba 2014 399

Version 1: Version 1:  n k  R:theestimated potential scale reduction factor (PSRF)  Within chains: chain mean θm =(1/n) k=1 θm (m =1,...,M)   Rc = (d +3)/(d +1)R, with d =2V/var%(V ) corrected for sampling variability  Across chains: overall mean θ =(1/M ) m θm  ANOVA idea:  97.5% upper confidence bound of Rc is added   ◦ W = 1 M s2 , with s2 = 1 n (θk − θ )2 M m=1 m m n k=1 m m In practice: n M 2 ◦ B = − (θm − θ) M 1 m=1  Continue sampling until R < 1.1 or 1.2  Two estimates for var(θk | y): c ◦ ≡ % k | y n−1 1  Monitor V and W , both must stabilize Under stationarity: W and V var(θ )= n W + nB good estimates ◦ Non-stationarity: W too small and V too large  When convergence: take summary measures from 2nd half of the total chain V  Convergence diagnostic: R = W

Bayesian Biostatistics - Piracicaba 2014 400 Bayesian Biostatistics - Piracicaba 2014 401 BGR diagnostic:

Version 1: Version 2: CODA/BOA: Dynamic/graphical version  Version 1 assumes Gaussian behavior of the sampled parameters: not true for σ2 • M chains are divided into batches of length b  Within chains: empirical 100(1-α)% interval:

1/2 1/2 • V (s), W (s) and Rc(s) are calculated based upon 2nd half of the [100α/2%, 100(1-α/2)%] interval of last n iterations of M chains of length 2n (cumulative) chains of length 2sb,fors =1,...,[n/b]  Across chains: empirical equal-tail 100(1-α)% interval obtained from all chains of the last nM iterations of length 2nM • 97.5% upper pointwise confidence bound is added  Interval-based RI: length of total-chain interval VI RI = ≡ average length of the within-chain intervals WI

Bayesian Biostatistics - Piracicaba 2014 402 Bayesian Biostatistics - Piracicaba 2014 403

BGR diagnostic: BGR diagnostic on osteoporosis regression analysis

Version 2: • Based on chain of 1,500 iterations • WinBUGS: dynamic version of R I • 8 overdispersed starting values from a classical linear regression analysis α=0.20  Chain divided in cumulative subchains based on iterations 1-100, 1-200, 1-300, • First version (CODA): etc.. ◦ β1: Rc =1.06 with 97.5% upper bound equal to 1.12 2  For each part compute on the 2nd half of the subchain: ◦ σ : Rc =1.00 with 97.5% upper bound is equal to 1 (a) RI (red) • Second version (WinBUGS): (b) VI/max(VI) (green) ◦ (c) WI/max(WI) (blue) β1: RI dropped quickly below 1.1 with increasing iterations  Three curves are produced, which should be monitored and varied around 1.03 from iteration 600 onwards 2 ◦ σ : RI was quickly around 1.003 and the curves stabilized almost immediately • When convergence: take summary measures from 2nd half of the total chain

Bayesian Biostatistics - Piracicaba 2014 404 Bayesian Biostatistics - Piracicaba 2014 405 Version 1 on osteoporosis regression analysis Version 2 on osteoporosis regression analysis

β σ

(a) RI (red) (b) VI/max(VI) (green)

(c) WI/max(WI) (blue)

Bayesian Biostatistics - Piracicaba 2014 406 Bayesian Biostatistics - Piracicaba 2014 407

Conclusion about convergence of osteoporosis regression analysis 7.2.6 Computing the Monte Carlo standard error

• A chain of length 1,500 is inadequate for achieving convergence of the regression Aim: estimate the Monte Carlo standard error (MCSE) of the posterior mean θ parameters • MCMC algorithms produce dependent random variables • Convergence was attained with 15,000 iterations √ • s/ n (s posterior SD and n the length of the chain) cannot be used • For accurately estimating certain quantiles 15,000 iterations were not enough • Two approaches: • With 8 chains the posterior distribution was rather well estimated after 1500  Time-series approach (SAS MCMC/CODA/BOA) iterations each  Method of batch means (WinBUGS/CODA/BOA)

• Chain must have converged!

• Related to this: effective sample size = Size of chain if it were generated from independent sampling

Bayesian Biostatistics - Piracicaba 2014 408 Bayesian Biostatistics - Piracicaba 2014 409 MCerror on osteoporosis regression analysis MCSE for osteoporosis regression analysis

• Chain: total size of 40, 000 iterations with 20, 000 burn-in iterations

• MCSE for β1: ◦ WinBUGS: MCSE = 2.163E-4 (batch means) ◦ SAS: MCSE = 6.270E-4 (time series) ◦ CODA: MCSE = 1.788E-4 (time series) ◦ CODA: MCSE = – (batch means), computational difficulties ◦ ESS: 312 (CODA), 22.9 (SAS)

• Extreme slow convergence with SAS PROC MCMC: stationarity only after 500,000 burn-in iterations + 500,000 iterations

Bayesian Biostatistics - Piracicaba 2014 410 Bayesian Biostatistics - Piracicaba 2014 411

7.2.7 Practical experience with the formal diagnostic procedures It is impossible to prove convergence in practice:

◦ • Geweke diagnostic: Example Geyer (1992) ◦ AZT RCT example ◦ Popular, but dynamic version is needed ◦ Computational problems on some occasions

rs • Gelman and Rubin diagnostics: ◦ Animated discussion whether to use multiple chains to assess convergence ◦ BGR diagnostic should be part of each test for convergence ◦ Problem: how to pick overdispersed and realistic starting values

Bayesian Biostatistics - Piracicaba 2014 412 Bayesian Biostatistics - Piracicaba 2014 413 7.3 Accelerating convergence Choosing better starting values:

Bad starting values: Aim of acceleration approaches: lower • Can be observed in trace plot • Review of approaches: ◦ Choosing better starting values • Remedy: Take other starting values ◦ Transforming the variables: centering + standardisation ◦ Blocking: processing several parameters at the same time ◦ Algorithms that suppress the purely random behavior of the MCMC sample: over-relaxation ◦ Reparameterization of the parameters: write your model in a different manner • More sophisticated MCMC algorithms may be required • Thinning: look at part of chain, is not an acceleration technique

Bayesian Biostatistics - Piracicaba 2014 414 Bayesian Biostatistics - Piracicaba 2014 415

Effect of acceleration on osteoporosis regression analysis Effect of acceleration on osteoporosis regression analysis

β β (a) Block (b) Overrelaxation (a) Original β β (b) Block (c) Centered bmi (d) Thinning (10) (c) Overrelaxation (d) Centered bmi β β ●  β β

Bayesian Biostatistics - Piracicaba 2014 416 Bayesian Biostatistics - Piracicaba 2014 417 7.4 Practical guidelines for assessing and accelerating 7.5 Data augmentation convergence

Data augmentation (DA): augment the observed data (y) with fictive data (z) Practical guidelines: Aim: DA technique may simplify the estimation procedure considerably  Start with a pilot run to detect quickly some trivial problems. Multiple (3-5) ◦ chains often highlight problems faster Classical DA technique: EM algorithm ◦ Fictive data are often called missing data  Start with a more formal assessment of convergence only when the trace plots show good mixing • DA technique consists of 2parts:  When convergence is slow, use acceleration tricks Part 1. Assume that missing data z (auxiliary variables) are available to give  When convergence fails despite acceleration tricks: let the sampler run complete(d) data w = {y, z} longer/apply thinning/change software/ ... write your own sampler Part 2. Take into account that part of the (complete) vector w (z) is not available  Ensure that MCerror at most 5% of the posterior standard deviation • Bayesian DA technique (Tanner & Wong, 1987): Bayesian extension of EM  Many parameters: choose a random selection of relevant parameters to monitor algorithm, replacing maximization with sampling

Bayesian Biostatistics - Piracicaba 2014 418 Bayesian Biostatistics - Piracicaba 2014 419

Bayesian data augmentation technique: Applications:

• Sampling from p(θ | y, z) may be easier than from p(θ | y) • Genuine missing data

• Replace E-step by: sampling the missing data z from p(z | θ, y) • Censored/misclassified observations

• Replace M-step by: sampling θ from p(θ | y, z) • Mixture models

⇒ DA approach: repeatedly sampling from p(θ | y, z) and from p(z | θ, y) • Hierarchical models

⇒ Block Gibbs sampler

Bayesian Biostatistics - Piracicaba 2014 420 Bayesian Biostatistics - Piracicaba 2014 421 Example VII.6 – Genetic linkage model Likelihood + prior:

Imagine that: 1st cell was split up into 2 cells with  If two factors are linked with a recombination fraction π, the intercrosses Ab/aB × Ab/aB (repulsion) result in the following probabilities (θ = π2): ◦ Probabilities 1/2 and θ/4 (a) for AB: 0.5+θ/4 (b) for Ab: (1 − θ)/4 ◦ Corresponding frequencies (y1 − z) and z (c) for aB: (1 − θ)/4 (d) for ab: θ/4 ◦ (Completed) vector of frequencies: w = (125 − z, z, 18, 20, 34)

nsubjects classified into (AB, Ab, aB, ab) with y =(y1,y2,y3,y4) frequencies  Given z: p(θ | w)=p(θ | y,z) ∝ θz+y4(1 − θ)y2+y3 (completed posterior) y ∼ Mult(n, (0.5+θ/4, (1 − θ)/4, (1 − θ)/4,θ/4))  Given θ: p(z | θ, y)=Bin(y1,θ/(2 + θ))  Flat prior for θ + above multinomial likelihood ⇒ posterior  These 2 posteriors = full conditionals of posterior obtained based on p(θ | y) ∝ (2 + θ)y1(1 − θ)y2+y3θy4 ◦ Likelihood: Mult(n,(0.5, θ/4, (1 − θ)/4, (1 − θ)/4,θ/4))  Posterior = unimodal function of θ but has no standard expression ◦ Based on frequencies (125 − z, z, 18, 20, 34)  Popular data set introduced by Rao (1973): 197 animals with y = (125, 18, 20, 34) ◦ Priors:flatforθ and discrete uniform for z on {0, 1,...,125}

Bayesian Biostatistics - Piracicaba 2014 422 Bayesian Biostatistics - Piracicaba 2014 423

Output: Example VII.7 – Cysticercosis study – Estimating prevalence

Example V.6: no gold standard available θ ^ ◦ Prior information needed to estimate prevalence (π), SE (α) and SP (β) ◦ Analysis was done with WinBUGS 1.4.3 ◦ Here we show how data augmentation can simplify the MCMC procedure

Test Disease (True) Observed + - + - Total + πα (1 − π)(1 − β) z1 y − z1 y = 496 - π(1 − α)(1− π)β z2 (n − y) − z2 n − y = 372 θ u Total π (1 − π) n = 868

100 iterations, burn-in = 10 initial

Bayesian Biostatistics - Piracicaba 2014 424 Bayesian Biostatistics - Piracicaba 2014 425 Likelihood + Prior: Full conditionals:

Priors:  Observed data: complex ◦ π:Beta(νπ,ηπ)  Completed data: standard ◦ α:Beta(ν ,η ) α α   ◦ β:Beta(νβ,ηβ) πα z1 | y, π, α, β ∼ Bin y,  πα+(1− π)(1 − β)  Likelihood: multinomial with probabilities in table − | ∼ − π (1 α) z2 y, π, α, β Bin n y, − − Posterior on completed data: π (1 α)+(1 π)β | ∼ − − − − − − − − − − − α y, z1,z2,π,να,ηα) Beta(z1 + να,z2 + ηα) πz1+z2+νπ 1(1 − π)n z1 z2+ηπ 1 αz1+να 1(1 − α)z2+ηα 1 β(n y) z2+νβ 1(1 − β)y z1+ηβ 1 β | y, z1,z2,π,νβ,ηβ) ∼ Beta((n − y) − z2 + νβ,y− z1 + ηβ) π | y, z1,z2,π,νπ,ηπ) ∼ Beta(z1 + z2 + νπ,n− z1 − z2 + ηπ) Marginal posterior for prevalence:Beta(z1 + z2 + νπ,n− z1 − z2 + ηπ)

Bayesian Biostatistics - Piracicaba 2014 426 Bayesian Biostatistics - Piracicaba 2014 427

Output: Identifiability issue:

Priors: • Models based on π, α and β and 1 − π, 1 − α and 1 − β are the same π:Beta(1, 1) α:Beta(21, 12) • Only definition of ‘diseased’ and ‘positive test’ are interchanged β:Beta(32, 4) • If no strong priors are given: Markov chain will swap between values in [0,0.5] and [0.5,1]

MCMC program: • Solution: restrict π, α and β to [0.5,1]

^ ^ Self-written R-program 100,000 iterations 10,000 burn-in

Bayesian Biostatistics - Piracicaba 2014 428 Bayesian Biostatistics - Piracicaba 2014 429 Example VII.8 – Caries study – Analysis of interval-censored data Tooth numbering system:

Signal-Tandmobiel study: Distribution of emergence of tooth 22 on 500 children

 True distribution of the emergence times not known (annual examinations): ◦ Left-censored: emergence before 1st examination ◦ Right-censored: emergence after last examination ◦ Interval-censored: in-between 2 examinations

⇒ Data: [Li,Ri](i =1,...,n)

2  DA approach: assume true emergence times yi ∼ N(μ, σ )(i =1,...,n) known  Two types of full conditionals:

◦ Model parameters | yi (i =1,...,n) (classical distributions)

◦ yi | [Li,Ri] (truncated normal distributions)

Bayesian Biostatistics - Piracicaba 2014 430 Bayesian Biostatistics - Piracicaba 2014 431

Output: Example VII.9 – MCMC approaches for a probit regression problem

Priors: Example VI.9: MCMC approaches for logistic regression model Vague for μ and σ2 | x xT β Here probit regression: P (yi =1 i)=Φ( i ) ● MCMC program: ●  Gibbs sampler is logical choice Self-written R-program ● ● ● ●  Making use of DA technique ● WB-program ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Use I(Li,Ri) ● ● ● ● ●  Discretization idea: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● T ● ● ◦ x β ● zi = + εi, (i =1,...,n) ● i ● ● 10,000 iterations V ◦ εi ∼ N(0, 1) 2,000 burn-in ◦ yi =0(1)for zi ≤ (>)0 Results: ◦ | x − −xT β xT β P (yi =1 i)=P (zi > 0) = 1 Φ( i )=Φ( i ) v •: mean prediction : [2.5%, 97.5%]

Bayesian Biostatistics - Piracicaba 2014 432 Bayesian Biostatistics - Piracicaba 2014 433 Full conditionals: Take home messages

β given z: Conditional on z =(z1,z2,...,zn) with design matrix Z ◦ β ∼ β XT X −1 X • MCMC algorithms come with a cost: N( LS, ( ) ), = design matrices of covariates ⇒ 2 β XT X −1XT Z  estimation process takes often (much) longer time than classical ML Bayesian regression analysis with σ =1and LS =( ) ⇒ Sampling from a multivariate Gaussian distribution  more difficult to check if correct estimates are obtained

• z given β: Full conditional of zi on β and yi Testing convergence:

◦ yi =0: proportional to φxT β (z)I(z ≤ 0)  strictly speaking, we are never sure i ,1  handful tests (graphical + formal) are used ◦ yi =1: proportional to φxT β (z)I(z>0) i ,1 ⇒ Sampling from truncated normals  when many models need to be fit, then often trace plots are the only source for testing

Bayesian Biostatistics - Piracicaba 2014 434 Bayesian Biostatistics - Piracicaba 2014 435

Chapter 8 Software • WinBUGS + MCMC techniques implied a revolution in use of Bayesian methods

• WinBUGS is the standard software for many Bayesians

•  Aims: SAS has developed Bayesian software, but will not be discussed here

 Introduction to the use of WinBUGS (and related software) • Many R programs for Bayesian methods have been developed, both related to  Review of other Bayesian software WinBUGS/JAGS/... and stand-alone

Bayesian Biostatistics - Piracicaba 2014 436 Bayesian Biostatistics - Piracicaba 2014 437 8.1 WinBUGS and related software

In this chapter: • • WinBUGS, OpenBUGS + related software WinBUGS: Windows version of Bayesian inference Using Gibbs Sampling (BUGS) • Other sources: ◦ Start: 1989 in MRC Biostatistics at Cambridge with BUGS  WinBUGS manual, WinBUGS Examples manuals I and II ◦ Final version = 1.4.3  SAS-STAT manual • OpenBUGS = open source version of WinBUGS  Books: Lawson, Browne and Vidal Rodeiro (2003) & Ntzoufras (2009) • Both packages freely available

• Additional and related software: CODA,BOA,etc.

Bayesian Biostatistics - Piracicaba 2014 438 Bayesian Biostatistics - Piracicaba 2014 439

8.1.1 A first analysis WinBUGS program on osteoporosis simple linear regression analysis

Input to WinBUGS: .odc file        ◦ Text file: with program, data and initial values for a MCMC run        ◦ Compound document contains various types of information: text, tables, formulae,      plots, graphs, etc.   ! "#$ To access .odc file: ! "!%  ! "# ◦    &'( Start WinBUGS    &'( ◦ " &')&') Click on File ◦ Click on Open... *     !"#$ %""#$ &$%#$ $&'# !$%#$ &()# # WinBUGS programming language:  $& '#&( )"#$ "#&) '"#$' $#$% "#$! $)# #$&) *  +! ◦ Similar but not identical to R    (( )# ( ($%# *+( (% ◦ chapter 8 osteoporosis.odc

Bayesian Biostatistics - Piracicaba 2014 440 Bayesian Biostatistics - Piracicaba 2014 441 Basic components of a WinBUGS program

Data: Model:

• Components of model  List statement ◦ Blue part: likelihood  Different formats (rectangular, list, etc) for providing data are possible ◦ Red part: priors ◦ Green part: transformations for monitoring Initial values:

• WinBUGS expresses variability in terms of the precision (= 1/variance)  List statement: initial values for intercept, slope and residual precision  No need to specify initial values for all parameters of model • BUGS language = declarative: order of the statements does not matter  For derived parameters, initial value must not be specified • WinBUGS does not have if-then-else commands, replaced by step and equals function

• WinBUGS does not allow improper priors for most of its analyses

Bayesian Biostatistics - Piracicaba 2014 442 Bayesian Biostatistics - Piracicaba 2014 443

Obtain posterior samples Specification Tool

• check model: To obtain posterior samples, one requires 3 WinBUGS tools: ◦ Checking syntax of program ◦ (a) Specification Tool Message data loaded in left bottom corner of screen/error message

• load data: (b) Sample Monitor Tool ◦ Message data loaded in left bottom corner of screen/error message (c) Update Tool • compile: ◦ Message data loaded in left bottom corner of screen/error message

• load inits: ◦ button switches off/button gen inits

Bayesian Biostatistics - Piracicaba 2014 444 Bayesian Biostatistics - Piracicaba 2014 445 Sample Monitor Tool Screen dump WinBUGS program:

 Choose Samples...: ◦ Fill in node names ◦ To end: type in * and click on set  Term node because of connection with Directed Acyclic Graph (DAG)  Dynamic trace plot: click on trace

Update Tool

 Choose the number of MCMC iterations: ◦ Click on Model of the menu bar and choose Update... ◦ Specify the number of iterations ◦ To start sampling, click on update

Bayesian Biostatistics - Piracicaba 2014 446 Bayesian Biostatistics - Piracicaba 2014 447

Obtain posterior samples WinBUGS output on osteoporosis simple linear regression analysis

Extra via Update Tool:  Extra iterations: click again on update and specify extra number of iterations  To apply thinning (10%) at generation of samples: fill in 10 next to thin Extra via Sample Monitor Tool:  To look at the densities: click on density  To read off posterior summary measures: click on stats  To obtain ACF: click on auto corr  To apply thinning (10%) when showing posterior summary: fill in 10 next to thin

Bayesian Biostatistics - Piracicaba 2014 448 Bayesian Biostatistics - Piracicaba 2014 449 8.1.2 Information on samplers Acceptance rate on Ames study

Information on sampler:

Top: • Blocking: Check Blocking options..., Node Info..., Node Tool: methods Tuning: 4,000 iterations • Standard settings of samplers: Check Update options..., Updater options window Convergence _ _ _ _

• Example: settings Metropolis sampler ◦ 4,000 iterations needed to tune the sampler Bottom: ◦ Acceptance rate between 20% and 40% Tuning: 750 iterations ◦ Effect of reducing tuning phase to 750 iterations: next page No convergence

_ _ _ _

Bayesian Biostatistics - Piracicaba 2014 450 Bayesian Biostatistics - Piracicaba 2014 451

8.1.3 Assessing and accelerating convergence BGR diagnostic on osteoporosis regression analysis

Checking convergence of Markov chain: • Within WinBUGS: only BGR test is available  Choose starting values in an overdispersed manner

 Number of chains must be specified before compiling  Choose bgr diag in the Sample Monitor Tool

Bayesian Biostatistics - Piracicaba 2014 452 Bayesian Biostatistics - Piracicaba 2014 453 Checking convergence using CODA/BOA: (Simple) tricks to improve convergence (hopefully):

• Export to S+/R: connect with CODA/BOA • Center + standardize covariates  Click on coda in Sample Monitor Tool ◦ mu[i] <- beta0+beta1*(bmi[i]-mean(bmi[]))/sd(bmi[]) ◦  1 index file + x files on sampled values Vector: indicated by “[]”  Index file: • Check effect of standardization: cross-correlation plot beta0 1 1500 beta1 1501 3000 • Over-relaxation: switch on over relax in Update Tool sigma2 3001 4500  Process files with CODA/BOA  See file chapter 8 osteoporosis BGR-CODA.R

Bayesian Biostatistics - Piracicaba 2014 454 Bayesian Biostatistics - Piracicaba 2014 455

Cross-correlation on osteoporosis multiple linear regression analysis 8.1.4 Vector and matrix manipulations

Example: regress tbbmc on age, weight, length, bmi, strength: • WinBUGS file: compound document rs

• See file chapter 8 osteomultipleregression.odc

rs  Data collected in matrix x  Regression coefficients in vector beta  Scalar product: mu[i] <- inprod(beta[],x[i,]) rs  Precision matrix: beta[1:6] ∼ dmnorm(mu.beta[], prec.beta[,])  Data section in two parts: ◦ First part: list statement rs rs rs ◦ Second part: rectangular part end by END ◦ Click on both parts + each time: load data

Bayesian Biostatistics - Piracicaba 2014 456 Bayesian Biostatistics - Piracicaba 2014 457 WinBUGS program on osteoporosis multiple linear regression analysis Vector and matrix manipulations

 Ridge regression:            , -, #" -, ). "/ -, 0 "/  c<-5 , 1  -, (!  "/    2  ,  for (r in 1:6) { for (s in 1:6){

    (& &  prec.beta[r,s] <- equals(r,s)*c &'(     (   ! ( }} 2 & !%! !    (   &2 & " &')&') Zellner’s g-inverse: ,

*  ! 34( c <- 1/N for (r in 1:6) { mu.beta[r] <- 0.0} " "/. "/ !  "/  ((% ((' (($ "!' $%$ &$% for (r in 1:6) { for (s in 1:6){ "' ((%% (('" (($" &( ( $% "%( prec.beta[r,s] <- inprod(x[,r],x[,s])*tau*c '5 }}

Bayesian Biostatistics - Piracicaba 2014 458 Bayesian Biostatistics - Piracicaba 2014 459

Cross-correlation on osteoporosis multiple linear regression analysis 8.1.5 Working in batch mode

Running WinBUGS in batch mode:

rs rs • WinBUGS scripts

rs rs • R2WinBUGS: interface between R and WinBUGS

rs rs • Example: ______osteo.sim <- bugs(data, inits, parameters, "osteo.model.txt", rs rs rs rs rs rs n.chains=8, n.iter=1500, bugs.directory="c:/Program Files/WinBUGS14/", working.directory=NULL, clearWD=FALSE); vague normal prior ridge normal prior print(osteo.sim) plot(osteo.sim) ______

Bayesian Biostatistics - Piracicaba 2014 460 Bayesian Biostatistics - Piracicaba 2014 461 Output R2WinBUGS program on osteoporosis multiple linear regression analysis 8.1.6 Troubleshooting

[WwVWWY

_

● ● • ● Errors may occur at: _

● ● ● ● ● ● ◦ syntax checking ◦ data loading ◦ compilation

● ● ● ● ● ● ● ◦ providing initial values ◦ running the sampler

● ● ● ● ● ● ● • TRAP error

• Common error messages: Section Tips and Troubleshooting of WinBUGS manual

● ● ● ● ● ●

Bayesian Biostatistics - Piracicaba 2014 462 Bayesian Biostatistics - Piracicaba 2014 463

8.1.7 Directed acyclic graphs Local independence assumption:

Directed local Markov property/conditional independence assumption: • A : pictorial representation of a statistical model ◦ Each node v is independent of its non-descendants given its parents ◦ Node: random variable ◦ Line: dependence between random variables Likelihood of DAG:

| • Directed Acyclic Graph (DAG) () p(V )= v∈V p(v parents[v]) ◦ Arrow added to line: express dependence child on father Full conditionals of DAG: ◦ No cycles permitted | \ ∈ ∈ p(v V v)=p(v parents[v]) v∈parents[w] p(w parents[w])

Bayesian Biostatistics - Piracicaba 2014 464 Bayesian Biostatistics - Piracicaba 2014 465 Doodle option in WinBUGS: Doodle option in WinBUGS on osteoporosis simple linear regression analysis

Visualizes a DAG + conditional independencies

• Three types of nodes: ◦ Constants (fixed by design) ◦ Stochastic nodes: variables that are given a distribution (observed = data or rs unobserved = parameters) ◦ Logical nodes: logical functions of other nodes rs

• Two types of directed lines: rs ◦ Single: stochastic relationship ◦ Double: logical relationship

• Plates

Bayesian Biostatistics - Piracicaba 2014 466 Bayesian Biostatistics - Piracicaba 2014 467

8.1.8 Add-on modules: GeoBUGS and PKBUGS 8.1.9 Related software

OpenBUGS • GeoBUGS ◦ Interface for producing maps of the output from disease mapping and other  Started in 2004 in Helsinki spatial models  Based on BUGS language ◦ Creates and manipulates adjacency matrices  Current version: 3.2.1 (http://www.openbugs.info/w/) • PKBugs  Larger class of sampling algorithms ◦ Complex population pharmacokinetic/pharmacodynamic (PK/PD) models  Improved blocking algorithms  New functions and distributions added to OpenBUGS  Allows C(lower, upper) and truncation T(lower, upper)  More details on samplers by default

Bayesian Biostatistics - Piracicaba 2014 468 Bayesian Biostatistics - Piracicaba 2014 469 JAGS Interface with WinBUGS, OpenBUGS and JAGS

• Platform independent and written in C++ • R2WinBUGS

• Current version: 3.1.0 (http://www-fis.iarc.fr/ martyn/software/jags/allows) • BRugs and R2OpenBUGS: interface to OpenBUGS

• New (matrix) functions: • Interface between WinBUGS and SAS, STATA, MATLab ◦ mexp(): matrix exponential • R2jags: interface between JAGS and R ◦ %*%: matrix multiplication ◦ sort(): for sorting elements

• Allows censoring and truncation

Bayesian Biostatistics - Piracicaba 2014 470 Bayesian Biostatistics - Piracicaba 2014 471

 8.2 Bayesian analysis using SAS 8.3 Additional Bayesian software and comparisons

SAS is a versatile package of programs: provides tools for

◦ Setting up data bases ◦ General data handling ◦ Statistical programming ◦ Statistical analyses

SAS has a broad range of statistical procedures

◦ Most procedures support frequentist analyses ◦ Version 9.2: Bayesian options in GENMOD, LIFEREG, PHREG, MI, MIXED ◦ Procedure MCMC version 9.2 experimental, from version 9.3 genuine procedure ◦ From version 9.3 random statement in PROC MCMC

Bayesian Biostatistics - Piracicaba 2014 472 Bayesian Biostatistics - Piracicaba 2014 473 8.3.1 Additional Bayesian software 8.3.2 Comparison of Bayesian software

Additional programs (based on MCMC): Mainly in R and Matlab Comparisons are difficult in general ◦ R: check CRAN website (http://cran.r-project.org/) Only comparisons on particular models can be done ◦ Website of the International Society for Bayesian Analysis (http://bayesian.org/) ◦ MLwiN (http://www.bristol.ac.uk/cmm/software/mlwin/)

INLA: No sampling involved

◦ Integrated Nested Laplace Approximations ◦ For Gaussian models

STAN:

◦ Based on Hamiltonian sampling ◦ Faster for complex models, slower for “simple” models

Bayesian Biostatistics - Piracicaba 2014 474 Bayesian Biostatistics - Piracicaba 2014 475

Chapter 9 Hierarchical models

Part II Aims: Bayesian tools for statistical modeling  Introduction of one of the most important models in (bio)statistics, but now with a Bayesian flavor  Comparing frequentist with Bayesian solutions  Comparison Empirical Bayes estimate with Bayesian estimate

Bayesian Biostatistics - Piracicaba 2014 476 Bayesian Biostatistics - Piracicaba 2014 477 • Bayesian hierarchical models (BHMs): for hierarchical/clustered data In this chapter: • Examples: ◦ Measurements taken repeatedly over time on the same subject • Introduction to BHM via Poisson-gamma model and Gaussian hierarchical model ◦ Data with a spatial hierarchy: surfaces on teeth and teeth in a mouth • Full Bayesian versus Empirical Bayesian approach ◦ Multi-center clinical data with patients within centers ◦ Cluster randomized trials where centers are randomized to interventions • Bayesian Generalized Linear Mixed Model (BGLMM) ◦ Meta-analyses • Choice of non-informative priors • BHM: a classical mixed effects model + prior on all parameters • Examples analyzed with WinBUGS 1.4.3, OpenBUGS 3.2.1 (and Bayesian SAS • As in classical frequentist world: random and fixed effects procedures)

Bayesian Biostatistics - Piracicaba 2014 478 Bayesian Biostatistics - Piracicaba 2014 479

9.1 The Poisson-gamma hierarchical model 9.1.1 Introduction

• Simple Bayesian hierarchical model: Poisson-gamma model

• Example: Lip cancer study

Bayesian Biostatistics - Piracicaba 2014 480 Bayesian Biostatistics - Piracicaba 2014 481 Example IX.1: Lip cancer study – Description

 Mortality from lip cancer among males in the former German Democratic Republic To visually pinpoint regions with an increased risk for mortality (or morbidity): including Berlin (GDR) Display SMRs a (geographical) map: disease map  1989: 2,342 deaths from lip cancer among males in 195 regions of GDR •  Observed # of deaths: yi (i =1,...,n= 195) Problem:  SMR is an unreliable estimate of θ for a relatively sparsely populated region  Expected # of deaths: ei (i =1,...,n= 195) i i  More stable estimate provided by BHM  SMRi = yi/ei: standardized mortality rate (morbidity rate)

 SMRi: estimate of true (RR) = θi in the ith region

◦ θi =1: risk equal to the overall risk (GDR) for dying from lip cancer

◦ θi > 1: increased risk

◦ θi ≤ 1: decreased risk

Bayesian Biostatistics - Piracicaba 2014 482 Bayesian Biostatistics - Piracicaba 2014 483

Disease map of GDR: 9.1.2 Model specification

• Lip cancer mortality data are clustered: male subjects clustered in regions of GDR

• What assumptions can we make on the regions?

• What is reasonable to assume about yi and θi?

Bayesian Biostatistics - Piracicaba 2014 484 Bayesian Biostatistics - Piracicaba 2014 485 Example IX.2: Lip cancer study – Basic assumptions Example IX.2: Lip cancer study – Basic assumptions

• Assumptions on 1st level: • Assumptions on 1st level:

◦ yi ∼ Poisson(θi ei) + independent (i =1,...,n) ◦ yi ∼ Poisson(θi ei) + independent (i =1,...,n)

• Three possible assumptions about the θis: • Three possible assumptions about the θis:

A1: n regions are unique: MLE of θi =SMRi with variance θi/ei A1: n regions are unique: MLE of θi =SMRi with variance θi/ei

A2: {y1,y2,...,yn} a simple random sample of GDR: θ1 = ...= θn = θ =1 A2: {y1,y2,...,yn} a simple random sample of GDR: θ1 = ...= θn = θ =1 ∼ |· A3: n regions (θi) are related: θi ∼ p(θ |·) (i =1,...,n) (prior of θi) A3: n regions (θi) are related: θi p(θ ) (i =1,...,n) (prior of θi)

• Choice between assumptions A1 and A3: subjective arguments • Choice between assumptions A1 and A3: subjective arguments

• Assumption A2: test statistically/simulation exercise (next page) • Assumption A2: test statistically/simulation exercise (next page)

⇒ Choose A3 ⇒ Choose A3

Bayesian Biostatistics - Piracicaba 2014 486 Bayesian Biostatistics - Piracicaba 2014 487

Check assumption A2: Model specification

Assumption A3: p(θ | ψ) ◦ θ { } = θ1,...,θn is exchangeable ◦ Subjects within regions are exchangeable but not between regions

◦ Exchangeability of θi ⇒ borrowing strength (from the other regions)

What common distribution p(θ | ψ) to take?

◦ Conjugate for the Poisson distribution: θi ∼ Gamma(α, β)(i =1,...,n) 2 ◦ A priori mean: α/β, variance: α/β To establish BHM: (α, β) ∼ p(α, β)

Bayesian Biostatistics - Piracicaba 2014 488 Bayesian Biostatistics - Piracicaba 2014 489 9.1.3 Posterior distributions

W

• With y = {y1,...,yn}, joint posterior: n n Bayesian Poisson-gamma model: p(α, β, θ | y) ∝ p(yi | θi,α,β) p(θi | α, β) p(α, β) Level 1: yi | θi ∼ Poisson(θi ei) i=1 i=1 | ∼ Level 2: θi α, β Gamma(α, β) • Because of hierarchical independence of yi for (i =1,...,n) n yi n α (θ e ) β − − θ | y ∝ i i − α 1 βθi p(α, β, ) exp ( θiei) θi e p(α, β) yi! Γ(α) Prior: (α, β) ∼ p(α, β) i=1 i=1 • Assume: p(α, β)=p(α)p(β)

• With p(α)=λα exp(−λαα) & p(β)=λβ exp(−λββ) (λα = λβ =0.1)= Gamma(1,0.1)

Bayesian Biostatistics - Piracicaba 2014 490 Bayesian Biostatistics - Piracicaba 2014 491

⇒ Full conditionals

Full conditionals:

− | θ y ∝ yi+α 1 − p(θi (i),α,β, ) θi exp [ (ei + β)θi](i =1,...,n) n α−1 | θ y ∝ (β θi) − p(α ,β, ) n exp( λαα) Γ(α)!  " nα p(β | θ,α,y) ∝ β exp −( θi + λβ)β

T with θ =(θ1,...,θn) θ θ (i) = without θi

Bayesian Biostatistics - Piracicaba 2014 492 Bayesian Biostatistics - Piracicaba 2014 493 Poisson-gamma model: DAG 9.1.4 Estimating the parameters

Some results:

• p(θi | α, β, yi) = Gamma(α + yi,β+ ei)

rs [rs • Posterior mean of θi given α, β: α + yi E(θi | α, β, yi)= β + ei

rs • Shrinkage given α, β: α + yi α yi =Bi +(1 − Bi) β + ei β ei rs with Bi = β/(β + ei): shrinkage factor

• Also shrinkage for: p(θi | y)= p(θi | α, β, yi)p(α, β | y) dα dβ

Bayesian Biostatistics - Piracicaba 2014 494 Bayesian Biostatistics - Piracicaba 2014 495

Example IX.3: Lip cancer study – A WinBUGS analysis

• Marginal posterior distribution of α and β:  WinBUGS program: see chapter 9 lip cancer PG.odc     n α y  Three chains of size 10,000 iterations, burn-in 5,000 iterations Γ(y + α) β e i p(α, β | y) ∝ i i p(α, β) Γ(α)y ! β + e β + e  Convergence checks: (a) WinBUGS: BGR - OK i=1 i i i (b) CODA: Dynamic Geweke no convergence, but rest OK  Posterior summary measures from WinBUGS:

   %&  ' ()   *+ ()    

 

 

  

  

  

Bayesian Biostatistics - Piracicaba 2014 496 Bayesian Biostatistics - Piracicaba 2014 497 WinBUGS program: Random effects output:

  CODA: HPD intervals of parameters       

*6 !!  7 /  ! + ! rs Caterpillar plot ! + 2 !  - /  ,2  !  ! + $,2  rs / " 2/ rs rs 2   2 !   rs rs rs Estimate of random effects: rs rs rs *5 !   ! + !  ,2  rs First 30 θ rs i rs /& ." 2/-& ./& . - rs rs 2  & .2 ! & . rs 95% equal tail CIs rs rs rs *6   !   !  822  82  ! rs rs 2/,2 &-,2 & rs rs rs rs rs *62    22  +   rs rs /2/$-+ /2/$2. #- rs rs !/!%  + / rs

Bayesian Biostatistics - Piracicaba 2014 498 Bayesian Biostatistics - Piracicaba 2014 499

Shrinkage effect: Original and fitted map of GDR:

θ α β [

Conclusion?

Bayesian Biostatistics - Piracicaba 2014 500 Bayesian Biostatistics - Piracicaba 2014 501 9.1.5 Posterior predictive distributions Example IX.5: Lip cancer study – Posterior predictive distributions

Case 1: Prediction of future counts from a Poisson-gamma model given observed counts y.     | y  | | y | y Two cases: p(yi )= p(yi θi) p(θi α, β, ) p(α, β ) dθi dα dβ α β θi

1. Future count of males dying from lip cancer in one of the 195 regions with  |  |  2. Future count of males dying from lip cancer in new regions not considered before p(yi θi)=Poisson(yi θi ei) p(θi | α, β, y)=Gamma(θi | α + yi,β+ ei) p(α, β | y) (above)

• Sampling with WinBUGS/SAS

• WinBUGS: predict[i] ~ dpois(lambda[i])

• SAS: preddist outpred=lout;

Bayesian Biostatistics - Piracicaba 2014 502 Bayesian Biostatistics - Piracicaba 2014 503

Case 2:    9.2 Full versus Empirical Bayesian approach    p(y | y)= p(y | θ) p(θ | α, β) p(α, β | y) dθdαdβ  α β θ • Full Bayesian (FB) analysis: ordinary Bayesian approach with   p(y | θ)=Poisson(y | θ e) • Empirical Bayesian (EB) analysis: classical frequentist approach to estimate   p(θ | α, β, y)=Gamma(θ | α, β) (functions of) the ‘random effects’ (level-2 observations) in a hierarchical context p(α, β | y) (above) • EB approach on lip cancer mortality data: • Sampling with WinBUGS ◦ MMLE of α and β: maximum of p(α, β | y) when p(α) ∝ c, p(β) ∝ c • WinBUGS: ◦ MMLE of α: αEB, MMLE of β: βEB theta.new ~ dgamma(alpha,beta) • Summary measures for θ : p(θ | α , β ,y)=Gamma(α + y , β + e ) lambda.new <- theta.new*100 i i EB EB i EB i EB i predict.new ~ dpois(lambda.new) • Posterior summary measures = Empirical Bayes-estimates because hyperparameters are estimated from the empirical result

Bayesian Biostatistics - Piracicaba 2014 504 Bayesian Biostatistics - Piracicaba 2014 505 Example IX.6 – Lip cancer study – Empirical Bayes analysis

• All regions: ◦ EB: α =5.66, β =4.81 ⇒ mean of θ s=1.18 Bayesians’ criticism on EB: hyperparameters are fixed EB EB i ◦ ⇒ FB: αFB =5.84, βFB =4.91 mean of θis=1.19

• Restricted to 30 regions: comparison of estimates + 95% CIs of θis • Classical approach: Sampling variability of αEB and yi, βEB involves bootstrapping

• But is complex

Bayesian Biostatistics - Piracicaba 2014 506 Bayesian Biostatistics - Piracicaba 2014 507

Comparison 95% EB and FB: 9.3 Gaussian hierarchical models θ

FB=blue, EB=red

Bayesian Biostatistics - Piracicaba 2014 508 Bayesian Biostatistics - Piracicaba 2014 509 9.3.1 Introduction 9.3.2 The Gaussian hierarchical model

• Two-level Bayesian Gaussian hierarchical model: • Gaussian hierarchical model: hierarchical model whereby the distribution at each 2 2 level is Gaussian Level 1: yij | θi,σ ∼ N(θi,σ ) (j =1,...,mi; i =1,...,n) Level 2: θ | μ, σ2 ∼ N(μ, σ2) (i =1,...,n) • i θ θ Here variance component model/random effects model: no covariates involved 2 ∼ 2 2 ∼ 2 Priors: σ p(σ ) and (μ, σθ) p(μ, σθ) • Bayesian (hierarchical) linear model:  Hierarchical independence ◦ All parameters are given a prior distribution  Hyperparameters: often p(μ, σ2)=p(μ)p(σ2) ◦ Fundamentals by Lindley and Smith θ θ ∼ 2  Alternative model formulation is θi = μ + αi, with αi N(0,σθ) • Illustrations: dietary IBBENS study  Joint posterior:

θ 2 2 | y ∝ n mi | 2 n | 2 2 2 p( ,σ ,μ,σθ ) i=1 j=1 N(yij θi,σ ) i=1 N(θi μ, σθ) p(σ ) p(μ, σθ)

Bayesian Biostatistics - Piracicaba 2014 510 Bayesian Biostatistics - Piracicaba 2014 511

9.3.3 Estimating the parameters Example IX.7 – Dietary study – Comparison between subsidiaries

Some analytical results are available (when σ2 is fixed): Aim: compare average cholesterol intake between subsidiaries with WinBUGS

• Priors: μ ∼ N(0, 106), σ2 ∼ IG(10−3, 10−3) and σ ∼ U(0, 100) • Similar shrinkage as with the simple normal model: N(μ, σ2) (with fixed σ2) θ ◦ 3 chains each of size 10,000, 3 × 5,000 burn-in • Much shrinkage with low intra-class correlation ◦ BGR diagnostic in WinBUGS: almost immediate convergence ◦ σ2 Export to CODA: OK except for Geweke test (numerical difficulties) ICC = θ σ2 + σ2 θ • Posterior summary measures on the last 5,000 iterations

◦ μ = 328.3, σμ =9.44

◦ σM = 119.5 ◦ σθM =18.26

◦ θi: see table (not much variation)

◦ Bi ∈ [0.33, 0.45] (≈ uniform shrinkage)

Bayesian Biostatistics - Piracicaba 2014 512 Bayesian Biostatistics - Piracicaba 2014 513 DAG: FB estimation of θi with WinBUGS:

Sub mi Mean (SE) SD FBW (SD) FBS (SD) EML (SE) EREML (SE) 1 82 301.5 (10.2) 92.1 311.8 (12.6) 312.0 (12.0) 313.8 (10.3) 312.2 (11.3) 2 51 324.7 (17.1) 122.1 326.4 (12.7) 326.6 (12.7) 327.0 (11.7) 326.7 (12.3) 3 71 342.3 (13.6) 114.5 336.8 (11.7) 336.6 (11.7) 335.6 (10.7) 336.4 (11.6) 4 71 332.5 (13.5) 113.9 330.8 (11.6) 330.9 (11.5) 330.6 (10.7) 330.8 (11.6) 5 62 351.5 (19.0) 150.0 341.2 (12.8) 341.3 (12.6) 339.5 (11.1) 340.9 (11.8) 6 69 292.8 (12.8) 106.4 307.3 (14.2) 307.8 (13.5) 310.7 (10.8) 308.4 (11.6) 7 74 337.7 (14.1) 121.3 334.1 (11.4) 334.2 (11.2) 333.4 (10.6) 333.9 (11.5) 8 83 347.1 (14.5) 132.2 340.0 (11.4) 340.0 (11.4) 338.8 (10.2) 340.0 (11.2)

chapter 9 dietary study chol.odc

Bayesian Biostatistics - Piracicaba 2014 514 Bayesian Biostatistics - Piracicaba 2014 515

Clinical trial applications Meta-analyses

• Hierarchical models have been used in a variety of /medical • A Bayesian meta-analysis differs from a classical meta-analysis in that all applications: parameters are given (a priori) uncertainty  Meta-analyses • There are advantages in using a Bayesian meta-analysis. One is that no  Inclusion of historical controls to reduce the sample size asymptotics needs to be considered. This is illustrated in the next  Dealing with multiplicity meta-analysis.

Bayesian Biostatistics - Piracicaba 2014 516 Bayesian Biostatistics - Piracicaba 2014 517 Beta-blocker meta-analysis of 22 clinical trials Inclusion of historical controls - RCT example 5

 A meta-analysis on 22 RCTs on beta-blockers for reducing mortality after acute @‘h€ƒyr $) 9S C‚y€r† r‡ hy E6H6 !($ !% myocardial infarction was performed by Yusuf et al. (1985)  Originally, a classical meta-analysis was performed, below is a WinBUGS program for a Bayesian meta-analysis  dc (dt) = number of deaths out of nc (nt) patients treated with the control (beta-blocker) treatment  WinBUGS program: ,       model{ [Y for( i in 1 : Num ) { dc[i] ~ dbin(pc[i], nc[i]); dt[i] ~ dbin(pt[i], nt[i]) logit(pc[i]) <- nu[i]; logit(pt[i]) <- nu[i] + delta[i] nu[i] ~ dnorm(0.0,1.0E-5); delta[i] ~ dnorm(mu, itau)} mu ~ dnorm(0.0,1.0E-6); itau ~ dgamma(0.001,0.001); delta.new ~ dnorm(mu, itau) tau <- 1 / sqrt(itau) }

Bayesian Biostatistics - Piracicaba 2014 518 Bayesian Biostatistics - Piracicaba 2014 519

Multiplicity issues 9.4 Mixed models

• Hierarchical models have been suggested to correct for multiplicity in a Bayesian manner

• For example: ◦ Testing treatment effect in many subgroups ◦ Idea: treatment effects in the subgroups have a common distribution with mean to be estimated ◦ The type I error is not directly controlled, but because of shrinkage effect some protection against overoptimistic interpretations is obtained

• In other cases, protection is sought via changing settings guided by simulation studies

Bayesian Biostatistics - Piracicaba 2014 520 Bayesian Biostatistics - Piracicaba 2014 521 9.4.1 Introduction 9.4.2 The linear mixed model

Linear mixed model (LMM): yij response for jth observation on ith subject • Bayesian linear mixed model (BLMM): extension of Bayesian Gaussian hierarchical model xT β zT b ε yij = ij + ij i + ij • y X β Z b ε Bayesian generalized linear mixed models (BGLMM): extension of BLMM i = i + i i + i

• Bayesian nonlinear mixed model: extension of BGLMM y T ×  i =(yi1,...,yimi) : mi 1 vector of responses X xT xT T ×  i =( i1,..., imi) :(mi (d +1)) design matrix T  β =(β0,β1,...,βd) a (d +1)× 1: fixed effects Z zT zT T × ×  i =( i1,..., imi) :(mi q) design matrix of ith q 1 vector of random effects bi ε ε ε T ×  i =( i1,..., imi) : mi 1 vector of measurement errors

Bayesian Biostatistics - Piracicaba 2014 522 Bayesian Biostatistics - Piracicaba 2014 523

Distributional assumptions: • LMM popular for analyzing longitudinal studies with irregular time points + Gaussian response  bi ∼ Nq(0, G),G:(q × q)  G with (j, k)th element: σ (j = k), σ2 (j = k) • Covariates: time-independent or time-dependent bj,bk bj  ε ∼ N (0, R ),R: (m × m ) covariance matrix often R = σ2I i mi i i i i i mi • Random intercept model: b0i

 bi statistically independent of εi (i =1,...,n) • Random intercept + slope model: b0i + b1itij Implications: • Bayesian linear mixed model (BLMM): all parameters prior distribution y | b ∼ X β Z b i i Nmi( i + i i, Ri) y ∼ X β Z ZT i Nmi( i , iG i + Ri)

Bayesian Biostatistics - Piracicaba 2014 524 Bayesian Biostatistics - Piracicaba 2014 525 Distributional assumptions Random intercept & random intercept + slope model

Xb- ]  ! W # ] W _

X



Bayesian Biostatistics - Piracicaba 2014 526 Bayesian Biostatistics - Piracicaba 2014 527

Example IX.11 – Dietary study – Comparison between subsidiaries

• Bayesian linear mixed model: Aim: compare average cholesterol intake between subsidiaries correcting for age and gender | β b 2 ∼ xT β zT b 2 Level 1: yij , i,σ N( ij + ij i,σ ) (j =1,...,mi; i =1,...,n) b | ∼ Level 2: i G Nq(0, G) (i =1,...,n) • Model: Priors: σ2 ∼ p(σ2), β ∼ p(β) and G ∼ p(G) yij = β0 + β1 ageij + β2 genderij + b0i + εij

 Joint posterior: b ∼ N(0,σ2 ) 0i b0 ∼ 2 β 2 b b | y y εij N(0,σ ) p( , G,σ , 1,..., n 1,..., n) ∝  Priors: vague for all parameters n mi | b 2 β n b | β 2 i=1 j=1 p(yij i,σ , ) i=1 p( i G)p( )p(G)p(σ )

Bayesian Biostatistics - Piracicaba 2014 528 Bayesian Biostatistics - Piracicaba 2014 529 Results: Example IX.12 – Toenail RCT – Fitting a BLMM

• WinBUGS program: chapter 9 dietary study chol age gender.odc Aim: compare itraconazol with lamisil on unaffected nail length ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Double-blinded multi-centric RCT (36 centers) sportsmen and elderly people ◦ Rapid convergence treated for toenail dermatophyte onychomycosis • Posterior summary measures on the last 3 × 5,000 iterations ◦ Two oral medications: itraconazol (treat=0) or lamisil (treat=1) ◦ ◦ β1(SD)=−0.69(0.57), β2(SD)=−62.67(10.67) Twelve weeks of treatment ◦ With covariates: σM = 116.3, σb0M =14.16 ◦ Evaluations at 0, 1, 2, 3, 6, 9 and 12 months ◦ Without covariates: σ = 119.5, σ =18.30 M b0M ◦ Response: unaffected nail length big toenail ◦ With covariates: ICC =0.015 ◦ Patients: subgroup of 298 patients

Bayesian Biostatistics - Piracicaba 2014 530 Bayesian Biostatistics - Piracicaba 2014 531

Individual profiles: Model specification:

• Model:   .   

yij = β0 + β1 tij + β2 tij × treati + b0i + b1i tij + εij

 bi ∼ N2(0, G) 2 εij ∼ N(0,σ )

• Priors ◦ Vague normal for fixed effects ◦ G ∼ IW(D, 2) with D=diag(0.1,0.1)

Bayesian Biostatistics - Piracicaba 2014 532 Bayesian Biostatistics - Piracicaba 2014 533 DAG: Results:

• WinBUGS program: chapter 9 toenail LMM.odc ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in ◦ Rapid convergence

• Posterior summary measures on the last 3 × 5,000 iterations ◦ β1(SD)=0.58(0.043), β2(SD)=−0.057(0.058) ◦ σM =1.78 ◦ σb0M =2.71 ◦ σb1M =0.48 ◦ − corr(b0,b1)M = 0.39 • Frequentist analysis with SAS MIXED: MLEs close to Bayesian estimates

Bayesian Biostatistics - Piracicaba 2014 534 Bayesian Biostatistics - Piracicaba 2014 535

9.4.3 The Bayesian generalized linear mixed model (BGLMM) Generalized linear mixed model (GLMM):

Generalized linear model (GLIM):  Two-level hierarchy, i.e. mi subjects in n clusters with responses yij having an exponential distribution

nsubjects with responses yi having an exponential distribution  Examples: binary responses, Poisson counts, gamma responses, Gaussian responses, etc.  Examples: binary responses, Poisson counts, gamma responses, Gaussian responses, etc.  Correlation of measurements in same cluster often modelled using random effects  When covariates involved: logistic/probit regression, Poisson regression, etc.  When covariates are involved: logistic/probit mixed effects regression, Poisson mixed effects regression, etc.  But clustering in data (if present) is ignored

Regression coefficients have a subject-specific interpretation Regression coefficients have a population averaged interpretation Bayesian generalized linear mixed model (BGLMM): GLMM with all parameters given a prior distribution

Bayesian Biostatistics - Piracicaba 2014 536 Bayesian Biostatistics - Piracicaba 2014 537 Example IX.13 – Lip cancer study – Poisson-lognormal model Results: aff : percentage of the population engaged in agriculture, forestry and fisheries • WinBUGS program: chapter 9 lip cancer PLNT with AFF.odc

Aim: verify how much of the heterogeneity of SMRi can be explained by aff i ◦ Three chains of each 20,000 iterations, 3 × 10,000 burn-in ◦ • Model: Rapid convergence

yi | μi ∼ Poisson(μi), with μi = θi ei • Posterior summary measures on the last 3 × 5,000 iterations log(μ /e )=β + β aff + b i i 0 1 i 0i ◦ β1(SD)=2.23(0.33), 95% equal tail CI [1.55, 2.93] ◦ With aff : σb0M =0.38 ∼ 2 b0i N(0,σb ) ◦ 0 Without aff : σb0M =0.44  aff i centered ◦ aff explains 25% of variability of the random intercept  Priors: • Analysis also possible with SAS procedure MCMC ◦ Independent vague normal for regression coefficients ◦ ∼ σb0 U(0, 100)

Bayesian Biostatistics - Piracicaba 2014 538 Bayesian Biostatistics - Piracicaba 2014 539

Example IX.14 – Toenail RCT – A Bayesian logistic RI model Mean profiles:

Aim: compare evolution of onycholysis over time between two treatments

Response: yij: 0 = none or mild, 1 = moderate or severe

 Model: u logit(π )=β + β t + β t × treat + b ij 0 1 ij 2 ij i 0i b ∼ N(0,σ2 ) 0i b0  Priors:

◦ Independent vague normal for regression coefficients ◦ ∼ σb0 U(0, 100)

Bayesian Biostatistics - Piracicaba 2014 540 Bayesian Biostatistics - Piracicaba 2014 541 Results: Example IX.15 – Toenail RCT – A Bayesian logistic RI+RS model

• WinBUGS program: chapter 9 toenail RI BGLMM.odc Aim: compare evolution of onycholysis over time between two treatments ◦ Three chains of each 10,000 iterations, 3 × 5,000 burn-in Response: yij: 0 = none or mild, 1 = moderate or severe ◦ Rapid convergence • Model: • Posterior summary measures on the last 3 × 5,000 iterations logit(πij)=β0 +β1 tij +β2 tij ×treati +b0i +b1i tij ◦ β0(SD)=−1.74(0.34) ◦ − β1(SD)= 0.41(0.045)  Similar distributional assumptions and priors as before ◦ β2(SD)=−0.17(0.069) ◦ σ2 =17.44 WinBUGS program: chapter 9 toenail RI+RS BGLMM.odc b0M ◦ ICC = σ2 /(σ2 + π2/3) = 0.84 b0 b0 Numerical difficulties due to overflow:useofmin and max functions • Frequentist analysis with SAS GLIMMIX: similar results, see program chapter 9 toenail binary GLIMMIX and MCMC.sas

Bayesian Biostatistics - Piracicaba 2014 542 Bayesian Biostatistics - Piracicaba 2014 543

Other BGLMMs Some further extensions

Extension of BGLMM with residual term: In practice more models are needed: xT β zT b g(μij)= ij + ij i + εij, (j =1,...,mi; i =1,...,n) • Non-linear mixed effects models (Section 9.4.4)

Further extensions: • Ordinal logistic random effects model

 WinBUGS program: chapter 9 lip cancer PLNT with AFF.odc • Longitudinal models with multivariate outcome ◦ Replace Gaussian assumption by a t-distribution ◦ t-distribution with degrees of freedom as parameter • Frailty models

 Other extensions: write down full conditionals • ...

See WinBUGS Manuals I and II, and literature

Bayesian Biostatistics - Piracicaba 2014 544 Bayesian Biostatistics - Piracicaba 2014 545 9.4.6 Estimation of the REs and PPDs

Advantage of Bayesian modelling: Prediction/estimation:

 Bayesian approach: estimating random effects = estimating fixed effects • Much easier to deviate from normality assumptions  Prediction/estimation of individual curves in longitudinal models: • Much easier to fit more complex models T T λββ + λb bi with β and bi: posterior means, medians or modes

PPD:

 See Poisson-gamma model, but replace θi by bi  Estimation via MCMC

Bayesian Biostatistics - Piracicaba 2014 546 Bayesian Biostatistics - Piracicaba 2014 547

Example IX.19 – Toenail RCT – Exploring the random effects

Software: Histograms of estimated RI and RS:

 WinBUGS (CODA command) + past-processing with R  R2WinBUGS

Bayesian Biostatistics - Piracicaba 2014 548 Bayesian Biostatistics - Piracicaba 2014 549 Predicting individual evolutions with WinBUGS: Predicted profile for Itraconazol patient 3:

Predicted profiles:

• xT β zT b Predicted evolution for the ith subject: ij + ij i

• Extra WinBUGS commands: For existing subject (id[iobs]): newresp[iobs] ~ dnorm(mean[iobs], tau.eps) For new subject: sample random effect from its distribution

• Computation: Stats table WinBUGS +R, CODA + R, R2WinBUGS • Prediction missing response NA: automatically done

Bayesian Biostatistics - Piracicaba 2014 550 Bayesian Biostatistics - Piracicaba 2014 551

9.4.7 Choice of the level-2 variance prior Example IX.20 – Dietary study* – NI prior for level-2 variance

• Modified dietary study: chapter 9 dietary study chol2.odc • Bayesian linear regression: NI prior for σ2 = Jeffreys prior p(σ2) ∝ 1/σ2  Prior 1: σ2 ∼ IG(10−3, 10−3) • 2 2 ∝ 2 θ Bayesian Gaussian hierarchical model: NI prior for σθ = p(σθ) 1/σθ ??  Prior 2: σ ∼ U(0,100) ◦ Theoretical + intuitive results: improper posterior θ ◦ Note: Jeffreys prior from another model, not Jeffreys prior of hierarchical model  Prior 3: suggestion of Gelman (2006) gives similar results as with U(0,100) • • Possible solution: IG(ε, ε) with ε small (WinBUGS)? Results: ◦ ◦ Not: see application Posterior distribution of σθ:clearimpact of IG-prior ◦ Trace plot of μ: regularly stuck with IG-prior • Solutions:

◦ U(0,c) prior on σθ ◦ Parameter expanded model suggestion by Gelman (2006)

Bayesian Biostatistics - Piracicaba 2014 552 Bayesian Biostatistics - Piracicaba 2014 553 2 Effect choosing NI prior: posterior distribution of σθ Effect choosing NI prior: trace plot of μ

# !/ ! 0  

σ σ θ θ

Bayesian Biostatistics - Piracicaba 2014 554 Bayesian Biostatistics - Piracicaba 2014 555

Choice of the level-2 variance prior 9.4.8 Propriety of the posterior

Prior for level-2 covariance matrix? • Impact of improper (NI) priors on the posterior: theoretical results available • Classical choice: Inverse Wishart ◦ Important for SAS since it allows improper priors

• Simulation study of Natarajan and Kass (2000): problematic • Proper NI priors can also have too much impact ◦ Important for WinBUGS • Solution: ◦ In general: not clear • Improper priors can yield proper full conditionals ◦ 2 dimensions: uniform prior on both σ’s and on correlation :IW(D, 2) with small diagonal elements :IW(D, 2) with small diagonal elements ≈ uniform priors, but slower convergence with uniform priors

Bayesian Biostatistics - Piracicaba 2014 556 Bayesian Biostatistics - Piracicaba 2014 557 9.4.9 Assessing and accelerating convergence Hierarchical centering:

Uncentered Gaussian hierarchical model: Checking convergence Markov chain of a Bayesian hierarchical model: yij = μ + αi + εij • ∼ 2 Similar to checking convergence in any other model αi N(0, σα) Centered Gaussian hierarchical model: • But, large number of parameters ⇒ make a selection yij = θi + εij ∼ 2 Accelerating convergence Markov chain of a Bayesian hierarchical model: θi N(μ, σα)

• • 2 2 Use tricks seen before (centering, standardizing, overrelaxation, etc.) For mi = m: hierarchical centering implies faster mixing when σα >σ/m

• Specific tricks for a hierarchical model: • Similar results for multilevel BLMM and BGLMM ◦ hierarchical centering ◦ (reparameterization by sweeping) ◦ (parameter expansion)

Bayesian Biostatistics - Piracicaba 2014 558 Bayesian Biostatistics - Piracicaba 2014 559

Chapter 10 Take home messages Model building and assessment

• Frequentist solutions are often close to Bayesian solutions, when the frequentist approach can be applied

• There are specific problems that one has to take care of when fitting Bayesian Aims: mixed models  Look at Bayesian criteria for model choice

• Hierarchical models assume exchangeability of level-2 observations, which might a  Look at Bayesian techniques that evaluate chosen model strong assumption. When satisfied it allows for better prediction making use of the ‘borrowing-strength’ principle

Bayesian Biostatistics - Piracicaba 2014 560 Bayesian Biostatistics - Piracicaba 2014 561 10.1 Introduction

• In this chapter: explorative tools that check and improve the fitted model: • Bayesian procedures for model building and model criticism ⇒ select an appropriate model • Model selection: Deviance Information Criterion (DIC) • • In this chapter: model selection from a few good candidate models Model criticism: Posterior Predictive Check (PPC) • • Chapter 11: model selection from a large set of candidate models For other criteria, see book

• Bayesian model building and criticism = similar to frequentist model and criticism, except for ◦ Bayesian model: combination of likelihood and prior ⇒ 2 choices ◦ Bayesian inference: based on MCMC techniques while frequentist approaches on asymptotic inference

Bayesian Biostatistics - Piracicaba 2014 562 Bayesian Biostatistics - Piracicaba 2014 563

10.2 Measures for model selection 10.2.1 The Bayes factor

Use of Bayesian criteria for choosing the most appropriate model: • In Chapter 3, we have seen that it can be used to test statistical hypotheses in a Bayesian manner • Briefly: Bayes factor • The Bayes factor can also be used for model selection. In book: illustration to • Extensively: DIC choose between binomial, Poisson and negative binomial distribution for dmft-index

• Computing the Bayes factor is often quite involved, and care must be exercised when combined with non-informative priors

• The pseudo Bayes factor: a variation of the Bayes factor that is practically feasible

• Bayes factors must be computed outside WinBUGS

Bayesian Biostatistics - Piracicaba 2014 564 Bayesian Biostatistics - Piracicaba 2014 565 10.2.2 Information theoretic measures for model selection Practical aspects of AIC, BIC, DIC

Most popular frequentist information criteria: AIC and BIC • AIC, BIC and DIC adhere Occam’s Razor principle • Both adjust for model complexity: • Practical rule for choosing model: effective degrees of freedom (≤ number of model parameters) ◦ Choose model with smallest AIC/BIC/DIC • Akaike’s information criterion (AIC) developed by Akaike (1974) ◦ Difference of ≥ 5: substantive evidence AIC = −2 log L(θ(y) | y)+2p ◦ Difference of ≥ 10: strong evidence

• Bayesian Information Criterion (BIC) suggested by Schwartz (1978) BIC = −2 log L(θ(y) | y)+p log(n)

• Deviance Information Criterion (DIC) suggested by Spiegelhalter et al. (2002) DIC = generalization of AIC to Bayesian models

Bayesian Biostatistics - Piracicaba 2014 566 Bayesian Biostatistics - Piracicaba 2014 567

Number of parameters ⇔ effective degrees of freedom Example X.5+6: Theoretical example – Conditional/ of LMM

What is the effective degrees of freedom ρ of a statistical model? Statistical model of Example IX.7: • For mixed model, number of parameters: • Two-level Bayesian Gaussian hierarchical model:  Let p = # fixed effects, v = # variance parameters, q = # random effects Level 1: y | θ ,σ2 ∼ N(θ ,σ2) (j =1,...,m; i =1,...,8)  Marginal model (marginalized over random effects): p + v ij i i i | 2 ∼ 2 Level 2: θi μ, σθ N(μ, σθ) (i =1,...,8)  Conditional model (random effects in model): p + v + q −3 −3 6 Priors: σ ∼ IG(10 , 10 ), σθ ∼ U(0,100), μ ∼ N(0, 10 ) • # Free parameters (effective degrees of freedom): • | 2 ∼ 2 ≤ ≤ Conditional likelihood (θi and θi μ, σθ N(μ, σθ)): 1+1 ρ 8+1  Marginal model: ρ = p + v θ | y 8 mi | 2 LC( ,σ ) = i j N(yij θi,σ )  Conditional model: p + v ≤ ρ

Bayesian Biostatistics - Piracicaba 2014 568 Bayesian Biostatistics - Piracicaba 2014 569 Bayesian definition of effective degrees of freedom Deviance information criterion

• • Bayesian effective degrees of freedom: pD = D(θ) − D(θ) DIC as a Bayesian model selection criterion:

· Bayesian deviance = D(θ)=−2 log p(y | θ) + 2 log f(y) DIC = D(θ)+2pD = D(θ)+pD · D(θ) = posterior mean of D(θ) • Both DIC and pD can be calculated from an MCMC run: · D(θ) = D(θ) evaluated in posterior mean ◦ θ1,...,θK = converged Markov chain · y  f( ) typically saturated density but not used in WinBUGS ◦ θ ≈ 1 K θk D( ) k=1 D( ) K  • ◦ θ ≈ 1 K θk Motivation: D( ) D(K k=1 )  The more freedom of the model, the more D(θ) will deviate from D(θ) • Practical rule for choosing model: as with AIC/BIC pD = ρ (classical degrees of freedom) for normal likelihood with flat prior for μ and fixed variance • DIC & pD are subject to sampling variability

Bayesian Biostatistics - Piracicaba 2014 570 Bayesian Biostatistics - Piracicaba 2014 571

Focus of a statistical analysis DIC and pD – Use and pitfalls

• ⇔ Focus of research determines degrees of freedom: population cluster Examples to illustrate use and pitfalls when using pD and DIC: • Prediction in conditional and marginal likelihood: • Use of WinBUGS to compute pD and DIC (Example X.7) ◦ Predictive ability of conditional model: focus on current clusters (current • random effects) ρ = pD for linear regression analysis (Example X.7) ◦ Predictive ability of marginal model: focus on future clusters (random effects • distribution) Difference between conditional and marginal DIC (Example X.9) • • Example: Impact of variability of random effects on DIC (Example X.10) ◦ Multi-center RCT comparing A with B • Impact of scale of response on DIC (Example X.10) ◦ Population focus: interest in overall treatment effect = β • ◦ Cluster focus: interest in treatment effect in center k = β + bk Negative pD (Example X.11)

Bayesian Biostatistics - Piracicaba 2014 572 Bayesian Biostatistics - Piracicaba 2014 573 Example X.8: Growth curve study – Model selection using pD and DIC Individual profiles:

• Well-known data set from Potthoff & Roy (1964) • Dental growth measurements of a distance (mm)

• 11 girls and 16 boys at ages (years) 8, 10, 12, and 14

• Variables are gender (1=female, 0=male) and age • Gaussian linear mixed models were fit to the longitudinal profiles

• WinBUGS: chapter 10 Potthoff-Roy growthcurves.odc

• Choice evaluated with pD and DIC

Bayesian Biostatistics - Piracicaba 2014 574 Bayesian Biostatistics - Piracicaba 2014 575

Models: Alternative models: Model M1:

 Model M2: model M1, but assuming ρ =0 yij = β0+β1 agej +β2 genderi+β3 genderi×agej +b0i+b1i agej +εij  Model M3: model M2, but assuming σ0 = σ1

yij = distance measurement  Model M4: model M1, but assuming σ0 = σ1

T T b ∼  bi =(b0i,b1i) random intercept and slope with distribution N (0, 0) , G  Model M5: model M1, but i,εij t3-(scaled) distributions ⎛ ⎞  Model M6: model M1, but εij ∼ t3-(scaled) distribution 2 ∼ 2 ⎝ σb0 ρσb0σb1 ⎠ εij N(0,σ0) for boys G =  Model M7: model M1, but bi ∼ t3-(scaled) distribution 2 ∼ 2 ρσb0σb1 σb1 εij N(0,σ1) for girls  Nested model comparisons:  The total number of parameters for model M1 = 63 (1) M1, M2, M3,(2)M1, M2, M4,(3)M5, M6, M7 4 fixed effects, 54 random effects, 3+2variances (RE + ME)

Bayesian Biostatistics - Piracicaba 2014 576 Bayesian Biostatistics - Piracicaba 2014 577 pD and DIC of the models: Initial and final model:

model M1 model M5 Model Dbar Dhat pD DIC Node Mean SD 2.5% 97.5% Mean SD 2.5% 97.5% M1 343.443 308.887 34.556 377.999 β0 16.290 1.183 14.010 18.680 17.040 0.987 15.050 18.970

M2 344.670 312.216 32.454 377.124 β1 0.789 0.103 0.589 0.989 0.694 0.086 0.526 0.867

β2 1.078 1.453 -1.844 3.883 0.698 1.235 -1.716 3.154 M3 376.519 347.129 29.390 405.909 β3 -0.309 0.126 -0.549 -0.058 -0.243 0.106 -0.454 -0.035 M4 374.065 342.789 31.276 405.341 σb0 1.786 0.734 0.385 3.381 1.312 0.541 0.341 2.502

M5 328.201 290.650 37.552 365.753 σb1 0.139 0.065 0.021 0.280 0.095 0.052 0.007 0.209 ρ -0.143 0.500 -0.829 0.931 -0.001 0.503 -0.792 0.942 M6 343.834 309.506 34.327 378.161 σ0 1.674 0.183 1.362 2.074 1.032 0.154 0.764 1.365 M7 326.542 288.046 38.047 364.949 σ1 0.726 0.111 0.540 0.976 0.546 0.101 0.382 0.783

Bayesian Biostatistics - Piracicaba 2014 578 Bayesian Biostatistics - Piracicaba 2014 579

Example X.9: Growth curve study – Conditional and marginal DIC Marginal(ized) M : 4 y ∼ X β Z ZT i N4 i , G + R , (i =1,...,27) (Conditional) model M4: y T with i =(yi1,yi2,yi3,yi4) | b ∼ 2 yij i N(μij,σ )(j =1,...,4; i =1,...,27) ⎛ ⎞ ⎛ ⎞ × with ⎜ 18genderi 8 genderi ⎟ ⎜ 18⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ μ = β + β age + β gender + β gender × age + b + b age ⎜ 110genderi 10 × genderi ⎟ ⎜ 110⎟ ij 0 1 j 2 i 3 i j 0i 1i j X ⎜ ⎟ Z ⎜ ⎟ 2 i = ⎜ ⎟, = ⎜ ⎟,R= σ I4 ⎜ 112gender 12 × gender ⎟ ⎜ 112⎟ ⎝ i i ⎠ ⎝ ⎠ ×   114genderi 14 genderi 114 μ 2 ≡ β b 2 − | 2  Deviance: DC( ,σ ) DC( , ,σ )= 2 i j log N(yij μij,σ )  T β 2 − y | X β Z ZT  In addition: bi ∼ N (0, 0) , G  Deviance: DM ( ,σ ,G)= 2 i log N4 i i , G + R b  Interest (focus): current 27 bi’s  Interest (focus): future i’s  DIC = conditional DIC  DIC= marginal DIC

 WinBUGS: pD =31.282 and DIC=405.444  WinBUGS : pD =7.072 and DIC=442.572

Bayesian Biostatistics - Piracicaba 2014 580 Bayesian Biostatistics - Piracicaba 2014 581 DIC and pD – Additional remarks

Frequentist analysis of M4: maximization of marginal likelihood Some results on pD and DIC: ◦ SAS procedure MIXED: p =8and AIC = 443.8 • pD and DIC depend on the parameterization of the model ◦ chapter 10 Potthoff-Roy growthcurves.sas • Several versions of DIC & pD: ◦ DIC in R2WinBUGS (classical definition) is overoptimistic (using data twice)

◦ R2jags: pD = var(deviance) / 2 ◦ rjags: classical DIC & DIC corrected for overoptimism (Plummer, 2008) ◦ Except for (R2)WinBUGS, computation of DIC can be quite variable

Bayesian Biostatistics - Piracicaba 2014 582 Bayesian Biostatistics - Piracicaba 2014 583

10.3 Model checking procedures 10.3.1 Introduction

Selected model is not necessarily sensible nor does it guarantee a good fit to the data Statistical model evaluation is needed:

1. Checking that inference from the chosen model is reasonable 2. Verifying that the model can reproduce the data 3. Sensitivity analyses by varying certain aspects of the model

Bayesian model evaluation:

 As frequentist approach  Also prior needs attention  Primarily based on sampling techniques

Bayesian Biostatistics - Piracicaba 2014 584 Bayesian Biostatistics - Piracicaba 2014 585 Practical procedure 10.3.4 Posterior predictive check

Two procedures are suggested for Bayesian model checking Take Example III.8 (PPD for caries experience)

• Sensitivity analysis: varying statistical model & priors

• Posterior predictive checks: generate replicated data from assumed model, and compare discrepancy measure based on replicated and observed data [

Observed dmft-distribution is quite different from PPD

Bayesian Biostatistics - Piracicaba 2014 586 Bayesian Biostatistics - Piracicaba 2014 587

Classical approach: (i) Idea behind posterior predictive check

 We measure difference of observed and predicted distribution via a statistic ◦ H0: model M0 holds for the data, i.e. y ∼ p(y | θ)

 Then use the (large) sample distribution of the statistic under H0 to see if the ◦ Sample y = {y1,...,yn} & θ estimated from the data observed value of the statistic is extreme Take, e.g.

But: ◦ Goodness-of-fit test (GOF) statistic T (y) with large value = poor fit ◦ Discrepancy measure D(y, θ) (GOF test depends on nuisance parameters)  In Bayesian approach, we condition on observed data ⇒ different route needs to be taken Then  Now, take a summary measure from the observed distribution and see how y {  } | θ extreme it is compared to the replicated summary measures sampled under  Generate sample = y1,...,yn from p(y ) assumed distribution  Compare observed GOF statistic/discrepancy measure with that based on replicated data + take uncertainty about θ into account  This gives a proportion of times discrepancy measure of observed data is more extreme than that of replicated data = Bayesian P-value

Bayesian Biostatistics - Piracicaba 2014 588 Bayesian Biostatistics - Piracicaba 2014 589 (ii) Computation of the PPC Example X.19: Osteoporosis study – Checking distributional assumptions

Computation of the PPP-value for D(y, θ) or T (y): Checking the normality of tbbmc:

• (R2)WinBUGS: six PPCs in chapter 10 osteo study-PPC.R 1. Let θ1,...,θK be a converged Markov chain from p(θ | y) 2. Compute D(y, θk)(k =1,...,K) (for T (y) only once) • and using T (y) (fixing mean and variance):     yk |θk n 3 n 4 3. Sample replicated data ˜ from p(y ) (each of size n) 1 yi − y 1 yi − y Tskew(y)= and Tkurt(y)= − 3 4. Compute D(yk, θk)(k =1,...,K) n s n s    i=1 i=1 1 K yk θk ≥ y θk 5. Estimate pD by pD = i=1 I D(˜ , ) D( , ) K • Skewness and kurtosis using D(y, θ):     n 3 n 4 1 yi − μ 1 yi − μ  When pD < 0.05/0.10 or pD > 0.90/0.95: “bad” fit of model to the data D (y, θ)= and D (y, θ)= − 3, skew n σ kurt n σ  Graphical checks: i=1 i=1 θ T ◦ T (y): Histogram of T (y˜k), (k =1,...,K) with observed T (y) with =(μ, σ) ◦ D(y, θ): X-Y plot of D(y˜k, θk) versus D(y, θk) + 45◦ line

Bayesian Biostatistics - Piracicaba 2014 590 Bayesian Biostatistics - Piracicaba 2014 591

PPP-values: PPP-plots:

• pTskew =0.13 & pTkurt =0.055

• pDskew =0.27 & pDkurt =0.26

⇒ Dskew,Dkurt more conservative than Tskew,Tkurt

• Histogram + scatterplot (next page)

Bayesian Biostatistics - Piracicaba 2014 592 Bayesian Biostatistics - Piracicaba 2014 593 Example X.19 – Checking distributional assumptions Potthoff & Roy growth curve study: Model M1 evaluated with PPCs

− Checking normality of ri =(yi yi)/σ of regression tbbmc on bmi: • R program using R2WinBUGS: LMM R2WB Potthoff Roy PPC model M1.R  (R2)WinBUGS: six PPCs in chapter 10 osteo study2-PPC.R • Four PPCs were considered:  Kolmogorov-Smirnov discrepancy measure  PPC 1: sum of squared distances of data (y or y˜ ) to local mean value y θ | − − − | ij ij DKS( , )=maxi∈{1,...,n} [Φ(yi μ, σ) (i 1)/n, i/n Φ(yi μ, σ)]  Two versions (m =4):    Gap discrepancy measure ◦ y 1 n m − 2 Version 1: TSS( )= × (yij yi.) y θ − n m i=1 j=1 Dgap( , )=maxi∈{1,...,(n−1)}(y(i+1) y(i)), ◦ y θ 1 n m − 2 Version 2: DSS( , )= × i=1 j=1 (yij μij) y n m with y(i) the ith ordered value of • PPC 2 & PPC 3: skewness and kurtosis of residuals (θ =(μ, σ)T ):  Proportion of residuals greater than 2 in absolute value   n   − 3  Skewness: D (y, θ)= 1 n m yij μij D (y, θ)= I(|y | > 2) skew n×m i=1 j=1 σ 0.05 i   i=1   − 4  Kurtosis: D (y, θ)= 1 n m yij μij − 3 None showed deviation from normality kurt n×m i=1 j=1 σ

Bayesian Biostatistics - Piracicaba 2014 594 Bayesian Biostatistics - Piracicaba 2014 595

Potthoff & Roy growth curve study: PPC 1 Potthoff & Roy growth curve study: PPCs 2, 3, 4                                

                   



PPP 2 = 0.51, PPP 3 = 0.56, PPP 4 = 0.25 PPP 1 = 0.68

Bayesian Biostatistics - Piracicaba 2014 596 Bayesian Biostatistics - Piracicaba 2014 597 Take home messages

• The Bayesian approach offers the possibility to fit complex models

• But, model checking cannot and should not be avoided Part III

• Certain aspects of model checking may take a (very long) time to complete More advanced Bayesian modeling

• The combination of WinBUGS/OpenBUGS with R is important

• R software should be looked for to do the model checking quicker

Bayesian Biostatistics - Piracicaba 2014 598 Bayesian Biostatistics - Piracicaba 2014 599

Chapter 11 11.1 Example 1: Longitudinal profiles as covariates Advanced modeling with Bayesian methods

• MCMC techniques allow us to fit models to complex data

• This is for many statisticians the motivation to make use of the Bayesian approach

• In this final chapter we give 2 examples of more advanced Bayesian modeling: ◦ Example 1: predicting a continuous response based on longitudinal profiles ◦ Example 2: a complex dental model to relate the presence of caries on deciduous teeth to the presence of caries on adjacent permanent teeth

Bayesian Biostatistics - Piracicaba 2014 600 Bayesian Biostatistics - Piracicaba 2014 601 Predicting basal metabolic rate Covariate profiles and outcome histogram:

• Study in suburb around Kuala Lumpur (Malaysia) on normal weight-for-age and height-for-age schoolchildren, performed between 1992 and 1995

• 70 boys (10.9 -12.6 years) and 69 girls (10.0-11.6 years) w

• Subjects were measured serially every 6 months for 3 years, dropouts: 23%

% 1 2   • Measurements: weight, height, (log) lean body mass (lnLBM) and basal metabolic rate (BMR) + ...

• \ Aim: Predict BMR from anthropometric measurements • Program commands are in BMR prediction.R

Bayesian Biostatistics - Piracicaba 2014 602 Bayesian Biostatistics - Piracicaba 2014 603

Joint modeling:

• Assessing relation between anthropometric measurements and BMR is useful: • Alternative approaches:  Determination of BMR is time consuming, prediction with anthropometric  Classical regression using all anthropometric measurements at all ages: measurements is quicker multicollinearity + problems with missing values  Tool for epidemiological studies to verify if certain groups of schoolchildren has  Ridge regression using all anthropometric measurements at all ages: an unusual BMR problems with missing values  Two-stage approach: sampling error of intercepts and slopes is not taken into • Prediction in longitudinal context: account  Based on measurements taken cross-sectionally • Joint modeling: takes into account missing values and sampling error  Based on longitudinal measurements: joint modeling (here) • Bayesian approach takes sampling error automatically into account, but computer intensive

Bayesian Biostatistics - Piracicaba 2014 604 Bayesian Biostatistics - Piracicaba 2014 605 Analysis Results: Longitudinal models

• The following longitudinal models were assumed: Param Mean SD 2.5% 25% 50% 75% 97.5% Rhat n.eff  Weight: yij = β01 + β11 ageij + β21 sexi + b01i + b11i ageij + ε1ij Weight

 Height: yij = β02 + β12 ageij + β22 sexi + b02i + b12i ageij + ε2ij β01 -19.17 1.80 -22.65 -20.40 -19.19 -17.98 -15.63 1.00 790.00

β11 4.80 0.15 4.51 4.69 4.80 4.90 5.09 1.00 750.00  log(LBM): yij = β03 + β13 ageij + β23 sexi + b03i + b13i ageij + ε3ij β21 -3.99 1.04 -6.06 -4.68 -4.01 -3.28 -1.96 1.00 3000.00 Height with RI + RS independent between models and of measurement error: β02 69.10 2.09 65.26 67.68 69.02 70.41 73.52 1.00 3000.00 ◦ b ∼ b ∼ b ∼ 1i N(0, Σb1), 2i N(0, Σb2), 3i N(0, Σb3) β12 6.49 0.17 6.15 6.39 6.50 6.61 6.80 1.00 3000.00 2 2 2 ◦ ∼ ∼ ∼ β22 -4.10 1.16 -6.42 -4.86 -4.08 -3.33 -1.83 1.00 3000.00 ε1ij N(0,σε1),ε2ij N(0,σε2),ε3ij N(0,σε3) log(LBM) • The following main model for BMR was assumed: β03 1.99 0.18 1.66 1.86 1.98 2.12 2.37 1.02 130.00 β13 0.12 0.01 0.11 0.12 0.12 0.12 0.13 1.00 2200.00 yi = β0+β1 b01i+β2 b11i+β3 b02i+β4 b12i++β5 b03i+β6 b13i+β7 agei1+β8 sexi+εij β23 -0.23 0.25 -0.78 -0.40 -0.23 -0.06 0.26 1.04 72.00

• 3 chains, 1,500,000 iterations with 500,000 burn-in with thinning=1,000

Bayesian Biostatistics - Piracicaba 2014 606 Bayesian Biostatistics - Piracicaba 2014 607

Results: Main model for BMR Results: DIC and pD

Param Mean SD 2.5% 25% 50% 75% 97.5% Rhat n.eff Model Dbar Dhat pD DIC β0 59.80 97.21 -130.51 -6.10 60.28 124.82 251.50 1.00 3000.00 bmr6 1645.500 1637.260 8.233 1653.730 β1 17.26 8.55 0.10 11.45 17.43 22.85 34.06 1.00 1300.00

β2 375.31 82.15 215.49 319.77 373.65 431.70 534.91 1.00 2200.00 height 2298.300 2056.320 241.984 2540.290 β 10.15 7.41 -4.95 5.31 10.12 15.14 24.41 1.00 2800.00 3 weight 2592.730 2367.990 224.733 2817.460 β4 233.95 83.18 61.67 179.57 235.55 290.52 394.70 1.00 2800.00 lnLBM -2885.930 -3133.980 248.054 -2637.870 β5 119.40 102.76 -92.40 52.80 123.35 186.43 316.20 1.01 230.00 β6 1.49 99.83 -198.05 -67.33 4.06 69.44 188.71 1.01 410.00 total 3650.600 2927.590 723.004 4373.600 β 457.78 10.87 436.60 450.20 457.90 465.10 479.20 1.00 2400.00 7 Dbar = post.mean of -2logL β8 198.36 86.51 34.79 138.75 199.50 257.35 369.60 1.00 1200.00 Dhat = -2LogL at post.mean of stochastic nodes

DICand pD are given for each model: 3 longitudinal models and main model  Averaged relative error: 11%

Bayesian Biostatistics - Piracicaba 2014 608 Bayesian Biostatistics - Piracicaba 2014 609 Results: Random effects structure: presence of correlation Further modeling:

• Looking for correct scale in longitudinal models and main model • Possible alternative: model 4-dimensional response jointly over time

• Involve 23 other anthropometric measurements in prediction

Bayesian Biostatistics - Piracicaba 2014 610 Bayesian Biostatistics - Piracicaba 2014 611

11.2 Example 2: relating caries on deciduous teeth with caries on permanent teeth

See 2012 presentation JASA paper

Bayesian Biostatistics - Piracicaba 2014 612 Bayesian Biostatistics - Piracicaba 2014 613 Exercises

For the exercises you need the following software:

Part IV • Rstudio + R

Exercises • WinBUGS

• OpenBUGS

Bayesian Biostatistics - Piracicaba 2014 614 Bayesian Biostatistics - Piracicaba 2014 615

Pre-course exercises Exercises Chapter 1

Software: Rstudio + R Software: Rstudio + R

Exercise 1.1 Plot the binomial likelihood and the log-likelihood of the surgery exam- Run the following R programs making use of R-studio to get used to R and to check ple.Usechapter 1 lik and llik binomial example.R for this. What happens when your software x=90 and n=120, or when x=0 and n=12?

• dmft-score and poisson distribution.r Exercise 1.2 Compute the positive predictive value (pred+) of the Folin-Wu blood test for prevalences that range from 0.05 to 0.50. Plot pred+ as a function of the • gamma distributions.r prevalence.

Bayesian Biostatistics - Piracicaba 2014 616 Bayesian Biostatistics - Piracicaba 2014 617 Exercises Chapter 2 Exercise 2.3 Suppose that in the surgical example of Chapter 1, the 9 successes out of 12 surgeries were obtained as follows: SSFSSSFSSFSSSSS. Start with a uni- form prior on the probability of success, θ, and show that how the posterior changes when the results of the surgeries become available. Then use a Beta(0.5,0.5) as Software: Rstudio + R prior. How do the results change? Finally, take a skeptical prior for the probability of success. Exercise 2.1 In the (fictive) ECASS 2 study on 8 out of the 100 rt-PA treated stroke Use program chapter 2 binomial lik + prior + post2.R and chapter 2 binomial lik patients suffered from SICH. Take a vague prior for θ = probability of suffering from + prior + post3.R. SICH with rt-PA. Derive the posterior distribution with R. Plot the posterior distri- bution, together with the prior and the likelihood. What is the posterior probability Exercise 2.4 Suppose that the prior distribution for the proportion of subjects with that the incidence of SICH is lower than 0.10? an elevated serum cholesterol (>200 mg/dl), say θ, in a population is centered Change program chapter 2 binomial lik + prior + post.R that is used to take the around 0.10 with 95% prior uncertainty (roughly) given by the interval [0.035,0.17]. ECASS 2 study data as prior. So now the ECASS 2 data make up the likelihood. In a study based on 200 randomly selected subjects in that population, 30% suffered from hypercholesteremia. Establish the best fitting beta prior to the specified prior Exercise 2.2 Plot the Poisson likelihood and log-likelihood based on the first 100 sub- information and combine this with the data to arrive at the posterior distribution jects from the dmft data set (dmft.txt). Use program chapter 2 Poisson likelihood of θ. Represent the solutions graphically. What do you conclude? dmft.R Use program chapter 2 binomial lik + prior + post cholesterol.R.

Bayesian Biostatistics - Piracicaba 2014 618 Bayesian Biostatistics - Piracicaba 2014 619

Exercise 2.5 Take the first ten subjects in the dmft.txt file. Assume a Poisson Exercises Chapter 3 distribution for the dmft-index, combine first with a Gamma(3,1) prior and then with a Gamma(30,10) prior for θ (mean of the Poisson distribution). Establish the posterior(s) with R. Then take all dmft-indexes and combine them with the two gamma priors. Repeat the computations. Represent your solutions graphically. Software: Rstudio + R What do you conclude? Use program chapter 2 poisson + gamma dmft.R. Exercise 3.1 Verify with R the posterior summary measures reported in Examples III.2 and III.5. Try first yourself to write the program. But, if you prefer, you can always look at the website of the book.

Exercise 3.2 Take the first ten subjects in the dmft.txt file, take a Gamma(3,1) prior and compute the PPD of a future count in a sample of size 100 with R. What is the probability that a child will have caries? Use program chapter 3 negative binomial distribution.R.

Exercise 3.3 Take the sequence of successes and failures reported in Exercise 2.3. Compute the PPD for the next 30 results. See program chapter 3 exercise 3-3.R.

Bayesian Biostatistics - Piracicaba 2014 620 Bayesian Biostatistics - Piracicaba 2014 621 Exercise 3.4 In Exercise 2.1, determine the probability that in the first interim analysis risk reduction (ar), of death, nonfatal stroke and the combined endpoint of death there will be more than 5 patients with SICH. or nonfatal stroke. Use the asymptotic normality of ar in the calculations below.

Exercise 3.5 The GUSTO-1 study is a mega-sized randomized controlled clinical trial N of N of N of Combined comparing two thrombolytics: streptokinase (SK) and recombinant plasminogen activator (rt-PA) for the treatment of patients with an acute myocardial infarction. Trial Drug Patients Deaths Nonfatal strokes Deaths or Strokes The outcome of the study is 30-day mortality, which is a binary indicator whether GUSTO SK 20,173 1,473 101 1,574 the treated patient died after 30 days or not. The study recruited 41021 acute rt-PA 10,343 652 62 714 infarct patients from 15 countries and 1081 hospitals in the period December 1990 - February 1993. The basic analysis was reported in Gusto (1993) and found a GISSI-2 SK 10,396 929 56 985 statistically significant lower 30-day mortality rate for rt-PA compared with SK. rt-PA 10,372 993 74 1,067 Brophy and Joseph (1995) reanalyzed the GUSTO-1 trial results from a Bayesian ISIS-3 SK 13,780 1,455 75 1,530 point of view, leaving out the subset of patients who were treated with both rt-PA and SK. As prior information the data from two previous studies have been used: rt-PA 13,746 1,418 95 1,513 GISSI-2 and ISIS-3. In the table on next page data from the GUSTO-1 study and the two historical studies are given. The authors compared the results of streptokinase and rt-PA using the difference of the two observed proportions, equal to absolute

Bayesian Biostatistics - Piracicaba 2014 622 Bayesian Biostatistics - Piracicaba 2014 623

Questions: Exercise 3.7 In Exercise 2.4, derive analytically the posterior distribution of logit(θ), if you dare. Or ..., compute the posterior distribution of logit(θ) via sampling and 1. Determine the normal prior for ar based on the data from (a) the GISSI-2 study, compute the of this posterior distribution. (b) the ISIS-3 study and (c) the combined data from the GISSI-2 and ISIS-3 studies. Exercise 3.8 Apply the R program chapter 3 Bayesian P-value cross-over.R to com- 2. Determine the posterior for ar for the GUSTO-1 study based on the above priors pute the contour probability of Example III.14, but with x=17 and n=30. and a noninformative normal prior. 3. Determine the posterior belief in the better performance of streptokinase or rt-PA Exercise 3.9 Berry (Nature Reviews/Drug Discovery, 2006, 5, 27-36) describes an based on the above derived posterior distributions. application of Bayesian predictive probabilities in monitoring RCTs. The problem is 4. Compare the Bayesian analyses to the classical frequentist analyses of comparing as follows: After 16 control patients have been treated for breast cancer, 4 showed the proportions in the two treatment arms. What do you conclude? a complete response at a first DSMB interim analysis, but with an experimental 5. Illustrate your results graphically. treatment 12 out of 18 patients showed a complete response. It was planned to treat in total 164 patients. Using a Bayesian predictive calculation, it was estimated Hint: Adapt the program chapter 3 Gusto.R. that the probability to get a statistical significant result at the end was close to 95%. This was the reason for the DSMB to advice to stop the trial. Apply the R program Exercise 3.6 Sample from the PPD of Example III.6 making use of the analytical chapter 3 predictive significance.R to compute this probability. In a second step results and the obtained posterior summary measures for a non-informative normal apply the program with different numbers. prior for μ.

Bayesian Biostatistics - Piracicaba 2014 624 Bayesian Biostatistics - Piracicaba 2014 625 Exercises Chapter 4 Exercises Chapter 5

Software: Rstudio + R, WinBUGS Software: Rstudio + R Exercise 5.1 Establish a prior for the prevalence of HIV patients in your country. Make use of the internet (or any source that you would like to use) to establish Exercise 4.1 Run the R program chapter 4 regression analysis osteoporosis.R, which your prior. Combine this prior information with the results of a(n hypothetical) is based on the Method of Composition. survey based on 5,000 randomly chosen individuals from your population with 10% HIV-positives? 2 Exercise 4.2 Sample from a t3(μ, σ )-distribution with μ =5and σ =2using the Method of Composition (extract relevant part from previous program). Exercise 5.2 Theoretical question: show that√ the Jeffreys prior for the mean param- eter λ of a Poisson likelihood is given by λ. Exercise 4.3 Sample from a mixture of three Gamma distributions with:

 Mixture weights: π1 =0.15, π2 =0.35 and π3 =0.55 Exercise 5.3 In chapter 5 sceptical and enthusiastic prior.R we apply a skeptical and a enthusiastic prior to interim results comparing to responder rates of two cancer  Parameters gamma distributions: α1 =2.5,β1 =1.75, α2 =3.8,β2 =0.67 and α =6,β =0.3. treatments. Run the program, interpret the results and apply the program to the 3 3 final results.

Bayesian Biostatistics - Piracicaba 2014 626 Bayesian Biostatistics - Piracicaba 2014 627

Exercise 5.4 Example VI.4: What types of priors have been used? Exercises Chapter 6

Software: Rstudio + R

Exercise 6.1 Example VI.1: With chapter 6 gibbs sampling normal distribution.R, determine posterior summary measures from the sampled histograms.

Exercise 6.2 Example VI.2: Use R program chapter 6 beta-binomial Gibbs.R to sam- ple from the distribution f(x).

Bayesian Biostatistics - Piracicaba 2014 628 Bayesian Biostatistics - Piracicaba 2014 629 Exercises Chapter 7 Exercise 7.4 Fit a Bayesian Poisson model on the dmft-counts of the first 20 children in ‘dmft.txt’. Then fit a negative binomial model to the counts. Check the con- Software: Rstudio + R, WinBUGS/OpenBUGS vergence of your sampler, and compute posterior summary statistics. This exercise requires that you build up a WinBUGS program from scratch (hint: use an existing Exercise 7.1 The file chapter 7 osteoporosis convergence.R contains the commands program as start). to perform the diagnostic plots and tests based on three CODA output files with names osteop1.txt, osteop2.txt, osteop3.txt and one administrative file with name Exercise 7.5 Run the WinBUGS Mice example - Volume I. This is an example of a osteoind.txt. These files were obtained from a WinBUGS run. Run this R program Bayesian Weibull regression analysis. and evaluate the output. Exercise 7.6 Use the Mice example, as a guiding example to solve the following Exercise 7.2 In chapter 7 caries.odc a Bayesian logistic regression is performed. Use problem. The data is taken from the book of Duchateau and Janssen (The Frailty and adapt the R program in the previous exercise to evaluate the convergence of the Model, Springer). It consists in the time from last time of giving birth to the time model parameters (hint: produce three chains as for osteoporosis study). Compare of insemination of a cow. It is of interest to find predictors of the insemination time. the performance of the convergence of the model parameters when the regressors The data can be found in WinBUGS format in insemination.odc. The following are centered and standardized. Perform a Bayesian probit regression model. variables are studied for their predictor effect:

Exercise 7.3 Apply some the acceleration techniques to the program of Exercise 7.2.

Bayesian Biostatistics - Piracicaba 2014 630 Bayesian Biostatistics - Piracicaba 2014 631

◦ PARITY: the number of times that the cow gave birth The questions/tasks are: ◦ Dichotomized version of PARITY: FIRST= 1 if the cow gave birth only once,  Model the time to insemination using a Weibull distribution, taking censoring FIRST = 0 when the cow gave birth multiple times. into account but ignoring the clustering in the data. Use a vague prior for the ◦ UREUM: Ureum percentage in the milk parameters. Then: ◦ PROTEIN: Protein percentage in the milk ◦ Check the convergence of sampler and the precision of your estimates. ◦ The cows belong to herds (cluster). Some cows are killed before the next insemina- Compute posterior summary statistics for each parameter. tion and are then censored at the time of death. If a cow is not inseminated after  What is the increase of risk of insemination in cows with multiple births (hazard 300 days, it is censored at that time. The data is restricted to the first 15 clusters. ratio)? The time and cluster information are captured in the following variables :  Evaluate the sensitivity of the results on the choice of priors. ◦ TIME: time from birth to insemination (or censoring) ◦ STATUS: = 1 for events, = 0 for censored ◦ CLUSTERID: cluster identifier

Bayesian Biostatistics - Piracicaba 2014 632 Bayesian Biostatistics - Piracicaba 2014 633 Exercises Chapter 8 Exercise 8.3 Run WinBUGS with chapter 8 osteomultipleregression.odc. Suppose that we wish to predict for a new individual of 65 years old, with length= 165 cm, weight = 70 kg, (fill in BMI yourself) and strength variable equal to 96.25 her Software: WinBUGS/OpenBUGS tbbmc. Perform the prediction and give the 95% CI.

Exercise 8.1 In the analysis of the osteoporosis data (file chapter 8 osteoporosis.odc), Exercise 8.4 Vary in Exercise 8.3 the distribution of the error term. Check WinBUGS switch off the blocking mode and compare the solution to that when the blocking manual which distributions are available. mode option is turned on. Exercise 8.5 Program Example II.1 in WinBUGS. Exercise 8.2 Adapt the above WinBUGS program (blocking option switched off) to analyze the osteoporosis data and: Exercise 8.6 Simulate with WinBUGS prior distributions. Take informative and non- informative priors. ◦ perform a simple linear regression analysis with bmi centered ◦ add the quadratic term bmi2 ◦ explore how convergence can be improved ◦ give 4 starting values, apply BGR diagnostics, export the chains and explore the posterior distribution in R.

Bayesian Biostatistics - Piracicaba 2014 634 Bayesian Biostatistics - Piracicaba 2014 635

Exercise 8.7 In WinBUGS the command y ∼ dnorm(mu,isigma)I(a,b) means that change in samplers in the 3 programs in chapter 6 coalmine samplers.odc. the random variable y is interval censored in the interval [a,b]. WinBUGS does not have a command to specify that the random variable is truncated in that interval. In OpenBUGS the censoring command is C(a,b), while the truncation command is T(a,b). Look into chapter 8 openbugscenstrunc.odc and try out the various models to see the difference between censoring and truncation.

Exercise 8.8 Example VI.4: In chapter 6 coalmine.odc a WinBUGS program is given to analyze the coalmine disaster data. Particular priors have been chosen to simplify sampling. Derive the posterior of θ −λ and θ/λ. Take other priors to perform what what is called a sensitivity analysis.

Exercise 8.9 In chapter 5 prevalence Ag Elisa test Cysticercosis.odc you find the program used in Example V.6. Run the program and remove the constraint on the prevalence. What happens?

Exercise 8.10 Check the samplers in the program chapter 6 coalmine.odc.Notethe

Bayesian Biostatistics - Piracicaba 2014 636 Bayesian Biostatistics - Piracicaba 2014 637 Exercises Chapter 9  Use the program in Model 2 (and Data 2, Inits 2/2) to determine the predictive distribution of the number of deaths in each arm of the new study (with 2x100 patients recruited). What is the difference with the previous analysis? Why is Software: Rstudio + R, WinBUGS/OpenBUGS, R2WinBUGS there a difference? Rank the studies according to their baseline risk. Illustrate the shrinkage effect of the estimated treatment effects compared to the observed Exercise 9.1 A meta-analysis on 22 clinical trials on beta-blockers for reducing mor- treatment effects. tality after acute myocardial infarction was performed by Yusuf et al. ‘Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Exercise 9.2 Perform a Bayesian meta-analysis using the WinBUGS program in chap- Cardiovasc Dis. 1985;27:33571’. The authors performed a classical analysis. Here ter 9 sillero meta-analysis.odc. Perform a sensitivity analysis by varying the prior the purpose is to perform a Bayesian meta-analysis. In the WinBUGS file chapter distributions and the normal distribution of the log odds ratio. Run the program 9 beta-blocker.odc two Bayesian meta-analyses are performed. Tasks: also in OpenBUGS.  Use the program in Model 1 (and Data 1, Inits 1/1) to evaluate the overall mean effect of the beta-blockers and the individual effects of each of the beta- Exercise 9.3 In chapter 9 toenail RI BGLMM.odc a logistic random intercept model is blockers. Evaluate also the heterogeneity in the effect of the beta-blockers. Rank given to relate the presence of moderate to severe onycholysis to time and treatment. the studies according to their baseline risk. Suppose a new study with twice 100 Run the program and evaluate the convergence of the chain. Vary also the assumed patients are planned to set up. What is the predictive distribution of the number distribution of the random intercept. of deaths in each arm? For this use Data 2 and Inits 1/2.

Bayesian Biostatistics - Piracicaba 2014 638 Bayesian Biostatistics - Piracicaba 2014 639

Exercise 9.4 In chapter 9 dietary study-R2WB.R an R2WinBUGS analysis is per- Exercise 9.7 Medical device trials are nowadays often designed and analyzed in a formed on the IBBENS data. Run this program and perform some diagnostic tests Bayesian way. The Bayesian approach allows to include prior information from in R to check the convergence of the chain. previous studies. It is important to reflect on the way subjective information is included into a new trial. A too strong prior may influence the current results too Exercise 9.5 Use chapter 9 dietary study chol2.odc to examine the dependence of heavily and sometimes render the planning of a new study obsolete. A suggested the posterior of σθ on the choice of ε in the inverse gamma ‘noninformative’ prior approach is to incorporate the past trials with the current trial into an hierarchical IG(ε, ε). model. In chapter 9 prior MACE.odc the fictive data are given on 6 historical studies on the effect of a cardiovascular device in terms of a 30-day MACE rate Exercise 9.6 Example IX.12: Check the impact of choosing an Inverse Wishart prior (composite score of major cardiovascular events), see Pennello and Thompson (J for the covariance matrix of the random intercept and slope on the estimation Biopharm Stat, 2008, 18:1, 81-115). For the new device the claim is that the of the model parameters, by varying the parameters of the prior. Compare this probability of 30-day MACE, p1 is less than 0.249. The program evaluates what the solution with the solution obtained from taking a product of three uniform priors prior probability of this claim is based on the historical studies. Run this program on the standard deviation of the random intercept and slope, and the correlation, and check also, together with the paper (pages 84-87), how this prior claim can be respectively. Use chapter 9 toenail LMM.odc. adjusted.

Bayesian Biostatistics - Piracicaba 2014 640 Bayesian Biostatistics - Piracicaba 2014 641 Exercises Chapter 10 Exercise 10.3 Example X.19: In chapter 10 osteo study-PPC.R the response tbbmc is checked for normality with 6 PPCs. The program uses R2WinBUGS. Run the program and check the results. Perform also the gap test. Apply this approach to test the normality of the residuals when regressing tbbmc on age and bmi. Software: Rstudio + R, WinBUGS/OpenBUGS Exercise 10.4 In chapter 10 caries with residuals.odc a logistic regression model is Exercise 10.1 Replay the growth curve analyses in the WinBUGS program chapter 10  fitted to the CE data of the Signal-Tandmobiel study of Example VI.9 with some Pothoff-Roy growthcurves.odc and evaluate with DIC and p the well performing D additional covariates, i.e. gender of the child (1=girl), age of the child, whether or models. not the child takes sweets to the school and whether or not the children frequently brushed their teeth. Apply the program and use the option Compare to produce Exercise 10.2 In chapter 10 Poisson models on dmft scores.odc several models are  various plots based on εi. Are there outliers? Do the plots indicate a special fit to the dmft-index of 276 children of the Signal-Tandmobiel study. Evaluate pattern of the residuals? In the same program also a Poisson model is fitted to the the best performing model. Confirm that the Poisson-gamma model has a different dmft-indices. Apply a PPC with the χ2-discrepancy measure to check the Poisson DIC and pD from the negative binomial model. Perform a chi-squared type of assumption. posterior-predictive check.

Bayesian Biostatistics - Piracicaba 2014 642 Bayesian Biostatistics - Piracicaba 2014 643

Medical papers

Bayesian methodology is slowly introduced into clinical trial and epidemiological research. Below we provide some references to medical/epidemiological papers where the Bayesian methodology has been used. Methodological papers to highlight the Part V usefulness of methodological papers in medical research are:

 Lewis, R.J. and Wears, R.L., An introduction to the Bayesian analysis of clinical Medical papers trials, Annals of Emergency Medicine, 1993, 22, 8, 1328–1336  Diamond, G.A. and Kaul, S., Prior Convictions. Bayesian approaches to the analysis and interpretation of clinical megatrials. JACC, 2004, 43, 11, 1929–1939  Gill, C.J., Sabin, L. and Schmid, C.H., Why clinicians are natural Bayesians,BMJ, 2005, 330, 1080–1083  Berry, D.A. Bayesian clinical trials, Nature Reviews/Drug Discovery, 2006, 5, 27–36

Bayesian Biostatistics - Piracicaba 2014 644 Bayesian Biostatistics - Piracicaba 2014 645 Bayesian methods in clinical trials Planning and analysis: Holmes, D.R., Teirstein, P., Satler, L. et al. Sirolimus-eluting stents vs vascular brachytherapy for in-stent restenosis within bare-metal stents. JAMA, 2006, 295, 11, 1264–1273 Monitoring: Cuffe, M.S., Califf, R.M., Adams, K.F. et al. Short-term intravenous Bayesian methods were used for trial design and formal primary analysis using a milirone for acute exacerbation of chronic heart failure: A randomized controlled Bayesian hierarchical model approach to decrease the sample size. trial., JAMA, 2002, 287, 12, 1541–1547 Evaluation of safety during the study conduct in a Bayesian manner. Planning and analysis: Stone, G.W., Martin, J.L., de Boer, M-J. et al. Effect of supersaturated oxygen delivery on infarct size after percutaneous coronary Monitoring: The GRIT Study Group. A randomized trial of time delivery for the intervention in acute myocardial infarction. Circulation, 2009, Sept 15, DOI: compromised preterm fetus: short term outcomes and Bayesian interpretation., 10.1161/CIRCINTERVENTIONS.108.840066 BJOG: an International Journal of Obstretics and Gynaecology, 2003, 110, 27–32 Bayesian methods were used for trial design and formal primary analysis using a Evaluation of safety during the study conduct in a Bayesian manner. Bayesian hierarchical model approach to decrease the sample size. Sensitivity analysis: Jafar, T.H., Islam, M., Bux, R. et al. Cost-effectiveness of Planning and analysis: Shah, P.L., Slebos, D-J., Cardosa, P.F.G. et al. Bronchoscopic community-based strategies for blood-pressure control in low-income developing lung-volume reduction with Exhale airway stents for emphysema (EASE trial): country: Findings from a cluster-randomized, factorial-controlled trial. Circulation, randomised sham-controlled, multicenter trial. The Lancet, 2011, 378, 997–1005 2011, 124, 1615–1625 A Bayesian adaptive design was used to set up the study, the primary efficacy and Bayesian methods were used to vary parameters when computing the safety endpoints were analysed using a Bayesian approach. cost-effectiveness.

Bayesian Biostatistics - Piracicaba 2014 646 Bayesian Biostatistics - Piracicaba 2014 647

Planning and analysis: Wilber, D.J., Pappone, C., Neuzil, P. et al. Comparison of Bayesian methods in antiarrhythmic drug therapy and radiofrequency catheter ablation in patients with paroxysmal atrial fibrillation. JAMA, 2010, 303, 4, 333–340 A Bayesian design was used to set up the study (allowing for stopping for futility), the primary efficacy endpoint was analysed using a Bayesian approach. In addition Two relatively simple papers on the application of hierarchical models in epidemiology: frequentist methods were applied for confirmation. Multilevel model Macnab, Y.C., Qiu, Z., Gustafson, P. et al. Hierarchical Bayes Planning and analysis: Cannon, C.P., Dansky, H.M., Davidson, M. et al. Design of he analysis of multilevel health services data: A Canadian neonatal mortality study. DEFINE trial: Determining the efficacy and tolerability of CETP INhibition with Health Services & Outcomes Research Methodology, 2004, 5, 5–26. AnacEtrapib. Am Heart J, 2009, 158, 4, 513.e3–519.e3 Cannon, C.P., Shah, S., Hayes, M. et al. Safety of Anacetrapib in patients with or Spatial model Zhu, L., Gorman, D.M. and Horel, S. Hierarchical Bayesian spatial at high risk for coronary heart disease. NEJM, 2010, 363, 25, 2406–2415 models for alcohol availability, drug “hot spots” and violent crime. International A preplanned Bayesian analysis was specified to evaluate the sparse CV event in Journal of Health Geographics, 2006, 5:54 DOI: 10.1186/1476-072X-5-54. the light of previous studies. Other analyses have a frequentist nature. item[Guidance document:] The FDA guidance document for the planning, conduct and analysis of medical device trials is also a good introductory text for the use of Bayesian analysis in clinical trials in general (google ‘FDA medical devices guidance’)

Bayesian Biostatistics - Piracicaba 2014 648 Bayesian Biostatistics - Piracicaba 2014 649