Applied Bayesian Inference
Total Page:16
File Type:pdf, Size:1020Kb
Applied Bayesian Inference Prof. Dr. Renate Meyer1;2 1Institute for Stochastics, Karlsruhe Institute of Technology, Germany 2Department of Statistics, University of Auckland, New Zealand KIT, Winter Semester 2010/2011 Prof. Dr. Renate Meyer Applied Bayesian Inference 1 Prof. Dr. Renate Meyer Applied Bayesian Inference 2 1 Introduction 1.1 Course Overview 1 Introduction 1.1 Course Overview Overview: Applied Bayesian Inference A Overview: Applied Bayesian Inference B I Conjugate examples: Poisson, Normal, Exponential Family I Bayes theorem, discrete – continuous I Specification of prior distributions I Conjugate examples: Binomial, Exponential I Likelihood Principle I Introduction to R I Multivariate and hierarchical models I Simulation-based posterior computation I Techniques for posterior computation I Introduction to WinBUGS I Normal approximation I Regression, ANOVA, GLM, hierarchical models, survival analysis, state-space models for time series, copulas I Non-iterative Simulation I Markov Chain Monte Carlo I Basic model checking with WinBUGS I Bayes Factors, model checking and determination I Convergence diagnostics with CODA I Decision-theoretic foundations of Bayesian inference Prof. Dr. Renate Meyer Applied Bayesian Inference 3 Prof. Dr. Renate Meyer Applied Bayesian Inference 4 1 Introduction 1.1 Course Overview 1 Introduction 1.2 Why Bayesian Inference? Computing Why Bayesian Inference? Or: What is wrong with standard statistical inference? I R – mostly covered in class The two mainstays of standard/classical statistical inference are I WinBUGS – completely covered in class I confidence intervals and I Other – at your own risk I hypothesis tests. Anything wrong with them? Prof. Dr. Renate Meyer Applied Bayesian Inference 5 Prof. Dr. Renate Meyer Applied Bayesian Inference 6 1 Introduction 1.2 Why Bayesian Inference? 1 Introduction 1.2 Why Bayesian Inference? Example: Newcomb’s Speed of Light Newcomb’s Speed of Light: CI Example 1.1 Let us assume that the individual measurements 2 2 Light travels fast, but it is not transmitted instantaneously. Light takes Xi ∼ N(µ, σ = 0:005 ) with known measurement variance 2 2 over a second to reach us from the moon and over 10 billion years to σ = 0:005 . We want to find a 95% confidence interval for µ. reach us from the most distant objects yet observed in the expanding p ¯ universe. Because radio and radar also travel at the speed of light, an Answer: x ± 1:96 × σ= n accurate value for that speed is important in communicating with X¯ − µ astronauts and orbiting satellites. An accurate value for the speed of Because as p ∼ N(0; 1): light is also important to computer designers because electrical signals σ= n travel only at light speed. X¯ − µ The first reasonably accurate measurements of the speed of light were P −1:96 < p < 1:96 = 0:95 made by Simon Newcomb between July and September 1882. He σ= n p p measured the time in seconds that a light signal took to pass from his P X¯ − 1:96σ= n < µ < X¯ − 1:96σ= n = 0:95 laboratory on the Potomac River to a mirror at the base of the P(24:8182 < µ < 24:8378) = 0:95 Washington Monument and back, a total distance of 7400m. His first measurement was 24.828 millions of a second. This means that µ is in this interval with 95% probability.Certainly NOT! Prof. Dr. Renate Meyer Applied Bayesian Inference 7 Prof. Dr. Renate Meyer Applied Bayesian Inference 8 1 Introduction 1.2 Why Bayesian Inference? 1 Introduction 1.2 Why Bayesian Inference? Newcomb’s Speed of Light: CI Newcomb’s Speed of Light: Simulation The Level of Confidence True mean Coverage 24.8 After collecting the data and computing the CI, this interval either Sample to date contains the true mean or it does not. Its coverage probability is not 1st 100% 2nd 100% 0.95 but either 0 or 1. 3rd 100% 4th 100% Then where does our 95% confidence come from? 5th 100% 6th 100% 7th 100% Let us do an experiment: 8th 100% 2 9th 88.9% I draw 1000 samples of size 10 each from N(24:828; 0:005 ) 10th 90.0% ……. ……. 100th 94.0% I for each sample calculate the 95% CI ……. ……. 991st ……. 9…5.2%…. I check whether the true µ = 24:828 is inside or outside the CI 1000th 95.2% S1 Figure 1: Coverage over repeated sampling. Prof. Dr. Renate Meyer Applied Bayesian Inference 9 Prof. Dr. Renate Meyer Applied Bayesian Inference 10 1 Introduction 1.2 Why Bayesian Inference? 1 Introduction 1.2 Why Bayesian Inference? Newcomb’s Speed of Light: CI Newcomb’s Speed of Light: CI I 952 of the 1000 CIs include the true mean. 48 of the 1000 CIs do not include the true mean. I By contrast, Bayesian confidence intervals, known as credible intervals I In reality, we don’t know the true mean. do not require this awkward frequentist interpretation. I We do not sample repeatedly, we only take one sample and calculate one CI. One can make the more natural and direct statement concerning the I Will this CI contain the true value? probability of the unknown parameter falling in this interval. I It either will or will not but we do not know. One needs to provide additional structure to make this interpretation I We take comfort in the fact that the method works 95% of the time possible. in the long run, i.e. the method produces a CI that contains the unknown mean 95% of the time that the method is used in the long run. Prof. Dr. Renate Meyer Applied Bayesian Inference 11 Prof. Dr. Renate Meyer Applied Bayesian Inference 12 1 Introduction 1.2 Why Bayesian Inference? 1 Introduction 1.2 Why Bayesian Inference? Newcomb’s Speed of Light: Hypothesis Test Newcomb’s Speed of Light: Hypothesis Test The P-value is the probability to observe a value of the test statistic H0 : µ ≤ µ0(= 24:828) versus H1 : µ > µ0 that is more extreme than the actually observed value uobs if the null hypothesis were true (under repeated sampling). I Test statistic: We can do another thought experiment ¯ X − µ0 I imagine we take 1000 samples of size 10 from a Normal U = p ∼ N(0; 1) if µ = µ0 σ= n distribution with mean µ0. I we calculate the P-value for each sample. I Small values of uobs are consistent with H0, large values favour H1 I it will only we smaller than 0.05 in about 5% of the samples, in I P-value: about 50 samples. p = P(U > uobsjµ = µ0) = 1 − Φ(u0) I we take comfort in the fact that this test works 95% of the time in I if P-value < 0:05 (= usual type I error rate), reject H0 the long run, i.e. rejects H0 even though H0 is true only in 5% of the cases that this method is used. The P-value is the probability that H0 is true.Certainly NOT. Prof. Dr. Renate Meyer Applied Bayesian Inference 13 Prof. Dr. Renate Meyer Applied Bayesian Inference 14 1 Introduction 1.2 Why Bayesian Inference? 1 Introduction 1.2 Why Bayesian Inference? Newcomb’s Speed of Light: Hypothesis Test Newcomb’s Speed of Light: Hypothesis Test I It can only offer evidence against the null hypothesis. A large P-value does not offer evidence that H0 is true. I P-value cannot be directly interpreted as "weight of evidence" but By contrast, the Bayesian approach to hypothesis testing, due only as a long-term probability (in a hypothetical repetition of the primarily to Jeffreys (1961) is much simpler and avoids the pitfalls of same experiment) of obtaining data at least as unusual as what the traditional Neyman-Pearson-based approach. was actually observed. I Most practitioners are tempted to say that the P-value is the It allows the direct calculation of the probability that a hypothesis is true and thus a direct and straightforward interpretation. probability that H0 ist true. P-values depend not only on the observed data but also the I Again, as in the case of CIs, we need to add more structure to the sampling probability of certain unobserved datapoints. This underlying probability model. violates the Likelihood Principle. I This has serious practical implications for instance for the analysis of clinical trials, where often interim analyses and unexpected drug toxicities change the original trial design. Prof. Dr. Renate Meyer Applied Bayesian Inference 15 Prof. Dr. Renate Meyer Applied Bayesian Inference 16 1 Introduction 1.3 Historical Overview 1 Introduction 1.3 Historical Overview Historical Overview Inverse Probability I Bayes and Laplace (late 1700’s) – inverse probability I Example: Given x successes in n iid trials with success probability θ I probability – statements about observables given assumptions about unknown parameters P(9 ≤ X ≤ 12jθ) deductive I inverse probability – statements about unknown parameters given observed data values P(a < θ < bjX = 9) inductive Figure 2: From William Jefferys’ webpage, Univ. of Texas at Austin. Prof. Dr. Renate Meyer Applied Bayesian Inference 17 Prof. Dr. Renate Meyer Applied Bayesian Inference 18 1 Introduction 1.3 Historical Overview 1 Introduction 1.3 Historical Overview Thomas Bayes Bayes’ Biography (b. 1702, London – d. 1761, Tunbridge Wells, Kent) Presbyterian minister and mathematician Bellhouse, D.R. (2004) The Reverend Thomas Bayes: FRS: A Son of one of the first 6 Nonconformist ministers in England Biography to Celebrate the Tercentenary of His Birth. Statistical Private education (by De Moivre?) Science 19(1):3-43. Ordained as Nonconformist minister and took the position as minister at the Presbyterian Chapel, Tunbridge Wells Educated and interested in mathematics, probability and statistics, believed to be the first to use probability inductively, defended the views and philosophy of Sir Isaac Newton against criticism by Bishop Berkeley Two papers published while he was still living: I Divine Providence and Government is the Happiness of His Creatures (1731) I An Introduction to the Doctrine of Fluxions, and a Defense of the Figure 3: Reverend Thomas Bayes 1702-1761.