<<

What is...? series New title

Supported by sanofi-aventis What is ?

G concerns unknown that John W Stevens BSc describe certain population characteristics such as the true Director, Centre for efficacy of a particular treatment. Inferences are made Bayesian Statistics in Health Economics, using data and a that links the data to the University of Sheffield parameters.

G In frequentist statistics, parameters are fixed quantities, whereas in Bayesian statistics the true value of a can be thought of as being a to which we assign a distribution, known specifically as prior information.

G A Bayesian analysis synthesises both sample data, expressed as the , and the prior distribution, which represents additional information that is available.

G The posterior distribution expresses what is known about a set of parameters based on both the sample data and prior information.

G In frequentist statistics, it is often necessary to rely on large- sample approximations by assuming asymptomatic normality. In contrast, Bayesian inferences can be computed exactly, even in highly complex situations.

For further titles in the series, visit: www.whatisseries.co.uk

Date of preparation: April 2009 1 NPR09/1108 What is Bayesian statistics? What is Bayesian statistics?

Statistical inference probability as a measure of the degree of Statistics involves the collection, analysis and personal belief about the value of an interpretation of data for the purpose of unknown parameter. Therefore, it is possible making statements or inferences about one or to ascribe probability to any event or more physical processes that give rise to the proposition about which we are uncertain, data. Statistical inference concerns unknown including those that are not repeatable, such parameters that describe certain population as the probability that the Bank of England characteristics such as the true mean efficacy Monetary Policy Committee will reduce of a treatment for cancer or the probability of interest rates at their next meeting. experiencing an adverse event. Inferences are made using data and a statistical model that links the data to the parameters. The Prior information statistical model might be very simple such A random variable can be thought of as a that, for example, the data are normally variable that takes on a set of values with distributed with some unknown but true specified probability. In frequentist statistics, population mean, µ say, and known parameters are not repeatable random things population , σ2 say, so that our but are fixed (albeit unknown) quantities, objective is to make inferences about µ which that they cannot be considered through a sample of data. In practice, as random variables. In contrast, in Bayesian statistical models are much more complex statistics anything about which we are than this. There are two main and distinct uncertain, including the true value of a approaches to inference, namely frequentist parameter, can be thought of as being a and Bayesian statistics, although most people, random variable to which we can assign a when they first learn about statistics, usually , known specifically as begin with the frequentist approach (also prior information. known as the classical approach). A fundamental feature of the Bayesian approach to statistics is the use of prior information in addition to the (sample) data. The nature of probability A proper Bayesian analysis will always The fundamental difference between the incorporate genuine prior information, which Bayesian and frequentist approaches to will help to strengthen inferences about the statistical inference is characterised in the way true value of the parameter and ensure that they interpret probability, represent the any relevant information about it is not unknown parameters, acknowledge the use of wasted. prior information and make the final In general, the argument against the use of inferences. prior information is that it is intrinsically The frequentist approach to statistics subjective and therefore has no place in considers probability as a limiting long-run science. Of particular concern is the fact that . For example, the toss of a fair die an unscrupulous analyst can concoct any an infinite number of times would result in desired result by the creative specification of the numbers one to six arising with equal prior distributions for the parameters in the frequency, and the probability of any model. However, the potential for particular event – 1/6 – is the long-run manipulation is not unique to Bayesian frequency relative to the number of tosses of statistics. The scientific community1 and the die. It is clear then that in frequentist regulatory agencies2 have developed statistics probability applies only to events sophisticated safeguards and guidance to that are (at least in principle) repeatable. In avoid conscious or unconscious biases. An contrast, the Bayesian approach regards example is the use of double-blind,

Date of preparation: April 2009 2 NPR09/1108 What is Bayesian statistics?

Figure 1. The Bayesian method

Data

Bayes’ Posterior Theorem distribution

Prior information

randomised, controlled trials for the rigorous of interest. The first of these is the sample comparison of interventions. This, and data, expressed formally by the likelihood similar requirements for the (statistical function. The second is the prior analysis) protocol to be established before a distribution, which represents additional trial begins, are necessary to obviate the (external) information that is available to the potential for manipulation or bias that investigator (Figure 1). Whereas the already exists in the use of frequentist likelihood function is also fundamental to statistics. A serious Bayesian statistician will , the prior distribution is spend reasonable, and sometimes substantial, used only in the Bayesian approach. If we effort to develop probability distributions that represent the data by the symbol D and genuinely represent prior beliefs. The process denote the set of unknown parameters by θ, should be formal and transparent so that the then the likelihood function is f(Dθ); the basis for the resulting probability distribution probability of observing the data D being is understandable and justifiable. conditional on the values of the parameter θ. If we further represent the prior distribution for θ as π(θ), giving the probability that θ The Bayesian method takes any particular value based on whatever Figure 2. Example A Bayesian analysis synthesises two sources of additional information might be available to of a triplot information about the unknown parameters the investigator, then, with the application of Bayes’s theorem,3 an elementary result about named after the Reverend , we synthesise these Prior two sources of information through the Data/likelihood equation: Posterior 0.4 – Equation 1. p(θD) ∝ f(Dθ)π(θ)

0.3 – The proportionality symbol ∝ expresses the fact that the product of the likelihood function and the prior distribution on the 0.2 – right hand side of Equation 1 must be scaled to integrate to one over the of plausible values for it to be a proper probability 0.1 – θ distribution. The scaled product, p(θD), is then called the posterior distribution for θ (given the data), and expresses what is now -4 -2 024 known about θ based on both the sample data and prior information (Figure 2).

Date of preparation: April 2009 3 NPR09/1108 What is Bayesian statistics?

The posterior distribution for θ is a Advantages of the Bayesian weighted compromise between the prior approach information and the sample data. In particular, if for some value of θ the The arguments that are made in favour of the likelihood in the right-hand side of Equation Bayesian approach are that it offers more 1 is small, so that the data suggests that this intuitive and meaningful inferences, that it value of θ is implausible, then the posterior gives the ability to tackle more complex distribution will also give small probability to problems and that it allows the incorporation this θ value. Similarly, if for some value of θ of prior information in addition to the data. the prior distribution in the right-hand side The Bayesian approach enables direct of Equation 1 is small, so that the statements to be made about information suggests that this value of θ is parameters of interest, whereas frequentist implausible, then, again, the posterior methods make indirect inferences by distribution will also give small probability to considering the data and more extreme but this θ value. In general, the posterior unobserved situations conditional on the null probability will be high for some θ only hypothesis being true; that is, p-values. Also, when both information sources support that classical 100(1–α)% confidence intervals do value. The simple and intuitive nature of not – although they appear to – provide an Bayes’ theorem as a mechanism for interval estimate containing the true value of synthesising information and updating the parameter with probability 100(1–α)%. To personal beliefs about unknown parameters understand this it is important to recognise is an attractive feature of the Bayesian that once a has been method. generated it either does or does not contain the true value of the parameter, in which case we say that the confidence interval has The nature of inference 100(1–α)% coefficient. In contrast, the Classical inference is usually based on Bayesian approach allows the construction of unbiased estimators defined to have interval estimates, known as credible expected value equal to the parameter being intervals, which do have a probabilistic estimated, a significance test of some null interpretation. hypothesis and confidence intervals. As with Statistical modelling can often generate , such inferences are quite complex problems and these can justified through long-run repetition of the quickly become difficult to deal with or to data. However, they do not – although it construct exact test statistics from using a may appear that they do – make direct frequentist approach. Often it is necessary to statements about parameters. For example, rely on large-sample approximations by consider the statement, ‘we reject the null assuming asymptotic normality. In hypothesis at the 5% level of significance’. contrast, Bayesian inferences can be This means that if we were to repeat the computed exactly, even in highly complex a large number of times then in situations. A simple example is the estimation 5% of occasions when the null hypothesis is of the probability of survival beyond one year true we will reject it. However, nothing is for patients given a new treatment for cancer. stated about this particular occasion. In Suppose that with standard treatment 40% of contrast, the Bayesian approach allows direct patients survive beyond one year. Prior probability statements to be made about the information suggests that the new treatment truth of the null hypothesis on the basis of improves survival. An expert investigator gives this sample of data (as a personal degree of a prior estimate of 45% and expresses her belief). In fact, many other probabilistic uncertainty as a of 7%. We statements about parameters can be made define the prior distribution to be Beta(22.28, from the posterior distribution and it is of 27.23) to give the required mean and standard particular value to simply plot the (posterior) deviation. After treating 70 patients, 34 distribution of the parameter of interest (48.6%) survive. The posterior distribution is given the data (Figure 2). Beta(56.28, 63.23), which has a mean of

Date of preparation: April 2009 4 NPR09/1108

What is Bayesian statistics?

47.1%, which is a compromise between the significance test or confidence interval to prior estimate and the sample estimate. employ is a subjective matter in frequentist Further examples of varying complexity can statistics. Such subjectivity is reduced but not be found on The BUGS Project website.4 removed by the requirement that the basic The use of prior information in addition form of the analysis should be prespecified. to sample data is fundamental to the There is no such choice to make in Bayesian Bayesian approach. Prior information of statistics, as once the posterior distribution some degree almost always exists and can has been obtained there is a unique make important contributions to strengthen (objective) answer to any properly specified inferences about unknown parameters question about the parameters. and/or to reduce sample sizes. Application of the Bayesian The main ‘controversies’ approach To avoid having to use informative prior In principle, the posterior distribution is distributions while still being able to use always available, although in realistically Bayesian tools, some authors have suggested complex problems it cannot be represented using non-informative prior distributions to analytically. This presented a barrier to the represent a state of prior ignorance.5 In so implementation of the Bayesian approach doing the analyst obtains some of the benefits until the development of numerical methods of a Bayesian approach, particularly that the and powerful computers during the late 20th results are presented in the intuitive way in century. Now, posterior distributions can be which one would like to make inferences. constructed for highly complex problems Another way to represent prior information is using Monte Carlo (MCMC) to specify sceptical prior distributions for the simulation. MCMC involves simulating a parameters in the model. In this context, the sample from the (joint) posterior distribution prior distribution is specified in such a way of the unknown parameters using one of that it automatically favours the standard three main : the treatment. Such a proposal is tempting, Metropolis–Hastings , Gibbs particularly in a regulatory framework where and slice sampling. With a rigorous standards and safeguards are sufficiently large sample, we are able to demanded. However, both ideas suffer from numerically generate the whole distribution serious objections, and both fail to exploit the from which we can make any inferences of full potential of the Bayesian approach. In interest. (Suggested further reading on MCMC addition, there is no unique way to can be found at the end of this document.) implement either idea, and hence subjectivity WinBUGS is a software package that is not removed. For all but the simplest of implements MCMC algorithms without the situations, there is no general agreement analyst having to write their own sampling regarding non-informative prior distributions, algorithms and is able to analyse highly and the posterior answers depend on the complex problems.6 (subjective) way in which the model is specified. There can be even less agreement References over what constitutes a ‘sceptical’ prior 1. Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of distribution. parallel-group randomised trials. Lancet 2001; 357: 1191–1194. Of course, the prior distribution in a 2. European Medicines Agency. ICH Topic E 9: Statistical Principles for Clinical Trials (CPMP/ICH/363/96). Bayesian analysis is not the only place where www.emea.europa.eu/pdfs/human/ich/036396en.pdf (last subjective judgements are in danger of accessed 22 January 2009) 3. Senn S. Dicing with Death: Chance, Risk and Health. Cambridge: entering the analysis. Any statistical model, Cambridge University Press, 2003. whether formulated for a frequentist or 4. The BUGS Project. www.mrc-bsu.cam.ac.uk/bugs/ (last accessed 22 January 2009) Bayesian analysis, is a matter of subjective 5. Briggs AH. A Bayesian approach to stochastic cost-effectiveness judgement, and it is commonplace that analysis. Health Econ 1999; 8: 257–261. different statisticians make different choices. 6. Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS - A Bayesian modelling framework: Concepts, structure, and Furthermore, the choice of which estimator, extensibility. Statistics and Computing 2000; 10: 325–337.

Date of preparation: April 2009 5 NPR09/1108 What is...? series

Further reading 4. O’Hagan A, Forster J. Kendall’s Advanced Theory of Statistics: What is 1. Bolstad WM. Introduction to Bayesian Statistics. Chichester: Volume 2B: , 2nd edn. Oxford: Oxford University Wiley, 2004. Press, 2004. Bayesian statistics? 2. Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte 5. Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Carlo in Practice. London: Chapman & Hall, 1996. Clinical Trials and Health-Care Evaluation. Chichester: Wiley, 3. Lee PM. Bayesian Statistics: An Introduction, 3rd edn. London: 2004. Arnold, 2004.

This publication, along with the others in the series,is available on the internet at www.whatisseries.co.uk The data, opinions and statements appearing in the article(s) herein are those of the contributor(s) concerned. Accordingly, the sponsor and publisher, and their respective employees, officers and agents, accept no liability for the consequences of any such inaccurate or misleading data, opinion or statement.

Published by Hayward Medical Communications, a division of Hayward Group Ltd. Copyright © 2009 Hayward Group Ltd. Supported by sanofi-aventis All rights reserved.

Date of preparation: April 2009 6 NPR09/1108