Frequentist and Bayesian Statistics

Faculty of Life Sciences Frequentist and Bayesian statistics Claus Ekstrøm E-mail: [email protected] Outline 1 Frequentists and Bayesians • What is a probability? • Interpretation of results / inference 2 Comparisons 3 Markov chain Monte Carlo Slide 2— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics What is a probability? Two schools in statistics: frequentists and Bayesians. Slide 3— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Frequentist school School of Jerzy Neyman, Egon Pearson and Ronald Fischer. Slide 4— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Bayesian school “School” of Thomas Bayes P(D|H) · P(H) P(H|D)= P(D|H) · P(H)dH Slide 5— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Frequentists Frequentists talk about probabilities in relation to experiments with a random component. Relative frequency of an event, A,isdefinedas number of outcomes consistent with A P(A)= number of experiments The probability of event A is the limiting relative frequency. Relative frequency 0.0 0.2 0.4 0.6 0.8 1.0 020406080100 n Slide 6— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Frequentists — 2 The definition restricts the things we can add probabilities to: What is the probability of there being life on Mars 100 billion years ago? We assume that there is an unknown but fixed underlying parameter, θ, for a population (i.e., the mean height on Danish men). Random variation (environmental factors, measurement errors, ...) means that each observation does not result in the true value. Slide 7— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics The meta-experiment idea Frequentists think of meta-experiments and consider the current dataset as a single realization from all possible datasets. Slide 8— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics The meta-experiment idea Frequentists think of meta-experiments and consider the current dataset as a single realization from all possible datasets. 167.2 cm Slide 8— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics The meta-experiment idea Frequentists think of meta-experiments and consider the current dataset as a single realization from all possible datasets. 167.2 cm 175.5 cm Slide 8— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics The meta-experiment idea Frequentists think of meta-experiments and consider the current dataset as a single realization from all possible datasets. 167.2 cm 175.5 cm 187.7 cm Slide 8— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics The meta-experiment idea Frequentists think of meta-experiments and consider the current dataset as a single realization from all possible datasets. 167.2 cm 175.5 cm 187.7 cm 182.0 cm Slide 8— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Confidence intervals Thus a frequentist believes that a population mean is real, but unknown, and unknowable, and can only be estimated from the data. Knowing the distribution for the sample mean, he constructs a confidence interval, centered at the sample mean. • Either the true mean is in the interval or it is not. Can’t say there’s a 95% probability (long-run fraction having this characteristic) that the true mean is in this interval, because it’s either already in, or it’s not. • Reason: true mean is fixed value, which doesn’t have a distribution. • The sample mean does have a distribution! Thus must use statements like “95% of similar intervals would contain the true mean, if each interval were constructed from a different random sample like this one.” Slide 9— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Maximum likelihood How will the frequentist estimate the parameter? Slide 10— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Maximum likelihood How will the frequentist estimate the parameter? Answer: maximum likelihood. Slide 10— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Maximum likelihood How will the frequentist estimate the parameter? Answer: maximum likelihood. Basic idea Our best estimate of the parameter(s) are the one(s) that make our observed data most likely. We know what we have observed so far (our data). Our best “guess” would therefore be to select parameters that make our observations most likely. Binomial distribution: n P(Y = y)= py (1 − p)n−y y Slide 10— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Bayesians Each investigator is entitled to his/hers personal belief ... the prior information. No fixed values for parameters but a distribution. Thumb tack pin pointing down: All distributions are subjective. Yours is as good as mine. Can still talk about the mean — but it is the mean of my distribution. In many cases trying to Prior distribution circumvent by using vague priors. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 Theta Slide 11— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Credibility intervals Bayesians have an altogether different world-view. They say that only the data are real. The population mean is an abstraction, and as such some values are more believable than others based on the data and their prior beliefs. Slide 12— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Credibility intervals Bayesians have an altogether different world-view. They say that only the data are real. The population mean is an abstraction, and as such some values are more believable than others based on the data and their prior beliefs. The Bayesian constructs a credibility interval,centerednear the sample mean, but tempered by “prior” beliefs concerning the mean. Now the Bayesian can say what the frequentist cannot: “There is a 95% probability (degree of believability) that this interval contains the mean.” Slide 12— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Comparison Advantages Disadvantages Frequentist Objective Confidence intervals (not quite the desired) Calculations Bayesian Credibility intervals Subjective (usually the desired) Complex models Calculations Slide 13— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics In summary • A frequentist is a person whose long-run ambition is to be wrong 5% of the time. • A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. Slide 14— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics In summary • A frequentist is a person whose long-run ambition is to be wrong 5% of the time. • A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. A frequentist uses impeccable logic to answer the wrong question, while a Bayesean answers the right questionbymakingassumptionsthatnobodycan fully believe in. P. G. Hamer Slide 14— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Jury duty Slide 15— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Example: speed of light What is the speed of light in vacuum “really”? Results (m/s) 299792459.2 299792460.0 299792456.3 299792458.1 299792459.5 Slide 16— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Example: frequentists solution The average of our observations is an estimate of the true, fixed (but unknown) speed of light, θˆ = 299792458.6. Conclusion: If we were to repeat this sequence of 5 measurements a repeated number of times, approximately 95% of my estimators will be within 1.83 m/s to the true speed of light. However, on this particular occasion where I have already calculated my statistic, I have no clue how close I actually am to the true value, but I feel comfortable that I am doing okay because of certain properties that my estimator has on repeated uses. Slide 17— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Example: Bayesian solution The observations are fixed realization from the underlying distribution of the true speed of light. 1 “Guess” what the distribution of the speed of light is (the prior distribution). 2 Use Bayes Theorem to modify/update the prior distribution based on the observed data. 3 The modified distribution is denoted the posterior distribution. The posterior distribution holds the information about the true speed of light – and this distribution is entirely subjective. Slide 18— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Markov Chain Monte Carlo Having a likelihood does not necessarily makes it easy to work with. In Bayesian statistics the posterior distribution contains all relevant information about the parameters. Statistical inference is often calculated from summaries (integrals) J = L(x)dx However, these evaluations are not necessarily easy. Slide 19— PhD (Aug 23rd 2011) — Frequentist and Bayesian statistics Bayesian modelling, Markov Chain Monte Carlo, Graphical Models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark August 13, 2012 Contents 1 Bayesian modeling 2 2 Inference 2 3 Bayesian models based on DAGs 3 3.1 Example: Independent samples . .3 3.2 Example: Linear regression . .4 3.3 Example: Random regression model . .4 4 Computations using Monte Carlo methods 5 4.1 Rejection sampling . .5 4.2 Example: Rejection sampling . .6 4.3 Sampling importance resampling (SIR)* . .7 4.4 Markov Chain Monte Carlo methods . .8 4.5 The Metropolis{Hastings algorithm . .8 4.6 Special cases . .9 4.7 Example: Metropolis{Hastings algorithm . 10 4.8 Single component Metropolis{Hastings . 10 4.9 Gibbssampler* ................................. 11 4.10 Sampling in high dimensions { problems . 11 5 Conditional independence 12 1 1 Bayesian modeling • In a Bayesian setting, parameters are treated as random quantities on equal footing with the random variables. • The joint distribution of a parameter (vector) θ and data (vector) y is specified through a prior distribution π(θ) for θ and a conditional distribution p(y j θ) of data for a fixed value of θ.

Frequentist and Bayesian Statistics

F:\RSS\Me\Society's Mathemarica

Two Principles of Evidence and Their Implications for the Philosophy of Scientific Method

The Likelihood Principle

School of Social Sciences Economics Division University of Southampton Southampton SO17 1BJ, UK

Statistical Inference a Work in Progress

“It Took a Global Conflict”— the Second World War and Probability in British

Herman Otto Hartley, Distinguished Professor the Texas A&M University Department of Statistics Established the H. O. Hartley

“I Didn't Want to Be a Statistician”

Should the Widest Cleft in Statistics - How and Why Fisher Oppos Ed Neyman and Pearson

Econometrics ELSEVIER Journal of Econometrics 67 (1995) 5-24

The Economist

Professor A. L. Bowley's Theory of the Representative Method *