Bayesian Alternatives for Common Null-Hypothesis Significance Tests
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Arxiv:1803.00360V2 [Stat.CO] 2 Mar 2018 Prone to Misinterpretation [2, 3]
Computing Bayes factors to measure evidence from experiments: An extension of the BIC approximation Thomas J. Faulkenberry∗ Tarleton State University Bayesian inference affords scientists with powerful tools for testing hypotheses. One of these tools is the Bayes factor, which indexes the extent to which support for one hypothesis over another is updated after seeing the data. Part of the hesitance to adopt this approach may stem from an unfamiliarity with the computational tools necessary for computing Bayes factors. Previous work has shown that closed form approximations of Bayes factors are relatively easy to obtain for between-groups methods, such as an analysis of variance or t-test. In this paper, I extend this approximation to develop a formula for the Bayes factor that directly uses infor- mation that is typically reported for ANOVAs (e.g., the F ratio and degrees of freedom). After giving two examples of its use, I report the results of simulations which show that even with minimal input, this approximate Bayes factor produces similar results to existing software solutions. Note: to appear in Biometrical Letters. I. INTRODUCTION A. The Bayes factor Bayesian inference is a method of measurement that is Hypothesis testing is the primary tool for statistical based on the computation of P (H j D), which is called inference across much of the biological and behavioral the posterior probability of a hypothesis H, given data D. sciences. As such, most scientists are trained in classical Bayes' theorem casts this probability as null hypothesis significance testing (NHST). The scenario for testing a hypothesis is likely familiar to most readers of this journal. -
The Bayesian Approach to Statistics
THE BAYESIAN APPROACH TO STATISTICS ANTHONY O’HAGAN INTRODUCTION the true nature of scientific reasoning. The fi- nal section addresses various features of modern By far the most widely taught and used statisti- Bayesian methods that provide some explanation for the rapid increase in their adoption since the cal methods in practice are those of the frequen- 1980s. tist school. The ideas of frequentist inference, as set out in Chapter 5 of this book, rest on the frequency definition of probability (Chapter 2), BAYESIAN INFERENCE and were developed in the first half of the 20th century. This chapter concerns a radically differ- We first present the basic procedures of Bayesian ent approach to statistics, the Bayesian approach, inference. which depends instead on the subjective defini- tion of probability (Chapter 3). In some respects, Bayesian methods are older than frequentist ones, Bayes’s Theorem and the Nature of Learning having been the basis of very early statistical rea- Bayesian inference is a process of learning soning as far back as the 18th century. Bayesian from data. To give substance to this statement, statistics as it is now understood, however, dates we need to identify who is doing the learning and back to the 1950s, with subsequent development what they are learning about. in the second half of the 20th century. Over that time, the Bayesian approach has steadily gained Terms and Notation ground, and is now recognized as a legitimate al- ternative to the frequentist approach. The person doing the learning is an individual This chapter is organized into three sections. -
Scalable and Robust Bayesian Inference Via the Median Posterior
Scalable and Robust Bayesian Inference via the Median Posterior CS 584: Big Data Analytics Material adapted from David Dunson’s talk (http://bayesian.org/sites/default/files/Dunson.pdf) & Lizhen Lin’s ICML talk (http://techtalks.tv/talks/scalable-and-robust-bayesian-inference-via-the-median-posterior/61140/) Big Data Analytics • Large (big N) and complex (big P with interactions) data are collected routinely • Both speed & generality of data analysis methods are important • Bayesian approaches offer an attractive general approach for modeling the complexity of big data • Computational intractability of posterior sampling is a major impediment to application of flexible Bayesian methods CS 584 [Spring 2016] - Ho Existing Frequentist Approaches: The Positives • Optimization-based approaches, such as ADMM or glmnet, are currently most popular for analyzing big data • General and computationally efficient • Used orders of magnitude more than Bayes methods • Can exploit distributed & cloud computing platforms • Can borrow some advantages of Bayes methods through penalties and regularization CS 584 [Spring 2016] - Ho Existing Frequentist Approaches: The Drawbacks • Such optimization-based methods do not provide measure of uncertainty • Uncertainty quantification is crucial for most applications • Scalable penalization methods focus primarily on convex optimization — greatly limits scope and puts ceiling on performance • For non-convex problems and data with complex structure, existing optimization algorithms can fail badly CS 584 [Spring 2016] - -
1 Estimation and Beyond in the Bayes Universe
ISyE8843A, Brani Vidakovic Handout 7 1 Estimation and Beyond in the Bayes Universe. 1.1 Estimation No Bayes estimate can be unbiased but Bayesians are not upset! No Bayes estimate with respect to the squared error loss can be unbiased, except in a trivial case when its Bayes’ risk is 0. Suppose that for a proper prior ¼ the Bayes estimator ±¼(X) is unbiased, Xjθ (8θ)E ±¼(X) = θ: This implies that the Bayes risk is 0. The Bayes risk of ±¼(X) can be calculated as repeated expectation in two ways, θ Xjθ 2 X θjX 2 r(¼; ±¼) = E E (θ ¡ ±¼(X)) = E E (θ ¡ ±¼(X)) : Thus, conveniently choosing either EθEXjθ or EX EθjX and using the properties of conditional expectation we have, θ Xjθ 2 θ Xjθ X θjX X θjX 2 r(¼; ±¼) = E E θ ¡ E E θ±¼(X) ¡ E E θ±¼(X) + E E ±¼(X) θ Xjθ 2 θ Xjθ X θjX X θjX 2 = E E θ ¡ E θ[E ±¼(X)] ¡ E ±¼(X)E θ + E E ±¼(X) θ Xjθ 2 θ X X θjX 2 = E E θ ¡ E θ ¢ θ ¡ E ±¼(X)±¼(X) + E E ±¼(X) = 0: Bayesians are not upset. To check for its unbiasedness, the Bayes estimator is averaged with respect to the model measure (Xjθ), and one of the Bayesian commandments is: Thou shall not average with respect to sample space, unless you have Bayesian design in mind. Even frequentist agree that insisting on unbiasedness can lead to bad estimators, and that in their quest to minimize the risk by trading off between variance and bias-squared a small dosage of bias can help. -
Introduction to Bayesian Inference and Modeling Edps 590BAY
Introduction to Bayesian Inference and Modeling Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2019 Introduction What Why Probability Steps Example History Practice Overview ◮ What is Bayes theorem ◮ Why Bayesian analysis ◮ What is probability? ◮ Basic Steps ◮ An little example ◮ History (not all of the 705+ people that influenced development of Bayesian approach) ◮ In class work with probabilities Depending on the book that you select for this course, read either Gelman et al. Chapter 1 or Kruschke Chapters 1 & 2. C.J. Anderson (Illinois) Introduction Fall 2019 2.2/ 29 Introduction What Why Probability Steps Example History Practice Main References for Course Throughout the coures, I will take material from ◮ Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., & Rubin, D.B. (20114). Bayesian Data Analysis, 3rd Edition. Boco Raton, FL, CRC/Taylor & Francis.** ◮ Hoff, P.D., (2009). A First Course in Bayesian Statistical Methods. NY: Sringer.** ◮ McElreath, R.M. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boco Raton, FL, CRC/Taylor & Francis. ◮ Kruschke, J.K. (2015). Doing Bayesian Data Analysis: A Tutorial with JAGS and Stan. NY: Academic Press.** ** There are e-versions these of from the UofI library. There is a verson of McElreath, but I couldn’t get if from UofI e-collection. C.J. Anderson (Illinois) Introduction Fall 2019 3.3/ 29 Introduction What Why Probability Steps Example History Practice Bayes Theorem A whole semester on this? p(y|θ)p(θ) p(θ|y)= p(y) where ◮ y is data, sample from some population. -
Statistical Inference: Paradigms and Controversies in Historic Perspective
Jostein Lillestøl, NHH 2014 Statistical inference: Paradigms and controversies in historic perspective 1. Five paradigms We will cover the following five lines of thought: 1. Early Bayesian inference and its revival Inverse probability – Non-informative priors – “Objective” Bayes (1763), Laplace (1774), Jeffreys (1931), Bernardo (1975) 2. Fisherian inference Evidence oriented – Likelihood – Fisher information - Necessity Fisher (1921 and later) 3. Neyman- Pearson inference Action oriented – Frequentist/Sample space – Objective Neyman (1933, 1937), Pearson (1933), Wald (1939), Lehmann (1950 and later) 4. Neo - Bayesian inference Coherent decisions - Subjective/personal De Finetti (1937), Savage (1951), Lindley (1953) 5. Likelihood inference Evidence based – likelihood profiles – likelihood ratios Barnard (1949), Birnbaum (1962), Edwards (1972) Classical inference as it has been practiced since the 1950’s is really none of these in its pure form. It is more like a pragmatic mix of 2 and 3, in particular with respect to testing of significance, pretending to be both action and evidence oriented, which is hard to fulfill in a consistent manner. To keep our minds on track we do not single out this as a separate paradigm, but will discuss this at the end. A main concern through the history of statistical inference has been to establish a sound scientific framework for the analysis of sampled data. Concepts were initially often vague and disputed, but even after their clarification, various schools of thought have at times been in strong opposition to each other. When we try to describe the approaches here, we will use the notions of today. All five paradigms of statistical inference are based on modeling the observed data x given some parameter or “state of the world” , which essentially corresponds to stating the conditional distribution f(x|(or making some assumptions about it). -
Hierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Contact information Please report any typos or errors to geoff[email protected] Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Outline 1 Hierarchical Bayesian Modelling Coin toss redux: point estimates for θ Hierarchical models Application to clinical study 2 Bayesian Model Selection Introduction Bayes Factors Shortcut for Marginal Likelihood in Conjugate Case Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Outline 1 Hierarchical Bayesian Modelling Coin toss redux: point estimates for θ Hierarchical models Application to clinical study 2 Bayesian Model Selection Introduction Bayes Factors Shortcut for Marginal Likelihood in Conjugate Case Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 Let Y be a random variable denoting number of observed heads in n coin tosses. Then, we can model Y ∼ Bin(n; θ), with probability mass function n p(Y = y j θ) = θy (1 − θ)n−y (1) y We want to estimate the parameter θ Coin toss: point estimates for θ Probability model Consider the experiment of tossing a coin n times. Each toss results in heads with probability θ and tails with probability 1 − θ Geoffrey Roeder Hierarchical Models & Bayesian Model Selection Jan. 20, 2016 We want to estimate the parameter θ Coin toss: point estimates for θ Probability model Consider the experiment of tossing a coin n times. Each toss results in heads with probability θ and tails with probability 1 − θ Let Y be a random variable denoting number of observed heads in n coin tosses. -
Bayes Factors in Practice Author(S): Robert E
Bayes Factors in Practice Author(s): Robert E. Kass Source: Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 42, No. 5, Special Issue: Conference on Practical Bayesian Statistics, 1992 (2) (1993), pp. 551-560 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2348679 Accessed: 10/08/2010 18:09 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=black. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. -
Bayes Factor Consistency
Bayes Factor Consistency Siddhartha Chib John M. Olin School of Business, Washington University in St. Louis and Todd A. Kuffner Department of Mathematics, Washington University in St. Louis July 1, 2016 Abstract Good large sample performance is typically a minimum requirement of any model selection criterion. This article focuses on the consistency property of the Bayes fac- tor, a commonly used model comparison tool, which has experienced a recent surge of attention in the literature. We thoroughly review existing results. As there exists such a wide variety of settings to be considered, e.g. parametric vs. nonparametric, nested vs. non-nested, etc., we adopt the view that a unified framework has didactic value. Using the basic marginal likelihood identity of Chib (1995), we study Bayes factor asymptotics by decomposing the natural logarithm of the ratio of marginal likelihoods into three components. These are, respectively, log ratios of likelihoods, prior densities, and posterior densities. This yields an interpretation of the log ra- tio of posteriors as a penalty term, and emphasizes that to understand Bayes factor consistency, the prior support conditions driving posterior consistency in each respec- tive model under comparison should be contrasted in terms of the rates of posterior contraction they imply. Keywords: Bayes factor; consistency; marginal likelihood; asymptotics; model selection; nonparametric Bayes; semiparametric regression. 1 1 Introduction Bayes factors have long held a special place in the Bayesian inferential paradigm, being the criterion of choice in model comparison problems for such Bayesian stalwarts as Jeffreys, Good, Jaynes, and others. An excellent introduction to Bayes factors is given by Kass & Raftery (1995). -
Bayesian Inference for Median of the Lognormal Distribution K
Journal of Modern Applied Statistical Methods Volume 15 | Issue 2 Article 32 11-1-2016 Bayesian Inference for Median of the Lognormal Distribution K. Aruna Rao SDM Degree College, Ujire, India, [email protected] Juliet Gratia D'Cunha Mangalore University, Mangalagangothri, India, [email protected] Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Rao, K. Aruna and D'Cunha, Juliet Gratia (2016) "Bayesian Inference for Median of the Lognormal Distribution," Journal of Modern Applied Statistical Methods: Vol. 15 : Iss. 2 , Article 32. DOI: 10.22237/jmasm/1478003400 Available at: http://digitalcommons.wayne.edu/jmasm/vol15/iss2/32 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Bayesian Inference for Median of the Lognormal Distribution Cover Page Footnote Acknowledgements The es cond author would like to thank Government of India, Ministry of Science and Technology, Department of Science and Technology, New Delhi, for sponsoring her with an INSPIRE fellowship, which enables her to carry out the research program which she has undertaken. She is much honored to be the recipient of this award. This regular article is available in Journal of Modern Applied Statistical Methods: http://digitalcommons.wayne.edu/jmasm/vol15/ iss2/32 Journal of Modern Applied Statistical Methods Copyright © 2016 JMASM, Inc. November 2016, Vol. 15, No. 2, 526-535. -
9 Bayesian Inference
9 Bayesian inference 1702 - 1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake £pM to win £M and equally prepared to accept a stake of £pM to win £M. In other words ... ... the bet is fair and you are assumed to behave rationally. 9.1.1 Kolmogorov’s axioms How does subjective probability fit in with the fundamental axioms? Let A be the set of all subsets of a countable sample space Ω. Then (i) P(A) ≥ 0 for every A ∈A; (ii) P(Ω)=1; 83 (iii) If {Aλ : λ ∈ Λ} is a countable set of mutually exclusive events belonging to A,then P Aλ = P (Aλ) . λ∈Λ λ∈Λ Obviously the subjective interpretation has no difficulty in conforming with (i) and (ii). (iii) is slightly less obvious. Suppose we have 2 events A and B such that A ∩ B = ∅. Consider a stake of £pAM to win £M if A occurs and a stake £pB M to win £M if B occurs. The total stake for bets on A or B occurring is £pAM+ £pBM to win £M if A or B occurs. Thus we have £(pA + pB)M to win £M and so P (A ∪ B)=P(A)+P(B) 9.1.2 Conditional probability Define pB , pAB , pA|B such that £pBM is the fair stake for £M if B occurs; £pABM is the fair stake for £M if A and B occur; £pA|BM is the fair stake for £M if A occurs given B has occurred − other- wise the bet is off. -
Bayesian Inference
Bayesian Inference Thomas Nichols With thanks Lee Harrison Bayesian segmentation Spatial priors Posterior probability Dynamic Causal and normalisation on activation extent maps (PPMs) Modelling Attention to Motion Paradigm Results SPC V3A V5+ Attention – No attention Büchel & Friston 1997, Cereb. Cortex Büchel et al. 1998, Brain - fixation only - observe static dots + photic V1 - observe moving dots + motion V5 - task on moving dots + attention V5 + parietal cortex Attention to Motion Paradigm Dynamic Causal Models Model 1 (forward): Model 2 (backward): attentional modulation attentional modulation of V1→V5: forward of SPC→V5: backward Photic SPC Attention Photic SPC V1 V1 - fixation only V5 - observe static dots V5 - observe moving dots Motion Motion - task on moving dots Attention Bayesian model selection: Which model is optimal? Responses to Uncertainty Long term memory Short term memory Responses to Uncertainty Paradigm Stimuli sequence of randomly sampled discrete events Model simple computational model of an observers response to uncertainty based on the number of past events (extent of memory) 1 2 3 4 Question which regions are best explained by short / long term memory model? … 1 2 40 trials ? ? Overview • Introductory remarks • Some probability densities/distributions • Probabilistic (generative) models • Bayesian inference • A simple example – Bayesian linear regression • SPM applications – Segmentation – Dynamic causal modeling – Spatial models of fMRI time series Probability distributions and densities k=2 Probability distributions