1 Introduction to R and Bayes Programming
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Statistical Methods for Data Science, Lecture 5 Interval Estimates; Comparing Systems
Statistical methods for Data Science, Lecture 5 Interval estimates; comparing systems Richard Johansson November 18, 2018 statistical inference: overview I estimate the value of some parameter (last lecture): I what is the error rate of my drug test? I determine some interval that is very likely to contain the true value of the parameter (today): I interval estimate for the error rate I test some hypothesis about the parameter (today): I is the error rate significantly different from 0.03? I are users significantly more satisfied with web page A than with web page B? -20pt “recipes” I in this lecture, we’ll look at a few “recipes” that you’ll use in the assignment I interval estimate for a proportion (“heads probability”) I comparing a proportion to a specified value I comparing two proportions I additionally, we’ll see the standard method to compute an interval estimate for the mean of a normal I I will also post some pointers to additional tests I remember to check that the preconditions are satisfied: what kind of experiment? what assumptions about the data? -20pt overview interval estimates significance testing for the accuracy comparing two classifiers p-value fishing -20pt interval estimates I if we get some estimate by ML, can we say something about how reliable that estimate is? I informally, an interval estimate for the parameter p is an interval I = [plow ; phigh] so that the true value of the parameter is “likely” to be contained in I I for instance: with 95% probability, the error rate of the spam filter is in the interval [0:05; 0:08] -20pt frequentists and Bayesians again. -
Naïve Bayes Classifier
CS 60050 Machine Learning Naïve Bayes Classifier Some slides taken from course materials of Tan, Steinbach, Kumar Bayes Classifier ● A probabilistic framework for solving classification problems ● Approach for modeling probabilistic relationships between the attribute set and the class variable – May not be possible to certainly predict class label of a test record even if it has identical attributes to some training records – Reason: noisy data or presence of certain factors that are not included in the analysis Probability Basics ● P(A = a, C = c): joint probability that random variables A and C will take values a and c respectively ● P(A = a | C = c): conditional probability that A will take the value a, given that C has taken value c P(A,C) P(C | A) = P(A) P(A,C) P(A | C) = P(C) Bayes Theorem ● Bayes theorem: P(A | C)P(C) P(C | A) = P(A) ● P(C) known as the prior probability ● P(C | A) known as the posterior probability Example of Bayes Theorem ● Given: – A doctor knows that meningitis causes stiff neck 50% of the time – Prior probability of any patient having meningitis is 1/50,000 – Prior probability of any patient having stiff neck is 1/20 ● If a patient has stiff neck, what’s the probability he/she has meningitis? P(S | M )P(M ) 0.5×1/50000 P(M | S) = = = 0.0002 P(S) 1/ 20 Bayesian Classifiers ● Consider each attribute and class label as random variables ● Given a record with attributes (A1, A2,…,An) – Goal is to predict class C – Specifically, we want to find the value of C that maximizes P(C| A1, A2,…,An ) ● Can we estimate -
The Bayesian Approach to Statistics
THE BAYESIAN APPROACH TO STATISTICS ANTHONY O’HAGAN INTRODUCTION the true nature of scientific reasoning. The fi- nal section addresses various features of modern By far the most widely taught and used statisti- Bayesian methods that provide some explanation for the rapid increase in their adoption since the cal methods in practice are those of the frequen- 1980s. tist school. The ideas of frequentist inference, as set out in Chapter 5 of this book, rest on the frequency definition of probability (Chapter 2), BAYESIAN INFERENCE and were developed in the first half of the 20th century. This chapter concerns a radically differ- We first present the basic procedures of Bayesian ent approach to statistics, the Bayesian approach, inference. which depends instead on the subjective defini- tion of probability (Chapter 3). In some respects, Bayesian methods are older than frequentist ones, Bayes’s Theorem and the Nature of Learning having been the basis of very early statistical rea- Bayesian inference is a process of learning soning as far back as the 18th century. Bayesian from data. To give substance to this statement, statistics as it is now understood, however, dates we need to identify who is doing the learning and back to the 1950s, with subsequent development what they are learning about. in the second half of the 20th century. Over that time, the Bayesian approach has steadily gained Terms and Notation ground, and is now recognized as a legitimate al- ternative to the frequentist approach. The person doing the learning is an individual This chapter is organized into three sections. -
LINEAR REGRESSION MODELS Likelihood and Reference Bayesian
{ LINEAR REGRESSION MODELS { Likeliho o d and Reference Bayesian Analysis Mike West May 3, 1999 1 Straight Line Regression Mo dels We b egin with the simple straight line regression mo del y = + x + i i i where the design p oints x are xed in advance, and the measurement/sampling i 2 errors are indep endent and normally distributed, N 0; for each i = i i 1;::: ;n: In this context, wehavelooked at general mo delling questions, data and the tting of least squares estimates of and : Now we turn to more formal likeliho o d and Bayesian inference. 1.1 Likeliho o d and MLEs The formal parametric inference problem is a multi-parameter problem: we 2 require inferences on the three parameters ; ; : The likeliho o d function has a simple enough form, as wenow show. Throughout, we do not indicate the design p oints in conditioning statements, though they are implicitly conditioned up on. Write Y = fy ;::: ;y g and X = fx ;::: ;x g: Given X and the mo del 1 n 1 n parameters, each y is the corresp onding zero-mean normal random quantity i plus the term + x ; so that y is normal with this term as its mean and i i i 2 variance : Also, since the are indep endent, so are the y : Thus i i 2 2 y j ; ; N + x ; i i with conditional density function 2 2 2 2 1=2 py j ; ; = expfy x =2 g=2 i i i for each i: Also, by indep endence, the joint density function is n Y 2 2 : py j ; ; = pY j ; ; i i=1 1 Given the observed resp onse values Y this provides the likeliho o d function for the three parameters: the joint density ab ove evaluated at the observed and 2 hence now xed values of Y; now viewed as a function of ; ; as they vary across p ossible parameter values. -
Points for Discussion
Bayesian Analysis in Medicine EPIB-677 Points for Discussion 1. Precisely what information does a p-value provide? 2. What is the correct (and incorrect) way to interpret a confi- dence interval? 3. What is Bayes Theorem? How does it operate? 4. Summary of above three points: What exactly does one learn about a parameter of interest after having done a frequentist analysis? Bayesian analysis? 5. Example 8 from the book chapter by Jim Berger. 6. Example 13 from the book chapter by Jim Berger. 7. Example 17 from the book chapter by Jim Berger. 8. Examples where there is a choice between a binomial or a negative binomial likelihood, found in the paper by Berger and Berry. 9. Problems with specifying prior distributions. 10. In what types of epidemiology data analysis situations are Bayesian methods particularly useful? 11. What does decision analysis add to a Bayesian analysis? 1. Precisely what information does a p-value provide? Recall the definition of a p-value: The probability of observing a test statistic as extreme as or more extreme than the observed value, as- suming that the null hypothesis is true. 2. What is the correct (and incorrect) way to interpret a confi- dence interval? Does a 95% confidence interval provide a 95% probability region for the true parameter value? If not, what it the correct interpretation? In practice, it is usually helpful to consider the following graphic: Figure 1: How to interpret confidence intervals and/or credible regions. Depending on where the confidence/credible interval lies in relation to a region of clinical equivalence, different conclusions can be drawn. -
The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective
Published online: 2017-Feb-08 Corrected proofs submitted: 2017-Jan-16 Accepted: 2016-Dec-16 Published version at http://dx.doi.org/10.3758/s13423-016-1221-4 Revision 2 submitted: 2016-Nov-15 View only at http://rdcu.be/o6hd Editor action 2: 2016-Oct-12 In Press, Psychonomic Bulletin & Review. Revision 1 submitted: 2016-Apr-16 Version of November 15, 2016. Editor action 1: 2015-Aug-23 Initial submission: 2015-May-13 The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective John K. Kruschke and Torrin M. Liddell Indiana University, Bloomington, USA In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty, on the other hand. Among frequentists in psychology a shift of emphasis from hypothesis testing to estimation has been dubbed “the New Statistics” (Cumming, 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis. Keywords: Null hypothesis significance testing, Bayesian inference, Bayes factor, confidence interval, credible interval, highest density interval, region of practical equivalence, meta-analysis, power analysis, effect size, randomized controlled trial. he New Statistics emphasizes a shift of empha- phasis. Bayesian methods provide a coherent framework for sis away from null hypothesis significance testing hypothesis testing, so when null hypothesis testing is the crux T (NHST) to “estimation based on effect sizes, confi- of the research then Bayesian null hypothesis testing should dence intervals, and meta-analysis” (Cumming, 2014, p. -
More on Bayesian Methods: Part II
More on Bayesian Methods: Part II Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 5 1 Today's menu I Confidence intervals I Credible Intervals I Example 2 Confidence intervals vs credible intervals A confidence interval for an unknown (fixed) parameter θ is an interval of numbers that we believe is likely to contain the true value of θ: Intervals are important because they provide us with an idea of how well we can estimate θ: 3 Confidence intervals vs credible intervals I A confidence interval is constructed to contain θ a percentage of the time, say 95%. I Suppose our confidence level is 95% and our interval is (L; U). Then we are 95% confident that the true value of θ is contained in (L; U) in the long run. I In the long run means that this would occur nearly 95% of the time if we repeated our study millions and millions of times. 4 Common Misconceptions in Statistical Inference I A confidence interval is a statement about θ (a population parameter). I It is not a statement about the sample. I It is also not a statement about individual subjects in the population. 5 Common Misconceptions in Statistical Inference I Let a 95% confidence interval for the average amount of television watched by Americans be (2.69, 6.04) hours. I This doesn't mean we can say that 95% of all Americans watch between 2.69 and 6.04 hours of television. I We also cannot say that 95% of Americans in the sample watch between 2.69 and 6.04 hours of television. -
On the Frequentist Coverage of Bayesian Credible Intervals for Lower Bounded Means
Electronic Journal of Statistics Vol. 2 (2008) 1028–1042 ISSN: 1935-7524 DOI: 10.1214/08-EJS292 On the frequentist coverage of Bayesian credible intervals for lower bounded means Eric´ Marchand D´epartement de math´ematiques, Universit´ede Sherbrooke, Sherbrooke, Qc, CANADA, J1K 2R1 e-mail: [email protected] William E. Strawderman Department of Statistics, Rutgers University, 501 Hill Center, Busch Campus, Piscataway, N.J., USA 08854-8019 e-mail: [email protected] Keven Bosa Statistics Canada - Statistique Canada, 100, promenade Tunney’s Pasture, Ottawa, On, CANADA, K1A 0T6 e-mail: [email protected] Aziz Lmoudden D´epartement de math´ematiques, Universit´ede Sherbrooke, Sherbrooke, Qc, CANADA, J1K 2R1 e-mail: [email protected] Abstract: For estimating a lower bounded location or mean parameter for a symmetric and logconcave density, we investigate the frequentist per- formance of the 100(1 − α)% Bayesian HPD credible set associated with priors which are truncations of flat priors onto the restricted parameter space. Various new properties are obtained. Namely, we identify precisely where the minimum coverage is obtained and we show that this minimum 3α 3α α2 coverage is bounded between 1 − 2 and 1 − 2 + 1+α ; with the lower 3α bound 1 − 2 improving (for α ≤ 1/3) on the previously established ([9]; 1−α [8]) lower bound 1+α . Several illustrative examples are given. arXiv:0809.1027v2 [math.ST] 13 Nov 2008 AMS 2000 subject classifications: 62F10, 62F30, 62C10, 62C15, 35Q15, 45B05, 42A99. Keywords and phrases: Bayesian credible sets, restricted parameter space, confidence intervals, frequentist coverage probability, logconcavity. -
Random Forest Classifiers
Random Forest Classifiers Carlo Tomasi 1 Classification Trees A classification tree represents the probability space P of posterior probabilities p(yjx) of label given feature by a recursive partition of the feature space X, where each partition is performed by a test on the feature x called a split rule. To each set of the partition is assigned a posterior probability distribution, and p(yjx) for a feature x 2 X is then defined as the probability distribution associated with the set of the partition that contains x. A popular split rule called a 1-rule partitions a set S ⊆ X × Y into the two sets L = f(x; y) 2 S j xd ≤ tg and R = f(x; y) 2 S j xd > tg (1) where xd is the d-th component of x and t is a real number. This note considers only binary classification trees1 with 1-rule splits, and the word “binary” is omitted in what follows. Concretely, the split rules are placed on the interior nodes of a binary tree and the probability distri- butions are on the leaves. The tree τ can be defined recursively as either a single (leaf) node with values of the posterior probability p(yjx) collected in a vector τ:p or an (interior) node with a split function with parameters τ:d and τ:t that returns either the left or the right descendant (τ:L or τ:R) of τ depending on the outcome of the split. A classification tree τ then takes a new feature x, looks up its posterior in the tree by following splits in τ down to a leaf, and returns the MAP estimate, as summarized in Algorithm 1. -
Bayesian Inference for Median of the Lognormal Distribution K
Journal of Modern Applied Statistical Methods Volume 15 | Issue 2 Article 32 11-1-2016 Bayesian Inference for Median of the Lognormal Distribution K. Aruna Rao SDM Degree College, Ujire, India, [email protected] Juliet Gratia D'Cunha Mangalore University, Mangalagangothri, India, [email protected] Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Rao, K. Aruna and D'Cunha, Juliet Gratia (2016) "Bayesian Inference for Median of the Lognormal Distribution," Journal of Modern Applied Statistical Methods: Vol. 15 : Iss. 2 , Article 32. DOI: 10.22237/jmasm/1478003400 Available at: http://digitalcommons.wayne.edu/jmasm/vol15/iss2/32 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Bayesian Inference for Median of the Lognormal Distribution Cover Page Footnote Acknowledgements The es cond author would like to thank Government of India, Ministry of Science and Technology, Department of Science and Technology, New Delhi, for sponsoring her with an INSPIRE fellowship, which enables her to carry out the research program which she has undertaken. She is much honored to be the recipient of this award. This regular article is available in Journal of Modern Applied Statistical Methods: http://digitalcommons.wayne.edu/jmasm/vol15/ iss2/32 Journal of Modern Applied Statistical Methods Copyright © 2016 JMASM, Inc. November 2016, Vol. 15, No. 2, 526-535. -
Bayestest Interval — Interval Hypothesis Testing
Title stata.com bayestest interval — Interval hypothesis testing Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas Reference Also see Description bayestest interval performs interval hypothesis tests for model parameters and functions of model parameters using current Bayesian estimation results. bayestest interval reports mean estimates, standard deviations, and MCMC standard errors of posterior probabilities associated with an interval hypothesis. Quick start Posterior probability of the hypothesis that 45 < {y: cons} < 50 bayestest interval {y: cons}, lower(45) upper(50) As above, but skip every 5 observations from the full MCMC sample bayestest interval {y: cons}, lower(45) upper(50) skip(5) Posterior probability of a hypothesis about a function of model parameter {y:x1} bayestest interval (OR:exp({y:x1})), lower(1.1) upper(1.5) Posterior probability of hypotheses 45 < {y: cons} < 50 and 0 < {var} < 10 tested independently bayestest interval ({y: cons}, lower(45) upper(50)) /// ({var}, lower(0) upper(10)) As above, but tested jointly bayestest interval (({y: cons}, lower(45) upper(50)) /// ({var}, lower(0) upper(10)), joint) Posterior probability of the hypothesis {mean} = 2 for discrete parameter {mean} bayestest interval ({mean}==2) Posterior probability of the interval hypothesis 0 ≤ {mean} ≤ 4 bayestest interval {mean}, lower(0, inclusive) upper(4, inclusive) Posterior probability that the first observation of the first simulated outcome is positive (after bayesmh) bayespredict {_ysim}, -
Bayesian Inference
Bayesian Inference Thomas Nichols With thanks Lee Harrison Bayesian segmentation Spatial priors Posterior probability Dynamic Causal and normalisation on activation extent maps (PPMs) Modelling Attention to Motion Paradigm Results SPC V3A V5+ Attention – No attention Büchel & Friston 1997, Cereb. Cortex Büchel et al. 1998, Brain - fixation only - observe static dots + photic V1 - observe moving dots + motion V5 - task on moving dots + attention V5 + parietal cortex Attention to Motion Paradigm Dynamic Causal Models Model 1 (forward): Model 2 (backward): attentional modulation attentional modulation of V1→V5: forward of SPC→V5: backward Photic SPC Attention Photic SPC V1 V1 - fixation only V5 - observe static dots V5 - observe moving dots Motion Motion - task on moving dots Attention Bayesian model selection: Which model is optimal? Responses to Uncertainty Long term memory Short term memory Responses to Uncertainty Paradigm Stimuli sequence of randomly sampled discrete events Model simple computational model of an observers response to uncertainty based on the number of past events (extent of memory) 1 2 3 4 Question which regions are best explained by short / long term memory model? … 1 2 40 trials ? ? Overview • Introductory remarks • Some probability densities/distributions • Probabilistic (generative) models • Bayesian inference • A simple example – Bayesian linear regression • SPM applications – Segmentation – Dynamic causal modeling – Spatial models of fMRI time series Probability distributions and densities k=2 Probability distributions