Bayesian Inference
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
DSA 5403 Bayesian Statistics: an Advancing Introduction 16 Units – Each Unit a Week’S Work
DSA 5403 Bayesian Statistics: An Advancing Introduction 16 units – each unit a week’s work. The following are the contents of the course divided into chapters of the book Doing Bayesian Data Analysis . by John Kruschke. The course is structured around the above book but will be embellished with more theoretical content as needed. The book can be obtained from the library : http://www.sciencedirect.com/science/book/9780124058880 Topics: This is divided into three parts From the book: “Part I The basics: models, probability, Bayes' rule and r: Introduction: credibility, models, and parameters; The R programming language; What is this stuff called probability?; Bayes' rule Part II All the fundamentals applied to inferring a binomial probability: Inferring a binomial probability via exact mathematical analysis; Markov chain Monte C arlo; J AG S ; Hierarchical models; Model comparison and hierarchical modeling; Null hypothesis significance testing; Bayesian approaches to testing a point ("Null") hypothesis; G oals, power, and sample size; S tan Part III The generalized linear model: Overview of the generalized linear model; Metric-predicted variable on one or two groups; Metric predicted variable with one metric predictor; Metric predicted variable with multiple metric predictors; Metric predicted variable with one nominal predictor; Metric predicted variable with multiple nominal predictors; Dichotomous predicted variable; Nominal predicted variable; Ordinal predicted variable; C ount predicted variable; Tools in the trunk” Part 1: The Basics: Models, Probability, Bayes’ Rule and R Unit 1 Chapter 1, 2 Credibility, Models, and Parameters, • Derivation of Bayes’ rule • Re-allocation of probability • The steps of Bayesian data analysis • Classical use of Bayes’ rule o Testing – false positives etc o Definitions of prior, likelihood, evidence and posterior Unit 2 Chapter 3 The R language • Get the software • Variables types • Loading data • Functions • Plots Unit 3 Chapter 4 Probability distributions. -
A History of Evidence in Medical Decisions: from the Diagnostic Sign to Bayesian Inference
BRIEF REPORTS A History of Evidence in Medical Decisions: From the Diagnostic Sign to Bayesian Inference Dennis J. Mazur, MD, PhD Bayesian inference in medical decision making is a concept through the development of probability theory based on con- that has a long history with 3 essential developments: 1) the siderations of games of chance, and ending with the work of recognition of the need for data (demonstrable scientific Jakob Bernoulli, Laplace, and others, we will examine how evidence), 2) the development of probability, and 3) the Bayesian inference developed. Key words: evidence; games development of inverse probability. Beginning with the of chance; history of evidence; inference; probability. (Med demonstrative evidence of the physician’s sign, continuing Decis Making 2012;32:227–231) here are many senses of the term evidence in the Hacking argues that when humans began to examine Thistory of medicine other than scientifically things pointing beyond themselves to other things, gathered (collected) evidence. Historically, evidence we are examining a new type of evidence to support was first based on testimony and opinion by those the truth or falsehood of a scientific proposition respected for medical expertise. independent of testimony or opinion. This form of evidence can also be used to challenge the data themselves in supporting the truth or falsehood of SIGN AS EVIDENCE IN MEDICINE a proposition. In this treatment of the evidence contained in the Ian Hacking singles out the physician’s sign as the 1 physician’s diagnostic sign, Hacking is examining first example of demonstrable scientific evidence. evidence as a phenomenon of the ‘‘low sciences’’ Once the physician’s sign was recognized as of the Renaissance, which included alchemy, astrol- a form of evidence, numbers could be attached to ogy, geology, and medicine.1 The low sciences were these signs. -
Assessing Fairness with Unlabeled Data and Bayesian Inference
Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference Disi Ji1 Padhraic Smyth1 Mark Steyvers2 1Department of Computer Science 2Department of Cognitive Sciences University of California, Irvine [email protected] [email protected] [email protected] Abstract We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not. 1 Introduction Machine learning models are increasingly used to make important decisions about individuals. At the same time it has become apparent that these models are susceptible to producing systematically biased decisions with respect to sensitive attributes such as gender, ethnicity, and age [Angwin et al., 2017, Berk et al., 2018, Corbett-Davies and Goel, 2018, Chen et al., 2019, Beutel et al., 2019]. This has led to a significant amount of recent work in machine learning addressing these issues, including research on both (i) definitions of fairness in a machine learning context (e.g., Dwork et al. -
Bayes and the Law
Bayes and the Law Norman Fenton, Martin Neil and Daniel Berger [email protected] January 2016 This is a pre-publication version of the following article: Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016, doi: 10.1146/annurev-statistics-041715-033428 Posted with permission from the Annual Review of Statistics and Its Application, Volume 3 (c) 2016 by Annual Reviews, http://www.annualreviews.org. Abstract Although the last forty years has seen considerable growth in the use of statistics in legal proceedings, it is primarily classical statistical methods rather than Bayesian methods that have been used. Yet the Bayesian approach avoids many of the problems of classical statistics and is also well suited to a broader range of problems. This paper reviews the potential and actual use of Bayes in the law and explains the main reasons for its lack of impact on legal practice. These include misconceptions by the legal community about Bayes’ theorem, over-reliance on the use of the likelihood ratio and the lack of adoption of modern computational methods. We argue that Bayesian Networks (BNs), which automatically produce the necessary Bayesian calculations, provide an opportunity to address most concerns about using Bayes in the law. Keywords: Bayes, Bayesian networks, statistics in court, legal arguments 1 1 Introduction The use of statistics in legal proceedings (both criminal and civil) has a long, but not terribly well distinguished, history that has been very well documented in (Finkelstein, 2009; Gastwirth, 2000; Kadane, 2008; Koehler, 1992; Vosk and Emery, 2014). -
Bayesian Inference Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Aus´ınand Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 1 / 35 Objective AFM Smith Dennis Lindley We analyze the Bayesian approach to fitting normal and generalized linear models and introduce the Bayesian hierarchical modeling approach. Also, we study the modeling and forecasting of time series. Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 2 / 35 Contents 1 Normal linear models 1.1. ANOVA model 1.2. Simple linear regression model 2 Generalized linear models 3 Hierarchical models 4 Dynamic models Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 3 / 35 Normal linear models A normal linear model is of the following form: y = Xθ + ; 0 where y = (y1;:::; yn) is the observed data, X is a known n × k matrix, called 0 the design matrix, θ = (θ1; : : : ; θk ) is the parameter set and follows a multivariate normal distribution. Usually, it is assumed that: 1 ∼ N 0 ; I : k φ k A simple example of normal linear model is the simple linear regression model T 1 1 ::: 1 where X = and θ = (α; β)T . x1 x2 ::: xn Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 4 / 35 Normal linear models Consider a normal linear model, y = Xθ + . A conjugate prior distribution is a normal-gamma distribution: -
Estimating the Accuracy of Jury Verdicts
Institute for Policy Research Northwestern University Working Paper Series WP-06-05 Estimating the Accuracy of Jury Verdicts Bruce D. Spencer Faculty Fellow, Institute for Policy Research Professor of Statistics Northwestern University Version date: April 17, 2006; rev. May 4, 2007 Forthcoming in Journal of Empirical Legal Studies 2040 Sheridan Rd. ! Evanston, IL 60208-4100 ! Tel: 847-491-3395 Fax: 847-491-9916 www.northwestern.edu/ipr, ! [email protected] Abstract Average accuracy of jury verdicts for a set of cases can be studied empirically and systematically even when the correct verdict cannot be known. The key is to obtain a second rating of the verdict, for example the judge’s, as in the recent study of criminal cases in the U.S. by the National Center for State Courts (NCSC). That study, like the famous Kalven-Zeisel study, showed only modest judge-jury agreement. Simple estimates of jury accuracy can be developed from the judge-jury agreement rate; the judge’s verdict is not taken as the gold standard. Although the estimates of accuracy are subject to error, under plausible conditions they tend to overestimate the average accuracy of jury verdicts. The jury verdict was estimated to be accurate in no more than 87% of the NCSC cases (which, however, should not be regarded as a representative sample with respect to jury accuracy). More refined estimates, including false conviction and false acquittal rates, are developed with models using stronger assumptions. For example, the conditional probability that the jury incorrectly convicts given that the defendant truly was not guilty (a “type I error”) was estimated at 0.25, with an estimated standard error (s.e.) of 0.07, the conditional probability that a jury incorrectly acquits given that the defendant truly was guilty (“type II error”) was estimated at 0.14 (s.e. -
The Interpretation of Probability: Still an Open Issue? 1
philosophies Article The Interpretation of Probability: Still an Open Issue? 1 Maria Carla Galavotti Department of Philosophy and Communication, University of Bologna, Via Zamboni 38, 40126 Bologna, Italy; [email protected] Received: 19 July 2017; Accepted: 19 August 2017; Published: 29 August 2017 Abstract: Probability as understood today, namely as a quantitative notion expressible by means of a function ranging in the interval between 0–1, took shape in the mid-17th century, and presents both a mathematical and a philosophical aspect. Of these two sides, the second is by far the most controversial, and fuels a heated debate, still ongoing. After a short historical sketch of the birth and developments of probability, its major interpretations are outlined, by referring to the work of their most prominent representatives. The final section addresses the question of whether any of such interpretations can presently be considered predominant, which is answered in the negative. Keywords: probability; classical theory; frequentism; logicism; subjectivism; propensity 1. A Long Story Made Short Probability, taken as a quantitative notion whose value ranges in the interval between 0 and 1, emerged around the middle of the 17th century thanks to the work of two leading French mathematicians: Blaise Pascal and Pierre Fermat. According to a well-known anecdote: “a problem about games of chance proposed to an austere Jansenist by a man of the world was the origin of the calculus of probabilities”2. The ‘man of the world’ was the French gentleman Chevalier de Méré, a conspicuous figure at the court of Louis XIV, who asked Pascal—the ‘austere Jansenist’—the solution to some questions regarding gambling, such as how many dice tosses are needed to have a fair chance to obtain a double-six, or how the players should divide the stakes if a game is interrupted. -
Bayesian Versus Frequentist Statistics for Uncertainty Analysis
- 1 - Disentangling Classical and Bayesian Approaches to Uncertainty Analysis Robin Willink1 and Rod White2 1email: [email protected] 2Measurement Standards Laboratory PO Box 31310, Lower Hutt 5040 New Zealand email(corresponding author): [email protected] Abstract Since the 1980s, we have seen a gradual shift in the uncertainty analyses recommended in the metrological literature, principally Metrologia, and in the BIPM’s guidance documents; the Guide to the Expression of Uncertainty in Measurement (GUM) and its two supplements. The shift has seen the BIPM’s recommendations change from a purely classical or frequentist analysis to a purely Bayesian analysis. Despite this drift, most metrologists continue to use the predominantly frequentist approach of the GUM and wonder what the differences are, why there are such bitter disputes about the two approaches, and should I change? The primary purpose of this note is to inform metrologists of the differences between the frequentist and Bayesian approaches and the consequences of those differences. It is often claimed that a Bayesian approach is philosophically consistent and is able to tackle problems beyond the reach of classical statistics. However, while the philosophical consistency of the of Bayesian analyses may be more aesthetically pleasing, the value to science of any statistical analysis is in the long-term success rates and on this point, classical methods perform well and Bayesian analyses can perform poorly. Thus an important secondary purpose of this note is to highlight some of the weaknesses of the Bayesian approach. We argue that moving away from well-established, easily- taught frequentist methods that perform well, to computationally expensive and numerically inferior Bayesian analyses recommended by the GUM supplements is ill-advised. -
Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam
Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam Roger Barlow The University of Huddersfield August 2018 Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 1 / 34 Lecture 1: The Basics 1 Probability What is it? Frequentist Probability Conditional Probability and Bayes' Theorem Bayesian Probability 2 Probability distributions and their properties Expectation Values Binomial, Poisson and Gaussian 3 Hypothesis testing Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 2 / 34 Question: What is Probability? Typical exam question Q1 Explain what is meant by the Probability PA of an event A [1] Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 3 / 34 Four possible answers PA is number obeying certain mathematical rules. PA is a property of A that determines how often A happens For N trials in which A occurs NA times, PA is the limit of NA=N for large N PA is my belief that A will happen, measurable by seeing what odds I will accept in a bet. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 4 / 34 Mathematical Kolmogorov Axioms: For all A ⊂ S PA ≥ 0 PS = 1 P(A[B) = PA + PB if A \ B = ϕ and A; B ⊂ S From these simple axioms a complete and complicated structure can be − ≤ erected. E.g. show PA = 1 PA, and show PA 1.... But!!! This says nothing about what PA actually means. Kolmogorov had frequentist probability in mind, but these axioms apply to any definition. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 5 / 34 Classical or Real probability Evolved during the 18th-19th century Developed (Pascal, Laplace and others) to serve the gambling industry. -
Bayesian Inference in Linguistic Analysis Stephany Palmer 1
Bayesian Inference in Linguistic Analysis Stephany Palmer 1 | Introduction: Besides the dominant theory of probability, Frequentist inference, there exists a different interpretation of probability. This other interpretation, Bayesian inference, factors in prior knowledge of the situation of interest and lends itself to allow strongly worded conclusions from the sample data. Although having not been widely used until recently due to technological advancements relieving the computationally heavy nature of Bayesian probability, it can be found in linguistic studies amongst others to update or sustain long-held hypotheses. 2 | Data/Information Analysis: Before performing a study, one should ask themself what they wish to get out of it. After collecting the data, what does one do with it? You can summarize your sample with descriptive statistics like mean, variance, and standard deviation. However, if you want to use the sample data to make inferences about the population, that involves inferential statistics. In inferential statistics, probability is used to make conclusions about the situation of interest. 3 | Frequentist vs Bayesian: The classical and dominant school of thought is frequentist probability, which is what I have been taught throughout my years learning statistics, and I assume is the same throughout the American education system. In taking a sample, one is essentially trying to estimate an unknown parameter. When interpreting the result through a confidence interval (CI), it is not allowed to interpret that the actual parameter lies within the CI, so you have to circumvent and say that “we are 95% confident that the actual value of the unknown parameter lies within the CI, that in infinite repeated trials, 95% of the CI’s will contain the true value, and 5% will not.” Bayesian probability says that only the data is real, and as the unknown parameter is abstract, thus some potential values of it are more plausible than others. -
Statistical Modelling
Statistical Modelling Dave Woods and Antony Overstall (Chapters 1–2 closely based on original notes by Anthony Davison and Jon Forster) c 2016 StatisticalModelling ............................... ......................... 0 1. Model Selection 1 Overview............................................ .................. 2 Basic Ideas 3 Whymodel?.......................................... .................. 4 Criteriaformodelselection............................ ...................... 5 Motivation......................................... .................... 6 Setting ............................................ ................... 9 Logisticregression................................... .................... 10 Nodalinvolvement................................... .................... 11 Loglikelihood...................................... .................... 14 Wrongmodel.......................................... ................ 15 Out-of-sampleprediction ............................. ..................... 17 Informationcriteria................................... ................... 18 Nodalinvolvement................................... .................... 20 Theoreticalaspects.................................. .................... 21 PropertiesofAIC,NIC,BIC............................... ................. 22 Linear Model 23 Variableselection ................................... .................... 24 Stepwisemethods.................................... ................... 25 Nuclearpowerstationdata............................ .................... -
Introduction to Bayesian Estimation
Introduction to Bayesian Estimation March 2, 2016 The Plan 1. What is Bayesian statistics? How is it different from frequentist methods? 2. 4 Bayesian Principles 3. Overview of main concepts in Bayesian analysis Main reading: Ch.1 in Gary Koop's Bayesian Econometrics Additional material: Christian P. Robert's The Bayesian Choice: From decision theoretic foundations to Computational Implementation Gelman, Carlin, Stern, Dunson, Wehtari and Rubin's Bayesian Data Analysis Probability and statistics; What's the difference? Probability is a branch of mathematics I There is little disagreement about whether the theorems follow from the axioms Statistics is an inversion problem: What is a good probabilistic description of the world, given the observed outcomes? I There is some disagreement about how we interpret data/observations and how we make inference about unobservable parameters Why probabilistic models? Is the world characterized by randomness? I Is the weather random? I Is a coin flip random? I ECB interest rates? It is difficult to say with certainty whether something is \truly" random. Two schools of statistics What is the meaning of probability, randomness and uncertainty? Two main schools of thought: I The classical (or frequentist) view is that probability corresponds to the frequency of occurrence in repeated experiments I The Bayesian view is that probabilities are statements about our state of knowledge, i.e. a subjective view. The difference has implications for how we interpret estimated statistical models and there is no general