Bayesian Modeling, Inference and Prediction

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Modeling, Inference and Prediction Bayesian Modeling, Inference and Prediction David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz [email protected] www.ams.ucsc.edu/ draper/ ∼ Draft 6 (December 2005): Many sections still quite rough. Comments welcome. c 2005 David Draper All rights reserved. No part of this book may be reprinted, reproduced, or utilized in any form or by any electronic, mechanical or other means, now known or hereafter invented, including photocopying and recording, or by an information storage or retrieval system, without permission in writing from the author. This book was typeset by the author using a PostScript-based phototypesetter ( c Adobe Systems, Inc.). The figures were generated in PostScript using the R data analysis language (R Project, 2005), and were directly incorporated into the typeset document. The text was formatted using the LATEX language (Lamport, 1994), a version of TEX (Knuth, 1984). ii David Draper Bayesian Modeling, Inference and Prediction iii To Andrea, from whom I've learned so much iv David Draper Contents Preface ix 1 Background and basics 1 1.1 Quantification of uncertainty . 1 1.2 Sequential learning; Bayes' Theorem . 5 1.3 Bayesian decision theory . 7 1.4 Problems . 11 2 Exchangeability and conjugate modeling 19 2.1 Quantification of uncertainty about observables. Binary out- comes . 19 2.2 Review of frequentist modeling . 20 2.2.1 The likelihood function . 26 2.2.2 Calibrating the MLE . 31 2.3 Bayesian modeling . 35 2.3.1 Exchangeability . 35 2.3.2 Hierarchical (mixture) modeling . 37 2.3.3 The simplest mixture model . 38 2.3.4 Conditional independence . 39 2.3.5 Prior, posterior, and predictive distributions . 40 2.3.6 Conjugate analysis. Comparison with frequentist mod- eling . 44 2.4 Integer-valued outcomes . 61 2.5 Continuous outcomes . 89 2.6 Appendix on hypothesis testing . 107 2.7 Problems . 110 v vi David Draper 3 Simulation-based computation 123 3.1 IID sampling . 123 3.1.1 Rejection sampling . 129 3.2 Markov chains . 134 3.2.1 The Monte Carlo and MCMC datasets . 141 3.3 The Metropolis algorithm . 143 3.3.1 Metropolis-Hastings sampling . 144 3.3.2 Gibbs sampling . 149 3.3.3 Why the Metropolis algorithm works . 154 3.3.4 Directed acyclic graphs . 158 3.3.5 Practical MCMC monitoring and convergence diagnos- tics . 168 3.3.6 MCMC accuracy . 176 3.4 Problems . 193 4 Bayesian model specification 199 4.1 Hierarchical model selection: An example with count data . 199 4.1.1 Poisson fixed-effects modeling . 202 4.1.2 Additive and multiplicative treatment effects . 208 4.1.3 Random-effects Poisson regression modeling . 216 4.2 Bayesian model choice . 226 4.2.1 Data-analytic model specification . 228 4.2.2 Model selection as a decision problem . 229 4.2.3 The log score and the deviance information criterion . 234 4.2.4 Connections with Bayes factors . 244 4.2.5 When is a model good enough? . 247 4.3 Problems . 250 5 Hierarchical models for combining information 261 5.1 The role of scientific context in formulating hierarchical models 261 5.1.1 Maximum likelihood and empirical Bayes . 264 5.1.2 Incorporating study-level covariates . 267 5.2 Problems . 282 6 Bayesian nonparametric modeling 285 6.1 de Finetti's representation theorem revisited . 285 Bayesian Modeling, Inference and Prediction vii 6.2 The Dirichlet process . 285 6.3 Dirichlet process mixture modeling . 285 6.4 P´olya trees . 285 6.5 Problems . 285 References 287 Index 295 viii David Draper Preface Santa Cruz, California David Draper December 2005 ix x David Draper Chapter 1 Background and basics 1.1 Quantification of uncertainty Case study: Diagnostic screening for HIV. Widespread screening for HIV has been proposed by some people in some countries (e.g., the U.S.). Two blood tests that screen for HIV are available: ELISA, which is relatively inexpensive (roughly $20) and fairly accurate; and Western Blot (WB), which is considerably more accurate but costs quite a bit more (about $100). Let's say1 I'm a physician, and a new patient comes to me with symptoms that suggest the patient (male, say) may be HIV positive. Questions: Is it appropriate to use the language of probability to quantify my • uncertainty about the proposition A = this patient is HIV positive ? f g If so, what kinds of probability are appropriate, and how would I assess • P (A) in each case? What strategy (e.g., ELISA, WB, both?) should I employ to decrease • my uncertainty about A? If I decide to run a screening test, how should my uncertainty be updated in light of the test results? 1As will become clear, the Bayesian approach to probability and statistics is explicit about the role of personal judgment in uncertainty assessment. To make this clear I'll write in the first person in this book, but as you read I encourage you to constantly imagine yourself in the position of the person referred to as \I" and to think along with that person in quantifying uncertainty and making choices in the face of it. 1 2 David Draper Statistics might be defined as the study of uncertainty: how to mea- sure it, and what to do about it, and probability as the part of math- ematics (and philosophy) devoted to the quantification of uncertainty. The systematic study of probability is fairly recent in the history of ideas, dating back to about 1650 (e.g., Hacking, 1975). In the last 350 years three main ways to define probability have arisen (e.g., Oakes, 1990): Classical: Enumerate the elemental outcomes (EOs) (the funda- • mental possibilities in the process under study) in a way that makes them equipossible on, e.g., symmetry grounds, and compute PC (A) = ratio of nA =(number of EOs favorable to A) to n =(total number of EOs). Frequentist: Restrict attention to attributes A of events: phenomena • that are inherently (and independently) repeatable under \identical" conditions; define PF (A) = the limiting value of the relative frequency with which A occurs in the repetitions. Personal, or \Subjective", or Bayesian: I imagine betting with • someone about the truth of proposition A, and ask myself what odds O (in favor of A) I would need to give or receive in order that I judge O the bet fair; then (for me) PB(A) = (1+O) . Other approaches not covered here include logical (e.g., Jeffreys, 1961) and fiducial (Fisher, 1935) probability. Each of these probability definitions has general advantages and disad- vantages: Classical • { Plus: Simple, when applicable (e.g., idealized coin-tossing, draw- ing colored balls from urns, choosing cards from a well-shuffled deck, and so on). { Minus: The only way to define \equipossible" without a circular appeal to probability is through the principle of insufficient reason|I judge EOs equipossible if I have no grounds (empirical, logical, or symmetrical) for favoring one over another|but this can lead to paradoxes (e.g., assertion of equal uncertainty is not invariant to the choice of scale on which it's asserted). Bayesian Modeling, Inference and Prediction 3 Frequentist • { Plus: Mathematics relatively tractable. { Minus: Only applies to inherently repeatable events, e.g., from the vantage point of (say) 2005, PF (the Republicans will win the White House again in 2008) is (strictly speaking) undefined. Bayesian • { Plus: All forms of uncertainty are in principle quantifiable with this approach. { Minus: There's no guarantee that the answer I get by asking myself about betting odds will retrospectively be seen by me or others as \good" (but how should the quality of an uncertainty assessment itself be assessed?). Returning to P (A) = P (this patient is HIV-positive), data are available from medical journals on the prevalence of HIV-positivity in various subsets of = all humans (e.g., it's higher in gay people and lower in women). All P f g three probabilistic approaches require me to use my judgment to identify the recognizable subpopulation this patient (Fisher, 1956; Draper et al., 1993): this is P the smallest subset to which this patient belongs for which the HIV prevalence differs from that in the rest of by an amount I judge as large enough to matter in a practicP al sense. The idea is that within this patient I regard HIV prevalence as close enough to constant that the differencesP aren't worth bothering over, but the differ- ences between HIV prevalence in this patient and its complement matter to me. P Here this patient might consist (for me) of everybody who matches this patient on gender,P age (category, e.g., 25{29), and sexual orientation. NB This is a modeling choice based on judgment; different reasonable people might make different choices. As a classicist I would then (a) use this definition to establish equipos- sibility within this patient, (b) count n =(number of HIV-positive people in P A this patient) and n =(total number of people in this patient), and (c) compute P nA P PC (A) = n . 4 David Draper As a frequentist I would (a) equate P (A) to P (a person chosen at random with replacement (independent identically distributed (IID) sampling) from this patient is HIV-positive), (b) imagine repeating this random sampling indefinitelyP , and (c) conclude that the limiting value of the relative frequency nA of HIV-positivity in these repetitions would also be PF (A) = n . NB Strictly speaking I'm not allowed in the frequentist approach to talk about P (this patient is HIV-positive)|he either is or he isn't. In the frequentist paradigm I can only talk about the process of sampling people like him from this patient.
Recommended publications
  • Generalized Linear Models (Glms)
    San Jos´eState University Math 261A: Regression Theory & Methods Generalized Linear Models (GLMs) Dr. Guangliang Chen This lecture is based on the following textbook sections: • Chapter 13: 13.1 – 13.3 Outline of this presentation: • What is a GLM? • Logistic regression • Poisson regression Generalized Linear Models (GLMs) What is a GLM? In ordinary linear regression, we assume that the response is a linear function of the regressors plus Gaussian noise: 0 2 y = β0 + β1x1 + ··· + βkxk + ∼ N(x β, σ ) | {z } |{z} linear form x0β N(0,σ2) noise The model can be reformulate in terms of • distribution of the response: y | x ∼ N(µ, σ2), and • dependence of the mean on the predictors: µ = E(y | x) = x0β Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University3/24 Generalized Linear Models (GLMs) beta=(1,2) 5 4 3 β0 + β1x b y 2 y 1 0 −1 0.0 0.2 0.4 0.6 0.8 1.0 x x Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University4/24 Generalized Linear Models (GLMs) Generalized linear models (GLM) extend linear regression by allowing the response variable to have • a general distribution (with mean µ = E(y | x)) and • a mean that depends on the predictors through a link function g: That is, g(µ) = β0x or equivalently, µ = g−1(β0x) Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University5/24 Generalized Linear Models (GLMs) In GLM, the response is typically assumed to have a distribution in the exponential family, which is a large class of probability distributions that have pdfs of the form f(x | θ) = a(x)b(θ) exp(c(θ) · T (x)), including • Normal - ordinary linear regression • Bernoulli - Logistic regression, modeling binary data • Binomial - Multinomial logistic regression, modeling general cate- gorical data • Poisson - Poisson regression, modeling count data • Exponential, Gamma - survival analysis Dr.
    [Show full text]
  • How Prediction Statistics Can Help Us Cope When We Are Shaken, Scared and Irrational
    EGU21-15219 https://doi.org/10.5194/egusphere-egu21-15219 EGU General Assembly 2021 © Author(s) 2021. This work is distributed under the Creative Commons Attribution 4.0 License. How prediction statistics can help us cope when we are shaken, scared and irrational Yavor Kamer1, Shyam Nandan2, Stefan Hiemer1, Guy Ouillon3, and Didier Sornette4,5 1RichterX.com, Zürich, Switzerland 2Windeggstrasse 5, 8953 Dietikon, Zurich, Switzerland 3Lithophyse, Nice, France 4Department of Management,Technology and Economics, ETH Zürich, Zürich, Switzerland 5Institute of Risk Analysis, Prediction and Management (Risks-X), SUSTech, Shenzhen, China Nature is scary. You can be sitting at your home and next thing you know you are trapped under the ruble of your own house or sucked into a sinkhole. For millions of years we have been the figurines of this precarious scene and we have found our own ways of dealing with the anxiety. It is natural that we create and consume prophecies, conspiracies and false predictions. Information technologies amplify not only our rational but also irrational deeds. Social media algorithms, tuned to maximize attention, make sure that misinformation spreads much faster than its counterpart. What can we do to minimize the adverse effects of misinformation, especially in the case of earthquakes? One option could be to designate one authoritative institute, set up a big surveillance network and cancel or ban every source of misinformation before it spreads. This might have worked a few centuries ago but not in this day and age. Instead we propose a more inclusive option: embrace all voices and channel them into an actual, prospective earthquake prediction platform (Kamer et al.
    [Show full text]
  • Reliability Engineering: Trends, Strategies and Best Practices
    Reliability Engineering: Trends, Strategies and Best Practices WHITE PAPER September 2007 Predictive HCL’s Predictive Engineering encompasses the complete product life-cycle process, from concept to design to prototyping/testing, all the way to manufacturing. This helps in making decisions early Engineering in the design process, perfecting the product – thereby cutting down cycle time and costs, and Think. Design. Perfect! meeting set reliability and quality standards. Reliability Engineering: Trends, Strategies and Best Practices | September 2007 TABLE OF CONTENTS Abstract 3 Importance of reliability engineering in product industry 3 Current trends in reliability engineering 4 Reliability planning – an important task 5 Strength of reliability analysis 6 Is your reliability test plan optimized? 6 Challenges to overcome 7 Outsourcing of a reliability task 7 About HCL 10 © 2007, HCL Technologies. Reproduction Prohibited. This document is protected under Copyright by the Author, all rights reserved. Reliability Engineering: Trends, Strategies and Best Practices | September 2007 Abstract In reliability engineering, product industries now follow a conscious and planned approach to effectively address multiple issues related to product certification, failure returns, customer satisfaction, market competition and product lifecycle cost. Today, reliability professionals face new challenges posed by advanced and complex technology. Expertise and experience are necessary to provide an optimized solution meeting time and cost constraints associated with analysis and testing. In the changed scenario, the reliability approach has also become more objective and result oriented. This is well supported by analysis software. This paper discusses all associated issues including outsourcing of reliability tasks to a professional service provider as an alternate cost-effective option. Importance of reliability engineering in product industry The degree of importance given to the reliability of products varies depending on their criticality.
    [Show full text]
  • R G M B.Ec. M.Sc. Ph.D
    Rolando Kristiansand, Norway www.bayesgroup.org Gonzales Martinez +47 41224721 [email protected] JOBS & AWARDS M.Sc. B.Ec. Applied Ph.D. Economics candidate Statistics Jobs Events Awards Del Valle University University of Alcalá Universitetet i Agder 2018 Bolivia, 2000-2004 Spain, 2009-2010 Norway, 2018- 1/2018 Ph.D scholarship 8/2017 International Consultant, OPHI OTHER EDUCATION First prize, International Young 12/2016 J-PAL Course on Evaluating Social Programs Macro-prudential Policies Statistician Prize, IAOS Massachusetts Institute of Technology - MIT International Monetary Fund (IMF) 5/2016 Deputy manager, BCP bank Systematic Literature Reviews and Meta-analysis Theorizing and Theory Building Campbell foundation - Vrije Universiteit Halmstad University (HH, Sweden) 1/2016 Consultant, UNFPA Bayesian Modeling, Inference and Prediction Multidimensional Poverty Analysis University of Reading OPHI, Oxford University Researcher, data analyst & SKILLS AND TECHNOLOGIES 1/2015 project coordinator, PEP Academic Research & publishing Financial & Risk analysis 12 years of experience making economic/nancial analysis 3/2014 Director, BayesGroup.org and building mathemati- Consulting cal/statistical/econometric models for the government, private organizations and non- prot NGOs Research & project Mathematical 1/2013 modeling coordinator, UNFPA LANGUAGES 04/2012 First award, BCG competition Spanish (native) Statistical analysis English (TOEFL: 106) 9/2011 Financial Economist, UDAPE MatLab Macro-economic R, RStudio 11/2010 Currently, I work mainly analyst, MEFP Stata, SPSS with MatLab and R . SQL, SAS When needed, I use 3/2010 First award, BCB competition Eviews Stata, SAS or Eviews. I started working in LaTex, WinBugs Python recently. 8/2009 M.Sc. scholarship Python, Spyder RECENT PUBLICATIONS Disjunction between universality and targeting of social policies: Demographic opportunities for depolarized de- velopment in Paraguay.
    [Show full text]
  • End-User Probabilistic Programming
    End-User Probabilistic Programming Judith Borghouts, Andrew D. Gordon, Advait Sarkar, and Neil Toronto Microsoft Research Abstract. Probabilistic programming aims to help users make deci- sions under uncertainty. The user writes code representing a probabilistic model, and receives outcomes as distributions or summary statistics. We consider probabilistic programming for end-users, in particular spread- sheet users, estimated to number in tens to hundreds of millions. We examine the sources of uncertainty actually encountered by spreadsheet users, and their coping mechanisms, via an interview study. We examine spreadsheet-based interfaces and technology to help reason under uncer- tainty, via probabilistic and other means. We show how uncertain values can propagate uncertainty through spreadsheets, and how sheet-defined functions can be applied to handle uncertainty. Hence, we draw conclu- sions about the promise and limitations of probabilistic programming for end-users. 1 Introduction In this paper, we discuss the potential of bringing together two rather distinct approaches to decision making under uncertainty: spreadsheets and probabilistic programming. We start by introducing these two approaches. 1.1 Background: Spreadsheets and End-User Programming The spreadsheet is the first \killer app" of the personal computer era, starting in 1979 with Dan Bricklin and Bob Frankston's VisiCalc for the Apple II [15]. The primary interface of a spreadsheet|then and now, four decades later|is the grid, a two-dimensional array of cells. Each cell may hold a literal data value, or a formula that computes a data value (and may depend on the data in other cells, which may themselves be computed by formulas).
    [Show full text]
  • Bayesian Data Analysis and Probabilistic Programming
    Probabilistic programming and Stan mc-stan.org Outline What is probabilistic programming Stan now Stan in the future A bit about other software The next wave of probabilistic programming Stan even wider adoption including wider use in companies simulators (likelihood free) autonomus agents (reinforcmenet learning) deep learning Probabilistic programming Probabilistic programming framework BUGS started revolution in statistical analysis in early 1990’s allowed easier use of elaborate models by domain experts was widely adopted in many fields of science Probabilistic programming Probabilistic programming framework BUGS started revolution in statistical analysis in early 1990’s allowed easier use of elaborate models by domain experts was widely adopted in many fields of science The next wave of probabilistic programming Stan even wider adoption including wider use in companies simulators (likelihood free) autonomus agents (reinforcmenet learning) deep learning Stan Tens of thousands of users, 100+ contributors, 50+ R packages building on Stan Commercial and scientific users in e.g. ecology, pharmacometrics, physics, political science, finance and econometrics, professional sports, real estate Used in scientific breakthroughs: LIGO gravitational wave discovery (2017 Nobel Prize in Physics) and counting black holes Used in hundreds of companies: Facebook, Amazon, Google, Novartis, Astra Zeneca, Zalando, Smartly.io, Reaktor, ... StanCon 2018 organized in Helsinki in 29-31 August http://mc-stan.org/events/stancon2018Helsinki/ Probabilistic program
    [Show full text]
  • Prediction in Multilevel Generalized Linear Models
    J. R. Statist. Soc. A (2009) 172, Part 3, pp. 659–687 Prediction in multilevel generalized linear models Anders Skrondal Norwegian Institute of Public Health, Oslo, Norway and Sophia Rabe-Hesketh University of California, Berkeley, USA, and Institute of Education, London, UK [Received February 2008. Final revision October 2008] Summary. We discuss prediction of random effects and of expected responses in multilevel generalized linear models. Prediction of random effects is useful for instance in small area estimation and disease mapping, effectiveness studies and model diagnostics. Prediction of expected responses is useful for planning, model interpretation and diagnostics. For prediction of random effects, we concentrate on empirical Bayes prediction and discuss three different kinds of standard errors; the posterior standard deviation and the marginal prediction error standard deviation (comparative standard errors) and the marginal sampling standard devi- ation (diagnostic standard error). Analytical expressions are available only for linear models and are provided in an appendix. For other multilevel generalized linear models we present approximations and suggest using parametric bootstrapping to obtain standard errors. We also discuss prediction of expectations of responses or probabilities for a new unit in a hypotheti- cal cluster, or in a new (randomly sampled) cluster or in an existing cluster. The methods are implemented in gllamm and illustrated by applying them to survey data on reading proficiency of children nested in schools. Simulations are used to assess the performance of various pre- dictions and associated standard errors for logistic random-intercept models under a range of conditions. Keywords: Adaptive quadrature; Best linear unbiased predictor (BLUP); Comparative standard error; Diagnostic standard error; Empirical Bayes; Generalized linear mixed model; gllamm; Mean-squared error of prediction; Multilevel model; Posterior; Prediction; Random effects; Scoring 1.
    [Show full text]
  • A Philosophical Treatise of Universal Induction
    Entropy 2011, 13, 1076-1136; doi:10.3390/e13061076 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article A Philosophical Treatise of Universal Induction Samuel Rathmanner and Marcus Hutter ? Research School of Computer Science, Australian National University, Corner of North and Daley Road, Canberra ACT 0200, Australia ? Author to whom correspondence should be addressed; E-Mail: [email protected]. Received: 20 April 2011; in revised form: 24 May 2011 / Accepted: 27 May 2011 / Published: 3 June 2011 Abstract: Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more recently computer scientists. In this article we argue the case for Solomonoff Induction, a formal inductive framework which combines algorithmic information theory with the Bayesian framework. Although it achieves excellent theoretical results and is based on solid philosophical foundations, the requisite technical knowledge necessary for understanding this framework has caused it to remain largely unknown and unappreciated in the wider scientific community. The main contribution of this article is to convey Solomonoff induction and its related concepts in a generally accessible form with the aim of bridging this current technical gap. In the process we examine the major historical contributions that have led to the formulation of Solomonoff Induction as well as criticisms of Solomonoff and induction in general. In particular we examine how Solomonoff induction addresses many issues that have plagued other inductive systems, such as the black ravens paradox and the confirmation problem, and compare this approach with other recent approaches.
    [Show full text]
  • Software Packages – Overview There Are Many Software Packages Available for Performing Statistical Analyses. the Ones Highlig
    APPENDIX C SOFTWARE OPTIONS Software Packages – Overview There are many software packages available for performing statistical analyses. The ones highlighted here are selected because they provide comprehensive statistical design and analysis tools and a user friendly interface. SAS SAS is arguably one of the most comprehensive packages in terms of statistical analyses. However, the individual license cost is expensive. Primarily, SAS is a macro based scripting language, making it challenging for new users. However, it provides a comprehensive analysis ranging from the simple t-test to Generalized Linear mixed models. SAS also provides limited Design of Experiments capability. SAS supports factorial designs and optimal designs. However, it does not handle design constraints well. SAS also has a GUI available: SAS Enterprise, however, it has limited capability. SAS is a good program to purchase if your organization conducts advanced statistical analyses or works with very large datasets. R R, similar to SAS, provides a comprehensive analysis toolbox. R is an open source software package that has been approved for use on some DoD computer (contact AFFTC for details). Because R is open source its price is very attractive (free). However, the help guides are not user friendly. R is primarily a scripting software with notation similar to MatLab. R’s GUI: R Commander provides limited DOE capabilities. R is a good language to download if you can get it approved for use and you have analysts with a strong programming background. MatLab w/ Statistical Toolbox MatLab has a statistical toolbox that provides a wide variety of statistical analyses. The analyses range from simple hypotheses test to complex models and the ability to use likelihood maximization.
    [Show full text]
  • Richard C. Zink, Ph.D
    Richard C. Zink, Ph.D. SAS Institute, Inc. • SAS Campus Drive • Cary, North Carolina 27513 • Office 919.531.4710 • Mobile 919.397.4238 • Blog: http://blogs.sas.com/content/jmp/author/rizink/ • [email protected] • @rczink SUMMARY Richard C. Zink is Principal Research Statistician Developer in the JMP Life Sciences division at SAS Institute. He is currently a developer for JMP Clinical, an innovative software package designed to streamline the review of clinical trial data. Richard joined SAS in 2011 after eight years in the pharmaceutical industry, where he designed and analyzed clinical trials in diverse therapeutic areas including infectious disease, oncology, and ophthalmology, and participated in US and European drug submissions and FDA advisory committee hearings. Richard is the Publications Officer for the Biopharmaceutical Section of the American Statistical Association, and the Statistics Section Editor for the Therapeutic Innovation & Regulatory Science (formerly Drug Information Journal). He holds a Ph.D. in Biostatistics from the University of North Carolina at Chapel Hill, where he serves as an Adjunct Assistant Professor of Biostatistics. His research interests include data visualization, the analysis of pre- and post-market adverse events, subgroup identification for patients with enhanced treatment response, and the assessment of data integrity in clinical trials. He is author of Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP and SAS, and co-editor of Modern Approaches to Clinical Trials Using
    [Show full text]
  • Winbugs for Beginners
    WinBUGS for Beginners Gabriela Espino-Hernandez Department of Statistics UBC July 2010 “A knowledge of Bayesian statistics is assumed…” The content of this presentation is mainly based on WinBUGS manual Introduction BUGS 1: “Bayesian inference Using Gibbs Sampling” Project for Bayesian analysis using MCMC methods It is not being further developed WinBUGS 1,2 Stable version Run directly from R and other programs OpenBUGS 3 Currently experimental Run directly from R and other programs Running under Linux as LinBUGS 1 MRC Biostatistics Unit Cambridge, 2 Imperial College School of Medicine at St Mary's, London 3 University of Helsinki, Finland WinBUGS Freely distributed http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml Key for unrestricted use http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/WinBUGS14_immortality_key.txt WinBUGS installation also contains: Extensive user manual Examples Control analysis using: Standard windows interface DoodleBUGS: Graphical representation of model A closed form for the posterior distribution is not needed Conditional independence is assumed Improper priors are not allowed Inputs Model code Specify data distributions Specify parameter distributions (priors) Data List / rectangular format Initial values for parameters Load / generate Model specification model { statements to describe model in BUGS language } Multiple statements in a single line or one statement over several lines Comment line is followed by # Types of nodes 1. Stochastic Variables that are given a distribution 2. Deterministic
    [Show full text]
  • Some Statistical Models for Prediction Jonathan Auerbach Submitted In
    Some Statistical Models for Prediction Jonathan Auerbach Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy under the Executive Committee of the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2020 © 2020 Jonathan Auerbach All Rights Reserved Abstract Some Statistical Models for Prediction Jonathan Auerbach This dissertation examines the use of statistical models for prediction. Examples are drawn from public policy and chosen because they represent pressing problems facing U.S. governments at the local, state, and federal level. The first five chapters provide examples where the perfunctory use of linear models, the prediction tool of choice in government, failed to produce reasonable predictions. Methodological flaws are identified, and more accurate models are proposed that draw on advances in statistics, data science, and machine learning. Chapter 1 examines skyscraper construction, where the normality assumption is violated and extreme value analysis is more appropriate. Chapters 2 and 3 examine presidential approval and voting (a leading measure of civic participation), where the non-collinearity assumption is violated and an index model is more appropriate. Chapter 4 examines changes in temperature sensitivity due to global warming, where the linearity assumption is violated and a first-hitting-time model is more appropriate. Chapter 5 examines the crime rate, where the independence assumption is violated and a block model is more appropriate. The last chapter provides an example where simple linear regression was overlooked as providing a sensible solution. Chapter 6 examines traffic fatalities, where the linear assumption provides a better predictor than the more popular non-linear probability model, logistic regression.
    [Show full text]