Stan User's Guide

Total Page:16

File Type:pdf, Size:1020Kb

Stan User's Guide Stan User’s Guide Version 2.19 Stan Development Team Contents About this Book 7 Part 1. Example Models 9 1. Regression Models 10 1.1 Linear Regression 10 1.2 The QR Reparameterization 12 1.3 Priors for Coefficients and Scales 14 1.4 Robust Noise Models 14 1.5 Logistic and Probit Regression 15 1.6 Multi-Logit Regression 16 1.7 Parameterizing Centered Vectors 19 1.8 Ordered Logistic and Probit Regression 21 1.9 Hierarchical Logistic Regression 23 1.10 Hierarchical Priors 25 1.11 Item-Response Theory Models 26 1.12 Priors for Identifiability 30 1.13 Multivariate Priors for Hierarchical Models 31 1.14 Prediction, Forecasting, and Backcasting 38 1.15 Multivariate Outcomes 39 1.16 Applications of Pseudorandom Number Generation 45 2. Time-Series Models 48 2.1 Autoregressive Models 48 2.2 Modeling Temporal Heteroscedasticity 51 2.3 Moving Average Models 52 2.4 Autoregressive Moving Average Models 55 2.5 Stochastic Volatility Models 57 2.6 Hidden Markov Models 60 3. Missing Data and Partially Known Parameters 66 1 CONTENTS 2 3.1 Missing Data 66 3.2 Partially Known Parameters 67 3.3 Sliced Missing Data 68 3.4 Loading matrix for factor analysis 69 3.5 Missing Multivariate Data 70 4. Truncated or Censored Data 73 4.1 Truncated Distributions 73 4.2 Truncated Data 73 4.3 Censored Data 75 5. Finite Mixtures 78 5.1 Relation to Clustering 78 5.2 Latent Discrete Parameterization 78 5.3 Summing out the Responsibility Parameter 79 5.4 Vectorizing Mixtures 82 5.5 Inferences Supported by Mixtures 83 5.6 Zero-Inflated and Hurdle Models 85 5.7 Priors and Effective Data Size in Mixture Models 90 6. Measurement Error and Meta-Analysis 91 6.1 Bayesian Measurement Error Model 91 6.2 Meta-Analysis 95 7. Latent Discrete Parameters 99 7.1 The Benefits of Marginalization 99 7.2 Change Point Models 99 7.3 Mark-Recapture Models 106 7.4 Data Coding and Diagnostic Accuracy Models 115 8. Sparse and Ragged Data Structures 120 8.1 Sparse Data Structures 120 8.2 Ragged Data Structures 121 9. Clustering Models 124 9.1 Relation to Finite Mixture Models 124 9.2 Soft K-Means 124 CONTENTS 3 9.3 The Difficulty of Bayesian Inference for Clustering 127 9.4 Naive Bayes Classification and Clustering 128 9.5 Latent Dirichlet Allocation 132 10. Gaussian Processes 137 10.1 Gaussian Process Regression 137 10.2 Simulating from a Gaussian Process 139 10.3 Fitting a Gaussian Process 142 11. Directions, Rotations, and Hyperspheres 160 11.1 Unit Vectors 160 11.2 Circles, Spheres, and Hyperspheres 161 11.3 Transforming to Unconstrained Parameters 161 11.4 Unit Vectors and Rotations 162 11.5 Circular Representations of Days and Years 163 12. Solving Algebraic Equations 164 12.1 Example: System of Nonlinear Algebraic Equations 164 12.2 Coding an Algebraic System 164 12.3 Calling the Algebraic Solver 165 12.4 Control Parameters for the Algebraic Solver 166 13. Ordinary Differential Equations 168 13.1 Example: Simple Harmonic Oscillator 168 13.2 Coding an ODE System 169 13.3 Solving a System of Linear ODEs using a Matrix Exponential 170 13.4 Measurement Error Models 171 13.5 Stiff ODEs 176 13.6 Control Parameters for ODE Solving 176 14. Computing One Dimensional Integrals 178 14.1 Calling the Integrator 179 14.2 Integrator Convergence 180 Part 2. Programming Techniques 183 15. Floating Point Arithmetic 184 15.1 Floating-point representations 184 CONTENTS 4 15.2 Literals: decimal and scientific notation 186 15.3 Arithmetic Precision 186 15.4 Comparing floating-point numbers 190 16. Matrices, Vectors, and Arrays 191 16.1 Basic Motivation 191 16.2 Fixed Sizes and Indexing out of Bounds 192 16.3 Data Type and Indexing Efficiency 192 16.4 Memory Locality 194 16.5 Converting among Matrix, Vector, and Array Types 196 16.6 Aliasing in Stan Containers 196 17. Multiple Indexing and Range Indexing 197 17.1 Multiple Indexing 197 17.2 Slicing with Range Indexes 199 17.3 Multiple Indexing on the Left of Assignments 199 17.4 Multiple Indexes with Vectors and Matrices 201 17.5 Matrices with Parameters and Constants 203 18. User-Defined Functions 205 18.1 Basic Functions 205 18.2 Functions as Statements 210 18.3 Functions Accessing the Log Probability Accumulator 210 18.4 Functions Acting as Random Number Generators 211 18.5 User-Defined Probability Functions 212 18.6 Overloading Functions 213 18.7 Documenting Functions 213 18.8 Summary of Function Types 214 18.9 Recursive Functions 215 18.10 Truncated Random Number Generation 216 19. Custom Probability Functions 219 19.1 Examples 219 20. Problematic Posteriors 222 20.1 Collinearity of Predictors in Regressions 222 CONTENTS 5 20.2 Label Switching in Mixture Models 229 20.3 Component Collapsing in Mixture Models 231 20.4 Posteriors with Unbounded Densities 232 20.5 Posteriors with Unbounded Parameters 233 20.6 Uniform Posteriors 234 20.7 Sampling Difficulties with Problematic Priors 234 21. Reparameterization and Change of Variables 239 21.1 Theoretical and Practical Background 239 21.2 Reparameterizations 239 21.3 Changes of Variables 244 21.4 Vectors with Varying Bounds 248 22. Efficiency Tuning 250 22.1 What is Efficiency? 250 22.2 Efficiency for Probabilistic Models and Algorithms 250 22.3 Statistical vs. Computational Efficiency 251 22.4 Model Conditioning and Curvature 251 22.5 Well-Specified Models 253 22.6 Avoiding Validation 253 22.7 Reparameterization 254 22.8 Vectorization 268 22.9 Exploiting Sufficient Statistics 273 22.10 Aggregating Common Subexpressions 274 22.11 Exploiting Conjugacy 274 22.12 Standardizing Predictors and Outputs 275 22.13 Using Map-Reduce 278 23. Map-Reduce 279 23.1 Overview of Map-Reduce 279 23.2 Map Function 279 23.3 Example: Mapping Logistic Regression 280 23.4 Example: Hierarchical Logistic Regression 282 23.5 Ragged Inputs and Outputs 285 Appendices 287 CONTENTS 6 24. Stan Program Style Guide 288 24.1 Choose a Consistent Style 288 24.2 Line Length 288 24.3 File Extensions 288 24.4 Variable Naming 288 24.5 Local Variable Scope 289 24.6 Parentheses and Brackets 290 24.7 Conditionals 291 24.8 Functions 292 24.9 White Space 293 25. Transitioning from BUGS 296 25.1 Some Differences in How BUGS and Stan Work 296 25.2 Some Differences in the Modeling Languages 298 25.3 Some Differences in the Statistical Models that are Allowed 302 25.4 Some Differences when Running from R 303 25.5 The Stan Community 304 References 305 About this Book This book is the official user’s guide for Stan. It provides example models and pro- gramming techniques for coding statistical models in Stan. How to use this book Part 1 gives Stan code and discussions for several important classes of models. Part 2 discusses various general Stan programming techniques that are not tied to any particular model. The appendices provide a style guide and advice for users of BUGS and JAGS. We recommend working through this book using the textbooks Bayesian Data Analysis and Statistical Rethinking: A Bayesian Course with Examples in R and Stan as references on the concepts, and using the Stan Reference Manual when necessary to clarify programming issues. Further resources are given at the end of the introductory chapter. Additional Stan manuals and guides In addition to this user’s guide, there are two reference manuals for the Stan language and algorithms. The Stan Reference Manual specifies the Stan programming language and inference algorithms. The Stan Functions Reference specifies the functions built into the Stan programming language. There is also a separate installation and getting started guide for each of the Stan interfaces (R, Python, Julia, Stata, MATLAB, Mathematica, and command line). Web resources Stan is an open-source software project, resources for which are hosted on various web sites: • The Stan Web Site organizes all of the resources for the Stan project for users and developers. It contains links to the official Stan releases, source code, installation instructions, and full documentation, including the latest version of this manual, the user’s guide and the getting started guide for each interface, tutorials, case studies, and reference materials for developers. 7 CONTENTS 8 • The Stan Forums provide structured message boards for questions, discussion, and announcements related to Stan for both users and developers. • The Stan GitHub Organization hosts all of Stan’s code, documentation, wikis, and web site, as well as the issue trackers for bug reports and feature requests and interactive code review for pull requests. Acknowledgements The Stan project could not exit without developers, users, and funding. Stan is a highly collaborative project. The individual contributions of the Stan developers to code is tracked through GitHub and to the design conversation in the Wikis and forums. Users have made extensive contributions to documentation in the way of case studies, tutorials and even books. They have also reported numerous bugs in both the code and documentation. Stan has been funded through grants for Stan and its developers, through in-kind donations in the form of companies contributing developer time to Stan and individ- uals contributing their own time to Stan, and through donations to the open-source scientific software non-profit NumFOCUS. For details of direct funding for the project, see the web site and project pages of the Stan developers. Copyright, Trademark, and Licensing This book is copyright 2011–2019, Stan Development Team and their assignees. The text content is distributed under the CC-BY ND 4.0 license.
Recommended publications
  • Assessing Fairness with Unlabeled Data and Bayesian Inference
    Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference Disi Ji1 Padhraic Smyth1 Mark Steyvers2 1Department of Computer Science 2Department of Cognitive Sciences University of California, Irvine [email protected] [email protected] [email protected] Abstract We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not. 1 Introduction Machine learning models are increasingly used to make important decisions about individuals. At the same time it has become apparent that these models are susceptible to producing systematically biased decisions with respect to sensitive attributes such as gender, ethnicity, and age [Angwin et al., 2017, Berk et al., 2018, Corbett-Davies and Goel, 2018, Chen et al., 2019, Beutel et al., 2019]. This has led to a significant amount of recent work in machine learning addressing these issues, including research on both (i) definitions of fairness in a machine learning context (e.g., Dwork et al.
    [Show full text]
  • Bayesian Inference Chapter 4: Regression and Hierarchical Models
    Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Aus´ınand Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 1 / 35 Objective AFM Smith Dennis Lindley We analyze the Bayesian approach to fitting normal and generalized linear models and introduce the Bayesian hierarchical modeling approach. Also, we study the modeling and forecasting of time series. Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 2 / 35 Contents 1 Normal linear models 1.1. ANOVA model 1.2. Simple linear regression model 2 Generalized linear models 3 Hierarchical models 4 Dynamic models Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 3 / 35 Normal linear models A normal linear model is of the following form: y = Xθ + ; 0 where y = (y1;:::; yn) is the observed data, X is a known n × k matrix, called 0 the design matrix, θ = (θ1; : : : ; θk ) is the parameter set and follows a multivariate normal distribution. Usually, it is assumed that: 1 ∼ N 0 ; I : k φ k A simple example of normal linear model is the simple linear regression model T 1 1 ::: 1 where X = and θ = (α; β)T . x1 x2 ::: xn Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 4 / 35 Normal linear models Consider a normal linear model, y = Xθ + . A conjugate prior distribution is a normal-gamma distribution:
    [Show full text]
  • Chapter 7 Assessing and Improving Convergence of the Markov Chain
    Chapter 7 Assessing and improving convergence of the Markov chain Questions: . Are we getting the right answer? . Can we get the answer quicker? Bayesian Biostatistics - Piracicaba 2014 376 7.1 Introduction • MCMC sampling is powerful, but comes with a cost: dependent sampling + checking convergence is not easy: ◦ Convergence theorems do not tell us when convergence will occur ◦ In this chapter: graphical + formal diagnostics to assess convergence • Acceleration techniques to speed up MCMC sampling procedure • Data augmentation as Bayesian generalization of EM algorithm Bayesian Biostatistics - Piracicaba 2014 377 7.2 Assessing convergence of a Markov chain Bayesian Biostatistics - Piracicaba 2014 378 7.2.1 Definition of convergence for a Markov chain Loose definition: With increasing number of iterations (k −! 1), the distribution of k θ , pk(θ), converges to the target distribution p(θ j y). In practice, convergence means: • Histogram of θk remains the same along the chain • Summary statistics of θk remain the same along the chain Bayesian Biostatistics - Piracicaba 2014 379 Approaches • Theoretical research ◦ Establish conditions that ensure convergence, but in general these theoretical results cannot be used in practice • Two types of practical procedures to check convergence: ◦ Checking stationarity: from which iteration (k0) is the chain sampling from the posterior distribution (assessing burn-in part of the Markov chain). ◦ Checking accuracy: verify that the posterior summary measures are computed with the desired accuracy • Most
    [Show full text]
  • Stan: a Probabilistic Programming Language
    JSS Journal of Statistical Software January 2017, Volume 76, Issue 1. doi: 10.18637/jss.v076.i01 Stan: A Probabilistic Programming Language Bob Carpenter Andrew Gelman Matthew D. Hoffman Columbia University Columbia University Adobe Creative Technologies Lab Daniel Lee Ben Goodrich Michael Betancourt Columbia University Columbia University Columbia University Marcus A. Brubaker Jiqiang Guo Peter Li York University NPD Group Columbia University Allen Riddell Indiana University Abstract Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propa- gation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces sup- port sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.
    [Show full text]
  • Getting Started in Openbugs / Winbugs
    Practical 1: Getting started in OpenBUGS Slide 1 An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics Dr. Christian Asseburg Centre for Health Economics University of York, UK [email protected] Practical 1 Getting started in OpenBUGS / WinB UGS 2007-03-12, Linköping Dr Christian Asseburg University of York, UK [email protected] Practical 1: Getting started in OpenBUGS Slide 2 ● Brief comparison WinBUGS / OpenBUGS ● Practical – Opening OpenBUGS – Entering a model and data – Some error messages – Starting the sampler – Checking sampling performance – Retrieving the posterior summaries 2007-03-12, Linköping Dr Christian Asseburg University of York, UK [email protected] Practical 1: Getting started in OpenBUGS Slide 3 WinBUGS / OpenBUGS ● WinBUGS was developed at the MRC Biostatistics unit in Cambridge. Free download, but registration required for a licence. No fee and no warranty. ● OpenBUGS is the current development of WinBUGS after its source code was released to the public. Download is free, no registration, GNU GPL licence. 2007-03-12, Linköping Dr Christian Asseburg University of York, UK [email protected] Practical 1: Getting started in OpenBUGS Slide 4 WinBUGS / OpenBUGS ● There are no major differences between the latest WinBUGS (1.4.1) and OpenBUGS (2.2.0) releases. ● Minor differences include: – OpenBUGS is occasionally a bit slower – WinBUGS requires Microsoft Windows OS – OpenBUGS error messages are sometimes more informative ● The examples in these slides use OpenBUGS. 2007-03-12, Linköping Dr Christian Asseburg University of York, UK [email protected] Practical 1: Getting started in OpenBUGS Slide 5 Practical 1: Target ● Start OpenBUGS ● Code the example from health economics from the earlier presentation ● Run the sampler and obtain posterior summaries 2007-03-12, Linköping Dr Christian Asseburg University of York, UK [email protected] Practical 1: Getting started in OpenBUGS Slide 6 Starting OpenBUGS ● You can download OpenBUGS from http://mathstat.helsinki.fi/openbugs/ ● Start the program.
    [Show full text]
  • BUGS Code for Item Response Theory
    JSS Journal of Statistical Software August 2010, Volume 36, Code Snippet 1. http://www.jstatsoft.org/ BUGS Code for Item Response Theory S. McKay Curtis University of Washington Abstract I present BUGS code to fit common models from item response theory (IRT), such as the two parameter logistic model, three parameter logisitic model, graded response model, generalized partial credit model, testlet model, and generalized testlet models. I demonstrate how the code in this article can easily be extended to fit more complicated IRT models, when the data at hand require a more sophisticated approach. Specifically, I describe modifications to the BUGS code that accommodate longitudinal item response data. Keywords: education, psychometrics, latent variable model, measurement model, Bayesian inference, Markov chain Monte Carlo, longitudinal data. 1. Introduction In this paper, I present BUGS (Gilks, Thomas, and Spiegelhalter 1994) code to fit several models from item response theory (IRT). Several different software packages are available for fitting IRT models. These programs include packages from Scientific Software International (du Toit 2003), such as PARSCALE (Muraki and Bock 2005), BILOG-MG (Zimowski, Mu- raki, Mislevy, and Bock 2005), MULTILOG (Thissen, Chen, and Bock 2003), and TESTFACT (Wood, Wilson, Gibbons, Schilling, Muraki, and Bock 2003). The Comprehensive R Archive Network (CRAN) task view \Psychometric Models and Methods" (Mair and Hatzinger 2010) contains a description of many different R packages that can be used to fit IRT models in the R computing environment (R Development Core Team 2010). Among these R packages are ltm (Rizopoulos 2006) and gpcm (Johnson 2007), which contain several functions to fit IRT models using marginal maximum likelihood methods, and eRm (Mair and Hatzinger 2007), which contains functions to fit several variations of the Rasch model (Fischer and Molenaar 1995).
    [Show full text]
  • Stan: a Probabilistic Programming Language
    JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ Stan: A Probabilistic Programming Language Bob Carpenter Andrew Gelman Matt Hoffman Columbia University Columbia University Adobe Research Daniel Lee Ben Goodrich Michael Betancourt Columbia University Columbia University University of Warwick Marcus A. Brubaker Jiqiang Guo Peter Li University of Toronto, NPD Group Columbia University Scarborough Allen Riddell Dartmouth College Abstract Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.2.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propa- gation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line, through R using the RStan package, or through Python using the PyStan package. All three interfaces support sampling and optimization-based inference. RStan and PyStan also provide access to log probabilities, gradients, Hessians, and data I/O. Keywords: probabilistic program, Bayesian inference, algorithmic differentiation, Stan.
    [Show full text]
  • Installing BUGS and the R to BUGS Interface 1. Brief Overview
    File = E:\bugs\installing.bugs.jags.docm 1 John Miyamoto (email: [email protected]) Installing BUGS and the R to BUGS Interface Caveat: I am a Windows user so these notes are focused on Windows 7 installations. I will include what I know about the Mac and Linux versions of these programs but I cannot swear to the accuracy of my comments. Contents (Cntrl-left click on a link to jump to the corresponding section) Section Topic 1 Brief Overview 2 Installing OpenBUGS 3 Installing WinBUGs (Windows only) 4 OpenBUGS versus WinBUGS 5 Installing JAGS 6 Installing R packages that are used with OpenBUGS, WinBUGS, and JAGS 7 Running BRugs on a 32-bit or 64-bit Windows computer 8 Hints for Mac and Linux Users 9 References # End of Contents Table 1. Brief Overview TOC BUGS stands for Bayesian Inference Under Gibbs Sampling1. The BUGS program serves two critical functions in Bayesian statistics. First, given appropriate inputs, it computes the posterior distribution over model parameters - this is critical in any Bayesian statistical analysis. Second, it allows the user to compute a Bayesian analysis without requiring extensive knowledge of the mathematical analysis and computer programming required for the analysis. The user does need to understand the statistical model or models that are being analyzed, the assumptions that are made about model parameters including their prior distributions, and the structure of the data, but the BUGS program relieves the user of the necessity of creating an algorithm to sample from the posterior distribution and the necessity of writing the computer program that computes this algorithm.
    [Show full text]
  • Advancedhmc.Jl: a Robust, Modular and Efficient Implementation of Advanced HMC Algorithms
    2nd Symposium on Advances in Approximate Bayesian Inference, 20191{10 AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms Kai Xu [email protected] University of Edinburgh Hong Ge [email protected] University of Cambridge Will Tebbutt [email protected] University of Cambridge Mohamed Tarek [email protected] UNSW Canberra Martin Trapp [email protected] Graz University of Technology Zoubin Ghahramani [email protected] University of Cambridge & Uber AI Labs Abstract Stan's Hamilton Monte Carlo (HMC) has demonstrated remarkable sampling robustness and efficiency in a wide range of Bayesian inference problems through carefully crafted adaption schemes to the celebrated No-U-Turn sampler (NUTS) algorithm. It is challeng- ing to implement these adaption schemes robustly in practice, hindering wider adoption amongst practitioners who are not directly working with the Stan modelling language. AdvancedHMC.jl (AHMC) contributes a modular, well-tested, standalone implementation of NUTS that recovers and extends Stan's NUTS algorithm. AHMC is written in Julia, a modern high-level language for scientific computing, benefiting from optional hardware acceleration and interoperability with a wealth of existing software written in both Julia and other languages, such as Python. Efficacy is demonstrated empirically by comparison with Stan through a third-party Markov chain Monte Carlo benchmarking suite. 1. Introduction Hamiltonian Monte Carlo (HMC) is an efficient Markov chain Monte Carlo (MCMC) algo- rithm which avoids random walks by simulating Hamiltonian dynamics to make proposals (Duane et al., 1987; Neal et al., 2011). Due to the statistical efficiency of HMC, it has been widely applied to fields including physics (Duane et al., 1987), differential equations (Kramer et al., 2014), social science (Jackman, 2009) and Bayesian inference (e.g.
    [Show full text]
  • The Bayesian Lasso
    The Bayesian Lasso Trevor Park and George Casella y University of Florida, Gainesville, Florida, USA Summary. The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the priors on the regression parameters are indepen- dent double-exponential (Laplace) distributions. This posterior can also be accessed through a Gibbs sampler using conjugate normal priors for the regression parameters, with indepen- dent exponential hyperpriors on their variances. This leads to tractable full conditional distri- butions through a connection with the inverse Gaussian distribution. Although the Bayesian Lasso does not automatically perform variable selection, it does provide standard errors and Bayesian credible intervals that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso pa- rameter. The methods described here can also be extended to other Lasso-related estimation methods like bridge regression and robust variants. Keywords: Gibbs sampler, inverse Gaussian, linear regression, empirical Bayes, penalised regression, hierarchical models, scale mixture of normals 1. Introduction The Lasso of Tibshirani (1996) is a method for simultaneous shrinkage and model selection in regression problems. It is most commonly applied to the linear regression model y = µ1n + Xβ + ; where y is the n 1 vector of responses, µ is the overall mean, X is the n p matrix × T × of standardised regressors, β = (β1; : : : ; βp) is the vector of regression coefficients to be estimated, and is the n 1 vector of independent and identically distributed normal errors with mean 0 and unknown× variance σ2. The estimate of µ is taken as the average y¯ of the responses, and the Lasso estimate β minimises the sum of the squared residuals, subject to a given bound t on its L1 norm.
    [Show full text]
  • Winbugs Lectures X4.Pdf
    Introduction to Bayesian Analysis and WinBUGS Summary 1. Probability as a means of representing uncertainty 2. Bayesian direct probability statements about parameters Lecture 1. 3. Probability distributions Introduction to Bayesian Monte Carlo 4. Monte Carlo simulation methods in WINBUGS 5. Implementation in WinBUGS (and DoodleBUGS) - Demo 6. Directed graphs for representing probability models 7. Examples 1-1 1-2 Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS How did it all start? Basic idea: Direct expression of uncertainty about In 1763, Reverend Thomas Bayes of Tunbridge Wells wrote unknown parameters eg ”There is an 89% probability that the absolute increase in major bleeds is less than 10 percent with low-dose PLT transfusions” (Tinmouth et al, Transfusion, 2004) !50 !40 !30 !20 !10 0 10 20 30 % absolute increase in major bleeds In modern language, given r Binomial(θ,n), what is Pr(θ1 < θ < θ2 r, n)? ∼ | 1-3 1-4 Introduction to Bayesian Analysis and WinBUGS Introduction to Bayesian Analysis and WinBUGS Why a direct probability distribution? Inference on proportions 1. Tells us what we want: what are plausible values for the parameter of interest? What is a reasonable form for a prior distribution for a proportion? θ Beta[a, b] represents a beta distribution with properties: 2. No P-values: just calculate relevant tail areas ∼ Γ(a + b) a 1 b 1 p(θ a, b)= θ − (1 θ) − ; θ (0, 1) 3. No (difficult to interpret) confidence intervals: just report, say, central area | Γ(a)Γ(b) − ∈ a that contains 95% of distribution E(θ a, b)= | a + b ab 4.
    [Show full text]
  • Posterior Propriety and Admissibility of Hyperpriors in Normal
    The Annals of Statistics 2005, Vol. 33, No. 2, 606–646 DOI: 10.1214/009053605000000075 c Institute of Mathematical Statistics, 2005 POSTERIOR PROPRIETY AND ADMISSIBILITY OF HYPERPRIORS IN NORMAL HIERARCHICAL MODELS1 By James O. Berger, William Strawderman and Dejun Tang Duke University and SAMSI, Rutgers University and Novartis Pharmaceuticals Hierarchical modeling is wonderful and here to stay, but hyper- parameter priors are often chosen in a casual fashion. Unfortunately, as the number of hyperparameters grows, the effects of casual choices can multiply, leading to considerably inferior performance. As an ex- treme, but not uncommon, example use of the wrong hyperparameter priors can even lead to impropriety of the posterior. For exchangeable hierarchical multivariate normal models, we first determine when a standard class of hierarchical priors results in proper or improper posteriors. We next determine which elements of this class lead to admissible estimators of the mean under quadratic loss; such considerations provide one useful guideline for choice among hierarchical priors. Finally, computational issues with the resulting posterior distributions are addressed. 1. Introduction. 1.1. The model and the problems. Consider the block multivariate nor- mal situation (sometimes called the “matrix of means problem”) specified by the following hierarchical Bayesian model: (1) X ∼ Np(θ, I), θ ∼ Np(B, Σπ), where X1 θ1 X2 θ2 Xp×1 = . , θp×1 = . , arXiv:math/0505605v1 [math.ST] 27 May 2005 . X θ m m Received February 2004; revised July 2004. 1Supported by NSF Grants DMS-98-02261 and DMS-01-03265. AMS 2000 subject classifications. Primary 62C15; secondary 62F15. Key words and phrases. Covariance matrix, quadratic loss, frequentist risk, posterior impropriety, objective priors, Markov chain Monte Carlo.
    [Show full text]