Stan User's Guide

Stan User’s Guide Version 2.19 Stan Development Team Contents About this Book 7 Part 1. Example Models 9 1. Regression Models 10 1.1 Linear Regression 10 1.2 The QR Reparameterization 12 1.3 Priors for Coefficients and Scales 14 1.4 Robust Noise Models 14 1.5 Logistic and Probit Regression 15 1.6 Multi-Logit Regression 16 1.7 Parameterizing Centered Vectors 19 1.8 Ordered Logistic and Probit Regression 21 1.9 Hierarchical Logistic Regression 23 1.10 Hierarchical Priors 25 1.11 Item-Response Theory Models 26 1.12 Priors for Identifiability 30 1.13 Multivariate Priors for Hierarchical Models 31 1.14 Prediction, Forecasting, and Backcasting 38 1.15 Multivariate Outcomes 39 1.16 Applications of Pseudorandom Number Generation 45 2. Time-Series Models 48 2.1 Autoregressive Models 48 2.2 Modeling Temporal Heteroscedasticity 51 2.3 Moving Average Models 52 2.4 Autoregressive Moving Average Models 55 2.5 Stochastic Volatility Models 57 2.6 Hidden Markov Models 60 3. Missing Data and Partially Known Parameters 66 1 CONTENTS 2 3.1 Missing Data 66 3.2 Partially Known Parameters 67 3.3 Sliced Missing Data 68 3.4 Loading matrix for factor analysis 69 3.5 Missing Multivariate Data 70 4. Truncated or Censored Data 73 4.1 Truncated Distributions 73 4.2 Truncated Data 73 4.3 Censored Data 75 5. Finite Mixtures 78 5.1 Relation to Clustering 78 5.2 Latent Discrete Parameterization 78 5.3 Summing out the Responsibility Parameter 79 5.4 Vectorizing Mixtures 82 5.5 Inferences Supported by Mixtures 83 5.6 Zero-Inflated and Hurdle Models 85 5.7 Priors and Effective Data Size in Mixture Models 90 6. Measurement Error and Meta-Analysis 91 6.1 Bayesian Measurement Error Model 91 6.2 Meta-Analysis 95 7. Latent Discrete Parameters 99 7.1 The Benefits of Marginalization 99 7.2 Change Point Models 99 7.3 Mark-Recapture Models 106 7.4 Data Coding and Diagnostic Accuracy Models 115 8. Sparse and Ragged Data Structures 120 8.1 Sparse Data Structures 120 8.2 Ragged Data Structures 121 9. Clustering Models 124 9.1 Relation to Finite Mixture Models 124 9.2 Soft K-Means 124 CONTENTS 3 9.3 The Difficulty of Bayesian Inference for Clustering 127 9.4 Naive Bayes Classification and Clustering 128 9.5 Latent Dirichlet Allocation 132 10. Gaussian Processes 137 10.1 Gaussian Process Regression 137 10.2 Simulating from a Gaussian Process 139 10.3 Fitting a Gaussian Process 142 11. Directions, Rotations, and Hyperspheres 160 11.1 Unit Vectors 160 11.2 Circles, Spheres, and Hyperspheres 161 11.3 Transforming to Unconstrained Parameters 161 11.4 Unit Vectors and Rotations 162 11.5 Circular Representations of Days and Years 163 12. Solving Algebraic Equations 164 12.1 Example: System of Nonlinear Algebraic Equations 164 12.2 Coding an Algebraic System 164 12.3 Calling the Algebraic Solver 165 12.4 Control Parameters for the Algebraic Solver 166 13. Ordinary Differential Equations 168 13.1 Example: Simple Harmonic Oscillator 168 13.2 Coding an ODE System 169 13.3 Solving a System of Linear ODEs using a Matrix Exponential 170 13.4 Measurement Error Models 171 13.5 Stiff ODEs 176 13.6 Control Parameters for ODE Solving 176 14. Computing One Dimensional Integrals 178 14.1 Calling the Integrator 179 14.2 Integrator Convergence 180 Part 2. Programming Techniques 183 15. Floating Point Arithmetic 184 15.1 Floating-point representations 184 CONTENTS 4 15.2 Literals: decimal and scientific notation 186 15.3 Arithmetic Precision 186 15.4 Comparing floating-point numbers 190 16. Matrices, Vectors, and Arrays 191 16.1 Basic Motivation 191 16.2 Fixed Sizes and Indexing out of Bounds 192 16.3 Data Type and Indexing Efficiency 192 16.4 Memory Locality 194 16.5 Converting among Matrix, Vector, and Array Types 196 16.6 Aliasing in Stan Containers 196 17. Multiple Indexing and Range Indexing 197 17.1 Multiple Indexing 197 17.2 Slicing with Range Indexes 199 17.3 Multiple Indexing on the Left of Assignments 199 17.4 Multiple Indexes with Vectors and Matrices 201 17.5 Matrices with Parameters and Constants 203 18. User-Defined Functions 205 18.1 Basic Functions 205 18.2 Functions as Statements 210 18.3 Functions Accessing the Log Probability Accumulator 210 18.4 Functions Acting as Random Number Generators 211 18.5 User-Defined Probability Functions 212 18.6 Overloading Functions 213 18.7 Documenting Functions 213 18.8 Summary of Function Types 214 18.9 Recursive Functions 215 18.10 Truncated Random Number Generation 216 19. Custom Probability Functions 219 19.1 Examples 219 20. Problematic Posteriors 222 20.1 Collinearity of Predictors in Regressions 222 CONTENTS 5 20.2 Label Switching in Mixture Models 229 20.3 Component Collapsing in Mixture Models 231 20.4 Posteriors with Unbounded Densities 232 20.5 Posteriors with Unbounded Parameters 233 20.6 Uniform Posteriors 234 20.7 Sampling Difficulties with Problematic Priors 234 21. Reparameterization and Change of Variables 239 21.1 Theoretical and Practical Background 239 21.2 Reparameterizations 239 21.3 Changes of Variables 244 21.4 Vectors with Varying Bounds 248 22. Efficiency Tuning 250 22.1 What is Efficiency? 250 22.2 Efficiency for Probabilistic Models and Algorithms 250 22.3 Statistical vs. Computational Efficiency 251 22.4 Model Conditioning and Curvature 251 22.5 Well-Specified Models 253 22.6 Avoiding Validation 253 22.7 Reparameterization 254 22.8 Vectorization 268 22.9 Exploiting Sufficient Statistics 273 22.10 Aggregating Common Subexpressions 274 22.11 Exploiting Conjugacy 274 22.12 Standardizing Predictors and Outputs 275 22.13 Using Map-Reduce 278 23. Map-Reduce 279 23.1 Overview of Map-Reduce 279 23.2 Map Function 279 23.3 Example: Mapping Logistic Regression 280 23.4 Example: Hierarchical Logistic Regression 282 23.5 Ragged Inputs and Outputs 285 Appendices 287 CONTENTS 6 24. Stan Program Style Guide 288 24.1 Choose a Consistent Style 288 24.2 Line Length 288 24.3 File Extensions 288 24.4 Variable Naming 288 24.5 Local Variable Scope 289 24.6 Parentheses and Brackets 290 24.7 Conditionals 291 24.8 Functions 292 24.9 White Space 293 25. Transitioning from BUGS 296 25.1 Some Differences in How BUGS and Stan Work 296 25.2 Some Differences in the Modeling Languages 298 25.3 Some Differences in the Statistical Models that are Allowed 302 25.4 Some Differences when Running from R 303 25.5 The Stan Community 304 References 305 About this Book This book is the official user’s guide for Stan. It provides example models and programming techniques for coding statistical models in Stan. How to use this book Part 1 gives Stan code and discussions for several important classes of models. Part 2 discusses various general Stan programming techniques that are not tied to any particular model. The appendices provide a style guide and advice for users of BUGS and JAGS. We recommend working through this book using the textbooks Bayesian Data Analysis and Statistical Rethinking: A Bayesian Course with Examples in R and Stan as references on the concepts, and using the Stan Reference Manual when necessary to clarify programming issues. Further resources are given at the end of the introductory chapter. Additional Stan manuals and guides In addition to this user’s guide, there are two reference manuals for the Stan language and algorithms. The Stan Reference Manual specifies the Stan programming language and inference algorithms. The Stan Functions Reference specifies the functions built into the Stan programming language. There is also a separate installation and getting started guide for each of the Stan interfaces (R, Python, Julia, Stata, MATLAB, Mathematica, and command line). Web resources Stan is an open-source software project, resources for which are hosted on various web sites: • The Stan Web Site organizes all of the resources for the Stan project for users and developers. It contains links to the official Stan releases, source code, installation instructions, and full documentation, including the latest version of this manual, the user’s guide and the getting started guide for each interface, tutorials, case studies, and reference materials for developers. 7 CONTENTS 8 • The Stan Forums provide structured message boards for questions, discussion, and announcements related to Stan for both users and developers. • The Stan GitHub Organization hosts all of Stan’s code, documentation, wikis, and web site, as well as the issue trackers for bug reports and feature requests and interactive code review for pull requests. Acknowledgements The Stan project could not exit without developers, users, and funding. Stan is a highly collaborative project. The individual contributions of the Stan developers to code is tracked through GitHub and to the design conversation in the Wikis and forums. Users have made extensive contributions to documentation in the way of case studies, tutorials and even books. They have also reported numerous bugs in both the code and documentation. Stan has been funded through grants for Stan and its developers, through in-kind donations in the form of companies contributing developer time to Stan and individ- uals contributing their own time to Stan, and through donations to the open-source scientific software non-profit NumFOCUS. For details of direct funding for the project, see the web site and project pages of the Stan developers. Copyright, Trademark, and Licensing This book is copyright 2011–2019, Stan Development Team and their assignees. The text content is distributed under the CC-BY ND 4.0 license.

Load more