Stan User's Guide 2.27
Total Page:16
File Type:pdf, Size:1020Kb
Stan User’s Guide Version 2.27 Stan Development Team Contents Overview 9 Part 1. Example Models 11 1. Regression Models 12 1.1 Linear regression 12 1.2 The QR reparameterization 14 1.3 Priors for coefficients and scales 16 1.4 Robust noise models 16 1.5 Logistic and probit regression 17 1.6 Multi-logit regression 19 1.7 Parameterizing centered vectors 21 1.8 Ordered logistic and probit regression 24 1.9 Hierarchical logistic regression 25 1.10 Hierarchical priors 28 1.11 Item-response theory models 29 1.12 Priors for identifiability 32 1.13 Multivariate priors for hierarchical models 33 1.14 Prediction, forecasting, and backcasting 41 1.15 Multivariate outcomes 42 1.16 Applications of pseudorandom number generation 48 2. Time-Series Models 51 2.1 Autoregressive models 51 2.2 Modeling temporal heteroscedasticity 54 2.3 Moving average models 55 2.4 Autoregressive moving average models 58 2.5 Stochastic volatility models 60 2.6 Hidden Markov models 63 3. Missing Data and Partially Known Parameters 70 1 CONTENTS 2 3.1 Missing data 70 3.2 Partially known parameters 71 3.3 Sliced missing data 72 3.4 Loading matrix for factor analysis 73 3.5 Missing multivariate data 74 4. Truncated or Censored Data 77 4.1 Truncated distributions 77 4.2 Truncated data 77 4.3 Censored data 79 5. Finite Mixtures 82 5.1 Relation to clustering 82 5.2 Latent discrete parameterization 82 5.3 Summing out the responsibility parameter 83 5.4 Vectorizing mixtures 86 5.5 Inferences supported by mixtures 87 5.6 Zero-inflated and hurdle models 90 5.7 Priors and effective data size in mixture models 94 6. Measurement Error and Meta-Analysis 96 6.1 Bayesian measurement error model 96 6.2 Meta-analysis 100 7. Latent Discrete Parameters 104 7.1 The benefits of marginalization 104 7.2 Change point models 104 7.3 Mark-recapture models 111 7.4 Data coding and diagnostic accuracy models 121 8. Sparse and Ragged Data Structures 126 8.1 Sparse data structures 126 8.2 Ragged data structures 127 9. Clustering Models 129 9.1 Relation to finite mixture models 129 9.2 Soft K-means 129 CONTENTS 3 9.3 The difficulty of Bayesian inference for clustering 132 9.4 Naive Bayes classification and clustering 133 9.5 Latent Dirichlet allocation 137 10. Gaussian Processes 142 10.1 Gaussian process regression 142 10.2 Simulating from a Gaussian process 144 10.3 Fitting a Gaussian process 147 11. Directions, Rotations, and Hyperspheres 166 11.1 Unit vectors 166 11.2 Circles, spheres, and hyperspheres 167 11.3 Transforming to unconstrained parameters 167 11.4 Unit vectors and rotations 168 11.5 Circular representations of days and years 169 12. Solving Algebraic Equations 170 12.1 Example: system of nonlinear algebraic equations 170 12.2 Coding an algebraic system 170 12.3 Calling the algebraic solver 171 12.4 Control parameters for the algebraic solver 172 13. Ordinary Differential Equations 174 13.1 Notation 176 13.2 Example: simple harmonic oscillator 176 13.3 Coding the ODE system function 176 13.4 Measurement error models 178 13.5 Stiff ODEs 181 13.6 Control parameters for ODE solving 182 13.7 Adjoint ODE solver 184 13.8 Solving a system of linear ODEs using a matrix exponential 185 14. Computing One Dimensional Integrals 187 14.1 Calling the integrator 188 14.2 Integrator convergence 189 Part 2. Programming Techniques 192 CONTENTS 4 15. Floating Point Arithmetic 193 15.1 Floating-point representations 193 15.2 Literals: decimal and scientific notation 195 15.3 Arithmetic precision 195 15.4 Log sum of exponentials 198 15.5 Comparing floating-point numbers 199 16. Matrices, Vectors, and Arrays 200 16.1 Basic motivation 200 16.2 Fixed sizes and indexing out of bounds 201 16.3 Data type and indexing efficiency 201 16.4 Memory locality 203 16.5 Converting among matrix, vector, and array types 205 16.6 Aliasing in Stan containers 205 17. Multiple Indexing and Range Indexing 207 17.1 Multiple indexing 207 17.2 Slicing with range indexes 209 17.3 Multiple indexing on the left of assignments 209 17.4 Multiple indexes with vectors and matrices 211 17.5 Matrices with parameters and constants 213 18. User-Defined Functions 215 18.1 Basic functions 215 18.2 Functions as statements 220 18.3 Functions accessing the log probability accumulator 220 18.4 Functions acting as random number generators 221 18.5 User-defined probability functions 222 18.6 Overloading functions 223 18.7 Documenting functions 223 18.8 Summary of function types 225 18.9 Recursive functions 225 18.10 Truncated random number generation 226 19. Custom Probability Functions 229 19.1 Examples 229 CONTENTS 5 20. Proportionality Constants 232 20.1 Dropping Proportionality Constants 232 20.2 Keeping Proportionality Constants 234 20.3 User-defined Distributions 234 20.4 Limitations on Using _lupdf and _lupmf Functions 235 21. Problematic Posteriors 236 21.1 Collinearity of predictors in regressions 236 21.2 Label switching in mixture models 244 21.3 Component collapsing in mixture models 246 21.4 Posteriors with unbounded densities 246 21.5 Posteriors with unbounded parameters 247 21.6 Uniform posteriors 248 21.7 Sampling difficulties with problematic priors 248 22. Reparameterization and Change of Variables 253 22.1 Theoretical and practical background 253 22.2 Reparameterizations 253 22.3 Changes of variables 258 22.4 Vectors with varying bounds 262 23. Efficiency Tuning 264 23.1 What is efficiency? 264 23.2 Efficiency for probabilistic models and algorithms 264 23.3 Statistical vs. computational efficiency 265 23.4 Model conditioning and curvature 265 23.5 Well-specified models 267 23.6 Avoiding validation 267 23.7 Reparameterization 268 23.8 Vectorization 283 23.9 Exploiting sufficient statistics 288 23.10 Aggregating common subexpressions 289 23.11 Exploiting conjugacy 289 23.12 Standardizing predictors and outputs 290 23.13 Using map-reduce 293 CONTENTS 6 24. Parallelization 295 24.1 Reduce-sum 295 24.2 Map-rect 300 24.3 OpenCL 308 Part 3. Posterior Inference & Model Checking 311 25. Posterior Predictive Sampling 312 25.1 Posterior predictive distribution 312 25.2 Computing the posterior predictive distribution 312 25.3 Sampling from the posterior predictive distribution 313 25.4 Posterior predictive simulation in Stan 313 25.5 Posterior prediction for regressions 315 25.6 Estimating event probabilities 317 25.7 Stand-alone generated quantities and ongoing prediction 318 26. Simulation-Based Calibration 320 26.1 Bayes is calibrated by construction 320 26.2 Simulation-based calibration 321 26.3 SBC in Stan 322 26.4 Testing uniformity 325 26.5 Examples of simulation-based calibration 326 27. Posterior and Prior Predictive Checks 332 27.1 Simulating from the posterior predictive distribution 332 27.2 Plotting multiples 333 27.3 Posterior “p-values’ ’ 336 27.4 Prior predictive checks 338 27.5 Example of prior predictive checks 339 27.6 Mixed predictive replication for hierarchical models 341 27.7 Joint model representation 343 28. Held-Out Evaluation and Cross-Validation 345 28.1 Evaluating posterior predictive densities 345 28.2 Estimation error 347 28.3 Cross-validation 350 CONTENTS 7 29. Poststratification 354 29.1 Some examples 354 29.2 Bayesian poststratification 355 29.3 Poststratification in Stan 356 29.4 Regression and poststratification 357 29.5 Multilevel regression and poststratification 358 29.6 Coding MRP in Stan 360 29.7 Adding group-level predictors 363 30. Decision Analysis 365 30.1 Outline of decision analysis 365 30.2 Example decision analysis 365 30.3 Continuous choices 369 31. The Bootstrap and Bagging 370 31.1 The bootstrap 370 31.2 Coding the bootstrap in Stan 371 31.3 Error statistics from the bootstrap 372 31.4 Bagging 374 31.5 Bayesian bootstrap and bagging 374 Appendices 375 32. Stan Program Style Guide 376 32.1 Choose a consistent style 376 32.2 Line length 376 32.3 File extensions 376 32.4 Variable naming 376 32.5 Local variable scope 377 32.6 Parentheses and brackets 378 32.7 Conditionals 380 32.8 Functions 380 32.9 White space 381 33. Transitioning from BUGS 384 33.1 Some differences in how BUGS and Stan work 384 CONTENTS 8 33.2 Some differences in the modeling languages 386 33.3 Some differences in the statistical models that are allowed 389 33.4 Some differences when running from R 391 33.5 The Stan community 392 References 393 Overview About this user’s guide This is the official user’s guide for Stan. It provides example models and programming techniques for coding statistical models in Stan. • Part 1 gives Stan code and discussions for several important classes of models. • Part 2 discusses various general Stan programming techniques that are not tied to any particular model. • Part 3 introduces algorithms for calibration and model checking that requie multiple runs of Stan. • The appendices provide a style guide and advice for users of BUGS and JAGS. In addition to this user’s guide, there are two reference manuals for the Stan language and algorithms. The Stan Reference Manual specifies the Stan programming language and inference algorithms. The Stan Functions Reference specifies the functions built into the Stan programming language. There is also a separate installation and getting started guide for each of the Stan interfaces (R, Python, Julia, Stata, MATLAB, Mathematica, and command line). We recommend working through this guide using the textbooks Bayesian Data Analysis and Statistical Rethinking: A Bayesian Course with Examples in R and Stan as references on the concepts, and using the Stan Reference Manual when necessary to clarify programming issues.