Stan Modeling Language
Total Page:16
File Type:pdf, Size:1020Kb
Stan Modeling Language User’s Guide and Reference Manual Stan Development Team Stan Version 2.12.0 Tuesday 6th September, 2016 mc-stan.org Stan Development Team (2016) Stan Modeling Language: User’s Guide and Reference Manual. Version 2.12.0. Copyright © 2011–2016, Stan Development Team. This document is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). For full details, see https://creativecommons.org/licenses/by/4.0/legalcode The Stan logo is distributed under the Creative Commons Attribution- NoDerivatives 4.0 International License (CC BY-ND 4.0). For full details, see https://creativecommons.org/licenses/by-nd/4.0/legalcode Stan Development Team Currently Active Developers This is the list of current developers in order of joining the development team (see the next section for former development team members). • Andrew Gelman (Columbia University) chief of staff, chief of marketing, chief of fundraising, chief of modeling, max marginal likelihood, expectation propagation, posterior analysis, RStan, RStanARM, Stan • Bob Carpenter (Columbia University) language design, parsing, code generation, autodiff, templating, ODEs, probability functions, con- straint transforms, manual, web design / maintenance, fundraising, support, training, Stan, Stan Math, CmdStan • Matt Hoffman (Adobe Creative Technologies Lab) NUTS, adaptation, autodiff, memory management, (re)parameterization, optimization C++ • Daniel Lee (Columbia University) chief of engineering, CmdStan, builds, continuous integration, testing, templates, ODEs, autod- iff, posterior analysis, probability functions, error handling, refactoring, training, Stan, Stan Math, CmdStan, RStan, PyStan • Ben Goodrich (Columbia University) RStan, multivariate probability functions, matrix algebra, (re)parameterization, constraint trans- forms, modeling, R, C++, training, Stan, Stan Math, RStan, RStanARM • Michael Betancourt (University of Warwick) chief of smooth manifolds, MCMC, Riemannian HMC, geometry, analysis and measure theory, ODEs, CmdStan, CDFs, autodiff, transformations, refactoring, modeling, adaptation, variational inference, logos, web design, training, Stan, Stan Math, CmdStan • Marcus Brubaker (University of Toronto, Scarborough) chief of optimization, code efficiency, matrix algebra, multivariate distributions, C++, Stan, Stan Math, CmdStan • Jiqiang Guo (NPD Group) modeling, Stan, RStan • Allen Riddell (Dartmouth College) PyStan • Marco Inacio (University of São Paulo/UFSCar) functions and distributions, Stan, Stan Math iii • Jeffrey Arnold (University of Washington) emacs mode, pretty printing, manual • Rob J. Goedman (Consultant, La Jolla, California) parsing, Stan, Stan.jl • Brian Lau (CNRS, Paris) MatlabStan • Mitzi Morris (Lucidworks) parsing, testing, JSON, Stan, Stan Math • Rob Trangucci (Columbia University) max marginal likelihood, multilevel modeling and GLMs, poststratification, autodiff, template metaprogramming, training, Stan, Stan Math, RStan • Jonah Sol Gabry (Columbia University) RStan, RStanARM, ShinyStan, multilevel modeling, model comparison (LOO), model visualization, training • Alp Kucukelbir (Columbia University) variational inference, Stan, CmdStan • Robert L. Grant (St. George’s, University of London and Kingston University) StataStan • Dustin Tran (Harvard University) variational inference, Stan, CmdStan • Krzysztof Sakrejda (University of Massachusetts, Amherst) sparse matrix operations, Stan, Stan Math • Aki Vehtari (Aalto University) GPs, model comparison (WAIC, LOO), modeling, algorithms, MatlabStan • Rayleigh Lei (University of Michigan) vectorization, ADVI, RStan, Stan, Stan Math • Sebastian Weber (Novartis Pharma) ODE solvers, Stan, Stan Math • Charles Margossian (Metrum LLC) ODE solvers, Stan, Stan Math iv Development Team Alumni These are developers who have made important contributions in the past, but are no longer contributing actively. • Michael Malecki (Crunch.io, YouGov plc) original design, modeling, logos, R • Peter Li (Columbia University) RNGs, higher-order autodiff, ensemble sampling, Metropolis, example models, C++ • Yuanjun Guo (Columbia University) dense mass matrix estimation, C++ v Contents Preface x Acknowledgements xvi I Introduction 20 1. Overview 21 II Programming Techniques 28 2. Model Building as Software Development 29 3. Data Types 35 4. Containers: Arrays, Vectors, and Matrices 38 5. Multiple Indexing and Range Indexing 42 6. Regression Models 50 7. Time-Series Models 87 8. Missing Data & Partially Known Parameters 105 9. Truncated or Censored Data 109 10. Finite Mixtures 114 11. Measurement Error and Meta-Analysis 122 12. Latent Discrete Parameters 129 13. Sparse and Ragged Data Structures 149 14. Clustering Models 152 15. Gaussian Processes 165 16. Directions, Rotations, and Hyperspheres 179 17. Reparameterization & Change of Variables 182 18. Custom Probability Functions 189 19. User-Defined Functions 191 20. Solving Differential Equations 201 21. Problematic Posteriors 209 22. Optimizing Stan Code for Efficiency 223 vi 23. Reproducibility 248 III Modeling Language Reference 250 24. Execution of a Stan Program 251 25. Data Types and Variable Declarations 257 26. Expressions 273 27. Statements 291 28. User-Defined Functions 311 29. Program Blocks 319 30. Modeling Language Syntax 331 IV Built-In Functions 337 31. Void Functions 338 32. Integer-Valued Basic Functions 340 33. Real-Valued Basic Functions 343 34. Array Operations 365 35. Matrix Operations 372 36. Sparse Matrix Operations 393 37. Mixed Operations 396 38. Ordinary Differential Equation Solvers 398 V Discrete Distributions 402 39. Conventions for Probability Functions 403 40. Binary Distributions 408 41. Bounded Discrete Distributions 410 42. Unbounded Discrete Distributions 416 43. Multivariate Discrete Distributions 421 VI Continuous Distributions 422 44. Unbounded Continuous Distributions 423 vii 45. Positive Continuous Distributions 431 46. Non-negative Continuous Distributions 439 47. Positive Lower-Bounded Probabilities 441 48. Continuous Distributions on [0, 1] 443 49. Circular Distributions 445 50. Bounded Continuous Probabilities 447 51. Distributions over Unbounded Vectors 448 52. Simplex Distributions 455 53. Correlation Matrix Distributions 456 54. Covariance Matrix Distributions 459 VII Additional Topics 461 55. Point Estimation 462 56. Bayesian Data Analysis 472 57. Markov Chain Monte Carlo Sampling 476 58. Transformations of Constrained Variables 485 59. Variational Inference 502 VIII Algorithms & Implementations 504 60. Hamiltonian Monte Carlo Sampling 505 61. Optimization Algorithms 517 62. Variational Inference 520 63. Diagnostic Mode 522 IX Software Process 524 64. Software Development Lifecycle 525 X Contributed Modules 534 65. Contributed Modules 535 viii Appendices 537 A. Licensing 537 B. Stan for Users of BUGS 539 C. Stan Program Style Guide 548 D. Warning and Error Messages 556 E. Deprecated Features 558 F. Mathematical Functions 562 Bibliography 564 Index 573 ix Preface Why Stan? We did not set out to build Stan as it currently exists. We set out to apply full Bayesian inference to the sort of multilevel generalized linear models discussed in Part II of (Gelman and Hill, 2007). These models are structured with grouped and interacted predictors at multiple levels, hierarchical covariance priors, nonconjugate coefficient priors, latent effects as in item-response models, and varying output link functions and distributions. The models we wanted to fit turned out to be a challenge for current general- purpose software. A direct encoding in BUGS or JAGS can grind these tools to a halt. Matt Schofield found his multilevel time-series regression of climate on tree- ring measurements wasn’t converging after hundreds of thousands of iterations. Initially, Aleks Jakulin spent some time working on extending the Gibbs sampler in the Hierarchical Bayesian Compiler (Daumé, 2007), which as its name suggests, is compiled rather than interpreted. But even an efficient and scalable implementation does not solve the underlying problem that Gibbs sampling does not fare well with highly correlated posteriors. We finally realized we needed a better sampler, not a more efficient implementation. We briefly considered trying to tune proposals for a random-walk Metropolis- Hastings sampler, but that seemed too problem specific and not even necessarily possible without some kind of adaptation rather than tuning of the proposals. The Path to Stan We were at the same time starting to hear more and more about Hamiltonian Monte Carlo (HMC) and its ability to overcome some of the the problems inherent in Gibbs sampling. Matt Schofield managed to fit the tree-ring data using a hand-coded imple- mentation of HMC, finding it converged in a few hundred iterations. HMC appeared promising but was also problematic in that the Hamiltonian dy- namics simulation requires the gradient of the log posterior. Although it’s possible to do this by hand, it is very tedious and error prone. That’s when we discovered reverse-mode algorithmic differentiation, which lets you write down a templated C++ function for the log posterior and automatically compute a proper analytic gradient up to machine precision accuracy in only a few multiples of the cost to evaluate the log probability function itself. We explored existing algorithmic differentiation pack- ages with open licenses such as rad (Gay, 2005) and its repackaging in the Sacado module of the Trilinos toolkit and the CppAD package in the coin-or toolkit. But nei- ther package supported very many special functions