Stan Modeling Language

Stan Modeling Language User’s Guide and Reference Manual Stan Development Team Stan Version 2.6.2 Saturday 14th March, 2015 http://mc-stan.org/ Stan Development Team. 2015. Stan Modeling Language: User’s Guide and Reference Manual. Version 2.6.2 Copyright © 2011–2015, Stan Development Team. This document is distributed under the Creative Commons Attribute 4.0 Unported License (CC BY 4.0). For full details, see https://creativecommons.org/licenses/by/4.0/legalcode Stan Development Team Currently Active Developers The list order is the order of joining the development team. • Andrew Gelman (Columbia University) chief of staff, chief of marketing, chief of fundraising, chief of modeling, chief of training, max marginal likelihood, expectation propagation, posterior analysis, R, founder • Bob Carpenter (Columbia University) language design, parsing, code generation, autodiff, templating, ODEs, probability functions, constraint transforms, manual, web design / maintenance, fundraising, support, training, C++, founder • Matt Hoffman (Adobe Creative Technologies Lab) NUTS, adaptation, autodiff, memory management, (re)parameterization, C++, founder • Daniel Lee (Columbia University) chief of engineering, CmdStan (founder), builds, continuous integration, testing, templates, ODEs, autodiff, posterior analysis, probability functions, error handling, refactoring, C++, training • Ben Goodrich (Columbia University) RStan, multivariate probability functions, matrix algebra, (re)parameterization, constraint transforms, modeling, R, C++, training • Michael Betancourt (University of Warwick) chief of smooth manifolds, MCMC, Riemannian HMC, geometry, analysis and measure theory, ODEs, CmdStan, CDFs, autodiff, transformations, refactoring, modeling, variational inference, logos, web design, C++, training • Marcus Brubaker (University of Toronto, Scarborough) optimization routines, code efficiency, matrix algebra, multivariate distributions, C++ • Jiqiang Guo (NPD Group) RStan (founder), C++, Rcpp, R • Peter Li (Columbia University) RNGs, higher-order autodiff, ensemble sampling, Metropolis, example models, C++ • Allen Riddell (Dartmouth College) PyStan (founder), C++, Python • Marco Inacio (University of São Paulo) functions and distributions, C++ iii • Jeffrey Arnold (University of Washington) emacs mode, pretty printing, manual, emacs • Rob J. Goedman (D3Consulting b.v.) parsing, Stan.jl, C++, Julia • Brian Lau (CNRS, Paris) MatlabStan, MATLAB • Mitzi Morris (Lucidworks) parsing, testing, C++ • Rob Trangucci (Columbia University) max marginal likelihood, multilevel modeling and poststratification, template metaprogramming, training, C++,R • Jonah Sol Gabry (Columbia University) shinyStan (founder), R Development Team Alumni These are developers who have made important contributions in the past, but are no longer contributing actively. • Michael Malecki (Crunch.io, YouGov plc) original design, modeling, logos, R • Yuanjun Guo (Columbia University) dense mass matrix estimation, C++ iv Contents Preface ix Acknowledgements xiv I Introduction 17 1. Overview 18 II Programming Techniques 24 2. Model Building as Software Development 25 3. Data Types 31 4. Containers: Arrays, Vectors, and Matrices 34 5. Regression Models 38 6. Time-Series Models 67 7. Missing Data & Partially Known Parameters 85 8. Truncated or Censored Data 89 9. Finite Mixtures 94 10. Measurement Error and Meta-Analysis 99 11. Latent Discrete Parameters 104 12. Sparse and Ragged Data Structures 123 13. Clustering Models 126 14. Gaussian Processes 139 15. Reparameterization & Change of Variables 153 16. Custom Probability Functions 159 17. User-Defined Functions 161 18. Solving Differential Equations 171 19. Problematic Posteriors 177 20. Optimizing Stan Code 191 21. Reproducibility 211 v III Modeling Language Reference 212 22. Execution of a Stan Program 213 23. Data Types and Variable Declarations 219 24. Expressions 235 25. Statements 251 26. User-Defined Functions 267 27. Program Blocks 275 28. Modeling Language Syntax 286 IV Built-In Functions 290 29. Vectorization 291 30. Void Functions 293 31. Integer-Valued Basic Functions 295 32. Real-Valued Basic Functions 298 33. Array Operations 320 34. Matrix Operations 327 35. Mixed Operations 347 V Discrete Distributions 349 36. Conventions for Probability Functions 350 37. Binary Distributions 351 38. Bounded Discrete Distributions 353 39. Unbounded Discrete Distributions 359 40. Multivariate Discrete Distributions 364 VI Continuous Distributions 365 41. Unbounded Continuous Distributions 366 42. Positive Continuous Distributions 374 43. Non-negative Continuous Distributions 382 44. Positive Lower-Bounded Probabilities 383 vi 45. Continuous Distributions on [0, 1] 385 46. Circular Distributions 387 47. Bounded Continuous Probabilities 389 48. Distributions over Unbounded Vectors 390 49. Simplex Distributions 397 50. Correlation Matrix Distributions 398 51. Covariance Matrix Distributions 401 VII Additional Topics 403 52. Point Estimation 404 53. Bayesian Data Analysis 414 54. Markov Chain Monte Carlo Sampling 418 55. Transformations of Constrained Variables 427 VIII Algorithms & Implementations 444 56. Hamiltonian Monte Carlo Sampling 445 57. Optimization Algorithms 457 58. Diagnostic Mode 460 IX Software Process 462 59. Software Development Lifecycle 463 X Contributed Modules 471 60. Contributed Modules 472 Appendices 474 A. Licensing 474 B. Stan for Users of BUGS 476 C. Stan Program Style Guide 485 D. Warning and Error Messages 493 vii E. Mathematical Functions 495 Bibliography 496 Index 503 viii Preface Why Stan? We1 did not set out to build Stan as it currently exists. We set out to apply full Bayesian inference to the sort of multilevel generalized linear models discussed in Part II of (Gelman and Hill, 2007). These models are structured with grouped and interacted predictors at multiple levels, hierarchical covariance priors, nonconjugate coefficient priors, latent effects as in item-response models, and varying output link functions and distributions. The models we wanted to fit turned out to be a challenge for current general- purpose software to fit. A direct encoding in BUGS or JAGS can grind these tools to a halt. Matt Schofield found his multilevel time-series regression of climate on tree-ring measurements wasn’t converging after hundreds of thousands of iterations. Initially, Aleks Jakulin spent some time working on extending the Gibbs sampler in the Hierarchical Bayesian Compiler (Daumé, 2007), which as its name suggests, is compiled rather than interpreted. But even an efficient and scalable implementation does not solve the underlying problem that Gibbs sampling does not fare well with highly correlated posteriors. We finally realized we needed a better sampler, not a more efficient implementation. We briefly considered trying to tune proposals for a random-walk Metropolis- Hastings sampler, but that seemed too problem specific and not even necessarily possible without some kind of adaptation rather than tuning of the proposals. The Path to Stan We were at the same time starting to hear more and more about Hamiltonian Monte Carlo (HMC) and its ability to overcome some of the the problems inherent in Gibbs sampling. Matt Schofield managed to fit the tree-ring data using a hand-coded implementation of HMC, finding it converged in a few hundred iterations. 1In Fall 2010, the “we” consisted of Andrew Gelman and his crew of Ph.D. students (Wei Wang and Vince Dorie), postdocs (Ben Goodrich, Matt Hoffman and Michael Malecki), and research staff (Bob Carpenter and Daniel Lee). Previous postdocs whose work directly influenced Stan included Matt Schofield, Kenny Shirley, and Aleks Jakulin. Jiqiang Guo joined as a postdoc in Fall 2011. Marcus Brubaker, a computer science postdoc at Toyota Technical Institute at Chicago, joined the development team in early 2012. Michael Betancourt, a physics Ph.D. about to start a postdoc at University College London, joined the development team in late 2012 after months of providing useful feedback on geometry and debugging samplers at our meetings. Yuanjun Gao, a statistics graduate student at Columbia, and Peter Li, an undergraduate student at Columbia, joined the development team in the Fall semester of 2012. Allen Riddell joined the development team in Fall of 2013 and is currently maintaining PyStan. In the summer of 2014, Marco Inacio (University of São Paulo), Mitzi Morris (independent contractor), and Jeffrey Arnold (University of Washington) joined the development team. ix HMC appeared promising but was also problematic in that the Hamiltonian dy- namics simulation requires the gradient of the log posterior. Although it’s possible to do this by hand, it is very tedious and error prone. That’s when we discovered reverse-mode algorithmic differentiation, which lets you write down a templated C++ function for the log posterior and automatically compute a proper analytic gradient up to machine precision accuracy in only a few multiples of the cost to evaluate the log probability function itself. We explored existing algorithmic differentiation pack- ages with open licenses such as rad (Gay, 2005) and its repackaging in the Sacado module of the Trilinos toolkit and the CppAD package in the coin-or toolkit. But nei- ther package supported very many special functions (e.g., probability functions, log gamma, inverse logit) or linear algebra operations (e.g., Cholesky decomposition) and were not easily and modularly extensible. So we built our own reverse-mode algorithmic differentiation package. But once we’d built

Stan Modeling Language

Stan: a Probabilistic Programming Language

Stan: a Probabilistic Programming Language

Advancedhmc.Jl: a Robust, Modular and Efficient Implementation of Advanced HMC Algorithms

Maple Advanced Programming Guide

End-User Probabilistic Programming

An Introduction to Data Analysis Using the Pymc3 Probabilistic Programming Framework: a Case Study with Gaussian Mixture Modeling

GNU Octave a High-Level Interactive Language for Numerical Computations Edition 3 for Octave Version 2.0.13 February 1997

Stan Modeling Language

Maple 7 Programming Guide

Bayesian Linear Mixed Models with Polygenic Effects

Bayesian Data Analysis – Assignment 6

Probabilistic Programming in Python Using Pymc3