Manipulating and Summarizing Posterior Simulations Using Random Variable Objects

Stat Comput (2007) 17: 235–244 DOI 10.1007/s11222-007-9020-4 Manipulating and summarizing posterior simulations using random variable objects Jouni Kerman · Andrew Gelman Received: 16 November 2005 / Accepted: 7 June 2007 / Published online: 14 July 2007 © Springer Science+Business Media, LLC 2007 Abstract Practical Bayesian data analysis involves manip- 1 Introduction ulating and summarizing simulations from the posterior distribution of the unknown parameters. By manipulation we In practical Bayesian data analysis, inferences are drawn × mean computing posterior distributions of functions of the from an L k matrix of simulations representing L draws unknowns, and generating posterior predictive distributions. from the posterior distribution of a vector of k parameters. The results need to be summarized both numerically and This matrix is typically obtained by a computer program im- plementing a Gibbs sampling scheme or other Markov chain graphically. Monte Carlo (MCMC) process, for example using Win- We introduce, and implement in R, an object-oriented BUGS (Lunn et al. 2000) and the R package R2WinBUGS programming paradigm based on a random variable ob- (Sturtz et al. 2005). Once the matrix of simulations from the ject type that is implicitly represented by simulations. This posterior density of the parameters is available, we may use makes it possible to define vector and array objects that may it to draw inferences about any function of the parameters. contain both random and deterministic quantities, and syn- In the Bayesian paradigm, unknown quantities have prob- tax rules that allow to treat these objects like any numeric ability distributions and are thus random variables. Ob- vectors or arrays, providing a solution to various problems served values are just realizations of random variables, and encountered in Bayesian computing involving posterior sim- constants may be thought of as random variables with point ulations. mass distributions. In mathematical notation, we deal with We illustrate the use of this new programming environ- objects that are random variables, but in practice these ob- ment with examples of Bayesian computing, demonstrating jects are approximated by vectors of numbers, that is, simu- missing-value imputation, nonlinear summary of regression lations. Consequently, when programming for manipulating predictions, and posterior predictive checking. simulations of unknown quantities, we must write code to manipulate arrays of numeric constants. Arrays of simulations are cumbersome objects to work · · Keywords Bayesian inference Bayesian data analysis with. Functions that work with vectors will not in general · · Object-oriented programming Posterior simulation work with matrices, so special versions of the functions need Random variable objects to be written to accommodate matrices of simulations as ar- guments. For example, a scalar-valued random variable becomes a vector of simulations, and a random vector becomes J. Kerman () a matrix of simulations. Statistical Methodology, Novartis Pharma AG, 4002 Basel, This gives rise to a question why our computing envi- Switzerland ronment is not equipped to handle objects that correspond e-mail: [email protected] directly to the mathematical random variables. Do we re- ally have to deal with arrays of simulations? Do we gain A. Gelman Department of Statistics, Columbia University, New York, USA anything if we try to introduce such an object class in our e-mail: [email protected] programming environment? 236 Stat Comput (2007) 17: 235–244 We demonstrate how an interactive programming envi- random scalar. Random variables and vectors are thus inte- ronment that has a random variable (and random array) data grated transparently into the programming environment. type makes programming involving simulations consider- There are no new syntax rules to be learned: built-in nu- ably more intuitive and powerful. This is especially true for meric functions work directly with random vectors, return- Bayesian data analysis. Along with new possibilities, intro- ing new random vectors. Most user-defined numeric func- duction of such a data type raises some new questions, for tions that manipulate vectors will also work with these ob- example, what is a mean of a random vector of length n?Is jects directly without any modification. it the distribution of the arithmetic average, a scalar quan- tity, or is it the expectation of the individual components, a vector of n constants? If we apply a comparison operator such as “>” to two random variables, what kind of an object 2 Manipulating posterior simulations is created? What does a scatterplot of a random vector look like? How should we plot a histogram of a random vector of length n? Once the model has been fit and posterior simulations for the Common programming languages are not equipped to unknown parameter vector, say θ, obtained from a model- handle random variable objects by default, not even R fitting program, the Bayesian data analyst typically needs to (R Development Core Team 2004), which is especially compute all or some of the following tasks: suited for statistical computing. However, we can create ran- 1. Posterior interval and point estimates of the components dom variables in object-oriented programming languages by of θ, such as means, medians, 50%, 80%, and 95% pos- introducing a new class of objects. Manipulating simulation terior intervals and the standard deviation which summa- matrices is of course possible using software that is already available, but an intuitive programming environment that al- rizes the uncertainty in θ. lows us to formulate problems in terms of random variable 2. Posterior interval and point estimates of functions of θ. objects instead of arrays of numbers makes statistical prob- For example, if θ is a vector of length 50 consisting of lems easier to express in program code and hence also easier some measures for all fifty U.S. states, we may be inter- to debug. ested in the distribution of the mean of the fifty random 1 50 quantities, 50 i=1 θi . 1.1 A new programming environment 3. Graphical summaries of the quantities mentioned above, for example plots that show point estimates and intervals. We have written a working prototype of a random-variable- 4. Posterior probability statements such as Pr(θ1 >θ2|y). enabled programming environment in R, which is an inter- 5. Histograms and density estimates of components of θ. active, fully programmable, object-oriented computing envi- 6. Scatterplots and contourplots showing the joint posterior ronment originally intended for data analysis. R is especially distribution of two-dimensional random quantities. convenient in vector and matrix manipulation, random vari- 7. Simulations from the posterior predictive distribution of able generation, graphics, and common programming. We future data y. suspect that our ideas could also be implemented in other 8. Bayesian p-values and graphical data discrepancy checks, statistical environments such as Xlisp-Stat (Tierney 1990) using functions of parameters θ, replicated data yrep, and or Quail (Oldford 1998). In R, numeric data objects are stored as vectors, that is, observed data y. in objects that may contain several components. These vec- To implement these tasks as computer programs, they tors, if of suitable length, may then have their dimension must be reinterpreted in terms of posterior simulations. attributes set to make them appear as matrices and arrays. A scalar random variable, say θ1, is represented internally The vectors may contain numbers (numerical constants) and by a numerical column vector of L simulations: symbols such as Inf (∞) and the missing value indicator NA. Alternatively, vectors can contain character strings or θ = (θ (1),θ(2),...,θ(L))T. logical values (TRUE, FALSE). 1 1 1 1 Our implementation extends the definition of a vector or array, allowing any component of a numeric array to be re- The number of simulations L is typically a value such as placed by an object that contains a number of simulations 200 or 1,000 (Gelman et al. 2003, pp. 277–278). We refer to () = from some distribution. Internally, a random vector is rep- y1 , 1,...,L,asavector of simulations. resented by a list of vectors of simulations, but the user Let k be a positive integer. A random vector θ = sees them as a single vector, and is also able to manipu- (θ1,...,θk) being by definition an k-tuple of random vari- late it as such without thinking of the individual simulation ables, is represented internally by k vectors of simulations. draws and such details as how many draws are included per These k column vectors form an L × k matrix of simulations Stat Comput (2007) 17: 235–244 237 ⎛ (1) (1) ··· (1) ⎞ θ1 θ2 θk example find the distribution of the determinant of θ by ap- ⎜ (2) (2) (2) ⎟ ⎜ θ θ ··· θ ⎟ plying the determinant function to each of the L two-by-two ⎜ 1 2 k ⎟ = ⎜ θ (3) θ (3) ··· θ (3) ⎟ matrices of simulations. Again, this requires a loop or the ⎜ 1 2 k ⎟ . ⎝ . ⎠ application of apply. For example, in R, the multiplication . .. of two random matrices A = is accomplished by, (L) (L) ··· (L) θ1 θ2 θk A <- array (NA,c(L,k,m)) () = () () # Allocate an L*k*m matrix Each row θ (θ1 ,...,θk ) of the matrix represents a random draw from the joint distribution of θ. The compo- for (i in 1:L) { nents of θ () may be dependent or independent. A[i,,] <- Theta[i,,] %*% Sigma[i,,] } 2.1 The currently-standard approach where k is the number of rows in Theta[i, ,] and m is the number of columns in Sigma[i, ,]. The usual approach now is to use loops and vector-matrix These are examples of functions that are applied rowwise computations to manipulate the matrix of posterior simu- to a matrix of simulations, yielding an array with the first lations. This approach is general but awkward and far from dimension of size L.

Manipulating and Summarizing Posterior Simulations Using Random Variable Objects

Naïve Bayes Classifier

The Bayesian Approach to Statistics

LINEAR REGRESSION MODELS Likelihood and Reference Bayesian

Random Forest Classifiers

Bayesian Inference

Naïve Bayes Classifier

9 Introduction to Hierarchical Models

Bayesian Model Selection

Arxiv:2002.00269V2 [Cs.LG] 8 Mar 2021

Bayesian Networks

Bayesian Statistics

Hands-On Bayesian Neural Networks - a Tutorial for Deep Learning Users