<<

Stat Comput (2007) 17: 235–244 DOI 10.1007/s11222-007-9020-4

Manipulating and summarizing posterior simulations using objects

Jouni Kerman · Andrew Gelman

Received: 16 November 2005 / Accepted: 7 June 2007 / Published online: 14 July 2007 © Springer Science+Business Media, LLC 2007

Abstract Practical Bayesian analysis involves manip- 1 Introduction ulating and summarizing simulations from the posterior dis- tribution of the unknown parameters. By manipulation we In practical Bayesian data analysis, inferences are drawn × computing posterior distributions of functions of the from an L k matrix of simulations representing L draws unknowns, and generating posterior predictive distributions. from the posterior distribution of a vector of k parameters. The results need to be summarized both numerically and This matrix is typically obtained by a computer program im- plementing a Gibbs scheme or other graphically. Monte Carlo (MCMC) process, for example using Win- We introduce, and implement in R, an object-oriented BUGS (Lunn et al. 2000) and the R package R2WinBUGS programming paradigm based on a random variable ob- (Sturtz et al. 2005). Once the matrix of simulations from the ject type that is implicitly represented by simulations. This posterior density of the parameters is available, we may use makes it possible to define vector and array objects that may it to draw inferences about any function of the parameters. contain both random and deterministic quantities, and syn- In the Bayesian paradigm, unknown quantities have prob- tax rules that allow to treat these objects like any numeric ability distributions and are thus random variables. Ob- vectors or arrays, providing a solution to various problems served values are just realizations of random variables, and encountered in Bayesian computing involving posterior sim- constants may be thought of as random variables with point ulations. mass distributions. In mathematical notation, we deal with We illustrate the use of this new programming environ- objects that are random variables, but in practice these ob- ment with examples of Bayesian computing, demonstrating jects are approximated by vectors of numbers, that is, simu- missing-value imputation, nonlinear summary of regression lations. Consequently, when programming for manipulating predictions, and posterior predictive checking. simulations of unknown quantities, we must write code to manipulate arrays of numeric constants. Arrays of simulations are cumbersome objects to work · · Keywords Bayesian data analysis with. Functions that work with vectors will not in general · · Object-oriented programming Posterior simulation work with matrices, so special versions of the functions need Random variable objects to be written to accommodate matrices of simulations as ar- guments. For example, a scalar-valued random variable be- comes a vector of simulations, and a random vector becomes J. Kerman () a matrix of simulations. Statistical Methodology, Novartis Pharma AG, 4002 Basel, This gives rise to a question why our computing envi- Switzerland ronment is not equipped to handle objects that correspond e-mail: [email protected] directly to the mathematical random variables. Do we re- ally have to deal with arrays of simulations? Do we gain A. Gelman Department of , Columbia University, New York, USA anything if we try to introduce such an object class in our e-mail: [email protected] programming environment? 236 Stat Comput (2007) 17: 235–244 We demonstrate how an interactive programming envi- random scalar. Random variables and vectors are thus inte- ronment that has a random variable (and random array) data grated transparently into the programming environment. type makes programming involving simulations consider- There are no new syntax rules to be learned: built-in nu- ably more intuitive and powerful. This is especially true for meric functions work directly with random vectors, return- Bayesian data analysis. Along with new possibilities, intro- ing new random vectors. Most user-defined numeric func- duction of such a data type raises some new questions, for tions that manipulate vectors will also work with these ob- example, what is a mean of a random vector of length n?Is jects directly without any modification. it the distribution of the arithmetic average, a scalar quan- tity, or is it the expectation of the individual components, a vector of n constants? If we apply a comparison operator such as “>” to two random variables, what kind of an object 2 Manipulating posterior simulations is created? What does a scatterplot of a random vector look like? How should we plot a of a random vector of length n? Once the model has been fit and posterior simulations for the Common programming languages are not equipped to unknown parameter vector, say θ, obtained from a model- handle random variable objects by default, not even R fitting program, the Bayesian data analyst typically needs to (R Development Core Team 2004), which is especially compute all or some of the following tasks: suited for statistical computing. However, we can create ran- 1. Posterior interval and point estimates of the components dom variables in object-oriented programming languages by of θ, such as , , 50%, 80%, and 95% pos- introducing a new class of objects. Manipulating simulation terior intervals and the which summa- matrices is of course possible using software that is already available, but an intuitive programming environment that al- rizes the uncertainty in θ. lows us to formulate problems in terms of random variable 2. Posterior interval and point estimates of functions of θ. objects instead of arrays of numbers makes statistical prob- For example, if θ is a vector of length 50 consisting of lems easier to express in program code and hence also easier some measures for all fifty U.S. states, we may be inter- to debug. ested in the distribution of the mean of the fifty random 1 50 quantities, 50 i=1 θi . 1.1 A new programming environment 3. Graphical summaries of the quantities mentioned above, for example plots that show point estimates and intervals. We have written a working prototype of a random-variable- 4. Posterior statements such as Pr(θ1 >θ2|y). enabled programming environment in R, which is an inter- 5. and density estimates of components of θ. active, fully programmable, object-oriented computing envi- 6. Scatterplots and contourplots showing the joint posterior ronment originally intended for data analysis. R is especially distribution of two-dimensional random quantities. convenient in vector and matrix manipulation, random vari- 7. Simulations from the posterior predictive distribution of able generation, graphics, and common programming. We future data y. suspect that our ideas could also be implemented in other 8. Bayesian p-values and graphical data discrepancy checks, statistical environments such as Xlisp-Stat (Tierney 1990) using functions of parameters θ, replicated data yrep, and or Quail (Oldford 1998). In R, numeric data objects are stored as vectors, that is, observed data y. in objects that may contain several components. These vec- To implement these tasks as computer programs, they tors, if of suitable length, may then have their dimension must be reinterpreted in terms of posterior simulations. attributes set to make them appear as matrices and arrays. A scalar random variable, say θ1, is represented internally The vectors may contain numbers (numerical constants) and by a numerical column vector of L simulations: symbols such as Inf (∞) and the missing value indicator NA. Alternatively, vectors can contain character strings or θ = (θ (1),θ(2),...,θ(L))T. logical values (TRUE, FALSE). 1 1 1 1 Our implementation extends the definition of a vector or array, allowing any component of a numeric array to be re- The number of simulations L is typically a value such as placed by an object that contains a number of simulations 200 or 1,000 (Gelman et al. 2003, pp. 277–278). We refer to () = from some distribution. Internally, a random vector is rep- y1 ,  1,...,L,asavector of simulations. resented by a list of vectors of simulations, but the user Let k be a positive integer. A random vector θ = sees them as a single vector, and is also able to manipu- (θ1,...,θk) being by definition an k-tuple of random vari- late it as such without thinking of the individual simulation ables, is represented internally by k vectors of simulations. draws and such details as how many draws are included per These k column vectors form an L × k matrix of simulations Stat Comput (2007) 17: 235–244 237 ⎛ (1) (1) ··· (1) ⎞ θ1 θ2 θk example find the distribution of the determinant of θ by ap- ⎜ (2) (2) (2) ⎟ ⎜ θ θ ··· θ ⎟ plying the determinant function to each of the L two-by-two ⎜ 1 2 k ⎟ = ⎜ θ (3) θ (3) ··· θ (3) ⎟ matrices of simulations. Again, this requires a loop or the  ⎜ 1 2 k ⎟ . ⎝ . . . . ⎠ application of apply. For example, in R, the multiplication ...... of two random matrices A =  is accomplished by, (L) (L) ··· (L) θ1 θ2 θk A <- array (NA,c(L,k,m)) () = () () # Allocate an L*k*m matrix Each row θ (θ1 ,...,θk ) of the matrix  represents a random draw from the joint distribution of θ. The compo- for (i in 1:L) { nents of θ () may be dependent or independent. A[i,,] <- Theta[i,,] %*% Sigma[i,,] } 2.1 The currently-standard approach where k is the number of rows in Theta[i, ,] and m is the number of columns in Sigma[i, ,]. The usual approach now is to use loops and vector-matrix These are examples of functions that are applied rowwise computations to manipulate the matrix  of posterior simu- to a matrix of simulations, yielding an array with the first lations. This approach is general but awkward and far from dimension of size L. transparent, as we now discuss. Posterior probability statements such as Pr(θ >θ |y) are Typically, depending on the problem at hand, we name 1 2 computed by taking the proportion of the simulations which the simulation vectors and matrices according to the random () () satisfy the condition “θ1 >θ2 ,” that is, by computing variables in our model such as β and σ , and manipulate sev- 1 L L i=1 1{ () ()}, where 1A is an indicator function of an eral of these array objects in our programs. θ1 >θ2 Means, medians, quantiles, and standard deviations of event A.InR,theta[,1]>theta[,2] yields a logical scalar components are obtained by applying the correspond- vector of length L, that is, a vector of TRUE and FALSE val- ing numerical function such as mean, mean, quantile, ues. Since R coerces the TRUE values into ones and FALSE sd to each of the columns of a matrix of simulations. This values into zeros whenever necessary, we can compute the typically requires a loop or application of a special function probability value by mean(theta[,1]>theta[,2]). such as apply in R, which implements the loop more effi- Simulations from the posterior predictive distribution of ciently. The intervals and standard deviations summarize un- the n-dimensional data vector y are computed by gener- certainty in the parameter. These are examples of quantities ating random variates distributed according to the model that are computed by applying a summary function “colum- of y, given the posterior simulations of the unknown para- ∼ 2 nwise” to a matrix of simulations. For example, in R, the meters. For example, suppose that y N(Xβ,σ ) where X × posterior mean of θ is computed by mean(theta[,1]). is an n k matrix of observed covariates and β is an un- 1 = Here theta is an L × k matrix and theta[,1] refers to known (random) vector β (β1,...,βk) and σ is an un- the first column of the matrix. known scalar-valued variable. If we have posterior simula- () () Distributions of functions of random variables are ob- tions for β (β is a k-vector, so β is also a k-vector of () tained by applying the function to each row in the matrix simulations) and for the scalar σ , the posterior predictive rep of simulations. The resulting object is a vector or a ma- distribution of y, called the distribution y is rep() trix with L rows: it represents thus the distribution of the represented by simulations such that y is an n-vector function applied to a random variable. Again, this requires generated so that that a loop is written or that a function such as apply yrep() ∼ N(Xβ(),σ()). is used. This time the function is applied, to the rows of the matrix of simulations. For example, if we have a vec- In R, this step is implemented by the program, tor of unknowns θ = (θ ,...,θ ) that is represented by an 1 50 y.rep <- matrix (NA, L, n) L × 50 matrix of simulations, the distribution of the arith- for (i in 1:L) { metic mean of the components of θ is represented by the L y.rep[i,] <- rnorm(nrow(X), numbers 1 50 θ (),  = 1,...,L. 50 i=1 i mean=X %*% beta[i,], sd=sigma[i]) For a simpler example, we may be interested in the distri- } bution of θ1 + θ2. This is represented by a vector of length L () + () with components θ1 θ2 . In R, this would be computed Similarly, we can generate posterior predictive distribu- by theta[,1]+theta[,2], adding the two vectors to- tions of the unknown parameters, θ rep, if desired. gether, yielding a new vector of length L. Replicated data distributions yrep are required when com- Random matrices can be thought of random vectors with puting Bayesian p-values and posterior predictive checks. a dimension attribute. If θ is a 2 × 2 random matrix, it is rep- These can be used to assess the fit of the model. A Bayesian resented by an L × 2 × 2 matrix of simulations. We can for p-value is computed by the tail-area probability 238 Stat Comput (2007) 17: 235–244 rep ≥ () () Pr(T (y ,θ) T(y,θ)) θ1 /θ2 . Currently we are must keep in mind that the ob- ject theta is not a random vector object but a matrix of for some appropriate test function T . y is here the numbers, and to compute the ratio, we will have to write rep observed data vector and y is the replicated vector, The theta[,1]/theta[,2], obtaining a vector of numbers estimated p-value is the proportion of the L simulations for (simulations), not a random scalar object. The awkward sub- which the test quantity equals or exceeds its realized value: script indexing notation is confusing, and forces us to work that is, for which in a lower level of abstraction: we would rather like to refer to the variable θ as theta[1], that is, think about the in- T(yrep(),θ()) ≥ T(y,θ()) 1 dices as a indices referring to random variables and not as for all  = 1,...,L(Gelman et al. 2003, pp. 162–163). a indices referring to dimensions of a matrix consisting of To compute this in R is straightforward if T is a func- simulations. tion that accepts vectors of simulations and knows how to Further, to express the multiplication of two random repeat the calculation over a given vector for each simula- matrices  and , we would like to write Theta %*% tion 1,...,L. For most scalar-valued numerical functions Sigma and not a loop such as the one shown above. The this poses no great problem in R; we can plug in a vector of result of such an expression should be a random matrix ob- simulations and expect a vector of simulations back. How- ject. ever, if T is a numeric R function that acts on a vector, such While writing loops and using other programming struc- as the standard deviation function, it will not be able to ac- tures and specialized functions are certainly not difficult cept a matrix of simulations and repeat the computation au- things to do, nevertheless it takes our mind away from the tomatically for each of the L simulations. Again a loop must essence of our work: Bayesian data analysis. Program code be written or the apply function applied. intended to implement manipulation of posterior simula- tions while it does not resemble mathematical notation is 2.2 Toward a more natural programming environment but an attempt to emulate such notation. As Bayesian data analysts and programmers, we think of As we have seen, thanks to the convenient vectorized pro- constants as special cases of random variables: they are just gramming language of R, most of the simplest program- variables having point-mass distributions and represented by ming tasks in posterior simulation manipulation are not too a single simulation draw. We do not wish to treat constants difficult to implement. Some of them take only one line to and random variables as two separate incompatible classes write, although they do not look intuitive: the distribution of objects. Our programming environment should be able of the standard deviation of a random vector θ would be to treat both constants and random variable objects equally: computed by applying the sample standard deviation func- whatever one can do with constants one should be able to do tion to the rows of a matrix of simulations theta by exe- with random variables. More specifically, any function that cuting apply(theta,1,sd), which returns a vector of accepts a numeric vector (or matrix) should be able to ac- simulations of length L. Ideally, we would like to write cept a random vector (or matrix) instead, producing a new “sd(theta)” where theta is not a matrix of simula- random variable object as the result. Therefore, for instance, tions, but rather a vector of random variable objects. The to compute the distribution of the sample standard deviation result should be a scalar random variable object, containing statistic of a random vector θ, one should be able to write the simulations computed by apply(theta,1,sd). sd(theta) and obtain a scalar-valued random variable— that summarizes the uncertainty in the distribution of the However, the main motivation here is not only further 1 k simplification of code, but the need to work in a more nat- function sd, that is, k−1 i=1 θi . ural programming environment, where objects to be manip- The very fact that constants are just very special random ulated are random variables that are in the same conceptual variables (with point-mass distributions) hints at us that the level as the random variables in mathematical notation. Ma- syntax involving random variable objects should be no dif- nipulation of individual simulations should be kept out of ferent from that involving regular numeric variables. If we the view, in the background, and the user should be able to compute a function of a random vector or matrix, we obtain focus on telling the computer what function of the random a random vector or matrix; if we “let the of the com- variables—not that of the simulations—to compute. ponents go to zero,” the computation should be equivalent to For another example, suppose that we wish to compute a computation with numeric vectors or matrices, returning a the distribution of the ratio of two random scalars, say (constant) numeric result. θ1/θ2. We would like to have a random vector object theta available, of some fixed length k, and be able to write Imputation In a programming environment that treats nu- r <- theta[1]/theta[2] to obtain another random meric constants and random variables as equals, it should variable object r whose simulations consist of the ratios be natural that random variable objects can be embedded Stat Comput (2007) 17: 235–244 239 in vectors that contain constants. The resulting vector is a can plot for example 50% and 80% intervals on top of each “mixed” random vector object, where constants appear as other. A scatterplot of a pair of a vectors of random vari- if they were random variables with zero variance. Constant ables (x, y) would then amount to several clouds of points, components behave as if they consisted of L draws from a but a vector of constants x and random variables y would be degenerate distribution, although only one number is stored displayed as a series of vertical uncertainty intervals. in memory. This would make possible the straightforward imputation Other functions Many random-variable specific functions of missing values with random variables. Since such mixed need to be written, for example a function returning means, vectors are random variable objects, they should be accepted quantiles, medians, and standard deviations of each compo- as arguments by any numeric function. nent of a random vector or matrix.

Generating replications Without random variable objects, Distinguishing rowwise and columnwise operations It is rep generating replications y is done by repeated calls to a important to realize that the “means of the simulations of the random variate generating function (in R, the loop is exe- components of a random vector” are quite different from the cuted automatically if vectors are given as arguments); see “mean of a random vector.” The former is a “columnwise” the example above. Naturally, these functions are designed operation on a matrix of simulations, returning constants, to work only with numbers, so we need to write new func- and the latter is a rowwise operation, returning a vector or tions that can take random vectors and arrays as arguments a matrix of simulations. In other words, the columnwise op- and return new random variable objects containing newly erations summarize the uncertainty of each scalar random generated simulations. variable by a number, but the rowwise operations are func- Let us return to the example above where the matrix tions of the random vectors themselves, returning a new ran- rep rep() ∼ of simulations of y was made by generating y dom vector or matrix. () () N(Xβ ,σ ). By introducing a new “normal random vari- If theta is a random vector object, the function call able generating function,” call it rvnorm, we will be able mean(theta) must return a random variable object, since to obtain the random variable object representing the distri- mean is defined to take a vector and return a scalar. Giv- rep bution of y by the function call, ing mean a random vector must return a random scalar, y.rep <- rvnorm(mean=X %*% beta, which then represents the distribution of the random variable 1 k θ . Internally the result of mean(theta) must sd=sigma) k i=1 i contain the simulations 1 k θ () for  = 1,...,L. where both the mean and standard deviation parameters are k i=1 i The same logic goes for all numeric functions such as the random variable objects. The mean is a vector and standard variance (var), (cov), and any user-defined nu- deviation is a scalar, applying to all components of the mean. meric function. “ in, randomness out.” In some For non-normal models we will need to write other random cases the output may of course be a constant, but as ex- variable generating functions that draw from other distribu- tions. plained before, they are just special cases of random vari- ables. However, if we want to return numerical summaries of Graphical summaries Graphical summaries for random the simulations such as the posterior means and standard variable objects, such as interval plots, histograms, scat- terplots and contourplots of simulations need to be imple- deviations, we need to write new functions. For example, rvmean(theta) mented by writing new, special functions that accept random a function would return the posterior rvsd(theta) variable objects. means for each component of θ; the stan- rvmedian(theta) In many cases we may extend existing methods. Take dard deviations, the medians, and so for example plot(x,y), which plots a scatterplot of forth. two numeric vectors. A scatterplot of two random scalars (x, y) should produce a cloud of points representing draws (simulations) from the joint distribution of (x, y).Ifthe 3 Implementation x-coordinate of a pair of scalars (x, y) is a constant and the y-coordinate is a random variable object, we would ex- Taking the above ideas into consideration, we have written pect to see an uncertainty interval of y as a vertical line. a working prototype of a random-variable enabled program- Similarly if y is constant but x is random, we expect to see ming environment within R. This collection of functions is a horizontal uncertainty interval. If we want we could plot implemented as an R “package” which extends the function- the or the mean as a point on top of the interval. ality of R. Once the package is loaded and enabled, random Moreover, using colors or varying thickness of the line, we variable objects can be generated and used as arguments in 240 Stat Comput (2007) 17: 235–244 almost any numerical functions. A collection of random- and then use this model to predict the final exam scores of variable specific functions that implement graphical and nu- the other five students. We use a noninformative prior on merical summaries and functions that generate new random (β, log(σ )),orp(β,σ) ∝ 1/σ . variable objects from various distributions are available. A random variable object is implemented internally as a 4.1.1 Posterior predictive distribution list of vectors of simulations. The simulations are not kept in matrices but each “column” is kept separate for conve- Our goal is to impute the missing values with draws from the nience; one vector of simulations corresponds to one com- posterior predictive distribution of y. We do this by simulat- ponent of a random vector or a matrix. Since random ma- ing β = (β1,β2) and σ from their joint posterior distribution trices are just random vectors with dimension attribute, and and then generating the missing elements of y from the nor- random scalars are just random vectors of length 1, in the mal model. following whatever is referred to as “random vectors” will Assume that we have obtained the classical estimates ˆ also apply to random scalars and random matrices. (β,σ)ˆ along with the unscaled Vβ using Applying arithmetic operations and elementary functions the standard linear fit function lm in R. The posterior distri- on a random vector object produces a new random vector bution of σ is then object that consists of a list of appropriate vectors of sim- 2 ulations. If a plain numeric vector object is imputed as a σ |x,y ∼ˆσ · (n − 2)/z, where z ∼ χ (n − 2). random scalar or a vector, the resulting object is a new ran- dom vector with the remaining numeric components being In our programming environment, this mathematical for- treated as “vectors of simulations of length 1.” mula translates to the statement, In an interactive computing environment, we may type z <- rvchisq(df=n-2) the name of an object (variable) on the console and view sigma <- sigma.hat*sqrt((n-2)/z) immediately its value. It is obvious what we expect to see rvchisq when typing the name of a numerical vector or an array, but The function returns a random variable object how should a random variable object be displayed? Since a consisting of L independent and identically distributed random variable object consists of independent draws from simulations from the chi-square distribution. The quanti- sigma.hat n a distribution, it is natural to view quantiles and such nu- ties and statement are fixed scalars. Since sigma z merical summaries as the mean and standard deviation of is a constant divided by a random variable ,it the simulations. The mean of the simulations is an estimate is a random variable object. we treat this variable in our of the expectation of the random variable. program as if it were the actual random variable of the The most often used summaries can be viewed most con- mathematical model. The posterior distribution of β is | ∼ ˆ 2| 2 veniently by entering the name of the random vector on the β σ,x,y N(β,Vβ σ x,y,σ ), which can be simulated console; the default printing method returns the mean, stan- by, dard deviation, and the 1%, 2.5%, 25%, 50%, 75%, 97.5%, beta <- rvnorm(mean=beta.hat, and 99% quantiles. Most functions can be adapted easily to var=V.beta*sigma^2) accept random variable objects as arguments. where rvnorm (“normal random variable”) is a function The best way to illustrate how the new programming en- returning a vector of (multivariate) Gaussian random vari- vironment works is to give some examples. ables, with each component consisting of simulations. The argument var accepts a variance matrix, which in this case depends on the random variable sigma. Thus we are ac- 4 Examples tually drawing from a distribution of β, averaging over the uncertainty of σ . 4.1 Prediction and multiple imputation The length of beta is determined by the arguments: in this case beta has the same length as the mean vector, We illustrate the use of our programming environment with beta.hat. a simple example of regression prediction using posterior Our implementation imitates the corresponding mathe- simulations. Suppose we have a class with 15 students of matical expressions as much as possible; however, an equiv- which all have taken the midterm exam but only 10 have alent program written in plain R could look like, taken the final. We shall fit a model to the ten students with complete data, predicting final exam scores sigma <- array(NA,L) # Allocate a vector y from midterm exam scores x, beta <- array(NA,c(L,2)) # Allocate a matrix 2 yi|β1,β2,xi,σ ∼ N(β1 + β2xi,σ ) for (sim in 1:L) { Stat Comput (2007) 17: 235–244 241 sigma[sim] <- sigma.hat*sqrt((n-2)/ rchisq(1,n-2)) beta[sim,] <- mvrnorm(1, beta.hat, V.beta*sigma[sim]^2) } The “traditional” way of writing R code is longer and less robust: besides having to implement a looping structure, one must constantly mind the dimensions of the simula- tion matrices and the slightly awkward index notation where beta[sim,] refers to the simulation number sim from the joint distribution of (β1,β2,β3); in contrast, beta[,1] is the vector of L simulations for β1. According to the model, y is normal with mean Xβ and standard deviation σ . The predictions for the missing y val- ues are obtained by the natural statement, y.pred <- rvnorm(mean=beta[1] +beta[2]*x[is.na(y)], sd=sigma) where rvnorm returns independent and identically distrib- uted normal variables, but here the two arguments, the mean Fig. 1 Predicting the final examination scores: uncertainty intervals and standard deviation, are both random variables; thus for the five predicted final exam scores. This is a scatterplot of the vec- we are drawing y from its marginal distribution, averag- tor pair (x, y) where the components x are all constant and y includes ten constant and five random components. This is done by a generic ing over the uncertainty of both σ and β. x[is.na(y)] function call “plot(x,y)”; the five pairs (xi ,yi ) where yi is random simply picks the covariates for the missing five students. are automatically drawn as intervals (the 50% intervals are shown as Since beta[1] and beta[2] are scalar-valued random solid vertical lines and the 95% intervals as dotted vertical lines.) The variables, the mean will be a vector of the same length as observed pairs (xi ,yi ) are shown as circles. x[is.na(y)], that is, five. The resulting vector y.pred is also a vector of length five. This particular step would be impossible in standard R: Point estimates for the estimated parameters are obtained each component y.pred is internally represented by a ma- using predefined functions, for example, the posterior mean trix of possibly thousands of rows of simulations, but the (expectation) for β is given by rvmean(beta), and the left-hand side of the assignment, y[is.na(y)], is a nu- medians of β1,β2 by rvmedian(beta). meric vector of length 5. The distributions of the predicted values are quickly sum- The predictions can be plotted along with the observed marized by typing the name of the variable on the console: (x, y) pairs using the command plot(x, y) which shows the determinate values as points and the random values as > y.pred name mean sd 1% 2.5% 25% 50% 75% 97.5% 99% sims intervals. [1] Alice 68.9 19.5 ( 17.8 28.7 56.9 69.2 81.0 106 116 ) 1000 [2] Bob 73.6 17.6 ( 28.3 38.8 63.0 73.5 84.1 108 118 ) 1000 4.1.3 Computing functions of random vectors [3] Cecil 65.7 20.7 ( 17.3 22.7 52.2 66.0 79.1 105 115 ) 1000 [4] Dave 70.3 17.5 ( 24.4 34.4 59.5 70.7 81.5 103 109 ) 1000 A function of the random variables is obtained as easily as [5] Ellen 74.8 18.0 ( 34.0 40.9 64.3 73.8 85.2 113 122 ) 1000 a function of constants. For example, the distribution of the mean score of the class is mean(y), 4.1.2 Imputing the predicted values > mean(y) mean sd 1% 2.5% 25% 50% 75% 97.5% 99% sims Our object-oriented framework allows for combining con- [1] 72.8 3.39 ( 64.2 66 70.7 72.8 74.8 79.2 81 ) 1000 stants with random variables into a single vector or array. The “mean” here is not the expectation of the random Thus we can impute the random variables into the original vector y, but the distribution of the arithmetic average of the vector y which used to hold only constants and missing val- 15 components of the vector y. ues. This is done by the R statement, y[is.na(y)] <- y.pred 4.1.4 Computing probabilities of events which replaces the missing values (indicated by the symbol In R, if we use comparison operators with numeric vectors NA) by their corresponding predictions. we obtain true/false values; using them with random vectors 242 Stat Comput (2007) 17: 235–244 we obtain distributions of indicators of events. The proba- the posterior distribution√ of σ is, just like in our data imputa- bility of such an event (or, equivalently, the expectation of tion example, σ |V ∼ˆσ · (n − 3)/z, where z ∼ χ2(n − 3), the corresponding random variable) is computed using the while the posterior distribution of β,givenσ ,isβ|σ, Vβ ∼ ˆ 2 function Pr. For example, the probability that the average is N(β,Vβ σ |σ, Vβ ). The next step is to generate posterior more than 80 points is given by Pr(mean(y)>80), which simulations for the parameters β and σ . These are generated comes to 0.04 in this set of simulations. by identical statements as those in the previous example. We now predict the outcome for the following election 4.2 Regression forecasting year, 1988. The posterior predictive distribution is,

vpred88|θ,y ∼ N(β + β v86 + β p88,σ2) We illustrate the power and convenience of random-variable 1 2 3 computations with a nonlinear function of regression pre- which is a normal random vector of length n = 80. The in- dictions. We shall first fit a classical regression, then use dicator of incumbency in 1988, p88, is the indicator of the simulation objects to summarize the inferences and obtain event {v86 > 0.5}. predictions. As we shall see, it is then easy to work directly with the random vector objects to get nonlinear predictions. mu <- beta[1]+beta[2]*v.86 Our data consist of the percentage of the vote of the +beta[3]*p.88 Democratic party’s share of the two-party vote in legisla- v.pred.88 <- rvnorm(mean=mu, sd=sigma) tive elections in California in years 1982, 1984, and 1986: 4.2.2 Making forecasts v82,v84,v86, respectively. (Missing values were imputed the value 0.5, and vote proportions of uncontested elections (ze- The posterior predictive distribution can be used to compute ros and ones) were imputed 0.25 and 0.75, respectively. This various forecasts. For example, the predicted average Demo- is a simplified version of a standard model for state legisla- cratic district vote in 1988 is v¯pred88 = 1 n vpred88, that tive elections (Gelman and King 1994).) n i=1 i is, the arithmetic average of the predicted values pred.88: These are all vectors of length n = 80, corresponding to the 80 election districts. We also included predictors that indicate the incumbent party in the district; the values for avg.pred.88 <- mean(v.pred.88) years 1984 and 1986 were simply computed from the previ- Any linear or nonlinear function of the predictions is ous election results: obtained in a straightforward way. For example, the pro- − ¯pred88 = 1ifvt 2 >.5, portion of seats obtained by the Democrats is s t i 1 n = ∈{ } = 1 pred88 , which translates to pi − for t 84, 86 . n i 1 {v >0.5} −1ifvt 2 <.5 i i s.pred.88 <- mean(v.pred.88>0.5) The data frame is, ⎛ ⎞ 4.3 Posterior predictive checking 88 86 86 84 84 82 p1 v1 p1 v1 p1 v1 ⎜ ⎟ ⎜ p88 v86 p86 v84 p84 v82 ⎟ Posterior predictive checking is a Bayesian model valida- = ⎜ 2 2 2 2 2 1 ⎟ V ⎝ ...... ⎠ . tion technique where we draw simulations from the posterior ...... predictive distribution of the observed data y and compare p88 v86 p86 v84 p84 v82 n n n n n n them to the observed values (Gelman et al. 2003, Chap. 6). The model for the proportion of the Democratic vote has Discrepancies between the observed and simulated values two predictors: the outcome in the previous election and the indicate possible model deficiency. incumbent party: 4.3.1 Drawing replicated data 86| ∼ + 84 + 86 2 = vi θ N(β1 β2vi β3pi ,σ ), i 1,...,n, We illustrate with the election example, where the posterior where θ = (n, V,β,σ)and β = (β1,β2,β3). predictive distribution in the election example is the hypo- thetical distribution of the observations in year 1986; that 4.2.1 Posterior predictive distribution is, a draw from the posterior predictive distribution of y given the (original) predictors of 1984. These simulations We fit the linear regression model and obtain the least- are called replications and obtained by, ˆ ˆ ˆ ˆ squares estimates β = (β1, β2, β3), the classical unbiased mu <- beta[1]+beta[2]*v.84 variance estimate σˆ 2, and the (unscaled) covariance ma- +beta[3]*p.86 trix Vβ . Again using a noninformative prior p(β,σ) ∝ 1/σ , v.rep.86 <- rvnorm(mean=mu, sd=sigma) Stat Comput (2007) 17: 235–244 243 where mu and sigma are random variables representing the The corresponding program code is again obvious: posterior distributions of μ and σ . p.value <- Pr(T.switch(v.rep.86,p.86) In general, we may consider the linear regression model >= T.switch(v.86,p.86)) y|X,β,σ ∼ N(Xβ,σ2), where X is the predictor matrix, β is the k-vector of coefficients and σ is the standard deviation The p-value comes to 0.61 in this set of simulations. of y. If we include the relevant predictors in a matrix X, T switch(vrep86,p86) is summarized by then we can obtain either predictions or replications by the mean sd 1% 2.5% 25% 50% 75% 97.5% 99% sims statement, [1] 4.2 1.89 ( 0 1 3 4 5 8 9 ) 1000 rvnorm(mean=X %*% beta, sd=sigma) with a different prediction matrix X. The above statement re- 5 Discussion turns a normally distributed random vector given a predictor matrix X, the coefficient vector beta, and the standard devi- 5.1 Summary of advantages ation sigma. These parameters may contain both constants From the Bayesian viewpoint, it is natural to expect the com- and random components. The replications can be obtained puting environment to accept inputs as either constants or by the function call, random: we essentially treat all quantities as random vari- v.rep.86 <- rvnorm(mean=X.84 %*% beta, ables, constants being just given realizations of them. The sd=sigma) absence of the random variable data type forces us to write where beta and sigma contain the posterior simulations code that somehow emulates the existence of such a data type, as shown in the examples above. Unless we develop a of β and σ , and X84 is the predictor matrix of the year 1984 vote proportions and incumbency indicators, along with the common framework that will actually implement this data constant predictor column.1 type, we will end up writing similar code over and over again. This will result in longer, more complicated program 4.3.2 Computing test quantities code that is more likely to contain errors. With random vec- tor objects, our program code becomes compact, easy to A test quantity, or “discrepancy measure” is a scalar sum- read, and easy to debug, since in many cases it will resemble mary of parameter and data that is used as a standard when mathematical notation. comparing data to predictive simulations (Gelman et al. To summarize the advantages: 2003). It is straightforward to compute test quantities from 1. Transparency. Program code written for numerical vec- the replicated data: one needs to write a function T(y,θ), tors can be often used for mixed vectors without modifi- where y is a vector that has a similar distribution as the ob- cation. No looping structures to emulate simulation-by- served values, and θ is the vector containing any given pre- simulation computation need to be written. dictors and parameter values. 2. Flexibility and robustness. Program code works with Arithmetic operations and elementary numeric functions both random and mixed input parameters. There is no will also work with random variables, so in practice, almost need to rewrite separate code for generating predictions any R function that accepts a numerical vector y of length n and replications. The code accepts different (random or can be used to compute the distribution of T(yrep,θ). constant) types of input. In the regression example, one suitable test quantity 3. Intuitive appeal. Program code resembles more like could be for example the number of “switches” occurred in mathematical notation than “traditional” computer pro- the n = 80 districts of California between consecutive years, grams. Ugly technical details have been hidden from that is, number of districts where the incumbent party was n user’s view. defeated: T switch(v, p) = |1{ } − p |. The corre- i=1 vi >0.5 i 4. Improved readability. Short, compact expressions are sponding R code is T.switch <- function (v,p) more readable and easier to understand than traditional sum(abs((v>0.5)-p)). This code works with constant code with looping structures and awkward matrix index- vectors v, p, but also with random vectors. ing notation. This is especially helpful when one needs Lack of fit of the data with respect to the posterior predic- to interpret code written by someone else. tive distribution can be expressed by the Bayesian p-value, 5. Productivity gain. Compact, intuitive program code is which is defined as the expected value of the replicated test relatively easy to write, easy to debug, and easy to main- quantity being at least extreme as the observed test quantity: tain. We can concentrate on Bayesian data analysis and p-value := Pr{T switch(vrep86,p86) ≥ T switch(v86,p86)}. program in a more natural syntax. All of these advantages stem from the conceptual advan- 1X.84 <- cbind(1, v.84, p.86). The standard R function tage of thinking in terms of random variable objects rather cbind returns a matrix with given columns. than just in terms of arrays of simulations. 244 Stat Comput (2007) 17: 235–244 5.2 Disadvantages Simulation-based random variable objects can also be used in conjunction with semi-analytical methods for Despite a host of obvious advantages, the object-oriented ap- Bayesian inference (“Rao-Blackwellization”; see Gelfand proach has an obvious disadvantage compared to the tradi- and Smith 1990) such as used in computing Bayes factors tional way of programming: we do not have the opportunity (e.g., Chib 1995; Chib and Jeliazkov 2001) and inferences to optimize the code generating the random variates to the for very small (e.g. Gelman et al. 1998). In ad- specific task. dition, random variable objects can be used for direct prob- For example, the multiplication of an indicator (usually ability calculations (Kerman 2005). obtained by applying a logical operation such as “>”) by an- Just as vector and matrix computations allow the user to other random variable is implemented by first drawing the L operate at a higher level of abstraction (compare, for exam- simulations for the indicator, and then drawing the L simu- ple, code in Fortran 77 to Fortran 90), we believe that ran- lations for the other random variable, and finally multiplying dom variable objects have the potential to facilitate a more the simulations componentwise. An optimized code would “statistical” framework for computing in the presence of un- be able to skip the computation of the expression for the sec- certainty. ond variable if the indicator produced a zero. Customized code would also be able to combine nested function calls, Acknowledgements We thank the National Science Foundation for partial support of this research. but in the object-oriented method the computer must com- pute values for function calls one by one. However, we feel that the gain in speed in most cases References may be negligible compared to the effort needed to write, edit, and debug optimized code. Truly speed-critical appli- Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. As- cations require machine-compiled code written, e.g., in C or soc. 90, 1313–1321 (1995) Fortran. If possible, such optimization should be done in the Chib, S., Jeliazkov, I.: Marginal likelihood from the Metropolis- “system level” so that the end-users need not worry about Hastings output. J. Am. Stat. Assoc. 96, 270–281 (2001) Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculat- such technical details. ing marginal densities. J. Am. Stat. Assoc. 94, 247–253 (1990) Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data 5.3 Conclusion Analysis, 2nd edn. Chapman & Hall/CRC, London (2003) Gelman, A., King, G.: A unified model for evaluating electoral systems and redistricting plans. Am. J. Political Sci. 38, 514–554 (1994) Computing functions of simulations and drawing graphs of Gelman, A., King, G., Boscardin, W.J.: Estimating the probability of them involves manipulating arrays of numbers by writing events that have never occurred: when does your vote matter? code that imitates the manipulation of objects that behave J. Am. Stat. Assoc. 93, 1–9 (1998) like random variables. By introducing a special object class Kerman, J.: Using random variable objects to compute probability sim- ulations. Technical Report, Department of Statistics, Columbia representing random variables and arrays, we can create a University (2005) programming environment that makes manipulating simula- Lunn, D.J., Thomas, A., Best, N., Spiegelhalter, D.: WinBUGS—a tions intuitive and effective. Bayesian modelling framework: concepts, structure, and exten- Our programming environment proves to be especially sibility. Stat. Comput. 10, 325–337 (2000) Oldford, R.W.: The Quail project: a current overview. Invited pa- helpful for Bayesian data analysis, where routinely consid- per, 30th Symposium on the Interface, Minneapolis (1998). ering all uncertain quantities as random variables is natural. http://www.stats.uwaterloo.ca/~rwoldfor/papers/Interface1998/ Although researchers need to understand fully the under- paper.pdf lying mechanism of manipulating simulations, we believe R Development Core Team: R: A Language and Environment for Sta- tistical Computing. R Foundation for Statistical Computing, Vi- that writing code to emulate the manipulation of simulation- enna (2004) based random variable objects is a wasted effort if an appli- Tierney, L.: LISP-STAT: An Object-Oriented Environment for Statisti- cation to simplify the programming syntax is available. Our cal Computing and Dynamic Graphics. Wiley, New York (1990) approach should help to make writing, reading, and under- Sturtz, S., Ligges, U., Gelman, A.: R2WinBUGS: a package for run- ning WinBUGS from R. J. Stat. Softw. 12(3), 1–16 (2005). ISSN standing programs more efficient. 1548-7660