Parameterization and Bayesian Modeling

Parameterization and BayesianModeling Andrew GELMAN Progressin statistical computationoften leads toadvances instatistical modeling.For example, it is surprisinglycommon that an existing modelis reparameterized, solelyfor computational purposes, but then this new conguration motivates a new familyof models that is useful inapplied statistics. One reasonwhy this phenomenon may nothave been noticed in statistics is thatreparameterizations donot change the likelihood.In a Bayesian framework, however,a transformationof parameters typicallysuggests a new familyof priordistributions. We discussexamples incensoredand truncated data, mixture modeling, multivariate imputation, stochastic processes, andmultilevel models. KEY WORDS:Censored data; Data augmentation;Gibbs sampler; Hierarchical model;Missing-data imputation; Parameter expansion; Priordistribution; Truncated data. 1.INTRODUCTION tospeed Gibbs samplers (e.g., Hills and Smith 1992; Boscardin 1996;Roberts and Sahu 1997). Progressin statistical computation often leads to advances From aBayesianperspective, however, new parameteriza- instatistical modeling. W eexplorethis idea in the context of tionscan lead to newprior distributions and thus new models. dataand parameter augmentation— techniques in which latent Oneway in which this often occurs is iftheprior distribution dataor parameters are added to a model.Data and parameter for aparameteris conditionallyconjugate (i.e.,conjugate in the augmentationare methods for reparameterizinga model,not conditionalposterior distribution), given the data and all other changingits description of data, but allowing computation to parametersin themodel. In Gibbs sampler computation, condi- proceedmore easily, quickly, or reliably.In a Bayesiancontext, tionalconjugacy can allow more ef cient computation of poste- however,these latent data and parameters can often be given riormoments using “ Rao–Blackwellization” (see Gelfandand substantiveinterpretations in a waythat expands the model’ s Smith1990). This technique is also useful for performingin- practicalutility. ferenceson latent parameters in mixture models, conditional on 1.1 Data Augmentation and Parameter Expansionin convergenceof thesimulations for thehyperparameters. As we Likelihood and BayesianInference discussin Section5, parameterexpansion leads to newfamilies ofconditionally conjugate models. Dataaugmentation (Tannerand W ong1987) refers toafam- Onceagain, there is ananalogyto Bayesian inference in sim- ilyof computationalmethods that typically add new latent data plersettings. For example,in classical regression, applying a thatare partially identi ed by thedata.By “ partiallyidenti ed,” lineartransformation to regression predictors has no effect on we meanthat there is some information about these new vari- thepredictions. But in a Bayesianregression with a hierarchi- ables,but as sample size increases, the amount of information calprior distribution, rescaling and other linear transformations abouteach variable does not increase. Examples included in canpull parameters closer together so that shrinkage is more thisarticle include censored data, latent mixture indicators, and effective,as we discussin Section5.1. latentcontinuous variables for discreteregressions. Data aug- mentationis designedto allow simulation-based computations 1.2 Model Expansionfor Substantiveor tobe performedmore simply on thelarger space of “complete Computational Reasons data,”by analogyto theworkings of theEMalgorithmfor max- Theusual reason for expandinga modelis for substantive imumlikelihood (Dempster, Laird, and Rubin 1977). reasons—to bettercapture an underlying model of interest,to Parameterexpansion (Liu,Rubin, and Wu 1998) typically better texistingdata, or both. Interesting statistical issues addsnew parameters that are nonidenti ed— inBayesian terms, arisewhen balancing these goals, and Bayesian inference with ifthey have improper prior distributions (as theytypically do), properprior distributions can resolve the potential nonidenti - thenthey have improper posterior distributions. An important abilityproblems that can arise. exampleis replacing a parameter µ bya product, ÁÃ, so that ABayesianmodel can be expandedby adding parameters, or inferencecan be obtainedabout the productbut not about either asetof candidatemodels can be bridgedusing discrete model individualparameter .Parameterexpansion can be viewed as averagingor continuous model expansion. Recent treatments partof alargerperspective on iterativesimulation (see vanDyk ofthese approaches from aBayesianperspective have been andMeng 2001; Liu 2003), but our focus here is on its con- presentedby Madigan and Raftery (1994), Hoeting, Madigan, structionof nonidentiable parameters as acomputationalaid. Raftery,and V olinsky(1999), Draper (1995),and Gelman, Bothdata augmentation and parameter expansion are excit- Huang,van Dyk, and Boscardin (2003, secs. 6.6 and 6.7). In ingtools for increasingthe simplicity and speed of compu- thecontext of data tting,Green and Richardson (1997) showed tations.In a likelihood-inferenceframework,that is all they howBayesian model mixing can be usedto perform theequiv- canbe— computational tools. By design, these methods do not alentof nonparametricdensity estimation and regression. changethe likelihood; they only change its parameterization. Anotherimportant form ofmodelexpansion is for sensitiv- Thesame goes for simplercomputational methods, such as ityto potential nonignorability in data collection (see Rubin standardizationof predictorsin regression models and rotations 1976;Little and Rubin 1987). The additional parameters in AndrewGelman is Professor,Department ofStatistics, Columbia Univer- ©2004American Statistical Association sity,New York,NY 10027(E-mail: [email protected] ).Theauthor Journalof the American Statistical Association thanksXiao-Li Meng and several reviewers forhelpful discussions and the U.S. June2004, V ol.99, No. 466, Review Article NationalScience Foundationfor nancialsupport. DOI 10.1198/016214504000000458 537 538 Journalof theAmerican Statistical Association, June 2004 thesemodels cannot be identied but are variedto explore sen- 2.2 Modeling Truncated Data asCensored but With an sitivityof inferencesto assumptionsabout selection (see Diggle Unknown Number ofCensored Data Points andKenward 1994; Kenward 1998; Rotnitzky, Robins, and Now supposethat N isunknown.W ecanconsider two op- Scharfstein1998; Troxel, Ma, and Heitjan 2003). Nandaram tionsfor modelingthe data: andChoi (2002) argued that continuous model expansion with aninformative prior distribution is appropriate for modeling 1.Using the truncated-data model (1) potentialnonignorability in nonresponse,and showed how the 2.Using the censored-data model (2), treating the original sample size N asmissingdata. variationin nonignorabilitycan be estimatedfrom ahierarchi- caldata structure (see alsoLittle and Gelman 1998). Thesecond option requires a probabilitydistribution for N. The Inthis article, we donot further consider these substantive completeposterior distribution of theobserved data y and miss- examplesof modelexpansion, but rather discuss several classes ing data N is then ofmodels for which theoretically or computationally motivated N p.¹; ¾; N y/ p.N/p.¹; ¾/ modelexpansion has unexpectedly led to newinsights or new j D 91 classesof models. All of these examples feature new para- ³ ´ N 91 91 metersor model structures that could be considered a purely ¹ 200 ¡ 8 ¡ N.y ¹; ¾ 2/: mathematicalconstructions but gain new life when given direct £ ¾ i j ³ ³ ´´ i 1 interpretations.Where possible, we illustratewith applications YD from ourown research so that we havemore certainty in our Wecanobtain the marginal posterior density of .¹; ¾/ by sum- claimsabout the original motivation and ultimate uses of the ming over N, N 91 reparameterizationsand model expansions. 1 N ¹ 200 ¡ p.¹; ¾ y/ p.N/p.¹; ¾ / 8 ¡ j / 91 ¾ N 91 ³ ´³ ³ ´´ 2.TRUNCA TED AND CENSORED DATA XD 91 Itis well understood that the censored data model is like N.y ¹; ¾ 2/ £ i j thetruncated data model, but with additional information. With i 1 YD censoring,certain speci c measurementsare missing. Here we 91 furtherexplore the connection between the two models. p.¹; ¾ / N.y ¹; ¾ 2/ D i j i 1 2.1 Truncated and Censored Data Models YD N 91 1 N ¹ 200 ¡ p.N/ 8 ¡ : (3) Weworkin the context of a simpleexample (see Gelman £ 91 ¾ N 91 ³ ´³ ³ ´´ etal. 2003, sec. 7.8). A randomsample of N animals are XD weighedon adigitalscale. The scale is accurate,but it doesnot Itturns out that if p.N/ 1=N,thenthe expression inside the / givea readingfor objectsthat weigh more than 200 pounds. Of summationin (3) hasthe form ofa negativebinomial density with µ N 1, ® 91, and 1 8..¹ 200/=¾//. the N animals, n 91aresuccessfully weighed; their weights D ¡ D ¯ 1 D ¡ D Theexpression inside the summation Cis proportional to .1 are y1; : : : ; y91.Weassumethat the weights of the animals in ¹ 200 91 ¡ thepopulation are normallydistributed with mean ¹ and stan- 8. ¡¾ //¡ ,sothat for thisparticular choice of noninfor- darddeviation ¾ . mativeprior distribution, the entire expression (3) becomes Inthe “ truncateddata” scenario, N isunknown and the pos- proportionalto thesimple truncated-data posterior density (1). teriordistribution of the unknown parameters ¹ and ¾ of the Mengand Zaslavsky (2002) discussed other properties of the 1=N priordistribution and the negative binomial model. data is Itseems completely sensible that if we adda parameter N tothe model and average it out, then we

Load more