
Statistical Science 2015, Vol. 30, No. 2, 242–257 DOI: 10.1214/14-STS510 c Institute of Mathematical Statistics, 2015 Approximate Bayesian Model Selection with the Deviance Statistic Leonhard Held, Daniel Saban´es Bov´eand Isaac Gravestock Abstract. Bayesian model selection poses two main challenges: the specification of parameter priors for all models, and the computation of the resulting Bayes factors between models. There is now a large literature on automatic and objective parameter priors in the linear model. One important class are g-priors, which were recently extended from linear to generalized linear models (GLMs). We show that the resulting Bayes factors can be approximated by test-based Bayes fac- tors (Johnson [Scand. J. Stat. 35 (2008) 354–368]) using the deviance statistics of the models. To estimate the hyperparameter g, we propose empirical and fully Bayes approaches and link the former to minimum Bayes factors and shrinkage estimates from the literature. Furthermore, we describe how to approximate the corresponding posterior distribu- tion of the regression coefficients based on the standard GLM output. We illustrate the approach with the development of a clinical prediction model for 30-day survival in the GUSTO-I trial using logistic regres- sion. Key words and phrases: Bayes factor, deviance, generalized linear model, g-prior, model selection, shrinkage. 1. INTRODUCTION et al. (1995)]. We study a publicly available sub- group from the Western region of the USA with The problem of model and variable selection is n = 2188 patients and prognosis of the binary end- pervasive in statistical practice. For example, it is point 30-day survival [Steyerberg (2009)]. In order central for the development of clinical prediction to develop a clinical prediction model for this end- models [Steyerberg (2009)]. For illustration, we con- point, we focus our analysis on the assessment of the arXiv:1308.6780v3 [stat.ME] 18 Aug 2015 sider the GUSTO-I trial, a large randomized study effects of 17 covariates listed in Table 1 in a logistic for comparison of four different treatments in over regression model. 40,000 acute myocardial infarction patients [Lee There is now a large literature on automatic and objective Bayesian model selection, which unburden Leonhard Held is Professor and Isaac Gravestock is the statistician from eliciting manually the parame- Ph.D. Student, Department of Biostatistics, Institute of Epidemiology, Biostatistics and Prevention, University ter priors for all models in the absence of substan- of Zurich, Hirschengraben 84, 8001 Zurich, Switzerland tive prior information [see, e.g., Berger and Pericchi e-mail: [email protected]; [email protected]. (2001)]. However, such objective Bayesian method- Daniel Saban´es Bov´eis Biostatistician at F. ology is currently limited to the linear model [e.g., Hoffmann-La Roche Ltd, 4070 Basel, Switzerland Bayarri et al. (2012)], where the g-prior on the re- e-mail: daniel.sabanes [email protected]. gression coefficients is the standard choice [Liang This is an electronic reprint of the original article et al. (2008)]. For non-Gaussian regression, there are published by the Institute of Mathematical Statistics in computational and conceptual problems, and one so- Statistical Science, 2015, Vol. 30, No. 2, 242–257. This lution to this are test-based Bayes factors [John- reprint differs from the original in pagination and son (2005)]. Consider a classical scenario with a typographic detail. null model nested within a more general alternative 1 2 L. HELD, D. SABANES´ BOVE´ AND I. GRAVESTOCK Table 1 Description of the variables in the GUSTO-I data set Variable Description y Death within 30 days after acute myocardial infarction (Yes =1, No=0) x1 Gender (Female = 1, Male = 0) x2 Age [years] x3 Killip class (4 categories) x4 Diabetes (Yes = 1, No = 0) x5 Hypotension (Yes = 1, No = 0) x6 Tachycardia (Yes = 1, No = 0) x7 Anterior infarct location (Yes = 1, No = 0) x8 Previous myocardial infarction (Yes = 1, No = 0) x9 Height [cm] x10 Weight [kg] x11 Hypertension history (Yes = 1, No = 0) x12 Smoking (3 categories: Never/Ex/Current) x13 Hypercholesterolaemia (Yes = 1, No = 0) x14 Previous angina pectoris (Yes = 1, No = 0) x15 Family history of myocardial infarctions (Yes = 1, No = 0) x16 ST elevation on ECG: Number of leads (0–11) x17 Time to relief of chest pain more than 1 hour (Yes = 1, No = 0) model. Traditionally, the use of Bayes factors re- we finally obtain a unified framework for objective quires the specification of proper prior distributions Bayesian model selection and parameter inference on all unknown model parameters of the alternative for GLMs. model, which are not shared by the null model. In The paper is structured as follows. In Section 2 contrast, Johnson (2005) defines Bayes factors us- we review the g-prior in the linear and generalized ing the distribution of a suitable test statistic under linear model, and show that this prior choice is im- the null and alternative models, effectively replacing plicit in the application of test-based Bayes factors the data with the test statistic. This approach elim- computed from the deviance statistic. In Section 3 inates the necessity to define prior distributions on we describe how the hyperparameter g influences model parameters and leads to simple closed-form model selection and parameter inference, and intro- expressions for χ2-, F -, t-, and z-statistics. duce empirical and fully Bayesian inference for it. The Johnson (2005) approach is extended in John- Using empirical Bayes to estimate g, we are able son (2008) to the likelihood ratio test statistic and, to analytically quantify the accuracy of test-based thus, if applied to generalized linear regression mod- Bayes factors in the linear model. Connections to the els (GLMs), to the deviance statistic [Nelder and literature on minimum Bayes factors and shrinkage Wedderburn (1972)]. This is explored further in Hu of regression coefficients are outlined. In Section 4 and Johnson (2009), where Markov chain Monte we apply the methodology in order to build a logis- Carlo (MCMC) is used to develop a Bayesian vari- tic regression model for predicting 30-day survival able selection algorithm for logistic regression. How- in the GUSTO-I trial, and compare our methodol- ever, the factor g in the implicit g-prior is treated as ogy with selected alternatives in a bootstrap study. fixed and estimation of the regression coefficients is In Section 5 we summarize our findings and sketch also not discussed. We fill this gap and extend the possible extensions. work by Hu and Johnson (2009), combining g-prior methodology for the linear model with Bayesian 2. OBJECTIVE BAYESIAN MODEL SELECTION IN REGRESSION model selection based on the deviance. This enables us to apply empirical [George and Foster (2000)] and Consider a generic regression model with linear fully Bayesian [Cui and George (2008)] approaches predictor η = α + x⊤β, from which weM assume that for estimating the hyperparameter g to GLMs. By the outcome y = (y1,...,yn) was generated. We col- linking g-priors to the theory on shrinkage esti- lect the intercept α, the regression coefficients vec- mates of regression coefficients [Copas (1983, 1997)], tor β, and possible additional parameters (e.g., the APPROXIMATE BAYESIAN MODEL SELECTION WITH THE DEVIANCE 3 residual variance in a linear model) in θ Θ. Spe- 2.1.1 Gaussian linear model Consider the Gaus- ∈ ⊤ 2 cific candidate models , j , differ with respect sian linear model j : yi N(α + x β ,σ ) with Mj ∈J M ∼ ij j to the content and the dimension of the covariate intercept α, regression coefficients vector βj, and 2 vector x, and hence β, so each model j defines variance σ , and collect all parameters in θj = θ M ⊤ 2 ⊤ 2 its own parameter vector j with likelihood func- (α, βj ,σ ) . Here N(µ,σ ) denotes the univariate tion p(y θ , ). 2 | j Mj Gaussian density with mean µ and variance σ , and Through optimizing this likelihood, we obtain the x = (x ,...,x )⊤ is the covariate vector for ob- ˆ ij i1 idj maximum likelihood estimate (MLE) θj of θj. For servation i = 1,...,n. Using the n dj full rank de- ⊤ × Bayesian inference a prior distribution with den- sign matrix Xj = (x1j,..., xnj) , the likelihood ob- sity p(θj j) is assigned to the parameter vector tained from n independent observations is |M θj to obtain the posterior density p(θj y, j) 2 | M ∝ (2) p(y θj, j) = Nn(y α1 + Xjβ ,σ I), p(y θj, j)p(θj j). This forms the basis to com- | M | j | M |M E pute the posterior mean (θj y, j) and other suit- with 1 and I denoting the all-ones vector and iden- | M able characteristics of the posterior distribution. tity matrix of dimension n, respectively. We assume The marginal likelihood that the covariates have been centered around 0, ⊤ that is, Xj 1 = 0. Here and in the following, 0 de- p(y j)= p(y θj, j)p(θj j) dθj notes the zero vector of length dj. |M Θ | M |M Z j Zellner’s g-prior [Zellner (1986)] fixes a constant is the key ingredient to transform prior model prob- g> 0 and specifies the Gaussian prior abilities Pr( j), j , to posterior model proba- 2 2 ⊤ −1 (3) β σ , j Nd (0,gσ (X Xj) ) bilities M ∈J j| M ∼ j j 2 for the regression coefficients βj , conditional on σ . p(y j) Pr( j) Pr( y)= |M M This prior can be interpreted as a posterior distribu- j p(y ) Pr( ) M | k∈J k k β (1) |M M tion, if α is fixed and a locally uniform prior for j is P DBFj,0 Pr( j) combined with an imaginary outcome y0 = α1 from = MPr .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages17 Page
-
File Size-