Marginal Likelihood Estimation Via Power Posteriors

Marginal likelihood estimation via power posteriors N. Friel† University of Glasgow, UK A.N. Pettitt Queensland University of Technology, Australia Summary. Model choice plays an increasingly important role in Statistics. From a Bayesian perspective a crucial goal is to compute the marginal likelihood of the data for a given model. This however is typically a difficult task since it amounts to integrating over all model parameters. The aim of this paper is to illustrate how this may be achieved using ideas from thermodynamic integration or path sampling. We show how the marginal likelihood can be computed via MCMC methods on modified posterior distributions for each model. This then allows Bayes factors or posterior model probabilitiesto be calculated. We show that this approach requires very little tuning, and is straightforward to implement. The new method is illustrated in a variety of challenging statistical settings. Keywords: BAYES FACTOR; REGRESSION; LONGITUDINAL DATA; MULTINOMIAL; HIDDEN MARKOV MODEL; MODEL CHOICE 1. Introduction Suppose data y is assumed to have been generated by one of M models indexed by the set {1, 2,...,M}. An important goal of Bayesian model selection is to calculate p(k|y) – the posterior model probability for model k. Here the aim may be to obtain a single most probable model, or indeed a subset of likely models, a posteriori. Alternatively, posterior model probabilities may be synthesised from all competing models to calculate some quantity of interest common to all models, using model averaging (Hoeting et al 2001). But before we can make inference for p(k|y) we first need to fully specify the posterior distribution of all unknown parameters in the joint model and parameter space. We denote by θk, the parameters specific to model k, where θ denotes collection of all model parameters. Specifying prior distributions for within model parameters p(θk|k), and for model indicators p(k), together with a likelihood L(y|θk, k) within model k allows Bayesian inference to proceed by examining the posterior distribution p(θk, k|y) ∝ L(y|θk, k)p(θk|k)p(k). (1) Across model strategies proceed by sampling from this joint posterior distribution of model indicators and parameters. The reversible jump MCMC algorithm (Green 1995) is a popular approach for this situation. Other across model search strategies include (Godsill 2001) and (Carlin and Chib 1995). By contrast, within model searches examine the posterior distribution within model k separately for each k. Here the within model posterior appears as p(θk|y, k) ∝ L(y|θk, k)p(θk|k), where the constant of proportionality, often termed the marginal likelihood or integrated likelihood for model k is written as p(y|k)= L(y|θk, k)p(θk|k)dθk. (2) Zθk This is in general a difficult integral to compute, possibly involving high dimensional within model parameters θk. However if we were able to do so, then we would be readily able to make statements about posterior model probabilities, using Bayes theorem, p(y|k)p(k) p(k|y)= M . 1 p(y|k)p(k) †Address for correspondence: Department of Statistics,P University of Glasgow, Glasgow, G12 8QQ, UK. E-mail: [email protected] 2 Friel, Pettitt The marginal likelihoods can be used to compare two models by computing Bayes factors, p(y|k = 1) B = , 12 p(y|k = 2) without the need to specify prior model probabilities. Note that the Bayes factor B12 gives the evidence provided by the data in favour of model 1 compared to model 2. It can also be seen that p(k =1|y) p(k = 2) B = . 12 p(k =2|y) p(k = 1) In other words, the Bayes factor is the ratio of the posterior odds of model 1 to its prior odds. Note that an improper prior distribution p(θk|k) leads necessarily to an improper marginal likelihood, which in turns implies that the Bayes factor is not well-defined. To circumvent the difficulty of using improper priors for model comparison, (O’Hagan 1995) introduced an approximate method termed the fractional Bayes factors. Here an approximate (proper) marginal likelihood is defined by the ratio L(y|θ , k)p(θ |k)dθ θk k k k , R {L(y|θ , k)}ap(θ |k)dθ θk k k k R since any impropriety in the prior for θk cancels above and below. Other approaches to this problem include the intrinsic Bayes factor (Berger and Perricchi 1996). In this paper we concentrate on the case where prior model distributions are proper. Various methods have been proposed in the literature to estimate the marginal likelihood (2). For example (Chib 1995) estimates the marginal likelihood p(y|k) using output from a Gibbs sampler for (θk|y, k). It relies however on a block updating approach for θk. Clearly this is not always possible to do. This work was then extended in (Chib and Jeliazkov 2001), where output from a Metropolis-Hasting algorithm for the posterior (θk|y, k) can be used to estimate the marginal likelihood. Annealed importance sampling (Neal 2001) estimates the marginal likelihood using ideas from importance sampling. Here an independent sample from the posterior is generated by defining a sequence of distributions indexed by a temperature parameter t from the prior through to the posterior. Importantly however, the collection of importance weights can be used to estimate the marginal likelihood. In this paper we propose a new method to compute the marginal likelihood based on samples from a distribution proportional to the likelihood raised to a power t times the prior, which we term the power posterior. This method was inspired by ideas from path sampling or thermodynamic integration (Gelman and Meng 1998). We find that the marginal likelihood, p(y|k) can be expressed as an integral with respect to t from 0 to 1 of the expected deviance for model k, where the expectation is taken with respect to the power distribution, at power t. We argue that this method requires very little tuning or book keeping. It is easy to implement, requiring minor modification to computer code which samples from the posterior distribution. In this paper, in Section 2 we carry out a review of different approaches to across and within model searches. Section 3 introduces the new method for estimating the marginal likelihood, based on sampling from the so-called power posterior distribution. Here we also outline how this method could be implemented in practice, while also giving some guidance as to the sensitivity of the estimate of the marginal likelihood (or Bayes factor) to the diffusivity of the prior model parameters. Three illustrations of how the new method performs in practice are given in Section 4. In particular complex modelling situations are illustrated which preclude other methods of marginal likelihood estimation which rely on block updating model parameters. We conclude this paper with some final remarks in Section 5. 2. Review of methods to compute Bayes factors There are numerous methods and techniques available to estimate (2). Generally speaking two approaches are possible – an across model search or a within model search. The former approach in an MCMC setting, involves generating a single Markov chain which traverses the joint model and parameter space (2). A popular choice is the reversible jump sampler (Green 1995). Godsill (2001) retains aspects of the reversible jump sampler, but considers the case where parameters are shared Marginal likelihood estimation via power posteriors 3 between different model, as occurs for example in nested models. Other choices include (Stephens 2000), (Carlin and Chib 1995) and (Dellaportas, Forster and Ntzoufras 2001). Within model search essentially aims to estimate the marginal likelihood (2) for each model k separately, and then if desired uses this information to form Bayes factors, see (Chib 1995), (Chib and Jeliazkov 2001). Neal (2001) combines aspects of simulated annealing and importance sampling to provide a method of gathering an independent sample from a posterior distribution of interest, but importantly also to estimate the marginal likelihood. Bridge sampling (Meng and Wong 1996) offers the possibility of estimating the Bayes factor linking the two posterior distributions by a bridge function. Bartolucci and Mira (2003) use this approach, although it is based on an across model reversible jump sampler. Within model approaches are disadvantageous when the cardinality of the model space is large. However, as noted by Green (2003), the ideal situation for a within model approach is one where the models are all reasonably heterogeneous. In effect, this is the case where it is difficult to choose proposal distributions when jumping between models - and indeed the situation where parameters across models of the same dimension have different interpretations. This short review is not intended to be exhaustive. A more complete picture can be found in the substantial reviews (Sisson 2005) and (Han and Carlin 2001). 2.1. Reversible jump MCMC Reversible jump MCMC (Green 1995) offers the potential to carry out inference for all unknown parameters (1) in the joint model and parameter space in a single logical framework. A crucial innovation in the seminal paper by Green (1995) was to illustrate that detailed balance could be achieved for general state spaces. In particular this extends the Metropolis-Hastings algorithm to variable dimension state spaces of the type (θk, k). To implement the algorithm, proposing to move from (θk, k) to (θl,l) proceeds by generating a random vector u from a distribution g and setting ∗ (θl,l)= fkl((θk, k), u). Similarly to move from (θl,l) to (θk, k) requires random numbers u follow- ∗ ∗ ing some distribution g , and setting (θk, k) = flk((θl,l), u ), for some deterministic function flk.

Load more