Bayesian Generalized Linear Model for Over and Under Dispersed Counts
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian generalized linear model for over and under dispersed counts Alan Huang1, and Andy Sang Il Kim2 1University of Queensland, Nanyang Technological University, and 2University of Technology Sydney Abstract. Bayesian models that can handle both over and under dis- persed counts are rare in the literature, perhaps because full probabil- ity distributions for dispersed counts are rather difficult to construct. This note takes a first look at Bayesian Conway-Maxwell-Poisson gen- eralized linear models that can handle both over and under dispersion yet retain the parsimony and interpretability of classical count regres- sion models. The focus is on providing an explicit demonstration of Bayesian regression inferences for dispersed counts via a Metropolis- Hastings algorithm. We illustrate the approach on two data analysis examples and demonstrate some favourable frequentist properties via a simulation study. Key words and phrases: Bayesian generalized linear model, Count data, Overdispersion, Underdispersion, Conway-Maxwell-Poisson. 1. INTRODUCTION Count data often exhibit dispersion relative to a classical Poisson model. For overdispersed counts, where the conditional variance is larger than the conditional mean, models based on the negative binomial, Poisson inverse-Gaussian and other Poisson rate mixture distributions have been well-covered in the literature from both frequentist and Bayesian points of view (e.g., Wilmot, 1987; McCullagh and Nelder, 1989; Gelman & Hill, 2007). For underdispersed counts, where the conditional variance is smaller than the conditional mean, the options are far more limited due to the difficulty in constructing full probability distributions that can handle underdispersion. A recent survey of a handful of existing models can be found in Sellers & Morris(2017). The lack of options is particularly detrimental from a Bayesian point of view because a full probability distribution for the data is required for the application of Bayes' theorem to determine the posterior distribution and subsequent inferences. arXiv:1910.06008v2 [stat.ME] 25 Oct 2019 This note takes a first step towards filling this gap in the literature by presenting a Bayesian framework for the regression modelling of counts that can (i) handle both over and under dispersion, and (ii) retain the same level of parsimony and interpretability as classical counts models. We do this by building on a recent reparametrisation of the Conway-Maxwell-Poisson distribution (CMP) that allows the mean to be modelled directly (Huang, 2017), so that simple and interpretable models, such as a log-linear model, can be constructed for both over and under dispersed counts. This places underdispersed counts on the same footing as School of Mathematics and Physics, University of Queensland, St Lucia, QLD, Australia 4072 (e-mail: [email protected]). School of Mathematical and Physical Sciences, University of Technology Sydney, Ultimo, NSW, Australia 2007 (e-mail: [email protected]). 1 2 A. HUANG & A. S. I. KIM equidispersed and overdispersed counts which are readily handled by the familiar Poisson and negative binomial models, respectively. More specifically, the mean-parametrized CMP distribution with mean µ > 0 and dispersion ν ≥ 0 is characterized by the probability mass function λ(µ, ν)y 1 (1.1) P (Y = yjµ, ν) = ; y = 0; 1; 2;:::; (y!)ν Z(λ(µ, ν); ν) where the rate λ(µ, ν) is a function of µ and ν given by the solution to 1 X λy 0 = (y − µ) ; (y!)ν y=0 P1 y ν and Z(λ, ν) = y=0 λ =(y!) is a normalizing function. It is easy to show that ν < 1 implies overdispersion and ν > 1 implies underdispersion relative to a Poisson distribution of the same mean. When ν = 1, the distribution coincides with the Poisson distribution. We write CMPµ(µ, ν) for the mean-parametrized CMP distribution (1.1) to distinguish it from the standard CMP distribution of Shmueli et. al. (2005). CMPµ distributions are particularly useful for modelling dispersed counts because they re- tain all the attractive properties of standard CMP distributions whilst being able to model the mean directly (Huang, 2017; Andrianatos, 2017). They form two-parameter exponential families, and for fixed or given values of the dispersion they become one-parameter exponen- tial families, making them immediately adaptable to regression modelling via the generalized linear model (GLM, McCullagh and Nelder, 1989) framework. They also form a continuous bridge between other classical models, passing through the geometric and Poisson distribu- tions as special cases, with the Bernoulli distribution as a limiting case. Moreover, unlike the generalized Poisson (Consul & Famoye, 1992), quasi-Poisson (Wedderburn, 1974), and the recently-proposed Extended Poisson-Tweedie (Bonat et. al., 2018) models, CMPµ distributions always correspond to full probability models for any choice of model parameters, covering both over and under dispersion. This makes them particularly suitable for Bayesian modelling and inferences. From a Bayesian point of view, Kadane et. al. (2006) explored the use of conjugate priors for the standard CMP distribution in the independent and identically distributed case. While this can be easily extended to mean-parametrized CMP distributions (see Online Supplement), there are three practical limitations of using the conjugate prior. First, the subsequent posterior is known only up to a normalizing constant that has no closed-form, so computing it requires either numerical integration, Markov Chain Monte Carlo (MCMC) sampling, or some other approximation method. There is therefore no practical advantage of using the conjugate prior over any other prior. Second, appropriate specification of the hyperparameters can be rather opaque due to the unfamiliar form of the conjugate prior. Kadane et. al. (2006) offer an applet to translate prior information into an appropriate elucidation of hyperparameters, but a simpler and more interpretable prior can mitigate this issue altogether. Finally, the conjugacy property only holds for intercept-only models, and the extension to the regression case in which the mean of each observation may depend on a set of covariates is not at all immediate. This is in fact true of most Bayesian generalized linear models { for example, while the Gamma distribution is conjugate for the Poisson distribution for intercept-only models, there is no extension to the Poisson regression case (Hoff, 2009, page 173). Instead, multivariate normal priors are placed on the regression coefficients for interpretability and parsimony (as implemented in the popular MCMCpack package in R by Martin et. al. (2011), for example). BAYESIAN GLM FOR DISPERSED COUNTS 3 To the best of our knowledge, this note is the first to consider a Bayesian framework for the regression modelling of both over and under dispersed counts that circumvents these lim- itations, yet retains the parsimony and interpretability of familiar count models such as the log-linear Poisson and negative binomial models. This work is part of a larger ongoing project that looks at efficient Bayesian inferences for dispersed counts in hierarchical models. 2. BAYESIAN CMP GENERALIZED LINEAR MODEL FOR DISPERSED COUNTS Suppose we have independent observation pairs (y1; X1); (y2; X2) ::: , where each yi is a p count response and each Xi 2 R is a corresponding set of covariates. A CMPµ generalized linear model for the data that can handle both over and under dispersion can be specified via ind: > (2.1) yijXi ∼ CMPµ µ(Xi β); ν ; i = 1; 2;:::; p where β 2 R is a vector of regression coefficients, ν ≥ 0 is a dispersion parameter, and (in a slight abuse of notation) µ(·) is a user-specified mean-model or inverse-link function. For this note, we focus on the popular log-linear model, E(yjX) = µ(X>β) = exp(X>β); so that each component of β can be interpreted as the expected change in the mean response (on the log scale) for a unit increase in the corresponding component of X. Of course, other link functions can be considered, as with any GLM. Indeed, the key advantage of the CMPµ distribution is that is directly parametrized via the mean so that simple, easily-interpretable mean-models can be considered. In contrast, standard CMP distributions (Shmueli et. al., 2005; Lord et. al., 2008; Sellers & Shmueli, 2010) and its variants (e.g., Guikema & Coffelt, 2008) model either a latent rate parameter or some power transformation of it, which cannot be interpreted as the mean. For a Bayesian model specification, we need a prior distribution on the model parameters β and ν. In this note, we focus on easily interpretable prior specifications, such as 2 (2.2) β ∼ N(µβ; Σβ) and ν ∼ Log-Normal(µν; σν) ; 2 so that the hyperparameters (µβ; Σβ) and (µν; σν) have clear interpretations as prior means and variances. This makes it easy to translate prior beliefs about the data-generating process into sensible specification of hyperparameter values. In contrast, the conjugate prior of Kadane et. al. (2006) has no clear interpretation, making elucidation of hyperparameter values rather opaque. Of course, the prior distribution in (2.2) can be replaced with any user-specified prior, with the proposed method in this note being applicable for any prior specification. Note that taking the prior variances in (2.2) to be arbitrarily large leads to improper flat priors, p(β); p(ν) / 1, which we consider in Section 4.3 and in the Online Supplement. When the normal and log-normal priors (2.2) are used, the joint posterior of the parameters β and ν given the observed data y ≡ (y1; : : : ; yn) has the form, n y −1 Y > i > −ν (2.3) p(β; νjy; X) / λ exp(Xi β); ν Z λ(exp(Xi β); ν); ν yi i=1 1 × exp − (β − µ )>Σ−1(β − µ ) 2 β β β 2 −1 (log ν − µν) ×ν exp − 2 ; 2σν 4 A. HUANG & A. S. I. KIM which is known up to a normalizing constant.