Lognormal and Gamma Mixed Negative Binomial Regression
Total Page:16
File Type:pdf, Size:1020Kb
Lognormal and Gamma Mixed Negative Binomial Regression Mingyuan Zhou [email protected] Lingbo Li [email protected] David Dunson [email protected] Lawrence Carin [email protected] Duke University, Durham NC 27708, USA Abstract & Nelder, 1989; Long, 1997; Cameron & Trivedi, 1998; In regression analysis of counts, a lack of Agresti, 2002; Winkelmann, 2008). Regression models simple and efficient algorithms for poste- for counts are usually nonlinear and have to take into rior computation has made Bayesian ap- consideration the specific properties of counts, includ- proaches appear unattractive and thus un- ing discreteness and nonnegativity, and often charac- derdeveloped. We propose a lognormal and terized by overdispersion (variance greater than the gamma mixed negative binomial (NB) regres- mean). In addition, we may wish to impose a sparse sion model for counts, and present efficient prior in the regression coefficients for counts, which is closed-form Bayesian inference; unlike con- demonstrated to be beneficial for regression analysis ventional Poisson models, the proposed ap- of both Gaussian and binary data (Tipping, 2001). proach has two free parameters to include two Count data are commonly modeled with the Poisson different kinds of random effects, and allows distribution y ∼ Pois(λ), whose mean and variance the incorporation of prior information, such are both equal to λ. Due to heterogeneity (difference as sparsity in the regression coefficients. By between individuals) and contagion (dependence be- placing a gamma distribution prior on the NB tween the occurrence of events), the varance is often dispersion parameter r, and connecting a log- much larger than the mean, making the Poisson as- normal distribution prior with the logit of the sumption restrictive. By placing a gamma distribution NB probability parameter p, efficient Gibbs prior with shape r and scale p=(1 − p) on λ, a negative sampling and variational Bayes inference are binomial (NB) distribution y ∼ NB(r; p) can be gener- both developed. The closed-form updates 1 ated as f (y) = R Pois(y; λ)Gamma λ; r; p dλ = are obtained by exploiting conditional con- Y 0 1−p Γ(r+y) r y jugacy via both a compound Poisson repre- y!Γ(r) (1 − p) p , where Γ(·) denotes the gamma func- sentation and a Polya-Gamma distribution tion, r is the nonnegative dispersion parameter and p is based data augmentation approach. The pro- a probability parameter. Therefore, the NB distribu- posed Bayesian inference can be implemented tion is also known as the gamma-Poisson distribution. routinely, while being easily generalizable to It has a variance rp=(1 − p)2 larger than the mean more complex settings involving multivariate rp=(1−p), and thus it is usually favored over the Pois- dependence structures. The algorithms are son distribution for modeling overdispersed counts. illustrated using real examples. The regression analysis of counts is commonly per- formed under the Poisson or NB likelihoods, whose 1. Introduction parameters are usually estimated by finding the max- imum of the nonlinear log likelihood (Long, 1997; In numerous scientific studies, the response variable Cameron & Trivedi, 1998; Agresti, 2002; Winkelmann, is a count y = 0; 1; 2; ··· , which we wish to ex- 2008). The maximum likelihood estimator (MLE), T plain with a set of covariates x = [1; x1; ··· ; xP ] as however, only provides a point estimate and does not −1 T T E[yjx] = g (x β), where β = [β0; ··· ; βP ] are the allow the incorporation of prior information, such as regression coefficients and g is the canonical link func- sparsity in the regression coefficients. In addition, the tion in generalized linear models (GLMs) (McCullagh MLE of the NB dispersion parameter r often lacks ro- bustness and may be severely biased or even fail to th Appearing in Proceedings of the 29 International Confer- converge if the sample size is small, the mean is small ence on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). or if r is large (Saha & Paul, 2005; Lloyd-Smith, 2007). Lognormal and Gamma Mixed Negative Binomial Regression Compared to the MLE, Bayesian approaches are able (Winkelmann, 2008). To model overdispersed counts, to model the uncertainty of estimation and to incor- the Poisson regression model can be modified as porate prior information. In regression analysis of y ∼ Pois(λ ); λ = exp(xT β) (2) counts, however, the lack of simple and efficient algo- i i i i i rithms for posterior computation has seriously limited where i is a nonnegative multiplicative random-effect routine applications of Bayesian approaches, making term to model individual heterogeneity (Winkelmann, Bayesian analysis of counts appear unattractive and 2008). Using both the law of total expectation and the thus underdeveloped. For instance, for the NB dis- law of total variance, it can be shown that persion parameter r, the only available closed-form T E[yijxi] = exp(xi β)E[i] (3) Bayesian solution relies on approximating the ratio of two gamma functions using a polynomial expan- Var[i] 2 Var[yijxi] = E[yijxi] + 2 E [yijxi]: (4) sion (Bradlow et al., 2002); and for the regression co- E [i] efficients β, Bayesian solutions usually involve com- Thus Var[y jx ] ≥ [y jx ] and we obtain a regression putationally intensive Metropolis-Hastings algorithms, i i E i i model for overdispersed counts. We show below that since the conjugate prior for β is not known under the both the gamma and lognormal distributions can be Poisson and NB likelihoods (Chib et al., 1998; Chib & used as the nonnegative prior on . Winkelmann, 2001; Winkelmann, 2008). i In this paper we propose a lognormal and gamma 2.1. The Negative Binomial Regression Model mixed NB regression model for counts, with default The NB regression model (Long, 1997; Cameron & Bayesian analysis presented based on two novel data Trivedi, 1998; Winkelmann, 2008; Hilbe, 2007) is con- augmentation approaches. Specifically, we show that structed by placing a gamma prior on i as the gamma distribution is the conjugate prior to the rr r−1 −ri NB dispersion parameter r, under the compound Pois- i ∼ Gamma(r; 1=r) = i e (5) son representation, with efficient Gibbs sampling and Γ(r) variational Bayes (VB) inference derived by exploiting −1 where E[i] = 1 and Var[i] = r . Marginalizing conditional conjugacy. Further we show that a log- out i in (2), we have a NB distribution parame- normal prior can be connected to the logit of the NB T terized by mean µi = exp(xi β) and inverse disper- probability parameter p, with efficient Gibbs sampling sion parameter φ (the reciprocal of r) as fY (yi) = −1 and VB inference developed for the regression coeffi- −1 −1 φ yi Γ(φ +yi) φ µi 2 −1 −1 −1 , thus cients β and the lognormal variance parameter σ , by yi!Γ(φ ) φ +µi φ +µi generalizing a Polya-Gamma distribution based data T augmentation approach in Polson & Scott(2011). The E[yijxi] = exp(xi β) (6) 2 proposed Bayesian inference can be implemented rou- Var[yijxi] = E[yijxi] + φE [yijxi]: (7) tinely, while being easily generalizable to more com- plex settings involving multivariate dependence struc- The MLEs of β and φ can be found numerically with tures. We illustrate the algorithm with real examples the Newton-Raphson method (Lawless, 1987). on univariate count analysis and count regression, and demonstrate the advantages of the proposed Bayesian 2.2. The Lognormal-Poisson Regression Model approaches over conventional count models. A lognormal-Poisson regression model (Breslow, 1984; Long, 1997; Agresti, 2002; Winkelmann, 2008) can be 2. Regression Models for Counts constructed by placing a lognormal prior on i as The most basic regression model for counts is the Pois- 2 i ∼ ln N (0; σ ) (8) son regression model (Long, 1997; Cameron & Trivedi, 1998; Winkelmann, 2008), which can be expressed as σ2=2 σ2 σ2 where E[i] = e and Var[i] = e e − 1 . Us- T yi ∼ Pois(λi); λi = exp(xi β) (1) ing (3) and (4), we have T where xi = [1; xi1; ··· ; xiP ] is the covariate vector for T 2 E[yijxi] = exp(xi β + σ =2) (9) sample i. The Newton-Raphson method can be used σ2 2 to iteratively find the MLE of β (Long, 1997). A seri- Var[yijxi] = E[yijxi] + e − 1 E [yijxi]: (10) ous constraint of the Poisson regression model is that it assumes equal-dispersion, i:e:, E[yijxi] = Var[yijxi] = Compared to the NB model, there is no analytical form T exp(xi β): In practice, however, count data are of- for the distribution of yi if i is marginalized out and ten overdispersed, due to heterogeneity and contagion the MLE is less straightforward to calculate, making Lognormal and Gamma Mixed Negative Binomial Regression it less commonly used. However, Winkelmann(2008) 3.1. Model Properties and Model Comparison suggests to reevaluate the lognormal-Poisson model, Using the laws of total expectation and total variance since it is appealing in theory and may fit the data and the moments of the NB distribution, we have better. The inverse Gaussian distribution prior can T 2 E[yijxi] = Ei [E[yijxi; i]]= exp(xi β + σ =2 + ln r) (16) also be placed on i to construct a heavier-tailed al- Var[y jx ]= [Var[y jx ; ]]+Var [ [y jx ; ]] ternative to the NB model (Dean et al., 1989), whose i i Ei i i i i E i i i σ2 −1 2 density functions are shown to be virtually identical = E[yijxi] + e (1 + r ) − 1 E [yijxi]: (17) to the lognormal-Poisson model (Winkelmann, 2008). We define the quasi-dispersion κ as the coefficient as- sociated with the mean quadratic term in the variance.