Lecture 5: Overdispersion in Logistic Regression

Lecture 5: Overdispersion in logistic regression Claudia Czado TU MÄunchen °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 0 { Overview ² De¯nition of overdispersion ² Detection of overdispersion ² Modeling of overdispersion °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 1 { Overdispersion in logistic regression Collett (2003), Chapter 6 Logistic model: Yi » bin (ni; pi) independent xt¯ xt¯ pi = e i =(1 + e i ) ) E(Yi) = nipi V ar(Yi) = nipi(1 ¡ pi) If one assumes that pi is correctly modeled, but the observed variance is larger or smaller than the expected variance from the logistic model given by nipi(1¡pi), one speaks of under or overdispersion. In application one often observes only overdispersion, so we concentrate on modeling overdispersion. °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 2 { How to detect overdispersion? If the logistic model is correct the asymptotic distribution of the residual deviance 2 2 D » Ân¡p. Therefore D > n ¡ p = E(Ân¡p) can indicate overdispersion. Warning: D > n ¡ p can also be the result of - missing covariates and/or interaction terms; - negligence of non linear e®ects; - wrong link function; - existance of large outliers; - binary data or ni small. One has to exclude these reasons through EDA and regression diagnostics. °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 3 { Residual deviance for binary logistic models Collett (2003) shows that the residual deviance for binary logistic models can be written as µ ¶ Xn p^ D = ¡2 (^p ln i + ln(1 ¡ p^ )); i 1 ¡ p^ i i=1 i ^ ^ x t¯ x t¯ where pî = e i =(1 + e i ). This is independent of Yi, therefore not useful to assess goodness of ¯t. Need to group data to use residual deviance as goodness of ¯t measure. °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 4 { Reasons for overdispersion Overdispersion can be explained by - variation among the success probabilities or - correlation between the binary responses Both reasons are the same, since variation leads to correlation and vice versa. But for interpretative reasons one explanation might be more reasonable than the other. °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 5 { Variation among the success probabilities If groups of experimental units are observed under the same conditions, the success probabilities may vary from group to group. Example: The default probabilities of a group of creditors with same conditions can vary from bank to bank. Reasons for this can be not measured or imprecisely measured covariates that make groups di®er with respect to their default probabilities. °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 6 { Correlation among binary responses ½ Pni 1 success Let Yi = Rij Rij = P (Rij = 1) = pi j=1 0 otherwise ni Pni X X ) V ar(Y ) = V ar(R ) + Cov(R ;R ) i | {z ij} ij ik j=1 j=1 k6=j pi(1¡pi) | {z } 6=0 6= nipi(1 ¡ pi) = binomial variance Yi has not a binomial distribution. Examples: - same patient is observed over time - all units are from the same family or litter (cluster e®ects) °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 7 { Modeling of variability among success probabilities Williams (1982) Yi = Number of successes in ni trials with random success probability vi, i = 1; : : : ; n Assume E(vi) = pi V ar(vi) = Ápi(1 ¡ pi);Á ¸ 0 unknown scale parameter. Note: V ar(vi) = 0 if pi = 0 or 1 vi 2 (0; 1) is unobserved or latent random variable °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 8 { Conditional expectation and variance of Yi: E(Yijvi) = nivi V ar(Yijvi) = nivi(1 ¡ vi) Since E(Y ) = EX(E(Y jX)) V ar(Y ) = EX(V ar(Y jX)) + V arX(E(Y jX)); the unconditional expectation and variance is E(Yi) = Evi(E(Yijvi)) = Evi(nivi) = nipi V ar(Yi) = Evi(nivi(1 ¡ vi)) + V arvi(nivi) 2 2 = ni[Evi(vi) ¡ Evi(vi )] + ni Ápi(1 ¡ pi) 2 2 = ni(pi ¡ Ápi(1 ¡ pi) ¡ pi ) + ni Ápi(1 ¡ pi) = nipi(1 ¡ pi)[1 + (ni ¡ 1)Á] °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 9 { Remarks - Á = 0 ) no overdispersion - Á > 0 ) overdispersion if ni > 1 - ni = 1 (Bernoulli data) ) no information about Á available, this model is not useful °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 10 { Modelling of correlation among the binary responses ½ Pni 1 success Yi = Rij;Rij = P (Rij = 1) = pi j=1 0 otherwise ) E(Yi) = nipi but Cor(Rij;Rik) = ± k 6= j p ) Cov(Rij;Rik) = ± V ar(Rij)V ar(Rik) = ±pi(1 ¡ pi) Pni Pni P ) V ar(Yi) = V ar(Rij) + Cov(Rij;Rik) j=1 j=1 k6=j = nipi(1 ¡ pi) + ni(ni ¡ 1)[±pi(1 ¡ pi)] = nipi(1 ¡ pi)[1 + (ni ¡ 1)±] °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 11 { Remarks - ± = 0 ) no overdispersion - ± > 0 ) overdispersion if ni > 1 ± < 0 ) underdispersion. - Since we need 1 + (ni ¡ 1)± > 0 ± cannot be too small. For ni ! 1 ) ± ¸ 0. - Unconditional mean and variance are the same if ± ¸ 0 for both approaches, therefore we cannot distinguish between both approaches °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 12 { Estimation of Á Yijvi » bin(ni; vi) E(vi) = pi V ar(vi) = Ápi(1 ¡ pi) i = 1; : : : ; g Special case ni = n 8i V ar(Y ) = np (1 ¡ p ) [1 + (n ¡ 1)Á] heterogenity factor i i i | {z } σ2 One can show that E(Â2) ¼ (g ¡ p)[1 + (n ¡ 1)Á] = (g ¡ p)σ2 where p = number of parameters in the largest model to be considered and Pg 2 Â2 = (yi¡npî) . npî(1¡pî) i=1 2 2 Â ^ σ^2¡1 ) σ^ = g¡p ) Á = n¡1 Estimation of ¯ remains the same °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 13 { Analysis of deviance when variability among the success probabilities are present model df deviance covariates 1 º1 D1 xi1; : : : ; xiº1 2 º2 D2 xi1; : : : ; xiº1; xi(º1+1); : : : ; xiº2 0 º0 D0 xi1; : : : ; xiº0 For Yijvi » bin(ni; vi); i = 1; : : : ; g. Since E(Â2) ¼ σ2(g ¡ p) we expect 2 a 2 2 a 2 a 2 2 Â » σ Âg¡p and D » Â » σ Âg¡p Â2 Statistic distribution (D ¡ D )=(º ¡ º ) Â2 ) 1 2 2 1 »a º2¡º1 »a F D =º Â2 º2¡º1;º0 0 0 º0 ! no change to ordinary case °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 14 { Estimated standard errors in overdispersed models ^ ^ seb (¯j) = σ^ ¢ sec0(¯j); where ^ sec0(¯j) = estimated standard error in the model without overdispersion 2 This holds since V ar(Yi) = σ nipi(1¡pi) and in both cases we have EYi = pi. °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 15 { Beta-Binomial models vi = latent success probability 2 (0; 1) vi » Beta(ai; bi) 1 ai¡1 b ¡1 f(vi) = v (1 ¡ vi) i ; ai; bi > 0 density B(ai;bi) i R1 B(a; b) = xa¡1(1 ¡ x)b¡1dx ¡ Beta function 0 ai E(vi) = =: pi ai+bi aibi V ar(vi) = 2 = pi(1 ¡ pi)=[ai + bi + 1] = pi(1 ¡ pi)¿i (ai+bi) (ai+bi+1) 1 ¿i := ai+bi+1 1 If ai > 1; bi > 1 8i we have unimodality and V ar(vi) < pi(1 ¡ pi)3. If ¿i = ¿, the beta binomial model is equivalent to the model with variability 1 among success probabilities with Á = ¿ < 3 () more restrictive). °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 16 { (Marginal) likelihood Qn R1 l(¯) = f(yijvi)f(vi)dvi i=1 0 Qn R ³ ´ ni 1 yi n ¡y ai¡1 b ¡1 = v (1 ¡ vi) i iv (1 ¡ vi) i dvi y B(ai;bi) i i i=1 i t t x ¯ x ¯ ai where pi = e i =(1 + e i ) pi = ai+bi Qn ³ ´ = ni B(yi+ai;ni¡yi+bi) y B(ai;bi) i=1 i needs to be maximized to determine MLE of ¯. Remark: no standard software exists °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 17 { Random e®ects in logistic regression Let vi = latent success probability with E(vi) = pi µ ¶ vi t log = xi¯ + ±i \random e®ect" 1 ¡ vi ±i measures missing or measured imprecisely covariates. When an intercept is 2 included we can assume E(±i) = 0. Further assume ±i i:i:d: with V ar(±i) = σ± Let Zi i:i:d: with E(Zi) = 0 and V ar(Zi) = 1 D 2 ) ±i = γZi with γ = σ± ¸ 0 Therefore µ ¶ vi t log = xi¯ + γZi 1 ¡ vi Remark: this model can also be used for binary regression data °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 18 { Estimation in logistic regression with random e®ects If Zi » N(0; 1) i:i:d: the joint likelihood for ¯; γ; Zi is given by Qn ³ ´ ni yi ni¡yi L(¯; γ; Z) = y vi (1 ¡ vi) i=1 i ³ ´ Qn t yi ni expfxi¯+γZig = n p + 1 + n parameters yi t i i=1 [1+expfxi¯+γZig] Too many parameters, therefore maximize marginal likelihood R L(¯; γ) := L(¯; γ; Z)f(Z)dZ Rn ³ ´ 1 t Qn R yi 2 ni expfxi¯+γZig 1 ¡1Z = n p e 2 i dZi yi t i 2¼ i=1 ¡1 [1+expfxi¯+γZig] °c (Claudia Czado, TU Munich) ZFS/IMS GÄottingen2004 { 19 { This can only be determined numerically. One approach is to use a Gauss- Hermite approximation given by 1 Z Xm ¡u2 f(u)e du ¼ cjf(sj) ¡1 j=1 for known cj and sj (see tables in Abramowitz and Stegun (1972)).

Lecture 5: Overdispersion in Logistic Regression

An Introduction to Poisson Regression Russ Lavery, K&L Consulting Services, King of Prussia, PA, U.S.A

Overdispersion, and How to Deal with It in R and JAGS (Requires R-Packages AER, Coda, Lme4, R2jags, Dharma/Devtools) Carsten F

Poisson Versus Negative Binomial Regression

Negative Binomial Regression

Generalized Linear Models

Modeling Overdispersion with the Normalized Tempered Stable Distribution

Variance Partitioning in Multilevel Logistic Models That Exhibit Overdispersion

Meta-Analysis of Count Data What Is Count Data Complicated by What Is

Generalized Linear Models and Point Count Data: Statistical Considerations for the Design and Analysis of Monitoring Studies

Handling Overdispersion with Negative Binomial and Generalized Poisson Regression Models

Stat 5421 Lecture Notes: Overdispersion

Count Data Models