CHAPTER 12 Generalized Linear Models and Poisson Regression
1 / 36 The Model
2 / 36 Generalized linear models
• Extensions of traditional linear models (e.g., logistic regression model) • Allow (i) Population mean to depend on a linear predictor through a nonlinear link function (ii) Response distribution to be any member of the exponential family • Three building blocks
I y1, y2,..., yn following the same distribution from the exponential family I β and x1, x2,..., xp 0 I Monotone link function: g(µi ) = xi β
3 / 36 Examples
• Standard linear regression model
I g(µ) = µ I y ∼ Normal • Logistic regression model µ I g(µ) = ln 1 − µ I y ∼ Bernoulli (or binomial) • Poisson regression model
I g(µ) = ln µ I y ∼ Poisson
4 / 36 Poisson regression model
• When the response represents count data • Examples: the number of
I Daily equipment failures I Weekly traffic fatalities I Monthly insurance claims I ... • Poisson distribution µy P(Y = y) = e−µ, y = 0, 1, 2,... y
• E(Y ) = V (Y ) = µ > 0
5 / 36 Poisson regression model (cont.)
• Link function:
g(µ) = ln µ = β0 + β1x1 + ··· + βpxp
• Then µ = exp(β0 + β1x1 + ··· + βpxp) > 0
• Interpretation: a unit change of x1 affects the mean by
exp[β + β (x + 1) + β x + ··· + β x ] − exp[β + β x + β x + ··· + β x ] 100 0 1 2 2 p p 0 1 2 2 p p exp[β0 + β1x + β2x2 + ··· + βpxp]
= 100[exp(β1) − 1] percent
6 / 36 Estimation of the Parameters in the Poisson Regression Model
7 / 36 Maximum likelihood estimation
n yi Y µi L(β|y1, y2,..., yn) = exp(−µi ) yi i=1 • Lok-likelihood function
n n X X ln L(β|y1, y2,..., yn) = c + yi ln µi − µi i=1 i=1 X where c = − ln(yi) • Use the Newton-Raphson method
8 / 36 Maximum likelihood estimation (cont.)
• Derivative:
n n ∂ ln L X y X y − µ = i − 1 = i i ∂µi µi µi i=1 i=1
• Note: ∂µi /∂β = µi xi • Hence, n n ∂ ln L ∂ ln L ∂µi X yi − µi X = = µi xi = (yi − µi )xi ∂β ∂µi ∂β µi i=1 i=1 • Maximum likelihood score equations:
X 0(y − µ) = 0
9 / 36 Newton-Raphson procedure • First derivative of the negative log-likelihood:
n ∂ ln L X g = − = − (y − µ )x ∂β i i i i=1 • Second derivative:
( n ) n ∂2 ln L ∂ X X − = − (yi − µi )xij∗ = µi xij xij∗ ∂βj ∂βj∗ ∂βj i=1 i=1 • Hessian matrix: n X 0 G = µi xi xi i=1 • Newton-Raphson iteration:
β∗ = β˜ − [G(β˜)]−1g(β˜)
10 / 36 Iteratively reweighted least squares (IRLS) algorithm
• Iteratively computed response:
0 ˜ 1 zi = xi β + (yi − µ˜i ) µ˜i
• Weighted linear regression of zi on xi with weights wi =µ ˜i • Equivalent to the Newton-Raphson iteration
11 / 36 Iteratively reweighted least squares (IRLS) algorithm (cont.)
• Weighted least squares (WLS) estimate:
" n #−1 " n # " n #−1 " n # ˆWLS X 0 X X 0 X β = wi xi xi wi xi zi = µ˜i xi xi µ˜i xi zi i=1 i=1 i=1 i=1 with " n # n " n # n X X 0 ˜ 1 X 0 ˜ X µ˜i xi zi = µ˜i xi xi β + (yi − µ˜i ) = µ˜i xi xi β + (yi − µ˜i )xi µ˜i i=1 i=1 i=1 i=1
= Gβ˜ − g • Hence, WLS βˆ = G−1(Gβ˜ − g) = β˜ − G−1g
12 / 36 Inference in the Poisson Regression Model
13 / 36 Likelihood ratio tests
• LR test statistic: L(full) T = 2 ln = 2{ln L(full) − ln L(restricted)} L(restricted)
2 • Under H0, T ∼ χ (df) where df = the number of independent constraints 2 • Reject H0 if T > χ (1 − α, df) • Can be used to test the significance of
I An individual coefficient (a partial test) I two or more coefficients simultaneously I All coefficients (test of overall regression)
14 / 36 Standard errors of the maximum likelihood estimates and wald tests
• Estimate of the covariance matrix:
" n #−1 ˆ ∼ −1 X 0 V (β) = G = µˆi xi xi i=1
• Wald confidence intervals for βj : ˆ ˆ βj ± (1.96)s.e.(βj )
15 / 36 Standard errors of the estimated mean 0 ˆ • Estimate of the ith mean: µˆi = exp(xi β) • Tylor series expansion:
∼ ˆ 0 ∂µˆi 0 ˆ µˆi = µi + (β − β) ˆ = µi + µi x (β − β) ∂βˆ β=β i
• Estimated variance of µˆi :
" n #−1 ∼ 2 0 ˆ ∼ 2 0 X 0 V (ˆµi ) = µi xi V (β)xi = µi xi µˆi xi xi xi i=1
• Approximate 95% confidence intervals for µi : v u " n #−1 u 0 X 0 µˆi ± (1.96)ˆµi txi µˆi xi xi xi i=1
16 / 36 Deviance • Saturated model:
I ith log-likelihood, yi ln(µi ) − µi , is maximized for µi = yi n X I Log-likelihood function: c + [yi ln(yi ) − yi ] i=1 • Deviance: D = 2{ln L(saturated) − ln L(parameterized)} ( n n ) X X = 2 [yi ln(yi ) − yi ] − [yi ln(ˆµi ) − µˆi ] i=1 i=1 n n X yi X yi = 2 yi ln − (yi − µˆi ) = 2 yi ln µˆi µˆi i=1 i=1
n X since [yi − µˆi ] = 0 as long as an intercept is included in i=1 the model
17 / 36 Goodness of fit y • Second-order Taylor series expansion of y ln around µ y = µ: y 1 y ln =∼ (y − µ) + (y − µ)2 µ 2µ • Pearson chi-square statistic:
n n 2 ∼ X 1 2 X (yi − µˆi ) 2 D = 2 (yi − µˆi ) + (yi − µˆi ) − (yi − µˆi ) = = χ 2µˆ µˆi i=1 i=1 • Goodness of fit: 2 2 I D (or χ ) > χ (1 − α, n − p − 1) I standardized deviance or standardized Pearson chi-square statistic, D χ2 or 1 n − p − 1 n − p − 1 ⇒ question the adequacy of the model
18 / 36 Residuals
• Deviance residuals: s yi di = sign(yi − µˆi ) 2 yi ln − (yi − µˆi ) µˆi
• Pearson residuals: yi − µˆi ri = √ µˆi
19 / 36