<<

CHAPTER 12 Generalized Linear Models and Poisson Regression

1 / 36 The Model

2 / 36 Generalized linear models

• Extensions of traditional linear models (e.g., model) • Allow (i) Population to depend on a linear predictor through a nonlinear link function (ii) Response distribution to be any member of the • Three building blocks

I y1, y2,..., yn following the same distribution from the exponential family I β and x1, x2,..., xp 0 I Monotone link function: g(µi ) = xi β

3 / 36 Examples

• Standard model

I g(µ) = µ I y ∼ Normal • Logistic regression model µ I g(µ) = ln 1 − µ I y ∼ Bernoulli (or binomial) • Poisson regression model

I g(µ) = ln µ I y ∼ Poisson

4 / 36 Poisson regression model

• When the response represents count • Examples: the number of

I Daily equipment failures I Weekly traffic fatalities I Monthly insurance claims I ... • µy P(Y = y) = e−µ, y = 0, 1, 2,... y

• E(Y ) = V (Y ) = µ > 0

5 / 36 Poisson regression model (cont.)

• Link function:

g(µ) = ln µ = β0 + β1x1 + ··· + βpxp

• Then µ = exp(β0 + β1x1 + ··· + βpxp) > 0

• Interpretation: a unit change of x1 affects the mean by

exp[β + β (x + 1) + β x + ··· + β x ] − exp[β + β x + β x + ··· + β x ] 100 0 1 2 2 p p 0 1 2 2 p p exp[β0 + β1x + β2x2 + ··· + βpxp]

= 100[exp(β1) − 1] percent

6 / 36 Estimation of the in the Poisson Regression Model

7 / 36 Maximum likelihood estimation

n yi Y µi L(β|y1, y2,..., yn) = exp(−µi ) yi i=1 • Lok-likelihood function

n n X X ln L(β|y1, y2,..., yn) = c + yi ln µi − µi i=1 i=1 X where c = − ln(yi) • Use the Newton-Raphson method

8 / 36 Maximum likelihood estimation (cont.)

• Derivative:

n n ∂ ln L X  y  X y − µ  = i − 1 = i i ∂µi µi µi i=1 i=1

• Note: ∂µi /∂β = µi xi • Hence, n   n ∂ ln L ∂ ln L ∂µi X yi − µi X = = µi xi = (yi − µi )xi ∂β ∂µi ∂β µi i=1 i=1 • Maximum likelihood score equations:

X 0(y − µ) = 0

9 / 36 Newton-Raphson procedure • First derivative of the negative log-likelihood:

n ∂ ln L X g = − = − (y − µ )x ∂β i i i i=1 • Second derivative:

( n ) n ∂2 ln L ∂ X X − = − (yi − µi )xij∗ = µi xij xij∗ ∂βj ∂βj∗ ∂βj i=1 i=1 • Hessian matrix: n X 0 G = µi xi xi i=1 • Newton-Raphson iteration:

β∗ = β˜ − [G(β˜)]−1g(β˜)

10 / 36 Iteratively reweighted (IRLS) algorithm

• Iteratively computed response:

0 ˜ 1 zi = xi β + (yi − µ˜i ) µ˜i

• Weighted linear regression of zi on xi with weights wi =µ ˜i • Equivalent to the Newton-Raphson iteration

11 / 36 Iteratively reweighted least squares (IRLS) algorithm (cont.)

(WLS) estimate:

" n #−1 " n # " n #−1 " n # ˆWLS X 0 X X 0 X β = wi xi xi wi xi zi = µ˜i xi xi µ˜i xi zi i=1 i=1 i=1 i=1 with " n # n   " n # n X X 0 ˜ 1 X 0 ˜ X µ˜i xi zi = µ˜i xi xi β + (yi − µ˜i ) = µ˜i xi xi β + (yi − µ˜i )xi µ˜i i=1 i=1 i=1 i=1

= Gβ˜ − g • Hence, WLS βˆ = G−1(Gβ˜ − g) = β˜ − G−1g

12 / 36 Inference in the Poisson Regression Model

13 / 36 Likelihood ratio tests

• LR test : L(full) T = 2 ln = 2{ln L(full) − ln L(restricted)} L(restricted)

2 • Under H0, T ∼ χ (df) where df = the number of independent constraints 2 • Reject H0 if T > χ (1 − α, df) • Can be used to test the significance of

I An individual coefficient (a partial test) I two or more coefficients simultaneously I All coefficients (test of overall regression)

14 / 36 Standard errors of the maximum likelihood estimates and wald tests

• Estimate of the matrix:

" n #−1 ˆ ∼ −1 X 0 V (β) = G = µˆi xi xi i=1

• Wald confidence intervals for βj : ˆ ˆ βj ± (1.96)s.e.(βj )

15 / 36 Standard errors of the estimated mean 0 ˆ • Estimate of the ith mean: µˆi = exp(xi β) • Tylor series expansion:

∼ ˆ 0 ∂µˆi 0 ˆ µˆi = µi + (β − β) ˆ = µi + µi x (β − β) ∂βˆ β=β i

• Estimated of µˆi :

" n #−1 ∼ 2 0 ˆ ∼ 2 0 X 0 V (ˆµi ) = µi xi V (β)xi = µi xi µˆi xi xi xi i=1

• Approximate 95% confidence intervals for µi : v u " n #−1 u 0 X 0 µˆi ± (1.96)ˆµi txi µˆi xi xi xi i=1

16 / 36 • Saturated model:

I ith log-likelihood, yi ln(µi ) − µi , is maximized for µi = yi n X I Log-likelihood function: c + [yi ln(yi ) − yi ] i=1 • Deviance: D = 2{ln L(saturated) − ln L(parameterized)} ( n n ) X X = 2 [yi ln(yi ) − yi ] − [yi ln(ˆµi ) − µˆi ] i=1 i=1 n     n    X yi X yi = 2 yi ln − (yi − µˆi ) = 2 yi ln µˆi µˆi i=1 i=1

n X since [yi − µˆi ] = 0 as long as an intercept is included in i=1 the model

17 / 36 Goodness of fit y  • Second-order Taylor series expansion of y ln around µ y = µ: y  1 y ln =∼ (y − µ) + (y − µ)2 µ 2µ • Pearson chi-square statistic:

n   n 2 ∼ X 1 2 X (yi − µˆi ) 2 D = 2 (yi − µˆi ) + (yi − µˆi ) − (yi − µˆi ) = = χ 2µˆ µˆi i=1 i=1 • Goodness of fit: 2 2 I D (or χ ) > χ (1 − α, n − p − 1) I standardized deviance or standardized Pearson chi-square statistic, D  χ2  or  1 n − p − 1 n − p − 1 ⇒ question the adequacy of the model

18 / 36 Residuals

• Deviance residuals: s     yi di = sign(yi − µˆi ) 2 yi ln − (yi − µˆi ) µˆi

• Pearson residuals: yi − µˆi ri = √ µˆi

19 / 36