[2Mm] Generalized Linear Models and Poisson Regression

CHAPTER 12 Generalized Linear Models and Poisson Regression 1 / 36 The Model 2 / 36 Generalized linear models • Extensions of traditional linear models (e.g., logistic regression model) • Allow (i) Population mean to depend on a linear predictor through a nonlinear link function (ii) Response distribution to be any member of the exponential family • Three building blocks I y1; y2;:::; yn following the same distribution from the exponential family I β and x1; x2;:::; xp 0 I Monotone link function: g(µi ) = xi β 3 / 36 Examples • Standard linear regression model I g(µ) = µ I y ∼ Normal • Logistic regression model µ I g(µ) = ln 1 − µ I y ∼ Bernoulli (or binomial) • Poisson regression model I g(µ) = ln µ I y ∼ Poisson 4 / 36 Poisson regression model • When the response represents count data • Examples: the number of I Daily equipment failures I Weekly traffic fatalities I Monthly insurance claims I ::: • Poisson distribution µy P(Y = y) = e−µ; y = 0; 1; 2;::: y • E(Y ) = V (Y ) = µ > 0 5 / 36 Poisson regression model (cont.) • Link function: g(µ) = ln µ = β0 + β1x1 + ··· + βpxp • Then µ = exp(β0 + β1x1 + ··· + βpxp) > 0 • Interpretation: a unit change of x1 affects the mean by exp[β + β (x + 1) + β x + ··· + β x ] − exp[β + β x + β x + ··· + β x ] 100 0 1 2 2 p p 0 1 2 2 p p exp[β0 + β1x + β2x2 + ··· + βpxp] = 100[exp(β1) − 1] percent 6 / 36 Estimation of the Parameters in the Poisson Regression Model 7 / 36 Maximum likelihood estimation • Likelihood function n yi Y µi L(βjy1; y2;:::; yn) = exp(−µi ) yi i=1 • Lok-likelihood function n n X X ln L(βjy1; y2;:::; yn) = c + yi ln µi − µi i=1 i=1 X where c = − ln(yi) • Use the Newton-Raphson method 8 / 36 Maximum likelihood estimation (cont.) • Derivative: n n @ ln L X y X y − µ = i − 1 = i i @µi µi µi i=1 i=1 • Note: @µi =@β = µi xi • Hence, n n @ ln L @ ln L @µi X yi − µi X = = µi xi = (yi − µi )xi @β @µi @β µi i=1 i=1 • Maximum likelihood score equations: X 0(y − µ) = 0 9 / 36 Newton-Raphson procedure • First derivative of the negative log-likelihood: n @ ln L X g = − = − (y − µ )x @β i i i i=1 • Second derivative: ( n ) n @2 ln L @ X X − = − (yi − µi )xij∗ = µi xij xij∗ @βj @βj∗ @βj i=1 i=1 • Hessian matrix: n X 0 G = µi xi xi i=1 • Newton-Raphson iteration: β∗ = β~ − [G(β~)]−1g(β~) 10 / 36 Iteratively reweighted least squares (IRLS) algorithm • Iteratively computed response: 0 ~ 1 zi = xi β + (yi − µ~i ) µ~i • Weighted linear regression of zi on xi with weights wi =µ ~i • Equivalent to the Newton-Raphson iteration 11 / 36 Iteratively reweighted least squares (IRLS) algorithm (cont.) • Weighted least squares (WLS) estimate: " n #−1 " n # " n #−1 " n # ^WLS X 0 X X 0 X β = wi xi xi wi xi zi = µ~i xi xi µ~i xi zi i=1 i=1 i=1 i=1 with " n # n " n # n X X 0 ~ 1 X 0 ~ X µ~i xi zi = µ~i xi xi β + (yi − µ~i ) = µ~i xi xi β + (yi − µ~i )xi µ~i i=1 i=1 i=1 i=1 = Gβ~ − g • Hence, WLS β^ = G−1(Gβ~ − g) = β~ − G−1g 12 / 36 Inference in the Poisson Regression Model 13 / 36 Likelihood ratio tests • LR test statistic: L(full) T = 2 ln = 2fln L(full) − ln L(restricted)g L(restricted) 2 • Under H0; T ∼ χ (df) where df = the number of independent constraints 2 • Reject H0 if T > χ (1 − α; df) • Can be used to test the significance of I An individual coefficient (a partial test) I two or more coefficients simultaneously I All coefficients (test of overall regression) 14 / 36 Standard errors of the maximum likelihood estimates and wald tests • Estimate of the covariance matrix: " n #−1 ^ ∼ −1 X 0 V (β) = G = µî xi xi i=1 • Wald confidence intervals for βj : ^ ^ βj ± (1:96)s.e.(βj ) 15 / 36 Standard errors of the estimated mean 0 ^ • Estimate of the ith mean: µî = exp(xi β) • Tylor series expansion: ∼ ^ 0 @µî 0 ^ µî = µi + (β − β) ^ = µi + µi x (β − β) @β^ β=β i • Estimated variance of µî : " n #−1 ∼ 2 0 ^ ∼ 2 0 X 0 V (^µi ) = µi xi V (β)xi = µi xi µî xi xi xi i=1 • Approximate 95% confidence intervals for µi : v u " n #−1 u 0 X 0 µî ± (1:96)^µi txi µî xi xi xi i=1 16 / 36 Deviance • Saturated model: I ith log-likelihood, yi ln(µi ) − µi ; is maximized for µi = yi n X I Log-likelihood function: c + [yi ln(yi ) − yi ] i=1 • Deviance: D = 2fln L(saturated) − ln L(parameterized)g ( n n ) X X = 2 [yi ln(yi ) − yi ] − [yi ln(^µi ) − µî ] i=1 i=1 n n X yi X yi = 2 yi ln − (yi − µî ) = 2 yi ln µî µî i=1 i=1 n X since [yi − µî ] = 0 as long as an intercept is included in i=1 the model 17 / 36 Goodness of fit y • Second-order Taylor series expansion of y ln around µ y = µ: y 1 y ln =∼ (y − µ) + (y − µ)2 µ 2µ • Pearson chi-square statistic: n n 2 ∼ X 1 2 X (yi − µî ) 2 D = 2 (yi − µî ) + (yi − µî ) − (yi − µî ) = = χ 2µ^ µî i=1 i=1 • Goodness of fit: 2 2 I D (or χ ) > χ (1 − α; n − p − 1) I standardized deviance or standardized Pearson chi-square statistic, D χ2 or 1 n − p − 1 n − p − 1 ) question the adequacy of the model 18 / 36 Residuals • Deviance residuals: s yi di = sign(yi − µî ) 2 yi ln − (yi − µî ) µî • Pearson residuals: yi − µî ri = p µî 19 / 36.

Load more