Chapter 5 Building Logistic Regression Models Model Selection with Many Predictors

Chapter 5 Building Logistic Regression Models Model Selection with Many Predictors Example (Horseshoe Crabs) Y = whether female crab has satellites (1 = yes, 0 = no). Explanatory variables: Weight 5.1 Model selection I 5.2 Model checking (Deviance, Residuals) I Width 5.3 Watch out for \sparse" categorical data I Color (ML, M, MD, D) w/ dummy vars c1, c2, c3 I Spine condition (3 categories) w/ dummy vars s1, s2 Consider model for crabs: logit(P(Y = 1)) = α + β1c+β2c2 + β3c3 + β4s1 + β5s1 + β5weight + β7width Chapter 5 - 1 Chapter 5 - 2 I Residual deviance: 185.20 is the deviance of the model fitted. Coefficients: Estimate Std. Error z value Pr(>|z|) Ha : logit(P(Y = 1)) = α + β1c+β2c2 + β3c3 + β4s1 + β5s1 (Intercept) -8.06501 3.92855 -2.053 0.0401 * C2 -0.10290 0.78259 -0.131 0.8954 + β5weight + β7width C3 -0.48886 0.85312 -0.573 0.5666 C4 -1.60867 0.93553 -1.720 0.0855 . I Null deviance: 225.76 is the deviance under the model: S2 -0.09598 0.70337 -0.136 0.8915 S3 0.40029 0.50270 0.796 0.4259 H0 : logit(P(Y = 1)) = α: Weight 0.82578 0.70383 1.173 0.2407 Width 0.26313 0.19530 1.347 0.1779 H0 means β1 = β2 = ··· = β7 = 0 in the model under Ha, --- which means none of the predictor has an effect. (Dispersion parameter for binomial family taken to be 1) LR statistic = −2(L0 − L1) = diff. of deviances Null deviance: 225.76 on 172 degrees of freedom = 225:76 − 185:20 = 40:56 Residual deviance: 185.20 on 165 degrees of freedom AIC: 201.2 df = 7, P-value < 0:0001 Strong evidence saying at least one predictor has an effect. None of the terms is significant in the Wald test, but... But NONE of the terms is significant in the Wald test. Why? Chapter 5 - 3 Chapter 5 - 4 Multicollinearity Backward Elimination Multicollinearity, which means \strong correlations among predictors", causes troubles in linear models and GLMs. 1. Start with a complex model (e.g., including all predictors and E.g., Corr(weight,width) = 0:89 interactions) ● 5 2. Drop \least significant” (i.e., largest P-value) variable among highest-order terms. 4 ● ● I Cannot remove a main effect term w/o removing its ● ● ● ● ● ● ●●●●●● ● ● ●●● higher-order interactions ●●●● ● ● ● 3 ● ●●●●●●● ● ●● ● ● ●●● ●● ● ●● ●● ● ●●● ●●●● ● ●● I Cannot remove a single dummy var. of a categorical predictor ●● ● ● ● ●●● ● ● ●● ●●●●●● ● Weight (kg) Weight ●●● ●● ● ● ●●●●● ●●●● w/ > 2 categories ● ●● ● ●● ●● 2 ●●●●●●●● ● ● ● ●●●● ● ●● ●● ● ●● ● ● ● ●● ● 3. Refit model. 1 4. Continue until all variables left are “significant” 22 24 26 28 30 32 34 Width (cm) I Other automatic model selection procedures: forward selection, stepwise Recall βi is partial effect of xi on response controlling for other variables in model. Sufficient to pick one of Weight and Width for a model. Chapter 5 - 5 Chapter 5 - 6 Akaike Information Criterion (AIC) Example (Mouse Muscle Tension, Revisit) We demonstrate the Backward Elimination procedure for the Mouse Muscle Tension data. Akaike information criterion (AIC) is a model selection criterion > mouse.muscle = read.table("mousemuscle.dat",header=T) that selects the model minimizes > mouse.muscle W M D tension.high tension.low 1 High 1 1 3 3 AIC = −2(maximized log-likelihood) + 2(num. of parameters): 2 High 1 2 21 10 3 High 2 1 23 41 4 High 2 2 11 21 prefer simple models (few parameters) with good fit 5 Low 1 1 22 45 I 6 Low 1 2 32 23 I can be used to compare models that neither is a special case 7 Low 2 1 4 6 of the other, e.g., binomial models w/ diff. link functions 8 Low 2 2 12 22 > attach(mouse.muscle) > T = cbind(tension.high,tension.low) # response > M = as.factor(M) # Muscle Type > D = as.factor(D) # Drug Chapter 5 - 7 Chapter 5 - 8 Backward elimination starts from the most complex model | 3-way interaction model, and then check significance of the highest An alternative way to check significance: order term | 3-way interactions. > drop1(glm3, test="Chisq") Single term deletions > glm3 = glm(T ~ W*M*D, family=binomial) Model: > glm2 = glm(T ~ W*M + M*D + W*D, family=binomial) T ~ W * M * D Df Deviance AIC LRT Pr(>Chi) > anova(glm2,glm3,test="Chisq") <none> 0.000 46.117 Analysis of Deviance Table W:M:D 1 0.111 44.228 0.111 0.739 Model 1: T ~ W * M + M * D + W * D Model 2: T ~ W * M * D Only 3-way interaction is shown in the output of drop1 because Resid. Df Resid. Dev Df Deviance Pr(>Chi) drop1 drops one term at a time, other lower-order terms 1 1 0.111 2 0 0.000 1 0.111 0.739 (W,M,D,W*M,M*D,W*D) cannot be dropped if 3-way interaction is in the model. 3-way interaction is not significant. Chapter 5 - 9 Chapter 5 - 10 After eliminating (W:D), we fit the model After eliminating the insignificant 3-way interaction, we consider the model with all 2-way interactions. W ∗ M + M ∗ D = W + M + D + W ∗ M + M ∗ D > glm2 = glm(T ~ W*M + M*D + W*D, family=binomial) > drop1(glm2,test="Chisq") > glm2a = glm(T ~ W*M + M*D, family=binomial) Single term deletions > drop1(glm2a, test="Chisq") Model: Single term deletions T~W*M+M*D+W*D Df Deviance AIC LRT Pr(>Chi) Model: <none> 0.11100 44.228 T ~ W * M + M * D W:M 1 1.05827 43.175 0.94727 0.3304 Df Deviance AIC LRT Pr(>Chi) M:D 1 2.80985 44.926 2.69886 0.1004 <none> 0.1195 42.236 W:D 1 0.11952 42.236 0.00852 0.9264 W:M 1 1.0596 41.176 0.9401 0.33225 M:D 1 4.6644 44.781 4.5449 0.03302 * Among the highest order terms (2-way interaction), W:D has the largest P-value and hance is least significant, so W:D is eliminated. This time, W:M is eliminated for it has the largest P-value among two-way interaction terms. Chapter 5 - 11 Chapter 5 - 12 After eliminating W:M, we fit the model W + M*D Then we check model M ∗ D, as M:D is significant. We cannot Note W is still in the model as we eliminate W:M from the model eliminate further or the model is not hierarchical. W*M + M*D. > glm2c = glm(T ~ M*D, family=binomial) > # glm2b = glm(T ~ M*D, family=binomial) # not this one! > drop1(glm2c, test="Chisq") > glm2b = glm(T ~ W + M*D, family=binomial) Single term deletions > drop1(glm2b, test="Chisq") Single term deletions Model: T ~ M * D Model: Df Deviance AIC LRT Pr(>Chi) T ~ W + M * D <none> 1.5289 39.646 Df Deviance AIC LRT Pr(>Chi) M:D 1 7.6979 43.814 6.169 0.013 * <none> 1.0596 41.176 W 1 1.5289 39.646 0.4693 0.49332 The model selected by the backward elimination procedure is M:D 1 5.3106 43.427 4.2510 0.03923 * M ∗ D: Though W is of lower order than M:D, but W is not a component of This model also has the smallest AIC value, 39.646, among all M:D. The model is still hierarchical if we drop W and keep M:D. models considered. Chapter 5 - 13 Chapter 5 - 14 Backward Elimination in R R function can do the backward elimination procedure Step: AIC=41.18 step() T ~ W + M + D + M:D we've just done automatically. Df Deviance AIC LRT Pr(>Chi) - W 1 1.5289 39.646 0.4693 0.49332 > step(glm3, test="Chisq") <none> 1.0596 41.176 Start: AIC=46.12 - M:D 1 5.3106 43.427 4.2510 0.03923 * T ~ W * M * D Df Deviance AIC LRT Pr(>Chi) Step: AIC=39.65 - W:M:D 1 0.111 44.228 0.111 0.739 T ~ M + D + M:D <none> 0.000 46.117 Df Deviance AIC LRT Pr(>Chi) <none> 1.5289 39.646 Step: AIC=44.23 - M:D 1 7.6979 43.814 6.169 0.013 * T ~ W + M + D + W:M + W:D + M:D Df Deviance AIC LRT Pr(>Chi) Call: glm(formula = T ~ M + D + M:D, family = binomial) - W:D 1 0.11952 42.236 0.00852 0.9264 - W:M 1 1.05827 43.175 0.94727 0.3304 Coefficients: <none> 0.11100 44.228 (Intercept) M2 D2 M2:D2 - M:D 1 2.80985 44.926 2.69886 0.1004 -0.65233 0.09801 1.12611 -1.19750 Step: AIC=42.24 Degrees of Freedom: 7 Total (i.e. Null); 4 Residual T ~ W + M + D + W:M + M:D Null Deviance: 19.02 Df Deviance AIC LRT Pr(>Chi) Residual Deviance: 1.529 AIC: 39.65 - W:M 1 1.0596 41.176 0.9401 0.33225 <none> 0.1195 42.236 - M:D 1 4.6644 44.781 4.5449 0.03302 * Chapter 5 - 15 Chapter 5 - 16 Forward Selection in R Forward Selection in R (Cont'd) The R function step() can also do forward selection, which starts with a model with only an intercept (~1), and one most significant Step: AIC=43.81 T ~ D + M variable is added at each step, until none of remaining variables are Df Deviance AIC LRT Pr(>Chi) “significant” when added to the model. + M:D 1 1.5289 39.646 6.1690 0.0130 * + W 1 5.3106 43.427 2.3872 0.1223 To run forward selection, you'll need to specify the \scope" of the <none> 7.6979 43.814 search.

Load more