Log-Linear Models: Poisson Regression Y H Chan
Total Page:16
File Type:pdf, Size:1020Kb
Basic Statistics For Doctors Singapore Med J 2005; 46(8) : 377 CME Article Biostatistics 306. Log-linear models: poisson regression Y H Chan Log-linear models are used to determine whether Table Ic. Chi-square test. there are any significant relationships in multiway Chi-square tests contingency tables that have three or more categorical Value df Asymp Exact Exact variables and/or to determine if the distribution of sig. sig. sig. the counts among the cells of a table can be explained (2-sided) (2-sided) (1-sided) by a simpler, underlying structure (restricted model). Pearson chi-square 3.407b 1 .065 The saturated model contains all the variables being Continuity correctiona 2.904 1 .088 analysed and all possible interactions between the Likelihood ratio 3.417 1 .065 variables. Let us use a simple 2X2 cross-tabulation (over-eating Fisher’s exact test .068 .044 versus over-weight, Table Ia) to illustrate the log-linear Linear-by-linear 3.390 1 .066 model analysis. Table Ib shows the SPSS data structure association and their association could easily be assessed using No. of valid cases 200 (1) the chi-square test (test of independence). Table Ic a. Computed only for a 2x2 table. shows that there is no association (phew!), p=0.065 b. 0 cells (.0%) have expected count less than 5. The minimum and Table Id shows the corresponding risk estimates. expected count is 47.52. Table Ia. Over-eating x over-weight. Table Id. Risk estimate table. Over-eating * over-weight cross-tabulation Risk estimate Over-weight 95% confidence interval Yes No Total Value Lower Upper Over-eating Yes Count 58 41 99 Odds ratio for 1.691 .966 2.960 % within 55.8% 42.7% 49.5% over-eating (yes/no) over-weight For cohort over-weight 1.286 .982 1.685 No Count 46 55 101 = yes % within 44.2% 57.3% 50.5% For cohort over-weight .761 .567 1.021 over-weight = no Total Count 104 96 200 No. of valid cases 200 Yong Loo Lin School of Medicine % within 100.0% 100.0% 100.0% National University over-weight of Singapore Block MD11 We shall use the log-linear model analysis for the Clinical Research above 2X2 table. Centre #02-02 Table Ib. SPSS data structure for over-eating x over-weight. 10 Medical Drive Before running the analysis for the log-linear Singapore 117597 Over-eating Over-weight Count model, we have to “weight cases” using the variable Y H Chan, PhD Head Yes Yes 58 Count first. Go to Data, Weight Cases to get Biostatistics Unit Yes No 41 Template I. Check on the “Weight cases by” and input Correspondence to: “Count” to the Frequency Variable option. Dr Y H Chan No Yes 46 Tel: (65) 6874 3698 Fax: (65) 6778 5743 No No 55 Email: medcyh@ nus.edu.sg Coding Yes = 1 & No = 2. Singapore Med J 2005; 46(8) : 378 Template I. Declaring “count” as the “Weight cases by”. Template IV. Display options. Go to Analyze, Loglinear, General to get Template II. Put Over-weight and Over-eating into the Factors option (a maximum of 10 categorical variables could be included). Template II. Declaring only categorical variables. Check the Estimates box. The following options are available in the Saved folder (Template V). Leave them unchecked. Template V. Save options. Leave the “Distribution of Cell Counts” as Poisson, then click on the Model folder, and see Template III. The Saturated model gives all possible interactions between the categorical variables. In this case, the model will be Over-weight + Over-eating + The model information and goodness-of-fit Over-eating X Over-weight. statistics will be automatically displayed. SPSS output – Saturated Model (only relevant Template III. Defining the saturated model. tables shown) Table II shows the goodness-of-fit test, which will always result in a chi-square value of 0 because the saturated model will fully explain all the relationships among the variables. Table II. Goodness-of-fit test. Goodness-of-fit testsa,b Value df Sig. Likelihood ratio .000 0 . Click on the Options folder in Template II to get Pearson chi-square .000 0 . Template IV. a. Model: Poisson. b. Design: Constant + over_weight + over_eating + over_weight* over_eating. Singapore Med J 2005; 46(8) : 379 Table III. Saturated model – parameter estimates. Parameter estimatesb,c 95% confidence interval Parameter Estimate Std. error Z Sig. Lower bound Upper bound Constant 4.016 .134 29.922 .000 3.753 4.279 [over_weight = 1.00] -.177 .199 -.890 .373 -.567 .213 [over_weight = 2.00] 0a ..... [over_eating = 1.00] -.291 .205 -1.417 .157 -.693 .112 [over_eating = 2.00] 0a ..... [over_weight = 1.00]* .525 .284 1.831 .067 -.037 1.077 [over_eating = 1.00] [over_weight = 1.00]* 0a ..... [over_eating = 2.00] [over_weight = 2.00]* 0a ..... [over_eating = 1.00] [over_weight = 2.00]* 0a ..... [over_eating = 2.00] a. This parameter is set to zero because it is redundant. b. Model: Poisson c. Design: Constant + over_weight + over_eating + over_weight * over_eating Table III shows the parameter estimates of the Template VI. Defining the restricted over-weight + over-eating model. saturated model. Taking the exponential (exp) of the estimate gives the odds ratio. We are particularly interested in the interaction term [over_weight = 1.00] * [over_eating = 1.00] which assesses the association between the 2 variables. This interaction’s estimate is 0.525 and exp (0.525) = 1.691 with a p-value of 0.067 – which is exactly the same results obtained using Chi-square test (Tables Ic & Id). The main effect ([over_weight = 1.00] and [over_eating = 1.00]) tests on the null hypothesis that the subjects are distributed evenly over the levels of each variable. Here we have both variables quite evenly distributed (over-weight: 52% vs 48% and In Template IV, check on the Residuals and over-eating: 49.5% vs 50.5%, Table Ib), thus Frequencies options, and clear all the plot options. p>0.05 for both main effects. SPSS outputs – Restricted model: Over-weight + The standardised form (Z) can be used to assess Over-eating. which variables/interactions in the model are the most or least important to explain the data. The Table IVa. Goodness-of-fit test: Over-weight + Over-eating. higher the absolute of Z, the more “important”. a,b If our interest is to determine relationships, Goodness-of-fit tests we can stop here. But if we want to develop a simpler Value df Sig. model, then the next simpler (restricted) model Likelihood ratio 3.417 1 .065 will be Over-weight + Over-eating (ignoring their Pearson chi-square 3.407 1 .065 interaction, since the 2 variables are independent). a. To define this Over-weight + Over-eating restricted Model: Poisson. b. model, click on the custom button in Template III. Design: Constant + over_eating + over_weight. Put Over-weight and Over-eating to the Terms in Model option (Template VI). Singapore Med J 2005; 46(8) : 380 Table IVb. Residual analysis for Over-weight + Over-eating. Cell counts and residualsa,b Observed Expected Over- Over- Standardised Adjusted weight eating Count % Count % Residual residual residual Deviance Yes Yes 58 29.0% 51.480 25.7% 6.520 .909 1.843 .890 No 46 23.0% 52.520 26.3% -6.520 -.900 -1.843 -.919 No Yes 41 20.5% 47.520 23.8% -6.520 -.946 -1.843 -.969 No 55 27.5% 48.480 24.2% 6.520 .936 1.843 .917 a. Model: Poisson. b. Design: Constant + over_eating + over_weight. The goodness-of-fit test (Table IVa) compares Table Vb shows the SPSS structure. whether this restricted model (Over-weight + Over-eating) is an adequate fit to the data. We want Table Vb. SPSS data structure for Over-weight, Over-eating the p-value (sig) to be >0.05. In this case, we have and Gender. p=0.065 which means that this restricted model is Over-eating Over-weight Gender Count adequate to fit the data. Yes Yes Male 44 Residual analysis helps us to spot outlier cells, where the restricted model is not fitting well. The Yes No Male 23 Residual is the difference of the expected frequencies No Yes Male 26 and the observed cell frequencies. The smaller No No Male 23 the residual, the better the model is working for Yes Yes Female 14 that cell. The Standardized residuals (normalised Yes No Female 18 against the mean and standard deviation) should have values <1.96 for a good fit. The Adjusted No Yes Female 20 (Studentized) residuals penalise for the fact that No No Female 32 large expected values tend to have larger residuals. Coding: Yes = 1 & No = 2. Male = 1 & Female = 2. Cells with the largest adjusted residuals show where the model is working least well. The Studentized We can start by constructing the saturated model deviance residuals (Deviance) are a more accurate and then remove the non-significant terms, or start version of adjusted residuals. from the basic main effects model (without interaction If we decide that over-weight is a response terms) and then build up. Let us use the latter. variable and over-eating is the independent, a logistic Table Vc shows the goodness-of-fit for the restricted regression (taking into account of other covariates) model of Over-weight + Over-eating + Gender (main (2) could be performed . effects only). The p-value is <0.05, which shows that But if both are dependent variables (I over-eat thus this model is not adequate to explain the data I am over-weight or I am over-weight thus I over-eat), then a logistic model will not be appropriate.