Advanced Regression Methods Symposium on Updates on Clinical Research Methodology March 18, 2013
Total Page:16
File Type:pdf, Size:1020Kb
Advanced Regression Methods Symposium on Updates on Clinical Research Methodology March 18, 2013 Lloyd Mancl, PhD Charles Spiekerman, PhD Oral Health Sciences Oral Health Sciences University of Washington University of Washington [email protected] [email protected] Outline • Introduce common regression methods for different types of outcomes – Logistic regression – Multiple linear regression – Cox proportional hazards regression – Poisson or log-linear regression • Uses of regression: – Adjust for confounding – Assess for effect modification or interaction – Account for non-independent outcomes Uses for Multiple regression analysis • Used to adjust for confounding – In observational studies, groups of interest can differ on other variables that may be related to the outcome. – In an RCT, randomization may not result in balanced groups. • Used to assess simultaneously the associations for several explanatory variables, as well as, interactions between variables – In observational studies, you may be interested in how several explanatory variables are related to an outcome. – In an RCT, we are often interested in testing if treatment is modified by another variable (i.e., interaction or moderation). – In designed experiments, we typically test for interactions between the different study factors. • Used to develop a prediction equation – Not commonly used for this purpose; not covered in this workshop. Multiple regression analysis Models the association between one outcome variable, Y, and multiple variables of interest, X1, X2,…,Xk. Multiple Linear regression model Y = α + β1X1 + β2X2 + … + βkXk + random error Generalized Linear Model G(Y) ~ α + β1X1 + β2X2 + … + βkXk (basically, a more complicated version of the outcome, Y, is related to a linear combination of the variables of interest) Common Multiple Regression Methods Outcome variable Regression method Regression results Slopes & difference Quantitative/continuous Linear regression between means Binary (2 categories) Logistic regression Odds ratio Poisson or Count or count rate Relative risk or rate ratio log-linear regression Cox proportional hazards Time to an event Hazard ratio regression Ordinal (>2 categories) Ordinal logistic regression Odds ratio Multinomial logistic Nominal (>2 categories) Odds ratio regression Regression method depends on the outcome • Continuous or quantitative outcome – linear regression – Amount of attachment loss (mm) – Change in dmfs • Binary outcome – logistic regression – Any new decay – Incident TMD • Time to event outcome - Cox proportional hazards regression – Time to tooth loss – Time to pulp cap failure • Count outcome – Poisson or log-linear regression – Number of new caries – Rate of new caries Example: heart disease and periodontitis • NHANES II –observational study that examined a large number of participants at a baseline visit and followed them for over 10 years to ascertain morbid events. • We are interested in assessing the association between periodontal disease evaluated at baseline and the occurrence of heart disease (CHD) within 10 years of study entry. Logistic regression In this analysis the outcome variable, CHD incidence, is a binary variable, so a regression method we could employ is logistic regression CHD risk by exposure group Group Healthy Periodontal Odds 95% Conf. Int. gums disease Ratio CHD incidence 4.9% 13.5% 3.0 (2.5, 3.7) Logistic regression uses the Odds Ratio as an estimate of association between your independent variable and the outcome Confounding • Confounding occurs when there is a third variable that is strongly related to both the dependent and the independent variable. • This can bias an estimate of association. ? Y X1 X2 • With respect to Periodontitis and CHD, an obvious potential confounder is Age. • We can adjust for the potential confounding effects by entering Age into the logistic regression model as an additional independent variable CHD incidence by periodontal disease Logistic Regression with Periodontal disease as the only independent variable Independent Variable Odds Ratio 95% Conf. Int. Healthy Gums 1 - Periodontal Disease 3.0 (2.5, 3.7) Logistic Regression model with Age added Independent Variable Odds Ratio 95% Conf. Int. Healthy Gums 1 - Periodontal Disease 1.6 (1.3, 2.0) Age (10 year increment) 2.1 (1.9, 2.2) CHD incidence by periodontal disease Logistic Regression model with Age added Independent Variable Odds Ratio 95% Conf. Int. Healthy Gums 1 - Periodontal Disease 1.6 (1.3, 2.0) Age (10 year increment) 2.1 (1.9, 2.2) • By controlling for Age the estimated association of Periodontal disease with CHD is less strong. • The association is still statistically significant (confidence interval does not contain 1). • The Age output indicates 2.1 times higher odds of CHD associated with 10 years greater Age Effect Modification / Interaction • An interaction is when the association or relationship between an explanatory variable and the outcome variable depends on the value of another explanatory variable. • Also called effect modification or moderation. • In extreme cases, an interaction may completely reverse the relationship between the explanatory variable and outcome. • More commonly, the effect is stronger (or weaker) depending on the value of another explanatory variable. • Stratification can be used to identify an interaction. • Can use regression to test for an interaction by adding an interaction term/variable in the regression model, which is the product of two explanatory variables. Chewing Gum Study • Subjects randomly assigned to 3 different chewing gums • Outcome was continuous, change in DMFS • Linear regression used to compare the 3 groups, adjusting for baseline DMFS DMFS Change Baseline DMFS Group n Mean (SD) Mean (SD) A 25 -0.72 (5.37) 4.68 (1.02) B 35 -0.83 (3.57) 3.77 (0.55) C 40 2.63 (3.80) 3.67 (0.57) Linear regression results Coefficient Standard Estimate Error P-value Intercept 1.30 0.90 .15 Group B (vs A) -0.50 1.01 .62 Group C (vs A) 2.91 0.98 .004 Baseline DMFS -0.43 0.10 <.001 Group main effect, p-value <.001 • Model estimates a constant group difference • DFMS change 2.91 greater for Group C than Group A Linear regression results Coefficient Standard Estimate Error P-value Intercept 1.30 0.90 .15 Group B (vs A) -0.50 1.01 .62 Group C (vs A) 2.91 0.98 .004 Baseline DMFS -0.43 0.10 <.001 Group main effect, p-value <.001 • Model estimates a constant group difference • DFMS change 2.91 greater for Group C than Group A • DFMS change -0.50 less for Group B than Group A Scatterplot of change in DMFS versus baseline DMFS for the treatment groups (A, B, C) 15 Group A Group B Group C 10 5 0 -5 ChangeDMFS in -10 -15 -20 0 5 10 15 20 25 Baseline DMFS • Group x baseline DMFS interaction added to the linear regression model to test if group differences are affected by baseline DMFS Coefficient Standard Estimate Error P-value Intercept 2.96 0.97 .003 Group B (vs A) -1.53 1.33 .25 Group C (vs A) -0.79 1.26 .53 Baseline DMFS -0.79 0.14 <.001 Group B x Baseline DMFS 0.19 0.23 .42 Group C x Baseline DMFS 0.91 0.21 <.001 • Group x Baseline DMFS interaction, p-value <.001 • Difference between Group C and A increases with baseline DMFS Scatterplot of change in DMFS versus baseline DMFS for the treatment groups (A, B, C) 15 Group A Group B Group C 10 5 0 -5 ChangeDMFS in -10 -15 -20 0 5 10 15 20 25 Baseline DMFS • Difference between Group C and A increases with baseline DMFS Effect Modification / Interaction • Test for interactions after all main effects are included in the regression model. • Typically, only assess for two-way interactions. • Usually only test interactions, when at least one of the variables has a significant main effect. • (Exceptions for designed experiments involving a small number of factors, where all possible interactions may be assessed). • Lower significance level (e.g., p<0.01) may be used, if testing a large number of interactions, to control type I error due to multiple comparisons. Survival Analysis: Time to event data • In some studies the outcome of interest is the time until an event – Time to implant failure – Time to tooth loss – Time to death • Analyses of this type of data are commonly called “Survival Analysis” Censored events • In most time to event studies a non-trivial portion of the events will not be observed because they don’t occur during the period of observation. • These unobserved events are considered “censored” • For the censored events we don’t know the actual time until the event, but we do know that the time until event is at least as great as the time until the patient was last seen. • Survival analysis uses the information on the complete observations and the censored observations in a smart way. Censored events Enrollment date Event date Time Start of End of enrollment study • The 2nd, 4th and 5th patients have censored times to event. • The 1st, 3rd and 6th patients we know the exact times to event. Periodontal disease and tooth loss • One hundred periodontal patients under maintenance care*. • Interest in assessing factors associated with tooth loss. • Outcome is time to loss of tooth. • We will look at oral hygiene, patient age, and smoking. *M. McGuire & M. Nunn, J Periodontology, 1996; 67:666-674. Kaplan-Meier survival plots 1.00 1.00 1.00 Hygiene Age Smoking good or fair less than 50 not smoker poor 50 or older smoker 0.98 0.98 0.98 0.96 0.96 0.96 Survival Probability Survival 0.94 0.94 0.94 0.92 0.92 0.92 0 5 10 15 0 5 10 15 0 5 10 15 Time (years) Time (years) Time (years) Kaplan-Meier plots present estimates of the survival function, S(t). S(t) = Probability of surviving to time t Cox proportional hazards regression • If some simplifying assumptions hold, then one can compare survival probabilities using a regression framework.