Generalized Estimating Equations Generalized Estimating Equations Second Edition
Total Page:16
File Type:pdf, Size:1020Kb
Statistics Second Edition Generalized Estimating Equations Generalized Estimating Equations Second Edition Generalized Estimating Equations, Second Edition updates the best-selling previous edition, which has been the standard text on the subject since it was published a decade ago. Combining theory Generalized and application, the text provides readers with a comprehensive discussion of GEE and related models. Numerous examples are employed throughout the text, along with the software code used to create, run, and evaluate the models being examined. Stata is used as the primary software for running and displaying modeling output; Estimating associated R code is also given to allow R users to replicate Stata examples. Specific examples of SAS usage are provided in the final chapter as well as on the book’s website. This second edition incorporates comments and suggestions from a variety of sources, including the Statistics.com course on longitudinal Equations and panel models taught by the authors. Other enhancements include an examination of GEE marginal effects; a more thorough presentation of hypothesis testing and diagnostics, covering competing hierarchical models; and a more detailed examination of previously discussed Second Edition subjects. Along with doubling the number of end-of-chapter exercises, this edition expands discussion of various models associated with GEE, such as penalized GEE, cumulative and multinomial GEE, survey GEE, and quasi-least squares regression. It also offers a thoroughly new presentation of model selection procedures, including the introduction James W. Hardin of an extension to the QIC measure that is applicable for choosing Hardin Hilbe among working correlation structures. Joseph M. Hilbe K13819 K13819_Cover.indd 1 11/5/12 11:55 AM i \K13819" | 2012/11/1 | 9:02 i i i Generalized Estimating Equations Second Edition i i i i i \K13819" | 2012/11/1 | 9:02 i i i i i i i i \K13819" | 2012/11/1 | 9:02 i i i Generalized Estimating Equations Second Edition James W. Hardin University of South Carolina, USA Joseph M. Hilbe Jet Propulsion Laboratory, California Institute of Technology, USA and Arizona State University, USA i i i i CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2013 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20121207 International Standard Book Number-13: 978-1-4398-8114-9 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit- ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright. com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ To our wives, Mariaelena Castro-Hardin and Cheryl Lynn Hilbe, and our children, Taylor Antonio Hardin, Conner Diego Hardin, Heather Lynn Hilbe, Michael Joseph Hilbe, and Mitchell Jon Hilbe. ✐ ✐ ✐ ✐ ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ ✐ ✐ ✐ ✐ ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ Contents Preface xi 1 Introduction 1 1.1 Notational conventions and acronyms 2 1.2 A short review of generalized linear models 3 1.2.1 A brief history of GLMs 3 1.2.1.1 GLMs as likelihood-based models 4 1.2.1.2 GLMs and correlated data 4 1.2.2 GLMs and overdispersed data 5 1.2.2.1 Scaling standard errors 6 1.2.2.2 The modified sandwich variance estimator 6 1.2.3 The basics of GLMs 7 1.2.4 Link and variance functions 9 1.2.5 Algorithms 10 1.3 Software 12 1.3.1 R 13 1.3.2 SAS 15 1.3.3 Stata 16 1.3.4 SUDAAN 16 1.4 Exercises 17 2 Model Construction and Estimating Equations 19 2.1 Independent data 19 2.1.1 Optimization 20 2.1.2 The FIML estimating equation for linear regression 20 2.1.3 The FIML estimating equation for Poisson regression 23 2.1.4 The FIML estimating equation for Bernoulli regression 24 2.1.5 The LIML estimating equation for GLMs 26 2.1.6 The LIMQL estimating equation for GLMs 29 2.2 Estimating the variance of the estimates 30 2.2.1 Model-based variance 30 2.2.2 Empirical variance 31 2.2.3 Pooled variance 33 2.3 Panel data 34 2.3.1 Pooled estimators 35 vii ✐ ✐ ✐ ✐ ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ viii CONTENTS 2.3.2 Fixed-effects and random-effects models 37 2.3.2.1 Unconditional fixed-effects models 37 2.3.2.2 Conditional fixed-effects models 39 2.3.2.3 Random-effects models 46 2.3.3 Population-averaged and subject-specific models 53 2.4 Estimation 54 2.5 Summary 55 2.6 Exercises 55 2.7 R code for selected output 57 3 Generalized Estimating Equations 59 3.1 Population-averaged (PA) and subject-specific (SS) models 59 3.2 The PA-GEE for GLMs 61 3.2.1 Parameterizing the working correlation matrix 62 3.2.1.1 Exchangeable correlation 63 3.2.1.2 Autoregressive correlation 71 3.2.1.3 Stationary correlation 74 3.2.1.4 Nonstationary correlation 76 3.2.1.5 Unstructured correlation 78 3.2.1.6 Fixed correlation 79 3.2.1.7 Free specification 79 3.2.2 Estimating the scale variance (dispersion parameter) 82 3.2.2.1 Independence models 83 3.2.2.2 Exchangeable models 88 3.2.3 Estimating the PA-GEE model 91 3.2.4 The robust variance estimate 92 3.2.5 A historical footnote 96 3.2.6 Convergence of the estimation routine 97 3.2.7 ALR: Estimating correlations for binomial models 97 3.2.8 Quasi-least squares 101 3.2.9 Summary 102 3.3 The SS-GEE for GLMs 104 3.3.1 Single random-effects 105 3.3.2 Multiple random-effects 107 3.3.3 Applications of the SS-GEE 108 3.3.4 Estimating the SS-GEE model 111 3.3.5 Summary 113 3.4 The GEE2 for GLMs 113 3.5 GEEs for extensions of GLMs 114 3.5.1 Multinomial logistic GEE regression 115 3.5.2 Proportional odds GEE regression 116 3.5.3 Penalized GEE models 121 3.5.4 Cox proportional hazards GEE models 122 3.6 Further developments and applications 123 3.6.1 ThePA-GEE for GLMs with measurementerror 123 ✐ ✐ ✐ ✐ ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ CONTENTS ix 3.6.2 The PA-EGEE for GLMs 129 3.6.3 The PA-REGEE for GLMs 132 3.6.4 Quadratic inference function for marginal GLMs 135 3.7 Missing data 137 3.8 Choosing an appropriate model 146 3.9 Marginal effects 148 3.9.1 Marginal effects at the means 149 3.9.2 Average marginal effects 150 3.10 Summary 151 3.11 Exercises 153 3.12 R code for selected output 156 4 Residuals, Diagnostics, and Testing 161 4.1 Criterion measures 163 4.1.1 Choosing the best correlation structure 163 4.1.2 Alternatives to the original QIC 166 4.1.3 Choosing the best subset of covariates 169 4.2 Analysis of residuals 171 4.2.1 A nonparametric test of the randomness of residuals 171 4.2.2 Graphical assessment 172 4.2.3 Quasivariance functions for PA-GEE models 184 4.3 Deletion diagnostics 188 4.3.1 Influence measures 189 4.3.2 Leverage measures 194 4.4 Goodness of fit (population-averaged models) 195 4.4.1 Proportional reduction in variation 195 4.4.2 Concordance correlation 195 4.4.3 A χ2 goodness of fit test for PA-GEE binomial models 197 4.5 Testing coefficients in the PA-GEE model 200 4.5.1 Likelihood ratio tests 201 4.5.2 Wald tests 203 4.5.3 Score tests 205 4.6 Assessing the MCAR assumption of PA-GEE models 206 4.7 Summary 209 4.8 Exercises 210 5 Programs and Datasets 213 5.1 Programs 213 5.1.1 Fitting PA-GEE models in Stata 214 5.1.2 Fitting PA-GEE models in SAS 215 5.1.3 Fitting PA-GEE models in R 216 5.1.4 Fitting ALR models in SAS 217 5.1.5 Fitting PA-GEE models in SUDAAN 218 5.1.6 Calculating QIC(P) in Stata 219 5.1.7 Calculating QIC(HH) in Stata 220 ✐ ✐ ✐ ✐ ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ x CONTENTS 5.1.8 Calculating QICu in Stata 221 5.1.9 Graphing the residual runs test in R 222 5.1.10 Using the fixed correlation structure in Stata 223 5.1.11 Fitting quasivariance PA-GEE models in R 224 5.1.12 Fitting GLMs in R 225 5.1.13 Fitting FE models in R using the GAMLSS package 226 5.1.14 Fitting RE models in R using the LME4 package 226 5.2 Datasets 227 5.2.1 Wheeze data 227 5.2.2 Ship accident data 229 5.2.3 Progabide data 231 5.2.4 Simulated logistic data 237 5.2.5 Simulated user-specified correlated data 245 5.2.6 Simulated measurement error data for the PA-GEE 247 References 251 Author index 257 Subject index 259 ✐ ✐ ✐ ✐ ✐ ✐ “K13819” — 2012/11/1 — 9:11 ✐ ✐ Preface Second Edition We are pleased to offer this second edition to Generalized Estimating Equa- tions.