Introduction to the Generalized Estimating Equations and Its Applications in Small Cluster Randomized Trials

Introduction to the Generalized Estimating Equations and its Applications in Small Cluster Randomized Trials Fan Li BIOSTAT 900 Seminar November 11, 2016 1 / 24 Overview Outline Background The Generalized Estimating Equations (GEEs) Improved Small-sample Inference Take-home Message How GEEs work? How to improve small-sample inference, especially in cluster randomized trial (CRT) applications? 2 / 24 Cluster Randomized Trials (CRTs) Randomizing clusters of subjects rather than independent subjects (convenience, ethics, contamination etc.) Intervention administered at the cluster level Outcomes are measured at the individual level Outcomes from subjects within the same cluster exhibit a greater correlation than those from different clusters Intraclass correlation coefficient (ICC) ranges from 0:01 to 0:1 Interest lies in evaluating the intervention effect Small number of clusters with large cluster sizes 3 / 24 The Stop Colorectal Cancer (STOP CRC) Study Study the effect of intervention to improve cancer screening 26 health clinics (n = 26) are allocated to either usual care or intervention (1-to-1 ratio) Usual care - provided opportunistic/occasional screening Intervention - an automated program for mailing testing kits with instructions to patients The clinics contain variable number of patients (mi): Min 461/Med 1426/Max 3302 Primary outcome - The patient-level binary outcome (yij), completion status of screening test within a year of study initiation Baseline clinic- and individual-level covariates Inference { estimand? 4 / 24 The Estimand from Conditional Models T yi = (yi1; : : : ; yimi ) - the collection of outcomes from clinic i Xi - 'design' matrix (including intercept, treatment variable and baseline covariates) of cluster i The generalized linear mixed model g(E(yijXi; bi)) = Xiβ + 1mi bi g(·) - a smooth, invertible link function β is the regression coefficient (intervention effect) 2 bi ∼ N(0; σb ) - Gaussian random effects The estimand, defined by the component of β, typically has a cluster-specific (conditional) interpretation yijjxij; bi is assumed from an exponential family model (likelihood) 5 / 24 The Estimand from Marginal Models Recall basics of GLM Now let yijjxij from an exponential family model with µij = E(yijjxij) and variance νij = h(µij)/φ mean-variance relationship φ - dispersion parameter T Let µi = E(yijXi) = (µi1; : : : ; µimi ) Use g(µi) = Xiβ allowing for non-zero covariance of yi is parameterized by a marginal intervention effect (marginal w.r.t. what?) Population-average intervention effect - more straightforward interpretation To make inferences on β, how to describe the correlation between components of yi? 6 / 24 Generalized Estimating Equations (GEEs) Define Ri(α) is the mi × mi \working" correlation matrix of yi, with α an unknown nuisance parameter, then the \working" 1=2 1=2 covariance Vi = var(yijXi) = Ai R(α)Ai /φ Ai = diag(h(µi1); : : : ; h(µimi )) The GEEs are defined as n n X X T −1 Ui(β; φ; α) = Di Vi (yi − µi) = 0; i=1 i=1 T where Di = @µi=@β Only the first two moments of yij are assumed, quasi-likelihood score equation (Wedderburn, 1974) The efficient score equation from a semi-parametric restricted moment model (Tsiatis, 2006) Iterative algorithms (Newton-Raphson) for estimation 7 / 24 Dealing with Nuisances The working variance Vi contains the nuisance parameters α and φ It is possible to \profile out" these nuisances within the iterative procedure Moment-based estimators for nuisances: Given a current estimate β^, the Pearson residuals is 1=2 rîj = (yij − µîj)=νîj , which is typically used to estimate n mi ^ X X 2 φ = rîj=(N − p); i=1 j=1 Pn where N = i=1 mi and p is the dimension of β What about α? 8 / 24 Dealing with Nuisances - Cont'd Choices of the working correlation structure If Ri = I, the independent structure { no nuisances involved 0 If corr(yij ; yij0 ) = α for j 6= j , then we end up with the exchangeable correlation. The nuisance can be estimated by n ( n ) ^ X X X α^ = φ rîj rîj0 = mi(mi − 1)=2 − p i=1 j>j0 i=1 If Ri is assumed unstructured, then (loosely speaking) ^ n φ X −1=2 T −1=2 R (^α) = A^ (y − µ^ )(y − µ^ ) A^ i n i i i i i i i=1 Other types of correlation structures are available In CRT applications, the exchangeable structure is often assumed 9 / 24 Modified Newton's Algorithm Initiate β^ ! Compute φ^(β^) and α^(β;^ φ^(β^)) ! Update β^ by the Newton's method ! Repeat the last two steps until convergence Essentially we are solving β iteratively from n X ^ 0 = Ui β; α^(β; φ(β)) i=1 n X T −1 ^ = Di (β)Vi β; α^(β; φ(β)) (yi − µi(β)) ; i=1 with a working assumption on Ri(α). Why are these efforts worthwhile? It turns out that, under mild assumptions, the final solution β^ is consistent even if Ri(α) is misspecified 10 / 24 Asymptotics (A.1) Sufficient moments of components of Xi and yi exist (A.2) φ^ is root-n consistent given β (A.3) α^ is root-n consistent given β and φ (A.4) j@α^(β; φ)=@φj = Op(1) Under (A.1) - (A.4), β^ is consistent to the truth β0 and asymptotically normal with the sandwich covariance matrix n !−1 n ! n !−1 X T −1 X T −1 −1 X T −1 Vsand = Di Vi Di Di Vi cov(yi)Vi Di Di Vi Di i=1 i=1 i=1 (A.2) - (A.4) are usually fulfilled by the moment-based estimators for nuisances If Ri(α) is correctly specified, Vsand equals the model-based variance Pn T −1 −1 ( i=1 Di Vi Di) Doesn't depend on the nuisances as long as (A.2) and (A.3) hold robust or empirical variance estimator 11 / 24 Asymptotics - Cont'd The proof is centered on the classical theory of unbiased estimating equation (van de Vaart, 1998), which simply uses Taylor expansion The key is to realize that β^ is asymptotically linear with n !−1 n p −1 X T −1 1 X n(β^ − β0) = lim n D V Di p Ui(β0) + op(1) n!1 i i n i=1 i=1 Vsand then comes from a simple application of CLT Plug-in estimator V^sand T replace cov(yi) by eîeî with eî = yi − µî replace β by β^ Rule of thumb - need at least n = 50 to work 12 / 24 Small Sample Performance Recall that STOP CRC only has 26 clinics (n = 26) The plug-in estimator used the residuals eî to estimate cov(yi) These residuals tend to be too small with small n ^ V^sand would be expected to underestimate the covariance of β How to correct for this bias? 13 / 24 Resampling An immediate solution is resampling One could possibly use cluster bootstrapping (sampling clinics with replacement) or delete-s jackknife covariance estimator computation sufficient number of bootstrap replicates? optimal s in jackknifing? Non-closed form correction is difficult to translate into practice 14 / 24 Deriving Vsand,MD A bias-corrected covariance estimator is proposed by Mancl and DeRouen (2001) T Reduce the bias of the residual estimator, eîeî Let ei = ei(β) = yi − µi, we could write for each i @e e^ ≈ e + i (β^ − β) = e − D (β^ − β); i i @βT i i T T where we recall that Di = @µi=@β = −@ei=@β The second moment of eî T h ^ T T i h ^ T i E(êieî ) ≈ cov(yi) − E ei(β − β) Di − E Di(β − β)ei h ^ ^ T T i +E Di(β − β)(β − β) Di 15 / 24 Deriving Vsand,MD Recall the asymptotic linearity of β^, so we have n !−1 n ^ X T −1 X T −1 (β − β0) ≈ Di Vi Di Di Vi ei i=1 i=1 −1 Pn T −1 T −1 Define Hil = Di i=1 Di Vi Di Dl Vl Hii is the block diagonal element of a projection matrix Hii is the leverage of the ith cluster/clinic Hil is between zero and one, and usually close to zero T Further since E(eiel ) = 0; i 6= l, we have h ^ T i E Di(β − β)ei = Hiicov(yi) h ^ T T i T E ei(β − β) Di = cov(yi)Hii h ^ ^ T T i Pn T E Di(β − β)(β − β) Di = l=1 Hilcov(yl)Hil 16 / 24 Deriving Vsand,MD Summing up these terms, we get T T X T E(êieî ) ≈ (Ii − Hii)cov(yi)(Ii − Hii) + Hilcov(yl)Hil l6=i T ≈ (Ii − Hii)cov(yi)(Ii − Hii) Ii is the identity matrix of dimension mi, and the latter term is assumed small because Hil's are close to zero −1 T T −1 cov(yi) ≈ (Ii − Hii) eîeî (Ii − Hii ) Consequently, the MD bias-corrected robust sandwich variance estimator takes the form n !−1 n ^ X T −1 X T −1 −1 T Vsand,MD = Di Vi Di Di Vi (Ii − Hii) eîeî i=1 i=1 n !−1 T −1 −1 X T −1 ×(Ii − Hii ) Vi Di Di Vi Di i=1 Inflates V^sand 17 / 24 Possible Improvement In theory, the MD bias-corrected robust variance can be improved by incorporating Hil T T T Specifically, if we write the residuals as e^ = (ê1 ;:::; e^n ) , and the Pn T −1 −1 T −1 projection matrix H = D( i=1 Di Vi Di) D V T T T D = (D1 ;:::;Dn ) V = block diag(V1;:::;Vn) T T T y = (y1 ; : : : ; yn ) We can show E(êe^T ) = (I − H)cov(y)(I − H)T , which may promise a more accurate correction Any practical issues? Numeric problems due to the near singularity 18 / 24 Heuristics for Vsand,KC Kauermann and Carroll (2001) proposed an alternative correction by extending the bias-corrected sandwich estimator for linear regression T Under the heteroscedasticity regression model Yi = xi β + ei with 2 T ei ∼ N(0; σi ), suppose the parameter of interest is a scalar θ = c β Write X as the design matrix, projection matrix T −1 T T T −1 H = X(X X) X , and define ai = c (X X) xi The sandwich estimator ! ^ T T −1 X T 2 T −1 X 2 2 Vsand = c (X X) xixi eî (X X) c := ai eî i i consistently estimate the variance of least square projection ^ T ^ 2 P 2 2 T T −1 θ = c β, which is σ i ai = σ c (X X) c 19 / 24 Heuristics for Vsand,KC 2 2 However, under homoscedasticity where σi = σ , the expectation 2 T 2 E(êi ) = E[y (I − H)y] = σ (1 − hii); where hii is the leverage of observation i (between 0 and 1) Indeed ^ 2 X 2 2 X 2 E(Vsand) = σ ai −σ ai hii i i | {z } Bias term A simple fix is to replace eî in V^sand by leverage-adjusted residuals − 1 e~i = (1 − hii) 2 eî 20 / 24 Heuristics for Vsand,KC

Introduction to the Generalized Estimating Equations and Its Applications in Small Cluster Randomized Trials

Autoregressive Conditional Kurtosis

Quadratic Versus Linear Estimating Equations

Inference Based on Estimating Equations and Probability-Linked Data

Generalized Estimating Equations for Mixed Models

Using Generalized Estimating Equations to Estimate Nonlinear

Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J

Maximum Likelihood Estimation of Distribution Parameters from Incomplete Data Edwin Joseph Hughes Iowa State University

Application of Generalized Linear Models and Generalized Estimation Equations

M-Estimation (Estimating Equations)

Estimated Estimating Equations: Semiparamet- Ric Inference for Clustered/Longitudinal Data

Association Between Early Cerebral Oxygenation and Neurodevelopmental Impairment Or Death in Premature Infants

Estimating Equations and Maximum Likelihood Asymptotics