14 Estimating Equation Methods for Population-Average-Models

CHAPTER 14 ST 762, M. DAVIDIAN 14 Estimating equation methods for population-average-models 14.1 Introduction In this chapter, we focus on methods for estimation of the parameters in a population-averaged marginal model of the general form discussed in Chapter 13. In particular, we assume that the pairs (Y i, xi), i = 1,...,m, are independent, where each Y i is (ni × 1), and we consider the general mean-covariance matrix model E(Y i|xi)= f i(xi, β), var(Y i|xi)= V i(β, ξ, xi) (ni × ni), (14.1) where f i(xi, β) is the (ni × 1) vector with jth element f(xij, β). The covariance model V i(β, ξ, xi) is taken to have the form 1/2 1/2 var(Y i|xi)= V i(β, ξ, xi)= T i (β, θ, xi)Γi(α, xi)T i (β, θ, xi), (14.2) where T i(β, θ, xi) is the diagonal matrix whose diagonal elements are the models for var(Yij|xi), e.g., involving a variance function 2 2 var(Yij|xi)= σ g (β, θ, xij) depending on possibly unknown variance parameters θ (as in the previous chapter, we will generally 2 absorb σ into θ for brevity and just refer to θ as all the variance parameters). The (ni × ni) matrix Γi(α, xi) is a correlation matrix that generally depends on xi only through the within-individual times or other conditions tij at which observations in Y i are taken. Here, α is a vector of unknown correlation parameters. The vector of variance and correlation parameters ξ = (θT , αT )T may be entirely unknown, or it may be that only α is unknown in models where the form of the variance function is entirely specified. As discussed in Chapter 13, the correlation model Γi(α, xi) is likely not to be correct. Rather, it is specified as a “working” model that hopefully captures some of the main features of the overall pattern of correlation. This is acknowledged when inference is carried out under model (14.1) with assumed covariance structure (14.2); we will discuss this explicitly in Section 14.5. Model (14.1) may be viewed as a multivariate analog to the univariate mean-variance models discussed in Chapters 2–12. Thus, it should come as no surprise that inferential strategies for (14.1) exploit some of the same ideas as in the univariate case. In particular, estimation of β and ξ is typically carried out by solution of linear or quadratic estimating equations that are similar in spirit to those used for univariate response. Of necessity, these equations are more complicated in the multivariate setting, as we will see in Sections 14.2, 14.3, and 14.4, although the basic principles are the same. PAGE 371 CHAPTER 14 ST 762, M. DAVIDIAN The terminology generalized estimating equations (GEEs), first coined by Liang and Zeger (1986), has come to refer broadly to the body of techniques for inference for β and ξ based on solution of appropriate estimating equations. • As suggested by the title of Liang and Zeger (1986), “Longitudinal data analysis using generalized linear models,” this paper cast the idea of posing estimating equations for multivariate response in the context where the the response is collected longitudinally for each experimental unit. • The development was also restricted to responses such as binary data, counts, and so on, thus on mean models f and models for var(Yij|xij) that are of the generalized linear model type. However, this restriction is unnecessary; solving estimating equations for any model of the form (14.1) is feasible more generally. This restriction does explain why, in much of the literature, there are no unknown variance parameters θ, and interest focuses exclusively on estimating correlation parameters α. For ex- ample, for binary response, the model for var(Yij|xij) might be taken to be the usual model f(xij, β){1 − f(xij, β)} and is assumed to be correctly specified. A “working” correlation model might be postulated depending on unknown α, so that only α remains to be estimated. • In our development here, we will allow the possibility that the model V i(β, ξ, xi) may involve both unknown variance parameters θ and unknown correlation parameters α. We will note simplifications that would occur in the case where the variance function does not depend on any unknown parameters θ. There are numerous references that cover the types of estimating equations we are about to discuss. Some of the key references are Prentice (1988), Zhao and Prentice (1990), Prentice and Zhao (1991), and Liang, Zeger, and Qaqish (1992). See also Section 7.5 and Chapter 8 of Diggle, Liang, and Zeger (1995), Chapter 9 of Vonesh and Chinchilli (1997), and Chapter 3 of Fitzmaurice et al. (2009). 14.2 Linear estimating equations for β Just as with univariate response, it is natural to start by considering the normal likelihood as a basis to derive an estimation method for β in (14.1). Thus, analogous to the case of “known weights” in the univariate case, assume that the matrices var(Y i|xi) = V i, say, are known. Under these conditions, assuming that the Y i|xi are normally distributed, the normal loglikelihood has the form m T −1 log L = −(1/2) log |V i| + {Y i − f i(xi, β)} V i {Y i − f i(xi, β)} . (14.3) Xi=1 h i PAGE 372 CHAPTER 14 ST 762, M. DAVIDIAN Writing T fβ (xi1, β) . Xi(β)= . (ni × p), T f (xini , β) β taking derivatives of (14.3) with respect to the (p × 1) vector β, and setting equal to zero yields m T −1 Xi (β)V i {Y i − f i(xi, β)} = 0, (14.4) Xi=1 which follows by using the following standard matrix differentiation results. • For quadratic form q = xT Ax and A symmetric, ∂q/∂x = 2Ax. Note that this is a vector. • The chain rule gives ∂q/∂β = (∂x/∂β)(∂q/∂x). Note that if both x and β are vectors, then ∂x/∂β is a matrix. (See Section 2.4 for a refresher.) Section 1.4 of Vonesh and Chinchilli (1997) is an excellent source for many useful and sometimes difficult-to-find results on matrices and matrix differentiation. The equation (14.4) has the form of a multivariate analog to the usual, univariate WLS equation, −1 where, here, the response and mean are now vectors, and the “weights” V i and “gradient” Xi(β) are matrices. In fact, we may write (14.4) in a way that makes it clear that there is really no fundamental difference T T T in the general forms in the univariate and multivariate cases. Let Y = (Y 1 ,..., Y m) be the vector of m length N = i=1 ni (total number of observations). Define P T T T f(β)= {f 1 (x1, β),..., f m(x1, β)} (N × 1), V = block diag(V 1,..., V m) (N × N), and X1(β) . X(β)= . (N × p). Xm(β) Note that E(Y |x˜)= f(β) and var(Y |x˜)= V , where we use conditioning on “x˜” as shorthand to denote that conditioning for each component Y i is with respect to xi. Note further that we may rewrite (14.4) as XT (β)V −1{Y − f(β)} = 0. (14.5) PAGE 373 CHAPTER 14 ST 762, M. DAVIDIAN This has the same form as the matrix representation of the usual WLS estimating equations in the univariate case with the exception that the “weight matrix” V −1 in (14.5) is not necessarily diagonal. This is not a big deal, so that the form of the equations is exactly that of WLS. In fact, note that the summands for i = 1,...,m in the equation written in the form of (14.4) are independent. Moreover, regardless of whether the (constant) matrices V i are actually equal to the true var(Y i|xi), the expectation of a summand, conditional on xi, is clearly equal to zero. • Thus, the estimating equation is unbiased. We would thus expect that estimation of β by solving (14.4) to lead to a consistent estimator. Of course, if the matrices V i are not known but depend on the parameters β and ξ, following the analogy to the linear case, it is natural to consider replacing them by the postulated covariance model in (14.1). This leads to the linear estimating equation m T −1 Xi (β)V i (β, ξ, xi){Y i − f i(xi, β)} = 0, (14.6) Xi=1 which may be rewritten, defining V (β, ξ) = block diag{V 1(β, ξ, x1),..., V m(β, ξ, xm)}, as XT (β)V −1(β, ξ){Y − f(β)} = 0. (14.7) As in the case where the covariance matrix is known, the summands in (14.6) are independent across i. Hence, we expect that this estimating equation is also unbiased, and would lead to a consistent estimator for β. We will discuss this in more detail in Section 14.5. IN FACT: If the model for the covariance matrix var(Y |x˜) is correct (we of course assume that the model for the mean is correct), note that (14.7) has exactly the form of the optimal linear estimating equation for β as in Section 9.6, except for the detail that the weight matrix V −1(β, ξ) is not diagonal. • As we will see in Section 14.5, a similar “folklore” result as for the univariate case obtains for the multivariate generalization of GLS that we discuss momentarily.

Load more