Chapter 10. Generalized Linear Models / 10-1

Chapter 10. Generalized Linear Models / 10-1 © 2003 by Edward W. Frees. All rights reserved Chapter 10. Generalized Linear Models Abstract. This chapter extends the linear model introduced in Part I and the binary dependent variable model in Chapter 9 to the generalized linear model formulation. Generalized linear models, often known by the acronym GLM, represent an important class of nonlinear regression models that have found extensive use in practice. In addition to the normal and Bernoulli distributions, these models include the binomial, Poisson and Gamma families as distributions for dependent variables. Section 10.1 begins this chapter with a review of homogeneous GLM models, that is, GLM models that do not incorporate heterogeneity. The Section 10.2 example reinforces this review. Section 10.3 then describes marginal models and generalized estimating equations, a widely applied framework for incorporating heterogeneity. Then, Sections 10.4 and 10.5 allow for heterogeneity by modeling subject-specific quantities as random and fixed effects models, respectively. Section 10.6 ties together fixed and random effects under the umbrella of Bayesian inference. 10.1 Homogeneous models This section introduces the generalized linear model (GLM); a more extensive treatment may be found in the classic work by McCullagh and Nelder (1989G). The GLM framework generalizes linear models in the following sense. Linear model theory provides a platform for choosing appropriate linear combinations of explanatory variables to predict a response. In Chapter 9, we saw how to use nonlinear functions of these linear combinations to provide better predictors, at least for responses with Bernoulli (binary) outcomes. In GLM, we widen the class of distributions to allow us to handle other types of non-normal outcomes. This broad class includes as special cases the normal, Bernoulli and Poisson distributions. As we will see, the normal distribution corresponds to linear models, the Bernoulli to the Chapter 9 binary response models. To motivate GLM, we focus on the Poisson distribution; this distribution allows us to readily model count data. Our treatment follows the structure introduced in Chapter 9; we first introduce the homogenous version of the GLM framework so that this section does not use any subject-specific parameters nor does it introduce terms to account for serial correlation. Subsequent sections will introduce techniques for handling heterogeneity in longitudinal and panel data. The section begins with an introduction of the response distribution in Subsection 10.1.1 and then shows how to link the distribution’s parameters to regression variables in Subsection 10.1.2. Subsection 10.1.3 then describes estimation principles. 10.1.1 Linear exponential families of distributions This chapter considers the linear exponential family of the form yθ − b(θ ) p(y,θ ,φ) = exp + S(y,φ) . (10.1) φ Here, y is a dependent variable and θ is the parameter of interest. The quantity φ is a scale parameter that we often will assume is known. The term b(θ) depends only on the parameter 10-2 / Chapter 10. Generalized Linear Models θ, not the dependent variable. The statistic S(y, φ ) is a function of the dependent variable and the scale parameter, not the parameter θ. The dependent variable y may be discrete, continuous or a mixture. Thus, p(.) may be interpreted to be a density or mass function, depending on the application. Table 10A.1 provides several examples, including the normal, binomial and Poisson distributions. To illustrate, consider a normal distribution with a probability density function of the form 1 (y − µ) 2 (yµ − µ 2 / 2) y 2 1 f(y, µ,σ 2 ) = exp− = exp − − ln 2πσ 2 . 2 2 2 () 2πσ 2σ σ 2σ 2 With the choices θ = µ, φ = σ 2, b(θ) = θ 2/2 and S(y, φ ) = - (y2 / (2φ) + ln(2 π φ))/2 ), we see that the normal probability density function can be expressed as in equation (10.1). For the function in equation (10.1), some straightforward calculations show that • E y = b′(θ) and • Var y = φ b″(θ). For reference, these calculations appear in Appendix 10A.1. To illustrate, in the context of the normal distribution example above, it is easy to check that E y = b′(θ) = θ = µ and Var y = σ 2 b′′(µ) = σ 2, as anticipated. 10.1.2 Link functions In regression modeling situations, the distribution of yit varies by observation through the subscripts “it.” Specifically, we allow the distribution’s parameters to vary by observation through the notation θit and φit. For our applications, the variation of the scale parameter is due to known weight factors. Specifically, when the scale parameter varies by observation, it is according to φit = φ /wit, that is, a constant divided by a known weight wit. With the relation Var yit = φit b″(θ)= φ b″(θ)/wit, we have that a larger weight implies a smaller variance, other things being equal. In regression situations, we wish to understand the impact of a linear combination of explanatory variables on the distribution. In the GLM context, it is customary to call ηit = xit′ β the systematic component of the model. This systematic component is related to the mean through the expression ηit = g(µit) . (10.2) Here, g(.) is known as the link function. As we saw in the prior subsection, we can express the mean of yit as E yit = µit = b′(θit). Thus, equation (10.2) serves to “link” the systematic component to the parameter θit. It is possible to use the identity function for g(.) so that µit = xit′ β. Indeed, this is the usual case in linear regression. However, linear combinations of explanatory variables, xit′ β, may vary between negative and positive infinity whereas means are often restricted to smaller range. For example, Poisson means vary between zero and infinity. The link function serves to map the domain of the mean function onto the whole real line. Special case: Links for the Bernoulli distribution Bernoulli means are probabilities and thus vary between zero and one. For this case, it is useful to choose a link function that maps the unit interval (0,1) onto the whole real line. The following are three important examples of link functions for the Bernoulli distribution: • Logit: g(µ) = logit(µ) = ln (µ/(1− µ)) . • Probit: g(µ) = Φ-1(µ) , where Φ-1 is the inverse of the standard normal distribution function. • Complementary log-log: g(µ) = ln ( -ln(1− µ) ) . Chapter 10. Generalized Linear Models / 10-3 This example demonstrates that there may be several link functions that are suitable for a particular distribution. To help with the selection, an intuitively appealing case occurs when the systematic component equals the parameter of interest (η = θ ). To see this, first recall that η = g(µ) and µ = b′(θ), dropping the “it” subscripts for the moment. Then, it is easy to see that if g-1 = b′, then η = g(b′(θ)) = θ. The choice of g that is the inverse of b′ is called the canonical link. Table 10.1 shows the mean function and corresponding canonical link for several important distributions. Table 10.1 Mean functions and canonical links for selected distributions Distribution Mean function (b′(θ)) Canonical link (g(θ)) Normal θ θ Bernoulli eθ/(1+ eθ ) logit(θ ) Poisson eθ ln θ Gamma -1/θ- -θ- 10.1.3 Estimation This section presents maximum likelihood, the customary form of estimation. To provide intuition, we begin with the simpler case of canonical links and then extend the results to more general links. Maximum likelihood estimation for canonical links From equation (10.1) and the independence among observations, the log-likelihood is yit θit − b(θit ) ln p(y) = ∑ + S(yit ,φit ) . (10.3) it φit Recall that for canonical links, we have equality between the distribution’s parameter and the systematic component, so that θit = ηit = xit′ β. Thus, the log-likelihood is yit x′it β − b(x′it β) ln p(y) = ∑ + S(yit ,φit ) . (10.4) it φit Taking the partial derivative with respect to β yields the score function ∂ yit − b′(x′itβ) lnp(y) = ∑xit . ∂β it φit Because µit = b′(θit) = b′( xit′ β) and φit = φ /wit, we can solve for the maximum likelihood estimators of β, bMLE, through the “normal equations” 0 = ∑ wit xit (yit − µ it ). (10.5) it One reason for the widespread use of GLM methods is that the maximum likelihood estimators can be computed quickly through a technique known as iterated reweighted least squares, described in Appendix C.3. Note that, like ordinary linear regression normal equations, we do not need to consider estimation of the variance scale parameter φ at this stage. That is, we can first compute bMLE and then estimate φ. 10-4 / Chapter 10. Generalized Linear Models Maximum likelihood estimation for general links For general links, we no longer assume the relationship θit = xit′ β but assume that β is related to θit through the relation µit = b′(θit) and xit′ β = g(µit). Using equation (10.3), we have that the jth element of the score function is ∂ ∂θit yit − µ it ln p(y) = ∑ , ∂β j it ∂β j φit because b′(θit) = µit . Now, use the chain rule and the relation Var yit = φit b′′(θit) to get ∂µ it ∂ b′(θit ) ∂θit Var yit ∂θit = = b′′(θit ) = . ∂β j ∂β j ∂β j φit ∂β j ∂θ 1 ∂µ 1 Thus, we have it = it . This yields ∂β j φit ∂β j Var yit ∂ ∂µ it −1 ln p(y) = ∑ ()()Var yit yit − µ it , ∂β j it ∂β j which is known as the generalized estimating equations form.

Load more