Flexible Modelling of Serial Correlation in GLMM

Flexible Modelling of Serial Correlation in GLMM Modellazione flessibile della struttura di correlazione seriale nei GLMM Mariangela Sciandra Dipartimento di Scienze Statistiche e Matematiche “Silvio Vianelli”, Universita` degli Studi di Palermo e-mail: [email protected] Keywords: longitudinal data, glmm, serial correlation, fractional polynomials 1. Introduction Mixed models have rapidly become a very useful tool for modelling longitudinal data because, thanks to their flexibility, they are particularly suited to handle many complex situations, especially when extensions like Generalized Linear or Nonlinear mixed models are used. However, existing works on Generalized Linear Mixed Models (GLMMs) assume the correlation between measurements on the same observational unit to be modelled only by the random effects while conditionally upon them the repeated measurements are assumed to be independent. Yet, it could happen that part of an individual observed profile may be a response to time-varying stochastic processes operating within that unit. This type of random variation results in a correlation between pairs of measurements on the same subject, called serial correlation, which is usually a decreasing function of the time separation between these measurements. So, in presence of serially correlated observations the functional specification of the covariance structure should be done through the simultaneous specification of the covariance structure induced by the random effects, which is typically non-stationary, and the specification of a “residual” component which accounts for deviations of the observations from the individual profiles. Standard GLMMs are not able to deal with serially correlated observations because assuming conditional independence they ignore correlation at the residuals level resulting in invalid inferences for the parameters in the mean structure of the model. In this work a general approach will be proposed consisting in extending to the GLMMs case the procedure proposed by Lesaffre et al. (1999) in the context of Linear Mixed Models (LMMs), for a flexible modelling of the GLMMs covariance structures as a combination of a parametric modelling of the random effects part with a flexible modelling of the serial correlation component through the use of Fractional Polynomials. 2. GLMMs for serially correlated observations Let Yij be the jth measurement for subject i, i = 1,...,m, j = 1,...,ni and Yi the ni-dimensional vector of all measurements taken on subject i, a GLMM assumes that, conditionally on q-dimensional random effects bi, the elements Yij of Yi are independent. The basis for the development of a Generalized Linear Mixed Model with autocorrelation is the decomposition xT zT b Yij = µij + ǫij = h( ijβ + ij i)+ ǫij where h(·) is the inverse of the link function, and the error terms have an appropriate distribution with variance equal to V ar(Yij|bi) = φv(µij) for v(·) the usual variance function in the exponential family. Then, using a Penalized Quasi-Likelihood (PQL) approach to the estimation, the GLMM is estimated by defining at each step a working variate Y∗ as a Taylor expansion of the response around the conditional mean evaluated i ˆ ˆ on the current estimates β and bi of fixed and random effects, respectively. The working Y∗ variate i of the specified GLMM then it is known (Schall, 1991) to be Y∗ ′ Y ˆ X ˆ Z bˆ ′ Y ˆ X Z b ∗ i ≡ g(ˆµi)+ g (ˆµi)( i − µi)= iβ + i i + g (ˆµi)( i − µi) ≈ iβ + i i + ǫi ∗ ′ where ǫi is equal to g (ˆµi)ǫi, which has also mean zero ∀i = 1, 2,...,m. Then E[Y∗|b] = Xβ + Zb and var(Y∗|b) = ∆Σ∆T where ∆ is the diagonal matrix ′ with diagonal entries g (ˆµi) and Σ = var(Y − µˆ). We can express the variance function Σ so that 1 1 1 1 2 A 2 H A 2 2 V ar(ǫi) = Φ i i i Φ (1) where Φ is a diagonal matrix with the over dispersion parameters along the diagonal, Hi is the correlation matrix and Ai is a diagonal matrix containing the variances following from the generalized linear model specification of Yij given the random effects bi. Using equation (1) we have the following expression for the marginal variance covariance matrix: 1 1 ∗ 1 1 Y V ZDZT 2 A 2 H A 2 2 var( )= = + Φ i i i Φ . That is, the working variate Y∗ takes the form of a weighted LMM with diagonal weight ˆ A−1 ′ −2 matrix W = µˆ [g (ˆµ)] . Then an iterative algorithm like that proposed by Schall (1991), in which a LMM is fitted to get estimates of β and b, is used. The use of this estimation algorithm allows us to introduce autocorrelation at the level of the linear Y∗ predictor in modelling the pseudo-variable i at each step trough a LMM or to define an autocorrelation function on the residuals evaluated at each step of the iterative process. Doing so, we can take advantage of the well-established correlation structures for LMMs. In particular, our proposal consists in modelling the serial correlation part using Fractional Polynomials (Royston and Altman, 1994) which allow the autocorrelation function g(·) in Hi to have a parametric form which is flexible enough to assume several shapes. This approach is very general and applies to all link function. Yet, it is important to underline how the estimate of the autocorrelation parameter obtained using this approach has to be considered only as an approximation of the underlying correlation structure of the observed data given by the correlation of its working variate. References Lesaffre E., Todem D., Verbeke G. (2000) Flexible modelling of the covariance matrix in a linear random effects model, Biometrical Journal, 42, 807–822. Royston P., Altman D. G. (1994) Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling, Applied Statistics, 43, 429–468. Schall R. (1991) Estimation in generalized linear models with random effects, Biometrika, 78, 719-727..

Flexible Modelling of Serial Correlation in GLMM

Generalized Linear Models (Glms)

Bayesian Inference Chapter 4: Regression and Hierarchical Models

Bayesian Methods: Review of Generalized Linear Models

Generalized Linear Models

Heteroscedastic Errors

Generalized Linear Models

Generalized Linear Models

Generalized Linear Models Outperform Commonly Used Canonical Analysis in Estimating Spatial Structure of Presence/Absence Data

Generalized Linear Models I

Generalized Linear Models Link Function the Logistic Equation Is

Analysis of Bayesian Generalized Linear Models on the Number of Tuberculosis Patients in Indonesia with R

Generalized Linear Models