Applied Linear and Nonlinear Mixed Models* Introduction Mixed-Effect

Short Course — Applied Linear and Nonlinear Mixed Models* Introduction Mixed-effect models (or simply, “mixed models”) are like classical (“fixed- effects”) statistical models, except that some of the parameters describ- ing group effects or covariate effects are replaced by random variables, or random-effects. • Thus, the model has both parameters, also known as “fixed effects”, and random effects. Thus the model has mixed effects. • Random effects can be thought of as random versions of parameters. So, in some sense, a mixed model has both fixed and random parameters. – This can be a useful way to think about it, but it’s really not quite right and can lead to confusion. – The word “parameter” means fixed, unknown constant, so it is really something of an oxymoron to say “random parameter.” – As we’ll see, the distinction between a parameter and a random effect goes well beyond vocabulary. * Temple-Inland Forest Products, Inc., Jan. 17–18, 2005 1 Random effects arise when the observations being analyzed are heteroge- neous, and can be thought of as belonging to several groups or clusters. • This happens when there is one observation per experimental unit (tree, patient, plot, animal) and the experimental units occur or are measured in different locations, at different time points, from different sires or genetic strains, etc. • Also often occurs when repeated measurements of each experimental unit are taken. – E.g., several observations are taken through time of the height of 100 trees. The repeated height measurements are grouped or clustered by tree. The use of random effects in linear models leads to linear mixed models (LMMs). • LMMs are not new. Some examples from this class are among the simplest, most familiar linear model and are very old. • However, until recently, software and statistical methods for inference were not well-developed enough to handle the general case. – Thus, only recently has the full flexibility and power of this class of models been realized. 2 Some Simple LMMs: The one-way random effects model — Railway Rails: (See Pinheiro and Bates, §1.1) The data displayed below are from an experiment conducted to measure longitudinal (lengthwise) stress in railway rails. Six rails were chosen at random and tested three times each by measuring the time it took for a certain type of ultrasonic wave to travel the length of the rail. 4 3 6 Rail 1 5 2 40 60 80 100 Zero-force travel time (nanoseconds) Clearly, these data are grouped, or clustered, by rail. This clustering has two closely related implications: 1. (within-cluster correlation) we should expect that observations from the same rail will be more similar to one another than observations from different rails; and 2. (between cluster heterogeneity) we should expect that the mean response will vary from rail to rail in addition to varying from one measurement to the next. • These ideas are really flip-sides of the same coin. 3 Although it is fairly obvious that clustering by rail must be incorporated in the modeling of these data somehow, we first consider a naive approach. The primary interest here is in measuring the mean travel time. Therefore, we might naively consider the model yij = µ + eij,i=1,...,6,j =1,...,3, th th where yij is the travel time for the j trial on the i rail, and we assume iid 2 ε11,...,ε63 ∼ N(0,σ ). iid 2 • Here, the notation ∼ N(0,σ ) means, “are independent, identically distributed random variables each with a normal distribution with mean 0 and (constant) variance σ2.” • In addition, µ is the mean travel time which we wish to estimate. Its maximum likelihood (ML)/ordinary least-squares (OLS) estimate is the grand sample mean of all observations in the data set:y ¯·· =66.5. • Themeansquareerror(MSE)iss2 =23.6452, which estimates the error variance σ2. However, an examination of the residuals form this model plotted sepa- rately by rail reveals the inadequacy of the model: Boxplots of Raw Residuals by Rail, Simple Mean Model Residuals for simple mean model -40 -20 0 20 251634 Rail No. 4 Clearly, the mean response is changing from rail to rail. Therefore, we consider a one-way ANOVA model: yij = µ + αi + eij. (∗) Here, µ is a grand mean across the rails included in the experiment, and th αi is an effect up or down from the grand mean specific to the i rail. Alternatively, we could define µi = µ + αi as the mean response for the ith rail and reparameterize this model as yij = µi + eij. The OLS estimates of the parameters of this model areµ î =¯yi·,of 2 2 (ˆµ1,...,µˆ6)=(54.00, 31.67, 84.67, 96.00, 50.00, 82.67) and s =4.02 .The residual plot looks much better: Boxplots of Raw Residuals by Rail, One-way Fixed Effects Model Residuals for one-way fixed effects model -6 -4 -2 0 2 4 6 251634 Rail No. 5 However, there are still drawbacks to this one-way fixed effects model: • It only models the specific sample of rails used in the experiment, while the main interest is in the population of rails from which these rails were drawn. • It does not produce an estimate of the rail-to-rail variability in travel time, which is a quantity of significant interest in the study. • The number of parameters increases linearly with the number of rails used in the experiment. These deficiencies are overcome by the one-way random effects model. To motivate this model, consider again the one-way fixed effects model. Model (*) can be written as yij = µ +(µi − µ)+eij where, under the usual constraint i αi =0,(µi − µ)=αi has mean 0 when averaged over the groups (rails). The one-way random effects model, replaces the fixed parameter (µi − µ) th with a random effect bi, a random variable specific to the i rail, which 2 isassumedtohavemean0andanunknownvarianceσb . This yields the model yij = µ + bi + eij, (∗∗) where b1,...,b6 are independent random variables, each with mean 0 and 2 variance σb .Often,thebi’s are assumed normal, and they are usually assumed independent of the eij’s. Thus we have iid 2 iid 2 b1,...,ba ∼ N(0,σb ), independent of e11 ...,ean ∼ N(0,σ ), where a is the number of rails, n the number of observations on the ith rail. 6 • Note that now the interpretation of µ changes from the mean over the 6 rails included in the experiment (fixed effects model) to the mean over the population of all rails from which the six rails were sampled. • In addition, we don’t estimate µi the mean response for rail i,which is not of interest. Instead we estimate the population mean µ and 2 the variance from rail to rail in the population, σb . –Thatis,ourscope of inference has changed from the six rails included in the study to the population of rails from which those six rails were drawn. In addition: 2 • we can estimate rail to rail variability σb ;and • the number of parameters no longer increases with the number of rails tested in the experiment. – The parameters in the fixed-effect model were the grand mean 2 µ, the rail-specific effects α1,...,αa, and the error variance σ . – In the random effects model, the only parameters are µ, σ2 2 and σb . 2 σb quantifies heterogeneity from rail-to-rail, which is one consequence of having observations that are grouped or clustered by rail, but what about within-rail correlation? 7 Unlike a purely fixed-effect model, the one-way random effects model does not assume that all of the responses are independent. Instead, it implies that observations that share the same random effect are correlated. th • E.g., for two observations from the i rail, yi1 and yi3,say,themodel implies yi1 = µ + bi + ei1 and yi3 = µ + bi + ei3 That is, yi1 and yi3 share the random effect bi, and are therefore correlated. Why? Because one can easily show that 2 2 var(yij)=σb + σ 2 cov(yij,yij )=σb ,j= j 2 σb ≡ corr(yij,yij )=ρ 2 2 ,j= j , and σb + σ cov(yij ,yij )=0,i= i . That is, if we stack up all of the observations from the ith rail (the obser- T vations that share the random effect bi)asyi =(yi1,...,yin) ,then ⎛ ⎞ 1 ρ ··· ρ ⎜ ρ 1 ··· ρ ⎟ 2 2 ⎜ ⎟ var(yi)=(σb + σ ) ⎝ . ⎠ (†) . .. ρρ··· 1 and groups of observations from different rails (those that do not share random effects) are independent. 8 • The variance-covariance structure given by (†) has a a special name: compound symmetry. This means that – observations from the same rail all have constant variance equal 2 2 to σ + σb ,and – all pairs of observations from the same rail have constant correlation equal to 2 σb ρ = 2 2 σ + σb – ρ, the correlation between any two observations from the same rail, is called the intraclass correlation coefficient. • In addition, because the total variance of any observation is var(yij)= 2 2 2 2 σb + σ , the sum of two terms, σb and σ are called variance com- ponents. 9 • Both fixed-effects and random-effects versions of the one-way model are fit to these data in intro.R. • For (*), the fixed-effect version of the one-way model, we obtain µˆ =66.5 with a standard error of 0.948. • For (**), the random-effect version of the one-way model, we obtain µˆ =66.5 with a standard error of 10.17. – Standard error is larger in random-effects model, because this model has a larger scope of inference.

Load more