<<

02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 303

STATISTICS 303

Recommendations for the Primary Analysis of Continuous Endpoints in Longitudinal Clinical Trials

Craig H. Mallinckrodt, PhD Research Advisor, Lilly This position paper summarizes relevant theo- simple methods in favor of joint analysis of Research Laboratories, Eli ry and current practice regarding the analysis data from all time points based on a multivari- Lilly and Company, of longitudinal clinical trials intended to sup- ate model (eg, of a mixed-effects type). One Indianapolis, Indiana port regulatory approval of medicinal prod- such newer method, a likelihood-based mixed- Peter W. Lane, MA, CStat ucts, and it reviews published research regard- effects model repeated measures (MMRM) ap- Director of Consultancy and Training, Research Statistics ing methods for handling missing data. It is proach, has received considerable attention in Unit, GlaxoSmithKline, one strand of the PhRMA initiative to improve the clinical trials literature. We discuss specif- Harlow, United Kingdom efficiency of late-stage clinical research and ic concerns raised by regulatory agencies with Dan Schnell, PhD gives recommendations from a cross-industry regard to MMRM and review published evi- Section Head, Pharmaceutical Statistics, team. We concentrate specifically on continu- dence comparing LOCF and MMRM in terms Procter & Gamble ous response measures analyzed using a linear of validity, bias, power, and type I error. Our Pharmaceuticals, model, when the goal is to estimate and test main conclusion is that the mixed model ap- Mason, Ohio treatment differences at a given time point. proach is more efficient and reliable as a Yahong Peng, PhD Traditionally, the primary analysis of such tri- method of primary analysis, and should be Senior Biometrician, Clinical Biostatistics, Merck als handled missing data by simple imputation preferred to the inherently biased and statisti- Research Lab, Upper using the last, or baseline, observation carried cally invalid simple imputation approaches. Gwynedd, Pennsylvania forward method (LOCF, BOCF) followed by We also summarize other methods of handling James P. Mancuso, PhD analysis of (co)variance at the chosen time missing data that are useful as sensitivity Associate Director, Statistics, Inc, point. However, the general statistical and sci- analyses for assessing the potential effect of Groton, Connecticut entific community has moved away from these data missing not at random.

Key Words that it tends to under- rather than overestimate Missing data; Longitudinal data; INTRODUCTION treatment effects. Although the appropriateness Primary analysis; In longitudinal trials, efficacy is often assessed of LOCF hinges on strong assumptions, it is also Clinical trials in terms of treatment differences at a specific generally regarded as less biased than an analy- Correspondence Address time point, usually the last time at which obser- sis of completing subjects only, potentially Craig Mallinckrodt, , Lilly vations are planned while patients are under counteracting bias caused by differential tim- Corporate Center, Indianapolis, IN 46285 treatment. A major difficulty in analyses of such ing, rates, and reasons for dropout in the various (email: [email protected]). trials is missing data at the chosen time point, treatment arms. often due to patients withdrawing (or dropping Over the past 20 years, statistical methodolo- out) from treatment. Inference from the results gy and software have been developed that allow of a trial can be complicated by the method for the routine use of alternative approaches used to handle the missing data because the in- with less restrictive assumptions than LOCF. ference may depend on the method and its as- These methods are based on analyzing the ob- sumptions. servations made at all time points. One such Historically, the simple imputation method, longitudinal approach, which has been exten- called last observation carried forward (LOCF), sively studied in regulatory settings, uses a mod- has been used for the primary efficacy analysis el referred to as multivariate, or mixed, and is in- of clinical trials supporting registration of new creasingly denoted in the literature by the medicines (1). This approach is simple to carry abbreviation MMRM (mixed model for repeated out and is generally regarded as conservative in measures) (2–14).

Drug Information Journal, Vol. 42, pp. 303–319, 2008 • 0092-8615/2008 Submitted for Publication: August 6, 2007 Printed in the USA. All rights reserved. Copyright © 2008 Drug Information Association, Inc. Accepted for Publication: November 29, 2007 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 304

304 STATISTICS Mallinckrodt et al.

The MMRM method is from the broader class the implications of the research comparing of direct-likelihood analyses and makes use of LOCF and MMRM. Additional difficulties may fully and partially observed data sequences from have arisen from differences in the perspectives individual patients by estimating the covariance of pharmaceutical companies and regulators, ei- between data from different time points (1). As ther real or perceived. is described in an upcoming section, it is often The purpose of this article is to capitalize on useful to implement MMRM using an unstruc- the diverse experience of researchers at a num- tured approach to modeling both the treat- ber of pharmaceutical companies in order to (1) ment-by-time means and the (co)variances, lead- clarify terminology and concepts regarding use ing to what is essentially a multivariate normal of MMRM and LOCF in regulatory settings, (2) model wherein treatment group means at the address specific concerns raised by regulatory primary time point are adjusted to reflect both agencies regarding use of MMRM as the primary the actually observed data and the projected analysis, and (3) make specific recommenda- outcomes from the patients with missing data tions for analysis of data from confirmatory lon- (see, eg, articles by Cnaan et al. [15], Molen- gitudinal clinical trials with continuous end- berghs and colleagues [5], and Molenberghs points. and Kenward [1]). Other methods, such as mul- Regarding our perspective on the choice of tiple imputation, are also the result of advances primary analysis, this article is the consensus of in methodology and software but have not been an expert working team from the Efficiency studied as extensively as MMRM in regulatory in Clinical Trials Initiative of the Pharmaceuti- settings. cal Research and Manufacturers of America Given the strong theoretical and empirical ev- (PhRMA). We believe there is a compelling pub- idence favoring MMRM over LOCF, it is not sur- lic health need to develop drugs using the best prising that use of LOCF as the primary analysis possible scientific methods in all disciplines in has been questioned by statisticians and clini- order to meet patient needs with better and cians in academic, industry, and regulatory set- more affordable medicines. We believe regula- tings. However, regulatory agencies frequently tors share this perspective, as evidenced by the require that primary analyses of efficacy use various Critical Path initiatives. Hopefully, this LOCF. For example, Dr. Linda Yau surveyed stat- article will help drug developers and regulators isticians working in phases 2 and 3 from a wide achieve their common goal. range of therapeutic areas, including neuro- science, antivirals, respiratory, gastrointestinal, TERMINOLOGY AND CONCEPTS urology, and cardiovascular. In her presentation REGARDING USE OF MMRM AND at the DIA Conference in Philadelphia, June LOCF IN REGULATORY SETTINGS 2006, Dr. Yau noted that LOCF was almost uni- MISSING DATA TERMINOLOGY versally preferred by regulatory agencies as the AND CONCEPTS primary analysis. However, there was generally In order to understand the potential impact of no objection to using more recent methods missing data, the process (ie, mechanisms) lead- such as MMRM for primary analyses in phase 1, ing to the missingness must be considered. The nor for trials on medical devices or diagnostic following taxonomy of missing-data mecha- tests. In addition, plans for some vaccine trials nisms is now common in the statistical literature in phase 2 have included MMRM or multiple (16). imputation as the primary analysis. Data are considered missing completely at ran- In our experience, decisions regarding choice dom (MCAR) if, conditional upon the indepen- of the primary analysis have been hampered by dent variables in the analytic model, the miss- misunderstandings of concepts, some of which ingness does not depend on either the observed stem from inconsistency in terminology. This, in or unobserved outcomes of the variable being turn, has led to misunderstandings regarding analyzed (Y). Data are missing at random (MAR) if, 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 305

Recommendations for Primary Analysis STATISTICS 305

conditional upon the independent variables in be ignorable if parameters were estimated via the analytic model, the missingness depends on maximum likelihood but would not be ignorable the observed outcomes of the variable being an- if parameters were estimated via a frequentist alyzed (Yobs) but does not depend on the unob- method that assumes MCAR (18). served outcomes of the variable being analyzed These subtleties can be easy to overlook in (Ymiss). Data are missing not at random (MNAR) practice, leading to misunderstandings about if, conditional upon the independent variables missing data and its consequence. For example, in the analytic model, the missingness depends when dropout rates differ by treatment group, on the unobserved outcomes of the variable be- then it can be said that dropout is not random. ing analyzed. But it would be incorrect to conclude that the Several key points arise from these definitions. missingness mechanism giving rise to the First, the characterization of the missingness dropout is MNAR and that analyses assuming mechanism does not rest on the data alone; it MCAR or MAR would be invalid. Although involves both the data and the model used to an- dropout is not completely random in the sim- alyze the data. Consequently, missingness that plest sense, if dropout depends only on treat- might be MNAR given one model could be MAR ment, and treatment is included in the analytic or MCAR given another. In addition, since the model, the mechanism giving rise to the relationship between the dependent variable dropout would be MCAR. Some authors, such as and missingness is a key factor in the missing- Little (19), distinguish between pure MCAR ness mechanism, the mechanism may vary from (missingness depends on nothing at all) and co- one outcome to the next within the same data variate-dependent MCAR. The previous example set. Together, these consequences imply that could therefore also be described as being co- statements about the missingness mechanism variate-dependent MCAR. without reference to the analytic model and the specific variable being analyzed are problematic CONCEPTS AND CHARACTERIZATIONS OF to interpret. It also implies that broad state- LAST OBSERVATION CARRIED FORWARD ments regarding missingness and validity of par- Although this section focuses on LOCF, many of ticular analytic methods across specific disease the points also apply to baseline observation states are unwarranted. carried forward (BOCF). LOCF is not itself an Moreover, terms such as ignorable missingness analytic approach, but rather a method for im- or informative censoring can be even more prob- puting missing values. Therefore, the appropri- lematic to interpret. For example, in the case of ateness of an analysis using LOCF depends on likelihood-based estimation, if the parameters both the assumptions of LOCF and the assump- defining the measurement process are indepen- tions of the method used to analyze the data. dent of the parameters defining the missingness When assessing LOCF mean change via analysis process (sometimes referred to as the separabili- of variance (ANOVA), the key assumptions are ty or distinctness condition), the missingness is that missing data arise from an MCAR mecha- ignorable if it arises from an MCAR or MAR nism and that for subjects with missing end- process but is nonignorable if it arises from an point observations, their responses at the end- MNAR process (17). In this context, ignorable point would have been the same as their last means the missing-data mechanism can be ig- observed values. nored because unbiased parameter estimates The following example, using the hypothetical can be obtained from the observed data. Hence, data in Table 1, illustrates the handling of miss- if missing data are described as ignorable or ing data via LOCF: For patient 3, the last ob- nonignorable, this must be done with reference served value, 19, is used in the computation of to both the estimation method and the analytic the mean change to endpoint for treatment model. For example, given a certain model, miss- group 1; and for patient 6, the last observed val- ing data arising from an MAR mechanism might ue, 20, is used in the computation of the mean

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 306

306 STATISTICS Mallinckrodt et al.

Hypothetical Data Used to Illustrate How Various Methods Handle Missing Data TABLE 1 Week Patient Treatment Baseline 12345 6 1 1 22 20 18 19 14 12 10 2 1 22 21 18 11 12 11 6 3 1 22 22 21 20 19 * * 4 2 20 20 20 20 19 21 22 5 2 21 22 22 23 23 25 26 6 2 181920**** *Missing values due to patient dropout.

= change to endpoint for treatment group 2. The covariance matrix with (i,j) element dij dji; and ∑ × analysis does not distinguish between the actu- i is a general (ni ni)-dimensional covariance ally observed data and the imputed data. matrix (usually the same for all i). It follows from Even when the assumptions for LOCF hold, it this model that, marginally, must also be recognized that because LOCF is a Y ∼ N(X β, V) and V = Z DZ +∑ single-imputation method, the uncertainty of i i i i i imputation is not taken into account. Therefore, A key general feature of mixed-effects models is the analysis will, in essence, think more data ex- that they include fixed and random effects, ist than is actually the case (17). This well- whereas ANOVA models include only fixed ef- known limitation of LOCF results in systematic fects (apart from the residuals). In clinical trials, underestimation of the standard errors (7,8,20). the subject-specific (random) effects are seldom the focus. Rather, the trials are typically de- CONCEPTS AND CHARACTERIZATIONS signed to assess differences in fixed effects, most OF MMRM notably treatment effects. However, accounting Likelihood-based mixed-effects models offer a for the random effects is important in order to general framework from which to develop longi- make the most appropriate inferences regarding tudinal analyses under the MAR assumption the fixed effects. Indeed, not doing so would typ- (15,17). Laird and Ware (21) introduced the ically affect the precision of estimates and result general linear mixed-effects model to be any in incorrect inferences. model that satisfies A simple formulation of the general linear mixed model (Eq. 1) can be implemented in Y = X β+Z b +ε i i i i i which the random effects are not explicitly mod- ∼ bi N(0, D) ε ∼ ∑ eled, but rather are included as part of the mar- i N(0, i) ε ε ginal covariance matrix V, just defined, leading b1 ... bn, 1 ... n independent (1) then to what could alternatively be described as

where Yi is the ni-dimensional response vector a multivariate normal model. Modeling the ran- for subject i; β is the p-dimensional vector of dom effects as part of the within-patient error

fixed effects; bi is the q-dimensional vector of correlation structure is the feature that distin-

random (subject-specific) effects; Xi and Zi are guishes MMRM from other implementations of × × (ni p)- and (ni q)-dimensional matrices relat- mixed-effects models. ing the observations to the fixed and random ef- The following example, using the hypothetical ε fects, respectively; i is the ni-dimensional vector data in Table 1, illustrates the handling of miss- of residuals; D is a general (q × q)-dimensional ing data via an MMRM analysis: Information 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 307

Recommendations for Primary Analysis STATISTICS 307

from the observed outcomes is used via the treatment effects, biased tests of the null hy- within-patient correlation structure to provide pothesis of no treatment effect, underestimates information about the unobserved outcomes, of standard errors, inflated type I error, and cov- but missing data are not explicitly imputed. erage probabilities that may be far from the Specifically, patient 3 had been doing worse nominal level (1,2,4–8,11,12,17,23–31). than the average of patients in treatment group The assumption of MAR is often reasonable 1. Means for treatment group 1 at visits 5 and 6 because, particularly in longitudinal studies are adjusted to reflect that had patient 3 stayed wherein the evolution of treatment effects is in the trial, her observations at visits 5 and 6 assessed by design over time, the observed data would likely have been worse than the treatment and the models used to analyze them can ex- group average. But the analysis predicts that pa- plain much of the missingness (16,17). This tient 3 would have had some additional im- point may be especially relevant in well-con- provement because the other patients in group trolled studies such as clinical trials, in which 1 all improved. Patient 6 had also been doing extensive efforts are made to observe all the out- marginally worse than the average of patients in comes and the factors that influence them while his group (treatment group 2). Means for treat- patients are following protocol-defined proce- ment group 2 at visits 3–6 are adjusted to re- dures (32). Hence, longitudinal clinical trials by flect that had patient 6 remained in the trial, his their very design aim to reduce the amount of observations would likely have continued to MNAR data (missingness explained by unob- worsen at a rate slightly greater than the treat- served responses), thereby increasing the plau- ment group average. sibility of MAR. Further, it is evident that MAR is The magnitudes of these adjustments are de- always more plausible than MCAR because MAR termined mathematically from the data. Addi- is always valid if MCAR is valid, and MAR can be tional details can be found elsewhere (15,17,21). valid in cases when MCAR is not. Although these details go beyond the scope of Despite the advantages of MAR methods, this article, the basic principle is easily appreci- LOCF is still favored for use as the primary ated. A mixed-effects analysis uses all the avail- analysis in many therapeutic areas. The follow- able data (Yobs) to compensate for the data ing sections address the concerns cited by regu- missing on a particular patient, whereas LOCF latory agencies when considering MMRM for uses only one data point. Again, using the hypo- the primary analysis, along with responses to thetical data in Table 1, in dealing with the miss- those concerns. ing data for patient 3, a mixed-effects analysis considers data from visits 1–4 on patient 3 as LOCF IS CONSERVATIVE well as all the data from patients 1 and 2. In con- One of the reasons often cited for the continued trast, LOCF uses only the visit 4 value from pa- widespread use of LOCF is that the potential bi- tient 3, assuming that visit 6 will be the same as ases in LOCF lead to a conservative analysis. In visit 4, even though that was not the case for any this context, conservative is typically thought of patient whose data were observed. as underestimating the magnitude of the treat- ment effect. SPECIFIC CONCERNS RAISED It is intuitively obvious that LOCF yields con- BY REGULATORY AGENCIES servative estimates of within-group changes REGARDING USE OF MMRM in many scenarios. However, interpretations of AS THE PRIMARY ANALYSIS treatment effects are based on between-group It is widely recognized that the restrictive as- comparisons. Results from empirical studies sumptions for ANOVA with LOCF seldom hold (3,7,8,11,12,17,22,23) have clearly shown that (1,17). It has also been clearly established that conservative behavior of LOCF in regard to be- when data fail to conform to these assumptions, tween-group comparisons is far from guaran- use of LOCF can lead to biased estimates of teed and, in fact, is in some scenarios unlikely.

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 308

308 STATISTICS Mallinckrodt et al.

The potential for anticonservative behavior of 3. A greater effect of bias on inferences regarding ex- LOCF has also been confirmed via analytic proof istence of a treatment effect when a drug’s advan- (5,27). tage is small and when sample sizes are large These are not just theoretical concerns. In a summary of all the outcomes from all the place- For scenarios in which the overall tendency is bo-controlled clinical trials included in a new for worsening (progressive impairment), the bi- drug application, LOCF yielded a smaller P value ases in (1) and (2) are reversed. than MMRM in 34% of the 202 comparisons Additionally, if a method yielded biased esti- (13). mates of treatment effects when treatment dif- When LOCF underestimates the superiority of ferences truly existed, then when the true treat- the superior treatment, it necessarily underesti- ment difference was zero, bias would necessarily mates the inferiority of the inferior treatment. lead to nonzero estimates of treatment differ- Thus, such a bias would be anticonservative in ences and inflation of type I error. Moreover, noninferiority testing and in a superiority test consider Alzheimer disease, wherein the thera- wherein the test agent is inferior to the control. peutic aim is to delay or slow deterioration of For safety outcomes, underestimating the mag- mental status (as compared to situations such as nitude of a treatment effect is certainly not con- depression in which the goal is to improve the servative. condition). If a treatment is in truth no more ef- The magnitude and direction of the bias from fective than placebo, but a patient drops out LOCF depends on the average attenuation from early in the treatment arm, carrying the last ob- flattening out the mean profile in the treatment servation forward assumes that the patient had group compared with the control group. This in no further deterioration in condition. turn depends on the rate and timing of the miss- A similar bias can occur in so-called mainte- ing data and the rate of change in the trajectory nance studies. For example, in a weight mainte- that is being attenuated. It has been shown that nance study, patients who lose a substantial the bias from LOCF may involve many factors amount of body weight through nonpharmaco- and can be complex (5). Whether or not the bias logical means begin drug therapy with the goal from LOCF leads to conservative estimates of of maintaining the initial weight loss (33). Any treatment benefits is yet another question that patient who drops out early is likely to have re- further depends on the disease state and sce- gained little weight. Therefore, applying LOCF nario. For example, the same direction of bias in this scenario assumes the patient maintained might be conservative if, on average, patients the weight loss despite having only observed improve, but would be anticonservative if the that patient for a short time. treatment goal were the delay of worsening or It is reasonable also to question whether con- maintenance of effect (9,12). Hence, it is diffi- servative analytic approaches are in the best in- cult to anticipate the effects of bias from LOCF terest of patients. Of course inflation of type I in practical situations. However, the general error is never good, but is it necessary to have tendencies of LOCF for scenarios in which the “extra protection” against type I error when it overall tendency is for improvement (progres- comes at the expense of losing power, that is, in- sive improvement) include the following: flated type II error? Moreover, independent of missing-data concerns, using a method that in- 1. Overestimation of a drug’s advantage when cludes only the first and last observation is in- dropout is higher or earlier in the comparator, and herently inefficient. Given the unmet medical underestimation of its advantage when dropout is lower or later in the comparator needs and the rising costs of health care, the 2. Overestimation of a drug’s advantage when the ad- need for new medicines, better medicines, and vantage is maximum at intermediate time points, more affordable medicines is clear. Many factors and underestimation of its advantage when the may influence the success of drug development; advantage increases over time however, the reliance on a method such as LOCF 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 309

Recommendations for Primary Analysis STATISTICS 309

(or BOCF) with known inflation of type I and standard errors accurately reflected the uncer- type II error is an obvious suspect. Therefore, tainty in the data; however, there were impor- trading reduced power for more than the need- tant differences in results between the methods ed protection against type I error is not in the in analyses of incomplete data. public interest. In the study in which there were treatment Rather than defining conservatism as under- differences at endpoint, the MMRM estimates estimating the magnitude of the treatment ef- were closer to the true value than estimates from fect, what if conservatism were defined as being LOCF in every scenario simulated. Standard er- dependable? In the context of a statistical rors from MMRM accurately reflected the un- analysis, this might be defined as yielding the certainty of the estimates, whereas standard er- type I and type II error rates that were expected. rors from LOCF underestimated uncertainty. With this dependability, clinical development of Pooled across all scenarios, confidence interval individual drugs could be more predictable, coverage (percentage of confidence intervals while preserving public safety against ineffec- containing the true value) was 94% and 87% for tive drugs. So, by this definition, MMRM is MMRM and LOCF, respectively, compared with clearly more conservative than LOCF. the expected coverage rate of 95%. Notably, LOCF overestimated the treatment effect in EFFECT OF MNAR DATA some scenarios, typically when there was higher As we have previously noted, MMRM assumes dropout in the inferior (eg, placebo) group. data are MAR, which is often a reasonable as- In the type I error rate study, pooled across all sumption in longitudinal clinical trials and al- scenarios with missing data, the type I error ways at least as plausible as MCAR. However, the rates for MMRM and LOCF were 5.9% and possibility of MNAR data can never be ruled out. 10.4%, respectively, compared with the expect- Therefore, it is not surprising that when MMRM ed rate of 5%. Type I error rates in the 32 sce- has been proposed as the primary analysis, reg- narios ranged from 5.0% to 7.2% for MMRM and ulators have asked about the impact of MNAR from 4.4% to 36% for LOCF. data on the MMRM results. The more relevant The third study (11) included a factorial question is, what is the impact of MNAR data on arrangement of scenarios, with four patterns of MMRM compared with their impact on LOCF? mean change over time and three true correla- Research on the comparative impact of MNAR tion structures (autoregressive, compound sym- data on MMRM and LOCF is summarized in this metry, and unstructured). The mean change section. patterns included two scenarios in which the Mallinckrodt and colleagues (7,8,11) com- null hypothesis of no difference between treat- pared MMRM with LOCF ANOVA in a series of ments in mean change from baseline to end- simulation studies with MNAR missingness. The point was true and another two scenarios in first study included scenarios in which there which it was false. Data from each scenario were was a true difference between treatments in analyzed using MMRM with each correlation mean change from baseline to endpoint. The structure and with LOCF. The intent in using second study focused on type I error rates by these correlation structures was not to advocate simulating scenarios in which the difference be- their use, but to use very different structures to tween treatments in mean change from baseline assess how MMRM would compare with LOCF to endpoint was zero. In both studies, compar- under extreme conditions of misfitting the cor- isons were made in data before introducing relation structure. missingness (complete data) and in the same In most cases, the type I error rates from LOCF data sets after eliminating data via an MNAR were greater than or equal to those from any of mechanism. In analyses of complete data, the corresponding MMRM analyses, indicating MMRM and LOCF yielded identical results. Esti- that even egregious misfitting of the correlation mates of treatment effects were not biased, and structure with MMRM was typically less delete-

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 310

310 STATISTICS Mallinckrodt et al.

rious than using LOCF. Importantly, the use of MNAR with substantial differential dropout. In an unstructured covariance matrix in MMRM, the majority of comparisons under MNAR data, regardless of the form of the true covariance MMRM resulted in bias that was less than or matrix, yielded superior control of type I error equal to that obtained with LOCF. Of 63 com- compared with LOCF in every scenario investi- parisons, 42 resulted in at least 10% less bias gated. In pooled results from scenarios under with MMRM, 10 were about the same, and 11 the true null hypothesis, the type I error rate showed more than LOCF. In 6 of the 11 cases in from MMRM using an unstructured correlation which LOCF was less biased, the treatment dif- matrix was 6.2% compared with 9.8% for LOCF. ference (on which percentage bias was based) When the true treatment difference was large was very small. and dropout rate was higher in the superior Additionally, the perturbations in power arm, MMRM with an unstructured covariance caused by MMRM tended to be less than those matrix produced an average estimated treat- for LOCF and less subject to extreme differences ment difference of 12.6 compared with 9.1 from from the nominal values. Use of MMRM rarely LOCF and the true value of 12. The average pow- caused a difference in power greater than 20%, er from MMRM was 75% compared to 59% for whereas use of LOCF caused such a difference in LOCF and 81% for both methods in the com- nearly half of the simulations conducted. plete data. Across these studies, the magnitude of bias In contrast, when the true treatment differ- produced by MNAR data was smaller with ence was small and dropout rate was greater in MMRM than with LOCF, and MMRM provided the inferior arm, MMRM with an unstructured more robust control of type I and II error rates covariance matrix produced an average estimat- than LOCF. Furthermore, in actual clinical trial ed treatment difference of 2.9 compared to 5.2 data, MMRM yielded results similar to those of a for LOCF and a true value of 4. The average pow- selection model (MNAR) approach, and it was er from MMRM was 10% compared with 22% determined that MMRM was an appropriate pri- from LOCF and 17% with both methods in com- mary analysis for these data (5,14). plete data. This apparent increase in power with LOCF, despite an overall 35% dropout rate, was DETERMINING AND DEFINING driven by the bias in its estimates of treatment APPROPRIATE MODELING CHOICES effect. FOR MMRM Lane (3) conducted simulation studies based Regulators have noted that using MMRM entails on six actual clinical trial data sets. For each tri- more explicit modeling choices than using al, multiple sets of data were generated from LOCF. While true, the rather modest increase in multivariate normal distributions, with means complexity of MMRM has not been a hindrance and covariances set to the estimates obtained in implementation. from the actual data. Observations were re- When determining a suitable model for a moved from the simulated data to give missing study collecting longitudinal data, it is impor- values in accordance with an MNAR mechanism tant to realize that no universally accepted using three different models for probability of “best” model can be prespecified for the data dropout depending on the next observation eventually obtained. However, the main charac- (treated as unobserved). Both equal and differ- teristics of the data will be driven by the design ential dropout rates between treatments were of the study. And, to a large degree, an appropri- investigated. ate MMRM model follows logically from the de- LOCF led to misinterpretation of results when sign of the study and thus can be adequately dropout mechanism was not MCAR (MAR or prespecified. MNAR), particularly in cases with differential Three important characteristics to consider dropout rates. In contrast, MMRM led to misin- when specifying a model for data from longitu- terpretation only in cases in which data were dinal clinical trials are the random effects, the 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 311

Recommendations for Primary Analysis STATISTICS 311

correlations between the repeated measure- structured treatment-by-time effects (and with- ments (within-patient errors), and the time in-patient errors), MMRM and LOCF yield iden- trends. tical treatment contrasts if no data are missing. As noted in the earlier section “Terminology An unstructured modeling of within-patient and Concepts Regarding Use of MMRM and correlations also removes one layer of assump- LOCF in Regulatory Settings,” the feature dis- tions and often provides the best fit to the data. tinguishing MMRM from other mixed-effects However, overly general correlation structures analyses is the modeling of the random effects can lead to an analysis that fails to converge. Al- as part of the within-patient error correlation though failure to converge often results from structure. Handling the random effects in this improperly preparing the data (eg, two observa- manner simplifies the analysis while having no tions on the same patient at the same time (or very little) impact on inferences of treatment point, or poor choice of options in software), a effects. priori specification of the primary analysis Clinical trials often have a common schedule must have flexibility to allow alternative models of measurements for all patients, with a large to be fit if an analysis fails to converge because number of patients and a relatively small num- the prespecified correlation structure is too ber of measurement occasions. With such a data general. structure, MMRM can be implemented using a Several approaches can be taken to ensure full multivariate model, featuring an unstruc- convergence. First, every attempt should be tured modeling of time and correlation (10). If made to ensure convergence is obtained from a the number of patients relative to the number of given correlation structure. For example, con- measurement occasions is not large, more parsi- vergence can be enhanced by using software monious approaches are easily implemented. features such as the inputting of starting values For example, time trends could be modeled us- for parameter estimates, or the use in the initial ing linear and quadratic effects, and some struc- rounds (but not final rounds) of iteration algo- tured form of the V matrix could be fit to the rithms such as Fisher’s scoring rather than the within-patient correlations. Newton-Raphson algorithm, which is the de- However, the functional form of the longitudi- fault algorithm in many software packages. nal trends can be difficult to anticipate, and in Rescaling the data is also an option. If out- particular, linear time trends may not adequate- comes and covariates are made to fall in ranges ly describe the response profiles. A parsimonious in the order of magnitude of unity, interpreta- model using a structured form of the time trends tions and conclusions will not be changed; but could be more powerful than an unstructured avoiding manipulation of large or small num- model, but it could also be a poor fit. Therefore bers from a numerical analysis perspective re- in many scenarios, an unstructured modeling of duces the risk of ill-conditioned matrices, and time and the treatment-by-time interaction pro- ultimately, overflow or underflow. In addition, vides an assumption-free approach, does not re- the protocol can envision one of several model- quire estimation of an inordinate number of pa- fitting approaches. One could simply specify a rameters, and can be depended upon to yield a set of structures to be fit, and use as the primary useful result—attributes well suited to the pri- analysis the one yielding the best fit as assessed mary analysis. by standard model-fitting criteria. However, if It also is worth noting that an MMRM analysis one does not want to build models from the using the full multivariate approach (unstruc- same data from which hypotheses are to be test- tured modeling of time, treatment-by-time, and ed, a series of structures could be specified in a within-patient errors) for analyses of complete fixed sequence, and the first correlation struc- data (no missing observations) yields the same ture to yield convergence would be considered inference about the endpoint as an analysis of the primary analysis. For example, unstructured that endpoint by itself. That is, with fully un- could be specified as the structure for the pri-

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 312

312 STATISTICS Mallinckrodt et al.

mary analysis; but if it failed to converge, a series imposed that the dependence of response on of ever-more parsimonious structures appropri- the baseline measure is the same at all time ate to the situation at hand could be fit until points. Alternatively, both the baseline and one converges, which would then be considered postbaseline measures can be treated as re- the primary analysis. sponse variables under the assumption that the Although these approaches have always yield- baseline means are the same across treatment ed a converged analysis in our experience, it is groups to reflect randomization (34). For other reasonable to wonder what effect the true corre- covariates, such as age and gender, it may be lation structure in the data, and the method of considered appropriate to include no interac- modeling the correlation structure, have on re- tion with time because the effects can be taken sults. One study (11) assessed the effect of cor- to be constant; but such decisions need to be relation structure and how it is modeled on type taken and explained at the stage of planning the I error rates and power, and compared results analysis. from MMRM with LOCF. Results of this study are The following example illustrates one way to detailed in the earlier section, “Effect of MNAR specify a priori all the details of an MMRM Data.” When the correct correlation structure analysis such that independent analysts will ar- was fit, MMRM provided better control of type I rive at exactly the same results. This particular error and power than LOCF. Although misfitting wording specifies the full multivariate approach, the correlation structure in MMRM inflated with an unstructured modeling of treatment ef- type I error and altered power, even egregious fects over time and within-patient error correla- misfitting of the structure was typically less tions. deleterious than using LOCF. In fact, simply us- ing an unstructured model in MMRM yielded Mean changes from baseline will be analyzed us- superior control of type I error than LOCF in ing a restricted maximum likelihood (REML)- every scenario tested. based repeated measures approach. Analyses will include the fixed, categorical effects of treatment, Therefore, MMRM provides flexibility for investigative site, visit, and treatment-by-visit in- modeling the within-patient correlation struc- teraction, as well as the continuous, fixed covari- ture, does so in a manner that can be specified a ates of baseline score and baseline score-by-visit priori, ensures that analysts following those interaction. An unstructured (co)variance struc- specifications will independently arrive at ex- ture will be used to model the within-patient er- actly the same result, and even in worst-case sce- rors. If this analysis fails to converge, the following narios provides estimates of treatment effects structures will be tested: (insert a list of structures with less bias than LOCF—all attributes that are appropriate for the specific application). The well suited to the primary analysis. (co)variance structure converging to the best fit, Another aspect of time trends that must be as determined by Akaike’s information criterion, considered is in relation to the covariates repre- will be used as the primary analysis. The Kenward- sented by β in the model (Eq. 1). The treatment Roger approximation will be used to estimate de- effect is clearly the most crucial in pharmaceuti- nominator degrees of freedom. Significance tests will be based on least-squares means using a two- cal clinical trials, and the interaction of this ef- sided α=.05 (two-sided 95% confidence inter- fect with time has been discussed previously. vals). Analyses will be implemented using (insert Other covariates are often also included in the software package). The primary treatment com- model, and these must also be considered. For parisons will be the contrast between treatments example, the effect of a baseline observation at the endpoint visit. may be included, as subjects’ responses may be considered dependent on their condition at the Note that the primary analysis could be based start of the trial. In this case, too, it is usually on contrasts at time points other than end- preferable to allow a full interaction of the co- point, or could be based on the treatment main variate with time, for if not, a restriction is being effects. 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 313

Recommendations for Primary Analysis STATISTICS 313

LOCF JUSTIFIED AS A FACTUAL, When causal effects are the primary objective, COMPOSITE, OR EFFECTIVENESS the gold standard design is a double-blind, ran- ENDPOINT domized clinical trial. Hence, using LOCF in a Literally taken, the acronym LOCF implies impu- factual context is inconsistent with the design tation of missing values in a longitudinal con- and primary objective of confirmatory clinical text. An alternative interpretation of LOCF is trials. commonly used that might be better termed LO Furthermore, the rate and timing of dropout (last observation) or LAV (last available value). In does not necessarily reflect the true benefit and this approach, results are not interpreted as im- risk of the drug. While LOCF can in some situa- putations of missing data with changes assessed tions yield smaller estimates of treatment differ- at a specific time point, but rather as the change ences when patients drop out due to adverse that was actually seen at last observation re- events, the reduction is not necessarily propor- gardless of when it was observed (9). When tional to the safety risk (9). For example, consid- LOCF is used in this manner, it is said to esti- er the following two patients in an 8-week trial: mate a factual outcome in that it estimates what patient A dropped out after week 7 because of a was actually (factually) observed at the last as- dramatically prolonged QT interval; patient B sessment, regardless of when that observation dropped out during week 1 with nausea. The was made. In this same context, MMRM is said to impact on estimates of mean change resulting be estimating a counterfactual outcome in that from patient A’s dropout was small because the it estimates the effect that would have been ob- last observation was close to the trial’s endpoint, served had patients stayed in the trial, contrary whereas the impact from patient B’s dropout to the fact that some patients dropped out (27). was severe because (in many disease states) little The use of LOCF in the factual context stems improvement results from one week of treat- from its intuitive appeal as a pragmatic measure ment. However, patient A developed a potential- of effectiveness, a composite of efficacy, safety, ly life-threatening condition, whereas the nau- and tolerability (9,12,27). However, the fact that sea experienced by patient B early in the trial is it is easy to understand and calculate an LOCF typically transitory and often resolves with con- value should not be confused with its yielding a tinued therapy and no long-term consequences. meaningful measure. If one were to objectively This nonproportional penalty to individual pa- seek a factual or all-encompassing assessment, tients from LOCF may cause misleading infer- it seems unlikely that one would arrive at LOCF. ences regarding the merits of a treatment. First, the primary purpose of confirmatory As an even more extreme, but common, exam- clinical trials is typically to delineate causal dif- ple, consider the Alzheimer disease scenario ferences between drug and placebo (or between noted in the earlier section, “LOCF Is Conserva- drugs), not to mimic actual clinical practice. It is tive,” in which the therapeutic aim is to delay or unreasonable to assume that doctors and pa- slow deterioration of mental status. Using the tients make the same decisions regarding con- last observation from a patient who dropped out tinuation of therapy in a double-blind trial—in early from the treatment arm due to an adverse which they are unsure about whether the pa- event (AE) would actually reward the drug for tient is taking drug or placebo—as they would the AE, as the patient would appear to have had make in actual practice, when the drug and its no further deterioration in condition. properties are well known. Therefore, the rates A related point is that an LOCF result used in and reasons for dropout within the strictly con- this manner does not correspond to a popula- trolled conditions of a confirmatory clinical tri- tion parameter that can be prespecified. It is es- al are unlikely to mimic what would happen in sentially a composite whose components (effi- general use. If effectiveness were the primary cacy, safety, tolerability) have unknown, or at objective, the best place to assess it would be least random, weights. Hence, using LOCF in a in a general medical (ie, naturalistic) setting. hypothesis-testing setting violates the funda-

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 314

314 STATISTICS Mallinckrodt et al.

mental approach of statistics wherein we at- It is also important to recognize that an end- tempt to make inference about population pa- point analysis of any type is able to provide only rameters. If a composite measure of efficacy, a small part of the overall picture, and that the safety, and tolerability is of primary interest, entire longitudinal treatment profile should be then it would be better to have a prespecified considered in order to address such questions measure that can capture these facets uniformly as “How soon until I feel better?” or “How soon for each patient, such as an a priori–defined until I feel well?” Longitudinal methods such as clinical utility index. MMRM are ideally suited to provide such infor- It is sobering to recognize that an LOCF result mation from the same analysis as that which may be manipulated by design factors and the be- produces the endpoint contrast. havior of investigative site staff who encourage prolonged participation, possibly making the HANDLING NONIGNORABLE drug look better, but of course not altering the in- MISSINGNESS (MNAR) herent risk-benefit of the drug. This may take Although the assumption of MAR is often rea- place, for example, when an extension period is sonable in clinical trials, the possibility of data added to the randomized part of a clinical trial. missing not at random (MNAR) is difficult to Minimizing dropout is widely accepted as good rule out. Therefore, analyses valid under MNAR scientific practice. However, the concern here is are needed. Analyses in the MNAR framework that the amount of dropout should not directly try in some manner to model or otherwise take change the measure of interest, parameter being into account the missingness. Although reasons estimated, or the hypothesis that is being tested. for (early) discontinuation are routinely collect- Therefore, while it is true that MMRM esti- ed in clinical trials, they may not reveal much mates a hypothetical parameter in that not all about the missing-data mechanism, and model- patients stay on to the specific time ing or incorporating information about the points at which mean changes are estimated, missingness into the data analysis may not be the use of LOCF in the composite or factual con- straightforward. text is also fraught with many problems. Impor- The obvious but fundamental problem is that tantly, the hypothetical nature of the adjusted we do not have the missing data, so we cannot means from MMRM is not in practice a hin- definitively know its characteristics; we can only drance to interpretation. One can take the effi- make assumptions. Conclusions from MNAR cacy results from MMRM and combine them analyses are therefore conditional on the appro- with the various safety and tolerability results in priateness of the assumed model. While de- an ad hoc manner, as has traditionally been pendence on assumptions is not unique to done, or in a formal clinical utility index to as- MNAR analyses, a unique feature with MNAR sess the overall benefit-risk of the drug. analyses is that (some of) the assumptions are In fact, rather than viewing the hypotheses not testable (35) because we do not have the tested by LOCF and MMRM as factual and coun- missing data about which the assumptions are terfactual, one might view the hypothesis tested made (36). by MMRM as what is expected when patients Importantly, the consequences of model mis- take the drug as directed, whereas LOCF tests specification are more severe with MNAR meth- what is expected when the drug is taken as ob- ods than with other (eg, MAR) methods (19,36– served. Both are useful; the key is to match the 49). Hence, no individual MNAR analysis can be hypothesis with the stage of development and considered definitive. Not surprisingly then, design of clinical trial. The hypothesis tested by many statistical methodologies have been pro- MMRM is aligned with confirmatory clinical tri- posed to analyze data in the MNAR setting. als utilizing double-blind, randomized designs, General classes of MNAR methods have arisen whereas the hypothesis tested by LOCF is best from different factorizations of the likelihood evaluated in naturalistic settings. functions for the joint distribution of the out- 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 315

Recommendations for Primary Analysis STATISTICS 315

come variable and the indicator variable for missingness does not depend on the unob- whether or not a data point is observed. Factor- served outcomes of the variable being analyzed. ization in this context means that the hypothet- Therefore, if additional (ancillary) variables are ical “full” data are split into two parts: the actu- added to the model that helps explain missing- ally observed part and the missing part, which ness, MAR can be valid; whereas if the addition- are often described as the measurement process al variables were not included, the data would and the missingness process, respectively. be MNAR. The selection model framework (16,18,41) de- Collins and colleagues (50) state that multiple scribes the full data likelihood as the product of imputation (MI), originally proposed by Little the marginal density of the measurement process and Rubin (16) as an MAR method, is well suited and the density of the missingness process condi- to improving the performance of the missing- tional on the outcomes. Conceptually, a selection data procedure through the use of ancillary vari- model as typically implemented can be thought ables. They also note that ancillary variables can of as a multivariate analysis. The first outcome be included in likelihood-based analyses (such variable is the same outcome being analyzed as in as MMRM). This could be done by adding the an MAR analysis, typically a mean change analy- ancillary variable either as a covariate or as an sis. The second variable is the indicator variable additional response to create a multivariate for dropout, often analyzed via logistic regres- analysis. However, the complexity of multivariate sion. Selection models have been formulated in analyses and the features of most of the commer- parametric (41) and semiparametric (42) frame- cially available software make it easier to use an- works. cillary variables via multiple imputation. Liu and Pattern-mixture models (43,44) are based on Gould (6) and Lipkovich et al. (51) provided im- factorization of the full data likelihood as the plementations of MI with ancillary variables (in- product of the measurement process condition- cluding AE information) in clinical trial contexts. al on the dropout pattern and the marginal den- In addition, MI has the added advantage that sity of the missingness process. Conceptually, with separate steps for imputation and analysis, pattern mixture models as typically implement- ancillary variables that are postbaseline, time- ed assess the outcome variable separately for varying covariates—possibly influenced by treat- different groups (patterns), often defined by ment—can be included in the imputation step time of dropout, and then combine results to account for missingness but then not includ- across groups for final inference. ed in the analysis step to avoid confounding A third approach, the shared-parameter mod- with the treatment effects, as might be the case el (19,45–49), is similar to selection models in in a likelihood-based analysis. that it jointly models the measurement and Although methods to test for the existence dropout processes. Shared-parameter models and impact of outlier (influential) observations assume that a certain parameter, typically a ran- have been around for decades, new methods dom effect, influences both the outcome vari- have been developed for use in MNAR analyses. able and dropout, such that conditional upon To this end, interest has grown in local influence this parameter, the measurement and dropout approaches (14,52–57), which are often associ- processes are independent. ated with selection models. Local influence pro- The conceptual similarity between these dif- vides an objective approach to identifying and ferent approaches is that they go beyond ignor- examining the impact of influential data points ability by adding something to the analysis to and patients on various aspects of the analysis, account for the dropout. Another strategy is to including the missing-data mechanisms and add ancillary variables to the analysis of the out- treatment effects. Shen and colleagues (14) pro- come variable of interest in order to explain the vide a case study of a longitudinal depression dropout. The basic idea is that data are MAR if, trial showing how local influence can be helpful conditional upon the variables in the model, in conducting sensitivity analysis.

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 316

316 STATISTICS Mallinckrodt et al.

This is by no means an exhaustive list of all the quire customized programs, may suffer numeri- available methods for analyses under MNAR, cal convergence problems, or may be complicat- but rather a brief overview of the fundamental ed by weakly identified or underidentified mod- underpinnings that fostered development of els. Therefore, MNAR methods are not well many of the methods. See, for example, the suited for the primary analyses in confirmatory study by Ibrahim and colleagues (58) for com- clinical trials wherein a dependable, prespeci- parisons of common approaches for ignorable fied method is needed. We conclude, as have and nonignorable missing-data mechanisms, in- others (5,14), that the proper framework for use cluding maximum likelihood, multiple imputa- of MNAR approaches for confirmatory clinical tion, fully Bayesian, and semiparametric weight- trials is that of sensitivity analyses. ed estimation equations. MAR is the most appropriate framework for Developing an appropriate strategy for analy- the primary analysis in confirmatory trials be- sis under MNAR begins with recognizing that cause this assumption is often reasonable and these methods are heavily assumption driven certainly more plausible than MCAR. Use of and that the assumptions are not testable. MAR is further supported in that the conse- Therefore, no single MNAR approach can be quences of departures from MAR can be evalu- considered definitive. Consequently, a useful ated via sensitivity analyses, and MAR methods and common approach is to fit several MNAR are often robust to departures from MAR. models or methods utilizing different assump- Likelihood-based repeated measures ap- tions regarding the data distribution and miss- proaches, such as MMRM, provide a flexible ingness within a sensitivity analysis framework, framework under the MAR assumption from thereby allowing assessment of robustness of re- which analyses can be tailored to the specific sults to the various assumptions. situation at hand. Flexibility in modeling treat- ment effects over time and the within-patient RECOMMENDATIONS error correlation structure are particularly use- Having discussed theoretical and practical con- ful in this regard, making MMRM a widely useful siderations, we now turn to specific recommen- analysis in drug development. Specifically, dations. MMRM is an appropriate choice for the primary Our first recommendation is a natural conse- analysis in many longitudinal confirmatory clin- quence of the inability of any statistical analysis ical trials, especially those scenarios in which to recoup the loss of information due to missing LOCF has been an acceptable primary analysis data. Therefore, whatever methods are to be em- in the past. The historical precedent for LOCF ployed in analysis, they should not detract from makes it a likely choice to include as a sensitivi- efforts to plan a trial that minimizes dropout. In ty analysis. However, MNAR-based analyses addition, detailed records of the reasons for should be the primary basis upon which sensi- missing data and data on potential covariates tivity is assessed because MNAR analyses focus that might help further characterize the proba- on the assumption key to validity of MMRM, bility of dropout should be obtained. whereas discrepancies between an LOCF and Regarding the primary analysis for confirma- MMRM result could arise for many reasons un- tory longitudinal clinical trials, conclusive evi- related to the validity of MMRM. dence has demonstrated the need to abandon Our specific recommendation of MMRM for the simple, ad hoc methods such as LOCF and an MAR-based primary analysis needs to be con- BOCF. Given that the possibility of MNAR data sidered in light of the mission of our working can never be ruled out, one might be tempted to group. Our aim was to (a) clarify terminology shift the primary analytic approach to that of an and concepts regarding use of MMRM and MNAR method. However, MNAR methods are LOCF in regulatory settings, (b) address specific sensitive to untestable assumptions. Also, from a concerns raised by regulatory agencies regard- practical standpoint, many MNAR methods re- ing use of MMRM as the primary analysis, and 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 317

Recommendations for Primary Analysis STATISTICS 317

(c) make specific recommendations for analysis tistical methods for handling missing repeated of data from longitudinal clinical trials. measurements in obesity trials: beyond LOCF. This degree of focus has the advantage of fa- Obes Rev. 2003;4:175–184. cilitating a detailed and thorough discussion. 3. Lane PW. Handling drop-out in longitudinal clin- However, it has the limitation of not fully con- ical trials: a comparison of the LOCF and MMRM approaches. Pharm Stat. (early view) 2007;DOI: sidering other analyses. Some readers will won- 10.1002/pst.267. der why we did not include in our comparisons 4. Leon AC, Mallinckrodt CH, Chuang-Stein C, with LOCF Bayesian analyses, multiple imputa- Archibald DG, Archer GE, Chartier K. Attrition tion, or weighted generalized estimating equa- in randomized controlled clinical trials: method- tions. All of these methods are intrinsically simi- ological issues in psychopharmacology. Biol Psy- lar in their underlying assumptions (MAR) and chiatry. 2006;59:1001–1005. have extensive literature supporting their appli- 5. Molenberghs G, Thijs H, Jansen I, et al. Analyzing cation. As such, these methods ought to borrow incomplete longitudinal clinical trial data. Bio- strength from each other, rather than engage in statistics. 2004;5:445–464. mutual competition. 6. Liu G, Gould AL. Comparison of alternative Our focus on MMRM was driven by extensive strategies for analysis of longitudinal trials with experience with this method in the specific situ- dropouts. J Biopharm Stat. 2002;12:207–226. ation of relevance—confirmatory clinical trials. 7. Mallinckrodt CH, Clark WS, David SR. Account- ing for dropout bias using mixed-effects models. The other MAR approaches have not been stud- J Biopharm Stat. 2001;11(1–2):9–21. ied and used as extensively as MMRM in this re- 8. Mallinckrodt CH, Clark WS, David SR. Type I er- gard. Therefore, we focused on the area wherein ror rates from mixed effects model repeated practical experience was greatest so that our measures versus fixed effects ANOVA with miss- recommendations could be implemented imme- ing values imputed via last observation carried diately and with minimal ambiguity. forward. Drug Inf J. 2001;35:1215–1225. 9. Mallinckrodt CH, Sanger TM, Dube S, et al. As- sessing and interpreting treatment effects in lon- Acknowledgments—We benefited from review of the gitudinal clinical trials with missing data. Biol draft article by three prominent academics in the Psychiatry. 2003;53:754–760. field of missing data and longitudinal analyses, and 10. Mallinckrodt CH, Clark SW, Carroll RJ, Molen- we thank them for their thoughtful comments and berghs G. Assessing response profiles from in- suggestions: Rod Little (University of Michigan complete longitudinal clinical trial data under School of Public Health), Geert Molenberghs (Has- regulatory considerations. J Biopharm Stat. 2003; selt University Centre for Statistics), and Daniel 13:179–190. Scharfstein (Johns Hopkins Bloomberg School of 11. Mallinckrodt CH, Kaiser CJ, Watkin JG, Molen- Public Health). We are also grateful for review and ad- berghs G, Carroll RJ. The effect of correlation vice from several colleagues in the pharmaceutical in- structure on treatment contrasts estimated from dustry: Bruce Binkowitz (Merck), Argyha Chattopad- incomplete clinical trial data with likelihood- hyay (Johnson & Johnson), David Keller (Pfizer), based repeated measures compared with last ob- Frank Liu (Merck), Kaifeng Lu (Merck), Edmund Luo servation carried forward ANOVA. Clin Trials. (Merck), Akiko Okamoto (Johnson & Johnson), and 2004;1:477–489. James Roger (GSK). Although we incorporated many 12. Mallinckrodt CH, Kaiser CJ, Watkin JG, Detke MJ, of the suggestions made by these reviewers, the indi- Molenberghs G, Carroll RJ. Type I error rates viduals mentioned above were not asked to specifical- from likelihood-based repeated measures analy- ly endorse the recommendations made in this article. ses of incomplete longitudinal data. Pharm Stat. 2004;3:171–186. 13. Mallinckrodt CH, Raskin J, Wohlreich MM, REFERENCES Watkin JG, Detke MJ. The efficacy of duloxetine: a 1. Molenberghs G, Kenward MG. Missing Data in comprehensive summary of results from MMRM Clinical Studies. Chichester: John Wiley; 2007. and LOCF_ANOVA in eight clinical trials. BMC 2. Gadbury GL, Coffey CS, Allison DB. Modern sta- Psychiatry. 2004;4:26.

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 318

318 STATISTICS Mallinckrodt et al.

14. Shen S, Beunckens C, Mallinckrodt C, Molen- note on LOCF imputation. Biometrics. 2004;60: berghs G. A local influence sensitivity analysis for 820–828. incomplete longitudinal depression data. J Bio- 30. Molenberghs G, Verbeke G. Models for Discrete pharm Stat. 2006;16:365–384. Longitudinal Data. New York: Springer; 2005. 15. Cnaan A, Laird NM, Slasor P. Using the general 31. Beunckens C, Molenberghs G, Kenward MG. Di- linear mixed model to analyse unbalanced re- rect likelihood analysis versus simple forms of peated measures and longitudinal data. Stat Med. imputation for missing data in randomized clini- 1997;16:2349–2380. cal trials. Clin Trials. 2005;2:379–386. 16. Little R, Rubin D. Statistical Analysis With Missing 32. Rubin DB, Stern HS, Vehovar V. Handling “don’t Data. New York: John Wiley; 1987. know” survey responses: the case of the Slovenian 17. Verbeke G, Molenberghs G. Linear Mixed Models plebiscite. J Am Stat Assoc. 1995;90:822–828. for Longitudinal Data. New York: Springer; 2000. 33. Hill JO, Hauptman J, Anderson JW, Fujioka K, 18. Rubin DB. Inference and missing data. Biometri- O’Neil PM, Smith DK, Zavoral JH, Aronne LJ. Orli- ka. 1976;63:581–592. stat, a lipase inhibitor, for weight maintenance 19. Little RJA. Modeling the drop-out mechanism in after conventional dieting: a 1-yr study. Am J Clin repeated measures studies. J Am Stat Assoc. Nutr. 1999;69:1108–1116. 1995;90:1112–1121. 34. Liang K, Zeger S. Longitudinal data analysis of 20. Lavori PW. Clinical trials in psychiatry: should continuous and discrete responses for pre-post protocol deviation censor patient data? Neu- designs. Sankhya: Indian J Stat. 2000;62(Series ropsychopharmacology. 1992;6:39–48. B):134–148. 21. Laird NM, Ware JH. Random-effects models for 35. Molenberghs G, Kenward MG, Lesaffre E. The longitudinal data. Biometrics. 1982;38:963–974. analysis of longitudinal ordinal data with non- 22. Little R, Yau L. Intent-to-treat analysis for longi- random dropout. Biometrika. 1997;84:33–44. tudinal studies with drop-outs. Biometrics. 1996; 36. Laird NM. Discussion to Diggle PJ, Kenward MG. 52:1324–1333. Informative dropout in longitudinal data analy- 23. Gibbons RD, Hedeker D, Elkin I, et al. Some con- sis. Appl Stat. 1994;43:84. ceptual and statistical issues in analysis of longi- 37. Rubin DB. Discussion to Diggle PJ, Kenward MG. tudinal psychiatric data. Application to the Informative dropout in longitudinal data analy- NIMH Treatment of Depression Collaborative sis. Appl Stat. 1994;43:80–82. Research Program dataset. Arch Gen Psychiatry. 38. Copas JB, Li HG. Inference for non-random sam- 1993;50:739–750. ples (with discussion). J Royal Stat Soc B. 1997; 24. Heyting A, Tolboom JT, Essers JG. Statistical han- 59:55–96. dling of drop-outs in longitudinal clinical trials. 39. Draper D. Assessment and propagation of model Stat Med. 1992;11:2043–2061. uncertainty (with discussion). J Royal Stat Soc B. 25. Lavori PW, Dawson R, Shera D. A multiple impu- 1995;57:45–97. tation strategy for clinical trials with trunca- 40. Kenward MG. Selection models for repeated tion of patient data. Stat Med. 1995;14:1913– measurements with non-random dropout: an il- 1925 . lustration of sensitivity. Stat Med. 1998;17:2723– 26. Siddiqui O, Ali MW. A comparison of the random- 2732. effects pattern mixture model with last-observa- 41. Diggle PD, Kenward MG. Informative dropout in tion-carried-forward (LOCF) analysis in longitu- longitudinal data analysis (with discussion). Appl dinal clinical trials with dropouts. J Biopharm Stat. 1994;43:49–93. Stat. 1998;8:545–563. 42. Rotnitzky A, Robins JM, Scharfstein DO. Semi- 27. Shao J, Zhong B. Last observation carry-forward parametric regression for repeated outcomes and last observation analysis. Stat Med. 2003;22: with nonignorable nonresponse. J Am Stat Assoc. 2429–2441. 1998;93:1321–1339. 28. Carpenter J, Kenward M, Evans S, White I. Last 43. Little RJA. Pattern-mixture models for multivari- observation carry-forward and last observation ate incomplete data. J Am Stat Assoc. 1993;88: analysis. Letter to the Editor. Stat Med. 2004;23: 125–134. 3241–3244. 44. Little RJA. A class of pattern-mixture models for 29. Cook RJ, Zeng L, Yi GY. Marginal analysis of in- normal incomplete data. Biometrika. 1994;81: complete longitudinal binary data: a cautionary 471–483. 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 319

Recommendations for Primary Analysis STATISTICS 319

45. Wu MC, Carroll RJ. Estimation and comparison and generalized estimating equations for analy- of changes in the presence of informative right sis of binary repeated measures in clinical stud- censoring by modeling the censoring process. ies. Pharm Stat. 2005;4:267–285. Biometrics. 1988;44:175–188. 52. Zhu HT, Lee SY. Local influence for incomplete- 46. Ten Have TR, Kunselman AR, Pulkstenis EP, Lan- data models. J Royal Stat Soc B. 2001;63:111–126. dis JR. Mixed effects logistic regression models 53. Verbeke G, Molenberghs G, Thijs H, Lesaffre E, for longitudinal binary response data with in- Kenward MG. Sensitivity analysis for nonrandom formative drop-out. Biometrics. 1998;54:367– dropout: a local influence approach. Biometrics. 383 . 2001;57:7–14. 47. Wu MC, Bailey KR. Estimation and comparison 54. Thijs H, Molenberghs G, Verbeke G. The milk of changes in the presence of informative right protein trial: influence analysis of the dropout censoring: conditional linear model. Biometrics. process. Biometric J. 2000;42:617–646. 1989;45:939–955. 55. Molenberghs G, Verbeke G, Thijs H, Lesaffre E, 48. Mori M, Woodworth GG, Woolson RF. Applica- Kenward M. Mastitis in dairy cattle: local influ- tion of empirical Bayes inference to estimation of ence to assess sensitivity of the dropout process. rate of change in the presence of informative Comput Stat Data Anal. 2001;37:93–113. right censoring. Stat Med. 1992;11:621–631. 56. Troxel AB, Ma G, Heitjan DF. An index of local 49. Follmann D, Wu M. An approximate generalized sensitivity to nonignorability. Statistica Sinica. linear model with random effects for informative 2004;14:1221–1237. missing data. Biometrics. 1995;51:151–168. 57. Ma G, Troxel AB, Heitjan DF. An index of local 50. Collins LM, Schafer JL, Kam CM. A comparison of sensitivity to nonignorable drop-out in longitu- inclusive and restrictive strategies in modern dinal modeling. Stat Med. 2005;24:2129–2150. missing data procedures. Psychol Methods. 2001; 58. Ibrahim JG, Chen MH, Lipsitz SR, Herring AH. 6:330–351. Missing-data methods for generalized linear 51. Lipkovich I, Duan Y, Ahmed S. Multiple imputa- models: a comparative review. J Am Stat Assoc. tion compared with restricted pseudo-likelihood 2005;100:332–346.

Craig Mallinckrodt has disclosed that he is a stock shareholder in Eli Lilly and Co. Peter W. Lane has disclosed that he is a stock shareholder in and has received grants/research support from GlaxoSmithKline. Dan Schnell has disclosed that he is a stock shareholder in Procter & Gamble Co. Yahong Peng and James P. Mancuso report no relationships to disclose.

Drug Information Journal 02-DIJ42(4) 2348.qxd 6/9/08 1:46 PM Page 320