THE USE OF POST-INTERVENTION DATA FROM WAITLIST CONTROLS TO IMPROVE ESTIMATION OF TREATMENT EFFECT IN LONGITUDINAL RANDOMIZED CONTROLLED TRIALS
DISSERTATION
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the
Graduate School of The Ohio State University
By
Kimberly A. Walters, B.S.
*****
The Ohio State University
2008
Dissertation Committee: Approved by
Joseph S. Verducci, Adviser Haikady N. Nagaraja Adviser William I. Notz Graduate Program in Biostatistics c Copyright by
Kimberly A. Walters
2008 ABSTRACT
In medicine and public health research, the randomized delayed-intervention con- trolled trial (RDICT), also known as a wait-listed or stepped wedge design, is com- monly used to study overt, slow-acting treatments in comparison to a control condi- tion over time. Ten RDICT designs are specified as generalizations of the motivating example, a longitudinal psychology study of a psychoeducational intervention for chil- dren with bipolar disorder. These designs vary according to number of observation occasions, time between observations, and length of delay before the control group receives treatment.
Two estimators of fixed effects in separate linear mixed effects (LME) models, ˆ ˆ θ2 and θ1, are proposed to measure treatment effect based on data from an RDICT design. The LME models have a piecewise linear mean structure, allowing phases for treatment, placebo, and leveling-off effects. The treatment effect is traditionally conceptualized as the difference in slopes between the immediate treatment (IT) and pre-intervention control groups, which we call θ1.
Alternately, in an RDICT design, the treatment effect can be the change in slope post-intervention in the delayed-treatment (DT) control group, called θ0. The full model, which allows these treatment effects to differ, produces the standard estimator, ˆ θ1. A reduced model, nested within the full one, forces the inter and intra treatment ˆ effects to be identical and produces the novel estimator, θ2.
ii ˆ ˆ A simulation study was conducted to observe the relative efficiency of θ2 to θ1 as it varies over the 10 RDICT designs and 8 scenarios, which differ in size of treatment effect, intraclass correlation, and sample allocation to DT group.
The best-performing and recommended RDICT design, called H2.5 with a DT:IT allocation ratio of 2:1, achieved a relative efficiency of 1.3 when the group-specific treatment effects are identical. The H2.5 design has the longest overall calendar duration of the 10 designs considered and is an extension of the design used in the motivating example study of childhood mood disorders.
iii This is dedicated to everyone suffering from invisible disabilities.
∗ ∗ ∗
“This spiritualist, this statistician, what are you anyway?” - Thomas Pynchon
iv ACKNOWLEDGMENTS
The writing of this dissertation is the culmination of a long journey, beginning with a few tentative steps, made possible with the belief and nurturing of my mother
Gerda, sister Michelle, and father Robyn. Along the way, I found support and un- derstanding from a very special group of women very nearly in my shoes, each on her own adventure of transformation and achievement. Equally essential to my suc- cess were the many dear friends who made my path gentler by holding my hand and bolstering my spirits, especially John and Kylene. I am also indebted to my health care providers Drs. Mary Kiacz, Lee Cohen and Kitty Soldano, who cared for my whole wellness; my employer-mentors Drs. Mary Fristad, Ram Tiwari, Tom Bishop and Chris Holloman, who gave me a chance as patient role models; and my eternally supportive adviser Dr. Joe Verducci, who showed me the business of ethical statistical practice and helped me negotiate my invisible wall.
v VITA
November 14, 1974 ...... Born Maine, USA
1996 ...... B.S. Chemistry Rensselaer Polytechnic Institute
2002-2004 ...... Graduate Teaching Assistant OSU Statistics
2004-2006 ...... Graduate Research Assistant OSU Psychiatry
2006-2008 ...... Statistical Consultant OSU Statistical Consulting Service
2007 ...... Cancer Research Training Fellow National Cancer Institute
PUBLICATIONS
Research Publications
Y. Li; R. Tiwari; K. Walters; J. Zou “A Weighted-Least-Squares Estimation Approach to Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions”. Statistics in Medicine, in submission.
M. Fristad; J. Verducci; K. Walters; M. Young “The Impact of Multi-Family Psy- choeducation Groups (MFPG) in Treating Children Aged 8-12 with Mood Disorders”. Archives of General Psychiatry, in revision.
vi FIELDS OF STUDY
Major Field: Biostatistics
Studies in: Longitudinal Data Prof. Joseph Verducci Behavioral Interventions Prof. Mary Fristad
vii TABLE OF CONTENTS
Page
Abstract ...... ii
Dedication ...... iv
Acknowledgments ...... v
Vita ...... vi
List of Tables ...... xii
List of Figures ...... xiii
Chapters:
1. Introduction ...... 1
1.1 Domain ...... 1 1.2 Contribution to the Field ...... 1 1.3 Motivation ...... 1 1.3.1 Motivating Example ...... 2 1.3.2 RCT: The Standard Design ...... 2 1.3.3 RDICT: The Modified Design ...... 3 1.3.4 Self or Other as Control ...... 3 1.3.5 Overt Interventions ...... 4 1.4 Treatment Effect ...... 4 1.4.1 Study Objectives ...... 4 1.4.2 Definition ...... 5 1.4.3 Theory and Concept ...... 7 1.4.4 Operationalization ...... 8 1.5 Research Questions ...... 10
viii 1.5.1 Model Assumptions ...... 10 1.5.2 Design Issues ...... 11
2. Literature Review ...... 12
2.1 Introduction ...... 12 2.2 Repeated Measures ...... 12 2.3 Cross-sectional versus Longitudinal Studies ...... 13 2.4 Types of Treatment Effects ...... 14 2.4.1 The Effect in Cause and Effect ...... 14 2.4.2 Characteristics of Effect ...... 15 2.4.3 Characteristics of Condition ...... 16 2.5 Quasi-Experimental Designs ...... 17 2.5.1 No Controls ...... 17 2.5.2 Switching Replications ...... 17 2.6 Experimental Designs ...... 18 2.6.1 Cross-Over Studies ...... 18 2.6.2 Randomized Controlled Trials ...... 18 2.6.3 Longitudinal Designs ...... 19 2.7 “Classic” Wait-listed Design ...... 19 2.8 Models ...... 20 2.8.1 Independent Observations ...... 20 2.8.2 Correlated Data ...... 21 2.8.3 Marginal and Mixed Models ...... 24 2.8.4 Linear Mixed Models ...... 26 2.9 Estimation ...... 27 2.9.1 Likelihood Functions ...... 27 2.9.2 REML ...... 28 2.10 Inference ...... 29 2.10.1 Likelihood Ratio Test ...... 29 2.10.2 Wald Test ...... 30 2.11 Evaluation of Estimators ...... 30 2.11.1 A Sample Size Calculation ...... 31 2.12 Missing Data ...... 32 2.13 Summary ...... 32
3. Models and Data ...... 33
3.1 Introduction ...... 33 3.2 MFPG: Study Description ...... 33 3.2.1 Types of Condition and Intervention ...... 33 3.2.2 Study Design ...... 36
ix 3.3 Data Structure ...... 37 3.3.1 Outcome and Design ...... 37 3.3.2 Exploratory Data Analysis ...... 37 3.3.3 Missingness ...... 39 3.4 Rationale for Model ...... 41 3.4.1 Conceptual Motivation: Dynamic Modeling ...... 41 3.4.2 Linearization Phases ...... 43 3.4.3 Elements of the Model ...... 45 3.4.4 Choice of Random Effects ...... 46 3.5 Model for MFPG Data ...... 47 3.5.1 Time Convention ...... 47 3.5.2 Delayed Treatment Group ...... 48 3.5.3 RCT: Both Groups Without DI ...... 48 3.5.4 An Entrance-Centered Model for RDICT ...... 49 3.5.5 A Treatment-Centered Model for RDICT ...... 50 3.5.6 Inference ...... 51 3.6 Narrowing the Universe of Models ...... 51 3.6.1 Models: Mean Structure ...... 51 3.6.2 Models: Relative Treatment and Placebo Effects ...... 53 3.6.3 Models: Variance Components ...... 53
4. Simulation Study ...... 55
4.1 Introduction ...... 55 4.2 Objective ...... 55 4.3 Notation ...... 56 4.4 Narrowing the Universe of Designs ...... 56 4.4.1 Observation Times: Resolution h of Unit Time ...... 58 4.4.2 Length of Delay d in Waitlist Control Group ...... 59 4.4.3 Summary of Constraints and Simulation Designs ...... 59 4.5 Protocol ...... 62 4.5.1 Software ...... 62 4.5.2 Scenarios ...... 62 4.5.3 Starting Seeds for Random Number Generation ...... 64 4.5.4 Level of Dependence Between Simulated Datasets ...... 64 4.5.5 Scenarios: Sample Size n ...... 64 4.5.6 Scenarios: Size of Treatment Effect λ ...... 65 4.5.7 Scenarios: Covariance Structure ρ ...... 66 4.5.8 Number of Simulations M ...... 66 4.5.9 Results Stored From Each Run ...... 67 4.5.10 Summary Measures of Performance ...... 67 4.5.11 Criteria for Comparison ...... 68
x 4.6 Results ...... 69 4.6.1 Results for MFPG Design and Extension ...... 69 4.6.2 Theoretical Standard Error ...... 70 4.6.3 Model-Based versus Empirical Standard Errors ...... 71
5. Conclusions ...... 76
5.1 Distinguishing Features of the 10 Designs ...... 76 5.2 Standard Errors over the Designs ...... 76 5.2.1 Intra Estimator ...... 78 5.2.2 Inter Estimator ...... 79 5.2.3 Combined Estimator ...... 79 5.3 Relative Efficiencies over the Designs ...... 80 5.3.1 Balanced Sample Allocation ...... 81 5.3.2 Unbalanced Sample Allocation ...... 81 5.3.3 Other Allocation Plans ...... 83 5.4 Unanswered Questions ...... 87 5.5 Recommendations ...... 88
Bibliography ...... 90
xi LIST OF TABLES
Table Page
3.1 Within-Subject Correlation in MFPG ...... 39
3.2 Attrition Rates in MFPG ...... 39
4.1 Notation for Time Variables ...... 56
4.2 Notation for Simulation Parameteres ...... 57
4.3 Summary of Simulation Designs ...... 57
4.4 Possible Design Observation Times ...... 58
4.5 Results for MFPG Design ...... 69
4.6 Results for Enhanced MFPG Design ...... 70
xii LIST OF FIGURES
Figure Page
1.1 Interpretations of Treatment Effect ...... 6
1.2 Conceptual Plot of Treatment Effect ...... 9
3.1 Average Response Behavior by Treatment Group for Entire Sample . 34
3.2 Example Response Behavior in Immediate Treatment Group . . . . . 35
3.3 Example Response Behavior in Delayed Treatment Group ...... 36
3.4 Between-Time Scatter Plots for Detrended Data ...... 38
3.5 Response Behavior by Missingness Patterns ...... 40
3.6 Conceptual Plot of Response with Delayed Intervention ...... 42
3.7 Conceptual Plot of Linearized Response with Delayed Intervention . . 43
3.8 Average Response Behavior by Treatment Group for Treated Subset . 44
3.9 Simulation Mean Profiles ...... 52
4.1 Treatment Period Divided Into Thirds ...... 60
4.2 Treatment Period Divided Into Halves ...... 61
4.3 Illustration of RDICT Designs Considered ...... 63
ˆ 4.4 Theoretical Standard Error for θ? ...... 72
xiii ˆ 4.5 Theoretical Standard Error for θ2 ...... 73
4.6 Theoretical, Model-Based, and Empirical Standard Errors ...... 74
4.7 Theoretical, Model-Based, and Empirical Relative Efficiencies . . . . 75
5.1 Design Properties: Calendar Duration ...... 77
5.2 Simulation Results: Relative Efficiencies ...... 82
5.3 Thoeretical Standard Errors Over Full Range of Group Allocation . . 84
5.4 Thoeretical Relative Efficiencies Over Full Range of Group Allocation 85
5.5 Comparison of Thoeretical Relative Efficiencies Over Full Range of Group Allocation ...... 86
xiv CHAPTER 1
INTRODUCTION
1.1 Domain
This dissertation addresses longitudinal studies of overt, slow-acting behavioral- type interventions in human populations with chronic pathology where the number of subjects n well exceeds the number of time points m (n m respectively) with probable attrition and possible spontaneous remission. The statistical advantages of a wait-listed design, called the randomized delayed-intervention controlled trial
(RDICT), are considered.
1.2 Contribution to the Field
While post-intervention observations for the control group are generally disre- garded in a wait-listed design [2], the present research investigates possible statistical advantages to utilization of these data in estimation of the treatment effect (TE).
1.3 Motivation
In medical studies involving human subjects, investigators may wish to offer a proposed treatment to all participants for ethical and practical reasons. In standard
1 clinical trials, treatment is withheld from a subset of the study participants, who
form a control group for comparison and who receive a placebo or the best-available
treatment.
If treatment is given to the control group after a delay, as in the randomized
delayed-intervention controlled trial (RDICT), both groups experience the effect of
treatment, sooner or later. The motivation for the current research is to take advan-
tage of this fact in estimation of treatment effect for such a wait-listed design.
1.3.1 Motivating Example
The motivating example is the Multi-Family Psychoeducation Group (MFPG)
study [8], conducted by Dr. Mary Fristad between 2002 and 2006 at The Ohio State
University (OSU) Medical Center, of n = 165 children diagnosed with bipolar mood disorder. The participants were observed m = 4 times over 18 months after random assignment to receive treatment at baseline or after a delay of one year. This way, all the severely impaired children enrolled in the study were offered a potentially beneficial intervention, and participants in the control group had an incentive to return for follow-up.
1.3.2 RCT: The Standard Design
The gold standard [7, 10] for evaluating evidence-based treatments in medical and public health research is the randomized controlled trial (RCT). The main ingredient of an RCT is the random assignment of subjects to either a treatment or control group. The presence of such a control group enables investigators to attribute any difference between the two groups, the so-called treatment effect, to the intervention.
A desirable feature of longitudinal, as opposed to cross-sectional, studies is repeated
2 observations on the same subject over time, which facilitates measurement of change.
Longitudinal RCTs, also known as parallel designs, are superior to observational stud- ies or quasi-experiments in establishing a causal relationship between an intervention and a health benefit [18].
1.3.3 RDICT: The Modified Design
The randomized delayed-intervention controlled trial (RDICT), also called the wait-listed or stepped wedge design, as illustrated by the example study on bipolar children [8] presented in more detail hereafter, is an extension of the standard lon- gitudinal RCT. The only difference is that subjects assigned to the control group receive treatment after a fixed, pre-specified delay period. There are clear ethical and practical advantages to the RDICT study design. It is the aim of this research to explore the statistical advantages of RDICT and provide guidelines for its use in practice.
1.3.4 Self or Other as Control
In crossover designs or no-control quasi-experiments [21], a person is compared to him- or herself under different conditions, often using a pre-post difference score.
The control arm of the RDICT stands on its own as a so-called interrupted time series design and allows this type of self as control analysis. In this case, the outcome trajectories for a subject are compared prior to and following the delayed intervention, and each participant serves as his or her own control. On the other hand, in the standard RCT design, the treatment group is compared to a wholly distinct other and initially equivalent (via randomization) control group.
3 1.3.5 Overt Interventions
The delayed-intervention design is ideal for measuring the treatment effect of an overt intervention for an undesirable medical pathology or social condition. Overt treatments are those from which participants cannot be blinded, such as psychological or physical therapy, surgical operation, or community education, which are fairly obvious to the particpant.
With the standard RCT design, since the treatment delivery is overt to the partic- ipant, control group members may become demoralized and drop out, knowing they will never receive intervention. On the other hand, the RDICT design is appealing for this sort of intervention since all the subjects receive treatment at some point during the course of the study.
Even when participants in a study cannot be masked to their treatment group membership, it is crucial that researchers performing outcome evaluation are never- theless blinded as is usual in the double-blinded study design.
1.4 Treatment Effect
Since the main objective of most medical intervention studies is to measure the effect of a treatment, it is of primary concern to define and operationalize a meaningful concept of treatment effect (TE).
1.4.1 Study Objectives
The research goal in both study designs is generally to estimate the effect of treatment, requiring a comparison between the treatment and control groups. The usual aim of a randomized controlled trial is two-fold: first, to determine whether
4 there is a treatment effect via hypothesis testing, and, second, to estimate the size of that effect via point estimation. When using linear models, a common test of a nonzero treatment effect is the likelihood ratio test, which compares a full model including the treatment effect to a reduced model with no such effect. The power of this test equals the probability of rejecting the null hypothesis of no treatment effect given a certain nonzero effect.
1.4.2 Definition
One way to formalize treatment effect in a longitudinal study is to compare the rates of changes, i.e., slopes or angles, in an outcome measurement between the treatment and control groups. Alternately, treatment effect can be conceptualized as a comparing the extents of ultimate improvement rather than rates of getting there, as shown in Figure 1.2. These are equivalent if the time required to achieve full effect of treatment is set to 1 unit, as is done in models to follow. The relationships between slopes, angles, and absolute differences are illustrated in Figure 1.1.
The concept of a treatment effect naturally decomposes into two main ingredients.
The first ingredient is an observed change in the main outcome over time, anticipated to be in the direction of improvement. The second ingredient is the attribution of this change to the treatment, as isolated from other environmental or random influences.
Whether the effect is seen as additive or multiplicative depends on whether the change over time is modeled as a linear or nonlinear process. In either case, the second ingredient above defines treatment effect as the change in the treated group above and beyond the change in the controls. This definition requires a comparison between the two groups.
5 φ ψ λE = tan(φ) γ
CONTROL
E = tan(φ + ψ)
(1 − λ)E γ + θ Symptom Severity
TREATMENT
1
Time
Figure 1.1: Interplay of Various Conceptualizations of Treatment Effect (TE). The green and red lines represent the mean profiles for the treatment and control groups, respectively. E is the overall improvement in the treatment group over the TE period, which is a single unit of time by definition, implying a group-specific slope of γ + θ = −E. The placebo effect (PE) is represented by the angle φ or by the slope in the control group, γ = −λE. TE can be represented by the angle ψ or the difference in group slopes, θ = −(1 − λ)E.
6 In the presence of waitlist controls, it may be possible and even desirable to incorporate the post-intervention outcomes of the waitlist controls into the estimate of treatment effect. The feasibility of this incorporation depends on how similarly the two groups respond following intervention. Since the control group may experience a placebo effect, there is not as much room for improvement on the outcome scale as in the primary treatment group. In the ideal case considered here, the delayed intervention group will improve at the same rate as the primary group and level off at the same destination outcome value, as shown later in Figure 3.6.
1.4.3 Theory and Concept
In physical experiments, researchers impose conditions on a system and then mea- sure variables of interest. As with medical research, effects of the experimental con- ditions may be instantaneous or gradual, temporary or permanent. One factor that distinguishes human experiments from physical is the well-known placebo effect. The conscious expectation of change by a study participant may have an effect, in absence of any other treatment. Mere recruitment into and participation in a study can incite the placebo effect. Such is the power of the human mind!
Ideally, to determine treatment effect, one would employ a time machine. A particular subject would be given a proposed treatment and observed sufficiently over time from delivery. Then, going back in time to the point of delivery, the subject would be observed withholding treatment. That way, all the internal and external influences would be identical, with the exception of treatment. Any difference would be clearly the effect of treatment.
7 Even in this ideal case, there would remain the question of whether treatment effect varies according to individual characteristics. Often, the aim is to determine the average effect over an entire population. Random sampling from that population is desirable to approach the true overall mean in estimation.
Often in clinical trials, the population of interest is all persons with a particular diagnostic criterion. A simple random sample of this population is near impossible, since a complete sampling frame includes those with undiagnosed illness. In addition, some diseases make participation in a study very difficult. Convenience samples are common in practice and introduce questions of generalizability.
Lacking a time machine, the best feasible alternative is to sample two groups which equally represent the population of interest and let only one group receive in- tervention. The other group represents what would have happened to the first group without intervention, taking the time machine trip to simulate the impossible coun- terfactual. Random allocation of recruited participants to the two groups achieves the goal that they equally represent the population, with large enough sample sizes.
This research is concerned with overt treatments, which cannot be concealed or bluffed to the subjects. In this case, the second, or control group, will not experience the same placebo effect as if taking a sugar pill, when ignorant of treatment group status. An effect is still possible, however, and cannot be ruled out in any case.
Controls remain a necessity.
1.4.4 Operationalization
For a linear model with first-order time as the principal covariate, the treatment effect can be operationalized as the difference in slopes, either between two groups or
8 CONTROL
TREATMENT EFFECT Symptom Severity
TREATMENT
Time
Figure 1.2: Possible Conceptualization of Treatment Effect. The green and red lines represent hypothetical continuous mean profiles for the treatment and control groups, respectively, over time from entry into a randomized controlled trial. The treatment effect (TE) may be defined as the ultimate difference in outcomes between the groups following stabilization of placbo and treatment effects. Alternately, TE can be a function of the rates to reach these endpoints in outcome.
at a change-point within a group. In a nonlinear setting, with non-Gaussian outcomes or a nonlinear relationship with time, a difference or ratio between group parameters is established, ideally with a convenient interpretation. As seen in Figure 1.2, the treatment effect may simply be the difference in endpoints between a control and treatment group.
9 1.5 Research Questions
The reseach questions for this dissertation concern a comparison of two possible ˆ ˆ treatment effect estimators, θ1 and θ2, both of which are functions of data observed in a randomized delayed-intervention controlled trial (RDICT). The first estimator, ˆ ˆ θ1, represents the standard between-group comparison, while the second, θ2, incor- porates the post-intervention data from the control group, thus combining between- and within-group comparisons into a single estimator.
This comparison of estimators translates into a practical question for medical researchers. Is there a statistical advantage, beyond the ethical and practical ad- vantages, to utilizing the RDICT design instead of the standard RCT design when studying behavioral interventions? Since the answer may depend on model and design parameters, simulation studies were conducted to examine various scenarios.
The relative efficiency of the two estimators was the primary evaluation criterion for comparison. In addition, power and mean squared error (MSE) were considered.
1.5.1 Model Assumptions
Within the linear model developed here, two model features were varied over the scenarios for simulation. The first modification, controlled by λ, was the size of the treament effect relative to the size of the placebo effect. Secondly, the correlation coefficient, ρ, determined the relative sizes of between- and within-subject variation for the exchangeable covariance structure.
10 1.5.2 Design Issues
Important features of the RDICT design were controlled. They were the number m and spacing h of time points, the delay d for the control group, and sample sizes nk for each group k = 0, 1. It of interest whether a particular design accentuates any ˆ benefit of the RDICT-specific estimator, θ2.
11 CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
The purpose of this chapter is to provide an overview of the history of longitu- dinal methodology, including modeling and design considerations, with randomized controlled trials as a starting point. A review of literature from the fields of statistics and medicine, as well as social research including psychology, provides the foundation for the current research question. The choices of design and model depend on the context of the medical disorder and type of intervention.
2.2 Repeated Measures
As Paul Albert stated in his 1999 article [1], there are three reasons to measure the same experimental unit over time. The first reason is to estimate measurement error or instrument consistency. It is unavoidable that time elapses between subsequent measurements since they cannot be made simultaneously. The second reason is for data safety monitoring purposes. As has been seen in recent years, as in the notewor- thy example of a large-scale hormone replacement therapy trial, studies are subject to early termination when clear benefit or detriment is identified. The third reason,
12 relevant to this research, is to evaluate change over time, such as the gradual effect
of a slow-acting treatment, including maintenance or loss of effect during followup.
In considering the broad problem of measuring and explaining change, Ian Plewis
[17] classified change models into linear, nonlinear, and stepwise, as a function of
time, age, group membership, or other covariates.
2.3 Cross-sectional versus Longitudinal Studies
Cross-sectional observations are taken at a single point in history and capture an
association between variables in a particular slice of time. Longitudinal observations
are taken over time, generally on the same experimental units, although there are
epidemiological studies that have repeated cross-sectional observations. Longitudinal
studies allow investigators to track subjects over time and establish any effect after
an intervention.
Diggle et al. [6] demonstrated that cross-sectional associations may disguise lon-
gitudinal effects in the presence of cohort differences. Suppose βC is the slope for the cross-sectional regression relationship between the outcome and predictor variables over the sample at time 1 and that βL is the longitudinal slope for the change in outcome within each individual as the predictor changes over time.
When cross-sectional studies are used to draw conclusions about development over time, the assumption that βC = βL is made. This assumption is only applicable when there are little or no cohort effects, for example, when different birth cohorts pass through the same outcome levels at comparable ages.
When two groups are evaluated at different points in time, their differences can reflect a cohort effect, especially with young patients who are still developing. This is
13 relevant to the question of whether delivery time affects treatment effect in wait-listed designs.
Relevant to present research, Diggle et al. [6] noted that “each person can be thought of as serving as his or her own control” in a longitudinal study. This is similar to the idea of using the baseline, or pre, observation as a covariate in analysis of covariance (ANCOVA) with pre-post data.
A persistent question with repeated measures is the relative amounts of varia- tion between subjects and within subjects. Longitudinal studies become more ad- vantageous as the within-subject variation increases relative to the between-subject variation [6].
2.4 Types of Treatment Effects
There are different scenarios requiring treatment, and additionally there are dif- ferent ways in which the effect of a treatment may manifest. In order to estimate treatment effects, it is a useful exercise to systematically categorize their types to distinguish appropriate methods.
2.4.1 The Effect in Cause and Effect
The effect of treatment is defined relative to the natural progression of the disease condition in the absence of intervention, but perhaps in the presence of a placebo.
Shadish, Cook, and Campbell [21] state that “a central task for all cause-probing research is to create reasonable approximations to this physically impossible coun- terfactual.” The counterfactual in a longitudinal study is what would have happened to subjects had they not been exposed to treatment. For example, it is desirable to
14 know the “treatment-free estimate of rate of change per time interval” when studying
“spontaneous linear changes.”
2.4.2 Characteristics of Effect
Shadish et al. [21] categorize effects by their form, permanence, and immediacy.
Although they present this taxonomy in the context of a so-called interrupted time series, it has more general application.
The forms of treatment effect may include level, trend, variance, and cyclical properties. The first two forms listed correspond to intercept and slope, respectively, in a simple linear model of time. In the presence of random assignment to treatment groups, it is expected that there are no intercept differences at baseline. The specific subforms corresponding to a treatment effect in the form trend or drift is highly dependent on the model assumed for the mean structure, whether linear in time or not.
An effect is labeled according to its permanence as continuous if the effect is per- sistent or discontinuous if it dissipates over time. Continuous effects may be present only when an active treatment is present, such as blood pressure medicine or birth control pills. It may be that a continuous effect explores a simple state space, such as a vasectomy, which effects an essentially irreversible switch from fertile to sterile.
A continuous effect may require sustained treatment, such as daily medication for a chronic condition, or may be the result of a one-time exposure, such as a childhood vaccine or educational program.
On the other hand, the desired effect may be discontinuous, requiring only a finite course of treatment, such as antibiotics to cure a bacterial infection, reestablishing a
15 natural and self-sustaining equilibrium of biological flora. Lastly, the discontinuous
efficacy of a treatment may peak and then diminish after some time, such as for some
vaccinations.
An effect is called, according to its immediacy, immediate or delayed depending
on the time period between introduction and effect of a treatment. Immediate effects
are perhaps simplest to observe since no followup is required. Delayed effects may
complicate study in that experimental units must be monitored over time, the amount
of delay may be unknown or inconsistent, and humans are harder to keep track of
than laboratory mice.
There are some practical limitations to combinations of these characteristics. For
example, with form and permanence, a continuous trend effect most likely has a limit,
namely at zero symptoms or in a normal range for healthy humans. A discontinuous
effect may result from removal of treatment, ceiling or floor effects, or a need for a
booster treatment.
2.4.3 Characteristics of Condition
An undesirable health or social condition may be classified according to its perma- nence, just as with the effect taxonomy. The condition may be acute and temporary, like a rash or cold, or chronic and potentially progressive, such as pain or dementia.
There is an interplay between effect permanence and condition permanence. A temporary condition only demands a discontinous effect. A chronic condition requires a continous effect, either through sustained active exposure to a treatment or via a permanent intervention. Treatment, continuous or discontinued, depends on whether
16 the condition being treated is chronic or acute and whether a permanent cure is possible.
2.5 Quasi-Experimental Designs
Studies without randomization, without comparison groups, or with nonequivalent control groups are called quasi-experimental.
2.5.1 No Controls
In Shadish et al.’s repeated-treatment design in Chapter 4 [21], treatment is de- livered, removed, and then reintroduced to a single group of participants at a later occasion. This design is practical only with discontinous effects. It is of note that the authors consider the “most interpretable outcome of this design” to be the case where the treatment effect is similar on both exposures to the treatment and in the opposite direction from the change upon removal.
2.5.2 Switching Replications
In Shadish et al.’s switching replications design in Chapter 5 [21], the investigator initially gives treatment to one of two nonequivalent groups and “administers treat- ment at a later date to the group that initially served as a no-treatment control.”
The effect in the second group, a modified replication, may not be identical to that in the first due to the different context. The permanence of the treatment effect in the first group determines whether this group can be used as a control for the second.
The authors claim “the design is still useful even if the initial treatment continues to have an impact, especially if the control group catches up to the treatment group once the control group receives treatment.”
17 2.6 Experimental Designs
Studies utilizing randomization to treatment groups are called experimental.
2.6.1 Cross-Over Studies
Cross-over designs utilize randomization and expose subjects to multiple treat- ments in succession. They are useful in studying chronic conditions and temporary interventions. One main challenge with cross-over studies is the carryover effects between one period of treatment and the next. They are not appropriate for inter- ventions with a long-lasting or permanent effect.
2.6.2 Randomized Controlled Trials
Randomization or random allocation of study participants to treatment status allows practical and statistical assumptions regarding the equivalence, comparability, and independence of the groups in order to reduce bias [18]. Controls approximate the desired counterfactual and establish the natural disease history or placebo effects in the context of the study. The outcome data for the control and treatment groups are compared to establish an effect for the novel proposed intervention.
The randomized controlled trial (RCT) came into popular use after World War II as the power of the placebo effect was widely recognized around 1955 [10]. According to Kaptchuk, “the placebo became the emblem for all the healing occuring in the dis- guised ‘no-treatment’ arm of an RCT,” including “nature taking its course; regression to the mean; routine medical and nursing care; regimens such as rest, diet, exercise,
18 and relaxation; easing of anxiety by diagnosis and treatment; the patient-doctor rela- tionship; classic conditioning and learnt behaviours; the expectation of relief and the imagination; and the will and belief of both the patient and the practitioner.”
2.6.3 Longitudinal Designs
In discussing the practical problems with longitudinal designs, Shadish et al. [21] note “it is not always ethical to withhold treatments from participants for long periods of time, and the use of longitudinal observations on no-treatment or wait-list control- group participants is rare because such participants often simply obtain treatment elsewhere.”
2.7 “Classic” Wait-listed Design
C. Hendricks Brown et al. [2] propose an extension to the wait-listed design in their 2006 article on youth suicide prevention trials. They state that “data from the second phase cannot be used to assess intervention impact because there is no control group to compare over that time period.” It is taken for granted that delayed treatment is given solely to satisfy the community partners.
This declaration is provocative in the context of this research, which aims to utilize the second phase data, i.e., post-intervention observations in the wait-listed
(or delayed intervention) group. The challenge is to make a case that these data need not be discarded and ignored. Rather, the so-called second phase data may provide a statistical advantage to the RDICT (also called wait-listed) design.
The wait-listed design is a special case of the stepped wedge trial design as dis- cussed by Celia Brown and Richard Lilford [3]. The stepped wedge design allows more
19 than two randomized groups, each corresponding to a later time for introduction of treatment.
The RDICT design essentially contains two experiments, a within-subjects design and a between-subjects design. The latter, standard RCT, compares the immediate treatment (IT) and delayed treatment (DT) groups. The former quasi-experiment compares the DT group to itself before and after intervention. In an overview of behavioral experimental design, the German psychologist Joachim Krauth [11] warns against the former, conceding “one cannot rule out that within-subjects designs may produce the same results as between-subjects designs if certain assumptions are valid, which are difficult to check”.
Krauth continues, “If we keep in mind that, in general, only appropriately per- formed between-subjects designs admit a causal interpretation, it is obvious which results can be trusted when using both kinds of designs for the same problem.” The
RDICT design may present an opportunity for both results to be observed in harmony within a single study to strengthen the evidence for or against a treatment effect.
2.8 Models
The linear mixed model established by Laird and Ware in 1982 [12] is built upon a history of more simple models presented here.
2.8.1 Independent Observations
In both longitudinal and cross-sectional studies, a basic modeling approach is the linear model (2.1), which regresses a response variable for each of N units on one or ˆ many (p) covariates, resulting in the ordinary least squares (OLS) estimate, βOLS = (XT X)−1XT y, which minimizes the residual sum of squares, RSS = (y − Xβˆ)(y −
20 Xβˆ)T without any distributional assumptions. In this model, y is the outcome vector,
T X is the design matrix, β = (β1, . . . , βp) are the regression parameters, and are
the random measurement errors.
y = X β + , (2.1) N×1 N×p p×1 N×1
T T T T T T y = [y1 ,..., yn ] X = [X1 ,...,Xn ] . (2.2) N×1 N×p
During this notation development, in order to anticipate extension of this simple model, suppose the N units are partitioned into n clusters of size mi so that N = Pn i=1 mi . In longitudinal medical trials, the clustering mechanism reflects i = 1, . . . , n
N human subjects observed at j = 1, . . . , mi times each, where often mi ≡ m = n . An effort will be made to use this notation througout this document.
T T The individual-level outcomes yi = (yi1, . . . , yim) and covariates Xi = [xi1,..., xim] m×p
are the building blocks of the entire-study variables (2.2), where xij is the p × 1 vec- tor of covariates for subject i at time j. The error term is similary built from the
T individual errors i = (i1, . . . , im) .
Under the assumption of independent error terms with distribution ∼ MVN(0, σ2I), ˆ the OLS estimator βOLS is also the maximum likelihood estimator (MLE). This as-
iid 2 sumption implies Yi ∼ MVN(Xiβ, σ I).
2.8.2 Correlated Data
Longitudinal studies vary considerably according to the relative values n and m as well as the variable types. Cases where there are many observations (large m) for
few subjects (n) may inspire a time series or functional data approach. Link functions
21 are used in generalized linear models for discrete outcomes, and survival analysis can be employed for outcomes that indicate an event occurrence.
The main challenge of longitudinal data is the nonzero correlation of outcome variables within a subject, and the OLS solution is hence insufficient since it assumes independence of the measurement error terms in . The impact of this within-subject correlation, generally in the positive direction, depends on whether the comparison of interest compares multiple groups at the same time or different times within an individual.
A general rule of thumb [6] is that positive within-subject correlation increases the variance in estimation of differences between groups and decreases it for comparisons within a subject. Clearly, increases in variance for an estimator diminish the power of any corresponding tests, requiring larger sample sizes. The intuition behind this general rule is that highly correlated observations within a subject leave the remaining bulk of variation between the subjects, making it harder to summarize groups of individuals.
The standard linear model 2.1 is extended to the longitudinal context in
yi = Xiβ + i for independent units i = 1, . . . , n, (2.3)
ind where Yi ∼ MVN(Xiβ, Vi), the variance matrices Vi are a function of the covariance
2 parameter α, and a common simplifying assumption dictates Vi ≡ σ V0. ˆ ˆ The weighted least squares solution, βW LS, minimizes the RSS = (y −Xβ)W (y − Xβˆ)T , where W is some symmetric weight matrix. The choice of W is an important part of estimating robust (or empirical) standard errors for βˆ, and W −1 is called the working variance matrix.
22 For short evenly-spaced time sequences with minimal missingness, estimation of
β and of Var(βˆ) via WLS, weighted least squares, is robust to misspecification of the covariance structure. Unless inference regarding α is desired, simple correlation assumptions suffice [6].
Combining the outcomes of distinct subjects in (2.3) by vertically stacking entries of Yi and Xi to compose Y and X, respectively,
Y ∼ MVN(Xβ,H ≡ σ2V ), (2.4)
where the unscaled covariance matrix H ≡ I ⊗Vi, and the scaled covariance matrix n×n 2 V ≡ I ⊗V0 under the common individual covariance (Vi ≡ σ V0) assumption with n×n σ2 as the common within-subject variance.
The general and two specific weighted least squares estimates for β are
ˆ T −1 T βW LS = (X WX) X W y
ˆ T −1 T βOLS = (X X) X y
ˆ T −1 −1 T −1 βGLS = (X V X) X V y,
ˆ where the weight matrices are the identity in βOLS and the inverted, scaled variance ˆ ˆ matrix in βGLS, the generalized least squares (GLS) estimator. Note that βOLS is the same OLS estimate referred to as βˆ above.
The variance of the WLS estimator is
ˆ T −1 T T −1 RW ≡ Var(βW LS) = {(X WX) X W }H{WX(X WX) } and reduces in the special GLS case to formulation in equation (2.5).
ˆ T −1 −1 Var(βGLS) = (X H X) (2.5)
23 ˆ ˆ Diggle et al. [6] compare the efficiency of βGLS relative to βOLS in the cases of compound symmetry, exponential correlation, and a crossover design. The more Xi
and Vi vary between subjects, the less suitable is the OLS estimate.
The various approaches to longitudinal data analysis differ primarily in their han-
dling of V , the variance of the outcome random vector. Imbedded in the general model above are several assumptions that may not be true in some study designs, e.g., an equal number (m) of observations per subject, identical within-subject co-
2 variance (Vi = σ V0), and between-subject independence (the block diagonal form of
V ).
2.8.3 Marginal and Mixed Models
The two main linear models employed in longitudinal data analysis are marginal models and linear mixed effects (LME) models. They differ in distribution and cor- relation structure assumptions, estimation methods, parameter magnitude and in- terpretation, and separate versus joint modeling of the linear predictor (Xiβ) and variance components.
The primary goals of most longitudinal analyses are consistent and efficient esti- mators for the regression parameter (β) and variance (V ) as well as robust estimates of their corresponding standard errors for the purpose of valid inference. In addition, the analysis should address potential bias introduced by missing observations, a par- ticular concern in longitudinal trials. The choice of model necessarily depends on the aims of the study, specifically its hypotheses and their scope.
24 Marginal models are quite flexible in modeling the structure of the variation (V ) in the response variable while LME models assume particular structures to this vari- ation by invoking and conditioning on a latent random variable by subject. In some cases, these models for V are equivalent in some sense. For example, a random in- tercept LME model imposes the same structure to V as does the marginal model with an exchangeable correlation. The difference is that, with LME, the variation is partitioned into two parts, and inference is made conditional on one of these.
In marginal models, the population average of the response variable is regressed over a selection of covariates of interest. The models essentially integrate over all other attributes of the entire population for each level of the explanatory variable.
The parameters in a marginal model reflect the effect of the chosen covariates on the population average, which is often the interest in epidemiological or other public health applications.
Diggle, Heagerty, Liang, and Zeger [6] categorize longitudinal methods that do not collapse repeated measurements into a single summary statistics into three types.
The marginal analysis models estimate the parameters determining the population means, β, and variances, α, separately. Random effect models assume the regression coefficients vary randomly in the population. Transition models allow the distribution of an observation at a particular measurement occasion to depend on the previous outcome values and current covariates.
Diggle et al. [6] suggest that the choice between marginal and linear mixed effects models may depend on the relative sizes of variation between and within subjects.
When there is a large variability among subjects, within-subject comparisons are more precise than between-subject (or group) contrasts, which may benefit from LME
25 models. If, on the other hand, there is little variation between subjects, marginal
models may be appropriate.
Marginal models answer questions regarding the trend of a population mean over
time rather than the evolution of individual profiles. In the case of an outcome with
a Gaussian distribution and a linear contrast of interest, these are equivalent since
expectation is a linear operator.
2.8.4 Linear Mixed Models
In linear mixed effects models, a latent variable based on some unobservable co-
variates is assumed to distinguish the heterogeneity of the population. The model
parameters then explain the regression of the outcome on the observed covariates of
interest, within an individual stratum of the population sharing the same latent prop-
erty. The marginal models, on the other hand, overlook this sometimes inexplicable
heterogeneity by averaging it out over the entire population.
Marginal models are used when investigators are interested only in the effect of
population-specific covariates on an entire population. LME models assess covariate
effects within unidentified subsets of individuals who share an unmeasured or unmea-
surable property. The hierarchical nature of mixed models makes them analogous to
a Bayesian approach [5].
In the LME model (2.6), the random variation in (2.3) is divided into individual
random effects and measurement error. The Zi are generally a submatrix of the full design matrix Xi, and q ≤ p, where q and p are the numbers of random (bi) and
2 fixed (β) effects, respectively. It is commonly assumed that Σi = σ Im, or that the observations within a subject are independent given bi.
26 yi = Xiβ + Zibi + i, (2.6)
where
iid bi ∼ N(0,D) independent of
iid i ∼ N(0, Σi).
ind T The distributional assumptions in (2.6) imply Yi ∼ MVN(Xiβ,Vi ≡ ZiDZi +Σi).
2.9 Estimation
Of primary concern in statistical analysis is to produce “good” estimates of the mean structure parameters. Often, one or several parameters are of key interest and have an interpretable meaning with regard to the research objective, e.g., determining the effect of a proposed treatment. Maximum likelihood methods are presented here for mean and covariance parameters in linear mixed models.
2.9.1 Likelihood Functions
Under assumption (2.4), the likelihood function depends on the parameters (β,
2 ˆ 2 RSS(V0) V0, σ ). Substituting β, as a function of V0, for β allows for solutionσ ˆ = nm and reduction via profiling of log likelihood function to
m l(V ) = − {n log RSS(V ) + log(|V |)}, 0 2 0 0
P T −1 where RSS(V0) = i(yi−Xiβ) V0 (yi−Xiβ). Demidenko [5] derived a considerably simplified variance-profiled log likelihood function for a balanced observation random
intercept linear mixed model.
27 Often, assumptions about the mean structure affect the saturation of X. A fully
saturated model estimates a separate mean for each possible covariate condition.
2 Diggle et al. [6] note that misspecification of X will bias the estimates of σ and V0.
REML, restricted maximum likelihood, circumvents this bias.
In marginal models, it is often necessary to model β and α separately. When
Vi(α) is known, the likelihood of β is maximized by the standard MLE (2.7), which maximizes (2.8), where θ = (βT , αT )T . An alternate approach, restricted maximum
likelihood (REML) is often employed in order to estimate the variance components
parameter α.
n −1 n ˆ X T −1 X T −1 β = Xi Vi Xi Xi Vi yi (2.7) i=1 i=1
n Y − m − 1 1 T −1 L (θ) = (2π) 2 | V | 2 exp − (y − X β) V (α) (y − X β) (2.8) ML i 2 i i i i i i=1
2.9.2 REML
Restricted maximum likelihood estimation (REML) transforms the response vec-
T tor into U(nm−p) × 1 = A Y, called error contrasts, which have a distribution de-
pending only on the variance components and not on the fixed effects. The matrix A
has (nm − p) linearly independent columns also mutually orthogonal to the columns
of design matrix X. The resulting restricted likelihood (2.9) depends only on α. Al-
ternatively, the REML (joint) likelihood function (2.10) can be solved simultaneously
for θ to get REML estimators for β and α. In practice, MLE and REML estimators
differ drastically only in models including a large number (p) of explanatory variables.
28 n − 1 X T −1 2 ˆ L(α) ∝ Xi Vi Xi LML(β(α), α) (2.9) i=1
n − 1 X T −1 2 LREML(θ) = Xi Vi Xi LML(θ) (2.10) i=1
The likelihoods are optimized using iterative methods, such as procedures based
on Newton-Raphson. The parameter space is Θ = Θβ × Θα, where α determines Vi
and θ is an element of the space. Note that both D and Σi must be positive semi-
definite in the mixed effects model, resulting in a more restrictive parameter space
than for the marginal model, where this requirement applies only to Vi.
REML estimation decomposes the outcome vector Y into two parts, one depend-
ing only on β and the other only on α. For example, Y can be broken into its orthogonal projection onto the column-space of X and the OLS residual vector.
2 RSS(V0) The REML procedure results inσ ˜ = nm−p and a new reduced log likelihood of 1 l∗(V ) = l(V ) − log(|XT V −1X|). 0 0 2
The difference between ML and REML estimates is most pronounced when p is large
or V is near-singular.
2.10 Inference
Hypothesis testing for linear mixed models is achieved using various tests, includ-
ing likelihood ratio tests (LRT) and Wald tests.
2.10.1 Likelihood Ratio Test
The likelihood ratio test (LRT) statistic for comparing a reduced model nested within a full model equals −2 log Lreduced . Under the null hypothesis of the reduced Lfull
29 model, the LRT statistic follows a Chi-squared distribution with degrees of freedom
equal to that lost in the reduction between models.
2.10.2 Wald Test
Inferences involving the fixed effects involve the hypotheses (2.11) and asymptotic
2 d 2 distribution of a test statistic (2.12), where Ks×p is full-rank and Qn −→ χs(0) under n→∞ ˆ H0. This Wald test underestimates the standard error of β by ignoring variability
introduced by estimating α. Verbeke and Molenberghs [24] discuss approximate tests
(t and F) to compensate for this bias.
H0 : Kβ = 0 Ha : Kβ 6= 0 (2.11)
−1 n !−1 2 ˆ T X T −1 T ˆ Qn = [K(β − β)] K Xi Vi (αˆ)Xi K [K(β − β)] (2.12) i=1 The so-called sandwich (or empirical variance) estimator (2.13) for Var(βˆ) has
been shown to be robust to misspecification of the correlation structure but not to ˆ incorrect linear predictors of the mean or missing data, where ri = yi − Xiβ and
ˆ T Vi = riri .
n −1 n n −1 ˆ X T −1 X T −1 T −1 X T −1 Var(β) = Xi Vi Xi Xi Vi [riri ]Vi Xi Xi Vi Xi i=1 i=1 i=1 (2.13)
2.11 Evaluation of Estimators
In order to compare the two estimators for the RDICT design, it is necessary to establish a metric for the performance of the estimator. Besides minimizing bias,
30 increasing efficiency and maximizing power to detect practically significant treatment
effects are desirable. In addition, confidence interval coverage and mean squared error
(MSE) are meaningful assessment measures for a particular method [4].
There are a multitude of criteria used to evaluate the strength of a design. Tu et
al. [22] derive the power function for linear mixed models based on the asymptotic
variance of the parameter estimator. The power function is used by other authors as
well [14, 9].
Winkens et al. [25] use the c-optimality criterion of minimizing Var(cT βˆ) where the contrast vector c isolates the parameter of interest. Several authors [20, 9] have used a variance inflation factor (VIF) or “design effect”, which depends on the in- traclass correlation coefficient (ICC), ρ. Ouwens et al. [15] utilize the D-optimality criterion of minimizing det{Var(βˆ)}.
Winkens et al. [25] further utilize the c-optimality criterion to introduce the relative efficiency of two designs, which is the ratio of c-optimal variances.
2.11.1 A Sample Size Calculation
Although I devise and employ a crude sample size calculation for the simulation studies in section 4.5.5, a more elaborate one from the literature is presented below.
The per-group sample size required [6] to detect a between-group difference of d, in the slopes for the fixed and common covariate xm×1, between two groups with power P = 1 − Q and type I error α is
2 2 2(zα + zQ) σ (1 − ρ) n = 2 · 2 , d msx
2 where sx = Var(xm×1) and ρ is the correlation between any two distinct observations within a subject. Note that n decreases as ρ increases with this statistical question.
31 The second factor above is
T −1 Xi Ri Xi 2,2 for exchangeable correlation and can be generalized, where Ri = Corr(yi).
In a problem where group differences are of primary interest, change over time may be disregarded by collapsing observations within a subject to an average. The sample size then increases with ρ, since the variance of an average increases with the covariances between the terms being summed. Diggle et al. [6] demonstrate this reversal with continous and binary outcomes.
2.12 Missing Data
Missing data mechanisms are categorized as missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). The ignorability of these mechanisms depends on the method of estimation used. Two likelihood based approaches to modeling missing data are selection and pattern mixture models, which differ in the factorization of a joint likelihood for the response vector and missingness pattern.
2.13 Summary
The linear mixed effects (LME) model is a flexible and useful tool for fitting longitudinal data from a randomized controlled trial. In the next chapter, a LME model is fit to the data from the motivating example and generalized for the simulation study in the following chapter.
32 CHAPTER 3
MODELS AND DATA
3.1 Introduction
In this chapter, the analysis of the MFPG dataset is presented to motivate a general model framework for the simulation study in the next chapter.
3.2 MFPG: Study Description
The NIMH-funded Multi-Family Psychoeducation Group (MFPG) study [8], con- ducted by Dr. Mary Fristad of The Ohio State University (OSU), was a randomized controlled trial to assess the effect of a psychoeducational intervention by following the mood severity of n = 165 children at m = 4 evenly-spaced times over 18 months, as seen in Figure 3.1. To qualify for inclusion in MFPG, the children aged 8-11 years were diagnosed with either bipolar or major depressive disorder at baseline. In each case, a primary informant also participated in interviews and treatment. Families were enrolled quarterly in 11 cycles of 15 families.
3.2.1 Types of Condition and Intervention
The intervention consisted of 8 weekly group sessions for the children and care- givers run by trained psychologists. The session sizes varied from 2 to 8 participants,
33 ann eeshdldt eev ramn fe era atitcnrl ( overall controls waitlist that as so year 1 after treatment receive to scheduled were ( 8 each maining immediately in treatment receive enrolled to families assigned 15 randomly the were 7 Of cycle, cycle. enrollment each for attrition on depending DT and IT the inter- for respectively, that 2, Note and con- occasion. 0 95% observation times point-wise each observation groups. at at the delivered mean represent treat- was group bars delayed vention mood the error and in for The (IT) intervals means treatment fidence respectively. the (Intent-to- immediate groups, represent the Entire for squares (DT) for time red ment over Group and (MSI) Treatment diamonds index by green severity Evolution The Outcome Sample. Treat) Mean 3.1: Figure Mean MSI
15 20 25 30 35 0 n 0 7and 87 = n 1 78. = 1 Time 34 IMMEDIATE 2 DELAYED k )adtere- the and 1) = 3 k 0), = According to the taxonomy developed in section 2.4, the condition of a mood dis- order is generally chronic and potentially progressive, getting worse over time without treatment. A chronic condition requires a continuous treatment effect to maintain healthy symptom ranges. The form of the treatment effect is basically a trend from pathology toward normal followed by leveling-off, possibly in the fashion of a sigmoid curve.
The psychoeducational intervention is permanent and irreversible, since a person cannot undo exposure to a course of learning. The effect of this type of intervention is gradual, in that it takes some time for the full TE to develop, and it is continuous, although like any knowledge, it may wear off in the long run. Refresher courses may be beneficial.
Primary Treatment
0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 Participant Participant Participant Participant ● 60 ● ● 50 ● ● 40 ● ● 30 20 ● ● ● ● 10 0 Participant Participant Participant Participant 60 50 ● 40 ● ● ● ● 30 ● ● ● ● ● 20 ● ● 10 ● 0 ● Participant Participant Participant Participant MSI ● 60 ● 50 ● ● 40 ● 30 ● 20 ● ● ● ● ● ● ● 10 0 Participant Participant Participant Participant 60 50 40 ● ● ● 30 ● ● ● ● 20 ● ● ● ● 10 ● ● ● 0 ● ● 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 Time
Figure 3.2: Sample Outcome Profiles in Immediate Treatment Group. Observed individual profiles in mood severity index (MSI) are shown in separate panels for a random subset of participants from the immediate treatment group. Lines that do not extend for the full time indicate a loss to followup. Note that intervention for this group occured at observation time 0.
35 Delayed Intervention
0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 Participant Participant Participant Participant 60 ● 40 ● ● ● ● ● 20 ● ● ● ● 0 Participant Participant Participant Participant 60 ● ● 40 ● ● 20 ● ● 0 Participant Participant Participant Participant
MSI ● 60 ● ● ● ● ● ● 40 ● ● ● ● ● 20 ● ● 0 Participant Participant Participant Participant 60 40 ● ● ● ● ● ● ● ● ● 20 ● ● ● 0 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 Time
Figure 3.3: Sample Outcome Profiles in Delayed Treatment Group. Observed individ- ual profiles in mood severity index (MSI) are shown in separate panels for a random subset of participants from the delayed treatment group. Lines that do not extend for the full time indicate a loss to followup. Note that intervention for this group occured at observation time 2.
3.2.2 Study Design
The particular randomized delayed-intervention controlled trial (RDICT) design
chosen specified observation of a main outcome, or indicator of disease severity, for
each individual at m = 4 equally-spaced timepoints, indexed by j = 0, 1, 2, 3. The
delay, d, for this RDICT design was 2 observation timepoints.
Approximately half of the participants, n1 = 78, with group indicator k = 1, were selected via randomization to receive the overt treatment immediately following their baseline measurements, j = 0. The remaining half of study participants, n0 = 87, with k = 0, acted as controls until they received the delayed intervention following the third timepoint, j = d ≡ 2.
36 Since the control group was observed following treatment, specifically at the fourth time corresponding to j = 3, observations taken on this group may or may not contribute useful information in the estimation of treatment effect.
3.3 Data Structure
The continuous outcome variable for MFPG was mood severity index (MSI), which has a range of 0 to 133. The MSI range for a healthy child is generally 0 to 10, while a score above 35 signifies severe pathology. The approximate ranges 10 to 20 and 20 to 35 in MSI represent the symptom-only and diagnosis categories, respectively. The outcome of interest, MSI, combined scores from instruments evaluating symptoms of depression and mania in the child, as reported by the caregiver.
3.3.1 Outcome and Design
For each of the n = 165 study participants, the MSI values were assessed via psychological interviews at each of m = 4 observation occasions, evenly spaced by six-month intervals. This study is technically a group or cluster randomized trial
(GRT), although that feature is disregarded in the present research.
3.3.2 Exploratory Data Analysis
Observed individual profiles over time for a random subset of participants by treatment group are shown in Figures 3.2 and 3.3. The sample mean evolution by treatment group with pointwise confidence intervals is demonstrated in Figure 3.1. To assess the extent of intra-subject dependence, the sample correlations calculated from pairwise complete detrended data are presented in Table 3.1 and Figure 3.4. There
37 Detrended Outcomes
−20 0 20 40 −20 0 20 40
Time 0 40
0.31 0.39 0.36 20 0 −20
● ● ● ● ●
40 ● ● ●● ● ● Time 1 ● ● ● ●● ●●● ● 20 ● ●● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●●● ●●●●●● ● ●●●●●●● ●● ● 0.49 0.51 ● ● ●●●●●●●● ● ● 0 ● ● ●●● ●●● ●●● ● ●●●● ● ● ●●●●● ● ● ●●● ●● ● ●● ●●● ●● ● ●●●● ● ● ●● ●● ● ● ● ● ●●● ●● ● ●● ● −20 ●
● ●
● ● ●● 40 ● ● ● ● ● ● ● ● ● ● Time 2 ● ● ● ● ●● ● ● ● ●●● ● ● ●●●●● ● ● ● ●●● ● ● ● ●● ● 20 ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●●● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ●●● ● ● 0.48 ● ● ● ● ●● 0 ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ●●●●●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ●● ●●● ●● ● ●● ●● ●●● ● ● ● ●●●● ● ● ●● ●●● ● ● ● ●● ●● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● −20
● ● ● ● ● ● ● ● ●
40 ● ● ● ● ● ● ● ●● Time 3 ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ●● ●● ●● ● ● ●● ● ● ●
20 ● ● ● ● ●● ● ●● ●● ● ● ● ●● ●●● ●● ●●● ●● ● ●●● ●● ● ● ● ●●● ● ● ●● ●●● ● ● ●●●●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ●●● ● ● ● ● ● ● 0 ● ●● ● ●●● ●● ●● ● ● ● ● ● ●● ● ● ●● ●●● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●●●●●● ●● ●● ●●● ● ● ● ● ●●●● ● ● ●● ● ● ●●● ●● ● ● ●●● ●●●● ●● ● ●●●●●●●● ● ● ● ● ●●● ●● ●● ●●●●●●● ● ●● ●●● ● ●●● ●● ● ● ●● ●●●●● ● ● ●●● ● ● ●●●●● ● ● −20 −20 0 20 40 −20 0 20 40
Figure 3.4: Within-Subject Correlation for Detrended MFPG Data. The group- and time-specific means were subtracted from the observed mood severity index (MSI) outcomes. Histograms and pair-wise scatter plots are shown along and below the main diagonal, respectively. The sample correlations are displayed above the main diagonal in font sizes proportional to their magnitude.
is fairly strong and stable correlation that does not diminish over time, suggesting a exchangeable correlation structure.
38 Time 0 Time 1 Time 2 Time 3 Time 0 1 0.31 0.39 0.36 Time 1 1 0.49 0.51 Time 2 1 0.48 Time 3 1
Table 3.1: Sample Correlations for Pairwise Complete Detrended MFPG Data. These values were used to assess the suitability of the random intercept model as well as the strength of the intraclass correlation.
3.3.3 Missingness
A major consideration in any longitudinal study is the effect of missing obser- vations on the validity of any models used or inferences made. Since the dropout patterns in this study were essentially only monotone, missingness is summarized by the attrition shown in Table 3.2. Overall, 28% (22 of 78) of the immediate treatment and 39% (34 of 87) of the delayed treatment participants were lost to followup.
Group Time 0 Time 1 Time 2 Time 3 Total Immediate (k=1) 78 (8) 70 (9) 61 (5) 56 78 Delayed (k=0) 87 (13) 74 (13) 61 (8) 53 87
Table 3.2: Number Participants Remaining by Time and Treatment Group. For each time point, the number of subjects remaining in the MFPG study are listed by treatment group. In parentheses are the number of dropouts following that time period.
The response behavior was compared for these four dropout patterns in Figure
3.5 to see if data were missing completely at random (MCAR), i.e., if the outcome response was similar regardless of last interview time. It is especially clear that, in
39 a uhlwrma S crsa httm hnddtoewormie nthe in remained who those did than time longer. that study at ( scores observation MSI second mean the lower after much immediately had out dropped participants who 13 group the DT particular, In the in interview. last their of time differed the study to the according leaving greatly participants for MSI the group, (DT) treatment delayed the 1 time after out solid, dropped The who completers. 0. those study time represent and single observation 2 respectively The after or lines followup monotone treatment. dotted 4 to delayed and the lost and dashed, participants for immediate shown the group, are represents each (MSI) dot The in index Group. patterns severity Treatment data and mood missing Time in Observation profiles Last mean by observed Profiles Outcome 3.5: Figure Average MSI
20 25 30 35 40 0 ● Immediate Treatment 1 Time 2 3 40
Average MSI
20 25 30 35 40 0 ● Delayed Treatment 1 Time 2 j 1) = 3 3.4 Rationale for Model
With m = 4 timepoints and 2 treatment groups, the 8 different means could be modeled free of constraint, as in a repeated measures analysis of variance (RM-
ANOVA). On the other hand, certain segments may be fit to a straight line. The mean structure as piecewise linear is motivated by a smooth dynamic model curve since the outcome levels are presumable changing continously over time. The variance components are motivated by conjecture and the observed sample correlations. The sample correlations in Table 3.1, an exchangeable correlation structure seems to be appropriate since the off-diagonal values, varying from .31 to .51, do not clearly decrease with time lag separation.
3.4.1 Conceptual Motivation: Dynamic Modeling
An alternative to the piecewise linear model of outcome response, the perhaps more realistic dynamic model represents the outcome as a continuous function over time with parameters according to times of treatment seeking and commencement of intervention, as shown in Figure 3.6. It is proposed that individuals suffering from a psychiatric or other medical problem may seek treatment during a crisis or peak in the evolution of their symptoms.
By this reasoning, if participants are recruited as they seek treatment, the disease symptom trajectory would reach a maximum with slope of zero at entry into the study. After this point, there is a naturnal decline in symptoms away from the crisis maximum, which may be called spontaneous remission, even in the absence of treatment. The hypothesized treatment effect is reflected in a more rapid symptom decline in the treatment group.
41 PRIMARY TREATMENT DELAYED INTERVENTION Symptom Severity
0 1 2 3
Time
Figure 3.6: Possible Conceptualization of Delayed Intervention Response. The green line represents a hypothetical continuous response in the immediate treatment (IT) group. The IT response gradually improves after receiving intervention (indicated by vertical line at time 0) following a sigmoid curve and then levels off. The red line represents a hypothetical dynamic response in the delayed treatment (DT) group. The DT response initially improves due to placebo effect and levels off prior to treatment at time 2 (indicated by vertical line). Following intervention, the DT response gradually decreases, experiencing the treatment effect, and levels off at the same final outcome value as the IT group.
The rapid improvement in the disorder symptomology cannot be sustained in- definitely since at some point the level will presumably enter the healthy range of a person with no diagnosis. One possible model feature to capture this attenuation of treatment effect would specify a changepoint, or change in concavity or decceleration, in the symptom trajectory toward and approaching a stable level on the outcome
42 ● ●
PRIMARY TREATMENT DELAYED INTERVENTION ●
●
● Symptom Severity
● ●
0 1 2 3
Time
Figure 3.7: Possible Conceptualization of Linearized Delayed Intervention Response. The dotted lines represent the hypothetical dynamic model from Figure 3.6. No matter the underlying model, observation points may capture an approximately linear profile, as illustrated.
scale. It is reasonable to dictate this change point at a specified elapsed time from
intervention.
If the model in Figure 3.6 were explored using the m = 4 RDICT design in MFPG,
a mean profile such as that shown in Figure 3.7 may result, which is not at all unlike
the actual mean profile for the treated subset (n = 116) of MFPG data in Figure 3.8.
3.4.2 Linearization Phases
It is sometimes useful to “break up the curvilinear growth trajectories into separate linear components ... to compare growth rates during two different periods.” [19] The
43 sn he hneons h hssae1 lcb ffc,2 lcb eeig 3) leveling, placebo 2) effect, leveling. treatment placebo 4) 1) effect, are treatment phases The phases changepoints. linear distinct three four into using scale time the divides model the of form linear piecewise groups. at DT delivered and was IT intervention the that for respectively, Note groups, 2, and (DT) the occasion. 0 for treatment observation intervals times delayed each confidence observation 95% at and (MSI) point-wise mean index the (IT) severity represent group treatment bars mood error immediate in The the means The respectively. the for Subset. represent Treated time squares for over Group red Treatment and by diamonds Evolution green Outcome Mean 3.8: Figure Mean MSI
15 20 25 30 35 0 1 Time 44 IMMEDIATE 2 DELAYED 3 A key feature of a model is hTE, the duration of phase 3. An educated guess at hTE provides guidance to researchers in designing a study. It is the length of time
required to observe full or near-full treatment impact, prior to the leveling off in phase
4.
The first two phases potentially occur after entry into the study but in the absence
of and prior to treatment. Phase 1 is observed only in the delayed-intervention control
group. Phase 2 is observed depending on whether the delay for intervention exceeds
the time to placebo leveling or not. None of the designs, real or proposed, presented
in this research investigate phase 2 of the model.
The last two phases occur following intervention, with or without a delay. Phase
3, observed for all subjects under the RDICT design, represents the initial treatment
impact. The final phase (phase 4) is a leveling off, attenuation, or loss of the initial
treatment impact, which may or may not be captured by the study design depending
on the length of followup.
The changepoint between phases 1 and 2 occurs when the spontaneous remission
or placebo effect wears off. The second changepoint, between phases 2 and 3, occurs
at the administration of treatment, marking the beginning of treatment impact. The
final changepoint, between phases 3 and 4, occurs once the treatment has done its
work and participants either stabilize, continue slight improvement, or regress into
pathology.
3.4.3 Elements of the Model
For the MFPG data, based on the mean profiles in Figure 3.1, the immediate
treatment group experiences phases 3 and 4, treatment effect and leveling, while the
45 control group experiences phases 1 and 3, placebo and treatment effects. It is assumed
that the delay, d = 2 time periods, for intervention in the latter group was not long
enough for the placebo effect to wear off, so the second phase of the model was not
invoked by the MFPG design.
Both treatment groups are thus piecewise linear with a hinge at j = 2 or 12
months. For the control group, the trajectory was assumed to be linear in two pieces,
one before and one after treatment delivery at 12 months, after a delay of d = 2 obser-
vation times. Any improvement before the treatment may be considered spontaneous
or placebo-like.
The immediate treatment group could potentially follow a straight line over the
entire duration of the study, 18 months, but this is unlikely. At some point, it would
be expected that the outcome would level off after maximum treatment effect is
exhausted. In MFPG, the attenuation or leveling-off changepoint was after hTE = 12
months or h = 2 timepoints. This may have been anticipated but was more likely determined based on preliminary or pilot study results.
An additional possible constraint to this piecewise linear model with 4 distinct slopes requires the post-treatment slopes (prior to stabilization) to be equal, resulting in a model with only 3 slopes. This constraint is key to the present research, which asks whether treatment effect varies with the timing of delivery. If not, the delayed intervention in RDICT may provide useful information in estimation of the TE.
3.4.4 Choice of Random Effects
The natural grouping variable for random effects in MFPG is the individual and perhaps the small group sessions formed for treatment delivery. The repeated time
46 measurements, level 1, are nested within participant, level 2, and participants are
nested within session, level 3. Considered for inclusion in the linear mixed effects
model were a random intercept and slope for participant as well as a random intercept
by session. Decisions regarding inclusion of a random effect are made via comparsion
of estimated standard deviations for those effects. Venables and Ripley in Chapter
10 [23] also suggest a comparison of models with and without random effects.
3.5 Model for MFPG Data
Consider symptom severity data of the form yijk, indexed by treatment group
k = 0, 1, participant number i = 1, . . . , nk, and occasion j = 0, . . . , m−1. These data represent m repeated observations (over time) on nk subjects in each of the groups k.
The k = 1 group is the immediate treatment group while k = 0 references the control
group.
3.5.1 Time Convention
The time variable, tj or tjk, is indexed by the measurement occasion j ∈ 0 : (m−1)
or occasion and the group k ∈ {0, 1} if necessary for distinction. So far, there are
two time scales for the study, the calendar time of days and months, as well as the
observation time integers j ∈ 0 : (m−1), which map onto the former time scale. There
are a few special integers, the hinge h ∈ 1 : (m − 1) and the delay d ∈ 1 : (m − 2),
which lie on the latter time scale.
To standardize the statistical modeling, a third time scale was constructed. In
subsequent modeling, the scale of time variable, tj or tjk, is such that a single unit
of time equals the treatment effect time period, which was hTE = 12 months in
MFPG. In this design, the unit of time is divided into halves, or h = 2 time periods,
47 hTE by observation points that are ( h ) = 6 months apart, so that hTE calendar time elapses between observations j = 0 and j = h.
Setting time at study entrance to t0 ≡ 0 for both groups, as in so-called “entrance- centered” models, then dictates th ≡ 1 by convention above. For MFPG, this forces
1 3 the m = 4 time points t = (tj : j ∈ 0 : 3) to be (0, 2 , 1, 2 ).
3.5.2 Delayed Treatment Group
The waitlist controls form their own quasi-experiment, what Shadish et al. [21]
called an interrupted time series design. A simple model for the n0 = 78 participants
iid in this group is shown below (3.1) for i = 1, . . . , n0, j = 0,..., 3, assuming bi0 ∼
2 ind iid 2 N(0, τ )& ij0 ∼ N(0, σ ).
+ yij0 = bi0 + α0 + γ1tj + θ0(tj − td) + ij0 (3.1)
This random intercept model is essentially segmented linear with a changepoint
after d = 2 observations. The pre-intervention slope γ1 reflects any placebo effect. The
random intercepts bi0 reflect individual variation, while the ij0 reflect measurement
error, for example, the psychometric properties of the instrument. Without debating
the external validity of θ0 as a measure of treatment effect (TE), we fit this model to ˆ obtain the following estimates: θ0 = −6.97(5.72),σ ˆ = 12.14, andτ ˆ = 11.71.
3.5.3 RCT: Both Groups Without DI
A model incorporating data from both groups but ignoring the delayed treatment
(DT) is presented below (3.2) for i = 1, . . . , nk, j = 0,..., 2, k = 0, 1, assuming
iid 2 ind iid 2 bik ∼ N(0, τ )& ijk ∼ N(0, σ ). Note that the last j = 3 observation is dropped
since this is the post-treatment for the control group. It is also dropped for the
48 immediate treatment group since the last time point is post-attenuation and hence does not inform directly regarding the TE (but would increase the degrees of freedom).
yijk = bik + αk + (γ1 + I[k=1]θ1)tj + ijk (3.2)
shown separately by group as
yij0 = bi0 + α0 + (γ1)tj + ij0
yij1 = bi1 + α1 + (γ1 + θ1)tj + ij1
We fit this RCT model, igoring the post-DI observation time, to obtain the fol- ˆ lowing estimates: θ1 = −6.55(3.10),σ ˆ = 12.50, andτ ˆ = 9.95.
3.5.4 An Entrance-Centered Model for RDICT
An entrance-centered model, where t0 ≡ 0 and th ≡ 1 with h = 2 (for the halving of hTE) and d = 2, is an extension of 3.2 and shown separately by group in the models below for i = 1, . . . , nk, j = 0,..., 3, k = 0, 1,. This model encompasses the full dataset, combining the quasi-experiment and RCT approaches to treatment effect estimation.
+ yij0 = bi0 + α0 + (γ1)tj + θ0(tj − td) + ij0
+ yij1 = bi1 + α1 + (γ1 + θ1)tj + γ2(tj − th) + ij1
49 ˆ We fit this EC-Full model to obtain the following estimates: θ1 = −6.41(3.05), ˆ θ0 = −6.99(5.79),σ ˆ = 12.30, andτ ˆ = 10.38. This model is “full” in the sense that it still estimates the two measures of treatment effect separately. In order to force these to be a single parameter, the time variables were shifted in the treatment-centered model that follows. The treatment-centered model is also helpful when d 6= h.
3.5.5 A Treatment-Centered Model for RDICT
In order to constrain the parameters of the full entrance-centered (EC-Full) model so that there is a single parameter θ for the treatment effect in both groups, the times for the delayed treatment (DT) group were shifted down by td so that td0 ≡ 0.
For MFPG, d = 2 and td = 1, so the new time values for the DT group were
1 1 t0 = (tjk : j ∈ {0,..., 3}, k = 0) = (−1, − 2 , 0, 2 ). The reduced treatment-centered (TC-Reduced) family of linear mixed effects (LME)
models with a random intercept has
+ + yijk = bik + αk + γ1tjk + θ2(tjk) + γ2(tjk − th) + ijk
assuming
iid 2 ind iid 2 bik ∼ N(0, τ )& ijk ∼ N(0, σ ); i = 1, . . . , nk; j = 0, . . . , m − 1; k = 0, 1.
We fit this TC-Reduced model to obtain the following estimates for the MFPG ˆ dataset: θ2 = −6.41(3.05),σ ˆ = 12.29, andτ ˆ = 10.39.
In matrix notation, the TC-Reduced model family has
yik = Xikβ + zikbik + ik,
where the parameter is