<<

EVALUATING ALTERNATIVE STRATEGIES FOR STRATIFICATION FACTORS IN RANDOMIZED CONTROLLED TRIAL ANALYSIS by

Stuart Alan Gansky

Department of University of North Carolina

Institute of Mimeo Series No. 2168T

August 1996 EVALUATING ALTERNATIVE STRATEGIES FOR STRATIFICATION FACTORS IN RANDOMIZED CONTROLLED TRIAL ANALYSIS

by

Stuart Alan Gansky

A Paper submitted to the faculty of the University ofNorth Carolina at Chapel Hill in partial fulfillment ofthe requirements for the degree ofDoctor ofPublic Health in the Department ofBiostatistics

Chapel Hill

1996

Approved by:

Advisor

Reader ----:::::r~c:....::....::.=.o....-=+---.::..::..-=s._

+-=~~::::::::::::~~~ Reader 111

EVALUATING ALTERNATIVE STRATEGIES FOR STRATIFICATION FACTORS IN RANDOMIZED CONTROLLED TRIAL ANALYSIS

Stuart Alan Gansky (under the direction ofGary Grove Koch) ABSTRACT In randomized controlled trials, traditionally there are a set of strata and within each stratum patients are randomized to a treatment condition. Stratified designs are often used to address concerns about study generalizability to traditionally underrepresented patient subgroups. For many studies, the stratification applied during design is much more extensive than can be fully incorporated in analyses. One potential strategy for such situations is to ignore one or more stratification factors; another is to account for one or more factors with nonparametric adjustment. Consequences ofignoring strata or adjusting strata as covariates, relative to fully stratified procedures are not completely known and deserve further investigation, since trialists frequently perform analyses not fully accounting for stratification. This research evaluates properties of strategies that ignore strata or adjust strata as covariates using both design- and model-based methods, relative to fully stratified analyses, in tenns ofvalidity and power. Derivations, simulations and actual examples illustrate the perfonnance in various scenarios. Sparse strata situa- tions are examined; particular emphasis is on multicenter trials with not all treatments assigned within all centers (i.e. incomplete centers). The situation concerning many small centers (one or two patients) is explored in depth with modelling and minim,al ~ssumption methods. Using a crossover trial paradigm, estimators are developed with a priori weights to incorporate incomplete center data into the usual complete data estimator.

Various patient accrual mechanisms are considered, including one involving selection .

Power and bias are assessed for methods combining data from complete and incomplete IV centers. Finally, these methods are extended to multicenter trials with more than two patients per center; bias is assessed for the extensions as well. v

ACKNOWLEDGMENTS

I cannot thank Dr. Gary Koch enough for his guidance, support, and motivation. His students appreciate his patience, persistence, dedication, expertise and advice more than he can imagine. Moreover, I would like to thank Drs. Kant Bangdiwala, Harry Guess, Ronnie Homer, and Dana Quade not only for their constructive comments, suggestions, and insights concerning this dissertation, but also for their instruction and guidance in coursework and projects. In addition, I am very thankful to Amy Cutrell for kindly providing the herpes zoster data. Moreover, I sincerely appreciate the financial support from the United States National Institute of Environmental Health Science Traineeship,

Veterans Affairs Office of Academic Affairs Health Services Research Predoctoral

Fellowship, and UNC Biometric Consulting Laboratory Graduate Research Assistantship which have generously funded my graduate education and research. Also, Drs. Jerome

Wilson, Jane Weintraub, and Joanne Harrell have presented me the opportunities to work on interesting, relevant public health problems, for which I am grateful. My colleagues and friends in the Biometric Consulting Laboratory, formerly "The Trailer", have helped me a great deal over the years in a challenging, yet relaxed learning atmosphere. Dr. Craig

Turnbull deserves much credit for introducing me to biostatistics. Thanks also go to Mark

Pinzur who showed me that "numbers are your friends" and to Dr. Ed Dwyer who taught me that you can never edit too much. I am extremely thankful to my parents Rose and

Jack Gansky, who instilled in me a penchant for learning and who encouraged me by saying I could do anything ifI set my mind to it. Finally, my most sincere thanks go to my loving and supportive wife Karen Leder Gansky for her continual help and motivation, as well as her boundless understanding and patience. VI

TABLE OF CONTENTS Page

LIST OF TABLES x

LIST OF FIGURES .xi

LIST OF ABBREVIATIONS xii

Chapter

1. Introduction 1

1.1. General Issues ' .1

1.2. Stratified Randomization 3

1.3. Restricted Randomization 5

1.3.1. Permuted-Block Randomization 6

1.3.2. MinimizationlDynamic Allocation 7

1.3.3. Adaptive Randomization 9

1.4. Usual Analysis Approaches 10

1.4. 1. Ignoring Stratification Factor(s) .1 0

1.4.2. Covariance Adjusting Stratification Factor(s) .11

1.5. Examples 12

1.5.1. Example 1: Analgesics and Pain Relief 12

1.5.2. Example 2: Respiratory Ailment.. .13

1.5.3. Example 3: Multicenter Herpes Zoster Study 13

1.5.4. Conceptual Examples .13

1.6. Purpose 14

2. A Review ofSome Statistical Methods for Randomized Controlled Trial Analyses 16

2.1. Introduction .16

2.2. Design-based Methods 17

2.2.1. Fisher's Exact Test. .17 w

2.2.2. Extended Mantel-Haenszel Test ~ .18

2.2.3. Stratified Randomization Analysis ofCovariance 23

2.3. Model-based Methods .24

2.3.1. Logistic Regression 24

2.3.2. Ordinal Logistic Regression .25

2.3.3. Two Sample t-Test. 26

2.3.4. Paired t-Test 26

2.3.5. General Linear Univariate Model.. 27

2.3.6. Random/Mixed Effects Model. 29

2.3.7. Weighted Least Squares 31

2.3.8. Generalized 33

2.3.9. Survey Data Regression 35

3. Unstratified Analyses for Stratified Designs .37

3.1. Introduction .37

3.2. Assessments with Example RCTs .37

3.2.1. Example 1: Dental Pain Trial.. 37

3.2.2. Example 2: Respiratory Ailment Study .40

3.3. Simulations -. :42

3.3 .1. Normally Distributed Response :43

3.3.2. Non-Normally Distributed Response ..45

3.3.3. Simulation Results 49

3.4. Summary and Conclusions 53

4. A Random Effects Model for Incomplete Pairs in Multicenter Trials 54

4.1. Introduction 54

4.2. General model. 57

4.3. Random effects model. 61 Vlll

4.4. Example: Multicenter Herpes Zoster Study 75

4.5. Summary and Conclusions 80

5. Finite Population Framework for Incomplete Pairs in Multicenter Trials 81

5.1. Introduction 81

5.2. Fixed Patient Accrual and Random Treatment Allocation 85

5.3. Random Accrual Independent ofTreatment Allocation 92

5.4. Accrual Dependent on Allocation: Selection Bias 97

5.4.1. General Selection Bias 98

5.4.2. Selection Bias in Treatment Sequences .105

5.4.3. Notes on Selection Bias 109

5.5. Summary and Conclusions 110

6. A Random Effects Model for Incomplete Blocks in Multicenter Trials .112 6.1. Introduction .112

6.2. General ModeL 114

6.3. Random Effects Model... 115

6.3.1. Three Patient Maximum per Center. .115

6.3.2. Four Patient Maximum per Center 123

6.4. Summary and Conclusions 129

7. Finite Population Framework for Incomplete Blocks in Multicenter Trials 130

7.1. Introduction .130

7.2. Fixed Patient Accrual and Random Treatment Allocation .!33

7.3. Random Accrual Independent ofTreatment Allocation .139

7.4. Notes on Combining Centers .143

7.5. Summary and Conclusions 144

8. Summary and Future Research 146

8.1. Summary .146 IX

8.2. Recommendations 148

8.3. Future Research 148

8.3.1. Unequal Treatment Allocation in Multicenter Trials 148 8.3.2. Stratified Incomplete Multicenter Extensions 149

8.3.3. Alternative Multicenter Designs .l50

8.3.4. Other Extensions 151

REFERENCES .153 x

LIST OF TABLES Table Page 3.1 P-values for testing each active treatment group versus placebo for summary measures and hourly dental pain reliefwith centers as strata 38 • 3.2 P-values for testing each active treatment group versus placebo for summary measures and hourly dental pain reliefwith centers x permuted blocks as strata 39

3.3 P-values for testing treatment differences for ordinal respiratory measures with centers as strata A1 3.4 Comparison ofnormal and non-normal distributions 47

3.5 Distributional parameter values and measures ofcentral tendency and dispersion .48

3.6 Empirical bias estimates based on 2100 simulations: Rejection rate ().. 50

3.7 Empirical power estimates based on 1000 normal simulations: Rejection rate (standard error) 52 4.1 WLS results for 7 day standard and test treatments 17

4.2 WLS results for 7 and 14 day test treatments 78

5.1 Assumptions needed for unbiasedness with fixed accruaL 88

5.2 Unbiasedness assumptions with random accrual independent ofallocation 94

5.3 Large sample unbiasedness with random accrual dependent on allocation (general selection bias) 101

5.4 Unbiasedness with random accrual dependent on allocation (treatment allocation selection bias) .107 7.1 Assumptions needed for unbiased treatment difference with fixed accruaL 137

7.2 Assumptions for unbiased treatment difference with random accrual.. .l41 Xl

LIST OF FIGURES Figure Page

2.1 Cross-tabulation ofthe hth stratum 19

3.1 Nonnal distribution simulations (2 treatments and 2 strata) A4 • 3.2 Non-nonnal distribution simulations (2 treatments and 2 strata) A6

4.1 Multicenter data structure : 62

4.2 as a function ofratio ofsingles to pairs (fap), intraclass correlation (PI), and weights (w) with equal treatment allocation (fp = la = 1) and 100 centers (H = 100) 10

4.3 Variance as a function oftreatment imbalance (fp), intraclass correlation (PI), and weights (w) for lap = 1 or ~ and 100 centers (H = 100) 71

4.4 Ratio ofvariance from optimal weights to variance from a priori weights based on proportional to number ofcenters for various intraclass correlations (PI) and 100 centers (H = 100) 12 6.1 Extended multicenter data structure 117

7.1 Sample sizes for center sequences and center sizes 132

8.1 Sample alternative design 1 : 152

8.2 Sample alternative design 2 152 Xli

LIST OF ABBREVIATIONS

ANCOVA analysis ofcovariance

ANOVA analysis ofvariance

BLUP best linear unbiased prediction

BSD big stick design dfor d.f degree(s) offreedom

EMH extended Mantel-Haenszel

GEE generalized estimating equation

GLM

GLUM general linear univariate model

GOF goodness offit

LTFR less than full rank

MCAR missing completely at random

:M.IXMOD

OLS

P placebo

RANCOVA rank analysis ofcovariance

RCT randomized controlled trial

REML restricted maximum likelihood

SDR survey data regression

SH standard - high dose

SPill sum ofpain intensity differences

SL standard - low dose

STRANCOVA stratified randomization analysis ofcovariance

TH test - high dose

TL test - low dose Xlll

TOTGONE total pain halfgone

TOTPAR total pain relief

TSL Taylor series linearization ..

WLS weighted least squares '" Chapter 1 Introduction

1.1. General Issues Randomized controlled trials (RCTs) typically involve strata and within each stratum patients are randomized to a treatment condition. Each person has data collected for background characteristics and outcome variables at one or more time points and/or sites.

Proper analysis should be based on the actual study design. A basic tenet in statistics is to

"analyze as randomized". R. A. Fisher advocated analyzing "strictly in accordance with the structure imposed by the design of the " (p. 54, 1966). Others have recom- mended that in RCTs "the data analysis performed at the end of the study should reflect the randomization process actually performed," (Friedman et aI, p. 55, 1985) to properly draw statistical inferences based on the permutation principle. Yet for many studies, the stratification applied during design is much more extensive than can be fully incorporated in analyses. Moreover, this issue is likely to become more critical as future studies incorp- orate additional stratification for demographic variables, such as age" race, and gender, so their conclusions have better generalizability to underrepresented groups as per Food and

Drug Administration, National Institutes ofHealth, and Department ofHealth and Human

Services' policy statements (Wermerling and Selwicz, 1993; Schmucker and Vesell, 1993).

The main dilemma from further stratification is fewer patients per stratum and even some strata without all treatments represented, so analyses fully adjusting for stratification be- come inefficient. One potential strategy for such situations is to ignore one or more strat- ification factors; another is to account for one or more factors with covariance adjustment. , 2

Generally, all analyses can be approached either from a randomization basis or a modelling basis. The randomization- or design-based analyses make no assumptions about the origin of the patients and use a randomization-related probability structure, e.g. to

Fisher's exact tests, Wilcoxon rank sum (Mann-Whitney U) tests, Mantel-Haenszel tests and their extensions, and rank analyses of covariance (RANCOYAs). Conditional permutation tests consider the responses and covariates as fixed constants, condition on the number of patients assigned to each treatment group, and permute the treatment assignments all possible ways (to generate the null distribution) to evaluate the probability of finding the realized values relative to the null distribution (e.g. Hollander and Wolfe,

Chpt. 10, §3, 1973; Randles and Wolfe, Chpt. 11, 1991). (Note the previous phrase "all possible ways" all possible ways in which treatments could have actually been assigned under the utilized randomization plan; thus if a restricted, rather than complete, randomization was used, then the randomization analysis must take that into account.)

Since patients are randomly assigned to treatments in RCTs, randomization-based analyses ensure appropriate size (Type I error). (Of course even randomization-based analyses assume that trials are conducted properly; e.g. blinding is maintained, data quality is assured, and any random error is not associated with the outcome or with errors ofother observations.)

The modelling approach assumes an underlying superpopulation from which these people are drawn by some mechanism and posits an analytic framework attempt­ ing to reflect the actual sampling. Methods making these assumptions include t-tests, analyses ofvariance (ANDYAs), analyses of covariance (ANCOYAs) and linear, logistic, ordered logistic, or proportional hazards regressions. Clinical trials typically involve vol­ unteer subjects meeting entry criteria (convenience or fortuitous sampling) and clinics or centers chosen for their staff, patients, location and/or experience (judgment or purposive .. sampling) (Koch and Gillings, 1983). Thus, model-based analyses may require more cau­ tion, since RCTs involve convenience and judgment samples, not probability samples from 3 a well-defined target population. Non-statistical justifications are required for the model­ ling approach and for the generalizability ofits results (e.g. Kuritz et ai, 1988; Meinert, p.

206, 1986). Senn (1991) even suggests inference can only be made concerning the sample studied.

Consequently authors advocate using design- and model-based methods as comple­ ments (Koch et ai, 1980; Koch and Gillings, 1983; Koch and Edwards, 1988; Landis et ai,

1988; Koch et aI, 1990). Lachin (1988a) characterizes " in a ... as a two-step process": first assess treatment differences in these patients, then

"ascertain the extent to which the observed results can be applied to the reference patient population, ... invok[ing] the concept ofa population model ... [which] cannot be done with any statistical formalism." ~e depicts these randomization and invoked population models for a clinical trial in a figure to help compare and contrast these approaches.

1.2. Stratified Randomization Most RCTs are designed with some stratification, either for logistical or scientific reasons, resulting in randomized-block designs. Examples of logistical strata include center in multicenter studies, as well as sequential-block (i.e. stratifying on entry time) in permuted-block designs, which reduce the possibility and extent of treatment allocation imbalances. Disease severity, prior treatment regimen, gender and age are common clinical stratification variables used to achieve within-stratum treatment allocation balance.

Stratified randomization is used for various reasons: better generalizability, balanced treatment allocation within a stratum, periodic treatment allocation balance (as in sequential- or permuted-blocks), easier randomization implementation (as in multicenter designs), and increased precision. Using demographics as strata for randomization can improve validity by increasing generalizability to a more representative target population; e.g. both genders, multiple races, and various age groups. Balancing treatment assign­ ments for stratification factor levels enables subgroup assessment within strata; e.g. 4 randomizing by stratifYing on previous treatment modality (any vs. none) or baseline disease severity (mild vs. moderate vs. severe) would tend to balance treatment assign­ ments within each level and would allow tests oftreatment effects within each subgroup.

Similarly, using permuted-blocks would not only limit the amount oftreatment imbalance at any stopping point in the study, but would also limit the imbalance ofany (unmeasured) baseline trend during accrual, since patients would be fairly balanced between treatment groups regardless of the baseline trend. Sometimes, stratified randomization eases prob­ lems in multicenter studies (e.g. Oddone et al 1993), such as certain centers withdrawing from the study or adhering to the study protocol differently, so that despite these dilem­ mas, patients are nearly equally split amongst treatments. Finally, stratified randomization can reduce within-stratum variability (increase intraclass correlation) and increase pre­ cision, since stratifying analyses aggregate these reduced-variance within-stratum treat­ ment effects. Statisticians generally advocate using only strata known to increase precision in designs.

From a design standpoint, strata only improve when they are correlated with the outcome(s) of interest (Cochran, pp. 101-103, 1977; Friedman et al, pp. 56-59,

1985; Pocock, pp. 80-87, 1983; Meinert, pp. 93-94, 1986; Gehan and Lemak, p. 139,

1994). However, investigators sometimes desire within-stratum balance for factors that may have unknown correlation with response (e.g. center in multicenter studies as in Ory et aI, 1993) or may sometimes desire stratification on a number of prognostic and demographic factors. So, designs may incorporate many strata, even though precision gains in the variance of a stratified from increasing the number of strata eventually reach a point of diminishing returns (Cochran, pp. 132-134, 1977). Too many strata can increase treatment imbalance which can lead to loss of precision. In addition, with moderate sample sizes and many strata, the error degrees of freedom may decrease so much that power is reduced. Furthermore, post-stratification for important factors achieves almost as much efficiency as stratified randomization in large trials, but does not 5 encumber analyses by requiring inclusion ofstratified randomization factors later found to be unimportant (Friedman et ai, p. 58, 1985). A drawback of highly stratified randomization in small trials is possibly sparse data in each stratum, with empty cross­ classified cells as the extreme. This essentially corresponds to fully stratifying analyses discarding patients in those strata, which can negate one of randomization's most important advantages: the tendency to balance treatment allocation across variables measured as well as those not measured. For example, Pocock and Simon (1975) discuss a stratified permuted-block randomized trial with two trichotomous and three dichoto... mous stratification factors resulting in 72 cross-classified strata; incorporating the 19 centers increases the number of cross-classified strata to 1368. After one year of enrollment, 21 of the 72 strata (excluding center) had incomplete permuted-blocks, while

49 of the other 51 strata were completely empty. Moreover, designs sometimes yield treatment balance within each cross-classified stratum for cosmetic purposes only, since completely stratified analyses are never intended.

1.3. Restricted Randomization

A danger of unrestricted randomization, such as within-stratum randomization, especially with small or moderate sample sizes (i.e. less than 200 (Lachin, 1988b)), is the chance ofallocation imbalance; i.e. an unequal number ofpatients in each (within-stratum) treatment group. The drawback from imbalance is loss ofefficiency; moreover, investiga­ tors usually desire balance for cosmetic purposes. Thus, contemporary trialists frequently use a restricted randomiza~ion procedure to achieve better balance and prevent major im­ balances (Simon, 1979; Friedman et aI, p. 57, 1985). Since randomization-based procedures should evaluate the observed against all possible allocations given the randomization scheme, probability levels from simple or complete randomization are not necessarily the same as those from restricted randomization. Some restricted randomizations involve deterministic assignments, which can increase selection bias as 6 measured by the expected number of correct treatment assignments guessed by an optimally guessing clinician (Blackwell and Hodges, 1957). However, in well-run double­ blind studies - particularly in multicenter studies (Soares and Wu, 1983) - this should not be an issue.

1.3.1. Permuted-Block Randomization One common tactic to promote balance is permuted-block randomization: treatments are assigned in blocks ofsize B which have balanced allocation; i.e. B / J patients allotted to each of J treatments within each block, where B is an integer multiple of J. Two treatments have (t;2) possible block assignment sequences and J treatments have n(B-(i;/~BjJ) = [(BfJ)!V ):::::1 possible sequences. Since each sequence is equiprobable, each has a selection probability equal to the inverse of the number of sequences. An incomplete block would result in imbalance by at most B / J in at most J - 1 treatments; e.g. allocating 3 treatments in blocks of size 12 could cause, at worst, 2 treatments to have imbalances of 4 patients each. In addition to promoting balance at each point in a study with ongoing enrollment, permuted-block randomization also has a "more subtle advantage": balancing (observed or unobserved) time-dependent covariates, such as measurement technique (Fleiss, P 50,

1986). As mentioned above, a disadvantage ofthis method is less straightforward analysis than under complete randomization; proper analysis requires block stratification.

Matts and Lachin (1988) discuss randomization-based analyses for unstratified RCTs with complete permuted-blocks of equal size. They address the misconception that ignoring blocks always provides a conservative test (Friedman et ai, 1985) by showing that blocked versus unblocked analyses depend on the intraclass correlation (Koch, 1983) for blocks. The intraclass (PI) correlation is related to the block size B and the number of 7

blocks H. Letting h = 1,2, ... H index blocks and i = 1, 2, ... B index subjects in the hth block, PI corresponds to the extent to which B(B - 1) pairs ofsubject responses

(Yhi, Yhi') in the hth block deviate in the same direction from the mean response (P): H

PI = HB C1-1)vE E (Yhi - j.£)(Yhi' - j.£) and Ci!l) ::s; PI < 1, h i#i' HB HB where p. = JiB EEYhi and v = JiB EE (Yhi - j.£? h i h i Usually in RCTs, PI for blocks will be positive, resulting in unblocked tests being

conservative, but inefficient; however sometimes PI can be negative, causing tests that

ignore to be anti-conservative (Matts and Lachin, 1988). Though negative PI is

more likely with small block sizes, Example 1 (presented below) with blocks ofsize 20 has

a negative Pi for pain reliefat 3 hours post-medicating.

Another potential problem is greater selection bias risk, since clinic staffmay attempt

to guess assignments. In an unblinded trial, staff could know with certainty before actual

assignment at most the last B / J assignments and at least the last assignment in. each block; e.g. as above, with B = 12 and J = 3 (labelled A, B, and C): a sequence could allot one treatment to the last consecutive B / J assignments (e.g. AAAABBBBCCCC) or could apportion treatments more mixed (e.g. ABCCBABCABAC). (Underlining indicates unblinded assignment known with certainty.) Possible selection bias can be reduced, though not eliminated, by varying block sizes. However, most RCTs incorporate blinding whenever possible, thereby removing possible selection bias. Meinert recommends never divulging block sizes before completing patient enrollment (pp. 95-96,

1986). Using masked, varied block sizes and masked sequences eliminates selection bias

(Soares and Wu, 1983; Matts and Lachin, 1988).

1.3.2. MinimizationlDynamic Allocation

Several authors have suggested designs that minimize the imbalances across the stratification factors. For example, Taves (1974) and Pocock and Simon (1975) 8 suggested an approach that nunmuzes treatment allocation imbalances for marginal stratification factor levels, which is similar to sample survey approaches to multiple stratification with small samples (Cochran, pp. 124-126, 1977) since they both use mar- ginal approaches. Allocations are made after assessing the sum ofpotential marginal im­ balances that would result from assigning the patient with those particular stratification factor characteristics to each treatment. This procedure assigns the treatment that would reduce the current stratification imbalances the most with increased probability under dy­ namic allocation (pocock and Simon) or with certainty (probability one) under minimiza­ tion (Taves). So with two stratification factors, the potential imbalance for each factor is calculated and the treatment allotment that would reduce the larger ofthe two imbalances would be made. A major drawback with these methods is that they are often detenninis­ tic; thus, they do not always provide a basis to invoke the permutation principle and they can introduce more selection bias than stochastic methods. Another disadvantage is that they balance the margins only; imbalances in the cross-classified cells can still occur as shown by the following simple example (Signorini et aI, 1993) depicting the number of patients assigned to two treat!J1ents (A,S). Overall and marginal treatment totals are balanced, but allocations for factor combinations are imbalanced.

Hospital

Gender 1 2 Total Male 8,2 2,8 10,10 Female 2,8 8,2 10,10 Total 10,10 10,10 20,20 Pocock and Simon suggest using a full factorial design to balance interactions, but this can cause problems when there are multiple factors.

Soares and Wu (1983) modify the minimization idea to be stochastic. In their "Big

Stick Design" (BSD), they modify Zelen's design (1974) to assign treatments randomly unless the imbalance would exceed some a priori level; in that case an alternative allot­ ment (the "big stick") reduces the imbalance. This alternative assignment can be deterrnin- 9 istic or stochastic (like biased-coin design described below). The advantages ofthe BSD are balance, reduced chance of selection bias, and relative simplicity (compared to adapt­ ive randomization designs). The main disadvantage is that randomizationlpennutation tests would require more complicated construction, although large sample approximations, simulations and bootstrap methods are available and should be reasonable as long as base­ line severity does not change during patient accrual (Mehta et aI, 1988; Wei et al, 1989).

1.3.3. Adaptive Randomization More complex restricted randomization methods have been developed to provide better balance with less selection bias possibility. Such adaptive procedures include urn

(Wei, 1977) and biased-coin designs (Efron, 1971). The basic premise ofadaptive biased­ coin randomization (Wei, 1977) is as follows: if the trial is balanced at the time of randomizi~g a patient, allot treatment to the patient with equal probability; but if treatments are unbalanced at randomization time, assign the less abundant treatment to the patient with greater than equal probability. Biased-coin designs use a constant probability for imbalance, but urn designs base the probability on the extent of imbalance. These designs have the advantages of producing more balanced smaller trials with less predictability and selection bias, but complicate both the allocation process and the corresponding design-based analysis. The urn design has smaller potential selection bias than permuted-block and biased-coin designs (Wei and Lachin, 1988). Under the urn and biased-coin designs, subjects are not assigned treatments with equal probability, so the usual randomization-based analyses do not necessarily apply. Enumerating all the possible ways treatments could have been allotted entails fewer actual possibilities (some sequences have iero probability), but involves variable assignment probabilities; instead of merely calculating the proportion ofpossible allocations with a test statistic at least as rare as that in the actual realization, under adaptive designs one must also evaluate the assignment probability for each treatment allocation in each possible assignment sequence. Recent 10 algorithms and software (Mehta et aI, 1988~ Hollander and Pena, 1988; Wei et aI, 1989) make computation ofrandomization tests for moderate sized studies more feasible.

1.4. Usual Analysis Approaches 1.4.1. Ignoring Stratification Factor(s) Many statisticians advocate that studies implementing within-stratum randomization should use stratified analyses, such as extended Mantel-Haenszel (EMH) tests. Often in practice, investigators use one of two alternatives to stratified procedures for analysis: ignore or adjust as a covariate. This may be partly due to ignorance, especially with the advent of user-friendly statistical software and publishing pressures to include statistical analyses in applied disciplines. Researchers may inadvertently, rather than maliciously, use the observational unit instead of the experimental unit as the unit of analysis. This may lead investigators to draw accidental erroneous conclusions. In particular, with binary response, ignoring matching can propagate bias and decrease power (Breslow and Day,

1980; Kleinbaum et ai, 1982).

Other times, analysts may purposely ignore (some) design aspects; e.g. Meinert (Q56. p. 205, 1986) recommends an initial analysis ignoring stratification. A design component may be ignored for better efficiency due to the aforementioned sparse strata scenario.

Sometimes an analyst implicitly or explicitly believes a design factor, such as randomization block, is not explaining any variation in the error term. Ignoring the stratified randomization is often used for the successive permuted blocks within which treatments are usually assigned to avoid major allocation imbalances. In addition to being part of the actual allocation scheme, which a design-based analysis should incorporate, • these time-dependent permuted blocks may account for unforeseen variation at enrollment, such as differential baseline severity (VACURG, 1967~ Byar et aI, 1976~

Friedman et aI, p. 55, 1985; Eisenhauer et aI, 1994), seasonality, changes in standard care or evaluation techniques, or a learning effect for the clinical staff as the study progresses 11

(Greenberg, 1951); Matts and McHugh call this "chronologie bias" (1978). Coulter (p.

53, 1991) warns "when a trial is extended unduly, due to inadequate patient accrual, there is a risk that the disease itself, or the patients, will have changed, so that those at the end ofthe trial" differ from those at the beginning.

A paradox, however, can arise: although power may be gained by stratifying positively correlated permuted blocks, stratification with these blocks results in stratum sizes equal to block size, or smaller with and unfilled blocks; hence, inefficiency can result (e.g. Schoenfeld and Tsiatis, 1987). Often in multicenter trials, small within-center sample sizes make stratifying methods inefficient or specifying efficient analyses difficult. Small within-stratum sample sizes can lead to within-stratum allocation imbalance, which also can inflate the variance in stratified methods. Thus, analyses frequently ignore the original stratified randomization. Moreover, community health trials, which randomize clusters (e.g. schools or towns) to different interventions but gather data at the person-level, sometimes use analyses ignoring stratified randomization. Methods such as Fisher's exact tests, Wilcoxon rank sum (Mann-Whitney U) tests, I-Way

ANOVAs (two sample t-tests), ANCOVAs adjusting for non-stratification factors, and linear, logistic and ordered logistic regression could be utilized to assess treatment differences ignoring stratification. Ignoring a design aspect implicitly assumes its variation is random. So, ignoring some stratification factor(s) does not necessarily imply ignorance ofbasic .

1.4.2. Covariance Adjusting Stratification Factor(s)

Other times, analyses addressing the stratified design utilize covariance adjustment for (some) stratification factors instead of actual stratification adjustment. The rationale for covariance adjustment lies in the fact that (fully) stratified analyses may be inefficient due to small within-stratum sample sizes; additionally, one may wish to covary a design factor in order to actually estimate a stratification effect since it may be of interest in its 12 own right. Moreover, a covariance adjustment can improve efficiency by accounting for variation due to stratification differences, in a main effect sense. For example, in multi­ center trials, differences due to regional or practice variation can decrease power to detect treatment effects by inflating the variance; adjusting for these differences can reduce the variance and increase the power. Covariance adjustment can also reduce bias from minor treatment group imbalances in one or more covariates related to the response(s). Further, covariance adjustment may show the extent to which treatment effects are (not) explained by other factors (Koch et ai, 1982). Nonparametric covariance adjustment (e.g. RAN­

COVA) (Quade, 1967, 1982), stratified nonparametric covariance adjustment (Koch et ai,

1982, 1990), Higher-Way ANOVA, ANCOVA, and linear, logistic and ordered logistic regression are techniques which can account for the strata as one or more covariates.

1.5. Examples 1.5.1. Example 1: Analgesics and Pain Relief

A dental pain relief ReT, discussed in Gansky et al (1994), serves as one useful .. example. In a two center, parallel groups, randomized, double-blind study of analgesics,

258 patients, suffering from dental pain following oral surgery, were given one offive ran­ domly assigned medications: placebo (P), low dose (200mg) standard drug (SL), high dose (400mg) standard drug (SH), low dose (50mg) test drug (TL), or high dose (lOOmg) test drug (TH). In each center, treatments were assigned in permuted blocks of size 20 to promote balance. Subjects were split nearly equally between the treatments and centers.

Patients recorded ordinal pain intensity and pain reliefin analgesia diaries at 4, 1,2,3,4, 5,

6, 7 and 8 hours after medicating. Typically, acute pain trials (e.g. Mehlisch et al, 1990) use summary efficacy measures over the time period, i.e. the sum of pain intensity differences (SPill) from baseline, total pain relief (TOTPAR) and the total extent of half relieved pain (TOTGONE), which are (nearly continuous) weighted sums of ordinal measures (Gansky et ai, 1994). 13

1.5.2. Example 2: Respiratory Ailment The sample data from Koch et al (1988) is another interesting example; it involves a two-center, multivisit clinical trial comparing two treatments in 111 patients with chronic asthma, a respiratory disorder. Randomization was stratified on center: patients at each center were assigned to active treatment or placebo in sequential permuted blocks of size

6. Gender and age were recorded at baseline. Respiratory status was measured at baseline and each offour visits with a five point ordinal scale: terrible (0) to excellent (4).

1.5.3. Example 3: Multicenter Herpes Zoster Study This example concerns a multicenter, double-blind, randomized parallel group trial of three antiviral medication regimens for herpes zoster (shingles) in 1141 immunocompetent adults at least fifty years of age. One of three treatment regimens - Standard (Active

Control) for 7 days, Test for 7 days, or Test for 14 days - was assigned to patients within.

107 study sites in sequential blocks ofsize six. Centers enrolled between one and forty-six patients, meaning that some had incomplete assignment; i.e. not all three treatment groups were assigned. Herpes zoster (shingles) is characterized by acute herpetic rash, fever, and neuralgia (e.g. Nikkels and Pierard, 1994; Wood, 1994; Beutner et aI, 1995). Response was the number of days (2-7, 10, 14, or 21) until cessation of new lesion formation or no new increase in lesion size. Roughly 3% ofthe patients did not respond by the end ofthe study (Day 21); i.e. they were right censored. Although the protocol called for all patients to be enrolled within 48 hours of onset, many patients were enrolled after that time; thus, time since onset can be used as a post-stratification variable.

1.5.4. Conceptual Examples In cluster-randomized studies such as community intervention trials, designs are often complex. For example, in a recent controversial dental RCT (Bell et a.l, 1982; Klein et aI,

1985), a city was chosen from each of ten regions (Southeast, Northeast, Central, 14

Southwest, Northwest) by fluoridated water (yes, no) strata; within each stratum schools. were randomly assigned to one offive treatment regimens or the control group; data were collected for each participating child. Similarly, in the Cardiovascular Health in Children

(CffiC) study in North Carolina (Harrell et al, 1996), three schools within six region

(mountain, central, coastal) by locale (rural, urban) strata were assigned to one of two interventions or a control condition; longitudinal data were measured at the child-level.

Another situation is that sometimes with more than one subunit of the initially randomized unit, there exists further internal randomization of the subunits in a split-plot design. One such potential dental RCT example could be: one agent such as a mouth rinse randomized to patients and an additional treatment such as sealants randomized to sites within people, such as jaws, quadrants, teeth or surfaces. Or in a dermatological RCT, oral agents could be randomized at the patient level and topical agents randomized at the body part level within each patient. Another instance could be in community, heaIth- • promotion, disease-prevention trial cluster randomization: interventions are randomized to clusters of people, such as schools, factories, communities, or primary care physicians

(e.g. Ory et al 1993), and additional interventions are randomized to individuals within those clusters.

1.6. Purpose Currently, stratified RCTs are often assessed without analysis procedures accounting for the full stratification. The consequences ofignoring strata or adjusting strata as covar- iates, relative to fully stratified procedures are not completely known and deserve further . . investigation, since trialists often perform analyses not fully accounting for stratification.

This research evaluates the properties of strategies that ignore strata or adjust strata as covariates using both design- and model-based methods, relative to fully stratified analyses, in terms of validity and power. Derivations, simulations and actual examples illustrate the performance for continuous outcomes. Sparse strata situations are examined, 15

with particular emphasis on multicenter trials with not all treatments assigned within all centers (i.e. incomplete centers). The case of many centers with one or two patients is studied in depth, including the situation in which selection bias exists. Power and bias are examined when combining estimates from complete and incomplete centers for this leading case using both random effects models and randomization-based models. Finally, .methods extending this leading case to multicenter trials with more than one or two

patients per center are developed. Chapter 2 A Review of Some Statistical Methods for Randomized Controlled Trial Analyses

2.1. Introduction In general, clinical trials gather data during one or more times from subjects assigned to treatment groups within strata. These data consist of stratification variables, re- sponse(s) and possibly covariates which may be predictive ofthe response(s). Generally, the vector of R responses for the ith subject from the)th group in the hth cross-classified stratum at the tth time point is denoted as •

Yhijt = [Yhijtt,Yhijt2, ... , YhijtR]', where i = 1, 2, .. " nhf indexes the subjects in the )th group and hth cross-classified strat­ um,) = 1,2"", J indexes the number ofgroups, h = 1,2,"', H indexes the cross-classi- fied strata, and t = 1,2,"',T indexes the time points. Similarly, the vector ofC (possibly time varying) covariates measured for the ith subject from the hth cross-classified stratum and)th group at the tth time point is denoted Xhijt = [Xhijtl,Xhijt2, .. .,XhijtcJ'. In ex- amples and scenarios presented herein the number ofgroups, J, usually equals two. The number ofsubjects in the)th group ofthe hth stratum is nhf.

The rth response at the tth time point can be addressed univariately. The Yhijt vectors can be rewritten as Ytr vectors of the rth response at the tth time point with dimension (N x 1), so

Ytr = [Yluer, ''', Ylnlller, "', YUJer, "', YlnlJJer, ''', YHllt,., "', YHnHllt,., "', YHIJt,., ''', YHnHJJtr]' H J where N = L Lnhj is the total sample size for the study. h=lj=l 17

Although the discussion in this chapter focuses on methods for clinical trials with

parallel groups designs, adaptations of these methods for trials with two treatment, two period crossover designs are also discussed. In classical crossover RCTs each patient is assigned to one of two sequences of two treatment regimens over two periods so treat­ ment comparisons can be made within each person, thereby reducing variance (person-to­

person variability). Sequence group assessments focus on differences between the two periods for each person to test treatment effect and on sums across the two periods to test

carryover. (The presentation of crossover methods to test sequence group differences in this chapter assumes the usual requirements for crossover studies apply; e.g. sufficient washout between periods, so carryover effect of the first period treatment persisting into the next period can be assumed null. Methods to assess carryover are reviewed only briefly here. In the presence ofcarryover, the first period responses can be used to assess the treatment effect; additional endeavors to examine treatment effect in the presence of carryover are not discussed here.) Detailed descriptions of assumptions and methods for crossover trials can be found in Jones and Kenward (1989), Senn (1993) and Tudor and Koch (1994).

2.2. Design-based Methods 2.2.1. Fisher's Exact Test In general, the actual data realization evaluated by a statistic can have its exact significance evaluated as a permutation test through permuting the possible data groupings and noting the resulting statistics for that data. Conditional permutation tests assess this after conditioning on some property, such as marginal sample sizes. For example, the

Fisher's exact test for a 2 x 2 fixes (conditions on) the marginal totals. (The randomization of patients to treatment groups in clinical trials fixes one margin and the null hypothesis of no difference in the other factor - i.e. response - fixes the other margin). This induces a hypergeometric distribution for the individual cell counts and 18

allows evaluation for all the possible tables with the same marginals; this compares the

observed table to the possible ways subjects with these fixed responses and marginal totals

could have been assigned to treatment. Then, the significance level is found by summing the probability of all possible tables (with the same marginals) having at least the probability of the actually realized table. Although Fisher's exact tests can be one or two

tailed, henceforth only two tailed tests will be considered. Fisher's exact tests can be extended to tables larger than 2 x 2 since conditioning on marginal totals induces a multi­

variate hypergeometric distribution for the cell counts for larger dimensions.

In two treatment, two period crossover trials with two treatment sequences and di­

chotomous responses, a Gart test modification (1969) ofthe Fisher's exact test compares the discordant response pairs' (i.e. patients whose responses change from one period to the

other) for the two sequences. Concordant response pairs (i.e. patients whose response are the same for both periods) are discarded or ignored. The differences between periods can be compared without discarding concordances with a test detailed in the next subsection.

2.2.2. Extended Mantel-Haenszel Test For randomized trials, minimal assumption methods (Koch and Edwards, 1988; Koch

et aI, 1990) can be used as primary significance tests of efficacy. Cross-tabulations for

assessing the relationship of a J level factor with a K level response, while adjusting

across the H strata, formed by other factors, can be constructed. The adjustment factors

form H strata, where H is usually equal to the product of the number of levels of each factor, e.g. with two dichotomous factors, such as center and baseline severity, four strata

are formed. Figure 2.3 depicts the cross-tabulation ofthe hth such stratum, where nhjk is the number of subjects in the hth stratum having the jth level ofthe factor for comparison

(for example treatment) and the kth level ofthe response. (A subscripted" ." indicates a

summation over that particular term, e.g. nil.. is the total number of subjects in the hth

stratum.) Moreover, under the null hypothesis, Ho, of no factor difference in response for 19 each patient, marginal totals are fixed and a product multivariate hypergeometric distribu­ tion applies:

Figure 2.1 Cross-tabulation ofthe hth stratum

Response Category

1 2 K

1 nhll nh12 nhlK nhl·

Factor 2 nll2l nh22 nh2K nh2· Category

J nhJI nhJ2 nhJK nhJ·

nil.I nll·2 nh·K nh··

For the hth stratum, the marginal proportions are Phj. = nhj-!nh.. for the jth row and

Ph.k = nh.k!nh.. for the kth column. Then, let Ph*. = [PhI., Ph2.,···, PhJ.]' be the (J x 1) vector of row marginal proportions and Ph.* = [Ph.I,Ph.2,· ",Ph.K)' be the (K x 1) vector ofcolumn marginal proportions. Furthermore, let nhj* = [nhjl, nhj2, ..., nhjK]' be the (K xl) vector of observed counts for the jth response; stack them for nh = [nh h ' nh2*, ..., nhJS, the (JK xl) compound vector ofobserved counts for the hth stratum.

Then under the hypergeometric distribution, the expected value is

where ® is the right Kronecker product, and the variance is

2 V{nhIHo} = Vh = nhnh'::'l(Dph•. - Ph*.Ph*.) ® (Dph.• - Ph.*Ph.*), (JKxJK) .. where Dais a diagonal matrix with elements ofthe vector a on the main diagonal. Then the residual vector Th = nh - mh is weighted with appropriate scores Wh = Arh, 20 where A has J K columns so its dimensions conform to premultiplying the residual vector; the choice of A is discussed further below. These weighted residuals are H aggregated across strata W = E W h and used with the variance ofthis sum ofweighted h=l H residuals, Vw = E A lIhA'. Under the null hypothesis, h=l E{WhIHo} = 0 and V{WhIHo} = AlIhA', -1 so a scalar test statistic with a quadratic form is produced: QEMH = W'VwW. For large n-j-, by the , W is distributed approximately multivariate normal, so the extended Mantel-Haenszel (EMH) statistic (Mantel and Haenszel, 1959;

Mantel, 1963; Kuntz et ai, 1988), QEMH, has an approximate chi-square distribution with degrees offreedom (d.£) equal to the rank ofA ( :S J K). The chi-square approximation for the classical Mantel-Haenszel procedure for a set of2 x 2 tables can be used when the frequencies satisfy the Mantel-Fleiss criterion (Fleiss, p. 175, 1981): HHH H} • min{ {h~ mhll - h~ (nhll h }, {h~ (nhldu - h~mhld 2:: 5,

With smaller samples, exact methods such as StatXact (Cytel Software, 1995) can be. used. For responses with more than two levels, the row totals nj- need to be moderately large (e.g.

> 30). To test the homogeneity of odds ratios in sets of2 x 2 tables, the likelihood ratio test, Breslow-Day test or Zelen exact test can be used. With sets of larger dimension tables, the pseudohomogeneity statistic (Koch and Edwards, 1988) assesses similarity of effect across strata. The scores A are chosen to reflect both the alternative hypothesis ofinterest and the scalings of the factor and response measures. Scalings of measures increase in hierarchy: nominal, ordinal, interval or ratio; as with any statistical method, tests designed for a particular scale can be used for measures with at least that scale of hierarchy, e.g. ordinal data methods can be applied to interval data. With nominal scalings for both the factor 21

and response, a «(J - l)(K - 1) x J K) matrix of general weights which removes one row and one column would be appropriate; for example

AG = [IJ-l, OJ-I] /Z) [IX-I, OK-I] , where IJ-I is a «(J - 1) x (J - 1») identity matrix and OJ-I is a «(J - 1) x 1) vector of zeros, excludes the last row and column for all strata. (Note that this EMIl test with

A G is invariant to the choice of the excluded row and column; usually either the first or last are chosen for convenience.) This general EMH test has (rank of AG)

(J - 1)(K - 1) degrees offreedom for the chi-square approximation. With ordered levels of either factor or response and nominal levels of the other, a block diagonal weight matrix composed ofmonotone scores, ahk. is appropriate to assess trends of the nominal measure groupings. With an ordinal response and nominal factor, which is more common than the reverse, a «(J - 1) x J K) matrix can compare the mean scores ofany J - 1 ofthe J nominal groups for a location shift; specifically

AM = [IJ-l, OJ-I] /Z) a~ , where a~ = [ahlo ah2,"', ahK]· (J-I)xJK (J-I)xJ IxK As with the general EMH case, the mean score EMH test is invariant to the choice ofthe row excluded. (With an ordinal factor and nominal response, AM is «(K - 1) x J K), formed by a «(K - 1) x (K - 1») identity matrix with OX-I and the monotone score blocks have dimension (1 x J).) The chi-square approximation for the mean score EMH test has degrees of freedom as the number of nominal groupings minus 1, i.e. J - 1 for ordinal response (or K - 1 for ordinal factor).

With both factor and response ordered, compound row vectors ofmonotone ordinal scores for columns, Chj, and rows, ahk, are appropriate to assess factor trends in the mean response, i.e. Ac = c~ /Z) a~ , where c~ = [ChI. Ch2,"', ChJ]. (Although separate row IxJK IxJ IxK and column scores are shown, often in practice the same type ofmonotone scores is used for rows and columns, particularly when using standard software.) The correlation EMH test has 1 degree of freedom for the chi-square approximation. (Note that with rank 22

scores for both the factor and response VQEMH/(N -1) estimates the Spearman coefficient.) With two factor levels (e.g. placebo and active treatments) using

0-1 scores, the mean score EMH statistic is actually equivalent to the correlation EMH statistic. The choice of the actual monotone ordinal scores, Chj and ahk, is somewhat open to interpretation, although certain scores are recommended for particular situations

(Graubard and Korn, 1987; Koch et ai, 1985). Binary, integer, rank, midrank (also known as ridit, an abbreviation of "Relative to an Identified Distribution" due to Bross, 1958), standardized midrank (modified ridit), logrank and probit (van der Waerden, Blom or

Tukey) scores are among the possible choices for monotone scores.

With one stratum, two groups and a dichotomous response, Q EMH is proportional to

1 the usual Pearson chi-square test, with a proportionality constant NN . The chi-square approximation of the Wilcoxon rank sum test (1945) (or its Mann-Whitney U variant

(1947» is a special case of QEMH using rank scores with midranks for ties when H = 1 and J = 2; instead if J > 2, the mean score QEMH simplifies to the chi-square approximation of the Kruskal-Wallis test (1952). Similarly, QEMH using ranks with midranks for ties reduces to the chi-square approximation ofFriedman's randomized block test (1937) (with one observation per treatment per block - no replicates) or van Elteren's test (1960) (with replicates). Moreover, Q EMH using logranks is equivalent to the chi­ square approximation of the Mantel-Cox test (Koch and Edwards, 198"8).

In two treatment, two period crossover RCTs, tests are applied to compare the ranked period differences (after pooling all patients) with midranks for ties, and so QEMH becomes a Wilcoxon rank sum test. Similarly, the ranked period sums ofsequences can be compared with a Wilcoxon rank sum test for carryover (Tudor and Koch, 1994). With binary response H = 1 and J = 2 and the scores [-1, 0, 1] for period differences (or scores [0, 1, 2] for period sums), sequences can be compared by QEMH as a trend test

(Tudor and Koch, 1994). 23

2.2.3. "Stratified Randomization Stratified randomization analysis of covariance (STRANCOVA) (Koch et aI, 1982,

1990) combines Quade's rank analysis of covariance (RANCOVA) (1967, 1982), involv- ing nonparametric covariance adjustment ofpooled treatment groups which were random­ ly assigned, with Mantel-Haenszel methods to test the adjusted residuals for the treatment groups across strata. For J = 2, the treatment difference is denoted dt = Y.2tr - Y.ltr and the covariate difference between treatments is de = X .2te - X .lte, where Y.jtr is the mean ofthe rth response at the tth time and X .jte is the mean ofthe cth covariate at the th time of the jth treatment group. Under randomization, E{de} = 0, since there is no expected difference in the proportions of subjects allocated to each treatment within each stratum for prerandomization covariates. Under the null hypothesis of no treatment difference, E { d,IHo} = O. The variance of de and de is -_ [Vt Vt.e] V ar{ dt , d}e V __ Vt,e Ve ~ijte ~ = N [t t I: [ (Yhijtr - 1h.tr)2 (yhijtr - Yh.tr)( X h.te)] ] n.jndN-1) hI'= J=1 i=l (Yh"1Jtr - -Yh·tf )(Xh"1Jte - -Xh)·te (Xhijte - Xh.te) where Yh.tr is the mean in the hth stratum for the rth response at the tth time and x h.te is the mean in the hth stratum for the cth covariate at the tth time. Then the test

Q = [de de] V-I [de de]' is approximately chi-square with 2 d.f. and can be rewritten 2 as Q = {dt - :~c de } {Vt(l - R;y)}-1 + ~ = Q(dtlde) + Q(de), where

Rxy = Vt,e/(VtVc)ll2 and Q(dtlde) is the 1 d.f test comparing treatments given covariate similarity which is tested with Q(de), also a 1 d.f approximate chi-square test. The weighted least squares model E [de de]' = [I 0]' f3 has its estimate ~ = dt - :~c de • 1\ '> 1\2 1\ _ _ 2 with Var(f3) = Vt(l- R;y). So Q(dtlde) = f3 /Var(f3) = (Z.2tr - z.ltr) /vz, where Zhijtr = Yhijtr - Yh.tr - (Xhijte - x h.te) x {f t ~(Yhijtr - Yh.tr)(Xhijt - x h.te)} h=1r-1 ,=1

H 2 nh] _ ,>}-1 ~~E(Xhijte X { - Xh.te)- 24

is the covariate adjusted residual for the ith subject, jth treatment, hth stratum and tth time N H2r1hi ._ •• 2 and V z (N-l) I: I:I: zhijt'r' Since Zh·tr O. An EMH test WIth Integer scores = n .• n·2 h=l 1=1 i=1 = compares the residuals for the treatments. When Yhijtr and Xhijtc are within-stratum ranks across treatment groups, this is the stratified version of a special case of Quade's

RANCOVA (when Quade's weight function uses the sums of squares and cross products matrix based on ranks). Other scores, such as midranks, as discussed in the previous subsection can be applied.

2.3. Model-based Methods 2.3.1. Logistic Regression If the subjects are viewed as representative of the respective subpopulations in a stratified process from a larger population, then with a dichotomous response, a product binomial distribution applies at the tth time for the rth response. In terms ofFigure 2. I, J = K = 2, so 2 12hj.! 12hj1 (1 )12h i2 P r ({ nhjk}) = IlIl U v 1rhj - 1rhj J h j I1 I1 nhjk! j=l k=1 A model ofinterest is the parallel line logistic model for the dichotomous response:

, -1 exp('11 +x'hijt!3t) 7fhj = {I + exp[ - (", + x hijt.Bt)]} = l+exp('11+x 'hijt!3t)

where 7fhj is the probability of the rth response at time t for the jth group in stratum h; '" is the intercept parameter, which is a cell for the reference subpopulation; .Bt is the vector of regression parameters for the tth time where the elements of the vector are typically increments for the corresponding predictors in Xhijt, the row vector of the predictor variables at time t for the itli patient in the (hj)th subpopulation. Exponentiating the • elements of .Bt yields estimates of the odds ratios. Then, for all 0 ~ 7fhj < 1, logit{7fh;} = '" + X'hijt.Bt. The logit of 7fhj is the logarithm of the odds of the rth re- 25 sponse at the tth time to nonresponse for the (hJ)th subpopulation. The parameters can be estimated using maximum likelihood methods. The model goodness offit (GOF) can bOe assessed with the Pearson chi-square, likelihood ratio chi-square and the Rao score statistics.

2.3.2. Ordinal Logistic Regression The model in the previous section can be extended to ordered response with K levels in which subjects are again viewed as a stratified simple random sample process from a larger population. Then, a product multinomial distribution applies at the tth time for the rth response, so that now in terms ofFigure 2.1 with J = 2 and 2 rrK 1r:??k Pr({nhjk}) = rrrr nhj-!~. h j k hJk·

The proportional odds model for the K - 1 cumulative logits (indexed by k' = 1,2, ... ,K - 1) for ordinal response is ofinterest: {to"h;"" /",t.."h;I k for the (hJ)th subpopulation. The parameters can be estimated using maximum likelihood methods. The proportional odds assumption can be evaluated with a , while the model GOF can be assessed with the Pearson chi-square, likelihood ratio chi-square and the Rao score statistics.

2.3.3. Two Sample t-Test For parallel groups trials, the means of the rth response at the tth time for each treatment can be compared with a two sample t-test calculated as

- (- - S2 ( -1 -1) t tr - Y·ltr - Y.2tr )/V t n.1 + n.2 , Hhl nh2 } N~2 where Sl = { L:L:(Yhiltr - Y.ltr)2 + L:L:(Yhi2tr - Y.2tr)2 is the pooled variance h i h i estimate. This is distributed as a t distribution with N - 2 d.f ifYhiltr and Yhi2tr are inde­ pendent and normally distributed. The two sample t-test is equivalent to the general linear . univariate model discussed in the Subsection 2.3.5. with the design matrix X= [~ ~ ], i.e. a one-way analysis ofvariance.

2.3.4. Paired t-Test In two period, two treatment crossover trials, the difference in the rth response between treatments for each patient, denoted D hir, is used for the within-patient test ofno treatment effect. Additionally, the sum of the rth response across the periods for each patient can be used for the test of no carryover effect. The Student's t statistic for the difference, t r = Dr / J s'i/N, where S~ = N~l L:(Dhr - Dr )2, has a t distribution with h

N - 1 d.f ifthe D hr are normally distributed. (The sums are used in a similar manner.) 27

2.3.5. General Linear Univariate Model The general linear univariate model (GLUM) methodology as described in various

texts (e.g. Searle, 1971) provides an eloquent unifying summary of the well-known univariate (ANOYA), analysis of covariance (ANeOYA), and linear

regression. The GLUM for the rth response at the tth time point can be expressed as

Ytr = Xf3t + Ct, where X is a matrix offixed known constants, f3t is a vector offixed unknown constant

parameters, and St is a vector of unobserved errors for the tth time point. The error

vector is assumed to be distributed multivariate normal with mean 0 and variance Jt 1.

Other ANDYA assumptions include independence of observations, ,

linearity ofthe parameters and the existence offinite second moments.

One method for expressing X, called "classical" coding, includes a column of ones

and, for each classification measured, indicators for each ofthe groups. It is also known

as "less than full rank" (LTFR) coding, since X is overspecified and singular due to

linearly dependent columns, i.e. the first column is equal to the sum ofthe next} columns.

This coding is particularly good for unbalanced data, i.e. unequal nol . When X (and hence X'X) is LTFR, the normal equations X'X f3 = X'Ytr can be

solved using the method ofgeneralized inverses. (If MGM = M then G is known as a nonunique generalized inverse ofM, denoted as G-.) So the solution, requiring the use

ofa generalized inverse, depends on the particular generalized inverse chosen. Thus least

squares estimates of the classification effects on the response can be found by

1\ f3 = (X'X)-X'Ytr' These possibly biased estimates are distributed singular multivariate normal with mean (X'X)-eX'X)f3 and variance JteX'X)-. The predicted response

1\ vector given the classification effects can be produced by Ytr = Xf3, which is distributed singular multivariate normal with mean Xf3 and variance dtH where H = X(X'X)-X' is called the "Hat" or "Projection" matrix since Ytr = H Ytro Although the estimates 28 using generalized inverses are biased, the hypothesis tests it generates for estimable sources ofvariation are invariant to the choice ofgeneralized inverse.

The total variation about the average response can be partitioned into the sums of squares attributed to the classifications (regression) and the sums of squares attributed to error SSTOT = SSREG + SSE and can be computed by SSTOT = Y~rYtr - (~)Y~r(lNl~)Ytr,

= Y~r[I - (~)(lNl~ )]Ytr, which has N - 1 degrees of freedom (d.f). The amount of variation of the response about a constant due to the classifications can be calculated as

1\' SSREG = {3 X'Ytr - (~)Y~r(lNl~ )Ytr,

= Ylr[H - (); )(lNl~)]Ytr, which has d.f (d!REG) equal to the rank of X minus 1. The remaining variation, con­ tributed by' error, is found by SSE = (Ytr - Ytr)'(Ytr - Ytr)

1\' = Y ~rYtr - /3 X'Ytr

= Ylr[I - H]Ytr, where the error d.f (dfE) are equal to the total sample size minus the rank ofX.

1\ Estimable functions, a class of linear combinations of the parameter estimates ((3), have unique estimates even with generalized inverses. Such linear combinations can be devised to test certain aspects of the grouping classifications using a contrast matrix O.

1\ For estimable functions, 0/3 is unique and normally distributed with mean 0/3 and variance JtO(X'X)-O'. Testing of; = 0 is accomplished with

1\ 1\ SSH = (O{3)'[O(X'X)-O']"IO{3, and noting that under the null hypothesis (O{3 = 0), the ratio ;;~7~~~ follows the

Fd! H,d!E distribution, where the hypothesis d.f (dfH) is the rank of O. Individual parameters which are estimable can be tested for zero to assess if the characteristic 29 explains significant variation of the responses. These F tests use the partial sums of squares also known as added last or Type TIl sums ofsquares.

In the first two examples stated in Chapter 1, one-way ANDVAs use treatment as a two level factor (equivalent to two sample t-tests when J = 2); two-way ANDVAs use treatment and center each as two level factors or treatment as a two level factor and center x block as a 14 level factor; three-way ANDVAs use treatment and center each as two level factors and baseline as a five level factor.

GLUM methods can compare allocation sequences of patient period differences (to assess treatment) and period sums (to assess carryover) in crossover designs. GLUMs with intercepts only (no treatment effect) are extensions ofpaired t-tests which can allow adjustment for other factors (e.g. covariates). GLUMs with intercepts, treatment, period, sequence and subject within sequence effects can model the actual responses in crossover

RCTs.

2.3.6. Random/Mixed Effects Model The previous GLUM applies to fixed effects models, which have predictors with preset levels. For example, treatment has J levels, gender has two levels, and block has B levels. However, there are many situations in which there are an infinite or very large number of levels in reality and a subset are chosen for study; for example, centers or clinics used in studies are chosen from a larger group ofcenters, participating patients are chosen from a larger patient base, time points observed are chosen from an infinite temp­ oral continuum, and doses in multiple dose studies are chosen from a of potential doses. Thus, the levels for these factors are random, not fixed, quantities. A model in­ volving a fixed effect and a random effect is called a mixed model.

A mixed model (McLean et ai, 1991) for the ith person in the hth stratum for the data structure specified in Section 2.1. is Yhi = X hi{3 + Mhi(hi + ehi, where

Yhi is the vector ofresponses for up to Thi times for the ith subject in the hth center, 30

X hi is the fixed effects design matrix; e.g. containing separate intercepts and slopes

(corresponding to times) for each treatment for the ith subject in the hth center, f3 is the vector offixed primary parameters corresponding to the effects in Xhi,

Mhi is the block diagonal, random effects design matrix; e.g. containing subject-specific intercept and slope for the ith subject in the hth center,

Chi is the vector ofrandom effect parameters from the effects in M hi for the ith subject in

the hth center (for example, (hil could be the random intercept deviation and (hi2 the random slope deviation from the population regression line), and ehi is the random vector ofunobservable within-subject error terms for the ith subject in

the hth center. Mixed models can be fit using restricted maximum likelihood (REML) estimation (Laird and Ware, 1982; McLean et aI, 1991) to provide best linear unbiased prediction (BLUP).

Thus, for example, with observations over time for each patient, one can think of each patient as having a random regression line which deviates from the population regression line. The design matrices for the above mixed model could be expressed as t [~::] [~ ~? Ess(X) = I z @ [; ;:] and M = IN @ [: ;:] and Var = I l

1 tThi 1 tThi where Ess( .) is the essence function which retains unique rows of the matrix. This model has E(y) = Xf3 and Var(y) = MAM' +~2I. The fixed effects design matrix here has a cell mean structure, i.e. a separate intercept and time factor for each ofthe two treatment groups; it could have additional fixed effects, as well. The common covariance matrix ofthe random effects is A = [~:~ ~~:] . which has linear structure, where .6. 11 is

the variance for the random intercept, .6. 22 is the variance ofthe random slope, and .6.]2 is the covariance between the random intercept and slope; V ar(ehi) = ~2Vhi is the covariance matrix of the random deviations about the random regression line for the ith subject in the hth center, and ~2 is an unknown scalar within-subject error variance 31 parameter (same for all subjects). This model has linear covariance structure, since /). does. This model allows each patient to have observations at differing times and each to have a different number of observations (Thi). (If patients missed any planned visits or observations, however, these methods assume the data are missing completely at random - MCAR - in the terms ofLittle and Rubin (1987).) The treatment effect can be assessed with the secondary parameter () = 0'f3 - ()o where f3' = [Q1 f31 Q2 f32],

()o = 0 and 0' = [1 0 - 1 0]. Another example is with a random center effect and a univariate response for each patient within the centers. In this case, the random effects design matrix has a random int­ ercept for each center; the fixed effects matrix could have a cell mean structure with a sep­ arate intercept, as well as additional fixed effects for each treatment group.

Furthermore, mixed models can be used for crossover trials by specifying separate random treatment effects for each patient. The treatment effect can be interpreted as the difference in response for a patient whose treatment regimen changes. Although, mixed models for crossover data can accommodate missing data, the data would have to be

MCAR; i.e. data would have. to be missing for reasons unrelated to current or previous response, such as logistical or administrative problems, lost forms or laboratory samples, or patients moving out ofthe area. •

2.3.7. Weighted Least Squares

Ordinary least squares (OLS), a type ofGLUM, can be generalized to weighted least squares (WLS) to allow (Grizzle et ai, 1969; Koch et aI, 1982), when the number ofcategorical predictors is small. WLS methods assume only cross-tabulation sample sizes sufficient for the mean responses to be asymptotically multivariate normal and the classification group data to be considered representative of the populations they portray, similar to stratified simple random samples. 32

Altering Figure 2.1 slightly to form a contingency table with H* =(H x J) subpop-. ulations on the vertical axis and K response levels on the horizontal one and letting h*

index the subpopulations, gives the nh*k with product multinomial distributions in this sit-

uation.

Pre{nh*k}) = IT [n h*'! IT { {1r~:*~~h*k}] , h*=1 k=l where 7rh*k = E{nh*k/nh*.} is the probability that a randomly selected patient from the K h*th subpopulation has the kth response and L7rh*k = l. k=l

Now using the conventions from Landis and Koch (1979), let mho = nh* /nh*., a

(K x 1) vector, be the observed sample proportions from the h*th subpopulation. Further,

stack the mho to form the (H*K x 1) compound vector m, which is the unbiased

maximum likelihood estimator for the compound parameter vector 1r, shaped in the same

manner. The covariance matrix can be consistently estimated by the (H*K x H*K) block

diagonal matrix V(m) which has the H* (KxK) submatrices

Vh*(mh*) = (Dtnh* - mhom~o)/nh*. on the main diagonal, where D tnh* is a (KxK) diagonal matrix with mho 's elements on the diagonal. Each Vh* (mho) is the multinomial covariance matrix for the hth stratum. Then let F =F(m) be a matrix of/functions of • m of interest, with a consistent, asymptotically nonsingular, (fx./) covariance matrix esti­

mator VF = HV(m)H' , where H = a~:) IX=tn has dimensions (fx H*K). The a-

symptotic expected value of F is EA{F(m)} = F(1r) = X{3, where X is a full rank

/\ design matrix. The parameter estimate vector {3 = (X'ViixylX'ViiF, a best asymp-

totically normal (BAN) estimator, has its covariance matrix consistently e~timated as

/\ /\ Vp = (X'V;Ixyl . The residual Wald statistic Q = (F - X{3)'V;1 (F - X{3), which

has the approximate chi-square distribution with U- Rank(X)) d.f, is used to assess goodness offit (GOF). 33

A Linear combinations 0 {3 can be formed with the result being asymptotically normally

A distributed with mean 0 and covariance matrix 0 (X'Vi}X)-I0'. Testing 0 {3 = 0 is A A accomplished with a Wald statistic (O{3)'[O(X'V;IX)-IO']-I(O{3), which is distributed approximately chi-square with (rank of 0) d.f.. Predicted values can be generated using A A F = X {3 = X(X'V;IX)-IX'V;I F and its covariance matrix can be estimated consistently by Vi' = XVpX' = X(X'V;IX)-lX'.

A When the model fits well and subpopulation. sample sizes are adequate, F and its est- imated covariance matrix Vi' are better estimates than F and VF , the sample estimates.

Of particular interest is the linear transform F (In) = Lin, where L is the

(H* x H*K) block diagonal matrix of [1 2·· ·K] row vectors. This produces y, the vector of means of integer scores for responses in each subpopulation, with covariance matrix

A Vi = LV(Tn ) L'. Estimates of the parameters (f3 .. ), its covariance matrix (l-J.), pre- dicted values (~), and their covariance matrix (Vj ) can be found as described previously.

2.3.8. Generalized Estimating Equations Quasi-likelihood theory for distributions gives rise to generalized linear models (McCullagh and Neider, 1989) which assume a known transformation ofthe response's marginal expectation is a linear function ofthe covariates and that the variance is a known function of its expected value, except for a possibly unknown .

In other words, the actual distribution ofthe response is not specified, just the relationship between the mean response and covariates, as well as between the mean and variance.

Thus, normal and non-normal responses can be modelled. Generalized estimating equation (GEE) methods (Zeger and Liang, 1986; Liang and Zeger, 1986) use this quasi- likelihood approach along with a specified "working" correlation matrix for correlated or clustered observations in an iterative fashion to provide consistent population average

(marginal) regression parameter and variance estimates. 34

Using a compound response vector as described in Section 2.1., let t* = 1,2, 0001';

index the observations for the t = 1,2,00', Ii- times and r = 1,2,00', R responses ofthe fih subject in the hth stratum belonging to the jth group, where i* = 1,2,0' 0, N indexes the i = 1,2,000, nhj subjects from h = 1,2, 000, H strata and j = 1,2,00', J groups,

where N = EEnhj. (The notation for the remainder ofthis chapter uses N (the number h j of patients) as the number of clusters, although later chapters use H clusters and have

patients within clusters.) Then, the expected value of the t*th observation for the i*th subject can be written as E[Yi-to] = J.tiot- = 1i[x~-t-,B] and the variance can be written as a function of the expected value V ar[Yioto] = Vi·t· = W[J.Li·t· JI¢, where the inverse of 1i is called the link function and ¢ is a scale parameter.

Denote P(Q) as the "working" correlation matrix, characterized by the unknown nui-

sance parameter vector Q. Then, the working covariance matrix is

I I Vi· = D~i.P(Q)D~i'/¢' where D lJJi• is an Ti*' x Ti*' diagonal matrix with W[;.ti-t-] as the N diagonal elements. The GEEs (quasi-score equations) are E G i• (,B,~) = i-=} t H;. Vi: 1rio = 0, where Hi'. aM' transforms the observations to the parameter s~ace i o =l and rio = Yi o- P-i'; P-i' is the vector of observations, J.ti·t·, for the i-th subject. The A N A vector of consistent parameter estimates ,B is the solution of EGi- [,B,~{,B,¢(,B)}] = 0, ;'=1 A where ~ and ¢ are consistent estimates of Q and ¢, respectively. Furthermore ..;N(/3 - ,B) is distributed asymptotically multivariate normal with a "robust" covariance matrix of

N • I -I -l { N I -1 -1 }{ N I -I }-I VC = lIm N { E Hi' Vi· Hi' } E Hi' Vi. COV(Yi') Vi- Hi' E Hi- Vi. Hi" , N ~oo ;0=1 i o =l i o =l where COV(Yi') is the true, not the assumed, covariance matrix. In practice for estimation

A of Ve , COV(Yi') is replaced with an estimate for rio r~_ whereby ,B was replaced by ,B for each J.Li"t· comprising P-iO. The GEEs are solved using iteratively reweighted least squares

and standardized residuals and iterated until convergence is achieved. 35

2.3.9. Survey Data Regression Survey data regression methods (SDR) (Binder, 1983; Shah, et aI, 1977; Shah and

LaVange, 1982; Shah et aI, 1993; Bieler and Williams, 1995) involve modifying usual

OLS methods by changing the variance estimation to account for correlation ofdata from

the same patient. Regularly- or irregularly-timed repeated measures data from a clinical

trial can be interpreted as analogous to cluster samples with each person as a sampling unit

and each time point as a within-cluster element. The covariance matrix ofthe parameter estimates is approximated with Taylor series linearization (TSL).

The OLS regression model and the solution ofthe normal equations are altered, using

weighted sums of squares and cross-products (Horvitz-Thompson estimators) for X'X N r- and X'Ytr. The counterpart to X'X is.E EXi"t"X~"t"Wi" = Kl, while the counterpart ,·=1 t·=1 N T" to X'Ytr is E E Xi"t" Yi"t" Wi" = K 2 , where Wi" is proportional to the inverse overall i·=1 t'=1 sampling selection probability for the tth subject; thus the SDR parameter estimator is

1\ f3 = Kil K 2· Let the linearized variate vector for the t*th observation ofthe i*th subject

1\ be Zi"t" = Kil {Xi"t" riOt" Wi" }, where riOt" = (Yi"r - xi"t" (3) is the adjusted residual; then T" summing them across observations in a within-subject manner provides Zi" = E Zi*t" and t'=1 - IN then averaging them across subjects provides Z = N E Zj"; the Taylor series variance i"=1 1\ 1\ 1\ N N _ _, estimator off3 is written as V(f3) = N-I E (Zi" - Z)(Zi" - Z) ; since the Zi' vectors i·=1 of within-subject residual aggregates are independent as long as subjects are independent.

Although in RCTs each subject has an equal sampling weight (inverse probability ofbeing sampled) so that the same parameter estimates as OLS are obtained, covariance computations involve within-subject sums of adjusted residuals which account for the aforementioned sampling design. Thus, each subject is viewed as a cluster (primary sampling unit) and each observation is treated as a within-cluster observational unit. 36

GEE methods with an independence working correlation structure have been shown (Bieler and Williams, 1995) to be asymptotically equivalent to SDR methods; GEE uses the number of clusters in the variance estimation, while SDR uses the number ofclusters minus one. SDR methods may be less efficient than GEE methods when the working correlation structure has non-independence correctly specified.

GEE and SDR methods can be utilized for crossover designs (Zeger and Liang,

1992) much as mixed models can. GEE and SDR methods, however, can be applied to categorical responses (e.g. binary, nominal, ordinal, counts) with patients as the clusters and measures at each period for each treatment as the observations. Period differences

(for treatment effect) and period sums (for carryover effect) can be compared for • allocation sequences allowing missing data as specified for mixed models. Chapter 3 Unstratified Analyses for Stratified Designs 3.1. Introduction Analysts often wish to perform analyses which do not fully account for the stratified

study designs, for reasons described in Chapter 1. For example, the design may be too

complex to provide efficient analyses under full stratification, or the investigators may

wish to estimate the differences in treatment effects among subgroups. Although these

analyses frequently arise in practice, the ramifications ofsuch actions are not fully known.

Thus, the properties of strategies ignoring strata or adjusting for strata as covariates with

both design- and model-based methods were evaluated relative to fully stratified analyses

in terms of bias (size) and power. Actual randomized controlled trials and Monte Carlo simulations were used for these assessments.

3.2. Assessments with Example RCTs 3.2.1. Example 1: Dental Pain Trial Assessments ofthese analysis methods in the actual RCTs described in Subsections 1.5.1.

and 1.5.2. follow. In the two center, five group dental pain relief ReT of analgesics (Ganskyet ai, 1994), patients recorded ordinal pain and relief 0.5, 1,2,3,4,5,6, 7 and 8 hours after medicating. As in most acute pain trials, analyses used summary efficacy measures (SPill, TOTPAR, and TOTGONE), which are nearly continuous weighted sums of ordered outcomes. These analyses separately compare each active dose (SL, SH, TL,

TH) to placebo (P) for SPill, TOTPAR, TOTGONE and ordinal hourly pain relief ratings through 3 hours post-medicating, usmg centers or centers x permuted 38

Table 3.1 P-values for testing each active treatment group versus placebo for summary measures and hourly dental pain reliefwith centers as strata Response Trt Grp Stratify Ignore Adjust as Covariate Measure vsP EMH Rank Sum ANOVA RANCaVA 2-Way ANaVA SPill SL ::; .001 ::; .001 < .001 < .001 < .001 (N=258) SH :5 .001 ::; .001 < .001 :5 .001 < .001 TL .003 .002 .006 < .001 .003 TH ::; .001 ::; .001 < .001 < .001 < .001 TOTPAR SL ::; .001 ::; .001 .001 < .001 ::; .001 (N=258) SH ::; .001 ::; .001 < .001 :5 .001 < .001 TL ::; .001 ::; .001 .002 < .001 < .001 TH ::; .001 ::; .001 ::; .001 ::; .001 ::; .001 TOTGONE SL :::; .001 .003 .003 ::; .001 .001 (N=258) SH :::; .001 :::; .001 ::; .001 ::; .001 ::; .001 TL .003 .004 .005 .003 .002 TH :::; .001 :::; .001 ::; .001 ::; .001 ::; .001 ~ hr Relief SL .005 .011 .010 .005 .004 (N=258) SH .012 .021 .020 .021 .012 TL .210 .233 .235 .215 .212 TH .003 .006 .006 .002 .002 1 hr Relief SL ::; .001 .002 .002 ::; .001 ::; .001 (N=258) SH ::; .001 ::; .001 ::; .001 ::; .001 :5 .001 TL ::; .001 .002 .001 ::; .001 ::; .001 TH ::; .001 :::; .001 ::; .001 ::; .001 < .001 2 hr Relief SL ::; .001 ::; .001 ::; .001 ::; .001 < .001 (N=164) SH ::; .001 ::; .001 ::; .001 ::; .001 < .001 TL :::; .001 :::; .001 ::; .001 ::; .001 ::; .001 TH < .001 :::; .001 ::; .001 ::; .001 < .001 3 hr Relief SL .012 .005 .004 .003 .006 (N=127) SH .002 ::; .001 .001 ::; .001 .001 TL .005 :::; .001 .001 .002 .001 TH ::; .001 :::; .001 ::; .001 < .001 ::; .001

E:MH=extended Mantel-Haenszel with integer scores ANOVA=analysis ofvariance; RANCOYA=rank analysis ofcovariance (centers) P=placebo; SL=standard low dose; SH=standard high dose; TL=test low dose; TH= test high dose Non-significant tests at the a = .05 level are displayed in boldface - 39

Table 3.2 P-values for testing each active treatment group versus placebo for summary measures and hourly dental pain reliefwith centers x permuted blocks as strata

Response TrtGrp Stratify Ignore Adjust as Covariate Measure vsP EMH Rank Sum ANOVA RANCOVA 2-Way ANOVA SPill SL :s .001 :s .001 :s .001 :s .001 ~ .001 (N=258) SH :s .001 :s .001 :s .001 < .001 < .001 TL .005 .002 .006 .002 .004 TH < .001 :s .001 :s .001 < .001 < .001 TOTPAR SL < .001 :s .001 .001 :s .001 < .001 (N=258) SH :s .001 :s .001 :s .001 :s .001 < .001 TL :s .001 :s .001 .002 < .001 < .001 TH :s .001 :s .001 :s .001 < .001 ~ .001 TOTGONE SL :s .001 .003 .003 :s .001 .001 (N=258) SH :s .001 :s .001 :s .001 :s .001 :s .001 TL .003 .004 .005 .002 .003 TH < .001 < .001 :::; .001 < .001 < .001 ~ hr Relief SL .006 .011 .010 .008 .005 (N=258) . SH .012 .021 .020 .016 .011 TL .217 .233 .235 .328 .220 TH .003 .006 .007 .004 .002 1 hr Relief SL :s .001 .002 .002 < .001 .001 (N=258) SH :s .001 :s .001 :s .001 :s .001 :s .001 TL :s .001 .002 .001 :s .001 .001 TH :::; .001 :s .001 :s .001 :s .001 :s .001 2 hr Relief SL .003 :s .00 I :s .001 :s .001 .002 (N=164) SH :s .001 :s .001 :s .001 :s .001 < .001 TL :s .001 :s .001 :s .001 :s .001 < .001 TH :s .001 :s .001 :s .001 :s .001 :s .001 3 hr Relief SL .016 .005 .004 .007 .137 (N=127) SH .006 :s .001 .001 .003 .002 TL .068 :s .001 .001 .005 .028 TH .007 :s .001 :s .001 :s .001 :s .001

EMH=extended Mantel-Haenszel with integer scores ANOVA=analysis ofvariance; RANCOVA=rank analysis ofcovariance (centers x blocks) P=placebo; SL=standard low dose; SH=standard high dose; TL=test low dose; TH= test high dose Non-significant tests at the Q = .05 level displayed in boldface 40 blocks as the stratification factor.. Tables 3.1 and 3.2 show similar results for the five analysis procedures. These tests, which are confirmatory in nature for this study, are almost all significant, as expected. Non-significant tests at the 0: = 0.05 level are shown in boldface. Pain relief ~ hour post-medicating was not significantly different from placebo for the low dose of the test drug (TL) with any ofthe five statistical techniques, using either centers or centers x permuted blocks as the strata. Using centers x permuted blocks as the strata in this finite sample does not appear to improve efficiency, as might have been expected (Matts and Lachin, 1988). For 31 of 84 (37%) tests using centers x blocks as strata, p-values increase from those using center as strata; only 3 of the 84 tests have p-values decrease. In particular, when using centers x blocks at 3 hours post-medicating, the fully stratified EMH test for the low dose oftest medication (TL) and the two-way ANaVA for the low dose of standard (SL) are not significantly different from placebo; including permuted block actually reduces efficiency via the paradox described in Subsection lA.l. due to small within-stratum sample sizes but not much from missing data imbalance in this case. Negative intrablock correlation, which is uncommon, can be no more extreme than -.053 with blocks ofsize 20 (Koch, 1983; Matts and Lachin,

1988). In this example, the within-center intraclass correlation was higher than that within-strata formed by centers x blocks. In fact, intraclass correlation of centers x blocks for pain relief at 3 hours post-medicating was negative. Imbalance was not a big concern here, however; one of fourteen centers x blocks strata had any imbalance (60%-40% split when comparing P to SH and P to TL).

3.2.2. Example 2: Respiratory Ailment Study

Another example, the multivisit respiratory study (Koch et aI, 1990), was analyzed as well. Analyses of these ordinal responses at each time point were performed with center as the stratification factor; supplemental analyses adjusted for baseline, since this 41

Table 3.3 P-values for testing treatment differences for ordinal respiratory measures with centers as strata

Baseline Statistical Visit Adjustment Procedure 1 2 3 4

No Stratify: EMIl .053 ::; .001 ::; .001 .015 Ignore: Rank Sum .061 ::; .001 ::; .001 .017 ANaVA .060 ::; .001 ::; .001 .016 Adjust: RANCaVA .052 ::; .001 ::; .001 .015 2-Way ANaVA .052 ::; .001 .001 .015

Yes Stratify: EMIl .009 ::; .001 ::; .001 .007 Ignore: EMIl .015 ::; .001 ::; .001 .010 2-Way ANaVA .014 ::; .001 ::; .001 .009 Adjust: RANCaVA .015 ::; .001 ::; .001 .010 3-Way ANaVA .015 ::; .001 ::; .001 .010

EMIl=extended Mantel-Haenszel with integer scores; ANaVA=analysis ofvariance; RANCaVA=rank analysis ofcovariance (center) Non-significant tests at the Q = .05 level displayed in boldface 42 was an important covariate. Treatment effects of ordinal visit measures were assessed stratifying on center with EMH tests, ignoring center with rank sum tests and I-Way

ANOVAs, and adjusting center as a covariate with RANCOVAs and 2-Way ANOVAs.

Since the baseline measure had a strong relationship with responses (Koch et aI, 1990), analyses adjusting for baseline were also performed: stratifying on center and baseline with

EMH tests, ignoring center but adjusting for baseline with EMH tests and 2-Way

ANOVAs, and adjusting center and baseline as covariates with RANCOVAs and 3-Way

ANOVAs. Results are presented in Table 3.3. The unadjusted and a~justed EMH tests for each visit, using integer scores stratifying on center, agree with results from Koch et al

(1990), who performed analyses adjusting for centers as strata and baseline as a covari­ able. Unadjusted analyses ignoring center are not as efficient as those stratifying on or adjusting for center, but results for the five methods are similar. Baseline adjustments in­ creased the. efficiency ofthe analyses, since they account for variation due to baseline diff­ erences. After adjusting for baseline, analyses ignoring center produce essentially the same results as analyses adjusting center as a covariate. These results were very similar to those from fulIy stratified analyses.

3.3. Simulations

To explore these relationships in more depth, simulations were utilized to evaluate power and bias of the various methods. Simulations calculated the rejection rate: the percent of simulations that concluded a significant treatment difference at a = 0.05.

These rejection rates were used as measures ofempirical size and empirical power. These simulations were performed in the presence of several stratum differences (none, small, moderate, or high). The performances of five statistical methods were examined. The extended Mantel-Haenszel test with integer scores (Subsection 2.2.2.) fully accounts for the stratification in the design, so it is considered appropriate for stratified designs. The nonparametric Wilcoxon rank sum test (discussed in Subsection 2.2.2.) and the parametric 43

two sample t-test (Subsection 2.3.3.), or equivalent one-way analysis of variance

(Subsection 2.3.5.), are methods which ignore the stratification factors in the design. The

nonparametric, partly stratified randomization analysis of covariance (Subsection 2.2.3.)

and the parametric two-way analysis of variance (Subsection 2.3.5.) are methods which

can be used to treat the stratification factor as a covariate via adjustment. Both normally

distributed and non-normally distributed responses were considered.

3.3.1. Normally Distributed Response Monte Carlo simulations for one stratification factor and various stratum effects

(none, small, moderate, or high) with two strata, each ofsize 30, were performed. Figure

3.1 shows the Monte Carlo simulation steps for two treatments and two strata with con­

tinuous, normally distributed responses, Yhij, for the ith subject in thejth treatment group

ofthe hth stratum, where h = 1,2; i = 1,···, nhj; and} = 1 (placebo), 2 (active).

Random standard normal deviates (Zhij) were generated and multiplied by the

specified for each stratum of size 30; to treatment within each stratum and evaluation of the methods followed. Next they were shifted by the treatment (dt ) and stratum (d s ) differences, where appropriate. Only the initial random seed was specified; SAS's CALL RANNOR (SAS, 1990) output seeds were

stored and used for subsequent simulations and randomizations. Thus, random deviates and random allocations were generated as if reading consecutive random numbers from a table after choosing a starting point.

Simulation evaluations were repeated 1000 times (step i. 20 times and step ii. 50 times) for stable estimates ofpower to evaluate non-null treatment differences with known powers of50% and 80%. (Thus, power would be estimated with standard errors less than

1.6%.) For bias assessments relative to nominal Type I error rate (0.05) in the situation with no treatment effect (null case), the simulations were repeated 2100 times (step i. 30 times and step ii. 70 times) to obtain stable estimates, since 1000 simulations produced 44

Figure 3.1 Normal distribution simulations (2 treatments and 2 strata)

Step

1. Choose nh. random standard normal deviates Zhij rv N(O, 1)

11. Randomly assign to treatment within-stratum

111. Multiply by s.d. (b); shift by mean (a) Yhij = aZhij + J.L

Shift stratum 2 by stratum effect Y2ij = Y2ij + ds

Shift treatment 2 by treatment effect Yhi2 = Yhi2 + dt

Thus Ylil rv N(J.L,(j2)

Yli2 rv N(J.L + dt ,(j2)

Y2il rv N (J.L + ds ,(j2)

Y2i2 fV N(J.L + ds + dt,(j2)

IV. Evaluate methods for each repetition

In these simulations: h = 1,2; i = 1,.··, nhj;) = 1 (placebo), 2 (active); nhj = 15 V hand) a = J.L = 5; b2 = (j2 = 100; ZQ = 1.96; a = 0.05 ds E {O, 5, 10, 15}; dt E {O, 5.06, 7.23} 45 larger than desired standard errors. (Note: others have used 1000 simulations for both power and bias assessments of unstratified nonparametric covariance procedures without explicitly discussing standard errors [Conover and Iman, 1982; Stephenson and Jacobson,

1988]). For size, 2100 simulations would generate estimates with standard errors near

0.5%. Standard errors ofrejection rates are calculated as the square root ofthe following product: the estimated rejection rate multiplied by its complement and divided by one less than the number of simulations. These simulations used J.L = 5; (72 = 100; ds E {O, 5,10, 15}; and dt E {O (null case), 5.06 (50% power), 7.23 (80% power)}. Non­ null treatment differences, dt , were found by solving dt = (Za + Z/3)(7/ In.j/2, where a = 0.05, power-1 - {3, and ZQ and Z/3 are the ath and {3th ofthe standard normal distribution, respectively.

3.3.2. Non-Normally Distributed Response To assess bias in situations in which the continuous scaled response is non-normal, several distributions were examined. Using non-normal continuous distributions enabled investigating the implications of not performing fully stratified analyses when an assumption of model-based methods (i.e. normality) is not met. . Non-normal distribution functions and properties are displayed along with those ofthe normal distribution in Table

3.4. Note that the Cauchy distribution has undefined mean and variance, so and are presented instead.

The non-normal distributions used in Conover and Iman (1982) and Stephenson and

Jacobson (1988) have and dispersion measures that correspond poorly with their normal distribution counterparts. The particular non-normal distribution transforms (i.e. choices of a and b, the shift and scale parameters) for these simulations were derived to have central tendency and dispersion measures matching those of the above specified normal distributions; a non-normal distribution transform matches either 46

Figure 3.2 Non-normal distribution simulations (2 treatments and 2 strata)

Step

1. Choose random standard deviate vector Z rv D(O, 1) Rnh. x 1 ofsize Rnh. for R replicates ofdistribution D . u. Divide vector into strata and treatment groups lll. Transform by a and b as in Table 3.5

for example in the Cauchy case y = bz + alRn".

IV. Evaluate methods for each repetition

In these simulations: h = 1,2; i = 1,''', nhj;} = 1 (placebo), 2 (active); nhj = 15 Vhand} a and b are specified in Table 3.2; ZQ = 1.96; Q = 0.05 ds=O; dt=O

• 47

Table 3.4 Comparison of normal and non-normal distributions

Distribution pdf fey) Central Tendency Dispersion 2 1 -(y-a)2/2b2 2 Normal N(a, b ) J2;b e a b 2 1 e-(ln(y)-a?/2b2 ea+b2 /2 2a b2 b2 Log-normal LN(a,b ) yJ2;b e + (e -1) Cauchy C(a, b) b/{1r(b2 + (y - a?)} a 2b Uniform U(a,b) (b - a)-l (a+b)/2 (b - a)2/12 pdf= function Central tendency is the expected value (mean) except for Cauchy which is the median; Dispersion is the variance except Cauchy which is the interquartile range. 48

Table 3.5 Distdbutional parameter values and measures of central tendency. and dispersion

Conover & Iman (1982) Simulations

Distribution Transformation a b E(y) V(y) a b E(y) V(y) Median IQR Normal y = bz +a 0 2 0 4 5 10 5 100 5 13.49 bz a 2 4 4 In5 Log-normal y = e + 0 2 e =7.39 e (e -1 )=2926 T ~ 5 100 2.24 4.28 In5 ,,)2.70 19.27 5141 5 13.49 Cauchy y = bz +a 0 1 om 2q 5 6.75 - - 5 13.49 4 1 Uniform y = (b - a)z + a 0 4 2 j -12 22 5 96 3 5 17 60~ -8.49 18.49 5 3 5 13.49

z is a standard distribution (usually a = 0 and b = 1) IQR = interquartile range E(y) and V(y) are undefined for Cauchy m median (not mean); q interquartile range (not variance) 49 the normal mean and variance or the normal median and interquartile range, but not both at the same time, due to the shape ofthe non-normal distributions. The measures ofcen­ tral tendency and dispersion are shown in Table 3.5 for the particular non-normal distributions used in Conover and Iman (1982) and in these simulations. Thus, simulations more directly comparable to the normal response were performed than in previous examinations; both -based and quantile-based transformations for log-normal and uniform distributions were used. Figure 3.2 shows the procedures used to simulate non­ normal responses by transforming random standard deviates (Zhij). For these simulations; the CALL RANNOR function in SAS was again utilized, albeit somewhat differently than for the normal distribution. Instead of choosing random deviates 30 times and randomly assigning them 70 times for a total of 2100 replicates as with normal data in the null case, these simulations randomly chose 2100 replicates with samples of size 60 for each distribution type (e.g. uniform) as one vector in the SAS/IML matrix language; the vector was partitioned into the required number of groups without loss of generality, since the data are already ordered randomly. These simulations used balanced (i.e. equal sized) treatment groups and strata. Finally, the five statistical methods were evaluated and empirical power and bias were calculated.

3.3.3. Simulation Results Empirical size (bias) estimates for the normal distribution are displayed in Table 3.6.

Simulations with continuous normally distributed outcomes have shown that all these pro­ cedures have Type I error close to the nominal significance levels (0.05); although the point estimates for no stratum effect are all slightly above 0.05, they are all within two standard errors of the nominal level. As stratum effect increases, the methods ignoring stratification become overly conservative with point estimates smaller than 0.05.

Stratifying and adjusting methods seem unaffected by increasing stratum effects since 50

Table 3.6 Empirical bias estimates based on 2100 simulations: Rejection rate (standard error)

Stratum Effect Case Procedure None Small Moderate High Normal Stratify: EMH .055 .040 .056 .051 .. (.005) (.004) (.005) (.005)

Ignore: Rank Sum .055 .036 .031 .012 (.005) (.004) (.004) (.002)

ANOVA .056 .037 .033 .012 (.005) (.004) (.004) (.002)

Adjust: RANCOVA .056 .044 .057 .053 (.005) (.004) (.005) (.005)

2-Way ANOVA .056 .043 .056 .052 (.005) (.004) (.005) (.005)

Non-normal Distribution Log-normal Cauchy Uniform Case Procedure Moment Quantile Moment Quantile Non-normal Stratify: EMH .033 .026 .015 .043 .043 (.004) (.003) (.003) (.004) (.004)

Ignore: Rank Sum .032 .028 .016 .043 .043 (.004) (.004) (.003) (.004) (.004)

ANOVA .034 .028 .016 .044 .044 (.004) (.004) (.003) (.004) (.004)

Adjust: RANCOVA .035 .029 .016 .045 .045 (.004) (.004) (.003) (.005) (.005)

2-Way ANOVA .034 .029 .016 .045 .045 (.004) (.004) (.003) (.005) (.005)

(Estimates + 2 s.e.)< specified power are displayed in boldface EMH=extended Mantel-Haenszel with Integer scores ANOVA=analysis ofvariance; RANCOVA=rank analysis ofcovariance 51 empirical size does not decrease as stratum effect increases. These simulations indicate that increased size (i.e. bias) does not result from performing unstratified analyses for stratified designs. Simulated empirical estimates for power, shown in Table 3.7, are close to the corresponding 50% and 80% powers generated from treatment differences unless the stratum effect is large. With no or small stratum effects, the point estimates are mostly below specified power but they are within two standard errors of the specified power. Stratified analyses and analyses adjusting strata as covariates have estimates within two standard errors of the specified power when the stratum effect is moderate. Methods completely ignoring strata are underpowered in the presence ofat least moderate stratum effects. With large stratum effects in the 80% target power case, even fully stratified analyses are overly conservative. No method produces estimates of power within two standard errors for a large stratum effect in the 80% power case. Empirical power estimates from simulations may be slightly lower than target powers because simulation treatment effects were found with the normal approximation instead ofthe non-central t distribution. Methods ignoring strata are particularly underpowered for both 50% and

80% targets. Thus, when there are no among strata differences, accounting for strata is not too important; however, as the stratum effect increases, the procedures which completely ignore stratification become quite inefficient. Simulations for bias with non-normal continuously distributed data (Table 3.6) showed no increase in bias; all empirical sizes are less than the nominal alpha. However, the log-normal and Cauchy distributions have size estimates not within two standard errors of the nominal level. All five methods yield similar results. 52

Table 3.7 Empirical power estimates based on 1000 normal simulations: Rejection rate (standard error)

Stratum Effect Case Procedure None Small Moderate High 50% Power Stratify: EMIl .499 .499 .494 .522 .. (.016) (.016) (.016) (.016)

Ignore: Rank Sum .499 .476 .386 .306 (.016) (.016) (.015) (.015)

ANOVA .499 .480 .393 .308 (.016) (.016) (.015) (.015)

Adjust: RANCOVA .499 .507 .502 .529 (.016) (.016) (.016) (.016)

2-WayANOVA .499 .503 .501 .528 (.016) (.016) (.016) (.016)

80% Power Stratify: EMH .778 .791 .778 .770 (.013) (.013) (.013) (.013)

Ignore: Rank Sum .779 .775 .707 .584 (.013) (.013) (.014) (.016)

ANOVA .781 .780 ..711 .588 (.013) (.013) (.014) (.016)

Adjust: RANCOVA .785 .795 .780' .773 (.013) (.013) (.013) (.013)

2-WayANOVA .783 .793 .779 .771 (.013) (.013) (.013) (.013)

(Estimates + 2 s.e.)< specified power are displayed in boldface EMH=extended Mantel-Haenszel with Integer scores ANOVA=analysis ofvariance; RANCOVA=rank analysis ofcovariance 53

3.4. Summary and Conclusions Although analyses of these RCTs show that procedures fully stratifying, ignoring strata, and adjusting strata as covariates produce similar results, other studies have indicated problems arising from nonnull stratum effects, particularly time-varying ones

(e.g. VAC~G, 1967; Eisenhauer et al, 1994). Although with dichotomous (binary)

response, unconditional analysis of highly stratified or matched data has been shown to

result in biased estimates due to estimating many nuisance parameters (Breslow and Day,

§7.1, 1980; Kleinbaum, Kupper, Morgenstern, pp. 387-388, 440-442, 1982), these simulations with continuous data show that fully stratifying strata, ignoring strata, and

adjusting strata as covariates yield similar properties when stratum effects are null or small: no evidence of systematic bias, as well as power near specified levels. Non­ normally distributed continuous responses showed no increase in bias, but sometimes a reduced size of the test. As stratum effects get large, power diminishes, especially in procedures completely ignoring strata. Thus, ignoring strata completely and adjusting strata as covariates are reasonable approaches when strata are not associated with continuous outcomes. Since completely ignoring strata loses efficiency as stratum effects increase, adjusting strata as covariates may be a better overall approach when stratification analysis is not desired or not feasible. With continuous outcomes, stratification or covariance adjustments both appear reasonable statistical procedures; choice lies in the investigators' preference to conduct comparisons either holding a factor constant or adjusting to mean levels. Chapter 4 . A Random Effects Model for Incomplete Pairs In Multicenter Trials 4.1. Introduction Some multicenter clinical trials include a large number of centers with each having a relatively small number of patients. Usually, these multicenter studies have within-center randomization, which poses a potential methodologic dilemma since some centers may have no patients assigned to some treatments (i.e. incomplete designs). To study this problem, one can consider the situation with two treatments and many centers, each with either two patients (one per treatment) or one patient. This data structure is the same as a matched pairs design with missing data in the centers with only one patient (called singleton or unpaired centers). Such data would usually be missing completely at random

(MCAR) (Little and Rubin, 1987) since missingness is only related to the presumably ran- dom fact that the center had one patient instead oftwo; missingness would be non-random only ifthe center did not enroll the second patient because ofsome circumstance related to the first patient's baseline status or response outcome (called "chronologic bias" by Matts and McHugh (1978)). In each center with two treatments, the first treatment assignment is random, while the second is for the other treatment (i.e., deterministic). For conveni- ence, refer to the pattern ofwithin-center treatment assignment as the accrual"order".

Various authors have addressed related problems. Lachenbruch and Myers (1983) interpreted continuous data for matched pairs with possibly missing response for one member as a randomized block design for which analysis is via a with fixed effects for pairs and treatments. They concluded that data from the unmatched pairs 55 do not add any additional information to the paired data since treatment differen~es are estimated on a within-blocks basis (i.e., the data for the unmatched pairs only contribute to estimation of pair effects in this fixed effects general linear model); the same situation applies to binary data when using McNemar's test (Koch and Edwards, 1988). Ekbohm

(1976, 1981) investigated testing the equality of two correlated means with incomplete data, extending work for missing data in one variate (Mehta and Gurland, 1969). He noted that a mixed model with fixed effects for center and treatment leads to the complete paired test (as in Lachenbruch and Myers), discarding the unmatched pairs. He then examined modified maximum likelihood estimators (MLEs) and simple difference between means (ignoring accruaVallocation order) estimators (proposed by Lin and Stivers, 1974) with normally distributed responses, assuming data are MCAR and concluded through simulations that those simple estimators of differences between means are appropriate when the correlation is low, while the modified MLEs appear better with unknown or moderate to large correlation, especially with small, unequal sample sizes. Bhoj in a series ofpapers (1978, 1984, 1989, 1991) proposed a convex combination ofthe usual paired t­ test and two sample t-test via transformations for incomplete bivariate normal data with exchangeable correlation structure. A weighted estimator utilizing this concept will be presented in Section 4.3. Bhoj compared his proposed statistic with the modified MLE statistics of Lin and Stivers (1974) and Hamdan et al (1978), as well as the simple difference between means (ignoring accrual order) statistic ofLin and Stivers (1974) and an earlier statistic he proposed (Bhoj, 1989), via some simulations. The Lin and Stivers modified MLE may be anticonservative in the presence of heteroscedasticity. Both Bhoj statistics and the Hamdan modified MLE seem to have good power and proper size. Little and Rubin (Chapter 8, 1987) provided an example of a variance components (random or mixed effects) model which handles unobserved block effects as missing data with the

Expectation-Maximization (EM) algorithm. 56

Generalized estimating equation (GEE) methods (Liang and Zeger, 1986; Zeger and

Liang, 1986; Liang, Zeger and Qaqish, 1992) can be applied using population averaged models in this situation for continuous, count, binary, and ordinal data (Davies, 1994;

Lipsitz et al, 1994), as long as centers constitute a large random sample from some population of centers and the unenrolled second patients' data in the unpaired centers are

MCAR. An exchangeable working correlation structure for GEE methods seems quite reasonable since patients within a center can be thought as interchangeable under a null hypothesis of no treatment difference. However, there are some questions about the applicability of design, samples size, and MCAR assumptions for GEE methods in this scenario; centers are usually chosen by convenience, not at random; patients within centers are not necessarily chosen (or necessarily volunteering) to participate (from the population ofpatients with the same condition) at random. This situation is also similar to a two treatment, two period crossover study with missing data in the second period. Patel (1985) examined crossover trials with missing data in the second period. He assumed a bivariate normal distribution for responses of patients and that the first period responses are identically distributed, regardless of miss­ ingness in the second period. He proposed a method using incomplete data and compared it to the usual test with only the complete data. He concluded via simulations that tests based on the estimator incorporating the incomplete data had appropriate size and better power than using only the complete data. Jones and Kenward (1989) suggested an adhoc procedure using a weighted estimator based on the inverse ofthe variance to combine the complete and incomplete data. Cook (1995, 1996) uses a weighted estimator ofcomplete and incomplete sequences for interim analyses of crossover studies. Section 4.3. investigates a related weighted procedure using weighted least squares for the vector of sequence by accrual means and its covariance matrix. Grieve (1995) used a Bayesian approach with improper prior distributions and a uniform covariance structure; he concluded, through derivations and an example, that incorporating the missing data only 57 adds information with a nonnull carryover effect. This conclusion would correspond to a nonnull accrual x treatment in the multicenter RCT setting with incomplete treatment assignment.

4.2. General Model Consider the parallel groups design for a randomized controlled trial (RCT) with two treatments, many centers and one or two patients per center. Let h = 1,2,· H, H index the centers, i = 1,2 index the accrual order of patients in the hth center, j = A,B index treatment assignment, k = 1,2 index the subjects within the hth center, and Yhijk denote the response for the kth patient in the hth center entering the study in the ith order and assigned to the jth treatment. Then, a general model with nested subject effects for this situation is:

Yhijk = p + 'if;h + 1fi + Tj + CXk(h) + (VJ1f)hi + ('if;Thj + (1fT)ij + (1fCX)ik(h) + (TCX)jk(h) + ('ljJ7fThij + (1fTCX)ijk(h) + €hijk =Phijk + €hijk where the €hijk have E(€hij~) = 0, Var(Ehijk) = a;, and COV(€hljk, €h2j'k') = Pea; where j =/:. j', k =/:. k'. The €hijk are random observational errors corresponding to measurement variability or inherent biological stochastic variability in the phenomena under observation (i.e. potentially multiple observations on the same patient under the same conditions can vary). In this model, P is the overall mean, 'if;h are the center effects,

1fi are the accrual (period/order) effects, Tj are the treatment effects, CXk(h) are the subject effects, (..).. are the pairwise interactions, and (...)... are the three-way interactions. Various restrictions apply to this model: 58

This model can be re-written according to a Fisher decomposition:

E(Yhijk) = Ph... + (Phi.. -Ph...) + (}.Lh-j. -llh...) + (Ph..k -llh...) + (phi·k - }.L hi·· - Ph··k + Ph ) + (llh·jk - }.L h-j. - II h··k + Ph... ) + (phij· - Phi.. - II h·j· + Ph ) + (}.Lhijk + Phi.. +P h·j- + P h..k - Phi·k - P h·jk - IIhij· - II h... )' where the terms correspond to the center mean, the center accrual (period/order) effect, center treatment effect, patient effect within-center, accrual x subject within-center interaction, treatment x subject within-center interaction, center accrual x treatment inter- action, and the order x treatment x subject within-center interaction, respectively. The decomposition can be developed further into:

E(Yhijk) = P .... + (Ph -P....) + (P.i.. -P....) + (ll ..j. -P....) + (Ph..k -Ph...) + (Phi.. -Ph -P.i.. +P ) + (Ph-j. - P,h... -P..j- +P.... ) + (p, ·ij· - ll.·i.. - ll ..j- + P, )

+ (p hi· I,: - II h-£.. - P, h··1,: + II h... ) + (llh·jk -Ph·j- -Ph..k + Ph...) + (p, h'ij- - P, hi·· - P, h-j· - ll.ij. + Ph... + P, ·i.. + P ..j. -P.... ) + (}.Lhijk - P, hi·1,: - P, h·jk -Phij- + II h··1,: + 71 hi·· + }.L h·j· -Ph... )' which corresponds to the overall mean, center effect, accrual (period/order) effect, treatment effect, subject effect within-center, center x accrual interaction, cen­ ter x treatment interaction, accrual x treatment interaction, accrual x subject within­ center interaction, treatment x subject within-center interaction, center x a~c~al x treat­ ment interaction, and the order x treatment x subject within-center interaction, respect­ ively; the p, terms are the means ofthe J.L terms across the indices replaced with dots. For this model, patients are nested within-center and the nested subject effect is neither 59

summed over centers nor interacted with the center effect. This nested model for patients

assumes a patient can enter either first or second.

Usually, patients arrive without being randomized to arrival order, i.e. patients are

randomized to treatment as they arrive, so patient effects are not separable from (i.e., are

aliased with) order effects on a within-center basis; i.e., they essentially correspond to

order x center interactions. Thus, the following general model for this situation is of

more interest and may be more realistic. The general nested model with subject effects

will not be considered further as all effects involving k are henceforth assumed null.

Thus, consider the general model:

E(Yhij) = J.L + 'l/Jh + 7ri + Tj + ('l/J7r)hi + ('l/JT)hj + (7rT)ij + ('l/J7rT)hij + Ehij (4.1) = J.Lhij + Ehij where the Ehij have E(Ehij) = 0, V(€hij) = 0';, and COV(€hlj, Eh2j/) = Pea; for j =1= j'.

In this model, J.L is the overall mean, 'l/Jh are the center effects, 7ri are the accrual

(period/order) effects, Tj are the treatment effects, (•• )•• are the pairwise interactions, and

(...)••• is the three-way interaction. The same restrictions apply to this model as for the nested model, except the terms including the subject effects do not apply. The following means are ofinterest:

-y2).. - 1.H "'Yh"LJ 2) - ,...II. + 7r'2 + T') + 1.H "'€h"LJ 2) (4.2) h h and fIj = 2k LLYhij = J.L + Tj + 2k LLEhij. (4.3) h i h i This general model in equation (4.1) can be decomposed as follows:

E(Yhij) = 7lh.. + (71hi. -71h..) + (71h-j -71h..) + (J.Lhij -71hi. -71h.j + J.Lh..)' where the terms correspond to the center mean, the center accrual (period/order) effect, center treatment effect, and the center treatment x accrual interaction, respectively. The decomposition can be rewritten as: 60

E(Yhij) = ll... + (J.Lh .. - ll...) + (J.L.i. - ll...) + (J.L ..j - ll... ) + (J.Lhi. - J.Lh .. - ll.i. + J.L ...) + (llh-j - llh.. - ll..j + ll...) + (ll.ij - ll.i. - ll ..j + ll...) + (J.Lhij - J.L hi· - II h.j - J.L ·ij + II h.. + ll.i. + ll..j - ll...), which corresponds to the overall mean, 'center effect, accrual (period/order) effect, treat- ment effect, center x accrual interaction, center x treatment interaction, accrual x treatment interaction, and the center x treatment x accrual interaction, respectively.

(Note that only one combination of treatment and order is observed for each patient in each center, which limits estimability. In this regard, the center x treatment x accrual interaction - and the last term in all the previous decompositions - is usually viewed as a random residual.)

The general model in equation (4.1) reduces to the finite population model considered in Chapter 5 when a; = Pe = 0 so that the Yhij are fixed constants; i.e., the €hij = 0 when there is no inherent variability or measurement error in the response. The random effects model considered in Section 4.3. applies when

('l/J1r)hi = ('l/JT)hj = (1rT)ij = ('l/J1rT)hij = 0 for all h,i,j, under the assumption of no center x accrual (order) interaction, no center x treatment interaction, no accrual x treatment interaction, and no center x treatment x accrual order interaction.

At each center, for the general model in equation (4.1), the overall null hypothesis

(Ho) of no treatment effect for each and every patient has Tj = (1rT)ij = ('l/JThj = ('l/J1rT)hij = O. A case of interest allows €hij to be random with

Pe = 0, 00 > a; > 0 and ('l/J1rhi = 0 in addition to the null conditions, under the assumption of no interactions at all. Thus, E(YhijIHo) = J.Lhij = J.L + 'l/Jh + 1ri and

Var(YhijIHo) = a; for h, i, and j. 61

4.3. Random Effects Model

One can consider a RCT with many strata, some of which do not have all the potential treatments assigned to subjects. The chance of such an occurrence increases as within-stratum sample size decreases. This situation could occur in a study with many stratification factors other than center through their resulting cross-classification; however, a trial with many centers is illustrated below.

Consider the scenario of a RCT with two treatments (A,B) assigned in H centers or clinics (where H is large). Assume a model similar to equation (4.1) with an overall mean

(p,), a random center effect ('tf;h), an accrual effect (1l"i), a treatment effect (Tj), and random error (€hij) for the response ofthe ith patient in center h assigned to treatmentj:

Yhij = J.I, + 'tf;h + 1l"i + Tj + €hij, (4.4) where random errors and random center effects are independent and identically distributed

(i.i.d.) according to normal distributions; i.e. €hij % N(O, a;ij)' 'tf;h % N(O, a~), and €hij and 'tf;h are independent (..1) for all ('v) h = 1,... ,H, i = 1,2, and j = A,B. (With cross-classified stratification factors other than center, the stratum effect probably is not random, whereas center could be random.) Suppose N = 2nPA + 2npB + nSA + nSB patients were randomly assigned to treatment as shown in the following figure: n p centers have two patients each (nPA have the first patient assigned to A and npB have the first patient assigned to B) and the other n s centers have only one patient (nsA ofthe centers assigned their only patient to treatment A and the other nSB to B.) (Without loss of generality, the nominally scaled center index can be rearranged to produce the pattern shown in Figure 4.1.) Let 1s = nSB/nSA denote the ratio of singleton/unpaired centers with treatment B only to those with A only, 1p = npB/npA denote the ratio of paired centers assigning treatment B to the first patient to those assigning treatment A to the first patient, and n s = n s /2 be the average number of centers with only A or B, so n s = 2n s = nSA + nSB is the total number of centers assigning only one of two 62

Figure 4.1 Multicenter data structure

Treatment

A B

YUA Y12B Y21A Y22B · · YnFAIA YnpA2B YnpA+l,2A YnpA+l,lB

YnpA+l,2A YnpA+l,lB ·

Ynp2A YnplB Ynp+l,lA -

Ynp +2,lA -

Ynp +nsAolA -

- Ynp+nsA+l,lB

- Ynp +nsA+2,lB

- YHIB

where Yhij (hij)th response

h = 1, ... , H indexes centers

i = 1,2 indexes accrual order

j= A,B indexes treatment assignment 63 treatments.

These data can be viewed in two parts: the centers with paired patients and the cent- ers with only one patient. In the n p centers with patient pairs, one can base the estimate for the treatment difference on aggregating the within-center treatment differences: np d;, = L(YhiA - Yhi I B)/2npj* (4.5) h=l n~ n p = L(YhlA - Yh2B)/2nPA + L (Yh2A - YhlB)/2nPB h=l h=n~A+l = (YPlA - YP2B)/2 + (YP2A - YPIB)/2,

nPA if!:S h :s npA where where 10 is the npj" = { npB ifnpA < h :s n p = nPA1{(nPA-h)~O} + n pB 1{(h-nPA»O}, indicator function or Boolean operator; YPij is the mean ofthe subjects assigned to

treatment j in the ith order in the npj paired centers. This estimator has expected value

E(d;,) = TA - TB and variance

Var(d;,) = O";(n~A +n~)/2 = 0";np/(2npAnpB) = 0";(1 + !p)2/(2np!p) = 0"2(1- PI)np/(2nPAnpB) = 0"2(1- PI)(!;l +2 + !p)/2np, where !p = npB/nPA, 0"2 = 0"; + O"~, and the intracenter correlation is PI = O"~/(O"; + O"~). n p If npA = npB = Tip = np/2 then d;, = L (YhiA - Yhi'B)/np, which has the same expected h=l

In the ns = nSA+ nSB singleton or unpaired centers (centers with one patient), one can base the treatment effect estimate on the difference between the mean for the patients with treatment A and the mean for those with treatment B:

(4.6) which also has expected value E(ds) = T A - TBo However its variance is - ') ') ') 2 2 Var(ds ) = (0"; + O"~)(n~A + n~B) = (0"; + O"J(l + Is) /(ns!s) = 0"2(j;! + 2 + !s)/ns = 0"2(j;! + 2 + !s)/np!sp, 64 where YSlj is the mean of the nSj unpaired centers with only assignments to treatment i,

Ii = nj/np for j = A,B, and Isp = ns/np. If nSA = nSB = n s = ns/2 then

E(ds) = TA - TB, but Var((fs) = 2((/; + (/~)/ns = 4((/; + (/~)/ ns. The two estimators can be combined in various ways using Mantel-Haenszel or weighted least squares (WLS) methods: d = wd"p + (1- w)Cfs. (4.7)

Thus, for fixed w the combined estimator has expected value E(d) = TA - TB and variance Var(d) = w2 (/;(n-;A + n-;B)/2 + (1 - W)2((/; + (/~)(niA + niB) = w2(/;np/(2nPAnpB) + (1 - w?((/; + (/~)nS/(nSAnSB) 2 = [w (1- PI)np/(2npAnpB) + (1- w)2ns/(nSAnSB)] (/2 = K(/2 (4.8)

= [w 2 (1 - PI )(f~1 + 2 + I p)/2 + (1 - W)2(f~1 + 2 + Is)/Isp]((/2/np ).

IfnpA =npB then Var(d) = 2w2(/;/np+ (1 - W)2((/; + (/~)(n;A + n;B) 2 = [2w (1 - PI )/np+ (1 - w)2(1 + Is)2 /(2nsf s)] (/2 2 = [2w (1 - PI )/np+ (1 - W)2/nSA + (1 - w?/nSB ](/2 = [2w2 (1.- PI) + (1 - W)2(f~1 + 2 + fs)/ fsp] ((/2/np ):

Instead, if nSA = nSB then Var(d) = w2 (/;(n"],A + n"],B)/2 + 4(1 - W)2((/; + (/~)/ns = [w 2 (1 - PI )np/(2nPAnpB) + 4(1 - W)2 /ns](/2 2 = [w (1 - PI )(f~l + 2 + I p)/2 + 4(1 - W)2 / Isp] ((/2/np ).

If both nPA = npB and nSA = nSB then Var(d) = 2w2(/;/np+ 4(1 - W)2((/; + (/~)/ns 2 = [2w (1 - PI )/np + 4(1 - W)2 /n s ] (/2

= [2w2 (1 - PI) + 4(1 - W)2 / fsp] ((/2/np ).

Various weights, a :::; w :::; 1, can be chosen; e.g. w = 1 uses only the centers with pairs (reducing to complete paired data); w = ~ weights dp and iL equally; w = auses only the, 65

singleton centers (reducing to a two sample problem); w = np/(np+ n s) = (1 + !sp/2)-1 weights d; by the proportion of all patients in centers with pairs;

w = np/(np+ ns) = np/ H = (1 + !sp)-l weights dp by the proportion of all centers with pairs; or

w = (np + cns)/(np+ ns) = (1 + c!sp)/(1 + !sp); where 0::; c ::; 1 weights d p by those aforementioned proportions, but with more emphasis on paired centers as c

increases. WLS with homogeneous uses weights based on inverse variances (i.e.

w = {2/{(1 - PI )(ntA + ntBn }/ {[2/{ (1 - PI )(ntA + ntan] + 1/(niA + niB) }).

With fixed PI, (72, np, Tis, !p and fs (or equivalently fixed (7;, (7~, nPA, npB, nsA, and nSB), V (d) is minimized relative to w by setting 8~;:) = 0, solving for w, and noting

8ZV(d) ... --a:uT" IS posItIve:

Thus, Var(djwmin) = {(I~PJ) (ntA + ntB) + (n;!A + niB) } -1(72, for ngj > 0 and PI < 1, where 9 = P, Sand j = A,B.

IfnPA = npB then fp = 1; so 66

n- l +n-l ) . ( SA sa

Wmin = 2(1 - pd/n p + (n1A + n1B) -1- 2(1-PI) - 2(1- PI) +np(n1A +n1B) (1 + Is)2/(nsl s) = 2(1 - PI )/np+ (1 + Is)2/(nsl s)' (1 + Is)2 / (lspls) 2(1- PI) + (1 + Is)2/(lspls) _ (1 + Is)2 - 2lspls(1 - PI) + (1 + Is)2'

{(l:~l) Thus, Var(dlwmin) = + 2(n1A +n1B)}-12C72 , for nsj > 0 and PI < 1, where j=A,B.

Instead, ifnSA = nSB then Is = 1; so

j = A,B.

If both nPA = npB and nSA = nSB, which is reasonable since E(npA) = E(npB) and E(nsA) = E(nSB), then 2np /ns 2/ns Wmin = (1 - PI) + 2np /ns (1 - PI )Inp + 21ns 2np 2/lsp - - (1- PI)ns + 2np (1- PI) + 2/lsp 1 - Jap(l _2 ) 2 (1 Jsp(l - PI )/2r . pl + = + 67

Then, Var(dlwmin) = {(1~;1) +ns} -14(72, for PI < 1. If twice as many centers have only one patient as compared to those centers with two patients (i.e. Isp = 2 ), then

Wmin = 1/(2 - PI). When PI = 0, this implies equal weights of dp and ds. Instead, if x% ofthe centers are singletons (i.e. (lOO-x)% are paired centers), where x = ~~{T, then

x d (200-2x)/x Fl'h 20°/ .I H 5 f sp = 100-x an Wmin = (200-X)/X-Pl' or examp e WIt /0 smg etons, = ns, Isp = .25, and .80 ~ Wmin=g!Pl ~ 1.

In general, as the intracenter correlation (PI) increases, Var(d) decreases. As the intracenter correlation (PI) approaches one or the ratio ofsingleton to paired centers (lsp) approaches zero, Wmin approaches one, meaning that the combined test statistic relies more on the paired centers. IfPI = °then

(1 + Ip)2/(nplp) + 2(1 + Is)2/(nsl s) 2/p(I + Is)2 ---_---.:....:...... :.."....-.....:....-...:...---- Ispls(I + I p)2 + 2/p(1 + Is)2'

Thus, Var(dlwmin) = {2(n~A + n~B) + (niA + niB) } -1 (72, for ngj > 0, where 9 = P, S and j = A,S.

If, in addition, npA = npB then (niA + niB) 2 Wmin = I (1 1 ) = 1 - (1 1 ) 2 np+ nSA +nSB 2 +np nSA +nSB . (1 + Is)2/(nsl s) (1 + Is)2 - - 2/np + (1 + Is)2/(nsl s) 2/spls + (1 + Is)2'

So, Var(dlwmin) = {np+ 2(n;!A + niB) } -12(72, for nSj > 0, where j = A,B. Instead, ifPI = °and nSA = nSB then 68

(n-1 + n-1 ) W .- 8/n s = 1 _ PA PB mm -(n ·1pA + n.1)pB + 81n s (n pA1 + n pB1) + 81n s 81ns 81p - (1 + I p )2 I (nplp) + 81ns - lap(I + I p)2 + 81p

and Var(dlwmin) = {8(ntA + ntB) + n s } -140'2, for npj > 0, where j = A,B.

IfPI =0,

which is the proportion of patients in centers with pairs out of all patients

(N = n s + 2np). So, Var(dlwmin) = 40"2 /(2np + n s) = 40'2/N. Figure 4.2 shows

2 the W that minimizes K = w (1 - PI )np/(2npAnpB) + (1 - w)2nal(nsAnsB) for various

ratios of unpaired/singleton centers to paired centers Usp=O, l~'!'~' 1, ~, 4, 15) (with

nPA = npB and nSA = nSB) as intraclass correlation (PI) varies from - 1 to 1 with 100 .. total centers. This shows the w that minimizes the variance of the weighted estimator·

since Var(d) = K 0'2. Also, vertical reference lines indicate the lower range ofw propor­ tional to the number of centers (the line farther left) and proportional to the number of

patients. Note that w = 1 minimizes the variance regardless of Is p only when PI = 1.

With Is p > 0 and modest correlations expected within centers w < 1 minimizes the vari­ ance. Similarly, Figure 4.3 shows the w minimizing K and thus the variance. In these

graphs, Is p = 1 or ~ and Ip varies from ~ to 1 with Is = I/lp. As before H = 100 and

intraclass correlation varies from - 1 to 1. Notice that even with twice as many patients

assigned to one treatment in the paired or unpaired centers the minimum w does not

change much. In practice w should be specified a priori. The intraclass correlation, PI,

could be estimated a posteriori as an evaluation of the prespecified w. Usually one

chooses w between one and three quarters ofthe possible range (between proportional to

centers or subjects and 1) based on the sample sizes and the correlation. Figure 4.4 shows ri~ure (.1 ~ariance as a lunclion 01 ralio 01 sin~les 10 rairs (Is~), inlraclass correlal ion (r~o), on~ wei~~ls (w) wi I~ e~uol Irealmenl al local ion (1~:ls:l) on~ 1~~ cenlers (~:1~~)

, , I.U t.U e.n 1.21

'.15 '.15

'.1' '.11 ...... n

0.0' •.U'1,--~~~.....-.....-.....-.....-.....-~ I.a '.1 •.2 '.J to. '.S •.• '.1 I.' I.' 1.1 , .• 0.1 0.1 0.3 to. '.5 I.' 0.7 •.• 1.1 I.' o

o '.25. ',25

'.21 •. n

'.15 1.15

1.11

0.05

0.' '.1 '.1 '.J to' '.J •. t: t.r t . '.0 0 .• 0.1 0.3 to' 0.5 0.' 0.1 , •• I.t l.t o . 0.25. '.U

I.U

II. IS

0.10

0.05 '.'5

0.00 •. nl1,--.-.~~~+--...---l-.....--.-.=~ 0.0 1.1 o.~ O.J '.4 •.5 I.' '.7 •.• •.• 1.1 '.0 0.1 0.1 O.J 0.4 '.5 I.' '.7 I.' t.' \.' o

, 0.25• o.n

0.20 '.n

0.15 0.1$

0.10 '.10

0.05

o...L-_..L--L.__-======" •. OO1.--J...--'-- --=:;:=== l~

~ariance ~ ri ~ ure LJ as a lunclion 01 lrealmenl imoalance (I )I

inIra cIass correlalion (rno) I ana wei~nls (w) lor Is r: 1 or L1J ana 1~ ~ cenlers (~:1~~)

, ..t(_)-If(p)oo.5-'I"(.) r... t<...).' '

•.U• '.1'5•

'.21 •.U

'.15 '.1$

'.1' I,ll '.DS ..., ...... 1.1 •.1 I.' ... I.' 1.1 '.1 ... 1.1 ...... 1.1 1.1 ...... 1.1 ... 1.1 I.' I •

'etf(-aar-11{p)-.a$oo1/f(_) rorl(ap"" , t.ts• 0.25

'.1' I.lI

'.IS '.15

1.1' '.11

'.05 '.15

I." I,n ... '.1 1.2 I.' I.' I.' I.' 1.1 1.1 1.1 1.1 1.1 '.1 .., ...... '.\ I.' I.J 1.1 1.1 I.'

'.n• t.2:I•

'.10 '.lD

'.IS '.15

•.• '.1 D.l '.J I .• '.S 1.1 1.7 I.' '.! 1.0 ••• 0.1 '.2 II.,) '.4 0.5 '.S '.7 f.' '.S I.•

•. n· 0.25· I.n •. z,

'.15 •• 15

....~_~_~~ ...j..._~J,..;::;::: D.' 0.1 I.: '.J 0.4 '.1 '.1 O.J 0.' '.1 La ••• 0,1 0.2 0.] 0.4 •.5 1.1 •. 7 0.' O.t 1.G ~RA" 0 ri~ure 4.4 Ral io of variance from o~l imal wei~nls 10 11 1~ variance from a ~riori wei~nls Dasea on ~ro~orlional 10 numoer of cenlers for various inlraclass correlalions (rno) ana 1~~ cenlers (H=1~~)

rno=.~~ 1

+ + + + J + + 4 + + + J + + + + rno=.~J h C 1 + rIIO=.U *ao;~-____ I~-___ -:=:::=-=:::::--:===------__--=1.1 ,/ - --RGfGZ------= ------1 ''ii' ---s -----

4 ~ 1~ 11 H 1b 1~ L~

rjr 72

the ratio of variance from a priori weights (proportional to the number ofcenters for no more than moderate intraclass correlation (PI < .5) and half way between one and proportional to the number of centers for larger intraclass correlation (PI L 5» to vari­

ance from optimal variance reducing weights. The relatively flat plot indicates using these

a priori weights does not result in much loss in efficiency unless intraclass correlation is quite large (PI > .9). When the ratio of singleton to paired centers (lap) increases above

10, the efficiency ofthe a priori weights decreases somewhat for PI = .6, .7, and .8.

With sufficiently large npA, npB, nSA and nSB (i.e. > 25 so that H > 100 and N ~ 150), the vector ofmeans

y = [Y PIA' Y P"2B' YPIB' YP2A> YSIA> WSIB]" (4.9)

where YPij is the mean of the subjects assigned to treatment j in the ith order in the npj

paired centers and Y Slj is the mean of the subjects assigned to treatment j in the nSj singletonlUflpaired centers, are approximately normally distributed with covariance matrix Var(y), estimated by the estimator Vy. Thus, methods such as weighted least squares

(WLS), generalized estimating equations (GEEs), mixed models (MIXMODs), and generalized linear models (GLMs) could be used to model the means depending on the scale of response and the assumptions one is comfortable making. WLS allows a small

number of categorical covariates or additional indirect adjustments via randomization

analysis of covariance (RANCOVA), while GEE methods, :MIXMODs, and GLMs allow

continuous covariates and effect modifiers. The design matrix 1 100 0 0 1 0 1 000 1 0 0 0 0 0 x= (4.10) 1 1 1 0 1 0 1 101 0 1 100 1 0 0 corresponds to the parameters f3 = [,u TA I 71'"2 1/J I TA71'"2 TA1/J)' for the mean for first patients assigned treatment B in complete centers, the difference between treatments

(A - B), the accrual difference ("period two" effect), singleton center effect, and the two 73 interactions ofthe treatment difference with accrual effect and with singleton center effect, respectively. Removing the last two effects (treatment interactions) from the model, without affecting the , is desired. (If these terms are significant their variation could potentially be explained by expanding the data structure with additional strata and adding other factors to the model.) One can potentially reduce the model further by removing the 7f2 and 1/J terms as well.

With the general covariance matrix specified below, weighted least squares could be used to perform analyses. Under this general covariance structure

Var(d) = ~[ntA(arA +a~B - 2PI,A:Ba lA a 2B) +ntB(arB +a~A - 2PI,B:Aa1B a 2A)] + (1 - w)2(niAa?A + niBarB)' which reduces to (4.8) assuming exchangeability (compound symmetry). This covariance structure could possibly be simplified; for instance, the paired centers may have exchangeable (compound symmetric) structure for each accrual order separately (i.e. arA = O"~B and arB = a~A) or the same exchangeable structure for each order (i.e. arA = O"iB = arB = aiA and PI,A:B = PI,B:A)'

Var(y) = _10"2 nSA lA

and Cov(yPIj'Y P':!j') = PI, j:j' a1j 0"2j' /npj IS estimated by

(n~l)n 'L(Yhlj - YPlj) (Yh2j' - Ynj'); while In singleton (unpaired) centers PJ PJ h

~( 1 - )2. In is estimated by (ns·-l)nsLi Yhlj - YSlj , all cases, J J h 74 errj = er; + er;ij' for i = 1,2 and j = A,B. So, the exchangeable covariance matrix is written as:

1 [1 PI] npA PI 1

2 1 [1 PI] Var(y) = er nn PI 1

_1 nsa where er2 = er; + er~ is the common variance with error and center components and PI is the common intraclass correlation. Many methods (e.g. random effects models and some

GEE models) utilize an exchangeable correlation structure. Since the exchangeable structure uses fewer parameters, these models have somewhat less stringent sample size requirements than those with more general covariance structures. The random effects model in equation (4.2) actually imposes these exchangeable restrictions; GEE methods impose such restrictions as well, but produce estimated variances robust to misspecified correlation structures. The validity of the exchangeable covariance structure can be assessed. WLS methods estimate the primary parameters via P= (X'ViIX)-IX'Vily and its variance matrix as Vp = (X'ViIX)-I, where Vy is the estimated covariance ...... matrix. Secondary parameters can be estimated through linear combinations C {3, which have corresponding approximately chi-square . statistic

4.4. Example: Multicenter Herpes Zoster Study

The trial described in Subsection 1.5.3. can be analyzed using these proposed methods to compare two treatments at a time. Since two treatments have blocks of size four, only the first assignment for each treatment will be considered for this example, in order to form pairs (blocks of size two). Treatments given for 7 days (standard versus test) are assessed, as well as test therapies (7 day versus 14 day). Since approximately 75

three percent of the data are censored for the response (number of days until no new

lesions), the censored observations will be set to the maximum number ofdays (Le. 21) for

illustration purposes. Results from analyses follow. For the seven day treatments (standard versus test), 85% of the pairs are complete,

so fsp = 33/194 = .17 with 33 singletons and 194 pairs. The usual, complete data analyses do not show differences (paired t-test p=.26 and :MIXMOD treatment effect p=.31), but the paired data weighted least squares (WLS) model for mean number ofdays

with a general covariance structure showed a significant treatment difference. The model

based on the design matrix

10101 100] 1000 [ 1 1 1 1 reduced to one with only an intercept and treatment effect (i.e. the first two columns) showing a mean estimate of3.25 days for the standard treatment and an extra .99 days for

the test treatment with a standard error of .34 days yielding a p-value of.0036. However,

analyses can use the vector of estimated paired and singleton means (4.9) and its

covariance matrix: 3.9247 .1460 .005013 3.1935 .005013 .0697 3.2772 .0512 -.003383 y= Vy= 4.7327 -.003383 .2251 4.0000 1.0062 4.2000 1.4240 which are compatible with a general covariance WLS model containing an intercept and a treatment effect since interactions with treatment were not significant and patient accrual and center effects were not significant either, as displayed in Table 4.1. This final reduced model estimated the standard treatment mean as 3.27 days and an additional .96 days for the test treatment with a standard error of .33 days corresponding to p=.0038. Even though the test statistic and p-value are not smaller than in the paired data only analysis, 76 the standard error ofthe treatment effect was reduced showing an advantage to using the combined data. Note that the estimated covariance matrix suggests heteroscedasticity incompatible with exchangeability. Test drug treatments at two durations (7 and 14 days) are compared using the first patient in the allocation block assigned each. Over four fifths (84%) of these pairs are complete, so lap = 38/194 = .20. The usual, complete pair only analyses do not show any treatment difference among the 194 pairs (paired t-test p=.90 and MIXMOD treatment effect p=.90). However, as shown in Table 4.2, WLS analysis of the mean number ofdays for complete pairs shows significant treatment difference: the 14 day mean is 3.80 days and the difference for the 7 day treatment is estimated as .64 fewer days with a standard error of .27 which corresponds to a p-value of .0160. Using the vector of estimated means and covariance matrix for the combined paired and singleton data

3.4111 .0921 .0362 4.1222 .0362 .1916 3.6731 .0970 .001965 Y= 3.1250 Vy= .001965 .0151 4.3889 1.0009 3.8500 0.9314

with WLS also results in a reduced model since interactions with treatment were non- significant as were effects for singleton centers and second patients (accrual). This yielded a mean estimate of 3.81 days for the 14 day treatment and .63 fewer days for the 7 day treatment with a standard error of .26 and a p-value of .0147. So, incorporating the singleton/unpaired data increases the precision and the types of centers appear relatively homogeneous. 77

Table 4.1 WLS results for 7 day standard and test treatments

Paired Only Combined '" '" Model Parameter {3 SE p {3 SE P Full Intercept 3.2772 0.2263 <.0001 3.2772 0.2263 <.0001 Test Treatment 0.6475 0.4440 .1448 0.6475 0.4440 .1448 2nd Patient -0.0837 0.3476 .8098 -0.0837 0.3476 .8098 Singleton Center 0.9228 1.2146 .4474 Treatment x Pt 0.8916 0.7037 .2051 0.8916 0.7037 .2051 Treatment x Ctr -0.8475 1.6209 .6011

Reduced Intercept 3.1911 0.2158 <.0001 3.2063 0.2149 <.0001 Test Treatment 1.0055 0.3426 .0033 0.9499 0.3346 .0045 2nd Patient 0.1369 0.3009 .6492 0.1336 0.3009 .6571 Singleton Center 0.3199 0.7990 .6889

Final Intercept 3.2506 0.1717 <.0001 3.2698 0.1699 <.0001 Test Treatment 0.9949 0.3418 .0036 0.9562 0.3304 .0038 78

Table 4.2 WLS results for 7 and. 14 day test treatments

Paired Only Combined "- "- Model Parameter f3 SE p f3 SE P Full Intercept 3.6731 0.3114 <.0001 3.6731 0.3114 <.0001 7 Day Treatment -0.2620 0.4348 .1448 -0.2620 0.4348 .5468 2nd Patient 0.4491 0.5372 .8098 0.4491 0.5372 .4031 Singleton Center 0.1769 1.0141 .8615 Treatment x Pt -0.7353 0.6871 .2051 -0.7353 0.6871 .2846 Treatment x Ctr 0.8009 1.4565 .5824

Reduced Intercept 3.82721 0.2761 <.0001 3.7988 0.2739 <.0001 7 Day Treatment -0.6159 0.2823 .0291 -0.5701 0.2766 .0393 2nd Patient -0.0597 0.2498 .8111 -0.0735 0.2492 .7680 Singleton Center 0.5857 0.7277 .4209

Final Intercept 3.7999 0.2514 <.0001 3.8064 0.2433 <.0001 7 Day Treatment -0.6389 0.2653 .0160 -0.6296 0.2581 .0147

.. 79

4.5. Su"mmary and Conclusions With incomplete paired treatment allocation in multicenter RCTs, some analysts advise only using complete pairs (i.e. discarding centers with incomplete assignments).

This chapter introduced a paired estimator adjusting for accrual order differences within center, much like a crossover trial. The weighted approach combining paired within­ center treatment differences with unpaired/singleton between-center treatment differences incorporates adjustment for accrual order and makes use of all the data. With small to moderate positive within-center correlations (as might be seen in RCTs) optimal (variance minimizing) weights reduce the emphasis according" to the amount of unpaired/singleton centers or patients. Chapter 5 Finite Population Framework for Incomplete Pairs in Multicenter Trials 5.1. Introduction Consider the same situation as in Chapter 4: a RCT of two treatments with many centers and either one or two patients within each center. Minimal assumption methods which draw conclusions about the particular sample randomized could have more validity in this scenario than the model-based methods examined in Chapter 4. Bouza (1983) addressed the situation ofincomplete paired data from a survey sampling perspective with a finite population model. While RCTs are not probability samples or simple random samples from populations of known size but actually are convenience samples with random allocation to treatments, many analytic approaches, for example random effects models, assume the patients are representative of some population of interest through nonstatistical arguments. Although Bouza's weighted approach with derivation ofoptimal weights is useful for multicenter RCTs with incomplete assigrtment, he does not incorporate the order of the pair in the complete assignments. He addressed nonresponse by subsampling the missing data, which is not feasible in RCTs due to ethical considerations (i.e., patient participation is voluntary). Wei (1983) developed a class of rank tests for interchangeability of two correlated, possibly incomplete, responses, which is similar in spirit to the aforementioned weighted methods combining paired and unpaired data. After the pooled responses, he combines two independent components - a paired quantity and an unpaired one - with weights based on the proportion of paired data; (e.g. the unpaired data component uses a rank sum type of test for location shift 81

hypotheses.) Permutation distributions and their asymptotic estimators are given for complete randomization.

Thus, clinical trials involving two treatments with one or two patients in each of many centers, can be examined under a finite population framework with the indicator

function method, as in Cornfield (1944). Letting h = 1, 2",·, H index the centers, i = 1,2 index accrual order of patients in the hth center, andj = A,B index the

treatments, consider Yhij, the response ofthe ith patient (i.e., the patient in the ith accrual

order) in the hth center assigned to the jth treatment, as a fixed constant for each patient.

Two indicator functions, one as a recruitment index (Lh ) and the other as "a treatment

assignment index (Uh ), are used to develop a randomization model:

L _ {I ifcenter h has 2 patients (paired) h - 0 ifcenter h has 1 patient (singleton/unpaired)

ifpatient 1 in center h is assigned treatment A and Uh - -{1 0 ifpatient 1 in center h is assigned treatment B ifcenter h assigned treatments in order A: B = {~ ifcenter h assigned treatments in order B:A .

Lh and Uh can be independent components of the study design. Sometimes, the Lh are fixed by the study design, while other times they are considered consequences of the study. Usually the L h and Uh are assumed to be statistically independent. When the L h are not independent ofthe Uh, compatibility of unpaired/singleton and paired centers is a general concern; specifically selection bias poses a problem. Regardless ofthe relationship between L h and Uh, the U h are evaluated after fixing the L h , either via design or con- ditional expectations. The following indicator sums relate to sample sizes. 82

ELh =np E(l- L h ) = n s h h

E(l- Lh)Uh = nSA E(l - Lh)(l - Uh) = nSB h h ELhUh = nPA ELh (1 - Uh) = npB h h

EUh =nA E(l- Uh) = nB h h The following figure summarizes the two indicator functions and sample sizes.

Lh 0 1 Uh 0 nSB npB nB 1 nSA npA n A ns n p -H

Define the following group sums:

where Sgij is the sum of the gth group (9 = P for paired centers; 9 = S for singleton/un­ paired centers) assigned the jth treatment in the ith order. Then, the estimated treatment differences accounting for the allocation order (equations 4.5 and 4.6) from the previous chapter would be computed as follows.

For paired centers, 'ELhUh (YhlA -Yh2B) - h dp = -"'------+ (5.1) 2'ELhUh h and for singleton/unpaired centers 83

(5.2)

As in Section 4.3. an overall treatment difference estimator incorporating the accrual order

is a weighted sum of ratio estimators with weights w: d = W dp + (1 - w) d8 (equation

4.7). As in the previous chapter, W can be chosen from various options. This weighted estimator can be rewritten as

d = I{YPIA -YP2B +YP2A -YPIB} + (1- W){YSIA -YSIB}

= I{YPIA - Y PIB} + I{YP2A - Y P2B} + (1 - wHySIA - Y SIB}

= WI {y PIA - Y PIB} + W2{Y P2A - Y P2B} + (1 - WI - W2){YSIA - Y SIB}' which uses separate weights for each accrual order (WI for first patients and W2 for second patients) in centers with two patients. In the presence of accrual x treatment interaction the weight W2 = 0 would use only the first patients.

Alternatively, instead of weighting paired and unpaired/singleton centers together as in 4.7, a weighted estimator variant combines separate components for each accrual order; i.e. all first patients and all second patients together: -d - * {SSlA+SPIA _ SSlB+SPIB} (1 - *)(8 / - 8 / ) a - W nA nB +. W P2A npB P2B npA .

The components of this estimator are dependent, however, according to the intracenter correlation for the paired centers. In situations with selection bias determining the number ofpatients enrolled in centers so accrual x treatment is nonnull, w* = 1 would produce an unbiased estimate of the treatment effect, as in analyzing the first period of a two period two treatment crossover study in the presence ofcarryover.

An alternate, heuristic estimator is the simple difference oftreatment means (ignoring allocation order) = SSIA+Sp.A _ SSIB+Sp.B SSIB+Sp.B d* (5.3) H - I:(l-Lh)(l-Uh) H - I:(l-Lh)Uh R-nSA ' h h which directly uses the sums defined earlier, instead of combining separate estimators for the paired and singleton/unpaired center treatment differences. 84

For these estimators, the nonnull scenario is evaluated, along with two null hypothe- sis scenarios, to assess bias under various conditions through the above indicator func- tions. The nonnull hypothesis HA is that patients assigned to one treatment have a greater preponderance oflarger responses than patients assigned to the other treatment. (HA con­ cerns possible assignments; actual assignments would correspond to the observable ten­ dencies for YhiA > Yh'iB and YhiA > Yhi'B - or vice versa - where h =/::. h'and i =/::. i', since patients in the same center cannot actually be assigned different treatments in the same ac­ crual period.) The general null hypothesis, Hog: YhiA = YhiB = Yhi (or alternatively

Hog: TA- TB + (-ll"T)iA - (?TT)iB + (1/JThA - (1/JT)hB + (1/J?TT)hlA - (1/J?TThlB = 0), is that the response for the ith patient in the hth center is equivalent for the two possible treatment assignments. The average null hypothesis, Hoa: EEYhiA = EEYhiB (or h i h i alternatively Hoa: TA - TB)' is that the responses are equivalent for the two possible treatment assignments after aggregating over center and accrual order.

5.2. Fixed Patient Accrual and Random Treatment Allocation

The first conditions are that the patient accrual indices (Lh ) are fixed constants by design and that the allocation indices (Uh ) are independent and identically distributed (i.i.d.) according to a Bernoulli distribution with probability one-half, denoted

Uh *l. Bernoulli(V. So, the sum of the allocation indices across centers (the number of patients assigned to treatment A) is distributed binomially with probability one-half,

Binomial(H,~). denoted EUh rv Although n p and n s are fixed since the L h are fixed, h each particular center has its first patient assigned to treatment A or B at random, making npA, npB, nSA, andnsB random as well. Although E(Uh) = 1/2 unconditionally, the following conditional expectations apply: 85

These results follow from nPA and nSA each being distributed Binomial(ng , ~), where 9 = P for paired or 9 = S for unpaired/singleton, and conditioning on either of them induces a hypergeometric distribution for the treatment allocation indicator (Uh ). Further, the unconditional expected values of the group totals are: E(nsA) = !f = E(nSB) and E(npA) = T= E(npB)'

Under the nonnull scenario, HA , that the Yhij are distinct for each treatment group, the estimators are evaluated. The following expectations ofsums apply:

Sum (Sgij) Fixed npA, npB, nsA, nSB Random npA, npB, nSA, nSB E(SgijIHA, ngA, ngB) E(SgijIHA) nPA'L,L SPIA ~ h YhlA ~ 'L,Lh YhlA h h nps'L,L SP2A ~ h Yh2A ~'L,Lh Yh2A h h nps'L,L SPIB ~ h YhlB ~ 'L,Lh YhlB h h nPA'L,L SP2B ~ h Yh2B ~'L,Lh Yh2B h h SSlA nSA 2:::(1 - Lh)YhlA ~ 2:::(1 - Lh)YhlA ns h h SSIB ~ss 2:::(1 - Lh)YhlB ~ 2:::(1 - Lh)YhlB 5 h h

s for singleton/unpaired centers . where 9 = { P fior palre. d centers "J = AB

The estimators for paired (5.1) and singleton/unpaired (5.2) centers have the following expectations. 86

= IE {E(SPIAILh,nPA,nPB)-E(SP2sILh,nPA,nPB) 2 nl'A, nl'B npA

+ E(SP2AILh,nPA,npS)-E(SPlsILh,nPA,nPB)} nps

= 2~1' (EE LhYhiA - EELh YhiS) h i h i

= YP'A - YP.e = /-LP·A - IJ-p·e = TA -TS +(ifrr)PA - ('?rr)PS, where (1/;T)pj- = npELh('l/JT)hj.1 If there is no center x treatment effect or if the h center x treatment effect "averages out" in the paired centers (as in the parametrization restrictions across all centers), then E(dplLh , HA ) = rA - Ts, which shows the estimator is unbiased.

Alternatively, if one can assume the L h are random by design or are essentially random phenomena with fixed n p and n s , then

E(dpIHA , np,n s ) = 2~l'EE (YhiA - YhiS)E(Lhlnp,n s ) h i

= 2k{EEYhiA - EEYhiB} = TA -Ts, which is unbiased. h i h i Under the global null hypothesis (Hog) responses are equivalent for each treatment assignment (YhiA = YhiB = Yhd; thus, the paired estimator is unbiased under the general null without simplifying assumptions: E(dplL h , Hog) = O. However, under the average null hypothesis (Hoa : r A = TB), the paired estimator is unbiased (i.e. E{dpILh,Hoa} = 0) only if the center x treatment interaction is null (Le. (1iJr)PA = (1iJr)PB) or if the L h are actually random. Table 5.1 summarizes the assumptions needed for unbiasedness for each estimator and hypothesis for this section.

For the singleton/unpaired centers, the expected values are as follows. E(d IL H) = E {E(SSlAILh,nSA,nSB) _ E(SSlBILh,nSA,nSB) } s h, A flsA • nSB nSA nSB

= ~ (E(l - Lh)YhlA - E(l - Lh) YhlB) $ h h 87

= Y SIA - YSIB = J.LSIA - J.LSIB

= rA- r B + (rrr)IA - (7rr)IB + ("1j;r)SA - (1j;r)SB + (1iJ7rr)SlA - (1iJ7rr)SlB' Assuming no center x treatment and no three-way interactions (or at least that the model restrictions across all centers also hold in the unpaired centers; i.e.

(1jrr)Sj = 2:(1 - Lh)(1/n")hj = 0 and ("1j;1rT)Slj = 2:(1 - Lh)('ljnrT)hij = 0), as well as h h no accrual order x treatment interaction, E(dsIHA) = Y.A - Y.B = TA - r B, as in (4.3) showing unbiasedness under certain restrictions.

Again, an alternative scenario (random Lh; fixed np and ns) can be considered.

E(dsIHA,np,ns) = ~ {2:YhIAE(l- Lhlnp,ns) - EYhIBE(l- Lhlnp,ns)} S h h = ~ 7f fE YhlA - L:YhIB} = k{L: YhlA - 2:YhIB} S h h h h

= TA - TB + (1rT)IA - (1rT)IB, which shows unbiased ds under HA with random L h and null accrual x treatment effects. Under the global/general null hypothesis (Hog),

TA- TB + (7rT)IA - (1rT)IB' + ('If;T)hA - ('If;T)hB + ('If;1rT)hlA - ('ljnrT)hlB' 0; thus, E(dsILh , Hog) = 0, indicating unbiasedness. Under the average null hypothesis

(Boa), TA = TB, so the same assumptions as for HA are required for unbiasedness: no accrual x treatment interaction and either random L h or both no center x treatment and no three-way interactions.

So with (4.7) for the overall weighted estimator and fixed w,

Assuming interactions involving centers sum to zero separately for paired and for unpaired/singleton centers, as well as no accrual interaction with treatment in unpaired!

overall weighted mean difference estimator is unbiased under these assumptions. 88

Table 5.1 Assumptions needed for unbiasedness with fixed accrual

Statistic Hypothesis Required assumptions for unbiasednessu a d p HAlHoa no center x treatment or L h essentially random Hog none

i d s HAlHoa no interactions involving treatment or (no accrual x treatment & L h essentially random) Hog none

d HAlHoa no interactionsi involving treatment or (no accrual x treatment & L h essentially random) Hog none

d. HAIRoa no accrual effect no accrual x treatment equal allocation in unpaired centers or Lh essentially random . Hog (no accrual effect, no center effect & no accrual x center) or equal allocation in unpaired centers

u E( . IHA) = TA - Te; E( . IHog ) = 0; E( . IHoa ) = 0 are unbiased a or interactions involving center "average out" within those centers

as model restrictions do in all centers together

no accrual x treatment, noa center x treatment & noa 3-way interaction 89

Under the general null hypothesis (Hog), E(dIHog ) = 0, showing unbiasedness without additional assumptions; under the average null (Hoa) with no accrual x treatment, no center x treatment, and no three-way interaction, E(dIHoa) = 0, showing unbiasedness with the same restrictions as for HA . Then, the heuristic simple mean estimator (5.3) has the expectation: Era IH ) = E , {E{SSlAILh'HA,n~}+E{SP.AILh'HA,n~} \.u. * A n. np+nSA

_ E{SSlBILh'HA,n~}+E{SP'BILh'HA,n~}} . np+nSB

(since E(npA) = E(npB) = np/2). Ifaccrual (period/order) and accrual x treatment effects are null:

LYhlj = LYh2j for j = A,S (5.4) h h or equivalently Ylj = Y2j or in terms of (4.1) 7r} + (7rThj = 7r2 + (7rT)2j, 90

= ~ EnsA,nsa { ~S {np;'~SA ~ ~(1 - Lh)YhiA - np~SB ~~(1 - Lh)YhiB}

+ np';'nSA ~~ LhYhiA - np';'nSB ~~LhYhiB }.

If, in addition, nSA = nSB = n s /2 then E(d.. IRA ) = 21 EE(l + Lh)(YhiA - YhiB) and h i so, in tenns of (4.1), E(d.. IHA) = TA - TB + N[(¢r)PA - (¢T)PB] due to model restrictions, where N = H + n p is the total number of patients. Thus, the heuristic estimator would be unbiased with no accrual and accrual x treatment effects and no center x treatment interaction (or at least center x treatment "averaging out" in paired centers) as well as equal allocation in unpaired centers.

Alternatively, via the earlier argument concerning random L h with fixed n p and n s , - { nSAY·A+npY.A nSBY.B+npY·B }__ E(d.. IHA ) = E nsA,ns8 np+nSA - np+nSB = Y.A - Y.B = TA - TB, showing with (4.3) the heuristic estimator is unbiased under no accrual effect and no accrual x treatment, as well as either random L h or equal treatment allocation in singletons.

If treatments in singleton centers are balanced (i.e. nSA = nSB = n s(2), theri the expectation without assuming (5.4) is

ECd .. IRA) = 2n ~n (L:(1 - L h ) (YhlA - YhlB) + L:L: Lh (YhiA - YhiB») P S h h i

= 2np~ns {np(y PIA +Y P2A - YPIs - YP2B) + ns(ySIA - YSIS)}

= 2np~ns {2n p (p,p'A - p,p.s) + ns(P,SIA - P,SIS) }

= TA - T6 + 2np~n$ {2n p {(¢r)PA - (¢T)PS}

+ n s {(7rT)lA - (7rT)lS + (1Pr)SA - (1Pr)SS

+ (¢7rT)SIA - (1}J7rT) SIS}}.

Thus, the no accrual x treatment assumption (i.e. (7rT)lA = (7rT)lS), as well as the no center x treatment and no three-way interactions (or at least that the model restrictions, 91

across all centers also hold in the paired and unpaired centers separately), is needed in

addition to equal allocation in singletons for this estimator to be unbiased under the nonnulI hypothesis.

Under the global/general null hypothesis (Hog), E(d IHo ) = E {npyp.+nSAYSl _ npYp.+nsBYsl } ... g nSA,nSB np+nSA np+nSB __ {np(nsA -nSB) } = {YSl - YpJ EnsA,nss (np+nSA) (np+nsB)

= 0, ifnsA = nSB or fisl = fip. (Le·¢s -¢p + 1rl + (1jJJr)Sl = 0); where YSI = ~ E(l - Lh)Yhl and YP. = 2:~ EELhYhi, • h Phi

So under the general null, either equal treatment allocation in the unpaired/singleton centers yields unbiasedness or null center, accrual, and accrual x center effects give unbi- asedness. Under the average null hypothesis (Hoa), TA = TB so the same assumptions as for HA are required for unbiasedness: no accrual x treatment interaction and either random Lh or both no center x treatment and no three-way interactions.

5.3. Random Accrual Independent ofTreatment Allocation The next situation involves both random accrual and random treatment allocation with these two components being independent (denoted 1..). This reflects designs which randomly assign the number of patients a particular center will enroll. Suppose

L h % BemouIli(-y), Uh ~ Bemoulli(V, and Lh 1.. Uh for all h so the sum of center indicators, which is the number of paired centers, is n p = EL h rv Binomial(H;'Y) (i.e. n p h is random, along with n s ). (Again, i.i.d. means independent and identically distributed.)

Although unconditionally E(Uh) = 1/2, the following conditional expectations apply: 92

E(LhUhlri.) = n;t, E(Lh(l- Uh)I7l'·) = j;, E{(l - Lh)Uhl7l'.} = ";\ E{(l - Lh)(l - Uh)lri.} = "';a. where 7l'. =(nPA' npB, nSA, nsa), which are distributed multinomially; conditioning on these group totals induces a product hypergeometric distribution for the indicator func­ tions. Further, the group totals are binomial: nPA rv Binomial(H, 1'/2) and nSA rv Binomial(H, I?). SO, the unconditional expected values ofthe group totals are: E(nSA) = H(~--Y) = E(nSB) and E(npA) = ~'Y = E(npB).

The following expectations ofsums apply under the nonnull HA :

Sum (Sgij) Fixed npA, npB, nSA, nSB Random npA, npB, nsA, nSB E(SgijIHA, 11.'.) E(SgijIHA)

npA ~ 'Y,,"", SPIA H LJ YhlA 2" LJ YhlA h h

npB ~ 'Y,,"", H LJ Yh2A '2 Li Yh2A h h

npB ~ 'Y,,"", H LJ YhlB 2" Li YhlB h h

npA ~ 'Y,,"", H LJ Yh2B '2 Li Yh2B h h

nSA ~ l-'Y"""' SSIA H LJYhlA -2-LiYhlA h h

nSB ~ l-'Y """' IT LJYhlB -2-L.JYhlB h h h {S for singleton/unpaired centers . were 9 = P for paired centers ' J = A, B

The estimators for paired (5.1) and singleton/unpaired (5.2) centers have the following expectations:

E(dpIHA) = EndE(dpIHA,n~)} {E(SPIAIHA,n~)-E(SP2BIHA,n~) E(SP2AIHA,n~)-E(SPlBIHA,n~)} = IE I + 2 n· npA npB

= {E (YhlA - Yh2B) + E(Yh2A - YhlB)} /2H h h

= 2k (EEYhiA - EEYhiB) = IA - IB, which is unbiased. h i h i 93

Under both the general (global) (Hog) and average (Boa) null hypotheses, the paired estimator is unbiased in this scenario with independent random accrual and allocation. E(dpIHos) = E(dpIHoa) = 0, since 'A - 'B = 0 for both. Table 5.2 summarizes unbiasedness assumptions for estimators and hypotheses for random accrual independent ofrandom allocation. Additionally, the expected value for the unpaired/singleton centers is:

E(d IH ) = E I {E(d IH n~)} = E I {E(SSIAIHA'!!~) _ E(SSIBIHA,!!~) } sAn. s Al - n· nSA nSB

= 1(L,YhIA - L,YhIB) = YIA - YIB = 'A - 'B + (1rI)IA - (1rI)IB' h h

Assuming a null accrual x treatment interaction (i.e. (1rI)IA = (1rI)IB), then

E(dsIHA ) = 'A - 'B' showing unbiasedness under no accrual x treatment interaction. Under the generaVglobal null hypothesis (Hog), both the treatment effect and accrual x treatment interaction are null (as in Section 5.2), so the singleton/unpaired estimator is unbiased. Under the second null hypothesis, the average null (Hoa), however, the same assumption as the nonnull hypothesis is needed for unbiasedness; namely, the accrual x treatment interaction must be assumed null.

With fixed w, the expected value of (4.7) is

Assuming no accrual x treatment interaction, E(dIHA ) = 'A - 'B, showing the overall weighted estimator ofthe difference ofmeans is unbiased under (1rI)IA = (1rI)IB'

The first or general null hypothesis (Hog) has the weighted estimator being unbiased without an added assumption. Under the average null hypothesis (Boa), the weighted estimator is unbiased assuming no accrual x treatment interaction. 94

Table 5..2 Unbiasedness assumptions with random accrual independent ofallocation

Statistic Hypothesis Required assumptions for unbiasednessu

d p HAlHoglHoa none

d s HAlHoa no accrual x treatment interaction Hog none

d HAlHoa no accrual x treatment interaction Hog none

d. HAlHoa no accrual effect or equal group allocationn no accrual x treatment HOg no accrual effect or equal group allocationn

u E( . IHA) = T A - TB; E( . IHog) = 0; E( . lHoa) = 0 are unbiased

n 95

The heuristic simple mean estimator (5.3) ignoring accrual order has the expectation:

E(d.IHA ) = Ezt- {E(d.IHA,!!~)} = E , {E{SSlAIHA'!!:~}+E{SP'AIHA'!!:~} _ E{SSlBIHA,!!:~}+E{Sp'BIHA'!!:~} } n· np+nSA np+nSB

nSAL YhlA +npAL YhlA +nPBLYh2A nSBLYhlB+npBLYhlB+nPAL Yh2B } =E, h h h h h h n· { H(np+nSA) H(np+nSB)'

Assuming (5.4) (i.e. no accrual (period/order) or accrual x treatment effects), the expect-

ation reduces to _ _, {(nSA+nPA+nPB)~~YhlA (nSB+nPB+nPA)f~YhlB} E(d.IHA )- En. 2H(np+nSA) 2H(np+nSB)

= 2k {EEYhiA - EEYhiB} = TA -TB, showing the heuristic estima- h i h i tor is unbiased under assumption (5.4).

If only group totals are equal (i.e. npA = npB = n p/2 and nSA = nSB = n s /2),

E(d.IHA )-= En~{H(2::+ns) (~YhlA - ~YhlB) + H(2::+ns) (~~YhiA - ~~YhiB)}'

Thus, the no accrual x treatment assumption is needed for this estimator to be unbiased

under the nonnull hypothesis even with equal treatment allocation within paired and

unpaired centers.

_ (- _- )E, {nPA(nPA+nSA)-nPB(nPB+nSB)} - Yl· Y2· n· (np+nSA) (np+nSB) ,

where fi;. = 1I EYhi = J.1 + 7ri h = 0, if - - - - 0 r E {nPA(nPA+nSA)-nPB(nPB+nSB)} - 0 Yl. Y2. - o!!" (np+nSA)(np+nSB) -, showing that the heuristic estimator is unbiased under reasonable assumptions: no accruaVorder effect (7rl = 7r2) or equal group totals (npA = npB and nSA = nSB)' Unbiasedness ofthe heuristic estimator applies under certain assumptions: 96

• since E(npA) = E(npB), E(nSA) = E(nSB), and with large H .. .. {nPA(nPA+nSA)-nP8(nP8+nSB)} by Taylor senes hneanzatlon Erf. (np+nSA) (np+nsB) = 0

• ifthere is no accrual (period/order) effect then Yl. = Y2. or, equivalently, 1rl = 1r2.

Under the average null hypothesis (Hoa), the same assumptions as in the nonnull case are required for unbiasedness of the heuristic estimator: no accrual x treatment interaction along with either equal treatment group allocation within paired and within singleton/un­ paired centers separately or no accrual effect.

5.4. Accrual Dependent on Allocation: Selection Bias

The next case extends the previous exploration to the situation in which the accrual function and the allocation function are both random but not independent ( I). One noteworthy example of this situation is selection bias in which patients are allocated to treatments differentially. Investigators enrolling patients (intentionally or unintentionally) based on some characteristic of previous patients' responses or of current patients' underlying conditions can cause imbalances in treatment allocation (Blackwell and

Hodges, 1957). The enrollment mechanism of a trial can introduce bias if it is not consistently applied to all potential participants in a standardized protocol (Wei and

Cowan, 1988).

Suppose that in a particular trial, for example, centers that have first patients assigned to treatment A with increased response YhlA (corresponding to, say, improved response) have decreased enrollment ofsecond patients who would have been assigned to treatment

B. Thus, the probability ofa center enroIling two patients ifthe first patient was assigned

A is less than the probability of a center having two patients if the first patient was assigned treatment 8 (i.e. Pr{Lh = llUh = I} = 'YhA < Pr{Lh = l/Uh = O} = 'YhB) so that the number of paired centers with A:B allocation is less than the number of paired centers with B:A allocation (i.e. nPA < npB) and the number ofunpaired/singleton centers 97 assigning only A is greater than the number assigning only B (i.e. nSA > nsa). Thus, with no treatment effect (7A = 7 B), these equalities ofmeans could hold: YSIB = YPIB = YP2A; but, for example, YSIA > YSIB and YSIA > YPIA could apply for group means because of the propensity for larger values of YhlA to be in the singleton/unpaired centers. In this example, the accrual x treatment and center x treatment effects 'would be nonnulI. Thus, one may not be able to assume these interactions are null in the presence ofselection bias.

5.4.1. General Selection Bias The scenario in Section 5.3. can be generalized for the above situation to allow the accrual and treatment allocation indices to be dependent (i.e. L h t Uh);

L h rv BemouIli('Yh) but they are not independent and identically distributed (i.e. each L h has its own probability parameter 'Yh) and Uh ikl· Bemoullia), as before. Selection bias can then be modelled separately for each treatment and each center. As before, group allocation totals (npA, npB, nSA, and nSB) are random and Pr{Uh } = 1/2, uncondition- ally. Define the following terms:

Pr{Lh = llUh = I} = "'ihA Pr{Lh = aluh = I} = 1 - "'ihA

Pr{Lh = 11 Uh = a} = "'ihB Pr{Lh = a/Uh = a} = 1 - "'ihB

'Yh = "'ihA /2 + "'ihB/2 = hhA + "'ihB) /2 1 - 'Yh = {2 - hhA + "'ihB)}/2

The "'ihj, which are probabilities of two patients in a center given the treatment allocation sequence, could be examined with a linear model "'ihj = ~ + 17h + OJ or a logistic model /-.h +

With no selection bias Pr{Uh = 11Lh = I} = 1/2, i.e. equal probability of treatment allocation regardless of the number of patients in the center. More generally, 98 using Bayes Theorem, the above conditional probabilities can be rewritten conditioning on the number ofpatients in each center: "YhA "YhA Pr{U = 1 Lh = 1} = + = -2- h I "YhA "YhB "Yh { } "YhB "YhB Pr Uh = °ILh = 1 = "YhA +"YhB = 2-"Yh l-"YhA (l-"YhA)12 Pr{Uh = llLh = O} = 2-(-YhA+'rhB) = l-"Yh l-"YhB (l-"YhB)12 Pr{Uh = OILh = O} = 2-(-YhA+"YhB) = l-"Yh .

Then under the nonnull hypothesis HA , the following expectations ofsums apply.

"" 'YhA 1 "" ~'YhA+'YhaL h Yh1A 2" ~'YhAYh1A

"" 'Yha L 1 "" SP2A ~'YhA+'Yha h Yh2A 2"~'YhBYh2A

1 "" ""LJ'YhA+'Yha'Yha L hYhlB 2"LJ'YhBYhlB h h 1 SP2B "" 'YhA L -2 """"hAYh2B ~'YhA+'Yha h Yh2B ~ I

"" l-'YhA L)y ""hA)Yh1A LJ 2-(-YhA+'Yha) (1 - h hlA !-, ""(1LJ - I h - h

SSlB E2-(~~:~ha) (1 - Lh)Yh1B ~ E(l - 'YhB)Yh1B h h h {S for singleton/unpaired centers . were 9 = P for paired centers 'J = A, B.

The group allocation totals have the following expectations:

Then, expectations of the separate estimators for the paired (5.1) and singleton/un­ paired (5.2) centers under the nonnull hypothesis cannot be assessed as in the previous two sections because the L h are not i.i.d. Instead, in large samples, expectations can be approximated with Taylor series expansions. The approximations are shown below.

For the paired estimator 99

L"YhA (YhlA -Yh2B) L"YhB(Yh2A -YhlB) _ h + ....:.:h'-- _ 2L"YhA 2L"YhB h h

L"YhA {('¢nrhl -('¢nrh2+(1/JT)hA -(1/JThB+('¢nrr)hlA -('¢nrrh2B} + ...:.:h'-- _ 2L"YhA h

L"YhB{ ('¢nrh2 -('¢nr)hl +(1/JThA -(1/JThB+('¢nrrh2A -('¢nrrhlB} + ...:.:h _ 2L"YhB h which is unbiased only with no center x accrual, no center x treatment and no three-way interactions. If'Yhj = 'Yh for each j (or equivalently with a linear model 'Yhj = ~ + 1Jh for the conditional probabilities), then the probability a center has two patients is actually only dependent on center characteristics, such as size, reputation, or case mix; so

E(dpIHA,'Yhj = 'Yh) ~ TA - TB + {E'Yh{(1Irr)hA - ("pThB} }/{E'Yh}, which is unbias- h h ed if the center x treatment interaction is null. However, if'Yhj = 'Yj for all h (or equiva- lently 'Yhj = ~ + OJ), then the probability a center has two patients only depends on the treatment assigned to the first patient; so the paired estimator is unbiased in large samples.

Under the general null hypothesis the paired estimator has the expectation 100

which is unbiased if there is no center x accrual interaction or ifeither 'Yhj = 'Yj for all h

(or equivalently 'Yhj = ~ + OJ) or 'Yhj = 'Yh for each j (or equivalently 'Yhj = ~ + 1]h). Under the average null hypothesis, the same assumptions as the nonnull hypothesis are needed for unbiasedness. Assumptions required for unbiased approximate expected values for all estimators and hypotheses are summarized in Table 5.3. Now, the estimator for unpaired/singleton centers has the large sample expectation: ECd IH ) = E{ SSlA _ SSlB} s A nSA nSB

~ E(1-"IhA)YhlA ~ E(l-"IhB)YhlB ~ h -:.:h _ ~~(l-"IhA) ~~(l-"IhB)

E(l-"IhA)YhlA E(l-"IhB)YhlB h -"'h'-- _

= TA- TB + (rrThA - (11'"7)lB E(l-"IhA){1!Jh+(1/m)hl +(1/ffhA+(1/mr hlA} + ~h:.....- _ E(l-"IhA) h E(1-"IhB){'¢h+(1/mhl +(1/ffhB+(1/mrhlB} h which would be unbiased assuming no center effect along with no accrual x treatment, no center x accrual, no center x treatment and no three-way interactions. If 'Yhj = 'Yh for each j, then the large sample expectation would be unbiased if pairwise and three-way interactions involving treatment are null. Furthermore, if 'Yhj = 'Yj for a,ll h then the unpaired/singleton estimator would be unbiased in large samples ifthe accrual x treatment interaction is actually null. Under the general null hypothesis (Hog), 101

Table 5.3 Large sample unbiasedness with random accrual dependent on allocation (general selection bias)

Statistic Hypothesis Required assumptions fo~ unbiasednessu i d p HAlHoa no interactions involving center or "Ihj = "Ij'Vh Hog no center x treatment or "Ihj = ,h 'Vj or 'hi = ,j 'Vh

d s HAlHoa (no interactions and no center effect) or (,hi = ,h 'Vj and no interactionsii involving treatment) or ("Ihi = "Ii'Vh and no accrual x treatment) Hog (no center effect and no center x accrual) or ,hj = "Ih 'Vj or 'hi = "Ii 'Vh

d HAlHoa no interactions and no (unpaired) center effect or ("{hi = "Ih 'Vj and no interactionsii involving treatment) or ("Ihi =,j 'Vh and no accrual x treatment) Hog (no (unpaired) center effect and no center x accrual) or "Ihi = "Ih 'Vj or ,hj =,j 'Vh

d. HAlHoa (no accrual, no center & no interactions) or (,hj = ,h 'Vj and no interactionsii involving treatment) or ("Ihi = "Ij 'Vh and no accrual x treatment) Hog (no accrual effect, no center effect and no center x accrual) or 'hi = "Ih 'Vj or ('hi = ,i 'Vh and no accrual effect)

U E( . IHA) = TA - TB~ E( . IHog) = O~ E( . IHoa) = 0 are unbiased

i no center x accrual, no center x treatment & no 3-way interactions

ii no center x treatment, no accrual x treatment & no 3-way interactions 102

so, the singleton/unpaired estimator is unbiased if there is no center effect and no center x accrual interaction or if either "Yhj = "Yj for all h or "Yhj = "Yh for each j. Under the average null hypothesis (BOa), the same requirements as in the nonnull case apply.

Then, with (4.7) and fixed w, the expected value for the weighted estimator is

ECdIHA ) = E(wdp ) + E[(l- w) ds] = wE(dp ) + (1- w) ECds)

~ TA -TB+ (1 - W)[(7rT)lA - (7rT)lB] WL'YhA {('l/nrhI - ('l/nrh2+(1/rrhA -(1/rrhB+('l/nrrhlA - ('l/nrrh2B} + h 2L'YhA h WL'YhB{ ('l/nrh2-(t/nrhi +(1/rrhA -(1/rr)hB+('l/nrrh2A -('l/nrrhlB} + h

(l-W)L(l-~thAH7Ph+(t/nrhl +(1/rrhA +("l/nrrhlA} + h L(l-'YhA) h (l-W)L(l-'YhBH7Ph+("l/nrhl +(1/rrhB+('l/nrrhlB} h

showing that the overall difference of means estimator would be unbiased for HA only in the absence ofpairwise and three-way interactions as well as center effect in the singleton! unpaired centers. If 'hj = 'h for each j, then the weighted estimator has large sample unbiasedness if center x treatment is null and in the unpaired/singleton centers the accrual x treatment and the three-way interactions are null. If "Yhj = "Yj for all h, the weighted estimator would be unbiased, in the large sample sense, with null accrual x treatment interaction in the unpaired/singleton centers.

Under the general null hypothesis (HOg), the expected value of the weighted estima­ tor is unbiased with no center x accrual interaction and no center effect in the singleton! unpaired centers or ifeither "Yhj = "Yj for all h or 'hj = 'h for all j. With the average null 103 hypothesis, however, the pairwise and three-way interactions again pose a problem with selection bias. Additionally, expectations ofthe heuristic estimator (5.3) can be derived, as shown.

E(d./H ) = E{ SSIA+SPIA +SP2A _ SSIB+SPlB+SP2B} A np+nSA np+nSB

!L:(l-"fhA)YhIA+! L:"fhAYhIA+! L:"fhBYh2A ~ h h h !L:(l-"fhA)+!L:"fhA+!L:"fhB h h h ! }:(l-"fhB)YhlB+!L:"fhBYhlB+! L:"fhAYh2B h h h !L:(l-"fhB)+!L:"fhB+!L:"fhA h h h LYhlA+L"'YhBYh2A LYhlB+L"'YhAYh2B h h h h H+L"'YhB H+L"'YhA h h HL:("'YhA -"'YhB) ) A B 1 = T - T + (7r - 7r2) ( (H+L:~hB)(H+L:"'YhA) h h H(7rr) H(7rr) (7rr)2AL:"'YhB + lA _ IB + h H+L:"fhB H+L:"fhA H+L:"fhB h h h L:"fhB {'lfJh +('Ijm)h2+(1frrhA+('ljmrh2A} + .-;.;h'-- _ H+L:"fhB h L"'YhA {'lfJh+('Ijm)h2+(1frrhB+('ljmrh2B} h so the heuristic estimator would only be unbiased under the nonnull hypothesis with no accrual/order effect, no center effect, and no pairwise or three-way interactions. If

'Yhj = 'Yh for each j then the no treatment x accrual, no center x treatment and no three­ way interactions assumptions are still needed for unbiasedness. If'Yhj = 'Yj for all h then null accrual effect and accrual x treatment assumptions yield unbiasedness. Under the general null hypothesis (Hag), HL:bhA -"'YhB) L:"'YhB{'lfJh+('Ijm)h2} + ..:;h:....- _ g ~ 1 (H+L:~hB)(H+L:"'YhA) E(d.IHo ) (7r - 7r2) H+L"fhB h h h so the heuristic estimator would be unbiased with no accrual/order effect (i.e. 7rl = 7r2), no center effect and no center x accrual interaction; alternatively, with 'hj = 'h for each 104

j; or with '"'Ihj = '"'Ij for all h and no accrual effect. Under the average null hypothesis

(Ho,.), the same assumptions as the nonnull hypothesis apply.

5.4.2. Selection Bias in Treatment Sequences A selection bias model directed more towards the treatment effect allows the selection bias to be different for each treatment sequence (i.e. A:B versus B:A) but assumes it is the same across the centers with the same sequence, which is a simplification ofthe general selection bias model. So in terms ofthe previous model: '"'Ihj = '"'Ij for all h; thus, the L h have the same probability parameter, so L h % Bernoulli(;:y) and ~ll Uh % Bernoullia), but with L h I- Uh for h. Then, selection bias can be modelled separately for each treatment allocation sequence. Again, the group allocation totals

(npA' np6, nSA, and nS6) are random quantities which are jointly distributed multinomially with parameters '"'IA/2, '"'16/2, (1 - '"'IA)/2, (1 - '"'16)/2, respectively.

Unconditionally, Pr{Uh } = 1/2. Define the following terms:

Pr{Lh = 11Uh = I} = '"'IA Pr{Lh = O/Uh = I} = 1 - '"'IA

Pr{Lh = llUh = O} = '"'16 Pr{Lh = O/Uh = O} = 1 - '"'16

"Y = IA/2 + 16/2 = hA + 16)/2 1-;:Y = {2 - hA +,6)}/2

Again, using Bayes Theorem, the above conditional probabilities can be rewritten conditioning on the number ofpatients in the center:

Pr{U = 11L I} ~ 1'~ h h = = I'A +1'6 = 21' Pr{U = 0IL I} = ~ 1'~ h h = I'A +16 = 21 1-1A (1-I'A)/2 Pr{Uh = 11Lh = O} = 2-(-YA+16) = 1-1 1-1'6 (1-16)/2 Fr{Uh = 0ILh = O} = 2-(-YA+I'B) = 1-1' . 105

Then, the following expectations, conditioning on number ofpatients and totals, apply: npA nps } E{Uh = I/Lh = 1, npA, npB = -nP E{Uh = 0ILh = 1, nPA, nps} = -np SA E{U = llLh = 0, nSA, nSB} = nn E{Uh = O/L = 0, nSA, nSB} = nSB h s h ns

E{Lh = Iln p} = ]f E{Lh = Olns } = Ii·

Next, under the nonnull hypothesis HA , the following expectations ofsums apply.

'YA~8 ELh YhlA ~A EYhlA h h

SP2A 'YA~8 ELh Yh2A ~8 EYh2A h h

'YA-r:.'Y8 ELh YhlB ~8 EYhlB h h 'YA ~L 'YA~ 'YA+'Y8 ~ h Yh2B 2" ~Yh2B h h 2-~~:;'Y8) E(l - L h)YhlA l~'YA EYhlA h h

::8 ~(1 - Lh)YhlB 2-~~~'Y8) ~(1 - L h)YhlB 1;1'8 ~YhlB h { S for singleton/unpaired centers . were 9 = P for paired centers ,J = A, B.

Then, expectations of the separate estimators for the paired (5.1) and singleton/unpaired (5.2) centers under the nonnull hypothesis are as follows. L Era IH ) = 1 E {E [E(SPIA I h,nPA,npB)-E(SP2B ILh,npA,npB) \ u. p A 2 LhJnp npA,np8 npA

+ E(SP2A ILh,nPA,npB)-E(SPIB ILh,nPA,n ps )] } npB

= 2~pE(YhlA - Yh2B + Yh2A - YhlB)E{Lhlnp} h

= 2k (LEYhiA - EEYhiB) = TA - TB, which is unbiased even in the h i h i presence ofselection bias. Under both the general and average null hypotheses, the paired estimator is unbiased despite selection bias. Table 5.4 tabulates expected values' assumptions needed for unbiasedness for all estimators and hypotheses. 106

And Era IH ) = E {E [E{SS1AILh,nSA,nSB} _ E{SS1BILh,nSA,nSB}] } \.eM S A L"ln. nSA,nSS nSA nSB

= ~ ~(Yh1A - Yh1B)E{1 - Lhlns} = k{~Yh1A - ~Yh1B} • h h h = 7A - 7B + (11T)1A - (1I"7)1B' which would be unbiased assuming no accrual x treatment interaction. Selection bias could result from a nonnull accrual x treatment interaction (i.e. (1I"T)1A =/; (1I"7)1B). Under the general null hypothesis (Hog), however, the singleton/unpaired estimator is unbiased.

Under the average null hypothesis (Hoa), accrual x treatment interaction implies that the singleton estimator is not unbiased. Then, with (4.7) and fixed w, the expected value for the weighted estimator is

E(dIHA ) = E(wdp ) + E[(l- w)dsl = wE(dp ) + (1- w) E(ds)

= TA-TB+ (1 - W)[(1I"T)lA - (11'"T)1B], showing that the overall differ- ence of means estimator would be unbiased for HA only in the absence of accru- al x treatment interaction, but that interaction can be nonnull with selection bias.

Under the general null hypothesis (Hog), the expected value ofthe weighted estima­ tor is zero; i.e. it is unbiased. With the average null hypothesis, however, the accru- al x treatment interaction again poses a problem with selection bias.

Additionally, expectations ofthe heuristic estimator (5.3) can be derived, as shown. E(d IH ) = E ' {E ' {E(SS1AILh'!!:~)+E(SP1A ILh,!!:~)+E(S~A ILh,!!:~) '" A Lltln. n· np+nSA

_ E(SSlBILh,!!:~)+E(SP1B ILh,!!:~)+E(SP2B ILh,!!:~) }} np+nSB

= En~lnp+nsA { (Ttp';'nSA) [~A ~ (1 - Lh)YhlA

+ 7/ "'LLhYh1A + n~B ~LhYh2A ]} h h

B - En~lnp+nSB { (np';'nsB) [7I ~(1 - L h )Yh1B

+ ~B ~LhYh1B + n~A ~LhYh2B] } 107

Table 5.4 Unbiasedness with random accrual dependent on allocation (treatment allocation selection bias)

Statistic Hypothesis Required assumptions for unbiasednessu

d p HA/HOgIHoa none

d s HAlHoa no accrual x treatment interaction Hog none

d HAlHoa no accrual x treatment interaction Hog none

d* HAlHoa no accrual effect & no accrual x treatment interaction HOg no accrual effect

U E( . IHA) = TA - Te; E( . IHog ) = 0; E( . IHaa) = 0 are unbiased 108

= H(1~8) [(1 - 'YA)/2r; YhlA + 'YA/2r;YhlA + 'YB/2r;Yh2A]

- H(l~A) [(1 - 'YB)/2~YhlB + 'YB/2~YhlB + 'YA/2~Yh2B]

= H(1~8) [E YhlA + 'YBEYh2A] - H(l~A) [EYhlB + 'YAEYh2B] h h h h = 1~8 [(1r1 + TA + (1rThA) + 'YB(1r2 + TA+ (1rThA)]

- l~A [(1r1 +TB + (1rThB) + +'YA(1r2 +TB + (1rT)2B)] "'IA -"'IB = TA - TB+ (1rl - 1r2 ) (1 +"'IA)(l+"'IB)

so the heuristic estimator would only be unbiased under the nonnull hypothesis with no accrual/order effect and no accrual x treatment interaction. The expectation requires the same assumptions for unbiasedness as above even if it is conditional on the four-fold multinomial counts instead.

Under the general null hypothesis (Bog),

E(d.lHog ) = (1r1+'YB1r2)/(1 +'YB) - (1r1 +'YA 1r2)/(1 +'YA) "'IA -"'IB = (1rl - 1r2 ) (l+"'IA) (l+'YB) ,

= 0, if 1r1 - 7["2 = 0, showing that the heuristic estimator would be unbiased with no accrual/order effect (i.e. 7["1 = 7["2). Under the average null hypothesis

(Boa), as with the nonnull hypothesis, the accrual effect and accrual x treatment interaction need to be null for the heuristic estimator to be unbiased.

5.4.3. Notes on Selection Bias

Actual multicenter trials with missing data can be assessed for indications ofselection bias. Assumptions can be examined, but never completely verified. Association between recruitment indices (Lh) and treatment assignment indices (Uh) for the H centers provides evidence of selection bias, though lack of association does not necessarily mean lack of selection bias. A Fisher's exact test can assess the independence of {Ud and {Lh } in a 109

2 x 2 table. (With a moderate number of centers, the chi-square approximation can be used.) Comparing nPA versus the binomial distribution with parameters H and 1/2 can check the association between center treatment sequence and number ofcenters.

Moreover, the correlation between the recruitment indices (Lh ) and the responses for the first patient in each center (Yhlj), separately for each treatment group should be near zero and similar for each treatment group; the accounting for treatment also should be near zero. Any nonzero correlation between response and recruitment would be worrisome and indicative of differential recruitment in centers, based upon the response of the first subject. Further, absolute values of the correlations of the random allocation indices (Uh) with the first patients' responses (Yhlj) and with the second patients' responses (Yh2j) should be similar; otherwise selection bias might be suspected. Finally, methods used to assess carryover effect (period x treatment interaction) in crossover trials (Jones and Kenward, pp. 22-28, 1989; Senn, pp. 44-54, 1993) can be applied. In this situation accrual x treatment, which is analogous to carryover effect, can be examined with the sums of responses within centers (I:(YhlA + Yh2B) versus h

L::(YhlB + Yh2A) for each sequence (Uh). Graphical methods for carryover effect (Jones h and Kenward, pp. 43-45, 1989) can be used to examine accrual x treatment interaction as well. With one or two patients per center, however, the power to. detect the accrual x treatment interaction is usuaIly not very high, just as with carryover in crossover

RCTs. More powerful extensions to situations with more than one or two patients per center are discussed in the next chapter.

5.5. Summary and Conclusions This chapter investigates bias for several estimators for incomplete pairs of patients within a fixed number ofcenters (H). Although the heuristic estimator is attractive for its simplicity, it pools responses for each treatment group ignoring accrual order. Under the 110

nonnull (HA: YhiA =1= YhiB), general null (Hog: YhiA = YhiB = Yhi), and average null

assumption is not needed for the general null hypothesis. With selection bias, however, these assumptions may not hold.

The weighted estimator, which combines the paired and unpaired/singleton estima- tors with a priori weights, accounts for accrual order and makes use ofmore information than simply discarding data from unpaired/singleton centers. (Since the paired and single­ ton/unpaired estimators are components ofthe weighted estimator, only the weighted est-

imator will be summarized here.) Under both the average null and nonnull hypotheses, the

weighted estimator is unbiased with fewer assumptions than the heuristic estimator with

either fixed or random recruitment; with selection bias, however, even these reasonable

assumptions are no longer tenable. (The paired estimator is unbiased under the nonnull

and average null hypotheses even with selection bias; the weighted estimator is biased for these hypotheses because it includes the unpaired/singleton data.) Thus, missing data in the form of incomplete treatment block allocation from such trials should be examined for

patterns of missingness reflecting selection bias. Under the general null hypothesis, the weighted estimator is unbiased with either fixed or random recruitment indices (Lh ), even

in the presence ofselection bias. Chapter 6 A Random Effects Model for Incomplete Blocks in Multicenter Trials 6.1. Introduction Chapters 4 and 5 concern multicenter clinical trials composed of a large number of

centers each with one or two patients in order to study methods in depth, although most

multicenter RCTs include some centers with more than two patients. This chapter extends those methods and findings for two treatments to trials comprised of many centers but with more than one or two patients per center. The problem of centers with no patients

assigned to some treatments (i.e. incomplete designs) is again considered.

This situation can be viewed as being analogous to missing data in crossover trials with two treatments and more than two periods (i.e. higher order designs). Higher order crossover designs with two treatments have been examined as ways to assess carryover with more power (e.g. Jones and Kenward, Chapter 4, 1989; Matthews, 1994), since carryover can be estimated within-subjects, which reduces variance as in estimating treatment effect. The crossover trial sequences which lead to minimum variance unbiased estimators (MVUEs) of treatment and carryover effects and, hence, optimal designs for a given number of periods are: AA, BB, AB, BA for 2 periods; ABB, BAA for 3 periods;

AABB, BBAA, ABBA, BAAB for 4 periods; ABBAA, BAABB for 5 periods; and . . ABBAAB, BAABBA, AABBBA, BBAAAB for 6 periods (Ebbutt, 1984; Jones and

Kenward, 1989). (Notice that these designs include sequences, such as ABB and BAA, that are known as complements, reciprocals, mirrors, pairs or duals.) Although these sequences yield optimal designs for crossover trials with balanced complete sequence 112 allocation'and nonnull carryover in a minimum variance sense, they are not optimal in all situations; for example ABA and BAB are as efficient as ABB and BAA with null carryover and ABA and BAB are actually optimal with autoregressive errors (Matthews, •

1994). Moreover, the MVUE yielding designs (Ebbutt, 1984; Jones and Kenward, 1989) may not be 'optimal' when considering more than minimum variance for multicenter studies with incomplete blocks (sequences). For example, with two treatments and three periods, the design producing MVUEs has only two sequences (ABB and BAA) which can lead to selection bias since investigators would know that the second and third patients would always receive the same treatment and might enroll third patients based on second patients' responses. Thus, researchers may be willing to lose some efficiency in in­ complete multicenter RCTs by including additional sequences in order to reduce the threat of selection bias. Generally, incomplete multicenter RCTs can be viewed as multiperiod, two treatment crossover trials where the maximum within-center sample size corresponds to the number ofperiods.

In particular, this chapter considers three and four period designs, corresponding to centers with up to three or four patients. The sequences for three patients under consider­ ation are ABB, BAA, ABA, and BAB, while those for four patients are ABBA, BAAB,

ABAB, and BABA. The sequences AAB and BBA for three and AABB and BBAA for four patients could be added to the above list but they would be poor choices for centers only enrolling two patients since then both patients would be assigned the same treatments; thus, they are not considered further, although methods using them (in addition to, or instead of, certain sequences) could be developed. Although this chapter uses equal sequence allocation, the sequences could be assigned with unequal allocation, e.g. for three patients per center ABB and BAA could be assigned more than ABA and

BAB to increase efficiency. This chapter extends the methods in Chapter 4 to accommodate additional patients per center. 113

6.2. General Model Using similar notation as in Chapter 4, consider the parallel groups design for a ran­ domized controlled trial (RCT) with two treatments, many centers and up to I patients per center. Let h = 1,2"", H index the centers, i = 1,2, ... , I index the accrual order of patients in the hth center, j = A,S index treatment assignment, and Yhij denote the re­ sponse for the patient entering the study in the ith order and assigned to the jth treatment in the hth center. The number of sequences, K, can be at most 21, but is usually much less for 'optimality' reasons discussed in Section 6.1. Then, a general conceptual model for this situation is:

E(Yhij) = J.L + 'l/Jh + 7fi + Tj + ('l/J7f)hi + ('l/JT)hj + (7fT)ij + ('l/J7fThij + €hij (6.1) = J.Lhij + €hij where the €hij have E(Ehij) = 0, V(€hij) = a;, and COV(€hij, €hi1j/) = Pea; for j #- j' and i #- i', In this model, J.L is the overall mean, 'l/Jh are the center effects , 1ri are the accrual (period/order) effects, Tj are the treatment effects, (••)•• are the pairwise inter­ actions, and (...).... is the three-way interaction. The Ehij are random observational errors. Again, various restrictions apply to the model:

"L/'l/J7fT)nij = 2:)'l/J7fT)hij = ~('l/J7fT)hij = O. h i j The following means are ofinterest:

Yij = if LYhij = /-L + 7fi + Tj + if LEhij (6.2) h h (6.3)

The general model in equation (6.1) has problems, however, because some interactions are confounded (aliased) even with more than two patients per center. Thus, a model re- 114 lated to (5.4) in Chapter 5 is explored instead. Interactions involving treatment can be ex­ amined implicitly in the random effects model in the next section.

The general model in equation (6.1) reduces to one of the random effects model considered in Section 6.3. when ('l/J7f)hi = (7fT)ij = ('l/J7fThij = ('l/JT)hj = 0 for all h,i,j assuming no interactions.

At each center, for the general model in equation (6.1), the overall null hypothesis

(Ho) ofno treatment effect for each patient has Tj = (7fT)ij = ('l/JT)hj = (1/J7fT)hij = O. A case of interest allows Ehij to be random with Pe = 0, 00 > 0'; > 0 and (1/J7f)hi = 0 in addition to the null conditions (no interactions). Thus, E(YhijIHo) = Phij = P + 'l/Jh + 7fi and V ar(Yhij IHo) = a;' for h, i, and j.

6.3. Random Effects Model

Again, consider the scenario of a RCT with two treatments (A,B) assigned in H centers (where H is large). Assume a model similar to equation (6.1) with an overall mean (p), a random center effect ('l/Jh), an accrual effect (7fi), a treatment effect (Tj), and random error (€hij) for the response ofthe ith patient assigned to treatmentj in center h:

Yhij = J.L + 'l/Jh + 7fi + Tj + Ehij, (6.4) where random errors and random center effects are independent and identically distributed

(i.i.d.) according to normal distributions; i.e. €hij ikj. N(O, O';ij)' 'l/Jh % N(O, O'~), and Ehij and'l/Jh are independent for all h = 1,... , H, i = 1,2,... , I, and j = A,B.

6.3.1. Three Patient Maximum per Center

Now, suppose up to three patients are assigned to treatments in each center (i.e.

I = 3). A departure from notation in earlier chapters is needed to differentiate treatment sequences assigned to centers. Let njj'j' denote the number of centers with treatment j assigned to the first patient and treatment j I =1= j assigned to the second and third patients 115 where j = A,B. Furthermore, let nij'}' be the number of centers that allocated j to the first patient and j' to the second and that would have allocated j' to the third patient ifa third enrolled; similarly, let njJ'}' be the number of centers that allocated j to the first patient and that would have allocated j' to the second and third patients ifthey were en­ rolled. Similar notation is used for each sequence. So, for example, nASS is the number ofcenters with three patients, assigned to treatments A, B, and B in that order; nASb is the number of centers with two patients assigned to treatments A and B in that order, which would have assigned B to a third patient; and nAbb is the number of centers with one patient allocated to treatment A, which would have given the next two patients B.

Further, let njj'* denote the number of centers with two patients assigned to j and jf in that order regardless of the treatment that third patients would have received (i.e. njj'* = njj'} + njj'}'). Similarly let njn denote the number of centers with one patient who was allotted treatment j regardless of the treatments other patients in the centers would have received. (The methods in this section could be applied to the njj'}, njj;}" njJj', and njJ'}' directly if enough centers of each type were in the study; the method below, however, utilizes the njj'* and nj**.) Thus,

N = 3nABB + 3nBAA + 3nABA + 3nBAB + 2nAB* + 2nBA* + nA.... + ns.... total patients were randomly assigned to treatment as shown in the following figure: nt

'tripled' (i.e. complete) centers have three patients each, n p 'paired' centers have two patients each, and the other n s 'singleton' centers have only one patient. (Without loss of generality, the nominally scaled centers can be rearranged to produce the pattern shown in

Figure 6.1.) Let 1st = ns/nt denote the ratio of singleton/unpaired centers to tripled centers,1pt = np/nt denote the ratio ofpaired centers to tripled centers. Viewing these data in three parts as in Chapter 4 becomes unnecessarily cumber­ some. Methods for analyzing three period, two treatment crossover trials (Hafuer et aI,

1988) could be utilized, but they use contrasts to produce within-subject estimates and 116

Figure 6.1 Extended multicenter data structure

Patient

1 2 3 YUAI Y12Bl. Y13Bl

YnABB IAI YnASB 2Bl YnABB 3Bl

YnASB+l,lB2 YnABB+l,2A2 YnASB+l,3A2

YnABB +nBAA1B2 YnABB +nBAA2A2 YnASB +nBAA3A2 YnASB+nBAA+1,lA3 YnASB+nBAA+1,2B3. YnASB +nBAA+1,3A3

YnABB+nBAA+nASAIA3 YnABB +nBAA+nABA2B3 YnABB +nBAA+nABA3A3

YT/.t -nBAB+ l,1B4 Y7lt -nBAB+ l,2A4 YT/.t-nBAB+1,384

YT/.tl84 Y7lt2A4 YT/.t384 Y7lt+l,lA. YT/.t+l,2B. -

Ynt+nAB.1A. Yn t+n AB.2B. -

YT/.t+nAB.+l,lB. Y7lt+n AB.+1,2A. -

YT/.t+nplB. Y7lt+n p2A. -

YT/.t+np+l,lA. - -

YH-n8*.lA. - -

YH-nB..+l,lB. - -

YHlB. - -

where Yhijk (hijk)th response h = 1, ... , H indexes centers

i = 1,2,3 indexes accrual order

j = A, B indexes treatment

k = 1, ... ,4 indexes center sequence 117

would essentially discard the incomplete centers in these data. Thus, the approach at the

end ofChapter 4 is emphasized here. Define the vector ofmeans as y= [y~, y~, y~]', (6.5)

where Yt = [1'TlAl' 1'~Bl'1'T3Bl'1'TlB2'1'1'2A2' 1'T3A2'1'TlA3'1'1'2B3'1'T3A3'1'Tl84'1'1'2A4' 1'1'384]"

Y p = [1'PiA.' l'P2B.' l'PiB.' l'P2A.]" Y 6 = [ y SIA.' YSIB.]"

1'Tijk is the mean of thesubjects assigned to treatment j in the ith order in complete

(tripled) centers allocated sequence k, Ypijk is the mean of the subjects assigned to treatment j in the ith order in paired centers allotted sequence k, and Ysljk is the mean of the subjects assigned to treatment j in singleton/unpaired centers allocated sequence k.

that H ~ 200 and N > 450 for the most general covariance structure, but H > 60 for simpler covariance structures), Y is approximately normally distributed with covariance matrix Var(y), estimated by the estimator Vy. As in Chapter 4, WLS, GEE, :MIXMOD, . and GLM methods can be used to model the means depending on the scale of responses and on the relevance ofassumptions. Instead ofa reference cell model as in Chapter 4, an identity design matrix (JIS ) with a contrast matrix generates nine treatment effect esti- mates as a random effects model:

TTlAI-TlB2 T1'2A2-1'2Bl [1.1]" ['.,.,] TT3A2-T3Bl TTlA3-TlB4 Ct = [1-1]0 [I_I,] and Ct/3 = T1'2A4-1'2B3 (6.6) TT3A3-T384 [,.,]" ['·']['.'IJ TpIA.-PIB. TP2A.-P2B.

TS1A.-S1B• which corresponds to secondary parameters for the treatment differences for each comple­ mentary sequence pair or dual, each accrual order, and each number ofpatients. Homo- geneity of these treatment differences can be assessed as additional (tertiary) parameters with contrasts as shown later in this subsection. 118

WLS and :MIXMOD analyses can utilize a general structure as specified below.

n;BB ~:B:B

~ Vs:A:A

n~ VA:B:A

_l_V. o o nBAB B°A ° B Var(y) = _l_V,A'B'. nAB* •. n:Aa "Vs:A. _1a?- nA-a 1A _1en nBo_ 18 where the submatrices are

O"r~ PI,j:j:j:1jO"2j PI,j:j:j O"ljO"3j ] Vj:j:j = .. 0"2j PI,j:j:j0"2j0"3j [ .. 2 .... 0"3j with similar submatrices for other sequences for j = A, B. In tripled centers

oOk)=O"~.ln··,"':::' 1 Var(-yTtJ tJ JJJ (n ..,o,-l)no.,o,w'"'(Yh"--Yo'k)2tJ TtJ 111 111 h

COV(YTijk' Y Ti'j'k) = PI,j:j':jO"ij (Ji'l' Injj'j'

.:::. (n.o,.,!l)n.o,.,E(Yhij -flTij*)(Yhioj' -YTi'j'*), 11 1 11 1 h where" ~ " means "is estimated by"; similarly, in paired centers

Var(YPij*) = (J;j Injj'* --. (n .., !l)noo, E(Yhij-YPij*? 11 • JJ. h

while in singleton (unpaired) centers

Var(y51j) = O"Llnjn ~ (nj..!l)nj.. ~(Yhlj - Y5lj*)2.

Again, similar expressions apply for other sequences. In all cases, O";j = (J~ + (J;ij' for i = 1,2,3 and j = A,B. This covariance structure could possibly be simplified; for instance, the variance might depend only on accrual order (i.e. (JJj = (JJj = (JJ) or only on the treatment group 119

assigned (i.e. a7j = O'lrj = oJ); the. covariance could have the same exchangeable structure

(i.e. O'fj = eF2 and PI,j:j:j = PI). So, the exchangeable covariance matrix is: IF nABB g _l_Pg nBAA _1_Pg nASA Var(y) = eF2

where a2 = O'~ + a; is the common variance with error and center components and PI is an I x I matrix with ones on the diagonal and PI (the common intraclass correlation) off

the diagonal. Many methods (e.g. random effects models such as (6.4) and some GEE

models) assume exchangeable correlation. Since fewer parameters are estimated under exchangeability, sample sizes can be somewhat smaller than with more general covariance

structures. The fit of the exchangeable covariance structure can be assessed via WLS or

MIXMOD procedures.

Homogeneity of the nine treatment difference estimates can be examined with WLS methods. In addition, variation among these estimates can be modelled. The primary parameters are estimated using WLS as P= (X'ViIX)-lX'Vily and the variance matrix as Vp = (X'ViIX)-l, where Vy is the estimated covariance .matrix. Secondary parameters can be estimated through linear combinations C {3, which have corresponding approximately chi-square Wald test statistic (Cpy [CeX'VylX)-IC'rleCp). The contrasts of interest shown below can be used as premuitipliers of the above secondary ..... parameter estimates (Ct {3) to test the overall treatment difference and interactions with treatment. C avg trt tests for an overall or average treatment effect. Ctrtxs;'ze tests the treatment x center size interaction as characterized by variations in treatment difference estimates depending on the number of patients per center (one, two or three). The treat­ ment x accrual interaction as characterized by changes in treatment difference estimates 120

according to the entry ofpatients (first, second or third) in centers is tested with CtrlxQ.cc,

The treatment x sequence pair/dual interaction among complete centers (i.e. ABB/BAA versus ABA/BAB) assessed with Ctrtx6eq examines random sequence assignment. The three-way interaction involving treatment, accrual, and aspects of size and sequence pair

can be tested with C S-wQ.1I' The first row tests treatment x accrual x size, while the other two test treatment x accrual x sequence. Such contrasts can also be used to estimate the effects and interactions. Treatment difference estimates can be pooled where appropriate.

C cvg trt ~[1 1 1 11 1 1 11 1 I 1] (6.7) 1 1 1[1 1 11 1 1-3 -31 0] Ctrtxsize 6 1 1 1 1 1 1 0 o -6 -2 2 -2 2 Ctrtxcc~ 1[2 01 01 -~ ~] 6 0 3 -3 0 3 -3 0 I

Ctrtxseq HI 1 1 I -1 -1 -1 I 0 o I 0] -1 0 1 -1 0 -2 2 C 3-wcy -1 0 -1 1 0 0 0 lU 1 -1 0 -1 1 0 0 ~]

This identity matrix model (lIs) can be simplified. One simpler model ofinterest in- volves separate intercepts for the four complete sequences (tripled centers), intercepts for the two paired sequences, intercepts for the two singleton sequences, an overall treatment effect, and two predictors for accrual effect, corresponding to a fixed effects model. Thus, the design matrix would be specified as: 14 @ 13 ,1 X = (6.8) 18 x 11 [ where t~ = [011110010101101 ], t~ = [01110], t~ = [all], a23 = 14,1 @ [ O2,1 12 ]', a2 = 12,1 @ [ t s 02,1 ], Or,c is a matrix of zeros with r rows and c columns, and 1r ,c is a matrix of ones with r rows and c columns. This design matrix produces the parameter vector 121

where the eight intercepts are means for the groups based on number ofpatients per cent­ er and sequence and the last three parameters are increments attributed to treatment B and second and third accrual order, respectively. This model assumes homogeneity of treat­ ment and accrual. Equivalence among the complete sequences can be examined with the contrast C 6eq = [13,1 -I3 03,7]. Similarly, homogeneity ofmeans across the number -~I2 ofpatients per center can be tested with C 6 ize = [!12,4 ® 11,2 02,3], Another model involves three intercepts (based on center size) along with increments for treatment, accrual order, and sequence pair among complete sequences. Thus, the corresponding design matrix is: 112,1 tt a23 t 6 @ 16 ,1 J X = 14,1 t p a2 04 1 (6.9) 18 x 7 [ ', 12,1 t 6 O2,2 O2,1 which has the corresponding parameter vector {3 = [P-t P-p P-s TB 7r2 7rs AABA/BAB ]', where

P-t, P-P' and P-s are intercepts referring to first patients assigned treatment A for triples, pairs and singletons, TB is the increment due to treatment B, 7ri is the increment for the ith patient, and AABA'BAB is the increment for sequence pair ABA/BAB among complete se- quences (i.e. triples). Equivalence among the centers ofdifferent sizes can be tested with

C6ize = [12,1 -I2 O2,4]. The last column ofX can potentially be removed along with the last row of {3 since the sequences are randomly assigned to centers and therefore not expected to differ among complete centers.

Ifthe test involving C size supports homogeneity of the intercepts than a model with a single common intercept can be used. This model treats the centers as random and uses: 112 1 tt a23 J X = 14 ~ t p a2 and {3 = [J.L TB 7r" 7rs ]'. (6.10) 18 x 4[ ' - 12,1 t s 02,2

The overall homogeneity of accrual can be assessed with C acc = [02,2 I 2 ] while the equivalence ofthe second and third patients can be tested with C acc = [0 0 1 -1]. 122

6.3.2. Four Patient Maximum per Center Next, allow an additional patient to enter each center, so up to four patients are assigned to treatments in each center (i.e. 1=4). The same notation as in the previous subsection is used with the addition ofone extra patient; e.g. njj'j'j denotes the number of centers with treatment j assigned to the first and fourth patients and treatment j' =1= j assigned to the second and third patients where j = A,B. Thus,

N = 4nABBA + 4nBAAB + 4nABAB + 4nBABA

total patients were randomly assigned to treatment with n q 'quadrupled' (i.e. complete) centers having four patients each, nt 'tripled' centers having three patients each, n p 'paired' centers having two patients each, and the other n s 'singleton' centers having only one patient. Let fsq = ns/nq denote the ratio of singleton/unpaired centers to quadrupled centers, f pq = np/nq denote the ratio of paired centers to quadrupled centers, and ftq = nt/n, denote the ratio oftripled centers to quadrupled centers.

As noted in the previous subsection, viewing these data in four parts as in Chapter 4 complicates the scenario unnecessarily. This subsection extends the previous subsections' methods. Again, strategies for analyzing multiperiod two treatment crossover trials (Jones and Kenward, 1989) utilize within-subject contrasts, so with incomplete centers WLS methods are used as above.

Denote the vector of means as y = [y ~, y ~ , y ~, y ~ ]" where y t, Y p' and Y II are defined in the previous subsection and Y ~ = [Y ~12' Y ~34] is comprised oftwo vectors of dual/pair sequences:

Y q12 = [YQIAl' YQ2Bl' YQ3BllYQ4Al'YQlB2' YQ2A2'YQ3A2'YQ4B2]'and

Y q34 = [Y QIA3' YQ2B3' YQ3A3' YQ4B3' YQIB4' YQ2A4' YQ3B4' YQ4A4]" where

Y Qijk is the mean of the subjects assigned to treatment j in the ith order in complete

(quadrupled) centers allocated sequence k, YTijk is the mean of the subjects assigned to 123

treatment j in the ith order in tripled centers assigned sequence k, Y Pijk is the mean ofthe.

subjects assigned to treatment j in the ith order in paired centers allotted sequence k, and

11 Sljk is the mean of the subjects assigned to treatment j in singleton/unpaired centers

allocated sequence k. With sufficiently large nABBA , nBAAB, nABAB, nBABA, nABBa, n BAAb ,

approximately normally distributed with covariance matrix V ar(y), estimated by the esti­ mator Vy. Again, WLS, GEE, MIXMOD, and GLM methods can model the means

depending on the scale ofresponses and on the relevance ofassumptions. As in the previ-

ous subsection, an identity design matrix (I34) can be used along with a contrast matrix to

generate seventeen treatment effect estimates:

so the resulting cttJ with up to four patients per center is the vector containing the values

[ TQIAI-QIB2 TQ2A2-Q2Bl T Q:lA2-Q:lBl TQ4AI-Q4B2ITQIA:l-QIB4 TQ2A4-Q2B3 T Q:lA:l-Q3B4 TQ4A4-Q4B3 ]'

appended by the Cd3 (6.6) from the previous subsection. As before, this corresponds to secondary parameters for the treatment differences for each complementary sequence pair or dual, each accrual order, and each number ofpatients. Homogeneity ofthese treatment differences also can be assessed as additional (tertiary) parameters with contrasts as shown later in this subsection. 124

WLS .and MIXMOD analyses can utilize a general structure extended from the previous subsection. V ar(y) = D vQ , which is a diagonal matrix with the elements of Va = [v~ v~ v~ v~]' on the diagonal, where

v q = [n~eA ~:B:B:A ne:.e'Vs:A:A:B nA~Ae ~:B:A:B n~A 'Vs:A:B:A ]',

Vt = [n~Ba ~:B:B:a n~ 'Vs:A:A:b n~Ab VA:B:A:b ne:ea 'Vs:A:B:a ]',

v p = [n":'. ~:B:.:. n~.. 'Vs:A:.:. ]', and v 8 = [ n~.. a?A n~.. a?B ]'. These vectors are composed of 2 a Ij PI,j:j:j:jaIja2j PI,j:j:j:jaIja3j 2 a 2j 'Vj:j:j:j = PI,j:j:j:ja3j a 4j <) a4j

'Vj:j:j:J = Es'Vj:j:j:jE£, and Vf.j:.:. = E 2'Vj:j:j:jE2, where E s = [Is OS,1] and PI,j:j:j:j is replaced with PI,j:j:j:j; E 2 = [I2 O2,2] and

P I,j:j:j:j is replaced with PI,j:j:.:.' In quadrupled centers

Var(-y ...)- a 2 In ..,., . .:::::. 1 ""(y .. _ -y )2 QtJk - ij JJ JJ - (n",.,]] ]].-l)n]]·.1]].1. LJh htJ Qijk

COV(YQijk' Y Qi'j'k) = PI,j:j:j:jCTij ai'j'Injjljlj

where" "" " means "is estimated by". Likewise, in tripled centers

V ar(-y .. )- a 2 In ..,., .:::::. 1 ""(y .. _ -Y )2 TtJk - ij JJ JJ - (n ..'.''j-l)n'''.''jLJ ht; Tijk ]J J ]] J h

COV(YTijk, YTi'j'k) = PI,j:j:j:jCTij ai'j'Injj'jlJ

(n"'.'j!l)n"'.'jL(Yhij - YTij.)(Yhi'j'k - YTi'j'.); ]] ] ]]] h similarly, in paired cente~s

Pi;.) aT;.jnjj/n "" ( \) L(Yhi;· - YPi;· Var(y = njj'__ - njj'__ h • )2 125

while in singleton (unpaired) centers

-)- ~ 1 ~(. _- )2. V ar(Y S1)' - 0"1)'2/.nJ*** - (n. -l)n' LJ Yh1) YS1)'* , J... j*.* h m. all cases, O"ij2_- O"c2+ O"eij' 2 CoLor ~·-1-, ... , 4 and'-ASJ -,• Potentially, this covariance structure can be simplified. The variance, for example, might depend only on accrual order (i.e. ofj = O"rj = of) or only on the treatment group assigned (i.e. o-[j = of,j = O"J), Alternatively, the covariance could have an exchangeable

' 2 2 structure (I.e. O"ij = 0" andPI,j:j:j = PI ) . So, the exchangeable covariance matrix is:

V ar(y) = DVQ' where

VQ = 0"2 [ P4 [ nA~8A n8~ n~A8 n8~8A ] Ps [ n~8a nL. n~A. na:a. ] P 2 [ n~.. n8~" ][ nA~" n~.. ]]"

(72 = (7~ + (7; is the common variance with error and center components, and PI is an

I x I matrix with ones on the diagonal and PI (the common intraclass correlation) offthe diagonal. Many methods (e.g. random effects models such as (6.4) and some GEE mod- els) assume exchangeable correlation. Since fewer parameters are estimated under ex­ changeability, sample sizes can be somewhat smaller than with more gener~ covariimce structures. The exchangeable covariance structure can be examined with WLS or MIXMOD procedures. Homogeneity of the seventeen treatment difference estimates can be examined with WLS methods and variation among estimates can be modelled as in the previous sub- section. As with the previous subsection, the contrasts shown below can be used as ...... premultipliers of the above secondary parameter estimates (Cd3) to test the treatment difference and interactions of various other factors with treatment. C avg trt tests for an overall/average treatment effect across the seventeen treatment difference estimates.

Ctrtxsize tests the treatment x size interaction as characterized by variations in treatment differences depending on the number ofpatients per center (one, two, three, or four). The treatment x accrual interaction as characterized by changes in tr~atment difference' 126 estimates according to the entry of patients (first, second, third, or fourth) in cepters is tested with Ctrtxacc. The treatment x sequence pair/dual interaction (i.e. ABBA/BAAB versus ABAB/BABA), which can be assessed with Ctrtxcseq among complete centers and Ctrtxseq among both quadrupled and tripled centers, examines random sequence

l7[ 11111111111111111/11/1] 3 3 3 3 3 3 3 3 -8 -8 -8 -8 -8 -8 0 0 0 ] Ctrtxsize 2~ 33333333000 000 -12 -12 0 [ 33333333000000 0 0 -24 4-4 0 04-4 0 04-4 04-4 04-40J Ctrtxacc 1.. 0 5 -5 0 0 5 -5 0 0 5 -5 0 5 -5 0 0 0 20 [ 0 0 10 -10 0 0 10 -10 0 0 0 0 0 0 0 0 0

i[ 11111-1-1-1-11000100 % 01 0 ] t[ 1 1 11/-1 -1 -1 -11111/-1 -1 -I/O 010 ] 3 -3 0 0 3 -3 0 0 -3 3 0 -3 3 0 0 0 0 ] • Ctrtxaccxsi.ze 1~ 0 3 -3 0 0 3 -3 0 0 -3 3 0 -3 3 0 0 0 [ 4-4004-400000000-440 1 -1 0 01-1 1 0 0 1 -1 0 -1 1 0 0 0 0 ] C trt xacc xseq i 0 1 -1 0 0 -1 1 0 0 1 -1 0 -1 1 0 0 0 [ o 0 2 -2/ 0 0 -2 2 0 0 0 0 0 0 0 0 0

C trt X si.zex seq t[ 1 1 1 11-1 -1 -1 -11-1 -1 -Ill 1 110010 ] assignment. The three-way interactions oftreatment x accrual x center size, treatment x accrual x sequence pair, and treatment x size x sequence pair can be tested with contrast matrices as specified. Additionally, these contrasts can estimate the effects and interactions. Homogeneous treatment differences can be combined.

Simpler models can be fitted as in the previous subsection. For example, the fixed effects model design matrix would be expanded as 127

14 ® 14,1 a2:'2~:2,1 X = 14 ® 13,1 :: ] 34 x 16 [ t p a2,04,1 ' 12 t a 023, where t~ = [0 11 0Il 00 110 1 °111 °1 a] and a234 = 14,1 ® [ 03,1 13 ]'. This design matrix produces the parameter vector . , f3 = [ J.l-ABBA J.l-BAAB J.l-ABAB J.l-BABA J.l-ABBa J.l-BAAb J.l-ABAb J.l-BABa J.l-AB. J.l-BA. J.l-An J.l-Bn TB 1["2 1["3 1["4 ], where the twelve intercepts are means for the groups based on number of patients per center and sequence and the last four parameters are increments attributed to treatment B and second, third, and fourth accrual order, respectively. This model assumes homogene- ity oftreatment and accrual. Equivalence among the sequences can be examined with the contrast Ccseq = [ls,1 -Is 03,11] and among the complete sequences with

Gaeq = [13,1 -Is Is,1 -Is 03,7]. Also, homogeneity of means across center size

1 can be tested with Csize = [1134 [-t 1 ,4 1 I ] 03,3] . , -2 2 ® 11,2 Another model involves four intercepts (based on center size) along with increments for treatment, accrual order, and sequence pair. 116,1 X = 112,1 34 x 9 [ 14,1 which has the corresponding parameter vector f3 = [f-Lq J.l-t J.l-p f-Ls TB 7r2 7r3 7r4 AABAB/BABA ]', where J.l-q, J.l-t, J.l- p, and J.l-s are intercepts referring to first patients assigned treatment A for quadruples, triples, pairs and singletons,

TB is the increment due to treatment B, 7ri is the increment for the ith patient, and

AABAB/BABA is the increment for sequence pair ABAB/BABA among sequences (i.e. quadruples and triples). Equivalence among center size can be tested with

C size = [13 -13 03,5]. The last column of X can potentially be removed along with the last row of f3 since the sequences are randomly assigned to centers and therefore not expected to differ among complete centers. 128

Again, ifthe test involving Gain supports intercepts' homogeneity then a model with

a single common intercept can be used. This model treats the centers as random and uses:

x = 34 x 5

The overall homogeneity ofaccrual can be assessed with Gacc = [OS,2 Is ].

6.4. Summary and Conclusions With incomplete treatment allocation blocks in multicenter RCTs, some analysts ad- vise using only complete blocks (i.e. discarding centers with incomplete assignments).

This chapter extended the modelling approach of Chapter 4 adjusting for accrual order differences within center, much like a crossover trial. The weighted least squares ap­

proach combining complete within-center treatment differences with partially complete within-center treatment differences along with unpaired/singleton between-center treat- ment differences incorporates adjustment for accrual order and makes use of all the

observed data. This chapter showed methods for a maximum ofthree or four patients per

center and can be applied with various methodologies depending on the covariance struc­ ture assumed. These methods can be extended to larger centers as long as a sufficient number ofcenters are in each sequence by center size group. Chapter 7 Finite Population Framework for Incomplete Blocks in Multicenter Trials 7.1. Introduction Under a finite population framework as developed in Chapter 5, consider the

extension covered in Chapter 6: a RCT of two treatments with many centers and more

than one or two patients per center. Many analysis strategies, such as the random effects

models in Chapter 6, argue that patients represent some population of interest through

hypothetical (as opposed to statistical) reasoning. Since these minimal assumption

methods draw inferences about the particular patients randomized, they could have better

validity for this situation than the model-based methods examined in Chapter 6.

Thus, RCTs of two treatments with more than one or two patients in each of many centers can be examined under a finite population framework extending the indicator

function method (Cornfield, 1944) of Chapter 5. Let h = 1, 2, ... ,H index the centers,

i = 1,2, ... ,I index accrual order of patients in the hth center which assigned gpatients to treatment in sequence k, j = A,S index the treatments, 9 = 1,2, ... ,I index the number of patients in the hth center, and k = 1,2, ... , K index the sequence of treatments assigned to patients in the hth center. Consider Yhijk, the response ofthe ith patient (i.e., the patient in the ith accrual order) assigned to the jth treatment in the hth center which assigned its patients to treatments in sequence k, as a fixed constant for each patient. (Choice of sequences is discussed in the previous chapter.) Indicator functions, as in Chapter 5, can be used to denote recruitment and treatment assignment; however, with more than one or two patients per center, more than one indicator for each is needed. 130

Recruitment indices (Lhg) and center treatment sequence assignment indices (Uhk), are used to develop a randomization model: L _ {I ifcenter h assigned 9 patients to treatments hg - 0 otherwise

• u _ {I ifpatients in center h were assigned treatments in sequence k and hk - 0 otherwise .

These {Lhg} and {Uhk} are each mutually exclusive and exhaustive so that each ofthe h centers has exactly one Lhg and one Uhk equal to one; i.e. ELhg = EUhk = 1. For the 9 k extensions in this chapter, assume L hg and Uhk are statistically independent components of the study design. As in Chapter 5, sometimes the L hg are fixed by the study design, while other times they are considered consequences of the study. As before, the Uhk are evaluated after fixing the L hg, either via design or conditional expectations. As in Subsection 6.3 .1., consider a trial with a maximum ofthree patients per center (i.e. 1=3) and patients assigned within centers' according to one of four (i.e. K = 4) randomly allocated sequences: ABB, BAA, ABA, or BAB. (Note that 9 = 1,2,3 and

9 = S, P, T are used to indicate the same notion: number of patients per center.) Then the following indicator sums relate to sample sizes:

So, nt centers have three patients, n p have pairs (two), and n s have singles (one). Also, nABS centers would have assigned their patients in the sequence ABB regardless of the number of patients (one, two or three actually enrolled), nSAA to BAA regardless of number enrolled, nABA to ABA regardless of number enrolled, and nBAB to BAB regardless of number enrolled. For the number of centers assigning 9 patients in the kth sequence, additional sums, as in Chapter 6, are needed. Figure 7.1 below summarizes the sample sizes in a contingency table when the corresponding indicator functions equal one. 131

Figure 7.1 Sample sizes for centersequences and center sizes

L hl L h2 LhS

Uhl nAbb nABb nABB nABB

Uh2 nBaa nBAa nBAA nBAA

UhS nAba nABa nABA nABA

Uh4 nBab nBAb nBAB nBAB

ns n p nt H (As in Chapter 6, the upper case treatmentgroup letters indicate patients actually enrolled, while the lower case letters indicate treatments that would have been assigned to patients had they been enrolled.) As in Chapter 6, let nAB* = nABa + nABb, nBA* = nBAa + nBAb, nAn = nAba + nAbb, and nB** = nBaa + nBab, so that the study is comprised of centers and

Define the following group sums:

STIAI = 'L,LhSUhl YhlAl ST2BI = 'L,LhSUhl Yh2Bl h h

ST2A2 = 'L,LhS U h2 Yh2A2 h

ST2BS = 'L,LhsUhS Yh2B3 h 132

SSlB* = SSlB2 + SSlB4 = ELh1 U h2 YhlB2 + ELhlUh4 YhlB4 h h where Sgijk is the sum of the gth group (g = T for tripled centers; P for paired centers;

9 = S for singleton/unpaired centers) assigned the jth treatment in the ith order from sequence k. Then, the vector of means (equation (6.5) can be calculated from the sums as Ygijk = S gijk / njj'j' (from Figure 7.1) and the vector of differences among means

(equation (6.6) from the previous chapter) can be computed. For models of the means and differences among means, the nonnull scenario is evaluated, along with two null hyp- othesis scenarios, to assess bias under various conditions through the above indicator functions. As in Chapter 5, the nonnull hypothesis HA is that patients assigned to one treatment have a greater preponderance of larger responses than patients assigned the other treatment. The general null hypothesis, Bog: YhiAk = YhiBk= Yhi (or alternatively

the response for the ith patient in the hth center is equivalent for the two possible treatments. The average null hypothesis, Hon : LLYhiAk = LLYhiBk (or alternatively h i h i

Boa: 7 A - 7 B), is that the responses are equivalent for the two possible treatment group assignments after aggregating over center and accrual order.

7.2. Fixed Patient Accrual and Random Treatment Allocation

The first conditions are that the patient accrual indices (Lhg ) are fixed constants by design and that the allocation indices (Uhk ) are distributed multinomially with equal probabilities i.e. each with probability one-quarter. (As noted in the previous chapter, unequal allocation, for example probabilities of (i, !, t, t), could be used instead.) So, with equal allotment, the sum of the allocation indices across centers (the number of centers assigned to each sequence) is distributed binomially with probability ~, which is ~). denoted LUhk rv Binomial(H, Although nt, n p , and n s are fixed since the L hg are h fixed, each particular center has its patients assigned to a treatment sequence at random, 133

Although E(Uhk) = 1/4 unconditionally, the following conditional expectations apply:

~~ E(Uhl I L h3, nABB, nBAA, nABA, nBAB) = ~ nBAA E(Uh21 L h3 , nABB, nBAA, nABA, nBAB) = ~

nABA E(Uh31 Lh3, nABB, nBAA, nABA, nBAB) = ~

nBAB E(UMI Lh3, nABB, nBAA, nABA, nBAB) = n;-

nABb E(Uhll Lh2, nABb, nBAa, nABa, nBAb) = --;:;,­ p nBAa E (Uh2 Lh2 , nABb, nBAa, nABa, nBAb) =--;:;,­ I p nABa E (Uh3 L h2 , nABb, nBAa, nABa, nBAb) =--;:;,­ I p nBAb E(UMI L , nABb, nBAa, nABa, nBAb) = --;:;,- h2 p nAbb E(Uhll Lh1 , nAbb, nBaa, nAba, nBab) = ~

nBaa E (Uh2 LhI. nAbb, nBaa, nAba, nBab) =­ I n s nAba E (Uh3 I Lhl , nAbb, nBaa, nAba, nBab) = ~ nBab E (UM I LhI. nAbb, nSaa, nAba, nBab) = ~.

Conditioning on a set of tripled, paired or single numbers of centers induces a hypergeo- metric distribution for the treatment allocation indicators (Uhk ).

Under the nonnull scenario, HA , that the Yhijk are distinct for each treatment group, the estimators are evaluated. Expectations of the means (the vector {3) in equation (6.5) can be expressed as follows.

E(YTIAII L h3 ) = ~t I:Lh3Yh1Al E(YT2Bll Lh3 ) = ~t I:Lh3Yh2Bl h h

E(YT3Bll L h3 ) = ~ I:Lh3 Yh3Bl E(YTIB21 Lh3 ) = ~t I:Lh3YhlB2 t h h E(YT2A21 Lh3 ) = 1~ I:Lh3 Yh2A2 E(YT3A21 Lh3 ) = ~ I:Lh3Yh3A2 t h t h 134

E(YTIA31 L h3 ) = ~t ELh3YhlA3 E(YT2B31 L h3 ) = ~ E L h3Yh283 h h E(YT3A31 Lh3 ) = ~t ELh3Yh3A3 E(YTlB41 L h3 ) = ~ ELh3YhlB4 h h E(YT2A41 Lh3) = ~t E Lh3Yh2A4 E(YT3B41 L h3 ) = ~t E Lh3Yh384 h h .. E(YPIA.' Lh2) = 2~ ELhZ (YhlAl+YhlA3) E(YP2B.' L h2 ) = 2~ E Lh2(Yh2Bl+Yh283) P h P h

E(YPIB.' Lh2) = 2~ E L h2(Yh182+YhlB4) E(YP2A.' Lh2) = 2~ E L h2(Yh2A2+Yh2A4) P h P h

E(YSIA. I Lhd = 2~ ELhl (YhlAl+YhlA3) E(YSIB. I L h1 ) = 2~ ELhl (YhlB2+Yhl84). S h S h

Then, the expectation ofthe differences between means from the ith patients in the centers assigned duaVcomplementary sequences from equation (6.6), can be found as:

~t ELhS(YhlAl - YhlB2) h ~ ELhS (Yh2A2 - YhZBl) t h

~t ELh3 (YhSA2 - YhSBl) h

~t ELhS (YhlA3 - YhlB4) h ~ ELhS (Yh2A4 - YhZ83) t h ~ ELhS (Yh3A3 - Yh3B4) t h 1 . 2n ELh2 (YhlAl + YhlA3 - YhlB2 - YhlB4) P h 2~ ELhZ (Yh2A2 + Yh2A4 - Yh2Bl - Yh283) P h

2~ ELhl (YhlAl + YhlA3 - YhlB2 - YhlB4) S h

Thus, in terms ofthe model in (4.1), the expectation can be rewritten as: 135

~ E L h3 (TA-Ta+(7rT)IA-(1rT)la+('I/Jr)hA -('I/Jr)ha +(~)hlA -('l/nrr)hla) t h . .1. EL h3 (TA-Ta+(1rT)2A -(1rTha+('I/Jr)hA -('I/Jr)ha+('l/nrr)h2A -('l/nrr)h2a) Tlt h ~ ELh3 (TA-Ta+(7rTloA -(1rTloa+('I/Jr)hA-('I/Jr)ha+('l/nrT)h3A-('l/nrr)h3a) h .1. ELh3 (TA-Ta+(7rT)IA-(7rT)la+('I/Jr)hA-('I/Jr)ha+('l/nrT)hlA -('l/nrT)hla) 11t h ~t ~Lh3 E(Cd" L hg ) = (TA-Ta+(rrThA-(1rTha+('I/Jr)hA-('I/Jr)ha+('l/nrr)h2A-('l/nrT)h2a)

.1. E L h3 (TA-Ta+(7rTloA -(7rTloa+('I/Jr)hA -('I/Jr)ha+('l/nrr)h3A -('l/nrr)h3a) 11t h

Then, the contrast for average treatment difference (6.7) has the conditional expectation:

E(C.."gtrtCt.BIHA, Lhg ) = TA - Ta + ~ {2(1rT)IA + (1rThA - 2(1rT)la - (1rTha

+ ~ "L,Lh3('I/Jr)hA - ('I/Jr)ha) h + ;; "L,Lh2('I/Jr)hA - ('I/Jr)ha) P h

+ ; "L,Lhl ('I/Jr)hA - ('I/Jr)ha) , h

+ ;; "L,Lh2 ('l/nrTlhIA + ('l/nrTlh2A - ('l/nrT)hla - ('l/nrr)h2a) P h

+ ; "L,Lhl ('l/nrT)hlA - ('l/nrT)hla)}, , h which is biased unless accrual x treatment is null, center x treatment is null (or the effects sum to zero within singles, pairs and triples as they do overall), and the three way interaction is null. However, under the general null hypothesis (Hog) the overall or average treatment contrast of the means is unbiased; the average null (Ho.) has the same requirements as the nonnull hypothesis.

Alternatively, if one can assume the L hg are random by design or are essentially random phenomena with fixed nt, n p and n s , then 136

E(G....gtrtGtP/HA , Lhg, nt, n p, n.)

= TA- Te + ~ {(7rT)lA - (7rT)le} + ~ {(7rT)2A - (7rT)2e}, which is unbiased assuming accrual x treatment is null. Again, under the global null hypothesis (Hog) responses are equivalent for each treatment assignment (YhiA = Yhi e = Yhi); thus, the average treatment contrast is unbiased under the general null without simplifYing assumptions. However, under the average null hypothesis (!loa: TA = Te), it would be unbiased only if the accrual x treatment interaction is null. Table 7. I summarizes the assumptions needed for unbiasedness for each hypothesis in this section.

Treatment estimators from the reduced models (with design matrices (6.8)-(6.10)), considered in the previous chapter, can also be evaluated in the finite population framework. (The fixed effects model (6.8) has an estimator for the treatment difference in terms of Te from the P vector, so results in this chapter have had the sign reversed for comparability.) From the appropriate row in (X'X)·lX', the linear combination ofthe means used to estimate the fixed effects model treatment estimator is the same as

2~ [4 2 212 4 213 310]'GtP. Thus, the expected value ofthe parameter from the model is

E(C/i",edCtP/HA,Lhg) = TA -Ts 5 + 22 {(7rT),A + (7l'Tb - (7rT),e - (7rT)2S}

+ 2~~ 'L,Lh3«,pT)hA - (,pT)he) , h

+ 2in, 'L,Lh3(('l/nrT)hlA + ('l/nrT)h2A - ('l/nrT)hle - ('l/nrT)h2e) h + 22'n 'L,Lh2 «'l/nrT)hlA + (,p7rT)h2A - ('IjJ7rT)hle - ('IjJ7rT)h2e), P h which needs null accrual x treatment, null center x treatment (or zero sum within triples' and pairs as they do overall), and null three way interaction; the same is true for the 137

Table 7.1 Assumptions needed for unbiased treatment difference with fIXed acc;rual

Estimator Hypothesis Required assumptions for unbiasednessu noQ accrual x treatment no interactions; involving center or Lh essentially random none

Fixed effects noQ accrual x treatment model (6.8) no interactionsi'involving center or Lh essentially random none

U E( ·IHA ) = T A - T.; E( . lfiog) = 0; E( . IHoa) = 0 are unbiased a or interactions involving center "average out" within those centers

as model restrictions do in all centers together

noa center x treatment & noa 3-way interaction 138 average null (Ro.). Under the general null hypothesis (Hog) the fixed effects model estimator is unbiased.

Again, an alternative can be examined ifthe L hg can be interpreted as random with fixed nt, np and n•. In this scenario, the effects involving center no longer remain:

E(C/'zedCt/3/HA, Lhg ,nt, n p,n.) = TA - TB + ;2 {(1rT),A + (1rT)2A - (7rT),B - (7rThB}' which is slightly more biased than E(C,."gtrtCt/3IHA, Lhg, nt, np , n.) when accrual x treatment is nonnull.

The other reduced models from the previous chapter, (6.9) and (6.10), use the same linear combination of means as the contrast for average treatment(C,."gtrt). Thus they have the same amount ofbias in the estimation ofthe treatment effect. Refer to the table for summary ofassumptions for unbiasedness.

7.3. Random Accrual Independent ofTreatment Allocation • Next, consider both accrual and treatment allocation as random and statistically independent (denoted .1). Designs randomly assigning the numbers of patients that centers enroll fall into this situation. Suppose Lhg are multinomial each with probability

"(g and Uhk are multinomial with probability based on the allocation ratio and the number ofsequences; i.e. each offour equally allocated sequences has probability ~. In addition,

L hg .1 Uhk for all h, g, and k. Although unconditionally E(Uhk ) = 1/4, the following conditional expectations apply:

') nABB E (Lh3Uhl In=---g- E(Lh3 Uh2\ n'l = n~A n~A E(Lh3 Uh3 1n'l = E(Lh3 Uh4 ! n'l = n';;B

E(LhZUhl/ n'l = n;;b E(LhZ Uh2 1 n'l = n,;;a

E(LhZ Uh3/ n'l = n;;a E(LhZ Uh4 1 n'l = n';;b

I nAbb 1 E ( Lh1Uhl \ n) = 7I E(Lhl Uh2 n'l = n;;a

E(LhlUh3\ n'l = n;;a E(Lh1 Uh4 / n'l = n;;b. 139

which are distributed multinomially; conditioning on these group totals induces a product hypergeometric distribution for the product of indicator functions. Further, the group totals are each binomial with probability "Y9/4. So, the unconditional expected values of the group totals are:

E(nABB) = E(nBAA) = E(nABA) = E(nBAB) = H"Y3/4, E(nABb) = E(nBAa) = E(nABa) = E(nBAb) = H"Y2I4, and E(nAbb) = E(n8aa) = E(nAba) = E(n8ab) = H"Yd4.

The following expectations of means apply under the nonnull HA by taking expectations conditional on n' and then unconditioning :

E(YTIAI) = EYhlAl E(YT281) = EYh2Bl E(YT3BI) = EYh3Bl 1 h 1 h 1 h E(YTI82) = 1EYhlB2 E(YT2A2) = 1EYh2A2 E(YT3A2) = 1EYh3A2 h h h E(YTIA3) = 1EYhlA3 E(YT283) = 1EYh2B3 E(YT3A3) = 1EYh3A3 h h h E(YTI84) = 1EYhlB4 E(YT2A4) = 1EYh2A4 E(YT3B4) = 1EYh3B4 h h h E(YPlA.) = /H E(YhlAl+YhIA3) E(YP2B.) = 2k E(Yh2Bl+Yh2B3) h h E(YPIB.) = 2k 2: (YhlB2+YhlB4) E(YP2A.) = 2k 2: (Yh2A2+Yh2A4) h h E(YSIA.) = 2k E(YhlAl+YhIA3) E(YSI8.) = 2k E(YhlB2+YhlB4). h h

Then, expectations for differences between means of ith patients in centers assigned duaV complementary sequences from equation (6.5), are: 140

1E(YhlAI - YhlB2) h 1E(Yh2A2 - Yh2Bl) h 1E(Yh3A2 - Yh3Bl) h 1E(YhlA3 - YhlB4) h E(Ctf3) = 1E(Yh2A4 - Yh2B3) h 1E(Yh3A3 - Yh3B4) h 2k E(YhlAI + YhlA3 - YhlB2 - YhlB4) h 2k E(Yh2A2 + Yh2A4 - Yh2Bl - Yh2B3) h 2k E(YhlAI + YhlA3 - Yh182 - YhlB4) h Thus, in terms ofthe model in (4.1), the expectation can be rewritten as:

TA - TB + (7rT)IA - (1TT)IB TA - TB + (1TThA - (1TT)2B TA - TB + (1TT)3A - (1TThB TA - TB + (1TT)IA - (1TT)IB E(Ctf3) = TA - TB + (7rThA - (1TT)2B

TA -TB + (1TT)3A - (1TT)3B TA - TB + (1TT)IA - (1TT)IB

TA - TB + (7rThA - (1TT)2B TA - TB + (1TThA - (1TThB

Then, the contrast for average treatment difference (6.7) has the expectation:

E(CavgtrtCt/,IHA) = TA - TB + ~{(7rT)IA - (7rThB} + ~{(1TT)2A - (1TT)2B}, which is unbiased if accrual x treatment is null. Under the general null hypothesis (Hog) the average treatment contrast ofthe mean differences is unbiased. Under the second null

(the average null, Hoa), the no accrual x treatment assumption is required for unbiased­ ness. Table 7.2 summarizes the assumptions needed for unbiasedness for each hypothesis in this section. 141

Table 7.2 Assumptions for unbiased treatment difference with random accrual

Estimator Hypothesis Required assumptions for unbiasednessu

CQ,'lJgtrt no accrual x treatment none

Fixed effects no accrual x treatment model (6.8) none

U 142

Chapter 6 treatment estimators resulting from design matrices (6.8)-(6.10), can again be examined in this finite population framework. (Note: these models have estimators for treatment difference expressed in terms of TB from the (3 vector, so the results here have had the sign reversed for comparability.) From the proper row in (X'xt1 X', the linear combination of means estimating the fixed effects model treatment estimator is equivalent to 2~(4 2 21 2 4213 310]'Ct {3. Thus, the expected value ofthe treatment difference parameter from the model is

E(Cfi~edCt{31 HA ) = 7A - 7B + ;2 {(1r7)lA - (1r7)lB + (1r7)2A - (1r7)2B}. SO, once again, the accrual x treatment interaction in the tripled and paired centers must be null for the estimator to be unbiased; the same is true for the average null (Hoa). Under the general null hypothesis (Hog) the fixed effects model estimator is unbiased without additional assumptions. With nonnull accrual x treatment interaction, this estimator is slightly more biased than E(CavgtrtCt{31 HA ); this extra bias with nonnull 23 accrual x treatment is 1~8 {(1rT)lA - (7rT)lB} + 198 {(7rT)2A - (7rT)2B}' The other Chapter 6 reduced models, (6.9) and (6.10), have the same linear combination of means as the average treatment contrast (Cavgtrt) already evaluated.

Thus they have the same amount ofbias in the estimation ofthe treatment effect. Refer to

Table 7.2 which summarizes assumptions needed for unbiasedness. These investigations could be undertaken for trials with a maximum offour patients per center as well.

7.3. Notes on Combining Centers In practice, multicenter trials with incomplete allocation blocks can be assessed for center homogeneity and indications of selection bias. Although assumptions cannot be verified, they can be examined. Association between recruitment indices (Lhg) and treatment assignment indices (Uhk) for the H centers provides evidence of selection bias, though lack of association does not necessarily mean lack of selection bias. Chi-square 143

tests (with at least a moderate number ofcenters) can assess the independence ofUhk and

L hg . A trend test for the L hg can be performed to examine a pattern related to the number ofpatients per center. Comparing the number ofcenters assigned to a sequence (e.g. with

I = 3 nABB) versus the binomial distribution with parameters H and k also checks the association between center treatment sequence and number ofcenters.

Moreover, the correlation between each recruitment index (Lhg) and the responses

for the first patient in each center (Yhljk), separately for each treatment group should be near zero and similar for each treatment group; the partial correlation accounting for treat­

ment also should be near zero. The second patients' responses can be assessed with

recruitment indices, except Lhl, similarly. Nonzero correlation between response and recruitment would generate concern and potentially indicate differential center recruit­ ment; i.e. based upon response of earlier subjects. Further, absolute values of the correlations of the random allocation indices (Uhk) with the first, second, and third patients' responses separately should be similar; otherwise selection bias might be suspected.

Finally, methods to assE;ss carryover effect (period x treatment interaction) 10 crossover trials (Jones and Kenward, 1989; Senn, 1993) can be applied. In this situation accrual x treatment, which is analogous to carryover effect, can be examined with the sums of responses within centers for each sequence. Graphical methods for carryover effect (Jones and Kenward, 1989) can also be used to examine accrual x treatment interaction. Just as in crossover trials, however, multicenter studies are designed to have adequate power to detect treatment differences. So, like carryover effect in crossover studies, the power to assess accrual x treatment interactions is usually not very high.

7.4. Summary and Conclusions This chapter investigates bias for models for incomplete blocks of patients within a fixed number of centers (H). Treatment estimators are unbiased under the usual general 144 null hypothesis. However, the average null hypothesis and nonnull hypothesis are,biased and require assuming various interactions are null. Random recruitment indices have less bias than fixed ones. Moreover, random effects models appear to have less bias than fixed • effects models. These assessments can be extended to more than three patients per center. .. Missing data in the form ofincomplete treatment block allocation from such trials should be examined for patterns of missingness reflecting selection bias. Under the general null hypothesis, the model based estimators are unbiased with either fixed or random recruit­ ment indices (Lhg). Chapter 8 Summary and Future Research

8.1. Summary Randomized controlled clinical trials are considered by many to be the "gold

standard" in health research. Stratified designs in RCTs, which are common for logistical

reasons as well as statistical considerations, are perhaps being used even more to meet

requirements for better generalizability. However, researchers often do not wish to

perform analyses fully accounting for design stratification; for example, some researchers may consider sequential blocks and study centers as nuisance variables, while others might want to estimate subgroup differences (e.g. gender differences) or treatment effect differences among subgroups. This research has proposed methods for some such situations and examined their implications in terms ofbias and power.

Various randomized design methods and issues are reviewed in Chapter 1. Chapter 1 describes analysis strategies for stratified designs: fully accounting for strata, ignoring strata, and covariance adjustment of strata. Also, actual and conceptual examples are introduced in Chapter 1. Chapter 2 provides a review ofdesign- and model-based analysis methods for categorical and continuous responses in parallel and crossover studies.

The three analysis strategies for stratified designs (stratify, ignore, and adjust) are compared in terms of bias and power in Chapter 3. Various stratification effects and treatment effects are evaluated. Normally and non-normally distributed responses are considered. Simulations and example analyses show that the three procedures have similar bias and power with null or small stratum effects. As stratum effects increase, power 146 decreases, especially if ignoring the strata. Thus, adjusting strata as covariates seems to be a good choice when fully stratified analyses are undesirable. Analyses of trials with center as a stratification design factor are examined in Chap­ • ters 4 through 7. Chapters 4 and 5 develop methods for trials with many centers and one .. or two patients per center, which can be viewed like a crossover trial with missing data in the second period. A random effects model approach, accounting for accrual order within centers, utilizes a weighted estimator of paired and unpaired centers in Chapter 4 for better power than using paired centers only. Potential bias ofthis method is assessed with a finite population approach in Chapter 5. The weighted approach with a priori weights incorporating accrual order requires fewer assumptions for unbiased treatment difference estimates than a simple difference estimator with either fixed or random recruitment. The centers should be examined for compatibility as outlined in Chapter 5. However, the proposed estimator tests the null hypothesis without bias for either fixed or random recruitment, even in the presence ofselection bias.

Chapters 6 and 7 extend the methods of Chapters 4 and 5 to trials with more than two patients per center. Th~se extensions utilize a vector containing means for each within-center sequence by number of patients combination. Chapter 6 provides various models for this vector of means which can be analyzed with mixed models, weighted least squares, or generalized estimating equations to include data from incomplete centers.

Chapter 7 assesses these models for bias and finds that with random recruitment the accru­ al x treatment interaction must be assumed null for unbiased treatment difference estimates. Also, random effects models have slightly less bias than fixed effects models. However, the null hypothesis has unbiased evaluation under both fixed and random recruitment. 147

8.2. Recommendations Analyses ignoring factors used as strata in the design implicitly assume that variation due to the strata is random. This is often untrue, even for pure design strata such as sequential- or pennuted-blocks since they are time-varying and other factors related to response may be time-varying too. Thus, adjusting strata as covariates seems to be a good choice when fully stratified analyses of continuous responses are not desired, since covariance adjusted analyses do not appear biased or inefficient; caution is advised for dichotomous outcomes, especially for matched pairs as in case-control studies (Breslow and Day, §7.1, 1980).. Multicenter trials which have some incomplete within-center treatment allocation often have the incomplete centers discarded for analyses. The proposed methods incorporate the incomplete data with a weighted combination of complete data which results in increased power to detect treatment differences. Assumptions needed for these methods to be unbiased are enumerated in Chapters 5 and 7 using a finite population approach; with random recruitment, null accrual order x treatment interaction must be assumed for unbiasedness ofthe treatment difference. However, the null hypothesis ofno treatment difference assessed with this weighted approach is unbiased without extra assumptions. Thus, the incomplete centers should be included for more power in multicenter trial analyses, but homogeneity ofcenters should be examined, particularly for centers accruing fewer patients than others.

8.3. Future Research 8.3.1. Unequal Allocation in Multicenter Trials Various times throughout the dissertation the concept of unequal treatment or sequence allocation has been introduced. Section 6.1. discusses sequence optimality sequences and design efficiency, while Section 7.2. mentions that unequal probabilities can be used for the random treatment allocation indices. Methods need to be extended for 148 such scenarios. In unequal allocation, for example, the two sequences yielding more efficient designs could be assigned to centers more frequently and the two sequences included to reduce the chance of selection bias associated with enrolling third patients • could be assigned less frequently; e.g. the optimal sequences ABB and BAA could be assigned twice as often as ABA and BAB which would be included to prevent bias. In addition to extending the methods to accommodate unequal allotment, the performance gains of such schemes versus equal allocation should be evaluated in terms ofpower and selection bias.

8.3.2. Stratified Incomplete Multicenter Extensions The methods in Chapters 4 through 7 could be extended to multicenter trials which have an additional stratification design variable. With many centers each with one or two patients, as in Chapters 4 and 5, and a dichotomous stratification factor in addition to center, the various types ofcenters can be combined using weights. For instance, centers with one stratum empty (i. e. pairs in one stratum and none in the other stratum and singles in one stratum and none in the other) could have the same methods as in Chapters 4 and 5 applied directly. Paired observations in both strata and pairs in one stratum but singles in the other could be handled with straightforward extension ofthose .methods. Centers with singles in both strata could be interpreted as a two sample problem; alternatively, just the singles assigned the same treatment in the same center could be handled this way and the singles assigned opposite treatments in the same center could be considered paired

(ignoring the other stratification factor). A weighted estimator can combine each ofthese estimates for an overall estimate. This approach as well as other alternatives need to be elaborated and then evaluated. 149

8.3.3. Alternative Multicenter Designs Various restricted randomization designs including permuted-block, dynamic alloca­

tion/minimization, and adaptive randomization (including biased-coin) have been reviewed

(Subsections 1.3.1.-1.3.3.). Alternative stratified designs counter-balancing assignments

but analyzed in a crossover design style could be examined. Several scenarios (Figures 8.1

and 8.2) are sketched for further investigation. Example 1 concerns three hypothetical stratification factors: He centers, gender, and

race (white/other). Assuming equal enrollment rates for the H crossclassified demograph­

ic groups in each ofthe He centers, randomly assign one of the (~) treatment sequences (e.g. with H = 2 x 2 = 4 and J = 2: ABBA, BAAB, ABAB, BABA, AABB, BBAA) to the first patient from each crossclassified group; then randomly assign one of the

remaining sequences (i.e. sample without replacement) to the second from each of H

groups; continue until all (~) sequences are assigned; repeat, if more than (~) patients

are enrolled in each center. Data from such a design could be viewed as a crossover

study: "period" is the cross-classified stratification factors (gender x race) and "carryover"

is the treatment x "period" interaction (e.g. treatment x gender and treatment x race).

The second scenario, Example 2, contains three hypothetical stratification factors: He

centers, gender, and age (younger/older). Assuming equal enrollment rates for the H

crossclassified demographic groups in each center, randomly assign one of ( ~) treatment

sequences to the first patient from each crossclassified group and assign the complement­

ary treatment sequence to the second patient from each group; then randomly assign one

ofthe remaining sequences to the third from each group and assign its complement to the fourth; continue until all (~) sequences are assigned; repeat, if necessary: Essentially, (~) /2 sequences are randomly allotted. Data from such a design could be viewed as a crossover study, as before, with the additional restrictions for the assignment in complementary sequence pairs. 150

8.3.4. . Other Extensions The evaluations ofignoring strata and adjusting strata as covariates as alternatives to fully stratified analyses in Chapter 3 can be extended. Multiple stratification factors comprised ofmore strata with various sample sizes can be considered. The implications of these alternatives for subgroup analyses (treatment x covariate) and secondary covariates, particularly relating to multiple comparisons, can be examined. Moreover, the role covariate measurement error plays in these alternatives can be assessed through adapting sample survey methods (Koch, 1973) for measurement error.

Various extensions for methods in multicenter trials can be developed. For example, these methods have all concerned two treatment groups, so the methods could be extended to three groups and generalized to J groups, if desired. Additionally, the multicenter trial scenario with incomplete pairs can be studied as being performed until n p paired (complete) centers are enrolled; this corresponds to a negative binomial problem of center enrollment with ·random Hand n s and fixed n p . Moreover, other schemes for pooling centers can be proposed and evaluated. 151

Figure 8.1 Sample alternative design 1

Gender x Race

Center Patient # Sequence # FW FO MW MO 1 1 3 A B B A 1 2 1 A B A B 1 3 4 B A B A 1 4 2 B A A 8 1 5 5 B B A A 1 6 6 A A 8 B

He 1 He 2 He 3 He 4 He 5 He 6

Figure 8.2.Sample alternative design 2

Gender x Race

Center Patient # Sequence # FY FO MY MO 1 1 1 A B B A 1 2 2 B A A B 1 3 4 B A B A 1 4 3 A B A B 1 5 6 B B A A 1 6 5 A A B B

He 1 He 2 He 3 He 4 He 5 He 6 152

REFERENCES

Bell RM, Klein SP, Bohannan HM, Graves RC, Disney JA (1982), Results o/baseline exams in the nationalpreventive dentistry demonstration program, R-2862-RWI, • Santa Monica, CA: The Rand Corporation.

Beutner KR, Friedman DJ, Forszpaniak C, Anderson PL, Wood MJ (1995), Valaciclovir compared with acyclovir for improved therapy for herpes zoster in immunocompetent adults, AntimicrobialAgents and Chemotherapy 39: 1546-53.

Bhoj DS (1978), Testing equality ofmeans ofcorrelated variates with missing observations on both responses, Biometrika 65:225-228.

Bhoj DS (1984), On difference ofmeans ofcorrelated variates with incomplete data on both responses, J Statist Comput Simulat 19:275-289.

Bhoj DS (1989), On comparing correlated means in the presence ofincomplete data, BiometricalJ 31:279-288.

Bhoj DS (1991), Testing equality ofmeans in the presence ofcorrelation and missing data, Biometrical J 33:63-72.

Bieler GS, Williams RL (1995), techniques in quantal response teratology and developmental toxicity studies, Biometrics 51:764-776.

Binder DA (1983), On the variances ofasymptotically normal estimators from complex surveys, Int Statist Rev 51:279-292.

Blackwell D, Hodges JL, Ir (1957), Design for the control ofselection bias, Ann Math Stat 28:449-460.

Bouza CN (1983), Estimation ofa difference in finite populations with missing observations, Biometrical J 25:123-128.

Breslow NE, Day NE (1980), StatisticalMethods in Cancer Research, Volume 1: The Analysis 0/Case-Control Studies, Lyon, France: International Agency for Research on Cancer.

Bross illJ (1958), How to use ridit analysis, Biometrics 14: 18-38.

Byar DP, Simon RM, Friedewald WT, Schlesselman II, DeMets DL, Ellenberg JH, Gail MR, Ware JH (1976), Randomized clinical trials - Perspectives on some recent ideas, N Engl J Med 295:74-80.

Cochran WG (1977), Sampling Techniques, 3rd edn., New York: John Wiley & Sons. 153

Conover WI, Iman RL (1982), Analysis ofcovariance using the rank transformation, Biometrics 38:715-724.

Cook RI (1995), Interim analyses in 2 x 2 crossover trials, Biometrics 51:932-945.

Cook RI, Willan AR (1996), Design considerations in crossover trials with a single interim analysis and serial patient entry, Biometrics 52:732-739.

Cornfield J (1944), On samples from finite populations, JAm Statist Assoc 39: 236-239.

Coulter HL (1991), The Controlled Clinical Trial: An Analysis, Washington, DC: Center for Empirical Medicine.

Cytel Software Corp. (1995), StatXact 3 for Windows: Statistical Softwarefor Exact Nonparametric Inference, User Manual, Cambridge, MA: Cytel Software Corp.

Davies GM (1994), Application ofSample to RepeatedMeasures Data Structures in Dentistry, Unpublished doctoral dissertation, Mimeo Series No. 2128T, University ofNorth Carolina at Chapel Hill: Institute ofStatistics.

Ebbutt AF (1984), Three period crossover designs for two treatments, Biometrics 40: 219-224.

Efron B (1971), Forcing a sequential experiment to be balanced, Biometrika 58: 403-417.

Eisenhauer EA, ten Bokkel Huinink WW, Swenerton KD, Gianni L, Myles J, van der Burg ME, Kerr I, Vermorken JB, Buser K, Colombo N, Bacon M, Sant barbara P, Onetto N, Winograd B, Canetta R (1994), European-Canadian randomized trial ofpaclitaxel in relapsed ovarian cancer: high versus low dose and long versus short infusion, J Clin OncoI12:2654-2666.

Ekbohm G (1976), On comparing means in the paired case with incomplete data on both responses, Biometrika 63: 299-304.

Ekbohm G (1981), On testing equality ofmeans in the paired case with incomplete data on both responses, Biometrical J 23:251-259.

Fisher RA (1966), Design ofExperiments, 8th edn., London: Hafner Publishing Co.

Fleiss JL (1981), Statistical Methodsfor Rates andProportions, New York: John Wiley & Sons.

Fleiss JL (1986), The Design andAnalysis ofClinical , New York: John Wiley & Sons. 154

Friedman LM, Furberg CD, DeMets DL (1985), Fundamentals a/Clinical Trials, 2nd edn., Littleton, MA: PSG Publishing Company, Inc.

Friedman M (1937), The u~e ofranks to avoid the assumption ofnormality implicit in the analysis ofvariance, J Amer Statist Assoc 32:675-699. •

Gansky SA, Koch GG, Wilson J (1994), Statistical evaluation ofrelationships between analgesic dose and ordered ratings ofpain relief over an eight-hour period, J Biopharm Statist 4:233-264. .

Gart JJ (1969), An exact test for comparing matched proportions in crossover designs, Biometrika 56:75-80.

Gehan EA, Lemak NA (1994), Statistics in Medical Research: Developments in Clinical Trials, New York: Plenum Publishing Corporation.

Graubard BI, Korn EL (1987), Choice ofcolumn scores for testing independence in ordered 2 x K contingency tables, Biometrics 43:471-476.

Greenberg BG (1951), Why randomize? Biometrics 7:309-322.

Grieve AP (1995), Extending a Bayesian analysis ofthe two-period crossover to accommodate missing data, Biometrika 82:277-286.

Grizzle IE, Starmer CF, Koch GG (1969), The analysis ofcategorical data by linear models, Biometrics 25:489-504.

Hafner KB, Koch GG, Canada AT (1988), Some analysis strategies for three-period changeover designs with two treatments, Statist Med 7:471-481.

Hamdan MA. Khuri AI, Crews SL (1978), A test for equality ofmeans oftwo correlated normal variates with missing data on both responses, Biometrical J 20:667-674.

Harrell JS, McMurray RG, Bangdiwala sr, Frauman AC, Gansky SA, Bradley CB (1996), The effects ofa school-based intervention to reduce cardiovascular disease risk factors in elementary school children: The Cardiovascular Health in Children (CHIC) Study, J Pediatrics 128:797-805.

Hollander M, Pena E (1988), Nonparametric tests under restricted treatment-assignment rules, JAm Statist Assoc 83: 1144-1151.

Hollander M, Wolfe DA (1973), Nonparametric Statistical Methods, New York: John Wiley & Sons. 155

Jones B, Kenward MG (1989), Design andAnalysis ofCross-Over Trials, New York: Chapman & Hall.

Klein SP, Bohannan HM, Bell RM, Disney JA, Foch CB, Graves RC (1985), The cost andeffectiveness ofschool-basedpreventive dental care, R-3203-RWJ, Santa Monica, CA: The Rand Corporation.

Kleinbaum DG, Kupper LL, Morgenstern H (1982), Epidemiologic Research: Principles and Quantitative Methods, New York: Van Nostrand Reinhold.

Koch GG (1973), An alternative approach to multivariate response error models for sample survey data with applications to estimators involving subclass means, J Am Statist Assoc 68: 906-913.

Koch GG (1983), Intraclass correlation, in: Encyclopedia ofStatistical Sciences Volume 4 (Eds. NL Johnson, S Kotz), New York: Wiley, pp. 212-217.

Koch GG, Amara IA, Davis GW, Gillings DB (1982), A review ofsome statistical methods for covariance analysis ofcategorical data, Biometrics 38:563-595.

Koch GG, Carr GJ, Amara IA, Stokes:rv1E and Uryniak TJ (1990), Categorical data analysis, in: Statistical Methodology in the Pharmaceutical Sciences (Ed. DA Berry), New York: Marcel Dekker, pp. 389-473.

Koch GG, Edwards SE (1988), Clinical efficacy trials with categorical data, in Biopharmaceutical Statistics/or Dntg Development (Ed. KE Peace), New York: . Marcel Dekker, pp. 403-457.

Koch GG, Gillings DB (1983), Inference, design based vs. model based, in: Encyclopedia ofStatistical Sciences Volume 4 (Eds. NL Johnson, S Kotz), New York: Wiley, pp.84-88.

Koch GG, Gillings DB, Stokes :rv1E (1980), Biostatistical implications ofdesign, sampling, and measurement to health science data analysis, Ann Rev Public Health 1:163-225.

Koch GG, Imrey PB, Singer lM, Atkinson SS and Stokes ME (1985), Analysis of Categorical Data, Montreal: Les Presses de l'Universite de Montreal.

Kruskal WH, Wallis WA (1952), Useofranks in one-criterion variance analysis, JAm Statist Assoc 47:584-621.

Kuritz SJ, Landis JR, Koch GG (1988), A general overview ofMantel-Haenszel methods: Applications and recent developments, Ann Rev Public Health 9: 123-60. 156

Lachenbruch PA, Myers MG (1983), Unmatched observations in matched pairs analysis, Am Statist 37: 317-319.

Lachin 1M (1988a), Statistical properties ofrandomization in clinical trials, Controlled Clin Trials 9:289-311.

Lachin 1M (l988b), Properties ofsimple randomization in clinical trials, Controlled Clin Trials 9:312-326.

Laird NM, Ware JH (1982), Random-effects models for longitudinal data, Biometrics 38: 963-974.

Landis JR, Koch GG (1979), The analysis ofcategorical data in longitudinal studies of behavioral development, in: LongitudinalMethodology in the Study ofBehavior and Development (Eds. JR Nesselroade, PB Baltes), New York: Academic Press, pp.233-261.

Landis JR, Miller ME, Davis CS, Koch GG (1988), Some general methods for the analysis ofcategorical data in longitudinal studies, StatistMed 7:109-137.

Liang KY, Zeger SL (1986), Longitudinal data analysis using generalized linear models, Biometrika 731:13-22.

Liang KY, Zeger SL, Qaqish (1992), Multivariate regression analyses for categorical data, JRSS B 54:3-40.

Lin PE, Stivers LE (1974), On difference ofmeans with incomplete data, Biometrika 61: 325-334.

Lipsitz SR, Kim K, Zhao L (1994), Analysis ofrepeated categorical data using generalized estimating equations, Statist Med 13: 1149-1163.

Little RJA, Rubin DB (1987), Statistir;al Analysis with Missing Data, New York: John Wiley & Sons.

Mann HB, Whitney DR (1947) On a test ofwhether one oftwo random variables is stochastically larger than the other, AnnMath Statist 18: 50-60.

Mantel N (1963), Chi-square tests with one degree offreedom: Extensions ofthe Mantel­ Haenszel procedure, J Am Statist Assoc 58: 690-700.

Mantel N, Haenszel W (1959), Statistical aspects ofthe analysis ofdata from retrospective studies ofdisease, J Nat Cancer Inst 22:719-748.

Matthews JNS (1994), Multi-period crossover trials, Statist Meth in MedRes 3:383-405. 157

Matts JP, Lachin 1M (1988), Properties ofpermuted-block randomization in clinical trials, Controlled Clin Trials 9:327~344.

Matts JP, McHugh RB (1978), Analysis ofaccrual randomized clinical trials with balanced groups in strata, J Chron Dis 31:725-740.

McCullagh P, NeIder JA (1989), GeneralisedLinear Models, 2nd edn., New York: Chapman & Hall.

McLean RA., Sanders WL, Stroup WW (1991), A unified approach to mixed linear models, Am Statist 45:54-64.

Mehlisch DR, Sollecito WA, Helfrick JF, Leibold DG, Markowitz R, Schow CE Jr, Schultz R, Waite DE (1990), Multicenter clinical trial ofibuprofen and acetaminophen in the treatment ofpostoperative dental pain, J Am Dent Assoc 121:257-263.

Mehta CR, Patel NR, Wei LJ (1988), Constructing exact significance tests with restricted randomization rules, Biometrika 75:295-302.

Mehta IS, Gurland J (1969), Some properties and an application ofa statistic arising in testing correlation, Annals ofMath Statist 40: 1736-1745.

Meinert CL (1986), Clinical Trials: Design, Conduct, andAnalysis, New York: Oxford University Press.

Nikkels AF, Pierard GE (1994), Recognition and treatment ofshingles, Drugs 48:528-548.

Oddone EZ, Cowper P, Hamilton JD, Matchar DB, Hartigan P, Samsa G, SimberkoffM, Feussner JR (1993), Cost effectiveness analysis ofearly zidovudine treatment ofIDV infected patients, Brit MedJ 307: 1322-1325.

Ory MG, Schechtman KB, Miller JP, Hadley EC, Fiatarone MA, Province MA, Arfken CL, Morgan D, Weiss S, Kaplan M (1993), Frailty and injuries in later life: the FICSIT trials, JAm Geriatr Soc 41:283-296.

Patel HI (1985), Analysis ofincomplete data in a two-period crossover design with reference to clinical trials, Biometrika 72:411-418.

Pocock SJ (1983), Clinical Trials: A Practical Approach, New York: John Wiley & Sons.

Pocock SJ, Simon R (1975), Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial, Biometrics 31: 103 -115.

Quade D (1967), Rank analysis ofcovariance, JAm Statist Assoc 62: 1187-1200.