Interval Estimation for Drop-The-Losers Designs

Biometrika (2010), 97,2,pp. 405–418 doi: 10.1093/biomet/asq003 C 2010 Biometrika Trust Advance Access publication 14 April 2010 Printed in Great Britain Interval estimation for drop-the-losers designs BY SAMUEL S. WU Department of Epidemiology and Health Policy Research, University of Florida, Gainesville, Florida 32610, U.S.A. [email protected]fl.edu WEIZHEN WANG Downloaded from Department of Mathematics and Statistics, Wright State University, Dayton, Ohio 45435, U.S.A. [email protected] AND MARK C. K. YANG Department of Statistics, University of Florida, Gainesville, Florida 32610, U.S.A. http://biomet.oxfordjournals.org [email protected]fl.edu SUMMARY In the first stage of a two-stage, drop-the-losers design, a candidate for the best treatment is selected. At the second stage, additional observations are collected to decide whether the candidate is actually better than the control. The design also allows the investigator to stop the trial for ethical reasons at the end of the first stage if there is already strong evidence of futility or superiority. Two types of tests have recently been developed, one based on the combined means and the other based on the combined p-values, but corresponding inter- at University of Florida on May 27, 2010 val estimators are unavailable except in special cases. The problem is that, in most cases, the interval estimators depend on the mean configuration of all treatments in the first stage, which is unknown. In this paper, we prove a basic stochastic ordering lemma that enables us to bridge the gap between hypothesis testing and interval estimation. The proposed confidence intervals achieve the nominal confidence level in certain special cases. Simulations show that decisions based on our intervals are usually more powerful than those based on existing methods. Some key words: Adaptive design; Clinical trial; Drop-the-losers design; p-value combination; Two-stage test. 1. INTRODUCTION The purpose of this paper is to construct confidence limits for the mean difference between the selected treatment and the control in a two-stage, drop-the-losers clinical trial. As the name implies, when there are k treatments to be compared with a control, we wish to pick the most promising treatment using a two-stage design, in which we drop all other treatments after the first stage and confirm that the selected treatment is indeed better than the control in the second stage. The investigator can also stop the trial for ethical reasons at the end of the first stage if there is already strong evidence for or against the null hypothesis that no treatment is better than the control. Multiple interim looks may be incorporated after the treatment selection. In drug testing, the selection stage corresponds to a Phase II trial and the confirmation stage corresponds to a 406 SAMUEL S. WU,WEIZHEN WANG AND MARK C. K. YANG Phase III study. Traditionally, these two trials have been conducted separately, with the Phase III study ignoring all the Phase II information except the selection. A seamless Phase II/III design combines the data from both stages to make the final decision. Since the primary goal of most clinical trials is to make sure that the selected treatment is indeed better than the control, most two-stage designs are investigated using hypothesis testing. Thall et al. (1988) considered the two-stage, drop-the-losers strategy for dichotomous responses with an early futility stop, but no early rejection. Bischoff & Miller (2005) constructed an optimal test based on the likelihood ratio principle for k = 2, but an extension of their result for k > 2 was not explored. Sampson & Sill (2005) proposed a uniformly most powerful conditional test for general k conditioned on the first-stage results. Unfortunately, this test is not an uncondi- tionally uniformly most powerful test. Wang (2006) constructed a rigorous level-α test under the assumption that all treatments are not inferior to the control. Downloaded from The methods in the four papers mentioned so far use the likelihood ratio principle on data from the two stages; i.e. the decision is based on combined means. Alternatively, one could use p-values from the two stages to integrate information. Popular methods of combining p-values include Fisher’s products of p-value combination (Bauer & Kohne,¨ 1994; Bretz et al., 2006), weighted linear p-value combination (Chang, 2007) and weighted linear p-value combination http://biomet.oxfordjournals.org under the inverse transformation of the normal distribution function (Lehmacher & Wassmer, 1999). In contrast to likelihood-based methods, the techniques that combine p-values do not exploit all of the available information and are therefore less efficient (Tsiatis & Mehta, 2003; Jennison & Turnbull, 2006). On the other hand, the p-value methods offer more flex- ibility. For example, the statistical methods that lead to the p-values at the two different stages do not have to be the same, and the second-stage design may depend on all first-stage data. Once we have confirmed that the candidate treatment is indeed better than the control, it is important from a practical standpoint to quantify the difference. In some cases, an interval estimator may be obtained as a by-product of the hypothesis-testing procedure. For example, Liu et al. at University of Florida on May 27, 2010 (2002) describe how this can be accomplished in a situation where there exists a pivotal quantity for a univariate parameter. Unfortunately, the parameter of interest in our problem is random due to selection. There seems no clear way to extend the method of Liu et al. (2002) in this case. To be more specific, inverting a test in our context would require the least favourable configuration for the coverage probability of the interval estimator. Lehmacher & Wassmer (1999) obtained a confidence interval for the k = 1 case and Brannath et al. (2006) provided a com- prehensive discussion of confidence interval construction under various conditions, such as designs with upper and lower bounds on the second-stage sample size, but they also considered only the k = 1 case. Bischoff & Miller (2005) considered the k = 2 case but did not provide an interval estimator. Sampson & Sill (2005) were able to convert their conditional test into a confidence interval because, by conditioning on the first-stage data, the final decision is a one-stage problem. Since the Sampson and Sill method seems to be the only confidence interval for the general k-arm drop-the-losers design, we use it as the standard reference for comparison. In this paper, we develop interval estimators for the mean difference between the selected treatment and the control. Our methods work for both types of testing; i.e. both mean combination and p-value combination methods. While the unknown mean configuration prevents us from constructing the tightest possible lower confidence limit, we are able to prove that our interval estimators do achieve the nominal confidence level in certain special cases. Moreover, our simulation study shows that the new intervals usually perform better than Sampson and Sill’s intervals. Interval estimation for drop-the-losers designs 407 2. BASIC FORMULATION AND DECISION RULES 2·1. Basic formulation Let the response of the jth patient from the ith treatment at stage t be Xtij and Xtij = μi + tij ( j = 1,...,nti; i = 0, 1,...,k; t = 1, 2), 2 where μi stands for the ith treatment mean, with i = 0 for the control, and tij ∼ N(0,σ )are independent errors. Furthermore, we assume that the larger the mean the better. ¯ 2 Let Xti and Sti be the sample mean and sample variance from the ith treatment at stage t.From 2 the first stage, let S1 be the pooled estimate of the common variance as defined in Appendix 1 and = ¯ − ¯ / ¯ ¯ = ,..., Yi (X1i X10) ai be the standardized differences between X1i and X10 for i 1 k,where Downloaded from 1/2 ai = (1/n1i + 1/n10) .Letτ be the selected treatment; i.e. τ = i, if Yi = max j=1,...,k Y j .We assume that n2i (i = 0,...,k) are prespecified functions of S1 for the mean combination method and functions of both S1 and X¯ 1i (i = 0,...,k), for the p-value combination method; hence, they are known at the end of the first stage. Also, only the selected treatment and the control are continued after the first stage; i.e. the second-stage data from the dropped treatments are not observed. http://biomet.oxfordjournals.org Since the selected treatment τ for the second stage is random, we are actually estimating the difference between the means of an unprespecified treatment and the control, = μτ − μ0.Itis clear that for continuous error tij, τ and μτ are uniquely defined with probability one. 2·2. The mean combination decision rule The difference between the sample means of the selected treatment and the control is D = (n1τ X¯ 1τ + n2τ X¯ 2τ )/(n1τ + n2τ ) − (n10 X¯ 10 + n20 X¯ 20)/(n10 + n20), based on the data from both stages. When the sample sizes n2i (i = 1,...,k) are different, the distribution of the usual sample variance depends on τ; consequently, it depends on X¯ 1i (i = 1,...,k). Therefore, we elect to use at University of Florida on May 27, 2010 2 a modified variance estimate S defined in Appendix 1. In most practical situations, the n2i are all equal and S2 is the usual variance estimator. 1/2 Let bτ ={1/(n1τ + n2τ ) + 1/(n10 + n20)} and T = D/bτ .Foragivenset(d0, d1, d2)such that d0 d1, our decision rule is: ⎧ ⎨if Yτ /S1 < d0, stop for futility, / , τ, at stage 1 ⎩if Yτ S1 d1 stop for superiority of treatment if d Yτ /S < d , continue to stage 2; 0 1 1 if T/S < d , claim futility, at stage 2 2 (1) if T/S d2, claim superiority of treatment τ.

Interval Estimation for Drop-The-Losers Designs

Confidence Interval Estimation in System Dynamics Models: Bootstrapping Vs

WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conﬂict with His Ideology of Statistical Inference

A Framework for Sample Efficient Interval Estimation with Control

Likelihood Confidence Intervals When Only Ranges Are Available

32. Statistics 1 32

Interval Estimation Statistics (OA3102)

Confidence Intervals in Analysis and Reporting of Clinical Trials Guangbin Peng, Eli Lilly and Company, Indianapolis, IN

Interval Estimation, Point Estimation, and Null Hypothesis Significance Testing Calibrated by an Estimated Posterior Probability of the Null Hypothesis David R

Statistical Inference II: Interval Estimation and Hypotheses Tests

Contents 3 Parameter and Interval Estimation

Applied Resampling Methods

Resampling for Estimation and Inference for Variances 1 Introduction