Pubh 8482: Sequential Course Introduction

Joseph S. Koopmeiners

Division of University of Minnesota

Week 1 Who am I?

• Joe Koopmeiners • Assistant Professor in Division of Biostatistics • PhD - University of Washington, 2009 • Research Interests: • Sequential and adaptive methods for translational cancer research • statistical evaluation of diagnostic tests • Collaborative Projects: • Tobacco cessation • MRI as a diagnostic tool for prostate cancer • Statistical support for early phase clinical trials Who are you?

• Name • Year in program (program, if not in biostatistics) • Advisor, if known Pre-requisites

• Stat 8101-8102 • Students must be comfortable with the multivariate normal distribution Text Books

There are no required texts for this course but lecture notes will draw heavily from: • Jennison, C. and Turnbull, B. (1999) Group Sequential Methods with Applications to Clinical Trials, Boca Raton: CRC Press. ISBN 0849303168 • Berry, S.M., Carlin, B.P., Lee, J.J. and Muller, P. (2010) Bayesian Adaptive Methods for Clinical trials, Boca Raton: CRC Press. ISBN 1439825483 Text Books

Other useful textbooks on reserve in the biostat reading room include: • Whitehead, J. (1997) The Design and Analysis of Sequential Clinical Trials, 2nd Ed., New York: John Wiley & Sons. ISBN 0471975508 • Proschan, M. A., Lan, K.K.G. and Wittes, J.T. (2006) Statistical Monitoring of Clinical Trials: A Unified Approach, New York: Springer. ISBN 0387300597 • Yin, G. (2012) Design: Bayesian and Frequentist Adaptive Methods, Hoboken: John Wiley & Sons. ISBN 0470581719 Evaluation and Grades

• Homework - 40% • Mid-term exam - 30% • Final project - 30% Homework

• About 5 homework assignments • You will have at least 2 weeks to work on the assignments • Homeworks will be due on Tuesdays Mid-term Exam

• Take-home mid-term about halfway through the course • Will cover what I consider “classic” group sequential methodology • You will have one week to complete the exam Class Project

• Pick a specific topic related to sequential and adaptive clinical trials • Writen report and 20 - 30 minute presentation • Discuss statistical challenges specific to your topic • Literature review of current research • Identify open research areas Office Hours

• Monday 3:00 p.m. - 4:00 p.m. • Thursday 1:00 - 2:00 p.m. Course Website

• Course Website

http://www.biostat.umn.edu/ josephk/courses/pubh8482 fall2012/

• Linked from my faculty webpage Course Title

• Current Title: • Sequential Analysis • More Accurate Title: • Sequential and Adaptive Methods for Clinical Trials What is a clinical trial?

• Clinical Trial: A controlled to test the safety or efficacy of a treatment or intervention • Usually randomized. Although, this is not always the case. Especially in phases 1 and 2. Fixed-sample Design

• Most studies utilize fixed-sample designs • Fixed-sample design: collect a pre-specified number of subjects and test our hypthesis • Randomize 200 patients to either a novel treatment or placebo and compare survival using a log-rank test Sequential Design

• An alternate approach is a sequential design • Sequential design: sequentially monitor the primary endpoint and continue to enroll subjects based on the interim results • Randomize an initial cohort of patients to a novel treatment or placebo and compare survival using a log-rank test • Determine if more patients should be enrolled based on pre-specified stopping rule Continuous Monitoring

• Early sequential methods focused on continuous monitoring • Evaluating the endpoint after each new experimental unit • for bombs during WWII • These methods are not practical in the setting of clinical trials Group Sequential Methods

• Sequential clinical trials generally rely on group sequential methodology • Group sequential methods: interim analyses are completed at pre-specified intervals throughout the study • Randomized the first 50 patients to either a novel treatment or placebo and compare survival using a log-rank test • Determine if more patients should be enrolled based on pre-specified stopping rule • Re-evaluate endpoint after every 50 patients until a total of 200 have been enrolled Adaptive Design

• An adaptive design is a design that uses accumulating data from the ongoing trial to modify certain aspects of the study • Sample size • Treatment dose • ratio • Study arms Sequential vs. Adaptive Designs

• There is no clear distinction between what constitutes a sequential and what constitutes an adaptive design • Both rely on interim analyses to modify the design • Sequential designs generally only modify the sample size (by stopping early), while adaptive designs are used to describe designs with more broad modifications • Both face similar statistical challenges Sequential and Adaptive Designs: Challenges

• Clinical trials are designed to achieve desired operating characteristics • Type-I error • Power • Sequential and adaptive methods alter the operating characteristics of the study • Challenge: Incorporate sequential and adaptive methods while maintaining the desired operating characteristics • Goal: Show that sequential and adaptive methods have the same type-I error rate and power as a fixed-sample design but smaller sample size or other desirable property Phases of Drug Development

• Phase 1: Safety trials • Phase 2: Efficacy trials • Phase 3: Confirmatory trials • Phase 4: Post-marketing surveillance Phase 1 Clinical Trials

• First-in-human trials • Primary objective is to evaluate safety • Efficacy is a secondary concern • Small trials: 10 - 50 subjects Phase 1 Trials Example

Phase 1 clinical trial in oncology • Estimate the probability of dose limiting toxicity (DLT) • We assume that as dose increases, the probability of DLT and the probability of efficacy will also increase • Maximum tolerated dose (MTD): Highest dose with probability of DLT less than some pre-specified cut-off (usually 0.2 or 0.33) Dose Escalation

• We would like to do a randomized study • Too dangerous for first-in-human studies • Instead, we complete non-randomized dose escalation studies • Patients receive progressively higher doses until MTD has been identified • Adaptive designs are used to guide dose escalation Patients vs. Healthy Volunteers

• Healthy volunteers are used in other settings where toxicities are less severe • Phase 1 oncology trials include patients for whom standard treatments have failed • Added goal treating patients with efficacious dose • Adaptive designs are used to treat patients at dose levels that are more likely to be efficacous Phase 2 Clinical Trials

• Goal of Phase 2: evaluate the efficacy of a novel therapeutic agent • Surrogate endpoints are often used in place of hard endpoints • Phase 2 oncology trial: tumor response instead of overall survival • Continue to evaluate safety of new drug Stopping for futility

• The majority of novel therapeutic agents will not be adequately efficacious • Clinical trials for ineffective treatments are expensive and fail to provide adequate care for study subjects • Early termination for futility allows ineffective treatments to be abonded if initial estimates of treatment efficacy are not promising Dropping Study Arms

• The optimal dose/treatment schedule is unlikely to be known after Phase 1 • We might run a multi-arm study to investigate multiple dose levels/treatment schedules and adaptively drop arms to save time/money/etc. Safety Monitoring

• The safety profile of a new drug continues to be evaluated in Phase 2 • Dual goals of answering scientific question/protecting study subjects • Sequential stopping rules are used to monitor the rate of adverse events Personalized Medicine

• Personalized medicine refers to customized treatment decisions based on patient characteristics such as genetic or other information • A phase 2 trial could be designed to investigate the effectiveness of a new treatment in different subpopulations • We might design a study to adaptively assign subjects to one of several treatment or drop subgroups for which the drug is not effective Phase 3 Clinical Trials

• Final, confirmatory trial for new therapeutic agent • Much larger than phase 2 • Hard endpoints • Overall survival instead of clinical response Sequential Monitoring in Phase 3

• Most common setting for sequential monitoring • Stopping rules set in advance to allow early termination for efficacy or futility • Get new treatments onto the market faster • Save time and money when treatments are not promising Adaptive Randomization

• Typically, subjects are randomized to treatment or control using a fixed randomization ratio • Alternately, we could change the randomization ratio over time so that more subjects are assigned to “better” treatment • Better outcomes for study subjects Sample Size Re-estimation

• Sample size is often calculated based on nuisance parameters for which there is little information • Incorrectly specified nuisance parameters can lead to under-powered studies • Re-estimate sample size using updated estimates of nuisance parameters at interim analyses Sample Size Re-estimation

• What if the true is smaller than we anticipated? • We can re-estimate the sample size using an updated estimate of the effect size at interim analyses • This is controversial In Summary

Motivations for using sequential/adaptive designs can be group into the following: • Ethical • Economic • Administrative Ethical

• Minimize the number of subjects treated with ineffective treatments • Make new treatments available to the public more quickly • Protect study subjects Economic

• Save money on the trial by terminating early • Early termination allows company to profit from new drug sooner Administrative

• Evaluate composition of study population • Determine if study procedures are being followed correctly • Check model assumptions Course Objectives

• Students will be familiar with standard group sequential methodology • Students will be exposed to adaptive methods in clinical trials • Students will understand the challenges of apply sequential and adaptive methods to clinical trials • Students will understand the advantages and disadvantages to sequential and adaptive designs Course Outline

• Week 1: Course Introduction • Weeks 2 and 3: Sequential testing of Normal Random Variables • Week 4: Brownian Motion and Asymptotically Normal test • Week 5: Estimation after a sequential trial • Week 6: Confidence intervals and p-values • Week 7: Bayesian Sequential methods • Weeks 8 and 9: Adaptive methods for Phase 1 clinical trials • Weeks 10 and 11: Adaptive methods for Phase 2 clinical trials • Weeks 12 and 13: Adaptive methods for Phase 3 clinical trials • Week 14: Student Presentations Frequentist vs. Bayesian

It is worth pointing out... • I don’t consider my self a Frequentist or a Bayesian • I am comfortable with both and do what I think is best for the specific problem • Both approaches will be discussed in this course • In general, • The first half of the class will focus on sequential designs and be more Frequentist • The second half of the class will focus on adaptive designs and be more Bayesian Comparing Normal with Known

2 • Let X1, X2,..., Xn be i.i.d. N µx , σ 2 • Let Y1, Y2,..., Yn be i.i.d. N µy , σ • σ2 known Null hypothesis

Consider a two-sided test of

H0 : µx = µy vs. Ha : µx 6= µy Fixed-sample test

• Collect n subjects in each group • Test null hypothesis using the following test ¯ ¯ Xn − Yn Zn = q 2∗σ2 n

• Under the null, Zn ∼ N (0, 1)

• Reject if |Zn| > Zα/2 • Results in type-1 error rate of α Power

Power (1 − β) for the fixed sample test can be calculated from the following formula:

 √ δ  1 − β = 1 − Φ Z1−α/2 − n √ 2 ∗ σ2 where δ is the alternative hypothesis. With δ = .5 ∗ σ and α = 0.05, • n = 50 results in 1 − β = 0.70 • n = 100 results in 1 − β = 0.94 Operating Characteristics

The fixed sample design has: • Total sample size of 2n • Type-1 error equal to α • Power for δ = .5 ∗ σ and α = 0.05: • 0.70 for n = 50 • 0.94 for n = 100 Group Sequential Design

• We could also consider a group sequential design • Ideally, we would find a sequential design with • The same type-1 error rate and power as the fixed-sample design • Smaller expected sample size One Interim Analysis

• Consider the simplest case of one interim analysis:

• Collect an initial sample of n1 subjects in each group

• Test the null hypothesis using Zn1 defined analagousley to Zn

• Reject if |Zn1 | > Zα/2 • Otherwise, collect n − n1 additional subjects in each group and test the null hypothesis using Zn Operating Characteristics

• Evaluate the operating characteristics of the sequential design described in the previous slide: • Type-1 error rate • Power • Expected sample size • How?

• Consider the joint distribution of (Zn1 , Zn) ¯ ¯ ¯ ¯  Joint Distribution of Xn1 − Yn1 , Xn−n1 − Yn−n1

¯ ¯ ¯ ¯  The joint distribution of Xn1 − Yn1 , Xn−n1 − Yn−n1 is

 ¯ ¯    2∗σ2 !! Xn − Yn µx − µy 0 1 1 ∼ , n1 ¯ ¯ 2∗σ2 Xn−n1 − Yn−n1 µx − µy 0 n−n1 ¯ ¯ ¯ ¯  Joint Distribution of Xn1 − Yn1 , Xn − Yn

It is immediate from the last slide that the joint distribution of ¯ ¯ ¯ ¯  Xn1 − Yn1 , Xn − Yn is

2 2 !!  X¯ − Y¯   µ − µ  2∗σ 2∗σ n1 n1 x y n1 n ¯ ¯ ∼ , 2 2 Xn − Yn µx − µy 2∗σ 2∗σ n n Definitions

Define ˆ ¯ ¯ • δn1 = Xn1 − Yn1 ˆ ¯ ¯ • δn = Xn − Yn

• δ = µx − µy • n1 ˆ In1 = 2∗σ2 be the information for δn1 • n ˆ In = 2∗σ2 be the information for δn ¯ ¯ ¯ ¯  Joint Distribution of Xn1 − Yn1 , Xn − Yn

¯ ¯ ¯ ¯  The joint distribution of Xn1 − Yn1 , Xn − Yn can be re-written as:  ˆ     −1 −1  δn1 δ In1 In ˆ ∼ , −1 −1 δn δ In In ˆ ˆ  Joint Distribution of δn1 , δn

ˆ ˆ  That is, δn1 , δn follows a bivariate normal distribution with:   • ˆ −1 δn1 ∼ N δ, In1 ˆ  −1 • δn ∼ N δ, In ˆ ˆ  −1 • Cov δn1 , δn = In ˆ ˆ A note on the joint distribution of δn1 and δn

• Many commonly used estimators follow a similar joint distribution ˆ ˆ as δn1 and δn asymptotically • This allows us to develop sequential methodology in a common framework that can be broadly applied • We will discuss this further in the future Joint Distribution of (Zn1 , Zn)

Under our new notation: ˆ p • Zn = δn In 1 √1 1 ˆ • Zn = δn In

which results in the following joint distribution for (Zn1 , Zn)    p   p  Zn δ In 1 In /In 1 ∼ N √ 1 , p 1 Zn δ In In1 /In 1 Finding the type-I error rate

We can find the type-I error rate by integrating over the joint

distribution of (Zn1 , Zn)

 type-I error rate = 1 − P |Zn1 | < Zα/2& |Zn| < Zα/2|δ = 0

Z Z1−α/2 Z Z1−α/2

= 1 − f (Zn1 , Zn|δ = 0) Zα/2 Zα/2 Two chances to make a type-1 error

Alternately (and perhaps more instructive), we can consider the two ways to make a type-1 error • Incorrectly reject null hypothesis at interim analysis • Incorrectly reject null hypothesis at study completion given that you did not stop at the interim analysis • The type-1 error rate is the probability of incorrectly rejecting at the interim analysis plus the probability of incorrectly rejecting at study completion given that the trial was not stopped at the interim analysis Probability of making a type-1 error at the interim analysis

It is straight-forward to calculate the probability of making a type-1 error at the interim analysis

Z Z1−α/2 Z ∞  P |Zn1 | > Zα/2 =1 − f (zn1 , zn|δ = 0) dzndzn1 Zα/2 −∞

Z Z1−α/2

=1 − f (zn1 |δ = 0) dzn1 Zα/2 =α Probability of making a type-1 error at study completion

We can calculate the probability of making a type-1 error at study completion by multiplying the probability of a type-I error at study completion by the probability of reaching full enrollment • Let C be an indicator function taking the value 1 if

|Zn1 | < Z1−alpha/2 • The probability of making a type-1 error at study completion is

Z Z1−α/2 (1 − α) f (zn|C = 1, δ = 0) Zα/2 What is f (zn|C = 1, δ = 0)?

√  • Marginally, Zn follows a normal distribution: Zn ∼ N δ In, 1

• Allowing for early termination alters the distribution of Zn conditional on C = 1

• What is f (zn|C = 1, δ = 0)? Density of Zn conditional on C = 1

√ √ √ √  I Z − I z   I Z − I z  n 1−α/2 n1 n n α/2 n1 n Φ √ − Φ √  √  In−In In−In 1 − 1 Z −δ I 1 1 2 n n f (zn|C = 1, δ = 0) =  q  √ √ e Φ − δ − Φ − δ  2π Z1−α/2 In1 Z1−α/2 In1

• f (zn|C = 1, δ = 0) is equal to f (zn) multiplied by a factor to account for the possibility of early termination • ( | = , δ = ) δ α f zn C 1 0 depends on , , In and In1 • / The most important factor is In1 In, the ratio of information at the interim analysis to the information at study completion Density of Zn conditional on C = 1

I_n1/I_n = 0.25 I_n1/I_n = 0.50 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

z_n z_n

I_n1/I_n = 0.75 I_n1/I_n = 0.95 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

z_n z_n f (zn|C = 1, δ = 0) as a function In1 /In

• f (zn|C = 1, δ = 0) has lighter tails than f (zn)

• The mass in the tails decreases as In1 /In increases • The probability of making a type-1 error at study completion

decreases as In1 /In increases What is the type-1 error rate?

• Type-1 error depends on In1 /In

• In1 /In = 0.25: type-1 error equals 0.091

• In1 /In = 0.50: type-1 error equals 0.083

• In1 /In = 0.75: type-1 error equals 0.073 Correcting the type-1 error rate

• The type-1 error rate is inflated regardless of In1 /In • How do we correct the type-1 error rate? • The simplest approach is to find α∗ such that the overall type-1 error rate equals α • These are known as Pocock stopping boundaries Corrected stopping boundaries

∗ • α for various values of In1 /In ∗ • In1 /In = 0.25: α = 0.027 ∗ • In1 /In = 0.50: α = 0.029 ∗ • In1 /In = 0.75: α = 0.033 Type-1 error summary

• Interim looks inflate the type-1 error rate • The amount of inflation depends on the proportion of information available at the interim analysis • We can correct the type-1 error rate by altering the stopping boundaries Power

• Recall that for for δ = .5 ∗ σ and α = 0.05: • 0.70 power with n = 50 • 0.94 power with n = 100 • How does altering the stopping boundaries impact power? Power

∗ • For In1 /In = .50, α = 0.029 results in a type-1 error of 0.05 • Power to detect δ = .5 ∗ σ for n = 50 is 0.66 • Power to detect δ = .5 ∗ σ for n = 100 is 0.92 Power

• Power has decreased. Why? • More stringent criteria for rejecting the null hypothesis • We have to increase the maximum sample size to assure adequate power

• Keeping In1 /In = .50 • n = 56 results in power of 0.70 • n = 110 results in power of 0.94 Power as function of In1 /In

• Power increases as In1 /In increases • For δ = .5 and n = 100

• In1 /In = 0.25 results in a power of 0.91

• In1 /In = 0.75 results in a power of 0.93 Maximum sample size as function of In1 /In

• The sample size inflation factor is also related to In1 /In • For δ = .5 and overall type-1 error of 0.05

• In1 /In = 0.25 requires a maximum sample size of 113 to maintain power = 0.94

• In1 /In = 0.75 requires a maximum sample size of 106 to maintain power = 0.94 Summary

• We have developed a two-stage design with the same type-I error and power as a fixed-sample design • α to α∗ to maintain type-I error rate • increased maximum sample size to maintain power • What is the benefit? Sample Size

• Sample size in a sequential study is a random variable

• The sample size is either n1 or n depending on Zn1 • We evaluate the benefit of a sequential design by considering the expected sample size Expected Sample Size

The expected sample size is calculated as

E (SS) = n1 (1 − P (C = 1)) + n ∗ P (C = 1) where C is an indicator that the trial reached full enrollment

Z1−α/2 √ Z 1 1 − (Zn −δ In ) P (C = 1) = √ e 2 1 1 Zα/2 2π Note that the expected sample size depends on δ and will be different for the null and alternative hypotheses Expected Sample Size

• For our two stage design, we have • Overall type-1 error rate equal to 0.05 • Power of 0.94 • Maximum sample size of 110

• In1 /In = 0.5 • What is the expected sample size? • Under the null, E(SS) = 108 • Under the alternative of δ = 0.5, E(SS) = 73 Expected Sample Size as function of In1 /In

• Expected sample size is also related to In1 /In • Assuming overall α equal to 0.05 and power equal to 0.94

• For In1 /In = 0.25 • Under the null, E(SS) = 110 • Under the alternative of δ = 0.5, E(SS) = 82

• For In1 /In = 0.75 • Under the null, E(SS) = 105 • Under the alternative of δ = 0.5, E(SS) = 83 Comparison

• Fixed-sample design • α = 0.05 • Power = 0.94 • N = 100 • Fixed-sample design • α = 0.05 • Power = 0.94 • E(N) = 108 under null • E(N) = 73 under alternative • Which is the better design? • Is the slight increase in sample size under the null (i.e. the worst case scenario) worth a substantial reduction under the alternative? Summary

• Adding interim analyses increases the type-I error rate • This can be fixed by changing the stopping boundaries • Correcting the stopping boundaries results in a decrease in power • Increase the maximum sample size to achieve the desired power • Sample size is now stochastic • Sequential designs result in dramatic reductions in the expected sample size in some cases