SMARTs: Part I

Eric B. Laber

Department of , North Carolina State University

April 2019 SAMSI Warm up part I: quiz!

I Discuss with your stat buddy: I What is a ? What’s the best way to quantify the shame you should feel if you don’t know? I What is a power calculation? I What are common complicating statistical issues associated with clinical trials?

I True or false I Sequential Hierarchically Assigned Randomization Trials are the gold standard design for estimation and evaluation of treatment regimes. I Susan Murphy, who authored several seminal papers on sequential clinical trial design, is affectionately known as ‘Smurphy’ to her friends and colleagues. I The legend of Santa Clause may be partially based on Siberian Shamans consuming psychedelic mushrooms with reindeer.

1 / 64 Warm up: quiz! I Discuss with your stat buddy: I What is a clinical trial? What’s the best way to quantify the shame you should feel if you don’t know? I What is a power calculation? I What are common complicating statistical issues associated with clinical trials? I True or false I Sequential Hierarchically Assigned Randomization Trials are the gold standard design for estimation and evaluation of treatment regimes. I Susan Murphy, who authored several seminal papers on sequential clinical trial design, is affectionately known as ‘Smurphy’ to her friends and colleagues. I The legend of Santa Clause may be partially based on Siberian Shamans consuming psychedelic mushrooms with reindeer.1 1https://www.npr.org/2010/12/24/132260025/did-shrooms-send-santa- and-his-reindeer-flying Starting easy

Instead of having “answers” on a math test, they should just call them “impressions,” and if you got a different “impression,” so what, can’t we all be brothers? – Pythagoras

2 / 64 Precision medicine

I “The right treatment for the right patient at the right time.” –Mantra of precision medicine advocates I Widely recognized that best clinical care requires treatment decisions tailored to individual patient characteristics I Improve patient outcomes, reduce cost and patient burden

3 / 64 Precision medicine background

I Patient heterogeneity I Demographic I Physiological I Medical history/comorbidities I Genetic/genomic factors I Environment I ...

I Clinicians tailor therapy to individual patient characteristics I Evolution of health status I Patient individual preference I Local availability I Cost I ... 4 / 64 Precision medicine background cont’d

I Clinical decision making I Synthesis of available information I Expert judgment I Treatment guidelines

I Precision medicine I Data-driven, aka evidence-based I Seeks to inform not dictate decision making

5 / 64 Treatment regimes

I Formalize clinical decision making via sequence decision rules I One rule per stage of clinical intervention I Maps current patient info to recommended treatment

I Optimal regime maximizes the of some cumulative clinical outcome if applied to population of interest

6 / 64 Ex. Treatment regime: mHealth for PTSD in cancer patients (PI S. Smith)

First stage decision rule If distress ≥ 3 then: Cancer Distress Coach (CDC) Else if PTSD symptom score ≥ 20 then: CDC Else: usual care

Second stage decision rule If responder then: continue first stage treatment Else if using CDC and PSTD change ≥ 3 then: add mCoaching Else if using CDC and distress ≥ 4 then: add FaceTime CBT Else FaceTime CBT only

7 / 64 Key ingredients

I Critical decision points I Opportunities to change course of treatment I Fixed in calendar time or outcome driven

I Patient characteristics I Up-to-date history I Personal preferences

I Treatment options I Depend on time and patient history I May also depend on cost, resource availability, etc.

8 / 64 Data sources

I Observational studies I , e.g., Framingham I EHR data

I Randomized clinical trials I K-arm randomized trial I Sequential Multiple Assignment Randomized Trials I Micro-randomized trials

9 / 64 Data sources

I Observational studies I Cohort study, e.g., Framingham I EHR data

I Randomized clinical trials I K-arm randomized trial I Sequential Multiple Assignment Randomized Trials I Micro-randomized trials

10 / 64 SMARTs

I Sequential Multiple Assignment Randomized Trials (SMARTs)

I Gold standard randomized trial design for evaluating treatment sequences (seminal paper: Murphy 2005 SIM) I Basic idea: randomize treatment assignment at critical decision points where there is equipoise

I Motivation for SMARTs I Avoid causal issues assoc with observational longitudinal data I Efficiently compare partial and full treatment sequences I Estimate optimal treatment regimes I Better mimic clinical practice

10 / 64 Ex. SMART: mHealth for PTSD Continue Distress coach

Yes Treatment AA Add mCoaching Treatment A Distress Coach Response?

No Treatment AB Facetime CBT R Continue R Follow-up only

Yes Treatment BA DC + mCoaching Treatment B Standard Care Response?

No Treatment BB Facetime CBT R

11 / 64 Ex. SMART: mHealth for PTSD cont’d

I Additional trial details I Response status assessed at 4 weeks I Response criterion PTSD symptoms exceed threshold I Primary outcome: PTSD symptoms

12 / 64 Ex. SMART: ADHD (PI: Pelham)

Yes Treatment AA

Treatment A Augment with MEDS Low Intensity BMOD Response? Treatment AB No Intensify BMOD R R

Yes Treatment BA

Treatment B Augment with BMOD Low Intensity MEDS Response? Treatment BB No Intensify MEDS R

13 / 64 Ex. SMART: ADHD (PI: Pelham) Cont’d

I Additional trial details I Response status assessed each month I Response criterion teacher reported classroom performance I Primary outcomes: parent and teacher reported outcomes, academic assessments, rule violations

14 / 64 Ex. SMART: Zika (PI: S. Becker-Dreps) No change Continue

Yes

Treatment Passive messaging + Response? Insecticide + Condoms Intensify Add active messaging

No

Augment R In-home visits R

Active control Insecticide + Condoms

15 / 64 Ex. SMART: Zika cont’d

I Additional trial details I Response status assessed at first trimester clinic visit I Response criterion: patient-reported compliance I Primary outcome: Zika infection at full term

16 / 64 Ongoing: Trial design for children with epilepsy

Treatment 1 Treatment 1 Continue Continue

Yes No Response? R

Treatment 0 Treatment 1 Treatment 3 Run-in period E+ADR+IAF E+ADR+IAF+PS

No High adherence? R

Yes Treatment 2

No further treatment E+ADR

17 / 64 Ongoing: Trial design for children with epilepsy

I Multiple outcomes I Adherence at 8, 14, and 20 months I Seizures in months 8-14 I QOL in month 14 I Healthcare utilization months 8-20

18 / 64 Randomization

I Three embedded regimes I A: E+ADR+IAF and add PS if non-response I B: E+ADR+IAF and continue if non-response I C: E+ADR and continue until end of study

I Block-permuted design among embedded treatment regimes I Balance within strata (base adherence x age x severity) Strata Block 1 Block 2 Block 3 ··· Block J 1 ACB CBA ACB ··· CBA 2 ABC CBA BCA ··· CBA 3 BAC BAC CAB ··· ABC ...... 8 ABC ACB CAB ··· CBA

19 / 64 We shall see that these concerns are mostly unfounded.

Common concerns with SMARTs

I Many design choices ⇒ unwieldy

I Splitting data ⇒ loss of power

I Involves subgroup analyses ⇒ complicated inference

20 / 64 Common concerns with SMARTs

I Many design choices ⇒ unwieldy

I Splitting data ⇒ loss of power

I Involves subgroup analyses ⇒ complicated inference

We shall see that these concerns are mostly unfounded.

20 / 64 Warm-up part II: toy study

I Prompt: researchers considering a two-stage SMART to evaluate two candidate first-stage treatments and two salvage therapies for non-responders. Responders will all receive the same maintenance therapy.

I Sketch a SMART assuming I First stage txts: (i) new active txt and (ii) std care I Non-responders: (i) salvage 1 and (ii) salvage 2 I Responders: maintenance I Add’l details

I Response assessed at four weeks I Justified by clinical application of std of care

21 / 64 Warm-up part II: toy study cont’d Treatment 2 Maintenance

Yes Treatment 3 Salvage 1 Treatment 0 New Active Treatment Response?

No Treatment 4 Salvage 2 R

R

Treatment 2 Maintenance

Yes Treatment 3 Salvage 1 Treatment 1 Standard of Care Response?

No Treatment 4 Salvage 2 R 22 / 64 A slight line or fold in time

23 / 64 Warm-up part II: toy study cont’d

I Suppose that the optimal duration of the new treatment is unknown and deemed of primary interest

I Want to compare waiting 4 and 8 weeks before assessing response under the new treatment I All other aspects are the same I Draw your design!

24 / 64 Thoughts? Feelings? Don’t forget. I care about you.

Warm-up part II: toy study cont’d

Treatment 3 Maintenance

Yes Treatment 0 Treatment 4 New Active Treat- Salvage 1 ment Assess Resp. Response? Treatment 5 4WK No Salvage 2 R

Treatment 3 Maintenance

Yes Treatment 1 Treatment 4 New Active Treat- Salvage 1 R ment Assess Resp. Response? Treatment 5 8WK No Salvage 2 R

Treatment 3 Maintenance

Yes Treatment 4 Treatment 2 Salvage 1 Standard of Care Response? Assess Resp. 4WK Treatment 5 No Salvage 2 R

25 / 64 Warm-up part II: toy study cont’d

Treatment 3 Maintenance

Yes Treatment 0 Treatment 4 New Active Treat- Salvage 1 ment Assess Resp. Response? Treatment 5 4WK No Salvage 2 R

Treatment 3 Maintenance

Yes Treatment 1 Treatment 4 New Active Treat- Salvage 1 R ment Assess Resp. Response? Treatment 5 8WK No Salvage 2 R

Treatment 3 Maintenance

Yes Treatment 4 Treatment 2 Salvage 1 Standard of Care Response? Assess Resp. 4WK Treatment 5 No Salvage 2 R

Thoughts? Feelings? Don’t forget. I care about you. 25 / 64 Warm-up part II: toy study cont’d

I Suppose that the researchers determine that the evaluation of salvage therapies under the standard of care arm is of less interest than the comparison of response rates across the new treatment at 4 and 8 weeks and standard of care at 4 weeks.

I Draw it!

26 / 64 Warm-up part II: toy study cont’d

Treatment 4 Maintenance

Yes Treatment 1 Treatment 5 New Active Treat- Salvage 1 ment Assess Resp. Response? Treatment 6 4WK No Salvage 2 R

Treatment 4 Maintenance

Yes Treatment 2 Treatment 5 New Active Treat- Salvage 1 R ment Assess Resp. Response? Treatment 6 8WK No Salvage 2 R

Treatment 3 Active control

27 / 64 Warm-up part II: will it ever end?!

I Suppose that researchers decide that in clinical practice they would like to assess response under the new treatment at four weeks and then have the option to recommend a salvage therapy or to recommend staying on the new treatment for another 4 weeks (8 total) before re-assessing.

I Draw it!

28 / 64 Warm-up part II: toy study cont’d

Treatment 4 Maintenance

Yes Treatment 5 Treatment 1 Salvage 1 New Active Treat- R Response? ment Given for 4WKs Treatment 6 No Salvage 2 R

Treatment 2 New Active Treat- ment Given Add’l 4WKs Treatment 4 R Maintenance Yes Treatment 5 Salvage 1

Response? Treatment 6 No Salvage 2 R Treatment 3 Active control

29 / 64 Warm-up part II: wrap-up

I Toy example illustrates how science drives design

I In practice, this process can take weeks or more I Cannot test every question in single design ⇒ prioritize I Allowing changing treatment options, dosages, and response criteria can lead to many more permutations

30 / 64 Choosing treatment options

I If feasible treatments well-established by clinical science no further consideration is necessary

I Common cases where treatment is unknown2 I Optimal treatment sequencing unknown I Subgroup needing expensive treatment unknown I Optimal dosage unknown

2Mathematically, these are going to end up looking very similar. However, clinical and intervention scientists often see these as being quite different. 31 / 64 Design pattern: switch away from the loser

I Try something, if it doesn’t work, then try something else

I Intervention scientists have set of candidate treatments that they’ll apply in sequence until one works

I Goal: identify best sequence of treatments I Primary analysis may be one-sequence-fits-all and secondary analysis individualized to patient characteristics I Target may be non-responders to standard treatment

32 / 64 Design pattern: switch away from the loser ex.

Treatment 5 Treatment 3 Keep it Up! booster Keep it Up! booster and YMHP

Yes No Response? R

Treatment 0 Treatment 1 Treatment 3 Queer Sex Ed Keep it Up! Keep it Up! booster

No Response? R

Yes Treatment 7 Treatment 2 Treatment 1 No further treatment Attention control Keep it Up!

No Response? R

Yes Treatment 4 Treatment 6 Control booster YMHP 33 / 64 KIU trial discussion

I Ongoing trial running in the US and PR I Safe sexual practices via sex ed for adolescent MSM I Reach subjects in stigmatized populations I Scalable to large public health scale

I Burn-in treatment: Queer Sex Ed (QSE, std treatment) followed by treatment changes for non-responders

I Interesting additional features I Not all subjects have had sexual debut I Response rates to QSE unknown ⇒ complicates power

34 / 64 Design pattern: switch away from loser generic

No further treatment

Yes Treatment 1 Treatment 2 No No further treatment 1 Response? 2 No further treatment Yes Yes Treatment 0 Treatment 2 Treatment 1 No No 0 Response? R 2 Response? 1

No further treatment

Yes Treatment 0 Treatment 2 No No further treatment 0 Response? 2 No further treatment Yes Yes Treatment 1 Treatment 2 Treatment 0 No No R 1 Response? R 2 Response? 0

No further treatment

Yes Treatment 1 Treatment 0 No No further treatment 1 Response? 0 No further treatment Yes Yes Treatment 2 Treatment 0 Treatment 1 No No 2 Response? R 0 Response? 1

35 / 64 Design pattern: stepped care

I Goal: cost effective txt regime that gives expensive/intensive txts only if, when, and to whom they are needed

I Step-up: start on cheap txt, escalate as needed

I Step-down: start on expensive txt, de-escalate when possible

36 / 64 Design pattern: step-up generic

Treatment 0 Treatment 2 Continue Expensive treatment I

Yes Treatment 0 Treatment 3 Inexpensive treatment No Response? R Expensive treatment II I

(a) R

Treatment 1 Treatment 4 Inexpensive treatment No Response? R Expensive treatment III II

Yes Treatment 1 Treatment 5 Continue Expensive treatment IV

37 / 64

Treatment 0 Treatment 1 Continue Expensive treatment I

Yes Treatment 0 Treatment 2 Inexpensive treatment No Response? R Expensive treatment II I

(b) R

Treatment 1 Treatment 3 No Expensive treatment I Response? R Expensive treatment III

Yes Treatment 4 R Expensive treatment IV

Treatment 1 Continue

Treatment 0 Inexpensive treatment I Design pattern: step-up generic

Treatment 0 Treatment 2 Continue Expensive treatment I

Yes Treatment 0 Treatment 3 Inexpensive treatment No Response? R Expensive treatment II I

(a) R

Treatment 1 Treatment 4 Inexpensive treatment No Response? R Expensive treatment III II

Yes Treatment 1 Treatment 5 Continue Expensive treatment IV

38 / 64

Treatment 0 Treatment 1 Continue Expensive treatment I

Yes Treatment 0 Treatment 2 Inexpensive treatment No Response? R Expensive treatment II I

(b) R

Treatment 1 Treatment 3 No Expensive treatment I Response? R Expensive treatment III

Yes Treatment 4 R Expensive treatment IV

Treatment 1 Continue

Treatment 0 Inexpensive treatment I Treatment 0 Treatment 2 Continue Expensive treatment I

Yes Treatment 0 Treatment 3 Inexpensive treatment No Response? R Expensive treatment II I

(a) R

Treatment 1 Treatment 4 Inexpensive treatment No Response? R Expensive treatment III II

Yes Treatment 1 Treatment 5 Continue Expensive treatment IV

Design pattern: step-down generic

Treatment 0 Treatment 1 Continue Expensive treatment I

Yes Treatment 0 Treatment 2 Inexpensive treatment No Response? R Expensive treatment II I

(b) R

Treatment 1 Treatment 3 No Expensive treatment I Response? R Expensive treatment III

Yes Treatment 4 R Expensive treatment IV

Treatment 1 Continue

Treatment 0 Inexpensive treatment I

39 / 64 Design pattern: step-down ex.

Treatment 0 Continue AllyQuest+

No Treatment 0 Treatment 0 Continue AllyQuest+ AllyQuest+ Response? Yes Treatment 1 R Step-down to AllyQuest R

Treatment 1 Treatment 0 No AllyQuest Response? Step-up to AllyQuest+

Yes Treatment 1 Treatment 2 Continue AllyQuest Control

40 / 64 AQ discussion

I Ongoing trial at UNC

I Goal med adherence among HIV+ adolescents I Mechanism: gamification and removal of stigma

I mHealth and eHealth increasingly used for precision medicine in situ, i.e., when and where interventions are needed

41 / 64 Design pattern: dose adjustments

I Multicomponent/multimodal treatments common in cancer

I Ex. UF colorectal cancer screening trial I Individualize message encouraging FIT screening I Ten binary factors and three tertiary factors I Must balance message overload with engagement

I Primary binary factor of interest: virtual human vs. text

42 / 64 Design pattern: dose adjustments generic CBT GT MS-A MS-B 0 0 0 0 0 0 0 1 0 0 1 0

Continue initial treatment 0 0 1 1 CBT GT MS-A MS-B 0 1 0 0 0 0 0 0 Yes 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 0 R 0 1 1 1 Response? 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 No 0 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 R 0 1 1 1 1 1 0 0 1 0 0 0 1 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 43 / 64 Design pattern: dose adjustment cont’d

I All possible level combinations: 2L with L bin factors I Full factorial: assign all possible level combinations

I Identify all interactions among factors I Requires potentially enormous sample size I Fractional factorial: subset of possible level combinations

I Fewer unique ‘treatments’ ⇒ small sample size I Some interactions cannot be estimated (aliased) I Efficient fractional factorial designs for SMARTs not well-studied, especially for est of opt txt regimes

44 / 64 Warm up: quiz! I Consider the AllyQuest design shown below. Suppose that researchers want to compare the mean outcomes under the embeded regimes: (e1) give AllyQuest+ throughout the entire follow-up period, and (e2) assign Allyquest initially and Allyquest+ to non-responders. Let ρ denote the probability of response under AllyQuest and ρ+ the probability of response under AllyQuest+. If initial treatments are randomized equally and there n patients enrolled in the trial, what is the expected number of patients who will be consistent with (e1) and (e2)? Treatment 0 Continue AllyQuest+

No Treatment 0 Treatment 0 Continue AllyQuest+ AllyQuest+ Response? Yes Treatment 1 R Step-down to AllyQuest R

Treatment 1 Treatment 0 No AllyQuest Response? Step-up to AllyQuest+

Yes Treatment 1 Treatment 2 Continue AllyQuest Control

45 / 64 Warm up: quiz! notes

46 / 64 Stratification

I Two-minute review I Randomization ⇒ balance of prognostic factors on average across treatment conditions I In small samples, can have severe imbalance by chance, stratifying randomization by key prognostic factors can ensure balance on these variables and reduce

I In SMART one can stratify each randomization separately but this is generally logistically complicated I Requires re-randomization on the fly I Not conceptually complicated but implementation non-trivial

I Easier alt: stratified randomization to the embedded regimes I Done at baseline with existing software/platforms I Analytically equivalent to sequential randomization

47 / 64 Intermission

If you ever teach a yodeling class, probably the hardest thing is to keep the students from just trying to yodel right off. You see, we build to that. –Emil Whilhem Richterich

48 / 64 Warm-up quiz

I Explain to your stat buddy: I What is inverse probability weighting and where was it first used? I What are the standard inputs/procedures for sizing a clinical trial? I What is the efficiency-ethics trade-off?

I True or false I IPWE of value can be expressed as MLE in some gen models I Cohen’s d was Lyor Cohen’s proposed name for def jam records I Trying to find out the weight of the largest domesticated cat in the world is unbelievable frustrating.

49 / 64 | | | I History: H1 = X1 and H2 = (X1 , A1, X2 )

Setup and notation

I Consider two-stage SMART with finite treatments I Generalize trivially to multistage txts I May adapt coding of txts as it suits us I In some designs part of patient data structurally missing

n I Trial will generate data {(X1,i , A1,i , X2,i , A2,i , Yi }i=1

pt I Xt ∈ R patient info at time t

I At ∈ At = {1,..., Kt } txt at time t I Y ∈ R outcome coded so that higher is better

50 / 64 Setup and notation

I Consider two-stage SMART with finite treatments I Generalize trivially to multistage txts I May adapt coding of txts as it suits us I In some designs part of patient data structurally missing

n I Trial will generate data {(X1,i , A1,i , X2,i , A2,i , Yi }i=1

pt I Xt ∈ R patient info at time t

I At ∈ At = {1,..., Kt } txt at time t I Y ∈ R outcome coded so that higher is better

| | | I History: H1 = X1 and H2 = (X1 , A1, X2 )

50 / 64 Sizing for first-stage response rate Treatment 2 PCST-Full maintenance

Treatment 3 No further treatment R Yes Treatment 4 Treatment 0 PCST-Plus PCST-Full Response? Treatment 2 No PCST-Full maintenance R

Treatment 5 PCST-Brief maintenance R Treatment 3 No further intervention R Yes Treatment 0 Treatment 1 PCST-Full PCST-Brief Response? No Treatment 5 PCST-Brief maintenance R

51 / 64 Sizing for first-stage response rate

Treatment 0 PCST-Full Response?

R

Treatment 1 PCST-Brief Response?

52 / 64 Sizing for first-stage response rate cont’d

I Let R = R(H2) ∈ {0, 1} indicator of response

0 I Let a1, a1 ∈ A1 be two distinct initial txts ∗ ∗ I Potential outcome R (a1) = R {H2 (a1)}, test

∗ ∗ 0 H0 : ER (a1) = ER (a1)

against a two-sided alternative

∗ ∗ 0 H1 : ER (a1) 6= ER (a1)

53 / 64 Sizing for first-stage response rate cont’d

I Suppose we want to ensure sufficient power when ∗ ∗ 0 ER (a1) − ER (a1) > δ, for clinically relevant diff δ > 0

I Inverse probability weighted estimator

n R1(A1=a1) o n P P(A1|H1) pa1,n = b n 1(A1=a1) o n P P(A1|H1)

test Tn = pa ,n − p 0 , reject when |Tn| ‘large’ b 1 ba1,n

54 / 64 distribution of test statistic

I Define the estimated asymptotic variance

 1 2 2 (R − pa1,n) (A1 = a1) σ 0 = b ba1,a ,n Pn 1 P(A1|H1) 2   0  R − p 0 1 (A1 = a )  ba1,n 1  + Pn ,  P(A1|H1) 

then under H it follows from CLT + Slutsky that √ 0 nTn/σ 0 Normal(0, 1) ba1,a1,n

55 / 64 Quick reminder of asymptotics for IPWEs

56 / 64 Power for testing first-stage response

2 2 I Let σ 0 be popn analog of σ 0 a1,a1 ba1,a1,n √ Reject when n|Tn|/σ 0 > z1−α has type I error of no I ba1,a1,n more than α + o(1) and power of at least

 √  Φ −z + nδ/σ 0 1−α/2 a1,a1  √  + Φ −z − nδ/σ 0 + o(1), 1−α/2 a1,a1

∗ ∗ 0 provided |ER (a1) − ER (a1)| ≥ δ

I Pick smallest integral n such that the above expression exceeds β using elicited value of σ 0 a1,a1

57 / 64 Power for testing first-stage response: dicussion

I Illustrates key features of sample size calculations I Planned analyses, e.g., a hypothesis test I Required operating characteristics for analyses, e.g., power I External factors, e.g., clinically meaningful difference, level of of test, variance, etc.

I Many sample size procedures of interest in SMARTs (and other designs) follow the same basic template

I Identify primary analyses I Decide performance guarantees I Estimate requisite sample size I (Often) additional calculations and/or simulations to evaluate impacts finite sample performance or assumption violations

58 / 64 Power comparing two fixed regimes

I Let π and π0 denote fixed and non-overlapping regimes, i.e., 0 π1(h1) 6= π1(h1) for all h1

∗ I Let V (π) = EY (π), test

0 H0 : V (π) = V (π )

against a two-sided alternative

0 H1 : V (π) 6= V (π )

With your stat buddy: brainstorm a sample size procedure

59 / 64 Power comparing two fixed regimes cont’d

I Suppose we want to ensure sufficient power when |V (π) − V (π0)| > δ, for clinically relevant diff δ > 0

I Use inverse probability weighted estimator

h Y 1{A1=π1(H1)}1{A2=π2(H2)} i n P P(A1|H1)P(A2|H2) Vbn(π) = h 1{A1=π1(H1)}1{A2=π2(H2)} i n P P(A1|H1)P(A2|H2)

0 test statistic Tn = Vbn(π) − Vbn(π ), reject when |Tn| ‘large’

60 / 64 Power comparing two fixed regimes cont’d

I Use same sorts of asymptotic arguments for response rate to derive plug-in estimator of asy variance

n o 2 Y − Vbn(π) 1 {A1 = π1(H1)} 1 {A2 = π2(H2)} 2 σbπ,π0,n = Pn   P(A1|H1)P(A2|H2) n o 2  1 0 1 0  Y − Vbn(π) {A1 = π1(H1)} {A2 = π2(H2)} + Pn   , P(A1|H1)P(A2|H2) √ then under H0 it follows that nTnσbπ,π0,n Normal(0, 1) which we can leverage for sample size calculations as in the preceding example

61 / 64 Power for testing first-stage response two fixed regimes

2 2 I Let σπ,π0 be popn analog of σbπ,π0,n √ I Reject when n|Tn|/σbπ,π0,n > z1−α has type I error of no more than α + o(1) and power of at least

√  Φ −z1−α/2 + nδ/σπ,π0 √  + Φ −z1−α/2 − nδ/σπ,π0 + o(1),

provided |V (π) − V (π0)| ≥ δ

I Pick smallest integral n such that the above expression exceeds β using elicited value of σπ,π0

62 / 64 Concluding remarks

I Many estimands of interest are asymptotically normal under the null and thus amenable to straightforward power calc

I Common criteria for sizing a SMART I Comparison of response rates I Comparison of non-overlapping regimes, e.g., most intensive vs. least intensive etc. I Comparison of embedded regimes I First stage treatments marginalizing over remaining treatment stages (WARNING!) I First stage treatment maximizing over remaining treatment stages (WARNING!)

63 / 64 Concluding remarks cont’d

I Simple comparisons are widely used I Correspond to primary scientific question I Give reviewers confidence (esp. when SMARTs were new) I Belief that sizing for other comparisons too complicated I More complicated analyses left to secondary analyis

I Modifications to SMARTs I Adaptive randomization I Early termination I Platform trials I ···

64 / 64