SMARTs: Part I
Eric B. Laber
Department of Statistics, North Carolina State University
April 2019 SAMSI Warm up part I: quiz!
I Discuss with your stat buddy: I What is a clinical trial? What’s the best way to quantify the shame you should feel if you don’t know? I What is a power calculation? I What are common complicating statistical issues associated with clinical trials?
I True or false I Sequential Hierarchically Assigned Randomization Trials are the gold standard design for estimation and evaluation of treatment regimes. I Susan Murphy, who authored several seminal papers on sequential clinical trial design, is affectionately known as ‘Smurphy’ to her friends and colleagues. I The legend of Santa Clause may be partially based on Siberian Shamans consuming psychedelic mushrooms with reindeer.
1 / 64 Warm up: quiz! I Discuss with your stat buddy: I What is a clinical trial? What’s the best way to quantify the shame you should feel if you don’t know? I What is a power calculation? I What are common complicating statistical issues associated with clinical trials? I True or false I Sequential Hierarchically Assigned Randomization Trials are the gold standard design for estimation and evaluation of treatment regimes. I Susan Murphy, who authored several seminal papers on sequential clinical trial design, is affectionately known as ‘Smurphy’ to her friends and colleagues. I The legend of Santa Clause may be partially based on Siberian Shamans consuming psychedelic mushrooms with reindeer.1 1https://www.npr.org/2010/12/24/132260025/did-shrooms-send-santa- and-his-reindeer-flying Starting easy
Instead of having “answers” on a math test, they should just call them “impressions,” and if you got a different “impression,” so what, can’t we all be brothers? – Pythagoras
2 / 64 Precision medicine
I “The right treatment for the right patient at the right time.” –Mantra of precision medicine advocates I Widely recognized that best clinical care requires treatment decisions tailored to individual patient characteristics I Improve patient outcomes, reduce cost and patient burden
3 / 64 Precision medicine background
I Patient heterogeneity I Demographic I Physiological I Medical history/comorbidities I Genetic/genomic factors I Environment I ...
I Clinicians tailor therapy to individual patient characteristics I Evolution of health status I Patient individual preference I Local availability I Cost I ... 4 / 64 Precision medicine background cont’d
I Clinical decision making I Synthesis of available information I Expert judgment I Treatment guidelines
I Precision medicine I Data-driven, aka evidence-based I Seeks to inform not dictate decision making
5 / 64 Treatment regimes
I Formalize clinical decision making via sequence decision rules I One rule per stage of clinical intervention I Maps current patient info to recommended treatment
I Optimal regime maximizes the mean of some cumulative clinical outcome if applied to population of interest
6 / 64 Ex. Treatment regime: mHealth for PTSD in cancer patients (PI S. Smith)
First stage decision rule If distress ≥ 3 then: Cancer Distress Coach (CDC) Else if PTSD symptom score ≥ 20 then: CDC Else: usual care
Second stage decision rule If responder then: continue first stage treatment Else if using CDC and PSTD change ≥ 3 then: add mCoaching Else if using CDC and distress ≥ 4 then: add FaceTime CBT Else FaceTime CBT only
7 / 64 Key ingredients
I Critical decision points I Opportunities to change course of treatment I Fixed in calendar time or outcome driven
I Patient characteristics I Up-to-date history I Personal preferences
I Treatment options I Depend on time and patient history I May also depend on cost, resource availability, etc.
8 / 64 Data sources
I Observational studies I Cohort study, e.g., Framingham I EHR data
I Randomized clinical trials I K-arm randomized trial I Sequential Multiple Assignment Randomized Trials I Micro-randomized trials
9 / 64 Data sources
I Observational studies I Cohort study, e.g., Framingham I EHR data
I Randomized clinical trials I K-arm randomized trial I Sequential Multiple Assignment Randomized Trials I Micro-randomized trials
10 / 64 SMARTs
I Sequential Multiple Assignment Randomized Trials (SMARTs)
I Gold standard randomized trial design for evaluating treatment sequences (seminal paper: Murphy 2005 SIM) I Basic idea: randomize treatment assignment at critical decision points where there is equipoise
I Motivation for SMARTs I Avoid causal issues assoc with observational longitudinal data I Efficiently compare partial and full treatment sequences I Estimate optimal treatment regimes I Better mimic clinical practice
10 / 64 Ex. SMART: mHealth for PTSD Continue Distress coach
Yes Treatment AA Add mCoaching Treatment A Distress Coach Response?
No Treatment AB Facetime CBT R Continue R Follow-up only
Yes Treatment BA DC + mCoaching Treatment B Standard Care Response?
No Treatment BB Facetime CBT R
11 / 64 Ex. SMART: mHealth for PTSD cont’d
I Additional trial details I Response status assessed at 4 weeks I Response criterion PTSD symptoms exceed threshold I Primary outcome: PTSD symptoms
12 / 64 Ex. SMART: ADHD (PI: Pelham)
Yes Treatment AA
Treatment A Augment with MEDS Low Intensity BMOD Response? Treatment AB No Intensify BMOD R R
Yes Treatment BA
Treatment B Augment with BMOD Low Intensity MEDS Response? Treatment BB No Intensify MEDS R
13 / 64 Ex. SMART: ADHD (PI: Pelham) Cont’d
I Additional trial details I Response status assessed each month I Response criterion teacher reported classroom performance I Primary outcomes: parent and teacher reported outcomes, academic assessments, rule violations
14 / 64 Ex. SMART: Zika (PI: S. Becker-Dreps) No change Continue
Yes
Treatment Passive messaging + Response? Insecticide + Condoms Intensify Add active messaging
No
Augment R In-home visits R
Active control Insecticide + Condoms
15 / 64 Ex. SMART: Zika cont’d
I Additional trial details I Response status assessed at first trimester clinic visit I Response criterion: patient-reported compliance I Primary outcome: Zika infection at full term
16 / 64 Ongoing: Trial design for children with epilepsy
Treatment 1 Treatment 1 Continue Continue
Yes No Response? R
Treatment 0 Treatment 1 Treatment 3 Run-in period E+ADR+IAF E+ADR+IAF+PS
No High adherence? R
Yes Treatment 2
No further treatment E+ADR
17 / 64 Ongoing: Trial design for children with epilepsy
I Multiple outcomes I Adherence at 8, 14, and 20 months I Seizures in months 8-14 I QOL in month 14 I Healthcare utilization months 8-20
18 / 64 Randomization
I Three embedded regimes I A: E+ADR+IAF and add PS if non-response I B: E+ADR+IAF and continue if non-response I C: E+ADR and continue until end of study
I Block-permuted design among embedded treatment regimes I Balance within strata (base adherence x age x severity) Strata Block 1 Block 2 Block 3 ··· Block J 1 ACB CBA ACB ··· CBA 2 ABC CBA BCA ··· CBA 3 BAC BAC CAB ··· ABC ...... 8 ABC ACB CAB ··· CBA
19 / 64 We shall see that these concerns are mostly unfounded.
Common concerns with SMARTs
I Many design choices ⇒ unwieldy
I Splitting data ⇒ loss of power
I Involves subgroup analyses ⇒ complicated inference
20 / 64 Common concerns with SMARTs
I Many design choices ⇒ unwieldy
I Splitting data ⇒ loss of power
I Involves subgroup analyses ⇒ complicated inference
We shall see that these concerns are mostly unfounded.
20 / 64 Warm-up part II: toy study
I Prompt: researchers considering a two-stage SMART to evaluate two candidate first-stage treatments and two salvage therapies for non-responders. Responders will all receive the same maintenance therapy.
I Sketch a SMART assuming I First stage txts: (i) new active txt and (ii) std care I Non-responders: (i) salvage 1 and (ii) salvage 2 I Responders: maintenance I Add’l details
I Response assessed at four weeks I Justified by clinical application of std of care
21 / 64 Warm-up part II: toy study cont’d Treatment 2 Maintenance
Yes Treatment 3 Salvage 1 Treatment 0 New Active Treatment Response?
No Treatment 4 Salvage 2 R
R
Treatment 2 Maintenance
Yes Treatment 3 Salvage 1 Treatment 1 Standard of Care Response?
No Treatment 4 Salvage 2 R 22 / 64 A slight line or fold in time
23 / 64 Warm-up part II: toy study cont’d
I Suppose that the optimal duration of the new treatment is unknown and deemed of primary interest
I Want to compare waiting 4 and 8 weeks before assessing response under the new treatment I All other aspects are the same I Draw your design!
24 / 64 Thoughts? Feelings? Don’t forget. I care about you.
Warm-up part II: toy study cont’d
Treatment 3 Maintenance
Yes Treatment 0 Treatment 4 New Active Treat- Salvage 1 ment Assess Resp. Response? Treatment 5 4WK No Salvage 2 R
Treatment 3 Maintenance
Yes Treatment 1 Treatment 4 New Active Treat- Salvage 1 R ment Assess Resp. Response? Treatment 5 8WK No Salvage 2 R
Treatment 3 Maintenance
Yes Treatment 4 Treatment 2 Salvage 1 Standard of Care Response? Assess Resp. 4WK Treatment 5 No Salvage 2 R
25 / 64 Warm-up part II: toy study cont’d
Treatment 3 Maintenance
Yes Treatment 0 Treatment 4 New Active Treat- Salvage 1 ment Assess Resp. Response? Treatment 5 4WK No Salvage 2 R
Treatment 3 Maintenance
Yes Treatment 1 Treatment 4 New Active Treat- Salvage 1 R ment Assess Resp. Response? Treatment 5 8WK No Salvage 2 R
Treatment 3 Maintenance
Yes Treatment 4 Treatment 2 Salvage 1 Standard of Care Response? Assess Resp. 4WK Treatment 5 No Salvage 2 R
Thoughts? Feelings? Don’t forget. I care about you. 25 / 64 Warm-up part II: toy study cont’d
I Suppose that the researchers determine that the evaluation of salvage therapies under the standard of care arm is of less interest than the comparison of response rates across the new treatment at 4 and 8 weeks and standard of care at 4 weeks.
I Draw it!
26 / 64 Warm-up part II: toy study cont’d
Treatment 4 Maintenance
Yes Treatment 1 Treatment 5 New Active Treat- Salvage 1 ment Assess Resp. Response? Treatment 6 4WK No Salvage 2 R
Treatment 4 Maintenance
Yes Treatment 2 Treatment 5 New Active Treat- Salvage 1 R ment Assess Resp. Response? Treatment 6 8WK No Salvage 2 R
Treatment 3 Active control
27 / 64 Warm-up part II: will it ever end?!
I Suppose that researchers decide that in clinical practice they would like to assess response under the new treatment at four weeks and then have the option to recommend a salvage therapy or to recommend staying on the new treatment for another 4 weeks (8 total) before re-assessing.
I Draw it!
28 / 64 Warm-up part II: toy study cont’d
Treatment 4 Maintenance
Yes Treatment 5 Treatment 1 Salvage 1 New Active Treat- R Response? ment Given for 4WKs Treatment 6 No Salvage 2 R
Treatment 2 New Active Treat- ment Given Add’l 4WKs Treatment 4 R Maintenance Yes Treatment 5 Salvage 1
Response? Treatment 6 No Salvage 2 R Treatment 3 Active control
29 / 64 Warm-up part II: wrap-up
I Toy example illustrates how science drives design
I In practice, this process can take weeks or more I Cannot test every question in single design ⇒ prioritize I Allowing changing treatment options, dosages, and response criteria can lead to many more permutations
30 / 64 Choosing treatment options
I If feasible treatments well-established by clinical science no further consideration is necessary
I Common cases where treatment is unknown2 I Optimal treatment sequencing unknown I Subgroup needing expensive treatment unknown I Optimal dosage unknown
2Mathematically, these are going to end up looking very similar. However, clinical and intervention scientists often see these as being quite different. 31 / 64 Design pattern: switch away from the loser
I Try something, if it doesn’t work, then try something else
I Intervention scientists have set of candidate treatments that they’ll apply in sequence until one works
I Goal: identify best sequence of treatments I Primary analysis may be one-sequence-fits-all and secondary analysis individualized to patient characteristics I Target may be non-responders to standard treatment
32 / 64 Design pattern: switch away from the loser ex.
Treatment 5 Treatment 3 Keep it Up! booster Keep it Up! booster and YMHP
Yes No Response? R
Treatment 0 Treatment 1 Treatment 3 Queer Sex Ed Keep it Up! Keep it Up! booster
No Response? R
Yes Treatment 7 Treatment 2 Treatment 1 No further treatment Attention control Keep it Up!
No Response? R
Yes Treatment 4 Treatment 6 Control booster YMHP 33 / 64 KIU trial discussion
I Ongoing trial running in the US and PR I Safe sexual practices via sex ed for adolescent MSM I Reach subjects in stigmatized populations I Scalable to large public health scale
I Burn-in treatment: Queer Sex Ed (QSE, std treatment) followed by treatment changes for non-responders
I Interesting additional features I Not all subjects have had sexual debut I Response rates to QSE unknown ⇒ complicates power
34 / 64 Design pattern: switch away from loser generic
No further treatment
Yes Treatment 1 Treatment 2 No No further treatment 1 Response? 2 No further treatment Yes Yes Treatment 0 Treatment 2 Treatment 1 No No 0 Response? R 2 Response? 1
No further treatment
Yes Treatment 0 Treatment 2 No No further treatment 0 Response? 2 No further treatment Yes Yes Treatment 1 Treatment 2 Treatment 0 No No R 1 Response? R 2 Response? 0
No further treatment
Yes Treatment 1 Treatment 0 No No further treatment 1 Response? 0 No further treatment Yes Yes Treatment 2 Treatment 0 Treatment 1 No No 2 Response? R 0 Response? 1
35 / 64 Design pattern: stepped care
I Goal: cost effective txt regime that gives expensive/intensive txts only if, when, and to whom they are needed
I Step-up: start on cheap txt, escalate as needed
I Step-down: start on expensive txt, de-escalate when possible
36 / 64 Design pattern: step-up generic
Treatment 0 Treatment 2 Continue Expensive treatment I
Yes Treatment 0 Treatment 3 Inexpensive treatment No Response? R Expensive treatment II I
(a) R
Treatment 1 Treatment 4 Inexpensive treatment No Response? R Expensive treatment III II
Yes Treatment 1 Treatment 5 Continue Expensive treatment IV
37 / 64
Treatment 0 Treatment 1 Continue Expensive treatment I
Yes Treatment 0 Treatment 2 Inexpensive treatment No Response? R Expensive treatment II I
(b) R
Treatment 1 Treatment 3 No Expensive treatment I Response? R Expensive treatment III
Yes Treatment 4 R Expensive treatment IV
Treatment 1 Continue
Treatment 0 Inexpensive treatment I Design pattern: step-up generic
Treatment 0 Treatment 2 Continue Expensive treatment I
Yes Treatment 0 Treatment 3 Inexpensive treatment No Response? R Expensive treatment II I
(a) R
Treatment 1 Treatment 4 Inexpensive treatment No Response? R Expensive treatment III II
Yes Treatment 1 Treatment 5 Continue Expensive treatment IV
38 / 64
Treatment 0 Treatment 1 Continue Expensive treatment I
Yes Treatment 0 Treatment 2 Inexpensive treatment No Response? R Expensive treatment II I
(b) R
Treatment 1 Treatment 3 No Expensive treatment I Response? R Expensive treatment III
Yes Treatment 4 R Expensive treatment IV
Treatment 1 Continue
Treatment 0 Inexpensive treatment I Treatment 0 Treatment 2 Continue Expensive treatment I
Yes Treatment 0 Treatment 3 Inexpensive treatment No Response? R Expensive treatment II I
(a) R
Treatment 1 Treatment 4 Inexpensive treatment No Response? R Expensive treatment III II
Yes Treatment 1 Treatment 5 Continue Expensive treatment IV
Design pattern: step-down generic
Treatment 0 Treatment 1 Continue Expensive treatment I
Yes Treatment 0 Treatment 2 Inexpensive treatment No Response? R Expensive treatment II I
(b) R
Treatment 1 Treatment 3 No Expensive treatment I Response? R Expensive treatment III
Yes Treatment 4 R Expensive treatment IV
Treatment 1 Continue
Treatment 0 Inexpensive treatment I
39 / 64 Design pattern: step-down ex.
Treatment 0 Continue AllyQuest+
No Treatment 0 Treatment 0 Continue AllyQuest+ AllyQuest+ Response? Yes Treatment 1 R Step-down to AllyQuest R
Treatment 1 Treatment 0 No AllyQuest Response? Step-up to AllyQuest+
Yes Treatment 1 Treatment 2 Continue AllyQuest Control
40 / 64 AQ discussion
I Ongoing trial at UNC
I Goal med adherence among HIV+ adolescents I Mechanism: gamification and removal of stigma
I mHealth and eHealth increasingly used for precision medicine in situ, i.e., when and where interventions are needed
41 / 64 Design pattern: dose adjustments
I Multicomponent/multimodal treatments common in cancer
I Ex. UF colorectal cancer screening trial I Individualize message encouraging FIT screening I Ten binary factors and three tertiary factors I Must balance message overload with engagement
I Primary binary factor of interest: virtual human vs. text
42 / 64 Design pattern: dose adjustments generic CBT GT MS-A MS-B 0 0 0 0 0 0 0 1 0 0 1 0
Continue initial treatment 0 0 1 1 CBT GT MS-A MS-B 0 1 0 0 0 0 0 0 Yes 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 0 R 0 1 1 1 Response? 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 No 0 1 1 0 1 0 1 0 0 1 0 1 1 0 1 1 R 0 1 1 1 1 1 0 0 1 0 0 0 1 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 43 / 64 Design pattern: dose adjustment cont’d
I All possible level combinations: 2L with L bin factors I Full factorial: assign all possible level combinations
I Identify all interactions among factors I Requires potentially enormous sample size I Fractional factorial: subset of possible level combinations
I Fewer unique ‘treatments’ ⇒ small sample size I Some interactions cannot be estimated (aliased) I Efficient fractional factorial designs for SMARTs not well-studied, especially for est of opt txt regimes
44 / 64 Warm up: quiz! I Consider the AllyQuest design shown below. Suppose that researchers want to compare the mean outcomes under the embeded regimes: (e1) give AllyQuest+ throughout the entire follow-up period, and (e2) assign Allyquest initially and Allyquest+ to non-responders. Let ρ denote the probability of response under AllyQuest and ρ+ the probability of response under AllyQuest+. If initial treatments are randomized equally and there n patients enrolled in the trial, what is the expected number of patients who will be consistent with (e1) and (e2)? Treatment 0 Continue AllyQuest+
No Treatment 0 Treatment 0 Continue AllyQuest+ AllyQuest+ Response? Yes Treatment 1 R Step-down to AllyQuest R
Treatment 1 Treatment 0 No AllyQuest Response? Step-up to AllyQuest+
Yes Treatment 1 Treatment 2 Continue AllyQuest Control
45 / 64 Warm up: quiz! notes
46 / 64 Stratification
I Two-minute review I Randomization ⇒ balance of prognostic factors on average across treatment conditions I In small samples, can have severe imbalance by chance, stratifying randomization by key prognostic factors can ensure balance on these variables and reduce variance
I In SMART one can stratify each randomization separately but this is generally logistically complicated I Requires re-randomization on the fly I Not conceptually complicated but implementation non-trivial
I Easier alt: stratified randomization to the embedded regimes I Done at baseline with existing software/platforms I Analytically equivalent to sequential randomization
47 / 64 Intermission
If you ever teach a yodeling class, probably the hardest thing is to keep the students from just trying to yodel right off. You see, we build to that. –Emil Whilhem Richterich
48 / 64 Warm-up quiz
I Explain to your stat buddy: I What is inverse probability weighting and where was it first used? I What are the standard inputs/procedures for sizing a clinical trial? I What is the efficiency-ethics trade-off?
I True or false I IPWE of value can be expressed as MLE in some gen models I Cohen’s d was Lyor Cohen’s proposed name for def jam records I Trying to find out the weight of the largest domesticated cat in the world is unbelievable frustrating.
49 / 64 | | | I History: H1 = X1 and H2 = (X1 , A1, X2 )
Setup and notation
I Consider two-stage SMART with finite treatments I Generalize trivially to multistage txts I May adapt coding of txts as it suits us I In some designs part of patient data structurally missing
n I Trial will generate data {(X1,i , A1,i , X2,i , A2,i , Yi }i=1
pt I Xt ∈ R patient info at time t
I At ∈ At = {1,..., Kt } txt at time t I Y ∈ R outcome coded so that higher is better
50 / 64 Setup and notation
I Consider two-stage SMART with finite treatments I Generalize trivially to multistage txts I May adapt coding of txts as it suits us I In some designs part of patient data structurally missing
n I Trial will generate data {(X1,i , A1,i , X2,i , A2,i , Yi }i=1
pt I Xt ∈ R patient info at time t
I At ∈ At = {1,..., Kt } txt at time t I Y ∈ R outcome coded so that higher is better
| | | I History: H1 = X1 and H2 = (X1 , A1, X2 )
50 / 64 Sizing for first-stage response rate Treatment 2 PCST-Full maintenance
Treatment 3 No further treatment R Yes Treatment 4 Treatment 0 PCST-Plus PCST-Full Response? Treatment 2 No PCST-Full maintenance R
Treatment 5 PCST-Brief maintenance R Treatment 3 No further intervention R Yes Treatment 0 Treatment 1 PCST-Full PCST-Brief Response? No Treatment 5 PCST-Brief maintenance R
51 / 64 Sizing for first-stage response rate
Treatment 0 PCST-Full Response?
R
Treatment 1 PCST-Brief Response?
52 / 64 Sizing for first-stage response rate cont’d
I Let R = R(H2) ∈ {0, 1} indicator of response
0 I Let a1, a1 ∈ A1 be two distinct initial txts ∗ ∗ I Potential outcome R (a1) = R {H2 (a1)}, test
∗ ∗ 0 H0 : ER (a1) = ER (a1)
against a two-sided alternative
∗ ∗ 0 H1 : ER (a1) 6= ER (a1)
53 / 64 Sizing for first-stage response rate cont’d
I Suppose we want to ensure sufficient power when ∗ ∗ 0 ER (a1) − ER (a1) > δ, for clinically relevant diff δ > 0
I Inverse probability weighted estimator
n R1(A1=a1) o n P P(A1|H1) pa1,n = b n 1(A1=a1) o n P P(A1|H1)
test statistic Tn = pa ,n − p 0 , reject when |Tn| ‘large’ b 1 ba1,n
54 / 64 Sampling distribution of test statistic
I Define the estimated asymptotic variance
1 2 2 (R − pa1,n) (A1 = a1) σ 0 = b ba1,a ,n Pn 1 P(A1|H1) 2 0 R − p 0 1 (A1 = a ) ba1,n 1 + Pn , P(A1|H1)
then under H it follows from CLT + Slutsky that √ 0 nTn/σ 0 Normal(0, 1) ba1,a1,n
55 / 64 Quick reminder of asymptotics for IPWEs
56 / 64 Power for testing first-stage response
2 2 I Let σ 0 be popn analog of σ 0 a1,a1 ba1,a1,n √ Reject when n|Tn|/σ 0 > z1−α has type I error of no I ba1,a1,n more than α + o(1) and power of at least
√ Φ −z + nδ/σ 0 1−α/2 a1,a1 √ + Φ −z − nδ/σ 0 + o(1), 1−α/2 a1,a1
∗ ∗ 0 provided |ER (a1) − ER (a1)| ≥ δ
I Pick smallest integral n such that the above expression exceeds β using elicited value of σ 0 a1,a1
57 / 64 Power for testing first-stage response: dicussion
I Illustrates key features of sample size calculations I Planned analyses, e.g., a hypothesis test I Required operating characteristics for analyses, e.g., power I External factors, e.g., clinically meaningful difference, level of of test, variance, etc.
I Many sample size procedures of interest in SMARTs (and other designs) follow the same basic template
I Identify primary analyses I Decide performance guarantees I Estimate requisite sample size I (Often) additional calculations and/or simulations to evaluate impacts finite sample performance or assumption violations
58 / 64 Power comparing two fixed regimes
I Let π and π0 denote fixed and non-overlapping regimes, i.e., 0 π1(h1) 6= π1(h1) for all h1
∗ I Let V (π) = EY (π), test
0 H0 : V (π) = V (π )
against a two-sided alternative
0 H1 : V (π) 6= V (π )
With your stat buddy: brainstorm a sample size procedure
59 / 64 Power comparing two fixed regimes cont’d
I Suppose we want to ensure sufficient power when |V (π) − V (π0)| > δ, for clinically relevant diff δ > 0
I Use inverse probability weighted estimator
h Y 1{A1=π1(H1)}1{A2=π2(H2)} i n P P(A1|H1)P(A2|H2) Vbn(π) = h 1{A1=π1(H1)}1{A2=π2(H2)} i n P P(A1|H1)P(A2|H2)
0 test statistic Tn = Vbn(π) − Vbn(π ), reject when |Tn| ‘large’
60 / 64 Power comparing two fixed regimes cont’d
I Use same sorts of asymptotic arguments for response rate to derive plug-in estimator of asy variance
n o 2 Y − Vbn(π) 1 {A1 = π1(H1)} 1 {A2 = π2(H2)} 2 σbπ,π0,n = Pn P(A1|H1)P(A2|H2) n o 2 1 0 1 0 Y − Vbn(π) {A1 = π1(H1)} {A2 = π2(H2)} + Pn , P(A1|H1)P(A2|H2) √ then under H0 it follows that nTnσbπ,π0,n Normal(0, 1) which we can leverage for sample size calculations as in the preceding example
61 / 64 Power for testing first-stage response two fixed regimes
2 2 I Let σπ,π0 be popn analog of σbπ,π0,n √ I Reject when n|Tn|/σbπ,π0,n > z1−α has type I error of no more than α + o(1) and power of at least