Within-Subject Clinical Trials: Introduction to New Methods and Statistical Models to RCT Or Not to RCT: That Is the Question

June 22, 2017 Within-Subject Clinical Trials: Introduction to New Methods and Statistical Models To RCT or not to RCT: That is the Question Donald E. Stull, PhD Head, Data Analytics and Design Strategy RTI Health Solutions 2 Agenda • Background: Are RCTs the only acceptable/respectable approach for establishing treatment efficacy or cause-effect? – RCTs = multi-country, multi-center, randomized, double-blind, placebo- controlled clinical trial • Brief (Read: NOT comprehensive) presentation of some “issues” with RCTs: – The Good, – The Bad, and – The Juggly • Some alternative approaches: – (Bayesian) adaptive trials – Within-Subject Clinical Trials (WSCTs) • Brief discussion of analytic approaches/software for dealing with intensive data 3 RCTs: The Good • “The Gold Standard” (?) • Internal validity • Randomization • Blinding • Control over comparisons • Manipulation of key variables 4 RCTs: The Bad • Ethics • External validity • Cost • Covariate imbalance • Investigator discretion 5 RCTs: The Juggly • Juggling the Good and the Bad: – RCTs are often a balance between cost, external/internal validity, accepting (choosing?) no direct head-to-head comparison, etc., etc. 6 Alternative Approaches to Understanding Change and Treatment Effects “Many new drugs are expensive, and in some countries drug budgets are growing faster than other health care sectors…The key questions are: how much better are the new drugs than the old ones, how much more does it cost to obtain the additional benefits, and does the extra cost represent value for the money.” (Henry and Hill, BMJ, 1995) • Does answering these key questions always require RCTs? Henry D, Hill S. Comparing treatments. BMJ. 1995 May 20;310(6990):1279. 7 The Randomized Controlled Trial: gold standard, or merely standard? “Because every study design may have problems in particular applications, studies should be evaluated by appropriate criteria, and not primarily according to the simplistic RCT/non- RCT dichotomy promoted by some prominent advocates of the evidence-based medicine movement and by the research evaluation guidelines based on its principles.” (Grossman & Mackenzie, 2005) Grossman J, Mackenzie FJ. The randomized controlled trial: gold standard, or merely standard? Perspect Biol Med. 2005 Autumn;48(4):516-34. 8 Alternative Approaches to Understanding Change and Treatment Effects • (Bayesian) Adaptive trials • Mixture models for heterogeneous data • What if you want to test a treatment for an ultra-rare disease? • What if you need a Go/No-Go decision? • Are there study designs that can handle these challenges without undertaking an RCT? We will focus on within-subject clinical trials as an approach to address many of these challenges 9 Within-Subject Clinical Trials: Complementary Alternatives to RCTs Ty A. Ridenour, PhD, MPE Developmental Behavior Epidemiologist Behavioral Health Epidemiology RTI International 10 Objective of RCTs Meta-efficacy: Conventional MDI Better Insulin Pump Better Figure 1—Effect sizes for parallel design studies. Studies are presented in increasing order of chronology from the bottom, with primary authors’ names along the left side of the graph. *Mean effect size. Bars denote the 95% CIs of the mean. Mean effect size for the 11 studies was d = 0.95. Weissberg-Benchell, Antisdel-Lomaglio, et al. Insulin pump therapy a meta-analysis. Diabetes Care, 2003; 26:, 1079-1087. 11 Objective of RCTs Range: Conventional MDI Better Insulin Pump Better Figure 1—Effect sizes for parallel design studies. Studies are presented in increasing order of chronology from the bottom, with primary authors’ names along the left side of the graph. *Mean effect size. Bars denote the 95% CIs of the mean. Mean effect size for the 11 studies was d = 0.95. Weissberg-Benchell, Antisdel-Lomaglio, et al. Insulin pump therapy a meta-analysis. Diabetes Care, 2003; 26:, 1079-1087. 12 Clinician’s Dilemma • Must base patient’s treatment on population • Ecological fallacy (Robinson, 1950) • Ergodicity theorem (Birkoff, 1931) • Simpson’s paradox (Simpson, 1951) 13 Needs for Within-Subject Clinical Trials • Small population or sample – Pilot studies – Rare or newly discovered diseases – Genetic microtrials • In-the-field research required • Little funding • Patients have study exclusion criteria • Intervention mechanisms / processes 14 Part 1: Within-Subject Experimental Designs Overall Goal of Designs: Eliminate alternative explanations • Multiple Baseline Design Results support Treatment Results don’t support Treatment AllPsych; //allpsych.com/researchmethods/multiplebaselines/#.Vd30PvlVhBe; Kazdin, Single-case research designs. Oxford U Press. 2011. 15 Problem with Visual Inspection Real life data are messy dL Glucose mg/ Patient D Ridenour, Pineo et al. Toward idiographic research in prevention science: Demonstration of three techniques for rigorous small sample research. Prevent Sci 2013;14: 267-278. 16 Part 1: Within-Subject Experimental Designs Overall Goal of Designs: Eliminate alternative explanations • Multiple Baseline Design Results support Treatment Results don’t support Treatment AllPsych; //allpsych.com/researchmethods/multiplebaselines/#.Vd30PvlVhBe; Kazdin, Single-case research designs. Oxford U Press. 2011. 17 Part 2: Hierarchical Modeling Hierarchical linear model • Level 1: time series observations within-person • Level 2: aggregates for individuals / sample it Yit = β 0 + u0i + β1 (Time) + u1i (Time) + β 2 Intx it + β 3 (Intx *Time) + eit intercept slope Differences between phases terms terms (control, treatment 1, treatment 2) 18 Levels of WSCT Models dL Glucose mg/ Patient A Patient B dL Glucose mg/ Patient C Patient D Ridenour, Pineo et al. Toward idiographic research in prevention science: Demonstration of three techniques for rigorous small sample research. Prevent Sci 2013;14: 267-278. 19 Hierarchical Model Components dL Glucose mg/ Patient A Patient B Intercepts dL Slopes Glucose mg/ Patient C Patient D 20 Implications for Pharmaceuticals and Medical Devices • WSCTs can help tease out potential “period effects” that may confound our understanding of the effects of an intervention – i.e., something occurs that affects the responses of all participants at a particular time • Standard analytic methods (e.g., HLM/MLM), we can examine responses across many assessment points and identify the “step functions” indicating when an intervention had an effect • Small numbers of participants are offset with many observations per participant, providing confidence in results 21 Illustration 1: Small Sample Pilot Study 8/1/05 8/8/05 8/15/05 8/22/05 8/29/05 9/5/05 9/12/05 9/19/05 9/26/05 10/3/05 10/10/05 10/17/05 10/24/05 10/31/05 11/7/05 11/14/05 11/21/05 11/28/05 12/5/05 12/12/05 12/19/05 12/26/05 Patient A ss ss GG GG GG GG GG GG GG GG GG GG GG GG GG … Patient B ss ss ss ss ss ss ss GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG … Patient C … Patient D … 10/2/06 10/9/06 10/16/06 10/23/06 10/30/06 11/6/06 11/13/06 11/20/06 11/27/06 12/4/06 12/11/06 12/18/06 12/25/06 1/1/07 1/8/07 1/15/07 1/22/07 1/29/07 2/5/07 2/12/07 2/19/07 2/26/07 3/5/07 3/12/07 3/19/07 3/26/07 4/2/07 4/9/07 4/16/07 4/23/07 4/30/07 Patient A … Patient B … Patient C … ss ss ss ss ss ss GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG Patient D … ss ss ss ss ss ss sG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG GG G Ridenour, Pineo et al. Toward idiographic research in prevention science: Demonstration of three techniques for rigorous small sample research. Prevent Sci 2013;14: 267-278. 22 Detailed Heterogeneity in Outcomes Aggregated 7:30am 11:30am 4:30pm 8:30pm Times -49.4 -35.9 -43.3* -59.4 -59.1* Entire Sample (9.2) (9.8) (194.2) (9.7) (277.9) -40.9 0.2* 1.8* -50.4 -104.2 Patient A (10.7) (11.1) (24.4) (20.2) (19.4) -107.9 -32.2 -117.3 -156.3 -122.2 Patient B (11.8) (8.8) (23.0) (19.3) (17.0) -22.6* 11.5* -66.6 -35.5* 3.0* Patient C (15.3) (27.5) (26.8) (25.4) (27.7) -24.6 -112.1 26.3* 43.5 -57.3 Patient D (10.1) (16.0) (17.6) (17.7) (24.3) Note: * Change in glucose was NS (p>.01). Parenthetical values are 95% confidence intervals. The orange cell presents preliminary efficacy. Green cells present “impact” of treatment per patient. Ridenour, Pineo et al. Toward idiographic research in prevention science: Demonstration of three techniques for rigorous small sample research. Prevent Sci 2013;14: 267-278. 23 Implications for Pharmaceuticals and Medical Devices • RCTs compare differences in mean effects between treatment arms • Variability around mean effects within treatment arms, (heterogeneity of treatment effects), can “wash out” differences between treatment arms or an understanding of when treatment is best administered (as in this example) • Examining what is occurring within patients can lend important insights into these effects and may be informative about individual responses • Examining this individual variability fits practically and philosophically with personalized medicine 24 Illustration 2: Rigorous Testing in Small Population • Immunosuppressive drugs prevent rejection of organ • 40-60% of patients lapse from treatment regimen • 15-25% of noncompliance due to high cost • Inclusion criteria: age>18, post-transplant of 6 MO for liver or 3 MO for kidney, 3+ trough concentrations before & after switch (stable dosing) • N=103 (48 liver, 55 kidney); observations = 746 trough concentrations (ng/mL) • No organ rejections, no appreciable changes in liver / kidney function Momper, Ridenour, et al. The impact of conversion from Prograf to generic tacrolimus in liver and kidney transplant recipients with stable graft function.

Within-Subject Clinical Trials: Introduction to New Methods and Statistical Models to RCT Or Not to RCT: That Is the Question

Should the Randomistas (Continue To) Rule?

Getting Off the Gold Standard for Causal Inference

Assessing the Accuracy of a New Diagnostic Test When a Gold Standard Does Not Exist Todd A

Memorandum Explaining Basis for Declining Request for Emergency Use Authorization for Emergency Use of Hydroxychloroquine Sulfate

Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries

Evaluating Diagnostic Tests in the Absence of a Gold Standard

Screening for Alcohol Problems, What Makes a Test Effective?

Adaptive Platform Trials

Common Types of Questions: Types of Study Designs

Efficient Adaptive Designs for Clinical Trials of Interventions for COVID-19

Assessing the Accuracy of Diagnostic Tests

Evaluation of Diagnostic Tests When There Is No Gold Standard Vol