Strengths and Pitfalls of Randomized vs. Observational Analyses of Treatment Effects

Matthew T. Roe, MD, MHS Laine Thomas, PhD Professor of Medicine Assistant Professor DCRI Fellowship Department of Biostatics and Bioinformatics Program Director Finding the “Truth” Clinical trials versus Observational data

Clinical Trials Observational

Matthew T. Roe, MD, MHS Laine Thomas, PhD Professor of Medicine Assistant Professor DCRI Fellowship Program Director Department of Biostatistics and Duke/DCRI Bioinformatics Duke/DCRI Estimating Treatment Effects Using Traditional Randomized Data

Matthew T. Roe, MD, MHS Attributes of Traditional Randomized Clinical Trials

▪ Randomization equals the playing field to result in comparable populations to avoid selection biases ▪ Well-practiced standards for blinding of treatment allocation and GCP adherence for trial conduct principles ▪ Study drugs are stored under controlled circumstances, distributed and accounted for by study sites, and monitored closely for compliance ▪ Controlled experiment with frequent contact with participants to ascertain longitudinal follow up and medication changes (to randomized study drug, as well as to concomitant medications) Guiding Principles to Define Quality in Traditional Trials

▪ Right Patient: – Have we enrolled the right participants according to the protocol with adequate consent? ▪ Right Intervention: – Did participants receive the assigned treatment and did they stay on the treatment? ▪ Right Primary/Secondary Outcomes: – Was there complete ascertainment of primary and secondary data? ▪ Right Safety Outcomes: – Was there complete ascertainment of primary and secondary safety data? ▪ Right Study Conduct: – Were there any major GCP-related issues? Challenges Facing Traditional Randomized Clinical Trials

• Limited pool of trial site investigators who may not be representative of typical clinicians treating patients in routine practice • Further compounded by limited participation of patients with the disease state of interest from trial sites Site Enrollment Metrics from Traditional Trials

Traditional Lipid Atrial ACS 1 ACS 2 Diabetes ACS 3 Trials Lowering Fibrillation

Median 0.2 0.4 0.2 0.3 0.6 0.6 Pts/Site/Month

(Q1,Q3) (0.1,0.3) (0.2,0.7) (0.1, 0.5) (0.2,0.5) (0.3,1.0) (0.3,1.0) Challenges Facing Traditional Randomized Clinical Trials

Early discontinuation rates of randomized study drug in trials remain high and differs across geographic regions Challenges Facing Traditional Randomized Clinical Trials

Regional differences in patient characteristics and trial results Geographic Regions CV Death, MI, Stroke

Geographic Total KM at month 12 Interaction region patients Tic Clop HR (95% CI) p-values

Asia / 1714 11.4 14.8 0.80 (0.61, 1.04) Australia

Central America / 1237 15.2 17.9 0.86 (0.65, 1.13) South America 0.045 Europe / 13859 8.8 11.0 0.80 (0.72, 0.90) Middle East / Africa 0.01

North America 1814 11.9 9.6 1.25 (0.93, 1.67)

0.5 1.0 2.0

Ticagrelor Clopidogrel better better Identification of “Eligible” and “Ineligible” Patients in the TOPCAT Trial

Discordant total event rates Assessing Treatment Effects with Traditional Clinical Trials

▪ Highly controlled environment with well-practiced quality standards ▪ Randomization and blinding limit to rigorously test the hypothesis of the trial and ascertain treatment effects ▪ Investigators and trial participants may not be representative of routine practice situations so results may not be generalizable ▪ Several challenges continue to potentially confound results from traditional clinical trials Estimating Treatment Effects Using Observational Data

Laine E. Thomas, PhD Scroll down to install Mentimeter - Audience Interaction Tool. Once the installation is complete, you can find it on the Insert tab under My Add-ins.

______

Mentimeter - Audience Interaction Tool

Mentimeter is a easy way to create a fun, interactive and productive meeting or event.

______

Don't see a way to get started with Mentimeter - Audience Interaction Tool? Return to your Internet Browser for help. Closed your browser window? Copy this URL into your internet browser: https://store.office.com/destroute?osid=f7ff1762-dca5-4447-8090-df2f6fdd1548&ai=WA104379261&rs=en-US&ta=HE We’ve always needed observational studies

Once upon a time… ▪ 1983 Feinstein A. (1983) An Additional Basic Science for Clinical Medicine II. The Limitations of Randomized Trials. Annals of Internal Medicine, 99: 544-550.

▪ 2017 Frieden, T.R. (2017) Evidence for Health Decision Making – Beyond Randomized Controlled Trials. The New England Journal of Medicine, 377:465-75. Why? Can we believe observational comparisons? Two DCRI examples:

▪ Atrial Fibrillation – Warfarin prevents stroke, but is known to increase bleeding risk – Outcome: ISTH major bleeding – Trial results: elevation is approximately 2 fold – Observational data: ORBIT (DCRI) and GARFIELD (Thrombosis Research Institute) Registries

▪ Prevention of CV disease – Statins reduce the risk of cardiovascular events – Outcome: MI/Stroke/CV death – Trial results: reduction is approximately 25% – Observational data: Framingham Heart Study Offspring Cohort Failure?

Treatment effects estimated in observational data disagree with clinical trials.

Clinical Trial Observational HR *adjusted HR 0.99 Warfarin on bleeding ≈ 2.00 (CI: 0.81-1.19)

0.91 Statins on CV events ≈ 0.76 (CI: 0.61-1.34)

* Adjustment via inverse propensity weighting (IPW)

Is the analysis too confounded? Success!

Treatment effects estimated in observational data agree with clinical trials.

Clinical Trial Observational HR *adjusted HR 1.49 Warfarin on bleeding¥ ≈ 2.00 (CI: 1.03, 2.18)

0.74 Statins on CV events* ≈ 0.76 (CI: 0.56, 0.99)

Same treatment, same outcome, same adjustment variables! Same*/similar¥ data sources. What changed??? Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding ? ? Interventions and endpoints are well-defined/measured Measured confounders are balanced (after IPW) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree)

Do we measure all of the important differences between treated an untreated patients? Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding Interventions and endpoints are well-defined/measured Measured confounders are balanced (after propensity adjustment) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree) Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding Interventions and endpoints are well-defined/measured Measured confounders are balanced (after propensity adjustment) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree) Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding Interventions and endpoints are well-defined/measured Measured confounders are balanced (after propensity adjustment) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree)

▪ The same measured variables were successfully balanced (made comparable between treated and untreated patients) ▪ Confounding does not explain the difference in results Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding Interventions and endpoints are well-defined/measured Measured confounders are balanced (after propensity adjustment) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree)

▪ This is where the analyses start to depart ▪ The unsuccessful analyses evaluated “current users” Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding Interventions and endpoints are well-defined/measured Measured confounders are balanced (after propensity adjustment) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree) /

▪ All patients in ORBIT/GARFIELD were reasonably eligible for Warfarin ▪ Not all patients in Framingham were reasonable candidates for statins – Analysis differed in eligibility criteria Fundamental principles for causal inference

Failure Success Principles No unmeasured confounding Interventions and endpoints are well-defined/measured Measured confounders are balanced (after propensity adjustment) New user design (follow-up begins immediately after treatment initiation) Equipoise (clinical uncertainty remains – at least to a degree) /

▪ Clinical trials have to meet all of these criteria! Not just randomization. ▪ Observational studies can meet all of these criteria. Do they …? Estimating Treatment Effects Using Pragmatic Clinical Trial Data

Matthew T. Roe, MD, MHS Leveraging Real-World Evidence for Pragmatic Trials

▪ Types of real-world data (RWD) – EHR – Claims – Registries – mHealth

▪ Efficient linking and analysis of RWD – Standardized informed consent guidelines, data use agreements, and data governance practices – Common data models and data quality standards – Well-characterized methodologies

Image: FDA (2013) Strengthening Our National System for Medical Device Postmarket Surveillance: Updates and Next Steps. Traditional vs. Pragmatic Trials

▪ Ideal Population ▪ Routine Population ▪ Narrow, selective investigator pool ▪ Larger pool of investigators ▪ Multiple, diverse countries ▪ Single or few countries ▪ Ideal/Perfect Care ▪ Usual Care ▪ Blinding ▪ Un-blinded ▪ Placebo ▪ Active control ▪ “Stand Alone” Data Collection ▪ Centralized data collection (EHR, claims, PRO) Common linking factor for both types of trials is randomization Quality Principles for Pragmatic Clinical Trials

1. Is a relevant question being addressed? 2. Protocol that is clear, practical, and focused 3. Adequate number of events to answer question with confidence 4. Conducted in a routine practice setting to make results generalizable 5. Participants have proper randomization to study treatments 6. Reasonable assurance that patients receive (and stay on) randomized treatment 7. Reasonably complete follow-up and ascertainment of primary outcomes (and other key outcomes assessed) 8. Plan for ongoing measurement, feedback, and improvement of quality measures during trial conduct 9. Safeguards against bias in determining clinically relevant outcomes 10.Protection of the rights of trial participants -Berdan LG, et al. 2016 ADAPTABLE Study Design

Patients with known ASCVD + ≥ 1 “enrichment factor”

Patients identified by Clinical Data Research Networks (CDRNs) through EHR searches using a computable phenotype that classifies inclusion/exclusion criteria

Patients provided with trial information and link to e-consent on a web portal;† Randomized, open-label treatment assignment provided directly to patient

ASA 81 mg QD ASA 325 mg QD

Electronic patient follow-up: Every 3 or 6 months Supplemented with searches of EHR/CDM/claims data

Duration: Enrollment over 24 months; maximum follow-up of 30 months Primary endpoint: † Participants without internet Composite of all-cause mortality, hospitalization access will be consented and for MI, or hospitalization for stroke followed via a parallel system. Primary safety endpoint: Hospitalization for major bleeding ClinicalTrials.gov: NCT02697916 Site Enrollment Metrics from ADAPTABLE

Started Enrollment CDRN Site Site Activated Total Enrolled Enrollment Rate/Month

Mid-South Vanderbilt 4/18/2016 April 1,135 63.1 Mid-South Duke 11/9/2016 November 653 59.4 Mid-South UNC 1/13/2017 April 238 39.7 OneFlorida U of Florida 11/1/2016 November 313 28.5 PaTH UPMC 7/18/2016 August 370 26.4 GPC Marshfield Clinic 11/1/2016 February 181 22.6 NYC-CDRN Montefiore 11/9/2016 November 230 20.9 CAPriCORN U of Chicago 2/16/2017 February 152 19.0 REACHnet Ochsner 4/18/2016 April 316 17.6 PaTH Penn State 9/23/2016 October 201 16.8 PaTH Utah 9/23/2016 October 202 16.8 NYC-CDRN Weill Cornell 3/8/2017 March 117 16.7 GPC MCW 11/9/2016 January 125 13.9

Despite impressive enrollment rates, only 4% of approached patients enroll in the trial VALIDATE

• 25 PCI centers out of 29 in Sweden participated in the trial • 47.8% (6006 of 12,561) of all patients STEMI or NSTEMI in Sweden presenting at enrolling hospitals with an initial diagnosis of STEMI or NSTEMI planned for PCI were randomized. ENROLLED • Of all patients potentially eligible for enrollment, 70.0% (6006 of 8585) were randomized. Baseline Characteristics and Clinical Presentation Screened, not Bivalirudin Heparin randomized* N 3004 3002 6555 STEMI, (%) 50.0 50.0 35.4 Male sex, (%) 74.2 72.5 69.5 Age (years), median (Q1-Q3) 68.0 68.0 71.0 BMI, median (Q1-Q3) 26.8 26.9 26.9 Weight <60kg, (%) 4.6 5.2 7.2 Current smoker, (%) 23.8 23.7 20.8 Diabetes, (%) 16.3 16.9 24.9 Hypertension, (%) 51.8 51.6 62.2 Hyperlipidemia, (%) 31.7 31.2 44.1 Previous MI, (%) 16.3 16.1 27.9 Previous PCI, (%) 15.2 14.2 21.9 Previous CABG, (%) 5.1 4.7 10.0 *All patients with an Previous stroke, (%) 3.8 4.2 7.1 initial diagnosis of CPR before arrival, (%) 0.9 0.7 1.3 STEMI or NSTEMI not Killip class II-IV, (%) 3.6 2.9 7.9 included. 10/2016 Accuracy and Generalizability of Assessing Treatment Effects with Pragmatic Trials

▪ Randomization in PCTs minimizes selection and treatment biases that occur commonly in routine practice – Non-blinded treatment allocation may bias treatment compliance, endpoint ascertainment, and patient retention ▪ Large-scale recruitment of participants from a more diverse pool of investigators in PCTs may result in more representative patient populations but many patients still not enrolled ▪ Synergism between traditional and pragmatic trials for assessing treatment effects of , so “hybrid” trials may be the most promising path forward

ClinicalTrials.gov: NCT02697916 Designing observational studies that emulate clinical trials

Laine E. Thomas, PhD Attributes of clinical trials

▪ No unmeasured confounding: Measure variables that matter to treatment selection and outcomes ▪ Define endpoints and interventions carefully ▪ Use appropriate adjustment techniques – Such as propensity score methods – Check balance: did adjustment work? ▪ New user design ▪ Equipoise ▪ Sample size Prevalent User Design

Treated Status Tx …………… Bleed ……………………………0………………………………………………… Tx .……………………………………….Tx………………………………….……………… Tx ……………… Bleed ……..0………………………………….……………… …………………………………………….0………………………………………………… Tx ……………………………… Bleed ……0………………………………………………… Tx ……………………..Tx………………………………………………… ………………………………………………………0…………………………………………………. Calendar time Registry start time

follow-up Prevalent User Design

What kind of treated patients Treated Status are left? Tx …………… Bleed ……………………………0………………………………………………… Tx .……………………………………….Tx………………………………….……………… Tx ……………… Bleed ……..0………………………………….……………… …………………………………………….0………………………………………………… Tx ……………………………… Bleed ……0………………………………………………… Tx ……………………..Tx………………………………………………… ………………………………………………………0…………………………………………………. Calendar time Registry start time

follow-up Prevalent User Design

What kind of treated patients Treated Status are left? Tx …………… Bleed ……………………………0………………………………………………… Tx .……………………………………….Tx………………………………….……………… Tx ……………… Bleed ……..0………………………………….……………… The type who don’t bleed …………………………………………….0………………………………………………… Tx ……………………………… Bleed ……0………………………………………………… Tx ……………………..Tx………………………………………………… ………………………………………………………0…………………………………………………. Calendar time Registry start time

follow-up Selection bias in prevalent user design

Clinical Trial Observational HR *adjusted HR 0.99 Warfarin on bleeding ≈ 2.00 (CI: 0.81-1.19)

* Adjustment via inverse propensity weighting (IPW)

Selection bias is not addressed by our typical adjustment for confounders. Our adjustment methods don’t (can’t) do much about this. The variables required for adjustment are not the usual confounders, but all unobserved causes of bleeding! Famous example in the Nurses Health Study

NHS Women’s Health NHS *adjusted HR Initiative *adjusted HR Prevalent User New User Design Randomized Trial Design

Hormone replacement 0.67 1.68 1.42 on (CI: 0.54-0.85) (CI: 1.15-2.45) (CI: 0.92-2.20) Coronary Heart Disease

Hernan, M. A., A. Alonso, R. Logan, F. Grodstein, K. B. Michels, W. C. Willett, J. E. Manson and J. M. Robins (2008). "Observational Studies Analyzed Like Randomized Experiments An Application to Postmenopausal Hormone Therapy and Coronary Heart Disease." Epidemiology 19(6): 766-779. New User Design

▪ Why not? – New user design has too few patients – Adjustment variables need to be collected longitudinally (prior to initiation) ▪ Solutions – Big QI Registries – if they collect longitudinal treatments – EHR / Claims / Large consortia ▪ Methods – Sequential stratification – Risk-set matching – Etc. Methods: New User Design

Pair 1 Tx………… Bleed ……………………………………………………………………………… None…………..Tx? ……………………..……………………………………………………… Follow-up pair 1

Pair 2 Tx .……………………………………….………………………………….……………… None……Tx?….………… Bleed ……..………………………………….……………… Follow-up pair 2 Pair 3 Tx ……………………..……………………………………………………. None……………………………………………………………………….. Follow-up pair 3 Time since eligibility Examples: New User Design

▪ ORBIT-AF (Registry)

Allen LA, Fonarow GC, Simon DN, Thomas LE, Marzec LN, Pokorney SD, Gersh BJ, Go AS, Hylek EM, Kowey PR, Mahaffey KW, Chang P, Peterson ED, Piccini JP, ORBIT-AF Investigators. Digoxin use and subsequent outcomes among patients in a contemporary atrial fibrillation cohort. J Am Coll Cardio. 2015 Jun 30; 65(25): 2691-8.

Pokorney SD, Kim S, Thomas L, Fonarow GC, Kowey PR, Gersh BJ, Mahaffey KW, Peterson ED, Piccini JP; Outcomes Registry for Better Informed Treatment of Atrial Fibrillation (ORBIT-AF) Investigators. Cardioversion and subsequent quality of life and natural history of atrial fibrillation. Am Heart J. 2017 Mar; 185:59-66 ▪ ARISTOTLE (Clinical Trial)

Lopes RD, Rordorf R, De Ferrari GM,Leonardi S, Thomas L, Wojdyla DM, Ridefelt P, Lawrence JH, De Caterina R, Vinereanu D, Hanna M, Flaker G, Al-Khatib SM, Hohnloser SH, Alexander JH, Granger CB, Wallentin L. Digoxin and Mortality in Patients with Atrial Fibrillation (submitted) Equipoise – Does it matter in Observational Research too? In whom would it be ethical to conduct a clinical trial?

? ? Extended Generalizability; an Advantage of Observational Research

• Over what dimension? • Geographically representative • SES … • Clinical risk factors

• How broad? • Even when treatment is known? ? • Maybe we don’t trust current practice? Need more evidence?

• There is very little evidence in the tails – regardless of whether we want it • High variance • Bias? Considering Equipoise

▪ There may be a tradeoff when we ask observational data to answer “harder” questions where the data is sparse and biases are strong ▪ Solutions – Keep answering “stretch” questions where needed – Check – how far are we “stretching” the available data to answer something that is poorly informed by the available data? ▪ Methods – Check the propensity distribution > refine the population scientifically – Target different populations • Average treatment effects among the treated (ATT) • Overlap weighting (ATO) Methods: Considering Equipoise Design rather than disclaimer

Disclaimer: “As with all observational treatment comparisons we can not rule out the possibility that associations are biased by unmeasured confounding.” ▪ Appropriately cautious ▪ Should not eclipse the potential to do good causal inference in observational data ▪ Clinical trials have a lot of strengths beyond randomization ▪ Observational research can emulate those strengths Discussion Moderated by- Kevin Anstrom, PhD Director, Data Solutions for Clinical Trials Associate Director, Biostatistics, DCRI Associate Professor of Biostatistics and Bioinformatics, Duke