Shaping the Future of Drug Development

Transparent machine learning

Alind Gupta Real-World and Advanced Analytics Machine learning • Algorithms that learn from data + evaluation criteria • Key challenge – identify the (unobservable) data-generating distribution from sample

Linear regression Neural network Research areas • Prediction • Knowledge discovery Y • Anomaly detection • Summarization Which model better • Optimal decision-making approximates data distribution P(X,Y)?

X 4/28/2020 Cytel Inc. 2 Potential applications in

Shah, Pratik, et al. "Artificial intelligence and machine learning in clinical development: a translational perspective." NPJ digital medicine 2.1 (2019): 1-5.

4/28/2020 Cytel Inc. 3 Black-box machine learning • High capacity, low interpretability models (e.g. deep neural networks) • Problems: • Biases and limitations? • Inability to audit decision-making • Difficult to troubleshoot • May not engender trust in users, regulators ?

Input Black box Prediction model

4/28/2020 Cytel Inc. 4 Transparency is important

EU GPDR • Individual’s “Right to explanation” about automated decisions

FDA guidance for Good Machine Learning Practices (GMLP) • “[A]ppropriate validation, transparency” to assure “safety and effectiveness” • Focus on validation with “clinicians in the loop” where necessary

https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf

4/28/2020 Cytel Inc. 5 Limitations of black boxes

Adversarial attacks Panda • What patterns are black box models really (57%) representing? + IBM Watson for Oncology

• “Overpromising and under-delivery” = COMPAS • As good as random people on the internet at Gibbon predicting recidivism (99%)

Strickland, E. (2019). IBM Watson, heal thyself: How IBM overpromised and underdelivered on AI health care. IEEE Spectrum, 56(4), 24-31. Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1), eaao5580. Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security (pp. 506-519).

4/28/2020 Cytel Inc. 6 Bayesian networks A transparent and flexible machine learning method

4/28/2020 Cytel Inc. 7 Key idea Performing computations on a Directed Acyclic Graph (DAG)

Data

DAG structure DAG parametrization (Maximum likelihood, Bayesian Subject-matter estimation) expert

4/28/2020 Cytel Inc. 8 Applications of graphical models

• Risk prediction Causal inference • • Computer vision • NLP • Gene networks

4/28/2020 Cytel Inc. 9 Use case: Immunotherapy for advanced cancer

Challenges • High individual-level heterogeneity in response to treatment • Subsets show durable response, severe adverse events • Short follow-up • Multiple outcomes of interest

Uses for machine learning • Identifying predictors of response • Long-term predictions for health economic evaluations for HTA (better than curve-fitting?) • Informing future trial design, surrogate endpoints • Patient simulation with time-varying interventions

4/28/2020 Cytel Inc. 10 Multivariate prediction model

• Based on IPD from RCT • 3 outcomes over 3 years

DAG learning • MLE + bootstrapping + model averaging • Risk Constrained edge orientation score X based on causal tiers • Comparison to known/expected relationships

4/28/2020 Cytel Inc. 11 Classification performance

Results for other outcomes

Outcome 1

4/28/2020 Cytel Inc. 12 External validation using RWD

• To assess generalizability and limitations • Problem – what to do about missing covariates? Good • HRQoL is highly prognostic in RCT real-world but not present in RWD generalizability

All variables Common subset Real-world data

4/28/2020 Cytel Inc. 13 External validation by key subgroups

Subgroup A Subgroup B

Moderate Good real-world real-world generalizability generalizability

4/28/2020 Cytel Inc. 14 Prognostic variables by treatment group

Treatment A Treatment B

Differentially prognostic

variable Prognostic value Prognostic

Variables ordered by increasing prognostic value

4/28/2020 Cytel Inc. 15 Communication

4/28/2020 Cytel Inc. 16 Dynamic Bayesian networks

Extending Bayesian networks for time modelling

4/28/2020 Cytel Inc. 17 Predicting trends in time

Challenges • Extrapolation in time • Time-varying covariates • How prognostic are changes in variables? • Time-varying interventions

Initial distribution Time replication

4/28/2020 Cytel Inc. 18 Survival curves

Time

4/28/2020 Cytel Inc. 19 Prediction performance from baseline

Treatment group A Treatment group B

X X

Plateau?

4/28/2020 Cytel Inc. 20 Prognostic value of changes

Biomarker A high → high Survival Death Biomarker A med → med High levels Biomarker B low → med

Intermediate levels

Biomarker A high → med

Biomarker B high → low Low levels Biomarker A med → low Month Month T T +1

4/28/2020 Cytel Inc. 21 Future directions

• Relaxing the Markov assumption • Latent variable models • Adding outcomes • PFS, ORR, TFS

4/28/2020 Cytel Inc. 22 Conclusion

• Bayesian networks are transparent, interpretable models • Useful for multivariate prediction • Useful for missing data problems + small data • Useful as time models for dynamic processes

4/28/2020 Cytel Inc. 23 NEW BOOK Introduction to Global Products and Services Adaptive Design & Master Protocols COMING 2021

Functional Statistical Strategic Project-Based Services Software Consulting Services Provision (FSP) Industry standard for trial design, PhD statisticians expert in Reliable Biometrics service Creation of dedicated teams including adaptive (e.g. East, innovative design & complex provider delivering high quality, operating within/as an extension StatXact) Operations software (e.g. statistical questions on time of the client’s own & ACES, EnForeSys, FlexRandomizer) programming, data management Experts in Data Science, PK/PD, Lead staff with over 15 years and PK/PD teams All 25 top biopharma companies, Enrolment & Event Forecasting, industry experience on average the FDA, EMA & PMDA use our Portfolio/Program Optimization Leader in offshoring of Biometrics software (NPV) Including biostatistics & competencies programming, ISC, data management, PK/PD analysis, medical writing NEW BOOK Introduction to End-to-End Biometric Solutions for All Phases Development Adaptive Design & Master Protocols Stage of Development COMING 2021

Protocol Design Study Conduct Reporting & Submission

Cytel’s Statistical and Adaptive Trial Software

Cytel’s Clinical Research Services

Pharmacometrics & Strategic Program Planning eCRF Development Final Study Reporting Pharmacology (QPP)

Study / Adaptive Design Data Management Real World Analytics CDISC migration

Exploratory & Predictive Analyses Integrated Summaries of Safety & Biostatistics Interim Analyses Efficacy Simulation & Modeling

Feasibility & Patient Statistical Programming eCTD Reporting for Submission Recruitment Modeling Data Monitoring Committee Health Economics and Outcomes Regulatory Support & Data Monitoring Representation Support Research (HEOR)

All of Cytel’s services are offered across all four phases of drug development and across a multitude of therapeutic areas

25