Transparent Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
Shaping the Future of Drug Development Transparent machine learning Alind Gupta Real-World and Advanced Analytics Machine learning • Algorithms that learn from data + evaluation criteria • Key challenge – identify the (unobservable) data-generating distribution from sample Linear regression Neural network Research areas • Prediction • Knowledge discovery Y • Anomaly detection • Summarization Which model better • Optimal decision-making approximates data distribution P(X,Y)? X 4/28/2020 Cytel Inc. 2 Potential applications in clinical research Shah, Pratik, et al. "Artificial intelligence and machine learning in clinical development: a translational perspective." NPJ digital medicine 2.1 (2019): 1-5. 4/28/2020 Cytel Inc. 3 Black-box machine learning • High capacity, low interpretability models (e.g. deep neural networks) • Problems: • Biases and limitations? • Inability to audit decision-making • Difficult to troubleshoot • May not engender trust in users, regulators ? Input Black box Prediction model 4/28/2020 Cytel Inc. 4 Transparency is important EU GPDR • Individual’s “Right to explanation” about automated decisions FDA guidance for Good Machine Learning Practices (GMLP) • “[A]ppropriate validation, transparency” to assure “safety and effectiveness” • Focus on validation with “clinicians in the loop” where necessary https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf 4/28/2020 Cytel Inc. 5 Limitations of black boxes Adversarial attacks Panda • What patterns are black box models really (57%) representing? + IBM Watson for Oncology • “Overpromising and under-delivery” = COMPAS • As good as random people on the internet at Gibbon predicting recidivism (99%) Strickland, E. (2019). IBM Watson, heal thyself: How IBM overpromised and underdelivered on AI health care. IEEE Spectrum, 56(4), 24-31. Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science advances, 4(1), eaao5580. Buolamwini, J., & Gebru, T. (2018, January). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency (pp. 77-91). Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017, April). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security (pp. 506-519). 4/28/2020 Cytel Inc. 6 Bayesian networks A transparent and flexible machine learning method 4/28/2020 Cytel Inc. 7 Key idea Performing computations on a Directed Acyclic Graph (DAG) Data DAG structure DAG parametrization (Maximum likelihood, Bayesian Subject-matter estimation) expert 4/28/2020 Cytel Inc. 8 Applications of graphical models • Risk prediction Causal inference • Bayesian inference • Computer vision • NLP • Gene networks 4/28/2020 Cytel Inc. 9 Use case: Immunotherapy for advanced cancer Challenges • High individual-level heterogeneity in response to treatment • Subsets show durable response, severe adverse events • Short follow-up • Multiple outcomes of interest Uses for machine learning • Identifying predictors of response • Long-term predictions for health economic evaluations for HTA (better than curve-fitting?) • Informing future trial design, surrogate endpoints • Patient simulation with time-varying interventions 4/28/2020 Cytel Inc. 10 Multivariate prediction model • Based on IPD from RCT • 3 outcomes over 3 years DAG learning • MLE + bootstrapping + model averaging • Risk Constrained edge orientation score X based on causal tiers • Comparison to known/expected relationships 4/28/2020 Cytel Inc. 11 Classification performance Results for other outcomes Outcome 1 4/28/2020 Cytel Inc. 12 External validation using RWD • To assess generalizability and limitations • Problem – what to do about missing covariates? Good • HRQoL is highly prognostic in RCT real-world but not present in RWD generalizability All variables Common subset Real-world data 4/28/2020 Cytel Inc. 13 External validation by key subgroups Subgroup A Subgroup B Moderate Good real-world real-world generalizability generalizability 4/28/2020 Cytel Inc. 14 Prognostic variables by treatment group Treatment A Treatment B Differentially prognostic variable Prognostic value Prognostic Variables ordered by increasing prognostic value 4/28/2020 Cytel Inc. 15 Communication 4/28/2020 Cytel Inc. 16 Dynamic Bayesian networks Extending Bayesian networks for time modelling 4/28/2020 Cytel Inc. 17 Predicting trends in time Challenges • Extrapolation in time • Time-varying covariates • How prognostic are changes in variables? • Time-varying interventions Initial distribution Time replication 4/28/2020 Cytel Inc. 18 Survival curves Time 4/28/2020 Cytel Inc. 19 Prediction performance from baseline Treatment group A Treatment group B X X Plateau? 4/28/2020 Cytel Inc. 20 Prognostic value of changes Biomarker A high → high Survival Death Biomarker A med → med High levels Biomarker B low → med Intermediate levels Biomarker A high → med Biomarker B high → low Low levels Biomarker A med → low Month Month T T +1 4/28/2020 Cytel Inc. 21 Future directions • Relaxing the Markov assumption • Latent variable models • Adding outcomes • PFS, ORR, TFS 4/28/2020 Cytel Inc. 22 Conclusion • Bayesian networks are transparent, interpretable models • Useful for multivariate prediction • Useful for missing data problems + small data • Useful as time models for dynamic processes 4/28/2020 Cytel Inc. 23 NEW BOOK Introduction to Global Products and Services Adaptive Design & Master Protocols COMING 2021 Functional Statistical Strategic Project-Based Services Software Consulting Services Provision (FSP) Industry standard for trial design, PhD statisticians expert in Reliable Biometrics service Creation of dedicated teams including adaptive (e.g. East, innovative design & complex provider delivering high quality, operating within/as an extension StatXact) Operations software (e.g. statistical questions on time of the client’s own biostatistics & ACES, EnForeSys, FlexRandomizer) programming, data management Experts in Data Science, PK/PD, Lead staff with over 15 years and PK/PD teams All 25 top biopharma companies, Enrolment & Event Forecasting, industry experience on average the FDA, EMA & PMDA use our Portfolio/Program Optimization Leader in offshoring of Biometrics software (NPV) Including biostatistics & competencies programming, ISC, data management, PK/PD analysis, medical writing NEW BOOK Introduction to End-to-End Biometric Solutions for All Phases Development Adaptive Design & Master Protocols Stage of Development COMING 2021 Protocol Design Study Conduct Reporting & Submission Cytel’s Statistical and Adaptive Trial Software Cytel’s Clinical Research Services Pharmacometrics & Strategic Program Planning eCRF Development Final Study Reporting Pharmacology (QPP) Study / Adaptive Design Data Management Real World Analytics CDISC migration Exploratory & Predictive Analyses Integrated Summaries of Safety & Biostatistics Interim Analyses Efficacy Simulation & Modeling Feasibility & Patient Statistical Programming Randomization eCTD Reporting for Submission Recruitment Modeling Data Monitoring Committee Health Economics and Outcomes Regulatory Support & Data Monitoring Representation Support Research (HEOR) All of Cytel’s services are offered across all four phases of drug development and across a multitude of therapeutic areas 25.