Machine Learning for Personalised Healthcare: Advances and Challenges
Danielle Belgrave Researcher Healthcare Machine Learning
@DaniCMBelg My Research
Patient 1
Patient 2
Patient 3
Patient 4
0 2 4 6 8 10 12 14 16 18 20 22 24 Time (hours) Latent Variable Modelling Longitudinal Data Analysis Missing Data
X Y
Z
Patient-Centric Approach Causality Multidisciplinary Top 8 Challenges: DS in Healthcare
1. Address some of the technical challenges facing the community of machine learning for doing impactful healthcare research
2. Presentation of current solutions
3. Steps for future research Challenge # 1: Estimating Treatment Effects
Intervention
Population is split into 2 Outcomes for both groups by random allocation groups are measured
Patient Group Control
= Cured = Still Diseased Supervised Learning
Intervention Control p1 = proportion in the intervention group who are cured p2 proportion in the control group who are cured
H0: p1 - p2 = 0 e vs
H1: p1 - p2 ≠ 0
Mean: p1 - p2
푝 (1−푝 ) 푝 (1−푝 ) = Cured = Not cured Variance: 1 1 + 2 2 푛1 푛2
푝 − 푝 Assume well-labelled groups Test Statistic: Z = 1 2 1 1 푝(1−푝)( + ) Machine recognises a new example 푛1 푛2 Classification, regression Challenge #2: Heterogeneous Populations
Patient Group
Same diagnosis same prescription Understanding Heterogeneity
Drug NOT toxic Drug toxic but and beneficial Patient Group NOT beneficial
Same diagnosis same prescription
Drug toxic but Drug NOT toxic and beneficial NOT beneficial Supervised Learning Unsupervised Learning is not Enough
Not cured Cured Patient Group
Assume well-labelled groups Infer patterns/ discover underlying Machine recognises a new example data structure from a dataset without reference to labelled outcomes Classification, regression Clustering, Latent variable modelling Endotype Discovery to Understand Heterogeneity
To identify subgroups of complex disease risk
Treatment outcome explained by distinctive underlying mechanism
Foundation of Stratified Medicine
Seeking better-targeted interventions Accounting for Heterogeneity
Individualised Healthcare Outcomes
• Probabilistic Graphical Models
• Reinforcement Learning
• Deep Learning Understanding Heterogeneity Through Probabilistic Graphical Modelling Identifying Heterogeneous Patient Groups
Parsimonious description of the data inferred from what is observed Probabilistic Programming
Probabilistic reasoning system The evidence contains The probabilistic model specific information expresses general about a situation Probabilistic knowledge about a model situation
The inference algorithm uses the Evidence model to answer Inference queries given evidence Algorithm Queries
The answers to queries are framed as probabilities of Answer different outcomes The queries express the things that will help The basic components of a probabilistic reasoning system you make a decision Adapted from Pfeffer, Avi. "Practical probabilistic programming." International Conference on Inductive Logic Programming. Springer Berlin Heidelberg, 2010. Eco-systems for Probabilistic Programming
Infer.NET Edward Observed values (data, priors) Probabilistic Pyro program Stan
Infer.NET Inference Engine
Algorith Infer.NET C# Algorith C# m compiler compiler m execution
Probability distributions Understanding Heterogeneity Motivating Probabilistic Graphical Examples Modelling Example 1: Heterogeneity in Asthma Asthma
Phenotypes: Observable Manifestations of Disease
Asthma Medication Exacerbations Allergy
Poor Wheeze Asthma Lung Symptoms Function
Don’t Grow out of Respond to Asthma Late Severity Respond to Asthma treatment in Childhood treatment
Endotypes? Different Diseases With Different Causes Causal Mechanisms of Asthma and Allergy
Manchester Longitudinal birth cohort ~2000 children
Progression of allergy: Eczema -> Asthma -> Rhinitis
Symptoms Causally Linked
Prevention strategy: Bristol Target children with eczema to reduce progression to asthma Longitudinal birth cohort and rhinitis ~10000 children Hidden Markov Model: “Allergic March”
Eczema Age 1 Age 3 Age 5 Age 8 Age 11 Class Eczema State Eczema State Eczema State Eczema State Eczema State
Wheeze Age 1 Age 3 Age 5 Age 8 Age 11 Class Wheeze State Wheeze State Wheeze State Wheeze State Wheeze State
Rhinitis Age 1 Age 3 Age 5 Age 8 Age 11 Class Rhinitis State Rhinitis State Rhinitis State Rhinitis State Rhinitis State
Children (n=9801)
t e r A h e s s t h m n c a y Latent Class Disease a a n d u d M A l l e r g y S t Profile Longitudinal Latent Disease Profile
Latent State Latent State Latent State Latent State Latent State Age 1 Age 3 Age 5 Age 8 Age 11
Class = 1,….,k
Eczema Eczema Eczema Eczema Eczema Age 1 Age 3 Age 5 Age 8 Age 11
Wheeze Wheeze Wheeze Wheeze Wheeze Age 1 Age 3 Age 5 Age 8 Age 11
Rhinitis Rhinitis Rhinitis Rhinitis Rhinitis Age 1 Age 3 Age 5 Age 8 Age 11
Children (n=9801)
Latent Class Disease Profile Disaggregating Symptom Heterogeneity
The “Allergic March” reflects patterns at the population level, rather than the natural covariance of symptoms within individuals’ life courses Developmental profiles of are heterogeneous
Danielle CM Belgrave, Raquel Granell, Angela Simpson, John Guiver, Christopher Bishop, Iain Buchan, A. John Henderson, and Adnan Custovic. Developmental Profiles of Eczema, Wheeze, and Rhinitis: Two Population-Based Birth Cohort Studies. PlosMedicine 2014. Example 2: Heterogeneity in Complex Chronic Diseases Scleroderma
Aim: To predict a function of time that models the future trajectory of a single target clinical marker tracking a disease process of interest. Heterogeneity in Lung Function
Schulam, Peter, and Suchi Saria. "Integrative analysis using coupled latent variable models for individualizing prognoses." The Journal of Machine Learning Research 17.1 (2016): 8244-8278. Probabilistic Models for Individualised Care
Schulam, Peter, and Suchi Saria. "Integrative analysis using coupled latent variable models for individualizing prognoses." The Journal of Machine Learning Research 17.1 (2016): 8244-8278. Reinforcement Learning in Motivating Healthcare Examples Individualised Treatment Effects
Task of Optimising Sequential Decision-Making
Unobserved responses
Observed decisions and response
Unobserved Mechanical responses Ventilation? Sedate? Vasopressors Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., & Celi, L. A. (2019). Guidelines for reinforcement learning in healthcare. Nature medicine, 25(1), 16-18. Reinforcement Learning in Healthcare What action maximises the reward
Action (A) - State (S) - Reward (R) - Policy (π) - Value (V)
Expected long-term return of the current state sunder policy π Vπ(s) Q-value or action-value (Q) Long-term return of the current state s, taking action a under policy π Qπ(s, a)
v(s) = E[Rt+1 + λv(St+1)| St = s]
2 Qπ(s, a) = E[rt+1 + λrt+2 + λ rt+3 | s, a]
= E[r + λ Qπ(s’, a’) | s, a]
Q*π(s, a) = E[r + λmax Q*(s’, a’) | s, a] Successful Applications of RL in Healthcare
1. RL applied to optimizing antiretroviral therapy in HIV Parbhoo, S., Bogojeska, J., Zazzi, M., Roth, V. & Doshi-Velez, F. AMIA Summits on Translational Science Proceedings 2017, 239 (2017).
2. RL applied to tailoring antiepilepsy drugs for seizure control Guez, A., Vincent, R. D., Avoli, M. & Pineau, J. Treatment of epilepsy via batch- mode reinforcement learning. In Proceedings of the Twenty-Tird AAAI Conference on Artifcial Intelligence 1671–1678 (AAAI, 2008).
3. RL applied to interventions in ICU Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. & Faisal, A. Nat. Med. 24, 1716– 1720 (2018). Confounding factors: Estimated effects of medication may be diminished if severely ill or high-risk patients are given a particular drug
Effective sample size larger if learned policies are close to the clinician policies – RL better for refining existing practices rather than discovering new treatment approaches
If the rewards we are trying to maximise are too simplistic, will the system behave as expected in real life
Downfalls of RL in Healthcare Deep Learning
Exceptionally effective at learning patterns of data
Utilizes learning algorithms that derive meaning out of data by using a hierarchy of multiple layers that mimic the neural networks of our brain
If you provide a system with tons of information, it begins to understand it and respond in useful ways Challenge #3: Heterogeneous Data Types Data “In-the-Wild”
The Richer the Data Types, the more we need other methods Meta-challenges to Estimating Treatment Effect
Heterogeneous Generalisability Causal Discovery Patient Populations
Data “In-the-Wild” Missing Data Interpretability Challenge #4: Causality Causal Reasoning
The questions that motivate most studies in the health, social and behavioral sciences are not associational but causal in nature.
Before an association is assessed for the possibility that it is causal, other explanations such as chance, bias and confounding have to be excluded
Require some knowledge of the data-generating process - cannot be computed from the data alone, nor from distributions governing data
Aim: to infer dynamics of beliefs under changing conditions, for example, changes induced by treatments or external interventions. Pearl, Judea. "Causal inference in statistics: An overview." Statistics surveys 3 (2009): 96-146. Efficacy and mechanism evaluation: Causal Framework for Personalising Health U
Mediator
Predictive biomarker (moderator)
Outcomes Random Allocation
Prognostic biomarker (risk factor) Example: Personalisation of Cancer Treatment
U
Tumor Size
Genetic Marker
Outcome Treatment (Survival)
Prognostic biomarker (risk factor) Bradford-Hill Principles of Causality
Plausibility Consistency Temporality Cause associated with disease in Does causation make sense Cause precedes disease different population and studies
Strength Specificity Dose-Response Cause strongly associated Does the cause lead to Greater exposure to cause, with disease a specific effect higher the risk of disease What can Machine Learning Offer The Deconfounder
• Capture dependency structure using a latent variable model
• Assumption 1: the fitted latent-variable model is a good model of the assigned causes • Testable
• Assumption 2: there are no unobserved single-cause confounders, variables that affect one cause and the potential outcome function • Not testable, it is weaker than an assumption of no unobserved confounders Challenge #5: Generalizability Sparsity in Health Data
Major challenge for truly generalizable and scalable AI in healthcare is maximizing information utility for public health impact when that information (observational or clinical-context data) is sparse Missing data Inadequately sampled data Data that does not represent the diversity of a population
Generalisability: Training datasets that are representative of the diversity of the population as well as the heterogeneity of health conditions.
Transfer learning: potential to Maximise utility of available data Improve model’s ability to generalise Transfer Learning for Data Sparsity
Good quality healthcare data is expensive and very often sparse Aim: Maximizing information by using multiple data sources Challenge: Feature mismatch: features in different datasets may vary Challenge: Distribution Mismatch: differing patient populations across different hospitals GAN architectures to efficiently enlarge the dataset Better predictive models than if we simply used the target dataset Jinsung Yoon, James Jordon and Mihaela van der Schaar. “RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks.” arXiv preprint arXiv:1802.06403 (2018). RadialGAN Transfer Learning for Data Sparsity
Z: Latent space X(i) x Y: ith domain Gi, Fi, Di: Decoders, Encoders and Discriminator of the ith domain
The ith domain is translated to th the j domain via Z using Fi and Gj
Jinsung Yoon, James Jordon and Mihaela van der Schaar. “RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using Generative Adversarial Networks.” arXiv preprint arXiv:1802.06403 (2018). Challenge #6: Missing Data Missing Data Missing Completely At Random (MCAR) The probability of data being missing does not depend on the observed or unobserved data e.g. logit(pit) = θ0 Missing At Random (MAR) The probability of data being missing does not depend on the unobserved data, conditional on the observed data e.g. Children with missing wheeze data have better lung function e.g. logit(pit) = θ0 + θ1ti or logit(pit) = θ0 + θ2y0 Missing Not At Random (MNAR) The probability of data being missing does depend on the unobserved data, conditional on the observed data. e.g. Children with missing lung function have better lung function e.g. logit(pit) = θ0 + θ3yit Mason, Alexina, Nicky Best, Sylvia Richardson, and IAN PLEWIS. "Strategy for modelling non-random missing data mechanisms in observational studies using Bayesian methods." Journal of Official Statistics (2010) Alexina Mason. “Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies” PhD Thesis (2009) Missing Completely at Random
β θ
µ푖 푥푖
σ² 푦푖 푝푖
푚푖 Individual i Model of Model of Interest Missingness Missing at Random
β θ
µ푖 푥푖
σ² 푦푖 푝푖
푚푖 Individual i Model of Model of Interest Missingness Missing Not at Random
β θ
µ푖 푥푖
σ² 푦푖 푝푖
푚푖 Individual i Model of Model of Interest Missingness Challenge #7: Interpretability Challenge #8: Fairness
“People will only use technology they trust.” Brad Smith President & Chief Legal Officer Microsoft Corporation Problem Specific Challenge
Tutorial: 21 fairness definitions and their politics Arvind Narayanan Update: this tutorial was presented at the Conference on Fairness, Accountability, and Transparency, Feb 23 2018. Watch it here.
Computer scientists and statisticians have devised numerous mathematical criteria to define what it means for a classifier or a model to be fair. The proliferation of these definitions represents an attempt to make technical sense of the complex, shifting social understanding of fairness. Thus, these definitions are laden with values and politics, and seemingly technical discussions about mathematical definitions in fact implicate weighty normative questions. A core component of these technical discussions has been the discovery of trade-offs between different (mathematical) notions of fairness; these trade-offs deserve attention beyond the technical community. https://towardsdatascience.com/a-tutorial-on-fairness-in-machine-learning-3ff8ba1040cb Making AI work in healthcare
Design AI to Earn Trust z
Fairness Reliability Privacy & Inclusiveness & Safety Security
Transparency
Accountability Reliability & Safety
Evaluate training data
Test extensively (and enable a user feedback loop)
Monitor ongoing performance
Design for unexpected circumstances—including nefarious attacks
Human in the loop 53 Privacy & Security Existing privacy laws (e.g. the General Data Protection Regulation) apply
Provide transparency about data collection and use, and good controls so people can make choices about their data
Design systems to protect against bad actors
Use de-identification techniques to promote both privacy and security 54 Inclusiveness
Inclusive design practices to address potential barriers that could unintentionally exclude people
Enhances opportunities people with disabilities
Build trust through contextual interaction
EQ in addition to IQ 55 Transparency
People should understand how decisions were made
Provide contextual explanations
Make it easier to raise awareness of potential bias, errors and unintended outcomes
56 Ultimate Challenge: Defining the Trade-offs between the Challenges
Heterogeneous Generalisability Causal Discovery Accuracy Patient Populations
Data “In-the-Wild” Missing Data Interpretability Fairness Healthcare ML at MSR Cambridge
Javier Alvarez-Valle Danielle Belgrave Chris Bishop Laurence Bourn Isabel Chien David Carter Stephanie Hyland Richard Lowe
Pratik Ghosh Hannah Murfet Jay Nanavati Aditya Nori Kenton O’Hara Konstantina Palla Tim Regan Anton Schwaighofer
Jan Steumer Kenji Takeda Ivan Tarapov Anja Thieme Sebastian Tschiatschek Stefan Wijnen Ted Meeds Richard Turner Thank You!
t e r A h e s s t h m n c a y a a n u d M d A l l e g y S t r Danielle Belgrave [email protected]
@DaniCMBelg