The Curing AI for Precision Medicine
Hoifung Poon
1 Medicine Today Is Imprecise
Top 20 drugs 80% non-responders
Wasted 1/3 health spending $750 billion / year
2 Disruption 1: Big Data
2009 2013: 40% 93%
3 Disruption 2: Pay-for-Performance
Goal: 75% by 2020
4 Vemurafenib on BRAF-V600 Melanoma
Before Treatment 15 Weeks
5 Vemurafenib on BRAF-V600 Melanoma
Before Treatment 15 Weeks 23 Weeks
6 Why We Haven’t Solved Precision Medicine?
… ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC … … ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC … … ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC … High-Throughput Data Discovery
Bottleneck #1: Knowledge Bottleneck #2: Reasoning
AI is the key to overcome these bottlenecks
7 Use Case: Molecular Tumor Board
8 www.ucsf.edu/news/2014/11/120451/bridging-gap-precision-medicine Use Case: Molecular Tumor Board
Problem: Hard to scale U.S. 2015: 1.6 million new cases, 600K deaths
902 cancer hospitals
Memorial Sloan Kettering 2016: Sequence: Tens of thousand Board can review: A few hundred
Wanted: Decision support for cancer precision medicine
9 First-Generation Molecular Tumor Board
Knowledge bottleneck E.g., given a tumor sequence, determine: What genes and mutations are important What drugs might be applicable Can do manually but hard to scale
10 Next-Generation Molecular Tumor Board
Reasoning bottleneck E.g., personalize drug combinations Can’t do manually, ever
11 Big Medical Data Decision Support Precision Medicine
Machine Predict Reading Drug Combo
12 13 PubMed
26 millions abstracts Two new abstracts every minute Adds over one million every year
14 Machine Reading
PMID: 123 … VDR+ binds to SMAD3 to form …
PMID: 456 Knowledge … JUN expression Base is induced by SMAD3/4
… ……
15 Machine Reading
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...
16 Machine Reading
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...
human gp41 p70(S6)-kinase IL-10 monocyte GENE GENE GENE CELL 17 Machine Reading
Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...
Involvement REGULATION
Theme Cause
up-regulation REGULATION activation REGULATION
Theme Cause Site Theme
human gp41 p70(S6)-kinase IL-10 monocyte GENE GENE GENE CELL 18 Long Tail of Variations
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… negative regulation 532 inhibited, 252 inhibition, 218 inhibit, 207 blocked, 175 inhibits, 157 decreased, 156 reduced, 112 suppressed, 108 decrease, 86 inhibitor, 81 Inhibition, 68 inhibitors, 67 abolished, 66 suppress, 65 block, 63 prevented, 48 suppression, 47 blocks, 44 inhibiting, 42 loss, 39 impaired, 38 reduction, 32 down-regulated, 29 abrogated, 27 prevents, 27 attenuated, 26 repression, 26 decreases, 26 down-regulation, 25 diminished, 25 downregulated, 25 suppresses, 22 interfere, 21 absence, 21 repress …… 19 Machine Reading
Prior work Focused on Newswire / Web Popular entities and facts Redundancy Simple methods often suffice High-value verticals Healthcare, finance, law, etc. Little redundancy: Rare entities and facts Novel challenges require sophisticated NLP
20 Precision Medicine Machine Reading Challenges Advances
Annotation Bottleneck Distant Supervision
Complex Knowledge Grounded Semantic Parsing
Reasoning Neural Embedding
Distant Supervision with Beyond Sentence Boundary Discourse Modeling
21 Free Lunch: Existing KB
Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …
22 Free Lunch: Existing KB
Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …
23 Free Lunch: Existing KB
Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
24 Free Lunch: Existing KB
Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …
TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
25 Free Lunch: Existing KB
Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …
TP53 inhibits BCL2. Tumor suppressor P53 down-regulatesDistant the activity Supervision of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……
26 Genetic Pathways
PubMed-scale extraction 15,000 genes, 1.5 million unique regulations Compare w. NCI: 10,000 unique regulations Applications UCSC Genome Browser, MSR Interactions Track U. Wisconsin breast cancer study Etc.
Poon, Toutanova, Quirk, “Distant Supervision for Cancer Pathway Extraction from Text”. PSB-15.
27 Complex Knowledge
Outperform 19 out of 24 supervised participants in GENIA Shared Task
Parikh, Poon, Toutanova. “Grounded Semantic Parsing for Complex Knowledge Extraction”, NAACL-15.
28 Cross-Sentence, N-ary Relations
The deletion mutation on exon-19 of EGFR gene was present in 16 patients, while the L858E point mutation on exon-21 was noted in 10.
All patients were treated with gefitinib and showed a partial response.
29 Drug-Gene Interactions
Distant supervision w. discourse modeling Orders of magnitude increase: 162 79,952 No need for annotated examples
Quirk & Poon, “Distant Supervision for Relation Extraction beyond the Sentence Boundary”, arxiv.org/abs/1609.04873.
30 Reasoning: Neural Embedding
Embed gene network
Entity / Relation (v1, v2, …, vk) Relation triple
Toutanova et al., “Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text”, ACL-16.
31 Big Medical Data Decision Support Precision Medicine
Predict Drug Combo
32 Drug Combination
Problem: What combos to try? Cancer drug: 250+ approved, 1200+ developing Pairwise: 719,400; three-way: 287,280,400
Wanted: Prioritize drug combos in silico
33 Drug Combination
Problem: What combos to try? Cancer drug: 250+ approved, 1200+ developing Pairwise: 719,400; three-way: 287,280,400
Wanted: Prioritize drug combos in silico
Drug 1
Drug 2
34 35 BeatAML
Targeted drugs: 149
Pairs: 11,026
Tested: 102; Unknown: 10,924
36 Machine Learning
Evaluation: Cell kill + Synergy
Learning: Ranking loss
Features Panomics: Gene expression, … Pharmacology: Drug targets Network knowledge: TP53 inhibits BCL2, …
37 Interpretable Model
Feature Weight Feature Weight BCL2 and MAPK3 (moderate) 0.0442 BCL2 or MAPK3 (moderate) 0.041 MAP2K1 or MAPK10 (moderate) 0.0402 AKT2 and BCL2 (high) 0.033 BCL2 or MAPK3 (moderate) 0.0325 MAP2K1 or MAPK10 (moderate) 0.033 CSNK1E or PLK4 (high) 0.0311 BCL2 and MAPK3 (moderate) 0.031 MAP2K7 and MAPK7 (moderate) 0.0301 MAP2K5 and MAPK4 (high) 0.031 AKT3 and MAP2K1 (high) 0.0293 BCL2 and MAPK1 (high) 0.030 NEK2 or PLK1 (moderate) 0.0286 MAPK4 and SRC (high) 0.029 PSMB1 or PSMB2 (moderate) 0.0267 MAP2K2 or MAPK10 (moderate) 0.028 MAPK9 and STK11 (high) 0.0263 AKT3 and MAP2K1 (high) 0.027 MAPK1 and MAPK13 (moderate) 0.0263 BCL2 and MAPK3 (high) 0.027 … … … … BIRC5 or PLK4 (moderate) -0.0321 CSNK1D or PLK4 (moderate) -0.026 MAP2K2 or MAPK14 (high) -0.0324 MAP2K1 or MAPK13 (moderate) -0.026 AKT3 and MAPK8 (moderate) -0.0336 STK10 and STK33 (high) -0.027 STK10 and STK33 (high) -0.0337 AKT3 and MAPK8 (moderate) -0.028 BCL2 or MAPK8 (moderate) -0.0343 MAPK3 and SRC (high) -0.028 EGFR and MAPK3 (moderate) -0.036 BIRC5 or PLK4 (moderate) -0.029 MAPK10 and MAPK3 (moderate) -0.0381 MAP2K1 and MAPK10 (moderate) -0.031 MAP2K1 and MAPK10 (moderate) -0.0395 PLK1 and TAOK1 (moderate) -0.032 BCL2 or MAPK1 (high) -0.0442 MAP2K2 or MAPK14 (high) -0.034 BCL2 or MAPK8 (high) -0.0507 AURKB or PLK4 (moderate) -0.034 Composite metric AA metric
38 Interpretable Model
Feature Weight Feature Weight BCL2 and MAPK3 (moderate) 0.0442 BCL2 or MAPK3 (moderate) 0.041 MAP2K1 or MAPK10 (moderate) 0.0402 AKT2 and BCL2 (high) 0.033 BCL2 or MAPK3 (moderate) 0.0325 MAP2K1 or MAPK10 (moderate) 0.033 CSNK1E or PLK4 (high) 0.0311 BCL2 and MAPK3 (moderate) 0.031 MAP2K7 and MAPK7 (moderate) 0.0301 MAP2K5 and MAPK4 (high) 0.031 AKT3 and MAP2K1 (high) 0.0293 BCL2 and MAPK1 (high) 0.030 NEK2 or PLK1 Hanover:(moderate) 0.0286 BCL2MAPK4 + andMEK SRC (high) 0.029 PSMB1 or PSMB2 (moderate) 0.0267 MAP2K2 or MAPK10 (moderate) 0.028 MAPK9 and STK11 (high) 0.0263 AKT3 and MAP2K1 (high) 0.027 MAPK1 and MAPK13 (moderate) 0.0263 BCL2 and MAPK3 (high) 0.027 … … … … BIRC5 or PLK4 (moderate) -0.0321 CSNK1D or PLK4 (moderate) -0.026 ImpendingMAP2K2 or MAPK14 (high) trial on-0.0324 VenetoclaxMAP2K1 or MAPK13 (moderate)/ Trametinib-0.026 AKT3 and MAPK8 (moderate) -0.0336 STK10 and STK33 (high) -0.027 STK10 and STK33 (high) -0.0337 AKT3 and MAPK8 (moderate) -0.028 BCL2 or MAPK8 (moderate) -0.0343 MAPK3 and SRC (high) -0.028 EGFR and MAPK3 (moderate) -0.036 BIRC5 or PLK4 (moderate) -0.029 MAPK10 and MAPK3 (moderate) -0.0381 MAP2K1 and MAPK10 (moderate) -0.031 MAP2K1 and MAPK10 (moderate) -0.0395 PLK1 and TAOK1 (moderate) -0.032 BCL2 or MAPK1 (high) -0.0442 MAP2K2 or MAPK14 (high) -0.034 BCL2 or MAPK8 (high) -0.0507 AURKB or PLK4 (moderate) -0.034 Composite metric AA metric
39 Project Hanover
40 Knowledge Reasoning Machine Reading Predictive Analytics
Can be done manually, Can’t be done manually, need automation to scale need automation to enable
E.g., PubMed search E.g., personalize drug combinations
41 http://hanover.azurewebsites.net Collaborators
Knight: Brian Drucker, Jeff Tyner Chicago: Andrey Rzhetsky Wisconsin: Mark Craven Microsoft Research: Chris Quirk, Kristina Toutanova, Scott Yih, Bill Bolosky, David Heckerman, Lucy Vanderwende, Ravi Pandya Interns: Maxim Grechkin, Ankur Parikh, Victoria Lin, Sheng Wang, Stephen Mayhew, Daniel Fried, Violet Peng, Hao Cheng
42