The Curing AI for Precision Medicine

Hoifung Poon

1 Medicine Today Is Imprecise

Top 20 drugs 80% non-responders

Wasted 1/3 health spending $750 billion / year

2 Disruption 1: Big Data

2009  2013: 40%  93%

3 Disruption 2: Pay-for-Performance

Goal: 75% by 2020

4 Vemurafenib on BRAF-V600 Melanoma

Before Treatment 15 Weeks

5 Vemurafenib on BRAF-V600 Melanoma

Before Treatment 15 Weeks 23 Weeks

6 Why We Haven’t Solved Precision Medicine?

… ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC … … ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC … … ATTCGGATATTTAAGGC … … ATTCGGGTATTTAAGCC … High-Throughput Data Discovery

Bottleneck #1: Knowledge Bottleneck #2: Reasoning

AI is the key to overcome these bottlenecks

7 Use Case: Molecular Tumor Board

8 www.ucsf.edu/news/2014/11/120451/bridging-gap-precision-medicine Use Case: Molecular Tumor Board

 Problem: Hard to scale U.S. 2015: 1.6 million new cases, 600K deaths

 902 cancer hospitals

 Memorial Sloan Kettering 2016:  Sequence: Tens of thousand  Board can review: A few hundred

Wanted: Decision support for cancer precision medicine

9 First-Generation Molecular Tumor Board

 Knowledge bottleneck  E.g., given a tumor sequence, determine:  What and mutations are important  What drugs might be applicable  Can do manually but hard to scale

10 Next-Generation Molecular Tumor Board

 Reasoning bottleneck  E.g., personalize drug combinations  Can’t do manually, ever

11 Big Medical Data Decision Support Precision Medicine

Machine Predict Reading Drug Combo

12 13 PubMed

 26 millions abstracts  Two new abstracts every minute  Adds over one million every year

14 Machine Reading

PMID: 123 … VDR+ binds to SMAD3 to form …

PMID: 456 Knowledge … JUN expression Base is induced by SMAD3/4

… ……

15 Machine Reading

Involvement of p70(S6)- activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

16 Machine Reading

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

human gp41 p70(S6)-kinase IL-10 monocyte GENE GENE CELL 17 Machine Reading

Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1 ...

Involvement REGULATION

Theme Cause

up-regulation REGULATION activation REGULATION

Theme Cause Site Theme

human gp41 p70(S6)-kinase IL-10 monocyte GENE GENE GENE CELL 18 Long Tail of Variations

TP53 inhibits BCL2. Tumor suppressor down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … …… negative regulation 532 inhibited, 252 inhibition, 218 inhibit, 207 blocked, 175 inhibits, 157 decreased, 156 reduced, 112 suppressed, 108 decrease, 86 inhibitor, 81 Inhibition, 68 inhibitors, 67 abolished, 66 suppress, 65 block, 63 prevented, 48 suppression, 47 blocks, 44 inhibiting, 42 loss, 39 impaired, 38 reduction, 32 down-regulated, 29 abrogated, 27 prevents, 27 attenuated, 26 repression, 26 decreases, 26 down-regulation, 25 diminished, 25 downregulated, 25 suppresses, 22 interfere, 21 absence, 21 repress …… 19 Machine Reading

 Prior work  Focused on Newswire / Web  Popular entities and facts  Redundancy  Simple methods often suffice  High-value verticals  Healthcare, finance, law, etc.  Little redundancy: Rare entities and facts  Novel challenges require sophisticated NLP

20 Precision Medicine Machine Reading Challenges Advances

Annotation Bottleneck Distant Supervision

Complex Knowledge Grounded Semantic Parsing

Reasoning Neural Embedding

Distant Supervision with Beyond Sentence Boundary Discourse Modeling

21 Free Lunch: Existing KB

Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …

22 Free Lunch: Existing KB

Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …

23 Free Lunch: Existing KB

Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …

TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……

24 Free Lunch: Existing KB

Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …

TP53 inhibits BCL2. Tumor suppressor P53 down-regulates the activity of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……

25 Free Lunch: Existing KB

Regulation Theme Cause Positive A2M FOXO1 NCI Positive ABCB1 TP53 Pathway KB Negative BCL2 TP53 … … …

TP53 inhibits BCL2. Tumor suppressor P53 down-regulatesDistant the activity Supervision of BCL-2 proteins. BCL2 transcription is suppressed by P53 expression. The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 … ……

26 Genetic Pathways

 PubMed-scale extraction  15,000 genes, 1.5 million unique regulations  Compare w. NCI: 10,000 unique regulations  Applications  UCSC Browser, MSR Interactions Track  U. Wisconsin breast cancer study  Etc.

Poon, Toutanova, Quirk, “Distant Supervision for Cancer Pathway Extraction from Text”. PSB-15.

27 Complex Knowledge

Outperform 19 out of 24 supervised participants in GENIA Shared Task

Parikh, Poon, Toutanova. “Grounded Semantic Parsing for Complex Knowledge Extraction”, NAACL-15.

28 Cross-Sentence, N-ary Relations

The mutation on exon-19 of EGFR gene was present in 16 patients, while the L858E point mutation on exon-21 was noted in 10.

All patients were treated with gefitinib and showed a partial response.

29 Drug-Gene Interactions

 Distant supervision w. discourse modeling  Orders of magnitude increase: 162  79,952  No need for annotated examples

Quirk & Poon, “Distant Supervision for Relation Extraction beyond the Sentence Boundary”, arxiv.org/abs/1609.04873.

30 Reasoning: Neural Embedding

 Embed gene network

 Entity / Relation  (v1, v2, …, vk)  Relation triple  Score  Distant supervision: Known relations score higher  Increased recall by 20 points on held-out Toutanova et al., “Representing Text for Joint Embedding of Text and Knowledge Bases”, EMNLP-15.

Toutanova et al., “Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text”, ACL-16.

31 Big Medical Data Decision Support Precision Medicine

Predict Drug Combo

32 Drug Combination

 Problem: What combos to try?  Cancer drug: 250+ approved, 1200+ developing  Pairwise: 719,400; three-way: 287,280,400

 Wanted: Prioritize drug combos in silico

33 Drug Combination

 Problem: What combos to try?  Cancer drug: 250+ approved, 1200+ developing  Pairwise: 719,400; three-way: 287,280,400

 Wanted: Prioritize drug combos in silico

Drug 1

Drug 2

34 35 BeatAML

 Targeted drugs: 149

 Pairs: 11,026

 Tested: 102; Unknown: 10,924

36 Machine Learning

 Evaluation: Cell kill + Synergy

 Learning: Ranking loss

 Features  Panomics: , …  Pharmacology: Drug targets  Network knowledge: TP53 inhibits BCL2, …

37 Interpretable Model

Feature Weight Feature Weight BCL2 and MAPK3 (moderate) 0.0442 BCL2 or MAPK3 (moderate) 0.041 MAP2K1 or MAPK10 (moderate) 0.0402 AKT2 and BCL2 (high) 0.033 BCL2 or MAPK3 (moderate) 0.0325 MAP2K1 or MAPK10 (moderate) 0.033 CSNK1E or PLK4 (high) 0.0311 BCL2 and MAPK3 (moderate) 0.031 MAP2K7 and MAPK7 (moderate) 0.0301 MAP2K5 and MAPK4 (high) 0.031 AKT3 and MAP2K1 (high) 0.0293 BCL2 and MAPK1 (high) 0.030 NEK2 or (moderate) 0.0286 MAPK4 and SRC (high) 0.029 PSMB1 or PSMB2 (moderate) 0.0267 MAP2K2 or MAPK10 (moderate) 0.028 MAPK9 and STK11 (high) 0.0263 AKT3 and MAP2K1 (high) 0.027 MAPK1 and MAPK13 (moderate) 0.0263 BCL2 and MAPK3 (high) 0.027 … … … … BIRC5 or PLK4 (moderate) -0.0321 CSNK1D or PLK4 (moderate) -0.026 MAP2K2 or MAPK14 (high) -0.0324 MAP2K1 or MAPK13 (moderate) -0.026 AKT3 and MAPK8 (moderate) -0.0336 STK10 and STK33 (high) -0.027 STK10 and STK33 (high) -0.0337 AKT3 and MAPK8 (moderate) -0.028 BCL2 or MAPK8 (moderate) -0.0343 MAPK3 and SRC (high) -0.028 EGFR and MAPK3 (moderate) -0.036 BIRC5 or PLK4 (moderate) -0.029 MAPK10 and MAPK3 (moderate) -0.0381 MAP2K1 and MAPK10 (moderate) -0.031 MAP2K1 and MAPK10 (moderate) -0.0395 PLK1 and TAOK1 (moderate) -0.032 BCL2 or MAPK1 (high) -0.0442 MAP2K2 or MAPK14 (high) -0.034 BCL2 or MAPK8 (high) -0.0507 AURKB or PLK4 (moderate) -0.034 Composite metric AA metric

38 Interpretable Model

Feature Weight Feature Weight BCL2 and MAPK3 (moderate) 0.0442 BCL2 or MAPK3 (moderate) 0.041 MAP2K1 or MAPK10 (moderate) 0.0402 AKT2 and BCL2 (high) 0.033 BCL2 or MAPK3 (moderate) 0.0325 MAP2K1 or MAPK10 (moderate) 0.033 CSNK1E or PLK4 (high) 0.0311 BCL2 and MAPK3 (moderate) 0.031 MAP2K7 and MAPK7 (moderate) 0.0301 MAP2K5 and MAPK4 (high) 0.031 AKT3 and MAP2K1 (high) 0.0293 BCL2 and MAPK1 (high) 0.030 NEK2 or PLK1 Hanover:(moderate) 0.0286 BCL2MAPK4 + andMEK SRC (high) 0.029 PSMB1 or PSMB2 (moderate) 0.0267 MAP2K2 or MAPK10 (moderate) 0.028 MAPK9 and STK11 (high) 0.0263 AKT3 and MAP2K1 (high) 0.027 MAPK1 and MAPK13 (moderate) 0.0263 BCL2 and MAPK3 (high) 0.027 … … … … BIRC5 or PLK4 (moderate) -0.0321 CSNK1D or PLK4 (moderate) -0.026 ImpendingMAP2K2 or MAPK14 (high) trial on-0.0324 VenetoclaxMAP2K1 or MAPK13 (moderate)/ -0.026 AKT3 and MAPK8 (moderate) -0.0336 STK10 and STK33 (high) -0.027 STK10 and STK33 (high) -0.0337 AKT3 and MAPK8 (moderate) -0.028 BCL2 or MAPK8 (moderate) -0.0343 MAPK3 and SRC (high) -0.028 EGFR and MAPK3 (moderate) -0.036 BIRC5 or PLK4 (moderate) -0.029 MAPK10 and MAPK3 (moderate) -0.0381 MAP2K1 and MAPK10 (moderate) -0.031 MAP2K1 and MAPK10 (moderate) -0.0395 PLK1 and TAOK1 (moderate) -0.032 BCL2 or MAPK1 (high) -0.0442 MAP2K2 or MAPK14 (high) -0.034 BCL2 or MAPK8 (high) -0.0507 AURKB or PLK4 (moderate) -0.034 Composite metric AA metric

39 Project Hanover

40 Knowledge Reasoning Machine Reading Predictive Analytics

Can be done manually, Can’t be done manually, need automation to scale need automation to enable

E.g., PubMed search E.g., personalize drug combinations

41 http://hanover.azurewebsites.net Collaborators

 Knight: Brian Drucker, Jeff Tyner  Chicago: Andrey Rzhetsky  Wisconsin: Mark Craven  Microsoft Research: Chris Quirk, Kristina Toutanova, Scott Yih, Bill Bolosky, David Heckerman, Lucy Vanderwende, Ravi Pandya  Interns: Maxim Grechkin, Ankur Parikh, Victoria Lin, Sheng Wang, Stephen Mayhew, Daniel Fried, Violet Peng, Hao Cheng

42