Coupling AI and network biology

Generate insights for disease understanding and target identification

Alexandr Ivliev, Director of Cheng Fang, Scientific Consultant 03.06.2020 Agenda 1. Introduction 2. Coupling AI and network biology 3. High-quality biological networks in microbiome for target ID 4. Key takeaways Introduction

© 2020 Clarivate 3 Genomic revolution of early 2000’s From individual to understanding entire

© 2020 Clarivate 4 Artificial intelligence: new ongoing revolution Revolutionizing industries

Computer Natural language Reinforcement vision processing learning • Self-driving cars • Machine translation • Chess, go, computer games • Face recognition • Speech analysis • Robotics • And more • And more • And more

© 2020 Clarivate 5 Artificial intelligence is a hot field First time “deep learning” appeared in Gartner Hype for Emerging Technologies in 2017

© 2020 Clarivate 6 Deep learning is a big field

https://www.asimovinstitute.org/neural-network-zoo/

© 2020 Clarivate 7 and image processing

Esteva et al, Nat Med, 2019

© 2020 Clarivate 8 Text mining, e.g. electronic health records

Esteva et al, Nat Med, 2019 © 2020 Clarivate 9 Applications in genomics

Esteva et al, Nat Med, 2019 © 2020 Clarivate 10 Networks are how biology works

Drug target

• Disease mechanism understanding Disease genes • Target ID

Network by Martin Grandjean © 2020 Clarivate 11 Can biological networks be coupled with deep neural networks to enable disease mechanism understanding and target ID?

Target

© 2020 Clarivate 12 What approaches are you taking to understand disease mechanisms and identify novel drug targets? a. Literature searches b. data analysis c. Small scale lab experiments d. Classical machine learning e. Deep learning f. Other or inapplicable

© 2020 Clarivate 13 Coupling AI with network biology to enable disease understanding and target ID

© 2020 Clarivate 14 Networks are quite different from texts and images Problem: graphs are structurally very different from inputs in other AI solutions

Text and sequences have Images have 2D grid Networks are more linear 1D structure structure complex

CGT TTA GAA

“To be or not to be, that is the question”

© 2020 Clarivate 15 Images and texts work fine as input for neural networks

0.123 0.224 0.851 0.401 0.486 0.298 Prediction 0.694 0.887 0.653 0.101 0.696

© 2020 Clarivate 16 Images and texts work fine as input for neural networks

© 2020 Clarivate 17 Solution 1: Generating biological network node embeddings using random walks

© 2020 Clarivate 18 What is “embedding”?

• Embedding = dense vector that captures important information from the input queen • Embeddings can be word2vec king learned automatically as opposed to feature engineering • “Similar” objects have bagel close embeddings • It’s easy to use embeddings as input into AI techniques

Sequences Numeric space

© 2020 Clarivate 19 Generating node embeddings

Node 1 Node 2 Node 3

Node 4 Node 5 Node 6 Node 7 Node 3,816

Biological network Numeric space

© 2020 Clarivate ‹#› Generating node embeddings

node2vec

word2vec Node 1 Node 2 Node 4 Node 3 Node 5 Node 3,816 Node 6 Node 7

Biological network Sequences Numeric space

© 2020 Clarivate ‹#› Generating node embeddings using random walks General idea 1 3 6 7 2 4 8

Examples of random walks starting 3 -> 4 -> 6 -> 7 -> 8 from node 3 with four steps: 3 -> 2 -> 3 -> 1 -> 3 3 -> 4 -> 6 -> 3 -> 4

Random walk sequences Node graph embeddings 3 4 6 7 8 Text Node 1: -0.01822536, 0.14636423, 0.023379749 … 3 2 3 1 3 Embedding Node 2: 0.10925472, 0.00750885, -0.019593006 … 3 4 6 3 4 Model … …

© 2020 Clarivate 21 Random walk variant example: Node2Vec

It’s a scale of behavior controlled by two hyperparameters Breadth first Depth first

Nodes in “local communities” Nodes having alike “structural roles” are more similar to each other are more similar to each other

© 2020 Clarivate 24 node2vec: Scalable Feature Learning for Networks. Grover, A., & Leskovec, J. (2016). Coupling biological networks with deep neural networks to enable disease understanding and target ID

DEGs Node embeddings Novel from random predicted walks targets

Training set of known targets

© 2020 Clarivate 22 Challenges with simple random walks How do we capture those non-trivial similarities?

Attributes • Nodes 1 and 8 are e.g.: • up-regulated = yes “similar” as they have • protein class = kinase • known target = yes the same attributes But they will be far from each other in in random walks • Node 9 is unreachable from any other node, yet it’s “similar” to Nodes node 6 e.g.: • protein 1 • protein 2 • protein 3

© 2020 Clarivate 25 How to incorporate attributes into graph embeddings Gat2Vec Generate random walks on each graph independently, 1 3 and supply both sets of sequences into word embeddings learning 2

Random walk => node ids “words” Structural graph From structural graph: without attributes 1 -> 3 -> 2 => “1 3 2”

1 a From attributes graph: 8 b • Nodes 1 and 8 are are close in bipartite 1 -> a -> 8 -> b -> 2 => “1 8 2” graph 2 • Node 9 is connected to other nodes on Caveats: - Attributes can be only discrete - Cannot use complex attributes, like SMILES, amino-acid sequences, etc Bipartite attributes graph

gat2vec: representation learning for attributed graphs. Sheikh, N., Kefato, Z., & Montresor, A. (2019). Computing, 101(3), 187–209. © 2020 Clarivate 26 Example application Target prioritization using gat2vec - GuiltyTargets Best ROC AUC for different - STRING diseases - HIPPIE ≈0.92-0.94 Protein-protein interaction network Annotated Features Positive- RNASeq from different protein-protein Rank candidate obtained using unlabeled cohorts interaction targets Gat2Vec learning (MSBB, MayoRNASeq, network Discrete ROSMAP, etc.) differential expression

- Open Targets Known targets - Therapeutic Targets for the disease Database

GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). BioRxiv, 521161. © 2020 Clarivate 27 Coupling biological networks with deep neural networks to enable disease understanding and target ID

Kinases

GWAS hits

Node DEGs embeddings * from random + * walks in Novel * (1) structural predicted graph; and (2) targets + attribute graph

Training set of known targets

© 2020 Clarivate 28 Solution 2: Building artificial neural nets to structurally reflect a biological network of interest

© 2020 Clarivate 29 Graph neural networks

Input Graph neural net

Molecule

Physical

Text The little cat looks lovely.

© 2020 Clarivate 30 Graph neural networks for target ID – one approach A node’s neighborhood defines a computational graph e.g.: • A + C • A – 0.34 * C • A * (A + 1.4 * C)

Any differentiable Features on e.g.: function that aggregates node A multiple vectors into one • protein class • druggability • genetic link Features on • differential node C expression

BIOLOGICAL NETWORK

The beauty is: it’s all a differentiable computational graph that can be optimized using backpropagation.

“Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018” © 2020 Clarivate 31 Graph neural network for target ID • Every node has its own unique computational graph defined by the biological network structure • These computational graphs are BIOLOGICAL NETWORK neural networks that can be trained using standard AI techniques

A B C D E F

COMPUTATIONAL GRAPH = ARTIFICIAL NEURAL NETWORK FOR EACH NODE

“Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018” © 2020 Clarivate ‹#› Example: Decagon algorithm Modeling polypharmacy side effects with graph convolutional networks

Modeling polypharmacy side effects with graph convolutional networks. Zitnik, M., Agrawal, M., & Leskovec, J. (2018). © 2020 Clarivate 33 These methods open the doors to coupling AI with biological networks

Targets

Indications

Mechanisms

© 2020 Clarivate 34 Key challenges

Garbage in – Knowledge bias Model interpretation garbage out • The need for high- • It’s hard to predict • Opening the “black box” quality networks completely unknown • And large high-quality from the known training sets

© 2020 Clarivate 35 How optimistic are you that AI will transform pharma R&D in 5 years? a. It'll revolutionize research b. It'll yield incremental advances c. It won't solve any of the major challenges d. Other

© 2020 Clarivate 36 Key challenges

Garbage in – Knowledge bias Model interpretation garbage out • The need for high- • It’s hard to predict • Opening the “black box” quality networks completely unknown • And large high-quality from the known training sets

Curating high-quality networks in microbiome for target ID

© 2020 Clarivate 37 High-quality biological networks in microbiome for target ID

© 2020 Clarivate 38 “The human microbiome and why the solution for all disease lies within our own gut” Nov 2017

© 2020 Clarivate 39 Why should we care about the microbiome? Its importance has long been recognized (first described 1700 years ago!) and used in medical practice– FMT (fecal microbiota transplant). Fecal transplantation is performed as a treatment for recurrent C. difficile colitis infection (CDI). C. difficile colitis, a complication of antibiotic therapy, may be associated with diarrhea, abdominal cramping and sometimes fever.

Adverse effects are poorly understood.

© 2020 Clarivate 40 Why should we care about microbiome: a new era

Microbiome is implicated in health and diseases: • IBD and Crohn’s diseases • Obesity & Diabetes • Immune functions & malfunction • Autoimmunity diseases • Cardiovascular diseases • Neurological diseases • Oncology

Cani, P. Nat Rev Gastroenterol Hepatol 14, 321–322 (2017).

© 2020 Clarivate 41 Active drug development • 781 drug and biologics under development • 73 in clinical trials

Source: Cortellis Drug Discovery Intelligence

© 2020 Clarivate 42 Active drug development • Vast majority of the microbiome drugs in clinical trials have no specific mechanisms • Sodium oligo- mannurarate extracted from algae developed at Shanghai Green Valley to treat mild to moderate AD. • Sibofimloc is inhibitor of type 1 fimbrial adhesin from E. Coli. It’s in phase one for IBD.

Source: Cortellis Drug Discovery Intelligence

© 2020 Clarivate 43 Mechanism of action matters in drug development

Drug Discovery Preclinical Clinical Trial Regulatory Review Marketing Translational API Synthesis Scale-up to MFG Manufacturing Precision Medicine Post-market Surveillance (Ph. IV)

IND Submitted NDA Submitted   APPROVAL 5,000-10,000 ~250 compounds compounds <5 compounds

PHASE I PHASE II PHASE III 20 – 100 100 – 1,000 – 5,000 volunteers 500patients patients

Safety Safety/ Efficacy, Dosing Adverse Events

Among 640 novel therapeutics of Phase 3 clinical trials (1998-2008), 344 (54%) failed in clinical development, 230 (36%) were approved by the US Food and Drug Administration (FDA), and 66 (10%) were approved in other countries but not by the FDA. Most products failed due to inadequate efficacy (n = 195; 57%), while 59 (17%) failed because of safety concerns and 74 (22%) failed due to commercial reasons.

Hwang et al. Dec. 2016 JAMA Internal Medicine © 2020 Clarivate 44 How do we leverage AI to understand MoA and identify new targets in microbiome?

Novel predicted targets

© 2020 Clarivate ‹#› Understanding biology is critical for target ID

• A microbe-host interaction network could be used to: – Networks can uniquely identify potential microbial effectors that target distinct host nodes or interfere with endogenous host interactions – Determine how mutations on either host or microbial proteins affect the interaction – Delineate pathogenic mechanisms and thereby help maximize beneficial therapeutics

© 2020 Clarivate 45 Types of microbe-host interactions

Microbe-Host MAMP (microbe Microbial Microbe-microbe protein-protein associated metabolite – host interactions interactions 1 molecular pattern) protein (protein-protein or – Host protein- interactions 2 protein- protein metabolite) interactions

1 approx. 16,000 publications 2 approx. 10,000 publications

© 2020 Clarivate 46 Microbiome publications are Microbiome publications over time growing fast

Source: Clarivate Analytics Web of Science, using title search terms (human microbiota, human microbiome, microbiome, human microbial, human microbes, or gut ecology).

© 2020 Clarivate 47 A database to capture microbial- host interactions is needed for better understanding the biology

© 2020 Clarivate 48 Ideal literature curation workflow Manual curation ensures the high quality

Construct Review and Acquire Annotate QC and Define search prioritize data and and curate format for project strings abstracts articles articles delivery

Define curation Find Manual Experience Controlled Knowledge in template, relevant review and in vocabularies development inclusion/exclusion articles for prioritize Biomedical and public of biological criteria and review based on literature database IDs databases prioritization strategy inclusion/exc monitoring lusion criteria

© 2020 Clarivate ‹#› How is a database like this constructed?

A solution for reconstruction, data management and integration

Literature curation and database Proprietary data sources User interface construction

Curator • Annotates • Enriches data User query • Quality control

Articles and data • Metabolite-host interactions Summary • Microbe-microbe interactions • And more Database of Microbiome-Host Administrator Interactions (DoMI) Interaction networks • Design • Table of interactions • Development • Access to related articles • Maintenance • And more Public data sources

© 2020 Clarivate 51 Example interactome reconstruction

MICROBIOME HOST

MetaG MetaB RNA-Seq Metabolite Taxonomy KO BGC IL12 NOS2

Activation IL6

GPR109A IL10 Butyrate MYD88 TRAF6 Inhibition protein NFKB

TLR4 IKKE TlpA

LPS TRAM IRF3 IFNB CD11B TBK1 CASP7 ACT

FHA CD18 COG

© 2020 Clarivate © 202052 ClarivateCASP3 52 The microbial-host interaction database will help leveraging AI

node2vec

word2vec Node 1 Node 2 Node 4 Node 3 Node 5 Node 3,816 Node 6 Node 7

Host-microbiome network Sequences Numeric space

© 2020 Clarivate ‹#› The microbial-host interaction database will help leveraging AI

Novel predicted targets

© 2020 Clarivate ‹#› Key takeaways

AI is promising Biological networks Manual curation Time will show significant are different from remains important how much of advances in the common AI inputs for creating high- transformation data-rich but approaches quality biological versus incremental biomedical field have emerged to networks and progress AI will feed biological training sets for AI bring into pharma networks into AI R&D techniques

© 2020 Clarivate 54 Q&A

© 2020 Clarivate ‹#› Interested in learning more about Clarivate’s drug discovery consulting services? Visit our website to learn more.

Alexandr Ivliev [email protected] Cheng Fang [email protected]

© 2020 Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.