Coupling AI and Network Biology
Total Page:16
File Type:pdf, Size:1020Kb
Coupling AI and network biology Generate insights for disease understanding and target identification Alexandr Ivliev, Director of Bioinformatics Cheng Fang, Scientific Consultant 03.06.2020 Agenda 1. Introduction 2. Coupling AI and network biology 3. High-quality biological networks in microbiome for target ID 4. Key takeaways Introduction © 2020 Clarivate 3 Genomic revolution of early 2000’s From individual genes to understanding entire genome © 2020 Clarivate 4 Artificial intelligence: new ongoing revolution Revolutionizing industries Computer Natural language Reinforcement vision processing learning • Self-driving cars • Machine translation • Chess, go, computer games • Face recognition • Speech analysis • Robotics • And more • And more • And more © 2020 Clarivate 5 Artificial intelligence is a hot field First time “deep learning” appeared in Gartner Hype Cycle for Emerging Technologies in 2017 © 2020 Clarivate 6 Deep learning is a big field https://www.asimovinstitute.org/neural-network-zoo/ © 2020 Clarivate 7 Computer vision and image processing Esteva et al, Nat Med, 2019 © 2020 Clarivate 8 Text mining, e.g. electronic health records Esteva et al, Nat Med, 2019 © 2020 Clarivate 9 Applications in genomics Esteva et al, Nat Med, 2019 © 2020 Clarivate 10 Networks are how biology works Drug target • Disease mechanism understanding Disease genes • Target ID Network by Martin Grandjean © 2020 Clarivate 11 Can biological networks be coupled with deep neural networks to enable disease mechanism understanding and target ID? Target © 2020 Clarivate 12 What approaches are you taking to understand disease mechanisms and identify novel drug targets? a. Literature searches b. OMICs data analysis c. Small scale lab experiments d. Classical machine learning e. Deep learning f. Other or inapplicable © 2020 Clarivate 13 Coupling AI with network biology to enable disease understanding and target ID © 2020 Clarivate 14 Networks are quite different from texts and images Problem: graphs are structurally very different from inputs in other AI solutions Text and sequences have Images have 2D grid Networks are more linear 1D structure structure complex CGT TTA GAA “To be or not to be, that is the question” © 2020 Clarivate 15 Images and texts work fine as input for neural networks 0.123 0.224 0.851 0.401 0.486 0.298 Prediction 0.694 0.887 0.653 0.101 0.696 © 2020 Clarivate 16 Images and texts work fine as input for neural networks © 2020 Clarivate 17 Solution 1: Generating biological network node embeddings using random walks © 2020 Clarivate 18 What is “embedding”? • Embedding = dense vector that captures important information from the input queen • Embeddings can be word2vec king learned automatically as opposed to feature engineering • “Similar” objects have bagel close embeddings • It’s easy to use embeddings as input into AI techniques Sequences Numeric space © 2020 Clarivate 19 Generating node embeddings Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 3,816 Biological network Numeric space © 2020 Clarivate ‹#› Generating node embeddings node2vec word2vec Node 1 Node 2 Node 4 Node 3 Node 5 Node 3,816 Node 6 Node 7 Biological network Sequences Numeric space © 2020 Clarivate ‹#› Generating node embeddings using random walks General idea 1 3 6 7 2 4 8 Examples of random walks starting 3 -> 4 -> 6 -> 7 -> 8 from node 3 with four steps: 3 -> 2 -> 3 -> 1 -> 3 3 -> 4 -> 6 -> 3 -> 4 Random walk sequences Node graph embeddings 3 4 6 7 8 Text Node 1: -0.01822536, 0.14636423, 0.023379749 … 3 2 3 1 3 Embedding Node 2: 0.10925472, 0.00750885, -0.019593006 … 3 4 6 3 4 Model … … © 2020 Clarivate 21 Random walk variant example: Node2Vec It’s a scale of behavior controlled by two hyperparameters Breadth first Depth first Nodes in “local communities” Nodes having alike “structural roles” are more similar to each other are more similar to each other © 2020 Clarivate 24 node2vec: Scalable Feature Learning for Networks. Grover, A., & Leskovec, J. (2016). Coupling biological networks with deep neural networks to enable disease understanding and target ID DEGs Node embeddings Novel from random predicted walks targets Training set of known targets © 2020 Clarivate 22 Challenges with simple random walks How do we capture those non-trivial similarities? Attributes • Nodes 1 and 8 are e.g.: • up-regulated = yes “similar” as they have • protein class = kinase • known target = yes the same attributes But they will be far from each other in in random walks • Node 9 is unreachable from any other node, yet it’s “similar” to Nodes node 6 e.g.: • protein 1 • protein 2 • protein 3 © 2020 Clarivate 25 How to incorporate attributes into graph embeddings Gat2Vec Generate random walks on each graph independently, 1 3 and supply both sets of sequences into word embeddings learning 2 Random walk => node ids “words” Structural graph From structural graph: without attributes 1 -> 3 -> 2 => “1 3 2” 1 a From attributes graph: 8 b • Nodes 1 and 8 are are close in bipartite 1 -> a -> 8 -> b -> 2 => “1 8 2” graph 2 • Node 9 is connected to other nodes on Caveats: bipartite graph - Attributes can be only discrete - Cannot use complex attributes, like SMILES, amino-acid sequences, etc Bipartite attributes graph gat2vec: representation learning for attributed graphs. Sheikh, N., Kefato, Z., & Montresor, A. (2019). Computing, 101(3), 187–209. © 2020 Clarivate 26 Example application Target prioritization using gat2vec - GuiltyTargets Best ROC AUC for different - STRING diseases - HIPPIE ≈0.92-0.94 Protein-protein interaction network Annotated Features Positive- RNASeq from different protein-protein Rank candidate obtained using unlabeled cohorts interaction targets Gat2Vec learning (MSBB, MayoRNASeq, network Discrete ROSMAP, etc.) differential gene expression - Open Targets Known targets - Therapeutic Targets for the disease Database GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). BioRxiv, 521161. © 2020 Clarivate 27 Coupling biological networks with deep neural networks to enable disease understanding and target ID Kinases GWAS hits Node DEGs embeddings * from random + * walks in Novel * (1) structural predicted graph; and (2) targets + attribute graph Training set of known targets © 2020 Clarivate 28 Solution 2: Building artificial neural nets to structurally reflect a biological network of interest © 2020 Clarivate 29 Graph neural networks Input Graph neural net Molecule Physical path Text The little cat looks lovely. © 2020 Clarivate 30 Graph neural networks for target ID – one approach A node’s neighborhood defines a computational graph e.g.: • A + C • A – 0.34 * C • A * (A + 1.4 * C) Any differentiable Features on e.g.: function that aggregates node A multiple vectors into one • protein class • druggability • genetic link Features on • differential node C expression BIOLOGICAL NETWORK The beauty is: it’s all a differentiable computational graph that can be optimized using backpropagation. “Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018” © 2020 Clarivate 31 Graph neural network for target ID • Every node has its own unique computational graph defined by the biological network structure • These computational graphs are BIOLOGICAL NETWORK neural networks that can be trained using standard AI techniques A B C D E F COMPUTATIONAL GRAPH = ARTIFICIAL NEURAL NETWORK FOR EACH NODE “Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018” © 2020 Clarivate ‹#› Example: Decagon algorithm Modeling polypharmacy side effects with graph convolutional networks Modeling polypharmacy side effects with graph convolutional networks. Zitnik, M., Agrawal, M., & Leskovec, J. (2018). © 2020 Clarivate 33 These methods open the doors to coupling AI with biological networks Targets Indications Mechanisms © 2020 Clarivate 34 Key challenges Garbage in – Knowledge bias Model interpretation garbage out • The need for high- • It’s hard to predict • Opening the “black box” quality networks completely unknown • And large high-quality from the known training sets © 2020 Clarivate 35 How optimistic are you that AI will transform pharma R&D in 5 years? a. It'll revolutionize research b. It'll yield incremental advances c. It won't solve any of the major challenges d. Other © 2020 Clarivate 36 Key challenges Garbage in – Knowledge bias Model interpretation garbage out • The need for high- • It’s hard to predict • Opening the “black box” quality networks completely unknown • And large high-quality from the known training sets Curating high-quality networks in microbiome for target ID © 2020 Clarivate 37 High-quality biological networks in microbiome for target ID © 2020 Clarivate 38 “The human microbiome and why the solution for all disease lies within our own gut” Nov 2017 © 2020 Clarivate 39 Why should we care about the microbiome? Its importance has long been recognized (first described 1700 years ago!) and used in medical practice– FMT (fecal microbiota transplant). Fecal transplantation is performed as a treatment for recurrent C. difficile colitis infection (CDI). C. difficile colitis, a complication of antibiotic therapy, may be associated with diarrhea, abdominal cramping and sometimes fever. Adverse effects are poorly understood. © 2020 Clarivate 40 Why should we care about microbiome: a new era Microbiome is implicated in health and diseases: • IBD and Crohn’s diseases • Obesity & Diabetes • Immune functions & malfunction