Title Page Probabilistic Latent Factor Models For
Total Page:16
File Type:pdf, Size:1020Kb
PROBABILISTIC LATENT FACTOR MODELS TITLE PAGE FOR TRANSFORMATIVE DRUG DISCOVERY by Murat Can Cobanoglu BS, Sabanci University, 2008 MS, Sabanci University, 2010 Submitted to the Graduate Faculty of School of Medicine in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2015 UNIVERSITY OF PITTSBURGH SCHOOL OF MEDICINE This dissertation was presented by Murat Can Cobanoglu It was defended on June 9th, 2015 and approved by Ziv Bar-Joseph, PhD, Associate Professor, CMU Machine Learning Department Zoltán N. Oltvai, MD, Associate Professor, Department of Pathology Gary Silverman, MD, PhD, Professor, Pediatrics, Cell Biology and Physiology Andrew Stern, PhD, Associate Professor, University of Pittsburgh Drug Discovery Institute Dissertation Co-Advisor: Ivet Bahar, PhD, Distinguished Professor, Department of Computational and Systems Biology Dissertation Co-Advisor: D. Lansing Taylor, PhD, Professor, Director, University of Pittsburgh Drug Discovery Institute ii PROBABILISTIC LATENT FACTOR MODELS FOR TRANSFORMATIVE DRUG DISCOVERY Murat Can Cobanoglu, M.S. University of Pittsburgh, 2015 Copyright © by Murat Can Cobanoglu 2015 iii ABSTRACT: The cost of discovering a new drug has doubled every 9 years since the 1950s. This can change by using machine learning to guide experimentation. The idea I have developed over the course of my PhD is that using latent factor modeling (LFM) of the drug-target interaction network, we can guide drug repurposable efforts to achieve transformative improvements. By better characterizing the drug-target interaction network, it is possible to use currently approved drugs to achieve therapies for diseases that currently are not optimally treated. These drugs might be directly used through repurposing, or they can serve as a starting point for new drug discovery efforts where they are optimized through medicinal chemistry methods. To achieve this goal, I have developed LFM-based techniques applicable to existing databases of drug-target interaction networks. Specifically, I have started out by establishing that probabilistic matrix factorization (PMF; one type of LFM algorithm) can be used as descriptors by showing they capture therapeutic function similarities that state-of-the-art 3D chemical similarity methods could not capture. Then I have shown that PMF can effectively predict unknown drug-target interactions. Furthermore, I have used newly developed computational techniques for discovering repurposable drugs for two diseases, α1 antitrypsin (1-AT) deficiency (ATD) and Huntington’s disease (HD) leading to successful discoveries in both. For ATD, two sets of data generated by the David Perlmutter and Gary Silverman laboratories have been used as input to deduce potential targets and repurposable drugs: (i) a high throughput screening data from a genome- wide RNAi knockdown in a C. elegans model for studying ATZ (Z-allele of 1-AT), and (ii) data from Prestwick library screen for the same model. We have predicted that the antidiabetic drug glibenclamide would be beneficial against ATZ aggregation, and data collected to date in Mus musculus models are promising. We have worked on HD with the Robert Friedlander lab, iv by examining the potential drugs and implicated pathways for 15 neuroprotective (repurposable) drugs that they have identified in a two-stage screening study. Based on LFM-based analysis of the targets of these drugs, we have developed a number of hypotheses to be tested. Among them, the antihypertensive drug sodium nitroprusside appears to be effective against HD based on neuronal cell death inhibition experiments that were conducted at the University of Pittsburgh Drug Discovery Institute as well as the Friedlander lab. Finally, we have built a web server, named BalestraWeb, for facilitating the use of PMF in repurposable drug identification by the broader community. BalestraWeb enables users to extract information on known and potential targets (or drugs) for any approved drug (or target), simply by entering the name of the query drug (or target). I have also laid out the framework for developing an integrated resource for quantitative systems pharmacology, Balestra toolkit (BalestraTK), which would take advantage of existing databases such as STITCH, UniProt, and PubChem. Collectively, our results provide firm evidence for the potential utility of machine learning techniques for assisting in drug discovery. v TABLE OF CONTENTS TITLE PAGE ................................................................................................................................. I ABSTRACT ................................................................................................................................. IV PREFACE ................................................................................................................................... XV 1.0 INTRODUCTION ........................................................................................................ 1 1.1 BACKGROUND ON COMPUTATIONAL METHODS ................................ 4 1.1.1 Ligand-Centric Approaches ........................................................................... 7 1.1.2 Integrative Approaches ................................................................................... 8 1.1.3 Holistic Approaches ....................................................................................... 14 1.1.4 Target-Centric Approaches .......................................................................... 19 1.2 BIOMEDICAL BACKGROUND .................................................................... 21 1.2.1 α-1 Antitrypsin Deficiency ............................................................................ 21 1.2.2 Huntington’s Disease ..................................................................................... 24 1.3 SCOPE OF CONTRIBUTION ......................................................................... 25 1.4 SPECIFIC AIMS ............................................................................................... 29 1.5 SUMMARY OF FINDINGS ............................................................................. 31 2.0 LATENT FACTOR MODELING BASED ANALYSIS OF DRUG TARGET INTERACTIONS ........................................................................................................................ 34 2.1 METHODOLOGY ............................................................................................ 34 vi 2.1.1 Problem Definition ........................................................................................ 34 2.1.2 Dataset ............................................................................................................ 35 2.1.3 Probabilistic Matrix Factorization (PMF) .................................................. 35 2.1.4 Methodology for Active Learning On Drug-Target Interactions Using PMF ......................................................................................................................... 38 2.2 RESULTS ........................................................................................................... 42 2.2.1 Descriptive Power of LFM ............................................................................ 42 2.2.2 Predictive Power of LFM .............................................................................. 50 2.2.3 LFM Based Predictive Active Learning on Drug-Target Interactions .... 56 2.3 EFFICIENT & ONLINE LATENT FACTOR MODEL BASED DRUG- TARGET INTERACTION PREDICTIONS ................................................................... 61 2.4 METHODOLOGY FOR BUILDING EFFECTIVE MODEL-AVERAGED LATENT FACTOR BASED DRUG-TARGET INTERACTION PREDICTION MODELS ............................................................................................................................. 66 2.5 BALESTRATK: PYTHON TOOLKIT FOR DRUG TARGET INTERACTION DATA ACCESS AND INTEGRATION ............................................. 71 3.0 COMPUTATIONAL DISCOVERY OF THERAPEUTIC AGENTS AGAINST ALPHA-1 ANTITRYPSIN DEFICIENCY (ATD) .................................................................. 74 3.1 METHODOLOGY FOR DRUG REPURPOSING BASED ON MODEL ORGANISM GENE KNOCKDOWN DATA .................................................................. 74 3.1.1 Whole Genome RNAi Knockdown Screen .................................................. 75 3.1.2 The Computational Methodology for Analyzing Whole Genome Knockdown Data ........................................................................................................ 76 vii 3.1.3 Integration of Drug-Target Interaction and Drug Approval Status from Multiple Sources ......................................................................................................... 83 3.1.4 Mapping Between H. sapiens and C. elegans Targets ................................ 83 3.1.5 Identification of Repurposable Drugs .......................................................... 84 3.2 GLIBENCLAMIDE AS A NOVEL REPURPOSABLE CANDIDATE AGAINST ATD .................................................................................................................. 84 3.3 ADDITIONAL REPURPOSABLE CANDIDATES AGAINST ATD .......... 85 3.4 METHODOLOGY FOR HIGH CONTENT SCREENING DATA ANALYSIS AND HIT DIVERSIFICATION .................................................................. 94 3.4.1 Chemical-Based Active Diversification ....................................................... 95 3.4.2 Target-Based Active Diversification ...........................................................