Predicting Protein Targets for Drug-Like Compounds Using Transcriptomics

UCSF UC San Francisco Previously Published Works Title Predicting protein targets for drug-like compounds using transcriptomics. Permalink https://escholarship.org/uc/item/79q812t3 Journal PLoS computational biology, 14(12) ISSN 1553-734X Authors Pabon, Nicolas A Xia, Yan Estabrooks, Samuel K et al. Publication Date 2018-12-07 DOI 10.1371/journal.pcbi.1006651 Peer reviewed eScholarship.org Powered by the California Digital Library University of California RESEARCH ARTICLE Predicting protein targets for drug-like compounds using transcriptomics 1☯ 2☯ 3 4 Nicolas A. PabonID , Yan Xia , Samuel K. Estabrooks , Zhaofeng Ye , Amanda 5 5 5 6 K. Herbrand , Evelyn SuÈ û , Ricardo M. BiondiID , Victoria A. Assimon , Jason 6 3 1 2 E. Gestwicki , Jeffrey L. Brodsky , Carlos J. CamachoID *, Ziv Bar-JosephID * 1 Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 2 Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America, 3 Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 4 School of Medicine, Tsinghua a1111111111 University, Beijing, China, 5 Department of Internal Medicine I, UniversitaÈtsklinikum Frankfurt, Frankfurt, a1111111111 Germany, 6 Department of Pharmaceutical Chemistry, University of California, San Francisco, San a1111111111 Francisco, California, United States of America a1111111111 a1111111111 ☯ These authors contributed equally to this work. * [email protected] (CJC); [email protected] (ZBJ) Abstract OPEN ACCESS An expanded chemical space is essential for improved identification of small molecules for Citation: Pabon NA, Xia Y, Estabrooks SK, Ye Z, Herbrand AK, SuÈû E, et al. (2018) Predicting emerging therapeutic targets. However, the identification of targets for novel compounds is protein targets for drug-like compounds using biased towards the synthesis of known scaffolds that bind familiar protein families, limiting transcriptomics. PLoS Comput Biol 14(12): the exploration of chemical space. To change this paradigm, we validated a new pipeline e1006651. https://doi.org/10.1371/journal. that identifies small molecule-protein interactions and works even for compounds lacking pcbi.1006651 similarity to known drugs. Based on differential mRNA profiles in multiple cell types exposed Editor: Avner Schlessinger, Icahn School of to drugs and in which gene knockdowns (KD) were conducted, we showed that drugs induce Medicine at Mount Sinai, UNITED STATES gene regulatory networks that correlate with those produced after silencing protein-coding Received: May 16, 2018 genes. Next, we applied supervised machine learning to exploit drug-KD signature correla- Accepted: November 13, 2018 tions and enriched our predictions using an orthogonal structure-based screen. As a proof- Published: December 7, 2018 of-principle for this regimen, top-10/top-100 target prediction accuracies of 26% and 41%, Copyright: © 2018 Pabon et al. This is an open respectively, were achieved on a validation of set 152 FDA-approved drugs and 3104 poten- access article distributed under the terms of the tial targets. We then predicted targets for 1680 compounds and validated chemical interac- Creative Commons Attribution License, which tors with four targets that have proven difficult to chemically modulate, including non- permits unrestricted use, distribution, and covalent inhibitors of HRAS and KRAS. Importantly, drug-target interactions manifest as reproduction in any medium, provided the original author and source are credited. gene expression correlations between drug treatment and both target gene KD and KD of genes that act up- or down-stream of the target, even for relatively weak binders. These cor- Data Availability Statement: All predictions and code are open source and available at the relations provide new insights on the cellular response of disrupting protein interactions and supporting website http://sb.cs.cmu.edu/Target2/. highlight the complex genetic phenotypes of drug treatment. With further refinement, our Funding: This work was supported in part by the U. pipeline may accelerate the identification and development of novel chemical classes by S. National Institutes of Health (Grants screening compound-target interactions. 1U54HL127624 to ZBJ, DK079307 to JLB, and R01GM097082 to CJC) and by the U.S. National Science Foundation (Grants DBI-1356505 to ZBJ and 1247842, to NAP). ZY acknowledges support from the Tsinghua-Pittsburgh Joint Program. JLB and CJC acknowledges support from the Cystic PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006651 December 7, 2018 1 / 24 In silico identification of drug-protein partners Fibrosis Foundation (BRODSK18G0). The funders had no role in study design, data collection and Author summary analysis, decision to publish, or preparation of the manuscript. Bioactive compounds often disrupt cellular gene expression in ways that are difficult to predict. While the correlation between a cellular response after treatment with a small Competing interests: The authors have declared that no competing interests exist. molecule and the knockdown of its target protein should be simple to establish, in practice this goal has been difficult to achieve. The main challenges are that data are noisy, drugs are not intended to be active in all cell types, and signals from a bona fide target(s) may be obscured by correlations with knockdowns of other proteins in the same pathway(s). Here, we find that a random forest classification model can detect meaningful correla- tional patterns when gene expression profiles after compound treatment and gene knockdowns in four or more cell lines are compared. When this approach is enriched by a structure-based screen, novel drug-target interactions can be predicted. We then validated new ligand-protein interactions for four difficult targets. Although the initial compounds are not especially potent in vitro, they are capable of disrupting their target pathway in the cell to an extent that generates a significant and characteristic gene expression profile. Col- lectively, our studies provide insight on cell-level transcriptomic responses to pharmaceutical intervention and the use of these patterns for target identification. In addition, the method provides a novel drug discovery pipeline to test chemistries without a priori knowledge of their target(s). Introduction Most research programs focus on a subset of roughly 10% of human proteins, and this bias has a profound effect on drug discovery, as exemplified by studies on protein kinases [1±3]. The origin for this relatively limited exploration of the human interactome and the resulting lack of novel drugs for emerging `genomic-era' targets has been traced back to the availability of small molecular weight probes for only a narrow set of familiar protein families [1]. To break this vicious cycle, a new approach is needed that goes beyond known targets and old scaffolds and benefits from the vast amount of information we possess on gene expression, protein interactions, protein structures, and the genetic basis of disease. The current target-centric paradigm relies on high-throughput in vitro screens of large compound libraries against a single protein [4]. This approach has been effective for kinases, GPCRs, and proteases, but has produced meager yields for new targets such as protein-protein interactions, which require chemotypes absent in most compound libraries [5, 6]. Moreover, these in vitro biochemical screens often cannot provide any context regarding drug activity in the cell, multi-target effects, or toxicity [7, 8]. On the other hand, the goal of leveraging new chemistries requires a compound-centric approach that would test compounds directly on thousands of potential targets. In practice, this is undertaken in cell-based phenotypic assays, but it is often unclear how to identify potential molecular targets in these experiments [9±11]. Understanding how cells respond when specific interactions are disrupted is not only essential for target identification but also for developing therapies that might restore perturbed disease networks to their native states. Compound-centric computational approaches are now commonly applied to predict drugÐ target interactions by leveraging existing data. However, many of these methods extrapolate from known chemistry, structural homology, and/or functionally related compounds, and excel in target prediction only when the query compound is chemically or functionally similar to known drugs [12±17]. Other structure-based methods, such as molecular docking, can evaluate novel chemistries but are limited by the availability of protein structures [18±20], inadequate PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1006651 December 7, 2018 2 / 24 In silico identification of drug-protein partners scoring functions, and excessive computing times, which render structure-based methods ill- suited for genome-wide virtual screens [21]. More recently, a new paradigm to predict molecular interactions using cellular gene expression profiles has emerged [22±24]. Previous work showed that distinct inhibitors of the same protein target produce similar transcriptional responses [25]. Other studies predicted second- ary pathways affected by chemical inhibitors by identifying genes that, when deleted, diminish the transcriptomic signature of drug-treated

Predicting Protein Targets for Drug-Like Compounds Using Transcriptomics

Snps) Distant from Xenobiotic Response Elements Can Modulate Aryl Hydrocarbon Receptor Function: SNP-Dependent CYP1A1 Induction S

Table 1. Swine Proteins Identified As Differentially Expressed at 24Dpi in OURT 88/3 Infected Animals

Gene Set Size Count Z-Score P-Value Q-Value List of Genes

Polyubiquitin Gene Ubb Is Required for Upregulation of Piwi Protein Level During Mouse Testis Development

A Genomic Analysis of Rat Proteases and Protease Inhibitors

Supplementary Table 1: Genes Located on Chromosome 18P11-18Q23, an Area Significantly Linked to TMPRSS2-ERG Fusion

Isolation and Identification of Proteins from Swine Sperm Chromatin and Nuclear Matrix

Integrated Network Analysis Identifying Potential Novel Drug Candidates

Proteomic Analyses of Human Sperm Cells: Understanding the Role of Proteins and Molecular Pathways Aﬀecting Male Reproductive Health

Sensitive and Frequent Identification of High Avidity Neo-Epitope Specific CD8+ T Cells in Immunotherapy-Naive Ovarian Cancer

The Unique Functions of Tissue-Specific Proteasomes

Neonatal-Onset Autoinflammation and Immunodeficiency Caused by Heterozygous Missense Mutation of the Proteasome Subunit Β-Type 9