Integrative Cancer Pharmacogenomics to Infer Large-Scale Drug Taxonomy Nehme El-Hachem1,2, Deena M.A
Total Page:16
File Type:pdf, Size:1020Kb
Published OnlineFirst March 17, 2017; DOI: 10.1158/0008-5472.CAN-17-0096 Cancer Therapeutics, Targets, and Chemical Biology Research Integrative Cancer Pharmacogenomics to Infer Large-Scale Drug Taxonomy Nehme El-Hachem1,2, Deena M.A. Gendoo3,4, Laleh Soltan Ghoraie3,4, Zhaleh Safikhani3,4, Petr Smirnov3, Christina Chung5, Kenan Deng5, Ailsa Fang5, Erin Birkwood6, Chantal Ho5, Ruth Isserlin5, Gary D. Bader5,7,8, Anna Goldenberg5,9,and Benjamin Haibe-Kains3,4,5,10 Abstract Identification of drug targets and mechanism of action relationships. DNF is the first framework to integrate drug (MoA) for new and uncharacterized anticancer drugs is impor- structural information, high-throughput drug perturbation, tant for optimization of treatment efficacy. Current MoA pre- and drug sensitivity profiles, enabling drug classification of diction largely relies on prior information including side new experimental compounds with minimal prior informa- effects, therapeutic indication, and chemoinformatics. Such tion. DNF taxonomy succeeded in identifying pertinent and information is not transferable or applicable for newly identi- novel drug–drug relationships, making it suitable for investi- fied, previously uncharacterized small molecules. Therefore, a gating experimental drugs with potential new targets or MoA. shift in the paradigm of MoA predictions is necessary toward The scalability of DNF facilitated identification of key drug development of unbiased approaches that can elucidate drug relationships across different drug categories, providing a flex- relationships and efficiently classify new compounds with basic ible tool for potential clinical applications in precision med- input data. We propose here a new integrative computational icine. Our results support DNF as a valuable resource to the pharmacogenomic approach, referred to as Drug Network cancer research community by providing new hypotheses on Fusion (DNF), to infer scalable drug taxonomies that rely only compound MoA and potential insights for drug repurposing. on basic drug characteristics toward elucidating drug–drug Cancer Res; 77(11); 3057–69. Ó2017 AACR. Introduction ments in this field, key challenges still remain in the (i) design of classification approaches that rely on minimal drug characteristics Continuous growth and ongoing deployment of large-scale to classify drugs, and (ii) selection and integration of comple- pharmacogenomic datasets has opened new avenues of research mentary datasets to best characterize drugs' effects on biological for the prediction of biochemical interactions of small molecules systems. with their respective targets and therapeutic effects, also referred to The notion of a "minimalist" approach to represent similar- as drug mechanisms of action (MoA). Development of compu- ities among drug compounds has been extensively explored, tational methods to predict MoA of new compounds is an active with varying results. Several computational strategies have field of research in the past decade (1). Despite major advance- solely relied on chemical structure similarity to infer drug– target interactions (2, 3), based on the assumption that struc- 1Integrative Computational Systems Biology, Institut de Recherches Cliniques de turally similar drugs share similar targets, and ultimately, Montreal, Montreal, Quebec, Canada. 2Department of Biomedical Sciences. similar pharmacologic and biological activity (4). However, 3 Universite de Montreal, Montreal, Quebec, Canada. Princess Margaret Cancer sole dependence on chemical structure information fails to Centre, University Health Network, Toronto, Ontario, Canada. 4Department of consider drug-induced genomic and phenotypic perturbations, Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. 5Depart- ment of Computer Science, University of Toronto, Toronto, Ontario, Canada. which directly connect with biological pathways and molecular 6School of Computer Science, McGill University, Montreal, Quebec, Canada. 7The disease mechanisms (5, 6). Seminal approaches by Iorio and Donnelly Centre, Toronto, Ontario, Canada. 8The Lunenfeld-Tanenbaum colleagues (7) and Iskar and colleagues (8) leveraged drug- Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada. 9Hospital induced transcriptional profiles from Connectivity Map 10 for Sick Children, Toronto, Ontario, Canada. Ontario Institute of Cancer (CMAP; ref. 9) toward identification of drug–drug similarities Research, Toronto, Ontario, Canada. and MoA solely based on gene expression profiles (7). The Note: Supplementary data for this article are available at Cancer Research major limitation of CMAP, however, is the lack of global scope, Online (http://cancerres.aacrjournals.org/). as only 1,309 drugs are characterized across 5 cancer cell lines N. El-Hachem, D.M.A. Gendoo, and L.S. Ghoraie are the co-first authors of this (9). Other methods have integrated prior knowledge such as article. adverse effects annotations (10, 11), and recent approaches Corresponding Author: Benjamin Haibe-Kains, University Health Network, showed that integrating multiple layers of information had Princess Margaret Cancer Centre Research Tower, 11-310, 101 College Street, improved Anatomical Therapeutic Classification System (ATC) Toronto, Ontario M5G1L7, Canada. Phone 416-581-8626; E-mail: prediction for FDA-approved drugs (12). While these initiatives [email protected] constitutes major advances toward characterizing drug MoA, doi: 10.1158/0008-5472.CAN-17-0096 comparing the consistency of such efforts towards prediction of Ó2017 American Association for Cancer Research. new, uncharacterized small molecules remains a challenge. www.aacrjournals.org 3057 Downloaded from cancerres.aacrjournals.org on October 2, 2021. © 2017 American Association for Cancer Research. Published OnlineFirst March 17, 2017; DOI: 10.1158/0008-5472.CAN-17-0096 El-Hachem et al. Computational approaches designed to characterize drug MoA of more than 60 million unique structures. Tanimoto similarity have yet to capitalized on newly generated high-throughput data measures (19) between drugs were calculated by first parsing types. The published CMAP has recently been superseded by the annotated SMILES strings for existing drugs through the parse. L1000 dataset from the NIH Library of Integrated Network-based smiles function of the rcdk package (version 3.3.2). Extended Cellular Signatures (LINCS) consortium, which contains over 1.8 connectivity fingerprints (hash-based fingerprints, default length million gene expression profiles spanning 20,413 chemical per- 1,024) across all drugs was subsequently calculated using the turbations. A recent study of the LINCS data showed that struc- rcdk::get.fingerprints function (20). tural similarity is significantly associated with similar transcrip- tional changes (6). While the L1000 dataset provides an unprec- Drug perturbation signatures. We obtained transcriptional profiles edented compendium of both transcriptomic drug data, its inte- of cancer lines treated with drugs from the L1000 dataset recently gration with other pharmacogenomics data types has not been released by the Broad Institute, which contains over 1.8 million explored extensively. gene expression profiles of 1000 "landmark" genes across 20,413 The advent of high-throughput in vitro drug screening pro- drugs. We used our PharmacoGx package (version 1.3.4; ref. 21) mises to provide additional insights into drug MoA. The to compute signatures for the effect of drug concentration on the pioneering initiative of the NCI60 panel provided an assembly transcriptional state of a cell, using a linear regression model of tumor cell lines that have been treated against a panel of adjusted for treatment duration, cell line identity, and batch to over 100,000 small molecules (13), and served as the first identify the genes whose expression is significantly perturbed by large-scale resource enabling identification of lineage-selective drug treatment: small-molecule sensitivities (13). However, its relatively ¼ b þ b þ b þ b þ b ; G 0 iCi tT dD bB small number of cancer cell lines (n ¼ 59) restricted the relevance of these data for the prediction of drug MoA. To where address this issue, the Cancer Therapeutics Response Portal G ¼ molecular feature expression (gene). (CTRP) initiative screened 860 cancer cell lines against a set of Ci ¼ concentration of the compound applied. 481 small-molecule compounds (14), which makes it the T ¼ cell line identity. largest repository of in vitro drug sensitivity measurements to D ¼ experiment duration. date. Individual assessment of these in vitro sensitivity datasets B ¼ experimental batch. has highlighted their relevance for inference of MoA of b ¼ regression coefficients. approved and experimental compounds. It remains to be The strength of the feature response is quantified by bi. The demonstrated, however, whether integration of drug sensitiv- transcriptional changes induced by drugs on cancer cell lines are ity data with other drug-related data, such as drug structures subsequently referred to throughout the text as drug perturbation and drug-induced transcriptional signatures, can be used to signatures. Similarity between estimated coefficients of drug systematically infer drug MoA. perturbation signatures was computed using the Pearson corre- To efficiently harness these recent high-throughput datasets, lation coefficient, with the assumption that drugs similarly per- we have developed a scalable approach that maximizes com- turbing the same set of genes might have similar