Discovery of Drug Mode of Action and Drug Repositioning from Transcriptional Responses
Total Page:16
File Type:pdf, Size:1020Kb
Discovery of drug mode of action and drug repositioning from transcriptional responses Francesco Iorioa,b, Roberta Bosottic, Emanuela Scacheric, Vincenzo Belcastroa, Pratibha Mithbaokara, Rosa Ferrieroa, Loredana Murinob, Roberto Tagliaferrib, Nicola Brunetti-Pierria,d, Antonella Isacchic,1, and Diego di Bernardoa,e,1 aTeleThon Institute of Genetics and Medicine, Naples, Italy; cDepartment of Biotechnology, Nerviano Medical Sciences, Milan, Italy; eDepartment of Systems and Computer Science, “Federico II” University of Naples, Naples, Italy; dDepartment of Pediatrics, “Federico II” University of Naples, Naples, Italy; and bDepartment of Mathematics and Computer Science, University of Salerno, Salerno, Italy Edited by Charles R. Cantor, Sequenom, Inc., San Diego, CA, and approved July 2, 2010 (received for review January 5, 2010) A bottleneck in drug discovery is the identification of the molecular tivity Map” (11, 12). These compound-specific expression profiles targets of a compound (mode of action, MoA) and of its off-target can be queried with a gene signature to recover a subset of effects. Previous approaches to elucidate drug MoA include analy- compounds connected to the signature of interest. A compound sis of chemical structures, transcriptional responses following is selected if genes in the signature are significantly modulated in treatment, and text mining. Methods based on transcriptional the compound-specific transcriptional response. If a gene signa- responses require the least amount of information and can be ture for a new compound is available, it is then possible to search quickly applied to new compounds. Available methods are ineffi- the collection of transcriptional responses with that signature to cient and are not able to support network pharmacology. We de- identify well-characterized drugs, which behave similarly, and veloped an automatic and robust approach that exploits similarity thus infer the MoA of the new compound. The problems affecting in gene expression profiles following drug treatment, across multi- gene signature-based methods are in the choice of the subset of ple cell lines and dosages, to predict similarities in drug effect and genes composing the signature, and in the proper handling of MoA. We constructed a “drug network” of 1,302 nodes (drugs) and multiple expression profiles obtained by treating different cell 41,047 edges (indicating similarities between pair of drugs). We lines, with the same compound. A wrong selection of genes in applied network theory, partitioning drugs into groups of densely the signature will lead to capture similarities in the experimental BIOPHYSICS AND interconnected nodes (i.e., communities). These communities are settings (i.e., same cell line) rather than in the drug MoAs COMPUTATIONAL BIOLOGY significantly enriched for compounds with similar MoA, or acting (SI Methods). Because of these limitations, the analysis of the on the same pathway, and can be used to identify the compound- transcriptional response to a new compound is usually performed targeted biological pathways. New compounds can be integrated by mapping the most differentially expressed genes, following into the network to predict their therapeutic and off-target effects. compound treatment, onto known biological pathways, in order Using this network, we correctly predicted the MoA for nine antic- to detect the most perturbed pathways. Such attempts are met ancer compounds, and we were able to discover an unreported ef- with limited success due to the complexity of “backtracking” fect for a well-known drug. We verified an unexpected similarity expression changes to primary causes (i.e., molecular targets). between cyclin-dependent kinase 2 inhibitors and Topoisomerase Inspired by these considerations, we developed a general ap- Fasudil inhibitors. We discovered that (a Rho-kinase inhibitor) proach, with a matched online tool, to identify and classify the “ ” might be repositioned as an enhancer of cellular autophagy, po- pathway targeted by a compound and its MoA. We computed tentially applicable to several neurodegenerative disorders. Our for each drug a “consensus” synthetic transcriptional response approach was implemented in a tool (Mode of Action by NeTwoRk summarizing the transcriptional effect of the drug across multiple Analysis, MANTRA, http://mantra.tigem.it). treatments on different cell lines and/or at different dosages. We then constructed a “drug network” (DN) in which two drugs are computational drug discovery ∣ drug repurposing ∣ systems biology ∣ connected to each other if their consensus responses are similar chemotherapy according to a similarity measure that we developed (drug dis- tance). We divided the DN into interconnected modules termed dentifying molecular pathways targeted by a compound (drug “communities” and “rich clubs” (19). By analyzing these modules, effects), and the specific compound-substrate interactions (drug I we were able to capture similarities and differences in pharma- mode of action—MoA), is of paramount importance for the de- cological effects and MoAs; we were able to predict MoA of an- velopment of new drugs, and also for new clinical applications of ticancer compounds still being studied and to discover previously already existing drugs (1–3). Systems biology approaches are unreported MoAs for well-known drugs. naturally suited to capture the complexity of drug activity in cells We developed a Web-based tool to explore the DN and query it (4–6). Prediction of drug MoA has been attempted by using gene for classification of previously undescribed compounds (Mode of expression profiles following drug treatment (7–13), by compar- Action by Network Analysis—MANTRA, http://mantra.tigem.it). ing side-effect similarities (14), by text-mining literature (15), or by applying chemoinformatic tools to search for small molecules similarities (16, 17). Most of these approaches are applicable only Author contributions: F.I., N.B.-P., A.I., and D.d.B. designed research; F.I., R.B., E.S., P.M., to well-characterized molecules (e.g., when the structure is avail- N.B.-P., A.I., and D.d.B. performed research; F.I., V.B., and R.T. contributed new able, or side effects are documented). On the other hand, expres- reagents/analytic tools; F.I., R.B., E.S., V.B., P.M., R.F., L.M., A.I., and D.d.B. analyzed data; and F.I., R.B., E.S., N.B.-P., A.I., and D.d.B. wrote the paper. sion profile-based methods are the most general ones, because they do not require any prior information on the compound being The authors declare no conflict of interest. analyzed. Among the most promising approaches are the ones This article is a PNAS Direct Submission. based on “gene signatures” (11, 12), i.e., subset of genes whose Data deposition: The microarray data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession differential expression can be used as a marker of the activity of a no. GSE18552). given pathway, disease or compound. Gene signatures can be 1To whom correspondence may be addressed. E-mail: [email protected] used to discover “connections” among drugs, pathways, and dis- or [email protected]. eases (8, 11, 12, 18) using a large collection of transcriptional This article contains supporting information online at www.pnas.org/lookup/suppl/ responses following compound treatments, such as the “Connec- doi:10.1073/pnas.1000138107/-/DCSupplemental and at http://mantra.tigem.it. www.pnas.org/cgi/doi/10.1073/pnas.1000138107 PNAS Early Edition ∣ 1of6 Downloaded by guest on September 24, 2021 Results regulated). The size of these optimal signatures was heuristically Drug Network and Communities. We quantified the degree of simi- determined as described in SI Methods. larity in the transcriptional responses among drugs. To this end, We then checked if the genes in the optimal gene signature of we exploited a repository of transcriptional responses to com- the first compound ranked consistently at the top/bottom of the pounds: the Connectivity Map (cMap) (11, 12) containing 6,100 PRL of the second compound, and vice versa, using the Gene Set genome-wide expression profiles obtained by treatment of five Enrichment Analysis (GSEA) (22). We computed the GSEA en- different human cell lines at different dosages with a set of richment score of the optimal gene signature of compound A in 1,309 different molecules. We represented the similarity between the PRL of compound B, and vice versa. We then combined the two drugs as a “distance” and computed it as summarized in two scores to obtain a single value quantifying the distance be- Fig. 1A: For each compound, we considered all the transcrip- tween compound A and B (SI Methods). The smaller the distance, tional responses following treatments, across different cell lines the more similar the two compounds are. We computed the dis- and/or at different concentrations. Each transcriptional response tance for each pair of the 1,309 compounds in the cMap dataset for a total of 856,086 pairwise comparisons. We then considered was represented as a list of genes ranked according to their dif- each compound as a node in a network and connected two nodes ferential expression. We then computed a single “synthetic” with a weighted edge (where the weight is proportional to their ranked list of genes, the Prototype Ranked List (PRL), by mer- distance), if their distance was below a significant threshold value ging all the ranked lists referring to the same compound. In order (Fig. 1B). Drugs that were not connected to any other compound to equally weight the contribution of each of the cell lines to the by at least one edge were excluded from the DN (SI Methods). drug PRL, rank merging was achieved with a procedure (detailed The resulting DN has a giant connected component with 1,302 in SI Methods) based on a hierarchical majority-voting scheme, nodes (i.e., drugs) out of 1,309 and 41,047 edges, corresponding where genes consistently overexpressed/down-regulated across to 5% of a fully connected network with the same number of the ranked lists are moved at the top/bottom of the PRL (18).