bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 PharmOmics: A Species- and Tissue-specific Drug Signature Database and Online Tool 2 for Drug Repurposing 3 4 Yen-Wei Chen1,2,6, Graciel Diamante1,2,6, Jessica Ding1,3,6, Thien Xuan Nghiem1, Jessica 5 Yang1, Sung-min Ha1, Peter Cohn1, Douglas Arneson1,4, Montgomery Blencowe1,3, Jennifer 6 Garcia2, Nima Zaghari2, Paul Patel2, and Xia Yang1,2,3,4,5 * 7 8 1Department of Integrative Biology and Physiology, University of California, Los Angeles, 9 CA 10 2Interdepartmental Program of Molecular Toxicology, University of California, Los Angeles, 11 Los Angeles, CA 12 3Interdepartmental Program of Molecular, Cellular, & Integrative Physiology, Los Angeles, 13 Los Angeles, CA 14 4 Interdepartmental Program of Bioinformatics, University of California, Los Angeles, Los 15 Angeles, CA 16 5Institute for Quantitative and Computational Biosciences, University of California, Los 17 Angeles, CA 18 6These authors contributed equally. 19 20 Correspondence: 21 *Xia Yang, Ph.D. 22 Dept. Integrative Biology and Physiology, UCLA 23 Email: [email protected] 24

1 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

25 Abstract 26 Drug development has been hampered by a high failure rate in clinical trials due to efficacy 27 or safety issues not predicted by preclinical studies in model systems. A key contributor is 28 our incomplete understanding of drug functions across organ systems and species. Therefore, 29 elucidating species- and tissue-specific actions of drugs can provide systems level insights 30 into therapeutic efficacy, potential adverse effects, and interspecies differences that are 31 necessary for more effective translational medicine. Here, we present a comprehensive drug 32 knowledgebase and analytical tool, PharmOmics, comprised of genomic footprints of drugs 33 in individual tissues from human, mouse, and rat transcriptome data from GEO, 34 ArrayExpress, TG-GATEs, and DrugMatrix. Using multi-species and multi-tissue 35 expression signatures as molecular indicators of drug functions, we developed gene network- 36 based approaches for drug repositioning. We demonstrate the potential of PharmOmics to 37 predict drugs for new disease indications and validated two predicted drugs for non-alcoholic 38 fatty liver disease in mice. We also examined the potential of PharmOmics to identify drugs 39 related to hepatoxicity and nephrotoxicity. By combining tissue- and species-specific in vivo 40 drug signatures with biological networks, PharmOmics serves as a complementary tool to 41 support drug characterization. 42 43 Key words: PharmOmics, network medicine, tissue specificity, cross-species comparison, 44 drug repositioning, adverse drug reactions, drug toxicity 45

2 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

46 Background 47 Drug development has been challenging and costly over the past decades due to the high 48 failure rate in clinical trials (1). Most drugs with excellent efficacy and safety profiles in 49 preclinical studies often encounter suboptimal efficacy or safety concerns in humans (2). 50 This translational gap is likely attributable to our incomplete understanding of the systems 51 level activities of drugs in individual tissues and organ systems (3) as well as the differences 52 between humans and model systems (4). 53 54 Drug activities can be captured by gene expression patterns, commonly referred to as gene 55 signatures. By measuring how a pharmacological agent affects the gene signature of a cell or 56 tissue type in a particular species, we can infer the cell- or tissue-specific biological pathways 57 involved in therapeutic processes or toxicological responses. This concept has prompted drug 58 repositioning studies and provided important predictions for repurposing approved drugs for 59 new disease indications (5–10). Similarly, gene signatures can reveal mechanisms underlying 60 adverse drug reactions (ADRs) and be leveraged to predict ADRs as previously shown for 61 liver and kidney toxicity (11–13). 62 63 A drug may affect different molecular processes between tissues, providing treatment effects 64 in the desired target tissue(s) but causing toxicity or ADRs in other tissues. Therefore, tissue- 65 specific drug signatures will offer a more systematic understanding of drug actions in vivo. In 66 addition, rodent models have been commonly used in toxicology and preclinical studies, yet 67 species-specific effects of drugs have been observed (14) and underlie the lack of efficacy or 68 unexpected ADRs of certain drugs when used in humans (15). Therefore, understanding the 69 species-specific molecular effects of drugs is of high biological importance. A detailed 70 species- and tissue-specific drug genomic signature database will significantly improve our 71 understanding of the molecular networks affected by drugs and facilitate network-based drug 72 discovery and ADR prediction for translational medicine. 73 74 The potential of using gene signatures to facilitate target and toxicity identification has led to 75 several major efforts in characterizing genomic signatures related to drug treatment (8,16–18). 76 However, none of the existing platforms offer comprehensive cross-tissue and cross-species 77 in vivo assessments of drug activities to allow predictions of drug effects on individual tissues 78 and to help assess the translational potential of a drug based on consistencies or discrepancies 79 between species. For instance, the comparative toxicogenomics database (CTD), a literature- 80 based resource curating chemical-to-gene/ associations as well as chemical-to-disease 3 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

81 and gene/protein-to-disease connections (16), lacks the cellular and tissue context of the 82 curated interactions. More systematic, data-driven databases like CMAP (8) and LINC1000 83 (17) focus on characterizing and cataloging the genomic footprints of more than ten thousand 84 chemicals using in vitro cell lines (primarily cancer cell lines) to offer global views of 85 molecular responses to drugs in individual cellular systems. However, these in vitro cell-line 86 based gene signatures may not always capture in vivo tissue-specificity of drug activities. To 87 move into in vivo systems, large toxicogenomics databases like TG-GATEs (19) and 88 DrugMatrix from the National Toxicology Programs of the National Institute of 89 Environmental Health Sciences (https://ntp.niehs.nih.gov/drugmatrix/index.html) have 90 become available to provide unbiased transcriptome assessment for heart, muscle, liver, and 91 kidney systems. However, information about other organ systems is limited. Efforts to 92 analyze publicly deposited transcriptomic datasets in GEO (20) and ArrayExpress (21), 93 which have broader tissue coverage, for individual drugs have been described (18), but 94 systematic annotation and integration of species- and tissue-specific effects of drugs have not 95 been achieved. 96 97 Here, we present a database that contains 13,382 rat, human, and mouse transcriptomic 98 datasets across >20 tissues covering 941 drugs. We evaluated the tissue- and species- 99 specificity of drug signatures as well as the performance of these signatures in gene network- 100 based drug repositioning, toxicity prediction, target identification, and comparisons of 101 molecular activities between tissues and species. The drug signatures are hosted on an 102 interactive web server, PharmOmics, to enable public access to drug signatures and 103 integrative analyses for drug repositioning 104 (http://mergeomics.research.idre.ucla.edu/runpharmomics.php). 105 106 Methods 107 Curation of tissue- and species-specific drug transcriptomic datasets 108 As illustrated in Figure 1, we compiled a list of clinically relevant drugs, including 766 FDA 109 approved drugs from Kyoto Encyclopedia of and Genomes (KEGG) (16), which 110 overlapped with drugs from the US Food and Drug Administration (FDA), European Medical 111 Agency, and Japanese Pharmaceuticals and Medical Devices Agency, with an additional 175 112 chemicals from TG-GATEs (19) and DrugMatrix 113 (https://ntp.niehs.nih.gov/drugmatrix/index.html). The compiled drug list was queried against 114 GEO, ArrayExpress, TG-GATEs, and DrugMatrix to identify datasets as of January 2018. 115 Duplicated datasets between data repositories were removed. We developed a semi- 4 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

116 automated pipeline combining automated search with manual checking to identify relevant 117 datasets for drug treatment. The automated process first extracts datasets containing drug 118 generic names or abbreviations and then inspects the potential datasets for availability of both 119 drug treatment and control labels in the constituent samples. We also manually checked the 120 recorded labels identified by the automated process to validate the labels and remove 121 potential false detections. Only datasets with n>=3/group in both drug treatment and control 122 groups were included in our downstream analyses. Although a larger sample size is desired, 123 the majority (78.7%) of drug transcriptome datasets have n=3/group, 20.9% datasets have 124 n=2/group, and <1% datasets have n>3/group (Supplementary Table 1). 125 126 Obtaining drug treatment signatures stratified by species and tissues 127 Species and tissue labels were retrieved based on the metadata of each dataset. Tissue names 128 were standardized based on the Brenda Tissue Ontology (22). We implemented a search 129 procedure to climb the ontology tree structure to determine the suitable tissue annotations. 130 This was done by first building a priority list of widely used tissues/organs in toxicological 131 research using the Brenda Tissue Ontology tree system. Tissue/organ priority order was set to 132 "kidney", "liver", "pancreas", "breast", "ovary", "adipose tissue", "cardiovascular system", 133 "nervous system", "respiratory system", "urogenital system", "immune system", 134 "hematopoietic system", "skeletal system", "integument" (endothelial and skin tissue), 135 "connective tissue", "muscular system", "gland", "gastrointestinal system", and "viscus" 136 (other non-classified tissue). Tissue terms relevant to each of these tissues or organs were 137 curated from the ontology tree into a tissue/organ ontology table. Next, we looked up terms 138 from our tissue/organ ontology table in the Cell/Organ/Tissue column of the metadata in each 139 transcriptomic dataset. If a tissue/organ term was not found, we searched the title and 140 summary columns of the metadata as well to retrieve additional information. When the search 141 returned multiple tissue terms (for example, breast cancer cell line may be categorized as 142 both epithelial and breast organ), we used the term with the highest priority as described 143 above. We prioritized the tissue terms based on the relevance to toxicology to make tissue 144 assignments unique for each dataset to reduce ambiguity. Manual checking was conducted to 145 confirm the tissue annotation for each dataset. 146 147 For each gene expression dataset, normalized data were retrieved, and quantile distribution of 148 the values were assessed. When a dataset was not normally distributed, log2-transformation 149 using GEO2R (20) was applied. To identify differentially expressed genes (DEGs) 150 representing drug signatures, two different strategies were used. First, the widely used DEG 5 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

151 analysis method LIMMA (23) was applied to obtain dose and time segregated signatures 152 under FDR < 0.05. To overcome the low sample size issue and obtain “consensus” drug 153 signatures for a drug/chemical, LIMMA was also applied to datasets where multiple doses 154 and treatment durations were tested, and treatment effect were derived by combining 155 dose/time experiments for the same drug/chemical in each study. Second, we leveraged 156 different studies for the same drugs or chemicals in the same tissue and species to derive 157 meta-analysis signatures. To address heterogeneity in study design, platforms, sample size, 158 and normalization methods across different data repositories, we applied the characteristic 159 direction method from the GeoDE package (24) to derive consistent DEGs for each drug 160 across different data sources. GeoDE was designed to accommodate heterogenous datasets 161 that have differing parameters and outputs between treatment and control groups. It uses a 162 “characteristic direction” measure to identify biologically relevant genes and pathways. The 163 normalized characteristic directions for all genes were then transformed into a non-parametric 164 rank representation. Subsequently, gene ranks of a particular drug from the same tissue/organ 165 system and the same organism were aggregated across datasets using the Robust Rank 166 Aggregation method (25), a statistically controlled process to identify drug DEGs within each 167 tissue for each species. Robust Rank Aggregation provides a non-parametric meta-analysis 168 across different ranked lists to obtain commonly shared genes across datasets, which avoids 169 statistical issues associated with heterogeneous datasets. It computes a null distribution based 170 on randomized gene ranks and then compares the null distribution with the empirical gene 171 ranks to obtain a p-value for each gene. The robust rank aggregation process was done for the 172 upregulated and downregulated genes separately to obtain DEGs for both directions under 173 Bonferroni-adjusted p-value < 0.01, a cutoff implemented in the Robust Rank Aggregation 174 algorithm. To obtain species-level signatures for each drug, we further aggregated DEGs 175 across different organs tested for each drug within each species. 176 177 Pathway analysis of individual drug signatures was conducted using Enrichr (26) by 178 intersecting each signature with pathways or gene sets from KEGG (27) and 179 biological process (GOBP) terms (28). In addition, pathway enrichment analysis based on 180 network topology analysis (29) was conducted using ROntoTools (30). Pathways at false 181 discovery rate (FDR) < 0.05 were considered significant in both methods. 182 183 We curated 14,366 drug signatures segregated by treatment dosage and duration, tissue, and 184 species, covering 719 drugs and chemicals, among which 544 are FDA approved. In addition, 185 our meta signatures is a consensus of 4,349 signatures covering 551 drugs across treatment 6 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

186 regimens. In total, the entire database is based on 13,382 rat, human, and mouse 187 transcriptomic datasets across >20 tissue or organ systems across 941 drugs and chemicals 188 from GEO, ArrayExpress, DrugMatrix, and TG-GATEs to derive drug signatures. These rat, 189 human, and mouse datasets cover >20 tissue or organ systems. The toxicogenomics databases 190 TG-GATEs and DrugMatrix mainly contain liver and kidney datasets from rats, while public 191 data repositories GEO and ArrayExpress contain datasets with broader tissue and species 192 coverage (Figure 2A). Overall, the rat datasets are mainly from liver and kidney whereas 193 human and mouse datasets also contained signatures from other tissues and organs such as 194 breast and the nervous system (Figure 2B). There is also a species bias between the data 195 repositories; GEO covered more mouse and human datasets, TG-GATEs mainly has human 196 and rat datasets, and DrugMatrix curated more rat datasets (Figure 2C). 197 198 Comparison of PharmOmics with existing drug signature platforms 199 To assess the degree of agreement in drug signatures between the PharmOmics database and 200 existing platforms, we compared PharmOmics with the CREEDS (18) and L1000FWD (31) 201 databases, for which drug signatures are accessible (Supplementary methods). As shown in 202 Supplementary Figure 1, both the PharmOmics dose/time-segregated signatures and the 203 meta signatures showed better concordance with the two existing platforms than the 204 agreement between CREEDS and L1000FWD, as reflected by higher overlap fold 205 enrichment score and lower statistical p values. The three platforms have differences in the 206 datasets and analytical strategies and therefore are complementary. Due to the lack of full 207 access to CMAP signatures, we were not able to systematically compare PharmOmics against 208 CMAP. 209 210 Web server implementation of PharmOmics 211 To allow easy data access and use of PharmOmics, we have created a freely accessible web 212 tool deployed on the same Apache server used to host Mergeomics (32), a computational 213 pipeline for integrative analysis of multi-omics datasets to derive disease-associated 214 pathways, networks, and network regulators (http://mergeomics.research.idre.ucla.edu). 215 216 The PharmOmics web server features three functions (Figure 3). First, it allows queries for 217 species- and tissue-stratified drug signatures and pathways for both the dose/time-segregated 218 and meta signatures. Details of statistical methods (e.g, LIMMA vs characteristic direction), 219 signature type (dose/time-segregated vs meta), and datasets used are annotated. The drug 220 query also includes a function for DEG and pathway signature comparisons between user- 7 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

221 selected species and tissues which can be visualized and downloaded. Second, it features a 222 network drug repositioning tool that is based on the connectivity of drug signatures in 223 PharmOmics to user input genes such as a disease signature. This tool requires a list of genes 224 and a gene network that can be chosen from our preloaded gene regulatory networks if 225 relevant or a custom upload (see Applications below for details in implementation). In the 226 output, Z-score and p-value results of network repositioning are displayed and available for 227 download. In addition, we list the overlapping genes between drug signatures in the given 228 network and the input genes, the drug genes with direct connections to input genes through 229 one-edge extension, and input genes with one-edge connections to drug genes in the 230 downloadable results file. The output page also provides network visualization which details 231 the genes affected by a drug and their overlap with and direct connections to user input genes 232 using Cytoscape.js. The network nodes and edges files are also available for download and 233 can be used on Cytoscape Desktop. Figure 4 shows the web interface of the input submission 234 form (Figure 4A) and results display of the network repositioning tool using a sample liver 235 network and a sample hyperlipidemia gene set (Figure 4B). Lastly, the web server offers a 236 gene overlap-based drug repositioning tool that assesses direct overlap between drug gene 237 signatures and user input genes. Gene overlap-based drug repositioning requires a single list 238 of genes or separate lists of upregulated and downregulated genes and outputs the Jaccard 239 score, odds ratio, Fisher’s exact test p-value, within-species rank, and gene overlaps for drugs 240 showing matching genes with the input genes. This gene overlap-based approach is similar to 241 what was implemented in other drug repositioning tools, but the network-based repositioning 242 approach is unique to PharmOmics. 243 244 Experimental methods for NAFLD drug validation 245 Seven-week old C57BL/6 male mice were purchased from the Jackson Laboratory (Bar 246 Harbor, ME). After acclimation the animals were randomly assigned to four experimental 247 groups (n=7-9/group) on different diets/treatments: regular chow diet (Control) (Lab Rodent 248 Diet 5053, St. Louis, MO), high fat high sucrose (HFHS) diet (Research Diets-D12266B, 249 New Brunswick, NJ) to induce hepatic steatosis, a key NAFLD phenotype, HFHS diet with 250 fluvastatin treatment (NAFLD + Flu), and HFHS diet with aspirin treatment (NAFLD + Asp). 251 The target intake concentrations of fluvastatin and aspirin were 15mg/kg and 80 mg/kg, 252 respectively, which were chosen based on doses used in previous studies that did not show 253 toxicity (33,34). These experimental diets were then administered for 10 weeks. The average 254 fluvastatin intake was 14.98 mg/kg/day, and the average aspirin intake was 79.67 mg/kg/day. 255 8 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

256 During drug treatment, metabolic phenotypes such as body weight, body fat and lean mass 257 composition were monitored weekly. Fat and lean mass were measured with Nuclear 258 Magnetic Resonance (NMR) Bruker minispec series mq10 machine (Bruker BioSpin, 259 Freemont, CA). For metabolic phenotypes measured at multiple time points (body weight 260 gain and adiposity), differences between groups were analyzed using a 2-way ANOVA 261 followed by Sidak’s multiple comparisons test. At the end of treatment, livers from all groups 262 were weighed, flash frozen, and stored at −80°C until lipid analysis. Hepatic lipids were 263 extracted using the Folch method as previously described (35). The lipid extracts were 264 analyzed by the UCLA GTM Mouse Transfer Core for triglyceride (TG), total cholesterol 265 (TC), unesterified cholesterol (UC), and phospholipids (PL) levels by colorimetric assay from 266 Sigma (St. Louis, MO) according to the manufacturer’s instructions. All animal experiments 267 were approved by the UCLA Animal Research Committee. 268 269 270 Results 271 Evaluating the ability of PharmOmics to extract drug targets and target pathways 272 It remains unclear whether drug DEGs reflect drug targets. To evaluate this possibility, we 273 retrieved known targets for the drugs included in PharmOmics from the DrugBank database 274 (36) and used three different methods to evaluate the potential of DEGs for drug target 275 identification. The first method assessed direct gene overlaps between known drug targets 276 and DEG signatures. The second assessed overlaps between known drug target pathways and 277 drug DEG pathways from pathway enrichment analysis. The last method was based on 278 whether known drug targets were within the close neighborhood of drug DEGs in molecular 279 networks, including the STRING network (37) and tissue-specific Bayesian networks (BNs) 280 (Supplementary methods). For drugs with multiple dose and time regimens, only the 281 signature with the best performance was used in these analyses. 282 283 The drug target recovery rates using PharmOmics drug DEGs for gene overlap, pathway 284 overlap, STRING network overlap, and liver BN overlap with liver DEGs were 22%, 59.1%, 285 41.7%, and 60.2%, respectively, and were significantly higher than the rates using random 286 genes (Supplementary Table 2). Although these rates are low, gene overlap drug target 287 recovery rate using PharmOmics signatures was higher than using CMAP (14%) and L1000 288 (17%) signatures, and drug target recovery was improved by pathway and network 289 approaches. Notably, matching the tissue between DEGs and network improved the target 290 detection rate. However, we note that while the pathway- and network-based approaches 9 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

291 increased the detection rate for true drug targets, the number of false positives was also 292 increased. Overall, our results show that although PharmOmics has certain value in drug 293 target and pathway retrieval as shown by better performance than random genes and other 294 platforms, the retrieval rate is low. These results suggest that DEGs do not recover direct 295 drug targets well but more likely reflect target-related pathways, and caution should be taken 296 when using DEGs for target identification. 297 298 Utility of PharmOmics drug signatures in retrieving known therapeutic drugs for various 299 diseases 300 We next evaluate the ability of PharmOmics drug signatures to identify drugs for diseases 301 based on overlaps or network connectivity in gene signatures matched by tissue. We 302 hypothesized that if a drug is useful for treating a disease, the drug signatures and disease 303 signatures likely target similar pathways and therefore have direct gene overlaps or connect 304 extensively in gene networks. For gene overlap-based drug repositioning, we calculate the 305 Jaccard score, gene overlap fold enrichment, and Fisher’s exact test p values as the overlap 306 measurements. For network-based drug repositioning, we used a network proximity 307 measurement between drug and diseases genes which was previously applied to protein 308 interaction networks and known drug targets (5) (Supplementary methods). Here, we used 309 tissue-specific BNs and tested the mean shortest distance between drug DEGs and disease 310 genes. 311 312 The performance of PharmOmics drug repositioning was assessed using hyperlipidemia as 313 the first test case, as multiple known drugs are available as positive controls. Since 314 hyperlipidemia is most relevant to LDL and liver tissue, we retrieved LDL causal genes and 315 pathways in liver tissue based on LDL GWAS and liver genetic regulation of gene expression 316 using Mergeomics (Supplementary methods) (38), a method that can extract causal genes, 317 pathways, and networks for diseases (39,40). In addition to retrieving disease genes from 318 GWAS, a hyperlipidemia signature from CTD (16) was also used as an alternative source. 319 For each drug with different dose and treatment durations, the signature with the highest 320 overlap with the disease signature was used to represent the drug. Gene overlap- and 321 network-based methods using dose/time-segregated signatures had similar overall 322 performance (~90% AUC) in identification of antihyperlipidemic drugs (Figure 5A, 5B), and 323 the dose/time-segregated signatures performed better than the meta signatures when using 324 network-based repositioning (Figure 5C-5D). When compared to other platforms, 325 PharmOmics was able to retrieve higher prediction rankings for the known drugs (Table 1) 10 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

326 and better AUC (Figure 5C-5D) than CMAP and L1000 and higher balanced accuracy 327 (Supplementary Table 3) than CREEDS, CMAP, and L1000. These results support the 328 capacity of the PharmOmics platform as a drug repositioning tool. 329 330 We also examined the network overlap patterns of the top drugs consistently retrieved by the 331 PharmOmics platform, lovastatin (ranked top 1% in both PharmOmics and CREEDS) and 332 oxymetholone (ranked top 1% in PharmOmics and ranked as 15% in CREEDS). Both drugs 333 targeted lipid metabolism genes (e.g. Sqle and Hmgcr) and PPAR pathways in the 334 hyperlipidemia network (Figure 5E, 5F), but more lovastatin DEGs connected to disease 335 genes compared to oxymetholone DEGs. These results support the utility of a network-based 336 drug repositioning approach that does not require the direct retrieval of a known drug target 337 or direct overlap of drug DEGs with disease genes. 338 339 We further evaluated the performance of PharmOmics in retrieving known drugs for other 340 diseases. Using CTD disease signatures for hyperuricemia, we found network-based 341 repositioning obtained 90% AUC (p=0.009) for detection of anti-hyperuricemia drugs, 342 whereas the gene overlap-based method did not yield a significant AUC (prediction ranks in 343 Supplementary Table 4). We also queried hepatitis signatures and achieved 83% AUC (p< 344 0.001) using the network method and 79% AUC (p<0.001) using the gene overlap method in 345 retrieving non-steroid anti-inflammatory agents (prediction ranks in Supplementary Table 346 5). Finally, using diabetes signatures, PharmOmics was able to predict PPAR gamma agonist 347 drugs (79% AUC, p=0.04), but not sulfonylurea drugs which act on the pancreatic islet to 348 enhance insulin release (prediction ranks in Supplementary Table 6). We note that the 349 paucity of drug signatures in diabetes relevant tissues/cells such as the islets and the digestive 350 system likely explains why sulfonylurea drugs are harder to retrieve. Overall, these various 351 test cases using known therapeutic drugs as positive controls support the utility of network- 352 based drug repositioning for select diseases when drug signatures from the appropriate tissues 353 are used. 354 355 Use of PharmOmics to predict drugs for NAFLD 356 After establishing the performance of PharmOmics in drug repositioning using the case 357 studies above, we applied PharmOmics to predict potential drugs for NAFLD, for which 358 there is currently no approved drugs. Using NAFLD steatosis signatures from a published 359 study (40) and the CTD database (16), we predicted PPAR alpha agonists (clofibrate, 360 fenofibrate, bezafibrate, and gemfibrozil), HMG-CoA reductase inhibitors (lovastatin, 11 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

361 fluvastatin, and simvastatin), and a PPAR gamma agonist (rosiglitazone) among the top 10% 362 of the drug candidates (Supplementary Table 7). PPAR agonists have been supported as 363 potential drugs for NAFLD (41–52). Statins have shown efficacy in animal models (34,53), 364 although clinical results are controversial (54,55). Additional predicted drugs included aspirin, 365 which was recently reported to be associated with reducing liver fibrosis progression (56). 366 367 In vivo validation of drug repositioning predictions for NAFLD 368 Next, we sought to experimentally validate the ability of two top ranked drugs by 369 PharmOmics, fluvastatin and aspirin, to mitigate liver steatosis as predicted by PharmOmics 370 and assess the accuracy of repositioning ranks. Compared to other platforms 371 (Supplementary Table 7), fluvastatin was ranked high consistently in PharmOmics (top 5%), 372 CMAP (top 1% in all cells combined, 20% in HEPG2), CREEDS (20%), and L1000 (top 1% 373 in all cells combined, 55% in HEPG2). In comparison, aspirin was ranked higher in 374 PharmOmics (top 5%) compared to CREEDS (30%) and CMAP (35%) and was not 375 documented in L1000. Therefore, these predictions are relatively unique to PharmOmics. 376 377 Comparison between the mice in HFHS group (NAFLD) and the chow group (Control) 378 confirmed HFHS induced NAFLD phenotypes including increased body weight, adiposity, 379 and hepatic steatosis (Supplementary Figure 2A and 2B). Comparison of the fluvastatin 380 and aspirin treated groups with the NAFLD group revealed significant drug effects on body 381 weight gain for both fluvastatin (p<0.0001; Figure 6A) and aspirin (p<0.0001; Figure 6B). 382 The adiposity phenotype (fat and lean mass ratio) also showed significant drug effects from 383 both fluvastatin (p<0.0001; Figure 6C) and aspirin (p=0.0157; Figure 6D). 384 385 There was no significant difference in total liver weight among the groups (Supplementary 386 Figure 2C for Control and NAFLD group comparison; Supplementary Figure 3A and 3B 387 for NAFLD and drug group comparisons). As expected, the HFHS group had significantly 388 elevated levels of liver TG compared to controls, without changes in other lipids measured 389 such as TC, UC, and PL (Supplementary Figure 2D). In the drug treatment groups, both 390 fluvastatin (p=0.0044) and aspirin (p=0.0023) induced significant decreases in hepatic TG 391 compared to the NAFLD group, without any effect on TC, UC, and PL (Figure 5E-5F). 392 393 We further investigated whether the effects of the drugs on NAFLD phenotypes were 394 confounded by food and water intake. No effect of food intake was observed in the NAFLD + 395 Flu group; however, there was a significant decrease in food intake in the NAFLD + Asp 12 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

396 group (Supplementary Figure 3C-3D). No effect on water intake was found for both groups 397 (Supplementary Figure 3E-3F). We next adjusted for food intake in the NAFLD phenotypic 398 analysis for body weight gain, adiposity, and TG levels using linear regression. After the 399 adjustment, the significant effects of fluvastatin on NAFLD phenotypes remained (body 400 weight gain p=0.0306; adiposity p=0.0022; hepatic TG p=0.0190). For aspirin, the significant 401 effects on adiposity (p=0.0479) and hepatic TG (p=0.0372) remained, but the effect on body 402 weight gain was no longer significant (p=0.0559). Overall, food/water intake did not have 403 major influence on treatment effects on NAFLD observed for both drugs. 404 405 Our experimental validation experiments support the efficacy of both fluvastatin and aspirin 406 in mitigating NAFLD. The effects of fluvastatin were stronger than that of aspirin and 407 visualization of the network overlaps between NAFLD signatures and drug signatures 408 revealed more extensive disease network connections for fluvastatin than for aspirin (Figure 409 6G-6H), supporting their repositioning ranks and potential mechanisms of action. The 410 signatures of the two drugs connected to pathways involved in NAFLD such as PPAR 411 signaling pathways, fatty acid and steroid biosynthesis (Figure 6G-6H). 412 413 Utility of PharmOmics drug signatures in predicting and understanding hepatotoxicity 414 We further explored the potential of coupling PharmOmics drug signatures and tissue 415 networks to predict liver toxicity, a major type of ADR for which both toxicity signatures and 416 orthogonal ADR documentations from various independent databases are available for 417 performance evaluation. We used the chemical-induced liver injury signature containing 435 418 genes from CTD to match with PharmOmics drug signatures through liver gene networks. 419 We then used both the histological severity from TG-GATEs and the independent FDA drug- 420 induced liver disease (DILI) categories (“most”, “less” – moderate/mild DILI adverse 421 reactions compared to the “most” category, and “no” DILI concern) as in silico independent 422 validation of the drugs predicted by PharmOmics that match with the CTD liver toxicity 423 signature. 424 425 First, we examined the relationship between the matching scores of PharmOmics signatures 426 and the histological severity grading based on TG-GATEs. Both the network-based and gene 427 overlap-based scores from PharmOmics increased with higher histological severity defined 428 by TG-GATEs (Figure 7A). Next, we examined the dose-dependent effects across the TG- 429 GATEs histological severity categories as well as the three FDA DILI categories. Our results 430 indicated that severe histological grading occurred mainly at higher drug doses within both 13 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

431 the “less” and “most” DILI concern categories (Figure 7B). Analysis of the relationship 432 between dose/time-segregated signatures and network-based PharmOmics scores indicated 433 that drug treatment at higher doses had higher network matching ranks in PharmOmics and 434 more severe DILI (Figure 7C). In addition, we tested the performance of PharmOmics in 435 predicting hepatotoxic drugs from the FDA DILI drug database. PharmOmics dose/time- 436 segregated signatures resulted in higher performance (67% AUC) compared to the meta 437 signatures (60% AUC) and the other platforms tested such as CREEDS, CMAP, and L1000 438 (AUC 50-53%; Figure 7D; Supplementary Figure 4). Top drug predictions based on the 439 complete hepatotoxicity signatures were wy-14643 (experimental drug with severe 440 histological finding in TG-GATEs), dexamethasone (moderate DILI concern category in 441 FDA and moderate histological finding in TG-GATEs), phenobarbital (moderate DILI 442 concern), indomethacin (“most” DILI concern), and fenofibrate (moderate DILI concern). 443 444 Since CTD curated a large number of genes (435 genes) related to chemical induced liver 445 injury, we hypothesized that this large network could be divided into subnetworks indicative 446 of different mechanisms towards liver toxicity, which could improve toxicity prediction for 447 drugs with different mechanisms. We first examined network overlapping patterns of the top 448 5 predicted drugs by using the CTD liver injury genes (Figure 7E) and found consistent 449 targeting of gene subnetworks across top predictions. We then applied the Louvain clustering 450 method to divide the liver injury network into subnetworks and filtered subnetworks with less 451 than 10 genes to reduce uncertainty. These different subnetworks showed varying abilities in 452 identifying drugs with DILI concerns (Supplementary Table 8). The best performing 453 subnetwork showed improved AUC compared to the whole network (75% vs 67%; Figure 454 7D). Further scrutinization of key genes documented in CTD signatures of the top performing 455 subnetwork revealed that the antioxidant gene GSR, the phase 2 drug metabolizer NAT2, and 456 the inflammatory response gene IRAK1 showed the best predictability (Supplementary 457 Table 8). These results suggest that the network-based toxicity prediction approach may help 458 prioritize predictive genes, pathways, and subnetworks related to hepatotoxicity. 459 460 Utility of PharmOmics drug signatures in predicting and understanding nephrotoxicity 461 We also examined the performance of PharmOmics in predicting nephrotoxicity, another 462 ADR for which both toxicity signature and drug ADR documentations are available from 463 independent sources to help validate performance. 464

14 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

465 Nephrotoxicity signatures were curated from CTD using either the chronic kidney disease 466 signature (CKD, 56 genes) or acute kidney injury signature (AKI, 120 genes), which were 467 matched with PharmOmics drug signatures to predict drugs matching CKD or AKI signatures. 468 The PharmOmics predictions were then validated using kidney histological severity 469 documented by TG-GATEs or nephrotoxicity defined by DrugBank. There were 13 shared 470 genes between CKD and AKI signatures including several inflammatory factors TNF, 471 TGFB1, NFKB1, and IL6. We found that unlike AKI signatures, using CKD signatures 472 against PharmOmics drug signatures resulted in network matching scores (not gene overlap 473 scores) that agreed well with histological severity documented in TG-GATEs (Figure 8A). 474 Therefore, we focused on using CKD signatures in downstream network-based analyses. We 475 found that PharmOmics drug signatures of higher doses predicted more drugs with severe or 476 moderate kidney histology categorized by TG-GATEs as well as drugs with nephrotoxicity as 477 defined by DrugBank (Figure 8B). However, when examining the relationship between 478 PharmOmics network scores across doses and DrugBank nephrotoxicity categories (non- 479 nephrotoxic or nephrotoxic), the network scores did not show a significant dose-dependent 480 relationship (Figure 8C). This is in contrast to the dose-dependent relationship observed for 481 hepatotoxicity analysis (Figure 7C). The weaker performance of PharmOmics in 482 nephrotoxicity prediction could be due to the smaller number of kidney drug datasets (~1k) 483 compared to liver drug datasets (~5k) based on data availability. 484 485 Finally, we assessed the performance of PharmOmics and other tools in identifying 486 DrugBank nephrotoxic agents. PharmOmics dose/time-segregated (64% AUC, p=0.001) and 487 meta databases (61% AUC, p=0.028) both showed a significant performance (Figure 8D), 488 whereas from the other tools evaluated only CMAP (63% AUC, p<0.001) showed a 489 significant performance (L1000 43% and CREEDS 56% AUC, non-significant). The top 5 490 nephrotoxic drugs predicted by PharmOmics were dexamethasone (potential CKD alleviating 491 agent) (57,58), naproxen (documented nephrotoxic drug in DrugBank), cholecalciferol 492 (potential CKD alleviating agent) (59), beta-estradiol (potential alleviating agent in women) 493 (60,61), and ibuprofen (documented nephrotoxic drug in DrugBank). We also examined the 494 gene overlap patterns of the top drugs with the CKD gene network (Figure 8E) and found 495 sparse overlap, which again is in contrast to the top hepatotoxicity drugs (Figure 7E). 496 497 Overall, our assessment of the application of PharmOmics in toxicity or ADR prediction 498 supports its potential value but also emphasizes that PharmOmics drug signatures may have 499 differing performance in different use cases. Several factors, including the toxicity signatures 15 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

500 used (e.g., CKD signature performed better than AKI signature for nephrotoxicity prediction), 501 ADR/toxicity annotation (e.g., TG-GATEs or DrugBank), and signature matching method 502 (network-based approach better than gene overlap approach) can all significantly affect the 503 results. We also note that our network approach does not differentiate toxicity-inducing drugs 504 from toxicity-mitigating drugs since it is based on network connectivity and not the 505 directionality of gene signatures. 506 507 Utility of meta-analysis signatures to understand tissue and species specificity 508 We used meta signatures, which reflect the dose-independent, consistent genes affected by 509 drugs across studies in the same tissue or species, to evaluate tissue and species specificity of 510 drugs by analyzing the overlap in gene signatures for each drug across different tissues and 511 species and visualized the results using UpSetR (62). As shown in Figure 9A, the overlap 512 rate in the DEGs of the same drug between tissues and organs is usually less than 5%, 513 indicating a high variability in DEGs between tissues. As an example, we examined 514 atorvastatin, a HMGCR (β-Hydroxy β-methylglutaryl-CoA ) inhibitor, which has 515 well understood mechanisms and has been broadly tested in different tissues under the human 516 species label. We found that two DEGs (TSC22D3, THBS1) were shared across tissues 517 (Figure 9B). These genes are involved in and inflammation, suggesting 518 these processes are common targets of atorvastatin across tissues. Among the pathways 519 shared across tissues, immune related pathways were shared between blood cells and liver 520 cells but not in prostate cells from the urogenital system (Figure 9C, Supplementary Table 521 9). Pathway analysis indicated that steroid synthesis and drug metabolism pathways were 522 altered primarily in liver, which is expected as the known target of statin drugs is HMGCR, 523 the rate limiting enzyme in cholesterol biosynthesis in liver. Blood monocyte DEGs indicated 524 changes in inflammation related pathways, while GPCR ligand binding were altered 525 in prostate cancer cells. The tissue specificity of drug meta signatures revealed through our 526 analysis supports tissue-specific therapeutic responses and side effects and emphasizes the 527 need for comprehensive inclusion of drug signatures from different tissue systems as 528 implemented in the PharmOmics framework. 529 530 We also found evidence for high species specificity. As shown in Figure 9D, the pair-wise 531 overlaps in DEGs between species for the same drug is generally lower than 5%. Here we 532 chose PPAR gamma receptor agonist rosiglitazone as an example because this drug has 533 datasets across human, rat, and mouse in PharmOmics, and its mode of actions is well- 534 studied. As shown in Figure 9E and 9F, nine genes (CPT1C, AKR1B1, VNN1, ACSM3, 16 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

535 CD36, CPT1A, PDK4, ZNF669, ADH1C) and several pathways (PPAR signaling and fatty 536 acid, triacylglycerol, and ketone body metabolism) were consistently identified from liver 537 DEGs across species (Supplementary Table 10), reflecting the major species-independent 538 pharmacological effects of rosiglitazone. Bile acid related genes were altered in rat datasets, 539 whereas retinol metabolism and adipocytokine pathways were altered in human datasets. The 540 species-specific differences identified highlight the importance of understanding the 541 physiological differences among model systems to facilitate drug design with better 542 translational potential. Our cross-species comparative studies also emphasize the need to 543 investigate drugs in multiple species, as only 21% of the unique drug-tissue pairs (236 out of 544 1144) from PharmOmics meta signatures have data from two or more species. 545 546 Discussion 547 Here we present PharmOmics, a publicly available drug signature database along with an 548 open-access web interface for accessing and utilizing the signatures for various applications. 549 PharmOmics utilizes published drug-related transcriptomic datasets across multiple data 550 repositories and provides unique tissue-, species-, and dose/time-stratified gene signatures 551 that are more reflective of in vivo activities of drugs. We also developed a unique framework 552 for drug repositioning based on tissue-specific gene network models. We examined the 553 potential applications of PharmOmics for various utilities including drug repurposing, 554 toxicity prediction, target identification, and comparisons of molecular activities between 555 tissues and species. We also carried out in silico performance assessments across drug 556 signature databases and in vivo mouse experiments to validate our network-based drug 557 predictions for NAFLD. 558 559 Compared to the well-established CMAP and LINC1000 platforms, PharmOmics focuses 560 more on in vivo settings and likely captures more physiologically relevant drug signatures to 561 improve drug repositioning performance. Compared to a previous crowdsourcing effort 562 which also utilizes publicly available drug datasets (18), our PharmOmics platform included 563 more curated databases (TGGATEs + drugMatrix Affymetrix + drugMatrix Codelink 564 datasets compared to only drugMatrix Codelink datasets from CREEDS) and involved 565 systematic tissue, species, and treatment regimen stratification to facilitate drug repositioning. 566 Our platform is also the only tool utilizing a gene network framework rather than direct gene 567 overlap approach. 568

17 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

569 The use of tissue annotation with Brenda Tissue Ontology helps normalize organ labels and 570 improves comparability of datasets. The unique tissue- and species-specific analyses 571 implemented in PharmOmics allows for comprehensive molecular insight into the actions of 572 drug molecules in individual tissues and species. Our results support that different species 573 have unique drug responses in addition to shared features; therefore, drug responses obtained 574 in animal models require caution when translating to humans. This notion agrees with the 575 long-observed high failure rate of drug development that has primarily relied on preclinical 576 animal models and argues for greater consideration and understanding of inter-species 577 differences in drug actions. 578 579 In addition to tissue and species stratification, we also provide detailed dose/time-segregated 580 drug signatures, which can help better understand the dose- and time-dependent effects of 581 drugs through gene signature and pathway comparisons offered through our web server. By 582 contrast, the meta-analysis signatures capture the consistent genes and pathways across 583 treatment regimens, which likely represent core, dose/time-independent mechanisms, and 584 help address the sample size issue of individual datasets since the majority of drug treatment 585 datasets have n<=3/group. Dose/time-segregated signatures performed better than meta 586 signatures for both drug repositioning and toxicity prediction. However, meta signatures 587 showed better performance than CMAP, LINC1000, and CREEDs (Figure 5, 7, 8), and can 588 also significantly shorten the computation time in network-based repositioning applications. 589 For instance, computation using 1251 human meta signatures can be completed in 40 minutes, 590 whereas using ~14,000 dose/time-segregated signatures can take 4 hours. These estimates 591 will vary depending on input data size and server load. 592 593 Previous drug repositioning studies support the utility of a protein network-based approach 594 for drug repositioning. Here we show that combining the drug transcriptomic signatures in 595 PharmOmics with tissue-specific gene regulatory networks and gene signatures of diseases 596 can retrieve known therapeutic drugs, predict potential therapeutic avenues, and predict tissue 597 toxicity. Compared to other platforms, the use of tissue- and species-specific drug signatures 598 along with network biology is a unique strength of PharmOmics, which enables drug 599 prioritization based on network proximity rather than direct gene overlaps. We demonstrate 600 in various applications that network-based analysis had a superior performance to that of 601 gene overlap-based analysis. Moreover, the tissue-specific network connections between 602 drugs and diseases or toxicity offer molecular and mechanistic insights into the therapeutic or 603 toxic effects of drugs. For instance, fluvastatin showed different NAFLD overlapping 18 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

604 patterns compared to aspirin, which inferred differences in disease repositioning depending 605 on different drug mechanisms. 606 607 In general, gene signatures of drugs reflect cascades of downstream events after drug 608 administration. The initial drug target(s) may or may not be captured by drug DEGs due to 609 the lack of dynamic information in the DEGs. Therefore, we explored if PharmOmics 610 signatures as well as signatures from other platforms can be used to retrieve drug targets 611 through integration with pathway or network information. Our results show that DEGs may 612 help inform on the pathways affected by the drugs but retrieving the direct targets can be 613 difficult. We caution the use of drug DEGs from any drug signature platform for direct target 614 identification. 615 616 There are several limitations in this study. First, our computational pipeline may not be able 617 to identify all of the drug datasets from GEO and ArrayExpress database. Variations in 618 annotations of drug names, sample size, definition of treatment vs control groups, and 619 tissue/cell line labeling across datasets make it challenging to design a fully automated 620 pipeline to curate drug signatures. It is therefore crucial for GEO and ArrayExpress 621 repositories to offer clear definitions and instructions for metadata generation in order to 622 standardize terms across datasets to facilitate future data reuse. Secondly, the coverage of 623 tissue, species, and treatment regimens across drugs is unbalanced, preventing a thorough 624 comparison across tissues, species, dosages, and treatment windows. We will continue to 625 refine the pipeline and update our PharmOmics database periodically to include more 626 datasets as they become available to increase the coverage of datasets and drug signatures. 627 Thirdly, the sample sizes for drug treatment studies tend to be small (majority with 628 n=3/group or less), which limit the statistical power and reliability of the drug signatures 629 when individual studies were analyzed. This is an intrinsic limitation of existing drug studies 630 and highlights the need for systematic efforts to increase sample sizes in such studies. To 631 mitigate this concern and reduce the reliance on individual studies, we implemented a meta- 632 analysis strategy to aggregate drug signatures from individual studies and derive meta 633 signatures. However, this strategy removes dosage- and time-dependent effects. We offer 634 both options in our database to mitigate sample size concerns through meta-analysis and 635 retain dose and time regimen information through dose/time-segregated analysis. Fourth, our 636 network-based applications are currently limited in the coverage of high-quality tissue 637 specific regulatory networks and computational power. We will continue to expand and 638 improve the tissue networks and computing environment in our web server. Lastly, 19 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

639 systematic validation efforts are needed to substantiate the value of our platform. We utilized 640 both in silico performance assessments and in vivo experiments to validate our predictions in 641 limited settings. We mainly focused on liver related diseases with well-documented drugs 642 and disease signatures (hyperlipidemia, hyperuricemia, diabetes, and liver/kidney toxicity) to 643 benchmark the utilities of PharmOmics and experimentally validated two drugs predicted for 644 NAFLD. As with the other existing platforms such as CMAP and LINC1000, future 645 application studies and community-based validation efforts are necessary to assess the value 646 of PharmOmics. 647 648 Conclusion 649 We have established a new drug signature database, PharmOmics, across different species 650 and tissues, which captures the systems level in vivo activities of drug molecules. In addition, 651 we demonstrate the possible means to integrate these signatures with network biology to 652 address drug repositioning needs for disease treatment and to predict and characterize liver 653 and kidney injury. PharmOmics has the potential to complement other available drug 654 signature databases to accelerate drug development and toxicology research. Our 655 PharmOmics database and pipeline will be updated periodically to include newly available 656 datasets to increase the coverage of the drug signatures across tissues and species. It should 657 be noted that we aim to position PharmOmics as a data-driven compensatory tool in 658 hypothesis generation. Integration with known drug characteristics to select drug candidates 659 and design follow up experiments are still essential. 660 661 List of abbreviations 662 ADR adverse drug reactions 663 CTD comparative toxicogenomics database 664 KEGG Kyoto Encyclopedia of Genes and Genomes 665 DEG differential expressed genes 666 FDR false discovery rate 667 wKDA weighted key driver analysis 668 NAFLD non-alcoholic fatty liver disease 669 LDL low-density lipoprotein cholesterol 670 GWAS genome-wide association study 671 BN Bayesian gene regulatory network 672 ROC Receiver operating characteristic 673 HMGCR β-Hydroxy β-methylglutaryl-CoA receptor 20 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

674 PPAR Peroxisome proliferator-activated receptor 675 GPCR G-protein coupled receptor 676 677 Declaration 678 Ethics approval and consent to participate 679 Not applicable. 680 Consent for publication 681 Not applicable. 682 Availability of data and materials 683 Indexed dataset catalog, pre-computed gene signatures and pre-computed pathway 684 enrichments for individual drugs are deposited to and accessible through the PharmOmics 685 web server (http://mergeomics.research.idre.ucla.edu/runpharmomics.php). We also 686 implemented functions for same-tissue between-species comparison and same-species 687 between-tissue comparison. Direct download of select drug signatures is also enabled. In 688 addition, network-based drug repositioning analysis and gene overlap-based drug 689 repositioning analysis using all drug signatures are available at 690 http://mergeomics.research.idre.ucla.edu/runpharmomics.php. 691 Competing interests 692 The authors declare that they have no competing interests. 693 Funding 694 YC was supported by UCLA Eureka fellowship and Burroughs Wellcome Fund Inter-school 695 Training Program in Chronic Diseases. DA was supported by NIH-NCI National Cancer 696 Institute (T32CA201160), UCLA dissertation year fellowship and UCLA Hyde fellowship. 697 XY was funded by NIH DK104363 and DK117850. GD was supported by the National 698 Institute of Environmental Health Sciences (T32ES015457) and the American Diabetes 699 Association Postdoctoral Fellowship (1-19-PDF-007-R). 700 Authors' contributions 701 YC curated and analyzed data, constructed database, and designed and conducted application 702 studies. GD, JY, and PC conducted validation experiments. JD, TXN, DH, and MB designed 703 and implemented the PharmOmics web server. DA provided support in data curation and 704 analysis. GA, JG, NZ, and PP assisted with data curation. YC, GD, JD, and XY wrote the 705 manuscript. XY designed and supervised the research. All authors contributed to manuscript 706 editing. 707 Acknowledgements

21 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

708 We thank Dr. Patrick Allard at UCLA for constructive input on the manuscript. We thank Dr. 709 Avi Ma’ayan at Icahn School of Medicine at Mount Sinai for providing L1000FWD and 710 CREEDS data for comparison. 711 References: 712 1. Morgan S, Grootendorst P, Lexchin J, Cunningham C, Greyson D. The cost of drug 713 development: a systematic review. Health Policy. 2011 Apr;100(1):4–17. 714 2. Yu HWH. Bridging the translational gap: collaborative drug development and 715 dispelling the stigma of commercialization. Drug Discov Today. 2016;21(2):299–305. 716 3. Lin Z, Will Y. Evaluation of Drugs With Specific Organ Toxicities in Organ-Specific 717 Cell Lines. Toxicol Sci. 2011 Dec;126(1):114–27. 718 4. Denayer T, Stöhr T, Van Roy M. Animal models in translational medicine: Validation 719 and prediction. New Horizons Transl Med. 2014;2(1):5–11. 720 5. Cheng F, Desai RJ, Handy DE, Wang R, Schneeweiss S, Barabási A-L, et al. Network- 721 based approach to prediction and population-based validation of in silico drug 722 repurposing. Nat Commun [Internet]. 2018;9:2691. Available from: 723 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6043492/ 724 6. Guney E, Menche J, Vidal M, Barábasi A-L. Network-based in silico drug efficacy 725 screening. Nat Commun [Internet]. 2016;7:10331. Available from: 726 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4740350/ 727 7. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach 728 to human disease. Nat Rev Genet. 2010/12/18. 2011;12(1):56–68. 729 8. Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, et al. A Next 730 Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 731 2017/12/02. 2017;171(6):1437-1452.e17. 732 9. Hall CJ, Wicker SM, Chien A-T, Tromp A, Lawrence LM, Sun X, et al. Repositioning 733 drugs for inflammatory disease – fishing for new anti-inflammatory agents. Dis Model 734 &amp; Mech [Internet]. 2014;7(9):1069. Available from: 735 http://dmm.biologists.org/content/7/9/1069.abstract 736 10. Corbett A, Pickett J, Burns A, Corcoran J, Dunnett SB, Edison P, et al. Drug 737 repositioning for Alzheimer's disease. Nat Rev Drug Discov [Internet]. 738 2012;11:833. Available from: http://dx.doi.org/10.1038/nrd3869 739 11. Godoy P, Bolt HM. Toxicogenomic-based approaches predicting liver toxicity in vitro. 740 Arch Toxicol [Internet]. 2012;86(8):1163–4. Available from: 741 https://doi.org/10.1007/s00204-012-0892-5 742 12. Low Y, Uehara T, Minowa Y, Yamada H, Ohno Y, Urushidani T, et al. Predicting 22 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

743 Drug-Induced Hepatotoxicity Using QSAR and Toxicogenomics Approaches. Chem 744 Res Toxicol [Internet]. 2011;24(8):1251–62. Available from: 745 https://doi.org/10.1021/tx200148a 746 13. AbdulHameed MD, Ippolito DL, Stallings JD, Wallqvist A. Mining kidney 747 toxicogenomic data by using gene co-expression modules. BMC Genomics. 748 2016/10/12. 2016;17(1):790. 749 14. Toutain P-L, Ferran A, Bousquet-Melou A. Species differences in pharmacokinetics 750 and pharmacodynamics. Handb Exp Pharmacol. 2010;(199):19–48. 751 15. Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related 752 parameters. 2018 Jan; 753 16. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, et al. The 754 Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res [Internet]. 755 2017;45(Database issue):D972–8. Available from: 756 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210612/ 757 17. Wang Z, Clark NR, Ma’ayan A. Drug-induced adverse events prediction with the 758 LINCS L1000 data. Bioinformatics [Internet]. 2016;32(15):2338–45. Available from: 759 http://dx.doi.org/10.1093/bioinformatics/btw168 760 18. Wang Z, Monteiro CD, Jagodnik KM, Fernandez NF, Gundersen GW, Rouillard AD, 761 et al. Extraction and analysis of signatures from the Gene Expression Omnibus by the 762 crowd. Nat Commun [Internet]. 2016;7:12846. Available from: 763 http://dx.doi.org/10.1038/ncomms12846 764 19. Igarashi Y, Nakatsu N, Yamashita T, Ono A, Ohno Y, Urushidani T, et al. Open TG- 765 GATEs: a large-scale toxicogenomics database. Nucleic Acids Res. 2014/10/15. 766 2015;43(Database issue):D921-7. 767 20. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI 768 GEO: archive for functional genomics data sets—update. Nucleic Acids Res [Internet]. 769 2013;41(D1):D991–5. Available from: http://dx.doi.org/10.1093/nar/gks1193 770 21. Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, et al. 771 ArrayExpress update—simplifying data submissions. Nucleic Acids Res [Internet]. 772 2015;43(D1):D1113–6. Available from: http://dx.doi.org/10.1093/nar/gku1057 773 22. Gremse M, Chang A, Schomburg I, Grote A, Scheer M, Ebeling C, et al. The 774 BRENDA Tissue Ontology (BTO): the first all-integrating ontology of all organisms 775 for enzyme sources. Nucleic Acids Res [Internet]. 2011;39(Database issue):D507–13. 776 Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013802/ 777 23. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers 23 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

778 differential expression analyses for RNA-sequencing and microarray studies. Nucleic 779 Acids Res. 2015 Apr;43(7):e47–e47. 780 24. Clark NR, Hu KS, Feldmann AS, Kou Y, Chen EY, Duan Q, et al. The characteristic 781 direction: a geometrical approach to identify differentially expressed genes. BMC 782 Bioinformatics [Internet]. 2014;15:79. Available from: 783 https://www.ncbi.nlm.nih.gov/pubmed/24650281 784 25. Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and 785 meta-analysis. Bioinformatics [Internet]. 2012;28(4):573–80. Available from: 786 https://www.ncbi.nlm.nih.gov/pubmed/22247279 787 26. Kuleshov M V, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. 788 Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. 789 Nucleic Acids Res [Internet]. 2016;44(Web Server issue):W90–7. Available from: 790 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987924/ 791 27. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new 792 perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res [Internet]. 793 2017;45(Database issue):D353–61. Available from: 794 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210567/ 795 28. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 796 2016/12/03. 2017;45(D1):D331-d338. 797 29. Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, et al. A systems biology 798 approach for pathway level analysis. Genome Res. 2007/09/04. 2007 799 Oct;17(10):1537–45. 800 30. Voichita C, Ansari S, Draghici S. ROntoTools: R Onto-Tools suite. 2019. 801 31. Wang Z, Lachmann A, Keenan AB, Ma’ayan A. L1000FWD: fireworks visualization 802 of drug-induced transcriptomic signatures. Bioinformatics [Internet]. 2018 Jun 803 15;34(12):2150–2. Available from: http://dx.doi.org/10.1093/bioinformatics/bty060 804 32. Shu L, Zhao Y, Kurt Z, Byars SG, Tukiainen T, Kettunen J, et al. Mergeomics: 805 multidimensional data integration to identify pathogenic perturbations to biological 806 systems. BMC Genomics [Internet]. 2016;17(1):874. Available from: 807 http://www.ncbi.nlm.nih.gov/pubmed/27814671 808 33. Zhu K, Hu M, Yuan B, Liu J-X, Liu Y. Aspirin attenuates spontaneous recurrent 809 seizures in the chronically epileptic mice. Neurol Res. 2017 Aug;39(8):744–57. 810 34. Park HS, Jang JE, Ko MS, Woo SH, Kim BJ, Kim HS, et al. Statins Increase 811 Mitochondrial and Peroxisomal Fatty Acid Oxidation in the Liver and Prevent Non- 812 Alcoholic Steatohepatitis in Mice. Diabetes Metab J. 2016/04/21. 2016 Oct;40(5):376– 24 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

813 85. 814 35. FOLCH J, LEES M, SLOANE STANLEY GH. A simple method for the isolation and 815 purification of total lipides from animal tissues. J Biol Chem. 1957 May;226(1):497– 816 509. 817 36. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a 818 major update to the DrugBank database for 2018. Nucleic Acids Res. 2017/11/11. 819 2018;46(D1):D1074-d1082. 820 37. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The 821 STRING database in 2017: quality-controlled protein–protein association networks, 822 made broadly accessible. Nucleic Acids Res [Internet]. 2017;45(Database 823 issue):D362–8. Available from: 824 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5210637/ 825 38. Shu L, Zhao Y, Kurt Z, Byars SG, Tukiainen T, Kettunen J, et al. Mergeomics: 826 multidimensional data integration to identify pathogenic perturbations to biological 827 systems. BMC Genomics [Internet]. 2016;17(1):874. Available from: 828 https://doi.org/10.1186/s12864-016-3198-9 829 39. Shu L, Chan KHK, Zhang G, Huan T, Kurt Z, Zhao Y, et al. Shared genetic regulatory 830 networks for cardiovascular disease and type 2 diabetes in multiple populations of 831 diverse ethnicities in the United States. PLoS Genet. 2017/09/29. 832 2017;13(9):e1007040. 833 40. Chella Krishnan K, Kurt Z, Barrere-Cain R, Sabir S, Das A, Floyd R, et al. Integration 834 of Multi-omics Data from Mouse Diversity Panel Highlights Mitochondrial 835 Dysfunction in Non-alcoholic Fatty Liver Disease. Cell Syst. 2018/01/24. 836 2018;6(1):103-115.e7. 837 41. Liss KHH, Finck BN. PPARs and nonalcoholic fatty liver disease. Biochimie. 2017 838 May;136:65–74. 839 42. Abd El-Haleim EA, Bahgat AK, Saleh S. Resveratrol and fenofibrate ameliorate 840 fructose-induced nonalcoholic steatohepatitis by modulation of genes expression. 841 World J Gastroenterol. 2016 Mar;22(10):2931–48. 842 43. Ratziu V, Charlotte F, Bernhardt C, Giral P, Halbron M, Lenaour G, et al. Long-term 843 efficacy of rosiglitazone in nonalcoholic steatohepatitis: results of the fatty liver 844 improvement by rosiglitazone therapy (FLIRT 2) extension trial. Hepatology. 2010 845 Feb;51(2):445–53. 846 44. Ratziu V, Giral P, Jacqueminet S, Charlotte F, Hartemann-Heurtier A, Serfaty L, et al. 847 Rosiglitazone for nonalcoholic steatohepatitis: one-year results of the randomized 25 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

848 placebo-controlled Fatty Liver Improvement with Rosiglitazone Therapy (FLIRT) 849 Trial. Gastroenterology. 2008 Jul;135(1):100–10. 850 45. Zhang N, Lu Y, Shen X, Bao Y, Cheng J, Chen L, et al. Fenofibrate treatment 851 attenuated chronic endoplasmic reticulum stress in the liver of nonalcoholic fatty liver 852 disease mice. Pharmacology. 2015;95(3–4):173–80. 853 46. Laurin J, Lindor KD, Crippin JS, Gossard A, Gores GJ, Ludwig J, et al. 854 Ursodeoxycholic acid or clofibrate in the treatment of non-alcohol-induced 855 steatohepatitis: a pilot study. Hepatology. 1996 Jun;23(6):1464–7. 856 47. Fernandez-Miranda C, Perez-Carreras M, Colina F, Lopez-Alonso G, Vargas C, Solis- 857 Herruzo JA. A pilot trial of fenofibrate for the treatment of non-alcoholic fatty liver 858 disease. Dig Liver Dis. 2008 Mar;40(3):200–5. 859 48. Hertz R, Bar-Tana J. Peroxisome proliferator-activated receptor (PPAR) alpha 860 activation and its consequences in humans. Toxicol Lett. 1998 Dec;102–103:85–90. 861 49. Kawaguchi K, Sakaida I, Tsuchiya M, Omori K, Takami T, Okita K. Pioglitazone 862 prevents hepatic steatosis, fibrosis, and enzyme-altered lesions in rat liver cirrhosis 863 induced by a choline-deficient L-amino acid-defined diet. Biochem Biophys Res 864 Commun. 2004 Feb;315(1):187–95. 865 50. Nan Y-M, Fu N, Wu W-J, Liang B-L, Wang R-Q, Zhao S-X, et al. Rosiglitazone 866 prevents nutritional fibrosis and steatohepatitis in mice. Scand J Gastroenterol. 867 2009;44(3):358–65. 868 51. Neuschwander-Tetri BA, Brunt EM, Wehmeier KR, Oliver D, Bacon BR. Improved 869 nonalcoholic steatohepatitis after 48 weeks of treatment with the PPAR-gamma ligand 870 rosiglitazone. Hepatology. 2003 Oct;38(4):1008–17. 871 52. Torres DM, Jones FJ, Shaw JC, Williams CD, Ward JA, Harrison SA. Rosiglitazone 872 versus rosiglitazone and metformin versus rosiglitazone and losartan in the treatment 873 of nonalcoholic steatohepatitis in humans: a 12-month randomized, prospective, open- 874 label trial. Hepatology. 2011 Nov;54(5):1631–9. 875 53. Bravo M, Raurell I, Hide D, Fernández-Iglesias A, Gil M, Barberá A, et al. Restoration 876 of liver sinusoidal cell phenotypes by statins improves portal hypertension and 877 histology in rats with NASH. Sci Rep. 2019;9(1):20183. 878 54. Pastori D, Polimeni L, Baratta F, Pani A, Del Ben M, Angelico F. The efficacy and 879 safety of statins for the treatment of non-alcoholic fatty liver disease. Dig Liver Dis. 880 2015;47(1):4–11. 881 55. Sigler MA, Congdon L, Edwards KL. An Evidence-Based Review of Statin Use in 882 Patients With Nonalcoholic Fatty Liver Disease. Clin Med Insights Gastroenterol. 26 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

883 2018 Jul;11:1179552218787502–1179552218787502. 884 56. Simon TG, Henson J, Osganian S, Masia R, Chan AT, Chung RT, et al. Daily Aspirin 885 Use Associated With Reduced Risk For Fibrosis Progression In Patients With 886 Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol. 2019 May; 887 57. Moonen L, Geryl H, D’Haese PC, Vervaet BA. Short-term dexamethasone treatment 888 transiently, but not permanently, attenuates fibrosis after acute-to-chronic kidney 889 injury. BMC Nephrol. 2018;19(1):343. 890 58. Tsunoda R, Usui J, Hoshino J, Fujii T, Suzuki S, Takaichi K, et al. Corticosteroids 891 pulse therapy and oral corticosteroids therapy for IgA nephropathy patients with 892 advanced chronic kidney disease: results of a multicenter, large-scale, long-term 893 observational cohort study. BMC Nephrol. 2018 Sep;19(1):222. 894 59. Westerberg P-A, Sterner G, Ljunggren Ö, Isaksson E, Elvarson F, Dezfoolian H, et al. 895 High doses of cholecalciferol alleviate the progression of hyperparathyroidism in 896 patients with CKD Stages 3–4: results of a 12-week double-blind, randomized, 897 controlled study. Nephrol Dial Transplant. 2018 Mar;33(3):466–71. 898 60. Mankhey RW, Bhatti F, Maric C. 17β-Estradiol replacement improves renal function 899 and pathology associated with diabetic nephropathy. Am J Physiol Physiol. 2005 900 Feb;288(2):F399–405. 901 61. Suzuki H, Kondo K. Chronic kidney disease in postmenopausal women. Hypertens 902 Res. 2012;35(2):142–7. 903 62. Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of 904 intersecting sets and their properties. Bioinformatics [Internet]. 2017;33(18):2938–40. 905 Available from: http://dx.doi.org/10.1093/bioinformatics/btx364 906

27 certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint

907 Tables doi:

908 Table 1. Prediction percentile of FDA approved antihyperlipidemic drug based on hyperlipidemia signatures from MergeOmics (MO) pipeline and https://doi.org/10.1101/837773 909 CTD database across different platforms tested. HEPG2 results from both L1000 and CMAP were retrieved for tissue specificity comparison. PharmOmics PharmOmics PharmOmics Platform dose/time seg dose/time seg CREEDS CMAP CMAP HEPG2 L1000 L1000 HEPG2 meta network Jaccard Disease gene MO CTD MO CTD MO CTD MO CTD MO CTD MO CTD MO CTD MO CTD signature Atorvastatin 0.951 0.794 0.981 0.957 0.498 0.316 0.989 0.82 0.913 0.164 0.414 0.31 0.962 0.668 0.405 0.307

Bezafibrate 0.856 0.995 0.901 0.982 0.981 0.932 0.571 0.95 0.332 0.561 0.439 0.915 0.394 0.755 NA NA a CC-BY-NC-ND 4.0Internationallicense ; Cerivastatin 0.989 0.848 0.995 0.962 0.798 0.719 0.986 0.836 0.879 0.516 NA NA 0.967 0.761 NA NA this versionpostedMarch30,2021.

Clofibrate 0.965 0.97 0.802 0.927 0.951 0.992 0.737 0.986 0.196 0.291 0.153 0.433 0.31 0.615 NA NA

Clofibric acid 0.93 0.58 0.949 0.892 NA NA NA NA NA NA NA NA NA NA NA NA

Fenofibrate 0.984 0.986 0.908 0.883 0.954 0.954 0.797 0.943 0.121 0.108 0.229 0.201 NA NA NA NA

Fluvastatin 1 0.997 1.000 0.924 0.97 0.985 1 0.815 0.905 0.963 0.807 0.118 0.958 0.514 0.513 0.327

Gemfibrozil 0.992 0.962 0.984 0.873 0.787 0.844 0.9 0.712 0.677 0.612 NA NA 0.363 0.591 NA NA

Lovastatin 0.995 0.984 0.986 0.986 0.905 0.43 0.993 0.632 0.972 0.084 0.528 0.346 0.992 0.979 0.415 0.765

Nafenopin 0.726 0.943 0.472 0.864 NA NA 0.431 0.712 NA NA NA NA NA NA NA NA

Niacin 0.192 0.873 0.821 0.309 0.137 0.711 0.719 0.343 0.671 0.171 0.606 0.069 0.107 0.307 NA NA

Pravastatin 0.894 0.339 0.911 0.862 NA NA 0.979 0.854 0.829 0.669 0.727 0.934 0.592 0.717 NA NA The copyrightholderforthispreprint(whichwasnot

. Simvastatin 0.949 0.935 0.856 0.992 0.916 0.909 0.996 0.9 0.972 0.951 0.844 0.573 0.987 0.843 0.595 0.425

Ciprofibrate NA NA NA NA NA NA NA NA 0.685 0.998 0.84 0.288 0.292 0.272 NA NA

Ezetimibe NA NA NA NA NA NA NA NA 0.905 0.982 0.514 0.757 0.657 0.269 0.366 0.101

Probucol NA NA NA NA NA NA NA NA 0.552 0.115 0.021 0.696 0.018 0.529 NA NA

Rosuvastatin NA NA NA NA NA NA NA NA 0.913 0.056 0.855 0.238 0.905 0.464 NA NA

Median 0.951 0.943 0.911 0.924 0.911 0.876 0.94 0.828 0.829 0.516 0.528 0.346 0.624 0.603 0.415 0.327

Mean 0.879 0.862 0.890 0.878 0.79 0.779 0.841 0.792 0.701 0.483 0.537 0.452 0.607 0.592 0.459 0.385

28 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

910 Figure legends 911 Figure 1. PharmOmics data processing pipeline. FDA approved drugs based on KEGG 912 database were searched against GEO, ArrayExpress, TG-GATEs, and DrugMatrix data 913 repositories. Additional experimental drugs and chemicals from TG-GATEs and DrugMatrix 914 were also included. Only datasets with drug treatment and control samples were retrieved. 915 Datasets were first annotated with tissue and species information, followed by retrieval of 916 dose/time-segregated or meta-analysis drug signatures using two different methods. 917 Dose/time-segregated signatures were retrieved from individual datasets using LIMMA. 918 Meta signatures across datasets of the same drugs were obtained by first applying GeoDE to 919 obtain a ranked gene list for each treatment experiment, followed by meta-analysis using the 920 Robust Rank Aggregation method. These signatures were used to conduct drug repositioning 921 analysis and hepatotoxicity/nephrotoxicity prediction based on direct gene overlaps or a gene 922 network-based approach. 923 924 Figure 2. Summary of available datasets based on data sources, tissues, and species. Y- 925 axis indicates unique dataset counts, and X-axis indicates (A) tissue and data resources, (B) 926 tissue and species, and (C) data resources and species. 927 928 Figure 3. PharmOmics web server. The web server hosts drug signature and pathway 929 queries, between-tissue and between-species drug signature comparisons, and network-based 930 and gene overlap-based drug repositioning. Users are able to query, download, and perform 931 drug repositioning using all species- and tissue-specific meta and dose/time-segregated 932 signatures. Interactive results tables and network visualizations are displayed on the website 933 and available for download. 934 935 Figure 4. User interface of network drug repositioning web tool using sample 936 hyperlipidemia gene set and sample mouse Bayesian gene regulatory network. (A) 937 Inputs to network drug repositioning includes i) signature type to query (meta-analyzed, 938 dose/time-segregated with top 500 genes per signature, or dose/time-segregated with all 939 genes), ii) network (custom upload or select a sample network), iii) species (relating to the 940 species of the network being used), and iv) genes. In this case we choose dose/time- 941 segregated signatures using top 500 genes, a sample liver network, mouse/rat species, and the 942 sample hyperlipidemia gene set (loaded from ‘Add sample genes’). If human gene symbols 943 are provided with the ‘Mouse/Rat’ species selection, the genes will be converted to mouse/rat 944 symbols. (B) After the job is complete, the results file is displayed on the website and 29 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

945 available for download. A subset of the drug network containing the drug genes that are first 946 neighbors to input genes and all input genes can be visualized using the “Display Network” 947 button which will load an interactive display of the subnetwork topology. The oxymetholone 948 drug signature in rat liver is a top hit, and the drug network is shown on the right. Additional 949 data in the downloadable results file include the genes that are both a drug gene and an input 950 gene in the network, drug genes that are directly connected (first neighbor) to input genes, 951 and input genes directly connected to drug genes. 952 953 Figure 5. Drug repositioning for hyperlipidemia. AUC plots for network-based 954 repositioning and gene overlap-based repositioning in identifying anti-hyperlipidemia drugs 955 against other drugs using (A) Mergeomics hyperlipidemia signature or (B) CTD 956 hyperlipidemia signature. Comparison of drug repositioning performance between 957 PharmOmics network-based approach with CREEDS (using the “combined score” generated 958 by the enrichment analysis tool implemented in Enrichr), L1000, and CMAP query system 959 using (C) Mergeomics hyperlipidemia signature and (D) CTD hyperlipidemia signature. For 960 drugs with multiple datasets with different doses and treatment times, only the best 961 performing signature was used. (E) Drug-disease subnetwork of Mergeomics hyperlipidemia 962 signature (red) and lovastatin signature (blue) showing first neighbor (direct) connections. (F) 963 Drug-disease subnetwork Mergeomics hyperlipidemia signature (red) and oxymetholone 964 signature (blue) showing first neighbor connections. Wilcoxon signed rank test was used to 965 calculate significance between gene overlap/network z-scores between groups. *, **, *** 966 indicates p < 0.05, p < 0.01 and p < 0.001 repectively. 967 968 Figure 6. In vivo validation of predicted drugs Fluvastatin and Aspirin on preventing 969 NAFLD phenotypes in C57BL/6J mice. (A and B) Time course of body weight gain in 970 mice treated with fluvastatin (A) and aspirin (B) over 10 weeks. (C and D) Time course of fat 971 mass and muscle mass ratio (adiposity) in mice treated with fluvastatin (C) and aspirin (D) 972 over 10 weeks. (A-D) Data were analyzed by two-way ANOVA followed by Sidak post-hoc 973 analysis to examine treatment effects at individual time points. P value < 0.05 was considered 974 significant and is denoted by an asterisk (*). (E and F) Quantification of lipids in the liver of 975 mice on fluvastatin (E) and aspirin (F) treatment for 10 weeks. Triglyceride (TG), Total 976 Cholesterol (TC), Unesterified Cholesterol (UC), Phospholipid (PL). (D and E) Data were 977 analyzed using two-sided Student's t-test. P value < 0.05 was considered significant and is 978 denoted by an asterisk (*). Sample size n=7-9/group. High fat high sucrose (HFHS) 979 group (NAFLD); HFHS with fluvastatin (NAFLD + Flu); HFHS with aspirin (NAFLD + 30 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

980 Asp). (G-H) Gene network view of fluvastatin gene signatures overlapping with NAFLD 981 disease signatures (G) Gene network view of aspirin gene signatures overlapping with 982 NAFLD disease signatures (H). 983 984 Figure 7. Utility of PharmOmics drug signatures in hepatotoxicity prediction based on 985 matching between PharmOmics drug signatures and hepatotoxicity signatures of drug 986 induced liver injury (DILI) curated from comparative toxicogenomics database (CTD). 987 (A) Boxplots of Jaccard score-based hepatotoxicity ranking (left) and network-based 988 hepatoxicity ranking (right) by PharmOmics, across four categories of liver injury 989 histological severity defined by the independent TG-GATEs database (x-axis). PharmOmics 990 hepatotoxicity scores are higher for more severe liver injury categories. (B) PharmOmics 991 hepatotoxicity prediction scores based on gene signatures of higher drug doses correspond to 992 more severe liver injury categories defined by TG-GATES across three DILI concern 993 categories (“no”, “less”, “most”) defined by FDA. (C) Boxplots of network-based 994 hepatoxicity scores show increased scores at higher doses across three FDA DILI concern 995 categories. (D) ROC curves comparing PharmOmics with other tools in predicting 996 hepatotoxic drugs from the FDA DILI drug database. For PharmOmics, three sets of tests 997 were performed, where dose/time-segregated drug signatures, meta signatures, or a 998 hepatotoxicity subnetwork was used. (E) Liver hepatotoxicity network based on CTD 999 hepatotoxicity genes and its overlap with drug signatures of 4 of the top 5 predicted drugs by 1000 PharmOmics which had >50 signature genes. Phenobarbital was among the top 5 drugs but 1001 was not included in the figure due to its small DEG size. Colors of the network nodes denote 1002 the different drugs targeting the genes. The top 3 predictive subnetworks are depicted in red 1003 (D). ANOVA test followed by post-hoc analysis was used for statistics in A and C. *, **, *** 1004 indicates p < 0.05, p < 0.01 and p < 0.001 respectively. Boxplots show interquartile range 1005 (IQR) and median values (line inside the box). IQR was defined as between 25th (Q1) and 1006 75th (Q3) percentile. The upper and lower bars indicate the points within Q3 + 1.5*IQR and 1007 Q1 – 1.5*IQR, respectively. 1008 1009 Figure 8. Utility of PharmOmics drug signatures in nephrotoxicity prediction based on 1010 matching between PharmOmics drug signatures and nephrotoxicity signatures of 1011 chemical induced acute kidney injury (AKI) or chronic kidney disease (CKD) from 1012 comparative toxicogenomics database (CTD). (A) Boxplots of Jaccard score-based 1013 nephrotoxicity ranking (left) and network-based nephrotoxicity ranking (right) by 1014 PharmOmics, based on matching with AKI (top) or CKD (bottom) nephrotoxicity genes from 31 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1015 CTD, across four categories of kidney histological severity defined by the independent TG- 1016 GATEs database. Network-based nephrotoxicity prediction by PharmOmics showed a 1017 positive relationship between nephrotoxicity scores by PharmOmics and kidney histology 1018 severity defined by TG-GATEs. (B) PharmOmics nephrotoxicity prediction scores based on 1019 gene signatures of higher drug doses correspond to more severe kidney injury categories 1020 defined by TG-GATES, segregated by nephrotoxic labels defined by DrugBank. (C) Boxplot 1021 of network-based nephrotoxicity scores, using CKD nephrotoxicity genes against 1022 PharmOmics drug signatures, did not show significant dose-dependent trend in non- 1023 nephrotoxic drugs or nephrotoxic drugs defined by DrugBank. The low dose treatment group 1024 for the nephrotoxic drugs did not contain drug signatures with more than 10 genes and 1025 therefore the scores were not plotted. (D) ROC curve comparing the performance of 1026 PharmOmics with other tools in predicting nephrotoxic agents in DrugBank. For 1027 PharmOmics, two sets of tests were performed, where either dose/time-segregated drug 1028 signatures or meta signatures was used. (E) Kidney nephrotoxicity network based on CTD 1029 nephrotoxicity genes and the network overlap with drug signatures of top 5 drugs predicted 1030 by PharmOmics. Colors of the network nodes denote the various drugs targeting the genes. 1031 ANOVA test with post-hoc analysis was used for statistics in A and C. *, **, *** indicates p 1032 < 0.05, p < 0.01 and p < 0.001 respectively. Boxplots show interquartile range (IQR) and 1033 median values (line inside the box), with IQR defined as between 25th (Q1) and 75th (Q3) 1034 percentile. 1035 1036 Figure 9. Cross-tissue and cross-species comparisons of drug signatures in 1037 PharmOmics. (A) Distribution of drug signature overlap percentages between tissue pairs in 1038 matching species from PharmOmics. Arrow points to the pairs of tissues for drugs with high 1039 overlap in gene signatures. (B) Upset plot of cross-tissue comparison for atorvastatin 1040 signatures genes. Y-axis indicates number of genes. (C) Upset plot of cross-tissue comparison 1041 for pathways enriched in atorvastatin signatures. Y-axis indicates number of pathways. (D) 1042 Distribution of drug signature overlap percentages between pairs of species for matching 1043 tissues from PharmOmics. Arrow points to the species pair with high gene signature overlap 1044 for a matching drug. (E) Upset plot of cross-species comparison for rosiglitazone liver gene 1045 signatures. (F) Upset plot of cross-species comparison for pathways enriched in rosiglitazone 1046 liver signatures. Pairs of tissues with shared drug signature genes or pathways are connected 1047 with black vertical lines in the bottom portion of the Upset plots. 1048

32 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1049 Figures 1050 Figure 1.

1051 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1052 Figure 2.

1053

34 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1054 Figure 3.

1055 1056 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1057 Figure 4. A

B

Input gene

Drug gene 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 36 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1073 Figure 5.

A B 1.0 1.0 90% *** 89% ***

0.8 89% *** 0.8 87% ***

0.6 0.6

0.4 0.4 True positive rate positive True True positive rate positive True 0.2 0.2 Network Network Gene overlap Gene overlap 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate

C D 1.0 1.0 89%*** 87%***

0.8 85% 0.8 *** 80% 73%** *** 69%* 0.6 0.6 70%**

59% 0.4 PharmOmics dose/ 0.4 PharmOmics dose/ e time segregated

time segregated r True positive rate positive True True positive rate positive True PharmOmics meta PharmOmics meta 61%* 49% 0.2 CREEDS 0.2 CREEDS L1000 L1000 0.0 CMAP 0.0 CMAP 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate E F

Fatty acid Peroxisome, PPAR signaling, Insulin signaling, metabolism and Fatty acid Fatty acid Fatty acid AMPK, Insulin elongation, elongation, biosynthesis, and PPAR Glycerolipid Cholesterol Pyruvate signaling metabolism metabolism metabolism

Steroid and Steroid and terpenoid terpenoid backbone backbone biosynthesis, biosynthesis, Propanoate Glycolysis Drug gene signature metabolism Disease gene signature Oxymetholone 1074 Lovastatin 1075 1076 1077 1078 1079 37 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1080 Figure 6. A B 20 20 Time x Drug = 0.0002 Time x Drug = 0.0842 Time < 0.0001 Time < 0.0001 * * * * * * 15 Drug < 0.0001 15 Drug < 0.0001 * * * * 10 10

Weight gain gain Weight (g) 5

Weight gain (g) 5 NAFLD NAFLD NAFLD + Flu NAFLD + Asp 0 0 012345678910 012345678910 Weeks Weeks C D. 0.5 0.5 Time x Drug = 0.0068 Time x Drug = 0.9646 Time < 0.0001 * * * Time < 0.0001 0.4 Drug < 0.0001 0.4 * * Drug = 0.0157 0.3 * 0.3

0.2 0.2 Adiposity Adiposity

0.1 0.1 NAFLD NAFLD NAFLD + Flu NAFLD + Asp 0.0 0.0 012345678910 012345678910 E Weeks F Weeks

100 100 80 * NAFLD 80 * NAFLD NAFLD + Flu 60 60 NAFLD + Asp 40 40 20 20 5 5 4 4 3 3 2 2 1

Liver lipid (mg/g tissue) (mg/g lipid Liver 1 Liver Lipid (mg/g tissue) 0 0 TG TC UC PL TG TC UC PL G H

Drug gene signature

Disease gene signature 1081 1082 38 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1083 Figure 7.

1084 1085 1086 1087 1088 39 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1089 Figure 8. A B 1.00 1.00

e DrugBank nephrotoxic label l i t n

e 0.75 0.75 c r

e Non−nephrotoxic Nephrotoxic p

e r 0.50 1.00 o 0.50 c s

d r

a 0.25 c AKI c 0.25 a 0.75 J Network score percentile TGGATEs 0.00 minimal slight moderate severe minimal slight moderate severe histological histology severity histology severity severity 1.2 0.50 1.00 *** severe

e ***

l moderate i

t ***

n 0.9 * slight e

c 0.75 minimal r

e 0.25 p

e

r 0.6 o

c 0.50 s

d r

a 0.3 0.00 c CKD c

a 0.25 Control Low Middle High Control Low Middle High J Network score percentile score Network Dose level 0.0 Dose Level minimal slight moderate severe minimal slight moderate severe histology severity histology severity

CDDrugBank nephrotoxic label Non−nephrotoxic Nephrotoxic 1.00 e l i t n e

c 0.75 r e p

e r o

c 0.50 s

k r o w t

e 0.25 Network score percentile N

0.00 Low Middle High Low Middle High DoseDose Levellevel E P2ry13 9030625A04Rik Ndufa11 Socs6 H2-Eb2 Eea1 1600010M07Rik Itgb2 Eif5a 2310076G05Rik Cybb Plg Alox5ap BC052328 Cd38 Cox8a Tmem100 Mrc1 Cd44 Lyz2 Crp Itih3 1600027J07Rik Cyba Cyp2d22 Fgl2 Slc15a3 Tlr9 Atp5a1 Tcn2 AI429214 Cxcl10 Ces3a Mat1a Cntn1 Ccl7 Fyb 1190003J15Rik Tnfsf13b Dmp1 A830009L08Rik Lgmn Ccl2 Ngp Pthr1 Ncf4 Lonrf1 Cd48 Gimap8 Tmem184b Hamp Ttc39b Gria3 C3 Akr1c6 Serpinc1 Cyp2a12 Msr1 Vcam1 Krtap11-1 Cyp2c50 Hpx Slc34a3 Tusc3 March8 Rasl10b Cxcl1 Vmn2r-ps129 Tns1 Ddit4 Ces3 Apoa5 Arid5a Mug1 Aga Cyp3a11 Tdo2 D3Bwg0562e Rai2 Ppara 2810001A02Rik Hbegf Art2a-ps Adarb2 Gcm1 Nap1l5 Alb Cyp2c29 Tmem97 Rab38 Per1 Bik Atf3 Cyp2c37 Pon1 Mpv17l 4930539E08Rik Rnf44 Gaa Tmco6 Hpd Svopl Serpine1 Cpt1b Mug2 MMT00043382 Mep1b Egr2 Itga2b Nell2 C730036D15Rik Serpina3g 2210415I11Rik Fgb Slc38a4 Tat Ace Hsd11b1 Cyp2c39 H2-Q10 Apaf1 Ptger4Sema3g MMS010144669 Acy3 Gys2 C8a Slc13a3 Osteopontin_RBM_log_sex_W10 Slc22a7 Fmo3 Sele Wt1 Cyp24a1 Prr13 Cat Aass Ptpro Crb2 F2 Cyp4f14 Vps8 Gsto1 Ocstamp P2ry4 Ldhd Cyp2j5 Gsta3 Rcn1 Pparg Igfbp6 Alox15 Hsd17b2 Rdh7 Ngf Nox4 4930458L03Rik Retnla Cyp2c67 Tmco3 Mir342 Mgam Slc5a2 Sncg Irf5 Mbl1 Slc22a2 9530053A07Rik Slc47a1 Car3 Cyp2d13 Sall3 Retn F10 Ncf1 Itih1 Serpina1b Spp2 Murc Mast1 Spc25 Serpinb1a F13b Cfd Dusp27 Fam26d Cidec Cyp2c69 Mttp 4933406P04Rik Mcc mCT114231 A530050D06Rik XM_359153 Cidea Ebf2 Hid1 Orm1 Cyp2c40 Fabp1 Cyp2b9 Acaca Bex4 EG668346 Ube2o Cpt1a BC021381 Hk2 MMT00046357 Cyp2c68Ube2ql1 Hmgcs2 Apob Asgr1 2210414L08Rik Nat8l BC054059 Otop1 Adipoq XM_122526 MMT00038636 Acacb Scgb3a2 MMT00004002 Pphln1 Cyp2g1 Ldlr NM_020568 F12 Erlin1 Acot1 Cyp2d10 MMS010183492 Fasn Dyrk1b Mlxipl Plin Angptl3 Actn2 Kcnk3 Lipa Hp Prr7 Cyp3a44 Bhlhb2 Cfh Aacs Fabp4 1200009F10Rik Gm38426 Mmp27 Postn Sorcs2 Ucp1 Slc22a3 Gulo Srebf2 0610043K17Rik 2900060K15Rik MMT00054030 Hamp2 Ell3 Zp3r Thrsp 2900062L11Rik Aoc3 MMT00059796 Ptprf Gstt1 Pglyrp2 Mtss1 Clec14a Pnldc1 Col1a1 XM_134904 BB320549 Usp26 Acsm2 MMT00046076 Kcnc3 Gtf2a1l Igf1 Avpr1b Dnm1 Tfap2d XM_138349 ENSMUST00000068602 Fbxw21 Gdf5 Slc5a9 Olfr566 Omt2b Hyal5 BC042782 Hyi Xkr8 Spp1 Polr3b Mpp1 Wdr49 MMT00001755 Amdhd1 4933402J07Rik MMT00015104 Gja6 Dok2 2810422O20Rik 1100001E04Rik Adra1b Cyp2c54 MMT00020913 Gm1322 NM_009899 Hpcal4 Car15 Guca2b Padi4 1700041C02Rik 1500002O10Rik Olfr1426 Ces1b Arg1 Prss35 Npvf Mrps15 Ahsg Tmem158 Dusp1 Olfr75-ps1 Krt18 Cd14 Agtr2 Gja1 Lcn2 Thrap3 OTTMUSG00000008561 Ect2 3110049J23Rik Olfr1023 Olfr1107 Olfr1316 Frzb Hpdl Cfhr1 Ptgs2 Dfnb59 Plin5 Btnl10 Slc2a6 Atg4c Fbxo48 Il6 Tmsb10 Cpxm2 Cgref1 Klk4 Zdhhc24 Foxq1 4930554I06Rik MMT00073846 Opn1mw Socs2 Ndnf Camk1d Sema4a Hmgcr D330045A20Rik Keg1 Gem Slc38a2 A730054J21Rik Crlf1 Myo5a Cyp2c66 Olfr727 Mir547 Asxl3 Il3ra Mcmdc2 Timp1 Spsb3 Chadl Dsg1c Adipor1 Gclc Cts3 Macrod2 Rnf24 Cox6a2 Slc7a13 Spin2 Il2rb Bsx Dsg4 Ogdh Dgkz C130026L21Rik Chst7 Kat2b Spag9 Ctsm Sult1c2 Cbr1 Usp38 5730409E04Rik Mt1 Matr3 Tbx21 Vwa2 Mthfr Gucy1b3 Cxcr6 Hacl1 Rbck1 Itgbl1 MMT00014797 Olfml1 Rprd2 Pdcd1 Cyp2c65 Itgam Mt2 Baat Abca1 Tnnc1 Tbc1d10c Olfr613 Pcolce2 Hoxd12 Cyp2c38 Chst9 Il15 Krtap14 AI931714 Cntnap2 Olfr987 Grid1 Nrsn2 Alg8 Cxcl2 Picalm Cgn Hmox1 Olfr651 Il19 AV306604 Cyp2b10 Cryaa Foxb1 EG268795 Zfp385b Cyp2b13 Rad54l Tomt Tlr4 Acat1 Irx3 Ppfia4 Klk1b5 Olfr1030 4930448I18Rik Star Zfp14 Cyp2b23 Dnmt3b B3gnt2 Gm15987 Ccr4 Zfp605 Nlgn2 Scarb1 Fra10ac1 Srebf1 Rasl11b Tnf Robo1 Mir24-1 Cyp2b19 Ace3 Mtmr7 Abcg1 Cpt1c Akr1b7 Epo G3bp1 Panx3 Derl2 Vstm2l Tsnaxip1 Cpne2 Muc1 Vasn

Ibuprofen 1090 Dexamethasone Naproxen Cholecalciferol Beta-estradiol 1091 1092 40 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1093 Figure 9.

A D

120 500 Fludrocortisone_Rat kidney vs liver 100 400 Etidronate_Rat cardiovascular vs muscular tissue 80 Dexamethasone_liver Rat vs Mouse 300 Losatran Rat cardiovascular vs muscular tissue 60 200 40 Frequency Frequency

100 20

0 0 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 gene overlap percentage gene overlap percentage

BE 400 360 1000 941 344

300 CPT1C, AKR1B1, VNN1, ACSM3, CD36, 750 CPT1A, PDK4, ZNF669, ADH1C 226 TSC22D3, THBS1 200 500 478 Gene Size Gene Size Gene 259 100 250 173

16 12 4 2 24 10 9 0 0 hematopoietic system Human liver Mouse urogenital system Rat 300400 100200 0 1200 900 600300 0 Set Size Set Size

Steroid biosynthesis C PPAR signaling F Terpenoid backbone biosynthesis 100 129 Jak-Stat signaling 89 Tol like receptor signaling 75 PPAR signaling pathway 100 92 inflammatory response Fatty acid, triacylglycerol metabolism chemotaxis ketone body metabolism TNF signaling 50

50 GPCR ligand binding Pathway SizePathway Pathway SizePathway 25 14 15 11 9 5 1 3 0 0 Urogenital system Human hematopoeitic system Mouse liver Rat 100150 50 0 306090 0 Set Size Set Size 1094

41 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

GEO FDA approved drugs ArrayExpress Brenda Tissue experimental drugs OpenTG-GATEs Ontology chemicals DrugMatrix

Normalized Target chemicals Tissue Species microarray/RNAseq (total 941) annotation annotation datasets (total 13382) Limma characteristic direction/ dose/time segregated analysis Robust rank aggregation

Dose and time segregated Meta drug signatures drug signatures (total 4349) (total 14326)

Up Down Up Down Up Down Up Down Gene A Gene X Gene A'' Gene X'' Gene A Gene X Gene A'' Gene X'' Gene B Gene Y Gene B'' Gene Y'' Gene B Gene Y Gene B'' Gene Y'' Gene C Gene Z Gene C'' Gene Z'' Gene C Gene Z Gene C'' Gene Z'' Gene D Gene W Gene D'' Gene W'' Gene D Gene W Gene D'' Gene W''

Application 1 Application 2 Gene regulatory Drug repositioning ADR prediction network integration Drug-targeted disease Drug-induced injury predictions: predictions: Drug Disease: ex) Hyperlipidemia, NAFLD disease genes drug genes Experimental validations A 400 ArrayExpress drugMatrix 300 GEO TG−GATEs 200

Unique chemical count 100 Slgosc b_r_qcr amslr 0 eye liver ovary breast kidney pancreas integument primary cell adipose tissue skeletal system skeletal immune system immune nervous system endocrine gland muscular system muscular connective tissue connective urogenital system respiratory system embryonic structure hematopoietic system cardiovascular system cardiovascular gastrointestinal system B 510 270 Human 240 Mouse 210 180 Rat 150 120 90 60

Unique chemical count 30 0 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. eye liver ovary breast kidney pancreas integument primary cell adipose tissue skeletal system skeletal immune system immune nervous system endocrine gland muscular system muscular connective tissue connective urogenital system respiratory system embryonic structure hematopoietic system cardiovascular system cardiovascular gastrointestinal system C 650

ArrayExpress 200 drugMatrix GEO 175 TG−GATEs 150 125 100 75

Unique chemical count 50 25 0 Rat Mouse Human User

Select drug, Input genes Input genes Inputs species, and tissue and network

Drug DEG and Overlap drug Network drug pathway signature repositioning repositioning Applications query and species/tissue comparison

Data repository Species- and tissue- Sample gene specific signatures regulatory (meta and dose- networks segregated)

Drug signature query Overlap drug Network drug repositioning repositioning Drug Network Pathway Regulation Visualization Results Results bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The Drugcopyright 1 holder for this preprint (which was not Outputscertified by peer review) is the author/funder, who has granted bioRxivDrug a 1license to display the preprint in perpetuity. It is made available under Gene Regulation aCC-BY-NC-ND 4.0 International license. Drug 2 Drug 2 ......

User

Select drug, Input genes Input genes Inputs species, and tissue and network

Drug DEG and Overlap drug Network drug pathway signature repositioning repositioning Applications query and species/tissue comparison

Data repository Species- and tissue- Sample gene specific signatures regulatory (meta and dose- networks segregated)

Drug signature query Overlap drug Network drug repositioning repositioning Drug Network Pathway Regulation Visualization Results Results Drug 1 Outputs Drug 1 Gene Regulation Drug 2 Drug 2 ...... A

bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

B

Input gene

Drug gene A B 1.0 1.0 1.0 1.0 90% *** 90% *** 89% *** 89% ***

0.8 0.8 89% *** 89% *** 0.8 0.8 87% *** 87% ***

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4 True positive rate positive True rate positive True True positive rate positive True 0.2 0.2 0.2 0.2 Network Network Network Network Gene overlapGene overlap Gene overlapGene overlap 0.0 0.0 0.0 0.0 0.0 0.20.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.00.0 0.20.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.0 False positiveFalse rate positive rate False positiveFalse rate positive rate

C D 1.0 1.0 1.0 1.0 89%*** 89%*** 87%*** 87%*** 0.8 0.885% 85% 0.8 0.8 *** *** 80% 80% 73%** 73%** *** *** 69%* 69%* 0.6 0.6 0.6 0.6 70%** 70%** 59% 0.4 0.4 PharmOmicsPharmOmics dose/ dose/0.4 0.4 59% PharmOmicsPharmOmics dose/ dose/ time segregated time segregatedtime segregated time segregated r e True positive rate positive True rate positive True True positive rate positive True PharmOmicsPharmOmics meta meta PharmOmicsPharmOmics meta meta 61%* 61%* 49% 49% 0.2 0.2 CREEDS CREEDS 0.2 0.2 CREEDS CREEDS L1000 L1000 L1000 L1000 0.0 0.0 CMAP CMAP 0.0 0.0 CMAP CMAP 0.0 0.20.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.00.0 0.20.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.0

bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 InternationalFalse license. positiveFalse rate positive rate False positiveFalse rate positive rate E F

Fatty acid Peroxisome, PPAR signaling, Insulin signaling, metabolism and Fatty acid Fatty acid Fatty acid AMPK, Insulin elongation, elongation, biosynthesis, and PPAR Glycerolipid Cholesterol Pyruvate signaling metabolism metabolism metabolism

Steroid and Steroid and terpenoid terpenoid backbone backbone biosynthesis, biosynthesis, Propanoate Glycolysis Drug gene signature metabolism Disease gene signature Oxymetholone Lovastatin A B 20 20 Time x Drug = 0.0002 Time x Drug = 0.0842 Time < 0.0001 Time < 0.0001 * * * * * * 15 Drug < 0.0001 15 Drug < 0.0001 * * * * 10 10

Weight gain (g) 5

Weight gain (g) 5 NAFLD NAFLD NAFLD + Flu NAFLD + Asp 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Weeks Weeks C D. 0.5 0.5 Time x Drug = 0.0068 Time x Drug = 0.9646 Time < 0.0001 * * * Time < 0.0001 0.4 Drug < 0.0001 0.4 * * Drug = 0.0157 0.3 * 0.3

0.2 0.2 Adiposity Adiposity

0.1 0.1 NAFLD NAFLD NAFLD + Flu NAFLD + Asp 0.0 0.0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 E Weeks F Weeks

100 100 80 * NAFLD 80 * NAFLD NAFLD + Flu 60 60 NAFLD + Asp 40 40 20 20 5 5 4 4 3 3 2 2 1

Liver lipid (mg/g tissue) 1 Liver Lipid (mg/g tissue)

bioRxiv preprint doi: https://doi.org/10.1101/8377730 ; this version posted March 30, 2021. The copyright holder for this preprint (which was not 0 certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under TG aCC-BY-NC-ND 4.0 InternationalTC license. UC PL TG TC UC PL G H

Drug gene signature

Disease gene signature bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A B 1.00 1.00 ● DrugBank nephrotoxic label Kidney 0.75 0.75 Non−nephrotoxic Nephrotoxic

0.50 1.00 0.50

0.25 AKI 0.25 0.75 Jaccard score percentile Jaccard Network score percentile TGGATEs 0.00 minimal slight moderate severe minimal slight moderate severe histological histology severity histology severity severity 1.2 0.50 1.00 *** severe *** moderate *** 0.9 * slight 0.75 minimal 0.25 0.6 0.50

0.3 CKD 0.00 0.25 Control Low Middle High Control Low Middle High Jaccard score percentile Jaccard Network score percentile Dose level 0.0 Dose Level minimal slight moderate severe minimal slight moderate severe Liverhistology severity histology severity No DILI−Concern Less DILI−Concern Most DILI−Concern C Kidney DrugBank nephrotoxic* label *** D ** *** *** 1.00 Non−nephrotoxic Nephrotoxic ** 1.00

0.75 0.75 DILI

category● ● No DILI−Concern

● ● 0.500.50 Non● −nephrotoxicLess DILI−Concern ● NephrotoxicMost DILI−Concern

● ● ● 0.25 ● Network score percentile

● Network score percentile 0.25

0.00 0.00 Low Middle High Low Middle High Low Middle High DoseDoseLow Levellevel Middle High Low Middle High Dose level E P2ry13 9030625A04Rik Ndufa11 Socs6 H2-Eb2 Eea1 1600010M07Rik Eif5a Itgb2 2310076G05Rik Cybb Plg Alox5ap BC052328 Cd38 Cox8a Tmem100 Mrc1 Cd44 Lyz2 Crp Itih3 1600027J07Rik Cyba Cyp2d22 Slc15a3 Tlr9 Fgl2 Atp5a1 Tcn2 AI429214 Cxcl10 Ces3a Mat1a Cntn1 Ccl7 Fyb 1190003J15Rik Tnfsf13b Dmp1 A830009L08Rik Lgmn Ccl2 Ngp Pthr1 Ncf4 Lonrf1 Cd48 Gimap8 Tmem184b Hamp Ttc39b Gria3 C3 Akr1c6 Serpinc1 Cyp2a12 Msr1 Vcam1 Krtap11-1 Cyp2c50 Hpx Slc34a3 Tusc3 March8 Rasl10b Cxcl1 Vmn2r-ps129 Tns1 Ddit4 Ces3 Apoa5 Arid5a Mug1 Aga Cyp3a11 Tdo2 D3Bwg0562e Rai2 2810001A02Rik Hbegf Art2a-ps Adarb2 Ppara Gcm1 Nap1l5 Alb Cyp2c29 Tmem97 Rab38 Bik Atf3 Per1 Pon1 Mpv17l 4930539E08Rik Rnf44 Gaa Tmco6 Hpd Cyp2c37 Svopl Serpine1 Cpt1b Mug2 MMT00043382 Mep1b Egr2 Itga2b Nell2 C730036D15Rik Serpina3g 2210415I11Rik Fgb Slc38a4 Tat Ace Cyp2c39 H2-Q10 Apaf1 Sema3g Hsd11b1 MMS010144669 Acy3 Ptger4 C8a Gys2 Slc13a3 Osteopontin_RBM_log_sex_W10 Slc22a7 Fmo3 Sele Wt1 Cyp24a1 Prr13 Cat Aass Ptpro Crb2 F2 Cyp4f14 Vps8 Gsto1 Ocstamp P2ry4 Ldhd Cyp2j5 Pparg Gsta3 Rcn1 Igfbp6 Alox15 Rdh7 Hsd17b2 Ngf Mir342 Nox4 4930458L03Rik Retnla Cyp2c67 Irf5 Tmco3 Mgam Slc5a2 Sncg Mbl1 Slc22a2 Slc47a1 Car3 Cyp2d13 9530053A07Rik Sall3 Retn F10 Ncf1 bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not Itih1 Serpina1b certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Mast1 Serpinb1a F13b Spp2 Cfd Dusp27 Murc Fam26d aCC-BY-NC-ND 4.0 International license. Spc25 Cidec Cyp2c69 Mttp 4933406P04Rik Mcc mCT114231 A530050D06Rik XM_359153 Cidea Ebf2 Hid1 Orm1 Cyp2c40 Fabp1 Acaca Bex4 EG668346 Cyp2b9 Ube2o Cpt1a BC021381 Hk2 MMT00046357 Ube2ql1 Asgr1 2210414L08Rik Nat8l Cyp2c68 Hmgcs2 Apob BC054059 Otop1 Adipoq XM_122526 MMT00038636 Acacb Scgb3a2 MMT00004002 Pphln1 Cyp2g1 Ldlr NM_020568 Erlin1 Acot1 Cyp2d10 MMS010183492 Fasn F12 Dyrk1b Mlxipl Plin Angptl3 Actn2 Kcnk3 Lipa Hp Prr7 Cyp3a44 Bhlhb2 Cfh Aacs Fabp4 1200009F10Rik Gm38426 Mmp27 Postn Sorcs2 Ucp1 Slc22a3 Gulo Srebf2 0610043K17Rik 2900060K15Rik MMT00054030 Hamp2 Ell3 Zp3r Thrsp 2900062L11Rik Aoc3 Gstt1 Mtss1 Clec14aMMT00059796 Pnldc1 XM_134904 Ptprf BB320549 Pglyrp2 Col1a1 Acsm2 Usp26 MMT00046076 Kcnc3 Gtf2a1l Igf1 Avpr1b Dnm1 Tfap2d XM_138349 ENSMUST00000068602 Fbxw21 Gdf5 Slc5a9 Olfr566 Omt2b Hyal5 BC042782 Hyi Xkr8 Mpp1 Wdr49 Spp1 MMT00001755 Amdhd1 Polr3b 4933402J07Rik MMT00015104 Gja6 Dok2 2810422O20Rik 1100001E04Rik Adra1b Cyp2c54 MMT00020913 Gm1322 NM_009899 Hpcal4 Car15 Guca2b Padi4 1700041C02Rik 1500002O10Rik Olfr1426 Arg1 Prss35 Npvf Mrps15 Ces1b Ahsg Tmem158 Dusp1 Olfr75-ps1 Krt18 Cd14 Agtr2 Gja1 Lcn2 Thrap3 OTTMUSG00000008561 Ect2 3110049J23Rik Olfr1023 Olfr1107 Olfr1316 Frzb Cfhr1 Ptgs2 Hpdl Dfnb59 Plin5 Slc2a6 Atg4c Fbxo48 Btnl10 Il6 Tmsb10 Cpxm2 Cgref1 Klk4 Zdhhc24 Foxq1 4930554I06Rik MMT00073846 Opn1mw Socs2 Ndnf Camk1d Sema4a Hmgcr Gem A730054J21Rik Crlf1 D330045A20Rik Keg1 Myo5a Slc38a2 Olfr727 Asxl3 Il3ra Cyp2c66Mcmdc2 Timp1 Mir547 Spsb3 Chadl Dsg1c Macrod2 Adipor1 Rnf24 Gclc Cts3 Cox6a2 Slc7a13 Spin2 Il2rb Bsx Dsg4 Ogdh Dgkz C130026L21Rik Chst7 Kat2b Spag9 Ctsm Sult1c2 Cbr1 Usp38 5730409E04Rik Mt1 Matr3 Tbx21 Vwa2 Mthfr Gucy1b3 Cxcr6 Hacl1 Rbck1 Itgbl1 MMT00014797 Olfml1 Rprd2 Pdcd1 Cyp2c65 Baat Itgam Tnnc1 Mt2 Abca1 Tbc1d10c Olfr613 Pcolce2 Hoxd12 Cyp2c38 Chst9 Il15 Krtap14 AI931714 Cntnap2 Olfr987 Grid1 Nrsn2 Alg8 Cxcl2 Hmox1 Picalm Cgn Olfr651 Il19 AV306604 Cyp2b10 Cryaa Foxb1 EG268795 Zfp385b Cyp2b13 Rad54l Tomt Tlr4 Acat1 Irx3 Ppfia4 Klk1b5 Olfr1030 4930448I18Rik Star Zfp14 Cyp2b23 B3gnt2 Gm15987 Ccr4 Dnmt3b Zfp605 Nlgn2 Scarb1 Fra10ac1 Srebf1 Rasl11b Tnf Robo1 Mir24-1 Cyp2b19 Ace3 Mtmr7 Abcg1 Cpt1c Akr1b7 Epo G3bp1 Panx3 Derl2 Vstm2l Tsnaxip1 Cpne2 Muc1 Vasn

Dexamethasone Naproxen Cholecalciferol Beta-estradiol Ibuprofen A D

120 500 Fludrocortisone_Rat kidney vs liver 100 400 Etidronate_Rat cardiovascular vs muscular tissue 80 Dexamethasone_liver Rat vs Mouse 300 Losatran Rat cardiovascular vs muscular tissue 60 200 40 Frequency Frequency

100 20

0 0 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 gene overlap percentage gene overlap percentage

B E 400 360 1000 941 344 CPT1C, AKR1B1, VNN1, ACSM3, CD36, 300 750 CPT1A, PDK4, ZNF669, ADH1C 226 TSC22D3, THBS1 200 500 478

bioRxiv preprint doi: https://doi.org/10.1101/837773; this version posted March 30, 2021. The copyright holder for this preprint (which was not Gene Size certified by peer review) is the author/funder,Gene Size who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 259 100 250 173

16 12 4 2 24 10 9 0 0 hematopoietic system ● ● ● ● Human ● ● ● ● liver ● ● ● ● Mouse ● ● ● ● urogenital system ● ● ● ● Rat ● ● ● ● 0100200300400 03006009001200 Set Size Set Size

Steroid biosynthesis C PPAR signaling F Terpenoid backbone biosynthesis 100 129 Jak-Stat signaling 89 Tol like receptor signaling 75 PPAR signaling pathway 100 92 inflammatory response Fatty acid, triacylglycerol metabolism chemotaxis ketone body metabolism TNF signaling 50

50 GPCR ligand binding Pathway Size Pathway Pathway Size Pathway 25 14 15 11 9 5 1 3 0 0 Urogenital system ● Human ● ● ● hematopoeitic system ● ● ● Mouse ● ● ● liver ● ● Rat ● ● ● ● 050100150 0306090 Set Size Set Size