<<

SUPPLEMENTARY FIGURE 1

Drug Structure Drug Sensitivity Drug Perturbation

Cell line gene expression Canonical SMILES Drug sensitivity profiles before and after drug profiles NCI60 dataset treatment CTRPv2 dataset L1000 dataset

BENCHMARK DATASETS Parse SMILES Compute Compute effect of association drug Calculate Extended between gene concentrations on ATC Drug Fingerprint to expression and cell profiles using Classification obtain Tanimoto drug dose linear regression measures response

Reduce to common subset of drugs Drug-Target Interactions

Drug Drug Sensitivity Drug Structure Taxonomy Pertubation Taxonomy Taxonomy

1 1 Drug similarity matrix of Drug similarity matrix of Drug similarity matrix of chemical structures sensitivity profiles gene expression

SINGLE-LAYER DRUG TAXONOMIES 0.5 0.5 Precision True positive rate

Similarity Network Fusion across Single Taxonomies 0 0.5 1 0 0.5 1 False positive rate Recall

Affinity Propagation ROC Curves PR Curves Clustering to assess Drug Communities sharing common MOA VALIDATE DRUG MODE OF ACTION

Drug Network Fusion (DNF) Taxonomy

Community 1 Community 2 Community 3

Supplementary Figure 1: Overview of the study design. Drug sensitivity profiles from the NCI60 and the CTRPv2 datasets, along with drug perturbation and drug structure data from the L1000 dataset, are first parsed into drug-drug similarity matrices that represent single-dataset drug taxonomies. Two DNF taxonomies are generated using the drug sensitivity taxonomy from either the NCI60 or CTRPv2 datasets. DNF taxonomies and single-dataset taxonomies are tested against benchmarked datasets containing ATC drug classification and drug-target information, to validate their efficacy in predicting drug MoA. Additional clustering is conducted on DNF taxonomies to identify drug communities sharing a MoA. SUPPLEMENTARY FIGURE 2

Cell lines

Drugs

Pubchem Pubchem 60 million SMILES 60 million SMILES

L1000L1000 NCI60NCI60 L1000L1000 CTRPv2CTRPv2

2008820088 238238 4970049700 2008720087 239239 242 242

Number of Drugs Drug Targets ATC classes Number of Drugs Drug Targets ATC classes Matching Benchmarks 86 72 Matching Benchmarks 141 51

Supplementary Figure 2: Overlap of drug annotations across the L1000 and the NCI60 and CTRPv2 sensitivity datasets. Also indicated are the number of drugs from each DNF matrix, which overlap with the drug target and ATC benchmarks. SUPPLEMENTARY FIGURE 3

ATC Drug Drug-Target Classification Interactions

Drug Taxonomy Under Evaluation BENCHMARK DATASETS

Reduce to common subset of drugs between benchmark dataset and drug taxonomy

Convert Drug Taxonomy into Convert Benchmark into Binary continuous vector of Drug-Drug Pairs Vector of Drug-Drug Pairs

Drug A- Drug B 0.86 Drug A- Drug B 1 Assign Drug A- Drug C 0.54 Assign Drug A- Drug C 0 score of 1 if Drug A- Drug D score Drug A- Drug D 1 drug-drug 0.79 from drug pair share a Drug B- Drug C 0.3 taxonomy Drug B- Drug C 1 target, … … 0 otherwise

Assess DNF at different levels of false positive rate (FPR)

1 1

0.5 0.5 Precision True positive rate

0 0.5 1 0 0.5 1 False positive rate Recall

Calculate Area Under the Calculate Precision-Recall Curve (AUC) Curves (PR)

Supplementary Figure 3: Schematic representation of the validation of the DNF and single data type analyses against drug benchmarks. Drug taxonomies are converted into a continuous vector of drug-drug pairs. Benchmark datasets are converted into binary vectors, whereby a given drug-drug pair is assigned a value of ‘1’ if the drugs share a common drug target or ATC classification, and ‘0’ otherwise. Vectors are compared using AUROC and AUPRC. A drug taxonomy using CTRPv2 and( shownfor both( Dataare alsoshow. (DNF) andeachof thesingle-layer similarity matrices are betweenthe integrative drugtaxonomy depicted.Correlations perturbation, drugsensitivity)are drug between allpairsofsingle-layer similaritymatrices (drug structure, Spearman correlation drugtaxonomies. 4:Complementarity ofdruginformationacross Supplementary Figure Spearman Correlation SUPPLEMENTARY FIGURE 4 FIGURE SUPPLEMENTARY 0.0 0.2 0.4 0.6 0.8 1.0

Struct vs Pert

Struct vs Sens

Pert vs Sens CTRPv2

DNF vs Pert B

) drug taxonomy using the NCI60 sensitivity datasets. DNF vs Sens

DNF vs Struct

Spearman Correlation B 0.0 0.2 0.4 0.6 0.8 1.0

Struct vs Pert

Struct vs Sens

Pert vs Sens NCI60

DNF vs Pert

DNF vs Sens A ) DNF vs Struct SUPPLEMENTARY FIGURE 5

A NCI60 - Targets C NCI60 - ATC 1.0 1.0 0.8 0.8 0.6 0.6

AUROC AUROC

0.4 Integration = 0.876 0.4 Integration = 0.853 Structure = 0.801 Structure = 0.772 Sensitivity = 0.801 Sensitivity = 0.685 True positive rate positive True rate positive True Perturbation = 0.615 Perturbation = 0.615 rand = 0.5 rand = 0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate

B D AUC = 0.5518739 AUC = 0.4920411

1.0 Integration = 0.552 1.0 Integration = 0.492 Structure = 0.426 Structure = 0.404 Sensitivity = 0.434 Sensitivity = 0.278 Perturbation = 0.152 Perturbation = 0.242 rand = 0.048 rand = 0.095 0.8 0.8 AUPRC AUPRC 0.6 0.6 Precision Precision 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Recall Recall

Supplementary Figure 5: Validation of single-dataset and DNF taxonomies against drug benchmark datasets, based on DNF generated using NCI60. ROC and PR curves are shown for each of the taxonomies, tested against ATC annotations and drug-target information from Chembl or internal benchmarks. A diagonal (grey) representing the null case (AUROC=0.5) is drawn for reference, and a grey curve is also drawn to map random (rand) cases for the PR curves. (A) ROC curve for NCI60 against drug-targets (B) PR curve against drug-targets (C) ROC curve for NCI60 against ATC (D) PR curve against ATC drug classifications. SUPPLEMENTARY FIGURE 6

AFATINIB TRAMETINIB PAZOPANIB BOSUTINIB LANATOSIDEC

IBRUTINIB SOLANINE BISACODYL TYRPHOSTINAG1478 DABRAFENIB DASATINIB VEMURAFENIB PD184352 PROSTRATIN FOSTAMATINIB ERLOTINIB GITOXIGENIN OUABAIN LAPATINIB SELUMETINIB GEFITINIB DIGOXIN CRIZOTINIB C20 C42 C14

ITRACONAZOLE DACARBAZINE THIORPHAN PHENETHYLISOTHIOCYANATE GEFITINIB EMETINE PYRIMETHAMINE ISOTRETINOIN BRDK00910650 FORSKOLIN SELUMETINIB TERREICACID RALOXIFENE DIGOXIN NILOTINIB MEVASTATIN CURCUMIN LEFLUNOMIDE OLIGOMYCINA VINBLASTINE TEMSIROLIMUS ETACRYNICACID WITHAFERINA C5 NOBILETIN TANESPIMYCIN CINCHONINE VORINOSTAT CAFFEICACID BRDK00910650 PLUMBAGIN BORTEZOMIB MENADIONE PIMOZIDE AG957 DIHYDROERGOCRISTINE

MALONOBEN MENADIONE CARMOFUR OXIDOPAMINE SA792541 HYPERICIN GEMCITABINE C32 CLADRIBINE HONOKIOL KINETINRIBOSIDE DECITABINE TOPOTECAN STAT3INHIBITORVI PROCARBAZINE 6AMINOCHRYSENE MELPHALAN CADMIUMCHLORIDE TENIPOSIDE MEBENDAZOLE AG957

DAUNORUBICIN FENRETINIDE STAUROSPORINE ELESCLOMOL RADICICOL ARTEMETHER C2

RAZOXANE IRINOTECAN BISBENZIMIDE AMSACRINE TENIPOSIDE ETOPOSIDE CAMPTOTHECIN TOPOTECAN PODOPHYLLOTOXIN C48 C45

Supplementary Figure 6: Community of 53 Exemplar drugs of the DNF taxonomy generated using NCI60. Communities sharing similar MoA and proximity in the network are highlighted, with the community number indicated. SUPPLEMENTARY FIGURE 7

A B C9 4 2.5 C4 C12 2 3 C13 1.5 C9 -log 10 FDR C14 2 1 -log 10 FDR C17 0.5 1 C10 C18 0

C20 0 C12 C21

C24 C18 C25

C30 C20 C33

C34 C31 C38

C41 C33 C42

C44 C38 C45

C48 C41 C49 Lanosterol 14 Dihydrofolate reductase 3 Ribonucleoside DNA polymerase alpha catalytic subunit DNA topoisomerase 2 Thymidylate synthase growth factorEpidermal receptor receptor Glucocorticoid Tubulin beta chain Sodium/potassium receptor Estrogen receptor Nuclear receptor subfamily 1 group I member 2 Mast/stem cell growth factor receptor Kit Macrophage colony Platelet Tyrosine Platelet Tubulin beta DNA topoisomerase 1 DNA topoisomerase I, mitochondrial DNA (cytosine Tubulin beta Apoptosis regulator Bcl Microtubule Microtubule Microtubule Proto Tyrosine Receptor tyrosine Vascular endothelial growth factor receptor 2 Receptor Serine/threonine Fibroblast growth factor receptor 3 Breakpoint cluster region protein Tyrosine Serine/threonine − hydroxy C45 − oncogene tyrosine − − − − − derived growth factor receptor alpha derived growth factor receptor beta − protein kinase ABL1 protein kinase Lck protein kinase Lyn − type tyrosine − − − 3 − − − associated protein 4 associated protein 2 associated protein tau 4B chain 1 chain − methylglutaryl − C49 alpha demethylase − 5) − − diphosphate reductase large subunit − − protein kinase mTOR protein kinase B − protein kinase erbB methyltransferase 1 L01XE L01AA L01CA L01DB D10AA H02AB L01CB C10AA L01BB P01BE L01CD S01CB C05AA S01BA D07XB D07AB R01AD A01AC L01BA − transporting ATPasetransporting subunit alpha stimulating factor 1 receptor − − − alpha 2 protein kinase FLT3 − protein kinase Src − coenzyme A reductase − raf − 2 − 1

Supplementary Figure 7: Enrichment of Drug Communities of the DNF taxonomy generated using NCI60. A total of 53 communities were tested for enrichment against drug target annotations from DrugBank and ATC annotations from ChEMBL. (A) Enrichment of communities for Drug target annotations, with -log10 values indicated in the heatmap, which has been reduced to show significantly enriched communities. Communities are labelled by community number as determined by the APC algorithm. (B) Enrichment of communities for ATC classes, with -log10 values indicated in the heat map, which has been reduced to show significantly enriched communities. Communities are labelled by community number as determined by the APC algorithm. SUPPLEMENTARY FIGURE 8

A CTRPv2CTRPv2 B NCI60 NCI60 14 12 15 10 8 10 6 4 5 Number of communities Number of communities 2 0 0

2 3 4 5 6 7 8 9 2 3 4 5 6 7 9 10 11

Number of drugs in communities Number of drugs in communities

Supplementary Figure 8: Distribution of drug communities sizes of the DNF taxonomy. (A) A total of 53 communities from DNF (using CTRPv2 sensitivity data) were tested for enrichment against drug target annotations from the CTRPv2 data and ATC annotations from ChEMBL (see methods). (B) A total of 53 communities from DNF (using NCI60 sensitivity data) were tested for enrichment against drug target annotations from DrugBank and ATC annotations from ChEMBL. SUPPLEMENTARY FIGURE 9

ROC CURVES PR CURVES AUC = 0.5578036 1.0 1.0 Integration = 0.558 IorioPGX = 0.401 Iskar = 0.256 SuperPred = 0.37 rand = 0.212 0.8 0.8 AUPRC 0.6 0.6 ATC AUROC Precision 0.4 Integration = 0.801 0.4 IorioPGX = 0.632 Iskar = 0.522 True positive rate positive True SuperPred = 0.677 rand = 0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Recall

CTRPv2 AUC = 0.4131506

1.0 1.0 Integration = 0.413 IorioPGX = 0.074 Iskar = 0.149 DrugERank = 0.077 rand = 0.036 0.8 0.8 AUPRC 0.6 0.6

AUROC Precision 0.4 Integration = 0.887 0.4 Drug Targets IorioPGX = 0.602 Iskar = 0.649 True positive rate positive True DrugERank = 0.614 rand = 0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Recall

AUC = 0.4920411 1.0

1.0 Integration = 0.492 IorioPGX = 0.253 Iskar = 0.158 SuperPred = 0.334

0.8 rand = 0.095 0.8 AUPRC 0.6 0.6

AUROC ATC 0.4

Integration = 0.853 Precision 0.4 IorioPGX = 0.619 Iskar = 0.541 True positive rate positive True SuperPred = 0.725 rand = 0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Recall NCI60 AUC = 0.5518739 1.0 1.0 Integration = 0.552 IorioPGX = 0.134 Iskar = 0.247 DrugERank = 0.303 rand = 0.048 0.8 0.8 AUPRC 0.6 0.6

AUROC Drug Targets Precision 0.4 0.4 Integration = 0.876 IorioPGX = 0.593 Iskar = 0.694 True positive rate positive True DrugERank = 0.749 rand = 0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate Recall

Supplementary Figure 9: Comparison of the DNF and single-layer drug taxonomies with comparable drug prediction methods (SuperPred, DrugE-Rank, Iorio, Iskar). ROC and PR curves are depicted to indicate the performance of the taxonomies and comparative methods against drug target and ATC benchmarks. This analysis is conducted for both DNF taxonomies (based on CTRPv2 or NCI60 data, shown in blue), and their associated similarity networks. ROC curves are on the left, and PR curves on the right.