Supplementary Figures
Total Page:16
File Type:pdf, Size:1020Kb
SUPPLEMENTARY FIGURE 1 Drug Structure Drug Sensitivity Drug Perturbation Cell line gene expression Canonical SMILES Drug sensitivity profiles before and after drug profiles NCI60 dataset treatment CTRPv2 dataset L1000 dataset BENCHMARK DATASETS Parse SMILES Compute Compute effect of association drug Calculate Extended between gene concentrations on ATC Drug Fingerprint to expression and cell profiles using Classification obtain Tanimoto drug dose linear regression measures response Reduce to common subset of drugs Drug-Target Interactions Drug Drug Sensitivity Drug Structure Taxonomy Pertubation Taxonomy Taxonomy 1 1 Drug similarity matrix of Drug similarity matrix of Drug similarity matrix of chemical structures sensitivity profiles gene expression SINGLE-LAYER DRUG TAXONOMIES 0.5 0.5 Precision True positive rate True Similarity Network Fusion across Single Taxonomies 0 0.5 1 0 0.5 1 False positive rate Recall Affinity Propagation ROC Curves PR Curves Clustering to assess Drug Communities sharing common MOA VALIDATE DRUG MODE OF ACTION Drug Network Fusion (DNF) Taxonomy Community 1 Community 2 Community 3 Supplementary Figure 1: Overview of the study design. Drug sensitivity profiles from the NCI60 and the CTRPv2 datasets, along with drug perturbation and drug structure data from the L1000 dataset, are first parsed into drug-drug similarity matrices that represent single-dataset drug taxonomies. Two DNF taxonomies are generated using the drug sensitivity taxonomy from either the NCI60 or CTRPv2 datasets. DNF taxonomies and single-dataset taxonomies are tested against benchmarked datasets containing ATC drug classification and drug-target information, to validate their efficacy in predicting drug MoA. Additional clustering is conducted on DNF taxonomies to identify drug communities sharing a MoA. SUPPLEMENTARY FIGURE 2 Cell lines Drugs Pubchem Pubchem 60 million SMILES 60 million SMILES L1000L1000 NCI60NCI60 L1000L1000 CTRPv2CTRPv2 2008820088 238238 4970049700 2008720087 239239 242 242 Number of Drugs Drug Targets ATC classes Number of Drugs Drug Targets ATC classes Matching Benchmarks 86 72 Matching Benchmarks 141 51 Supplementary Figure 2: Overlap of drug annotations across the L1000 and the NCI60 and CTRPv2 sensitivity datasets. Also indicated are the number of drugs from each DNF matrix, which overlap with the drug target and ATC benchmarks. SUPPLEMENTARY FIGURE 3 ATC Drug Drug-Target Classification Interactions Drug Taxonomy Under Evaluation BENCHMARK DATASETS Reduce to common subset of drugs between benchmark dataset and drug taxonomy Convert Drug Taxonomy into Convert Benchmark into Binary continuous vector of Drug-Drug Pairs Vector of Drug-Drug Pairs Drug A- Drug B 0.86 Drug A- Drug B 1 Assign Drug A- Drug C 0.54 Assign Drug A- Drug C 0 score of 1 if Drug A- Drug D score Drug A- Drug D 1 drug-drug 0.79 from drug pair share a Drug B- Drug C 0.3 taxonomy Drug B- Drug C 1 target, … … 0 otherwise Assess DNF at different levels of false positive rate (FPR) 1 1 0.5 0.5 Precision True positive rate True 0 0.5 1 0 0.5 1 False positive rate Recall Calculate Area Under the Calculate Precision-Recall Curve (AUC) Curves (PR) Supplementary Figure 3: Schematic representation of the validation of the DNF and single data type analyses against drug benchmarks. Drug taxonomies are converted into a continuous vector of drug-drug pairs. Benchmark datasets are converted into binary vectors, whereby a given drug-drug pair is assigned a value of ‘1’ if the drugs share a common drug target or ATC classification, and ‘0’ otherwise. Vectors are compared using AUROC and AUPRC. SUPPLEMENTARY FIGURE 4 A CTRPv2 B NCI60 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 Spearman Correlation Spearman Correlation 0.2 0.2 0.0 0.0 DNF vs Pert DNF vs Pert Pert vs Sens Pert vs Sens Pert DNF vs Sens DNF vs Sens Struct vs Pert Struct vs Pert DNF vs Struct DNF vs Struct Struct vs Sens Struct vs Sens Supplementary Figure 4: Complementarity of drug information across drug taxonomies. Spearman correlation between all pairs of single-layer similarity matrices (drug structure, drug perturbation, drug sensitivity) are depicted. Correlations between the integrative drug taxonomy (DNF) and each of the single-layer similarity matrices are also show. Data are shown for both (A) drug taxonomy using CTRPv2 and (B) drug taxonomy using the NCI60 sensitivity datasets. SUPPLEMENTARY FIGURE 5 A NCI60 - Targets C NCI60 - ATC 1.0 1.0 0.8 0.8 0.6 0.6 AUROC AUROC 0.4 Integration = 0.876 0.4 Integration = 0.853 Structure = 0.801 Structure = 0.772 Sensitivity = 0.801 Sensitivity = 0.685 True positive rate positive True rate positive True Perturbation = 0.615 Perturbation = 0.615 rand = 0.5 rand = 0.5 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate False positive rate B D AUC = 0.5518739 AUC = 0.4920411 1.0 Integration = 0.552 1.0 Integration = 0.492 Structure = 0.426 Structure = 0.404 Sensitivity = 0.434 Sensitivity = 0.278 Perturbation = 0.152 Perturbation = 0.242 rand = 0.048 rand = 0.095 0.8 0.8 AUPRC AUPRC 0.6 0.6 Precision Precision 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Recall Recall Supplementary Figure 5: Validation of single-dataset and DNF taxonomies against drug benchmark datasets, based on DNF generated using NCI60. ROC and PR curves are shown for each of the taxonomies, tested against ATC annotations and drug-target information from Chembl or internal benchmarks. A diagonal (grey) representing the null case (AUROC=0.5) is drawn for reference, and a grey curve is also drawn to map random (rand) cases for the PR curves. (A) ROC curve for NCI60 against drug-targets (B) PR curve against drug-targets (C) ROC curve for NCI60 against ATC (D) PR curve against ATC drug classifications. SUPPLEMENTARY FIGURE 6 AFATINIB TRAMETINIB PAZOPANIB BOSUTINIB LANATOSIDEC IBRUTINIB SOLANINE BISACODYL TYRPHOSTINAG1478 DABRAFENIB DASATINIB VEMURAFENIB PD184352 PROSTRATIN FOSTAMATINIB ERLOTINIB GITOXIGENIN OUABAIN LAPATINIB SELUMETINIB GEFITINIB DIGOXIN CRIZOTINIB C20 C42 C14 ITRACONAZOLE FLUOROMETHOLONE DACARBAZINE THIORPHAN PHENETHYLISOTHIOCYANATE GEFITINIB EMETINE PYRIMETHAMINE ISOTRETINOIN BRDK00910650 FORSKOLIN SELUMETINIB TERREICACID RALOXIFENE DIGOXIN NILOTINIB MEVASTATIN CURCUMIN LEFLUNOMIDE OLIGOMYCINA VINBLASTINE TEMSIROLIMUS ETACRYNICACID PACLITAXEL WITHAFERINA C5 NOBILETIN TANESPIMYCIN CINCHONINE VORINOSTAT CAFFEICACID BRDK00910650 PLUMBAGIN BORTEZOMIB MENADIONE PIMOZIDE AG957 DIHYDROERGOCRISTINE MALONOBEN LORATADINE MENADIONE CARMOFUR OXIDOPAMINE SA792541 HYPERICIN GEMCITABINE C32 CLADRIBINE HONOKIOL KINETINRIBOSIDE ETHINYLESTRADIOL DECITABINE TOPOTECAN STAT3INHIBITORVI PROCARBAZINE 6AMINOCHRYSENE MELPHALAN CADMIUMCHLORIDE TENIPOSIDE MEBENDAZOLE AG957 DAUNORUBICIN FENRETINIDE CYCLOPHOSPHAMIDE STAUROSPORINE ELESCLOMOL RADICICOL ARTEMETHER C2 RAZOXANE IRINOTECAN BISBENZIMIDE AMSACRINE TENIPOSIDE ETOPOSIDE CAMPTOTHECIN TOPOTECAN PODOPHYLLOTOXIN C48 C45 Supplementary Figure 6: Community of 53 Exemplar drugs of the DNF taxonomy generated using NCI60. Communities sharing similar MoA and proximity in the network are highlighted, with the community number indicated. -log 10 FDR 10 -log 2.5 2 1.5 1 0.5 0 C33 C38 C41 C45 C49 C4 C9 C10 C12 C18 C20 C31 L01BA A01AC R01AD D07AB D07XB S01BA C05AA S01CB L01CD P01BE L01BB ) Enrichment of communities for Drug target annotations, with -log10 values indicated in the ) Enrichment of communities for Drug target A C10AA L01CB H02AB D10AA L01DB L01CA L01AA L01XE B -log 10 FDR 10 -log 4 3 2 1 0 C49 C9 C12 C13 C14 C17 C18 C20 C21 C24 C25 C30 C33 C34 C38 C41 C42 C44 C45 C48 Serine/threonine−protein kinase B−raf to show significantly enriched communities. Communities classes, with -log10 values indicated in the heat map, which has been reduced ) Enrichment of communities for ATC B Tyrosine−protein kinase Lyn Breakpoint cluster region protein Fibroblast growth factor receptor 3 Serine/threonine−protein kinase mTOR Receptor−type tyrosine−protein kinase FLT3 Vascular endothelial growth factor receptor 2 Receptor tyrosine−protein kinase erbB−2 Tyrosine−protein kinase Lck Proto−oncogene tyrosine−protein kinase Src Microtubule−associated protein tau Microtubule−associated protein 2 Microtubule−associated protein 4 A total of 53 communities were tested for enrichment against drug target annotations from DrugBank and ATC annotations from ChEMBL. ( annotations from DrugBank and ATC annotations from tested for enrichment against drug target A total of 53 communities were Apoptosis regulator Bcl−2 Tubulin beta−1 chain DNA (cytosine−5)−methyltransferase 1 DNA topoisomerase I, mitochondrial DNA topoisomerase 1 Tubulin beta−4B chain Platelet−derived growth factor receptor beta Tyrosine−protein kinase ABL1 Platelet−derived growth factor receptor alpha Macrophage colony−stimulating factor 1 receptor Mast/stem cell growth factor receptor Kit Nuclear receptor subfamily 1 group I member 2 Estrogen receptor Progesterone receptor Sodium/potassium−transporting ATPase subunit alpha−1 Tubulin beta chain Glucocorticoid receptor Epidermal growth factor receptor Thymidylate synthase DNA topoisomerase 2−alpha DNA polymerase alpha catalytic subunit Ribonucleoside−diphosphate reductase large subunit 3−hydroxy−3−methylglutaryl−coenzyme A reductase Dihydrofolate reductase Lanosterol 14−alpha demethylase Supplementary Figure 7: Enrichment of Drug Communities of the DNF taxonomy generated using NCI60. Supplementary Figure labelled by community number as determined by the APC algorithm. ( to show significantly enriched communities. Communities are heatmap, which has been reduced labelled by community number as determined by the APC algorithm. are SUPPLEMENTARY FIGURE 7 FIGURE SUPPLEMENTARY A SUPPLEMENTARY FIGURE 8 A CTRPv2CTRPv2 B NCI60 NCI60 14 12 15 10 8 10 6 4 5 Number of communities Number of communities 2 0 0 2 3 4 5 6 7 8 9 2 3 4 5 6 7 9 10 11 Number of drugs in communities Number of drugs in communities Supplementary Figure 8: Distribution of drug communities sizes of the DNF taxonomy.