Supporting Information

Relating Essential Proteins To Drug Side-Effects Using Canonical Component Analysis: A Structure-Based Approach

Tianyun Liu and Russ B Altman*

Affiliations:

Tianyun Liu Department of Genetics, Stanford University Email: [email protected]

Russ B Altman Department of Genetics and Department of Bioengineering, Stanford University Email: [email protected]

1 Figure S1. We calculated the PF-affinity scores of celecoxib against 563 proteins in Essential Protein Dataset. Of the 563 proteins, we have found experimental IC50 recorded in IC50 Assay Dataset for six unique proteins, shown as color squares in the PDF plot of all 563 PF-affinity scores. The bottom panel shows a clear correlation between the PF-affinity scores and log (IC50) values. It suggests that the predicted PF- affinity score can be an indicator of binding affinity of celecoxib to essential proteins.

2 Figure S2. Performance of SVD-CCA We have created two subsets of the side-effect data from SIDER: SE-subset-50 includes side-effects that are only observed in fewer than 108 (50%) of the 216 drugs. SE- subset-20 includes side-effects that are only observed in fewer than 43 (20%) of the 216 drugs. We evaluated performance by using leave-one-out cross validations.

3 Figure S3-A. Characteristics of side-effect data from Yamanish’s publication We have downloaded side-effect data from Yamanishi’s published work [reference #2 in the manuscript]. This figure shows the histogram of the number of associated drugs (percentage of the 674 drugs) for each of the 1343 side-effects.

4 Figure S3-B. The drug molecular information data (known drug targets from KEGG) is sparse for the 674 drugs published by Yamanishi [REF]. The drug target profile includes a total of 9180 drug-protein interactions, involving 674 drugs and 2080 unique proteins. The average number of interactions per drug is 13 (of a total of 2080 proteins).

2000 s t e g r

a 1500 t

g u r d

1000 f o

r e

b 500 m u (from 2080 targets)2080 (from n 0 0 100 200 300 400 500 600 700 674 drugs published by Yamanishi et al

5 Figure S4-A. Performance of SVD-CCA on the dataset from Yamanishi’s publication The figure shows the AUC (under the ROC curve) scores for side-effect predictions for individual drugs in the leave-one-out cross validation. We compared the performance of SVD-CCA when using different sets of CCs. The average performance is not improved when the number of CCs used in predicting side-effects increases. This suggests that information stored in the first canonical component plays the dominant role.

6 Figure S4-B. We calculated the frequency of side-effect observed in side-effect data from Yamanishi’s publication. We then plotted the weights assigned to each of the 1343 side-effects in the first canonical component (CC#1) against the frequency of side- effects. It shows that the weight assigned in CC#1 correlates well with the frequency of side-effect.

7

Figure S5. The attribute weights from SVD-CCA process is sparse. We are also interested in the biological interpretability of the outputs to understand the relationship between off-targets and side-effects. Because the weights (alpha and beta) from SVD-CCA satisfy the following:

‖where‖ ≤ 1; p‖ and‖ q≤ are 1; ‖ the‖ length/ ≤ of 1; ‖ and‖ / ≤ 1 the CCs are considered as sparse components. This figures shows the sparsity of the weights from CC#3.

8 Figure S6. Statistics of the novel associations identified by SVD-CCA. The binding sites on essential proteins are ranked by the total number of associations to drugs and side-effects. The number of the predicted drug binding for the 50 essential proteins ranges from 2 to 33. These 50 proteins belong to the group of low promiscuity.

9 Figure S7-A. Predicted relationships of drug, protein and side-effects for NSAIDs. The five NSAIDs in our study are celcecoxib, valdecoxib, fenoprofen, naproxen and ibuprofen. They cause 38 types of significant side-effects. However, only eleven of them are shared by two or three NSAIDs. Another eleven side-effects that are unique to celecoxib; six to valdecoxib, five to naproxen, three to ibuprofen and one to fenoprofen. This is consistent to distribution of the essential proteins bound identified for the five NSAIDs, which are often unique to one specific drug, with only PPARG shared by celecoxib and fenoprofen; HIF1AN(transcription factor) shared by fenoprofen and ibuprofen.

10

11 Figure S7-B. Predicted relationships of drug, protein and side-effects for antidepressants. We also analyzed the relationships extracted for six antidepressants: imipramine, sertraline, desipramine, amitriptyline, clomipramine and fluoxetine. They cause 51 types of side-effects, of which 19 are unique to fluoxetine only. Another 24 side-effects are shared by two or three antidepressants. Of the twelve significant off-targets, seven are shared by two or more drugs.

12

13 Table S1. The most promiscuous drugs For each of the 216 drugs, we estimated its potential of binding to the 563 proteins by calculating the off-target scores. Using an off-target score cutoff of -2.0, we counted the number of off-targets of a given drug. The most promiscuous drugs (with number of off- targets >200) are: troglitazone, dasatinib, sorafeib, aliskiren, imatinib and nilotinib.

DrugBank ID and name Number of known Number of predicted targets targets DB01076 Atorvastatin 3 108 DB01132 Pioglitazone 1 110 DB00625 Efavirenz 1 111 DB00650 Leucovorin 1 117 DB00675 Tamoxifen 2 118 DB00563 Methotrexate 1 121 DB01264 Darunavir 1 122 DB00458 Imipramine 14 124 DB00642 Pemetrexed 4 126 DB00701 Amprenavir 1 127 DB00503 Ritonavir 1 128 DB01151 Desipramine 14 134 DB00410 Mupirocin 1 135 DB00932 Tipranavir 1 138 DB01242 Clomipramine 7 138 DB00220 Nelfinavir 1 140 DB00615 Rifabutin 5 140 DB01072 Atazanavir 1 144 DB00300 Tenofovir 0 147 DB00472 Fluoxetine 1 154 DB01200 Bromocriptine 18 163 DB00224 Indinavir 1 165 DB01232 Saquinavir 1 167 DB01201 Rifapentine 1 172 DB01268 Sunitinib 8 173 DB00317 Gefitinib 1 189 DB00199 Erythromycin 2 193 DB01157 Trimetrexate 1 194 DB00197 Troglitazone 6 204 DB01254 Dasatinib 10 208 DB00398 Sorafenib 7 235 DB01258 Aliskiren 1 235 DB00619 Imatinib 9 241 DB04868 Nilotinib 2 260

14 Table S2. The most promiscuous targets For each of the 563 proteins, we estimated its potential of binding to the 216 drugs by calculating the off-target scores. Using an off-target score cutoff of -2.0, we counted the number of drugs of a given protein. The most promiscuous targets are listed as below.

ID Protein name KEGG_PATHWAY P16118 6-phosphofructo-2-/fructose- hsa00051:Fructose and mannose metabolism, 2,6-biphosphatase 1 Q16875 6-phosphofructo-2-kinase/fructose- hsa00051:Fructose and mannose metabolism, 2,6-biphosphatase 3 P10415 B-cell CLL/lymphoma 2 hsa04210:Apoptosis,hsa04510:Focal adhesion,hsa04722:Neurotrophin signaling pathway,hsa05014:Amyotrophic lateral sclerosis (ALS),hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05215:Prostate cancer,hsa05222:Small cell lung cancer, Q06187 Bruton agammaglobulinemia hsa04662:B cell receptor signaling pathway,hsa04664:Fc epsilon RI signaling pathway,hsa05340:Primary immunodeficiency, P26358 DNA (cytosine-5-)- hsa00270:Cysteine and methionine metabolism, methyltransferase 1 Q09472 E1A binding protein p300 hsa04110:Cell cycle,hsa04310:Wnt signaling pathway,hsa04330:Notch signaling pathway,hsa04350:TGF-beta signaling pathway,hsa04520:Adherens junction,hsa04630:Jak- STAT signaling pathway,hsa04720:Long-term potentiation,hsa04916:Melanogenesis,hsa05016:Hunting ton's disease,hsa05200:Pathways in cancer,hsa05211:Renal cell carcinoma,hsa05215:Prostate cancer, Q08881 IL2-inducible T-cell kinase hsa04062:Chemokine signaling pathway,hsa04660:T cell receptor signaling pathway,hsa04670:Leukocyte transendothelial migration, O60674 Janus kinase 2 hsa04062:Chemokine signaling pathway,hsa04630:Jak- STAT signaling pathway,hsa04920:Adipocytokine signaling pathway, Q9UHD2 TANK-binding kinase 1 hsa04620:Toll-like receptor signaling pathway,hsa04622:RIG-I-like receptor signaling pathway,hsa04623:Cytosolic DNA-sensing pathway, P00568 1 hsa00230:Purine metabolism, P27144 adenylate kinase 3-like 2; hsa00230:Purine metabolism, adenylate kinase 3-like 1 P15121 aldo-keto reductase family 1, hsa00040:Pentose and glucuronate member B1 (aldose reductase) interconversions,hsa00051:Fructose and mannose metabolism,hsa00052:Galactose metabolism,hsa00561:Glycerolipid metabolism,hsa00620:Pyruvate metabolism, O14727 apoptotic peptidase activating hsa04115:p53 signaling factor 1 pathway,hsa04210:Apoptosis,hsa05010:Alzheimer's disease,hsa05012:Parkinson's disease,hsa05014:Amyotrophic lateral sclerosis (ALS),hsa05016:Huntington's disease,hsa05222:Small cell lung cancer, P00519 c-abl oncogene 1, receptor tyrosine hsa04012:ErbB signaling pathway,hsa04110:Cell kinase cycle,hsa04360:Axon guidance,hsa04722:Neurotrophin signaling pathway,hsa05130:Pathogenic Escherichia coli infection,hsa05200:Pathways in cancer,hsa05220:Chronic myeloid leukemia,hsa05416:Viral myocarditis, P25116 coagulation factor II (thrombin) hsa04020:Calcium signaling

15 receptor pathway,hsa04080:Neuroactive ligand-receptor interaction,hsa04144:Endocytosis,hsa04610:Complemen t and coagulation cascades,hsa04810:Regulation of actin cytoskeleton, P05093 cytochrome P450, family 17, hsa00140:Steroid hormone biosynthesis, subfamily A, polypeptide 1 P00533 epidermal growth factor receptor hsa04010:MAPK signaling pathway,hsa04012:ErbB (erythroblastic leukemia viral (v- signaling pathway,hsa04020:Calcium signaling erb-b) oncogene homolog, avian) pathway,hsa04060:Cytokine-cytokine receptor interaction,hsa04144:Endocytosis,hsa04320:Dorso- ventral axis formation,hsa04510:Focal adhesion,hsa04520:Adherens junction,hsa04540:Gap junction,hsa04810:Regulation of actin cytoskeleton,hsa04912:GnRH signaling pathway,hsa05120:Epithelial cell signaling in Helicobacter pylori infection,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05212:Pancreatic cancer,hsa05213:Endometrial cancer,hsa05214:Glioma,hsa05215:Prostate cancer,hsa05218:Melanoma,hsa05219:Bladder cancer,hsa05223:Non-small cell lung cancer, P05413 fatty acid binding protein 3, muscle hsa03320:PPAR signaling pathway, and heart (mammary-derived growth inhibitor) P49327 fatty acid synthase hsa00061:Fatty acid biosynthesis,hsa04910:Insulin signaling pathway, P51570 1 hsa00052:Galactose metabolism,hsa00520:Amino sugar and nucleotide sugar metabolism, P09211 glutathione S- pi 1 hsa00480:Glutathione metabolism,hsa00980:Metabolism of xenobiotics by cytochrome P450,hsa00982:Drug metabolism,hsa05200:Pathways in cancer,hsa05215:Prostate cancer, P49841 glycogen synthase kinase 3 beta hsa04012:ErbB signaling pathway,hsa04062:Chemokine signaling pathway,hsa04110:Cell cycle,hsa04310:Wnt signaling pathway,hsa04340:Hedgehog signaling pathway,hsa04360:Axon guidance,hsa04510:Focal adhesion,hsa04660:T cell receptor signaling pathway,hsa04662:B cell receptor signaling pathway,hsa04722:Neurotrophin signaling pathway,hsa04910:Insulin signaling pathway,hsa04916:Melanogenesis,hsa05010:Alzheimer' s disease,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05213:Endometrial cancer,hsa05215:Prostate cancer,hsa05217:Basal cell carcinoma, P42345 mechanistic target of rapamycin hsa04012:ErbB signaling pathway,hsa04150:mTOR (serine/threonine kinase) signaling pathway,hsa04910:Insulin signaling pathway,hsa04920:Adipocytokine signaling pathway,hsa04930:Type II diabetes mellitus,hsa05200:Pathways in cancer,hsa05214:Glioma,hsa05215:Prostate cancer,hsa05221:Acute myeloid leukemia, P08581 met proto-oncogene (hepatocyte hsa04060:Cytokine-cytokine receptor growth factor receptor) interaction,hsa04144:Endocytosis,hsa04360:Axon guidance,hsa04510:Focal adhesion,hsa04520:Adherens junction,hsa05120:Epithelial cell signaling in Helicobacter pylori infection,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05211:Renal cell carcinoma,hsa05218:Melanoma, Q15759 mitogen-activated 11 hsa04010:MAPK signaling pathway,hsa04370:VEGF signaling pathway,hsa04620:Toll-like receptor signaling

16 pathway,hsa04621:NOD-like receptor signaling pathway,hsa04622:RIG-I-like receptor signaling pathway,hsa04660:T cell receptor signaling pathway,hsa04664:Fc epsilon RI signaling pathway,hsa04670:Leukocyte transendothelial migration,hsa04722:Neurotrophin signaling pathway,hsa04912:GnRH signaling pathway,hsa04914:Progesterone-mediated oocyte maturation,hsa05014:Amyotrophic lateral sclerosis (ALS),hsa05120:Epithelial cell signaling in Helicobacter pylori infection, P04150 nuclear receptor subfamily 3, group hsa04080:Neuroactive ligand-receptor interaction, C, member 1 (glucocorticoid receptor) O00482 nuclear receptor subfamily 5, group hsa04950:Maturity onset diabetes of the young, A, member 2 P41146 opiate receptor-like 1 hsa04080:Neuroactive ligand-receptor interaction, Q9NQU5 p21 protein (Cdc42/Rac)-activated hsa04012:ErbB signaling pathway,hsa04360:Axon kinase 6 guidance,hsa04510:Focal adhesion,hsa04660:T cell receptor signaling pathway,hsa04810:Regulation of actin cytoskeleton,hsa05211:Renal cell carcinoma, P37231 peroxisome proliferator-activated hsa03320:PPAR signaling receptor gamma pathway,hsa05016:Huntington's disease,hsa05200:Pathways in cancer,hsa05216:Thyroid cancer, P35558 phosphoenolpyruvate hsa00010:Glycolysis / carboxykinase 1 (soluble) Gluconeogenesis,hsa00020:Citrate cycle (TCA cycle),hsa00620:Pyruvate metabolism,hsa03320:PPAR signaling pathway,hsa04910:Insulin signaling pathway,hsa04920:Adipocytokine signaling pathway, P06401 progesterone receptor hsa04114:Oocyte meiosis,hsa04914:Progesterone- mediated oocyte maturation, P61586 ras homolog family, member hsa04062:Chemokine signaling A pathway,hsa04270:Vascular smooth muscle contraction,hsa04310:Wnt signaling pathway,hsa04350:TGF-beta signaling pathway,hsa04360:Axon guidance,hsa04510:Focal adhesion,hsa04520:Adherens junction,hsa04530:Tight junction,hsa04660:T cell receptor signaling pathway,hsa04670:Leukocyte transendothelial migration,hsa04722:Neurotrophin signaling pathway,hsa04810:Regulation of actin cytoskeleton,hsa05130:Pathogenic Escherichia coli infection,hsa05200:Pathways in cancer, P63000 ras-related C3 botulinum toxin hsa04010:MAPK signaling substrate 1 (rho family, small GTP pathway,hsa04062:Chemokine signaling binding protein Rac1) pathway,hsa04310:Wnt signaling pathway,hsa04360:Axon guidance,hsa04370:VEGF signaling pathway,hsa04510:Focal adhesion,hsa04520:Adherens junction,hsa04620:Toll- like receptor signaling pathway,hsa04650:Natural killer cell mediated cytotoxicity,hsa04662:B cell receptor signaling pathway,hsa04664:Fc epsilon RI signaling pathway,hsa04666:Fc gamma R-mediated phagocytosis,hsa04670:Leukocyte transendothelial migration,hsa04722:Neurotrophin signaling pathway,hsa04810:Regulation of actin cytoskeleton,hsa05014:Amyotrophic lateral sclerosis (ALS),hsa05120:Epithelial cell signaling in Helicobacter pylori infection,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05211:Renal cell carcinoma,hsa05212:Pancreatic cancer,hsa05416:Viral

17 myocarditis, P10276 retinoic acid receptor, alpha hsa05200:Pathways in cancer,hsa05221:Acute myeloid leukemia, P19793 retinoid X receptor, alpha hsa03320:PPAR signaling pathway,hsa04920:Adipocytokine signaling pathway,hsa05200:Pathways in cancer,hsa05216:Thyroid cancer,hsa05222:Small cell lung cancer,hsa05223:Non-small cell lung cancer, P23443 ribosomal protein S6 kinase, hsa04012:ErbB signaling pathway,hsa04150:mTOR 70kDa, polypeptide 1 signaling pathway,hsa04350:TGF-beta signaling pathway,hsa04666:Fc gamma R-mediated phagocytosis,hsa04910:Insulin signaling pathway,hsa05221:Acute myeloid leukemia, O00141 serum/glucocorticoid regulated hsa04960:Aldosterone-regulated sodium reabsorption, kinase 1 Q9NYA1 kinase 1 hsa00600:Sphingolipid metabolism,hsa04020:Calcium signaling pathway,hsa04370:VEGF signaling pathway,hsa04666:Fc gamma R-mediated phagocytosis, P21453 sphingosine-1-phosphate receptor hsa04080:Neuroactive ligand-receptor interaction, 1 O14717 tRNA aspartic acid hsa00270:Cysteine and methionine metabolism, methyltransferase 1 P36897 transforming growth factor, beta hsa04010:MAPK signaling pathway,hsa04060:Cytokine- receptor 1 cytokine receptor interaction,hsa04144:Endocytosis,hsa04350:TGF-beta signaling pathway,hsa04520:Adherens junction,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05212:Pancreatic cancer,hsa05220:Chronic myeloid leukemia, P01116 v-Ki-ras2 Kirsten rat sarcoma viral hsa04010:MAPK signaling pathway,hsa04012:ErbB oncogene homolog signaling pathway,hsa04062:Chemokine signaling pathway,hsa04320:Dorso-ventral axis formation,hsa04360:Axon guidance,hsa04370:VEGF signaling pathway,hsa04530:Tight junction,hsa04540:Gap junction,hsa04650:Natural killer cell mediated cytotoxicity,hsa04660:T cell receptor signaling pathway,hsa04662:B cell receptor signaling pathway,hsa04664:Fc epsilon RI signaling pathway,hsa04720:Long-term potentiation,hsa04722:Neurotrophin signaling pathway,hsa04730:Long-term depression,hsa04810:Regulation of actin cytoskeleton,hsa04910:Insulin signaling pathway,hsa04912:GnRH signaling pathway,hsa04914:Progesterone-mediated oocyte maturation,hsa04916:Melanogenesis,hsa04960:Aldoster one-regulated sodium reabsorption,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05211:Renal cell carcinoma,hsa05212:Pancreatic cancer,hsa05213:Endometrial cancer,hsa05214:Glioma,hsa05215:Prostate cancer,hsa05216:Thyroid cancer,hsa05218:Melanoma,hsa05219:Bladder cancer,hsa05220:Chronic myeloid leukemia,hsa05221:Acute myeloid leukemia,hsa05223:Non-small cell lung cancer, P31749 v-akt murine thymoma viral hsa04010:MAPK signaling pathway,hsa04012:ErbB oncogene homolog 1 signaling pathway,hsa04062:Chemokine signaling pathway,hsa04150:mTOR signaling pathway,hsa04210:Apoptosis,hsa04370:VEGF signaling pathway,hsa04510:Focal adhesion,hsa04530:Tight

18 junction,hsa04620:Toll-like receptor signaling pathway,hsa04630:Jak-STAT signaling pathway,hsa04660:T cell receptor signaling pathway,hsa04662:B cell receptor signaling pathway,hsa04664:Fc epsilon RI signaling pathway,hsa04666:Fc gamma R-mediated phagocytosis,hsa04722:Neurotrophin signaling pathway,hsa04910:Insulin signaling pathway,hsa04914:Progesterone-mediated oocyte maturation,hsa04920:Adipocytokine signaling pathway,hsa05200:Pathways in cancer,hsa05210:Colorectal cancer,hsa05211:Renal cell carcinoma,hsa05212:Pancreatic cancer,hsa05213:Endometrial cancer,hsa05214:Glioma,hsa05215:Prostate cancer,hsa05218:Melanoma,hsa05220:Chronic myeloid leukemia,hsa05221:Acute myeloid leukemia,hsa05222:Small cell lung cancer,hsa05223:Non-small cell lung cancer, Q15303 v-erb-a erythroblastic leukemia viral hsa04012:ErbB signaling pathway,hsa04020:Calcium oncogene homolog 4 signaling pathway,hsa04144:Endocytosis, P04626 v-erb-b2 erythroblastic leukemia hsa04012:ErbB signaling pathway,hsa04020:Calcium viral oncogene homolog 2, signaling pathway,hsa04510:Focal neuro/glioblastoma derived adhesion,hsa04520:Adherens oncogene homolog junction,hsa05200:Pathways in cancer,hsa05212:Pancreatic cancer,hsa05213:Endometrial cancer,hsa05215:Prostate cancer,hsa05219:Bladder cancer,hsa05223:Non-small cell lung cancer,

19 Table S3. Statistics of the novel associations identified by SVD-CCA. The binding sites on essential proteins are ranked by the total number of associations to drugs and side- effects. The is named by its PDB identifier followed by the ligand ID co- crystallized in that structure. The number of drugs and that of side-effectes associated with this binding site are listed in column 3 and column 4. Some structures are complexes with two proteins and their names are listed in column 5 and 6. The 2768 specific associations involve 520 unique pairs of drug and essential proteins. Only four of these pairs are known drug target pairs found in DrugBank. They are peroxisome proliferator-activated receptor gamma (PPAR):telmisartan, PPAR:rosiglitazone, PPAR:pioglitazone, and retinoic acid receptor RXR-alpha:bexarotene.

Binding Sites #Associations #Drugs # Side- Protein name effects 3gn8_DEX 257 22 48 Glucocorticoid receptor 2 Nuclear receptor coactivator 2 3e00_REA 257 33 42 Retinoic acid receptor RXR- Peroxisome proliferator- alpha activated receptor gamma 1sr7_MOF 252 20 53 Progesterone receptor 3k22_JZS 225 20 47 Glucocorticoid receptor Transcriptional Intermediary Factor 2 2hk5_1BM 135 26 32 Tyrosine-protein kinase HCK 3jpt_GFC 123 33 23 DNA beta 3ugc_046 111 24 25 Tyrosine-protein kinase JAK2 3oll_EST 104 6 30 Estrogen receptor beta Nuclear receptor coactivator 1 3kmr_EQN 104 6 30 Retinoic acid receptor alpha Nuclear receptor coactivator 1 3bej_MUF 102 31 16 Bile acid receptor Nuclear receptor coactivator 1 2p54_735 87 11 38 Peroxisome proliferator- Nuclear receptor activated receptor alpha coactivator 1 3b0t_MCZ 85 11 26 Vitamin D3 receptor 3tx7_P6L 64 6 21 Nuclear receptor subfamily 5 Catenin beta-1 group A member 2 1hy7_MBS 63 16 18 Stromelysin-1/matrix metalloproteinase-3 3l0l_HC3 58 16 12 Nuclear receptor ROR- SCR2-2 gamma 1xiu_REA 55 8 18 RXR-like protein Nuclear receptor coactivator 1 2bu7_TF3 52 15 15 Pyruvate dehydrogenase kinase isoenzyme 2 1h2k_OGA 52 11 17 Factor inhibiting HIF1 Hypoxia-inducible factor 1 alpha 3v2y_ML5 51 10 13 Sphingosine 1-phosphate receptor 1 1d2s_DHT 49 8 24 Sex hormone binding globulin 3l0e_G58 44 14 13 Oxysterols receptor LXR- Nuclear receptor beta coactivator 2 2hzq_STR 43 6 22 Apolipoprotein D 4don_3PF 38 15 11 Bromodomain-containing protein 4 3v5q_0F4 33 9 12 NT-3 growth factor receptor 3swz_TOK 26 8 11 Steroid 17-alpha- hydroxylase/17

20 3gen_B43 23 14 6 Tyrosine-protein kinase BTK 2isp_GGH 23 7 15 Polymerase (DNA directed) 4hcu_13L 21 6 15 Tyrosine-protein kinase ITK/TSK 3vw9_HPJ 21 11 15 Lactoylglutathione 2y6d_TQJ 21 7 12 MATRILYSIN 4h75_M3L 19 8 12 Spindlin-1 Histone H3 3ew8_B3N 19 6 11 Histone deacetylase 8 3u5s_FAD 18 8 11 FAD-linked sulfhydryl oxidase ALR 4iqr_MYR 17 9 9 Hepatocyte nuclear factor 4- alpha 1mqb_ANP 15 3 9 Ephrin type-A receptor 2 3p1f_3PF 12 10 6 CREB-binding protein 3vhe_42Q 11 3 9 Vascular endothelial growth factor receptor 2 2q3y_1CA 11 1 11 Ancestral Corticiod Receptor Nuclear receptor 0B2 4alg_1GH 10 7 5 Bromodomain-containing protein 2 3feg_HC7 9 6 6 Choline/ethanolamine kinase 3s92_JQ1 8 2 7 Bromodomain-containing protein 3 3tjm_7FA 7 5 5 Fatty acid synthase 1us0_NDP 7 1 6 Aldose reductase 4flp_JQ1 6 4 5 Bromodomain testis-specific protein 2fo0_P16 4 2 4 Proto-oncogene tyrosine- protein kinase ABL1 (1B ISOFORM) 2e0a_ANP 4 3 3 Pyruvate dehydrogenase kinase isozyme 4 3sxs_PP2 3 1 3 Cytoplasmic tyrosine-protein kinase BMX 3lzb_ITI 3 1 3 Epidermal growth factor receptor 1z6t_ADP 3 2 2 Apoptotic protease activating factor 1 2ocf_EST 2 2 2 Estrogen receptor alpha Fibronectin

21 Table S4. We have extracted 2768 sets of relationships between drug, protein and side- effect by analyzing significantly related attributes extracted from SVD-CCA. The 2768 relationships involve 99 drugs, 53 and 77 side-effects, resulting 520 unique pairs of drug and protein, 826 unique pairs of gene and side-effect. A total of 23 unique pairs of drug and protein are found in our IC50-Assay-Dataset. The IC50 values of the 23 pairs show high-affinity binding (IC50 <12uM) between the drugs and the proteins, corresponding to their predicted scores. The 23 pairs span in 120 sets of relationships of drug, protein and side-effect.

Drug Name DrugBank ID Target Predicted Observed Observed side- UniProt Score IC50(nM) effects TROGLITAZONE DB00197 Q15596 -2.543 300 C0151576 DIGOXIN DB00390 Q15596 -8.345 2000 C0018799 DIGOXIN DB00390 Q15596 -8.345 2000 C0022116 DIGOXIN DB00390 Q15596 -8.345 2000 C0042514 PROGESTERONE DB00396 Q15596 -3.789 37 C0019572 PROGESTERONE DB00396 Q15596 -3.789 37 C0035204 PROGESTERONE DB00396 Q15596 -3.789 37 C0149745 PROGESTERONE DB00396 Q15596 -3.789 37 C0333641 PROGESTERONE DB00396 Q15596 -3.789 37 C0851341 ROSIGLITAZONE DB00412 P37231 -10.331 3.8 C0020433 ROSIGLITAZONE DB00412 P37231 -10.331 3.8 C0022116 ROSIGLITAZONE DB00412 P37231 -10.331 3.8 C0032226 ROSIGLITAZONE DB00412 P37231 -10.331 3.8 C0151744 ROSIGLITAZONE DB00412 P37231 -10.331 3.8 C0234632 RALOXIFENE DB00481 Q15596 -3.531 0.3 C0574067 RALOXIFENE DB00481 Q15596 -3.531 0.3 C0574067 TESTOSTERONE DB00624 Q15596 -4.245 2.7 C0017168 TESTOSTERONE DB00624 Q15596 -4.245 2.7 C0019572 TESTOSTERONE DB00624 Q15596 -4.245 2.7 C0035204 TESTOSTERONE DB00624 Q15596 -4.245 2.7 C0333641 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0022408 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0022408 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0086543 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0086543 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0311468 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0311468 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0574067 TAMOXIFEN DB00675 Q15596 -4.358 8.4 C0574067 TELMISARTAN DB00966 P37231 -9.822 3400 C0003864 TELMISARTAN DB00966 P37231 -9.822 3400 C0017168 TELMISARTAN DB00966 P37231 -9.822 3400 C0022408 TELMISARTAN DB00966 P37231 -9.822 3400 C0025222 TELMISARTAN DB00966 P37231 -9.822 3400 C0027121 TELMISARTAN DB00966 P37231 -9.822 3400 C0035204 TELMISARTAN DB00966 P37231 -9.822 3400 C0151576 TELMISARTAN DB00966 P37231 -9.822 3400 C0857121 TELMISARTAN DB00966 Q15596 -3.611 12200 C0017168 TELMISARTAN DB00966 Q15596 -3.611 12200 C0022408 TELMISARTAN DB00966 Q15596 -3.611 12200 C0035204 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0019572 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0019572 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0086543 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0086543 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0311468 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0311468 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0574067 BICALUTAMIDE DB01128 Q15596 -6.638 25 C0574067

22 PIOGLITAZONE DB01132 P37231 -6.163 140 C0022116 PIOGLITAZONE DB01132 P37231 -6.163 140 C0151576 PIOGLITAZONE DB01132 P37231 -6.163 140 C0151744 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0019572 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0019572 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0022408 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0022408 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0086543 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0086543 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0333641 DEXAMETHASONE DB01234 Q15596 -10.244 22 C0333641 DASATINIB DB01254 Q06187 -6.455 1.3 C0027121

23 Methods of SVD-CCA

Proof of the maximizing can be achieved by setting , when S 1 is the largest singular value.α X Yβ = ; =

We have = +

= +

= [, . . , ] … … and:

, , = 0, = 0, = 1, = 1 = 1 = 1 then

|| = + + + = + ;

|| = + ; Therefore can be written as: α X Yβ

( ) α X Yβ = ( + ) ∙ [, . . , , … , …] … + …

=[ + + , … + + ] … … +

0 … =[ 0 … , … 0, , 0 … ] … 0 … = (1) +

24 Below we prove that choosing (supposing is the max among all ) will maximize = ; = +

( − ) + − > 0 Then

+ − 2 + + − 2 > 0

+ + + > 2 + 2

Therefore:

≥ + According to equation (1)

≥ α X Yβ

25