Supporting Information: Eﬃcient Multi-Task Chemogenomics for Drug Speciﬁcity Prediction

Supporting Information: Efficient Multi-task chemogenomics for drug specificity prediction Benoit Playe,†,¶ Chloé-Agathe Azencott,†,¶ and Véronique Stoven∗,†,¶ †Mines ParisTech, PSL Research University, Centre for Computational Biology, 35 Rue Saint-Honoré,F-77305 Fontainebleau Cedex, France ‡Institut Curie F-75248 Paris, France E-mail: [email protected] Supplementary Figure F1 1.00 ROC-AUC 0.95 AUPR 0.90 0.85 Scores 0.80 0.75 S1 S2 S3 S4 Figure 1: Scores of MT SVM chemogenomics on S1 − S4 settings obtained with the 5-fold CV scheme. 1 Supplementary Figure F2 1.00 1.00 ROC-AUC ROC-AUC 0.98 0.95 AUPR AUPR 0.96 0.90 0.94 0.92 Scores 0.85 Scores 0.90 0.80 0.88 S1 S2 S3 Nested-5foldCV 5foldCV Figure 2: Scores of MT Kernel Ridge Figure 3: Scores of MT Kernel Ridge regression on S1/2/34 in nested- regression on S1 depending on CV 5foldCV scheme . Supplementary Figure F3 1.0 ROC-AUC 0.9 AUPR 0.8 Scores 0.7 0.6 S1' S2' S3' S4' Figure 4: Scores of MT SVM method on S1’, S2’, S3’ and S4’ datasets with nested-5-fold CV 2 Supplementary Figure F4 1.00 ROC-AUC 0.98 AUPR 0.96 0.94 Scores 0.92 0.90 Nested 5-fold CV LOO CV 5-fold CV Figure 5: Scores of the MT method on S1 depending on CV scheme. Overall, all CV schemes provide high prediction performance on this dataset, in the range of 0.93-0.94 in AUC and AUPR. The Nested 5-fold CV leads to performance very close to those of 5-fold CV, showing that on the S1 dataset, 5-fold CV did not suffer from overestimation of the performance due to data over-fitting. LOO CV leads to slightly better results, although very close to those of the other CV schemes. In general, the LOO CV scheme is expected to provide better results because the model is trained on more data points than 5-fold CV. Again, this problem seems to be limited here, since the performance of LOO CV does not differ much from that of Nested 5-fold CV. Supplementary Tables Table 1: scores corresponding to 5-fold cross-validated MT on S1–S4 Setting AUC AUPR S1 93.4 ± 0.11 93.4 ± 0.19 S2 86.3 ± 0.32 87.4 ± 0.35 S3 89.2 ± 0.58 89.9 ± 0.46 S4 81.0 ± 2.08 82.2 ± 2.40 3 Table 2: scores corresponding to MT on S1 depending on the CV scheme CV scheme AUC AUPR Nested-5-fold CV 93.3 ± 0.11 93.4 ± 0.19 5-fold CV 93.3 ± 0.12 93.3 ± 0.20 LOO-CV 94.3 ± 0.17 94.2 ± 0.29 Table 3: scores corresponding to Nested 5-fold cross-validated MT on S1’–S4’ Dataset AUC AUPR S1’ 69.3 ± 5.16 79.8 ± 1.82 S2’ 69.4 ± 0.68 66.9 ± 0.53 S3’ 66.3 ± 0.66 62.4 ± 0.95 S4’ 46.3 ± 2.33 34.1 ± 0.95 Table 4: values corresponding to LOO-CV ligand-based ST and MT-intra on S1 model/nb neg1 2 5 10 50 full in intra task ligand- 91.02 ± 91.39 ± 92.2 ± 92.84 ± 93.43 ± 91.62 ± based 0.26 0.32 0.23 0.2 0.14 0.18 ST KronSVM 94.19 ± 94.72 ± 95.19 ± 95.49 ± 95.59 ± 95.29 ± 0.13 0.18 0.18 0.12 0.09 0.04 Table 5: values corresponding to LOO-CV NN-MT on S1 nb pos/nb neg in NN intra task 1 2 5 10 0 95.49 ± 0.12 95.49 ± 0.12 95.49 ± 0.12 95.49 ± 0.12 1 95.79 ± 0.1 95.82 ± 0.1 95.78 ± 0.1 95.69 ± 0.13 5 96.09 ± 0.15 96.05 ± 0.16 95.87 ± 0.13 95.7 ± 0.11 10 96.2 ± 0.13 96.11 ± 0.16 95.93 ± 0.12 95.7 ± 0.09 50 96.18 ± 0.09 96.18 ± 0.07 96.0 ± 0.08 95.75 ± 0.09 Table 6: values corresponding to LOO-CV RN-MT on S1 nb pos/nb neg in RN extra task 1 2 5 10 0 95.49 ± 0.12 95.49 ± 0.12 95.49 ± 0.12 95.49 ± 0.12 1 95.65 ± 0.14 95.63 ± 0.14 95.59 ± 0.14 95.54 ± 0.14 5 95.89 ± 0.15 95.83 ± 0.15 95.69 ± 0.14 95.52 ± 0.14 10 96.02 ± 0.16 95.93 ± 0.16 95.72 ± 0.15 95.48 ± 0.14 50 95.86 ± 0.17 95.76 ± 0.16 95.55 ± 0.14 95.33 ± 0.14 4 Table 7: values corresponding to LOO-CV MT-intra on S1 with similarity constraint on intra-task pairs θ 1 2 10 50 20 66.37 ± 1.18 66.9 ± 0.57 67.19 ± 0.72 67.34 ± 0.35 30 70.41 ± 0.85 70.87 ± 0.61 71.65 ± 0.54 71.63 ± 0.51 50 72.3 ± 0.52 72.52 ± 0.71 73.79 ± 0.54 73.78 ± 0.59 80 75.55 ± 0.35 76.46 ± 0.43 77.83 ± 0.51 77.49 ± 0.26 Table 8: values corresponding to LOO-CV ligand-based ST on S1 with similarity constraint on intra-task pairs centile/ratio 1 2 10 50 20 66.0 ± 1.08 65.84 ± 1.11 69.62 ± 0.63 72.24 ± 0.36 30 64.86 ± 0.64 65.87 ± 0.42 70.04 ± 0.9 72.3 ± 0.74 50 67.11 ± 0.81 67.35 ± 0.79 71.28 ± 0.81 73.09 ± 0.6 80 69.13 ± 1.02 70.18 ± 0.33 74.09 ± 0.85 75.77 ± 1.24 Table 9: values corresponding to LOO-CV NN-MT on S1 with similarity constraint on intra- task pairs (θ = 20) nb pos/nb neg ratio in NN extra task 1 2 5 0 66.37 ± 1.18 66.37 ± 1.18 66.37 ± 1.18 1 85.64 ± 0.83 84.61 ± 0.73 82.89 ± 0.55 5 85.99 ± 0.76 84.81 ± 0.37 83.15 ± 0.5 10 84.65 ± 0.86 83.09 ± 0.36 81.69 ± 0.52 50 77.83 ± 0.58 77.11 ± 0.39 77.49 ± 0.3 Table 10: values corresponding to LOO-CV NN-MT on S1 with similarity constraint on intra-task pairs (θ = 80) nb pos/nb neg ratio in NN extra task 1 2 5 0 75.55 ± 0.35 75.55 ± 0.35 75.55 ± 0.35 1 86.12 ± 0.34 85.76 ± 0.36 84.87 ± 0.19 5 86.98 ± 0.37 86.52 ± 0.42 87.51 ± 0.22 10 86.82 ± 0.41 86.02 ± 0.44 87.24 ± 0.44 50 82.72 ± 0.49 82.73 ± 0.42 81.66 ± 0.27 5 Table 11: values corresponding to LOO-CV RN-MT on S1 with similarity constraint on intra-task pairs (θ = 20) nb pos/nb neg ratio in RN extra task 1 2 5 0 66.37 ± 1.18 66.37 ± 1.18 66.37 ± 1.18 1 63.83 ± 1.19 63.83 ± 0.81 66.06 ± 0.97 5 65.77 ± 0.89 65.98 ± 0.47 67.31 ± 0.35 10 67.55 ± 1.07 67.06 ± 1.05 65.65 ± 0.4 50 69.64 ± 0.98 70.19 ± 0.88 70.37 ± 0.56 Table 12: values corresponding to LOO-CV RN-MT on S1 with similarity constraint on intra-task pairs (θ = 80) nb pos/nb neg ratio in RN extra task 1 2 5 0 75.55 ± 0.35 75.55 ± 0.35 75.55 ± 0.35 1 72.15 ± 0.25 72.35 ± 0.19 75.51 ± 0.51 5 74.57 ± 0.38 75.73 ± 0.32 78.46 ± 0.36 10 75.78 ± 0.24 75.1 ± 0.63 77.43 ± 0.35 50 76.94 ± 0.44 75.91 ± 0.22 75.31 ± 0.27 Table 13: values corresponding to LOO-CV MT-intra , NN-MT, RN-MT on S1 with similarity constraint on intra-task pairs model/centile 20 30 50 80 MT-intra 66.37 ± 1.18 70.41 ± 0.85 72.3 ± 0.52 75.55 ± 0.35 NN-MT 84.65 ± 0.86 85.68 ± 0.52 86.24 ± 0.45 86.82 ± 0.41 RN-MT 67.55 ± 1.07 70.55 ± 0.66 72.36 ± 0.36 75.78 ± 0.24 Table 14: values corresponding to LOO-CV NN-MT on S1 with similarity constraint on intra- and extra-task pairs (θ = 20) nb pos/nb neg ratio in NN extra task 1 2 5 1 63.39 ± 0.82 64.15 ± 1.3 65.87 ± 0.5 5 64.63 ± 0.84 64.98 ± 0.56 66.02 ± 0.24 10 65.64 ± 0.87 65.61 ± 0.7 64.53 ± 0.85 50 64.88 ± 0.67 64.24 ± 0.82 63.04 ± 0.27 6 Table 15: values corresponding to LOO-CV NN-MT on S1 with similarity constraint on intra- and extra-task pairs (θ = 80) nb pos/nb neg ratio in NN extra task 1 2 5 1 72.14 ± 0.33 71.91 ± 0.37 75.65 ± 0.34 5 73.35 ± 0.09 75.5 ± 0.27 77.58 ± 0.31 10 73.88 ± 0.35 73.83 ± 0.62 75.9 ± 0.59 50 72.49 ± 0.49 72.15 ± 0.54 70.72 ± 0.67 Table 16: values corresponding to LOO-CV RN-MT on S1 with similarity constraint on intra- and extra-task pairs (θ = 20) nb pos/nb neg ratio in RN extra task 1 2 5 1 63.0 ± 0.55 64.24 ± 0.34 65.99 ± 0.64 5 65.68 ± 0.15 64.53 ± 0.92 67.27 ± 1.1 10 66.01 ± 0.74 64.39 ± 1.24 62.99 ± 0.8 50 66.45 ± 0.66 65.75 ± 0.47 62.23 ± 0.66 Table 17: values corresponding to LOO-CV RN-MT on S1 with similarity constraint on intra- and extra-task pairs (θ = 80) nb pos/nb neg ratio in RN extra task 1 2 5 1 72.43 ± 0.11 71.99 ± 0.44 75.48 ± 0.36 5 73.92 ± 0.15 75.48 ± 0.33 78.1 ± 0.28 10 74.5 ± 0.18 73.77 ± 0.57 76.7 ± 0.34 50 73.17 ± 0.27 71.74 ± 0.54 69.14 ± 0.44 Table 18: GPCR dataset: values corresponding to LOO-CV NN-MT with family’s hierarchy based kernel nb pos/nb neg 1 2 5 10 20 ratio in NN extra task 0 96.3 ± 0.42 96.3 ± 0.42 96.3 ± 0.42 96.3 ± 0.42 96.3 ± 0.42 1 96.43±0.34 96.55 ± 0.3 96.77±0.28 96.9 ± 0.28 96.98±0.26 10 96.46 ± 0.2 96.65±0.22 96.84±0.18 97.01±0.18 97.05±0.19 50 96.32±0.35 96.5 ± 0.25 96.66±0.21 96.63±0.17 96.65±0.22 100 95.31±0.27 95.88±0.26 96.57±0.22 96.74±0.19 96.56±0.21 7 Table 19: GPCR dataset: values corresponding to LOO-CV NN-MT with sequence based kernel nb pos/nb neg 1 2 5 10 20 ratio in NN extra task 0 92.68±0.23 92.68±0.23 92.68±0.23 92.68±0.23 92.68±0.23 1 92.92±0.19 93.22±0.22 93.66±0.21 94.06±0.18 94.32±0.19 10 93.55±0.24 93.97±0.24 94.6 ± 0.27 95.14±0.26 95.47±0.37 50 93.04±0.36 93.69±0.27 95.05±0.29 95.67±0.34 95.88±0.41 100 92.25±0.26 93.65±0.29 94.91±0.29 95.54±0.34 95.83±0.39 Table 20: GPCR dataset: values corresponding to LOO-CV RN-MT with family’s hierarchy based kernel nb pos/nb neg 1 2 5 10 20 ratio in RN extra task 0 96.3 ± 0.42 96.3 ± 0.42 96.3 ± 0.42 96.3 ± 0.42 96.3 ± 0.42 1 96.31 ± 0.4 96.32±0.39 96.33±0.39 96.34±0.38 96.37 ± 0.4 10 96.21±0.34 96.23±0.35 96.26±0.35 96.31±0.36 96.52±0.34 50 96.04 ± 0.3 96.19±0.33 96.47±0.39 96.65 ± 0.4 96.79±0.39 100 95.72±0.31 96.08±0.36 96.57±0.45 96.8 ± 0.51 96.99±0.34 Table 21: GPCR dataset: values corresponding to LOO-CV RN-MT with sequence based kernel nb pos/nb neg 1 2 5 10 20 ratio in RN extra task 0 92.68±0.23 92.68±0.23 92.68±0.23 92.68±0.23 92.68±0.23 1 93.23±0.15 93.19±0.13 93.17±0.13 93.14±0.14 93.08±0.16 10 93.8 ± 0.23 93.86±0.19 94.03±0.28 94.13±0.22 94.24±0.22 50 94.08 ± 0.1 94.43±0.11 94.73±0.21 94.76±0.25 95.18±0.27 100 92.76±0.12 93.84±0.13 94.85 ± 0.2 94.9 ± 0.3 95.47±0.27 8 Table 22: Ion Channel dataset: values corresponding to LOO-CV NN-MT with family’s hierarchy based kernel nb pos/nb neg 1 2 5 10 20 ratio in NN extra task 0 96.96±0.25 96.96±0.25 96.96±0.25 96.96±0.25 96.96±0.25 1 97.04±0.27 97.05±0.27 97.1 ± 0.27 97.14±0.26 97.18±0.27 10 97.22±0.22 97.29 ± 0.2 97.38±0.19 97.45±0.16 97.45±0.18 50 96.82±0.15 97.0 ± 0.16 97.28±0.15 97.42±0.15 97.41±0.12 100 96.54±0.19 96.79±0.16 97.24±0.13 97.35 ± 0.1 97.36±0.13 Table 23: Ion Channel dataset: values corresponding to LOO-CV NN-MT with sequence based kernel nb pos/nb neg 1 2 5 10 20 ratio in NN extra

Load more