Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is © the Owner Societies 2019
Supplementary Information
Development and Application of a Comprehensive Machine Learning Program for Predicting Molecular Biochemical and Pharmacological Properties
Hwanho Choi, Hongsuk Kang, Kee-Choo Chung, and Hwangseo Park*
Sejong Battery Institute, Sejong University, 209 Neungdong-ro, Kwangjin-gu, Seoul 05006, South Korea
*Author to whom correspondence should may be addressed.
Chemical structures of the molecules in pKi (thrombin), logD, PCaco-2, S2-S16
and CLint datasets.
Linear correlation diagrams between the experimental and calculated S17-S21 logD values of 53 SAMPL5 molecules in the 5-fold external cross- validation.
Linear correlation diagrams between the experimental and calculated S22-S26 logCLint values of 37 drug molecules in the 5-fold external cross- validation.
S1 Table S1. Chemical structures of 88 thrombin inhibitors. Indicated in red are the molecules in the training set.
01 02 03
04 05 06
07 08 09
10 11 12
13 14 15
S2 16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
31 32 33
S3 34 35 36
37 38 39
40 41 42
43 44 45
46 47 48
49 50 51
S4 52 53 54
55 56 57
58 59 60
61 62 63
64 65 66
67 68 69
S5 70 71 72
73 74 75
76 77 78
79 80 81
82 83 84
85 86 87
S6 88
S7 Table S2. Chemical structures of 53 SAMPL5 molecules
SAMPL5_002 SAMPL5_003 SAMPL5_004
SAMPL5_005 SAMPL5_006 SAMPL5_007
SAMPL5_010 SAMPL5_011 SAMPL5_013
SAMPL5_015 SAMPL5_017 SAMPL5_019
SAMPL5_020 SAMPL5_021 SAMPL5_024
SAMPL5_026 SAMPL5_027 SAMPL5_033
S8 SAMPL5_037 SAMPL5_042 SAMPL5_044
SAMPL5_045 SAMPL5_046 SAMPL5_047
SAMPL5_048 SAMPL5_049 SAMPL5_050
SAMPL5_055 SAMPL5_056 SAMPL5_058
SAMPL5_059 SAMPL5_060 SAMPL5_061
SAMPL5_063 SAMPL5_065 SAMPL5_067
S9 SAMPL5_068 SAMPL5_069 SAMPL5_070
SAMPL5_071 SAMPL5_072 SAMPL5_074
SAMPL5_075 SAMPL5_080 SAMPL5_081
SAMPL5_082 SAMPL5_083 SAMPL5_084
SAMPL5_085 SAMPL5_086 SAMPL5_088
SAMPL5_090 SAMPL5_092
S10 Table S3. Chemical structures of 38 drug molecules in Caco-2 dataset. Indicated in red are the molecules in the training set.
Phenytoin Acyclovir Alprenolol
Atenolol Caffeine Chlorpromazine
Clonidine Desipramine Diazepam
Ganciclovir Indomethacin Labetalol
S11 Metoprolol Pindolol Propranolol
Terbutaline Timolol Dexamethasone
Corticosterone Hydrocortisone Estradiol
Sucrose Progesterone Aminophenazone
Testosterone Mannitol Phencyclidine
Zidovudine Nadolol Nicotine
S12 Bremazocine Scopolamine Sulfasalazine
Meloxicam Warfarin
S13 Table S4. Chemical structures of 37 drug molecules in CLint dataset
Phenytoin Metoprolol Amitryptyline
Omeprazole Diltiazem Clozapine
Alprazolam Zolpidem Imipramine
Tenoxicam Methohexital Chlorpromazine
Methoxsalen Propranolol Diclofenac
Nicardipine Prednisone Verapamil
S14 Midazolam Diazepam Triazolam
Quinidine Ibuprofen Diphenhydramine
Fluvastatin Tolbutamide Desipramine
Propafenone Dexamethasone Amobarbital
Hexobarbital Phenacetin Nilvadipine
Lorcainide FK1052 FK480
S15 Tenidap
S16 Figure S1. Linear correlation diagrams between the experimental and calculated results for logD values of the molecules in training (black) and test set (red).
Fold #1
Compounds in the training set: 002, 003, 004, 006, 007, 010, 011, 015, 017, 019, 021, 024, 026, 027, 037, 042, 045, 046, 048, 049, 050, 055, 056, 058, 059, 060, 063, 065, 070, 072, 074, 075, 081, 082, 083, 084, 088, 090, 092 Compounds in the test set: 005, 013, 020, 033, 044, 047, 061, 067, 068, 069, 071, 080, 085, 086
S17 Figure S2. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).
Fold #2
Compounds in the training set: 003, 005, 006, 007, 010, 011, 015, 017, 019, 020, 024, 027, 033, 037, 042, 044, 045, 048, 049, 055, 058, 059, 060, 061, 065, 067, 068, 069, 072, 074, 075, 080, 082, 083, 084, 085, 088, 090, 092 Compounds in the test set: 002, 004, 013, 021, 026, 046, 047, 050, 056, 063, 070, 071, 081, 086
S18 Figure S3. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).
Fold #3
Compounds in the training set: 002, 003, 004, 005, 006, 007, 010, 011, 013, 015, 017, 020, 021, 024, 026, 027, 033, 037, 044, 046, 047, 048, 049, 050, 055, 058, 059, 063, 065, 069, 071, 080, 081, 082, 083, 084, 085, 088, 090 Compounds in the test set: 019, 042, 045, 056, 060, 061, 067, 068, 070, 072, 074, 075, 086, 092
S19 Figure S4. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).
Fold #4
Compounds in the training set: 002, 003, 006, 007, 010, 011, 013, 015, 017, 019, 021, 026, 027, 033, 037, 042, 044, 045, 046, 047, 055, 056, 058, 059, 060, 061, 063, 068, 070, 071, 072, 074, 080, 082, 083, 085, 086, 088, 090 Compounds in the test set: 004, 005, 020, 024, 048, 049, 050, 065, 067, 069, 075, 081, 084, 092
S20 Figure S5. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).
Fold #5
Compounds in the training set: 002, 003, 005, 006, 007, 010, 013, 015, 017, 019, 020, 024, 026, 027, 033, 037, 042, 044, 045, 048, 049, 056, 059, 061, 063, 065, 068, 069, 070, 071, 074, 075, 080, 082, 083, 084, 085, 090, 092 Compounds in the test set: 004, 011, 021, 046, 047, 050, 055, 058, 060, 067, 072, 081, 086, 088,
S21 Figure S6. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).
Fold #1
Compounds in the training set: phenytoin, amitryptyline, omeprazole, clozapine, zolpidem, tenoxicam, methohexital, chlorpromazine, methoxsalen, propranolol, diclofenac, nicardipine, prednisone, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, diphenhydramine, fluvastatin, tolbutamide, desipramine, amobarbital, phenacetin, fk1052, lorcainide, tenidap Compounds in the test set: metoprolol, diltiazem, alprazolam, imipramine, propafenone, dexamethasone, hexobarbital, nilvadipine, fk480
S22 Figure S7. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).
Fold #2
Compounds in the training set: metoprolol, omeprazole, diltiazem, clozapine, alprazolam, imipramine, tenoxicam, methohexital, methoxsalen, propranolol, diclofenac, nicardipine, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, fluvastatin, desipramine, propafenone, dexamethasone, amobarbital, hexobarbital, phenacetin, nilvadipine, fk480, lorcainide Compounds in the test set: phenytoin, amitryptyline, zolpidem, chlorpromazine, prednisone, diphenhydramine, tolbutamide, fk1052, tenidap
S23 Figure S8. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).
Fold #3
Compounds in the training set: phenytoin, amitryptyline, omeprazole, diltiazem, alprazolam, zolpidem, imipramine, tenoxicam, methoxsalen, propranolol, nicardipine, prednisone, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, diphenhydramine, fluvastatin, tolbutamide, desipramine, propafenone, amobarbital, phenacetin, nilvadipine, fk1052, fk480 Compounds in the test set: metoprolol, clozapine, methohexital, chlorpromazine, diclofenac, dexamethasone, hexobarbital, lorcainide, tenidap
S24 Figure S9. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).
Fold #4
Compounds in the training set: metoprolol, amitryptyline, omeprazole, clozapine, alprazolam, zolpidem, tenoxicam, methohexital, chlorpromazine, methoxsalen, propranolol, diclofenac, nicardipine, prednisone, verapamil, diazepam, triazolam, quinidine, ibuprofen, fluvastatin, desipramine, propafenone, dexamethasone, amobarbital, phenacetin, nilvadipine, fk1052, tenidap Compounds in the test set: phenytoin, diltiazem, imipramine, midazolam, diphenhydramine, tolbutamide, hexobarbital, fk480, lorcainide
S25 Figure S10. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).
Fold #5
Compounds in the training set: phenytoin, metoprolol, amitryptyline, omeprazole, diltiazem, zolpidem, imipramine, tenoxicam, methohexital, chlorpromazine, methoxsalen, diclofenac, prednisone, midazolam, triazolam, quinidine, diphenhydramine, fluvastatin, tolbutamide, desipramine, dexamethasone, amobarbital, hexobarbital, nilvadipine, fk1052, fk480, lorcainide, tenidap Compounds in the test set: clozapine, alprazolam, propranolol, nicardipine, verapamil, diazepam, ibuprofen, propafenone, phenacetin
S26