<<

Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is © the Owner Societies 2019

Supplementary Information

Development and Application of a Comprehensive Machine Learning Program for Predicting Molecular Biochemical and Pharmacological Properties

Hwanho Choi, Hongsuk Kang, Kee-Choo Chung, and Hwangseo Park*

Sejong Battery Institute, Sejong University, 209 Neungdong-ro, Kwangjin-gu, Seoul 05006, South Korea

*Author to whom correspondence should may be addressed.

[email protected]

Chemical structures of the molecules in pKi (thrombin), logD, PCaco-2, S2-S16

and CLint datasets.

Linear correlation diagrams between the experimental and calculated S17-S21 logD values of 53 SAMPL5 molecules in the 5-fold external cross- validation.

Linear correlation diagrams between the experimental and calculated S22-S26 logCLint values of 37 drug molecules in the 5-fold external cross- validation.

S1 Table S1. Chemical structures of 88 thrombin inhibitors. Indicated in red are the molecules in the training set.

01 02 03

04 05 06

07 08 09

10 11 12

13 14 15

S2 16 17 18

19 20 21

22 23 24

25 26 27

28 29 30

31 32 33

S3 34 35 36

37 38 39

40 41 42

43 44 45

46 47 48

49 50 51

S4 52 53 54

55 56 57

58 59 60

61 62 63

64 65 66

67 68 69

S5 70 71 72

73 74 75

76 77 78

79 80 81

82 83 84

85 86 87

S6 88

S7 Table S2. Chemical structures of 53 SAMPL5 molecules

SAMPL5_002 SAMPL5_003 SAMPL5_004

SAMPL5_005 SAMPL5_006 SAMPL5_007

SAMPL5_010 SAMPL5_011 SAMPL5_013

SAMPL5_015 SAMPL5_017 SAMPL5_019

SAMPL5_020 SAMPL5_021 SAMPL5_024

SAMPL5_026 SAMPL5_027 SAMPL5_033

S8 SAMPL5_037 SAMPL5_042 SAMPL5_044

SAMPL5_045 SAMPL5_046 SAMPL5_047

SAMPL5_048 SAMPL5_049 SAMPL5_050

SAMPL5_055 SAMPL5_056 SAMPL5_058

SAMPL5_059 SAMPL5_060 SAMPL5_061

SAMPL5_063 SAMPL5_065 SAMPL5_067

S9 SAMPL5_068 SAMPL5_069 SAMPL5_070

SAMPL5_071 SAMPL5_072 SAMPL5_074

SAMPL5_075 SAMPL5_080 SAMPL5_081

SAMPL5_082 SAMPL5_083 SAMPL5_084

SAMPL5_085 SAMPL5_086 SAMPL5_088

SAMPL5_090 SAMPL5_092

S10 Table S3. Chemical structures of 38 drug molecules in Caco-2 dataset. Indicated in red are the molecules in the training set.

Salicylic acid

Phenytoin Acyclovir Alprenolol

Atenolol Caffeine

Clonidine

Ganciclovir Indomethacin

S11

Terbutaline Dexamethasone

Corticosterone Hydrocortisone Estradiol

Sucrose

Testosterone Mannitol

Zidovudine Nicotine

S12 Bremazocine Sulfasalazine

Meloxicam Warfarin

S13 Table S4. Chemical structures of 37 drug molecules in CLint dataset

Phenytoin Metoprolol Amitryptyline

Omeprazole Diltiazem

Alprazolam

Tenoxicam Methohexital Chlorpromazine

Methoxsalen Propranolol

Nicardipine Prednisone Verapamil

S14 Diazepam

Quinidine

Fluvastatin Tolbutamide Desipramine

Propafenone Dexamethasone

Hexobarbital Nilvadipine

Lorcainide FK1052 FK480

S15 Tenidap

S16 Figure S1. Linear correlation diagrams between the experimental and calculated results for logD values of the molecules in training (black) and test set (red).

Fold #1

Compounds in the training set: 002, 003, 004, 006, 007, 010, 011, 015, 017, 019, 021, 024, 026, 027, 037, 042, 045, 046, 048, 049, 050, 055, 056, 058, 059, 060, 063, 065, 070, 072, 074, 075, 081, 082, 083, 084, 088, 090, 092 Compounds in the test set: 005, 013, 020, 033, 044, 047, 061, 067, 068, 069, 071, 080, 085, 086

S17 Figure S2. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).

Fold #2

Compounds in the training set: 003, 005, 006, 007, 010, 011, 015, 017, 019, 020, 024, 027, 033, 037, 042, 044, 045, 048, 049, 055, 058, 059, 060, 061, 065, 067, 068, 069, 072, 074, 075, 080, 082, 083, 084, 085, 088, 090, 092 Compounds in the test set: 002, 004, 013, 021, 026, 046, 047, 050, 056, 063, 070, 071, 081, 086

S18 Figure S3. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).

Fold #3

Compounds in the training set: 002, 003, 004, 005, 006, 007, 010, 011, 013, 015, 017, 020, 021, 024, 026, 027, 033, 037, 044, 046, 047, 048, 049, 050, 055, 058, 059, 063, 065, 069, 071, 080, 081, 082, 083, 084, 085, 088, 090 Compounds in the test set: 019, 042, 045, 056, 060, 061, 067, 068, 070, 072, 074, 075, 086, 092

S19 Figure S4. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).

Fold #4

Compounds in the training set: 002, 003, 006, 007, 010, 011, 013, 015, 017, 019, 021, 026, 027, 033, 037, 042, 044, 045, 046, 047, 055, 056, 058, 059, 060, 061, 063, 068, 070, 071, 072, 074, 080, 082, 083, 085, 086, 088, 090 Compounds in the test set: 004, 005, 020, 024, 048, 049, 050, 065, 067, 069, 075, 081, 084, 092

S20 Figure S5. Linear correlation diagrams between the experimental and calculated results for LogD values of the molecules in training (black) and test set (red).

Fold #5

Compounds in the training set: 002, 003, 005, 006, 007, 010, 013, 015, 017, 019, 020, 024, 026, 027, 033, 037, 042, 044, 045, 048, 049, 056, 059, 061, 063, 065, 068, 069, 070, 071, 074, 075, 080, 082, 083, 084, 085, 090, 092 Compounds in the test set: 004, 011, 021, 046, 047, 050, 055, 058, 060, 067, 072, 081, 086, 088,

S21 Figure S6. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).

Fold #1

Compounds in the training set: phenytoin, amitryptyline, omeprazole, clozapine, zolpidem, , methohexital, chlorpromazine, methoxsalen, propranolol, diclofenac, nicardipine, prednisone, verapamil, midazolam, diazepam, triazolam, , ibuprofen, diphenhydramine, fluvastatin, tolbutamide, desipramine, amobarbital, phenacetin, fk1052, lorcainide, tenidap Compounds in the test set: metoprolol, diltiazem, , imipramine, , dexamethasone, , nilvadipine, fk480

S22 Figure S7. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).

Fold #2

Compounds in the training set: metoprolol, omeprazole, diltiazem, clozapine, alprazolam, imipramine, tenoxicam, methohexital, methoxsalen, propranolol, diclofenac, nicardipine, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, fluvastatin, desipramine, propafenone, dexamethasone, amobarbital, hexobarbital, phenacetin, nilvadipine, fk480, lorcainide Compounds in the test set: phenytoin, amitryptyline, zolpidem, chlorpromazine, prednisone, diphenhydramine, tolbutamide, fk1052, tenidap

S23 Figure S8. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).

Fold #3

Compounds in the training set: phenytoin, amitryptyline, omeprazole, diltiazem, alprazolam, zolpidem, imipramine, tenoxicam, methoxsalen, propranolol, nicardipine, prednisone, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, diphenhydramine, fluvastatin, tolbutamide, desipramine, propafenone, amobarbital, phenacetin, nilvadipine, fk1052, fk480 Compounds in the test set: metoprolol, clozapine, methohexital, chlorpromazine, diclofenac, dexamethasone, hexobarbital, lorcainide, tenidap

S24 Figure S9. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).

Fold #4

Compounds in the training set: metoprolol, amitryptyline, omeprazole, clozapine, alprazolam, zolpidem, tenoxicam, methohexital, chlorpromazine, methoxsalen, propranolol, diclofenac, nicardipine, prednisone, verapamil, diazepam, triazolam, quinidine, ibuprofen, fluvastatin, desipramine, propafenone, dexamethasone, amobarbital, phenacetin, nilvadipine, fk1052, tenidap Compounds in the test set: phenytoin, diltiazem, imipramine, midazolam, diphenhydramine, tolbutamide, hexobarbital, fk480, lorcainide

S25 Figure S10. Linear correlation diagrams between the experimental and calculated results for logCL values of the molecules in training (black) and test set (red).

Fold #5

Compounds in the training set: phenytoin, metoprolol, amitryptyline, omeprazole, diltiazem, zolpidem, imipramine, tenoxicam, methohexital, chlorpromazine, methoxsalen, diclofenac, prednisone, midazolam, triazolam, quinidine, diphenhydramine, fluvastatin, tolbutamide, desipramine, dexamethasone, amobarbital, hexobarbital, nilvadipine, fk1052, fk480, lorcainide, tenidap Compounds in the test set: clozapine, alprazolam, propranolol, nicardipine, verapamil, diazepam, ibuprofen, propafenone, phenacetin

S26