<<

Predictions of Intrinsic Aqueous Solubility of Crystalline Drug-like Molecules

James McDonagh Supervisors Dr John Mitchell and Dr Tanja van Mourik

1 Introduction Solution model Results Further models Current/future work Overview • Introduction. • Solution model – Our solubility prediction model. • Results – The performance of the solubility predictions. • Further models – Work following on from the initial model. • Current/future work – Where we are currently and where we are going.

2 Introduction Solution model Results Further models Current/future work Why is solubility prediction important?

• Crucial factor to control bioavailability of drug candidates. • Critical component in determining the environmental impact of pesticides. • Accurate in silico predictions of solubility can save time and money.

3 Introduction Solution model Results Further models Current/future work Why should we care about solubility ? • New drug candidates are often more insoluble than their predecessors. • Formulation of these drugs often involve less pleasant administration methods.

• Certain pesticides can cause extensive damage and potentially enter the water cycle. 4 Introduction Solution model Results Further models Current/future work

Existing Theoretical Approaches

• So far, although theoretical methods have shown promise, they have not matched the accuracy of QSPR. Theoretical methods do have the advantage of being physically tractable. • Industry also requires high through put methods. QSPR models are generally much faster than computational chemistry models. • There are many theoretical models to make solubility predictions.

5 Introduction Solution model Results Further models Current/future work Our Methodologies • We have decomposed the solution (solu) free energy prediction in to two distinct steps. – Sublimation (sub) – Hydration (hyd) • We have applied a range of methodologies to each step. • Methodologies include simulation, QM calculation and machine learning.

6 Introduction Solution model Results Further models Current/future work Thermodynamic cycle Gaseous Sub = sublimation Hyd = hydration 1 mol/L Solu = solution

+1.89 kcal/mol

1 atm

ΔGᵒsub ΔGᵒhyd ΔG*sub ΔG*hyd

ΔGsolu

7 Crystalline Solution Introduction Solution model Results Further models Current/future work Sublimation : Predictions by DMACRYS Gaseous • DMACRYS a periodic lattice simulation program. • Electrostatics, from distributed multipoles. • Buckingham potential to ΔG sub account for repulsion and dispersion. • Calculates lattice energy and crystal entropy from phonon modes. • Gas phase contributions calculated in Gaussian 09. 8 Solid Introduction Solution model Results Further models Current/future work Hydration: Solvation models Gaseous

• Continuum solvent, solvation model based on density (SMD). ΔG hyd • An integral equation theory (IET) of molecular liquids methodology the Reference Interaction Site Model (RISM).

Dilute solution 9 Introduction Solution model Results Further models Current/future work Hydration: RISM

• Combines features of explicit and implicit solvent models. • Solvent density is modelled, but no explicit molecular coordinates or dynamics.

~45 CPU mins per compound 10 Introduction Solution model Results Further models Current/future work Hydration: RISM • We use the 3D RISM variation. • We employ the Kovalenko-Hirata (KH) closure and Gaussian fluctuation free energy functional. • We also employ the universal correction (UC). 3퐷푅퐼푆푀−퐾퐻/푈퐶 3퐷푅퐼푆푀 ∆퐺ℎ푦푑 = ∆퐺ℎ푦푑 + 푎 휌푉 + 푏 • The methodology we used will be referred to as 3DRISM-KH/UC.

11 David, S. Palmer., Andrey I Frolov, Ekaterina L Ratkova and Maxim V Fedorov,. Journal of Physics: Condensed Matter 2010, 22 (49), 492101.

Introduction Solution model Results Further models Current/future work Solution Free Energy • The sum of our predictions of ΔG sub and ΔG hyd produce a ΔG solu prediction. • These methods were carried ΔG sub ΔG hyd out for 25 chemically diverse drug-like molecules. • Chemical accuracy ~4 kJ/mol or ~ 1 LogS unit. • Useful predictions are within ΔG solu the standard deviation (SD) of the experimental values.

12 Introduction Solution model Results Further models Current/future work Data set selection

4-Aminobenzoic acid Trimethoprim Benzoic acid • The thermodynamically Allopurinol AMBNAC04 AMXBPM10 BENZAC02 ALOPUR most stable polymorph was selected • 10 had available Benzamide experimental sublimation BZAMID02 COCAIN10 COYRUD11 Danthron DAHNQU06 and hydration free energies. (in red).

Ibuprofen Estrone Acetaminophen IBPRAC01 EPHPMO ESTRON14 HXACAN04

Nicotinic acid 1-Naphthol NICOAC02 Fluconazole Isoproturon Nitrofurantoin NAPHOL01 IVUQOF JODTUR01 LABJON01 NDNHCL01

Mefenamic acid XYANAC Niflumic acid Pindolol Pteridine Pyrene 13 NIFLUM10 PINDOL PTERID11 PYRENE07 SALIAC SIKLIH01 Introduction Solution model Results Further models Current/future work Sublimation free energy predictions

• Validation of the sublimation free energy prediction. • B3LYP/6-31G(d,p) multipole. • FIT repulsion dispersion potential. • Correlation coefficient (R) 0.87. • RMSE 5.66kJ/mol.

14 Introduction Solution model Results Further models Current/future work Hydration Free Energy Predictions

• Validation of the hydration free energy predictions.

3DRISM-KH/UC • SMD HF/6-31G(d,p). • Both have strong R values 0.93 RISM, 0.97 SMD. • RISM has a significantly higher RMSE 4.85kJ/mol RISM, 2.91kJ/mol SMD. SMD (HF)

15 Introduction Solution model Results Further models Current/future work Solution Free Energy Predictions DMACRYS + • The full 25 molecule set is 3DRISM-KH/UC compared to experiment. • Chemical accuracy ~ 1logS unit. Experimental SD 1.79 LogS. • Reasonable correlation R 0.85 RISM, R 0.84 SMD. DMACRYS+ • RISM method provides best RMSE, SMD (HF) RMSE of 1.45LogS RISM, RMSE of 2.03LogS SMD. • SMD outliers Niflumic Acid and Pteridine. Palmer, D. S.; McDonagh, J. L.; Mitchell, J. B. O.; van Mourik, T.; Fedorov, M. V., Journal of Chemical Theory and Computation 2012. 16 Introduction Solution model Results Further models Current/future work To sum up • From these results we concluded it was possible to make predictions of a reasonable accuracy. • In our methodology a larger portion of error could be attributed to the sublimation free energy prediction. • Larger datasets were required to fully validate the methodology.

17 Introduction Solution model Results Further models Current/future work Further models

• We took two approaches to follow up this work: – Parameterisation and machine learning approaches to predict ΔG solution. – Systematic theoretical improvements in Sublimation free energy predictions. (work currently ongoing).

18 Introduction Solution model Results Further models Current/future work Informatics and machine learning

• We selected a dataset of 100 molecules. • Calculated descriptors using the chemistry development kit (CDK). • We followed our previously laid out theoretical methodology for the 100 molecules. • We combined descriptors and theoretically calculated energies.

19 Introduction Solution model Results Further models Current/future work Descriptors • SMILES are input into CDK. • Structural and some predicted properties are output for use as descriptors. • These cheminformatics CC(=O)Oc1ccccc1C(=O)O descriptors were used as part of the input for Descriptor Value Molecular Weight 280 the machine learning Molecular Formula C9H8O4 methods. XLogP 1.24 Freely rotatable bonds 3 H bond acceptor 4 ….. ….. 20 …… …… Introduction Solution model Results Further models Current/future work Computational Chemistry Calculations • DMACRYS –B3LYP/ 6-31G(d,p), FIT potential. • SMD with HF/6-31G(d,p) • The same level of theory was used in the gas phase as the solution phase. • SMD selected over RISM as it provided a better correlation in the previous work. 21

Introduction Solution model Results Further models Current/future work Machine Learning Models • Random Forest (RF) – A forest of decision trees.

22 Introduction Solution model Results Further models Current/future work • Support Vector Machines (SVM) – Classification by projection into a higher space and separation by a hyperplane.

23 Introduction Solution model Results Further models Current/future work

• Partial least squares Regression (PLS) – can be considered as classification by deflation.

푦 = 푋푏 + 휖푡

1 0.5 5 50 1 0.7 7 70 1 0.9 9 2

24 Introduction Solution model Results Further models Current/future work Work flow/Experimental design

RF Internal cross validation

Prepare and input External 10 Descriptors and SVM fold cross Theoretically Internal cross calculated validation validation energies

PLS Internal cross validation

25 Introduction Solution model Results Further models Current/future work Results • Results of the purely LogS experiment Vs DMACRYS-SMD HF/6-31G(d,p) predictions theoretical prediction. 6.00 • Our results are 4.00

2.00 correlative.

0.00 -10.00 -8.00 -6.00 -4.00 -2.00 0.00 2.00 4.00 • Standard linear

-2.00

-4.00 regression is a poor -6.00 fitting model. -8.00 R = 0.57 R² = 0.32 Predicted LogS Predicted • Chemical accuracy -10.00 RMSE = 2.95 -12.00 ~1logS unit. Experimental LogS • Experimental SD 1.71 LogS units.

26 Introduction Solution model Results Further models Current/future work Descriptors only • Results of prediction Descriptors only RMSE exclusively using the CDK descriptors.

RMSE • All machine learning methods perform better than theory alone. • Red bars show the SD and Predictive model PLS RF SVM mean result. • Boxes represent 75% of the

predictions. Dark blue line

2 shows the median. R PLS RF SVM Theory Mean RMSE 1.174(±0.08) 1.134(±0.03) 1.132(±0.03) 2.95 Descriptors only R2 Mean R2 0.56(±0.03) 0.56(±0.03) 0.56(±0.03) 0.32 Predictive model PLS RF SVM 27 Introduction Solution model Results Further models Current/future work Combined model Combined model RMSE • The model contains HF/6-31G(d,p) energies and descriptors. RMSE • All show improvement over pure theory. • Result are similar to PLS RF SVM Predictive model those of the descriptors alone. • Experimental SD 1.71

LogS units.

2 R PLS RF SVM Theory Average RMSE 1.110(± 0.04) 1.107(±0.03) 1.111(±0.04) 2.95 Combined model R2 Average R2 0.594( ±0.04) 0.583(±0.04) 0.576(±0.04) 0.32 28 PLS RF SVM Predictive model Introduction Solution model Results Further models Current/future work Summary • The information from theoretical calculations at this level has a minor impact but does improve accuracy and correlation of the results. • The descriptors already hold much of the information. • Further exploration of models of this type could allow us to find information not held in the descriptors that is accessible by chemical calculation.

29 Introduction Solution model Results Further models Current/future work Exploration of sublimation free energy prediction • The largest source of error in the initial method was the sublimation free energy prediction. • We have a dataset of 60 molecules. • We made sublimation free energy predictions using this dataset with our previously outlined method. • We look to DFT methods to provide improved predictions.

30 Introduction Solution model Results Further models Current/future work Periodic DFT • We are using Periodic DFT to make sublimation free energy predictions. • We are exploring dispersion corrections. • We will look at the accuracy of prediction of the components of the free energy.

31 Introduction Solution model Results Further models Current/future work Free Energy of Sublimation

• From our initial Experimental Vs Predicted ΔG sub methodology. 120.00 • A poor correlation. 100.00

• Significant RMSE. 80.00 • Outliers hold significant 60.00 leverage.

ΔG sub Predicted subΔG 40.00 • All outliers contain NO2 R² = 0.39 groups (in red) which 20.00 are known to be RMSE 16.6 0.00 difficult to represent 0.00 20.00 40.00 60.00 80.00 100.00 120.00 accurately in force ΔG sub Experimental

fields. 32 Introduction Solution model Results Further models Current/future work Summary • We have explored a purely theoretical methodology for predictions of solvation free energy. • We have expanded from this to produce a combined computational chemistry - cheminformatics methodology. • We have begun exploration of sublimation free energy, due to the large error it contributed in our original methodology.

33 Introduction Solution model Results Further models Current/future work Thanks for your attention

Thanks to the members of room 150 School Many thanks to collaborators, co- of Chemistry University of St Andrews: workers and funders. • Dr David Palmer (RISM) • Professor Maxim Fedorov (RISM) • Ms Neetika Nath (Machine Learning) Dr Lazaros Rosanna Neetika Dr Luna Mavridis Alderson Nath De Ferrari • Dr Luna De Ferrari • Dr Herbert Früchtl • Professor Michael Bühl

Luke Leo Dr Ludovic Ava Sih-Yu Crawford Holroyd Castro Chen Many thanks to my Supervisors

34 Dr John Mitchell Dr Tanja van Mourik

35

36 Current Preliminary Results

Enthalpy of Sublimation Experimental Vs Predicted ΔH sub 170.00

• Using DMACRYS – 160.00

B3LYP multipoles FIT 150.00

1 potential and Gaussian - 140.00

09. 130.00

H H kJ sub mol Δ • 48 molecules. 120.00

110.00 Predicted

• A fair correlation but 100.00 R² = 0.5613 significant RMSE. 90.00 RMSE 15.20 80.00 80.00 90.00 100.00 110.00 120.00 130.00 140.00 150.00 160.00 Experimental ΔH sub kJ mol-1

37 Entropy of Sublimation

• Crystal entropy Experimental Vs Predicted TΔS sub calculated in DMACRYS, 66.00 gas phase in Gaussian 64.00

62.00

1

09. - • No meaningful 60.00

correlation. 58.00

S S kJ predictedmol

Δ T 56.00 • Significant RMSE.

54.00 R² = 0.0412 • 48 molecules RMSE 13.55 52.00 0 20 40 60 80 100 TΔS experimental kJ mol-1

38 Additional notes

• Buckingham potential • Thermodynamics 퐸 = 퐵푒푥푝 −퐶푟 − 퐴푟−6 푙푘 푙푘 푙푘 • ΔHsub = -Ulatt +2RT • Universal correction • ΔSsub = (Sr+St) – Scrys ∆퐺3퐷푅퐼푆푀−푈퐶 = ℎ푦푑 • ΔGsub = ΔHsub - TΔSsub ∆퐺3퐷푅퐼푆푀 + 푎 휌푉 + 푏 ℎ푦푑 • ΔGhyd = Esol – Egas • LogS • ΔGsolu = ΔGsub + ΔGhyd ΔG • ∆퐺퐺퐹 = sol 1 −푅푇 퐾 푇 푁푠표푙푣푒푛푡 휌 푐 푟 − 푐 푟 ℎ 푟 푑푟 =log(S) 퐵 훼=1 훼 푅3 훼 2 훼 훼 ln (10) ΔGsolu = −푅푇푙푛푆 • 1logS unit = 5.71 in terms of ΔG

표 표 표 ∆퐺푠표푙 = ∆퐺푠푢푏 + ∆퐺ℎ푦푑 = −푅푇푙푛(푆0푣푚) Molar volume of the crystal Vm Intrinsic solubility So 39

40