COSMO-RS: From to Cheminformatics

4.0

3.5 Binary mixture of Butanol(1) and Heptane (2)

3.0 at 50° C

2.5 1-Butanol (calculated)

) n-Heptane (calculated)

( 2.0 1-Butanol (experiment) n l n-Heptane (experiment)

1.5

1.0

0.5

0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1 Mole fraction of 1-butanol (1) Andreas Klamt COSMOlogic GmbH&Co.KG, Leverkusen, Germany and Inst. of Physical Chemistry, University of Regensburg, Germany Thermophysical data prediction methods

MD/MC

simple, well explored solvents ≠

water Quantum Chemistry latitudes of with dielectric solvation solvation models soft like PCM biom a t t e r or COSMO horizon of COSMO-RS -OCH3 -C r ar a MD / MC alkanes solid -C force-field H -C(=O)H -O H simulations -Car r phase - a Ca C rH - arH horizon of gas- -C phase methods Group contribution methods gas phase UNIFAC, CLOGP, LOGKOW, fingerprints,.. etc. quantum chemistry fitted parameters: CLOGP:~ 1500 UNIFAC: ~5000 +50% gaps DDiieelleeccttrriicc CCoonnttiinnuuuumm SoSollvvaatt iioonn MMooddeellss ((CCSMSM)) solute molecule embedded in a dielectric continuum, self-consistent inclusion of solvent polarisation (screening charges) into MO-calculation (SCRF) - Born 1920, Kirkwood 1934, Onsager1936 - Rivail, Rinaldi et al. - Katritzky, Zerner et al. - Cramer, Truhlar et al. (AMSOL) - Tomasi et al. (PCM) - Orozco et al. - Klamt, Schüürmann (COSMO) e.g. DMol3/COSMO and others COSMO = COnductor-like Screening Model, just a (clever) variant of dielectric CSMs Density Functional Theory (DFT) is appropriate level of QC! COSMO almost as fast as gasphase! programs: DMol3, Turbomole, Gaussian98_release2001 - empirical finding: cavity radii should be about 1.2 vdW-radii up to 25 atom:< 24 h on LINUX PC - promising results for solvents water, alkanes, and a few other solvents But CSMs are basically wrong and give a poor, macroscopic description of the solvent ! WhWhyy aarree CCoonnttiinnuuuumm SoSollvvaattiioonn MMoodedellss wwrroonngg ffoorr ppoollaarr mmoolleeccuulleess iinn popollaarr ssoollvveentntss??

-discrete permanant dipoles -only electronic polarizibility -mainly reorientational polarizibility -homogeneously distributed -linear response requires E << kT -linear response up to very high fields reor - typically E ~ 8 kcal/mol !!! dielectric continuum theory should reor be reasonably applicable no linear response, no homogenity no similarity with dielectric theory How to come to the latitudes of solvation?

state of ideal screening home of COSMOlogic COSMO-RS water Quantum Chemistry latitudes of with dielectric solvation solvation models acetone like COSMO or PCM horizon of COSMO-RS solid -OCH3 -C a ar MD / MC r C simulations alkanes - H -C(=O)H -O H -Car r QM/MM state -C a bridge of arH C H - horizon of gas-Car-Parrinello symmetry -Car phase methods

gas phase Group contribution methods UNIFAC, CLOGP, LOGKOW, etc. native home of 1) Put molecules into ‚virtual‘ conductor (DFT/COSMO) CCOOSSMMOO--RRSS:: 2) Compress the ensemble to approximately right density 3) Remove the conductor on molecular contact areas (stepwise) and ask for the energetic costs of each step.

In this way the molecular (2) hydrogen bond interactions reduce to pair σ >> 0 (1) σ <' < 0 interactions of surfaces!

electrostat. misfit + + _ σ__ _ ++ _ + σ' + ideal contact

α σ σ = ' σ + σ 2 (3) specific Gmisfit ( , ') aeff ( ') interactions 2 σ σ = σ σ + σ 2 Ghb ( , ') aeff chb (T ) min{0, ' hb } CCOOSSMMOO--RRSS

For an efficient statistical thermodynamics reduce the ensemble of molecules to an ensemble of pair-wise interacting surface segments !

Water

5

4.5

) 4 e c a f r 3.5 ✪ u s

f 3 o

t n

u 2.5 o m a

( 2

) s (

r 1.5 e t a w

p 1

0.5

0 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 σ [e/A2]

Screening charge distribution on molecular surface reduces to "σ-profile" CCOOSSMMOO--RRSS A. Klamt, J. Phys. Chem., 99 (1995) 2224

For an efficient statistical thermodynamics reduce the ensemble of molecules to an ensemble of pair-wise interacting surface segments ! (same approximation as is UNIFAC) 25 ) σ ( X

p Water

20 Methanol Acetone Benzene 15 Chloroform Hexane

10

5

0 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 σ [e/A2] Screening charge distribution on molecular surface reduces to "σ-profile" WWhhyy ddoo aacceettoonnee aanndd chchlloorrooffoorrmm 25 lliikkee eeaacchh ootthheerr ssoo mmuucchh?? Acetone 20 Chloroform

15 ) ( X p 10

5

0 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 0.0 σ [e/A2] Acetone (calculated)

-0.2 Chloroform (calculated)

-0.4 Acetone (experiment, Rabinovich et al.) Chloroform ) γ ( -0.6 n (experiment,Rabinovich et al.) l Aceton (experiment, Apelblat et al.) -0.8 Chloroform (experiment, Apelblat et al.) -1.0 σσ -1.2 BBeeccaauussee tthheeiirr --pprrooffiilleess aarree 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Mole fraction of acetone (1) aallmmoosstt ccoommpplleemmeennttaarryy!! SSttatatiissttiiccalal ThTheerrmmododynynamamiiccss • Replace ensemble of interacting molecules by an ensemble S of interacting pairs of surface segments σ σ • Ensemble S is fully characterized by its -profile pS( ) σ ( pS( ) of mixtures is additive! -> no problem with mixtures! ) • Chemical potential of a surface segment with charge density σ is exactly(!) described by:  E (σ ,σ ') − µ (σ ')  µ (σ ) = − kT ln ∫ dσ 'p ( σ ') exp − int S  S S  kT  σ-potential: chemical potential of solute X in S: affinity of solvent for specific polarity σ µ X = σ X (σ ) µ (σ ) − λ γ SX ,comb S ∫ d p S kT ln AS combinatorial contribution: solvent size effects activity coefficients → arbitrary liquid-liquid equilibria 25 ) σ ( X p σσ--pprrofofiilleess 20 Water

Methanol anandd 15 Acetone σσ--ppototeennttiialalss ofof Benzen e rreepprreesseennttaattiivvee lliiqquuiiddss 10 Chloroform 0.70 Hexane

5 0.50

hydrophobicity 0 0.30

-0.020 -0.015 -0.010 -0.005 0.000 0.005 ] 0.010 0.015 0.020 2 2

σ A

[e/A ] l o m /

J 0.10 k [

) σ (

X Water -µ 0.10 Methanol affinity for Acetone affinity for HB-donors HB-acceptors -0.30 Benzene Chloroform Hexane -0.50 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 σ [e/A2] 2 1 0 alkanes -1 alkenes -2 a) DG (in kcal/mol) hydr alkines -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 alcohols 2 ethers

1 carbonyls esters 0 aryls

-1 diverse amines b) log Pvapor (in bar) -2 amides -4 -3 -2 -1 0 1 2 2 N-aryls nitriles 1 nitro 0 chloro water -1 s l c) log KOctanol/Water -2 a Results of parametrization based on DFT -2 -1 0 1 2 3 4 5 6 3 u 2 (DMol : BP91, DNP-basis d

i 1 s 0

e 650 data

R -1 17 parameters d) log KHexane/Water -2 rms = 0.41 kcal/mol -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 2

1 A. Klamt, V. Jonas, J. Lohrenz, T. Bürger,

0 J. Phys. Chem. A, 102, 5074 (1998)

-1 e) log KBenzene/Water -2 meanwhile: -4 -3 -2 -1 0 1 2 3 4 5 2 COSMOtherm5.0 with Turbomole BP91/TZVP

1 rms = 0.36 kcal/mol

0

-1 f) log KEther/Water -2 -3 -2 -1 0 1 2 3 AAppplpliiccatiationonss ttoo PPhashasee DDiiagagrramsams anandd AAzzeeoottrropopeess

4.0 1.0

3.5 Binary mixture of 0.9 Binary Mixture of Butanol(1) and Heptane (2) 0.8 1-butanol (1) and water 3.0 at 50° C at 60° C 0.7

2.5 1-Butanol (calculated) 0.6 Calculated

) n-Heptane (calculated) Experiment y ( 2.0 1-Butanol (experiment) 0.5 n l n-Heptane (experiment) 0.4 1.5 0.3

1.0 0.2 miscibility gap 0.5 0.1

0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x x1 Mole fraction of 1-butanol (1)

1.0 1.0

Binary mixture of Binary mixture of 0.8 Butanol(1) and Heptane (2) 0.8 at 50° C ethanol (1) and benzene (2) November 2002: COSMOtherm wins the VLE patr 25e° Cdiction contest

0.6 of Nat. Inst. of Stand0.6ards (NICSalcuTlated) Calculated Experiment y Experiment and American Inst. of Chemy . Engineers (AICHE) 0.4 0.4

0.2 0.2

0.0 0.0 0.0 0.2 0.4 x 0.6 0.8 1.0 0.0 0.2 0.4 x 0.6 0.8 1.0 Chemical Structure Flow Chart of Phase Diagrams 1.0

Binary Mixture of COSMO-RS 0.8 Butanol and Water at 60° C 0.6 Calculated y Experiment 0.4 Equilibrium data: 0.2 activity coefficients miscibility gap vapor pressure, 0.0 Quantum Chemical 0.0 0.2 0.4 x 0.6 0.8 1.0 solubility, Calculation with COSMO partition coefficients (full optimization)

σ-potenstigimaal-p otefn timal ixture

0.1

0.05

0 -0.02 -0.01 0 0.01 0.02 -0.05

-0.1 σ -0.15 -profilsesigm oaf-p rcoofilmespounds -0.2 14 vanillin ideally screened molecule 12 w ater energy + screening charge 10 Fast Statistical acetone distribution on surface 8 Thermodynamics 6 Database of 4 COSMO-files other compounds 2 σ-profile (incl. all common 0 of mixture solvents) -0.02 -0.01 0 0.01 0.02 screening charge density [e/A²] DFT/COSMO COSMOtherm How to come to the latitudes of solvation?

state of ideal screening home of COSMOlogic COSMO-RS water Quantum Chemistry latitudes of with dielectric solvation solvation models acetone like COSMO or PCM horizon of COSMO-RS solid -OCH3 -C a ar MD / MC r C simulations alkanes - H -C(=O)H -O H -Car r QM/MM state -C a bridge of arH C H - horizon of gas-Car-Parrinello symmetry -Car phase methods

gas phase Group contribution methods UNIFAC, CLOGP, LOGKOW, etc. native home of computational chemistry Extension of COSMOtherm to multi-conformations Unfortunately, many molecules have more than one relevant conformation

COSMOtherm can treat a compound as a set of several conformers - each conformer needs a COSMO calculation - conformational population is treated consistently according to total free energy of conformers (by external self-consistency loop) Conformational effects for glycerol lowest COSMO conformer all 3 donors are bound in one 6-ring and two 5-rings, also least polar conformer partition coefficient between 39% in octane acetone and octone: C 9%on inc laucetsioonens:

2nd COSMO conformer logKAO = -3.3 (lowest conformer) - Conformational effects can be important for the detailed E =+0.37 kcal/mol logKAO = -4.0 (conformer ensemble) uconsmdoerstanding of phase equilibria Ediel =+2 kcal/mol 1 free donor, two bound in difference of 0.7 log-units ≈ 1 kcal/mol o- nIen 6m-roinsgt acnads eosn oe n5e-r icnognsformation dominates in all phases

16% in octane σ -profiles of glycerol conformers - Effects are especially large for molecules with sub-optimal 8% in acetone 14 h2o intramolecular HBs in solvents having strong HB acglycceropl4t_ocorsmso,c01 12 glycerol2_cosmoc02 7 tbh COSut a MdeOfi cciotn ofof rHmBer-donors. glycerol3_cosmoc04 glycerol1_cosmoc05 10 Ecosmo=+1.3 kcal/mol glycerol0_cosmoc03 glycerol3_cosmoc05 8 -ETdaielu t o =+me3r.3s kccaanl /bmeo lconsidered as a kind of conformerglsyc.erol3_cosmoc03 z 2 free donors, one bound 6 in strong 6-ring -Unfortunately the DFT level of QC is not alw4 ays reliable ( reepgraersdenintsg ~ t4h esi menilearrg y differences between conformers and 2 c eovnefonr mmaotiroen bs)etween tautomers. Energy corrections may be required. 2% in octane 0 -0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02 41% in acetone σ „Conformational analysis of cyclic acidic α-amino acids in aqueous solution - an evaluation of different continuum hydration models." by Peter Aadal Nielsen, Per-Ola Norrby, Jerzy W. Jaroszewski, and Tommy Liljefors (private comm., Ph.D. thesis) Method Solvent rms rms (4 points) Max Dev Model (kJ/mol) (kJ/mol) (kJ/mol) AM1 SM5.4A 4.6 5.6 9.2 PM3 SM5.4P 13.6 16.2 20.5 AM1 SM2.1 7.4 9.0 16.7 HF/6-31+G* C-PCM 3.1 3.8 5.9 HF/6-31+G* PB-SCRF 4.7 5.8 8.8 AMBER* GB/SA 13.2 16.2 24.3 MMFF GB/SA 18.5 19.9 31.4

BP-DFT/TZVP COSMO-RS 2.2 2.6 4.8 COSMO-RS was evaluated as a blind test !!! 1

0 Water Solubility log(xH2O) -1 calculated with COSMOtherm

-2 Dataset taken from Jorgensen and Duffy (BOSS)

-3

-4 R2= 0.90 t -5 n rms=0.66 e m i -6 n = 150 r e p

x -7

E DGfus < 0 -8 DGfus > 0 -9 McFarland Test Set

-10 questionable

-11 X X X logS S = (µ X-µ S+ min(0,∆ Gfus))/1.365 -12 X X X ∆ G fus = 0.54 µ water - 0.18*N ringatom +0.0029*volume -13 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 Calculated Stable model: No changes required for pesticides! A.Klamt, F. Eckert, M. Hornig, M. Beck, and T. Bürger: J. Comp. Chem. 23, 275-281 (2002) COSMOtherm prediction of drug solubility in diverse solvents (blind test performed with Merck&Co., Inc., Rahway, NJ, USA)

all predictions are relative to ethanol

solvents: Water 1-Propanol 2-Propanol DMF Ethyl Acetate Methanol triethylamine Heptane Toluene Chlorobenzene Acetone Ethanol heptane Acetonitrile Triethylamine Butanol Ionic Free Energies of Hydration by COSMOtherm-Ion-Extension

Free energy of Hydration [kcal/mol] for Ions

-50

-60

-70

-80 d e t a l u c l a

C -90

-100

-110

-120 -120 -110 -100 -90 -80 -70 -60 -50 Experiment Applications of COSMOtherm to Ionic Liquids

ln(gamma_inf) calc. / exp. (T=314/333K) log(Partition) for H2O / 1-butyl-3-methyl- in 4-methyl-n-butylpyridinium BF4 imidazolium(+) - PF (-) 6 Lit: Andreas Heintz, Dmitry V. Kulikov, Sergey P. Verevkin, J. Chem. exp.: J.G. Huddleston,University of Alabama Eng. Data 2001, 46, 1526-1529 4.0 6 3.0 5 m

2.0 r e 4 h t

1.0 O 3 M

S non-aromatic

0.0 O 2 compounds C aromatic compounds -1.0 1 0 -2.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 0 2 4 6 COSMOtherm (pure prediction) exp. COSMOtherm appears to work well for Ionic Liquids formicacid aceticacid chloroaceticacid COSMOtherm first principle pKa prediction dichloroaceticacid0 ( A. Klamt, et. al. J. Phys. Chem. A, Nov. 2003) trichloroaceticacid n-pentanoicacid 2,2- dimethylpropanoicacid 18.00 benzoicacid pKa = 0.59∆ Gdiss/(RTln10) +0.88 oxalicacid0 maleicacid3 16.00 2 N=60 R =0.978, rms=0.49 fumaricacid 14.00 carbonicacid0 latest results for bases (pK ): phenol b 12.00 similar rms pentachlorophenol p slope betweene t0ha.5no9l and 0.71 x 10.00

e 2,2,2-trichloroethanol

_ all hypochlorousacid a 8.00

K hypobromousacid p alcohols hypoiodousacid 6.00 nitrousacid 4.00 sulfurousacid carboxylic acids phosphoricacid2 2.00 boricacid inorganic acids 5-fluorouracil 0.00 5-nitrouracil subst. phenols cis-5-formyluracil 0 10 20 30 40 thymine trans-5-formyluracil ∆ Gdiss N-acids (uracils, Uracil imines) and others Linear (all) σ-Moment Approach

m 0.70 i i µ σ ≅ σ σ = σ ≥ 0.50 ( ) ∑ c f ( ) with f ( ) for i 0 and l

S S i i a 0.30 i = − t i 2 n 0.10 e t

o -0.10 Water p  0 if ± σ < σ - Acetone hb σ -0.30 f (σ ) = f (σ ) ≅  Hexane − 2/ − 1 acc / don  σ + σ ± σ > σ -0.50  hb if hb -0.70 -0.020 -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 Now the chemical potential of a solute X in this matrix S is: m m µ X = X σ µ σ σ ≅ X σ i σ σ ≅ i X S ∫ p ( ) S ( ) d ∫ p ( ) ∑ cS fi ( ) d ∑ cS M i i= − 2 i= − 2 X = X σ σ σ = σ − with M i ∫ p ( ) fi ( )d moments of solute X The coefficients can now be derived from experimental (log.) partition data by linear regression. => σ-moments are excellent QSAR-descriptors for general partition behaviour of molecules. “The solvent space is approximately 5-dimensional!“ Zissimos, et al.: ‘A comparison between the two general sets of linear free energy descriptors of Abraham and Klamt‘, J. Chem. Inf. Comput. Sci., 42, 1320-1331 (2002)  σ -moment models for ADME proprties as logBB, intestinal absorption, logHSA, … σ -moment logBB regression logBB = 0.0046 area -0.017 sig2 -0.0029 sig3 +0.19 n = 103, r² = 0.71, rms = 0.40 data from: "Modeling Blood-Brain Barrier Partitioning Using Topological Structure Descriptors", Rose, Hall, Hall, and Kier, MDL-Whitepaper, 2003

1.5 minimum_COSMO_conf. CORINA_optimized 1.0

0.5

0.0 . p -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 x e -0.5

-1.0

-1.5

-2.0 calc. σ-moment logKHSA regression logKHSA = 0.0081 area -0.016 sig2 -0.013 sig3 +0.145 sigHacc+0.88 n = 82, r² = 0.69, rms = 0.33 data from: Kier, Hall, Hall, MDL-Whitepaper, 2002

1.5

1.0

0.5 ] .

p 0.00811599

x -0.01641931 7 e

[ 1

)

A 0.0 S H ( K g o l -0.5

-1.0

-1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 logK(HSA) [calc.] COSMO-RS for Percentage Intestinal Absorption (PIA) Klamt, Diedenhofen, Connolly*, Jones* (submitted) *) GlaxoSmithKline

= + log KIA 0.0040M0 - 0.0053M2 - 0.0024M3 - 0.113Macc - 0.117Mdon 1.37

100 training set: n=38, 90 rms=12.5% high quality test set: n=107, 80 rms=12.8% questionable test set: 70 n=24, rms=22%

60 . p x

e 50

A I

P 40

30

20

10

0 0 10 20 30 40 50 60 70 80 90 100 PIA calculated by COSMO-KIA Prediction of Soil Sorption Journal of Environmental Toxicology and Chemistry, in print

7 Training Set rms=0.63 6 Test Set rms=0.72

5 Linear (Training Set rms=0.63)

c 4 o K g o l

3 . p x e 2

1

0

-1 -1 0 1 2 3 4 5 6 7 COSMO-KOC Adsorption to Activated Carbon 15

Fluid Phase (23 Adsorptives) 10 Gasphase (15 Adsorptives) y = 0.9277x + 0.1379 R2 = 0.9281 Linear (Fluid Phase (23 ]

) Adsorptives))

. 5 r o e h t ( e H

[ 0 n l

-5

-10 -10 -5 0 5 10 15 ln[He(exp.)] [Mehler, Peukert (TU München), Klamt; to be published] Free Energies relevant for Reactions localisation of transition state transition often complicated: In this work state (TS) gas-phase TS have been localised ∆ using techniques provided in Gactivation Gaussian98 ⇒ kinetic (DFT: B3LYP, 6-31G*) after that: single-point DFT/COSMO with constant TURBOMOLE (BP91-TZVP) DFT is not reliable for TS energies but the solvent shifts should be reliable. sum of educts sum of products

solvent1 ∆ Greact solvent2 ⇒equilibrium solvent3 constant ∆ Calculation of the solvent dependence of Greact is straightforward with COSMO-RS. Successful applications have been reported by industrial users (Dr. Franke, Degussa AG; Dr. Lohrenz, Bayer AG) COSMOmic: Simulation of molecules in micelles and membranes

Concept: -define layers of membrane (shells of micelle) -get probability to find a certain atom of surfactant in each layer (e.g. from MD) -convert this into a σ-profile p(σ,r) for each layer r using the COSMO-file of the surfactant -use COSMOtherm to calculate µ(σ,r) considering each layer as a liquid mixture

o o o -now calculate the chemical potential of a solute X in a certain postion and orientation by summing the chemical potentials of its segments in the respective layer. o -sample the chemical potentials all positions and orientations of X

o o -construct a total partition sum and get the probability to find the solute in a certain depth and orientation. -also get the average volume expansion in each layer - get a kind of micelle or membrane-water partition coefficient The tool COSMOmic facilitates all the previous steps together with COSMOtherm Perspective: self-consistent treatment of new surfactants; CMC prediction COSMOfrag: A fast shortcut of COSMOtherm suited for HTS-ADME prediction 1) large database of precalculated drug-like compounds (about 45000) 2) for new compound find most similar fragments in database 3) compose COSMO surface from surface fragments (write a meta-file) 4) do usual COSMOtherm: solubility, partition properties advantages: -about 1 sec. per compound! -you can add your typical inhouse structures to database -simple refinement of calculations

COSMOfrag ports COSMO-RS to Cheminformatics! COSMOfrag:statistics and examples

Prediction of Soil Sorption Coefficients Water Solubility with COSMOfrag 0 -10with CO-8SMOfra-6g -4 -2 0 ] 6 . -2 p

5 x e [

-4

4 ) a O t 2 a d

3 H -6 . p x x (

e 2

g Dataset of Jorgensen -8 o

1 Trainingsset rms=0.72 (0.63) l and Duffy rms: 0.71 (0.66) 0 Testset rms=0.81(0.72) -10 0 1 2 3 4 5 6 COSMOfrag log(xH2O) [meta] Ligand – Recptor Binding

Mouth of the Retinal binding pocket

50

45 σ−profiles of the binding pocket of 40 bacteriorhodopsin and retinal 35 30

25

20 Retinal 15

Bacteriorhodopsin 10 binding pocket 5 Retinal 0 -0.03 Mea-n0.w02hile w-0e. 0c1an appr0oximate0.l0y1 treat e0n.0z2ymes 0a.n03d receptor pockets. The goal is to describe ligand receptor binding (incl. desolvation) based COSMO polarization cahrge densities σ. COSMOsim bio-isoster search based on s-profiles examples by Dr. M. Thormann, Morphochem AG

If the physiological distribution and the drug-receptor binding are governed by the COSMO σ-profiles, it is reasonable to use these for drug-similarity searching:

- search for molecules with maximum similarity of σ-profiles in order to find molecules with similar interactions, but different chemistry

-search is only based on surface polarity (σ) and not on structure

 scaffold hopping

- either search over full COSMO-files of COSMOfrag-DB (48000 compounds) -screen millions of candidate compounds using the COSMOfrag method -Refine your search by explicit COSMO calculations on the most similar ~500 compds.

Lit: M. Thormann, A. Klamt, M. Hornig and M. Almstetter, "COSMOsim: Bioisosteric Similarity Based on COSMO-RS σ-Profiles”, J. Chem. Inf. Model. 46, (2006).

A.Bender, A. Klamt, K. Wichmann, M. Thormann, and R.C. Glen, „Molecular Similarity Searching Using COSMO Screening Charges (COSMO/3PP)“, in M.R. Berthold et al. (Eds.): CompLife 2005, LNBI 3695, pp. 175–185, 2005.Springer, Berlin Heidelberg 2005 Example 1: propionic acid

CCC(=O)O ZFQCMUCKI 0 1 propionic acid similars OC(=O)C=C ITPZMBCLI 1 0.8169 12 CCCC(=O)O IAVMXKDKI 2 0.7996 CC=CC(=O)O RGQGEAHMI 3 0.791 10 CC(=C)C(=O)O WCMTTAFLI 4 0.765 p7 8 CC=CC(=O)O VGZSDPDLI 5 0.7584 p8 CC(C)C(=O)O DGWQYNDKI 6 0.7487 6 p9 p12 OCC1CO1 SDLNNSMIA 7 0.7269 4 p13 CC(O)C#N HTYYARCJZ 8 0.7233 p0 2 Oc1nnns1 NBAKLRQLI 9 0.7171 p15 CC(O)C(=O)O WOJBMNDKV 10 0.7109 0 CC(=O)O CZWYICCKI 11 0.7052 -0.03 -0.02 -0.02 -0.01 -0.01 0 0.01 0.01 0.02 0.02 0.03 -2 Clc1nnn[nH]1 JMAKWZALI 12 0.7041 CC(=NO)C EZHYEWAJI 13 0.6983 OCCC(=O)O FFBMJKDKI 14 0.6978 CC(=O)C=NO HOMSZUGLI 15 0.6919 Oc1csnn1 UMBRJEKLI 16 0.6885 p7 p9 OC(=O)C1CCC1 CUOCJIGKI 17 0.6817 p8 OCCS HLKLSJLHI 18 0.6804 CC1CC1C(=O)O GXSEIQGKP 19 0.6767

p12 p13 p15 Example 2: Metabotropic Glutamate Receptor Ligands

Synthesis and Pharmacology of Metabotropic Glutamate Receptor Ligands Grube-Jörgensen et al., ISMC 2004P239 Drugs of the Future 2004 (29) Suppl. A: XVIIIth Symposium on MEDICINAL CHEMISTRY

16

14 a 12 b 10 c d 8

6

4

2

0 -0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02

A B C D a b c d A 1.000 0.711 0.666 0.697 0.396 0.440 0.459 0.488 Tanimotoprime coefficients for COSMOsim matrix B 0.711 1.000 0.852 0.835 0.406 0.487 0.459 0.530 Glu (A), ibotenic acid (B), and thioibotenic acid (C) are C 0.666 0.852 1.000 0.857 0.378 0.461 0.455 0.507 known mGluR agonists.D is novel and does also show D 0.697 0.835 0.857 1.000 0.357 0.437 0.403 0.492 mGluR agonist activity with mGluR subtype specificity a 0.396 0.406 0.378 0.357 1.000 0.665 0.679 0.642 most similar to that of C. The querie of d to our inhouse database containing > 2.000.000 sigma profiles employing b 0.440 0.487 0.461 0.437 0.665 1.000 0.742 0.792 the Tanimotoprime coefficient retrieves b at rank 3 c 0.459 0.459 0.455 0.403 0.679 0.742 1.000 0.700 with a similarity of 0.792. d 0.488 0.530 0.507 0.492 0.642 0.792 0.700 1.000 (M. Thormann, Morphochem, 2004) COSMO-RS: From Quantum Chemistry to Cheminformatics

• The quantum-chemically derived surface polarization charge densities σ provide a novel and very rich description of molecular interactions in liquids and pseudo-liquids phases, combing electrostatics, hydrogen bonding and “hydrophobic interactions“ in one picture.

• COSMO-RS provides a novel, extremely fast and efficient way to do thermodynamics based on σ-profiles.

• drug solubility and many important ADME properties can be calculated with COSMO-RS

• Quantum chemical DFT/COSMO calculations are reasonably feasible for a few hundred or thousand drug-like molecules.

• COSMOfrag derives approximate s-profiles for druglike compouds in a second.

• COSMOsim enables drug-similaity screening based on σ-profiles ------Outlook: Ligand recepor binding based on σ-profiles Hope you enjoyed the trip to the latitudes of solvation!

state of ideal screFenoinrg r eferences see: home of COSMOwlowgiwc .cosmologic.de COSMO-RS water Quantum Chemistry latitudes of with odrie rleecatdri cm y b o ok (Elsevier, 2005) solvation COSsMolvOat-iRonS :m Fordoemls Quantum Chemistry to acetonFeluid P h aliskee TChOeSrMmoO dynamics and Drug Design or PCM horizon of COSMO-RS solid -OCH3 -C a ar MD / MC r C simulations alkanes - H -C(=O)H -O H -Car r QM/MM state -C a bridge of arH C H - horizon of gas-Car-Parrinello symmetry -Car phase methods We are looking for an excellent gas phase cheminforatics expGerorutp ctoon trjiobuitnio no muerth ods UNIFAC, CLOGP, teamL!OGKOW, etc. native home of computational chemistry COSMO-RS for Drug-Design and -Development • water solubility of drugs, • Solvent Screening: relative solubilities of drugs in various solvents and mixtures • partition behaviour between almost arbitrary phases (blood-brain, intestinal absorption, BCF, ...)

•. pKa prediction • of partition coefficients and solubility as surface properties • one descriptor (σ) for entire interactions - electrostatics - hydrogen bonding - lipophilicity/hydrophobicity => useful property for MFA - chemical potential of surfaces in solution (morphology of drugs) •.identification of binding sites from σ-hotspots • surface integral scoring function for , including desolvation - extension to membrane and micelle partitioning

------COSMOfrag: σ-profiles built from similar fragments out of 30000 compound database brings COSMO-RS in to the range of 5 sec./compound => applicable to HTS Ideas for drug drug-receptor binding with COSMOtherm

-we need the σ-profile of the receptor once (QM/MM? not yet solved)

- we simply have the σ-profile of the ligands (even from COSMOfrag)

Idea 1: generate scoring function from COSMO-RS surface interaction model Idea 2: consider receptor pocket as a kind of pseudo-liquid (overestimated receptor flexibility, but may be interesting)

Both simply include desolvation

Sigma profiles of Enzymes calculated with linear-scaling AM1/COSMO (MOZYME in MOPAC2002)

Some common features: • Large charge

distribution in the 1000 region around σ = 0. 900 • Carbonyl oxygen between 0.01 and 0.02. 800 700 • Charged side chains in Bacteriorhodopsin 600 Barnase the outer regions Isomerase (σ<-0.02 andσ 500 BPTI Crambin 400 >0.02) Papain HIV-1 Protease 300

200

100

0 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 Bacteriorhodopsin and Retinal

Mouth of the Retinal binding pocket

50

45 Sigma profiles of the binding pocket of 40 bacteriorhodopsin and retinal 35 30

25

20 Retinal 15

Bacteriorhodopsin 10 binding pocket 5 Retinal 0 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 A few shots of the binding pocket Amino Acids: Sigma profiles on two computational levels

Alanine, AM1 Alanine 14 Alanine, BP/SVP

12 Glutamic acid, AM1 Glutamic acid, BP/SVP 10 Histidine, AM1 Glutamic Histidine, BP/SVP acid 8

6

-COOH 4 N lone pair 2 Histidine -COOH 0 -0.03 -0.02 -0.01 0 0.01 0.02 0.03