Evaluation of Log P, Pka, and Log D Predictions from the SAMPL7 Blind Challenge
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Computer-Aided Molecular Design (2021) 35:771–802 https://doi.org/10.1007/s10822-021-00397-3 Evaluation of log P, pKa, and log D predictions from the SAMPL7 blind challenge Teresa Danielle Bergazin1 · Nicolas Tielker6 · Yingying Zhang3 · Junjun Mao4 · M. R. Gunner3,4 · Karol Francisco5 · Carlo Ballatore5 · Stefan M. Kast6 · David L. Mobley1,2 Received: 21 April 2021 / Accepted: 5 June 2021 / Published online: 24 June 2021 © The Author(s) 2021 Abstract The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefcients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 diferent groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as to the sign of the free energy change associated with certain transitions), indicating far more work needs to be done on pKa prediction methods. Keywords log P · SAMPL · Free energy calculations · pKa Abbreviations SAMPL Statistical Assessment of the Modeling of Pro- teins and Ligands log P log10 of the organic solvent-water partition coef- fcient ( Kow ) of neutral species D Disclaimers The content is solely the responsibility of the authors log log10 of organic solvent-water distribution coef- and does not necessarily represent the ofcial views of the fcient ( Dow) National Institutes of Health. pKa −log10 of the acid dissociation equilibrium constant * David L. Mobley [email protected] SEM Standard error of the mean RMSE Root mean squared error 1 Department of Pharmaceutical Sciences, University MAE Mean absolute error of California, Irvine, Irvine, CA 92697, USA Kendall’s rank correlation coefcient (Tau) 2 Department of Chemistry, University of California, Irvine, R2 Coefcient of determination (R-Squared) Irvine, CA 92697, USA QM Quantum mechanics 3 Department of Physics, The Graduate Center, City University MM Molecular mechanics of New York, New York 10016, USA DL Database lookup 4 Department of Physics, City College of New York, LFER Linear free energy relationship New York 10031, USA QSPR Quantitative structure-property relationship 5 Skaggs School of Pharmacy and Pharmaceutical ML Machine learning Sciences, University of California, San Diego, Ja Jolla, LEC Linear empirical correction CA 92093-0756, USA 6 Physikalische Chemie III, Technische Universität Dortmund, Otto-Hahn-Str. 4a, 44227 Dortmund, Germany Vol.:(0123456789)1 3 772 Journal of Computer-Aided Molecular Design (2021) 35:771–802 Introduction on specifc areas in need of improvement, SAMPL helps drive progress in computational modeling. Computational modeling aims to enable molecular design, Here, we report on a SAMPL7 physical property chal- property prediction, prediction of biomolecular interac- lenge that focused on octanol-water partition coefcients tions, and provide a detailed understanding of chemical (log P) and pKa for the series of molecules shown in and biological mechanisms. Methods for making these Fig. 1. The pKa of a molecule, or the negative logarithm types of predictions can sufer from poor or unpredictable of the acid-base dissociation constant, is related to the performance, thus hindering their predictive power. With- equilibrium constant for the dissociation of a particular out a large scale evaluation of methods, it can be difcult acid into its conjugate base and a free proton. The pKa to know what method would yield the most accurate pre- also corresponds to the pH at which the corresponding dictions for a system of interest. Large scale comparative acid and its conjugate base each are populated equally in evaluations of methods are rare and difcult to perform solution. Given that the pKa corresponds to a transition because no individual group has expertise in or access to between specifc protonation states, a given molecule may all relevant methods. Thus, methodological studies typi- have multiple pKa values. cally focus on introducing new methods, without extensive The pKa is an important physical property to take into comparisons to other methods. account in drug development. The pKa value is used to The Statistical Assessment of Modeling of Proteins indicate the strength of an acid. A lower pKa value indi- and Ligands (SAMPL) challenges tackle modeling areas cates a stronger acid, indicating the acid more fully disso- in need of improvement, focusing the community on one ciates in water. Molecules with multiple ionizable centers accuracy-limiting problem at a time. In SAMPL chal- have multiple pKa values, and knowledge of the pKa of lenges, participants predict a target property such as solva- each of the ionizable moieties allows for the percentage tion free energy, given a target set of molecules. Then the of ionised/neutral species to be calculated at a given pH corresponding experimental data remains inaccessible to (if activity coefcients are known/assumed). pKa plays a the public until the challenge ofcially closes. By focusing particularly important role in drug development because the ionization state of molecules at physiological pH O O O O O O O O O O O O O O O O O S S N S S O S S N N N N H H H H H N H SM25 SM26 SM27 SM28 SM29 SM30 O S S S O O O O O O O O O O S S S S S S O O O O N N N N N N S S H H H H N N H H SM31 SM32 SM33 SM34 SM35 SM36 O O O O O O O O O O O S N O O S S S O N S O S O O O O O O S NH NH N N S S S H N N N N H H H SM37 SM38 SM39 SM40 SM41 SM42 O O O O O O O O N N N N O S N N S N S N S N NH NH NH N NH N N SM43 SM44 SM45 SM46 Fig. 1 Structures of the 22 molecules used for the SAMPL7 physical ment, except for compounds SM27, SM28, SM30-SM34, SM36- property blind prediction challenge. Log of the partition coefcient SM39 which had log D7.4 values determined via shake-fask assay. between n-octanol and water was determined via potentiometric titra- PAMPA assay data includes efective permeability, membrane reten- tions using a Sirius T3 instrument. pKa values were determined by tion, and log of the apparent permeability coefcient. Permeabilities potentiometric titrations using a Sirius T3 instrument. Log of the dis- for compounds SM33, SM35, and SM39 were not determined. Com- tribution coefcient between n-octanol and aqueous bufer at pH 7.4 pounds SM35, SM36 and SM37 are single cis confguration isomers. were determined via potentiometric titrations using a Sirius T3 instru- All other compounds are not chiral 1 3 Journal of Computer-Aided Molecular Design (2021) 35:771–802 773 can have important ramifcations in terms of drug-target drug-target and of-target interactions through hydropho- interactions (e.g., ionic interactions) and/or by infuencing bic interactions, and relatively high lipophilicity results in other key determinants of drug absorption, distribution, reduced aqueous solubility and increased likelihood of meta- metabolism and excretion (ADME) [1], such as lipophilic- bolic instability [5]. ity, solubility, membrane permeability and plasma protein Prediction of partitioning and distribution has some rel- binding [2]. evance to drug distribution. Particularly, partitioning and Accurate pKa predictions play a critical role in molecu- distribution experiments involve a biphasic system with lar design and discovery as well since pKa comes up in so separated aqueous and organic phases, such as water and many contexts. For example, inaccurate protonation state octanol, so such experiments have some of the features predictions impair the accuracy of predicted distribution of the interface between blood or cytoplasm and the cell coefcients such as those from free energy calculations. membrane [6, 7] and thus improved predictive power for Similarly, binding calculations can be afected by a change partitioning and distribution may pay of with an improved in protonation state [3]. If a ligand in a protein-ligand sys- understanding of such in vivo events. tem has a diferent protonation state in the binding pocket Methods to predict log P/log D may also use (and test) compared to when the molecule is in the aqueous phase, some of the same techniques which can be applied to bind- then this needs to be taken into account in the thermo- ing predictions. Both types of calculations can use solvation dynamic cycle when computing protein-ligand binding free energies and partitioning between environments (though afnities. this could be avoided by computing the transfer free energy). Multiprotic molecules, and those with multiple tauto- Such solute partitioning models are simple test systems for meric states, have two types of pKa, microscopic and mac- the transfer free energy of a molecule to a hydrophobic roscopic. The microscopic pKa applies to a specifc transi- environment of a protein binding pocket, without having to tion or equilibrium between microstates, i.e. for a transition account for additional specifc interactions which are present between a specifc tautomer at one formal charge and that in biomolecular binding sites.