Assessing the Accuracy of Octanol-Water Partition
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.20.913178; this version posted January 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Assessing THE ACCURACY OF octanol-water 1 PARTITION COEffiCIENT PREDICTIONS IN THE SAMPL6 2 Part II LOG Challenge 3 P 1*,2† 3† 4 Mehtap Işık (ORCID: 0000-0002-6789-952X) , TERESA Danielle BerGAZIN (ORCID: 0000-0002-0573-6178) , 5 1,6 5 Thomas FoX (ORCID: 0000-0002-1054-4701) , AndrEA Rizzi (ORCID: 0000-0001-7693-2013) , John D. ChoderA 1 3,4 6 (ORCID: 0000-0003-0542-119X) , David L. MobleY (ORCID: 0000-0002-1083-5533) Computational AND Systems Biology PrOGRam, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, 7 1 NeW York, NY 10065, United States; Tri-INSTITUTIONAL PhD PrOGRAM IN Chemical Biology, WEILL Cornell GrADUATE 8 2 School OF Medical Sciences, Cornell University, NeW York, NY 10065, United States; Department OF Pharmaceutical 9 3 Sciences, University OF California, Irvine, Irvine, California 92697, United States; Department OF Chemistry, 10 4 University OF California, Irvine, Irvine, California 92697, United States; Computational Chemistry, Medicinal 11 5 Chemistry, Boehringer INGELHEIM Pharma GmbH & Co KG, 88397 Biberach, Germany; Tri-INSTITUTIONAL TRAINING 12 6 PrOGRAM IN Computational Biology AND Medicine, NeW York, NY 10065 13 14 *For CORRespondence: 15 [email protected] (MI) † 16 These AUTHORS CONTRIBUTED EQUALLY TO THIS WORK 17 18 AbstrACT The SAMPL Challenges AIM TO FOCUS THE BIOMOLECULAR AND PHYSICAL MODELING COMMUNITY ON ISSUES THAT LIMIT THE 19 ACCURACY OF PREDICTIVE MODELING OF PRotein-ligand BINDING FOR RATIONAL DRUG design. IN THE SAMPL5 LOG D Challenge, DESIGNED TO 20 BENCHMARK THE ACCURACY OF METHODS FOR PREDICTING drug-likE SMALL MOLECULE TRANSFER FREE ENERGIES FROM AQUEOUS TO NONPOLAR 21 phases, PARTICIPANTS FOUND IT DIffiCULT TO MAKE ACCURATE PREDICTIONS DUE TO THE COMPLEXITY OF PROTONATION STATE issues. IN THE 22 SAMPL6 LOG P Challenge, WE ASKED PARTICIPANTS TO MAKE BLIND PREDICTIONS OF THE octanol-water PARTITION COEffiCIENTS OF NEUTRAL 23 SPECIES OF 11 COMPOUNDS AND ASSESSED HOW WELL THESE METHODS PERFORMED ABSENT THE COMPLICATION OF PROTONATION STATE 24 Effects. This CHALLENGE BUILDS ON THE SAMPL6 PK A Challenge, WHICH ASKED PARTICIPANTS TO PREDICT PK A VALUES OF A SUPERSET OF THE 25 COMPOUNDS CONSIDERED IN THIS LOG P challenge. Blind PREDICTION SETS OF 91 PREDICTION METHODS WERE COLLECTED FROM 27 RESEARCH 26 GRoups, SPANNING A VARIETY OF QUANTUM MECHANICS (QM) OR MOLECULAR MECHANICS (MM)-based PHYSICAL methods, KNOwledge-based 27 EMPIRICAL methods, AND MIXED APPRoaches. TherE WAS A 50% INCREASE IN THE NUMBER OF PARTICIPATING GROUPS AND A 20% INCREASE 28 IN THE NUMBER OF SUBMISSIONS COMPARED TO THE SAMPL5 LOG D Challenge. Overall, THE ACCURACY OF octanol-water LOG P PREDICTIONS 29 IN SAMPL6 Challenge WAS HIGHER THAN CYCLOHExane-water LOG D PREDICTIONS IN SAMPL5, LIKELY BECAUSE MODELING ONLY THE NEUTRAL 30 SPECIES WAS NECESSARY FOR LOG P AND SEVERAL CATEGORIES OF METHOD BENEfiTED FROM THE VAST AMOUNTS OF EXPERIMENTAL octanol-water 31 LOG P data. TherE WERE MANY HIGHLY ACCURATE methods: 10 DIVERSE METHODS ACHIEVED RMSE LESS THAN 0.5 LOG P units. These INCLUDED 32 QM-based methods, EMPIRICAL methods, AND MIXED METHODS WITH PHYSICAL MODELING SUPPORTED WITH EMPIRICAL CORRections. A 33 COMPARISON OF PHYSICAL MODELING METHODS SHOWED THAT QM-based METHODS OUTPERFORMED MM-based methods. The AVERAGE 34 RMSE OF THE MOST ACCURATE fiVE MM-based, QM-based, empirical, AND MIXED APPROACH METHODS BASED ON RMSE WERE 0.92,0.13, 35 0.48,0.06, 0.47,0.05, AND 0.50,0.06, RESPECTIVELY. 36 0.1 KeYWORDS 37 38 octanol-water PARTITION COEffiCIENT ⋅ LOG P ⋅ BLIND PREDICTION CHALLENGE ⋅ SAMPL ⋅ FREE ENERGY CALCULATIONS ⋅ SOLVATION MODELING 1 OF 50 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.20.913178; this version posted January 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 0.2 AbbrEVIATIONS 39 40 SAMPL Statistical Assessment OF THE Modeling OF PrOTEINS AND Ligands 41 LOG P LOG10 OF THE ORGANIC solvent-water PARTITION COEffiCIENT (Kow) OF NEUTRAL SPECIES 42 LOG D LOG10 OF ORGANIC solvent-water DISTRIBUTION COEffiCIENT (Dow) * 43 PK A LOG10 OF THE ACID DISSOCIATION EQUILIBRIUM CONSTANT 44 SEM StandarD ERROR OF THE MEAN 45 RMSE Root MEAN SQUARED ERROR 46 MAE Mean ABSOLUTE ERROR 47 Kendall’S RANK CORRELATION COEffiCIENT (Tau) 2 48 R CoeffiCIENT OF DETERMINATION (R-Squared) 49 QM Quantum Mechanics 50 MM Molecular Mechanics 1 INTRODUCTION 51 52 The DEVELOPMENT OF COMPUTATIONAL BIOMOLECULAR MODELING METHODOLGOIES IS MOTIVATED BY THE GOAL OF ENABLING QUANTITATIVE 53 MOLECULAR design, PREDICTION OF PROPERTIES AND BIOMOLECULAR INTERactions, AND ACHIEVING A DETAILED UNDERSTANDING OF MECHANISMS 54 (chemical AND biological) VIA COMPUTATIONAL PRedictions. While MANY APPROACHES ARE AVAILABLE FOR MAKING SUCH PRedictions, 55 METHODS OFTEN SUffER FROM POOR OR UNPREDICTABLE performance, ULTIMATELY LIMITING THEIR PREDICTIVE POwer. IT IS OFTEN DIffiCULT 56 TO KNOW WHICH METHOD WOULD GIVE THE MOST ACCURATE PREDICTIONS FOR A TARGET SYSTEM WITHOUT EXTENSIVE EVALUATION OF methods. 57 HoWEver, SUCH EXTENSIVE COMPARATIVE EVALUATIONS ARE INFREQUENT AND DIffiCULT TO perform, PARTLY BECAUSE NO SINGLE GROUP HAS 58 EXPERTISE IN OR ACCESS TO ALL RELEVANT METHODS AND ALSO BECAUSE OF THE SCARCITY OF BLIND EXPERIMENTAL DATA SETS THAT WOULD ALLOW 59 PROSPECTIVE Evaluations. IN addition, MANY PUBLICATIONS WHICH REPORT METHOD COMPARISONS FOR A TARGET SYSTEM CONSTRUCTS THESE 60 STUDIES WITH THE INTENTION OF HIGHLIGHTING THE SUCCESS OF A METHOD BEING DEveloped. 61 The SAMPL (Statistical Assessment OF THE Modeling OF PrOTEINS AND Ligands) Challenges [http://samplchallenges.github.io] 62 PROVIDE A FORUM TO TEST AND COMPARE METHODS WITH THE FOLLOWING goals: 63 1. Determine PROSPECTIVE PREDICTIVE POWER RATHER THAN ACCURACY IN RETROSPECTIVE tests. 64 2. AlloW A HEAD TO HEAD COMPARISON OF A WIDE VARIETY OF METHODS ON THE SAME data. 65 Regular SAMPL CHALLENGES FOCUS ATTENTION ON MODELING AREAS THAT NEED IMPROvement, AND SOMETIMES REVISIT KEY TEST systems, 66 PROVIDING A CROWDSOURCING MECHANISM TO DRIVE PROGRess. Systems ARE CAREFULLY SELECTED TO CREATE CHALLENGES OF GRADUALLY 67 INCREASING COMPLEXITY SPANNING BETWEEN PREDICTION OBJECTIVES THAT ARE TRACTABLE AND THAT ARE UNDERSTOOD TO BE SLIGHTLY BEYOND 68 THE CAPABILITIES OF CONTEMPORARY methods. So far, MOST FREQUENT SAMPL CHALLENGES HAVE BEEN ON SOLVATION AND BINDING systems. 69 ITERATED BLIND PREDICTION CHALLENGES HAVE PLAYED A KEY ROLE IN DRIVING INNOVATIONS IN THE PREDICTION OF PHYSICAL PROPERTIES AND 70 binding. HerE WE REPORT ON A SAMPL6 LOG P Challenge ON octanol-water PARTITION COEfficients, TREATING MOLECULES RESEMBLING 71 FRAGMENTS OF KINASE inhibitors. This IS A FOLLOw-on TO THE EARLIER SAMPL6 PK A Challenge WHICH INCLUDED THE SAME compounds. 72 The PARTITION COEffiCIENT DESCRIBES THE EQUILIBRIUM CONCENTRATION RATIO OF THE NEUTRAL STATE OF A SUBSTANCE BETWEEN TWO phases: [unionized solute]octanol log P = log10 Kow = log10 (1) [unionized solute]water 73 The LOG P CHALLENGE EXAMINES HOW WELL WE MODEL TRANSFER FREE ENERGY OF MOLECULES BETWEEN DIffERENT SOLVENT ENVIRONMENTS IN 74 THE ABSENCE OF ANY COMPLICATIONS COMING FROM PREDICTING PROTONATION STATES AND PK A values. Assessing LOG P PREDICTION ACCURACY 75 ALSO ALLOWS EVALUATING METHODS FOR MODELING PRotein-ligand AffiNITIES IN TERMS OF HOW WELL THEY CAPTURE SOLVATION Effects. 1.1 SAMPL Challenge History AND Motivation 76 77 The SAMPL BLIND CHALLENGES AIM TO FOCUS THE fiELD OF QUANTITATIVE BIOMOLECULAR MODELING ON MAJOR ISSUES THAT LIMIT THE ACCURACY 78 OF PRotein-ligand BINDING PRediction. Companion EXERCISES SUCH AS THE Drug Design Data ResourCE (D3R) BLIND CHALLENGES AIM 79 TO ASSESS THE CURRENT ACCURACY OF BIOMOLECULAR MODELING METHODS IN PREDICTING BOUND LIGAND POSES AND AffiNITIES ON REAL DRUG 80 DISCOVERY PROJECT data. D3R BLIND CHALLENGES SERVE AS AN ACCURATE BAROMETER FOR ACCURACY. HoWEver, DUE TO THE CONflATION OF 81 MULTIPLE ACCURacy-limiting PROBLEMS IN THESE COMPLEX TEST SYSTEMS IT IS DIffiCULT TO DERIVE CLEAR INSIGHTS INTO HOW TO MAKE FURTHER 82 PROGRESS TOWARDS BETTER ACCURACY. 2 OF 50 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.20.913178; this version posted January 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 83 Instead, SAMPL SEEKS TO ISOLATE AND FOCUS ATTENTION ON INDIVIDUAL ACCURacy-limiting issues. WE AIM TO fiELD BLIND CHALLENGES 84 JUST AT THE LIMIT OF TRACTABILITY IN ORDER TO IDENTIFY UNDERLYING SOURCES OF ERROR AND HELP OVERCOME THESE challenges. WORKING ON 85 SIMILAR MODEL SYSTEMS OR THE SAME TARGET WITH NEW BLINDED DATASETS IN MULTIPLE ITERATIONS OF PREDICTION CHALLENGES MAXIMIZE 86 OUR ABILITY TO LEARN FROM SUCCESSES AND FAILURes. Often, THESE CHALLENGES FOCUS ON PHYSICAL PROPERTIES OF HIGH RELEVANCE TO DRUG 87 DISCOVERY IN THEIR OWN right, SUCH AS PARTITION OR DISTRIBUTION COEffiCIENTS CRITICAL TO THE DEVELOPMENT OF potent, selective, AND 88 BIOAVAILABLE compounds. SAMPL5 logD Challenge + cyclohexane water SAMPL6 pKa Challenge SAMPL6 logP Challenge + octanol water water + FigurE 1. The DESIRE TO DECONVOLUTE THE DISTINCT SOURCES OF ERROR CONTRIBUTING TO THE LARGE ERRORS OBSERVED IN THE SAMPL5 LOG D CHALLENGE MOTIVATED THE SEPARATION OF PKA AND LOG P CHALLENGES IN SAMPL6. The SAMPL6 PKA AND LOG P CHALLENGES AIM TO EVALUATE PROTONATION STATE PREDICTIONS OF SMALL MOLECULES IN WATER AND TRANSFER FREE ENERGY PREDICTIONS BETWEEN TWO solvents, ISOLATING THESE PREDICTION PRoblems. 89 The PARTITION COEffiCIENT (log P) AND THE DISTRIBUTION COEffiCIENT (log D) ARE DRIVEN BY THE FREE ENERGY OF TRANSFER FROM AN 90 AQUEOUS TO A NONPOLAR phase.