Overview of the SAMPL5 Host–Guest Challenge: Are We Doing Better?

Overview of the SAMPL5 Host–Guest Challenge: Are We Doing Better?

J Comput Aided Mol Des DOI 10.1007/s10822-016-9974-4 Overview of the SAMPL5 host–guest challenge: Are we doing better? 1 1 1 2 Jian Yin • Niel M. Henriksen • David R. Slochower • Michael R. Shirts • 3 4 1 Michael W. Chiu • David L. Mobley • Michael K. Gilson Received: 24 June 2016 / Accepted: 14 September 2016 Ó Springer International Publishing Switzerland 2016 Abstract The ability to computationally predict protein- methods may be compared and ideas exchanged. The small molecule binding affinities with high accuracy would present paper provides an overview of the host–guest accelerate drug discovery and reduce its cost by eliminat- component of SAMPL5, which centers on three different ing rounds of trial-and-error synthesis and experimental hosts, two octa-acids and a glycoluril-based molecular clip, evaluation of candidate ligands. As academic and industrial and two different sets of guest molecules, in aqueous groups work toward this capability, there is an ongoing solution. A range of methods were applied, including need for datasets that can be used to rigorously test new electronic structure calculations with implicit solvent computational methods. Although protein–ligand data are models; methods that combine empirical force fields with clearly important for this purpose, their size and com- implicit solvent models; and explicit solvent free energy plexity make it difficult to obtain well-converged results simulations. The most reliable methods tend to fall in the and to troubleshoot computational methods. Host–guest latter class, consistent with results in prior SAMPL rounds, systems offer a valuable alternative class of test cases, as but the level of accuracy is still below that sought for they exemplify noncovalent molecular recognition but are reliable computer-aided drug design. Advances in force far smaller and simpler. As a consequence, host–guest field accuracy, modeling of protonation equilibria, elec- systems have been part of the prior two rounds of SAMPL tronic structure methods, and solvent models, hold promise prediction exercises, and they also figure in the present for future improvements. SAMPL5 round. In addition to being blinded, and thus avoiding biases that may arise in retrospective studies, the Keywords Host–guest Á Molecular recognition Á SAMPL challenges have the merit of focusing multiple Computer-aided drug design Á Blind challenge Á researchers on a common set of molecular systems, so that Binding affinity Electronic supplementary material The online version of this Introduction article (doi:10.1007/s10822-016-9974-4) contains supplementary material, which is available to authorized users. Structure-based computer-aided drug design (CADD) & Michael K. Gilson methodologies are widely used to assist in the discovery of [email protected] small molecule ligands for proteins of known three-di- 1 Skaggs School of Pharmacy and Pharmaceutical Sciences, mensional structure [1–3]. Docking and scoring methods University of California San Diego, La Jolla, CA 92093, USA can assist with qualitative hit identification and optimiza- 2 Department of Chemical and Biological Engineering, tion [4–6], and explicit solvent free energy methods [7–10] University of Colorado Boulder, Boulder, CO 80309, USA are beginning to show promise as an at least semi-quanti- 3 Qualcomm Institute, University of California, San Diego, tative tool to identify promising variants on a defined La Jolla, CA 92093, USA chemical scaffold [11–14]. However, despite numerous 4 Departments of Pharmaceutical Sciences and Chemistry, efforts to improve the reliability of CADD by going University of California Irvine, Irvine, CA 92697, USA beyond docking and scoring methods, ligand design still 123 J Comput Aided Mol Des includes a large component of experimental trial and error, predictions from seven research groups. Here, we provide and the reasons why CADD methods are often not pre- an overview of this challenge and the results. (Note that dictive are unclear. Although likely sources of substantial many participants also have provided individual papers on systematic error are well known—such as inaccuracy in the their host–guest predictions, most in this same special energy models used and uncertainty in protonation and issue, and that additional papers address the distribution tautomer states—it is difficult, and perhaps impossible, to coefficient challenge that also was part of SAMPL5.) The analyze systematic errors in any detail, because incomplete present paper is organized as follows. We first introduce conformational sampling of proteins adds large, ill-char- the design of the current SAMPL challenge, including acterized random error. descriptions of the host–guest systems and measurements, As a consequence, host–guest systems [15–25] are information on how the challenge was organized, and the finding increasing application as substitutes for protein– nature of the submissions. We then analyze the perfor- ligand systems in the evaluation of computational methods mance of the various computational methods, using a of predicting binding affinities [26–28]. A host is a com- number of different error metrics, and compare the results pound much smaller than a protein but still large enough to with each other and with those from prior SAMPL host– have a cavity or cleft into which a guest molecule can bind guest challenges. by non-covalent forces. Host–guest systems can be iden- tified that highlight various issues in protein–ligand bind- ing, including receptor flexibility, solvation, hydrogen Methods bonding, the hydrophobic effect, tautomerization and ion- ization. Because host molecules tend to be more rigid and Structures of Host–Guest Systems and Experimental always have far fewer degrees of freedom than proteins, Measurements random error due to inadequate or uncertain conforma- tional sampling can be dramatically reduced, allowing a The SAMPL5 host–guest challenge involves three host tight focus on other sources of error. Additionally, host– molecules, which were synthesized and studied in the lab- guest systems arguably represent a minimalist threshold oratories of Prof. Bruce Gibb and Prof. Lyle Isaacs, who test for methods of estimating binding affinities, as it is kindly allowed the experimental data to be included in the improbable that a method which does not work for such SAMPL5 challenge before being published. The first two simple systems could succeed for more complex proteins. hosts, OAH [33] and OAMe, from the Gibb laboratory, are Accordingly, host–guest systems have been included in also known as octa-acid (OA) and tetra-endo-methyl octa- rounds 3, 4 and now 5, of the Statistical Assessment of the acid (TEMOA) [34, 35]. The third, CBClip [36], was Modeling of Proteins and Ligands (SAMPL) project, a developed in the Isaacs laboratory. Representative 3D community-wide prediction challenge to evaluate compu- structures along with the 2D drawings of their respective tational methods related to CADD [29–32]. The SAMPL SAMPL5 guest molecules, are shown in Fig. 1. Host OAH project has traditionally posed challenges involving not was used in the SAMPL4 challenge [31], but with a different only binding affinities but also simpler physical properties, set of guests. One end of it has a wide opening to a bowl- such as hydration free energies of small molecules, and, in shaped binding site, while the other end has a narrow the present SAMPL5, distribution coefficients of drug-like opening that is too small to admit most guests. The bowl’s molecules between water and cyclohexane. Importantly, opening is rimmed by four carboxylic acids, and another four SAMPL is a blinded challenge, which means that the carboxylic groups extend into solution from the closed end. unpublished experimental measurements are withheld from The carboxylic groups were added to promote solubility and participants until the predictions have been made and are not thought to interact closely with any of the guests. submitted. This approach avoids the risk, in retrospective Host OAMe is identical to OAH, except for the addition of computational studies, of adjusting parameters or protocols four methyl groups to the aromatic rings at the rim of the to yield agreement with the known data, leading to results portal. The common guest molecules of OAH and OAMe, which appear promising but are not in fact reflective of OA-G1–OA-G6 were chosen based on chemical diversity, how the method will perform on new data. In addition, solubility, and an expectation that they would exhibit sig- SAMPL challenges facilitate comparisons among methods, nificant binding to these hosts. Host CBClip is an acyclic because all participants address the same problems, and the molecular clip that is chemically related to the cucurbiturils consistency of the procedures offers the possibility of used in previous SAMPL projects [30, 31]. It consists of two comparing results from one challenge to the next, in order glycoluril units, each with an aromatic sidewall, and four to at least begin to track the state of the art. sulfonate solubilizing groups. Ten molecules, CBC-G1– The most recent challenge, SAMPL5, included 22 host– CBC-G10, were chosen as guests of CBClip, with the aim of guest systems (Fig. 1), which attracted 54 sets of attaining a wide range of affinities. 123 J Comput Aided Mol Des Fig. 1 Structures of host OAH, OAMe, CBClip and their guest guest molecules for OAH and OAMe, and CBC-G1–CBC-G10 are molecules. OA and OAMe are also known as OA and TEMOA, guests for CBClip. Protonation states of all host and guest molecules respectively. All host molecules are shown in two perspectives. Silver shown in the figure were suggested by the organizers based on the carbon, Blue nitrogen, Red oxygen, Yellow sulfur. Non-polar hydro- expected pKas and the experimental pH values gen atoms were omitted for clarity. OA-G1–OA-G6 are the common The experimental binding data for all three sets of host– studied by ITC. The NMR experiments were carried out in guest systems are listed in Table 1.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    19 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us