Guidelines for the Analysis of Free Energy Calculations

Journal of Computer-Aided Molecular Design manuscript No. (will be inserted by the editor) Guidelines for the analysis of free energy calculations Pavel V. Klimovich, Michael R. Shirts, and David L. Mobley Received: date / Accepted: date Abstract Free energy calculations based on molecular dy- Python tool also handles output from multiple types of free namics (MD) simulations show considerable promise for ap- energy calculations, including expanded ensemble and Hamil- plications ranging from drug discovery to prediction of phys- tonian replica exchange, as well as standard fixed ensemble ical properties and structure-function studies. But these cal- calculations. We also survey a range of statistical and graph- culations are still difficult and tedious to analyze, and best ical ways of assessing the quality of the data and free energy practices for analysis are not well defined or propagated. Es- estimates, and provide prototypes of these in our tool. We sentially, each group analyzing these calculations needs to hope these tools and discussion will serve as a foundation decide how to conduct the analysis and, usually, develop its for more standardization of and agreement on best practices own analysis tools. Here, we review and recommend best for analysis of free energy calculations. practices for analysis yielding reliable free energies from molecular simulations. Additionally, we provide a Python Keywords hydration free energy · transfer free energy · tool, alchemical-analysis.py, freely available on free energy calculation · analysis tool · binding free energy · GitHub at https://github.com/choderalab/pymbar-examplesalchemical , that implements the analysis practices reviewed here for sev- eral reference simulation packages, which can be adapted to handle data from other packages. Both this review and the 1 Introduction tool covers analysis of alchemical calculations generally, including free energy estimates via both thermodynamic inte- 1.1 Free energy calculations assist drug discovery gration and free energy perturbation-based estimators. Our Complex chemical and biological systems pose a key chal- P. Klimovich · D. Mobley lenge for modern molecular and computational science. We Department of Pharmaceutical Sciences and Department of Chemistry seek computational models which can provide quantitative 147 Bison Modular predictions, not just qualitative insight. Researchers seek to University of California, Irvine answer questions such as “how much?”, “how big?”, “how Irvine, CA 92697 tight?” and so on, and increasingly apply physically-detailed Tel.: 949-824-6383 Fax: 949-824-2949 computation to help answer these questions. Models seek to E-mail: [email protected] mimic or simulate the processes in question, helping reveal M. R. Shirts and provide new understanding of mechanisms and phenom- Department of Chemical Engineering ena which might be challenging or impossible to probe ex- University of Virginia perimentally [20, 23, 38, 43]. Charlottesville, VA 22094 Free energy calculations [11, 7, 25, 9, 18] provide a good D. Mobley example of a computational technique which provides a quan- Department of Chemistry University of New Orleans titative answer to a specific question – in this case, “what is 2000 Lakeshore Drive the free energy difference between the two thermodynamic New Orleans, LA 70148 end states of the system?” This question arises, for example, in drug discovery [16] where drugs need to be ranked 2 Klimovich et al. 1 and 2. In such cases, we can instead compute free energy differences between a series of intermediate states which ∆Ghydration do have sufficient overlap, leading from state 1 to state 2. These intermediate states are typically artificial, unphysical 1 2 states constructed to link the physical states of interest, and form part of a thermodynamic cycle (see Fig. 1) linking the gas water two end states of interest. For most free energy calculations ∆Gahhihilation ∆Gannihilation relevant to binding, solvation, and solubility, this alternate, unphysical pathway involves effectively deleting and/or in- serting some atoms, while possibly also making parameter changes to those and other atoms. ∆G=0 1.3 The thermodynamic cycle depicts alternate paths 3 4 between the end states Fig. 1 The thermodynamic cycle for a standard hydration free energy calculation. Here, the blue background represents water and the The thermodynamic cycle for standard hydration free en- clear background represents gas. The goal is to find the free energy ergy calculations is comprised of the four legs joining the difference ∆Ghydration between the two states: end state 1 (upper left) representing the solute in the gas phase, and end state 2 (upper four states of interest (Fig. 1): (2) a molecule interacting right) depicting the solvated molecule. The ∆Ghydration is found with a box of water, (1) the same molecule present alone as a sum of the free energy changes between the end states 1 and 2 in the gas phase, (4) the molecule in water but not interact- and the intermediate alchemical states, intermediate state 3 (lower left) ing with the surrounding water, and (3) the non-interacting and intermediate state 4 (lower right), introduced along the alternative pathway 1 ! 3 ! 4 ! 2, which may include additional nonphysi- molecule again alone in the gas phase. We thus compute the cal states interpolating between the legs 1 ! 3 3 ! 4, and 4 ! 2. solvation free energy by modifying the solute molecule in The black-and-white appearance of the solute molecule, at bottom, in- each of its environments. To do so, we compute the free dicates the solute is in a state where it has no non-bonded interactions energy of turning off the solute’s non-bonded interactions with its environment, and possibly also no internal non-bonded interactions as well. These scenarios are referred to as decoupling and annihi- with its environment (called decoupling) or turning off both lation, respectively, as discussed in the text. In the case of decoupling, internal non-bonded interactions and interactions with the the transformation pathway reduces to the single leg 2 ! 4, and the environment (called annihilation) (Fig. 1)1. Specifically, we hydration free energy is the negative of the free energy for this leg. In compute the free energies associated with Figure 1, 1 ! 3 this case of annihilation, which also modifies internal solute interactions, both legs are necessary. and 2 ! 4, turning off the molecule’s interactions in gas phase and in water, respectively. Most commonly, both of these transformations are carried out by first scaling (usu- by, among other criteria, their binding affinity [9] to a tar- ally linearly) the solute charges to zero and then turning off get protein. To use free energy calculations to answer this the solute’s Lennard-Jones (LJ) interactions (usually via the question, one must first build a model of the system, and “soft-core” scheme [2]). These transformations are done in then identify the end states between which the free energy a series of steps by introducing a parameter λ which mod- difference is to be computed. ulates the potential energy of the system, so that as λ goes from 0 to 1 the potential energy transitions between that of the initial state and that of the target final state. Simulations 1.2 Free energy calculations begin with a definition of the are then run at a set of different λ values connecting the two end states states. In other words, each of the two transformation path- ways is subdivided into a variety of individual steps, where The thermodynamic end states (for example, Fig. 1 states 1 Annihilation and decoupling can be thought to differ primarily in 1 and 2) are the key starting point in free energy calcula- how they handle charge and Lennard-Jones parameters. Specifically, tions. In principle, free energy differences between the end annihilation involves actually setting solute partial charges to zero, states can be computed simply from simulations conducted while decoupling involves turning off charge interactions with envi- in one or both states [11]. But in practice, this is typically not ronment. Likewise, annihilation involves actually setting the Lennard- Jones parameters to zero, while decoupling involves turning off inter- possible for biomolecular systems on reasonable timescales. actions with environment. Our current explanation is specific to the To compute accurate free energy differences between states, more general case, annihilation, but in case of decoupling no gas trans- their phase space integrals must have sufficient overlap which formation 1 ! 3 is needed and the overall transformation reduces to the single leg 2 ! 4, i.e. the hydration free energy change is found as in practice is attainable only when both states are extremely water the negative of ∆Gdecoupling, with the possible exception of an anayt- similar. When this is not the case, it can be impossible to di- ical standard state correction depending on the experimental reference rectly compute the free energy difference between end states state employed. Free energy analysis 3 1.4 The interactions can be decoupled λCoulomb Changes in electrostatic interactions are often separated from changes in LJ interactions to avoid inaccuracy in the free en- 1.0 ergy estimate and sampling challenges. Specifically, if electrostatic interactions are retained while an atom’s LJ inter- 0.8 actions are being removed, the associated charge becomes more and more exposed, and can create huge electrostatic 0.6 forces leading to large and expensive-to-converge

Guidelines for the Analysis of Free Energy Calculations

BOSS, Version 4.6 Biochemical and Organic Simulation System User’S Manual for UNIX, Linux, and Windows

Macromodel Quick Start Guide

Solvation Free Energy of Protonated Lysine: Molecular Dynamics Study

Towards Accurate Free Energy Calculations in Ligand Protein-Binding Studies

Extending the Scope of Alchemical Perturbation Methods for Ligand Binding Free Energy Calculations

Computing Free Energies with PLUMED 2.0

Article Titlle Iitmomleicullrr Sitoullrtmns Frmo Ynnroitics Rny Oeicarnitsos Lm Icmopullrtmnrl Rssrns Mf Iitmlm⪪Iticrl Rictiitln

DICE: a Monte Carlo Code for Molecular Simulation Including the Conﬁgurational Bias Monte Carlo Method Henrique M

Absolute Free Energy of Binding Calculations for Macrophage

Absolute Binding Free Energy Calculations for Highly Flexible Protein MDM2 and Its Inhibitors

Free Energy Calculations in Action: Theory, Applications and Challenges of Solvation Free Energies

Free Energy Simulations of Complex Biological Systems at Constant Ph