<<

pubs.acs.org/JCTC Article

Implicit for the Polarizable Atomic Multipole AMOEBA Rae A. Corrigan,# Guowei Qi,# Andrew C. Thiel, Jack R. Lynn, Brandon D. Walker, Thomas L. Casavant, Louis Lagardere, Jean-Philip Piquemal, Jay W. Ponder, Pengyu Ren, and Michael J. Schnieders*

Cite This: J. Chem. Theory Comput. 2021, 17, 2323−2341 Read Online

ACCESS Metrics & More Article Recommendations *sı Supporting Information

ABSTRACT: Computational design, ab initio protein/ RNA folding, and protein− screening can be too computa- tionally demanding for explicit treatment of . For these applications, implicit solvent offers a compelling alternative, which we describe here for the polarizable atomic multipole AMOEBA force field based on three treatments of continuum : numerical to the nonlinear and linearized versions of the Poisson−Boltzmann equation (PBE), the domain-decomposition conductor-like screening model (ddCOSMO) approximation to the PBE, and the analytic generalized Kirkwood (GK) approx- imation. The continuum electrostatics models are combined with a nonpolar estimator based on novel cavitation and dispersion terms. Electrostatic model parameters are numerically optimized using a least-squares style target function based on a library of 103 small-molecule free energy differences. Mean signed errors for the adaptive Poisson−Boltzmann solver (APBS), ddCOSMO, and GK models are 0.05, 0.00, and 0.00 kcal/mol, respectively, while the mean unsigned errors are 0.70, 0.63, and 0.58 kcal/mol, respectively. Validation of the electrostatic response of the resulting implicit solvents, which are available in the Tinker (or Tinker-HP), OpenMM, and Force Field X software packages, is based on comparisons to explicit solvent simulations for a series of and nucleic acids. Overall, the emergence of performative implicit solvent models for polarizable force fields opens the door to their use for folding and design applications.

■ INTRODUCTION where ΔWcav is the unfavorable cost of forming a solute-shaped cavity within solvent, W is the favorable contribution of Solvation plays a key role in accurately portraying the natural Δ disp 1 including solute−solvent van der Waals interactions, and ΔWelec processes of molecules in vitro and in vivo. Hydrophilic and ff 2 captures the di erence between charging the molecule in solvent hydrophobic interactions govern and impact 13,15,16 3 4 and in vacuum environments (Figure 1). Collectively, molecular recognition. For these reasons, solvent must be cavitation and dispersion are termed the nonpolar contribu- considered during computational protein design and optimiza- 17−21 5−9 10,11 12 tion to solvation free energy, while the electrostatic term is tion, RNA folding, and biocatalyst design. While referred to as the polar contribution.4,22−27 For the latter, explicit solvent often provides a more complete depiction of previous widely used implicit solvents for biomolecules include solvation effects on molecular interactions, its use can become approaches based on numerical solutions to the Poisson− impractical for biomolecular folding and design applications. To Boltzmann equation (PBE)26,28−31 and the analytic generalized help alleviate this computational expense, implicit solvation Born (GB) approximation.32−40 The majority are built upon models have been developed.13,14 fixed partial charge force fields that maintain constant dipole Implicit solvents are designed to replicate explicit solvent moments across vacuum and aqueous environments. On the while treating water as a continuum to avoid the cost of other hand, the family of implicit solvents described here is calculating the interactions of thousands of individual water parameterized for use with the polarizable atomic multipole molecules. The total implicit solvent potential of mean force ΔWhydration(X) as a function of atomic coordinates X can be Received: December 11, 2020 formulated as a sum of cavitation, dispersion, and electrostatic Published: March 26, 2021 contributions

ΔWWWWhydration()XX=Δcav ()+Δdisp () X+Δ elec () X (1)

© 2021 American Chemical Society https://doi.org/10.1021/acs.jctc.0c01286 2323 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

fi Figure 1. Total implicit solvent potential of mean force ΔWhydration(X) can be formulated using a thermodynamic cycle composed of ve steps. Step 1: vacuum the solute is decharged in vacuum −Uelec (X). Step 2: dispersion interactions are removed between the solute and the surrounding medium, which has no energetic cost in vacuum. Step 3: a solute-shaped cavity is formed in water ΔWcav(X), which is unfavorable and proportional to the solvent- excluded volume for small solutes. Step 4: favorable solute−solvent dispersion interactions are added ΔWdisp(X). Step 5: the solute is charged in water water vacuum solvent Uelec (X) to yield an overall electrostatic contribution of ΔWelec(X)=Uelec (X) − Uelec (X).

AMOEBA force field.41,42 These models combine intra- AMOEBA force field. We describe a nonpolar estimator molecular solute polarization with the electrostatic response of consisting of novel cavitation and dispersion terms, which is the continuum via a self-consistent reaction field combined with electrostatic contributions based on solving the (SCRF) that leverages numerical solutions to the PBE43,44 or the PBE numerically with the adaptive Poisson−Boltzmann solver much faster analytic generalized Kirkwood (GK) approxima- (APBS),43 the domain-decomposition COSMO (ddCOSMO) tion45 (GK extends GB to polarizable multipoles using work by approach,44 and the analytic generalized Kirkwood (GK) Kirkwood46). In principle, polarizable biomolecular charge model.45 Model parameters are fit to experimental solvation distributions (i.e., induced dipoles for the AMOEBA model) are free energy differences for a set of 103 small molecules. The then able to respond to both low-dielectric (e.g., benzene or resulting implicit solvent hydration free energy differences are tetrachloride) and high-dielectric (e.g., methanol or compared to those obtained previously using explicit solvent water) environments. AMOEBA free energy simulations. Furthermore, the electro- Efforts to combine polarizable biomolecular force fields with static response of the resulting models is validated for a series of implicit solvents began in the ∼2000s with the introduction of proteins and nucleic acids in continuum water compared to both the polarizable force field (PFF) and its initial application to explicit solvent AMOEBA simulations and to widely used fixed- 47 protein−ligand interactions. The PFF defines solute electro- charge force fields. Finally, the relative computational speed of statics using permanent atomic multipoles (through dipole the models is compared. order) and induced dipoles, while the PBE was solved using a finite element mesh.48 A more recent example combined a 49,50 ■ METHODS Drude oscillator force field with numerical solutions of the 51 52 AMOEBA Parameterization Using PolType2. PBE. Application of this model to pKa prediction showed The superior accuracy relative to the additive CHARMM36 force PolType2 protocol was used to generate AMOEBA small- field,53,54 although it increased the computational cost. A second molecule parameters, beginning from an initial optimization at recent example combined the bond capacity (BC) polarization the MP2/6-31G* level of theory. Ab initio quantum mechanics model with both the generalized Born (GB) model and the (QM) calculations were performed using Gaussian 09. All conductor-like polarizable continuum model (C-PCM).55 For (MM) force field-based calculations the BC−GB model, NVE (MD) was shown needed for parameterization were performed using the Tinker 69 to conserve energy at a modest increase in cost of only 15% 8 Software. Valence parameters were taken from the small- relative to vacuum.56 Cooper et al. have developed an molecule parameter database in PolType2. Atomic multipole electrostatics solver called PyGBe, which employs a tree-code moments were initially assigned from the QM electron density accelerated boundary-element formulation. PyGBe specifically calculated at the MP2/6-311G** level via Stone’s distributed addresses the multisurface problems common to boundary- multipole analysis.70 Further optimization of permanent multi- element method (BEM) solvers and is able to achieve directly poles was performed using the Tinker Potential program to fit comparable accuracy to the adaptive Poisson−Boltzmann solver the electrostatic potential around each molecule to a QM (APBS) with increased speed.57,58 Although beyond our focus electron density at the MP2/6-311++G(2d,2p) level. All small- on implicit solvents for biomolecular polarizable force fields, molecule AMOEBA parameter files (Tinker “prm” format) are there is a large body of work dedicated to quantum mechanical available as Supporting Information. SCRF implicit solvents,59−61 including the polarizable con- Small-Molecule Data Set. A test set of 103 small molecules tinuum model (PCM),62,63 the (SM) series,64−66 was used to parameterize the implicit solvent models. The and conductor-like screening models (COSMOs).67,68 experimental solvation free energy differences for neutral Here, we describe the theory, implementation, and para- compounds were taken from the FreeSolv Database, version metrization of implicit solvents compatible with the polarizable 0.51,71,72 unless otherwise indicated.73,74 Experimental hydra-

2324 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

tion free energy di erences for charged compounds (ΔGsolv) were calculated using eq 2

ion neutral basicity H+ ΔGGsolv =Δsolv ±Δ Ggas ±Δ Gsolv

∓×2.303RT p Ka (2) where the upper signs are used for cations, and the lower signs neutral ff for anions. ΔGsolv is the solvation free energy di erence of the basicity neutral molecule, ΔGgas is the gas-phase basicity from 75,76 H+ ff NIST, ΔGsolv is the solvation free energy di erence of a proton, R is the universal gas constant in kcal/mol, T is the in Kelvin, and pKa is the negative decimal logarithm of the acid dissociation constant from Stewart.77 For phosphate and guanidinium compounds, experimental values for the neutral solvation free energy difference and/or gas-phase basicity were not available. Due to their importance in fitting electrostatic implicit solvent parameters for proteins (i.e., arginine) and nucleic acids (i.e., the phosphate backbone), Figure 2. Total dipole moments for the parameterization set of 103 target solvation free energy differences for these compounds molecules from QM using MP2/6-311G**(2d,2p) are compared to were calculated from AMOEBA explicit solvent solvation free those of the resulting AMOEBA models (red dashed line: Y = 0.9986·X energy simulations. + 0.0011; R2 = 0.9999). The value for the solvation free energy difference of the proton used here (−254.22 kcal/mol) was calculated as the sum For spherical solutes with a radius below this threshold, water of potassium ion solvation free energy (−74.32 kcal/mol78) and + + molecules are generally able to form a network the experimental free energy of transfer K → H (−179.90 surrounding the solute that maintains a complement of kcal/mol79). Using an intrinsic value for proton solvation, we fi hydrogen bonds similar to bulk water (i.e., ∼4 per water). As avoid implicitly tting to the interface potential (i.e., the the solute size increases toward mimicking a flat −vacuum potential created by preferential orientation of water molecules interface, each water molecule on average sacrifices a single at the vacuum−liquid interface; see Figure S1, Supporting hydrogen bond (i.e., one hydrogen from each water molecule is Information), while the preferential orientation of water around directed toward vacuum giving rise to a phase potential). For uncharged solute cavities (i.e., the cavity potential) is included. solutes with varying shapes, such as biomolecules, the cavitation The potential energy contribution due to a charged molecule cost is neither proportional to the volume nor the surface area crossing the vacuum−liquid interface can be added to implicit but rather some local mixture of the two regimes. For example, solvent free energy differences in the same manner as for 80 the cost to form a cavity scales more with volume character for periodic explicit solvent simulations. By excluding the an extended chain than for a compact spherical conformation, interface potential from implicit solvent parameterization, where both conformations have equal surface areas. One can ensemble averages, conformational distributions, and free imagine protein conformations with both extended loops and energy differences from implicit solvent and periodic boundary 78,81 large compact regions, suggesting that cavitation terms that do explicit solvent simulations are directly comparable. not consider local conformation are clearly an approximation. It Of the molecules tested, 91 were neutral and 12 were charged. is beyond the scope of the current work to develop a general Charged compounds were chosen based on their chemical functional form for the cavitation free energy of a biomolecular similarity to charged groups in biomolecules. All starting solute of arbitrary size and shape, although approaches that structures were obtained from PubChem and parameterized ff 81 adjust the e ective surface tension (ST) based on local curvature for AMOEBA using PolType2 as described above, then are promising.85 Fortunately, as small-molecule cavitation free energy-minimized in vacuum. We see close agreement between energies are in the volume-scaling regime, the magnitude of the the vacuum dipole moments from single point MP2/6- cavitation term for the AMOEBA implicit solvents is not ** 311G (2d,2p) calculations and those from the AMOEBA expected to change for the small-molecule parameterization small-molecule parameters, shown in Figure 2. discussed here, even if an improved cavitation model for larger Cavitation Free Energy. The Lum−Chandler−Weeks biomolecules is defined in the future. theory of hydrophobicity predicts contrasting behavior for the ff 1,82−84 An e ective radius for a nonspherical solute conformation X cavitation free energy of small and large solutes. At all can be determined from the calculation of either the solvent- length scales, the driving force for phase separation is excluded volume SEV(X) proportional to the solute volume, while the cost to form an 3 interface is proportional to the surface area. These competing rSEV()XX= 3× SEV()/4π (4) factors are exhibited in a cross-over in the dependence of the cavitation free energy change between volume scaling for small or the solvent-accessible surface area SASA(X) solutes and surface area scaling for large solutes, which, for a r ()XX SASA()/4 spherical cavity, occurs at a radius of approximately 10 Å. SASA = π (5) using algorithms developed by Connolly86−88 and implemented 69 89 volume,r ≤∼ 10 Å in Tinker and Force Field X. As first described by Richards, ΔWrcav()∝ SEV and SASA are defined by rolling a spherical probe (i.e., with surface area,r >∼ 10 Å (3) a radius of 1.4 Å to approximate water) around the surface of a l o 2325 https://doi.org/10.1021/acs.jctc.0c01286 m J. Chem. Theory Comput. 2021, 17, 2323−2341 o n Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article molecule.90 The cavitation free energy of an (approximately) 32 2 ceeebbd0 =(510)/−+ spherical solute can then be described by a piecewise continuous 22 function of its effective radius cebd1 =−30 / cebebd=+30 ( )/ λχ×≤SEV(X ), r 2 ΔWrsphere()= ceebbd=−10(22+ 4 + )/ γχ×SASA(X ), r > (6) 3 l cebd4 =+15( )/ o where in the volume-scalingmo regime, cavitation free energy is defined by the producto of SEV and solvent pressure (SP) cd5 =−6/ n 3 5 denoted by λ (kcal/(mol Å )); in the surface-area-scaling deb=()− (9) regime, cavitation free energy is defined by the product of SASA and surface tension (ST) denoted by γ (kcal/(mol Å2)). For our The surface area switch in this symmetric case is model, SP was assessed using two explicit solvent simulation ssa()rsr= 1− v () (10) approaches that are in general agreement. The first approach assumes that the SEV and SASA cavitation free energies are The behavior of the cavitation free energy using a symmetric switch showed a modest peak at the cross-over point, which is equal at the cross-over point χ for a spherical solute, which yields removed by shifting the center of the switching region for the SA the relationship λ = (3·γ)/χ. This defines SP to be 0.031 kcal/ ff ff 3 term to larger e ective radius values by a small o set o = 0.2 Å. (mol Å ), using the experimental surface tension of water (0.103 This gives the final functional form used here kcal/(mol Å2)) and an approximate cross-over point of 10 Å from fixed-charge simulations.1 The second approach leverages λ ×≤SEV(X ), rb 91 explicit solvent free energy perturbation simulations using the ΔWcav()X =λ ××SEV(X )srv ( ), b < r≤ e AMOEBA water model92 and 39 AMOEBA small molecules as l0, er 41 o < described elsewhere, which resulted in a mean SP of 0.0334 o 3 o 0, rbo≤+ kcal/(mol Å ). Using the relationship between the experimental om o surface tension of water and the latter SP, the volume-to-surface- o γ ××−SASA(X )sr ( o ), b+< o r≤ e+ o +no sa area cross-over radius is 9.251 Å. Both SP estimates are within olγ ×SASA(X ), eor+< 0.003 kcal/(mol Å3) of each other, and both define cross-over o o (11) ff mo radii that di er by less than 0.75 Å. For the current model, the o 3 where the surfaceo area switch is now slightly offset from the latter SP of 0.0334 kcal/(mol Å ) and the cross-over radius of o volume switch. Then smooth behavior of W (X) as a function 9.251 Å were chosen due to their consistency with the Δ cav of the effective radius is shown in Figure 3. In this work, the AMOEBA model. ff fi e ective radius of the molecule is determined using SASA (eq The simple de nition in eq 6 for the transition between the 5), rather than SEV (eq 4), due to the former being faster to volume-scaling and surface-area scaling regimes is not useful for compute for large solutes (i.e., SEV does not need to be molecular dynamic simulations or optimization algorithms computed for large biomolecules). The gradient of SEV and because it lacks continuous first and second derivatives. To SASA with respect to atomic coordinates has been described address this, it is possible to introduce a simple multiplicative previously.93−95 ff switch sv(r) to smoothly turn o the volume term and a second Dispersion Free Energy. The pairwise dispersion energy ff 97 switch ssa(r) to smoothly turn on the surface area term. Each for the AMOEBA model is given by a bu ered 14-7 potential switch acts over a window of length w = 7 Å centered on the 7 1.07 1.12 7 cross-over point χ, such that the switch begins at b = χ − w/2 and r0,ij r0,ij U14− 7()rij= ε ij 7 7 − 2 ends at e = χ + w/2 to give rrij+ 0.07 0, ij rrij+ 0.12 0, ij (12)

where rij is the separationji distancezy ji between atoms i andzy j, εij is λ ×≤SEV(X ), rb j z j z the well depth, andj r0,ij is thez minimumj energy separationz 41 j zfij z λ ××SEV(X )srv ( ) bre<≤ distance. This cank be used to de{ kne a purely repulsive{ Weeks− ΔWrswitch()= Chandler−Andersen (WCA) potential98,99 as ol +γ ×SASA(X ) ×srsa ( ), o o oγ ×SASA(X ), er< Ur14− 7()ij+<ε ij , rr ij 0, ij o m Urep()rij = o (7) 0, rr≥ o ij0, ij (13) o fi The volume-scalingo switch sv(r) is a fth-order polynomial ol ffi no which is showno in Figure 4. whose six coe cients are uniquely determined by constraining Work by Gallicchio,om Kubo, and Levy (GKL) demonstrated o its value at b to sv(b) = 1 and at e to sv(e) = 0, as well as that the freen energy of adding dispersion interactions to the constraining first and second derivatives at b and e to be zero. WCA repulsive potential, thereby restoring full van der Waals This gives interactions between solute and solvent, is nearly equal to the change in the solute−solvent for a series of small 2 3 4 5 100 sv()rccrcrcrcrcr=+01 + 2 +3 +4 +5 (8) studied using free energy perturbation (FEP) where ΔWUdisp≈⟨ 14− 7 ⟩−⟨ U rep⟩ (14)

2326 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

3 where ρw is the number density of water (0.033428 per Å ), εiw and σiw are the well depth and sigma value of the interaction of atom i with the TIP3P , respectively, n is the number 37,73 ff of solute atoms, and Ri is the Born radius. In e ect, the term acts like a tail correction, assuming solvent to be a continuum outside the solute and integrating the 1/r6 attractive portion of a 6-12 Lennard-Jones potential. In the limit of a spherical solute, the use of the Born radii in eq 15 is exact; however, for other geometries, it is an approximation. The goal for the dispersion free energy model is to build on the insights described above by removing the Born radii from the GKL model given in eq 15 and instead integrating the true WCA attractive potential (Figure 4) outside of the solute cavity for each atom.

−εij, rr ij<0, ij UWCA()rUrUrij= 14− 7 () ij− rep () ij = Urrr14− 7(),ij ij≥ 0, ij Figure 3. Surface tension is not constant for small solutes but increases ol (16) approximately linearly until the effective radius of the solute increases to o fl We present an analytic approach basedom on the Hawkins− beyond ∼10 Å. For large ( at) solutes, surface tension asymptotes o toward the experimental value for a water-vapor interface of 0.103 kcal/ Cramer−Truhlar (HCT) pairwise integrationn method also used (mol Å2).96 This length scale dependence can be approximately for GK.34,101 Due to the use of the buffered 14-7 potential by captured by a cavitation free energy difference that switches between AMOEBA, the underlying pairwise integration machinery needs ff using SEV and SASA via either a simple, nondi erentiable form to account for the constant portion of the WCA potential for r < (ΔWsphere(r) given by eq 6, black dashed line) or the smooth form used r0,io (where in this case r0,io is the minimum energy separation for in this work (ΔWcav(r) given by eq 11, green line). The asymptotic solute atom i with an AMOEBA water ) and integrate surface tension of ΔWcav(r) can be reduced relative to the experimental 7 14 value (e.g., to 0.08 kcal/(mol Å2), dotted blue line) to capture cavitation both 1/r and 1/r for r > r0,io. The general analytic form for the for solutes with a large effective radius but which are more highly curved dispersion free energy, ΔWdisp(X), of a solute with coordinates X than a simple sphere (e.g., a DNA double helix, RNA molecule, or is given by protein). n ∞ ππ2 ΔWUrdisp()X =ρw ∑ WCA () ∫∫∫R 00 i=1 0 × Sr(,θϕ , ,XR , )sin θ r2 d ϕ d θ dr (17) where the solvent indicator function S is unity if the point (r,θ,ϕ) is located within the solvent but zero otherwise, ρw is the number density of water, and R are the AMOEBA minimum energy separation distance values (r0,ij) for each atom. The radial integral for atom i with continuum water oxygen begins from half their combined r0,ij (in this case, r0,io for atom i with water oxygen) value plus an offset (d = 1.056 Å), which is one of two free parameters in the model. The beginning of the radial fi integral is de ned as R0 = r0,ij/2 + d. The second free parameter is a scale factor (s = 0.75) that accounts for the overlapping volumes of neighboring atoms during evaluation of the dispersion integral over solute atoms. Both parameters, which fi ff appear below in eq 18 for ΔWdisp(X), were t against dispersion Figure 4. Pairwise bu ered 14-7 potential (U14‑7) can be decomposed (eq 18) measured from explicit solvent simulations as into purely repulsive (Urep, eq 13) and attractive (UWCA, eq 16) described in the Supporting Information (Table S1). For a single contributions, which are plotted for the AMOEBA water oxygen atom water oxygen atom, the behavior of W is shown in Figure (r = 3.405 Å, ε = 0.11 kcal/mol). The cavitation free energy (ΔW ) Δ disp 0,ij ij cav 5A, and the dispersion interactions of explicit water atoms represents the process of growing in the repulsive potential (Urep) for solute atoms. Subsequently, the dispersion free energy (ΔWdisp) models (oxygen and hydrogen) with continuum water oxygen and the process of adding the attractive UWCA solute−solvent interactions to hydrogen are shown in Figure 5B,C. recover the full U14‑7 potential in the context of an uncharged solute. After performing the two angular integrals in eq 16, inverting the integration domain and applying the HCT pairwise 34 This led to their suggestion of a dispersion free energy estimator approximation gives based on Born radii, such that the dispersion free energy of the solute is ΔWdisp()X n U n 6 * 2 −16πρ ε σ =ρπw ∑∑Udtail,water()− 4 U WCA,water () rHrrrr (,ij , ρj ) d w iw iw ∫L ΔW = i=1 ji≠ GKL ∑ 3 Ä É 3Ri (15) Å Ñ i=1 Å Ñ(18) Å Ñ Å Ñ Å Ñ 2327 ÇÅ https://doi.org/10.1021/acs.jctc.0c01286ÖÑ J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

ff Figure 5. (A) Dispersion free energy di erences (ΔWdisp, solid black curve) are given by the integral of the attractive WCA potential (UWCA, dotted blue curve) over solvent for the interaction of two AMOEBA water oxygen atoms. The derivative of ΔWdisp with respect to R0 (dashed green curve) shows a maximum slightly beyond the minimum energy separation distance (vertical red line) due to the volume element 4πr2 dr increasing more quickly than UWCA approaches zero just beyond r0,ij. (B) Dispersion interactions of an explicit water oxygen atom with continuum water oxygen (also plotted in (A)) and hydrogen. The interaction of oxygen with continuum hydrogen is multiplied by 2 (two hydrogen atoms per water molecule). (C) Same as (B) but for two explicit water hydrogen atoms interacting with continuum water.

∞ where H is the fraction of the area of the current spherical * 2 Udtail(,εεπ , r 0,ij )= ∫ UWCA (, r , r 0,ij )4 rr d integration shell of radius r that is covered by atom j located a Rr00,=+ij/2 d distance rij from atom i and whose radius is scaled to ρj = sRj, and 34 4 3 3 18 3 H is given by (eq 12 in Hawkins et al. ) −−−πε()rR0,ij0 πε io rRr0, ij, 00, < ij 3 11 22 2 = 7 rr+−ρ l 7 4r0,ij 2 1 1 ij j o r , Rr H(,rr ,ρ )=− oπε 0,ij 11 −≥4 00,ij ij j o 11R0 R0 2 4 rrij (19) o Ä É om i y (21) Å Ñ o j z Å Ñ o j z The integrated WCA potentialÅ U* (r )Ñ uses a simplified form o j z Å WCA ij Ñ and the total tail correctiono j for the interactionz of atom i with of the buffered 14-7 for theÅ interactionÑ of solute atoms with n k { Å Ñ water is then given by water (i.e., the buffering constantsÇÅ are setÖÑ to zero)

Utail,water()dUdr=+ tail (,εεio , 0, io ) 2× Udr tail (, ih , 0, ih ) −ε, rrij<0, ij (22) 7 UWCA* (,rrijε , 0, ij )= r0,ij 7 2 where the well depths (εio, εih) and minimum energy distances εr0,ij −≥, rrij0, ij ol r14 r7 (r ,r ) are based on the AMOEBA mixing rules for atom i o ij ij (20) 0,io 0,ih o with the AMOEBA water model.92 The final piece to this model o i y ff om j z ff is the to the integral in eq 18, which uses integration Fortunately, the di erenceo betweenj thisz (unbu ered) UWCA* potential and the bufferedo 14-7j form is negligiblez for separations bounds shown in Figure 6. o k { greater than the minimumn energy distance. The analytic tail If integration of the WCA dispersion begins inside the correction based on eq 20 is given by minimum energy distance R0 < r0,ij, then a contribution of

2328 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

Figure 6. Illustration of the integration limits for the dispersion free energy based on solute van der Waals parameters and separation distance. (A) The minimum separation distance r0,ij for atoms i and j is based on AMOEBA mixing rules and used to determine the beginning of the WCA dispersion integral R0. When R0 < r0,ij, integration for the constant portion of the WCA potential begins at either R0 or rij − ρj, whichever is larger, and ends at r0,ij or rij + ρi, whichever is smaller. If R0 > r0,ij, then only the variable portion of the WCA potential factors into the dispersion free energy. (B) If atom j is fi completely engulfed by the sphere de ned by R0, no solvent is blocked and no dispersion energy must be removed. (C) When the two atoms overlap or are close together such that rij − ρj < R0, integration of the attractive WCA potential begins at R0 and ends at rij + ρj (the furthest edge of atom j). (D) When atom j is outside the beginning of the integration, integration of the variable portion of the WCA potential begins at rij − ρj (the closest edge of atom j) and ends at rij + ρj.

U 2 for all r in a domain Ω, where ε(r) is the dielectric constant, ϕ(r) Iε(,,)ULεε=× Hrr (,,)ij ρj r d r κ2 fi ∫L is the electrostatic potential, ̅(r) is the modi ed Debye− 22 2 2 U Hückel screening factor, and ρ(r) is the solute charge density. =[ερrr(3 − 8 rrrij+ 6 ij − 6j )/48 rij] L For polarizable force fields, the solute charge density ρ(r) (23) responds to the reaction field of the solvent, and thus eq 26 is solved repeatedly during iterations of an SCRF solver (e.g., is included. The lower limit L is R0 or rij − ρj, whichever is 92 greater. The upper limit U of this integral is r or r +ρ , Jacobi over-relaxation (SOR), conjugate gradient (CG) 0,ij ij j methods,102,103 the Jacobi algorithm coupled to direct inversion whichever is smaller. If rij+ρj is greater than r0,ij, the integration of 102 the repulsive contribution outside r is given by in the iterative subspace (JI/DIIS), and an optimized 0,ij perturbation theory (OPT) method104). This work compares 14 U εr0,ij three distinct continuum electrostatics models: numerical I (,LU , , r )Hr (, r , )d r 14ε 0,ij = ∫ 12 ij ρj solutions to both the NPBE and LPBE using the adaptive L r Poisson−Boltzmann solver (APBS),31 a domain-decomposition 14 2 2 2 12 U =[ερrrrrr0,ij(120− ij++ 66 55 ij − 55)/2640 rrij] L solution of the conductor-like screening model (ddCOS- j 44,105−107 (24) MO), and the analytic generalized Kirkwood (GK) theory.45 and the attractive contribution by Adaptive Poisson−Boltzmann Solver. APBS determines 7 the solution to the PBE using parallelized finite difference U r ε 0,ij multigrid and finite element algebraic multigrid numerical I70,(,LU ,ε , rij )=− 25 Hr (, rij ,ρj )d r ∫L r methods. Finite difference methods subdivide the domain in 72225U which the PBE is to be solved using Taylor expansions to model =−2(15106ερrrrrr[−++− 6)/120 rr] 0,ij ij ij j ij L the differential operators in each subdomain as difference (25) matrices and solving them via linear algebra techniques. The fi where the upper limit is always rij + ρj. As before, the lower limit nal algebraic equations obtained by this discretization can be L is R0 or rij − ρj, whichever is greater, unless this result is inside solved via a multilevel solver: iteration is used to reach solutions the minimum energy distance r0,ij. In this case, a contribution up at varying resolutions, where long-range errors in the iterations to r0,ij has already been included from eq 23 and L takes the value are allowed to converge on coarser grid spacings before using a fi fi fi r0,ij. The distances and parameters used to de ne integration ner grid for the nal solution. Though it provides one of the limits for the WCA potential are shown in Figure 6. more accurate numerical solutions to the PBE, APBS can Electrostatic Free Energy. The continuum electrostatics become computationally expensive for larger domains and finer contribution to solvation free energy of a small molecule can be final multigrid spacings. The combination of Tinker108 with determined by solving the nonlinear PBE (NPBE) or the APBS to support the AMOEBA force field has been described linearized PBE (LPBE) form shown here previously,43 including convergence of the SCRF and

2 calculation of atomic forces for the LPBE. APBS was run in ∇·[εϕ()rr∇ ()]− κϕ̅ () rr ()=− 4 πρ () r (26) Tinker using a grid spacing of 1293 and a probe of radius 0.0 Å to

2329 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

Figure 7. (A) Fit of GK self-energies to perfect PB self-energies (Y = 1.040X + 0.048, R2 = 0.996). (B) Fit of GK cross-term energies to perfect PB cross- term energies (Y = 1.059X − 0.089, R2 = 0.994). Both self and aggregate cross-term energies are reported for 1424 atoms. fi de ne a van der Waals solute cavity. The average grid length for −ΔWr() = 0, r ∈ Ω the small-molecule test set was 0.105 Å. APBS was run using multiple Debye−Huckle boundary conditions, a water dielectric Ws()=−Φ (), s s ∈Γ (30) constant of 78.3, and a solute dielectric constant of 1.0. The 26,109,110 wherel is the solute’s electrostatic potential in vacuum and is APBS parallel multigrid (PMG) solver was used for all o Φ Γ them boundary of the cavity. ddCOSMO uses Schwarz’s domain- calculations, as it is currently the only solver within APBS o decompositionn method to solve this boundary value problem by available for use with AMOEBA via Tinker. fi Domain-Decomposition Conductor-Like Screening splitting it into a series of smaller problems, each de ned on a 44,105−107 single spherical domain. This decomposition allowed for the Model. The ddCOSMO electrostatics model treats ddCOSMO implementation for AMOEBA (including con- the solvent as an infinite conductor surrounding a solute-shaped vergence of the SCRF and calculation of atomic forces44) to be cavity Ω, determined by a union of spheres (one sphere per atom 114 parallelized and is available in the Tinker-HP package, which of solute) is part of the Tinker 8 distribution. M Generalized Kirkwood. GK is an analytic approximation to Ω =∪Ωjjj(,)Rr the PBE that simplifies to the generalized Born (GB) model in j 1 (27) = the absence of permanent multipoles and induced dipoles (e.g., for fixed partial charge force fields). The GB electrostatic The electrostatic interactions are calculated by integrating the 115 (0) charge density ρ of the solute molecule multiplied by the energy (equivalent to the GK monopole term GGK) is given reaction potential W of the conductor over the molecular cavity by

1 11 qqij 1 ΔGG==(0) − − Efs = ()ερ () rWrr ()d GB GK ∑ 2 ∫Ω (28) 2 εεhsij, fGB (31) ε ε where f(ε) is a scaling factor used to adjust for the nonconductor where s is the permittivityji of the solvent,zy h is the permittivity of j z nature of the solvent based on its dielectric constant ε. The a homogeneous referencej state, qi andz qj are partial charges, and scaling factor f(ε) is defined as the empirical generalizingk function{ f GB is given by 2 ε − 1 f =+raaf f ()ε = GB ij i j ij (32) (29) ε + x ff where rij is the distance between sites i and j,e ective “Born 111 116−118 where x is an empirical constant. In the original derivation of radii” ai and aj are given by an integral over solvent, COSMO, x was set to 0.5. This value was later updated to use x = 1/3 112,113 131 0.5 for neutral molecules and x = 0 for ionic molecules. dV = ∫ 6 The value of x used here was 0 in all cases, such that ai 4π ex r (33) parameterized electrostatic radii described below are implicitly and f is given by optimized for transferability between small molecules and ij i y j 2 z charged biomolecules. The value of W in eq 28 is obtained from j−rcaa/ z f = e ijGK i j the solution of the following boundary value problem ij k { (34)

2330 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

ff Figure 8. Comparison of experimental and APBS solvation free energy di erences using either AMOEBA van der Waals Rmin radii to describe the fi 2 solute−solvent boundary (A) or using t radii is shown (B). When using AMOEBA Rmin radii, the linear regression gave Y = 0.9683·X + 0.3215 with R = 0.9702. When using fit radii, the linear regression gave Y = 1.0080·X + 0.1416 with R2 = 0.9979. where cGK is a tuning parameter that typically ranges from 2 to 4. Testing with generalizing constants (cGK) between 2 and 4 GK extends GB methods to polarizable atomic multipole charge showed little change in the MSE between PB and GK cross-term distributions using Kirkwood’s analytic solution to the electro- energies. The final value of 2.455 was chosen for consistency static component of solvation free energy for an arbitrary (i.e., with prior work45 and gave an MSE of 0.07, MUE of 0.35, and multipolar) charge distribution.46 For example, the GK RMSE of 1.17 (Figure 7B). In both cases, GK energies are interaction between two permanent dipoles is expressed as strongly correlated with PB energies, with R2 values of 0.996 and 0.994, respectively. 3(1)rr f 1 1 2(εε− ) αβ − ij δαβ Target Function for Electrostatic Radii Optimization. G(1) = hs uu + GK ∑ ij,,αβ 5 3 Electrostatic radii for 41 atom types were fit using the following 2 εh 2εεsh+ ij, ffGB GB Ä É target function ÅÄ ÑÉ Å Ñ (35) Å Ñ Å Ñ Å Ñ Å Ñ n where u andÅ u are permanentÑ dipoleÅ vectors and the subscriptsÑ i Å j Ñ Å Ñ Expt Model 2 α and β indicateÇÅ the use ofÖÑ the EinsteinÇÅ summation convention.ÖÑ EW()PP=MUE ∑ (Δ−Δ Gii G ()) GK interaction tensors up to the quadrupole−quadrupole order, i=1 n 2 as well as their inclusion in the AMOEBA SCRF calculation and n calculation of atomic forces, were described previously.45 +WGΔ−ΔExpt GModel()P MSE ∑∑i i=1 i Perfect self and cross-term energies were calculated with APBS i=1 N for each of the 1424 atoms in our set of 103 test molecules. The i radii y results were used to fit a unitless generalizing constant in the j Model vdW 2 z +WRRRegularizationj ∑ ()ii−z cross-term (c = 2.455) and a unitless scale factor (c = 0.72) j 1 z (36) GK HCT k i= { that avoids overestimation of descreening due to atomic fi overlaps when computing Born radii.119 An initial fit based on where the rst term favors minimizing the unsigned error small-molecule self-energies produced a scale factor of 0.77; between experimental and model solvation free energy over n however, testing with larger macromolecules led to excessive molecules, the second term favors minimizing the overall signed error, and the final term penalizes electrostatic radii that deviate descreening. For this reason, the scale factor was reduced to fi fi 0.72. PB self and cross-term energies were calculated using from the AMOEBA force eld de nition of minimum energy Tinker with an APBS grid spacing of 1293 and a van der Waals van der Waals separation (Rmin). The optimization was definition of the solute cavity using AMOEBA force field radii. performed using an L-BFGS minimizer for each of the APBS, Since Born radii are used in the calculation of cross-term ddCOSMO, and GK models. The optimization was seeded with energies, the HCT scale factor was chosen before finalizing the electrostatic radii based on AMOEBA van der Waals Rmin values, generalizing constant by minimizing the mean signed error and after trial and error, all three optimization weights were set (MSE) between PB and GK self-energy values. The final HCT to 1.0. scale factor of 0.72 gave an MSE of −0.10, mean unsigned error (MUE) of 0.29, and root mean square error (RMSE) of 0.87 ■ RESULTS (Figure 7A). The slightly negative MSE is compensated for Small-Molecule Hydration Free Energy. The nonpolar implicitly during the fitting of base solute radii described below. portion of the model consists of an unfavorable cavitation free

2331 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

ff Figure 9. Comparison of experimental and ddCOSMO solvation free energy di erences using either AMOEBA van der Waals Rmin radii to describe the fi 2 solute−solvent boundary (A) or using t radii is shown (B). When using AMOEBA Rmin radii, the linear regression gave Y = 1.0017·X − 0.1465 with R = 0.9738. When using fit radii, the linear regression gave Y = 1.0001·X + 0.0015 with R2 = 0.9981.

ff Figure 10. Comparison of experimental and GK solvation free energy di erences using either AMOEBA van der Waals Rmin radii to describe the fi 2 solute−solvent boundary (A) or using t radii is shown (B). When using AMOEBA Rmin radii, the linear regression gave Y = 0.9245·X + 0.0360 with R = 0.9567. When using fit radii, the linear regression gave Y = 0.9999·X − 0.0046 with R2 = 0.9987. energy term and a favorable dispersion free energy term. One atomic overlap scale factor (s = 0.75) were chosen to match tunable parameter, solvent pressure, was used in calculating dispersion values from solute−solvent enthalpy simulations in cavitation, and two tunable parameters, a dispersion offset and explicit water (Table S1, Supporting Information). an atomic overlap scale factor, were used in calculating For the electrostatic portion of the model, a total of 41 solute dispersion. The solvent pressure of 0.0334 kcal/(mol Å3) for radii classes were optimized using L-BFGS minimization as cavitation was chosen based on previous testing. The dispersion described above. Solvation free energy difference values from the integral offset (d = 1.056 Å) and the unitless HCT dispersion FreeSolv database71,72 were used as experimental targets. Each

2332 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article radii class was determined based on SMARTS strings that were Table 1B. RMSE/MSE Values for Test Molecules Using automatically generated for each atom type by PolType2. A total Parameterized Electrostatic Radii Are Given by Functional of 78 unique SMARTS strings were generated for the test set of Group Categories and Overall (kcal/mol) 103 molecules. These 78 SMARTS strings were then reduced fi into 41 groups based on element, chemical environment, and t solute radii electrostatic radii sizes from an initial optimization using all functional group N APBS ddCOSMO GK SMARTS strings under GK electrostatics (Table S2, Supporting alkanes 18 0.49/−0.24 0.42/−0.20 0.49/−0.17 Information). The use of fewer parameters helps to avoid alcohols and phenols 16 1.01/+0.50 1.31/+0.65 1.00/+0.56 overfitting and improves generalizability. Optimization was amines 12 1.49/+0.92 1.37/+0.06 0.92/−0.20 performed using the 41 radii classes to parameterize the APBS, amides 8 0.66/+0.13 0.66/+0.13 0.76/−0.12 ddCOSMO, and GK electrostatic models. Parameterized radii heterocyclic 8 1.31/+0.56 0.50/+0.07 0.88/+0.27 deviated from original AMOEBA van der Waals radii by an arenes 5 0.50/−0.44 1.00/−0.61 0.80/−0.20 average of 9.3% for APBS, 9.9% for ddCOSMO, and 14.9% for ethers 5 0.23/−0.03 0.35/−0.11 0.57/−0.36 GK. The quality of the resulting implicit solvent model for small oxanes and oxines 4 1.16/−0.33 0.58/−0.36 1.05/+0.07 molecules using APBS, ddCOSMO, and GK is shown below in thiols 4 0.65/−0.49 0.38/−0.26 0.32/−0.09 Figures 8−10, respectively, and full data is available in Tables carboxylic acids 3 0.09/−0.08 0.11/−0.01 0.24/+0.00 S3−S8, Supporting Information. sulfides 3 0.36/−0.33 0.19/+0.09 0.32/−0.29 The data shown in Figures 8−10 is summarized in Tables 1A aldehydes 2 0.48/−0.38 0.39/−0.39 0.11/−0.03 and 1B, which give the root mean square error (RMSE) and other 3 1.96/−0.06 0.88/−0.56 0.51/−0.31 total neutrals 91 0.97/+0.14 0.87/+0.01 0.76/+0.00 Table 1A. RMSE/MSE Values for Test Molecules Using charged 12 1.18/−0.59 1.25/−0.07 0.73/−0.03 AMOEBA van der Waals Radii Are Given by Functional total 103 1.00/+0.05 0.92/+0.00 0.75/+0.00 Group Categories and Overall (kcal/mol) ff AMOEBA radii di erences supports the conclusion that the continuum models are neither clearly overfit nor of worse quality than what is functional group N APBS ddCOSMO GK observed for AMOEBA solutes in the explicit solvent. alkanes 18 0.50/−0.24 0.71/−0.47 1.16/−0.89 Validation Simulations on Proteins, DNA, and RNA. alcohols and 16 1.48/+1.27 2.02/−0.19 2.40/+2.09 Nine nucleic acids (≤24 nucleotides) and nine proteins (≤129 phenols residues) of modest size were used to test the electrostatic amines 12 2.51/+2.40 2.36/+2.17 3.04/+1.10 energy and polarization response of the implicit solvents. Of the amides 8 2.25/+1.98 1.28/+0.40 3.18/+2.63 nucleic acids, seven were RNA and two were DNA. Starting nitrogen 8 2.02/+0.99 1.93/+0.14 2.33/+1.60 heterocyclic structures for all 18 validation set molecules are shown in Figure arenes 5 0.77/−0.57 0.95/−0.56 1.35/−0.54 11 and were obtained from the Protein Data Bank (PDB). In the fi ethers 5 0.86/+0.80 0.74/+0.49 1.45/+1.41 case of NMR ensembles, the rst conformer was used. For oxanes and oxines 4 1.05/+0.79 0.58/−0.16 2.29/+1.75 explicit solvent simulations, each molecule was solvated in an thiols 4 1.68/−1.63 1.25/−1.16 1.93/−1.85 explicit water box with neutralizing sodium or chloride . fi carboxylic acids 3 1.31/+1.18 0.68/+0.38 2.58/+2.39 With the validation set molecule xed, minimization to an RMS sulfides 3 1.20/−1.17 0.61/−0.48 1.49/−1.46 gradient of 0.1 kcal/(mol Å) was performed on each system to aldehydes 2 0.61/+0.58 0.22/−0.16 1.52/+1.52 allow for relaxation of the water and ions. other 3 1.55/+0.49 3.01/−1.38 13.04/−6.04 Electrostatic energies were calculated for each validation set fi total neutrals 91 1.58/+0.76 1.58/+0.09 3.22/+0.62 molecule with van der Waals and t radii using all three charged 12 9.90/+0.01 9.13/−2.12 9.80/+2.80 electrostatics models. When using van der Waals radii (Figure total 103 3.69/+0.67 3.45/−0.17 4.51/+0.87 12A), the mean GK electrostatic energy was 1.65% more positive than APBS (R2 = 0.9999), while that for ddCOSMO was 6.71% more negative (R2 = 0.9990). The relatively large mean signed error (MSE) between experimental and computed disparity between ddCOSMO and APBS when using identical solvation free energy differences. Overall, parameterized solute van der Waals radii suggests that the COSMO approximation is radii resulted in RMSE/MUE/MSE values of 1.00/0.70/0.05, not entirely ameliorated by its empirical scaling function. When 0.92/0.63/0.00, and 0.75/0.58/0.00 kcal/mol for the APBS, using fit radii (Figure 12A), the mean GK electrostatic energy ddCOSMO, and GK models, respectively (Table 1B). was 0.01% more negative than APBS (R2 = 0.9991), while that Comparison to Explicit Solvent Free Energy Differ- for ddCOSMO was 0.11% more positive (R2 = 0.9991). This ences. Although the implicit solvents described here were fit to shows that the COSMO approximation is compensated for to a experimental data, a direct comparison to AMOEBA explicit large degree implicitly during the fitting of electrostatic radii. solvent hydration free energy differences helps illuminate if the The slight reduction in R2 between APBS and GK electrostatic continuum models are either overfit or if they exhibit relatively energies moving from van der Waals radii to fit radii suggests higher errors. A subset of 26 neutral small molecules used to that the use of a single HCT overlap scale factor is not perfectly parameterize the implicit solvent model is compared to available transferable across the range of atomic sizes found in proteins data from a recent AMOEBA explicit solvent study120 in Table 2. and nucleic acids. The optimization of element-specific HCT The explicit solvent gave an RMSE of 0.70 kcal/mol compared scale factors for GK electrostatics will be explored in future work to the experiment, while implicit solvents using APBS, that also focuses on handling interstitial spaces too small to ddCOSMO, and GK electrostatics gave RMSEs of 0.91, 0.65, accommodate water molecules (i.e., calculation of Born radii and 0.63 kcal/mol, respectively. The concordance between the based on integration over a molecular volume rather than a van RMSEs for the explicit and implicit hydration free energy der Waals volume).

2333 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

Table 2. Comparison of Solvation Free Energy Differences in AMOEBA Explicit and Implicit Solvents to Experimental Solvation Free Energy Differences (All Values in kcal/mol)

explicit solvent ΔGimplicit signed error

molecule ΔGexpt ΔG error APBS COSMO GK APBS COSMO GK isopropanol −4.74 −4.21 0.28 −3.77 −3.66 −3.92 0.97 1.08 0.82 hydrogen sulfide −0.70 −0.41 0.08 −1.06 −0.77 −0.80 −0.36 −0.07 −0.10 p-cresol −6.13 −5.6 0.28 −6.22 −5.37 −4.99 −0.09 0.76 1.14 dimethylsulfide −1.61 −1.85 0.06 −2.11 −1.56 −1.99 −0.50 0.05 −0.38 phenol −6.60 −5.05 2.40 −6.23 −5.48 −5.49 0.37 1.12 1.11 benzene −0.90 −1.23 0.11 −1.30 −1.70 −1.54 −0.40 −0.80 −0.64 ethanol −5.00 −4.69 0.10 −4.09 −3.89 −3.57 0.91 1.11 1.43 ethane 1.83 1.73 0.01 2.31 2.10 2.54 0.48 0.27 0.71 n-butane 2.10 1.11 0.98 2.09 2.21 1.80 −0.01 0.11 −0.30 methylamine −4.55 −5.46 0.83 −4.00 −4.53 −4.71 0.55 0.02 −0.16 dimethylamine −4.29 −3.04 1.56 −2.72 −4.03 −5.48 1.57 0.26 −1.19 trimethylamine −3.20 −2.09 1.23 −3.34 −3.09 −1.78 −0.14 0.11 1.42 propane 2.00 1.69 0.10 2.24 2.28 2.39 0.24 0.28 0.39 methane 2.00 1.73 0.07 2.43 2.39 1.89 0.43 0.39 −0.11 methanol -5.10 −4.79 0.10 −3.55 −3.51 −3.40 1.55 1.59 1.70 n-propanol −4.85 −4.85 0.00 −4.33 −4.53 −4.23 0.52 0.32 0.62 toluene −0.90 −1.53 0.40 −1.41 −1.60 −1.12 −0.51 −0.70 −0.22 ethylbenzene −0.79 −0.8 0.00 −1.21 −1.41 −0.87 −0.42 −0.62 −0.08 n-methylacetamide −10.00 −8.66 1.80 −9.50 −9.09 −9.05 0.50 0.91 0.95 water −6.30 −5.86 0.19 −5.36 −6.34 −6.27 0.94 −0.04 0.03 acetic acid −6.69 −5.63 1.12 −6.56 −6.75 −6.37 0.13 −0.06 0.32 methylethylsulfide −1.50 −1.98 0.23 −1.64 −1.19 −1.89 −0.14 0.31 −0.39 imidazole −9.63 −10.25 0.38 −9.43 −9.79 −9.36 0.20 −0.16 0.27 acetamide −9.70 −9.3 0.16 −10.78 −10.22 −10.46 −1.08 −0.52 −0.76 ethylamine −4.50 −4.33 0.03 −4.14 −4.97 −4.28 0.36 −0.47 0.22 pyrrolidine −5.48 −4.88 0.36 −2.31 −4.61 −4.24 3.17 0.87 1.24 MSE 0.19 0.36 0.24 0.31 MUE 0.57 0.64 0.50 0.64 RMSE 0.70 0.91 0.65 0.63

Dipole moment magnitudes were calculated for each the worse agreement with AMOEBA in vacuum, is consistent validation set molecule in vacuum, explicit solvent, and implicit with fixed-charge biomolecular force fields being prepolarized solvent. Prior to computing explicit solvent dipole moments, a for aqueous simulations. This also demonstrates that the series of short molecular dynamics (MD) simulations were used AMOEBA electrostatics model produces molecular dipole to equilibrate the system. Ensemble average dipole moment moments that are consistent with previous generation force values were then calculated from 1 ns MD simulations with the fields. biomolecule fixed (i.e., solvent degrees of freedom were AMOEBA implicit solvent dipole moment magnitudes were converged). For detailed simulation conditions, see Table S10, calculated using each of the three parameterized electrostatics Supporting Information. In addition, dipole moment magni- models and compared to those from explicit solvent ensemble tudes were calculated for the validation set using three fixed- averages, as shown in Figure 14. All three electrostatics models charge force fields: AMBER ff99SB,121 OPLS-AA/L,122 and achieved near-perfect correlation with explicit solvent values CHARMM22/CMAP.123,124 Tinker version 8.8.1 (August based on R2 values for the APBS, ddCOSMO, and GK models of 2020) did not include nucleic acid force field parameters for 0.999 in each case. Notably, each AMOEBA implicit solvent OPLS-AA/L or CHARMM22/CMAP; therefore, only AMOE- electrostatics model produces dipoles moments that agree more BA and AMBER ff99SB dipole moment magnitudes are closely with AMOEBA in explicit solvent than does any fixed- reported for nucleic acids. Dipole moment magnitudes charge force field. This supports the conclusion that if the calculated using available force fields for nucleic acids and AMOEBA biomolecular dipole moments are closer to reality proteins are presented in Figure 13. All fixed-charge dipole than those from any of the fixed-charge models, then AMOEBA moment magnitudes are plotted against ensemble average simulations in implicit solvent are in some ways more realistic AMOEBA explicit solvent dipole moment magnitudes, as well as than fixed-charge simulations in explicit solvent. The relative AMOEBA vacuum dipole moments. AMBER ff99SB, OPLS- merits of fixed-charge explicit water simulations compared to AA/L, and CHARMM22/CMAP dipole moment magnitudes polarizable implicit solvent simulations will be further explored had R2 values of 0.987, 0.979, and 0.982, respectively, when in the future. In Figure 14, dipole moment magnitudes for the compared to AMOEBA vacuum dipole moment magnitudes, APBS electrostatics model were calculated without implicitly and R2 values of 0.998, 0.996, and 0.996, respectively, when adding ions. A full comparison of dipole moment magnitudes compared to AMOEBA explicit solvent dipole moment and electrostatic energy components for all validation set magnitudes. The better agreement of the fixed-charge force molecules is available in Tables S11−S13,Supporting fields with AMOEBA in explicit (or implicit) solvent, relative to Information. This includes dipole moment magnitudes

2334 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

Figure 11. Validation set includes nine nucleic acids and nine proteins. The nucleic acid set can be further broken down into sets of four RNA helices (2JXQ, 1F5G, 1MIS, and 2L8F), three RNA hairpins (1ZIH, 2KOC, and 1SZY), and two DNA double helices (1D20 and 2HKB). Additional information on individual molecules and simulation conditions can be found in Table S10, Supporting Information. Coordinate files for each molecule (Tinker “XYZ” format) are provided in the Supporting Information.

Figure 12. APBS energy values for the biomolecular validation set are plotted vs ddCOSMO and GK for both van der Waals radii and fit radii. (A) APBS electrostatics energy using van der Waals radii vs ddCOSMO (Y = 1.0523X, R2 = 0.9990) and GK (Y = 0.9854X, R2 = 0.9999). (B) APBS PBE electrostatics energy using fit radii vs ddCOSMO (Y = 0.9884X, R2 = 0.9991) and GK (Y = 0.9833X, R2 = 0.9991). Full data is available in Tables S11 and S13, Supporting Information. calculated in APBS using the nonlinear and linearized forms of The performances of the APBS, ddCOSMO, and GK the PBE with 150 mM salt. Overall, these results suggest the electrostatics models implemented in Tinker were compared electrostatics models scale up well from initial optimization by timing an energy and gradient calculation on a single CPU against small-molecule hydration free energy differences to core (Intel Xeon CPU E5-2680 v4 at 2.40 GHz). Calculations applications to larger biomolecules. were performed for one nucleic acid (1ZIH) and one protein

2335 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

Figure 13. Comparison of dipole moment magnitudes for fixed-charge force fields and AMOEBA across environments. (A) AMOEBA vacuum dipole moment magnitudes vs those for fixed-charge force fields (AMBER ff99SB R2, 0.987; OPLS-AA/L R2, 0.979; CHARMM22/CMAP R2, 0.982). (B) AMOEBA explicit solvent dipole moment magnitudes vs those for fixed-charge force fields (AMBER ff99SB R2, 0.998; OPLS-AA/L R2, 0.996; CHARMM22/CMAP R2, 0.996). Dashed lines at x = y are a guide to the eye.

step) is consistent with the molecular dynamics performance of ∼20 ns/day using a conservative 1 fs integration scheme, which opens the door to tuning the AMOEBA/GK continuum model using extensive simulations of proteins and nucleic acids. As of this writing, the APBS and ddCOSMO electrostatics models are not yet available for use on GPUs. ■ CONCLUSIONS Implicit solvent models were developed for use with the polarizable AMOEBA force field. Novel cavitation and dispersion nonpolar terms were designed to replicate explicit solvent free energy differences using only three free parameters: a single cavitation parameter to describe solvent pressure for small cavities and two dispersion parameters (one to define the beginning of the dispersal integral and a second to account for atomic overlaps during integration). Based on these nonpolar terms, the solute−solvent electrostatic boundary (i.e., atomic Figure 14. Comparison of dipole moment magnitudes for AMOEBA in radii) was optimized for three continuum electrostatics models, explicit solvent vs vacuum, PB, ddCOSMO, and GK environments for 2 2 ABPS, ddCOSMO, and GK, using numerical optimization the validation set (vacuum R : 0.990, PB/ddCOSMO/GK R : 0.999 in against experimental small-molecule solvation free energy each case). The dashed line at x = y is a guide to the eye. differences. Overall, the APBS, ddCOSMO, and GK models produced mean unsigned errors of 0.70, 0.63, and 0.58 kcal/mol, (1VII) from the validation set. Results in Table 3 show that respectively, compared to the experiment. All three implicit APBS is the costliest model, while GK is currently the most solvent models produced hydration free energy difference efficient. Timings for the same systems using AMOEBA/GK RMSEs within 0.2 kcal/mol of AMOEBA explicit solvent electrostatics implemented within FFX-OpenMM and executing solvation free energy difference simulations for a collection of 26 on a single GPU (NVIDIA GeForce RTX 2080 Ti) were also small molecules (Table 2). This supports the conclusion that the collected. The latter performance (less than 0.005 s per time implicit solvent models presented here are of similar quality to that of explicit solvent for hydration free energy differences and Table 3. Performance of the APBS, ddCOSMO, and GK are not clearly overfit to the test data (i.e., overfitting might be Electrostatics Models Implemented in Tinker on a Single suggested by implicit solvent RMSEs that are artificially much CPU Core and for the GK Model Using FFX-OpenMM on a lower than those achieved by explicit solvent simulations). Single GPU Each small molecule used to parameterize the implicit solvent model fell within the volume-scaling regime of the cavitation calculation of the energy and gradient (s) model, such that the contribution to solvation was calculated Tinker (1 CPU thread) FFX-OpenMM (1 GPU) using SEV. For larger proteins or nucleic acids, the cavitation free energy of the model falls within the surface-area-scaling molecule APBS ddCOSMO GK GK regime. A future goal is to account for local molecular curvature 1ZIH 48.804 11.285 0.250 0.0042 to promote transferability of the cavitation free energy to 1VII 53.772 21.245 0.533 0.0047 biomolecules of complex shapes. The dispersion model

2336 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article integrates the WCA attractive potential for each atom in the problematic as the molecular size and complexity increase (e.g., solute. This analytic, pairwise approach is well equipped to for biomolecules). For example, a simple van der Waals surface handle nonspherical solutes, which adds physical detail to the does not account for interstitial spaces (i.e., spaces between previously described Born radii-based dispersion model.73 Both biomolecular residues or domains where water molecules the cavitation and dispersion models described here are cannot fit), and thereby allows for continuum water access to currently limited by their lack of treatment of interstitial spaces, spaces not accessible to explicit water. Favorable hydration which is elaborated on below. effects of continuum water in interstitial spaces promote swelling To optimize the agreement of GK self-energies with the of biomolecules and oppose hydrophobic compaction forces. calculated perfect PB multipolar self-energies, it may be beneficial For this reason, future work must incorporate methods that have to use separate HCT scaling factors for each chemical element been proposed to account for interstitial spaces118,135−137 into instead of a single parameter, as was done here. The precedent the AMOEBA family of implicit solvents. for this split is given in the original description of the HCT pairwise descreening approximation, where the scale factor ■ ASSOCIATED CONTENT 34 magnitude generally decreases with the increasing atomic size. *sı Supporting Information Additionally, the agreement between GK cross-term energies The Supporting Information is available free of charge at and calculated perfect PB multipolar cross-term energies might https://pubs.acs.org/doi/10.1021/acs.jctc.0c01286. be improved using separate generalizing function constants for Description of the electrostatic potential experienced by monopoles, dipoles, and quadrupoles, rather than a single ff constant. The physical motivation is that the electrostatic an empty cavity under three di erent simulation potential is of a longer range for monopoles than for dipoles. conditions due to the preferential orientation of water at boundaries; implicit solvent dispersion term fitto Therefore, the transition between the Born ion regime (or the ff Kirkwood multipole regime) and the Coulomb regime, which is dispersion free energy di erences calculated for a set of tuned by the constant in the generalizing function (eq 34), could small molecules in explicit solvent; initial (van der Waals) and optimized solute diameters for APBS, ddCOSMO, in principle be optimized for each multipole order separately. ff For simplicity, this work used a single HCT scale factor (0.72) and GK; solvation free energy di erence components using AMOEBA van der Waals radii; solvation free energy and a single generalizing constant (2.455) for GK. ff At the length scale of small molecules, continuum electro- di erences using AMOEBA van der Waals radii; and signed and unsigned solvation free energy difference statics is known to be sensitive to the definition of the solute− 125−129 errors using AMOEBA van der Waals radii; solvation free solvent boundary, and thus optimization of electrostatic ff fi radii is required to implicitly account for physical details like energy di erence components using t electrostatic radii; solvation free energy differences using fit electrostatic solute−water hydrogen bonding. Overall, the quality of the radii; signed and unsigned solvation free energy difference resulting models using fit solute radii for PB (RMSE 1.0, MSE errors using fit electrostatic radii; free energy differences 0.1), ddCOSMO (RMSE 0.9, MSE 0.0), and GK (RMSE 0.8, for phosphate and guanidinium compounds from explicit MSE 0.0) are comparable to that of the recent Drude/PB 51 solvent simulations; simulation conditions for validation implicit solvent (RMSE 0.8, MSE 0.0). The fit radii reproduce set molecules with explicit solvent; electrostatic compo- experimental solvation free energy differences better than the nent of the solvation free energy and dipole moments for original van der Waals radii, which gave RMS errors of 3.69, 3.45, validation set molecules using van der Waals radii and fit and 4.51 kcal/mol for APBS, ddCOSMO, and GK, respectively radii; dipole moment magnitudes from fixed charge force (Table 1A). Additionally, it may be beneficial to consider using fields (PDF) optimized GK (or ddCOSMO) electrostatic radii as a starting point for electrostatics calculations in quantum mechanical AMOEBA parameters (Tinker “prm” format) and 61,130 coordinate files (Tinker “XYZ” format) for all small continuum solvents. fi Dipole moment calculations using each AMOEBA implicit molecules used in this work and coordinate les for all solvent for 18 protein and nucleic acid biomolecules show nearly protein and nucleic acid validation molecules (ZIP) exact agreement with explicit solvent dipole moments computed by averaging over solvent degrees of freedom (Figure 14). This ■ AUTHOR INFORMATION suggests that all three models (APBS, ddCOSMO, and GK) Corresponding Author successfully reproduce the polarization response observed in Michael J. Schnieders − Roy J Carver Department of explicit water simulations at the resolution of overall Biomedical Engineering, University of Iowa, Iowa City, Iowa biomolecules. Future work will focus on molecular dynamic 52242, United States; Department of Biochemistry, University simulations of biomolecules in implicit solvent compared to of Iowa, Iowa City, Iowa 52242, United States; orcid.org/ explicit solvent to access stability and the agreement of 0000-0003-1260-4592; Email: michael-schnieders@ conformational ensembles. Furthermore, although the implicit uiowa.edu solvent models discussed here have been developed for use with the AMOEBA polarizable force field, their support for Authors polarizable atomic multipole electrostatics should permit Rae A. Corrigan − Roy J Carver Department of Biomedical adaptation to emerging models such as AMOEBA+ and Engineering, University of Iowa, Iowa City, Iowa 52242, HIPPO.131−133 United States; orcid.org/0000-0003-3628-9347 An important limitation of the current models is their focus on Guowei Qi − Department of Biochemistry, University of Iowa, the use of a van der Waals description of the solute for cavitation, Iowa City, Iowa 52242, United States dispersion, and electrostatic contributions, rather than a Andrew C. Thiel − Roy J Carver Department of Biomedical molecular surface.134,135 The approximation of a van der Engineering, University of Iowa, Iowa City, Iowa 52242, Waals description is modest for small solutes but becomes United States

2337 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

Jack R. Lynn − Roy J Carver Department of Biomedical source protein redesign for you, with powerful new features. J. Comput. Engineering, University of Iowa, Iowa City, Iowa 52242, Chem. 2018, 39, 2494−2507. United States (8) Villa, F.; Panel, N.; Chen, X. Y.; Simonson, T. Adaptive landscape Brandon D. Walker − Department of Biomedical Engineering, flattening in sequence space for the computational design of University of Texas in Austin, Austin, Texas 78712, United protein:peptide binding. J. Chem. Phys. 2018, 149, No. 072302. ’ States (9) Tollefson, M. R.; Litman, J. M.; Qi, G.; O Connell, C. E.; Wipfler, M. J.; Marini, R. J.; Bernabe, H. V.; Tollefson, W. T. A.; Braun, T. A.; Thomas L. Casavant − Roy J Carver Department of Biomedical Casavant, T. L.; Smith, R. J. H.; Schnieders, M. J. Structural insights into Engineering, University of Iowa, Iowa City, Iowa 52242, hearing loss genetics from polarizable protein repacking. Biophys. J. United States 2019, 117, 602−612. Louis Lagardere − Department of Chemistry, Sorbonne (10) Cheatham, T. E.; Case, D. A. Twenty-five years of nucleic acid Université, F-75005 Paris, France simulations. Biopolymers 2013, 99, 969−977. Jean-Philip Piquemal − Department of Chemistry, Sorbonne (11) Bergonzo, C.; Henriksen, N. M.; Roe, D. R.; Cheatham, T. E. Université, F-75005 Paris, France; orcid.org/0000-0001- Highly sampled tetranucleotide and tetraloop motifs enable evaluation 6615-9426 of common RNA force fields. RNA 2015, 21, 1578−1590. Jay W. Ponder − Department of Chemistry, Washington (12) Musil, M.; Konegger, H.; Hong, J.; Bednar, D.; Damborsky, J. University in St. Louis, St. Louis, Missouri 63130, United Computational Design of Stable and Soluble Biocatalysts. Acs Catal. States; orcid.org/0000-0001-5450-9230 2019, 9, 1033−1054. Pengyu Ren − Department of Biomedical Engineering, (13) Roux, B.; Simonson, T. Implicit solvent models. Biophys. Chem. University of Texas in Austin, Austin, Texas 78712, United 1999, 78,1−20. (14) Onufriev, A. V.; Izadi, S. Water models for biomolecular States; orcid.org/0000-0002-5613-1910 simulations. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2018, 8, Complete contact information is available at: No. e1347. https://pubs.acs.org/10.1021/acs.jctc.0c01286 (15) Tan, C.; Tan, Y. H.; Luo, R. Implicit nonpolar solvent models. J. Phys. Chem. B 2007, 111, 12263−12274. Author Contributions (16) Decherchi, S.; Masetti, M.; Vyalov, I.; Rocchia, W. Implicit #R.A.C. and G.Q. contributed equally. solvent methods for free energy estimation. Eur. J. Med. Chem. 2015, 91, 27−42. Notes fi (17) Levy, R. M.; Zhang, L. Y.; Gallicchio, E.; Felts, A. K. On the The authors declare no competing nancial interest. nonpolar hydration free energy of proteins: Surface area and continuum solvent models for the solute-solvent interaction energy. J. Am. Chem. ■ ACKNOWLEDGMENTS Soc. 2003, 125, 9523−9530. All computations were performed on The University of Iowa (18) Wagoner, J. A.; Baker, N. A. Assessing implicit models for Argon cluster with support and guidance from Danny Tang, Joe nonpolar mean solvation forces: the importance of dispersion and Hetrick, Glenn Johnson, and John Saxton. M.J.S. was supported volume terms. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 8331−8336. by NIH R01DK110023, NIH R01DC012049, the Donald D. (19) Chen, Z.; Zhao, S.; Chun, J.; Thomas, D. G.; Baker, N. A.; Bates, Harrington Fellows Program and NSF CHE-1751688. J.W.P. P. W.; Wei, G. W. Variational approach for nonpolar solvation analysis. J. Chem. Phys. 2012, 137, No. 084101. and P.R. were supported by R01GM106137 and (20) Harris, R. C.; Pettitt, B. M. Effects of geometry and chemistry on R01GM114237. J.-P.P. and L.L. were supported by the hydrophobic solvation. Proc. Natl. Acad. Sci. U.S.A. 2014, 111, 14681− European Research Council (ERC) under the European 14686. Union’s Horizon 2020 Research and Innovation Program No (21) Michael, E.; Polydorides, S.; Simonson, T.; Archontis, G. Simple 810367. R.A.C. and A.C.T. were supported by the NSF models for nonpolar solvation: parameterization and testing. J. Comput. Graduate Research Fellowship Program NSF DGE-1945994, Chem. 2017, 38, 2509−2519. and G.Q. was supported by the Goldwater Foundation and the (22) Gilson, M. K.; Honig, B. H. Calculation of electrostatic potentials Iowa Center for Research by Undergraduates (ICRU). in an enzyme active-site. Nature 1987, 330, 84−86. (23) Gilson, M. K.; Honig, B. Calculation of the total electrostatic ■ REFERENCES energy of a macromolecular system - solvation energies, binding- energies, and conformational-analysis. Proteins: Struct., Funct., Genet. (1) Chandler, D. Interfaces and the driving force of hydrophobic assembly. Nature 2005, 437, 640−647. 1988, 4,7−18. (2) Dill, K. A.; MacCallum, J. L. The Protein-Folding Problem, 50 (24) Jeancharles, A.; Nicholls, A.; Sharp, K.; Honig, B.; Tempczyk, A.; Years On. Science 2012, 338, 1042−1046. Hendrickson, T. F.; Still, W. C. Electrostatic contributions to solvation (3) Zhang, Q. C.; Petrey, D.; Deng, L.; Qiang, L.; Shi, Y.; Thu, C. A.; energies - comparison of free-energy perturbation and continuum Bisikirska, B.; Lefebvre, C.; Accili, D.; Hunter, T.; Maniatis, T.; calculations. J. Am. Chem. Soc. 1991, 113, 1454−1455. Califano, A.; Honig, B. Structure-based prediction of protein-protein (25) Honig, B.; Nicholls, A. Classical electrostatics in biology and interactions on a genome-wide scale. Nature 2012, 490, 556−560. chemistry. Science 1995, 268, 1144−1149. (4) Ren, P.; Chun, J.; Thomas, D. G.; Schnieders, M. J.; Marucho, M.; (26) Baker, N. A.; Sept, D.; Joseph, S.; Holst, M. J.; McCammon, J. A. Zhang, J.; Baker, N. A. Biomolecular electrostatics and solvation: a Electrostatics of nanosystems: application to microtubules and the computational perspective. Q. Rev. Biophys. 2012, 45, 427−491. ribosome. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 10037−10041. (5) LuCore, S. D.; Litman, J. M.; Powers, K. T.; Gao, S.; Lynn, A. M.; (27) Simonson, T. Macromolecular electrostatics: continuum models Tollefson, W. T. A.; Fenn, T. D.; Washington, M. T.; Schnieders, M. J. and their growing pains. Curr. Opin. Struct. Biol. 2001, 11, 243−252. Dead-end elimination with a polarizable force field repacks PCNA (28) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S. H.; structures. Biophys. J. 2015, 109, 816−826. Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang, W.; Donini, O.; Cieplak, (6) Huang, P.-S.; Boyken, S. E.; Baker, D. The coming of age of de P.; Srinivasan, J.; Case, D. A.; Cheatham, T. E. Calculating structures novo protein design. Nature 2016, 537, 320. and free energies of complex molecules: Combining molecular (7) Hallen, M. A.; Martin, J. W.; Ojewole, A.; Jou, J. D.; Lowegard, A. mechanics and continuum models. Acc. Chem. Res. 2000, 33, 889−897. U.; Frenkel, M. S.; Gainza, P.; Nisonoff, H. M.; Mukund, A.; Wang, S.; (29) Tan, Y. H.; Luo, R. Continuum treatment of electronic Holt, G. T.; Zhou, D.; Dowd, E.; Donald, B. R. OSPREY 3.0: Open- polarization effect. J. Chem. Phys. 2007, 126, No. 094103.

2338 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

(30) Tan, Y. H.; Tan, C. H.; Wang, J.; Luo, R. Continuum polarizable electrostatic Poisson-Boltzmann implicit solvation model. J. Comput. force field within the Poisson-Boltzmann framework. J. Phys. Chem. B Chem. 2018, 39, 1707−1719. 2008, 112, 7675−7688. (52) Aleksandrov, A.; Roux, B.; MacKerell, A. D. pKa Calculations (31) Jurrus, E.; Engel, D.; Star, K.; Monson, K.; Brandi, J.; Felberg, L. with the Polarizable Drude Force Field and Poisson−Boltzmann E.; Brookes, D. H.; Wilson, L.; Chen, J.; Liles, K.; Chun, M.; Li, P.; Solvation Model. J. Chem. Theory Comput. 2020, 16, 4655−4668. Gohara, D. W.; Dolinsky, T.; Konecny, R.; Koes, D. R.; Nielsen, J. E.; (53) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Head-Gordon, T.; Geng, W.; Krasny, R.; Wei, G.-W.; Holst, M. J.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph- McCammon, J. A.; Baker, N. A. Improvements to the APBS McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; biomolecular solvation software suite. Protein Sci. 2018, 27, 112−128. Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; (32) Rashin, A. A.; Honig, B. Reevaluation of the Born model of ion Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, hydration. J. Phys. Chem. A 1985, 89, 5588−5593. M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-atom empirical (33) Roux, B.; Yu, H. A.; Karplus, M. Molecular-basis for the Born potential for molecular modeling and dynamics studies of proteins. J. model of ion solvation. J. Phys. Chem. B 1990, 94, 4683−4688. Phys. Chem. B 1998, 102, 3586−3616. (34) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Pairwise solute (54) Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E. M.; Mittal, J.; Feig, M.; descreening of solute charges from a dielectric medium. Chem. Phys. MacKerell, A. D. Optimization of the additive CHARMM all-atom Lett. 1995, 246, 122−129. protein force field targeting improved sampling of the backbone phi, psi (35) Qiu, D.; Shenkin, P. S.; Hollinger, F. P.; Still, W. C. The GB/SA and side-chain chi(1) and chi(2) dihedral angles. J. Chem. Theory continuum model for solvation: a fast analytical method for the Comput. 2012, 8, 3257−3273. calculation of approximate Born radii. J. Phys. Chem. A 1997, 101, (55) Poier, P. P.; Jensen, F. Including implicit solvation in the bond 3005−3014. capacity polarization model. J. Chem. Phys. 2019, 151, No. 114118. (36) Nina, M.; Beglov, D.; Roux, B. Atomic radii for continuum (56) Poier, P. P.; Jensen, F. Polarizable charges in a generalized Born electrostatics calculations based on molecular dynamics free energy reaction potential. J. Chem. Phys. 2020, 153, No. 024111. simulations. J. Phys. Chem. B 1997, 101, 5239−5248. (57) Cooper, C. D. A boundary-integral approach for the poisson− (37) Gallicchio, E.; Levy, R. M. AGBNP: An analytic implicit solvent boltzmann equation with polarizable force fields. J. Comput. Chem. model suitable for molecular dynamics simulations and high-resolution 2019, 40, 1680−1692. modeling. J. Comput. Chem. 2004, 25, 479−499. (58) Cooper, C. D.; Bardhan, J. P.; Barba, L. A. A biomolecular (38) Gallicchio, E.; Paris, K.; Levy, R. M. The AGBNP2 implicit electrostatics solver using Python, GPUs and boundary elements that solvation model. J. Chem. Theory Comput. 2009, 5, 2544−2564. can handle solvent-filled cavities and Stern layers. Comput. Phys. (39) Mukhopadhyay, A.; Aguilar, B. H.; Tolokh, I. S.; Onufriev, A. V. Commun. 2014, 185, 720−729. Introducing charge hydration asymmetry into the generalized Born (59) Cramer, C. J.; Truhlar, D. G. Implicit solvation models: Equilibria, structure, spectra, and dynamics. Chem. Rev. 1999, 99, model. J. Chem. Theory Comput. 2014, 10, 1788−1794. (40) Onufriev, A. V.; Case, D. A. Generalized Born implicit solvent 2161−2200. (60) Tomasi, J. Thirty years of continuum solvation chemistry: a models for biomolecules. Annu. Rev. Biophys. 2019, 48, 275−296. review, and prospects for the near future. Theor. Chem. Acc. 2004, 112, (41) Ren, P.; Wu, C.; Ponder, J. W. Polarizable atomic multipole- 184−203. based molecular mechanics for organic molecules. J. Chem. Theory (61) Tomasi, J.; Mennucci, B.; Cammi, R. Quantum mechanical Comput. 2011, 7, 3143−3161. continuum solvation models. Chem. Rev. 2005, 105, 2999−3093. (42) Zhang, C.; Lu, C.; Jing, Z.; Wu, C.; Piquemal, J.-P.; Ponder, J. W.; (62) Miertus,̌ S.; Scrocco, E.; Tomasi, J. Electrostatic interaction of a Ren, P. AMOEBA polarizable atomic multipole force field for nucleic solute with a continuum. A direct utilizaion of ab initio molecular acids. J. Chem. Theory Comput. 2018, 14, 2084−2108. potentials for the prevision of solvent effects. Chem. Phys. 1981, 55, (43) Schnieders, M. J.; Baker, N. A.; Ren, P. Y.; Ponder, J. W. 117−129. Polarizable atomic multipole solutes in a Poisson-Boltzmann (63) Cances,̀ E.; Mennucci, B.; Tomasi, J. A new integral equation continuum. J. Chem. Phys. 2007, 126, No. 124114. formalism for the polarizable continuum model: Theoretical back- (44) Lipparini, F.; Lagardere,̀ L.; Raynaud, C.; Stamm, B.; Cances,̀ E.; ground and applications to isotropic and anisotropic . J. Mennucci, B.; Schnieders, M.; Ren, P.; Maday, Y.; Piquemal, J.-P. Chem. Phys. 1997, 107, 3032−3041. Polarizable Molecular Dynamics in a Polarizable Continuum Solvent. J. (64) Cramer, C. J.; Truhlar, D. G. An SCF solvation model for the Chem. Theory Comput. 2015, 11, 623−634. and absolute free energies of aqueous solvation. (45) Schnieders, M. J.; Ponder, J. W. Polarizable atomic multipole Science 1992, 256, 213−217. solutes in a generalized Kirkwood continuum. J. Chem. Theory Comput. (65) Kelly, C. P.; Cramer, C. J.; Truhlar, D. G. SM6: A density 2007, 3, 2083−2097. functional theory continuum solvation model for calculating aqueous (46) Kirkwood, J. G. Theory of solutions of molecules containing solvation free energies of neutrals, ions, and solute-water clusters. J. widely separated charges with special application to zwitterions. J. Chem. Theory Comput. 2005, 1, 1133−1152. Chem. Phys. 1934, 2, 351−361. (66) Marenich, A. V.; Cramer, C. J.; Truhlar, D. G. Universal solvation (47) Maple, J. R.; Cao, Y. X.; Damm, W. G.; Halgren, T. A.; Kaminski, model based on solute electron density and on a continuum model of G. A.; Zhang, L. Y.; Friesner, R. A. A polarizable force field and the solvent defined by the bulk dielectric constant and atomic surface continuum solvation methodology for modeling of protein-ligand tensions. J. Phys. Chem. B 2009, 113, 6378−6396. interactions. J. Chem. Theory Comput. 2005, 1, 694−715. (67) Klamt, A.; Schuurmann, G. COSMO - a new approach to (48) Cortis, C. M.; Friesner, R. A. Numerical solution of the Poisson- dielectric screening in solvents with explicit expressions for the Boltzmann equation using tetrahedral finite-element meshes. J. Comput. screening energy and its gradient. J. Chem. Soc., Perkin Trans. 2 1993, Chem. 1997, 18, 1591−1608. 2, 799−805. (49) Lemkul, J. A.; Huang, J.; Roux, B.; MacKerell, A. D. An Empirical (68) Klamt, A. Conductor-like screening model for real solvents - a Polarizable Force Field Based on the Classical Drude Oscillator Model: new approach to the quantitative calculation of solvation phenomena. J. Development History and Recent Applications. Chem. Rev. 2016, 116, Phys. Chem. C 1995, 99, 2224−2235. 4983−5013. (69) Rackers, J. A.; Wang, Z.; Lu, C.; Laury, M. L.; Lagardere,̀ L.; (50) Lemkul, J. A.; MacKerell, A. D., Jr. Polarizable force field for RNA Schnieders, M. J.; Piquemal, J.-P.; Ren, P.; Ponder, J. W. Tinker 8: based on the classical drude oscillator. J. Comput. Chem. 2018, 39, Software Tools for Molecular Design. J. Chem. Theory Comput. 2018, 2624−2646. 14, 5273−5289. (51) Aleksandrov, A.; Lin, F. Y.; Roux, B.; MacKerell, A. D. (70) Stone, A. J. Distributed multipole analysis: Stability for large basis Combining the polarizable Drude force field with a continuum sets. J. Chem. Theory Comput. 2005, 1, 1128−1132.

2339 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

(71) Mobley, D. L.; Guthrie, J. P. FreeSolv: a database of experimental (95) Wesson, L.; Eisenberg, D. Atomic solvation parameters applied and calculated hydration free energies, with input files. J. Comput.-Aided to molecular-dynamics of proteins in solution. Protein Sci. 1992, 1, Mol. Des. 2014, 28, 711−720. 227−235. (72) Matos, G. D. R.; Kyu, D. Y.; Loeffler, H. H.; Chodera, J. D.; (96) C.R.C. Handbook of Chemistry and Physics, 58th ed.; Weast, R. C., Shirts, M. R.; Mobley, D. L. Approaches for calculating solvation free Ed.; CRC Press, 1977. energies and enthalpies demonstrated with an update of the FreeSolv (97) Halgren, T. A. The representation of van der Waals (vdW) database. J. Chem. Eng. Data 2017, 62, 1559−1569. interactions in molecular mechanics force fields: potential form, (73) Gallicchio, E.; Zhang, L. Y.; Levy, R. M. The SGB/NP hydration combination rules, and vdW parameters. J. Am. Chem. Soc. 1992, 114, free energy model based on the surface Generalized Born solvent 7827−7843. reaction field and novel nonpolar hydration free energy estimators. J. (98) Weeks, J. D.; Chandler, D.; Andersen, H. C. Role of repulsive Comput. Chem. 2002, 23, 517−529. forces in determining the equilibrium structure of simple . J. (74) Rizzo, R. C.; Aynechi, T.; Case, D. A.; Kuntz, I. D. Estimation of Chem. Phys. 1971, 54, 5237−5247. absolute free energies of hydration using continuum methods: accuracy (99) Chandler, D.; Weeks, J. D.; Andersen, H. C. Van der Waals of partial charge models and optimization of nonpolar contributions. J. picture of liquids, , and phase transformations. Science 1983, 220, Chem. Theory Comput. 2006, 2, 128−139. 787−794. (75) Bartmess, J. E. Negative Ion Energetics Data. NIST Chemistry (100) Gallicchio, E.; Kubo, M. M.; Levy, R. M. Enthalpy- and WebBook; National Institute of Standards and Technology: Gaithers- cavity decomposition of hydration free energies: Numerical burg, MD, p 20899. results and implications for theories of hydrophobic solvation. J. Phys. (76) Hunter, E. P.; Lias, S. G. Proton Affinity Evaluation. NIST Chem. B 2000, 104, 6271−6285. Chemistry WebBook; National Institute of Standards and Technology: (101) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Parametrized Gaithersburg, MD, p 20899. models of aqueous free energies of solvation based on pairwise (77) Stewart, R. The Proton: Applications to Organic Chemistry; descreening of solute atomic charges from a dielectric medium. J. Phys. Academic Press, Inc.: New York, 1985; Vol. 46. Chem. D 1996, 100, 19824−19839. (78) Wang, Z. Polarizable Force Field Development, and Applications (102) Lipparini, F.; Lagardere,̀ L.; Stamm, B.; Cances,̀ E.; Schnieders, to Conformational Sampling and Free Energy Calculation. Disserta- M.; Ren, P.; Maday, Y.; Piquemal, J.-P. Scalable Evaluation of tion, Washington University in St. Louis: St. Louis, MO, 2018. Polarization Energy and Associated Forces in Polarizable Molecular (79) Fawcett, W. R. Thermodynamic parameters for the solvation of Dynamics: I. Toward Massively Parallel Direct Space Computations. J. monatomic ions in water. J. Phys. Chem. B 1999, 103, 11181−11185. Chem. Theory Comput. 2014, 10, 1638−1651. (80) Ashbaugh, H. S. Convergence of molecular and macroscopic (103) Aviat, F.; Levitt, A.; Stamm, B.; Maday, Y.; Ren, P.; Ponder, J. continuum descriptions of ion hydration. J. Phys. Chem. B 2000, 104, W.; Lagardere,̀ L.; Piquemal, J.-P. Truncated Conjugate Gradient: An 7235−7238. Optimal Strategy for the Analytical Evaluation of the Many-Body (81) Wu, J. C.; Chattree, G.; Ren, P. Y. Automation of AMOEBA Polarization Energy and Forces in Molecular Simulations. J. Chem. polarizable force field parameterization for small molecules. Theor. Theory Comput. 2017, 13, 180−190. Chem. Acc. 2012, 131, No. 1138. (104) Simmonett, A. C.; Pickard, F. C., IV; Ponder, J. W.; Brooks, B. R. (82) Huang, D. M.; Chandler, D. Temperature and length scale An empirical extrapolation scheme for efficient treatment of induced dependence of hydrophobic effects and their possible implications for dipoles. J. Chem. Phys. 2016, 145, No. 164101. protein folding. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 8324−8327. (105) Lipparini, F.; Lagardere, L.; Scalmani, G.; Stamm, B.; Cances, (83) Huang, D. M.; Geissler, P. L.; Chandler, D. Scaling of E.; Maday, Y.; Piquemal, J. P.; Frisch, M. J.; Mennucci, B. Quantum hydrophobic solvation free energies. J. Phys. Chem. B 2001, 105, calculations in solution for large to very large molecules: a new linear 6704−6709. scaling QM/continuum approach. J. Phys. Chem. Lett. 2014, 5, 953− (84) Huang, D. M.; Chandler, D. The hydrophobic effect and the 958. influence of solute-solvent attractions. J. Phys. Chem. B 2002, 106, (106) Lipparini, F.; Scalmani, G.; Lagardere, L.; Stamm, B.; Cances, 2047−2053. E.; Maday, Y.; Piquemal, J. P.; Frisch, M. J.; Mennucci, B. Quantum, (85) Chen, J. H.; Brooks, C. L. Critical importance of length-scale classical, and hybrid QM/MM calculations in solution: general dependence in implicit modeling of hydrophobic interactions. J. Am. implementation of the ddCOSMO linear scaling strategy. J. Chem. Chem. Soc. 2007, 129, 2444. Phys. 2014, 141, No. 184108. (86) Connolly, M. L. Analytical molecular surface calculation. J. Appl. (107) Stamm, B.; Lagardere, L.; Scalmani, G.; Gatto, P.; Cances, E.; Crystallogr. 1983, 16, 548−558. Piquemal, J. P.; Maday, Y.; Mennucci, B.; Lipparini, F. How to make (87) Connolly, M. L. Computation of molecular volume. J. Am. Chem. continuum solvation incredibly fast in a few simple steps: a practical Soc. 1985, 107, 1118−1124. guide to the domain decomposition paradigm for the conductor-like (88) Connolly, M. L. The molecular-surface package. J. Mol. Graphics screening model. Int. J. Quantum Chem. 2019, 119, No. e25669. 1993, 11, 139−143. (108) Rackers, J. A.; Wang, Z.; Lu, C.; Laury, M. L.; Lagardere, L.; (89) Schnieders, M. J. Force Field X, version 1.0; University of Iowa, Schnieders, M. J.; Piquemal, J.-P.; Ren, P.; Ponder, J. W. Tinker 8: 2021. https://ffx.biochem.uiowa.edu. software tools for molecular design. J. Chem. Theory Comput. 2018, (90) Richards, F. M. Areas, volumes, packing and protein structure. 5273. Annu. Rev. Biophys. Bioeng. 1977, 6, 151−176. (109) Holst, M.; Saied, F. Multigrid solution of the Poisson- (91) Schnieders, M. J. The Theory and Effect of Solvation on Boltzmann equation. J. Comput. Chem. 1993, 14, 105−113. Biomolecules. Dissertation, Washington University in St. Louis: St. (110) Holst, M. J.; Saied, F. Numerical solution of the nonlinear Louis, MO, 2007. Poisson-Boltzmann equation: developing more robust and efficient (92) Ren, P.; Ponder, J. W. Polarizable atomic multipole water model methods. J. Comput. Chem. 1995, 16, 337−364. for molecular mechanics simulation. J. Phys. Chem. B 2003, 107, 5933− (111) Klamt, A.; Moya, C.; Palomar, J. A comprehensive comparison 5947. of the IEFPCM and SS(V)PE continuum solvation methods with the (93) Richmond, T. J. Solvent accessible surface area and excluded COSMO approach. J. Chem. Theory Comput. 2015, 11, 4220−4225. volume in proteins: Analytical equations for overlapping spheres and (112) Baldridge, K.; Klamt, A. First principles implementation of implications for the hydrophobic effect. J. Mol. Biol. 1984, 178, 63−89. solvent effects without outlying charge error. J. Chem. Phys. 1997, 106, (94) Kundrot, C. E.; Ponder, J. W.; Richards, F. M. Algorithms for 6622−6633. calculating excluded volume and its derivatives as a function of (113) Cossi, M.; Rega, N.; Scalmani, G.; Barone, V. Energies, molecule-conformation and their use in energy minimization. J. structures, and electronic properties of molecules in solution with the Comput. Chem. 1991, 12, 402−409. C-PCM solvation model. J. Comput. Chem. 2003, 24, 669−681.

2340 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article

(114) Lagardere,̀ L.; Jolly, L. H.; Lipparini, F.; Aviat, F.; Stamm, B.; (134) Swanson, J. M. J.; Mongan, J.; McCammon, J. A. Limitations of Jing, Z. F. F.; Harger, M.; Torabifard, H.; Cisneros, G. A.; Schnieders, atom-centered dielectric functions in implicit solvent models. J. Phys. M. J.; Gresh, N.; Maday, Y.; Ren, P. Y. Y.; Ponder, J. W.; Piquemal, J. P. Chem. B 2005, 109, 14769−14772. Tinker-HP: a massively parallel molecular dynamics package for (135) Mongan, J.; Simmerling, C.; McCammon, J. A.; Case, D. A.; multiscale simulations of large complex systems with advanced point Onufriev, A. Generalized Born model with a simple, robust molecular dipole polarizable force fields. Chem. Sci. 2018, 9, 956−972. volume correction. J. Chem. Theory Comput. 2007, 3, 156−169. (115) Bashford, D.; Case, D. A. Generalized Born models of (136) Onufriev, A.; Bashford, D.; Case, D. Exploring protein native macromolecular solvation effects. Annu. Rev. Phys. Chem. 2000, 51, states and large-scale conformational changes with a modified 129−152. generalized born model. Proteins 2004, 55, 383−394. (116) Grycuk, T. Deficiency of the Coulomb-field approximation in (137) Nguyen, H.; Roe, D. R.; Simmerling, C. Improved generalized the Generalized Born model: An improved formula for Born radii Born solvent model parameters for protein simulations. J. Chem. Theory evaluation. J. Chem. Phys. 2003, 119, 4817−4826. Comput. 2013, 9, 2020−2034. (117) Mongan, J.; Svrcek-Seiler, W. A.; Onufriev, A. Analysis of integral expressions for effective Born radii. J. Chem. Phys. 2007, 127, No. 185101. (118) Aguilar, B.; Shadrach, R.; Onufriev, A. V. Reducing the secondary structure bias in the generalized Born model via R6 effective radii. J. Chem. Theory Comput. 2010, 6, 3613−3630. (119) Onufriev, A.; Case, D. A.; Bashford, D. Effective Born radii in the Generalized Born approximation: The importance of being perfect. J. Comput. Chem. 2002, 23, 1297−1304. (120) Ponder, J.; Wu, C.; Ren, P.; Pande, V.; Chodera, J.; Schnieders, M.; Haque, I.; Mobley, D.; Lambrecht, D.; DiStasio, R.; Head-Gordon, M.; Clark, G.; Johnson, M.; Head-Gordon, T. Current status of the AMOEBA polarizable force field. J. Phys. Chem. B 2010, 114, 2549− 2564. (121) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins 2006, 65, 712−725. (122) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 2001, 105, 6474−6487. (123) Foloppe, N.; MacKerell, A. D. Conformational properties of the deoxyribose and ribose moieties of nucleic acids: a quantum mechanical study. J. Phys. Chem. B 1998, 102, 6669−6678. (124) Buck, M.; Bouguet-Bonnet, S.; Pastor, R. W.; MacKerell, A. D. Importance of the CMAP correction to the CHARMM22 protein force field: dynamics of hen lysozyme. Biophys. J. 2006, 90, L36−L38. (125) Nina, M.; Im, W.; Roux, B. Optimized atomic radii for protein continuum electrostatics solvation forces. Biophys. Chem. 1999, 78, 89− 96. (126) Swanson, J. M. J.; Adcock, S. A.; McCammon, J. A. Optimized radii for Poisson-Boltzmann calculations with the AMBER force field. J. Chem. Theory Comput. 2005, 1, 484−493. (127) Swanson, J. M. J.; Wagoner, J. A.; Baker, N. A.; McCammon, J. A. Optimizing the Poisson dielectric boundary with explicit solvent forces and energies: lessons learned with atom-centered dielectric functions. J. Chem. Theory Comput. 2007, 3, 170−183. (128) Green, D. F. Optimized parameters for continuum solvation calculations with carbohydrates. J. Phys. Chem. B 2008, 112, 5238− 5249. (129) Yamagishi, J.; Okimoto, N.; Morimoto, G.; Taiji, M. A New Set of Atomic Radii for Accurate Estimation of Solvation Free Energy by Poisson-Boltzmann Solvent Model. J. Comput. Chem. 2014, 35, 2132− 2139. (130) Bernales, V. S.; Marenich, A. V.; Contreras, R.; Cramer, C. J.; Truhlar, D. G. Quantum mechanical continuum solvation models for ionic liquids. J. Phys. Chem. B 2012, 116, 9122−9129. (131) Liu, C. W.; Piquemal, J. P.; Ren, P. Y. AMOEBA plus classical potential for modeling molecular interactions. J. Chem. Theory Comput. 2019, 15, 4122−4139. (132) Rackers, J. A.; Ponder, J. W. Classical Pauli repulsion: an anisotropic, atomic multipole model. J. Chem. Phys. 2019, 150, No. 084104. (133) Liu, C.; Piquemal, J.-P.; Ren, P. Implementation of geometry- dependent charge flux into the polarizable AMOEBA+ potential. J. Phys. Chem. Lett. 2020, 11, 419−426.

2341 https://doi.org/10.1021/acs.jctc.0c01286 J. Chem. Theory Comput. 2021, 17, 2323−2341 Supporting Information: Implicit Solvents for the Polarizable Atomic Multipole AMOEBA Force Field Rae A. Corrigan, Guowei Qi, Andrew C. Thiel, Jack R. Lynn, Brandon D. Walker, Thomas L. Casavant, Louis Lagardere, Jean-Philip Piquemal, Jay W. Ponder, Pengyu Ren, Michael J. Schnieders

Supplementary Figure S1. Description of the electrostatic potential experienced by an empty cavity under three different simulation conditions due to the preferential orientation of water at boundaries. A. Continuum B. Explicit Water C. Experiment or Explicit Periodic Boundary Water Droplet Φ!"#$% = 0 Φ!"#$% = Φ&#'()* Φ!"#$% = Φ&#'()* + Φ+,)%-.#/%

A. An empty cavity has no electrostatic potential under continuum electrostatics (i.e. for the PBE with the potential at the boundary of the domain set to zero).

B. An empty cavity experiences a positive potential (ΦCavity) for explicit water simulations under periodic boundary conditions (i.e. PME) due to preferential orientation of water around the empty cavity. C. In addition to the cavity potential, an empty cavity under experimental conditions (or for an explicit water droplet simulation embedded in a vacuum) experiences a potential due to preferential orientation of water at the vacuum-liquid interface (ΦInterface). Neither molecular mechanics force fields nor continuum solvents should implicitly include the effect of the interface potential ΦInterface when fitting against experimental hydration free energy differences for charged molecules. However, continuum solvents should implicitly include the cavity potential ΦCavity during fitting of solute atomic radii, such that continuum ensembles are comparable to PME explicit water ensembles.

Supplementary Table S1. The implicit solvent dispersion term was fit to dispersion free energy differences calculated for a set of small molecules in explicit solvent. The model uses an integral offset (d = 1.056Å) that pushes the start of the dispersion integral further into the solvent. The R2 for explicit vs implicit dispersion values is 0.987 (all values in kcal/mol).

Molecule ΔWdisp Explicit ΔWdisp Implicit SE UE acetic acid -8.40 -7.89 0.51 0.51 formic acid -6.10 -5.25 0.85 0.85 ethanol -7.90 -7.48 0.42 0.42 isopropanol -9.50 -9.46 0.04 0.04 methanol -5.60 -4.89 0.71 0.71 propanol -9.80 -9.71 0.09 0.09 acetaldehyde -7.40 -6.64 0.76 0.76 formaldehyde -5.00 -4.33 0.67 0.67 butane -10.70 -10.92 -0.22 0.22 methane -4.40 -3.64 0.76 0.76 acetamide -9.00 -8.53 0.47 0.47 dimethylacetamide -12.50 -12.91 -0.41 0.41 dimethylformamide -10.90 -10.96 -0.06 0.06 N-methylacetamide -11.20 -10.92 0.28 0.28 N-methylformamide -9.10 -8.71 0.39 0.39 propionamide -10.70 -10.75 -0.05 0.05 ammonia -3.40 -2.93 0.47 0.47 dimethylamine -8.30 -8.10 0.20 0.20 ethylamine -8.20 -8.04 0.16 0.16 methylamine -6.10 -5.66 0.44 0.44 propylamine -10.00 -10.22 -0.22 0.22 trimethylamine -9.80 -10.26 -0.46 0.46 benzene -11.70 -11.80 -0.10 0.10 p-cresol -14.10 -14.76 -0.66 0.66 ethylbenzene -14.70 -15.80 -1.10 1.10 phenol -12.50 -12.80 -0.30 0.30 toluene -13.30 -13.86 -0.56 0.56 ethylimidazole -13.60 -14.07 -0.47 0.47 imidazole -10.30 -9.95 0.35 0.35 ethylindole -17.90 -20.42 -2.52 2.52 N-methylpyrrolidine -11.90 -13.46 -1.56 1.56 pyrrolidine -10.60 -11.54 -0.94 0.94 dimethylsulfide -10.10 -9.68 0.42 0.42 ethanethiol -9.70 -9.39 0.31 0.31 methyl ethyl sulfide -12.00 -11.93 0.07 0.07 methanethiol -7.70 -7.06 0.64 0.64 water -2.50 -2.07 0.43 0.43 0.00 0.52

1

Supplementary Table S2. Initial (van der Waals) and optimized solute diameters for APBS, ddCOSMO, and GK. All diameter values are given in Angstroms (Å). APBS, ddCOSMO, and GK optimized diameters deviate from original van der Waals values by an average of 9.3%, 9.9%, and 14.9 %, respectively.

Element SMARTS Group vdW APBS ddCOSMO GK [#1]([#8]) 2.665 3.247 2.822 2.031 [#1](c(n)) 3.000 2.906 2.602 2.573 [#1]([CH3](N(C(=O)))) 2.930 2.289 2.003 2.626 [#1][#6;D3] 2.583 [#1]([O;D2;H2]) 2.655 2.574 2.758 2.905 [#1]([OH1](C=O)) 2.655 [#1]([N;D3]([C]([#1]))) 2.700 [#1]([N+;D3]) 2.700 3.649 3.404 2.989 [#1][C;D4][OH] 2.870 [#1]([N;D4]) 2.480 [#1](C=O) 2.920 3.035 2.686 3.315 [#1]([CH2]([N])) 2.960 [#1][#16;D2] 2.770 [#1]([CH4]) 2.900 2.996 2.928 3.356 H [#1][CH3][CH3] 2.960 [#1]([CH3]([CH2](S))) 2.980 [#1](c) 2.980 3.002 2.635 3.376 [#1][O;D2][#6] 2.655 [#1]([C][S;D2]) 2.870 2.653 3.392 2.904 [#1]([CH3]([O]([CH2,CH3,c]))) 2.890 [#1]([N](C=O)) 2.590 [#1](N(~C(c))) 2.590 [#1]([N;D3]) 2.700 2.909 2.836 3.521 [#1]([CH1;D4]) 2.980 [#1]([CH3](C=O)) 2.980 [#1]([CH2]([CH3])) 2.980 3.021 3.885 4.050 [#1]([CH2]) 2.980 [#1][C;D4][N,c;D3] 2.880 3.056 2.996 4.032 [#1]([CH2]([CH2]([N]))) 2.960 3.991 4.144 3.893 [#1][CH3] 2.960 3.143 3.374 4.144 [cH1][n+] 3.650 c(n) 3.780 3.859 3.766 3.845 c([cH0]) 3.800 [CH0;D4] 3.600 [CH1]([CH3])([CH3])([OH1]) 3.650 3.580 3.623 3.600 [CH3]([OH1]) 3.760 [#6;D3](=[#8]) 3.820 4.267 3.776 3.833 [CH3]([CH2](S)) 3.820 C [CH4] 3.780 3.411 3.545 3.842

c 3.800 c(c([cH0])) 3.800 c(c(c([cH0]))) 3.800 ccccncc 3.800 3.829 3.893 4.508 cccnccc 3.800 ccncccc 3.800 cnccccc 3.800 [CH3] 3.820 3.506 3.309 4.536

2

[#6;D3]([*])([*])(=[*]) 3.800 2.930 4.506 4.212 [cH0] 3.800 3.691 3.766 4.426 [CH1] 3.650 C(S) 3.780 3.397 3.255 4.738 [#6;D3](=[#8-]) 3.820 [C](=O)([N,c]) 3.820 [#6] 3.820 3.303 2.854 4.902 n 3.710 3.602 4.034 3.461 [N]([C]([#1])) 3.710 2.884 2.990 3.218 [NH3] 3.710 3.524 3.297 3.372 [N+] 3.710 N 2.226 4.700 3.259 [#7+](=[#6](-[#7](-[H])-[H])-[#7](-[H])-[H])(-[H])-[H] 3.710 N(~C(c)) 3.710 3.710 3.498 3.731 [#7;D3] 3.710 3.424 3.491 4.268 [N](C=O) 3.710 [#8;D1]=[#6] 3.300 [O](=C) 3.300 O(=C(c)) 3.300 2.984 3.356 2.962 [#8;D1]=[#15] 3.360 [#8;D2](~[#15]) 3.450 [#8;D1][*] 3.400 2.801 3.128 2.981 O [O;D2][#6] 3.405 3.168 3.134 3.100 [O;D2;H2] 3.405 3.124 2.906 2.795 [OD1]~C~[OD1] 3.450 3.039 3.156 3.206 [OH1](C=O) 3.405 3.399 2.819 2.934 [O;D1](~S) 3.510 3.711 3.787 4.355 F [#9] 3.220 4.114 4.166 3.282 P [#15] 4.450 4.870 4.353 4.082 [SH2] 4.005 4.066 3.800 4.136 S [#16] 4.005 4.435 4.194 4.543 [SH]2 4.005

3

Supplementary Table S3. Solvation Free Energy Difference Components Using AMOEBA van der Waals Radii. Solvation free energy difference components (kcal/mol) based on using AMOEBA van der Waals radii for APBS, ddCOSMO and GK (i.e. prior to fitting electrostatic radii). Non-polar terms were optimized using GK and used with all three electrostatics models. A solvent pressure of 0.0334 kcal/mol/Å3, a dispersion offset of 1.056 Å, and a descreening scale factor of 0.75 were used in the calculation of all nonpolar energies.

ΔUelec

Molecule ΔWcav ΔWdisp ΔWapolar APBS ddCOSMO GK methane 6.49 -3.64 2.86 -0.37 -0.38 -1.27 ethane 9.06 -6.42 2.65 -0.29 -0.37 -0.53 propane 11.27 -8.72 2.55 -0.33 -0.46 -0.54 isobutane 13.22 -10.56 2.66 -0.59 -0.80 -1.83 butane 13.44 -10.92 2.52 -0.44 -0.54 -1.87 isopentane 15.08 -12.40 2.68 -0.51 -0.68 -0.79 neopentane 14.88 -12.29 2.59 -0.61 -0.81 -1.33 pentane 15.59 -13.08 2.51 -0.43 -0.65 -0.65 cyclopentane 13.84 -12.17 1.67 -0.39 -0.82 -0.90 methylcyclopentane 15.79 -13.85 1.94 -0.65 -0.91 -2.77 2,3-dimethylbutane 16.77 -13.92 2.85 -0.67 -0.85 -1.05 neohexane 16.69 -14.02 2.67 -0.64 -0.90 -1.55 hexane 17.76 -15.24 2.52 -0.60 -0.87 -1.73 cyclohexane 15.56 -13.85 1.71 -0.33 -0.62 -0.22 methylcyclohexane 17.46 -15.47 1.99 -0.45 -0.78 -0.56 heptane 19.92 -17.40 2.52 -0.52 -0.81 -0.87 octane 22.08 -19.55 2.53 -0.57 -0.92 -0.99 decane 26.40 -23.86 2.54 -0.64 -1.09 -1.29 methanol 7.43 -4.89 2.54 -5.42 -6.26 -4.60 ethanol 9.95 -7.48 2.47 -5.57 -6.75 -4.43 isopropanol 11.95 -9.46 2.49 -5.42 -6.36 -5.16 propanol 12.05 -9.71 2.34 -5.89 -7.34 -4.83 butanol 14.22 -11.89 2.33 -5.83 -7.25 -4.78 butan-2-ol 14.07 -11.58 2.49 -5.15 -5.77 -4.19 pentanol 16.38 -14.06 2.32 -5.94 -7.49 -4.99 pentan-2-ol 16.22 -13.72 2.49 -5.02 -6.07 -4.84 hexanol 18.52 -16.21 2.31 -5.99 -7.55 -5.19 3-methylbutan-1-ol 15.87 -13.37 2.50 -6.04 -7.46 -5.24 cyclopentanol 14.41 -12.80 1.62 -4.77 -5.17 -7.28 ethyleneglycol 10.51 -8.44 2.07 -9.33 -10.94 -8.92 glucose 19.80 -18.29 1.51 -26.76 -34.14 -21.88 phenol 13.91 -12.80 1.10 -6.80 -6.79 -5.64 2-fluorophenol 14.23 -13.48 0.75 -6.05 -6.87 -5.89 4-fluorophenol 14.29 -13.50 0.80 -7.00 -8.13 -5.95 ammonia 5.54 -2.93 2.61 -5.44 -5.20 -7.65

4 methylamine 8.05 -5.66 2.39 -4.46 -4.81 -5.15 ethylamine 10.45 -8.04 2.41 -4.43 -5.06 -4.47 propylamine 12.60 -10.22 2.38 -4.36 -4.96 -4.52 dimethylamine 10.36 -8.10 2.26 -4.17 -4.52 -11.05 diethylamine 15.12 -12.58 2.54 -2.55 -2.42 -3.41 diisopropylamine 18.53 -15.75 2.78 -3.78 -3.23 -2.76 dibutylamine 24.05 -21.21 2.84 -3.56 -3.66 -4.11 trimethylamine 12.36 -10.26 2.10 -2.85 -2.50 -2.51 triethylamine 18.72 -15.92 2.80 -2.81 -3.02 -1.94 morpholine 15.07 -12.48 2.59 -7.66 -8.88 -8.14 benzenamine 14.54 -13.33 1.20 -5.62 -6.27 -11.59 acetamide 10.33 -8.53 1.80 -11.18 -13.11 -11.13 N-methylacetamide 12.76 -10.92 1.83 -9.14 -10.80 -6.88 dimethylacetamide 14.55 -12.91 1.64 -7.08 -9.01 -6.80 N-methylformamide 10.40 -8.71 1.69 -9.19 -10.54 -7.41 dimethylformamide 12.52 -10.96 1.56 -7.09 -8.75 -7.19 propionamide 12.46 -10.75 1.70 -10.35 -11.97 -10.81 benzamide 16.28 -15.86 0.42 -10.45 -11.96 -10.41 N,N,4-trimethylbenzamide 22.43 -21.79 0.63 -7.11 -8.08 -5.81 pyrrole 11.38 -10.24 1.15 -6.87 -6.20 -6.60 pyrrolidine 13.50 -11.54 1.97 -3.58 -4.31 -5.44 pyridine 12.68 -11.59 1.09 -5.13 -6.01 -3.57 N-methylpyrrolidine 15.58 -13.46 2.12 -3.23 -3.30 -6.12 N-methyl-2-pyridone 15.46 -14.58 0.88 -8.44 -10.18 -8.53 3-methylindole 18.23 -18.21 0.01 -7.26 -6.23 -6.42 imidazole 10.84 -9.95 0.88 -10.55 -12.39 -7.42 5-fluorouracil 13.72 -13.89 -0.17 -16.35 -19.56 -12.41 benzene 13.18 -11.80 1.38 -2.77 -2.68 -3.43 toluene 15.17 -13.86 1.32 -2.98 -2.62 -3.11 p-cresol 15.90 -14.76 1.14 -6.99 -6.92 -5.42 ethylbenzene 17.32 -15.80 1.53 -2.91 -2.68 -3.07 fluorobenzene 13.54 -12.58 0.96 -3.05 -3.74 -3.50 methoxymethane 9.85 -7.57 2.28 -3.53 -3.82 -2.47 methoxyethane 12.30 -9.92 2.38 -3.50 -3.82 -2.77 ethoxyethane 14.72 -12.21 2.51 -3.30 -3.65 -2.60 1,2-dimethoxyethane 15.54 -13.35 2.19 -5.78 -5.68 -5.75 1-methoxypropane 14.48 -12.14 2.34 -3.68 -4.40 -3.18 tetrahydrofuran 12.72 -11.33 1.39 -3.58 -4.88 -2.36 2-methyltetrahydrofuran 14.83 -13.09 1.74 -3.94 -5.07 -2.50 tetrahydropyran 14.56 -13.10 1.46 -3.38 -4.10 -1.80 1,4-dioxane 13.55 -12.30 1.25 -6.73 -7.36 -7.12 methanethiol 8.69 -7.06 1.63 -4.13 -3.62 -4.24 ethanethiol 11.06 -9.39 1.67 -4.13 -3.56 -4.21 propanethiol 13.25 -11.55 1.71 -5.10 -4.70 -5.48

5 butanethiol 15.39 -13.74 1.65 -4.26 -3.85 -4.57 acetic acid 9.85 -7.89 1.96 -7.26 -7.79 -6.02 propionic acid 12.03 -10.14 1.89 -7.11 -7.90 -5.81 butyric acid 14.20 -12.33 1.87 -7.30 -8.40 -6.21 dimethylsulfide 10.87 -9.68 1.19 -4.28 -3.73 -4.59 hydrogen sulfide 6.28 -4.14 2.14 -3.71 -2.85 -3.93 methyl ethyl sulfide 13.25 -11.93 1.33 -4.00 -3.32 -4.34 acetaldehyde 9.02 -6.64 2.38 -5.10 -5.90 -4.29 formaldehyde 6.61 -4.33 2.28 -4.65 -5.34 -3.58 dimethylsulfoxide 12.20 -10.43 1.77 -11.17 -15.12 -33.37 triethylphosphate 24.37 -23.15 1.22 -9.66 -11.07 -7.78 water 4.80 -2.07 2.73 -6.52 -6.76 -5.79 acetate (-) 9.82 -7.78 2.04 -78.02 -79.72 -73.80 propanoate (-) 11.97 -10.03 1.94 -75.98 -80.05 -70.39 methyl hydrogen phosphate (-) 13.05 -12.98 0.07 -70.89 -74.97 -69.06 dimethylphosphate (-) 15.33 -15.03 0.30 -69.54 -73.95 -72.89 methylammonium (+) 8.15 -5.67 2.48 -74.82 -76.83 -72.19 dimethylammonium (+) 10.64 -8.25 2.39 -66.53 -69.05 -66.98 trimethylammonium (+) 12.76 -10.51 2.26 -58.40 -60.50 -59.36 butylammonium (+) 14.77 -12.36 2.40 -71.28 -75.76 -67.68 guanidinium (+) 10.42 -8.56 1.86 -62.76 -62.56 -57.12 methylguanidinium (+) 12.67 -10.79 1.89 -59.09 -58.58 -53.90 imidazolium (+) 10.28 -9.76 0.52 -62.98 -63.09 -57.28 4-methylimidazolium (+) 12.83 -11.95 0.89 -59.78 -60.57 -55.98

6

Supplementary Table S4. Solvation Free Energy Differences Using AMOEBA van der Waals Radii. Experimental and calculated solvation free energy differences (kcal/mol) using AMOEBA van der Waals radii for electrostatic energies. The APBS, ddCOSMO and GK values are the sum of cavitation, dispersion and electrostatic contributions given in Table S3. Experimental solvation free energy differences (ΔGexp) are from the FreeSolv database (version 0.51) unless otherwise noted.

Molecule ΔGexp APBS ddCOSMO GK methane 2.00 2.48 2.48 1.58 ethane 1.83 2.35 2.27 2.12 propane 2.00 2.21 2.09 2.01 isobutane 2.30 2.07 1.86 0.83 butane 2.10 2.09 1.98 0.65 isopentane 2.38 2.17 2.00 1.89 neopentane 2.51 1.99 1.78 1.26 pentane 2.30 2.08 1.86 1.86 cyclopentane 1.20 1.28 0.85 0.77 methylcyclopentane 1.59 1.29 1.03 -0.83 2,3-dimethylbutane 2.34 2.18 2.00 1.80 neohexane 2.51 2.03 1.77 1.12 hexane 2.48 1.92 1.65 0.79 cyclohexane 1.23 1.39 1.10 1.49 methylcyclohexane 1.70 1.54 1.21 1.44 heptane 2.67 2.00 1.72 1.66 octane 2.88 1.96 1.61 1.54 decane 3.16 1.89 1.45 1.24 methanol -5.10 -2.88 -3.71 -2.05 ethanol -5.00 -3.10 -4.28 -1.96 isopropanol -4.74 -2.93 -3.87 -2.66 propanol -4.85 -3.56 -5.00 -2.50 butanol -4.72 -3.51 -4.92 -2.45 butan-2-ol -4.62 -2.66 -3.28 -1.70 pentanol -4.57 -3.62 -5.17 -2.67 pentan-2-ol -4.39 -2.52 -3.57 -2.34 hexanol -4.40 -3.68 -5.23 -2.87 3-methylbutan-1-ol -4.42 -3.53 -4.96 -2.74 cyclopentanol -5.49 -3.15 -3.56 -5.67 ethyleneglycol -9.30 -7.26 -8.87 -6.85 glucose -25.47 -25.25 -32.63 -20.37 phenol -6.60 -5.70 -5.69 -4.54 2-fluorophenol -5.29 -5.30 -6.12 -5.14 4-fluorophenol -6.19 -6.20 -7.33 -5.15 ammonia -4.29 -2.83 -2.59 -5.04 methylamine -4.55 -2.07 -2.42 -2.75 ethylamine -4.50 -2.03 -2.65 -2.06

7 propylamine -4.39 -1.98 -2.58 -2.14 dimethylamine -4.29 -1.91 -2.26 -8.79 diethylamine -4.07 -0.01 0.12 -0.87 diisopropylamine -3.22 -1.01 -0.46 0.02 dibutylamine -3.24 -0.72 -0.82 -1.27 trimethylamine -3.20 -0.75 -0.40 -0.41 triethylamine -3.22 -0.01 -0.22 0.87 morpholine -7.17 -5.07 -6.29 -5.55 benzenamine -5.49 -4.41 -5.07 -10.38 acetamide -9.71 -9.38 -11.32 -9.34 N-methylacetamide -10.00 -7.31 -8.97 -5.05 dimethylacetamide -8.50a -5.44 -7.37 -5.16 N-methylformamide -10.00a -7.49 -8.84 -5.72 dimethylformamide -7.81 -5.53 -7.19 -5.63 propionamide -9.40 -8.65 -10.27 -9.10 benzamide -11.00 -10.03 -11.55 -9.99 N,N,4-trimethylbenzamide -9.76 -6.48 -7.45 -5.18 pyrrole -4.78 -5.73 -5.06 -5.45 pyrrolidine -5.48 -1.61 -2.34 -3.47 pyridine -4.69 -4.04 -4.92 -2.48 N-methylpyrrolidine -3.98a -1.11 -1.19 -4.00 N-methyl-2-pyridone -10.00a -7.56 -9.30 -7.65 3-methylindole -5.88 -7.25 -6.21 -6.41 imidazole -9.63 -9.66 -11.50 -6.54 5-fluorouracil -16.92 -16.52 -19.73 -12.58 benzene -0.90 -1.38 -1.30 -2.05 toluene -0.90 -1.66 -1.30 -1.79 p-cresol -6.13 -5.85 -5.78 -4.28 ethylbenzene -0.79 -1.38 -1.15 -1.55 fluorobenzene -0.80 -2.10 -2.78 -2.54 methoxymethane -1.91 -1.25 -1.54 -0.19 methoxyethane -2.10 -1.13 -1.45 -0.39 ethoxyethane -1.59 -0.79 -1.14 -0.09 1,2-dimethoxyethane -4.84 -3.59 -3.49 -3.56 1-methoxypropane -1.66 -1.34 -2.05 -0.84 tetrahydrofuran -3.47 -2.20 -3.50 -0.98 2-methyltetrahydrofuran -3.30 -2.20 -3.33 -0.76 tetrahydropyran -3.12 -1.93 -2.64 -0.35 1,4-dioxane -5.06 -5.48 -6.12 -5.88 methanethiol -1.20 -2.50 -1.99 -2.61 ethanethiol -1.14 -2.46 -1.88 -2.53 propanethiol -1.10 -3.40 -2.99 -3.78 butanethiol -0.99b -2.61 -2.21 -2.93 acetic acid -6.69 -5.30 -5.84 -4.06

8

propionic acid -6.46 -5.22 -6.01 -3.92 butyric acid -6.35 -5.43 -6.52 -4.33 dimethylsulfide -1.61b -3.09 -2.54 -3.40 hydrogen sulfide -0.70 -1.57 -0.71 -1.79 methyl ethyl sulfide -1.50b -2.67 -1.99 -3.01 acetaldehyde -3.50 -2.73 -3.52 -1.92 formaldehyde -2.75 -2.37 -3.05 -1.30 dimethylsulfoxide -9.28 -9.40 -13.35 -31.60 triethylphosphate -7.50 -8.43 -9.85 -6.55 water -6.30 -3.78 -4.02 -3.06 acetate (-) -87.76 -75.98 -77.69 -71.76 propanoate (-) -85.94 -74.03 -78.11 -68.45 methyl hydrogen phosphate (-) -85.63c -70.31 -74.39 -68.48 dimethylphosphate (-) -85.28c -69.15 -73.55 -72.50 methylammonium (+) -66.60 -72.34 -74.35 -69.71 dimethylammonium (+) -58.87 -64.15 -66.67 -64.59 trimethylammonium (+) -51.28 -56.14 -58.25 -57.10 butylammonium (+) -61.15 -68.87 -73.35 -65.28 guanidinium (+) -52.31c -60.90 -60.70 -55.26 methylguanidinium (+) -47.18c -57.21 -56.69 -52.01 imidazolium (+) -56.02 -62.46 -62.57 -56.76 4-methylimidazolium (+) -52.57 -58.89 -59.68 -55.10 a Gallicchio, E.; Zhang, L. Y.; Levy, R. M., The SGB/NP hydration free energy model based on the surface generalized Born solvent reaction field and novel nonpolar hydration free energy estimators. J. Comput. Chem. 2002, 23 (5), 517- 529. b Rizzo, Robert C., et al. Estimation of Absolute Free Energies of Hydration Using Continuum Methods: Accuracy of Partial Charge Models and Optimization of Nonpolar Contributions. Journal of Chemical Theory and Computation 2006, 2 (1), 128-39. c Determined using BAR with AMOEBA explicit solvent, see Supplementary Table S9

9

Supplementary Table S5. Signed and Unsigned Solvation Free Energy Difference Errors Using AMOEBA van der Waals Radii. Signed and unsigned errors (kcal/mol), relative to experiment, based on solvation free energy differences given in Table S4. SE UE Molecule APBS ddCOSMO GK APBS ddCOSMO GK methane 0.48 0.48 -0.42 0.48 0.48 0.42 ethane 0.52 0.44 0.29 0.52 0.44 0.29 propane 0.21 0.09 0.01 0.21 0.09 0.01 isobutane -0.23 -0.44 -1.47 0.23 0.44 1.47 butane -0.01 -0.12 -1.45 0.01 0.12 1.45 isopentane -0.21 -0.38 -0.49 0.21 0.38 0.49 neopentane -0.52 -0.73 -1.25 0.52 0.73 1.25 pentane -0.22 -0.44 -0.44 0.22 0.44 0.44 cyclopentane 0.08 -0.35 -0.43 0.08 0.35 0.43 methylcyclopentane -0.30 -0.56 -2.42 0.30 0.56 2.42 2,3-dimethylbutane -0.16 -0.34 -0.54 0.16 0.34 0.54 neohexane -0.48 -0.74 -1.39 0.48 0.74 1.39 hexane -0.56 -0.83 -1.69 0.56 0.83 1.69 cyclohexane 0.16 -0.13 0.26 0.16 0.13 0.26 methylcyclohexane -0.16 -0.49 -0.26 0.16 0.49 0.26 heptane -0.67 -0.95 -1.01 0.67 0.95 1.01 octane -0.92 -1.27 -1.34 0.92 1.27 1.34 decane -1.27 -1.71 -1.92 1.27 1.71 1.92 methanol 2.22 1.39 3.05 2.22 1.39 3.05 ethanol 1.90 0.72 3.04 1.90 0.72 3.04 isopropanol 1.81 0.87 2.08 1.81 0.87 2.08 propanol 1.29 -0.15 2.35 1.29 0.15 2.35 butanol 1.21 -0.20 2.27 1.21 0.20 2.27 butan-2-ol 1.96 1.34 2.92 1.96 1.34 2.92 pentanol 0.95 -0.60 1.90 0.95 0.60 1.90 pentan-2-ol 1.87 0.82 2.05 1.87 0.82 2.05 hexanol 0.72 -0.83 1.53 0.72 0.83 1.53 3-methylbutan-1-ol 0.89 -0.54 1.68 0.89 0.54 1.68 cyclopentanol 2.34 1.93 -0.18 2.34 1.93 0.18 ethyleneglycol 2.04 0.43 2.45 2.04 0.43 2.45 glucose 0.22 -7.16 5.10 0.22 7.16 5.10 phenol 0.90 0.91 2.06 0.90 0.91 2.06 2-fluorophenol -0.01 -0.83 0.15 0.01 0.83 0.15 4-fluorophenol -0.01 -1.14 1.04 0.01 1.14 1.04 ammonia 1.46 1.70 -0.75 1.46 1.70 0.75 methylamine 2.48 2.13 1.80 2.48 2.13 1.80 ethylamine 2.47 1.85 2.44 2.47 1.85 2.44 propylamine 2.41 1.81 2.25 2.41 1.81 2.25 dimethylamine 2.38 2.03 -4.50 2.38 2.03 4.50

10 diethylamine 4.06 4.19 3.20 4.06 4.19 3.20 diisopropylamine 2.21 2.76 3.24 2.21 2.76 3.24 dibutylamine 2.52 2.42 1.97 2.52 2.42 1.97 trimethylamine 2.45 2.80 2.79 2.45 2.80 2.79 triethylamine 3.21 3.00 4.09 3.21 3.00 4.09 morpholine 2.10 0.88 1.62 2.10 0.88 1.62 benzenamine 1.08 0.42 -4.89 1.08 0.42 4.89 acetamide 0.33 -1.61 0.37 0.33 1.61 0.37 N-methylacetamide 2.69 1.03 4.95 2.69 1.03 4.95 dimethylacetamide 3.06 1.13 3.34 3.06 1.13 3.34 N-methylformamide 2.51 1.16 4.28 2.51 1.16 4.28 dimethylformamide 2.28 0.62 2.18 2.28 0.62 2.18 propionamide 0.75 -0.87 0.30 0.75 0.87 0.30 benzamide 0.97 -0.55 1.01 0.97 0.55 1.01 N,N,4-trimethylbenzamide 3.28 2.31 4.58 3.28 2.31 4.58 pyrrole -0.95 -0.28 -0.67 0.95 0.28 0.67 pyrrolidine 3.87 3.14 2.01 3.87 3.14 2.01 pyridine 0.65 -0.23 2.21 0.65 0.23 2.21 N-methylpyrrolidine 2.87 2.79 -0.02 2.87 2.79 0.02 N-methyl-2-pyridone 2.44 0.70 2.35 2.44 0.70 2.35 3-methylindole -1.37 -0.33 -0.53 1.37 0.33 0.53 imidazole -0.03 -1.87 3.09 0.03 1.87 3.09 5-fluorouracil 0.40 -2.81 4.34 0.40 2.81 4.34 benzene -0.48 -0.40 -1.15 0.48 0.40 1.15 toluene -0.76 -0.40 -0.89 0.76 0.40 0.89 p-cresol 0.28 0.35 1.85 0.28 0.35 1.85 ethylbenzene -0.59 -0.36 -0.76 0.59 0.36 0.76 fluorobenzene -1.30 -1.98 -1.74 1.30 1.98 1.74 methoxymethane 0.66 0.37 1.72 0.66 0.37 1.72 methoxyethane 0.97 0.65 1.71 0.97 0.65 1.71 ethoxyethane 0.80 0.45 1.50 0.80 0.45 1.50 1,2-dimethoxyethane 1.25 1.35 1.28 1.25 1.35 1.28 1-methoxypropane 0.32 -0.39 0.82 0.32 0.39 0.82 tetrahydrofuran 1.27 -0.03 2.49 1.27 0.03 2.49 2-methyltetrahydrofuran 1.10 -0.03 2.54 1.10 0.03 2.54 tetrahydropyran 1.19 0.48 2.77 1.19 0.48 2.77 1,4-dioxane -0.42 -1.06 -0.82 0.42 1.06 0.82 methanethiol -1.30 -0.79 -1.41 1.30 0.79 1.41 ethanethiol -1.32 -0.74 -1.39 1.32 0.74 1.39 propanethiol -2.30 -1.89 -2.68 2.30 1.89 2.68 butanethiol -1.62 -1.22 -1.94 1.62 1.22 1.94 acetic acid 1.39 0.85 2.63 1.39 0.85 2.63 propionic acid 1.24 0.45 2.54 1.24 0.45 2.54 butyric acid 0.92 -0.17 2.02 0.92 0.17 2.02

11 dimethylsulfide -1.48 -0.93 -1.79 1.48 0.93 1.79 hydrogen sulfide -0.87 -0.01 -1.09 0.87 0.01 1.09 methyl ethyl sulfide -1.17 -0.49 -1.51 1.17 0.49 1.51 acetaldehyde 0.77 -0.02 1.58 0.77 0.02 1.58 formaldehyde 0.38 -0.30 1.45 0.38 0.30 1.45 dimethylsulfoxide -0.12 -4.07 -22.32 0.12 4.07 22.32 triethylphosphate -0.93 -2.35 0.95 0.93 2.35 0.95 water 2.52 2.28 3.24 2.52 2.28 3.24 acetate (-) 11.78 10.07 16.00 11.78 10.07 16.00 propanoate (-) 11.91 7.83 17.49 11.91 7.83 17.49 methyl hydrogen phosphate (-) 15.32 11.24 17.15 15.32 11.24 17.15 dimethylphosphate (-) 16.13 11.73 12.78 16.13 11.73 12.78 methylammonium (+) -5.74 -7.75 -3.11 5.74 7.75 3.11 dimethylammonium (+) -5.28 -7.80 -5.72 5.28 7.80 5.72 trimethylammonium (+) -4.86 -6.97 -5.82 4.86 6.97 5.82 butylammonium (+) -7.72 -12.20 -4.13 7.72 12.20 4.13 guanidinium (+) -8.59 -8.39 -2.95 8.59 8.39 2.95 methylguanidinium (+) -10.03 -9.51 -4.83 10.03 9.51 4.83 imidazolium (+) -6.44 -6.55 -0.74 6.44 6.55 0.74 4-methylimidazolium (+) -6.32 -7.11 -2.53 6.32 7.11 2.53 All 0.67 -0.17 0.87 2.18 2.04 2.76 Neutrals 0.76 0.09 0.62 1.26 1.13 2.10

12

Supplementary Table S6. Solvation Free Energy Difference Components Using Fit Electrostatic Radii. Solvation free energy difference components (kcal/mol) using optimized electrostatic radii for APBS, ddCOSMO and GK. Non-polar terms are identical to those given in Table S3.

ΔUelec

Molecule ΔWcav ΔWdisp ΔWapolar APBS ddCOSMO GK methane 6.49 -3.64 2.86 -0.43 -0.46 -0.97 ethane 9.06 -6.42 2.65 -0.34 -0.55 -0.10 propane 11.27 -8.72 2.55 -0.31 -0.27 -0.16 isobutane 13.22 -10.56 2.66 -0.60 -0.59 -0.73 butane 13.44 -10.92 2.52 -0.43 -0.32 -0.72 isopentane 15.08 -12.40 2.68 -0.48 -0.40 -0.32 neopentane 14.88 -12.29 2.59 -0.56 -0.55 -0.75 pentane 15.59 -13.08 2.51 -0.41 -0.34 -0.14 cyclopentane 13.84 -12.17 1.67 -0.46 -0.27 -0.23 methylcyclopentane 15.79 -13.85 1.94 -0.74 -0.42 -1.14 2,3-dimethylbutane 16.77 -13.92 2.85 -0.62 -0.63 -0.41 neohexane 16.69 -14.02 2.67 -0.57 -0.53 -0.89 hexane 17.76 -15.24 2.52 -0.63 -0.40 -0.53 cyclohexane 15.56 -13.85 1.71 -0.37 -0.82 0.04 methylcyclohexane 17.46 -15.47 1.99 -0.48 -0.63 -0.15 heptane 19.92 -17.40 2.52 -0.50 -0.38 -0.18 octane 22.08 -19.55 2.53 -0.55 -0.42 -0.19 decane 26.40 -23.86 2.54 -0.64 -0.46 -0.26 methanol 7.43 -4.89 2.54 -6.09 -6.06 -5.94 ethanol 9.95 -7.48 2.47 -6.56 -6.36 -6.04 isopropanol 11.95 -9.46 2.49 -6.26 -6.16 -6.41 propanol 12.05 -9.71 2.34 -6.66 -6.87 -6.57 butanol 14.22 -11.89 2.33 -6.51 -6.73 -6.56 butan-2-ol 14.07 -11.58 2.49 -5.88 -5.30 -5.18 pentanol 16.38 -14.06 2.32 -6.64 -6.89 -6.66 pentan-2-ol 16.22 -13.72 2.49 -5.80 -5.37 -6.10 hexanol 18.52 -16.21 2.31 -6.67 -6.89 -6.72 3-methylbutan-1-ol 15.87 -13.37 2.50 -6.80 -7.01 -6.92 cyclopentanol 14.41 -12.80 1.62 -5.52 -4.37 -6.29 ethyleneglycol 10.51 -8.44 2.07 -10.33 -10.20 -10.99 glucose 19.80 -18.29 1.51 -29.21 -29.52 -28.59 phenol 13.91 -12.80 1.10 -7.33 -6.58 -6.59 2-fluorophenol 14.23 -13.48 0.75 -5.91 -5.66 -6.51 4-fluorophenol 14.29 -13.50 0.80 -7.05 -6.99 -6.28 ammonia 5.54 -2.93 2.61 -5.83 -6.92 -6.88 methylamine 8.05 -5.66 2.39 -6.40 -6.93 -7.10 ethylamine 10.45 -8.04 2.41 -6.55 -7.37 -6.69 propylamine 12.60 -10.22 2.38 -6.28 -7.09 -6.65 dimethylamine 10.36 -8.10 2.26 -4.98 -6.29 -7.74

13 diethylamine 15.12 -12.58 2.54 -3.16 -3.89 -7.24 diisopropylamine 18.53 -15.75 2.78 -4.29 -4.22 -6.41 dibutylamine 24.05 -21.21 2.84 -4.01 -5.27 -7.26 trimethylamine 12.36 -10.26 2.10 -5.44 -5.19 -3.89 triethylamine 18.72 -15.92 2.80 -7.37 -6.89 -4.73 morpholine 15.07 -12.48 2.59 -9.63 -12.94 -11.50 benzenamine 14.54 -13.33 1.20 -5.55 -6.77 -6.81 acetamide 10.33 -8.53 1.80 -12.57 -12.01 -12.25 N-methylacetamide 12.76 -10.92 1.83 -11.33 -10.92 -10.88 dimethylacetamide 14.55 -12.91 1.64 -9.91 -10.49 -11.06 N-methylformamide 10.40 -8.71 1.69 -11.06 -10.91 -10.65 dimethylformamide 12.52 -10.96 1.56 -9.69 -10.34 -10.31 propionamide 12.46 -10.75 1.70 -11.58 -10.37 -11.59 benzamide 16.28 -15.86 0.42 -10.78 -11.30 -11.46 N,N,4-trimethylbenzamide 22.43 -21.79 0.63 -9.46 -10.04 -10.19 pyrrole 11.38 -10.24 1.15 -6.16 -5.93 -6.91 pyrrolidine 13.50 -11.54 1.97 -4.27 -6.57 -6.20 pyridine 12.68 -11.59 1.09 -5.46 -5.63 -4.99 N-methylpyrrolidine 15.58 -13.46 2.12 -4.55 -6.50 -5.39 N-methyl-2-pyridone 15.46 -14.58 0.88 -10.30 -10.09 -11.98 3-methylindole 18.23 -18.21 0.01 -6.42 -6.02 -5.74 imidazole 10.84 -9.95 0.88 -10.31 -10.68 -10.25 5-fluorouracil 13.72 -13.89 -0.17 -17.36 -17.35 -15.64 benzene 13.18 -11.80 1.38 -2.68 -3.09 -2.93 toluene 15.17 -13.86 1.32 -2.72 -2.91 -2.44 p-cresol 15.90 -14.76 1.14 -7.36 -6.51 -6.13 ethylbenzene 17.32 -15.80 1.53 -2.74 -2.94 -2.40 fluorobenzene 13.54 -12.58 0.96 -2.57 -3.46 -2.95 methoxymethane 9.85 -7.57 2.28 -4.10 -4.48 -4.21 methoxyethane 12.30 -9.92 2.38 -4.23 -4.46 -4.48 ethoxyethane 14.72 -12.21 2.51 -4.08 -4.18 -4.06 1,2-dimethoxyethane 15.54 -13.35 2.19 -7.10 -6.61 -8.04 1-methoxypropane 14.48 -12.14 2.34 -4.42 -4.60 -4.79 tetrahydrofuran 12.72 -11.33 1.39 -4.41 -5.32 -4.48 2-methyltetrahydrofuran 14.83 -13.09 1.74 -4.91 -5.80 -4.83 tetrahydropyran 14.56 -13.10 1.46 -4.22 -4.15 -3.28 1,4-dioxane 13.55 -12.30 1.25 -8.56 -6.93 -7.91 methanethiol 8.69 -7.06 1.63 -3.04 -2.82 -2.69 ethanethiol 11.06 -9.39 1.67 -2.94 -2.88 -2.69 propanethiol 13.25 -11.55 1.71 -4.02 -3.51 -3.42 butanethiol 15.39 -13.74 1.65 -3.06 -2.94 -2.66 acetic acid 9.85 -7.89 1.96 -8.52 -8.71 -8.32 propionic acid 12.03 -10.14 1.89 -8.36 -8.21 -8.23 butyric acid 14.20 -12.33 1.87 -8.57 -8.34 -8.67

14 dimethylsulfide 10.87 -9.68 1.19 -3.30 -2.75 -3.18 hydrogen sulfide 6.28 -4.14 2.14 -3.20 -2.91 -2.94 methyl ethyl sulfide 13.25 -11.93 1.33 -2.97 -2.52 -3.22 acetaldehyde 9.02 -6.64 2.38 -6.56 -6.22 -6.02 formaldehyde 6.61 -4.33 2.28 -5.12 -5.47 -4.96 dimethylsulfoxide 12.20 -10.43 1.77 -9.38 -11.16 -11.92 triethylphosphate 24.37 -23.15 1.22 -11.52 -10.24 -8.82 water 4.80 -2.07 2.73 -8.10 -9.07 -9.00 acetate (-) 9.82 -7.78 2.04 -90.48 -88.72 -90.40 propanoate (-) 11.97 -10.03 1.94 -87.61 -89.04 -87.38 methyl hydrogen phosphate (-) 13.05 -12.98 0.07 -86.49 -86.36 -86.18 dimethylphosphate (-) 15.33 -15.03 0.30 -85.42 -85.66 -85.99 methylammonium (+) 8.15 -5.67 2.48 -68.79 -66.41 -67.33 dimethylammonium (+) 10.64 -8.25 2.39 -61.75 -60.61 -60.93 trimethylammonium (+) 12.76 -10.51 2.26 -55.27 -54.34 -54.87 butylammonium (+) 14.77 -12.36 2.40 -64.40 -65.22 -64.08 guanidinium (+) 10.42 -8.56 1.86 -53.35 -53.37 -53.65 methylguanidinium (+) 12.67 -10.79 1.89 -52.25 -51.15 -49.41 imidazolium (+) 10.28 -9.76 0.52 -57.13 -56.30 -56.66 4-methylimidazolium (+) 12.83 -11.95 0.89 -54.31 -53.83 -53.75

15

Supplementary Table S7. Solvation Free Energy Differences Using Fit Electrostatic Radii. Experimental and calculated solvation free energy differences (kcal/mol) using fit electrostatic radii for APBS, ddCOSMO, and GK. Experimental solvation free energy differences are from the FreeSolv database (version 0.51) unless otherwise noted and are identical to those given in Table S4.

Molecule ΔGexp APBS ddCOSMO GK methane 2.00 2.43 2.39 1.89 ethane 1.83 2.31 2.10 2.54 propane 2.00 2.24 2.28 2.39 isobutane 2.30 2.06 2.07 1.92 butane 2.10 2.09 2.21 1.80 isopentane 2.38 2.20 2.28 2.37 neopentane 2.51 2.03 2.04 1.84 pentane 2.30 2.10 2.17 2.37 cyclopentane 1.20 1.21 1.40 1.43 methylcyclopentane 1.59 1.20 1.52 0.80 2,3-dimethylbutane 2.34 2.23 2.22 2.44 neohexane 2.51 2.10 2.14 1.77 hexane 2.48 1.89 2.12 1.99 cyclohexane 1.23 1.35 0.89 1.76 methylcyclohexane 1.70 1.52 1.36 1.84 heptane 2.67 2.02 2.15 2.34 octane 2.88 1.97 2.11 2.33 decane 3.16 1.90 2.08 2.27 methanol -5.10 -3.55 -3.51 -3.40 ethanol -5.00 -4.09 -3.89 -3.57 isopropanol -4.74 -3.77 -3.66 -3.92 propanol -4.85 -4.33 -4.53 -4.23 butanol -4.72 -4.18 -4.40 -4.23 butan-2-ol -4.62 -3.38 -2.80 -2.69 pentanol -4.57 -4.32 -4.57 -4.34 pentan-2-ol -4.39 -3.31 -2.87 -3.60 hexanol -4.40 -4.36 -4.58 -4.40 3-methylbutan-1-ol -4.42 -4.29 -4.50 -4.41 cyclopentanol -5.49 -3.90 -2.75 -4.67 ethyleneglycol -9.30 -8.26 -8.14 -8.92 glucose -25.47 -27.70 -28.01 -27.07 phenol -6.60 -6.23 -5.48 -5.49 2-fluorophenol -5.29 -5.16 -4.91 -5.76 4-fluorophenol -6.19 -6.26 -6.19 -5.49 ammonia -4.29 -3.21 -4.31 -4.27 methylamine -4.55 -4.00 -4.53 -4.71 ethylamine -4.50 -4.14 -4.97 -4.28 propylamine -4.39 -3.90 -4.71 -4.27 dimethylamine -4.29 -2.72 -4.03 -5.48

16 diethylamine -4.07 -0.62 -1.35 -4.70 diisopropylamine -3.22 -1.52 -1.45 -3.63 dibutylamine -3.24 -1.17 -2.42 -4.41 trimethylamine -3.20 -3.34 -3.09 -1.78 triethylamine -3.22 -4.56 -4.09 -1.93 morpholine -7.17 -7.04 -10.35 -8.90 benzenamine -5.49 -4.35 -5.57 -5.61 acetamide -9.71 -10.78 -10.22 -10.46 N-methylacetamide -10.00 -9.50 -9.09 -9.05 dimethylacetamide -8.50a -8.28 -8.85 -9.42 N-methylformamide -10.00a -9.36 -9.21 -8.95 dimethylformamide -7.81 -8.13 -8.78 -8.75 propionamide -9.40 -9.88 -8.66 -9.89 benzamide -11.00 -10.36 -10.88 -11.04 N,N,4-trimethylbenzamide -9.76 -8.83 -9.41 -9.55 pyrrole -4.78 -5.02 -4.78 -5.76 pyrrolidine -5.48 -2.31 -4.61 -4.24 pyridine -4.69 -4.37 -4.54 -3.90 N-methylpyrrolidine -3.98a -2.44 -4.39 -3.27 N-methyl-2-pyridone -10.00a -9.42 -9.21 -11.10 3-methylindole -5.88 -6.40 -6.00 -5.73 imidazole -9.63 -9.43 -9.79 -9.36 5-fluorouracil -16.92 -17.53 -17.52 -15.81 benzene -0.90 -1.30 -1.70 -1.54 toluene -0.90 -1.41 -1.60 -1.12 p-cresol -6.13 -6.22 -5.37 -4.99 ethylbenzene -0.79 -1.21 -1.41 -0.87 fluorobenzene -0.80 -1.61 -2.50 -1.99 methoxymethane -1.91 -1.82 -2.20 -1.93 methoxyethane -2.10 -1.85 -2.08 -2.11 ethoxyethane -1.59 -1.56 -1.66 -1.55 1,2-dimethoxyethane -4.84 -4.91 -4.42 -5.85 1-methoxypropane -1.66 -2.08 -2.26 -2.45 tetrahydrofuran -3.47 -3.03 -3.94 -3.09 2-methyltetrahydrofuran -3.30 -3.17 -4.06 -3.09 tetrahydropyran -3.12 -2.77 -2.70 -1.83 1,4-dioxane -5.06 -7.31 -5.68 -6.67 methanethiol -1.20 -1.41 -1.18 -1.05 ethanethiol -1.14 -1.27 -1.20 -1.02 propanethiol -1.10 -2.31 -1.80 -1.71 butanethiol -0.99b -1.41 -1.29 -1.02 acetic acid -6.69 -6.56 -6.75 -6.37 propionic acid -6.46 -6.47 -6.32 -6.34 butyric acid -6.35 -6.70 -6.46 -6.80

17

dimethylsulfide -1.61b -2.11 -1.56 -1.99 hydrogen sulfide -0.70 -1.06 -0.77 -0.80 methyl ethyl sulfide -1.50b -1.64 -1.19 -1.89 acetaldehyde -3.50 -4.18 -3.84 -3.64 formaldehyde -2.75 -2.84 -3.19 -2.68 dimethylsulfoxide -9.28 -7.61 -9.39 -10.15 triethylphosphate -7.50 -10.29 -9.02 -7.60 water -6.30 -5.36 -6.34 -6.27 acetate (-) -87.76 -88.44 -86.68 -88.36 propanoate (-) -85.94 -85.67 -87.10 -85.44 methyl hydrogen phosphate (-) -85.63c -85.91 -85.79 -85.60 dimethylphosphate (-) -85.28c -85.03 -85.26 -85.59 methylammonium (+) -66.60 -66.31 -63.93 -64.85 dimethylammonium (+) -58.87 -59.36 -58.22 -58.55 trimethylammonium (+) -51.28 -53.02 -52.09 -52.62 butylammonium (+) -61.15 -61.99 -62.81 -61.67 guanidinium (+) -52.31c -51.50 -51.51 -51.79 methylguanidinium (+) -47.18c -50.36 -49.27 -47.53 imidazolium (+) -56.02 -56.61 -55.78 -56.14 4-methylimidazolium (+) -52.57 -53.43 -52.95 -52.87 a Gallicchio, E.; Zhang, L. Y.; Levy, R. M., The SGB/NP hydration free energy model based on the surface generalized Born solvent reaction field and novel nonpolar hydration free energy estimators. J. Comput. Chem. 2002, 23 (5), 517- 529. b Rizzo, Robert C., et al. Estimation of Absolute Free Energies of Hydration Using Continuum Methods: Accuracy of Partial Charge Models and Optimization of Nonpolar Contributions. Journal of Chemical Theory and Computation 2006, 2 (1), 128-39. c Determined using BAR with AMOEBA explicit solvent, see Supplementary Table S9

18

Supplementary Table S8. Signed and Unsigned Solvation Free Energy Difference Errors Using Fit Electrostatic Radii. Signed and unsigned errors (kcal/mol), relative to experiment, based on solvation free energy differences given in Table S7. SE UE Molecule APBS ddCOSMO GK APBS ddCOSMO GK methane 0.43 0.39 -0.11 0.43 0.39 0.11 ethane 0.48 0.27 0.71 0.48 0.27 0.71 propane 0.24 0.28 0.39 0.24 0.28 0.39 isobutane -0.24 -0.23 -0.38 0.24 0.23 0.38 butane -0.01 0.11 -0.30 0.01 0.11 0.30 isopentane -0.18 -0.10 -0.01 0.18 0.10 0.01 neopentane -0.48 -0.47 -0.67 0.48 0.47 0.67 pentane -0.20 -0.13 0.07 0.20 0.13 0.07 cyclopentane 0.01 0.20 0.23 0.01 0.20 0.23 methylcyclopentane -0.39 -0.07 -0.79 0.39 0.07 0.79 2,3-dimethylbutane -0.11 -0.12 0.10 0.11 0.12 0.10 neohexane -0.41 -0.37 -0.74 0.41 0.37 0.74 hexane -0.59 -0.36 -0.49 0.59 0.36 0.49 cyclohexane 0.12 -0.34 0.53 0.12 0.34 0.53 methylcyclohexane -0.18 -0.34 0.14 0.18 0.34 0.14 heptane -0.65 -0.52 -0.33 0.65 0.52 0.33 octane -0.91 -0.77 -0.55 0.91 0.77 0.55 decane -1.26 -1.08 -0.89 1.26 1.08 0.89 methanol 1.55 1.59 1.70 1.55 1.59 1.70 ethanol 0.91 1.11 1.43 0.91 1.11 1.43 isopropanol 0.97 1.08 0.82 0.97 1.08 0.82 propanol 0.52 0.32 0.62 0.52 0.32 0.62 butanol 0.54 0.32 0.49 0.54 0.32 0.49 butan-2-ol 1.24 1.82 1.93 1.24 1.82 1.93 pentanol 0.25 0.00 0.23 0.25 0.00 0.23 pentan-2-ol 1.08 1.52 0.79 1.08 1.52 0.79 hexanol 0.04 -0.18 0.00 0.04 0.18 0.00 3-methylbutan-1-ol 0.13 -0.08 0.01 0.13 0.08 0.01 cyclopentanol 1.59 2.74 0.82 1.59 2.74 0.82 ethyleneglycol 1.04 1.16 0.38 1.04 1.16 0.38 glucose -2.23 -2.54 -1.60 2.23 2.54 1.60 phenol 0.37 1.12 1.11 0.37 1.12 1.11 2-fluorophenol 0.13 0.38 -0.47 0.13 0.38 0.47 4-fluorophenol -0.07 0.00 0.70 0.07 0.00 0.70 ammonia 1.08 -0.02 0.02 1.08 0.02 0.02 methylamine 0.55 0.02 -0.16 0.55 0.02 0.16 ethylamine 0.36 -0.47 0.22 0.36 0.47 0.22 propylamine 0.49 -0.32 0.12 0.49 0.32 0.12 dimethylamine 1.57 0.26 -1.19 1.57 0.26 1.19

19 diethylamine 3.45 2.72 -0.63 3.45 2.72 0.63 diisopropylamine 1.70 1.77 -0.41 1.70 1.77 0.41 dibutylamine 2.07 0.82 -1.17 2.07 0.82 1.17 trimethylamine -0.14 0.11 1.42 0.14 0.11 1.42 triethylamine -1.34 -0.87 1.29 1.34 0.87 1.29 morpholine 0.13 -3.18 -1.73 0.13 3.18 1.73 benzenamine 1.14 -0.08 -0.12 1.14 0.08 0.12 acetamide -1.07 -0.51 -0.75 1.07 0.51 0.75 N-methylacetamide 0.50 0.91 0.95 0.50 0.91 0.95 dimethylacetamide 0.22 -0.35 -0.92 0.22 0.35 0.92 N-methylformamide 0.64 0.79 1.05 0.64 0.79 1.05 dimethylformamide -0.32 -0.97 -0.94 0.32 0.97 0.94 propionamide -0.48 0.74 -0.49 0.48 0.74 0.49 benzamide 0.64 0.12 -0.04 0.64 0.12 0.04 N,N,4-trimethylbenzamide 0.93 0.35 0.21 0.93 0.35 0.21 pyrrole -0.24 0.00 -0.98 0.24 0.00 0.98 pyrrolidine 3.17 0.87 1.24 3.17 0.87 1.24 pyridine 0.32 0.15 0.79 0.32 0.15 0.79 N-methylpyrrolidine 1.54 -0.41 0.71 1.54 0.41 0.71 N-methyl-2-pyridone 0.58 0.79 -1.10 0.58 0.79 1.10 3-methylindole -0.52 -0.12 0.15 0.52 0.12 0.15 imidazole 0.20 -0.16 0.27 0.20 0.16 0.27 5-fluorouracil -0.61 -0.60 1.11 0.61 0.60 1.11 benzene -0.40 -0.80 -0.64 0.40 0.80 0.64 toluene -0.51 -0.70 -0.22 0.51 0.70 0.22 p-cresol -0.09 0.76 1.14 0.09 0.76 1.14 ethylbenzene -0.42 -0.62 -0.08 0.42 0.62 0.08 fluorobenzene -0.81 -1.70 -1.19 0.81 1.70 1.19 methoxymethane 0.09 -0.29 -0.02 0.09 0.29 0.02 methoxyethane 0.25 0.02 -0.01 0.25 0.02 0.01 ethoxyethane 0.03 -0.07 0.04 0.03 0.07 0.04 1,2-dimethoxyethane -0.07 0.42 -1.01 0.07 0.42 1.01 1-methoxypropane -0.42 -0.60 -0.79 0.42 0.60 0.79 tetrahydrofuran 0.44 -0.47 0.38 0.44 0.47 0.38 2-methyltetrahydrofuran 0.13 -0.76 0.21 0.13 0.76 0.21 tetrahydropyran 0.35 0.42 1.29 0.35 0.42 1.29 1,4-dioxane -2.25 -0.62 -1.61 2.25 0.62 1.61 methanethiol -0.21 0.02 0.15 0.21 0.02 0.15 ethanethiol -0.13 -0.06 0.12 0.13 0.06 0.12 propanethiol -1.21 -0.70 -0.61 1.21 0.70 0.61 butanethiol -0.42 -0.30 -0.03 0.42 0.30 0.03 acetic acid 0.13 -0.06 0.32 0.13 0.06 0.32 propionic acid -0.01 0.14 0.12 0.01 0.14 0.12 butyric acid -0.35 -0.11 -0.45 0.35 0.11 0.45

20 dimethylsulfide -0.50 0.05 -0.38 0.50 0.05 0.38 hydrogen sulfide -0.36 -0.07 -0.10 0.36 0.07 0.10 methyl ethyl sulfide -0.14 0.31 -0.39 0.14 0.31 0.39 acetaldehyde -0.68 -0.34 -0.14 0.68 0.34 0.14 formaldehyde -0.09 -0.44 0.07 0.09 0.44 0.07 dimethylsulfoxide 1.67 -0.11 -0.87 1.67 0.11 0.87 triethylphosphate -2.79 -1.52 -0.10 2.79 1.52 0.10 water 0.94 -0.04 0.03 0.94 0.04 0.03 acetate (-) -0.68 1.08 -0.60 0.68 1.08 0.60 propanoate (-) 0.27 -1.16 0.50 0.27 1.16 0.50 methyl hydrogen phosphate (-) -0.28 -0.16 0.03 0.28 0.16 0.03 dimethylphosphate (-) 0.25 0.02 -0.31 0.25 0.02 0.31 methylammonium (+) 0.29 2.67 1.75 0.29 2.67 1.75 dimethylammonium (+) -0.49 0.65 0.32 0.49 0.65 0.32 trimethylammonium (+) -1.74 -0.81 -1.34 1.74 0.81 1.34 butylammonium (+) -0.84 -1.66 -0.52 0.84 1.66 0.52 guanidinium (+) 0.81 0.80 0.52 0.81 0.80 0.52 methylguanidinium (+) -3.18 -2.09 -0.35 3.18 2.09 0.35 imidazolium (+) -0.59 0.24 -0.12 0.59 0.24 0.12 4-methylimidazolium (+) -0.86 -0.38 -0.30 0.86 0.38 0.30 All 0.05 0.00 0.00 0.70 0.63 0.58 Neutrals 0.14 0.01 0.00 0.67 0.58 0.58

21

Supplementary Table S9. Free energy differences and standard deviations using the BAR algorithm for phosphate and guanidinium compounds. All BAR values were calculated across seven equally spaced lambda windows (kcal/mol). Methyl- Dimethyl- Methyl- phosphate phosphate Guanidinium guanidinium Free Std. Free Std. Free Std. Free Std. vdW λ Windows Energy Dev. Energy Dev. Energy Dev. Energy Dev. 0.000 - 0.143 0.05 0.00 0.06 0.00 0.04 0.00 0.05 0.00 0.143 - 0.286 1.08 0.00 1.31 0.01 0.75 0.01 0.99 0.01 0.286 - 0.429 4.71 0.06 5.46 0.07 3.32 0.06 4.22 0.07 0.429 - 0.571 2.22 0.07 2.26 0.07 2.00 0.08 2.33 0.10 0.571 - 0.714 -0.26 0.03 -0.34 0.03 -0.11 0.03 -0.14 0.04 0.714 - 0.857 -1.97 0.02 -2.31 0.02 -1.35 0.02 -1.67 0.02 0.857 - 1.000 -3.69 0.01 -4.23 0.01 -2.56 0.02 -3.24 0.02 vdW Total 2.13 0.09 2.21 0.10 2.09 0.11 2.54 0.13

Electrostaticλ Windows 0.000 - 0.143 -2.57 0.02 -2.46 0.03 -0.53 0.02 -0.39 0.03 0.143 - 0.286 -6.65 0.03 -6.27 0.03 -2.86 0.02 -2.35 0.03 0.286 - 0.429 -11.08 0.04 -10.42 0.04 -5.24 0.03 -4.39 0.03 0.429 - 0.571 -15.94 0.05 -15.14 0.04 -7.75 0.03 -6.48 0.03 0.571 - 0.714 -22.09 0.06 -21.07 0.06 -10.52 0.03 -8.76 0.03 0.714 - 0.857 -29.14 0.07 -27.45 0.06 -13.48 0.03 -11.18 0.04 0.857 - 1.000 -37.14 0.06 -34.44 0.06 -16.67 0.04 -13.88 0.04 Electrostatic Total -124.62 0.19 -117.24 0.19 -57.04 0.07 -47.43 0.08

Vacuum λ Windows 0.000 - 0.143 -0.73 0.00 -0.60 0.00 -0.05 0.00 0.05 0.00 0.143 - 0.286 -2.21 0.01 -1.79 0.01 -0.15 0.00 0.14 0.00 0.286 - 0.429 -3.69 0.01 -2.99 0.01 -0.26 0.00 0.24 0.00 0.429 - 0.571 -5.15 0.01 -4.18 0.01 -0.36 0.01 0.33 0.00 0.571 - 0.714 -6.69 0.01 -5.43 0.01 -0.48 0.01 0.42 0.00 0.714 - 0.857 -8.21 0.01 -6.69 0.01 -0.61 0.01 0.51 0.00 0.857 - 1.000 -9.78 0.01 -7.98 0.01 -0.73 0.01 0.60 0.00 Vacuum Total -36.46 0.03 -29.65 0.03 -2.64 0.01 2.28 0.00 Hydration Total -86.02 0.21 -85.38 0.22 -52.31 0.14 -47.18 0.15

22

Supplementary Table S10. (A) Detailed simulation conditions for validation set molecules with explicit solvent. (B) Simulation box content for validation set molecules in explicit water.

(A) Equilibration and production simulation protocols. Simulation Time Temperature Purpose (ns) Ensemble (K) 1 NVT 100 1 NVT 200 Equilibration 1 NVT 300 1 NPT 298 Production 1 NVT 298

(B) Simulation box sizes and content Explicit Explicit Solvent Solvent Cubic Box Experimental Water/Na+/Cl- Side Length Type PDB Structure Method Number (Å) 1MIS NMR 11955/14/0 50.591 2JXQ NMR 16968/18/0 56.617 RNA Helix 1F5G NMR 19629/18/0 59.169 2L8F NMR 24054/20/0 63.333 1ZIH NMR 7629/11/0 43.769 RNA 1SZY NMR 22941/20/0 62.323 Hairpin 2KOC NMR 12933/13/0 51.622 1D20 NMR 14022/18/0 53.443 DNA Helix 2HKB NMR 22806/22/0 62.464 1BPI X-ray (1.09 Å) 18906/0/6 58.716 1L2Y NMR 7164/0/1 42.473 1UBQ X-ray (1.80 Å) 23052/0/1 62.933 1UCS X-ray (0.62 Å) 13182/0/0 52.623 Protein 1VII NMR 11505/0/2 50.037 1WM3 X-ray (1.20 Å) 15390/0/1 55.628 2OED NMR 14451/2/0 54.063 2PPN X-ray (0.92 Å) 26052/0/4 65.701 6LYT X-ray (1.90 Å) 29667/0/9 68.637

23

Supplementary Table S11. Electrostatic Component of the Solvation Free Energy for Validation Set Molecules Using van der Waals Radii. The electrostatic component of solvation free energy (∆'0102) is reported for validation set molecules using each electrostatics model (APBS, ddCOSMO, and GK). APBS results without salt are referenced in the body of the text, while those with 150 mM KCl are shown for both the LPBE and NPBE.

ΔUelec (kcal/mol) PDB Molecule # APBS APBS APBS ID Type Charge Atoms No Salt LPBE NPBE ddCOSMO GK 1MIS -14 522 -3181.86 -3203.41 -3207.45 -3359.98 -3121.93 2JXQ RNA -18 642 -4730.14 -4763.72 -4771.49 -4968.01 -4652.90 1F5G Helix -18 650 -4518.76 -4552.02 -4555.92 -4698.63 -4455.45 2L8F -20 711 -5487.93 -5527.23 -5534.07 -5738.46 -5392.34 1ZIH -11 389 -2254.48 -2268.58 -2270.80 -2384.24 -2210.16 RNA 1SZY -20 673 -5606.59 -5648.27 -5659.94 -5926.07 -5518.29 Hairpin 2KOC -13 447 -2724.30 -2742.20 -2743.73 -2795.40 -2665.67 1D20 -18 633 -4533.15 -4565.12 -4568.09 -4675.10 -4444.43 DNA 2HKB -22 757 -6142.44 -6189.25 -6194.42 -6353.89 -6050.11 1BPI +6 892 -1264.62 -1268.62 -1268.75 -1377.45 -1286.94 1L2Y +1 304 -277.59 -277.84 -277.85 -321.18 -291.98 1UBQ +1 1232 -1250.73 -1252.05 -1252.25 -1439.03 -1268.37 1UCS 0 997 -695.42 -695.90 -695.93 -783.62 -708.75 1VII Protein +2 596 -798.02 -799.20 -799.40 -951.52 -810.02 1WM3 +1 1169 -1869.17 -1873.77 -1875.49 -2300.84 -1870.17 2OED -2 862 -926.22 -927.33 -987.50 -1049.90 -941.38 2PPN +4 1666 -1427.79 -1429.65 -1485.42 -1636.35 -1460.97 6LYT +9 1961 -2201.99 -2208.96 -2241.56 -2477.67 -2212.97 Mean -2771.73 -2788.51 -2799.45 -2957.63 -2742.38

24

Supplementary Table S12. Dipole Moment Magnitudes for Validation Set Molecules Using van der Waals Radii. Dipole moments for each electrostatics model (GK, ddCOSMO, and APBS) were collected using van der Waals electrostatic radii. APBS results without salt are referenced in the body of the text, while those with 150 mM KCl are shown for both the LPBE and NPBE.

Dipole Moment Magnitude (Debye) PDB Molecule # Explicit APBS APBS APBS ID Type Charge Atoms Vacuum Solvent No Salt LPBE NPBE ddCOSMO GK 1MIS -14 522 5.09 12.96 16.51 16.90 17.47 17.62 16.83 2JXQ RNA -18 642 43.81 44.74 48.22 48.08 47.61 47.44 46.30 1F5G Helix -18 650 19.74 26.75 32.22 32.64 33.09 32.16 33.47 2L8F -20 711 57.56 66.16 73.46 73.46 73.18 72.81 72.44 1ZIH -11 389 36.81 57.04 61.00 61.23 61.54 60.63 58.38 RNA 1SZY -20 673 69.58 76.35 78.55 78.50 78.32 78.31 78.31 Hairpin 2KOC -13 447 78.32 92.77 96.22 96.40 96.48 95.81 96.13 1D20 -18 633 20.65 33.25 29.68 29.64 29.60 29.47 29.04 DNA 2HKB -22 757 15.47 21.09 21.87 21.97 22.04 21.76 21.23 1BPI +6 892 262.78 298.29 295.50 295.81 295.83 295.75 294.74 1L2Y +1 304 63.06 72.23 74.28 74.33 74.33 74.09 73.88 1UBQ +1 1232 208.93 271.94 273.14 273.80 273.88 273.67 271.67 1UCS 0 997 113.82 132.88 133.53 133.72 133.73 134.37 134.42 1VII Protein +2 596 154.14 184.41 187.58 187.83 187.86 187.52 187.11 1WM3 +1 1169 414.65 542.01 549.25 550.46 550.81 549.35 546.01 2OED -2 862 167.99 210.06 212.13 212.47 213.72 212.13 212.09 2PPN +4 1666 123.10 158.74 155.08 155.46 155.69 154.81 155.31 6LYT +9 1961 165.65 200.36 205.41 205.85 204.84 204.97 205.22 Mean 112.29 139.00 141.31 141.59 141.67 141.26 140.70

25

Supplementary Table S13. Electrostatic Component of Solvation Free Energy for Validation Set Molecules Using Fit Radii. The electrostatic component of the solvation free energy (∆'0102) is reported for validation set molecules using each electrostatics model (APBS, ddCOSMO, and GK). APBS results without salt are referenced in the body of the text, while those with 150 mM KCl are shown for both the LPBE and NPBE.

ΔUelec (kcal/mol) Molecule # APBS APBS APBS PDB ID Type Charge Atoms No Salt LPBE NPBE ddCOSMO GK 1MIS -14 522 -3380.63 -3402.67 -3407.31 -3413.94 -3308.29 2JXQ RNA -18 642 -4987.68 -5022.01 -5030.83 -5027.65 -4903.48 1F5G Helix -18 650 -4761.02 -4794.80 -4799.14 -4791.46 -4697.51 2L8F -20 711 -5772.71 -5812.61 -5820.11 -5813.17 -5673.36 1ZIH -11 389 -2393.91 -2408.25 -2410.79 -2426.99 -2340.00 RNA 1SZY -20 673 -5881.04 -5923.77 -5936.83 -5969.97 -5791.95 Hairpin 2KOC -13 447 -2905.35 -2923.43 -2925.07 -2878.95 -2843.67 1D20 -18 633 -4817.35 -4849.46 -4852.40 -4557.81 -4720.34 DNA 2HKB -22 757 -6477.62 -6524.75 -6530.09 -6245.20 -6352.56 1BPI +6 892 -1283.79 -1287.74 -1287.87 -1261.82 -1339.72 1L2Y +1 304 -287.03 -287.28 -287.28 -286.00 -327.12 1UBQ +1 1232 -1328.58 -1329.90 -1330.10 -1303.64 -1476.49 1UCS 0 997 -732.14 -732.63 -732.67 -710.19 -868.89 1VII Protein +2 596 -814.08 -815.28 -815.48 -862.63 -880.73 1WM3 +1 1169 -1949.26 -1953.87 -1955.57 -2169.39 -2097.99 2OED -2 862 -986.27 -987.42 -987.50 -965.49 -1139.72 2PPN +4 1666 -1483.47 -1485.30 -1485.42 -1478.17 -1690.92 6LYT +9 1961 -2234.36 -2241.28 -2241.56 -2255.05 -2324.01 Mean -2915.35 -2932.36 -2935.34 -2912.08 -2932.04

26

Supplementary Table S14. Dipole Moment Magnitudes for Validation Set Molecules Using Fit Radii. Dipole moments for each electrostatics model (APBS, ddCOSMO and GK) evaluated using fit electrostatic radii. APBS results without salt are referenced in the body of the text, while those with 150 mM KCl are shown for both the LPBE and NPBE. Dipole Moment Magnitude (Debye) PDB Molecule # Explicit APBS APBS APBS ID Type Charge Atoms Vacuum Solvent No Salt LPBE NPBE ddCOSMO GK 1MIS -14 522 5.09 12.96 18.26 18.72 19.34 14.57 17.12 2JXQ RNA -18 642 43.81 44.74 47.58 47.39 46.89 48.05 45.54 1F5G Helix -18 650 19.74 26.75 35.21 35.71 36.20 27.82 37.30 2L8F -20 711 57.56 66.16 75.50 75.48 75.19 72.68 73.96 1ZIH -11 389 36.81 57.04 63.59 63.87 64.22 57.53 59.70 RNA 1SZY -20 673 69.58 76.35 78.40 78.32 78.11 80.35 79.29 Hairpin 2KOC -13 447 78.32 92.77 98.94 99.14 99.24 94.21 97.69 1D20 -18 633 20.65 33.25 29.96 29.92 29.87 29.09 29.36 DNA 2HKB -22 757 15.47 21.09 21.62 21.71 21.77 22.23 21.07 1BPI +6 892 262.78 298.29 294.71 295.01 295.03 294.28 295.17 1L2Y +1 304 63.06 72.23 73.46 73.51 73.52 72.61 73.78 1UBQ +1 1232 208.93 271.94 273.31 273.96 274.04 274.06 272.76 1UCS 0 997 113.82 132.88 134.21 134.41 134.42 134.79 135.18 1VII Protein +2 596 154.14 184.41 188.62 188.86 188.89 187.89 188.99 1WM3 +1 1169 414.65 542.01 548.53 549.73 550.06 546.63 546.68 2OED -2 862 167.99 210.06 213.36 213.70 213.72 212.74 212.36 2PPN +4 1666 123.10 158.74 155.27 155.66 155.69 155.06 156.28 6LYT +9 1961 165.65 200.36 204.34 204.78 204.84 203.59 210.17 Mean 112.29 139.00 141.94 142.22 142.28 140.45 141.80

27

Supplementary Table S15. Dipole moment magnitudes from fixed charge force fields (AMBER99sb, OPLS-AAL, and CHARMM22) are compared to dipole moments obtained using AMOEBA in vacuum and explicit solvent (ensemble average) for validation set biomolecules. Tinker version 8.8.1 (August 2020) does not include nucleic acid force field parameters for OPLS- AA/L or CHARMM22, so only AMOEBA and AMBER ff99SB dipole moment magnitudes are reported for nucleic acids. Dipole Moment Magnitude (Debye) AMOEBA PDB Molecule # AMOEBA Explicit ID Type Charge Atoms Vacuum Solvent AMBER OPLS CHARMM 1MIS -14 522 5.09 12.96 10.83 -- -- 2JXQ RNA -18 642 43.81 44.74 44.06 -- -- 1F5G Helix -18 650 19.74 26.75 19.16 -- -- 2L8F -20 711 57.56 66.16 70.38 -- -- 1ZIH -11 389 36.81 57.04 55.14 -- -- RNA 1SZY -20 673 69.58 76.35 70.60 -- -- Hairpin 2KOC -13 447 78.32 92.77 89.06 -- -- 1D20 -18 633 20.65 33.25 31.38 -- -- DNA 2HKB -22 757 15.47 21.09 14.81 -- -- 1BPI +6 892 262.78 298.29 291.02 292.78 295.29 1L2Y +1 304 63.06 72.23 68.59 70.63 70.70 1UBQ +1 1232 208.93 271.94 275.25 288.69 286.33 1UCS 0 997 113.82 132.88 128.51 128.35 127.81 1VII Protein +2 596 154.14 184.41 180.46 176.98 178.04 1WM3 +1 1169 414.65 542.01 534.47 539.46 540.70 2OED -2 862 167.99 210.06 204.65 203.57 206.48 2PPN +4 1666 123.10 158.74 156.56 159.53 156.19 6LYT +9 1961 165.65 200.36 184.10 182.04 181.14 Mean 112.29 139.00 134.95

28