<<

doi:10.1016/j.jmb.2009.05.068 J. Mol. Biol. (2009) 391, 484–497

Available online at www.sciencedirect.com

Calculation of ’ Total Side-Chain Torsional Entropy and Its Influence on –Ligand Interactions

Kateri H. DuBay and Phillip L. Geissler⁎

Department of , Despite the high density within a typical protein fold, the ensemble of University of California at sterically permissible side-chain repackings is vast. Here, we examine the Berkeley, Berkeley, CA 94720, extent of this variability that survives energetic biases due to van der Waals USA interactions, hydrogen bonding, salt bridges, and solvation. Monte Carlo simulations of an atomistic model exhibit thermal fluctuations among a Chemical Sciences Division, diverse set of side-chain arrangements, even with the peptide backbone Lawrence Berkeley National Lab, fixed in its crystallographic conformation. We have quantified the torsional Berkeley, CA 94720, USA entropy of this native-state ensemble, relative to that of a noninteracting Physical Biosciences Division, reference system, for 12 small proteins. The reduction in entropy per Lawrence Berkeley National Lab, rotatable bond due to each kind of interaction is remarkably consistent Berkeley, CA 94720, USA across this set of molecules. To assess the biophysical importance of these fluctuations, we have estimated side-chain entropy contributions Received 12 January 2009; to the binding affinity of several peptide ligands with calmodulin. received in revised form Calculations for our fixed-backbone model correlate very well with 20 May 2009; experimentally determined binding entropies over a range spanning more accepted 22 May 2009 than 80 kJ/(mol·308 K). Available online © 2009 Elsevier Ltd. All rights reserved. 28 May 2009

Keywords: side-chain entropy; configurational entropy; side-chain fluctuations; Edited by M. Levitt protein–ligand binding; protein thermodynamics

Introduction niques in particular resolve fluctuations at the level of single bond vectors in both the backbones and – Native protein conformations are extremely dense, side-chains of folded proteins.3 5 The Lipari–Szabo with packing fractions comparable to those of order parameters6 they determine, which increase 1 2 2 organic crystals. This observation motivated in from Saxis =0 to Saxis =1 as rotational motion early studies of protein structure and dynamics a becomes restricted, report on the range of picose- jigsaw-puzzle notion, in which side cond to nanosecond dynamics for backbone chains of a folded structure become fixed in a unique and side-chain methyl groups. Computational stu- spatial arrangement by steric interactions with their dies suggest that order parameters lower than 0.8 neighbors. Still today, many computational proce- point to transitions between multiple rotameric dures that explore side-chain packing strive to states in addition to the inevitable vibrations about identify a single native configuration.2 optimal torsional angles.7,8 The three-dimensional structure of a protein, how- Side-chain methyl group order parameters often lie b 2 b 3 ever, can fluctuate considerably. Large-scale mo- in the range 0.2 Saxis 0.8, indicating extensive tions involve partial or full unfolding and backbone exploration of different rotameric states. These results hinge motions, but subtle structural variation on are corroborated by dipolar coupling measurements, smaller scales has been highlighted by several suggesting that side chains substantially populate experimental measurements. NMR relaxation tech- different rotameric states within the ensemble of folded configurations.9,10 Evidence for alternative side-chain conformations has even been found in *Corresponding author. E-mail address: electron density maps from crystallography expe- [email protected]. riments.11 The data accumulating from such studies Abbreviations used: MC, Monte Carlo; MD, molecular paint a consistent picture: Residual side-chain fluc- dynamics; LJ, Lennard–Jones; SB, salt bridge; HB, tuations in the native-state ensemble are distributed hydrogen bond; IS, implicit solvent; SASA, solvent- heterogeneously throughout the protein; side-chain accessible surface area; CaM, calmodulin; PYP, bond vectors fluctuate more significantly than do photoactive yellow protein; WL, Wang–Landau. those along the backbone; and the entropy associated

0022-2836/$ - see front matter © 2009 Elsevier Ltd. All rights reserved. Calculation of Side-Chain Torsional Entropy 485 with such fluctuations is likely to be a significant in several systems, entropy changes figure promi- player in protein thermodynamics.4,12 nently in tuning protein binding affinities.22,23 In the Computational studies focusing on geometric case of stromelysin 1 binding to the N-terminal do- aspects of side-chain packing have reconciled the main of tissue inhibitor of metalloproteinases 1, they evidence for significant torsional fluctuations with even overcome a substantially unfavorable enthalpy constraints due to steric interactions in a dense of binding.24 Implicating the involvement of side environment.13 Much as in a dense liquid, volume chains in these phenomena, entropies inferred from exclusion reduces the diversity of accessible config- NMR order parameters correlate strongly with calo- urations greatly, but by no means completely. rimetrically determined binding entropies for cal- Nearly 1020 distinct side-chain conformations were modulin (CaM) and several peptide ligands.12 We determined to satisfy hard-core constraints in a 125- find even better agreement between binding en- residue protein with native backbone structure.13 To tropy measurements and calculated values based on what degree non-steric interactions further reduce the methods we have developed. This comparison is this variability is not at all clear a priori. Populating discussed in detail in Results and Discussion. even a very small fraction of the geometrically acceptable arrangements would be sufficient to Model allow for significant contributions to free energies of folding and ligand binding. In developing a theoretical approach, we are Mean field theories,14 various interpretations of guided by the notion that side-chain rearrangements – molecular dynamics (MD) simulations,15 18 and within a protein's native state are not strongly me- – several Monte Carlo (MC) approaches13,19 21 have diated by motions of the peptide backbone. Physi- all been used to estimate the residual entropy of side- cally, we expect that once the molecule has folded, it is chain rotations in folded proteins. Each of subject to global constraints of high packing fraction these approaches, however, is limited by underlying that vary little with small-amplitude backbone fluc- approximations or formidable practical challenges. tuations. Empirically, we note that correlations Mean field approaches, by definition, do not account observed between backbone NMR order parameters, 2 2 for a complete range of thermal fluctuations; straight- S , and their associated side-chain parameters, Saxis, forward MD simulations can explore only rearrange- are weak.25 Following Kussell et al.,13 we thus adopt a ments that occur on computationally accessible time model in which the peptide backbone is fixed in its scales. MC methods are similarly hindered by crystallographically determined conformation. As a sampling difficulties intrinsic to such tightly packed result, applications of our methods are limited to systems. As a compromise, entropies are sometimes proteins whose native structures have been deter- calculated separately for single residues or small mined with high resolution. groups of neighboring residues while keeping other The sole degrees of freedom in our calculations are residues fixed.19 Studies that do confront the full dihedral angles χ for rotatable side-chain bonds with combinatorial problem, allowing all side chains to heavy-atom (i.e., non-hydrogen) . Other rotate simultaneously, have neglected potentially variables are known to influence side-chain entropy,15 important contributions from intra-rotameric mo- but torsional entropy alone is thought to provide a tions20 or have considered geometric effects indepen- good approximation.19 Natural amino acids possess dent of non-steric interactions.13,21 no more than a handful of such dihedral degrees of In this article, we present a new approach for freedom. Alanine, for example, has none, while lysine estimating side-chain torsional entropy. Building on and arginine possess the largest number (four). As in a algorithms developed by Kussell et al.,13 our calcula- simple molecule such as propane, local bonding tions are enabled by enhanced MC methods and a energetics bias such angles to lie in one of typically schematic treatment of forces due to sterics, van der three ranges. For classification purposes, we consider Waals interactions, hydrogen bonding, salt bridges these ranges as discrete rotameric states, each with an (SBs), and solvation. Through this combination, we ideal angular value θ. We do, however, permit achieve thorough sampling of thermal fluctuations, deviations from these ideal angles, ϕ=χ−θ.Weand incorporate fully coupled rotations of all residues, others have found them to be essential for accom- and address a comprehensive set of physical interac- modating tightly packed rearrangements.13,26 The tions. Model outlines our approach and the physical intrinsic energetic penalty Edihedrals limiting such perspectives underlying it. Results and Discussion fluctuations in our model is quadratic in ϕ, except describes applications to a series of small globular for dihedrals between sp2 and sp3 hybridized carbons, χ proteins, quantifying and comparing the ways in where Edihedrals =0 and is therefore -independent. which various forces act to limit rotational freedom. Correspondingly, these bonds possess a single dis- Within the model we have developed, substantial crete rotamer state. freedom remains in the packing of side-chains, It is well known from studies of microscopic struc- even in the presence of strong, anisotropic attrac- ture in liquids27 and polymeric materials28 that the tions such as hydrogen bonding. The corresponding most essential feature of non-covalent interactions in entropy can, therefore, in principle, strongly in- dense environments is the harsh repulsion between fluence the thermodynamics of folding, protein– overlapping moieties. Energetic models that discard protein binding, and protein–ligand interactions. constraints of volume exclusion in favor of slowly Indeed, it now appears from calorimetric data that, varying potentials for computational convenience29 486 Calculation of Side-Chain Torsional Entropy are therefore not suitable for our purpose of quanti- cult to rotate a side-chain bond through the ∼120° fying side-chain entropy. Nonetheless, the precise needed to transit from the neighborhood of one ideal dependence of steric interactions on inter-atomic angle to another without introducing steric overlaps. distances is likely unimportant,13,30 provided that In real systems, such an isolated rotation would penetration becomes prohibitively costly at the incur great energetic cost; in our model, the price is appropriate length scale. We employ a Lennard– often not even finite. We circumvent this problem Jones (LJ) potential between all pairs of heavy atoms with MC sampling procedures that preserve the separated by at least three bonds, which describes van Boltzmann distribution determined by Eq. (1). der Waals attractions in addition to imposing steric Specifically, we employ a modified energy function constraints.31,32 This interaction is truncated at both in which the singular hard core of our van der Waals small and large distances: For separations larger than potential is replaced by a finite constant energy ɛ twice the LJ diameter, we set the potential energy to tunnel. Correcting exactly for the resulting bias is zero (and shift the entire potential to maintain trivial, since the relative weights of sterically allowed continuity at the cutoff); separations smaller than configurations are unchanged. For many purposes, 3/4 the van der Waals contact distance are assigned one need only discard sampled configurations that infinite energy and thus disallowed entirely. This violate steric constraints (see Methods). The advan- latter modification, introduced for practical reasons, tage of this artifice is an ability to “tunnel” through ɛ has no physical consequences at reasonable tempera- disallowed regions of configuration space. If tunnel tures and densities. does not greatly exceed the energy kBT of typical The pairwise interactions we expect to exert the thermal excitations, simulations can move much largest influence on side-chain packing are electro- more readily through the free-energy barriers that ɛ static in nature, namely, SBs and hydrogen bonds frustrate MD. An optimal value of tunnel must also (HBs). We model these energetics based on previous ensure that the proportion of sterically inadmissible coarse-grained approaches.33,34 Although we make states generated by MC simulations is not over- no effort to represent electrostatic forces between whelmingly large. This procedure can enhance residues in great detail, their strength and aniso- sampling efficiency considerably. Several of the tropy should be appropriate to the chemical variety calculations we present nonetheless additionally of natural amino acids. required adaptive umbrella sampling39 and/or Finally, we treat hydrophobic effects in terms of the staging through multiple ensembles in which side- relative amounts of polar and non-polar surface area chain interactions are gradually introduced (see exposed to solvent. This simplistic implicit solvent Methods) to obtain well-converged results with (IS) description does not address the sensitivity of available computing resources. aqueous solvation to the spatial distribution of The model energetics and Metropolis MC methods hydrophobic and hydrophilic moieties at the protein we have described provide a straightforward and surface,35,36 but it does roughly account for the many- computationally manageable way to characterize body nature of such effects. For this purpose, we side-chain fluctuations quantitatively. By design, our utilized an inexpensive but faithful approximation to sampling scheme is not dynamically realistic on the standard procedures for determining solvent-acces- time scale of torsional vibrations. Individual trial sible surface area (SASA).37,38 See Methods for details. moves that advance these simulations often switch The full potential energy function governing our directly between distinct rotameric states. In the model sums these various interactions, course of natural dynamics, such transitions occur on time scales of picoseconds to milliseconds.40,41 ðÞQ; A :ð Þ E = Edihedrals + Enon−bonded + Eimplicit solvent 1 We have found that MC trajectories comprising 50,000 sweeps are sufficient (but not excessive) for It depends on the set of N torsional angles for all sampling a representative set of side-chain rearran- rotatable bonds described above, which we specify gements in small globular proteins (including on the Θ θ θ … θ through the nearest ideal values, ={ 1, 2, , N}, order of 250 rotatable bonds). Exploring the same Φ ϕ ϕ … ϕ and deviations about them, ={ 1, 2, , N}. Note range of fluctuations using straightforward MD that we have collected LJ, SB, and hydrogen-bonding simulations of detailed atomistic models such as contributions into a total potential Enon-bonded for CHARMM or AMBER, which proceed in roughly pairwise-additive, non-bonded interactions. Free femtosecond steps, would be extremely taxing if not parameters in the energies of Eq. (1) were tuned unfeasible. Indeed, previous MD work suggests that exclusively for the purpose of ensuring that side- the breadth of side-chain motions cannot be reliably chain packing in crystallographic configurations gauged from nanosecond trajectories even for very yields energies not much larger than those of small proteins.42 alternative arrangements generated in the course of computer simulations. Their values lie well within the Results and Discussion range of analogous parameters appearing in other models that attempt a similar level of resolution. Because it represents steric constraints realistically, Entropy of side-chain configurations our model shares with many other approaches severe challenges to thorough sampling of thermal Absolute entropies are not well defined for contin- fluctuations. From typical configurations, it is diffi- uous classical variables. It is therefore necessary in Calculation of Side-Chain Torsional Entropy 487 computing torsional entropy of a model such as ours partition functions in Eq. (3) could be evaluated to specify a standard state. For this purpose, we choose using Zwanzig's formula, a noninteracting reference system where all dihedral P R (ref) AexpðÞβ ðÞQ; A E Qconfig Q d E angles are equally likely, =0. All entropies we = P R ðrefÞ Aexp β ðÞref report are given relative to this maximally flexible Q Q d E ð4Þ system, Δ(ref)S=S −S(ref) ,whereS is the config config config config hexp β ðÞQ; A ðÞref iðÞref ; configurational entropy associated with fluctuations = E E both within and between distinct rotameric wells. This − where β 1 =k T. It is therefore necessary in principle choice of reference state has several merits. First, a state B only to sample configurations from the noninterac- in which motions of one residue are independent from ting system. This approach is not practical, however, allothersservesasacrudeproxyforside-chainfluc- since the ensembles defined by E and E(ref) overlap tuations of an unfolded protein. In other words, Δ(ref)S weakly. We overcome this problem with a staging could be thought of as a rough estimate for the change protocol that introduces several intervening ensem- in torsional entropy upon folding. Second, by setting bles. In these intermediate states, IS and non-bonded E(ref) equal to a constant, we remove all chemical interactions are scaled by a parameter 0bλb1 (see details distinguishing between different rotating Methods). moieties. The reference state consequently has an Side-chain configurational entropy is commonly entropy per rotatable bond, s(ref),thatisconsistent discussed in terms of separate contributions from across proteins of arbitrary composition. Thus, while vibrations within a rotameric state (S ) and from we can determine side-chain entropies only up to an vib conformational transitions between discrete rota- additive constant, Ns(ref),whereN is the total number meric states (S ).43 Many computational efforts of rotatable bonds, we ensure that s(ref) has the same conf focus exclusively on S , even though recent theore- value for all proteins we consider. Finally, a noninte- conf tical studies highlight the importance of vibrational racting standard state facilitates ligand affinity calcu- entropy changes in ligand binding.18,44 That a large lations based on the thermodynamic cycle shown in set of rotameric states becomes accessible only when Fig. 1. Because non-translational free-energy contribu- such vibrations are allowed indicates that these tions are invariant when two molecules A and B bind motions are in fact strongly interdependent.13,26 Our in their noninteracting reference states, association calculations of Δ(ref)S make no attempt to treat vibra- entropies can be computed via tional and conformational contributions separately. We will, however, describe ways to quantify the va- DðÞbinding ::: ðÞ S = SA B SA +SB riability of one motion, while fixing or integrating out ð2Þ ðÞref ::: ðÞref ðÞref the other. = D SA B D SA + D SB :

We compute these entropy differences using the Entropic losses due to side-chain interactions corresponding changes in energy and partition function Q, We have applied the techniques outlined in the 0 1 previous section to determine side-chain entropies of 12 small proteins, ranging in size from 46 to 143 ðÞref @QconfigA 1 ðÞðÞref D S = k ln + hEihE ref i ð3Þ residues and exhibiting a diverse set of secondary B ðÞref T Qconfig structures. For each molecule, we have also per- formed calculations with model energetics that Angled brackets denote equilibrium averages include only a subset of the interaction types des- over canonical ensembles at temperature T.(We cribed by Enon-bonded and Eimplicit solvent.Inthisway, perform most calculations at T=300 K.) Lacking we quantify the extent to which different kinds of superscripts, these brackets refer to the Boltzmann forces limit torsional freedom in the dense environ- ment of a folded protein. Results for the entropy distributions determined by the full energy function Δ(ref) of Eq. (1); the superscript “(ref)” refers to statistics of reduction per rotatable bond, S/N, are shown in the noninteracting reference system. The ratio of Fig. 2. For each variant of the model, the similarity of Δ(ref)S/N values across the entire set of proteins is striking. Local energetic biases due to covalent bonding, described by Edihedrals and included in all of the interacting systems, result in a significant reduction in entropy. Although weakly dependent on the specific amino acid makeup of the protein, it is found to be quite consistent across the 12 proteins. Of the various interactions considered in isolation, Fig. 1. Thermodynamic cycle relating the change in (binding) electrostatics yields the largest entropy reduction in entropy upon protein–ligand binding, Δ S, to the entropic differences between the interacting and the most cases. Sterics and van der Waals attractions noninteracting reference cases for the bound and unbound effect changes similar in magnitude but typically species. Note that ΔS=0 for the binding of the ligand to the somewhat smaller. Solvation forces, in effect acting protein within the reference system, allowing Δ(binding)S to only at the periphery of the molecule, contribute least, be calculated as shown in Eq. (2). even though approximately 60% of the residues in 488 Calculation of Side-Chain Torsional Entropy

Fig. 2. Total side-chain dihedral entropy per rotatable bond of 12 small proteins, relative to that of a noninteracting reference system [see Eq. (3)]. Results are shown for various combinations of interaction types at 300 K. D refers to a noninteracting reference system that includes only the intrinsic dihedral energy Edihedrals. All other cases include this dihedral potential together with subsets of side-chain interactions: S indicates steric energetics due to the repulsive part of the LJ potential; LJ indicates the full Lennard–Jones potential; IS indicates the implicit solvent; and HBSB indicates the hydrogen bonding and SB interactions. The proteins studied here are barstar (1a1945), calmodulin (3cln46), crambin (1cbn47), eglin c (1cse48), GB3 (1igd49), protein L (1hz650), PYP (1f9i51), PZD2 (1r6j52), SH2 (1d1z53), CspA (1mjc54), ubiquitin (1ubq55), and tenascin (1ten56). These results were calculated using Metropolis MC.57 Five trials starting from different randomly chosen side-chain configurations were run for 50,000 MC sweeps each when calculating 〈E(Θ, Φ)〉 and for 17 stages of three 20,000-sweep trials each when calculating Q/Qref. Error bars represent one standard deviation. photoactive yellow protein (PYP) are considered overestimates torsional freedom. Further, rigidity of solvent-exposed.13 The CaM structure, consisting the peptide backbone would likely cause our model of two globular regions connected by an extended to underestimate torsional freedom of the folded α-helix,46 retains the most entropy. The steric con- state. Despite these limitations, the two values are tribution for CaM is among the smallest in this set of nonetheless well within kB of one another. We consi- molecules, as might be expected from its relatively der this correspondence an assuring sign that our open structure, but the isolated effects of other model captures the basic physical determinants of interaction types are not at all atypically weak. side-chain entropy correctly. Averaging the entropy reduction per rotatable Over the set of fully globular proteins (which bond over this set of proteins yields Δ(ref)S/N= excludes CaM) we have studied, results for Δ(ref)S/N −5.2 kJ/(mol·300 K). As we have noted, one might range from −5.02 kJ/(mol·300 K) for protein L to regard the noninteracting reference system as a −5.66 kJ/(mol·300 K) for eglin C. Since the standard schematic representation of the unfolded state, state is equivalent in all cases, the range of absolute whose side-chain rotations should be considerably torsional entropies per rotatable bond, Sconfig/N,is less restricted than in a native fold. A more faithful identical in breadth to the range in Δ(ref)S/N.Judging description of the unfolded state would include the from these 12 proteins, natural variations in native local biases of Edihedrals, which operate regardless of side-chain environments can easily shift torsional en- δ ≈ non-covalent structure. Accounting for the corre- tropies by an amount Sconfig/N 0.6 kJ/(mol·300 K). sponding entropic reduction, averaged over the set [Note that when the values of Δ(ref,dihedrals)S/N for of proteins we consider, of −1.8 kJ/(mol·300 K), we protein L and eglin C are compared, the difference is obtain a typical difference of Δ(ref,dihedrals)S/N= ≈0.7 kJ/(mol·300 K), indicating that this difference −3.6 kJ/(mol·300 K) between the noninteracting is not simply due to differing numbers of rotatable state with restrained dihedrals, denoted by a super- sp3–sp3 hybridized bonds.] In the context of protein script “(ref,dihedrals),” and the fully interacting binding thermodynamics, this result provides a state. The change in entropy upon folding due to rough gauge for the potential strength of entropic side-chain conformational fluctuations has been driving forces. If, for example, two globular proteins estimated from several different approaches, lea- form a complex whose interface is comparable to ding to a consensus figure of ≈−2.1 kJ/(mol·300 K) the internal structure of typical native folds, overall per rotatable bond.43 That Δ(ref,dihedrals)S/N exceeds side-chain entropy may nonetheless change by as this value in magnitude is not surprising. Viewed as much as N (0.6 kJ/(mol·300 K)). For a complex with an approximation of an unfolded protein, our refe- 100 residues, this maximum change in total entropy rence state, even with dihedral restraints, certainly would amount to a substantial 102 kJ/(mol·300 K). Calculation of Side-Chain Torsional Entropy 489

Side-chain entropic contributions to CaM–ligand of rotatable bonds (between N=281 and N=286)and binding rank differently by N and by Δ(binding)S. Furthermore, calculations employing reduced sets of interaction Calorimetry provides unambiguous evidence for types in many cases compare poorly with experiment. strong entropic contributions to protein binding equi- Our results thus bolster the conclusion of Ref. 12 that libria.12,24 CaM, for example, binds a series of pep- side-chain torsional rearrangements constitute a major, tides with similar affinities, but with widely varying if not dominant, source of CaM binding entropy. entropies of association.12 For the specific ligand CaMKKα(p), the contribution to the free energy of Heterogeneous distribution of side-chain binding due to entropy alone is nearly 100 kJ/mol, entropy but it is not clear how such entropic changes are distributed among the degrees of freedom associated Though ordered, a folded protein is structurally with solvent, peptide backbone, and amino acid side- heterogeneous on all scales from atomic to macro- chains. The role of side-chain rotations in CaM bin- molecular. One might expect that the rotational ding thermodynamics has recently been explored by freedom of side chains is similarly nonuniform. estimating torsional entropy from NMR order para- Indeed, the fluctuation spectrum of a protein's interior 12 2 meters. Although the connection between Saxis and has been likened to that of a solid, while the exposed side-chain entropy is not precise, and although this surface is often considered fluid.65 Our calculations estimate, of necessity, neglects correlated fluctuations reveal spatial patterns of torsional variability that are of different residues and fluctuations that take place not nearly as simple as this conjecture would suggest. on time scales longer than those detected in the relaxation experiment, thermodynamic trends were successfully predicted.12 Specifically, Frederick et al. found a linear correlation between calorimetric results for Δ(binding)S and those computed from NMR data, with a slope of 0.51 and correlation coefficient r=0.88. Side-chain contributions to CaM affinity thus appear considerable. Our approach provides a way to estimate side- chain contributions to Δ(binding)S without the as- sumptions inherent when inferring thermodynamic behavior from NMR order parameters. This CaM– peptide system thus serves as a test both of the methods we have developed and of the notion that torsional fluctuations can play an essential role in peptide binding. We focus on the four peptides considered in Ref. 12 for which calorimetric data12,58,59 and high-resolution structures are avail- 60 61 able: CaMKKα (1ckk ), smMLCK (1cdl ), CaMKI (1mxe62), and eNOS (1niw63). The thermodynamic cycle in Fig. 1 was used to calculate Δ(binding)S from (ref) our Δ S calculations of the bound and unbound Fig. 3. Entropic contributions to the binding free CaM and ligand species. The entropy of unbound energies −TΔ(binding)S for four CaM–peptide complexes. CaM was computed using the globular structure of Results of MC simulations are plotted against correspon- Ref. 64 (1prw). The backbone conformation of each ding calorimetric measurements from Ref. 12. Unbound ligand when co-crystallized with CaM was used as CaM is shown on the plot as the reference point at (0,0). For the CaM–peptide complexes, 〈E(Θ, Φ)〉 was calculated well for the unbound peptide. – Binding entropies determined by our model using a WL bias in six sets of 10 trials, each with 90,000 100,000 sweeps. Average values were calculated within match the trend of experimental data as well as each set and errors were calculated across the six sets. For their overall scale, as shown in Fig. 3.Inparticular, unbound CaM and peptides, Metropolis MC was sufficient Δ(binding) bΔ(binding) the experimental order SCaMKKα(p) to calculate 〈E(Θ, Φ)〉, and 5 trials of at least 50,000 sweeps bΔ(binding) bΔ(binding) SsmMLCK(p) SCaMKI(p) SeNOS(p) each were performed. AWL bias was also used to calculate is correctly reproduced, although the difference 〈Δ(C)〉(tunnel) and the first ratio of partition functions on the between smMLCK(p) and CaMKKα(p) cannot be right-hand side of Eq. (20) in three sets of 10 trials for each resolved within statistical errors. Correlation between of the CaM–peptide complexes. Metropolis MC was used to calculate the remaining 22 stages in the Q/Qref computed values and experimental measurements – exceeds reasonable expectations, given our exclusive calculation of the CaM peptide complexes, as well as the focus on side-chain contributions and neglect of full Q/Qref calculation for unbound CaM (in 26 stages) and the unbound peptides (in 17 stages). Each stage included 3 backbone fluctuations. We emphasize that model trials of 20,000 sweeps. Averages and errors were parameters were not adjusted to obtain this agree- calculated between the three independent calculations of ment. Neither is the correspondence a trivial conse- Q/Qref. Error bars represent one standard deviation. quence of peptide size and composition; the Errors in the calorimetric measurements of TΔ(binding)S complexes we have studied possess similar numbers are ≤ 1.0 kJ/mol.12 490 Calculation of Side-Chain Torsional Entropy

We do find, on average, that side chains of surface present and discuss in detail results only for PYP, residues are less tightly constrained by native inter- whose behavior is typical of the entire set of actions than are those of the interior. However, there molecules. Figure 4 illustrates the complex spatial are many exceptions, and simple features of crystal- distributions of rotational freedom generated by our lographic structures such as secondary structure and model. It also demonstrates that different interaction packing density do not reliably foreshadow the extent types limit side-chain rearrangements in different of local side-chain fluctuations.25 ways. ðresÞ As a measure of local torsional variability, we Values of S are indicated in Fig. 4 by the ðresÞ i consider the Gibbs entropy S associated with a coloring of residues within PYP's three-dimensional single residue's notionally discrete rotameric states, structure. Although side chains are shown in their X crystallographic configurations, it is fluctuations ðÞres ðÞQ ln ðÞQ ; ð Þ away from this ideal packing that determine the Si = kB p i p i 5 Q local entropies depicted. The color scale varies by i ðÞ residue according to its maximum possible value of where Θ ={θ(i),…, θ i } denotes the set of ideal torsion ðresÞ i 1 Ni S. Bright red corresponds to this maximum value 3− 3 i ð Þ angles for each of the Ni rotatable sp sp hybri- res ln Si = kBNi 3 , while dark blue signifies an dized side-chain bonds belonging to residue i. The ðresÞ Θ Ni populations p( i) of these 3 states are determined absence of rotamer variability Si =0 . Residues in simulations by constructing a histogram over that possess no rotatable bonds are colored blue, sterically allowed configurations. Effectively inte-ðÞ though Eq. (5) is not well defined in this case. grating out torsional fluctuations Φ ={ϕ(i),…, f i } i 1 Ni Results for our noninteracting reference system within each ideal rotameric state, we focus on are shown in Fig. 4a. Lacking any bias on side-chain discrete degrees of freedom with a manageable set configuration, all residues with rotatable bonds of possible realizations. As a result, we can calculate ðresÞ exhibit their maximum local entropy and are thus converged, absolute values of Si . This analysis colored red. Figure 4b–e correspond to different focuses explicitly on “conformational” contributions subsets of interaction types, each including the to entropy. Others have calculated analogous quan- basic local energetics E of torsional rotations. 43 dihedrals tities for different models, some lacking vibra- Figure 4f shows results for the full model potential 21 tional fluctuations altogether. In our calculations, of Eq. (1). coupling between conformational and vibrational Of the interaction types we consider, the combina- motions, and between rearrangements of different tion of steric constraints and van der Waals attrac- Θ residues, is implicit in the weights p( i). tions effects local entropy in ways most similar to the ðresÞ We have computed Si for all residues in each of solid/liquid caricatures of a protein's interior/ex- the small proteins listed in Fig. 2, and for several terior (see Fig. 4b). However, even in this case, the subsets of the interaction types in Eq. (1). Here, we entropic distinction between exposed and buried

ð Þ res 51 Fig. 4. Side-chain conformational entropy, Si [see Eq. (5)], for all residues i in PYP (1f9i ) for various kinds of ðresÞ interactions. The side-chains are color coded according toð eachÞ residue's value of Si , with red indicating the residue's res 3– 3 maximum entropy and blue indicating its minimum. Si values have been normalized by the number of sp sp hybridized rotatable bonds. (a) Noninteracting reference system (blue residues indicate amino acids without rotatable bonds), (b) LJ interactions, (c) IS interactions, (d) both LJ and IS interactions, (e) HB and SB interactions, and (f) all interactions. All interacting runs include the effects of Edihedrals. Results were calculated using Metropolis MC for five independent trials, each run for 50,000 MC sweeps. Images were made using MacPyMOL.66 Calculation of Side-Chain Torsional Entropy 491 residues is not clear-cut. By itself, the IS energy has discards contributions of torsional vibrations we an opposing effect, significantly limiting the motion have found to be numerically significant. On the of only those residues that can be readily accessed by experimental side, currently feasible measurements at solvating water molecules (see Fig. 4c). Electrostatic this level of resolution can only be related to thermo- interactions exert a rather different influence on dynamics in approximate ways. NMR order para- patterns of torsional freedom (see Fig. 4e), in meters, for example, are sensitive only to the range of isolation affecting only those residues that donate/ rotational fluctuations that function on picosecond to accept HBs or participate in SBs. The directionality of nanosecond time scales.3 Dipolar coupling and J- these forces, as well as the fact that HB partners may coupling experiments report on longer time scale reside on the peptide backbone, begets restrictions motions, but for side-chains, they are generally only on side-chain motion that are, in general, much more applied to the rotatable bond closest to the peptide χ 67 localized and anisotropic than those due to other backbone (whose dihedral angle is denoted 1). interaction types. The net effect of all these interac- Despite these limitations, we employ methyl order χ tions, when operating simultaneously, is a local parameters and 1 rotamer populations as rough entropy much reduced from that of the reference points of comparison for our computer simulations. state and distributed throughout the structure much less smoothly than would be expected from the notion that buried residues adopt unique rotamer states (see Fig. 4f). These same interactions restrict torsional vibra- tions as well, whose variety is essentially overlooked by the local entropy of Eq. (5). This neglect is reasonable for assessing rotamer flexibility in qualitative terms but does not suffice for quantifying the magnitude of entropic driving forces. As an example, the total side-chain rotational entropy of a protein can be estimated by summing local entropies P ðresÞ over all residues, S = i Si . Discarding contribu- tions from vibrational fluctuations in this manner diminishes computed CaM-peptide binding entro- pies by nearly 60%. Nevertheless, the reduction of ðÞ TD binding S estimates is consistent in magnitude across the peptide ligands we have studied, so that correlation with experimental data remains strong, with a correlation coefficient of r=0.96. In a separate approach to quantifying the impor- tance of torsional vibrations, we consider a new reference state, denoted by “(ref′,dihedrals),” in which rotamers do not interact but are nonetheless constrained to a single set of ideal dihedral angles Θ and are governed by the dihedral potential. We can Δ(ref′,dihedrals) thus estimate the loss of entropy Sconfig solely due to restrictions on vibrational motion resulting from interactions. These values closely mirror the results shown in Fig. 2, but on a scale smaller by roughly a quarter.

Entropies of the reference systems we have0 ðÞref ðÞref considered are simply related, Sconfig Sconfig = ðÞsp3 ðÞsp3 kBN ln3, where N is the total number of rotatable sp3–sp3 hybridized bonds. A thermody- namic cycle can thus be used to connect interacting systems differing by constraints on ideal dihedral angles. In this way, we find that fixing Θ in the fully interacting model effects an entropy loss of ≈0.8 kJ/ 3– 3 2 χ (mol·300 K) per rotatable sp sp hybridized bond. Fig. 5. Side-chain NMR order parameters, Saxis, and 1 rotameric populations for eglin c (1cse48). Results of MC Comparisons to experimentally determined sampling plotted against NMR-derived measurements. (a) Comparison of the MC and NMR10 methyl group order side-chain fluctuations χ parameters. (b) Comparison of 1 rotamer state popula- tions determined from MC sampling and experimental Comparisons between these detailed local entro- three-bond J-coupling constants.10 Five independent trials pies and experimental data are ambiguous in several ðresÞ were run for 50,000 sweeps each using Metropolis MC. respects. As we have noted, the quantity Si Error bars represent one standard deviation. 492 Calculation of Side-Chain Torsional Entropy

2 Figure 5 presents results for the specific protein eglin Notably, NMR order parameters, Saxis,estimated c, for which both methyl group order parameters and from a 5-ns MD trajectory of barstar resemble our χ 1 rotamer populations have been experimentally MC results more closely than do those determined determined.10 from only 250 ps of time evolution.4 The diversity of 2 10 Values of Saxis derived from NMR data and those side-chain packings we have identified suggests an calculated from MC simulations (using the approach important role for still slower fluctuations. Even with described in Ref. 68), both shown in Fig. 5a,are our MC sampling procedure, obtaining converged modestly correlated (r=0.66). Previous simulation results for CaM–peptide TΔ(binding)S values requires results obtained from 50-ns MD trajectories for a the implementation of advanced techniques such as detailed model of calbindin have matched experi- staging and the use of Wang–Landau (WL) proce- mental measurements more closely, but not drama- dures. This necessity highlights the limitations asso- tically so (r=0.8).42 Numerical calculations for eglin c ciated with calculating entropies from MD simulations – that utilize a sampling procedure inconsistent with alone, as has been attempted previously.15 18 In one Boltzmann statistics generate still stronger corre- study on protein–protein binding, several shorter MD lation,69 perhaps highlighting the sensitivity of trajectories were run in order to improve the con- 2 Saxis to very sluggish rearrangements. The result of vergence of calculated binding entropies, but the such comparisons indicates that side-chain fluctua- errors were still quite large.70 tions are overly restricted in our model, as might be Combining MC and MD techniques might pro- expected from the neglect of backbone flexibility. vide an optimal approach for exploring structural Alanine orientation, for example, is completely fixed excursions broadly while preserving the dynamical 2 71 in our model, yielding Saxis =1 identically. The NMR character of short-time relaxation. Capturing the 2 10 result for alanine in eglin c, Saxis =0.8, points to non- time dependence of the slowest side-chain rearran- negligible effects of backbone motion, although such gements, which in our model must navigate severe effects appear to correlate weakly with measured dynamical bottlenecks, will likely require impor- side-chain fluctuations.25 tance sampling in trajectory space.72 χ Populations of distinct 1 rotamer states inferred from experiment10 also agree reasonably (but not strikingly) well with results from our simulations Conclusions (see Fig. 5b). The dearth of probabilities between 0.1 and 0.9 indicate that these bonds are strongly We have examined spontaneous side-chain fluctua- biased toward one rotameric state. This fact tions in several folded proteins using computer should not, however, be taken as a sign of overall simulations that sample all side-chain torsional torsional rigidity. Bonds that are not proximal to degrees of freedom simultaneously. Overall, our MC the backbone show greater variability. Indeed, in a method facilitates exploration of rearrangements that typical configuration of our model, roughly one- proceed sluggishly in the course of natural dynamics, sixth of the rotatable sp3–sp3 hybridized side-chain and our model appears to successfully capture the θ bonds in eglin c adopt an ideal dihedral angle i physical character of these variations. Their extent is different from the most probable. likely underestimated due to backbone constraints, rendering conclusions about their thermodynamic Importance of model interactions and thorough significance conservative. sampling We have assessed the impact of various interaction types in restricting the range of side-chain motions, by The high correlation between experimental and quantifying entropy reductions relative to a noninter- calculated TΔ(binding)S values in the CaM–ligand acting reference system. The ability to probe these system suggests that our model includes the inter- interactions separately is a strength of our computa- actions most essential for describing side-chain tional approach that would be difficult to mimic fluctuations within the folded protein. We empha- experimentally. These reductions, normalized by the size the importance of considering energetics beyond number of rotatable bonds, are remarkably consistent those imposing steric constraints, despite the dense among the 12 proteins we have considered, despite environment; when non-steric interactions are significantly heterogeneous distributions of rotational omitted, calculated entropies correlate only moder- freedom. Under the collected influence of steric, ately with calorimetric measurements. Similarly, dispersive, and electrostatic forces, globular proteins neglecting inter-residue correlations and intra-rota- in our model possess, on average, an entropy per meric fluctuations substantially reduces the quanti- rotatable bond of 5.2±0.2 kJ/(mol·300 K) less than tative correspondence with experimental data. These their noninteracting counterparts. results strongly recommend models of side-chain Our binding entropy calculations for CaM–pep- thermodynamics that include intra-rotameric fluc- tide complexes, which correlate strongly with tuations18,44 and respect not only constraints of calorimetric measurements, underscore the thermo- packing but also the diversity and broad energy dynamic importance of side-chain torsional free- spectrum of sterically allowed configurations. dom. They also hint at the possibility that correlated Our MC sampling methods probe diverse side- side-chain fluctuations could communicate struc- chain configurations that may be difficult to access tural change over significant distances. Indeed, NMR using more straightforward sampling methods. studies show that effects of side-chain mutation or Calculation of Side-Chain Torsional Entropy 493 ligand binding on side-chain methyl dynamics can We denote the distance between heavy atoms i and j as 10,73 extend far from the site of perturbation. The rij, their charges as qi and qj, and the set of angles Ψ computational tools we have presented are well describing their HB geometry as . The factor K(r)=0.124r suited to explore this unconventional mechanism for Åmol/kJ accounts empirically for the screening of ionic interactions in the heterogeneous environment of a pro- protein allostery. 31,34 tein's interior. We represent steric as well as dispersion interactions Methods between heavy atoms using a modified LJ potential 8 > l; r br4 Model > 2 ! ! 3 ij ij <> 12 6 rmin rmin L r = e 4 ij ij a5; 4V b j ij ij > ij 2 + rij rij 2 ij The potential energy function governing side-chain > rij rij :> fluctuations in our model is a sum of three physically ; z j : distinct contributions: from the local torsional bias of 0 rij 2 ij covalent bonding (Edihedrals), from direct interactions ð9Þ between non-bonded moieties (Enon-bonded), and from the free energy of aqueous solvation (Eimplicit solvent). min We set the distance rij of minimum energy to be the The local dihedral energy Edihedral,i of a rotatable bond i sum of van der Waals radii for atoms i and j (taken from depends on its hybridized geometry. Since ideal angles are ɛ Ref. 32). Attraction strengths ij are taken from Ref. 31. The difficult to identify for sp3−sp2 hybridized rotatable σ 74 smooth decay at long distances is truncated at rij =2 ij, bonds, we impose no intrinsic bias on the corresponding σ 1/6 min χ where ij =(1/2) rij , and the entire potential is shifted rotations, that is, Edihedral,i =0 independent of i for these by the constant α =0.0615 so that the potential is 3– 3 bonds. For the more prevalent sp sp rotatable bonds, the lim Y j continuous; that is, rij 2 Lij =0.Wedescribethe dihedral energy function is constructed so that the ij ⁎ min −β harsh repulsion at short distances (rij =0.75rij ) with a Boltzmann weight exp( Edihedral,i) is a sum of (un- hard sphere potential (rather than the sharp but smooth normalized) Gaussian distributions centered at ideal − 12 θ r of LJ) for sampling purposes as described below. Also, rotamer angles i, toward that end, we define a non-singular version of the "#!steric interaction X ðÞχ θ 2 i i E ; = k Tln exp ð6Þ ( dihedral i B σ2 θ 2 4 i ðÞ e ; r br tunnel tunnel ij ij ð Þ Lij rij = ; z 4: 10 We parameterize this function through the approximate Lij rij rij rij width of empirical distributions of side-chain torsional σ 74 rotations, =12.7°, as found in the rotamer library. For The superscript “(tunnel)” accompanying other quan- 3– 3 θ each sp sp bond, three ideal values of i are assigned tities indicates usage of L(tunnel) in place of L. Steric using data from Ref. 74 (see Supplemental Material). Since χ repulsions and dispersion attractions are only considered the range of i is unbounded in our simulations, each of for atoms separated by at least three bonds. these three ideal values is in fact repeated with a period of Hydrogen bonding between the donors and acceptors 2π; that is, Gaussian distributions in Eq. (6) are centered at θ θ π θ π … specified in Table 1 of Ref. 33 is described by a potential i, i ±2 , i ±4 , . The strongest overlap among these adapted from Ref. 31, distributions is between neighboring ideal angles; how- 8 2 ! !3 σ > min 12 min 10 ever, in practice, is sufficiently small compared to the <> rHB rHB 4 ij ij D5 ðÞC ; HB4V b : ) spacing between ideal values that the overlap is extremely ; C D0 5 6 + F rij rij 4 0 Hij rij = > rij rij weak. Neglecting this overlap entirely, we could consider :> 0; otherwise: Edihedral,i as a piecewise continuous superposition of quadratic functions centered at each ideal rotamer angle. ð11Þ With this approximation, a protein's total intrinsic dihedral energy can be written The strength D0 =18 kJ/mol of a perfectly aligned HB was chosen such that the total energies of crystallographic X f2 structures lie within the energy range of typical repacked ðÞA c i ; ð Þ Edihedrals kBT hi 7 structures. Averaged over fluctuations in donor–acceptor 2j2 i geometry, the resulting dissociation energy amounts to roughly 11 kJ/mol when the full potential is considered f minu ðÞm u where i = i i i is the deviation of dihedral angle – HBmin χ θ for PYP. The donor acceptor distance rij =2.75 Å of i from its nearest ideal rotamer angle, i. The indicator 3– 3 minimum hydrogen-bonding energy was taken from Ref. function hi takes values of hi =1 if bond i is sp sp HB⁎ 3− 2 31. rij is set to 2.52 Å, and the entire potential is again hybridized and hi =0 if bond i is sp sp hybridized. The η ∑ shifted by a constant =0.0858 to preserve continuity. exact function Edihedrals = iEdihedral,i is a slightly smoothed version of Eq. (7), more closely resembling the detailed Orientation dependence of this model potential is deter- mined by a set Ψ of three angles. In terms of the unit dihedral potentials used in CHARMM and AMBER. vectors û pointing from the donor D to the acceptor A, Our model includes non-bonded interactions due to DA û ′ pointing from the donor to its nearest bonded heavy sterics and van der Waals attractions, due to SBs, and due DD ′ û to hydrogen bonding: atom D , and AA′ pointing from the acceptor to its nearest bonded heavy atom A′, these angles are defined as "#− 1 − 1 ψ =cos (û ·û ′) and ψ =cos (û ·û ′). ψ is the X D DA DD A AD AA n qiqj angle between the normals of the planes defined by (D,D′, E − ðÞQ; A = L r + + H r ; C : non bonded ij ij ij ij D″)and(A,A′,A″), where A″ and D″ are the next ipj Krij rij antecedent heavy atoms, bound either to the acceptor's ð Þ 8 or the donor's nearest bound neighbor or to the acceptor 494 Calculation of Side-Chain Torsional Entropy

or donor itself. See Ref. 33 and Fig. 1 for details.31,33,34 We Sampling take their influence to be multiplicatively separable, All computer simulations were performed in canonical ðÞc ; c ; c ðÞc ðÞc ðÞc ; ð Þ F D A n = fD D fA A gn n 12 ensembles permissive of steric overlaps, that is, according to the regularized potential E(tunnel). Physical quantities of where interest must be calculated for the full potential E,which,of 4 B B course, precludes steric clashes. We have constructed these cos2 c c 90 Vc V180 f ðÞc = D D D ð13Þ two potentials such that converting computed averages D D 0; otherwise: 〈·〉(tunnel) of an arbitrary observable · into physically realistic 〈 〉 2 ψ⁎ averages · is a straightforward task. Let C be the number of For donors that are sp hybridized, D =120°, while for b ⁎ 3 ψ⁎ ψ hard steric overlaps (instances of rij rij ) in a given config- sp donors, D =109.5°. The function fA( A) differs from fD ψ ≤ψ ≤ uration. It is simple to show that ( D) only in the range 60° A 180° over which it is 3 nonzero, and only for sp hybridized acceptors. Finally, ðÞtunnel hd DðÞiC hd i ; ð Þ B = ðÞtunnel 16 0; c N60 and the donor is sp2 hybridized hDðÞiC g ðÞc = n n n 1; otherwise: ð14Þ where the indicator function, Δ(x)=1 for x=0 and Δ(x)=0 Hydrogen bonding between protein and solvent is otherwise, effectively imposes steric constraints. Similarly, allowed when a side-chain donor or acceptor has not partition functions for the two ensembles are related by formed its maximum number of HBs33 with other protein Q=Q(tunnel)〈Δ(C)〉(tunnel). donors or acceptors. Contributions of these bonds to the Poor overlap between the canonical ensemble of interest pairwise interaction energy Enon-bonded are small, favor- and that of the noninteracting reference state requires that able by exactly 2 kJ/mol in all cases, with no distance or the ratio of partition functions Q/Q(ref) in Eq. (4) be angular component. More substantial effects of these computed in stages. To avoid performing simulations with bonds are subsumed in Eimplicit solvent, whose strength is hard steric constraints, we first make use of the above determined by the energy of sequestering non-polar atoms result for the regularized partition function, from solvent by exposing polar moieties instead. ðtunnelÞ ðtunnelÞ We represent solvent–protein interactions primarily Q Q Q ð Þ Q = = hDðCÞi tunnel ð17Þ according to the composition of SASA QðrefÞ QðtunnelÞ QðrefÞ QðrefÞ ðÞQ; A γ ðÞQ; A ; ð Þ Eimplicit solvent = Anon−polar 15 The statistical consequences of adding the dihedral 2 potential E to our reference system can be evaluated where γ =0.3 kJ/molÅ is the surface tension of a dihedrals 37 with little computational effort, since no coupling among hydrocarbon–water interface. The exposed non-polar different rotatable bonds is involved. We can even cal- area A (Θ, Φ) is calculated using a computationally non-polar culate the corresponding ratio of partition functions inexpensive implementation of the Shrake–Rupley (ref,dihedrals) (ref) 2 N (sp3)/2 38 analytically, Q /Q =(2πσ ) . We intro- algorithm. Fifty points are placed at uniform density duce additional factors to exploit this simplicity, on a sphere centered at each heavy atom, with a radius R equal to the sum of its van der Waals radius and that of a ðÞtunnel ðÞref; dihedrals Q hDðÞiðÞtunnel Q Q : ð Þ water molecule. We then determine the fraction x of such ðÞ= C ðÞ; ðÞ 18 points that lie outside all spheres centered on neighboring Q ref Q ref dihedrals Q ref atoms. A non-polar atom's contribution to SASA is computed as 4πR2x. For the crystal structure of PYP, this Finally, we introduce non-bonded and IS interactions in estimate differs from values obtained with the more taxing a gradual way through the potential hi but exact method GETAREA75 by only 1.5% of the total ðÞswitch ðÞλ λ ðtunnelÞ :ð Þ surface area for typical heavy atoms and by 1.0% of the E = Edihedrals + Enon−bonded + Eimplicit solvent 19 total surface area for the final value of Anon-polar. Within the framework of this model, , alanines, By varying the switching parameter λ between 0 and prolines possess no degrees of freedom and therefore and 1, we interpolate between ensembles; in particular, cannot contribute to the overall entropy. In addition, E(switch)(0)=E(ref,dihedrals) and E(switch)(1)=E(tunnel). The non- residues that participate in disulfide bridges, those interacting reference ensemble with dihedral bias can then residues binding to Ca2+ in CaM, and the residue attached be transformed into the fully interacting ensemble in a to the chromophore in PYP are considered to have no series of M steps, rotatable bonds. All bond lengths and angles are taken ðÞtunnel ðÞswitch ðÞλ ðÞswitch ðÞλ ðÞswitch ðÞλ directly from the Protein Data Bank structures for each Q Q 0 Q 1 N Q M1 ; ðÞref;dihedrals = ðÞswitch ðÞswitch ðÞswitch protein, and in all cases when more than one structure is Q Q ðÞλ1 Q ðÞλ2 Q ðÞλM resolved, the first structure is always used. Within the ð Þ crystalline unit, the most complete structures were used. In 20 crystal structures where non-standard amino acids are where λ =1−i/M. Partition function ratios are evaluated used to assist in crystallization or phasing, we mutate those i according to residues back to standard amino acids before sampling. Unresolved residues at the N- or C-termini were not ðÞswitch ðÞλ h Q i1 ðÞtunnel ðÞswitch included in the modeling, while unresolved side chains or =hexp E − mEimplicit solvent iλ ; ðÞswitch ðÞλ non bonded i atoms in among the resolved portion of each protein were Q i M arbitrarily assigned appropriate initial positions. ð21Þ We developed this model using only PYP and protein L hd iðÞswitch for testing and refining. No potential refinement was where λi denotes an average in the ensemble cor- – (switch) λ done to optimize results for eglin c, CaM, or CaM ligand responding to energy function E ( i). By making M complexes. large, the difference between consecutive ensembles can be Calculation of Side-Chain Torsional Entropy 495 made arbitrarily small, ensuring convergence of numerical Department of Energy under Contract No. DE- averages in reasonable time. AC02-05CH11231. K.H.D. was supported by a Our MC simulations proceed by steps that attempt to National Science Foundation Graduate Research reassign the value of a randomly selected side-chain χ χ (trial) Fellowship and the Berkeley Fellowship. dihedral angle i. Trial values i are generated from (gen) −β a distribution p proportional to exp( Edihedrals), accounting for local dihedral biases. Specifically, for sp3– Supplementary Data sp3 hybridized bonds,

1 hi Supplementary data associated with this article ðÞ ðÞtrial 1 ðÞtrial 2 p gen χ = 2kj2 2exp f =2j2 : ð22Þ can be found, in the online version, at doi:10.1016/ i 3 i j.jmb.2009.05.068 In this case, two-thirds of the attempted MC moves include hopping to a different rotameric state. For sp3−sp2 hybridized bonds, which lack intrinsic torsional bias in References our model, trial values are selected from a distribution χ (trial) 1. Chothia, C. (1975). Structural invariants in protein uniform in i . These trial moves are accepted with a 254 – Metropolis probability57 p(acc) based on the Boltzmann folding. Nature, , 304 308. distribution determined by E(switch)(λ): 2. Misura, K. M. S., Morozov, A. V. & Baker, D. (2004). hihiAnalysis of anisotropic side-chain packing in proteins ðÞ ðÞ and application to high-resolution structure predic- p acc = min 1; exp hλ DE tunnel þDE non−bonded implicit solvent tion. J. Mol. Biol. 342, 651–664. ð23Þ 3. Igumenova, T. I., Frederick, K. K. & Wand, A. J. (2006). Characterization of the fast dynamics of protein amino Δ (tunnel) Δ Here, Enon-bonded and Eimplicit solvent are changes in acid side chains using NMR relaxation in solution. interaction energies resulting from the trial move. This Chem. Rev. 106, 1672–1699. acceptance probability does not involve changes in 4. Wong, K. B. & Daggett, V. (1998). Barstar has a highly Edihedrals, whose statistics are fully addressed by the dynamic hydrophobic core: evidence from molecular generation probability of Eq. (22). dynamics simulations and nuclear magnetic reso- Calculations were performed for many different values of nance relaxation data. , 37, 11182–11192. λ (including λ=0 and λ=1, corresponding to the noninter- 5. Li, Z., Raychaudhuri, S. & Wand, A. J. (1996). Insights acting reference system with dihedral bias and the into the local residual entropy of proteins provided by regularized full potential, respectively). All simulations NMR relaxation. Protein Sci. 5, 2647–2650. were repeated multiple times starting from randomly 6. Lipari, G. & Szabo, A. (1982). Model-free approach to chosen initial side-chain configurations. Errors were esti- the interpretation of nuclear magnetic resonance mated from variances among these trials or sets of trials. relaxation in . 1. Theory and range of Straightforward Metropolis MC sampling was suffi- validity. J. Am. Chem. Soc. 104, 4559–4570. cient to generate much of the data presented here. In the 7. Best, R. B., Clarke, J. & Karplus, M. (2005). What case of the CaM–ligand complexes, however, precise contributions to protein side-chain dynamics are estimates could only be obtained with umbrella sampling probed by NMR experiments? A molecular dynamics techniques. For this purpose, we employed the adaptive simulation analysis. J. Mol. Biol. 349, 185–203. method of Wang and Landau (WL).39 Their original 8. Hu, H., Hermans, J. & Lee, A. L. (2005). Relating side- procedure was used to first construct a rough bias chain mobility in proteins to rotameric transitions: function, which was subsequently refined in several insights from molecular dynamics simulations and additional steps. During each refinement step, multiple NMR. J. Biomol. NMR, 32, 151–162. independent simulations were performed using the same 9. Mittermaier, A. & Kay, L. E. (2001). Chi1 torsion angle bias, and their resulting energy distributions were pooled dynamics in proteins from dipolar couplings. J. Am. to obtain a new estimate for the density of states.76 This Chem. Soc. 123, 6892–6903. procedure was repeated until the density of states could 10. Clarkson, M. W., Gilmore, S. A., Edgell, M. H. & Lee, be confidently constructed over a range of energies A. L. (2006). Dynamic coupling and allosteric behavior spanning those characteristic of physiological tempera- in a nonallosteric protein. Biochemistry, 45, 7693–7699. tures. Physical averages were finally computed in a 11. Shapovalov, M. V. & Dunbrack, R. L. (2007). Statistical nonadaptive run according to and conformational analysis of the electron density of protein side chains. Proteins, 66, 279–303. hd exp½h ðÞQ; A exp½iðÞ 12. Frederick, K. K., Marlow, M. S., Valentine, K. G. & E WE WEðÞ hd i = ; ð24Þ Wand, A. J. (2007). Conformational entropy in mole- hexp½hEðÞQ; A exp½iWEðÞ WEðÞ cular recognition by proteins. Nature, 448, 325–329. 13. Kussell, E., Shimada, J. & Shakhnovich, E. I. (2001). where W(E) denotes the WL bias potential in units of kBT Excluded volume in protein side-chain packing. J. 〈 〉 and · W(E) indicates an average over the WL-biased Mol. Biol. 311, 183–193. ensemble. 14. Koehl, P. & Delarue, M. (1994). Application of a self- consistent mean field theory to predict protein side- chains conformation and estimate their conforma- tional entropy. J. Mol. Biol. 239, 249–275. 15. Karplus, M. & Kushick, J. N. (1981). Method for Acknowledgements estimating the configurational entropy of macromo- lecules. Macromolecules, 14, 325–332. This work was supported by the Director, Office of 16. Gohlke, H. & Case, D. A. (2004). Converging free energy Science, Office of Basic Energy Sciences, Materials estimates: MM-PB(GB)SA studies on the protein– Sciences and Engineering Division, of the U.S. protein complex Ras–Raf. J. Comput. Chem. 25, 238–250. 496 Calculation of Side-Chain Torsional Entropy

17. Killian, B. J., Kravitz, J. Y. & Gilson, M. K. (2007). surfaces: separating geometry from chemistry. Proc. Extraction of configurational entropy from molecular Natl Acad. Sci. USA, 105, 2274–2279. simulations via an expansion approximation. J. Chem. 37. Sharp, K. A., Nicholls, A., Fine, R. F. & Honig, B. Phys. 127, 024107. (1991). Reconciling the magnitude of the microscopic 18. Chang, C.-E. A., McLaughlin, W. A., Baron, R., Wang, and macroscopic hydrophobic effects. Science, 252, W. & McCammon, J. A. (2008). Entropic contributions 106–109. and the influence of the hydrophobic environment in 38. Shrake, A. & Rupley, J. A. (1973). Environment and promiscuous protein–protein association. Proc. Natl exposure to solvent of protein atoms. Lysozyme and Acad. Sci. USA, 105, 7456–7461. insulin. J. Mol. Biol. 79, 351–371. 19. Gautier, R. & Tuffery, P. (2003). Critical assessment of 39. Wang, F. & Landau, D. P. (2001). Efficient, multiple- side-chain conformational space sampling procedures range random walk algorithm to calculate the density designed for quantifying the effect of side-chain of states. Phys. Rev. Lett. 86, 2050–2053. environment. J. Comput. Chem. 24, 1950–1961. 40. Fersht, A. (1999). Structure and Mechanism in Protein 20. Hu, X. & Kuhlman, B. (2006). Protein design simula- Science. W. H. Freeman and Company, New York. tions suggest that side-chain conformational entropy 41. Hattori, M., Li, H., Yamada, H., Akasaka, K., is not a strong determinant of amino acid environ- Hengstenberg, W., Gronwald, W. & Kalbitzer, H. R. mental preferences. Proteins, 62, 739–748. (2004). Infrequent cavity-forming fluctuations in 21. Zhang, J. & Liu, J. S. (2006). On side-chain conforma- HPr from Staphylococcus carnosus revealed by tional entropy of proteins. PLoS Comput. Biol. 2, pressure- and temperature-dependent tyrosine ring 1586–1591. flips. Protein Sci. 13, 3104–3114. 22. Zídek, L., Novotny, M. V. & Stone, M. J. (1999). 42.Showalter,S.A.,Johnson,E.,Rance,M.& Increased protein backbone conformational entropy Brüschweiler, R. (2007). Toward quantitative inter- upon hydrophobic ligand binding. Nat. Struct. Biol. 6, pretation of methyl side-chain dynamics from NMR 1118–1121. by molecular dynamics simulations. J. Am. Chem. Soc. 23. Bernini, A., Ciutti, A., Spiga, O., Scarselli, M., Klein, S., 129, 14146–14147. Vannetti, S. et al. (2004). NMR and MD studies on the 43. Doig, A. J. & Sternberg, M. J. (1995). Side-chain interaction between ligand peptides and alpha-bun- conformational entropy in protein folding. Protein Sci. garotoxin. J. Mol. Biol. 339, 1169–1177. 4, 2247–2251. 24. Arumugam, S., Gao, G., Patton, B. L., Semenchenko, 44. Chang, C. A., Chen, W. & Gilson, M. K. (2007). Ligand V., Brew, K. & Doren, S. R. V. (2003). Increased configurational entropy and protein binding. Proc. backbone mobility in beta-barrel enhances entropy Natl Acad. Sci. USA, 104, 1534–1539. gain driving binding of N-TIMP-1 to MMP-3. J. Mol. 45. Ratnaparkhi, G. S., Ramachandran, S., Udgaonkar, J. B. Biol. 327, 719–734. & Varadarajan, R. (1998). Discrepancies between the 25. Mittermaier, A., Kay, L. E. & Forman-Kay, J. D. (1999). NMR and X-ray structures of uncomplexed barstar: Analysis of deuterium relaxation-derived methyl axis analysis suggests that packing densities of protein order parameters and correlation with local structure. structures determined by NMR are unreliable. J. Biomol. NMR, 13, 181–185. Biochemistry, 37, 6958–6966. 26. Shetty, R. P., Bakker, P. I. W. D., DePristo, M. A. & 46. Babu, Y. S., Bugg, C. E. & Cook, W. J. (1988). Structure Blundell, T. L. (2003). Advantages of fine-grained side of calmodulin refined at 2.2 Å resolution. J. Mol. Biol. chain conformer libraries. Protein Eng. 16, 669–963. 204, 191–204. 27. Chandler, D. & Andersen, H. (1972). Optimized 47. Teeter, M. M., Roe, S. M. & Heo, N. H. (1993). Atomic cluster expansions for classical fluids. II. Theory of resolution (0.83 Å) crystal structure of the hydro- molecular liquids. J. Chem. Phys. 57, 1930–1931. phobic protein crambin at 130 K. J. Mol. Biol. 230, 28. Schweizer, K. S. & Curro, J. G. (1987). Integral- 292–311. equation theory of the structure of melts. 48. Bode, W., Papamokos, E. & Musil, D. (1987). The high- Phys. Rev. Lett. 58, 246–249. resolution x-ray crystal structure of the complex 29. Rohl, C. A., Strauss, C. E. M., Misura, K. M. S. & Baker, formed between subtilisin Carlsberg and eglin c, an D. (2004). Protein structure prediction using Rosetta. elastase inhibitor from the leech Hirudo medicinalis. Methods Enzymol. 383,66–93. Structural analysis, subtilisin structure and interface 30. Weeks, J., Chandler, D. & Andersen, H. (1971). Role geometry. Eur. J. Biochem. 166, 673–692. of repulsive forces in determining the equilibrium 49. Derrick, J. P. & Wigley, D. B. (1994). The third igg-binding structure of simple liquids. J. Chem. Phys. 54, domain from streptococcal protein G. An analysis by X- 5237–5247. ray crystallography of the structure alone and in a 31. Mayo, S. L., Olafson, B. D. & Goddard, W. A., III complex with Fab. J. Mol. Biol. 243, 906–918. (1990). DREIDING: a generic force field for molecular 50. O'Neill, J. W., Kim, D. E., Baker, D. & Zhang, K. Y. simulations. J. Phys. Chem. 94, 8897–8909. (2001). Structures of the B1 domain of protein L from 32. Tsai, J., Taylor, R., Chothia, C. & Gerstein, M. (1999). Peptostreptococcus magnus with a tyrosine to trypto- The packing density in proteins: standard radii and phan substitution. Acta Crystallogr., Sect. D: Biol. volumes. J. Mol. Biol. 290, 253–266. Crystallogr. 57, 480–487. 33. Stickle, D. F., Presta, L. G., Dill, K. A. & Rose, G. D. 51. Brudler, R., Meyer, T. E., Genick, U. K., Devanathan, (1992). Hydrogen bonding in globular proteins. J. Mol. S., Woo, T. T., Millar, D. P. et al. (2000). Coupling of Biol. 226, 1143–1159. hydrogen bonding to chromophore conformation and 34. Gordon, D. B., Marshall, S. A. & Mayo, S. L. (1999). function in photoactive yellow protein. Biochemistry, Energy functions for protein design. Curr. Opin. Struct. 39, 13478–13486. Biol. 9, 509–513. 52. Kang, B. S., Devedjiev, Y., Derewenda, U. & Dere- 35. Chandler, D. (2005). Interfaces and the driving force of wenda, Z. S. (2004). The pdz2 domain of syntenin at hydrophobic assembly. Nature, 437, 640–647. ultra-high resolution: bridging the gap between 36. Giovambattista, N., Lopez, C. F., Rossky, P. J. & macromolecular and small molecule crystallography. Debenedetti, P. G. (2008). Hydrophobicity of protein J. Mol. Biol. 338, 483–493. Calculation of Side-Chain Torsional Entropy 497

53. Poy, F., Yaffe, M. B., Sayos, J., Saxena, K., Morra, M., 64. Fallon, J. L. & Quiocho, F. A. (2003). A closed compact Sumegi, J. et al. (1999). Crystal structures of the XLP structure of native Ca(2+)-calmodulin. Structure, 11, protein SAP reveal a class of SH2 domains with 1303–1307. extended, phosphotyrosine-independent sequence 65. Zhou, Y., Vitkup, D. & Karplus, M. (1999). Native recognition. Mol. Cell, 4, 555–561. proteins are surface-molten solids: application of the 54. Schindelin, H., Jiang, W., Inouye, M. & Heinemann, U. Lindemann criterion for the solid versus liquid state. (1994). Crystal structure of CspA, the major cold shock J. Mol. Biol. 285, 1371–1375. protein of Escherichia coli. Proc. Natl Acad. Sci. USA, 91, 66. DeLano, W. (2007). MacPyMOL: a PyMOL-based 5119–5123. molecular graphics application for MacOS X, DeLano 55. Vijay-Kumar, S., Bugg, C. E. & Cook, W. J. (1987). Scientific LLC, Palo Alto, CA. Structure of ubiquitin refined at 1.8 Å resolution. J. 67. Mittermaier, A. & Kay, L. E. (2006). New tools provide Mol. Biol. 194, 531–544. new insights in NMR studies of protein dynamics. 56. Leahy, D. J., Hendrickson, W. A., Aukhil, I. & Science, 312, 224–228. Erickson, H. P. (1992). Structure of a fibronectin type 68. Prabhu, N. V., Lee, A. L., Wand, A. J. & Sharp, K. A. III domain from tenascin phased by MAD analysis of (2003). Dynamics and entropy of a calmodulin– the selenomethionyl protein. Science, 258, 987–991. peptide complex studied by NMR and molecular 57. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., dynamics. Biochemistry, 42, 562–570. Teller, A. H. & Teller, E. (1953). Equation of state 69. Shehu, A., Kavraki, L. E. & Clementi, C. (2007). On the calculations by fast computing machines. J. Chem. characterization of protein native state ensembles. Phys. 21, 1087. Biophys. J. 92, 1503–1511. 58. Marlow, M. S. & Wand, A. J. (2006). Conformational 70. Grunberg, R., Nilges, M. & Leckner, J. (2006). dynamics of calmodulin in complex with the calmo- Flexibility and conformational entropy in protein– dulin-dependent kinase kinase alpha calmodulin- protein binding. Structure, 14, 683–693. binding domain. Biochemistry, 45, 8732–8741. 71. Deng, Y. & Roux, B. (2008). Computation of binding 59. Frederick, K. K., Kranz, J. K. & Wand, A. J. (2006). free energy with molecular dynamics and grand Characterization of the backbone and side chain canonical Monte Carlo simulations. J. Chem. Phys. dynamics of the CaM–CaMKip complex reveals 128, 115103. microscopic contributions to protein conformational 72. Bolhuis, P. G., Chandler, D., Dellago, C. & Geissler, P. L. entropy. Biochemistry, 45, 9841–9848. (2002). Transition path sampling: throwing ropes over 60. Osawa, M., Tokumitsu, H., Swindells, M. B., Kurihara, rough mountain passes, in the dark. Annu. Rev. Phys. H., Orita, M., Shibanuma, T. et al. (1999). A novel Chem. 53, 291–318. target recognition revealed by calmodulin in complex 73. Lee, A. L., Kinnear, S. A. & Wand, A. J. (2000). with Ca2+-calmodulin-dependent kinase kinase. Nat. Redistribution and loss of side chain entropy upon Struct. Biol. 6, 819–824. formation of a calmodulin–peptide complex. Nat. 61. Meador, W. E., Means, A. R. & Quiocho, F. A. (1992). Struct. Biol. 7,72–77. Target recognition by calmodulin: 2.4 Å 74. Lovell, S. C., Word, J. M., Richardson, J. S. & structure of a calmodulin–peptide complex. Science, Richardson, D. C. (2000). The penultimate rotamer 257, 1251–1255. library. Proteins, 40, 389–408. 62. Clapperton, J. A., Martin, S. R., Smerdon, S. J., Gamblin, 75. Fraczkiewicz, R. & Braun, W. (1998). Exact and S. J. & Bayley, P. M. (2002). Structure of the complex of efficient analytical calculation of the accessible surface calmodulin with the target sequence of calmodulin- areas and the gradients for macromolecules. J. Comp. dependent protein kinase I: studies of the kinase Chem. 19, 319–333. activation mechanism. Biochemistry, 41, 14669–14679. 76. Jayasri, D., Sastry, V. S. S. & Murthy, K. P. N. (2005). 63. Aoyagi, M., Arvai, A. S., Tainer, J. A. & Getzoff, E. D. Wang–Landau Monte Carlo simulation of isotropic– (2003). Structural basis for endothelial nitric oxide nematic transition in liquid crystals. Phys. Rev., E Stat. synthase binding to calmodulin. EMBO J. 22, 766–775. Nonlinear Soft Matter Phys. 72, 036702.