Loss of conformational entropy in folding calculated using realistic ensembles and its implications for NMR-based calculations

Michael C. Baxaa,b, Esmael J. Haddadianc, John M. Jumperd, Karl F. Freedd,e,1, and Tobin R. Sosnicka,b,e,1

aDepartment of Biochemistry and Molecular Biology, bInstitute for Biophysical Dynamics, cBiological Sciences Collegiate Division, dJames Franck Institute and Department of Chemistry, and eComputation Institute, The University of Chicago, Chicago, IL 60637

Edited* by Robert L. Baldwin, Stanford University, Stanford, CA, and approved September 12, 2014 (received for review April 28, 2014)

The loss of conformational entropy is a major contribution in the 2° structure and weakly correlated with burial. Combining this thermodynamics of . However, accurate determi- correlation with the average loss of BB entropy for each 2° structure nation of the quantity has proven challenging. We calculate this type provides a site-resolved estimate of the entropy loss for an loss using molecular dynamic simulations of both the native protein input structure (godzilla.uchicago.edu/cgi-bin/PLOPS/PLOPS.cgi). and a realistic denatured state ensemble. For ubiquitin, the total This estimate can assist in thermodynamic studies and coarse- Δ = · −1 change in entropy is T STotal 1.4 kcal mol per residue at 300 K grained modeling of protein dynamics and design. with only 20% from the loss of side-chain entropy. Our analysis exhibits mixed agreement with prior studies because of the use of Results more accurate ensembles and contributions from correlated motions. The NSE and DSE of Ub (15) are generated from molecular Buried side chains lose only a factor of 1.4 in the number of confor- dynamics (MD) simulations using the CHARMM36 Ω Ω mations available per rotamer upon folding ( U/ N). The entropy loss (14, 16) and the TIP3P water model (17). The DSE is produced for helical and sheet residues differs due to the smaller motions of from short trajectories initiated from 315 members of a larger Δ = · −1 helical residues (T Shelix−sheet 0.5 kcal mol ), a property not fully ensemble constructed using BB (ϕ,ψ) dihedral angles found in reflected in the amide N-H and carbonyl C=O bond NMR order param- a PDB-based coil library (18). The DSE’s NMR chemical shifts eters. The results have implications for the thermodynamics of folding (19, 20), N-H residual dipolar couplings (RDCs) (18) (Fig. S1), and binding, including estimates of solvent ordering and microscopic and radius of gyration, Rg (14, 18, 21), agree with data for entropies obtained from NMR. chemically denatured Ub. The average helicity for residues lo- cated in the helix and the rest of the residues is 2% in water at NMR order parameters | | helix propensity | 293 K (22). Hydrogen exchange studies are also consistent with sheet propensity | denatured state all of the H-bonds being broken upon global unfolding (23), further supporting the use of the statistical coil model to de- n accurate determination of the loss of conformational en- scribe the DSE. Atropy is critical for dissecting the energetics of reactions To maintain agreement with experiment, the (ϕ,ψ) angles are involving protein motions, including folding, conformational constrained in the simulations to remain in their Rama basins via change, and binding (1–6). Given the difficulty of directly mea- reflecting walls that separate the five major basins (Fig. S2). suring the conformational entropy, most early estimates relied Without these constraints, the MD would undergo interbasin on computational approaches (2, 7–10), although, more recently, transitions, altering the angles to produce an ensemble that no NMR methods have been used to measure site-resolved entropies longer agrees with the experimental RDCs. Additionally, the (11). The computational methods often calculated the entropy of either the ensemble (NSE) or the denatured state en- Significance semble (DSE) and invoked assumptions about the entropy of the other ensemble [e.g., assuming the NSE is a single state or that the DSE is a composite of all side-chain (SC) rotameric states in the Despite 40 years of study, no consensus has been achieved on Protein Data Bank (PDB)]. Most previous approaches focused on the magnitude of the loss of backbone (BB) and side-chain (SC) helices and omitted contributions from vibrations and correlated entropies upon folding, even though these quantities are es- motions (12, 13), thereby partly accounting for the spectrum of sential for characterizing the energetics of folding and con- calculated values. formational change. We calculate the loss using experimentally We address these issues by calculating the chain’s conforma- validated denatured and native state ensembles, avoiding the tional entropy from the distributions of the backbone (BB) (ϕ,ψ) drastic assumptions used in many past analyses. By also ac- counting for correlated motions, we find that the loss of BB and SC rotametric angles, [χn], obtained from all-atom simu- lations of the NSE and DSE for mammalian ubiquitin (Ub). This entropy is three- to fourfold larger than the SC contribution. study extends our previous calculation of the loss of BB entropy Our values differ with some calculations by up to a factor of 3 and that used an experimentally validated DSE (14). The calculated depend strongly on 2° structure. These results have implications angular distributions reflect both the Ramachandran (Rama) upon other thermodynamic properties, the estimation of entropy basin populations and the torsional vibrations. Correlated using NMR methods, and coarse-grained simulations. motions are accounted for through the use of joint probability distributions [e.g., P(ϕ,ψ,χ1,χ2)]. Author contributions: M.C.B., K.F.F., and T.R.S. designed research; M.C.B. and E.J.H. per- The computed loss of BB entropy is 80% of the total entropy formed research; M.C.B., J.M.J., and T.R.S. analyzed data; and M.C.B., J.M.J., K.F.F., and loss at 300 K. The BB entropy is independent of burial and T.R.S. wrote the paper. residue type (excluding Pro, Gly, and pre-Pro residues) but The authors declare no conflict of interest. depends on the secondary structure. Helical residues lose more *This Direct Submission article had a prearranged editor. −1 BB entropy than sheet residues, TΔShelix−sheet = 0.5 kcal·mol at 1To whom correspondence may be addressed. Email: [email protected] or trsosnic@ 300 K, a difference not fully reflected by either amide N-H or uchicago.edu. carbonyl C=O bond NMR order parameters. The SC entropy This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. −1 −1 loss, TΔSSC ∼ 0.2 kcal·mol ·rotamer , is largely independent of 1073/pnas.1407768111/-/DCSupplemental.

15396–15401 | PNAS | October 28, 2014 | vol. 111 | no. 43 www.pnas.org/cgi/doi/10.1073/pnas.1407768111 Downloaded by guest on September 28, 2021 Fig. 1. Loss of conformational entropy upon folding for Ub. (A and B) Average value of TΔS is shown (line) [native 2° structure at top: strand (blue), helix (red), and coil (green)]. (C and D) Entropy and 2° structure. Data are calculated using 10° bins. Entropy differences (ΔS) in A–C are independent of bin size and can be compared with values in the literature. Absolute entropy in D should only be compared with other calculations using 10° bins. Points are spread out horizontally for ease of viewing.

removal of constraints would produce chain collapse [e.g., difference emerges because of a higher entropy reduction in the unconstrained exp ϕ ψ MD < 15 Å (24, 25), whereas Rg ∼ 25 Å (21)]. NSE resulting from a tighter ( , ) distribution for residues in Entropies are calculated as before using S = R ΣPilnPi, where helices and, to a lesser extent, in turns, which are considered to Pi is the probability of being in microstate i (9, 12–14, 26–30). be coil residues (Fig. 2A). Because helical residues, regardless of Although entropy is defined in terms of the probability dis- amino acid type, sample a very similar (ϕ,ψ) distribution in the tributions over all atomic positions, it can be decomposed into NSE, variations in the BB entropy loss upon helix formation the protein entropy and the average solvent entropy conditional arise from residue-dependent differences in the DSE (Fig. 1D on the protein’s microstates. Conversion to (ϕ,ψ,[χn]) angular and Fig. S4). However, as discussed below, the net entropy loss coordinates from (x,y,z) coordinates normally requires a Jacobian. does not correlate well with helical propensity. However, the Jacobian is constant across phase space (assuming The DSE entropies of the β-branched residues Val and Ile are of other bond angles and lengths are constant); hence, it is not lower entropy than the average of other non-Gly, pre-Pro, and Pro −1 calculated. Similarly, the size of a microstate in phase space is residues by TΔS = 0.3 kcal·mol (Fig. S4C). The spread for the affected by bin size, but this effect cancels for entropy differences other residues is due to sequence dependence and closely matches so that our values can be compared with other values of ΔS in the the spread calculated from Dunbrack’s coil library (31) (Fig. S4D). literature. The restriction to BB and SC angles ignores other degrees of freedom, such as bond length vibrations, that are as- NMR Order Parameters and Entropy. Lipari–Szabo NMR order BIOPHYSICS AND sumed to cancel in entropy differences. parameters (SLZ) have been used to estimate BB and SC en- COMPUTATIONAL BIOLOGY To account for correlated motions, entropies are calculated tropies in folding and binding (3, 11–13, 26, 32–37). The order using joint probability distributions for up to four angles at a parameter reflects the angular distribution of a bond vector [e.g., time, where the effects of limited sampling statistics are, on av- the peptide N-H) relative to the molecular (reference) frame for −1 erage, TΔSsampling ∼ 0.1 kcal·mol (Fig. S3A). For the loss of chain entropy upon folding per residue, TΔSTotal equals 1.380.46 −1

kcal·mol (the subscript represents the SD across the sequence CHEMISTRY rather than the uncertainty). Rigorously, the total entropy can- not be separated into BB and SC values due to correlations. Nevertheless, such a separation is useful even if not fully valid. Accordingly, we compute the individual BB entropy and then the corrected SC entropy from the difference between the total and BB − entropies. The average reduction of 0.14 kcal·mol 1 due to corre- lated BB/SC motions is assigned to the SC entropy. The BB entropy −1 loss, 1.10.3 kcal·mol , is fourfold larger than the SC loss, TΔSSC = −1 −1 −1 0.240.30 kcal·mol (0.150.16 kcal·mol ·rotamer ). If the mutual entropy between the BB and SC is split evenly between the two, the BB term still is threefold larger than the SC term. The errors due to finite sampling and bin size are estimated from four tests (Fig. S3). First, the NSE and DSE ensembles are split into halves, quarters, eighths, and sixteenths. The de- pendence of the total entropy on ensemble size indicates that our calculation has converged to within ∼2%. In addition, the resi- due-level entropies are very similar whether calculated using Fig. 2. Rama distributions and order parameters in the NSE vary with 2° structure. (A) Native distribution for I30 (helix) is tighter than for I3 (sheet) and either a 1/2:1/2 or a 3/4:1/4 decomposition of the total ensemble, · −1 B = or with 10° or 15° bins (using the entire ensemble). These four has 0.44 kcal mol less entropy. ( ) Probability distributions for the N-H, C O, and peptide plane vectors for I3 and I30. The motions of their N-H vectors are calculations indicate a sampling uncertainty of TΔS ∼ 0.1 −1 −1 more similar to each other compared with the motions of the other two pairs. · · SI Text 2 kcal mol residue ( ). (C) Entropy calculated using SLZ with the diffusion-in-a-cone model under- estimates BB entropy of sheet residues in the NSE. Points are colored according Secondary Structure and Amino Acid Dependence. The loss of SC to 2° structure as in Fig. 1. (D) Entropy differences calculated from the probability entropy is largely independent of 2° structure (Fig. 1B). In con- distributions of the vectors in B are less than those entropy differences calculated trast, the change in BB entropy varies for helices, sheets, and coil from the BB, indicating that none of the three vectors fully reflects the BB en- −1 residues, being 1.5, 1.0, and 1.2 kcal·mol , respectively. This tropy. ILE, isoleucine (Ile).

Baxa et al. PNAS | October 28, 2014 | vol. 111 | no. 43 | 15397 Downloaded by guest on September 28, 2021 subnanosecond motions (32, 33)]. Our NSE BB entropy is similar maximum information spanning tree (39). Our calculation omits to the entropy calculated using the N-H order parameter with correlations across more than three pairs of BB dihedral angles, a diffusion-in-a-cone model (32), albeit with a notable discrepancy across more than three sets of SCs, or between BB and SC angles 2 (Fig. 2 and Fig. S5). Generally, SLZ is slightly higher for helical than on different residues. The neglect of these terms should not af- 2 2 fect our final values beyond the stated statistical uncertainties, for sheet residues (SLZ,helix ∼ 0.90–0.92 and SLZ,sheet ∼ 0.8–0.91), − but their BB entropy often differs significantly (i.e., TΔShelix sheet = especially considering that the correlations examined above − BB yielded even smaller contributions (13). 0.52 kcal·mol 1, even for sheet residues having high order param- S2 ≥ eters, LZ 0.89). As a result, no single relationship between the Burial and Entropy. The loss of BB entropy is independent of N-H order parameter and entropy can simultaneously describe r = − C burial ( 0.10), whereas the change in SC entropy is weakly both helical and sheet residues (Fig. 2 ). correlated with the solvent accessible surface area (SASA) that Δ − We also calculated T Shelix−sheet directly from the motions of becomes buried upon folding, <ΔSASADSE NSE>/ the N-H, C=O, and peptide plane vectors for I3 (sheet) and I30 r = − ( 0.47; Fig. S7). Non-Gly, Ala, and Pro residues that bury (helix) (Fig. 2B). For these three vectors, TΔShelix sheet = 0.18, − vector either less or more than 50% of their SASA upon folding have an 0.29, and 0.35 kcal·mol 1, respectively, whereas our Rama-based Δ · −1 − average T S per rotamer of 0.080.13 or 0.200.16 kcal mol , re- calculation yields 0.44 kcal·mol 1. Because the calculations are spectively. The correlation slightly improves when the loss is based on the same data, they should agree if they report on the compared with the burial relative to the average for each residue same quantity (38). Given that they produce disparate values, we type (r = 0.52; Fig. 3A). conclude that the three vectors are reporting on different Considerable site-to-site variation in the loss of SC entropy motions (38) and are not true proxies of the total BB entropy. exists for exposed and buried residues (Fig. 4). The surface 2 ∼ Nevertheless, a single model relating each vector’s SLZ value to residues K33 and K48 ( 20% burial) have significantly different − its own entropy, rather than the BB entropy, successfully SC entropy losses, 0.5 and −0.1 kcal·mol 1, respectively (Fig. describes both helical and sheet residues (32) (Fig. S5). 4A). K33’s χ1 and χ2 angles are more restricted because of the threefold greater number of contacts formed by SCs in the NSE, Correlated Motions. The values listed above are adjusted for corre- even though both residues have comparable exposure (135 vs. − · −1 2 lated motions, a moderate effect on average ( 0.120.26 kcal mol ) 144 Å ) and similar DSE distributions (Fig. S8). Furthermore, (Fig. S6). The contributions from correlations between the BB and some exposed Thr and Ser are restricted in the NSE (TΔSSC = SC angles are deduced from S − (S + S ), the difference −1 Total BB SC 0.30.2 kcal·mol ). The S20 SC is only 10% buried in the NSE, yet between the total entropy and the sum of the BB and SC entropies, · −1 P ϕ ψ P χ its single rotamer loses 0.2 kcal mol of entropy on folding. independently evaluated from the distributions i( , )and i([ n]), Conversely, burial does not always imply the same reduction in respectively. These correlations differ in the NSE and DSE, −0.05 − − SC entropy even for the same residue type. Four core Iles with over − · 1· 1 A − − and 0.19 kcal mol residue , respectively (Fig. S6 ). The 90% burial have TΔSDSE NSE = 0.4–0.7 kcal·mol 1 due to varia- difference arises because the BB samples multiple Rama basins in SC χ χ B χ tions in the native state ( 1, 2) distributions (Fig. 4 and Fig. S8), the DSE, and the BB angles are correlated with 1. In contrast, consistent with other studies (40). Conformational heterogeneity the BB angles typically remain in a single basin in the NSE, so the appears in the SC of a highly buried Arg while still maintaining effect of these correlations is absent. Hence, this correlation ordered contacts (41). These and other examples illustrate that produces a measurable reduction for the difference between the Δ = − · −1 upon folding, surface and core residues lose an amount of entropy NSE and DSE, T SSC correction 0.14 kcal mol . that depends on context. Hence, the common assumption that SCs The contribution from correlated BB rotations of adjacent adopt a single state when buried and an unfolded-like number of residues is evaluated from the difference of the joint and in- states when exposed to solvent is inaccurate in the aggregate. dividual entropies, SBB(ϕi,ψi,ϕi+1,ψi+1) − [(SBB(ϕi,ψi) + SBB(ϕi+1, ψ A i+1)] (Fig. S6 ). This effect varies along the sequence (e.g., Estimating Entropy from Structure. Our results suggest that Δ NSE = − : − · −1 T SBB corr 0 06 and 0.18 kcal mol for helices and sheets, knowledge of the structure is sufficient to estimate the site- respectively). However, the cumulative corrections are similar in resolved BB and SC entropies. The BB entropy is estimated −1 the NSE and the DSE, −0.14 and −0.13 kcal·mol , respectively. using the average values for α, β, and coil residues, corrected for DSE−NSE Consequently, their difference is small, hTΔS i = 0:02 the dependence of the DSE entropy on residue type. The SC BB corr kcal · mol 1. This near-cancellation primarily is due to the dif- entropy’s sensitivity to burial is estimated using a linear corre- − ferent signs of the corrections for helical and sheet residues; lation with the relative burial (ΔS·rot 1 = 0.4·ΔBurialrel;Fig.3A). h Δ DSE−NSEi = − : · −1 T SBB corr 0 08 and 0.06 kcal mol , respectively. Helical The sum provides a reasonable estimate of the total entropy for residues have smaller corrections in the NSE compared with the DSE, whereas the opposite is true for sheet residues. The average contribution from correlated BB rotations for the 442 contacting pairs of nonsequence adjacent residues, defined by those residues making at least one heavy atom contact during the entire NSE − − simulations, is only −0.06 kcal·mol 1·residue 1 (Fig. S6C). The three − − largest corrections, 0.30–0.34 kcal 1·mol 1,areintheN-terminalβ turn (i,i + 2,3 contacts). The correlations between contacting SCs are determined using the joint χ-angle distributions involving six or fewer angles or more than six angles with bin sizes of 20° or 30°, respectively (Fig. S6B). We use an average of at least 3, 5, or 10 heavy-atom SC contacts to define a pairwise interaction in the NSE trajectories (152, 113, and 41 pairs, respectively). After subtracting the corre- sponding correction for each pair in the DSE simulations, the re- Δ − − · −1 duction in T Sis 0.02 to 0.06 kcal mol . This analysis indicates Fig. 3. Estimating the entropy lost using burial levels and 2° structure. (A) that motions of contacting SCs are nearly independent of each Loss of SC entropy is mildly correlated with the relative burial level for each other, in agreement with Hu and Kuhlman (6) and Li et al. (13). amino acid (AA) type. (B) Total entropy is estimated from this correlation plus Our procedure of using joint probability distributions for up to the 2° structure dependence of the BB entropy, corrected for differences in eight angles largely eliminates the need to calculate the higher the DSE, and is compared with the values calculated from the simulations. The order corrections using a mutual information expansion or unity line is provided as a visual guide.

15398 | www.pnas.org/cgi/doi/10.1073/pnas.1407768111 Baxa et al. Downloaded by guest on September 28, 2021 et al. (9) and D’Aquino et al. (29) are in mixed agreement with our data. They use calorimetric data to estimate ΔSAla−Gly, which is combined with a computation approach similar to ours to es- timate the DSE entropy. Their entropy loss is also amino acid- and burial-dependent. However, their Rama distributions in the DSE are broader than ours, and they assume one conformation for buried SCs. Applying their analysis to Ub’s α-helix produces −1 an average TΔSBB = 0.9 kcal·mol , which is lower than our − value of 1.5 kcal·mol 1. The corresponding SC calculation −1 produces TΔSSC = 0.7–1.0 kcal·mol (using a 50–80% thresh- old for classifying a residue as buried), which is much higher than −1 our value of 0.3 kcal·mol . For the entire protein, TΔSSC = 0.5–0.7 − kcal·mol 1, which is again higher than our average value of 0.2 − kcal·mol 1. Nevertheless, the two methods produce comparable − total losses of entropy for Ub’s helix (∼1.8 kcal·mol 1), but the Fig. 4. Entropy loss can vary even at the same burial level in the NSE. (A) relative contributions from the BB and SC are very different. Although Lys K33 and K48, are equally buried (21–22%), they have different χ χ Δ = − · −1 Our BB entropy loss is fivefold larger than the SC entropy NSE ( 1, 2) distributions and entropies, T SSC 0.5 and 0.1 kcal mol ,re- · −1 B > > (1.5 vs. 0.3 kcal mol ), whereas theirs is nearly equal (0.9 vs. spectively. ( ) Four Iles are highly buried in the NSE ( 90%), but I23 loses 0.3 −1 − – · kcal·mol 1 more TΔS than the others due to its tighter (χ ,χ ) distribution. 0.7 1.0 kcal mol ). SC 1 2 Daggett and coworkers (28) also use MD to produce NSE and DSE Rama maps from which they calculate the loss of BB en- each Ub residue (Fig. 3B). We expect the values found for Ub, tropy. Their values for the two states are smaller than ours by Δ ∼ · −1 a typical globular protein, apply to other when their DSEs T S 0.6 kcal mol . Their DSE maps (from 498 K simulations) are similarly unstructured. are heavily dominated by helical conformations for Ala and, to a lesser extent, for Gly, whereas our maps are dominated by Discussion extended conformers, a necessary feature for matching experi- A method for determining the loss of entropy is essential for mental RDCs (18). Nevertheless, their calculations and our dissecting the energetics of folding, binding, and conformational calculations produce similar values for the difference in the loss change (1–3). Our method resembles a variety of previous of BB entropy upon helix formation between an Ala and a Gly, Δð Δ DSE−NSEÞ · −1 studies that use angular distributions to calculate entropies (9, T SBB Ala → Glyjhelix, 0.1 and 0.3 kcal mol , respectively. 12–14, 26–30), although we include features that have not been Plaxco and coworkers (30) estimate the loss of BB entropy previously considered or used in combination. MD simulations from the work required to stretch an unfolded polypeptide to the – are used to create realistic ensembles for both the NSE and DSE. force at which the next attached protein unfolds (200 250 pN). · −1 We account for intrawell and interwell motions, obviating the need This free energy, 1.4 kcal mol , is equated to the loss of BB to invoke assumptions about the entropy in either state or the shape entropy under the implicit assumption that the native state has of the wells. Our DSE includes Rama basin restraints so that the one conformation or, more precisely, that the residual entropy of BIOPHYSICS AND the extended chain is the same as the native state entropy. Al- ensemble has the proper Rg, chemical shifts, and distributions of COMPUTATIONAL BIOLOGY dihedral angles that are necessary to reproduce experimental RDCs though there is no a priori reason for this equivalence, a recent (14, 18, 21). Without these constraints, the chain would unphysically computational study provides support because the area sampled collapse. We also evaluated contributions arising from correlated in the Rama map of an unfolded chain under 250 pN of force −1 (47) is similar to the area sampled by helical residues in our NSE motions, TΔScorr ∼ 0.3 kcal·mol . An investigation of the depen- dence on burial and 2° structure finds that only the latter is signif- simulations. The similarity explains why their approach produces − − icant, TΔShelix sheet = 0.5 kcal·mol 1. a similar value for helical residues (but not for sheets). BB Coarse-grained simulations have been used to predict folding CHEMISTRY pathways and conformational changes. These models generally Comparison with Other Studies. Our values for TΔS ,TΔS , − Total BB TΔS = 1.4, 1.0–1.1, and 0.2–0.3 kcal·mol 1, respectively, fall into two classes. In the first class, real-space dynamics are SC conducted and the conformational entropy is determined by overlap with some previous estimates but are smaller than others, ΔS = Energy/T . For one Cα bead model, the entropy is particularly for the SC entropy (30). Our small value for TΔS is − midpoint SC 1.6 kcal·mol 1, as calibrated using a variety of proteins (48). The consistent with the finding of Hu and Kuhlman (6) that including second class involves Go-style Ising models that lack any explicit SC entropy has minimal effect in amino acid preference for BB or SC degrees of freedom. As a result, an entropy term is protein design. Many previous methods similarly calculate SC added for each unfolded residue and should correspond to our entropy losses upon binding and folding by counting rotameric total loss of entropy. Alm and Baker (49) and Muñoz and Eaton states, sometimes using energy-weighted (9, 42) or Monte Carlo · −1 – (50) use values of 1.8 and 1.0 kcal mol , respectively. Although approaches (43 45), which are likely to have very good sampling. these values are similar to our average, the models omit all de- Doig and Sternberg (7) survey several works and report an av- · −1· −1 pendence on 2° structure, amino acid type, or SC burial. This lack erage loss of 0.5 kcal mol rotamer . Many approaches model of dependence on 2° structure favors helix formation because it fails ’ − the DSE s rotamer distribution (e.g., using the diversity observed to account for the additional 0.5 kcal·mol 1 entropy loss for helix in the PDB), while assuming the NSE has one conformation (9). formation. This bias can influence the prediction of folding path- Using Monte Carlo sampling, DuBay and Geissler (8) find Δ = · −1 ways (e.g., it may explain the frequent misidentification of helix in T S 0.9 kcal mol for the loss of SC entropy, although they the transition state of protein L by coarse-grained models) (51). Our note that their reference state overestimates the freedom in the · −1· −1 entropies and their dependence on structure, residue type, and DSE. Their computed entropy loss of 0.14 kcal mol rot for burial may be used to improve both classes of coarse-grained binding is very similar to our SC value for folding. Other treat- models, as well as to assist in protein design. ments assume single, sometimes quasiharmonic, energy wells (2, ’ 46), unlike our DSE s multiwell energy surface. NMR Order Parameters. Order parameters have been used to es- A variety of methods, typically applied to helices, produce timate microscopic BB and SC entropies (3, 11–13, 26, 32–37). Δ = – · −1 · −1 2 T SBB 0.9 2.2 kcal mol , with a median of 1.4 kcal mol Yang and Kay (32) derived a popular equation relating SLZ to (30), which is close to our helical value but greater than our sheet entropy from a diffusion-in-a-cone model, whereas Palmer and − values, 1.5 and 1.0 kcal·mol 1, respectively. Calculations by Lee coworkers (52) showed that estimates of changes in stability are

Baxa et al. PNAS | October 28, 2014 | vol. 111 | no. 43 | 15399 Downloaded by guest on September 28, 2021 2 not strongly model-dependent for values of SLZ observed in correlation between the Chou–Fasman helical propensities and proteins. Although order parameters are ill-defined in the DSE either the total, BB, or SC entropy, with the exception of the BB due to the lack of a global reference frame, some estimates of the entropy for the four apolar residues (r = −0.91). Context could be change in BB entropy using N-H order parameters with a diffu- responsible for the difference between the present and prior stud- −1 sion-in-a-cone model are close to our values, 0.8–1.6 kcal·mol ies. The earlier results examine monomeric helices, whereas our Ub (32–34, 53). values are computed for a partially buried amphipathic helix. However, our calculation exhibits a notable discrepancy, with ϕ ψ 2 Our previous study examining ( , ) preferences in a coil li- SLZ-derived values for the difference in entropy loss for helical brary found that both helical and sheet propensities correlate and sheet residues even for those positions having similarly high well with a single factor, the probability that a residue in a PDB- N-H order parameters. For example, the difference in the BB based coil library adopts a helical angle and sheet angle, re- Δ = helix helix helix entropy between an Ile in a helix and in a sheet is T Shelix−sheet ΔΔG = −RT ðP =P Þ · −1 · −1 spectively, as given by A → X ln X A (57). The 0.44 kcal mol , compared with 0.18, 0.29, and 0.35 kcal mol individual propensities can be viewed as the work or free energy for calculations based on the N-H, C=O, and peptide plane ’ B D required to restrict a residue s BB to a specific subset of vector motions, respectively (Fig. 2 and ). The three vectors ϕ,ψ-angles in the NSE (e.g., helical angles) from the multitude of are reporting on different motions, consistent with a study by angles adopted by the residue when the protein is unfolded Wang et al. (38), and none fully reflects the BB entropy (Fig. 2C (using a coil library as a proxy for the DSE). The total work and Fig. S5). This underreporting arises due to their being in- complete proxies of all (ϕ,ψ) combinations. The dominant BB contains both entropic and enthalpic components, which may motion on the subnanosecond time scale is the rocking of the explain the absence of a correlation when considering only the peptide plane. This motion involves a counterrotation of the two conformational entropy. flanking dihedral angles, Δψi− ∼−Δϕi (13, 54), with the cor- 1 Conclusions relation being stronger for sheet residues because they undergo larger angular changes. Consequently, the three vectors do not We determine the loss of protein conformation entropy upon sample the entire 2 degrees of (ϕ,ψ) angular freedom that define folding using realistic NSEs and DSEs and include contributions the BB entropy (37), particularly for sheet residues. from correlated motions. This protocol provides a nearly com- 2 Consequently, no single relationship between SLZ and the BB plete treatment for a thermodynamic parameter whose magni- entropy simultaneously describes both helices and sheet residues tude has been debated for decades. The total loss of confor- − − − with an accuracy better than 0.2–0.5 kcal·mol 1. This deficiency mational entropy, 1.4 kcal·mol 1·residue 1, is largely due to the 2 is particularly true at SLZ ≥ 0.9, where the entropy-order pa- loss of BB entropy. The total conformational entropy loss 2 rameter relationship is most sensitive to slight changes in SLZ. equates to a factor of ∼5–13 reduction in the number of con- 2 Nevertheless, the SLZ-entropy relationship is quite accurate for formations available per residue, with buried SCs losing only any one of the three vectors’ order parameter and its own en- a factor of 1.4 per rotamer. Helical residues lose a factor of 2.3 tropy (32) (Fig. S5). This difference supports the application of more conformations than sheet residues due to more restricted the diffusion-in-a-cone model for a given vector motion and its motions in the native state. This difference should be accounted entropy but emphasizes that none of the three bond vectors’ for in coarse-grained simulations and NMR analyses of micro- motions fully reflects the BB’s ϕ- and ψ-motions. scopic entropy changes upon folding and binding. These data are SC order parameters have been used to estimate changes in incorporated into a server that provides estimates of the entropy entropy upon binding (11). Wand and coworkers (55) find loss for an input native structure (godzilla.uchicago.edu/cgi-bin/ a strong correlation between the change in the SC methyl sym- PLOPS/PLOPS.cgi). metry axis order parameter and the loss in calmodulin’s con- formational entropy upon binding a series of target peptides. Methods This relationship is used to calibrate an “entropy meter.” Later, Entropy Calculations. The entropy is calculated from Pi(ϕ,ψ,χ ,χ ) for residues Kasinath et al. (3) analyzed MD simulations and found that 1 2 with up to two SC rotamers. The BB entropy is evaluated from Pi(ϕ,ψ). The SC motions of methyl-bearing SCs are excellent proxies of the entire entropy is the difference between the total entropy and the BB entropy. The SC entropy once the individual values are weighted according to total distributions for residues with more than three rotamers (K, R, E, Q) are the number of rotameric states. These and other calculations decomposed into two parts, Pi(ϕ,ψ,χ1) and Pi([χn]). The total entropy is the stress the importance of making an in-depth comparison be- sum of the entropies calculated from these two distributions minus tween NMR order parameters and microscopic entropies. the entropy associated with Pi(χ1) to avoid double-counting. A further description of our methods is detailed in SI Text. Implications for Hydration Entropies. Estimates for the change in conformational entropy are used to estimate other thermody- Creation of the Ensembles. The DSE is generated starting from 315 repre- – namic properties for folding and binding reactions (1 3). Typi- sentative conformations of a DSE constructed using a PDB-based coil library of cally, ΔS nearly vanishes for the entire reaction (e.g., near the non–H-bonded residues in the PDB (18). Angle selection from the library temperature of maximum protein stability), where the loss of accounts for each residue’s chemical identity, as well as for those chemical conformational entropy is balanced by the gain in solvent entropy identities of the neighboring residues and their conformation. Each residue typically associated with the burial of apolar and polar groups (1). is constrained to remain within its original Rama basin using a harmonic Our revised value for the loss of conformational entropy can reflecting “wall” at the edge of the basin (14). This calculation decomposes provide an estimate of the entropy of hydration, ΔShydr = ΔSexp − the total probability distribution into two components: the interbasin dis- ΔSconf. As a result, our study may help resolve inconsistencies in tribution (established by the relative basin propensities in the coil library) previous estimates of hydration entropies using transfer studies of and the distribution for intrabasin motions obtained with all-atom MD model compounds that may be attributable to the use of alter- simulations using NAMD (58). After reaching the target temperature, the BB native reference states [e.g,. assumptions that the gas phase is restraints are gradually released. Following an additional 1 ns of equilibra- a more appropriate reference state than the solvent phase (56)]. tion with no BB restraints, a 2-ns trajectory is carried out while maintaining the Rama basin restraints, resulting in ∼6 × 105 structures in the DSE. Secondary Structure Propensities. Residues in Ub’s helices, sheets, The corresponding angular distributions in the NSE are generated from 10 −1 and coil regions lose, on average, 1.5, 1.0, and 1.2 kcal·mol in independent 100-ns trajectories at 300 K, starting from the energy-minimized BB entropy, but the average SC entropy loss is largely in- crystal structure (59) and using the same equilibration protocol with BB dependent of the 2° structure. Earlier studies find a strong cor- restraints described above. Structures after the first 10 ns of simulation are relation between helical propensity and the loss of SC entropy on saved every 1 ps (providing a total of 9 × 105 structures). MD simulations in folding for apolar residues (10, 44). All 14 residues in Ub’s he- both states are run with the CHARMM36 (16) force field using the TIP3P lices, or just the four apolar residues, display, at most, a mild water model (17).

15400 | www.pnas.org/cgi/doi/10.1073/pnas.1407768111 Baxa et al. Downloaded by guest on September 28, 2021 ACKNOWLEDGMENTS. We thank B. Roux, J. C. Gumbart, A. Palmer III, Research Computing Center and National Institutes of Health (NIH) through J. Wand, K. Sharp, R. L. Baldwin, and members of our group for comments resources provided by the Computation Institute, the Bioscience Division of and helpful discussions. NAMD was developed by the Theoretical and Compu- The University of Chicago, and the Argonne National Laboratory under Grant tational Biophysics Group at the University of Illinois at Urbana–Champaign. This S10 RR029030 and by grants from the NIH (Grant GM055694) and National work was completed using resources provided by The University of Chicago Science Foundation (Grants CHE-1111918 and CHE-1363012).

1. Makhatadze GI, Privalov PL (1996) On the entropy of protein folding. Protein Sci 5(3): 30. Thompson JB, Hansma HG, Hansma PK, Plaxco KW (2002) The backbone conforma- 507–510. tional entropy of protein folding: Experimental measures from atomic force micros- 2. Brady GP, Sharp KA (1997) Entropy in protein folding and in protein-protein inter- copy. J Mol Biol 322(3):645–652. actions. Curr Opin Struct Biol 7(2):215–221. 31. Ting D, et al. (2010) Neighbor-dependent Ramachandran probability distributions of 3. Kasinath V, Sharp KA, Wand AJ (2013) Microscopic insights into the NMR relaxation- amino acids developed from a hierarchical Dirichlet process model. PLOS Comput Biol based protein conformational entropy meter. J Am Chem Soc 135(40):15092–15100. 6(4):e1000763. 4. Gumbart JC, Roux B, Chipot C (2013) Efficient determination of protein-protein 32. Yang D, Kay LE (1996) Contributions to conformational entropy arising from bond standard binding free energies from first principles. J Chem Theory Comput 9(8): vector fluctuations measured from NMR-derived order parameters: Application to 3789–3798. protein folding. J Mol Biol 263(2):369–382. 5. Tzeng SR, Kalodimos CG (2012) Protein activity regulation by conformational entropy. 33. Alexandrescu AT, et al. (1998) 15N backbone dynamics of the S-peptide from ribo- Nature 488(7410):236–240. nuclease A in its free and S-protein bound forms: Toward a site-specific analysis of 6. Hu X, Kuhlman B (2006) Protein design simulations suggest that side-chain confor- entropy changes upon folding. Protein Sci 7(2):389–402. mational entropy is not a strong determinant of amino acid environmental prefer- 34. Bracken C, Carr PA, Cavanagh J, Palmer AG, 3rd (1999) Temperature dependence of ences. Proteins 62(3):739–748. intramolecular dynamics of the basic leucine zipper of GCN4: Implications for the 7. Doig AJ, Sternberg MJ (1995) Side-chain conformational entropy in protein folding. entropy of association with DNA. J Mol Biol 285(5):2133–2146. Protein Sci 4(11):2247–2251. 35. Genheden S, Akke M, Ryde U (2014) Conformational entropies and order parameters: 8. DuBay KH, Geissler PL (2009) Calculation of proteins’ total side-chain torsional en- Convergence, reproducibility, and transferability. J Chem Theory Comput 10(1): – tropy and its influence on protein-ligand interactions. J Mol Biol 391(2):484–497. 432 438. 9. Lee KH, Xie D, Freire E, Amzel LM (1994) Estimation of changes in side chain con- 36. Lee AL, Wand AJ (2001) Microscopic origins of entropy, heat capacity and the glass Nature – figurational entropy in binding and folding: General methods and application to helix transition in proteins. 411(6836):501 504. formation. Proteins 20(1):68–84. 37. Fischer MWF, et al. (1997) Experimental characterization of models for backbone 10. Creamer TP, Rose GD (1992) Side-chain entropy opposes alpha-helix formation but picosecond dynamics in proteins. Quantification of NMR auto- and cross-correlation J Am Chem rationalizes experimentally determined helix-forming propensities. Proc Natl Acad Sci relaxation mechanisms involving different nuclei of the peptide plane. Soc – USA 89(13):5937–5941. 119(51):12629 12642. 11. Wand AJ (2013) The dark energy of proteins comes to light: Conformational entropy 38. Wang T, Weaver DS, Cai S, Zuiderweg ER (2006) Quantifying Lipari-Szabo modelfree J Biomol NMR – and its role in protein function revealed by NMR relaxation. Curr Opin Struct Biol parameters from 13CO NMR relaxation experiments. 36(2):79 102. 23(1):75–81. 39. King BM, Silver NW, Tidor B (2012) Efficient calculation of molecular configurational J Phys Chem B 12. Li DW, Khanlarzadeh M, Wang J, Huo S, Brüschweiler R (2007) Evaluation of con- entropies using an information theoretic approximation. 116(9): 2891–2904. figurational entropy methods from peptide folding-unfolding simulation. J Phys 40. Lindorff-Larsen K, Best RB, Depristo MA, Dobson CM, Vendruscolo M (2005) Simul- Chem B 111(49):13807–13813. taneous determination of protein structure and dynamics. Nature 433(7022):128–132. 13. Li DW, Showalter SA, Brüschweiler R (2010) Entropy localization in proteins. J Phys 41. Trbovic N, et al. (2009) Protein side-chain dynamics and residual conformational en- Chem B 114(48):16036–16044. tropy. J Am Chem Soc 131(2):615–622. 14. Baxa MC, Haddadian EJ, Jha AK, Freed KF, Sosnick TR (2012) Context and force field 42. Cole C, Warwicker J (2002) Side-chain conformational entropy at protein-protein in- dependence of the loss of protein backbone entropy upon folding using realistic terfaces. Protein Sci 11(12):2860–2870. denatured and native state ensembles. J Am Chem Soc 134(38):15929–15936. 43. Chellgren BW, Creamer TP (2006) Side-chain entropy effects on protein secondary 15. Vijay-Kumar S, Bugg CE, Cook WJ (1987) Structure of ubiquitin refined at 1.8 A res- structure formation. Proteins 62(2):411–420. olution. J Mol Biol 194(3):531–544. BIOPHYSICS AND 44. Creamer TP, Rose GD (1994) Alpha-helix-forming propensities in peptides and pro- 16. Best RB, et al. (2012) Optimization of the additive CHARMM all-atom protein force teins. Proteins 19(2):85–97. COMPUTATIONAL BIOLOGY field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) 45. Zhang J, Liu JS (2006) On side-chain conformational entropy of proteins. PLOS dihedral angles. J Chem Theory Comput 8(9):3257–3273. Comput Biol 2(12):e168. 17. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparison 46. Baron R, van Gunsteren WF, Hünenberger PH (2006) Estimating the configurational of simple potential functions for simulating liquid water. J Comput Chem 79(2): entropy from molecular dynamics simulations: Anharmonicity and correlation cor- 926–935. rections to the quasi-harmonic approximation. Trends Phys Chem 11:87–122. 18. Jha AK, Colubri A, Freed KF, Sosnick TR (2005) Statistical coil model of the unfolded 47. Stirnemann G, Kang SG, Zhou R, Berne BJ (2014) How force unfolding differs from state: Resolving the reconciliation problem. Proc Natl Acad Sci USA 102(37): chemical denaturation. Proc Natl Acad Sci USA 111(9):3413–3418.

– CHEMISTRY 13099 13104. 48. Karanicolas J, Brooks CL, 3rd (2002) The origins of asymmetry in the folding transition 19. Li DW, Brüschweiler R (2012) PPM: A side-chain and backbone chemical shift predictor states of protein L and protein G. Protein Sci 11(10):2351–2361. J Biomol NMR for the assessment of protein conformational ensembles. 54(3): 49. Alm E, Baker D (1999) Prediction of protein-folding mechanisms from free-energy – 257 265. landscapes derived from native structures. Proc Natl Acad Sci USA 96(20):11305–11310. 20. Peti W, Smith LJ, Redfield C, Schwalbe H (2001) Chemical shifts in denatured proteins: 50. Muñoz V, Eaton WA (1999) A simple model for calculating the kinetics of protein Resonance assignments for denatured ubiquitin and comparisons with other dena- folding from three-dimensional structures. Proc Natl Acad Sci USA 96(20):11311–11316. J Biomol NMR – tured proteins. 19(2):153 165. 51. Yoo TY, et al. (2012) The folding transition state of protein L is extensive with non- 21. Jacob J, Krantz B, Dothager RS, Thiyagarajan P, Sosnick TR (2004) Early collapse is not native interactions (and not small and polarized). J Mol Biol 420(3):220–234. J Mol Biol – an obligate step in protein folding. 338(2):369 382. 52. Akke M, Bruschweiler R, Palmer AG (1993) NMR order parameters and free-energy— 22. Muñoz V, Serrano L (1997) Development of the multiple sequence approximation An analytical approach and its application to cooperative Ca2+ binding by calbindin- within the AGADIR model of alpha-helix formation: Comparison with Zimm-Bragg D(9k). J Am Chem Soc 115(21):9832–9833. and Lifson-Roig formalisms. Biopolymers 41(5):495–509. 53. Maragakis P, et al. (2008) Microsecond molecular dynamics simulation shows effect of 23. Zheng Z, Sosnick TR (2010) Protein vivisection reveals elusive intermediates in folding. slow loop dynamics on backbone amide order parameters of proteins. J Phys Chem B J Mol Biol 397(3):777–788. 112(19):6155–6158. 24. Piana S, Lindorff-Larsen K, Shaw DE (2013) Atomic-level description of ubiquitin 54. Fitzgerald JE, Jha AK, Sosnick TR, Freed KF (2007) Polypeptide motions are dominated folding. Proc Natl Acad Sci USA 110(15):5915–5920. by peptide group oscillations resulting from dihedral angle correlations between 25. Piana S, Klepeis JL, Shaw DE (2014) Assessing the accuracy of physical models used in nearest neighbors. Biochemistry 46(3):669–682. protein-folding simulations: Quantitative evidence from long molecular dynamics 55. Marlow MS, Dogan J, Frederick KK, Valentine KG, Wand AJ (2010) The role of con- simulations. Curr Opin Struct Biol 24:98–105. formational entropy in molecular recognition by calmodulin. Nat Chem Biol 6(5): 26. Li DW, Brüschweiler R (2009) A dictionary for protein side-chain entropies from NMR 352–358. order parameters. J Am Chem Soc 131(21):7226–7227. 56. Baldwin RL (2013) The new view of hydrophobic free energy. FEBS Lett 587(8): 27. Fenley AT, et al. (2014) Correlation as a determinant of configurational entropy in 1062–1066. supramolecular and protein systems. J Phys Chem B 118(24):6447–6455. 57. Jha AK, et al. (2005) Helix, sheet, and polyproline II frequencies and strong nearest 28. Scott KA, Alonso DO, Sato S, Fersht AR, Daggett V (2007) Conformational entropy of neighbor effects in a restricted coil library. Biochemistry 44(28):9691–9702. alanine versus glycine in protein denatured states. Proc Natl Acad Sci USA 104(8): 58. Phillips JC, et al. (2005) Scalable molecular dynamics with NAMD. J Comput Chem 2661–2666. 26(16):1781–1802. 29. D’Aquino JA, et al. (1996) The magnitude of the backbone conformational entropy 59. Vijay-Kumar S, et al. (1987) Comparison of the three-dimensional structures of hu- change in protein folding. Proteins 25(2):143–156. man, yeast, and oat ubiquitin. J Biol Chem 262(13):6396–6399.

Baxa et al. PNAS | October 28, 2014 | vol. 111 | no. 43 | 15401 Downloaded by guest on September 28, 2021