<<

folding and SPECIAL FEATURE from atomistic

Stefano Pianaa,1,2, Kresten Lindorff-Larsena,1,2, and David E. Shawa,b,2

aD. E. Shaw Research, New York, NY 10036; and bCenter for Computational Biology and Bioinformatics, Columbia University, New York, NY 10032

Edited by Peter G. Wolynes, Rice University, Houston, TX, and approved June 13, 2012 (received for review March 2, 2012)

Advances in simulation techniques and computing hardware have folding mechanism, it has been difficult to produce consistent created a substantial overlap between the timescales accessible to predictions of the thermodynamics of villin folding (10–13), and atomic-level and those on which the fastest-folding where comparison has been possible between kinetic models built fold. Here we demonstrate, using simulations of four from multiple short simulations and independent, long-equili- variants of the human villin headpiece, how simulations of spon- brium MD simulations (14), substantial differences have been taneous folding and unfolding can provide direct access to thermo- observed in the folding free energies. dynamic and kinetic quantities such as folding rates, free energies, Recent advances in computer hardware have, however, ex- folding , heat capacities, Φ-values, and - tended the timescale accessible to simulation up to the millise- jump relaxation profiles. The quantitative comparison of simula- cond (15), thus creating a broad overlap with the microsecond tion results with various forms of experimental data probing timescale characteristic of fast-folding proteins such as villin, different aspects of the folding process can facilitate robust assess- and allowing for the direct calculation of equilibrium thermody- ment of the accuracy of the calculations while providing a detailed namic and kinetic properties from the simulation data (16, 17). structural interpretation for the experimental observations. In the Substantial improvements have also been made in the molecular example studied here, the analysis of folding rates, Φ-values, and mechanics fields used in MD simulations (14, 18–20). folding pathways provides support for the notion that a norleucine Taking advantage of improvements in both simulation speeds double mutant of villin folds five times faster than the wild-type and force fields, we have employed equilibrium MD simulations

sequence, but following a slightly different pathway. This work to study the folding kinetics and thermodynamics of several AND showcases how has now developed into a variants of villin. Some researchers have suggested that tempera- COMPUTATIONAL BIOLOGY mature tool for the quantitative computational study of protein ture-jump experiments may underestimate the folding time of folding and that can provide a valuable complement to villin (4, 5) and may be more sensitive to processes other than experimental techniques. . It should be noted, however, that some of these theoretical estimates were based on models that combine the re- Amber ff99SB*-ILDN ∣ ∣ heat capacity ∣ pre-exponential factor ∣ sults of non-equilibrium, short, individual trajectories, thereby transition path time introducing additional sources of uncertainty in the quantitative comparisons between the simulation results and the experimental roteins are synthesized in the cell or in vitro as unstructured observations. The equilibrium simulations presented here over- Ppolypeptide chains that, in most cases, self-assemble into their come this difficulty as in each trajectory at least 30 folding and functionally active three-dimensional shapes. This process, called unfolding events are observed, making it possible to directly com- protein folding, occurs on a broad range of timescales ranging pute thermodynamic and kinetic quantities without the need to from microseconds to seconds and higher. From a purely physi- build approximate models to describe the system. Further, we cal-chemical perspective, it should be possible in principle to compared simulations of different variants allowing the direct characterize the folding mechanism of a given protein at atomis- calculation of Φ-values in a manner analogous to experiments tic resolution and to reconstruct its free-energy landscape, given (17, 21). only its primary sequence, through molecular dynamics (MD) si- Most of the results presented here are in good agreement with mulations based on elementary physical principles. This direct previous experimental findings, with the notable exception of the approach has been rarely pursued because even the simplest sys- heat capacity for folding, which appears to be smaller than the tems representing a protein immersed in water consist of several value extracted from calorimetric data. We find that the double thousand , and simulating their behavior on the timescales norleucine (Nle/Nle) mutant (3) folds approximately five times typical of protein folding is computationally extremely demand- faster than the wild-type protein, thus supporting the original in- ing. The discovery and design of fast-folding proteins (1) signifi- terpretation of the experimental data (3). In agreement with our cantly narrowed the timescale gap between simulations and ex- previous observations (14), the results reported here also indicate periments, making such simulations feasible, at least for the that both the number of helical residues and the Trp side-chain fastest-folding proteins. environment are sensitive to the folding/unfolding process, sup- The C-terminal fragment of the villin headpiece [referred to in porting the notion that experiments that probe these quantities, the remainder of this paper simply as “villin” (2)], one of the fast- like infrared (IR) and fluorescence-detected temperature-jump, est-folding protein domains known (3), has proven to be an ex- may be used to determine folding and unfolding rates (22). cellent target for folding simulations with -based force

fields and an atomistically detailed representation of both the so- Author contributions: S.P., K.L.-L., and D.E.S. designed research; S.P. and K.L.-L. performed lute and the surrounding solvent (4–7). Until recently, the length research; S.P. and K.L.-L. analyzed data; and S.P., K.L.-L., and D.E.S. wrote the paper. — of such simulations was limited to a few microseconds a time- The authors declare no conflict of interest. scale sufficient to capture, at best, a single folding event (8, 9). This article is a PNAS Direct Submission. With this limitation, it has been difficult to directly connect the 1S.P. and K.L.-L. contributed equally to this work. data produced by short, non-equilibrium simulations to experi- 2To whom correspondence may be addressed. E-mail: Stefano.Piana-Agostinetti@ mental observations, unless sufficient statistics were generated to DEShawResearch.com or [email protected] or David.Shaw@ allow the construction of coarse-grained kinetic models that ap- DEShawResearch.com. proximate the underlying folding dynamics (10, 11). While these This article contains supporting information online at www.pnas.org/lookup/suppl/ models have proven useful for obtaining certain insights into the doi:10.1073/pnas.1201811109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1201811109 PNAS ∣ October 30, 2012 ∣ vol. 109 ∣ no. 44 ∣ 17845–17850 Downloaded by guest on October 1, 2021 Table 1. Folding kinetics and thermodynamics from equilibrium MD simulations of wild-type (WT) and two variants of villin headpiece C-terminal fragment T L μ n ΔG ΔH ΔC τ μ hτ i μ k μ −1 Variant sim (K) ( s) f f v f ( s) TP ( s) 0 ( s ) MD VH MD ΔΔH WT (HP-35) 345 398 30 0.8(2) −15.1(4) 0.2(2) 19(5) 0.5(1) 0.65 WT (HP-35) 360 319 31 1.6(2) −18(3) −19(7) 0.1(2) 0.2(2) 16(4) 0.24(5) 1.49 WT-F10L 345 371 30 1.6(3) −10(2) 0.4(2) 24(4) 0.39(7) 0.90 Nle/Nle 360 305 61 −0.6(2) −16(3) 0.1(1) 3.2(6) 0.19(2) 1.05 Nle/Nle 370 395 150 0.0(1) −18.2(8) −22(8) 0.07(2) 0.1(2) 2.3(2) 0.15(1) 1.05 Nle/Nle 380 301 140 0.7(1) −21.2(9) −26(5) 0.0(4) 0.1(1) 3.0(4) 0.12(1) 2.33 Nle/Nle-F10L 360 301 110 0.4(1) −14.7(8) 0.2(1) 3.5(5) 0.21(2) 1.00 Nle/Nle-F10L 370 300 130 0.8(1) −16(1) −15(5) 0.3(1) 0.1(1) 2.9(3) 0.18(1) 1.10 The temperature of each MD simulation is reported together with the total length (L) and the total number of observed folding and unfolding events (n). The trajectories have been partitioned into folded and unfolded segments using a transition-based assignment (14, 26). The folding free energy ΔG −1 ΔH −1 ( f,kcalmol ) is calculated from the ratio of the folded and unfolded fractions. Folding enthalpies ( f, kcal mol ) were calculated either from the folding free energy at different temperature using the van’t Hoff equation (VH)orasΔH ¼ ΔU þ VΔP where ΔU is the difference in average ΔC −1 −1 force-field energy in the folded and unfolded states (MD). The heat capacities ( v, kcal mol K ) were calculated either from difference in the fluctuations of the force-field energy between the folded and unfolded states (MD) or from the temperature dependency of the folding enthalpy ΔΔH τ k ( ). The folding time ( f) is calculated as the average waiting time in the unfolded state. The pre-exponential factor for folding ( 0) is estimated hτ i ’ from the folding time and the mean transition path time ( TP ) using Kramers theory (40)

Results and Discussion and the folding heat capacities calculated from the temperature Equilibrium Reversible Folding Simulation of the Villin Headpiece dependency of the folding free energy and enthalpy, respectively. C-terminal Fragment. We validated that the Amber ff99SB*-ILDN Both methods of calculation result in similar folding enthalpies, with values for the three proteins ranging between 14 and (18, 19, 23) appears to be reasonably transferable −1 across different protein classes (SI Text) and used it to investigate 26 kcal mol (Table 1). These values are consistent with those computationally the kinetics and thermodynamics of villin fold- observed in previous simulations performed with different force ing. In particular, we performed equilibrium MD simulations, at fields (14), but generally appear to be slightly smaller than the 29 −1 multiple , of the human villin C-terminal fragment value of kcal mol determined from calorimetry for wild-type 25 −1 (24), the Nle/Nle double mutant (3), and variants where we in- villin (30) and the approximately kcal mol estimated from a ’ troduced an F10L mutant into either the wild-type protein or the van t Hoff analysis for the Nle/Nle double mutant (3). Nle/Nle variant. Each simulation was run for at least 300 μs and As is the case for the folding enthalpies, the values calculated contained between 30 and 150 folding and unfolding events for the folding heat capacities from either the enthalpy changes (Table 1). as function of temperature or the fluctuations of the folding The simulation trajectories were analyzed using the stable state enthalpy are the same within (a relatively large) error (Table 1). However, the calculated values, ranging between 0 and approximation (25) and partitioned into folded and unfolded 0 2 −1 −1 ΔC states through a “transition-based assignment” (26) of time series . kcal mol K , are smaller than the corresponding p va- lues obtained from calorimetry experiments on the wild-type of the Cα-RMSD from the experimental native structure (14). protein [0.457 kcal mol−1 K −1 (30)]. Similarly, small heat capaci- We previously demonstrated that this approach is robust (14), ties have also been observed in simulations performed with a dif- and leads to a natural definition of transition paths as the por- ferent force field (16). This discrepancy should most likely be tions of the trajectory where the system is transitioning between ascribed to deficiencies in the force field representation of the the two cutoffs. system, and may be related to the weak temperature dependence of the stability of helical structures (9). This appears to be a gen- Calculation of Thermodynamic Properties. Partitioning of the trajec- eral characteristic of the force fields commonly used for protein tory into folded and unfolded states allows for the direct calcula- simulations (19, 20), suggesting that some aspects of protein fold- tion of the folding free energy from the ratio of the populations of ing, like helix formation, involve many-body effects that may not the folded and unfolded states. The calculated (14) melting tem- be fully reproduced by simple pairwise additive force fields. The peratures of the wild type and the Nle/Nle double mutant in importance of such effects is likely to be system- and size-depen- simulation (325 K and 370 K, respectively) are in reasonably good dent and, while it appears to be possible to get good quantitative agreement with the experimental values of 342 K (27) and 361 K agreement with experiment in folding simulations of small pro- (3), respectively, and remarkably consistent with the value of teins with current force fields, folding simulations of larger pro- 320 K estimated for wild-type villin in an independent metady- teins may require the development of improved functional forms namics simulation with the same force field (28). At 360 K we beyond currently used functional forms. 2 2 −1 calculate that the double Nle/Nle mutant is . kcal mol more It has recently been suggested that unfolded states in MD si- stable than the wild-type villin, compared to the experimental −1 mulations with simple water models like TIP3P may be unusually result of 1 kcal mol (3). Also, the F10L mutation destabilizes compact due to inaccuracies in the force-field description of the the Nle/Nle double mutant by 1 kcal mol−1 and wild-type villin by −1 enthalpies of hydration (31). In our simulations of villin, we ob- 0.8 kcal mol , in excellent agreement with the experimentally served that the average radius of gyration of the unfolded state −1 measured value of 1 kcal mol for the same mutation in the (approximately 11 Å) is only 1 Å larger than the folded state (ap- wild-type sequence at 340 K (29). We conclude that the simula- proximately 10 Å). This relatively small change may be ascribed tions are able to satisfactorily reproduce both the absolute and in part to the substantial fraction of the helix that is observed in relative stabilities of these three proteins. the unfolded state even at the melting temperature (14). Indeed, The folding enthalpy was calculated from the difference in in our simulation of the FiP35 WW domain, a much less helical internal energy between the folded and unfolded states (Table 1). protein with the same number of amino acid residues, the radius Similarly, the folding heat capacity was calculated from the dif- of gyration of the unfolded state (approximately 16 Å), is 1.5 ferences in the fluctuations of the in the two times larger than the value in the folded state, suggesting that states. These values can be compared to the folding enthalpies the unusual compactness of the unfolded state observed for villin

17846 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1201811109 Piana et al. Downloaded by guest on October 1, 2021 is not necessarily an intrinsic property of all proteins simulated folding. Fitting of the ACFs to a single exponential decay resulted with this force field. In all cases, it should be noted that, while in considerable discrepancies for short lag times. They could, SPECIAL FEATURE errors in the enthalpy and heat capacity affect the tempera- however, be fitted well to a double exponential decay (Fig. 1A) ture-dependence of calculated properties, most of this study with a fast phase (time constants between 40 ns and 200 ns) and a was carried out in a relatively narrow range of temperatures slow phase (time constants between 0.4 μs and 5 μs). A similar (between 345 and 380 K), and we do not observe dramatic devia- two-exponential fit was deemed necessary in the analysis of tem- tions from experiment when comparing simulations performed in perature-jump experiments (3). this limited range. The faster relaxation was originally attributed to intra-basin dynamics (3, 27) and has been the subject of extensive triplet–tri- Kinetics. As described above, we used a transition-based assign- plet energy transfer (TTET) studies (36) and simulations (10, 28). ment (14, 26) to define folded and unfolded segments of the tra- We have simulated TTET experiments using the definition of jectories, and calculated folding and unfolding rates from the TTET-active states proposed in (10) and the trajectory data of waiting times in these states (Table 1). Experimental measure- wild-type villin at 345 K, as these are the simulation data most ments of folding and unfolding rates are, however, generally readily comparable to the TTET experiment; analysis of the based on observing the time-resolved relaxation of a spectro- Nle/Nle double mutant trajectory at 360 K gave very similar re- scopic signal after a small external perturbation; for fast-folding sults. The calculated triplet decay profiles are all fully consistent proteins this is typically a sudden increase in temperature of with the experimental results obtained in 2 M guanidinium hydro- 5–10 K. For a two-state folder, the relaxation rate obtained from chloride (36). For probes placed on the N-terminus and residue such an experiment is the sum of the folding and unfolding rates, 23 (0∕23) and residues 7∕23 (Fig. 1D) the triplet decay profiles and this relaxation rate can—together with a measured equili- are very similar (Fig. 1C) and can be modelled by two exponential brium constant—thus be used to determine the folding and decays with timescales of 60–100 ns and approximately 4.5 μs. A unfolding rates. In the case of villin, such temperature-jump decomposition of the decay into the folded and unfolded state experiments have been performed using either tryptophan fluor- components shows that the slow phase corresponds to the unfold- escence (3, 27) or the amide I band in IR (32) as ing transition and the fast phase to contact formation in the un- spectroscopic signals. folded state. The calculated triplet decay profiles generated by We recently described an approach by which the kinetics ob- probes placed on residues 23∕35 and on the N-terminus and re- served in a long MD simulation can be compared more directly sidue 35 (0∕35) (Fig. 1D) can also be modelled by double-expo-

with the results of temperature-jump experiments (14). In parti- nential decays (Fig. 1C), but this time with timescales of a few BIOPHYSICS AND –

cular, in the limit of a small temperature perturbation, the relaxa- tens of nanoseconds and 190 230 ns. In these cases, the fast phase COMPUTATIONAL BIOLOGY tion kinetics observed in a temperature-jump experiment can can be attributed to contact formation in the unfolded state, while be compared more directly to an MD simulation by calculating the slower phase is generated by partial melting of helix 3 in the the autocorrelation function (ACF) of a quantity that mimics the folded state, consistent with previous interpretations (10). experimentally employed spectroscopic probe (14). This analysis The timescale of the slow relaxation is very similar for the two requires (i) simulations that are much longer than the timescale spectroscopic probes and suggests that both the Trp side-chain of the relaxation of interest (so that the ACFs can be calculated environment and number of helical residues are sensitive to the with a reasonable statistical accuracy) and (ii) an efficient method global folding and unfolding reaction. To quantify the extent to that can be used to calculate the spectroscopic signal from the which this is the case, we plotted the long-timescale relaxation structures observed in the simulation. Most of the simulations time constants obtained from fitting the ACFs against the relaxa- presented here satisfy the first requirement, as they are one to tion time calculated directly from the folding and unfolding rates two orders of magnitude longer than the folding and unfolding (Fig. 1B). The results show a strong correlation between the re- times (Table 1). To the best of our knowledge, however, no com- laxation time obtained from the ACFs and the values obtained putationally affordable method has been reported that allows for from the mean waiting times in the folded and unfolded states, a highly accurate calculation of IR spectra or excited-state ener- consistent with recent experimental studies indicating that Trp gies for a large number of conformations in a complex protein fluorescence properties can be used to monitor the folding/ system. We have thus limited ourselves to calculating quantities unfolding transition in villin (22). Over the temperature range that are expected to be correlated with the spectroscopic signal of studied here, we find —in agreement with experiments—that the interest, although the exact nature of the correlation is unknown. folding rates are only weakly dependent on temperature (3, 27), This approach is expected to be sufficient for the estimation of whereas the unfolding rates increase substantially as the tem- relaxation times, but may prevent quantitative comparisons with perature is increased (Table 1). We also find that wild-type villin the amplitude of the signal observed in experiment, in particular folds substantially slower than the Nle/Nle double mutant. At when relaxations on multiple timescales are present. 360 K—where we have data for both proteins—the wild-type pro- The spectroscopic signal from the amide I band in IR spectro- tein folds in 16 μs and the double mutant in 3 μs; the relative scopy is primarily sensitive to the number and geometry of back- folding rates of the two proteins are in excellent agreement with bone hydrogen bonds, and is therefore a global measure of the the experimental measurements (3, 27), while the absolute fold- secondary structure content in the protein. In our calculations ing rates appear to be a factor of three slower than the rates we approximate this signal by the total number of amino acid re- experimentally determined at 300 K. sidues that are found in a helical geometry (33). The fluorescence The ability to calculate folding rates and equilibrium constants properties of tryptophan residues are strongly influenced by fluc- directly from simulations of reversible folding also allows us to tuations of the electric field surrounding the indole ring, which in calculate protein-engineering Φ-values in a manner analogous turn is determined by the type of environment that surrounds the to experiments (17, 21, 37). We performed simulations of a var- Trp side chain (34). In the absence of an accurate and efficient iant of villin in which we introduced the F10L mutation into both method to predict fluorescence properties, we approximate this the wild type and Nle/Nle backgrounds (Fig. 1D). This mutation signal by the solvent-accessible surface area (SASA) of the indole disrupts several key hydrophobic interactions between helix 1 and ring (14), although we note that a very recent study suggests that a 2 and is expected to have a sizable effect on the stability of the more detailed description of the local geometry may be required protein. As described above, this mutation gives rise to a 0.8 to to fully capture the fluorescence properties of villin (35). We 1 kcal mol−1 destabilization when introduced both in the Nle/Nle therefore calculated the ACFs of the number of helical residues variant or in wild-type villin (Table 1). We find that at 345 K the and the indole SASA from our equilibrium simulations of villin F10L mutation appears to decrease the folding rate of wild-type

Piana et al. PNAS ∣ October 30, 2012 ∣ vol. 109 ∣ no. 44 ∣ 17847 Downloaded by guest on October 1, 2021 AB 4 0.8

0.6 2

0.4 1 Helix ACF

0.2 τ from ACF ( µ s) 0.5 IR 0 Fluorescence 0.25 0.001 0.01 0.1 1 10 0.25 0.5 1 2 4 µ τ µ Time ( s) from kf+ku ( s)

1 CDResidues 0/23 Residues 0/35 0.8 Residues 23/35 23 Residues 7/23

0.6 7 10 0.4

Relative absorbance 0.2 0 35

0 0.001 0.01 0.1 1 10 Time (µs)

Fig. 1. Simulated T-jump and TTET experiments.(A) Simulated IR T-jump experiment from the wild-type villin simulation performed at 345 K. The autocor- relation function for the number of helical residues as reported by STRIDE (33) (black) is used as a proxy for the decay of the amide band IR absorption following a T-jump and can be fitted to the sum of two exponential decays (red) with timescales of 0.12 and 5.2 μs. The long timescale describes the folding/unfolding transition, as demonstrated by the excellent agreement (B) between relaxation times calculated from the simulated IR and fluorescence T-jump experiments and the sum of the folding and unfolding rates (kf and ku) obtained from a two-state analysis of the simulation trajectories. (C) Simulated TTET experiments from the wild-type villin simulation at 345 K. Four experiments with probes located on different pairs of residues were simulated. The decay profile of probe absorbance, monitored in TTET experiments, was calculated from the kinetics of contact formation using the contact definitions of (10). The dashed and dotted black lines show the decomposition of the solid black curve (probes located on residues 0 and 23) in the contributions coming from the folded (dashed) and unfolded (dotted) states. (D) Native state structure of the villin headpiece showing in red the side chains corresponding to the residues where donor and acceptor probes were attached in TTET experiments (36). The side chain of residue Phe10 is shown in cyan.

villin, and the resulting Φ-value of 0.2 0.2 is consistent with the useful information regarding the properties of the free-energy fractional Φ-values measured experimentally at 310 K (Φ ¼ 0.3) barrier. It also determines the time resolution needed in experi- and 340 K (Φ ¼ 0.6) (29). In contrast, this mutation has no ob- ments in order to resolve individual folding events in single-mo- servable effect on the folding rate of the Nle/Nle variant, and the lecule studies (39, 40). For each folding and unfolding event we calculated Φ-value for folding is therefore small (0.0 0.2). determined the transition path time as the time required to tran- The small difference in the Φ-value calculated for wild-type sition fully between the RMSD cutoffs used to define these two villin and for the Nle/Nle variant might suggest that the introduc- states. The calculated values range between 120 and 460 ns, and tion of two stabilizing Nle-residues in helix 3 in the Nle/Nle dou- show an approximately exponential dependence on the tempera- ble mutant subtly shifts the folding pathway relative to that in the ture (Fig. 2A). The calculated mean transition path times can be wild-type protein (38). In our previous analysis of the Nle/Nle used to estimate the pre-exponential factor for folding (k0) using variant in the Amber ff99SB*-ILDN force field we found that the relation (40):   helix 3 in general forms early during the folding pathway, and that 1 k 3 hτ i¼ 0 [1] helix 1 almost invariably forms last (14); a result that is in good TP 2πk ln ln k agreement with the very low Φ-value that we here obtained com- 0 f putationally for F10L in the Nle/Nle background. A comparable analysis of the order of helix formation in wild-type villin shows The values of k0 estimated with this approach range between that helix 2 forms first in 80% of the folding events (as compared ð0.5 μsÞ−1 and ð1.5 μsÞ−1, in remarkable agreement with pre- to only 30% in the Nle/Nle double mutant). Thus, the larger Φ- vious estimates (1), and also appear to be temperature-depen- value for F10L in the WT background reflects genuine, but dent (Fig. 2B). For diffusion in a rough potential, it has been subtle, differences in the folding mechanisms of the two proteins proposed that there should be some kind of exponential depen- in our simulations. Future experiments probing the folding ki- dence of k0 with respect to the inverse or the square of the inverse netics and thermodynamics of the F10L mutant in the Nle/Nle of the temperature (41). The exponent is expected to be related variant could be used to validate or disprove the pathway shift to the roughness of the free-energy surface (41). We note the proposed on the basis of MD simulations. exact functional form describing the temperature-dependence The transition path time is the time it takes the of k0 is not known and a substantial amount of additional data to transition between the folded and unfolded basin, and can spanning a larger temperature interval would be required to infer be substantially shorter than the mean waiting times between it from simulation. A crude estimate of the roughness of the en- such folding and unfolding events (39, 40). The mean value of ergy landscape can be obtained from an Arrhenius plot of k0 the transition path time is of considerable interest as it contains (Fig. 2B), giving “effective activation energies” for diffusion of

17848 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1201811109 Piana et al. Downloaded by guest on October 1, 2021 20 A 1 SPECIAL FEATURE Nle/Nle µ s) Nle/Nle-F10L wild type 15

10

5 Number of events

Mean transition path time ( 0 0.1 0.01 0.1 1 0.0026 0.00265 0.0027 0.00275 0.0028 0.00285 Transition path time (µs) Inverse temperature (K−1) Fig. 3. Distribution of transition path times. The plot shows the distribution B 2.0 of the observed transition path times for the 150 folding/unfolding events

) Nle/Nle observed in the Nle/Nle double mutant simulation at 370 K. On the longest −1 Nle/Nle-F10L timescales the distribution is roughly exponential. On the shortest timescales, wild type however, there is a clear “lag time” so that almost no events are observed at the shortest timescales. The distribution can be fitted to a simple model with two intermediate states on the transition path (red) (40), although some dis- 1.0 crepancy is observed for short timescales, suggesting that the actual transi- tion is a more complex process.

a reaction coordinate is optimized to separate maximally the tran- sition paths from the stable folded and unfolded states. The free

Pre-exponential factor ( µ s energy profile calculated along the optimized reaction coordinate 0.5 is therefore expected to provide a reasonable estimate of the ac-

0.0026 0.00265 0.0027 0.00275 0.0028 0.00285 tual folding free-energy barrier. The free-energy barriers for fold- BIOPHYSICS AND Inverse temperature (K−1) −1 ing calculated with this approach range between 1.1 kcal mol COMPUTATIONAL BIOLOGY −1 Fig. 2. Arrhenius plots of mean transition path time and pre-exponential and 2.3 kcal mol (Fig. 4). These values are consistent with −1 −1 factor. Mean transition path times (A) and pre-exponential factors (B) ob- the pre-exponential factors of ð0.5 μsÞ to ð1.5 μsÞ estimated served across the seven different simulations of villin folding plotted as a from the mean transition path times and the calculated folding function of the inverse of temperature. Pre-exponential factors are esti- times of 2.3 to 20 μs. In Fig. 4 we report the free energy profiles mated from the mean transition path times and folding times using Kramers’ for the Nle/Nle double mutant, where simulations are available theory (40). The apparent activation free energy for diffusion can be defined at three temperatures. While the three profiles are remarkably as the slope of the Arrhenius plot where the logarithm of the pre-exponen- similar, the shape of the calculated barrier changes with tempera- tial factor is plotted against the inverse temperature. ture. In the 360-K simulation of the Nle/Nle mutant, a sparsely approximately 7 kcal mol−1 for wild-type villin, approximately populated folding intermediate can be observed; analysis of 5 kcal mol−1 for the Nle/Nle double mutant, and approximately the trajectory suggests that this intermediate corresponds to for- 1 kcal mol−1 for the F10L mutant, when assuming a simple expo- mation of helix 3 and the turn between helices 2 and 3. As the temperature is increased, this intermediate disappears and at nential dependency on the inverse temperature. [For a model in 380 K the top of the barrier moves towards the native state, which the exponent scales with the inverse square of the tempera- suggestive of a Hammond behavior. The free-energy barriers ture (41) we find the corresponding numbers to be approximately −1 −1 −1 calculated for the wild-type villin simulations (1.7 kcal mol 2 kcal mol for wild-type villin, approximately 2 kcal mol for −1 −1 and 2.3 kcal mol at 345 K and 360 K, respectively) can be the Nle/Nle double mutant, and approximately 1 kcal mol for the F10L mutant.] Finally, we find that the distribution of transition-path times 2 )

for individual folding and unfolding events is rather broad and −1 characterized by a “lag” at short times where no events are found and a roughly exponential decay at longer timescales. In Fig. 3 we 1 show a histogram of the observed transition path times for the 150 folding and unfolding events observed for the Nle/Nle double 0 mutant at 370 K. The mean value is 150 ns, and the histogram is peaked around 40 ns. This shape is qualitatively consistent with – −1 Nle/Nle 360 K that predicted by a number of theories (41 44); it would, how- Nle/Nle 370 K Free energy (kcal mol ever, require a substantially larger number of transitions to deter- Nle/Nle 380 K mine which of these theories fits the simulation results best. The −2 short-end timescale of this distribution sets a natural lower limit 0 0.5 1 1.5 r to the timescales needed to observe folding events in shorter MD simulations. As long as the folding and unfolding relaxation times Fig. 4. Free energy profiles for the fast-folding Nle/Nle double mutant of are substantially longer that the transition path time, as is the case villin. Free energy profiles for the folding of the Nle/Nle double mutant for the simulations performed here, a two-state analysis will pro- of villin at three temperatures, projected along an optimized one-dimen- duce a reasonable description of the kinetics of the system. sional coordinate. The reaction coordinate was optimized separately for each simulation. To facilitate comparison, the profiles have been rescaled on the x- axis and translated so that the folded state has a coordinate value of approxi- Estimation of the Folding Free-Energy Barrier. We used a previously mately 0 and the unfolded state has a coordinate value of approximately 1. described variational approach (45, 46) to estimate the free- Folding free-energy barriers calculated from the profiles are 1.10 kcal mol−1 energy barrier for the folding of villin. Briefly, in this approach, at 360 K, 1.14 kcal mol−1 at 370 K, and 1.70 kcal mol−1 at 380 K.

Piana et al. PNAS ∣ October 30, 2012 ∣ vol. 109 ∣ no. 44 ∣ 17849 Downloaded by guest on October 1, 2021 compared to the 0.5–2 kcal mol−1 estimated experimentally using and the Nle/Nle double mutant are slightly different and suggest a a number of approaches (30). possible way to probe this experimentally. Although the main focus of our work was to demonstrate the Methods feasibility of comparing folding simulations directly with experi- The Anton specialized hardware (15) was used to perform MD simulation ments, we also note here that many of our results are in line with with the Amber ff99SB*-ILDN (18, 19, 23) force field following the protocol described in (14). Further details on the simulation and analysis methods are expectations based on existing theories of protein folding such as reported in the SI Text. energy landscape theory (47). These theories describe protein folding as a diffusive process on a rough free energy landscape; Conclusion indeed we find that even in the presence of a marginal free energy We have further validated the Amber ff99SB*-ILDN force field barrier, landscape roughness still limits folding to the microse- and used it to examine the folding process of villin, demonstrating cond timescale, and that the relaxation kinetics can be exponen- how long-timescale molecular dynamics simulations can provide tial even with small barriers. Also, landscape theory suggests that direct access to a range of thermodynamic and kinetic properties the folding mechanism is more easily affected by perturbations for folding. Most of the calculated observables are in reasonably than native state structure, in good agreement with our findings. good agreement with experiments, the largest discrepancies being that the calculated heat capacity for folding is smaller than in the ACKNOWLEDGMENTS. We thank Ron O. Dror and William A. Eaton for helpful experiments and folding rates are slower by a factor of three. Our discussions and a critical reading of the manuscript and Mollie Kirk for simulations indicate that the folding pathway for wild-type villin editorial assistance.

1. Kubelka J, Hofrichter J, Eaton WA (2004) The protein folding “speed limit”. Curr Opin 26. Buchete NV, Hummer G (2008) Coarse master equations for folding dynamics. Struct Biol 14:76–88. J Phys Chem B 112:6057–6069. 2. McKnight CJ, Doering DS, Matsudaira PT, Kim PS (1996) A thermostable 35-residue 27. Kubelka J, Eaton WA, Hofrichter J (2003) Experimental tests of villin subdomain fold- subdomain within villin headpiece. J Mol Biol 260:126–134. ing simulations. J Mol Biol 329:625–630. 3. Kubelka J, Chiu TK, Davies DR, Eaton WA, Hofrichter J (2006) Sub-microsecond protein 28. Saladino G, Marenchino M, Gervasio FL (2011) Bridging the gap between folding folding. J Mol Biol 359:546–553. simulations and experiments: The case of the villin headpiece. J Chem Theory Comput 4. Ensign DL, Kasson PM, Pande VS (2007) Heterogeneity even at the speed limit of fold- 7:2675–2680. ing: Large-scale molecular dynamics study of a fast-folding variant of the villin head- 29. Kubelka J, Henry ER, Cellmer T, Hofrichter J, Eaton WA (2008) Chemical, physical, – piece. J Mol Biol 374:806 816. and theoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA 5. Freddolino PL, Schulten K (2009) Common structural transitions in explicit-solvent 105:18655–18662. simulations of villin headpiece folding. Biophys J 97:2338–2347. 30. Godoy-Ruiz R, et al. (2008) Estimating free-energy barrier heights for an ultrafast fold- 6. Mittal J, Best RB (2010) Tackling force-field bias in protein folding simulations: Folding ing protein from calorimetric and kinetic data. J Phys Chem B 112:5938–5949. of Villin HP35and Pin WW domains in explicit water. Biophys J 99:L26–L28. 31. Best RB, Mittal J (2010) Protein simulations with an optimized : Coopera- 7. Duan Y, Kollman PA (1998) Pathways to a protein folding intermediate observed in a tive helix formation and temperature-induced unfolded state collapse. J Phys Chem B 1-microsecond simulation in aqueous solution. Science 282:740–744. 114:14916–14923. 8. Freddolino PL, Harrison CB, Liu Y, Schulten K (2010) Challenges in protein-folding 32. Bunagan MR, Gao J, Kelly JW, Gai F (2009) Probing the folding transition state struc- simulations. Nat Phys 6:751–758. ture of the villin headpiece subdomain via side chain and backbone mutagenesis. JAm 9. Best RB (2012) Atomistic molecular simulations of protein folding. Curr Opin Struct – Biol 22:52–61. Chem Soc 131:7470 7476. 10. Beauchamp KA, Ensign DL, Das R, Pande VS (2011) Quantitative comparison of villin 33. Frishman D, Argos P (1995) STRIDE: A web server for secondary structure assignment – headpiece subdomain simulations and triplet–triplet energy transfer experiments. from known atomic coordinates of proteins. Proteins 23:566 579. Proc Natl Acad Sci USA 108:12734–12739. 34. Callis PR (2011) Predicting fluorescence lifetimes and spectra of biopolymers. Methods 11. Beauchamp KA, et al. (2011) MSMBuilder2: Modeling conformational dynamics at the Enzymol 487:1–38. picosecond to millisecond scale. J Chem Theory Comput 7:3412–3419. 35. Tusell JR, Callis PR (2012) Simulations of tryptophan fluorescence dynamics during 12. Bowman GR, Beauchamp KA, Boxer G, Pande VS (2009) Progress and challenges in the folding of the villin headpiece. J Phys Chem B 116:2586–2594. automated construction of Markov state models for full protein systems. J Chem Phys 36. Andreas R, Henklein P, Kiefhaber T (2010) An unlocking/relocking barrier in conforma- 131:124101. tional fluctuations of villin headpiece subdomain. Proc Natl Acad Sci 107:4955–4960. 13. Bowman GR, Pande VS (2010) Protein folded states are kinetic hubs. Proc Natl Acad Sci 37. Matouschek A, Kellis JT, Jr, Serrano L, Fersht AR (1989) Mapping the transition state USA 107:10890–10895. and pathway of protein folding by protein engineering. Nature 340:122–126. 14. Piana S, Lindorff-Larsen K, Shaw DE (2011) How robust are protein folding simulations 38. Lei H, Chen C, Xiao Y, Duan Y (2011) The protein folding network indicates that with respect to force field parameterization? Biophys J 100:L47–L49. the ultrafast folding mutant of villin headpiece subdomain has a deeper folding fun- 15. Shaw DE, et al. (2009) Millisecond-scale molecular dynamics simulations on Anton. Pro- nel. J Chem Phys 134:205104. ceedings of the Conference on High Performance Computing, Networking, Storage 39. Chung HS, Louis JM, Eaton WA (2009) Experimental determination of upper bound for and Analysis (SC09) (ACM, New York). transition path times in protein folding from single-molecule photon-by-photon tra- 16. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How fast-folding proteins fold. jectories. Proc Natl Acad Sci USA 106:11837–11844. – Science 334:517 520. 40. Chung HS, McHale K, Louis JM, Eaton WA (2012) Single molecule fluorescence experi- 17. Shaw DE, et al. (2010) Atomic-level characterization of the structural dynamics of pro- ments determine protein folding transition path times. Science 335:981–984. teins. Science 330:341–346. 41. Zwanzig R (1988) Diffusion in a rough potential. Proc Natl Acad Sci USA 18. Hornak V, et al. (2006) Comparison of multiple Amber force fields and development of 10485:2029–2030. improved protein backbone parameters. Proteins 65:712–725. 42. Zhang BW, Jasnow D, Zuckerman DM (2007) Efficient and verified simulation of a path 19. Best RB, Hummer G (2009) Optimized molecular dynamics force fields applied to the ensemble for conformational change in a united-residue model of calmodulin. Proc helix-coil transition of polypeptides. J Phys Chem B 113:9004–9015. Natl Acad Sci USA 104:18403–18048. 20. Lindorff-Larsen K, et al. (2012) Systematic validation of protein force fields against 43. Malinin SV, Chernyak VY (2010) Transition times in the low-noise limit of stochastic experimental data. PLoS ONE 7:e32131. dynamics. J Chem Phys 132:014504. 21. Settanni G, Rao F, Caflisch A (2005) Phi-value analysis by molecular dynamics simula- tions of reversible folding. Pro Natl Acad Sci USA 102:628–633. 44. Chaudhury S, Makarov DE (2010) A harmonic transition state approximation for the 22. Cellmer T, Buscaglia M, Henry ER, Hofrichter J, Eaton WA (2011) Making connections duration of reactive events in complex molecular rearrangements. J Chem Phys between ultrafast protein folding kinetics and molecular dynamics simulations. Pro 133:034118. Natl Acad Sci USA 108:6103–6108. 45. Best RB, Hummer G (2005) Reaction coordinates and rates from transition paths. Proc 23. Lindorff-Larsen K, et al. (2010) Improved side-chain torsion potentials for the Amber Natl Acad Sci USA 102:6732–6737. ff99SB protein force field. Proteins 78:1950–1958. 46. Hummer G (2005) Position-dependent diffusion coefficients and free energies from 24. Chiu TK, et al. (2005) High-resolution X-ray structures of the villin headpiece Bayesian analysis of equilibrium and replica molecular dynamics simulations. New J subdomain, an ultrafast folding protein. Proc Natl Acad Sci USA 102:7517–7522. Phys 7:34. 25. Northrup SH, Hynes JT (1980) The stable states picture of chemical reactions. I. Formu- 47. Onuchic JN, Wolynes PG (2004) Theory of protein folding. Curr Opin Struct Biol lation for rate constants and initial condition effects. J Chem Phys 73:2700–2714. 14:70–75.

17850 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1201811109 Piana et al. Downloaded by guest on October 1, 2021