Chemical, physical, and theoretical kinetics INAUGURAL ARTICLE of an ultrafast folding protein

Jan Kubelkaa,b,1, Eric R. Henrya,1, Troy Cellmera, James Hofrichtera, and William A. Eatona,2

aLaboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520; and bDepartment of Chemistry, University of Wyoming, Laramie, WY 82071

This contribution is part of the special series of Inaugural Articles by members of the National Academy of Sciences elected on April 25, 2006.

Contributed by William A. Eaton, September 3, 2008 (sent for review June 25, 2008) An extensive set of equilibrium and kinetic data is presented and A analyzed for an ultrafast folding protein—the villin subdomain. The equilibrium data consist of the excess heat capacity, trypto- phan fluorescence quantum yield, and natural circular-dichroism spectrum as a function of temperature, and the kinetic data consist of time courses of the quantum yield from nanosecond-laser temperature-jump experiments. The data are well fit with three kinds of models—a three-state chemical-kinetics model, a physical- kinetics model, and an Ising-like theoretical model that considers 105 possible conformations (microstates). In both the physical- kinetics and theoretical models, folding is described as diffusion on a one-dimensional free-energy surface. In the physical-kinetics model the reaction coordinate is unspecified, whereas in the theoretical model, order parameters, either the fraction of native BIOPHYSICS contacts or the number of native residues, are used as reaction B coordinates. The validity of these two reaction coordinates is demonstrated from calculation of the splitting probability from the rate matrix of the master equation for all 105 microstates. The analysis of the data on site-directed mutants using the chemical- kinetics model provides information on the structure of the transition-state ensemble; the physical-kinetics model allows an

estimate of the height of the free-energy barrier separating the CHEMISTRY folded and unfolded states; and the theoretical model provides a detailed picture of the free-energy surface and a residue-by- residue description of the evolution of the folded structure, yet contains many fewer adjustable parameters than either the chemical- or physical-kinetics models. Fig. 1. Structure of villin subdomain solved by x-ray diffraction (PDB 1WY4) (2). fluorescence ͉ funneled energy landscape ͉ Ising-like model ͉ Ribbon diagram of backbone showing the side chains of W23 and H27 (A) and laser temperature jump ͉ polypeptide structure with all nonhydrogen atoms (B). Residues F6,K7,A8,G11,M12,T13 are shown in black because their contacts contribute most to the stability of the most populated microstates of the transition state ensemble at 310 K (see Figs. S8 major challenge to advancing our understanding of how and S9). Aproteins fold is the development of an analytical theoretical model capable of calculating the quantities directly measured in both equilibrium and kinetic experiments. We have approached able states and the folding rates for individual proteins and have this problem experimentally by studying a small ultrafast folding also had some success in predicting the relative effect on folding protein, the 35-residue subdomain from the villin headpiece rates and equilibrium constants produced by site-directed mu- (1–7) (Fig. 1). It is the smallest naturally occurring protein that tations [i.e., ␾-values (22)] (12, 16–18, 23). However, up to now autonomously folds into a globular structure (8–10), so it should they have not been used to calculate the physical properties that have one of the simplest protein-folding mechanisms, which may are actually measured experimentally. therefore be amenable to understanding in depth by a theoretical In this work, we present and analyze an extensive set of model. Moreover, because folding of this protein occurs in a few equilibrium and kinetic data on the villin subdomain. The microseconds, close to the proposed theoretical speed limit (4, equilbrium data consist of the excess heat capacity (6), trypto- 11), it can be investigated in detail by molecular-dynamics simulations. Our theoretical approach is to calculate the exper- imentally measured quantities with an Ising-like statistical me- Author contributions: J.K., E.R.H., T.C., J.H., and W.A.E. designed research, performed chanical model (12, 13), originally developed to explain our research, analyzed data, and wrote the paper. results on the ␤-hairpin from the protein GB1 (14, 15), and The authors declare no conflict of interest. similar to models of Baker, Finkelstein, and coworkers (16–18). 1J.K. and E.R.H. contributed equally to this work. The key simplifying feature of these models is that they explicitly 2To whom correspondence should be addressed at: Building 5, Room 104, National Insti- consider only interactions between residues that are in contact tutes of Health, Bethesda, MD 20892-0520. E-mail: [email protected]. in the native structure [the perfectly funneled energy landscape This article contains supporting information online at www.pnas.org/cgi/content/full/ of Wolynes and Onuchic (19–21)]. These models have been 0808600105/DCSupplemental. remarkably successful in predicting both the number of observ- © 2008 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808600105 PNAS ͉ December 2, 2008 ͉ vol. 105 ͉ no. 48 ͉ 18655–18662 Downloaded by guest on September 30, 2021 phan fluorescence quantum yield (QY), and natural circular- dichroism (CD) spectrum (1, 2, 4) as a function of temperature. The kinetic data consist of time courses of the QY over a wide temperature range from nanosecond-laser-induced tempera- ture-jump experiments. These measured quantities are calcu- lated using a coarse-grained version of the Ising-like model of Mun˜oz, Henry, and Eaton (12, 13). The kinetics are described by diffusion on a one-dimensional free-energy surface (24), using either the number of ordered residues (25) or the fraction of native contacts (26, 27) as reaction coordinates (28), and a position-dependent diffusion coefficient determined from mea- surements of the relaxation rate as a function of viscosity (7). We also calculate ␾-values for a number of mutants from the change in relaxation rate resulting from the perturbation of the free- Fig. 2. Excess heat capacity as a function of temperature. The filled circles are energy surface produced by the mutation. To test the validity of our use the experimental data. The curves are the fits using the three-state chemical- kinetics (dashed, red), physical-kinetics (continuous, cyan), and theoretical of order parameters as reaction coordinates, we calculated the splitting (continuous, blue) models. probability (also called the pfold) (29–32) from the rate matrix of the master equation for all 105 microstates of the model. In addition, we have analyzed the data in terms of a conven- Chemical-Kinetics Model. The simplest chemical-kinetics model is tional chemical-kinetics model and a model in which the kinetics a two-state model, in which there are only two populations of are described by diffusion on an empirical free-energy surface, molecules at equilibrium and at all times in kinetic experiments. similar to what has previously been done for other proteins by However, the observation of two phases in the kinetic progress Gruebele (33), Mun˜oz (34), and their coworkers. For lack of a curves requires consideration of a third state. The conventional better term, we call this a physical-kinetics model. The chemical- three-state folding model is one in which there is an intermediate and physical-kinetics models are not only helpful in interpreting state (I) on the pathway from the unfolded (U) to folded (F) experimental results, but they also expose features that support state, i.e. the validity of the theoretical results. We believe that our analysis, using three very different types of models to interpret 105 Ϫ 106 sϪ1 107 sϪ1 the data, represents the most comprehensive approach so far to U -|0 I -|0 F understanding the results of equilibrium and kinetic experiments on . Results This model provides an excellent fit to both equilibrium and kinetic data (Figs. 2–4). The standard equations used and the Experimental Data to Be Calculated. The most important equilib- rium data to be calculated are the heat capacity as a function of temperature measured by differential scanning calorimetry (6). The reason for its special importance is that the parameters A required to fit the data are purely thermodynamic quantities, unlike both equilibrium fluorescence and CD data that require additional parameters to describe the temperature dependence of the QY and CD for each state of the chemical-kinetics model, each position along the reaction coordinate for the physical- kinetics model, or each microstate of the theoretical model. Because the theoretical-kinetics model does not consider the contributions from hydration or internal degrees of freedom arising from bond and angle vibrations, and the chemical- kinetics model considers only differences in heat capacity rela- tive to the native state, the relevant experimental quantity is the B

)

3

heat capacity in excess of that for the fully folded, native state 1

(Fig. 2). 1010

Fig. 3 shows the equilibrium thermal unfolding curves mea- x

dmol

2- sured by fluorescence and CD. To reduce the number of 2 -

adjustable parameters in fitting with all three models, we also cm measured the fluorescence of a fragment to simulate the QY for

ellipticity

deg the fully unfolded protein (upper dotted curve in Fig. 3A). The deg results of the kinetics experiments are shown in Fig. 4. The ( observed progress curves [QY versus time, supporting informa- tion (SI) Fig. S1] are well characterized at each temperature by two exponential processes with relaxation times of Ϸ100 ns and 1–5 ␮s. Several lines of evidence, including an independent Fig. 3. Tryptophan fluorescence quantum yield and circular dichroism as a estimate of the folding rate from an analysis of end-to-end function of temperature. The filled circles are the experimental data. The contact measurements (3) and infrared studies by Dyer and curves are the fits using the three-state chemical-kinetics (dashed, red), phys- coworkers (35, 36), indicate that the slower relaxation corre- ical-kinetics (continuous, cyan), and theoretical (continuous, blue) models. (A) ␾ Tryptophan fluorescence: the upper dotted curve corresponds to the mea- sponds to global unfolding/refolding. Fig. 5 shows -values for 10 sured quantum yield of the fragment (AcWKQQH); the lower dotted curve is mutants calculated from simultaneous fitting of fluorescence the quantum yield of the theoretical-kinetics model for microstates that have and CD curves as in reference (1) and a W23-H27 contact. (B) Circular dichroism. The curves are the fits using the relaxation rates assuming a two-state model (equilibrium and three-state chemical-kinetics (dashed, red), physical-kinetics (continuous, rate constants are summarized in Table S1). cyan), and theoretical (continuous, blue) models.

18656 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808600105 Kubelka et al. Downloaded by guest on September 30, 2021 A INAUGURAL ARTICLE

B Fig. 6. Free-energy surface (continuous) and populations (dashed) of phys- ical-kinetics model.

unfolded and folded states, with the position of the minima allowed to move with temperature as a way of generating an additional relaxation. Recognizing that, with the possible excep- tion of the calorimetric data, the information content in the data is insufficient to determine curvatures, we used effectively the same curvature for both wells and barrier top and parameterized the surface with coordinates (qi,Gi) for the unfolded state, the barrier top, and the folded state. Fig. 6 shows the free-energy Fig. 4. Relaxation rates (A) and kinetic amplitudes (B) as a function of surfaces as a function of temperature that optimally fit the temperature. The open and filled circles are the relaxation rates obtained experimental data (the coordinate dependencies of the QY and from biexponential fits to the measured kinetic progress curves shown in Fig. S1. The curves are the fits using the three-state chemical-kinetics (red), phys- CD are shown in Fig. S3). The calorimetric data (Fig. 2) were fit BIOPHYSICS ical-kinetics (cyan), and theoretical models with P (green) and Q (blue) as from the temperature dependence of an equilibrium constant reaction coordinates. (Fig. S4), defined by the ratio of the populations on either side of the dividing line, taken as the value of q at the free-energy barrier top (see SI Appendix for details). values of the adjustable parameters of the model are given in SI The physical-kinetics model also provides an excellent fit to all Appendix and Table S2. The important result of the three-state of the equilibrium and kinetic data (Figs. 2–4). The important model, as indicated in the scheme above, is that the intermediate results of analyzing the data by using this model are that the (I) interconverts with the folded state (F) much faster than it Ϸ

free-energy barrier to folding is very small ( 2 kcal/mol) and CHEMISTRY does with the unfolded state (U) and therefore lies on the folded that the Ϸ100-ns process is explained as relaxation in the folded side of the major free-energy barrier (the populations of the well to more unfolded conformations as the temperature in- three states as a function of temperature are shown in Fig. S2). creases, as indicated by the movement of the folded-well mini- mum to smaller values of the reaction coordinate. Physical-Kinetics Model. Our physical-kinetics model consists of a free energy (G) versus reaction coordinate function, and the Theoretical-Kinetics Model. The model has been described in detail values of the observables (QY and CD) at each value of the elsewhere (6, 12, 13) (see SI Appendix and Table S3 for details). reaction coordinate (q). We assumed that there are only two The principal assumptions of the model are that each residue of deep minima on this free-energy surface, corresponding to the the polypeptide chain can exist in one of two possible states— native (n) or nonnative (c), as in an Ising model (37)—and that no more than two continuous stretches of native residues are allowed in each molecule (e.g., …cnnncccnnnccc…). This latter assumption, the so-called double-sequence approximation, greatly reduces the number of possible configurations from 235 (3 ϫ 1011)to6ϫ 104. The free energy and thermodynamic weight of a stretch of native residues of length j beginning at position i are, respectively, ϭ ␧Ϫ ⌬ ϭ ͑Ϫ ͒ Gji nji jT sconf, wji exp Gji/RT [1]

where nji is the number of contacts in the stretch, ␧ is the energy per contact, and ⌬sconf is the conformational-entropy cost of fixing a residue in its native conformation. The model allows contacts between residues in a native segment and between residues in two different native segments, so there is an addi- tional destabilizing term in the partition function for connecting Fig. 5. ␾-Values. The red points are the experimental data at 310 K (see Table the two segments by a disordered loop, but it contains no S1); the continuous blue and green curves are the ␾-values calculated for the adjustable parameters. The same energy (␧) was assigned to each Q and P reaction coordinates, respectively, assuming a two-state model and a interresidue contact and is an adjustable parameter of the model. 50 cal/mol perturbation of the stability by altering the contact energy for the ⌬ mutated residue. The dotted blue and green curves, for Q and P respectively, In the same spirit, the same entropy decrease ( sconf) for the correspond to the (Boltzmann-weighted) average fraction of native contacts nonnative to native transition was assigned to every residue. ⌬ ␧ for the microstates at reaction coordinates with free energies within 1 kBT of Allowing sconf and to be temperature-dependent to account the barrier top. for hydrophobic interactions gave no improvement to the fits.

Kubelka et al. PNAS ͉ December 2, 2008 ͉ vol. 105 ͉ no. 48 ͉ 18657 Downloaded by guest on September 30, 2021 Fig. 7. Results of theoretical-kinetics model. (A) Contact map (PDB 1YRF). (B) Free energy and relative population versus Q reaction coordinate. (C) Free energy and relative population versus P reaction coordinate. (D) Probability that a residue is in its native conformation at each temperature. The thick red bars indicate helical residues. (E) Probability that a residue is in its native conformation at each value of Q relative to all microstates at that value of Q.(F) Probability that a residue is in its native conformation at each value of P relative to all microstates at that value of P. (G–I). Relative probability that contact is formed at Q ϭ 0.09 (barrier top) (G), 0.33 (H), 0.71 (I). See Fig. S5 for corresponding plot for P reaction coordinate.

A contact between residues exists if the distance between contribution to the CD from the lone tryptophan (see Eqs. S7 and ␣-carbons of the polypeptide backbone in the folded structure is S8 in SI Appendix). For the QY of each microstate we assumed that Յ0.8 nm. This definition results in a rather sparse interresidue the only source of quenching relative to the fully unfolded state contact map (Fig. 7A), with only 13 nonlocal contacts (which we arises from the contact of the tryptophan with the protonated define as contacts between residues separated by five or more one turn away on the helix (Fig. 1) when all of the residues in the sequence) and 9 contacts between helices. Using intervening residues are in their native conformation. This assump- atom–atom contacts as a criterion for interresidue contacts, Chiu tion is based on the well known quenching of tryptophan fluores- et al. (2) found the corresponding numbers to be 18 and 13. cence by protonated histidine (38) and our observation of a Ϸ4-fold Importantly, our more coarse-grained contact map does contain decrease in the amplitude for the slower unfolding/refolding relax- contacts by all three of the core (F6, F10, and ation at pH 7, where the histidine is mainly unprotonated (data not F17) and two of the three interhelical hydrogen bonds. shown). For microstates in which this contact is not made, we The excess heat capacity was calculated directly from the parti- assumed that the QY is the same as that measured for a short tion function of the model (see Eqs. S1–S5 in SI Appendix), whereas peptide fragment containing the lone tryptophan of the sequence calculation of the unfolding curves measured by CD and fluores- (upper dotted curve in Fig. 3A). cence required a model for the CD and the QY for each microstate The kinetics were obtained by solving the system of differen- of the model (see SI Appendix for details). For short helices, there tial equations describing reversible hopping between adjacent is a large dependence of the CD on the length of the helix. We used discrete values of the reaction coordinate on a one-dimensional the model of Thompson et al. (38), which also considers the free-energy surface, where the reaction coordinate is taken as

18658 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808600105 Kubelka et al. Downloaded by guest on September 30, 2021 either the number of ordered residues (P) (25) or the fraction of native contacts (Q) (26) (see Eqs. S9–S12 in SI Appendix) (Figs. A 7 B and C). The QY at each position along the reaction INAUGURAL ARTICLE coordinate was calculated from the Boltzmann-weighted con- tribution of the subset of the 92,696 microstates of the partition function having that value of the reaction coordinate. Fig. S1 shows representative fits to the progress curves, with all of the results summarized in Fig. 4, which compares the calculated and experimental values of the relaxation rates and amplitudes at each temperature. Fig. 5 contains the ␾-values predicted by the model using two different types of calculations (see SI Appendix for details). In one, ␾-values were calculated in the small-perturbation limit by B adjusting the contact energy of the mutated residue to produce a new free-energy surface with a 50-cal/mol decrease in the stability of the folded state, assuming a two-state system with the dividing surface at the barrier top (Q ϭ 0.09 or P ϭ 5). The folding rate was then obtained from the new relaxation rate and equilibrium constant to yield the ␾-value, as was done with the experimental data. In the second method, the ␾-value was obtained from the (Boltzmann-weighted) fraction of native contacts for each residue for all microstates at and close to the barrier top, assumed to be the transition state. This method corresponds to the conventional interpretation of experimental ␾-values and is also similar to what is done to calculate ␾-values from simulation results; the correct calculation, namely the deter- mination of the change in folding rate and equilibrium constant Fig. 8. Distribution of spitting probabilities (pfold) calculated from the rate Ͻ BIOPHYSICS produced by the mutation, is much more difficult to calculate from matrix for microstates 2 kBT above the most stable one at each value of the ϭ simulations and has not yet been done for any protein. reaction coordinate at 310 K. (A) pfold distribution at Q 0.03, 0.09 (the barrier top), and 0.27. (B) p distribution at P ϭ 3, 5 (the barrier top), and 10. See Fig. The important output of the theoretical model (Fig. 7) is fold S6 for pfold distribution for all microstates at the same values of the reaction mechanistic information, that is, the prediction of how the coordinates. structure evolves from the unfolded to folded state along the reaction coordinates, P and Q. Fig. 7 E and F show the relative probability at each value of the Q or P reaction coordinate, and equilibrium constant—the ␾-value (22), defined as ⌬lnkf/

respectively, that a residue is in its native conformation, whereas ⌬lnKeq. Because of the separation in time scales between the fast CHEMISTRY Fig. 7 G–I shows the evolution of the interresidue contacts along phase and the global unfolding/refolding phase, the fast phase the Q reaction coordinate (the corresponding contact maps for could be ignored, permitting the calculation of ␾-values from a P are shown in Fig. S5 and are very similar). simple two-state analysis to obtain the folding rate and equilib- rium constant for the wild-type and the mutants. None of the Splitting Probabilities. The splitting probability (or pfold)isthe mutations are ideal for ␾-value analysis, because they do not probability that a given microstate of the model will reach the represent small structural changes, such as simple deletion of a folded state before reaching the unfolded state (29–32). These methyl group, as in replacing with or probabilities were calculated for the individual microstates from with . Nevertheless, the ␾-values are unusually low at 310 the rate matrix for the master equation of the theoretical model K (Fig. 5), suggesting a transition state with little structure (see SI Appendix) (39). Fig. 8 shows the splitting-probability formation and therefore one that appears very early along the distribution for those microstates of the model with free energies reaction coordinate. There is, moreover, a significant increase in of 2 k T or less above the lowest-free energy microstate for the B ␾-values for four of the nine residues for which ␾-values could specified value of the coordinate. Fig. S6 shows the distribution be calculated at 340 K (Table S1), suggesting a shift in the transition for all microstates at these values of the reaction coordinates. state toward the folded state at the higher temperature. Discussion The physical-kinetics model, which consists of an empirical Each of the three models used to simultaneously fit the equi- one-dimensional free-energy surface with two deep minima librium and kinetic data provides information on the mechanism (Fig. 6) and the coordinate dependence of the fluorescence QY of folding of the villin subdomain. The three-state chemical- and CD (Fig. S3) to give a total of 16 adjustable parameters kinetics model provides an excellent fit to all of the data (Figs. (Table S4), also provides an excellent fit to the equilibrium and 2–4), which is not surprising because there are 18 adjustable kinetic data (Figs. 2–4). According to this model, the position of parameters in the model (Table S2). The important result from the folded-well minimum moves toward smaller values of the this model is that the intermediate interconverts with the fully reaction coordinate with increasing temperature. This motion folded state (Ϸ107 sϪ1) much faster than with the unfolded state represents partial unfolding and explains the decrease in helix (Ϸ105 to 106 sϪ1), from which we conclude that it is located on content before the global thermal unfolding (Fig. 3B). It also the folded side of the major free-energy barrier. The three-state explains the Ϸ100-ns phase in the kinetics (Fig. 4) as reconfigu- model also attributes the increase in CD before the main ration in the shifted folded well at the elevated temperature. The unfolding transition (Fig. 3B) to an increase in the partially lack of temperature dependence for the rate of the Ϸ100-ns unfolded intermediate-state population of lower helix content phase is consistent with our a priori assumption that there is no (Fig. S2). The important feature of a chemical-kinetics model is additional barrier on the free-energy surface. that there is a straightforward prescription for obtaining infor- The major result of the physical-kinetics model is the predic- mation on the ensemble of structures of the transition state from tion of the height of the free-energy barrier separating folded the relative effects of site-directed mutants on the folding rate and unfolded states. According to Kramers’ theory, the rate

Kubelka et al. PNAS ͉ December 2, 2008 ͉ vol. 105 ͉ no. 48 ͉ 18659 Downloaded by guest on September 30, 2021 of barrier crossing depends on the curvature of the wells and the bution of splitting probabilities for those microstates of the model barrier top and exponentially on the barrier height (24), so the within 2 kBT of the lowest free-energy microstate at the barrier top barrier height dominates. An important result of the physical- and at positions 1.6 kBT to the left and right of the barrier top. A kinetics model, then, is that the free-energy barrier to folding substantial fraction of microstates exhibits a pfold between 0.4 and (Fig. 6) is small (Ϸ2 kcal/mol). A Ϸ2-kcal/mol barrier is also 0.6 at the barrier top for both Q (28% at Q ϭ 0.09) and P (44% at simply calculated from Kramers’ equation for a two-state system, P ϭ 5) reaction coordinates, with a sharp change in the pfold assuming that the diffusion coefficients and curvatures in the distribution at higher and lower values of Q and P. We also asked folded and unfolded well and at the barrier top of the free-energy the question: what fraction of the low lying microstates with a pfold ⌬ ‡ ϭ ␶ ␲␶ ϭ surface are the same, i.e., Gf RTf ln( f/2 ) 1.6 kcal/mol, between 0.4 and 0.6 is within the barrier region, i.e., at reaction where ␶f (4.6 ␮s) is the folding time, and ␶ (70 ns) is the coordinates within 1 kBT of the barrier top? If we identify the low reconfiguration time in the folded well (4). The 1- to 2-kcal/mol lying microstates as those within 2 kBT of the lowest free-energy barriers at the folding temperature are also obtained by fitting microstate at each value of the reaction coordinate, we find that the the calorimetric data (Fig. 2) with the variable barrier model of barrier region contains 84% of the microstates with a pfold between Mun˜oz and Sanchez-Ruiz (6) or the temperature dependence 0.4 and 0.6 for Q and 100% for P. If the low lying microstates are of the relaxation rates (Fig. 4A) with a mean-field model of identified as those within 3 kBT of the lowest free-energy microstate Naganathan et al. (34). at each value of the reaction coordinate, the barrier region contains The Ising-like theoretical model provides remarkably good fits 48% for Q and 90% for P. All of these results indicate that both Q to all of the experimental equilibrium and kinetic data (Figs. and P are indeed good reaction coordinates for describing the 2–4) and yields the most information about the folding mech- kinetics. anism, yet contains far fewer adjustable parameters than either Another important test of the theoretical model is its ability the chemical-kinetics or physical-kinetics models. Although the to predict ␾-values, which cannot be calculated from either the fit to the excess heat capacity-vs.-temperature curve (Fig. 2) is chemical- or physical-kinetics model. Fig. 5 shows the ␾-values not as good as either of the other two models, the theoretical for the 10 mutations studied. Of these, only two (L20V and model requires only two adjustable parameters to (nearly) fit the L28V) might represent a sufficiently small structural perturba- heat-capacity data—a contact energy and a conformational- tion to satisfy the assumption of the ␾-value analysis that the entropy loss (Eq. 1) that are the same for every residue, mutation does not alter the unfolded state and changes only the compared with eight adjustable parameters of the chemical- interaction of the mutated residues with its contacting neighbors kinetics model and nine of the physical-kinetics model. The in the transition and folded states without perturbing any other fitted value of Ϫ3.70 cal molϪ1 KϪ1 for the conformational- residue–residue interactions. At 310 K the observed ␾-values are entropy loss, moreover, is comparable to the average value unusually low compared with all other proteins, which is qual- expected from thermodynamic analysis of many proteins (6). itatively explained by the theoretical model as resulting from the The kinetics are described by the theoretical model as hopping major barrier being very early along the reaction coordinate for along the discretized one-dimensional free-energy surface given by both Q and P (Fig. 7 B and C), where there is a very low the partition function, by using either the fraction of native contacts probability of native contact formation for most residues (Fig. (Q) or number of native residues (P) as reaction coordinates 7G and Figs. S5A, S8, and S9). The model also does a remarkably (25–27). By invoking a linear free-energy relation between the good job of predicting the low ␾-values quantitatively using two hopping rate and the ratio of equilibrium populations at adjacent different methods (Fig. 5 and Table S1). values of the reaction coordinate, the additional adjustable param- Although the system is not far from two-state, as judged by the eter required to describe the kinetics is the proportionality constant temperature dependence of the probability that a residue is in its (␥) and an activation energy for its temperature dependence (Eqs. native state (Fig. 7D), we do not have the simple case of a high S10–S12 in SI Appendix). From a recent study of the viscosity barrier that remains at the same position along the reaction dependence of the measured relaxation rates using a viscogen that coordinate at all temperatures. At temperatures above the folding has no effect on the equilibrium properties, we obtained the temperature of 340 K, the surface becomes more complex, indi- reaction-coordinate dependence of ␥ by fitting the data with the cating formation of an intermediate, and a new major barrier theoretical model (see Fig. S7) (7). The resulting simultaneous fits appears closer to the folded state (Q Ϸ 0.5 and P Ϸ 25). (A simple to all of the equilibrium and kinetic data (Figs. 2–4 and Fig. S1), calculation explains almost all of the shift: Increasing the temper- apart from the CD (see SI Appendix), are not quite as good as the ature by ⌬T results in an increase in free energy at P ϭ 25 relative other models, but again there is a large difference in the number of to P ϭ 5 from the change in the contribution from the conforma- adjustable parameters (7 for the theoretical model, 15 for the tional entropy of ⌬P ϫ⌬T ϫ⌬sconf, which, for a 40°C temperature physical-kinetics model, and 18 for the chemical-kinetics model). increase, is 3 kcal/mol). This change in the surface contributes to the The theoretical model fails to come close to fitting the Ϸ100-ns denaturant independence (Fig. S10) of the observed relaxation rate phase, attributed by the physical-kinetics model to reconfiguration because of the relative insensitivity of the folding and unfolding in the folded well. This relaxation apparently reflects the fine barriers to the contact energy at low and high denaturant concen- structure of the free-energy surface, which is not captured by this tration, respectively (Fig. 7 B and C) (5). The change in the simple theoretical model. The failure of the model may also result free-energy surface also results in the decrease in the viscosity from the implicit assumption of the same prefactor independent of dependence of the relaxation rate as the temperature is increased the type of motion (40) or from an oversimplification in our because of an increased contribution of internal friction in the more treatment of the (W23) quantum yield, which assumes that contact compact structures at higher values of the reaction coordinate (7) with the protonated histidine (H27) one turn away in the helix (Fig. (Fig. S7). The model therefore also predicts that there should be an 1) is the only additional source of fluorescence quenching in the increase in ␾-values with increasing temperature, and in the four folded state. mutants where measurements could be made at 340 K, there is A question that immediately arises in describing the kinetics as indeed a significant increase (Table S1). diffusion on the one-dimensional free-energy surface is: are Q and Because of the complexity of the surface, both experimental and P good reaction coordinates? A critical test is whether the splitting theoretical ␾-values are only approximate. The correct and more probability is close to one-half for the microstates at the free-energy rigorous approach is to compare the new thermal unfolding curves barrier tops of these profiles. We therefore calculated the splitting and relaxation rates that result from the mutation. That is, vary the probability (also called the pfold) from the rate matrix of the master contact energy (and possibly also the conformational entropy equation of the model (see SI Appendix). Fig. 8 shows the distri- change) of the mutated residue to optimally fit the new thermal

18660 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808600105 Kubelka et al. Downloaded by guest on September 30, 2021 unfolding curve and compare the calculated relaxation rate on the Concluding Remarks new free-energy surface with the observed relaxation rate. To test Our analysis of equilibrium and kinetic data using three different this prediction of the model will require more extensive data on the approaches highlights the importance of a theoretical model. INAUGURAL ARTICLE temperature dependence of the kinetics, particularly results from Despite having many fewer adjustable parameters than either more conservative mutations than are currently available. the chemical-kinetics or physical-kinetics models, the Ising-like The utility of the theoretical model, of course, is that if accurate model produces an almost equally good fit to a wide range of it provides a detailed description of the mechanism. Unlike ␾-value experimental data. An obvious question is: Why does such a analysis, which, with rare exceptions (21), provides structural in- simple model, with a perfectly funneled energy landscape and no formation at a single, albeit important point along the reaction distinction between the different amino acid residues, work so coordinate—the transition state—the theoretical model describes well? A possible answer to the first part is that coarse graining the complete evolution of the unfolded to folded structure. This works because of enthalpy–entropy compensation. This issue information is shown in Fig. 7 E and F as the probability of a residue could be explored further by using contact maps generated with being in the native confirmation at each position on the reaction different distance or atomistic criteria or with residue-specific coordinate and the evolution of the contact map (Fig. 7 G–I and potentials (45). The answer to the second part of the question is Fig. S5 A–C). The model predicts that the N-terminal helix (D3 to more biological than physical, and is based on the idea that natural selection minimizes nonnative interactions to prevent F10) forms first, along with an interhelical contact with the first few traps that will slow folding or lead to aggregation (19). residues of the second helix (R14 to F17), followed by the middle What can be done to further test and refine the model? One helix, and finally the C-terminal helix (L21 to E31). The longest- computational test would be to carry out Langevin simulations range contacts between the N- and C-terminal helices (V9-K32, of a coarse-grained representation of the protein. These calcu- F10-L28) do not begin to form until very late along the reaction lations could determine how many contiguous sequences would coordinate. Diagrams of the individual microstates of the transi- be required in the model to adequately describe the folding tion-state ensemble at 310 K (Figs. S8 and S9) show that for both process. An experimental test would be to carry out mutational Q and P reaction coordinates, the interaction between the helical studies that can be interpreted more rigorously, in which the residues F6-K7-A8 with the G11-M12-T13 loop contributes most to change in the amino acid is a simple methyl-group deletion (e.g., the stability (Fig. 1) (reflected in the peaks of the theoretically threonine to serine or leucine to norvaline). Finally, a more calculated ␾-values in Fig. 5), so that forming the transition state complete description of the mechanism can, of course, be ensemble does not correspond to simple helix nucleation. obtained from the kinetic equations that describe the intercon- BIOPHYSICS There have been a large number of simulations of various types version of all microstates of the theoretical model (Eq. S13). with the aim of describing the dynamical properties and folding Solution of these equations will yield the distribution of pathways mechanism of the villin subdomain (see SI Appendix foralistof that the protein takes from the folded to the unfolded state, as simulation references). So an obvious question might be: How does can be obtained by molecular simulations, and might be tested this evolution of the structure compare with the predictions of by single-molecule FRET experiments (46). simulations that provide much more structural detail than can be Materials and Methods

obtained from our theoretical model? We believe that it is prema- CHEMISTRY ture to make detailed comparisons for at least two reasons. First, The 35-residue villin subdomain (LSDED FKAVF GMTRS AFANL PLWKQ QHLKK EKGLF—helical residues in bold type) was obtained from California Peptide none of the directly measured experimental quantities have been Research. Solutions were buffered with 20 mM sodium acetate at pH 4.9. calculated in any of the simulations—most importantly, the tem- Details of the differential scanning calorimetry experiments have been de- perature dependence of the heat capacity that tests their thermo- scribed by Godoy-Ruiz et al. (6). The absolute heat capacity for the fully F ϭ ϩ dynamic accuracy. Second, although the agreement between the formed native structure was taken from the study of Freire, using CP (b Ϫ Ϫ1 Ϫ1 ϭ Ϫ1 Ϫ1 ϭ ϫ Ϫ3 predictions of our theoretical model and experimental results is c(T 273.15))Mr cal K mol , with b 0.329 cal K g , c 1.9 10 cal Ϫ2 Ϫ1 ϭ Ϫ1 impressive, as pointed out above, more extensive mutagenesis will K g , and the molecular weight of the protein M␥ 4084 g mol . Godoy-Ruiz et al. argued that the larger surface-to-volume ratio for this be required to more rigorously test the predicted order of structure 35-residue protein, with a greater fraction of surface residues having confor- formation. Nevertheless, there are two intriguing results that should mational freedom of the side chains compared with the much larger proteins be mentioned. Shakhnovich and coworkers used Monte Carlo used in the Freire study (47), justified the use of the Freire parameters that simulations of an atomistic model to calculate ␾-values from the produce the highest heat capacity within the experimental uncertainty. fraction of native contacts in the ensemble of microstates with 0.4 Ͻ CD was measured with a Jasco J-720 spectropolarimeter at a protein p Ͻ 0.6 (41). Although their calculated ␾-values are uniformly concentration of 0.2 mM. Fluorescence thermal unfolding curves were mea- fold ␮ higher at 300 K than we observe at 310 K (Fig. 5), the pattern of sured with a SPEX Fluorolog spectrofluorometer at 10 M concentration. ␾ Folding kinetics were measured at 0.5 mM by using a laser-temperature-jump -values indicates that the N-terminal and middle helices form apparatus described in ref. 38. Each kinetic progress curve was obtained from before the C-terminal helix. Pande and coworkers have carried out the average of 512 laser shots. The size of the temperature jump (7–10 K) was all-atom molecular-dynamics calculations in explicit solvent (42, calibrated by using the temperature dependence of N-acetyltryptophana- 43). They find that the native conformation is in fast exchange with mide fluorescence. ensembles of conformations containing the middle and C-terminal helices and conformations containing only the middle helix, pro- ACKNOWLEDGMENTS. We thank Attila Szabo, Peter Wolynes, Victor Mun˜oz, Ϸ and Eugene Shakhnovich for many helpful discussions. This work was sup- viding a possible explanation of the 100-ns relaxation (see also ported by the Intramural Research Program of the National Institute of ref. 44). Diabetes and Digestive and Kidney Diseases, National Institutes of Health.

1. Kubelka J, Eaton WA, Hofrichter J (2003) Experimental tests of villin subdomain folding 6. Godoy-Ruiz R, et al. (2008) Estimating free energy barrier heights for an ultra- simulations. J Mol Biol 329:625–630. fast folding protein from calorimetric and kinetic data. J Phys Chem B 112:5938– 2. Chiu TK, et al. (2005) High-resolution x-ray crystal structures of the villin headpiece 5949. subdomain, an ultrafast folding protein. Proc Natl Acad Sci USA 102:7517–7522. 7. Cellmer T, Henry ER, Hofrichter J, Eaton WA (2008) Measuring internal friction in 3. Buscaglia M, Kubelka J, Eaton WA, Hofrichter J (2005) Determination of ultrafast ultrafast folding kinetics. Proc Natl Acad Sci USA, in press. protein folding rates from loop formation dynamics. J Mol Biol 347:657–664. 8. McKnight CJ, Doering DS, Matsudaira PT, Kim PS (1996) A thermostable 35-residue 4. Kubelka J, Chiu TK, Davies DR, Eaton WA, Hofrichter J (2006) Sub-microsecond protein subdomain within villin headpiece. J Mol Biol 260:126–134. folding. J Mol Biol 359:546–553. 9. McKnight CJ, Matsudaira PT, Kim PS (1997) NMR structure of the 35-residue villin 5. Cellmer T, Henry ER, Kubelka J, Hofrichter J, Eaton WA (2007) Relaxation rate for an headpiece subdomain. Nat Struct Biol 4:180–184. ultrafast folding protein is independent of chemical denaturant concentration. JAm 10. Frank BS, Vardar D, Buckley DA, McKnight CJ (2002) The role of aromatic residues in the Chem Soc 129:14564–14565. hydrophobic core of the villin headpiece subdomain. Protein Sci 11:680–687.

Kubelka et al. PNAS ͉ December 2, 2008 ͉ vol. 105 ͉ no. 48 ͉ 18661 Downloaded by guest on September 30, 2021 11. Kubelka J, Hofrichter J, Eaton WA (2004) The protein folding ‘speed limit’. Curr Opin 31. Shakhnovich E (2006) Protein folding thermodynamics and dynamics: Where physics, Struct Biol 14:76–88. chemistry, and biology meet. Chem Rev 106:1559–1588. 12. Muñoz V, Eaton WA (1999) A simple model for calculating the kinetics of protein 32. Berezhkovskii A, Szabo A (2006) A perturbation theory of phi-value analysis of two- folding from three-dimensional structures. Proc Natl Acad Sci USA 96:11311–11316. state protein folding: Relation between p(fold) and phi values. J Chem Phys 13. Henry ER, Eaton WA (2004) Combinatorial modeling of protein folding kinetics: free 125:104902. energy profiles and rates. Chem Phys 307:163–185. 33. Ma HR, Gruebele M (2005) Kinetics are probe-dependent during downhill folding of an 14. Muñoz V, Thompson PA, Hofrichter J, Eaton WA (1997) Folding dynamics and mech- engineered lambda(6–85) protein. Proc Natl Acad Sci USA 102:2283–2287. anism of beta-hairpin formation. Nature 390:196–199. 34. Naganathan AN, Doshi U, Muñoz V (2007) Protein folding kinetics: Barrier effects in 15. Muñoz V, Henry ER, Hofrichter J, Eaton WA (1998) A statistical mechanical model for chemical and thermal denaturation experiments. J Am Chem Soc 129:5673–5682. ␤-hairpin kinetics. Proc Natl Acad Sci USA 95:5872–5879. 35. Brewer SH, et al. (2005) Effect of modulating unfolded state structure on the folding 16. Alm E, Baker D (1999) Prediction of protein-folding mechanisms from free-energy kinetics of the villin headpiece subdomain. Proc Natl Acad Sci USA 102:16662–16667. landscapes derived from native structures. Proc Natl Acad Sci USA 96:11305–11310. 36. Brewer SH, Song BB, Raleigh DP, Dyer RB (2007) Residue specific resolution of protein 17. Galzitskaya OV, Finkelstein AV (1999) A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc Natl Acad Sci USA 96:11299–11304. folding dynamics using isotope-edited infrared temperature jump spectroscopy. Bio- 18. Alm E, Morozov AV, Kortemme T, Baker D (2002) Simple physical models connect chemistry 46:3279–3285. theory and experiment in protein folding kinetics. J Mol Biol 322:463–476. 37. Zwanzig R, Szabo A, Bagchi B (1992) Levinthals paradox. Proc Natl Acad Sci USA 19. Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and the 89:20–22. energy landscape of protein-folding—A synthesis. Proteins Struct Funct Genet 21:167– 38. Thompson PA, et al. (2000) The helix-coil kinetics of a heteropeptide. J Phys Chem B 195. 104:378–389. 20. Onuchic JN, Luthey-Schulten A, Wolynes PG (1997) Theory of protein folding: The 39. Berezhkovskii A, Szabo A (2004) Ensemble of transition states for two-state protein energy landscape perspective. Ann Rev Phys Chem 48:545–600. folding from the eigenvectors of rate matrices. J Chem Phys 121:9186–9187. 21. Oliveberg M, Wolynes PG (2005) The experimental survey of protein-folding energy 40. Portman JJ, Takada S, Wolynes PG (2001) Microscopic theory of protein folding rates. landscapes. Quart Rev Biophys 38:245–288. II. Local reaction coordinates and chain dynamics. J Chem Phys 114:5082–5096. 22. Fersht A (1999) Structure and Mechanism in Protein Science (Freeman, New York). 41. Yang JS, Wallin S, Shakhnovich EI (2008) Universality and diversity of folding mechanics 23. Garbuzynskiy SO, Finkelstein AV, Galzitskaya OV (2004) Outlining folding nuclei in for three-helix bundle proteins. Proc Natl Acad Sci USA 105:895–900. globular proteins. J Mol Biol 336:509–525. 42. Jayachandran G, Vishal V, Pande VS (2006) Using massively parallel simulation and 24. Kramers HA (1940) Brownian motion in a field of force and the diffusion model of Markovian models to study protein folding: Examining the dynamics of the villin chemical reactions. Physica VII:284–304. headpiece. J Chem Phys 124:164902. 25. Bryngelson JD, Wolynes PG (1989) Intermediates and barrier crossing in a random 43. Jayachandran G, Vishal V, Garcia AE, Pande VS (2007) Local structure formation in energy-model (with applications to protein folding). J Phys Chem 93:6902–6915. simulations of two small proteins. J Struct Biol 157:491–499. 26. Sali A, Shakhnovich E, Karplus M (1994) How does a protein fold. Nature 369:248–251. 44. Lei HX, Duan Y (2007) Two-stage folding of HP-35 from ab initio simulations. J Mol Biol 27. Socci ND, Onuchic JN, Wolynes PG (1996) Diffusive dynamics of the reaction coordinate 370:196–206. for protein folding funnels. J Chem Phys 104:5860–5868. 28. Onuchic JN, Wolynes PG, Luthey-Schulten Z, Socci ND (1995) Toward an outline of 45. Miyazawa S, Jernigan RL (1985) Estimation of effective interresidue contact energies the topography of a realistic protein-folding funnel. Proc Natl Acad Sci USA from protein crystal structures—Quasi-chemical approximation. Macromoleules 92:3626–3630. 18:534–552. 29. Du R, Pande VS, Grosberg AY, Tanaka T, Shakhnovich EI (1998) On the transition 46. Schuler B, Eaton WA (2008) Protein folding studied by single-molecule FRET. Curr Opin coordinate for protein folding. J Chem Phys 108:334–350. Struct Biol 18:16–26. 30. Best RB, Hummer G (2005) Reaction coordinates and rates from transition paths. Proc 47. Freire E (1995) in Protein Stability and Folding. Theory and Practice, ed Shirley BA Natl Acad Sci USA 102:6732–6737. (Humana, Totowa, NJ), pp 191–218.

18662 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0808600105 Kubelka et al. Downloaded by guest on September 30, 2021