Collapse kinetics and chevron plots from simulations of denaturant-dependent folding of globular proteins

Zhenxing Liua, Govardhan Reddya, Edward P. O’Briena, and D. Thirumalaia,b,1

aBiophysics Program, Institute for Physical Science and Technology, University of Maryland, College Park, MD 20742; and bDepartment of Chemistry, University of Maryland, College Park, MD 20742

Edited* by William A. Eaton, National Institutes of Health-National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, and approved March 8, 2011 (received for review December 24, 2010)

Quantitative description of how proteins fold under experimental (folding arm) followed by an increase (unfolding arm) at high conditions remains a challenging problem. Experiments often use ½C. In previous computational studies, chevron plots were gen- urea and guanidinium chloride to study folding whereas the nat- erated by mapping temperature or certain interaction parameters ural variable in simulations is temperature. To bridge the gap, we in the force-field to ½C (16). Although these studies have pro- use the molecular transfer model that combines measured denatur- vided insights into the role of nonnative interactions they fall ant-dependent transfer free energies for the peptide group and short of directly probing folding of proteins in the presence of amino acid residues, and a coarse-grained Cα-side chain model denaturants. Here, we solve the long-standing problem of how for polypeptide chains to simulate the folding of src SH3 domain. to calculate chevron plots, and in the process describe the entire Stability of the native state decreases linearly as ½C (the concentra- folding reaction of a small protein in near quantitative detail. tion of guanidinium chloride) increases with the slope, m, that is in Molecular simulations of coarse-grained off-lattice models (14, excellent agreement with experiments. Remarkably, the calculated 24, 25), are ideally suited to make (semi)quantitative predictions folding rate at ½C¼0 is only 16-fold larger than the measured of the folding reaction. A combination of statistical mechanical k k value. Most importantly ln obs ( obs is the sum of folding and approach and simulations using simple off-lattice models has unfolding rates) as a function of ½C has the characteristic V (chev- provided insights into mechanisms. In particular, ron) shape. In every folding trajectory, the times for reaching the these approaches have shown how linear free energy relations for native state, interactions stabilizing all the substructures, and the seemingly complex folding reactions emerge naturally (26). BIOPHYSICS AND

mf COMPUTATIONAL BIOLOGY global collapse coincide. The value of m (mf is the slope of the fold- More importantly, the theoretical ideas rationalize the robustness ing arm of the chevron plot) is identical to the fraction of buried of folding in terms of topology alone as illustrated in the simula- solvent accessible surface area in the structures of the transition tions of structure-based models of small globular proteins (27). state ensemble. In the dominant transition state, which does not Of particular relevance to the present study is the demonstration vary significantly at low ½C, the core of the protein and certain that solvation forces determine the origin of barriers in SH3 loops are structured. Besides solving the long-standing problem domains, which helped rationalize the effect of specific mutations of computing the chevron plot, our work lays the foundation for on the folding rates (28). Here, we extend the molecular transfer CHEMISTRY incorporating denaturant effects in a physically transparent man- model (MTM) (29–31) to map in quantitative detail the thermo- ner either in all-atom or coarse-grained simulations. dynamics and to a lesser extent kinetics of folding of src SH3 do- Δ main as a function of GdmCl and urea. We show that GNUð½CÞ kinetic cooperativity ∣ self-organized polymer model ∣ pathway diversity ∣ decreases linearly as ½C, with the slopes (m) being in quantitative protein denaturation agreement with experiments. Simulations of the ½C-dependent ln kobs has the characteristic chevron shape, and the values of nderstanding how proteins fold in quantitative molecular the slopes of the folding (mf ) and unfolding (mu) arms are in ex- Udetail can provide insights into protein aggregation and dy- cellent agreement with experiments. There is a single dominant namics of formation of multisubunit complexes. As a result there transition state whose structure shows that the core β-sheets are have been intense efforts in deciphering the folding mechanisms well packed and the stiff distal loop is ordered. Contacts involving of proteins using a variety of experiments (1–6), theories (7–13), the hydrophobic core and the other secondary structural ele- and simulations (14–16). Despite these advances, a number of ments, overall collapse, and formation of the entire native struc- issues such as the denaturant-dependent characteristics of the ture form nearly simultaneously at zero and low ½C. Our work unfolded states and the link between protein collapse and fold- solves the long-standing problem of describing the entire dena- ing remain unclear (17–21). Single molecule experiments have turant-dependent folding of a protein, including the creation of provided clear evidence for folding pathway diversity, and have the chevron plot, albeit at a coarse-grained level. shown that polypeptide chains undergo almost continuous col- lapse as the denaturant concentration (½C) is decreased (1, Results 19–21). These and other experiments (5, 22) raise the need for Structure of src SH3. To establish the utility of MTM we computed computational models whose predictions can be directly com- several experimentally measurable thermodynamic and kinetic pared to experiments, which often use denaturants [guanidinium properties of the 56-residue src SH3 domain (Fig. 1A) (29) (see chloride (GdmCl) and urea] to initiate folding and unfolding. Methods for details). The core of SH3 domains, whose folding has Global thermodynamic and kinetic behavior of small proteins been extensively investigated using experiments (32–35) and si- often exhibit two-state behavior (23), which implies that at all

values of ½C only the population of native (N) and the unfolded Author contributions: Z.L., G.R., E.P.O, and D.T. designed research; Z.L., G.R., and D.T. (U) states are detectable. Experimentally two-state behavior is performed research; D.T. contributed new reagents/analytic tools; Z.L., G.R., and D.T. Δ characterized by (i) the free energy of stability, GNUð½CÞ analyzed data; and Z.L., G.R., and D.T. wrote the paper. − ð¼ GN ð½CÞ GU ð½CÞÞ,ofN with respect to U, varies linearly The authors declare no conflict of interest. with ½C, and (ii) the of the relaxation rate kobs, which *This Direct Submission article had a prearranged editor. is the sum of folding and unfolding rates, is V-shaped (“chevron 1To whom correspondence should be addressed. E-mail: [email protected]. ” plot ) when plotted as a function of ½C. For two-state systems This article contains supporting information online at www.pnas.org/lookup/suppl/ chevron plots show a linear decrease in ln kobs as ½C increases doi:10.1073/pnas.1019500108/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1019500108 PNAS Early Edition ∣ 1of6 Downloaded by guest on September 29, 2021 mulations (28, 36–38), consists of two well-packed three stranded Δ − f NBA ð½CÞ f NBAð½CÞ leading to GNUð½CÞ ¼ kBTs lnð1− Þ. The line- β f NBA ð½CÞ -sheets that are connected by RT, n-src, and short distal loops Δ Δ 0 ar fit (Fig. 1C), GNUð½CÞ ¼ GNUð½ Þ þ m½C, yields (Fig. 1A). Δ 0 −3 4 ∕ 1 34 ∕ GNUð½ Þ ¼ . kcal mol and m ¼ . kcal mol·M. We also computed ΔG C using free energy profiles as a function of χ Denaturant-Dependent Folding Thermodynamics. Given the approx- NUð½ Þ (Eq. S16), and obtained m ¼ 1.47 kcal∕mol·M. The experimen- imations in our model (see Methods and SI Text), it is difficult 1 5 ∕ to get absolute free energy differences between the folded and tally inferred m value using is . kcal mol·M 1 6 ∕ unfolded states from simulations. To make comparisons with whereas m ¼ . kcal mol·M using fluorescence measurements. The values from our simulation for m ð1.34–1.47Þ kcal∕mol·M experiments, we choose a simulation temperature, Ts, at which the calculated free energy of stability of the native state (N) with are in excellent agreement with experimental estimates, which Δ − further shows that MTM can predict the global thermodynamic respect to the unfolded state (U), GNUðTsÞ (GN ðTsÞ GU ðTsÞ), 295 Δ properties accurately. and the measured free energy at TEð¼ KÞ GNUðTEÞ coin- Δ Δ 0 Although the calculated value of ΔG 0 is in good agree- cide. The use of GNUðTsÞ¼ GNUðTEÞ (in water with ½C¼ ) NUð½ Þ to fix Ts is equivalent to choosing a overall reference energy ment with the experimental result, it is not as accurate as m-value Δ 295 scale in the simulations. For src SH3, GNUðTE ¼ KÞ¼ because of the known difficulties in obtaining accurate results at −4 1 ∕ 0 339 . kcal mol at ½C¼ (34), which results in Ts ¼ K. low GdmCl concentrations (40). To ensure that MTM can predict Δ 0 Besides the choice of Ts no other parameter is adjusted to fit GNUð½ Þ accurately we calculated f NBAð½CÞ for urea (green 339 Δ any other aspect of src SH3 folding. With Ts ¼ K fixed, we squares in Fig. 1B) from which we obtained GNUð½CÞ (Fig. 1C). Δ 0 −4 2 ∕ calculated the dependence of fraction of molecules in the native The linear fit yields GNUð½ Þ ¼ . kcal mol, which is in S18 basin of attraction (39), f NBAð½CÞ,on½C using Eq. . The excellent agreement with measurements. The m value for urea agreement between measured and simulated results for is 0.92 kcal∕mol·M (less than m for GdmCl), which is consistent f NBAð½CÞ versus ½C is excellent (Fig. 1B). The midpoint concen- with the observation that folding is less cooperative in urea 0 5 2 5 tration, Cm, obtained using f NBAð½CmÞ ¼ . is ½C¼ . M, also than in GdmCl (Fig. 1B). The results also confirm that GdmCl agrees with the measured experimental value of 2.6 M (34). is more efficient (C ¼ 2.5 M) in unfolding proteins than urea Δ m The native state stability with respect to U, GNUð½CÞ (C ¼ 4.7 M). Our predictions for equilibrium folding in urea − m ðGN ð½CÞ GU ð½CÞÞ, can be calculated using a two-state fit to can be experimentally tested. 1 ∑ 2 1∕2 Equilibrium Collapse. The dependence of hRgi¼h2N2 riji , where rij is the distance between all interaction centers of the protein and N (¼112 for src SH3) is the total number of beads, on ½C shows compaction of the denatured state ensemble (DSE) structures of SH3 domain below Cm (Fig. 2A). By following the population of conformations in the unfolded basin of attraction (UBA) and native basin of attraction (NBA) separately, (see Fig. S1 for description of UBA and NBA using χ as the order N parameter) we find that Rg of the NBA (Rg ) is nearly indepen-

Fig. 1. Thermodynamics of folding. (A) Cartoon representation of src SH3 [Protein Data Bank (PDB) code 1SRL]. Sequence and the corresponding loca- tion of secondary structures are given below. The numbering starts at posi- tion 9. In ref. 34 there is an additional aspartic acid residue at the C terminal.

In addition, the sequence in ref. 34 has arginine at the 52nd position whereas Fig. 2. Equilibrium collapse. (A) Average hRgi (black circles) as a function of our sequence has glutamine. (B) Fraction of molecules in the native state as a GdmCl concentration. Red and green corresponds to hRgi for folded and un- N 11 50 function of denaturant concentration. The folded-unfolded transition region foled structures, respectively. Blue symbol is constructed using Rg ð¼ . ÅÞ, dfNBA ð½CÞ U 24 04 8 0 is assessed using full width at half-maximum of d½C , which in experiments Rg ð¼ . ÅÞ at ½C¼ . M, and f NBAð½CÞ (see text for details). The Inset 1 0 and simulations for GdmCl is 1.42 M and 1.71 M respectively. Red (green) are shows PðRgÞ at ½C¼ . M for DSE. (B) Distribution PðRgÞ of radius of gyration 0 the simulation results for GdmCl (urea) and the black dots are experimental Rg for various concentrations of GdmCl. The Inset shows PðRgÞ for ½C¼ M Δ 2 5 5 0 results. (C) The dependence of GNUð½CÞ on ½C for GdmCl (Left) and urea (black), ½C¼ . M (navy), and ½C¼ . M (orange) corresponding to the 15 (Right). In (B) and (C) the denaturant concentration is in molarity. extended conformations (Rg > Å).

2of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1019500108 Liu et al. Downloaded by guest on September 29, 2021 dent of ½C (Fig. 2A). In contrast, the size of the structures in the UBA decrease continuously (41) and is about 17% more compact under folding condition than at high ½Cð>CmÞ (Fig. 2A). The UBA variations in the size of the UBA, hRg ð½CÞi,as½C is de- creased, is reflected in the ½C-dependent changes in hRgi (black circles in Fig. 2A). If the collapse of the DSE is neglected then UBA ≈ U 24 04 8 0 hRg ð½CÞi Rg ¼ . Å, the value at ½C¼ . M (Fig. 2A). The distribution PðRgÞ of Rg shows that as ½C increases src SH3 samples extended conformations with Rg exceeding 30 Å (Fig. 2B). Comparison of PðRgÞ of the DSE below Cm (Fig. 2A, Inset) and above Cm (the DSE region at high [C] in Fig. 2B, Inset) 15 shows that compact conformations (Rg < Å) in DSE are sampled only under folding conditions. Compaction of DSE below Cm implies that the dynamics of unfolded states must be different under folding conditions relative to high ½C as shown recently for protein L (42).

Folding Kinetics and the Chevron Plot. We calculated the ½C-depen- dent folding (unfolding) rates from folding (unfolding) trajec- tories, which were generated from Brownian dynamics using 3 the effective energy function, HPðfrig;½CÞ (Eq. and SI Text 0 for details). From sixty (fifty for ½C¼ ) folding trajectories Fig. 3. Folding kinetics. (A) Fraction of molecules that have not folded the fraction of unfolded molecules at time t, is computed using (½C¼0.0 M, and ½C¼1.0 M) or unfolded (½C¼5.0 M) as a function of 1 −∫t PuðtÞ¼ 0PfpðsÞds, where PfpðsÞ is the distribution of first time. The lines are exponential fits to the data. (B) Comparison of chevron − ≈ tkf ½C plots obtained from simulations and experiment. The scale for the experi- passage times. For some of the ½C values, we fit PuðtÞ e mental plot for ln kobs is on the left, and for the simulations is on the right. under folding conditions (½C < Cm) from which kf ð½CÞ can be

The slopes of the calculated folding (mf ) and unfolding (mu)are BIOPHYSICS AND extracted. Similarly, a single exponential fit for some values of 0.95 kcal∕mol·M and 0.60 kcal∕mol·M, respectively, which are in good agree- ½C >Cm yields kuð½CÞ. At high (low) ½C, we can approximate ment with experiment. COMPUTATIONAL BIOLOGY 2 5 kobs ¼ kf ð½CÞ þ kuð½CÞ as kuð½CÞ (kf ð½CÞ). For ½C¼ . M and ≈ 3.0 M ( Cm) the folding and unfolding rates are comparable, and hence we expect m ≈ m þ m . From the simulated chevron ≈ f u kuð½CÞ kf ð½CÞ. At these concentrations we performed both plot, we obtain m ≈ 1.55 kcal∕mol·M, which is in excellent agree- folding and unfolding simulations and by fitting the results for ment with m ð1.34–1.47 kcal∕mol·MÞ obtained from equilibrium P t using exponential function we extracted k C and ≈ ≈ uð Þ uð½ Þ calculations (Fig. 1C). The midpoint Cm at which kf ku is Cm k ð½CÞ (see SI Text for details). We obtained k ð½0Þ by globally 2 2 2 5 CHEMISTRY f u . M (Fig. 3B), which nearly coincides with Cm ¼ . M calcu- 0 5 fitting the relaxation rate, kobs ¼ kf ð½CÞ þ kuð½CÞ, using ln kobs ¼ lated using f NBA½Cm¼ . (Fig. 1B). Thus, both thermodynamic − ∕ ∕ 0 mf ½C RT 0 mu½C RT ln½kf ð½ Þe þ kuð½ Þe , where mf (mu) is the slope and kinetic results obtained using MTM simulations are in har- 0 − ∕ of the folding (unfolding) arm with ln kf ¼ ln kf ð Þ mf ½C RT mony, and justifies the two-state description for src SH3 folding. 0 ∕ Comparison of simulation and experiments shows that and ln ku ¼ ln kuð Þþmu½C RT. 0 0 although MTM simulations reproduce the chevron shape well, The dependence of PuðtÞ at ½C¼ . M, 1.0 M, and 5.0 M as a 5 0 the dependence of ln k on ½C does not quantitatively agree function of t is given in Fig. 3A.For½C¼ . M, PuðtÞ represents obs with experiments. In particular, k ð½0Þ from simulations is the fraction of conformations in the NBA. Clearly, kf ð½CÞ de- f 925 9 −1 0 creases as ½C increases, and kuð½CÞ increases as ½C increases. . s whereas the extrapolated value to ½C¼ from experi- 56 7 −1 Most importantly, plot of ln kobs as a function of ½C over the con- ment is . s . Thus, the folding rate from MTM simulations is centration range (0 M ≤ ½C ≤ 6.5 M) of GdmCl shows a classic about 16 times faster that the experimental rate. Deviations from 0 ∼ 2 0 ≪ ∼ chevron shape (Fig. 3B). For ½C¼ . M, ku kf , so that experiments increase to about 40 at C Cm. The differences ≈ ≈ kobs kf and similarly for ½C above 3.5 M, kobs ku. In the between kuð½CÞ obtained using simulations and experiments ≈ 5 0 transition region, kf ku, which requires generation of a large are larger. For example, kuð½C¼ . MÞ differs by a factor of number of trajectories, kobsð½CÞ converges slowly resulting in approximately 150 from experiments (Fig. 3B). The unfolding 0 0 ≈ larger errors. Comparison of the simulation and experimental rate kuð½ Þ (the value in water) from our simulations is kuð½ Þ results (filled black circles in Fig. 3B) allows us to draw two major 7.60 s−1, whereas the extrapolated rate from experiment is 0 1 −1 0 conclusions. (i) The slopes (from the folding and unfolding arms) . s . The larger discrepancy in kuð½ Þ between simulations of the simulated chevron plot are surprisingly similar to the and experiments is due to the substantial error in the simulations, experimental values. (ii) Within error bars in simulations and which occurs because relatively small number of unfolding events experiments, we do not find any deviation from linearity in the are observed at low ½C. The agreement with experiments could chevron plot. Taken together, our simulation results for kobs½C be improved further by taking into account the changes in the semiquantitatively capture all of the experimental features, which denaturant-dependent changes in viscosity. Nevertheless, from 0 0 Δ 0 ≈ is remarkable given the simplicity of the MTM. the calculated kf ð½ Þ and kuð½ Þ values, we infer that GNUð½ Þ 0 − kf ð½ Þ −3 2 ∕ kBTs ln 0 ¼ . kcal mol, which compares favorably with Consistency Between Thermodynamics and Kinetics. From the slope kuð½ Þ of the folding arm (simulation results in Fig. 3B), we obtain estimates from thermodynamic simulations. The coincidence of 0 95 ∕ 0 60 ∕ m value calculated from the linear dependence of ΔG C mf ¼ . kcal mol·M and mu ¼ . kcal mol·M from the NUð½ Þ E with m þ m obtained from chevron plot, and the similarity unfolding arm. The corresponding experimental values are mf ¼ f u 0 99 ∕ E 0 45 ∕ Δ 0 0 0 . kcal mol·M and mu ¼ . kcal mol·M, which shows very between GNUð½ Þ and the estimate using kf ð½ Þ and kuð½ Þ good agreement between experiments and simulations although allows us to conclude that a two-state model is accurate for fold- E mu is somewhat higher than mu . For a two-state description, ing of src SH3. Although such a conclusion has been reached thermodynamics and kinetics simulations should be consistent, using experiments it is gratifying that we can capture the global

Liu et al. PNAS Early Edition ∣ 3of6 Downloaded by guest on September 29, 2021 aspects of folding thermodynamics and kinetics using simulations symbols in Fig. 4 C and D shows the correlation between the first based on MTM. passage times for establishing interactions between β1 − β2 and β2 − β3. The quantitative picture that emerges is that unfolding Collapse Kinetics, Folding, and Unfolding Pathways. The kinetics of and folding in src SH3 occurs in a single step in a highly coopera- collapse of src SH3 domain at ½C¼0 and ½C¼1.0 M shows tive manner although there is pathway diversity in the dynamics of (Fig. 4A) that hRgðtÞi decays with a single rate constant, kcð½CÞ, individual folding trajectories (Fig. S2). the rate of collapse. The values of kcð½CÞ from the data in Fig. 4A are similar to kf ð½CÞ, which shows compaction and Structures of the Transition State Ensemble (TSE). We first deter- ; δ folding occur nearly simultaneously. The inset shows PðRg tÞ mined Pð Þ, the probability of forming native contacts as a func- 0 ≈ τ ≈ 1 0 −1 ≈ 4 2 34 δ ∕τ τ at t ¼ , t c ðkcð½C¼ . MÞÞ . ms, and t ¼ ms tion of the progress variable ¼ t 1i, where 1i is the first passage δ ∕ δ (the first passage time for the slowest trajectory), respectively. time for the ith trajectory. The values of TS where dP d starts to ≈ τ Even after collapse (t c) the protein samples, with small prob- increase rapidly is identified with the transition state (TS) region, 12 N ability (middle inset), conformations with Rg > Å(Rg ¼ and the conformations in this neighborhood are grouped to ob- 11.5 Å). Thus, the ensemble of kinetically collapsed conforma- tain the TSE (44). We determined the structures of the TSE using tions is a mixture of folded and specific (native-like) and nonspe- a procedure that is similar to the progress variable clustering cific compact structures (43). Although at an ensemble level method (45, 46), which uses a neural network algorithm to classify ≈ kcð½CÞ kf ð½CÞ, examination of the dynamics of acquisition of TSE structures without using surrogate reaction coordinates. structure (χðtÞ as a function of t in Fig. S2) shows heterogeneity, The global characteristics of the TSE is experimentally de- β ∕ β 1 − β which is masked in the ensemble averages. The link between com- scribed using the Tanford f ¼ mf m (or u ¼ f ). From the paction and acquisition of secondary structure, measured in βE 0 69 experimental chevron plot f ¼ . whereas our simulations terms of fraction (f ss) of secondary structure element, shows that give βS ¼ 0.61. It is generally assumed that β is related to the these two processes also occur simultaneously (Fig. 4B) in src f f buried solvent accessible surface area (SASA) at the TS. For SH3. Due to the cooperative nature of folding there is very little the TSE obtained in our simulations, we calculated the ½C-de- dependence of these processes on ½C. Δ 1 0 pendent distribution Pð RÞ (see Fig. 5A for ½C¼ . M), where Kinetic cooperativity is most vividly illustrated in Fig. 4C for Δ Δ − Δ ∕ Δ − Δ Δ Δ Δ ½C¼1.0 M and Fig. 4D for ½C¼6.0 M. In Fig. 4c, we plot the R ¼ð U TSEÞ ð U N Þ and U , TSE, N are the SASA 1 0 values in the DSE (½C¼6.0 M), TSE, and the NBA (½C¼ first passage times for reaching the native state at ½C¼ . M 1 0 Δ 0 68 and for establishment of interactions involving various structural . M), respectively. We found that the average h Ri¼ . . The elements for the sixty folding trajectories. All the first passage times coincide (all measures of structure content form at the same time), which shows that in the final stages (after crossing the free energy barrier) the entire structure is consolidated simul- taneously. Similarly, unfolding at ½C¼6.0 M as measured by loss of structures of the whole protein or various fragments (β12, β23, β34, and β45) also occur simultaneously (Fig. 4D). The black

Fig. 4. Folding and unfolding pathways. (A) Collapse kinetics monitored by

the time-dependent average hRgðtÞi as a function of t. Collapse (expansion) Δ decreases (increases) in hRgðtÞi can be fit using a single exponential function. Fig. 5. Transition state ensemble structures. (A) Distribution Pð RÞ,ofthe ; 0 ≈ τ ≈ 1 0 −1 ≈ 4 2 Δ Δ − Δ ∕ Δ − Δ The Inset shows PðRg tÞ at t ¼ , t c ðkcð½C¼ . MÞÞ . ms, and R ¼ð U TSEÞ ð U NÞ, which is the fraction of buried solvent accessible 34 Δ t ¼ ms, the first passage time for the slowest trajectory, respectively. surface area relative to the unfoled structures. The average values of N and Δ 5270 2 8189 2 Δ 0 68 (B) Projection of kinetic folding trajectories in terms of fraction of secondary U are Å and Å , respectively. The average h Ri¼ . , which structure formation, f ss, and hRgi.(C) First passage time for formation of var- coincides with Tanford β parameter. (B) Contact map of the native state ious interactions, indicated in the figure, from the sixty folding trajectories at ensemble (Upper Left) and the one for the TSE structures (Lower Right). 1 0 τβ12 ½C¼ . M. The solid black symbols show the correlation between i and The scale on the right gives the probablity of contact formation. The number τβ23 β − β i , which are first passage times for formation of interactions 1 2 and of native contacts in the folded state is 507. (C) Superposition of a few β2 − β3, respectively. (D) Same as (C) except the results are for unfolding structures of the dominant TSE cluster (Left) shows formation of the core at ½C¼6.0 M. Similarity between (C) and (D) establishes the reversibility with disorder mostly in the RT loop (see text). The subdominant cluster has between folding and unfolding. well-formed β1, β2, and β3 hydrophobic core.

4of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1019500108 Liu et al. Downloaded by guest on September 29, 2021 βS βE Δ close correspondence between f (or f ) and h Ri affirms the states. In contrast, in the folding of Cytochrome c, monitored by standard interpretation of the Tanford parameter. The small SAXS and CD, it has been shown that collapse is preceded by fluctuation, σ2 ¼ðhΔ2 i − hΔ i2Þ∕hΔ i2 ¼ 0.1, shows that the acquisition of secondary structure (51, 52). We suggest that with R R R decreased cooperativity there could be a separation in the time TSE in src SH3 is conformationally restricted. From the lower half of Fig. 5B, which shows the contact map scale between these two distinct processes (53). obtained from the TSE structures, it is clear that relative to the native state (top half) there are variations in the extent to which Folding Kinetics. The most important achievement of our study is the structure is ordered in the transition region. The core of the the simulation of the chevron plot. The difference between the protein involving interactions in strands β1, β2 and to a lesser ex- calculated and measured kobs could arise from neglect of nonna- tent β3 form with substantial probability in the TSE. The distal tive interactions and possibly from not using explicit models for loop is structured in the TS as are residues involving the diverging water and GdmCl. Although one can appeal to extensive studies turn. It is likely that the necessity to form the distal loop due to its using lattice and offlattice models (44), which show that to a large intrinsic stiffness forces the core of the protein to be ordered in extent the folding mechanisms for foldable sequences are not the TSE (37, 47). Residues from the n-src loop that are part of greatly altered by including nonnative interactions, their role has the hydrophobic core form with great probability in the TSE. to be quantitatively assessed. Nonnative interactions are likely However, the RT-loop, with large conformational fluctuations in not relevant either at very high or very low ½C values but do play the transition state, is disordered in the TSE (structures on the an important role in affecting the thermodynamics in the transi- left in Fig. 5C). tion region (Fig. 1B). Despite these reservations the remarkable We classified the TSE structures according to the similarity to agreement between simulations and experiments shows that the the native state for ½C¼1.0 M (Fig. 5C). In the dominant tran- theory outlined here can yield fundamental insights into protein sition state (found in approximately 72% of the folding trajec- folding using higher resolution models. tories) the core of the structures comprised of β1, β2, and β3 is packed orthogonally as in the native state (structures in Fig. 5C, Concluding Remarks Left). The subdominant member of the TSE (approximately We have produced the most comprehensive picture of denatur- 12%) also has a well-formed central β-sheet but are otherwise ant-dependent folding of src SH3 domain using simulations that not well structured (conformation of Fig. 5C, Right). The calcu- include denaturant effects in conjunction with self-organized lated average structure of the dominant TSE conformations is polymer-side chain (SOP-SC) model for the polypeptide chain. similar to that inferred from experimental ϕ-value analysis. Results for folding thermodynamics are in quantitative agree- BIOPHYSICS AND

The common features of the TSE is that the central β1, β2, β3 ment with experiments. Somewhat surprisingly it is found that COMPUTATIONAL BIOLOGY strands are well structured as are residues in the distal loop. not only do the calculations reproduce the chevron plot but the The present and previous simulations using entirely different predicted values of the ½C-dependent folding rates are in reason- models and methods of determining the TSE structures, are also able agreement with measurements. The nearly ½C-independent in very good agreement (36, 38). Both the structures, which are structure of the dominant transition state structure is broadly Φ consistent with experiments, show that contacts involving β1 − β2 consistent with the structural interpretation based on -value analysis and simulations (at ½C¼0) carried out using entirely sheet and the RT loop forms with low probability. Although CHEMISTRY simulations predict a single dominant transition state structure, different model and methods. Our work shows that the Tanford diversity (assessed by probability distribution of contact forma- β-parameter measures the fraction of buried SASA in the TSE. Δ The theory underlying the MTM is general and can be adopted tion or Pð RÞ) in the TSE structures, which is hard to glean from experiments, points to the inherent plasticity of the TSE, even in in conjunction with all-atom representation of proteins, thus SH3 domains in which there is predominantly a single transi- making it feasible to study denaturant and osmolyte effects using tion state. simulations. Discussion Methods Collapse Transition. Based on Flory theory we expect that radius SOP-Sidechain Model. Simulations were carried out using the SOP model (54) each residue is represented by two interaction centers, one that is located at U ≈ ν ν ≈ 0 6 ≈ 2 of gyration must scale as Rg aDN ( . , aD Å)(48) at the Cα position and the other is at the center of mass of the side chain. Native N ≈ 3 1∕3 – – high ½C. Similarly, in the folded state Rg N Å (49). The state is stabilized by backbone backbone, side chain side chain, and back- – – predicted theoretical values (RN ≈ 11.3 Å and RU ≈ 22.4 Å) bone side chain interactions. For the side chain side chain interactions we g g use the Betancourt–Thirumalai statistical potential (55). The effective energy, are in very good agreement with simulation results. Using the obtained by integrating over solvent (water) degrees of freedom, which U ≈24 04 8 0 N values for Rg ( . Åat½C¼ . M in Fig. 2A) and Rg describes the intrapeptide interactions of the polypeptide chain with coor- ≈11 50 T N 1− dinates frig,is ( . Å) a theoretical estimate hRg i¼f NBAð½CÞRg þð U T f ð½CÞÞRg can be made. However, Fig. 2A shows that hRg i NAT NEI NN [1] NBA EPðfrigÞ ¼ V FENE þ V þ V þ V : UBA LJ LJ does not agree with hRgi. Because the changes in hRg ð½CÞi in small single domain proteins are not large (20–30%), which Detailed descriptions of the terms in Eq. 1 and the parameters of the forcefield are in Tables S1 and S2. We use low friction Langevin simulations predominantly occur below Cm where the population of mole- cules in the UBA is small, it is in general difficult to detect them to obtain thermodynamic properties (56). Folding trajectories are generated using standard small angle X-ray scattering (SAXS) methods. using a Brownian dynamics algorithm (57) using a friction coefficient that corresponds to water viscosity (see SI Text for details). However, a recent study using SAXS has reported continuous UBA reduction in hRg ð½CÞi as ½C is lowered in dihydrofolate reduc- Molecular Transfer Model. The free energy of transferring a protein confor- 0 tase (50), a protein with approximately 150 amino acid residues. mation described by fri g from water (½C¼ ) to aqueous denaturant solution Single molecule experiments also show unambiguous features (½C ≠ 0) is approximated (58, 59) as of collapse for protein L and cold shock protein. Δ δ α ∕α [2] In the folding of src SH3 protein, collapse process and acquisi- Gðfrig;½CÞ ¼ ∑ gðk;½CÞ kðfrigÞ Gly-k-Gly; tion of native structure occur nearly simultaneously (Fig. 4). On k δ the time scale of collapse we find that only about 50% of the where the sum is over backbone and side chain, gðk;½CÞ ¼ mk½Cþbk (see α secondary structures are fully formed (Fig. 4B), which implies Table S3 for mk and bk values, and Table S4 for Gly-k-Gly values) is the transfer α α that there is a dynamic coexistence between folded and unfolded free energy of group k (60, 61), k ðfri gÞ is the SASA of k, and Gly-k-Gly is the

Liu et al. PNAS Early Edition ∣ 5of6 Downloaded by guest on September 29, 2021 ≠ 0 α SASA of the kth group in the tripeptide Gly-k-Gly. Thus, the effective free unfolding trajectories at ½C (see SI Text for details). We compute kðfrigÞ ∂α ≠ 0 k ðfri gÞ energy function for a protein at ½C is and ∂ using the procedure in ref. 62. Note that MTM does not assume r~i whether the protein folds by a two-state or multistate mechanism. If EP ðfrigÞ Δ [3] HPðfrig;½CÞ ¼ EPðfrigÞ þ Gðfrig;½CÞ: captures the nature of folding the MTM can accurately describe the effect of denaturants. We used the procedure in (29), which is accurate provided exhaustive

sampling using EP ðfri gÞ is performed, to obtain thermodynamic properties. ACKNOWLEDGMENTS. This work is supported by the National Science Founda- We use the full effective free energy function (Eq. 3) to generate folding and tion (NSF CHE 09-14033).

1. Schuler B, Eaton WA (2008) Protein folding studied by single-molecule FRET. Curr Opin 31. O’Brien EP, Morrison G, Brooks BR, Thirumalai D (2009) How accurate are polymer Struct Biol 18:16–26. models in the analysis of Forster resonance energy transfer experiments on proteins? 2. Nickson AA, Clarke J (2010) What lessons can be learned from studying the folding of J Chem Phys 130:124903. homologous proteins? Methods 52:38–50. 32. Riddle DS, et al. (1999) Experiment and theory highlight role of native state topology 3. Borgia A, Williams PM, Clarke J (2008) Single-molecule studies of protein folding. in SH3 folding. Nat Struct Biol 6:1016–1024. Annu Rev Biochem 77:101–125. 33. Grantcharova VP, Riddle DS, Baker D (2000) Long-range order in the src SH3 folding 4. Bartlett AI, Radford SE (2009) An expanding arsenal of experimental methods yields transition state. Proc Natl Acad Sci USA 97:7084–7089. an explosion of insights into protein folding mechanisms. Nat Struct Mol Biol 34. Grantcharova VP, Baker D (1997) Folding dynamics of the src SH3 domain. Biochemistry 16:582–588. 36:15685–15692. 5. Shank EA, Cecconi C, Dill JW, Marqusee S, Bustamante C (2010) The folding coopera- 35. Martinez J, Serrano L (1999) The folding transition state between SH3 domains is tivity of a protein is controlled by its chain topology. Nature 465:637–640. conformationally restricted and evolutionarily conserved. Nat Struct Biol 6:1010–1016. 6. Gebhardt JCM, Bornschloegla T, Rief M (2010) Full distance-resolved folding energy 36. Hubner IA, Edmonds KA, Shakhnovich EI (2005) Nucleation and the transition state of landscape of one single protein molecule. Proc Natl Acad Sci USA 107:2013–2018. the SH3 domain. J Mol Biol 349:424–434. 7. Onuchic JN, Wolynes PG (2004) Theory of protein folding. Curr Opin Struct Biol 37. Klimov D, Thirumalai D (2002) Stiffness of the distal loop restricts the structural het- 14:70–75. erogeneity of the transition state ensemble in SH3 domains. J Mol Biol 317:721–737. 8. Thirumalai D, Hyeon C (2005) RNA and protein folding: Common themes and varia- 38. Ding F, Guo WH, Dokholyan NV, Shakhnovich EI, Shea JE (2005) Reconstruction of the tions. Biochemistry 44:4957–4970. src-SH3 protein domain transition state ensemble using multiscale molecular dynamics 9. Shakhnovich E (2006) Protein folding thermodynamics and dynamics: Where physics, simulations. J Mol Biol 350:1035–1050. chemistry, and biology meet. Chem Rev 106:1559–1588. 39. Klimov D, Thirumalai D (1998) Cooperativity in protein folding: From lattice models 10. Dill KA, Ozkan BS, Shell M, Weikl TR (2008) The protein folding problem. Ann Rev with side chains to real proteins. Fold Des 3:127–139. Biophys 37:289–316. 40. Makhatadze G (1999) Thermodynamics of protein interactions with urea and guani- 11. Munoz V, Eaton W (1999) A simple model for calculating the kinetics of protein folding dinium hydrochloride. J Phys Chem B 103:4781–4785. from three-dimensional structures. Proc Natl Acad Sci USA 96:11311–11316. 41. Dasgupta A, Udgaonkar J (2010) Evidence for initial non-specific polypeptide chain 12. Kubelka J, Henry E, Cellmer T, Hofrichter J, Eaton W (2008) Chemical, physical, and collapse during the refolding of the SH3 domain of PI3 kinase. J Mol Biol 403:430–445. theoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA 42. Waldauer S, Bakajin O, Lapidus L (2010) Extremely slow intramolecular diffusion in 105:18655–18662. unfolded protein L. Proc Natl Acad Sci USA 107:13713–13717. 13. Thirumalai D, O’Brien EP, Morrison G, Hyeon C (2010) Theoretical perspectives on 43. Camacho CJ, Thirumalai D (1993) Minimum energy compact structures of random protein folding. Ann Rev Biophys 39:159–183. sequences of heteropolymers. Phys Rev Lett 71:2505–2508. 14. Pincus DL, Cho S, Hyeon C, Thirumalai D (2008) Minimal models for proteins and RNA: 44. Klimov D, Thirumalai D (2001) Multiple protein folding nuclei and the transition state From folding to function. Progress in Molecular Biology and Translational Science, ensemble in two state proteins. Proteins 43:465–475. (Academic, London), 84, pp 203–250. 45. Guo Z, Thirumalai D (1997) The nucleation-collapse mechanism in protein folding: 15. Whitford PC, et al. (2009) An all-atom structure-based potential for proteins: Bridging Evidence for the non-uniqueness of the folding nucleus. Fold Des 2:377–391. minimal models with all-atom empirical forcefields. Proteins 75:430–441. 46. Klimov D, Thirumalai D (1998) Lattice models for proteins reveal multiple folding 16. Zhang Z, Chan HS (2010) Competition between native topology and nonnative inter- nuclei for nucleation-collapse mechanism. J Mol Biol 282:471–492. actions in simple and complex folding kinetics of natural and designed proteins. Proc 47. Spagnolo L, Ventura S, Serrano L (2003) Folding specificity induced by loop stiffness. Natl Acad Sci USA 107:2920–2925. Protein Sci 12:1473–1482. 17. Ziv G, Thirumalai D, Haran G (2009) Collapse transition in proteins. Phys Chem Chem 48. Kohn J, et al. (2004) Random-coil behavior and the dimensions of chemically unfolded Phys 11:83–93. proteins. Proc Natl Acad Sci USA 101:12491–12496. 18. Nettels D, Gopich IV, Hoffmann A, Schuler B (2007) Ultrafast dynamics of protein col- 49. Dima R, Thirumalai D (2004) Asymmetry in the shapes of folded and denatured states lapse from single-molecule photon statistics. Proc Natl Acad Sci USA 104:2655–2660. of proteins. J Phys Chem B 108:6564–6570. 19. Sherman E, Haran G (2006) Coil-globule transition in the denatured state of a small 50. Arai M, et al. (2007) Microsecond hydrophobic collapse in the folding of Escherichia protein. Proc Natl Acad Sci USA 103:11539–11543. coli Dihydrofolate reductase, an alpha/beta-type protein. J Mol Biol 368:219–229. 20. Merchant KA, Best RB, Louis JM, Gopich IV, Eaton WA (2007) Characterizing the 51. Akiyama S, et al. (2002) Conformational landscape of cytochrome C folding studied unfolded states of proteins using single-molecule FRET spectroscopy and molecular by microsecond-resolved small-angle X-ray scattering. Proc Natl Acad Sci USA simulations. Proc Natl Acad Sci USA 104:1528–1533. 99:1329–1334. 21. Sadqi M, Lapidus LJ, Munoz V (2003) How fast is protein hydrophobic collapse? Proc 52. Cardenas A, Elber R (2003) Kinetics of cytochrome C folding: Atomically detailed Natl Acad Sci USA 100:12117–12122. simulations. Proteins 51:245–257. 22. Chung HS, Louis JM, Eaton WA (2009) Experimental determination of upper bound 53. Thirumalai D (1995) From minimal models to proteins: Time scales for protein folding for transition path times in protein folding from single-molecule photon-by-photon kinetics. J Phys I 5:1457–1467. trajectories. Proc Natl Acad Sci USA 106:11837–11844. 54. Hyeon C, Dima RI, Thirumalai D (2006) Pathways and kinetic barriers in mechanical 23. Jackson S (1998) How do small single-domain proteins fold? Fold Des 3:R81–R91. unfolding and refolding of RNA and proteins. Structure 14:1633–1645. 24. Clementi C (2007) Coarse-grained models of protein folding: Toy models or predictive 55. Betancourt M, Thirumalai D (1999) Pair potentials for protein folding: Choice of re- tools? Curr Opin Struct Biol 17:1–6. ference states and sensitivity of predicted native states to variations in the interaction 25. Karanicolas J, Brooks CL, III (2002) The origins of asymmetry in the folding transition schemes. Protein Sci 8:361–369. states of protein L and protein G. Protein Sci 11:2351–2361. 56. Guo Z, Thirumalai D (1996) Kinetics and thermodynamics of folding of a de novo 26. Oliveberg M, Wolynes PG (2005) The experimental survey of protein-folding energy designed four helix bundle. J Mol Biol 263:323–343. landscapes. Q Rev Biophys 38:245–288. 57. Ermak D, McCammon J (1978) Brownian dynamics with hydrodynamic interactions. 27. Clementi C, Nymeyer H, Onuchic J (2000) Topological and energetic factors: What J Chem Phys 69:1352–1360. determines the structural details of the transition state ensemble and “en-route” 58. Pace C (1986) Determination and analysis of urea and guanidine hydrochloride dena- intermediates for protein folding? An investigation for small globular proteins. turation curves. Methods Enzymol 131:266–280. J Mol Biol 298:937–953. 59. Nozaki Y, Tanford C (1963) Solubility of amino acids and related compounds in 28. Fernandez-Escamilla AM, et al. (2004) Solvation in protein folding analysis: Combina- aqueous urea solutions. J Am Chem Soc 288:4074–4081. tion of theoretical and experimental approaches. Proc Natl Acad Sci USA 60. Auton M, Bolen D (2004) Additive transfer free energies of the peptide backbone 101:2834–2839. unit that are independent of the model compound and the choice of concentration 29. O’Brien EP, Ziv G, Haran G, Brooks BR, Thirumalai D (2008) Effects of denaturants scale. Biochemistry 43:1329–1342. and osmolytes on proteins are accurately predicted by the molecular transfer model. 61. Auton M, Holthauzen L, Bolen D (2007) Anatomy of energetic changes accompanying Proc Natl Acad Sci USA 105:13403–13408. urea-induced protein denaturation. Proc Natl Acad Sci USA 104:15317–15322. 30. O’Brien EP, Brooks BR, Thirumalai D (2009) Molecular origin of constant m-values, 62. Hayryan S, Hu CK, Skrivanek J, Hayryan E, Pokorny I (2005) A new analytical method denatured state collapse, and residue-dependent transition midpoints in globular for computing solvent-accessible surface area of macromolecules and its gradients. proteins. Biochemistry 48:3743–3754. J Comput Chem 26:334–343.

6of6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1019500108 Liu et al. Downloaded by guest on September 29, 2021