doi:10.1016/j.jmb.2004.02.024 J. Mol. Biol. (2004) 337, 789–797

Does Native State Topology Determine the RNA Folding Mechanism?

Eric J. Sorin1, Bradley J. Nakatani1, Young Min Rhee1 Guha Jayachandran2, V Vishal1 and Vijay S. Pande1,3,4,5*

1Department of Chemistry Recent studies in folding suggest that native state topology plays a Stanford University, Stanford dominant role in determining the folding mechanism, yet an analogous CA 94305-5080, USA statement has not been made for RNA, most likely due to the strong coupling between the ionic environment and conformational energetics 2Department of Computer that make RNA folding more complex than . Applying a Science, Stanford University distributed computing architecture to sample nearly 5000 complete tRNA Stanford, CA 94305-5080 folding events using a minimalist, atomistic model, we have characterized USA the role of native topology in tRNA folding dynamics: the simulated bulk 3Department of Biophysics folding behavior predicts well the experimentally observed folding Stanford University, Stanford mechanism. In contrast, single-molecule folding events display multiple CA 94305-5080, USA discrete folding transitions and compose a largely diverse, heterogeneous

4 dynamic ensemble. This both supports an emerging view of hetero- Department of Structural geneous folding dynamics at the microscopic level and highlights the Biology, Stanford University need for single-molecule experiments and both single-molecule and bulk Stanford, CA 94305-5080 simulations in interpreting bulk experimental measurements. USA q 2004 Elsevier Ltd. All rights reserved. 5Stanford Synchrotron Radiation Laboratory Stanford University Stanford, CA 94305-5080 USA

Keywords: RNA folding; bulk kinetics; mechanism; distributed computing; *Corresponding author biasing potential

Introduction pathways have been made.6,7 Ferrara and Caflisch studied the folding of peptides forming beta struc- The study of biological self-assembly is at the ture and concluded that the folding free energy forefront of modern biophysics, with many labora- landscape is determined by the topology of the tories around the world focused on answering the native state, whereas specific atomic interactions question “How do biopolymers adopt biologically determine the pathway(s) followed,8 and our active native folds?” In recent years, protein recent comparison between protein and RNA hair- folding studies have put forth a new “topology- pin structures supports this line of reasoning.9 driven” view of folding by recognizing correlations This is not surprising when one considers that between folding mechanism and topology of the the folding transition states for of similar native state.1 For instance, the laboratories of shape were shown to be insensitive to significant Goddard and Plaxco have put forth several reports sequence mutations.10 – 12 connecting the folding rates of small proteins with Here, we attempt to directly characterize the the topological character of those proteins,2–5 and contribution of native state topology to the RNA comparisons between topology and unfolding folding mechanism at both the single-molecule and bulk levels. Because bulk experiments inher- Abbreviations used: RMSD, root-mean-squared ently include ensemble-averaged observables, deviations; LJ, Lennard–Jones. with distributions around those averages expected E-mail address of the corresponding author: within a given system, single-molecule studies [email protected] complement bulk studies by allowing observation

0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. 790 Bulk tRNA Folding Simulations of dynamics that are masked by such ensemble- at best, the simplified biasing potential used here averaging.13 – 15 Yet differences between single- presents an ideal methodology to probe the molecule and bulk folding behaviors can be relationship between topology and folding mecha- profound, as detailed herein. nism: the folding potential is built on information In the past, the ability of simulation to predict from the native tRNA fold. If the relationship bulk system dynamical properties has been hin- between topology and folding mechanism is negli- dered by the limited number of events that one gible, our results should show significant disagree- could simulate. Using a distributed computing ment with experimental observations. If, on the architecture16 to overcome this barrier, we have other hand, this relationship is significant, the collected nearly 5000 complete folding events for agreement with experiment should also be the 76 nucleotide tRNA topology, which is signifi- significant. cantly more complex than the small, fast-folding Indeed, we show below that the bulk folding systems commonly examined in folding behavior predicted by this simple, atomistic, simulations.9,17,18 This degree of sampling repre- minimalist model predicts well the experimentally sents ,500 CPU years of work conducted on observed bulk folding mechanism, suggesting that ,10,000 CPUs within our global computing topology does in fact contribute to the bulk folding network†. mechanism. Furthermore, massive sampling has, Interestingly, topological considerations have for the first time, allowed direct comparison of only been applied to the smallest of ,9 likely simulated ensemble kinetics to single-molecule because the folding of functional RNAs is known folding simulations, illustrating a tremendous to be more complex than that of small proteins heterogeneity inherent to individual folding events due to the coupling of RNA conformational ener- that collectively contribute to a topologically dri- getics with the predominantly ionic environment.19 ven bulk folding mechanism. We discuss below For instance, it has been demonstrated that the the general results seen in single-molecule folding identity and concentration of ions present can trajectories, followed by analysis of kinetic and alter the folding and unfolding dynamics exhibited mechanistic observations from an ensemble- by the tRNA topology,20 – 22 further complicating averaged perspective, as well as the role of poly- our understanding of RNA folding. meric entropy in folding. Caveats of the model are To directly assess the relationship between then discussed in Methods. native topology and RNA folding mechanism, we employ a minimalist model similar to the Go¯-like models previously used to study protein Results folding.15,23 – 28 We improve upon these models by developing a folding potential that: (i) treats the Diverse pathways in tRNA folding polymer in atomic detail; (ii) introduces attractive non-native interaction energies; and (iii) includes The native fold of tRNA can be described in ion-dependent effects on the folding dynamics. terms of nine “substructures” (Figure 1(a)). The These features allow for a much more accurate well-defined secondary structure is composed of approximation of the true energetics than the four helix-stem regions (S1,S2,S3,S4), while the coarse-grained models commonly used to study tertiary structure consists of a set of five regions of 28 the dynamics of large biomolecules, while inher- long-range contact (T12,T13,T14,T23,T24), where ently maintaining the polymeric entropy expected the dual subscripts denote the two helix-stems in from atomistic models, thereby allowing a causal contact for a given tertiary region. Formation of relationship to be inferred between changes in these substructures in simulated single-molecule the energetics of the model and changes in the folding events occur as a series of numerous, dis- resulting dynamics. crete transitions in the fraction of native contacts, Striving for simplicity, we consider the most Q. To quantitatively characterize the observed elementary model of the ionic effect in RNA fold- pathways, the relative folding times for these ing, which assumes that monovalent and divalent nine substructures were calculated. In many cases, cations stabilize secondary and tertiary structure, multiple substructures folded in the same time respectively. We propose that such a simple model range. To unambiguously determine the sub- is adequate to capture the essential folding structural folding order in a given trajectory, the dynamics observed in experiments under varying variance in the fraction of native contacts within ionic conditions: our model employs two para- each substructure throughout that trajectory, meters, 12 and 13, which specify the energetic bene- VarðQÞ; was calculated, and the relative folding fit of native secondary and tertiary interactions, time of each substructure was defined as the tem- respectively, coupled only in their simultaneous poral point at which VarðQÞ was a maximum (i.e. use in thermal calibrations of the model. the point at which Q was increasing most rapidly). We note that while the use of Go¯-like potentials Pearson correlations (defined as the covariance to extract precise folding dynamics is questionable of two sets divided by the product of their stan- dard deviations) between folding times for each pair of substructures within the ensemble of fold- † Folding@Home, http://folding.stanford.edu ing events were then evaluated, and the resulting Bulk tRNA Folding Simulations 791

Table 1. Weighting of tRNA folding pathways Class Weight (%) Pathway

I 36.3 S4 [S2,T12]S3 [T13,T23][T14,T24]S1 10.6 S4 S3 [S2,T12][T13,T23][T14,T24]S1 4.5 [S2,T12]S4 S3 [T13,T23][T14,T24]S1 1.7 S4 [S2,T12]S3 T24 [T13,T23]T14 S1

II 13.0 S4 [S2,T12]S3 S1 [T13,T23][T14,T24] 3.4 S4 S3 [S2,T12]S1 [T13,T23][T14,T24] 2.2 S4 [S2,T12]S3 S1 T14 [T13,T23]T24 1.5 S4 [S2,T12]S1 S3 [T13,T23][T14,T24] 1.4 [S2,T12]S4 S3 S1 [T13,T23][T14,T24]

III 8.1 S4 [S2,T12]S3 [T13,T23]S1 [T14,T24] 2.4 S4 S3 [S2,T12][T13,T23]S1 [T14,T24] 2.4 S4 S3 [S2,T12][T13,T23]S1 [T14,T24] 1.6 S4 [S2,T12]S3 [T13,T23]T14 S1 T24

stages of folding exhibited a much larger variation. For this reason, the ordering of the substructural folding events within phase 2 was used to classify folding trajectories. The observed statistical weighting of folding pathways is detailed for each folding class in Table 1. For brevity, pathways that contributed less than 1% of the folding ensemble have been removed, and the tabulated pathways account for approximately 90% of observed folders. Although this sampling of pathways cannot quantitatively capture equilibrium thermodynamic behavior, insight into the qualitative nature of the folding landscape (e.g. locations of intermediates and free energy barriers) can be gained by consid- ering the relative probabilities of each microstate present in the simulated ensemble along a minimal number of folding parameters (the fraction of native secondary and tertiary contacts, Q2 and Q3, respectively). The log of the conformational proba- bilities on this simplified, two-dimensional surface is plotted in Figure 2(a). It is evident that each Figure 1. tRNA substructure depiction and folding time correlations. (a) tRNA substructures: red, S accep- class of folding pathways predicts multiple inter- 1 mediates in the folding process, observed as the tor stem; orange, S2 D-stem; green, S3 anti-codon stem; blue, S4 T-stem; regions of tertiary contact are noted previously mentioned numerous, discrete steps in with arrows, where Tij indicates tertiary interactions single-molecule folding trajectories. A graphical between the Si and Sj stems. (b) Relative folding time representation of the most statistically predomi- (Pearson) correlations between each substructural pair nant pathway observed is shown in Figure 2(b), within the ensemble of 2592 folding trajectories. and an animation of this simulation is available for viewing online†.

Ensemble-averaged kinetics matrix (Figure 1(b)) shows a distinct separation We find significant differences in comparing between two phases during folding. These ensemble averaged dynamic properties to single- uncorrelated groups are referred to below as molecule trajectories. As shown in Figure 3(a), phase 1 ¼ {S ; S ; S ; T } and phase 2 ¼ {T ; T ; 2 3 4 12 13 14 again using the fraction of native contacts as the T ; T ; S }: Each folding trajectory was then 23 24 1 primary folding parameter, several distinct popu- assigned to a specific folding pathway, denoted by lations are present throughout the ensemble of the sequence of substructural folding events that folding events. With the exception of the fully occurred in that specific simulation. extended chain (E, a physically irrelevant state In an effort to simplify the representation of the used to start simulations without biasing the resul- many pathways observed, folding simulations ting folding pathways), these states qualitatively were grouped into three classes. While the initial events in the folding process were similar for all folders (predominantly stem formation), the final † http://folding.stanford.edu/tRNA/ 792 Bulk tRNA Folding Simulations

Figure 2. Statistical energy pro- files and the predominant folding pathway. (a) Locations of free energy barriers and minima on the simplified two-dimensional folding surface, where folding classes were defined by the order of substruc- tural formation in the second phase of folding. (b) Graphical represen- tation of the most predominant folding pathway in our simulations with substructures color-coded as in Figure 1(a). Labels indicate which substructures have folded or are folding in each frame. The relevant statistical energy landscape is shown in grayscale with the trajectory overlaid in red.

agree with experimentally observed populations agreement observed for IMg is a striking prediction described by Sosnick and co-workers,20,29 who of the character of the intermediate from a model differentiate between multiple unfolded states (U0 based solely on native topology information. The and U in the presence and absence of 4 M urea, distribution of all-atom root-mean-squared devia- respectively) and intermediates (INa denotes the tions (RMSD) from the native structure is also bulk intermediate observed in high sodium con- shown (inset). centrations and IMg is the bulk intermediate The folding mechanism predicted from a bulk observed in the presence of magnesium). perspective (which is dependent on ion identity The distributions of native contact formation and concentration) is shown in Figure 3(d). From PfoldðQÞ versus the number of native contacts ðQÞ the unfolded state, the IMg (full formation of for each substructure are plotted in Figure 3(b), secondary structure) and INa (formation of the with secondary substructures (upper panel) color- three non-terminal helix-stems) intermediates are coded as in Figure 1(a), and tertiary substructures both possible. From IMg, the INa intermediate can (lower panel) scaled from white (T12) to black be formed, with phase 2 tertiary contacts forming (T24). Early events in the folding process (when Q last. Alternatively, folding may include formation is small) include formation of the three non- of tertiary contacts followed by zipping of the terminal helices, alongside the T12 tertiary contact terminal S1 helix-stem. (phase 1). Long-range tertiary contacts To illustrate how a three-state (U ! I ! N) {T13; T14; T23; T24} then form in any variety of tem- observation could come from a large ensemble of poral sequences after phase 1 has completed, with individual folding events, each exhibiting numer- the formation of the terminal helix-stem ðS1; redÞ ous discrete transitions, Figure 4 depicts the either preceding or succeeding collapse and ensemble-averaged fraction of native contacts formation of tertiary structure, as described kQðtÞl alongside a single, randomly chosen trajec- experimentally.20 tory (Figure 4(a)). Figure 4(b) demonstrates how The distribution of the radius of gyration Rg is the discrete nature of folding is lost when consider- shown in Figure 3(c), and shows quantitative ing only ten randomly chosen individual folding agreement with experimentally measured molecu- events in the averaging. With more than 2500 29 lar sizes attained for collapsed states: the IMg trajectories (Figure 4(c)), all sense of discrete, step- ˚ and N ensembles have kRgl ¼ 31:8ð^3:4Þ A and wise folding is lost, and the kQðtÞl curve becomes ˚ kRgl ¼ 26:2ð^1:3Þ A, respectively. These values are a smooth function of time. Still, the averaging of only slightly larger than those reported by Fang only ten independent folding trajectories is more et al. due to the added polymeric flexibility of our representative of a single trajectory than of the model (see Methods). While the quantitative agree- ensemble as a whole, and only after including ,500 ment for the native state Rg is fortuitous (based on or more events (data not shown) does the kQðtÞl the construction of the model), the quantitative curve approximate that of the entire ensemble. Bulk tRNA Folding Simulations 793

Figure 4. Ensemble-averaged kinetics. The fraction of native contacts for a single folding trajectory (a) shown next to the ensemble-averaged fraction of native contacts for ten randomly chosen folding simulations (b) and all 2592 folding events (c). kQðtÞl approaches a smooth func- tion of time with two distinct, sequential exponential Figure 3. Ensemble-averaged properties during tRNA phases (fitted as dotted curves) in (c). In (b) and (c), folding. (a) Detected states in the bulk folding process. Var½Qðtފ is shown as a thick gray line. (d) The time- (b) Distributions of relative folding times for each sub- dependent mole fractions for the U, I, and N states. structure, with secondary structures color-coded as in Figure 1(a) (upper panel) and tertiary structures scaled from white to black (lower panel). (c) Rg and RMSD (inset) distributions for the observed states. (d) Predicted In contrast to the simple, exponentially increas- ion-dependent bulk folding mechanism. ing kQðtÞl described above, two such phases are apparent in Figure 4(c). Because phase 2 begins only after a portion of individual trajectories have It is convenient to consider the formation of each completed phase 1, the two transitions in the of the nine independent substructures within the kQðtÞl curve were fit individually to single-expo- ensemble, starting from the extended polymer, as nential processes (dotted curves in Figure 4(c)), a stochastic (Poisson) process with a given folding resulting in R2 values of 0.999 and 0.998, respect- rate ln. This is particularly the case for simple ively. Were there not a significant overlap between helix formation (with no bulges or other internal the two phases (i.e. if all trajectories completed structure), which has been shown to be a two- phase 1 prior to any of them starting phase 2), state process.9,30,31 Such processes are additive these fits would be expected to fully predict kQðtÞl; such that the sum of n independent Poisson which would then exhibit a cusp at the crossover processes is itself a Poisson process with a total point between the two fits. As the resulting rate of rate equal to the sum of the n individual rates. phase 1 is greater than that for phase 2, the buildup Thus, if folding occurred simply as simultaneous of a single observable intermediate between the formation of all nine substructures, one would phases is predicted. expect the total number of native contacts Q to To consider the dynamic populations present increase exponentially in time, with a total rate from a chemical kinetics perspective, Figure 4(d) equal to the sum of the rates for each substructure. shows the evolution of ensemble mole fractions of 794 Bulk tRNA Folding Simulations

unfolded U, intermediate I, and native state N con- as a preference for IMg), with S1 forming late in the formations. The I state was defined (from statistical folding process. In contrast, when secondary struc- populations in Figure 2(a)) as having formed ture is significantly more favorable than tertiary ,60% and ,10% of native secondary and tertiary structure, 12=13 ¼ 3; our model simulates a regime contacts, respectively, and exhibits a concentration analogous to higher sodium concentrations in the maximum that coincides with the crossover point absence of magnesium, as shown in Figure 5 of the individual fits in (c). The native state curve (lower panels). In this case, the formation of the is well fit by the proper biexponential relationship terminal stem-helix precedes the formation of for the sequential, irreversible reaction U ! I ! N phase 2 tertiary contacts (INa). We stress that this with an R2 value of 0.999. finding is in good agreement with the rather non- canonical experimental results reported by Shelton Pathways for overcoming entropy et al.,20 in which the simple cloverleaf secondary structure is not a necessary intermediate on the A strength of atomistic, minimalist models is tRNA folding pathway. their utility in elucidating the role of polymeric Like the formation of hairpins, the entropy in the folding dynamics.27 It is thus inter- folding of the terminal RNA helix-stem appears esting to consider the balance of interaction energy to be cooperative due to a balance between the and polymeric entropy in tRNA folding. We stress energetics of interaction and the polymeric entropy that in this minimalist model the energetics are loss of structure formation. Indeed, it has been defined by the native tRNA topology or, more previously suggested that the polymeric entropy specifically, by the “stability ratio,” 12/13. The fold- of biomolecules can be sufficient to lead to ing mechanisms for extreme values of the stability cooperative folding.32 However, one would expect ratio are obvious: for 12=13 q 1 secondary structure a significant entropic penalty for bringing together p will form preferentially, and for 12=13 1; tertiary the two ends of the 76 nucleotide tRNA, and sur- structure would be preferred. Simulation can play mounting this barrier is apparently accomplished an important role in characterizing the experi- in one of the two experimentally observed ways. mentally relevant intermediate regime, where one Increasing the stability ratio results in greater must consider the delicate balance between the secondary structure stability, and the enthalpy of energetic benefits and entropic penalties of form- base-pairing dominates the conformational free ing native structure. energy, thus favoring early zipping of the terminal The variation of 12/13 implies the variation of stem (INa). On the other hand, decreasing the ionic conditions: calibration of our model to stability ratio implies a more stable tertiary struc- experiment suggests that for 12=13 ¼ 1(Figure 5, ture, and the entropic barrier to terminal zipping top panels) we are in a regime analogous to dominates, resulting in unpaired S1 strands (IMg) magnesium-mediated folding, whereby three of until late in the folding process (after long-range the four helix-stem elements are structured in the collapse has occurred). We thus hypothesize that early stages of folding (observed experimentally this entropic barrier is responsible for the distinct ion-dependent dynamics observed in tRNA fold- ing, with counterions tuning the relative stability of the intermediates and, in effect, “tipping the scale” in favor of one mechanism over another. To be sure, our simulations show that even a some- what subtle change in the stability ratio can lead to a qualitative change in the folding behavior.

Discussion and Implications

Using massively distributed computational sampling and a minimalist, atomistic model with and counterion effects, we have shown that bulk non-equilibrium reaction dynamics can be extracted from many single-mol- ecule events without the need for extrapolating bulk behavior from a small number of simulations. While our ensemble-averaging includes far fewer events than the number studied in standard bulk experiments, we have observed convergence to an ensemble signal from ,2500 individual folding Figure 5. Ion-dependent dynamics. As in Figure 3(b), events, and have identified the lower bound for the distributions of relative substructural folding times attaining the ensemble signal to be ,500 indi- are shown for both perturbed values of the stability vidual trajectories for a molecule of this size and ratio 12=13: complexity. Bulk tRNA Folding Simulations 795

Our results indicate that native tRNA topology A unified-atom variant of the AMBER94 force field36 serves as a dominant predictor of the bulk folding was used to define bonded interactions (bond, angle, mechanism. To be sure, a biasing potential built and torsion terms), with optimal geometries taken from from native state information that includes non- the native state. To speed the dynamics of interest, low t ¼ , native interaction energetics recovers the known viscosity ( T 0.5 ps, 2% that of water) was employed three-state behavior reported previously and well and dihedral energy terms were increased by a factor of 5 relative to the effective temperature. The polymer was predicts the character of known environmentally made less rigid by also decreasing bonded interaction dependent intermediates, as specified by both energies by a factor of 10 relative to the effective 20 molecular size and substructure formation. In temperature. A complete set of simulation input files is contrast to the ensemble-averaged behavior, great available†. diversity in single-molecule folding pathways has All simulations were conducted using the stochastic been observed. This is not surprising when con- dynamics integrator within the Gromacs molecular sidered in the context of contemporary folding dynamics suite37 with 15 A˚ cutoffs on all non-bonded theories, which identify native folds as energetic interactions. As the mean fraction of native contacts “traps” and portray the polymeric motion that present in simulations of the native state was 0.87, all leads to trapping in low-energy states as predomi- folding simulations (started from a fully extended con- formation) were considered fully folded when 87% of nantly stochastic in nature. all possible secondary contacts and 87% of all possible Still, it remains a common practice to interpret tertiary contacts were formed. bulk signals from various experimental sources as representing the “true” dynamics on a microscopic level, and it has only recently been put forth that Caveats of the biasing potential (unobservable) intermediates are likely present even in the most simple two-state folders.33 The While minimalist models allow the study of events not sampling reported herein thus serves as an observable via fully atomistic molecular dynamics,15,25,27 example of the distinction that should be drawn the limitations of the minimalist model approach should between single-molecule (microscopic) and bulk be considered. First of all, a quantitative prediction of (macroscopic) probes, and highlights the need for absolute transition rates is not possible, and temporal both single-molecule experiments and simulation, analyses have therefore been conducted in units of iterations, yielding qualitative information on the nature both single-molecule and bulk, in the interpre- of relative rates.15 Moreover, since non-native inter- tation of bulk measurements. Indeed, it will be actions are less energetically favorable than native ones, intriguing to see this complexity revealed by this model is not expected to accurately characterize single-molecule experiments of tRNA folding. off-pathway folding intermediates (kinetic traps), other than those resulting from topological frustration.15,26 However, we note that the inclusion of weak non-native attractions not present in previous biasing models Methods greatly enhances the stochastic nature of folding trajec- tories, allowing a more plausible description of the Construction of the minimalist model underlying energetic landscape.

Furthermore, as 12 and 13 were chosen to be constant To construct a minimalist model for RNA folding, we over the whole molecule, the simulated tRNA sequence have expanded upon previous minimalist models of pro- may not have specific energetic biases to particular sub- 15,25,28,34,35 tein folding by including non-native long-range structures. While some tRNA sequences may break this interaction energy terms (adding energetic frustration approximation, the goal of this work is to study the to the folding landscape) and by considering the role of biasing role of polymeric entropy and we are thus inves- solvated counterions in the folding process. We define tigating a general property of RNA folding, i.e. the effect native contacts as atomic pairs separated by three or of the tRNA topology on folding. While one could ˚ more residues and within 6.0 A of one another in the include substructure-dependent energetic biases to phe native structure (based on tRNA , PDB ID 6tna). These model differences between specific tRNA sequences, interatomic interactions are modeled using different this would obscure the roles of polymeric entropy and classes of long-range Lennard–Jones (LJ) interactions: topology in folding. native contact pairs within the same helix-stem region Finally, a prime consideration in employing a biasing (secondary contacts) are assigned energy 12 (initially potential of any sort, is the possibility of artifacts within ,ten times greater than non-native interaction energies), the potential distorting the observed dynamics far from while all other native pairs (tertiary contacts) are that of the true biomolecular system, and the true test of assigned LJ energy 13. such a potential is the ability to show agreement with An initial stability ratio of 12=13 ¼ 2 was used, thereby current knowledge prior to making predictions. Within assigning twice the energetic benefit to native secondary the results reported above, we propose that such “arti- contacts as that assigned to native tertiary contacts, and factual folding” is seen as the early formation of the T12 a total of 2592 complete folding events were simulated. tertiary contact (Figure 3(b)), which has not been shown To gain insight into the distinct kinetics reported for to occur in the early stages of folding. Such potential arti- 20 differing ionic conditions, over 1000 complete folding facts may also affect the relative rates of individual helix- events were also collected using stability ratios of 1 and stem formation. We believe such artifactual folding to be 3, respectively. Each parameterization was calibrated to minimal within the model, and stress that the ordering of a melting temperature of ,343 K such that an effective temperature of 280(^10) K was employed for all folding simulations. † http://folding.stanford.edu/tRNA 796 Bulk tRNA Folding Simulations phase 1 secondary structure formation does not change 12. Riddle, D. S., Grantcharova, V. P., Santiago, J. V., the overall character of the analyses presented above. Alm, E., Ruczinski, I. & Baker, D. (1999). Experiment and theory highlight role of native state topology in SH3 folding. Nature Struct. Biol. 6, 1016–1024. 13. Deschenes, L. A. & Vanden Bout, D. A. (2001). Single-molecule studies of heterogeneous dynamics Acknowledgements in polymer melts near the glass transition. Science, 292, 255–258. We thank Erik Lindahl for technical aid, Dan 14. McKinney, S. A., Declais, A. C., Lilley, D. M. J. & Ha, Herschlag, Rhiju Das, Sung-Joo Lee, and Mark T. (2003). Structural dynamics of individual Holliday junctions. Nature Struct. Biol. 10, 93–97. Engelhardt for their comments on the manuscript, 15. Shimada, J. & Shakhnovich, E. I. (2002). The ensem- and the Folding@Home volunteers worldwide ble folding kinetics of protein G from an all-atom who made this work possible (a list of contributors Monte Carlo simulation. Proc. Natl Acad. Sci. USA, is available)†. G.J. is a Siebel Fellow. E.J.S. and 99, 11175–11180. Y.M.R. are supported by predoctoral fellowships 16. Zagrovic, B., Sorin, E. J. & Pande, V. (2001). from the Krell/DOE CSGF and Stanford Graduate b-Hairpin folding simulations in atomistic detail Fellowship, respectively, with additional support- using an implicit solvent model. J. Mol. Biol. 313, ing grants ACS-PRF (36028-AC4) and NSF MRSEC 151–169. CPIMA (DMR-9808677), and a gift from Intel. 17. Pande, V. S., Baker, I., Chapman, J., Elmer, S., Kaliq, S., Larson, S. et al. (2003). Atomistic protein folding simulations on the submillisecond timescale using worldwide distributed computing. Biopolymers, 68, References 91–109. 1. Baker, D. (2000). A surprising simplicity to protein 18. Sorin, E. J., Engelhardt, M. A., Herschlag, D. & folding. Nature, 405, 39–42. Pande, V. S. (2002). RNA simulations: probing hair- 2. Debe, D. A. & Goddard, W. A. (1999). First principles pin unfolding and the dynamics of a GNRA tetra- prediction of protein folding rates. J. Mol. Biol. 294, loop. J. Mol. Biol. 317, 493–506. 619–625. 19. Thirumalai, D., Lee, N., Woodson, S. A. & Klimov, 3. Plaxco, K. W., Simons, K. T., Ruczinski, I. & David, B. D. K. (2001). Early events in RNA folding. Annu. (2000). Topology, stability, sequence, and length: Rev. Phys. Chem. 52, 751–762. defining the determinants of two-state protein fold- 20. Shelton, V. M., Sosnick, T. R. & Pan, T. (2001). Alter- ing the intermediate in the equilibrium folding of ing kinetics. , 39, 11177–11183. Phe 4. Makarov, D. E., Keller, C. A., Plaxco, K. W. & Metiu, unmodified yeast tRNA with monovalent and H. (2002). How the folding rate constant of simple, divalent cations. Biochemistry, 40, 3629–3638. single-domain proteins depends on the number 21. Stein, A. & Crothers, D. M. (1976). Conformational of native contacts. Proc. Natl Acad. Sci. USA, 99, changes of transfer RNA. The role of magnesium(II). 3535–3539. Biochemistry, 15, 160–168. 22. Stein, A. & Crothers, D. M. (1976). Equilibrium bind- 5. Jewett, A. I., Pande, V. S. & Plaxco, K. W. (2003). Met Cooperativity, smooth energy landscapes and the ing of magnesium(II) by Escherichia coli tRNAf . origins of topology-dependent protein folding rates. Biochemistry, 15, 157–160. J. Mol. Biol. 326, 247–253. 23. Ueda, Y., Taketomi, H. & Go, N. (1975). Studies on 6. Klimov, D. K. & Thirumalai, D. (2000). Native top- protein folding, unfolding and fluctuations by com- ology determines force-induced unfolding pathways puter simulation. I. The effects of specific amino in globular proteins. Proc. Natl Acad. Sci. USA, 97, acid sequence represented by specific inter-unit 7254–7259. interactions. Int. J. Pept. Protein Res. 7, 445–459. 7. Gsponer, J. & Caflisch, A. (2001). Role of native 24. Ueda, Y., Taketomi, H. & Go, N. (1978). Studies on topology investigated by multiple unfolding simu- protein folding, unfolding, and fluctuations by com- lations of four SH3 domains. J. Mol. Biol. 309, puter-simulation. 2. 3-dimensional lattice model of 285–298. lysozyme. Biopolymers, 17, 1531–1548. 8. Ferrara, P. & Caflisch, A. (2001). Native topology or 25. Shea, J. E., Onuchic, J. N. & Brooks, C. L. (2002). specific interactions: what is more important for Probing the folding free energy landscape of the protein folding? J. Mol. Biol. 306, 837–850. src-SH3 protein domain. Proc. Natl Acad. Sci. USA, 9. Sorin, E. J., Rhee, Y. M., Nakatani, B. J. & Pande, V. S. 99, 16064–16068. (2003). Insights into nucleic acid conformational 26. Clementi, C., Hugh Nymeyer, H. & Onuchic, J. N. dynamics from massively parallel stochastic simu- (2000). Topological and energetic factors: what deter- lations. Biophys. J. 85, 790–803. mines the structural details of the transition state 10. Chiti, F., Taddei, N., White, P. M., Bucciantini, M., ensemble and “en-route” intermediates for protein Magherini, F., Stefani, M. & Dobson, C. M. (1999). folding? an investigation for small globular proteins. Mutational analysis of acylphosphatase suggests the J. Mol. Biol. 298, 937–953. importance of topology and contact order in protein 27. Pande, V. S. & Rokhsar, D. S. (1998). Is the molten folding. Nature Struct. Biol. 6, 1005–1009. globule a third phase of proteins? Proc. Natl Acad. 11. Martı´nez, J. C. & Serrano, L. (1999). The folding tran- Sci. USA, 95, 1490–1494. sition state between SH3 domains is conformation- 28. Clementi, C., Garcia, A. E. & Onuchic, J. N. (2003). ally restricted and evolutionarily conserved. Nature Interplay among tertiary contacts, secondary struc- Struct. Biol. 6, 1010–1016. ture formation and side-chain packing in the protein folding mechanism: all-atom representation study of protein L. J. Mol. Biol. 326, 933–954. † Folding@Home, http://folding.stanford.edu 29. Fang, X., Littrell, K., Yang, X.-J., Henderson, S. J., Bulk tRNA Folding Simulations 797

Siefert, S., Thiyagarajan, P. et al. (2000). Mg2þ-depen- 34. Nymeyer, H., Garcia, A. E. & Onuchic, J. N. (1998). dent compaction and folding of yeast tRNAPhe and Folding funnels and frustration in off-lattice minim- the catalytic domain of the B. subtilis RNase P RNA alist protein landscapes. Proc. Natl Acad. Sci. USA, determined by small-angle X-ray scattering. Biochem- 95, 5921–5928. istry, 39, 11107–11113. 35. Li, L. & Shakhnovich, E. I. (2001). Constructing, veri- 30. Zhang, W. B. & Chen, S. J. (2002). RNA hairpin-fold- fying, and dissecting the folding transition state of ing kinetics. Proc. Natl Acad. Sci. USA, 99, 1931–1936. chymotrypsin inhibitor 2 with all-atom simulations. 31. Ansari, A., Kuznetsov, S. V. & Shen, Y. Q. (2001). Proc. Natl Acad. Sci. USA, 98, 13014–13018. Configurational diffusion down a 36. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., describes the dynamics of DNA hairpins. Proc. Natl Merz, K. M., Ferguson, D. M. et al. (1995). A second Acad. Sci. USA, 98, 7771–7776. generation force field for the simulation of proteins, 32. Pande, V. S., Grosberg, A. Y. & Tanaka, T. (2000). nucleic acids, and organic molecules. J. Am. Chem. Heteropolymer freezing and design: towards physi- Soc. 117, 5179–5197. cal models of protein folding. Rev. Mod. Phys. 72, 37. Lindahl, E., Hess, B. & van der Spoel, D. (2001). 259–314. GROMACS 3.0: a package for molecular simulation 33. Daggett, V. & Fersht, A. (2003). The present view of and trajectory analysis. J. Mol. Model. 7, 306–317. the mechanism of protein folding. Nature Rev. Mol. Cell Biol. 4, 497–502.

Edited by J. Doudna

(Received 3 October 2003; received in revised form 7 February 2004; accepted 10 February 2004)