(Beta)-Hairpin Folding Simulations in Atomistic Detail Using an Implicit Solvent Model

doi:10.1006/jmbi.2001.5033 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 313, 151±169 b-Hairpin Folding Simulations in Atomistic Detail Using an Implicit Solvent Model Bojan Zagrovic1, Eric J. Sorin2 and Vijay Pande1,2* 1Biophysics Program We have used distributed computing techniques and a supercluster of thousands of computer processors to study folding of the C-terminal 2Department of Chemistry b-hairpin from protein G in atomistic detail using the GB/SA implicit Stanford University, Stanford solvent model at 300 K. We have simulated a total of nearly 38 msof CA 94305-5080, USA folding time and obtained eight complete and independent folding trajectories. Starting from an extended state, we observe relaxation to an unfolded state characterized by non-speci®c, temporary hydrogen bonding. This is followed by the appearance of interactions between hydrophobic residues that stabilize a bent intermediate. Final formation of the complete hydrophobic core occurs cooperatively at the same time that the ®nal hydrogen bonding pattern appears. The folded hairpin structures we observe all contain a closely packed hydrophobic core and proper b-sheet backbone dihedral angles, but they differ in backbone hydrogen bonding pattern. We show that this is consistent with the exist- ing experimental data on the hairpin alone in solution. Our analysis also reveals short-lived semi-helical intermediates which de®ne a thermodynamic trap. Our results are consistent with a three-state mechanism with a single rate-limiting step in which a varying ®nal hydrogen bond pattern is apparent, and semi-helical off-pathway intermediates may appear early in the folding process. We include details of the ensemble dynamics methodology and a discussion of our achievements using this new computational device for studying dynamics at the atomic level. # 2001 Academic Press Keywords: protein folding; simulation; ensemble dynamics; b-hairpin; *Corresponding author folding mechanism Introduction medicine to nanotechnology. After decades of lar- gely independent experimental and theoretical work, the ®eld of protein folding is slowly but Understanding protein folding is one of the undeniably entering a mature age in which the much desired goals of modern day molecular 1±3 two are converging. Experimental techniques have biology. The ability to predict the folding mech- become sophisticated enough to probe the folding anism and the ®nal structure of a protein based on of small, fast-folding proteins and protein its sequence would dramatically affect different elements, while computational power and algor- ®elds ranging from biochemistry to molecular ithms have reached a level at which simulating these events is tractable. B. Z. and E. J. S. contributed equally to this work. Recently, folding of the C-terminal b-hairpin Abbreviations used: dmin, minimum distance between from protein G (Figure 1) has received much atten- hydrophobic core residues; F, relative statistical energy; tion from both experimental and theoretical PC, principal component; MC, Monte Carlo; NOE, fronts.4±19 Parallel and antiparallel b structures nuclear Overhauser enhancement; MD, molecular are, together with a helices, the key secondary dynamics; Ace, acetyl; ASA, solvent-accessible surface structural elements in proteins, and it is believed area; E, fully extended conformation; H, that understanding the folding of these elements hydrophobically collapsed intermediate; F, folded conformation; U, more collapsed unfolded will be a foundation for investigating larger and conformation; GB, generalized Born model; SA, surface more complex structures. The study of isolated area; PB, Poisson-Boltzmann. b-sheets has for a long time been limited by the E-mail address of the corresponding author: lack of an amenable experimental system. The [email protected] breakthrough experiments by the Serrano5,6 and 0022-2836/01/010151±19 $35.00/0 # 2001 Academic Press 152 -Hairpin Folding Simulations in Atomistic Detail Figure 1. b-Hairpin structure from the 1GB1 NMR structure of protein G.4 In (a) the ``top'' view of the hairpin structure shows the relative strand positions and the hydrophobic side-chains which comprise the core: Trp43 (blue), Tyr45 (yellow), Phe52 (red), Val54 (green). Note that the numbering system agrees with the original nomenclature of the protein G NMR structure.4 The excised hairpin is expected to exhibit slightly different properties, as tertiary interactions with the rest of the protein G molecule are absent when the hairpin is alone in solution. (b) The same rep- resentation rotated 90 about the long axis of the hairpin. Numerous characteristics of the structure are clearly represented: hydrophobic packing occurs on only one side of the hairpin and is located near the central region of the strands; b-sheet angles are present along most of the chain, with the loop and terminal residues breaking this trend; a slight twist about the channel axis is evident. Eaton7 groups have recently established the b-hair- expected from an excised hairpin in solution. One pin from the C terminus of protein G as the system should keep in mind, however, that the measure- of choice to study b-structures in isolation. The kin- ments by the Serrano group do indicate that, etic studies by the Eaton group7 have revealed sev- despite the overall similarity, the structure of the eral surprising facts, the most important being the hairpin alone in solution may differ from the high- observation that the hairpin folds in a manner that resolution structures of the hairpin embedded in is very similar to the folding of small proteins: in protein G.5 In Figure 1(a) the hairpin backbone is their thermal unfolding experiments, the peptide displayed in cyan, with the four hydrophobic side- behaved as a two-state system, both kinetically chains colored to show their close packing (Trp43 and thermodynamically. Together with the fact in blue, Tyr45 in yellow, Phe52 in red and Val54 in that the hairpin possesses a hydrophobic core green). This same structure has been rotated 90 which is well packed in the folded state, these about the hairpin axis in (b) to allow easy visual- results have added an additional impetus for ization of the Ca trace. b-Sheet angles and a slight studying the folding of this molecule computation- twist about the channel axis are evident. This side ally. view also demonstrates the hydrophobic packing Based on the high-resolution structures of pro- that occurs only on one side of the hairpin, near tein G4,20 ± 22 and the NMR analysis of the hairpin the center of the folded structure. alone in solution,5 the structural features of the Being the smallest known naturally occurring hairpin include two b-pleated strands joined by a system which exhibits many features of a full six residue turn, a closely packed hydrophobic core size protein, the hairpin has recently motivated a (residues 43, 45, 52 and 54), and a number of inter- number of theoretical studies involving an strand hydrogen bonds connecting the two oppos- impressive array of different computational ing strands of the hairpin. It is important to note techniques.8 ± 12,15± 18 Nevertheless, simulating the that as of now there is no high-resolution exper- folding of the molecule in atomistic detail at a imental structure of the hairpin alone in solution: physiological temperature has not been previously the number of nuclear Overhauser enhancements achieved. The hairpin folds with a time constant of (NOEs) from the NMR analysis by the Serrano several microseconds,7 almost three orders of mag- group was not suf®cient to fully constrain the nitude longer than the typical time-scales attain- structure.5 Figure 1 shows the fold of the hairpin in able by molecular dynamics simulations. Due to protein G (PDB ID code 1GB1, ®rst model), as this signi®cant time-scale gap, there are a number determined by NMR.4 In the absence of a high- of important questions regarding the folding of the resolution structure of the hairpin alone, this best hairpin which are still controversial.19 The foremost serves as an approximate model for the inter- among these concern (a) the nature of the on-path- actions and structural characteristics that would be way intermediate observed in some, but not all, -Hairpin Folding Simulations in Atomistic Detail 153 studies of this peptide; (b) the existence of off-path- important to mention that the trajectories which way intermediates that may act to hinder hairpin are included in this set are fully independent from formation; (c) the relative importance of interstrand one another. Regarding the thermodynamic results hydrogen bonds in comparison to hydrophobic we present, consideration must be given to the core formation; (d) the order of structure formation extent of our sampling. While we have sampled a in the course of folding; and (e) the heterogeneity sizeable fraction of the con®gurational space, we of the folded state. have by no means fully covered it: after all, our Recently, we have introduced a new technique, ensemble of generated structures is not nearly in ensemble dynamics (see Methods), aimed at brid- equilibrium. For this reason, we employ the term ging the gap between computationally achievable statistical energy herein to describe the calculated time-scales and the much longer time-scales energy surfaces which, in the limit of suf®cient involved in protein folding.23 ± 25 Using a superclus- sampling, should approach the true free energy ter of processors around the world and distributed landscape of this system. The statistical energy is computing techniques25 we have folded a-helices, calculated as the logarithm of the probability of the 36 residue villin headpiece and a 20 residue ®nding a conformation with speci®c values of the zinc ®nger molecule (I. Baker et al., unpublished observed folding parameters. results). Here, we have used the ensemble dynamics method to analyze the folding process of Overview of structural trends and sampling of the C-terminal b-hairpin from protein G.

Load more