Roles of Native Topology and Chain-Length Scaling in Protein Folding: a Simulation Study with a Go-Like Model Nobuyasu Koga1 and Shoji Takada1,2*

doi:10.1006/jmbi.2001.5037 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 313, 171±180 Roles of Native Topology and Chain-length Scaling in Protein Folding: A Simulation Study with a Go-like Model Nobuyasu Koga1 and Shoji Takada1,2* 1Graduate School of Science We perform folding simulations on 18 small proteins with using a simple and Technology, Kobe Go-like protein model and analyze the folding rate constants, character- University and istics of the transition state ensemble, and those of the denatured states in terms of native topology and chain length. Near the folding transition 2PRESTO, Japan Science and temperature, the folding rate k scales as k exp( c RCO N0.6) where Technology, Rokkodai, Nada F F RCO and N are the relative contact order and numberÀ of residues, Kobe 657-8501, Japan respectively. Here the topology RCO dependence of the rates is close to that found experimentally (kF exp( c RCO)), while the chain length N dependence is in harmony withÀ the predicted scaling property 2/3 (kF exp( cN )). Thus, this may provides a uni®ed scaling law in fold- À 2/3 ing rates at the transition temperature, kF exp( c RCO N ). The degree of residual structure in the denatured stateÀ is highly correlated with RCO, namely, proteins with smaller RCO tend to have more ordered structure in the denatured state. This is consistent with the observation that many helical proteins such as myoglobin and protein A, have partial helices, in the denatured states. The characteristics of the transition state ensemble calculated by the current model, which uses native topology but not sequence speci®c information, are consistent with experimental f-value data for about half of proteins. # 2001 Academic Press Keywords: contact order; folding rate; transition state; f-value; denatured *Corresponding author state Introduction tacts, which are the residue pairs close in contact at the native structure, divided by the total number How proteins fold to their native structures has of residues. Second, the folding transition state long been studied and still remains a fundamental ensembles (TSEs) are characterized by the f-value question in biophysics.1 ± 3 Recent studies of small, analysis developed by Fersht et al.1,6 for many pro- single-domain proteins have leaded to a new para- teins and it was found that proteins that have simi- digm that the folding rates and pathways are lar- lar structures with low sequence homology have gely determined by their native topology. First, similar TSE for the SH3 domain family7,8 and ab Baker and co-workers found that the folding rates plaits9,10 which implies that the native topology of two-state folding proteins are strongly corre- tends to determine folding pathways of the pro- lated with a measure of the topological complexity tein. It should be noted, however, those recent of native structure,4,5 that is, folding is slower for works report some exceptions to this scenario:11± 13 more complex topologies. The measure of topologi- For example, protein G and protein L have the cal complexity, which they called the relative con- same topology, but exhibit different transition state tact order, is a key quantity currently. It is de®ned ensembles. as the average sequence separation of native con- Why native topology plays an important role in determining folding mechanisms is qualitatively understood from the energy landscape perspective. Abbreviations used: TSE, transition state ensembles; RCO, relative contact order; ACO, absolute contact It suggests that fast folding proteins have a funnel- like energy landscape with some ruggedness on order; D, denatured; N, native; AcP, acylphosphatase; 2 ADA2h, procarboxypeptidase A2. the surface. Evolutionary pressure has made the E-mail address of the corresponding author: ruggedness on the slope suf®ciently small so that [email protected] the current natural proteins may have minimal 0022-2836/01/010171±10 $35.00/0 # 2001 Academic Press 172 Topology and Chain-length in Protein Folding frustration. The effective energy, which is de®ned ility due to contacts, and the other from entropy as protein internal interaction energy plus sol- change due to loss of chain ¯exibility. vation free energy, decreases smoothly with fold- Advantages of these free-energy functional ing reaction to proceed, whereas the acquisition of approaches are their simplicity and reasonable structured segments upon folding reduces the agreement between theory and experiments. On chain entropy, too. Since the energy stabilization the other hand, a possible shortcoming may be the accelerates and the entropy loss decelerates folding dif®culty to single out reasons for any discrepancy reaction, their subtle balance determines the fold- between theory and experiments because these the- ing pathways and rates. Upon formation of a resi- ories often are based on many drastic approxi- due-residue contact, the entropy loss is larger for mations, such as neglect of non-native interactions, non-local contact than for local one. Thus, a protein crude approximations in the expression of entropy whose native topology has more non-local contacts loss and sequence-speci®c interactions included in folds more slowly. Also, qualitatively, contacts the model. formed at the TSE are chosen so that the entropy An alternative and complementary approach to loss may be minimized with given amount of these studies was recently taken by Onuchic and 26,27 structured segments. Here the entropy loss is his colleagues. They proposed a simple off- strongly dependent on the sequence separation of lattice simulation model that implements Go-like the formed contact and so the native topology con- interactions and performed molecular dynamics trols characteristics of the folding TSEs. simulation for a few small-to-medium size pro- If we move from the qualitative to the semi- teins. They reported probabilities of forming native quantitative level, however, many issues need to be contacts in the TSEs and intermediate states that clari®ed. First, it is somewhat surprising that the are qualitatively consistent with experiments. This folding rate constant correlates with relative contact simulation study seems more straightforward than order instead of absolute contact order{ because the the free-energy functional approaches. Especially, latter is more directly linked to the entropy loss no additional approximation is needed besides the upon contact formation.3 Related to this is chain- interaction energy function. It is also based on the length scaling of the folding rate, where theoretical perfect-funnel model and is constructed mostly work predicts some different types of scaling such from the native topology without taking into n 14 2/3 15 1/2 16 account details of sequence-speci®c information. as kF NÀ , kF exp( cN ), kF exp( cN ), that wer e uni®edby WolynÀ es17 while expÀerimental Thus, it is less ambiguous to address the interplay data exhibit little correlation between chain lengths between the native topology and folding mechan- and folding rates.5 Here, experimental data are col- isms, making it easier to single out reasons for a lected only for small two-state folding proteins, discrepancy, if any, with experiments. Here, we attempt to extend Clementi, Nymeyer, while the theory dealt with relatively large proteins 26 and thus this may not be inconsistency, but a gap and Onuchic's work to a more comprehensive between small and large proteins. Here, we show level. Performing molecular dynamics simulation on 18 small proteins, most of which are experimen- how the contact order and chain length scaling can tally known to exhibit apparent two-state tran- be coupled to ®ll the gap giving a uni®ed perspec- sition, we address in greater detail the relation tive of folding rates. between folding mechanisms and the native top- Another challenge at the semi-quantitative level ology. In particular, the folding rates are computed is the prediction of folding pathways and charac- and their correlation with the contact order and its teristics of the TSE from a simple funnel theory. In variants is investigated; the degree of native-like particular, comparison of the calculated values f order in the denatured states is analyzed in terms with experimental ones is an excellent way of test- of the contact order; and the folding pathways are ing the funnel theory. During the last few years, analyzed and computed TSEs, i.e. computed f- several groups reported that simple free energy values, are compared with experimental f-value functionals can describe the folding free energy results. surface and quantitatively predict folding pathways and rate constants with reasonable accuracy.18 ± 23 A basic assumption common in all of these works is that the native contacts (the con- Results tacts that exist in the native structure) alone can We use a minimal protein model proposed by describe the overall shape of a funnel energy land- clementi et al.26,27 that tries to mimic the perfect scape and interaction energies of any non-native funnel aspect of folding energy landscape. Brie¯y, contacts are neglected. These models are often the protein chain is represented only with ca called perfect-funnel models, or Go-like mod- 24,25 atoms of every amino acid residues that are con- els. The functional model essentially has two nected via virtual bonds. Both local and non-local competing contributions, one from energetic stab- interactions are set up so that they have the lowest energy at the native structure. In particular, attrac- { The absolute contact order is the average sequence tive contact interactions are introduced between separation of native contacts not divided by the chain native contacts, which are the residue pairs in close length. contact in the native structure, while the rest of Topology and Chain-length in Protein Folding 173 pairs have solely repulsive interactions avoiding SH3 domain) Both curves have peaks at the same chain crossing.

Roles of Native Topology and Chain-Length Scaling in Protein Folding: a Simulation Study with a Go-Like Model Nobuyasu Koga1 and Shoji Takada1,2*

NIH Public Access Author Manuscript Proteins

Spectrin SH3 Domain Xavier Periole,1* Michele Vendruscolo,2 and Alan E

The Limited Role of Nonnative Contacts in the Folding Pathways of a Lattice Protein

Using Stochastic Roadmap Simulation to Predict Experimental Quantities in Protein Folding Kinetics: Folding Rates and Phi-Values

A New Approach to Optimize a Protein Energy Function on a Folding

Solvation Effects and Driving Forces for Protein Thermodynamic and Kinetic Cooperativity: How Adequate Is Native-Centric Topological Modeling?

Bcl::Fold - De Novo Protein Structure Prediction by Assembly Of

'Folding and Unfolding Mechanism of Highly Stable Full-Consensus Ankyrin

The Protein Folding Problem

Sequence-Structure Relationship in Proteins: a Computational Analysis

Probing Possible Downhill Folding: Native Contact Topology Likely Places a Significant Constraint on the Folding Cooperativity of Proteins with ∼40 Residues

Theory of Protein Folding Jose´ Nelson Onuchic1,2, and Peter G Wolynes1,2,3