Statistical mechanics of a correlated landscape model for funnels Steven S. Plotkin, Jin Wang, and Peter G. Wolynes Department of Physics and School of Chemical Sciences, University of Illinois, Urbana, Illinois 61801 ͑Received 31 May 1996; accepted 6 November 1996͒ In heteropolymers, energetic correlations exist due to polymeric constraints and the locality of interactions. Pair correlations in conjunction with the a priori specification of the existence of a particularly low energy state provide a method of introducing the aspect of minimal frustration to the energy landscapes of random heteropolymers. The resulting funneled landscape exhibits both a from a molten globule to a folded state, and the heteropolymeric glass transition in the globular state. We model the folding transition in the self-averaging regime, which together with a simple theory of collapse allows us to depict folding as a double-well free energy surface in terms of suitable reaction coordinates. Observed trends in barrier positions and heights with protein sequence length and thermodynamic conditions are discussed within the context of the model. We also discuss the new physics which arises from the introduction of explicitly cooperative many-body interactions, as might arise from sidechain packing and nonadditive hydrophobic forces. © 1997 American Institute of Physics. ͓S0021-9606͑97͒52406-8͔

I. INTRODUCTION teins have used much of the mathematical techniques used to treat spin glasses6 and regular magnetic systems.7 The poly- Molecular scientists view protein folding as a complex meric nature of the problem must also be taken into account. chemical reaction. Another fruitful analogy from statistical Mean field theories based on replica techniques8 and varia- physics is that folding resembles a phase transition in a finite tional methods9 have been very useful, but are more difficult system. A new view of the folding process combines these to make physically intuitive than the straightforward ap- two ideas along with the notion that a statistical character- proach of the random energy model,10 which flexibly takes ization of the numerous possible protein configurations is into account many of the types of partial order expected in sufficient for understanding folding kinetics in many re- biopolymers.11 Recently we have generalized the latter ap- gimes. proach to take into account correlations in the landscape of The resulting theory of folding ac- finite-sized random heteropolymers.12 This treatment used knowledges that the energy surface of a protein is rough, the formalism of the generalized random energy model containing many local minima like the landscape of a spin ͑GREM͒ analyzed by Derrida and Gardner.13 In this paper, glass. On the other hand, in order to fold rapidly to a stable we extend that analysis to take into account the minimum structure there must also be guiding forces that stabilize the native structure substantially more than other local minima frustration principle and thereby treat proteinlike, partially on the landscape. This is the principle of minimum nonrandom heteropolymers. frustration.1 The energy landscape can be said then to re- There are various ways of introducing the aspect of semble a ‘‘funnel.’’ 2 Folding rates then depend on the sta- minimum frustration to analytical models with rugged land- tistics of the energy states as they become more similar to the scapes. One way recognizes that many empirical potentials at the bottom of the funnel. actually are obtained by a statistical analysis of a database, One powerful way of investigating protein energy land- and when the database is finite, there is automatically an scapes has been the simulation of ‘‘minimalist’’ models. aspect of minimal frustration for any member of that data- These models are not fully atomistic, but caricature the pro- base. Thus the so-called ‘‘associative memory’’ Hamiltonian 14 tein as a series of beads on a chain either embedded in a models have coexisting funnel-like and rugged features in continuum3 or on a lattice.4 A correspondence, in the sense their landscape. Other methods of introducing minimal frus- of phase transition theory, between these models and real tration model the process of evolution as giving a Boltzmann proteins has been set up using energy landscape ideas.5 distribution over sequences for an energy gap between a 15 Many issues remain to be settled however in understanding fixed target structure and unrelated ones. All of the above how these model landscapes and folding mechanisms change approaches can be straightforwardly handled with replica- as the system under study becomes larger and as one intro- based analyses. Here we show that the GREM analyses can duces greater complexity into the modeling of this corre- be applied to minimally frustrated systems merely by requir- spondence, as for example, by explicitly incorporating many- ing the energy of a given state to be specified as having a body forces and extra degrees of freedom. Simulations particularly low value ͑i.e., less than the putative ground become cumbersome for such surveys, and an analytical un- state value͒. Minimally frustrated, funneled landscapes are derstanding is desirable. just a special case of the general correlated landscape studied Analytical approaches to the energy landscape of pro- earlier.

2932 J. Chem. Phys. 106 (7), 15 February 1997 0021-9606/97/106(7)/2932/17/$10.00 © 1997 American Institute of Physics

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2933

A convenient aspect of the correlated landscape model is the denaturation curve as determined by the constant and that it allows the treatment of the polymer physics in a very variable density models. In Sec. VI we discuss the results direct way, using simple statistical thermodynamics in the and conclude with some remarks. tradition of Flory.16 Here we will show how the interplay of collapse and topological ordering can be studied. In order to II. A THEORY OF THE FREE ENERGY do this we introduce a simple ‘‘core-halo’’ model to take into account the spatially inhomogeneous density. We will In this section, we show how the existence of a particu- also discuss the role of many-body forces in folding. Explic- larly low energy configuration, together with energetic cor- itly cooperative many-body forces have often been involved relations for similar states, leads to a model for the folding in the thinking about protein structure formation. Hydropho- transition and corresponding free energy surface in protein- bic forces are often modeled as involving buried surface like heteropolymers. This ansatz for the correlated energy area. Such an energy term is not pairwise additive but in- landscape corresponds to the introduction of minimal frustra- volves three or more interacting bodies. Sidechain packing tion in a random energy landscape, where the order param- involves objects fitting into holes created by more than one eter here ͑which will function as a reaction coordinate for the other part of the chain, thus the elimination of sidechains folding transition͒ counts the number of native contacts or from the model can yield an energy function for backbone hydrogen bonds. units with explicit nonadditivity. These many-body forces We start by assuming a simple ‘‘ball and chain’’ model can be treated quite easily by the GREM, and we will see for a protein which is readily comparable with simulations, that they can make qualitative changes in the funnel topog- e.g., of the 27-mer, which is widely believed to capture many raphy. of the quantitative aspects of folding ͑Sec. IV͒. Proteins with To illustrate the methods here, we construct two- significant secondary structure have an effectively reduced dimensional free energy surfaces for the of number of interacting units as may be described by a ball and minimally frustrated polymers. These explicitly show the chain model. Properties of both, when appropriately scaled coupling between density and topological similarity in fold- by critical state variables such as the folding temperature TF , ing. We pay special attention to the location of the transition glass temperature TG , and collapse temperature TC , will 5 state ensemble and discuss how this varies with system size, obey a law of corresponding states. Thus the behavior de- cooperatively of interactions, and thermodynamic conditions. scribing a complicated real protein can be validly described In the case of the 27-mer on a lattice, a detailed fit to the by an order parameter applied to a minimal ball and chain lattice simulation data4 is possible. Although delicate cancel- model in the same universality class. lations of energetic and entropic terms are involved in the For a 27-mer on a three-dimensional cubic lattice, there overall free energy, plausible parameters fit the data. are 28 contacts in the most collapsed ͑cubic͒ structure. For The trends we see in the present calculations are in concreteness we take such a maximally compact structure to rough agreement with experimental information on the na- be the configuration of our ground state, the generalization to ture and location of the transition state ensemble,17,18 al- a less compact ground state being straightforward in the con- though the theory suggests that fluctuation mechanisms, in text of the model to be described. For a collapsed polymer of the form of independently folding units ͑foldons͓͒A. R. sequence length N, the number of pair contacts per mono- Panchenko et al., Proc. Natl. Acad. Sci. USA 93, 2008 mer, zN , is a combination of a bulk term, a surface term, and 19 ͑1996͔͒ become more important at larger N. We intend later a lattice correction ͓see Eq. ͑A4͔͒. The effect of the surface to return to the experimental comparison, especially taking on the number of contacts is quite important even for large into account more structural details within the protein. macromolecules, as zN approaches its bulk value of 2 con- Ϫ1/3 The organization of this paper is as follows: In Sec. II tacts per monomer rather slowly, as ϳ2–3N . we introduce a theory of the free energy at constant density, To describe states that are not completely collapsed, we 3 and in this context investigate the effects of cooperative in- introduce the packing fraction ␩ХN␴/Rg as a measure of the teractions on the transition state ensemble and corresponding density of the polymer, where ␴ is the volume per monomer free energy barrier. In Sec. III we detail a simple theory and Rg is the radius of gyration of the whole protein. So for coupling collapse with topological similarity, and resulting less dense states the total number of contacts is reduced from in the core-halo model described there. In Sec. IV we apply its collapsed value NzN ,toNzN␩. this collapse theory to obtain the free energy in terms of In the spirit of the lattice model we have in mind for density and topological order, now coupled via the core-halo concreteness, we introduce a simple contact Hamiltonian to model. In the same section we compare our model of the determine the energy of the system, minimally frustrated heteropolymer with lattice simulations of the 27-mer. In terms of the categorization of Bryngelson Hϭ͚ ⑀ij␴ij , ͑2.1͒ et al.2 these free energy surfaces depict scenarios described iϽ j as type I or type IIa folding. We then study the quantitative where ␴ijϭ1 when there is a contact made between mono- aspects of the barrier as a function of the magnitude of three- mers ͕ij͖ in the chain, and ␴ijϭ0 otherwise. Here contact body effects. The dependence of position and height of the means that two monomers ͕ij͖, nonconsecutive in sequence barrier as a function of sequence length is studied, as well as along the backbone chain, are adjacent in space at neighbor- the effects of increasing the stability gap. Finally, we study ing lattice sites. ⑀ij is a random variable so that, at constant

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2934 Plotkin, Wang, and Wolynes: Protein folding funnels

density, the total of the various configurations, each P͑E,Q,En͒ ͗n ͑E,Q,E ͒͘ϭeS␩͑Q͒ the sum of many ⑀ij , are approximately Gaussianly distrib- ␩ n P͑E ͒ uted by the central limit theorem, with mean energy at a n ¯ given density ␩ given by E␩ϭNzN␩¯⑀, where ¯⑀ is simply 1 ϳexp N s ͑Q͒Ϫ defined as the mean energy per contact and NzN␩ is again ͭ ␩ 2͑1ϪQ2͒ 2 2 the total number of contacts, and variance ⌬E␩ϭNzN␩⑀ , 2 ¯ ¯ 2 where ⑀ is the effective width of the energy distribution per ͑EϪE͒ϪQ͑EnϪE͒ ϫ , ͑2.5͒ contact. ͩ NJ ͪ ␩ ͮ Suppose there exists a configurational state n of energy 2 2 where J␩ϵzN␩⑀ and s␩(Q)ϵS␩(Q)/N. Equation ͑2.5͒ is En ͑which will later become the ‘‘native’’ state͒. Then if the ¯ Hamiltonian for our system is defined as in Eq. ͑2.1͒,wecan still Gaussian with a large number of states provided EϪE is within a band of energies having E Ϫ ¯E ϭ Q(E Ϫ ¯E) find the probability that configuration a has energy Ea , given c n 12 Ϯ NJ ͱ2(1ϪQ2)s (Q) as upper and lower bounds. There that a has an overlap Qan with n, where QanϵQ is the ␩ ␩ number of contacts that state a has in common with n, di- is a negligibly small number of states with energies above or below this range ͓where the exponent changes sign in Eq. vided by the total number of contacts NzN␩, ͑2.5͔͒. At temperature21 T, the Boltzmann factor eϪE/T/z 1 weighting each state shifts the number distribution of ener- Qϭ ␴a ␴n . ͑2.2͒ Nz ␩ ͚ ij ij gies so that the maximum of the thermally weighted distri- N bution can be interpreted as the most probable ͑thermody- namic͒ energy at that temperature Since this analysis is at constant density both a and n have 2 2 NzN␩⑀ ͑1ϪQ ͒ E ͑T,Q,E ͒ϭ¯EϩQ͑E Ϫ¯E͒Ϫ . ͑2.6͒ NzN␩ contacts. This probability is obtained directly from the ␩ n n T Hamiltonian ͑2.1͒ by averaging over Gaussian distributions of contact energies ⑀ij , The above expression for the most probable energy is useful provided the distribution ͑2.5͒ is a good measure of the ac- tual number of states at E and Q, the condition for which is a n Pan͑Ea ,Q,En͒ ͗␦͓EaϪH͕͑␴ij͖͔͒␦͓EnϪH͕͑␴ij͖͔͒͘ that the fluctuations in the number of states be much smaller ϭ n , than the number of states itself. To this end, we make here Pn͑En͒ ͗␦͓EnϪH͕͑␴ij͖͔͒͘ ͑2.3͒ the simplifying assumption that in each ‘‘stratum’’ defined by the set of states which have an overlap Q with the native state, the states themselves are not further correlated with a each other, i.e., P(E ,Q,E ,Q E )ϭP(E ,Q,E ) where ͕␴ij͖ is the set of contacts in configuration a. The a b ͉ n a n conditional probability distribution is simply a Gaussian with ϫP(Eb ,Q,En), so that in each stratum of the reaction coor- a Q dependent mean and variance, dinate Q, the set of states is modeled by a random energy model. Then since the number of states n␩(E,Q,En) counts a collection of random uncorrelated variables—large when 2 2 P ͑E ,Q,E ͒ ͓͑E Ϫ¯E͒ϪQ͑E Ϫ¯E͔͒ EϾEc—the relative fluctuations ͱ͗(nϪ͗n͘) ͘/͗n͘ are an a n a n Ϫ1/2 ϳexp Ϫ 2 2 . ϳ͗n͘ and are thus negligible. So P ͑E ͒ ͩ 2Nz ␩⑀ ͑1ϪQ ͒ ͪ n n N ͑2.4͒ n(E,Q,En)Ϸ͗n(E,Q,En)͘, and we can evaluate the expo- nent in the number of states ͑2.5͒ at the the most probable energy ͑2.6͒ as an accurate measure of the ͑Q-dependent͒ When Qϭ1, states a and n are identical and must then have thermodynamic at temperature T, the same energy, which Eq. ͑2.4͒ imposes by becoming delta 2 2 NzN␩⑀ ͑1ϪQ ͒ function, and when Qϭ0 states a and n are uncorrelated and S ͑T,Q,E ͒ϭS ͑Q͒Ϫ . ͑2.7͒ ␩ n ␩ 2T2 then Eq. ͑2.4͒ becomes the Gaussian distribution of the ran- dom energy model for the energy of state a. Expression ͑2.4͒ The assumption of a REM at each stratum of Q is clearly holds for all states of the same density as n, e.g., all col- a first approximation to a more accurate correlational lapsed states if n is the native state ͑the degree of collapse scheme. The generalization to treat each stratum itself as a must be a somewhat coarse-grained description to avoid GREM as in our earlier work is nevertheless straightforward, fluctuations due to lattice effects coupled with finite size͒. since our earlier work suggested only quantitative changes, Previously a theory was developed of the configurational which we will not pursue here. If two configurations a and b 20 entropy S␩(Q) as a function of similarity Q with a given have an overlap Q with state n and thus are correlated to n state, at constant density ␩.12 The results of this theory are energetically, they are certainly correlated to each other, par- summarized in Appendix A. Given S␩(Q) and the condi- ticularly for large overlaps where the number of shared con- tional probability distribution ͑2.4͒, the average number of tacts is large. Using the REM scheme at each stratum is more states of energy E and overlap Q with state n, all at density accurate for small Q and breaks down to some extent for ␩,is large Q. In the ultrametric scheme of the GREM, states a

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2935 and b have an overlap qabуQ, which is more accurate for From the thermodynamic expressions for the energy large overlap than at small Q, since at small Q states a and ͑2.6͒ and entropy ͑2.7͒ with the mean energy at density ␩, b need not share any bonds and still can both have overlap Q we can write down the free energy per monomer above the with n. One can also further correlate the energy landscape glass temperature as the sum of four terms, of states by stratifying with respect to q ϭq and so on, ab F z ␩⑀2 resulting in a hierarchy of overlaps and correlations best ␩ N 2 ͑T,Q,En͒ϭzN␩¯⑀ϩQzN␦⑀nϪTs␩͑Q͒Ϫ ͑1ϪQ ͒, treated using renormalization group ideas. N 2T All of the states in each stratum defined by Q are still ͑2.9͒ correlated to state n, and their statistics are correspondingly ¯ where zN␦⑀nϭzN(⑀nϪ¯⑀)ϭ(EnϪE)/N is the extra energy modified. For smaller values of Q, most of the states have for each bond beyond the mean homopolymeric attraction zero overlap with each other essentially because their con- energy ͑the energy ‘‘gap’’ between an average molten glob- figurational entropy is largest when their is no topological ule structure and the minimally frustrated one͒, times the constraint between the states. Microscopic ultrametricity is number of bonds per monomer, and s␩(Q)ϭS␩(Q)/N is the broken in that qab , the overlap between any two states (a,b) entropy per monomer described in Appendix A. in the stratum, is less than Q.AsQincreases, there is a The first term in Eq. ͑2.9͒ multiplied by N is just the crossover to a regime where microscopic ultrametricity be- homopolymeric attraction energy between all the monomers, comes a more accurate description. We assume here that this for a polymer of density ␩. It depends only on the degree of happens typically after QϷ0.5, where there must be some collapse, and not on how many contacts are native contacts. overlap between states a and b. In this regime the form of The second term is the average extra bias energy if a contact the thermodynamic functions is modified by the replacement is native, times the average number of native contacts per 2 of ͑1ϪQ ͒ in expressions ͑2.6͒, ͑2.7͒, ͑2.9͒, etc., by ͑1ϪQ͒. monomer. The third term measures the equilibrium bias to- For a derivation of these formulas in the ultrametric regime, ward larger configurational entropy at smaller values of the see Appendix B. reaction coordinate Q. The last term accounts for the diver- Just as the number of states ͑2.5͒ has a characteristic sity of energy states that exist on a rough energy landscape, energy for which it is exponentially small, the REM entropy the variance of which lowers thermodynamically the energy for a stratum at Q ͑2.7͒ vanishes at a characteristic tempera- more than the entropy, and so lowers the equilibrium free ture energy. For a special surface in (␦⑀n ,⑀,T) space, expression 2 ͑2.9͒ has a double minimum structure in the reaction coordi- Tg͑Q͒ zN␩͑1ϪQ ͒ ϭ , ͑2.8͒ nate Q, with one entropic minimum at low Q corresponding ͱ 2s Q ⑀ ␩͑ ͒ to the ‘‘molten globule’’ state, separated by a barrier from an energetic minimum at high Q corresponding to a ‘‘folded’’ 2 which signals the trapping of the polymer into a low energy state. For a given temperature, values of ␦⑀n and ⑀ can be conformational state within the stratum characterized by Q. obtained which are reasonably close to the values obtained If Tg(Q) is a monotonically decreasing function of Q,as by a more accurate analysis which includes the coupling of the temperature is lowered the polymer will gradually be density with topology, but we will not examine the constant thermodynamically confined in its conformational search to density case in much detail for reasons discussed below, ex- smaller and smaller basins of states. The basin around the cept to make the following remarks: ͑1͒ The true coupling native state is the largest basin with the lowest ground state, between density and Q constraints need not be strong to and hence is the first basin within which to be confined. Its obtain a double-well free energy structure. ͑2͒ For mono- characteristic size at temperature T is just the number of meric units with pair interactions, at constant density, the states within overlap Q0(T), where Q0(T) is the value of molten globule and folded minima are not at Qϭ0 and 1, overlap Q that gives Tg(Q0)ϭT in Eq. ͑2.8͒. Thus there is respectively. The position of the molten globule state is near now no longer a single glass temperature at which ergodic the maximum of the entropy of the system, which is at confinement suddenly occurs, as in the REM, but there is a QХ0.1 for the 27-mer due to the interplay of confinement continuum of basin sizes to be localized within at corre- effects and the combinatorial mixing entropy inherent in the sponding glass temperatures for those basins. ‘‘coarse-grained’’ description Q.12 The native minimum If Tg(Q) has a single maximum at say Q*, the glass shifts to Qϭ1 when many-body interactions are introduced transition is characterized by a sudden REM-like freezing to ͑see the next section͒. ͑3͒ The barrier height, at position a basin of configurations whose size is determined by Q*. Q0Х0.25 for the 27-mer with proteinlike parameters 0 The range of glass temperatures will turn out to be lower ͑TF/TGХ2͒, is small (⌬F ϷkBTF), due to the effective can- than the temperature at which a folding transition occurs ͑see cellation of entropy loss by negative energy gain, as the sys- Fig. 3͒, so that this model predicts a proteinlike heteropoly- tem moves toward the native state ͑This cancellation is re- mer whose folded state is stable by several kBT at tempera- duced when many-body forces are taken into account͒. ͑4͒ tures where freezing becomes important. A replica- When a linear form for the entropy is used in Eq. ͑2.9͒, e.g., symmetric analysis of the free energy is therefore sufficient s(Q)ϭs0(1ϪQ) instead of the more accurate s(Q) obtained to describe the folding transition to such deep native states in Ref. 12 the double minimum structure disappears and is that are minimally frustrated. replaced by a single minimum near TF at QϷ1/2, with the

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2936 Plotkin, Wang, and Wolynes: Protein folding funnels

Qϭ0 and Qϭ1 states becoming free energy maxima. So question. The condition that the square root term in Eq. folding is downhill or spinodal-like in this approximation. ͑2.11͒ be real gives the minimum gap for global foldability in terms of the roughness ⑀, A. Effects of cooperative interactions ͑c͒ ␦⑀n 2s0 As the interactions between monomeric segments be- ϭͱ Ϸ&, ͑2.12͒ come more explicitly cooperative, the energetic correlations ⑀ zN between states become significant only at greater similarity, where the minimum folding temperature is then with the system approaching the REM limit for ϱ-body in- (c) (c) kBTF Ϸ␦⑀n /2 ͑or equivalently, one can obtain the maxi- teractions, where the statistical energy landscape assumes a mum roughness for foldability as Ϸ1/& of a given gap en- rough ‘‘golf-course’’ topography with a steep funnel close to ergy͒. For typical proteins ͑with folding temperatures at the native state. Ϸ330 K͒ gap energies are ͑at least͒Ϸ1 kcal/mol ͑lattice In the presence of m-body interactions, the homopoly- unit͒. Note that Eq. ͑2.12͒ is precisely the same result, as it mer collapse energy scales as a higher power of density 23 mϪ1 should be, to that obtained previously in the context of ͑ϳ¯⑀¯z ͒. For even moderate mϳO͑1͒ a first-order phase finding optimal folding energy functions, by requiring the transition to collapsed states results, which effectively con- quantity TF/TGϾ1, where the glass temperature TG fines all reaction paths in the coordinate Q between molten ϭ ͱz ⑀2/(2s ) is evaluated at the molten globule overlap globule and folded states to those where the density is con- N 0 QϭQg . stant and Ϸ1. So within this constant density approximation Evaluating F(T,Q,E )/N ͓Eq. ͑2.10͔͒ with proteinlike we can investigate the nature of the folding transition as a n energetic parameters at the folding temperature TF ,weob- function of the cooperativity of the interactions, and see how tain free energy curves as in Fig. 1͑a͒, plotted for illustrative the correlated landscape simplifies to the REM in the limit of examples with mϭ3 and mϭ12, for a 27-mer lattice protein. m-body interactions with large m. Note that the transition state ensemble ͑the collection of In the presence of m-body interactions, the Q depen- states at QϭQ* where the free energy is a maximum͒ be- dence in the pair energy distribution ͑2.4͒ scales with Q as mϪ1 comes more and more native like ͑and thus the ensemble Q , when Q is defined as in Eq. ͑2.2͒, and the terms becomes smaller and smaller, eventually going to 1 state in ␴ijk•••m in the modified Hamiltonian factorize into pair inter- the REM͒ as the energy correlations become more short- action terms ␴ij␴jk••• through a suitable decomposition law ranged in Q ͑i.e., as m increases͒—see Fig. 1͑b͒. The corre- such as in the superposition approximation in the theory of 22 sponding free energy barrier then grows with m as the ener- fluids. Using this modified pair distribution along with the getic bias ͑ϳQmϪ1͒ overcomes the entropic barrier only collapsed homopolymeric state as our zero point energy, the much closer to the native state, and the barrier becomes more free energy ͑2.9͒ becomes and more entropic and less energetic ͓see Fig. 1͑c͔͒. There are less kinetic paths to the native state through the transition F mϪ1 state ensemble. ͑T,Q,En͒ϭϪTs1͑Q͒ϪQ zN͉␦⑀n͉ N As was already mentioned, the above analysis was for a 2 polymer of constant collapse density. However, experimental zN⑀ Ϫ ͑1ϪQ2͑mϪ1͒͒, ͑2.10͒ evidence of folding, as well as numerical evidence for lattice 2T models, suggest a coupling of density with nativeness, with energetically favorable nativelike states typically being where s1(Q) is the entropy as a function of constraint Q for denser. So to this end we now investigate in detail a simple a fully collapsed polymer. For pure three-body interactions theory coupling collapse density ␩ with nativeness Q, as- and higher, the globule and folded states are very nearly at suming a native ͑Qϭ1͒ state which is completely collapsed Q 0 and Q 1, respectively see Fig. 1 a . To the extent Х Х ͓ ͑ ͔͒ ͑␩ϭ1͒. Including this effect in Eq. ͑2.9͒ will complete our that this approximation is good, we can equate the free en- simple model of the folding funnel topography in two ergies of the molten globule ͑at QϭQMGХ0͒ and folded ͑Q reaction-coordinate dimensions. ϭ1͒ structures and obtain an m-independent folding tem- perature ͑note again that this is not a good approximation for pair interactions͒, III. A SIMPLE THEORY OF COLLAPSE The GREM theory for random heteropolymers devel- 2 zN͉␦⑀n͉ 2s0⑀ oped by us earlier investigates the interplay between entropy TFϭ 1ϩͱ1Ϫ 2 , ͑2.11͒ 2s0 ͩ zN␦⑀ ͪ loss and energetic roughness as a function of similarity to n any given reference state, all at fixed density. However for where s0 is the maximum of the entropy as a function of exceptional reference states such as the ground state of a constraint Q ͑essentially the log of the total number of con- well-packed protein, the density is not independent of con- figurations͒. figurational similarity, so a theory of the coupling of density From expression ͑2.11͒ we can obtain a first approxima- ␩ with topological similarity Q must also be developed. tion to the constraint on the magnitude of the gap energy ␦⑀n We wish to obtain the polymer density ␩ as a function of in order to have a global folding transition ͑rather than both the fraction of total contacts ¯z/zN and fraction of native merely a local glass transition͒ to the low energy state in contacts Q. At low degrees of nativeness, a good approxima-

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2937

FIG. 2. ͑a͒ A model of the partially native protein can be pictured as a frozen native core surrounded by a halo of non-native polymer of variable density. ͑b͒ The halo density ␩H as a function of the fraction of the total contacts ¯z/zN .

tion to the density can be obtained by assuming homoge- neous collapse, or ␩ϭ¯z/zN . In this approximation the density is a function of total contacts only, irrespective of nativeness. At higher degrees of nativeness, we adopt a simple model consisting of a native ‘‘core’’ region of density ␩Х1 surrounded by a typically less dense ‘‘halo’’ region ͑␩Hр1͒ FIG. 1. ͑a͒ Free energy per monomer F/N for a 27-mer, in units of kBTF as a function of Q, at constant density ␩ϭ1, with s0ϭ0.8, for proteinlike en- consisting of dangling loops and ends ͓see Fig. 2͑a͔͒.We ergetic parameters ͑␦⑀n ,⑀͒ϭ͑Ϫ2.28, 1.55͒. For these parameters TFϷ͉␦⑀n͉. then seek the functional form of the halo density ␩H(Q,¯z). For illustrative purposes, two values of m-body interactions are chosen: We make the approximation that at constant ¯z, the halo ͑solid line͒ pure three-body interactions; ͑dashed line͒ pure twelve-body interactions. Note the trends in height and position of the barrier, and note density is approximately constant. i.e., contacts will increase how in the mϭ12 case the free energy curve is essentially ϪT times the the halo density by reducing the effective loop size in the entropy curve s(Q) of Fig. 10 with ␩ϭ1, until Q is very large. ͑b͒ Position halo, which is determined by total contacts only, irrespective of the transition state ensemble Q* along the reaction coordinate Q as a of nativeness. Note that the total density includes both core function of the explicit cooperativity in pure m-body forces, m. The fact that and halo density, and is of course Q dependent. the asymptotic limit Qmax is less than one is due to the finite size of the system, so that Q, the fraction of native contacts, is not a continuous pa- We determine the function ␩H(¯z/zN) by first construct- rameter. ͑c͒ The free energy barrier height ⌬F in units of kBTF as a function ing a theory of loop density for a given length. Then we of the explicit cooperativity of the m-body force, m. The barrier height rises obtain the loop length as a function of total contacts by cal- to the limit of S(Qϭ0͒ as m ϱ, when it becomes completely entropic. Also shown are the energetic ͑→dashed͒ and entropic ͑solid͒ contributions to culating it along the line ¯zϭzNQ, and using the Q depen- the barrier. dence of free loop length from the high Q entropy theory of

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2938 Plotkin, Wang, and Wolynes: Protein folding funnels

Appendix A. The high Q theory gives lE(Q) and ͗lmelted(Q)͘ along the line Qϭ¯z/zN . Using the fact that ␩H should be a function of total contacts only gives the density for all Q and ¯z. The high Q expressions for loop sizes will be a good approximation since the molten globule and folded states are largely collapsed, so that Qϭ¯z/zN is in the strongly con- strained regime. To estimate the packing fraction of a polymer end E ␩H(lE), consider a chain of sequence length lE with intrinsic 3 volume lEb , confined to a half-plane making a self-avoiding walk. The characteristic volume of space it occupies is given 1 3 1 9/5 3 E Ϫ4/5 by 2Rrmsϭ 2lE b , so its packing fraction is ␩HХ2lE . For a loop of length l its characteristic volume is approximately 1 9/5 3 2(l/2) b , giving a denser packing fraction L 9/5 Ϫ4/5 ␩H(l)Х2ϫ2 l . Next, we average the density over the total number of loops and ends,

9/5 Ϫ4/5 Ϫ4/5 ͚ni␩i Nf2 l ϩ2lE ͗␩H͘ϭ ϭ2 . ͑3.1͒ ͚ni Nfϩ2

Equation ͑3.1͒ is the halo density when the loops and ends have no cross-links or bonds, i.e., along the line ¯zϭzNQ. The quantities l(Q)ϭ͗lmelted(Q)͘, lE(Q), and Nf(Q) are taken from the high Q entropy analysis ͑see Ref. 12 and Appendix A͒. Putting these values into Eq. ͑3.1͒ FIG. 3. The folding temperature TF and glass transition temperature TG as a gives the halo density ␩ (Q) along the line Qϭ¯z/z . Then function of the fraction of native contacts Q and the total contacts per H N monomer¯z. The folding temperature is above the glass temperature ͑2.8͒ for we use the independence of loop density on the nativeness of most values of Q and ¯z, for proteinlike energetic parameters used in fitting ¯ ¯ contacts made so that ␩H(Q,z/zN)ϭ␩H(z/zN). the theory to simulations ͑⑀Х1.1 and ␦⑀nХϪ2.1͒. Figure 2͑b͒ gives a plot of ␩H(¯z/zN). The value at ¯zϭ0 is the density of an end of length N/2 ͑Ϸ0.24 for a 27-mer͒. The true packing fraction should be roughly 1/2 of this, how- The scalar T /T (Q,¯z) is a rather simple indication of ever this artifact of the theory has little effect on the folding F G self-averaging, and a more rigorous method to determine the transition, which involves states with ¯z/z typically larger N degree of self-averaging would be to follow the calculations than 0.6. Х by Derrida and Toulouse24 of the moments of the probability We can now reinvestigate the glass transition tempera- distribution of Yϭ⌺ W2, measuring the sample to sample ture as a function of both Q and ¯z through the insertion of j j fluctuations of the sum of weights of the free energy valleys, the halo density (¯z/z ) into Eq. 2.8 . This gives the re- ␩H N ͑ ͒ and generalize them to finding the probability distribution of gions in the space of these reaction coordinates where the Y(Q,¯z). dynamics tends to become glassy if Tg(Q,¯z) is comparable to TF ͑see Fig. 3͒. We can see from Fig. 3 that TF/TG grows during the folding process, with a corresponding slowing IV. THE DENSITY-COUPLED FREE ENERGY down of the dynamics. TFХTG when QХ0.85. At these high values of Q the dynamics is glassy. At the transition state, In this section we obtain the free energy in terms of the (Q*,¯z*)Ϸ͑0.50, 0.92͒, TF/TGХ1.5 ͓these coordinates are reaction coordinates Q and ¯z through the introduction of the determined in Sec. IV, see Fig. 4͑b͔͒. TF/TGϷ2.3 in the halo density ͑3.1͒. ¯ molten-globule phase at ͑QMG , zMG͒ϭ͑0.14, 0.67͒. This The halo density ␩H(¯z/zN) from Eq. ͑3.1͒ will appear in value is larger than that obtained from simulations: the roughness term of Eq. ͑2.9͒ since this term arises as a TF/TG͑QMG ,¯zMG͒Х1.6. Values closer to 1.6 are easy to ob- result of non-native interactions which contribute to the total tain by adjusting the energetic parameters, however these variance of state energies. The entropic term in Eq. ͑2.9͒ new parameters move the transition state to lower Q values. contains the configurational entropy s␩(Q)atQand density In any event, both theory and simulations justify a replica- ␩. The most accurate values of barrier position and height symmetric treatment of the folding transition for minimally are obtained by inserting in s␩(Q) an interpolated form of frustrated polymers, particularly regarding the characteriza- density, between homogeneous collapse valid at low Q, and tion of the barrier. Thermodynamic quantities are self- the high Q core-halo formula ͑3.1͒, averaging at T , with the exception of very nativelike states F ¯z ¯z where the free energy becomes strongly sequence dependent full ␩H ͑Q,¯z͒ϭ͑1ϪQ͒ ϩQ␩H . for a finite size polymer. zN ͩ zN ͪ

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2939

surface as a function of the reaction coordinates Q, the frac- tion of native contacts, and ¯z, the total contacts per mono- mer,

F ͑T,Q,¯z͉E ͒ϭϪ¯z͉¯⑀͉ϪQz ͉␦⑀ ͉ϪTs͑Q,¯z͒ N n N n

2 zN␩H⑀ Ϫ ͑1ϪQ2͒, ͑4.1͒ 2T

full where s(Q,¯z) ϭ s(Q,␩H (Q,¯z)). The first term is an equilib- rium bias toward states that simply have more contacts and depends only on ¯z, whereas the second term is a bias toward states with greater nativeness and depends only on Q, al- though the maximum value of this bias ¯z␦⑀n does depend on ¯z. The entropic term biases the free energy minimum toward both small values of Q and ¯z where the entropy is largest. The free energy bias due to landscape roughness is largest when there are many non-native contacts ͑¯z is large and Q is small͒, which means that the protein can find itself in non- native low energy states due to the randomness of those non- native interactions. To model the protein behavior at the folding tempera- ture, the temperature T is held fixed at a value TF described below, and the other energetic parameters ͑¯⑀, ␦⑀n , and ⑀͒ are adjusted so as to give the free energy a double well structure with folded and unfolded minima of equal depth. A. Comparison with a simulation The 27-mer lattice model protein has been simulated for polymer sequences designed to show minimal frustration.4,25,26 The system we are interested in is modeled by a contact Hamiltonian as in Eq. ͑2.1͒, but now the beads representing the monomers are of three different kinds with respect to their energies of interaction. If like monomers are in contact, they have an energy ⑀ ϭϪ3, otherwise ⑀ ϭϪ1, ¯ ij ij FIG. 4. ͑a͒ The free energy vs Q and z, at the folding temperature TF , from where the interaction energy is in arbitrary units of order simulations ͑Ref. 5͒. The native minimum is in the upmost right-hand cor- ner. ͑b͒ Free energy surface at TF for the 27-mer, obtained from Eq. ͑4.1͒ kBT. Specific sequences are modeled to have a fully col- with the parameters (⑀,¯⑀,␦⑀n ,TF)ϭ͑1.1, Ϫ1.27, Ϫ2.1, 1.51͒. The surface lapsed native state with a specific set of 28 contacts and a has a double well structure ͑darker is deeper͒ with a transition state en- ground state energy of Ϫ3ϫ28. semble at Q* 0.50, and barrier height 3.0k T. Х Х B In the thermodynamic limit, the discrete interaction en- ergies used in the simulation give a Gaussian distribution for the total energy of the system by the central limit theorem, This gives more weight to the two behaviors in their respec- whose mean and width naturally depends on the fraction of tive regimes: mean-field uniform density at weak constraint native contacts. or low Q, and core-halo behavior at strong constraint. If we call ¯Z the total number of contacts of any kind, the ¯ The total density ␩totϭ¯z/zN appears in the homopoly- energy at Q and Z is determined simply by the energies of meric energy since this term is a function only of the number these native and non-native contacts above, while the en- of contacts, irrespective of whether they were native or not. tropy at high temperatures is the log of the number of states The extra gap energy defined with respect to fully collapsed satisfying the constraints of ¯Z total contacts and ␮ native states in Eq. ͑2.9͒ is an energetic contribution added to each contacts. However, the temperature range where folding oc- native bond formed, independent of ¯z, up to the limit curs is well below the temperature of homopolymeric col- QzNϭ¯z, where the gap term in Eq. ͑2.9͒ becomes simply lapse, and so the polymer can be considered to be largely ¯z␦⑀n . collapsed. This can be seen either by direct computation or These substitutions in Eq. ͑2.9͒ describe a free energy by computing the entropy, defined through

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2940 Plotkin, Wang, and Wolynes: Protein folding funnels

fying the conditions for global foldability. The system has a S͑Q,¯z,T͒ϭϪ pi log pi double-well structure with a weakly first-order transition be- ͚i tween a semicollapsed globule, at ͑QMG ,¯zMG͒Х͑0.14, 0.67͒, eϪEi /T eϪEi /T and a fully collapsed, near-native folded state at ϭϪ͚ log , ͑4.2͒ (Q ,¯z ) 0.98, z 1.03 . i ͩ Z ͪ ͩ Z ͪ F F Х͑ NХ ͒ p p In what follows let NCϭthe number of native residues in the core, and let N ϭthe number of non-native residues in where Z p is the ͑partial͒ partition function, the sum being H over all of the states consistent with the constraints charac- the halo. The molten globule states have a halo density of terized by ␮ and ¯Z above. ␩HХ0.91, and there are NCХNzNQ*/zNQϳ8 native residues 5 6 Onuchic et al. have obtained the free energy as FϭE in the various ϳN!/NC!NH!Ϸ2ϫ10 cores consisting of ϪTS for the 27-mer, which mimics the landscape of a small QMGNzNХ4 native contacts. The total density is given by helical protein, as a surface plot versus the total number of ␩ϭN␩H/(NC␩HϩNH) which is Х0.93 in the molten globule contacts per monomer ¯zϭ¯Z/N, and Qϭ͑total number of na- state. The folded state has a core with Х27 contacts, contain- tive bonds͒/28, see Fig. 4͑a͒. The largest value of Q for a ing Х26 monomers ͑almost all͒ and at density ␩Cϭ1 and a given ¯Z is ¯Z/28, because there cannot be more native con- collapsed halo of about 1 monomer at density ␩Hϭ1. The tacts than there are total contacts, hence the allowable region folded state is fully collapsed. It is energetically favored over is the upper left-hand side of the surface plot. The surface the molten globule by about a kBT per monomer plot in Fig. 4͑a͒ has a double minimum structure at a specific ͑E f ϪEmgХϪ28.3kBT͒ and thus less entropic T S ϪS Ϫ28.3k T . ͑folding͒ temperature TFϭ1.51 on the energy scale where ͓ F͑ f mg͒Х B ͔ The core residues in the transition state ensemble at ⑀ijϭ͕Ϫ3,Ϫ1͖ described above. The free energy barrier of (Q*,¯z*) 0.50, 0.92 contain approximately Ϸ2kBTF is small compared with the entropic barrier of the Х͑ ͒ Nz Q*/z 16 monomers. The 11 remaining monomers in system ͑ϳ14kBTF͒. The transition ensemble at reaction co- N NQХ ordinates (Q*,¯z*)Х͑0.54, 0.88͒, consists of about the dangling loops and ends are nearly collapsed, with 5 exp Ns(Q*,¯z*)Х2000 thermally occupied states and ϳ10 ␩HХ0.99, so the total density is very nearly nearly one. The configurational states. transition state ensemble in the theory consists of Ϸ1600 5 There are four energetic parameters in the free energy thermally occupied states and Ϸ4ϫ10 configurational states. Its thermal entropy is 7.4k . The folding free en- theory ͑⑀, ¯⑀, ␦⑀n , and kBTF͒, and three parameters in the Х B ergy barrier F F(Q*,¯z*)ϪF Q ,¯z is 3.0k T. simulation ͓⑀͑like units͒, ⑀͑unlike units͒, and kBTF͔, plus the ⌬ ϵ ͑ MG MG͒ Х B roughness parameter, which is implicitly evaluated through The energetic gain from Eq. ͑2.6͒ is ХϪ17.7kBT, and the the diversity of energies consistent with overlap Q. Minimal entropic barrier from Eq. ͑2.7͒ is Х20.7kBT. The full free frustration in the lattice simulation is implicit in the sequence energy barrier ͑Х3.0kBT͒ arises from the delicate incomplete design, in that the ground state is topologically consistent cancellation of entropic losses with energetic gains. with all the pair interactions between like monomers. Ener- One aspect of lattice simulations also present in our getic correlations are implicit from minimal frustration, since theoretical model is an essentially native folded minimum 26,27 states at similarity Q to the native state are also low in en- which persists up to high temperatures. In simulations ergy. this is a lattice effect. The very few near native states avail- We should note however that the gap energy in these able on a lattice create an entropic barrier to escape from the simulations is functionally somewhat different than our theo- native state. This barrier has led to some confusion in iden- 27 retical model in that contacts between like monomers are tifying the position of the folding transition state. The na- always favored whether native or not, and in the theory only tive state has a glass temperature TG(Qϭ1), which in the true native contacts have explicit contributions to the energy GREM formalism is determined by the ratio of energetic gap. gains to entropic losses as the native state is approached. The We do not undertake here a comparison of simulations at discrete nature of lattice models gives the native state an all parameter values with theory. Rather, we compare simu- effective high glass temperature above any simulation tem- lations and theory only for the 27-mer, with parameters cho- perature. In our analytical model, the collective nature of sen to be proteinlike according to the corresponding states melting ͑i.e., lC and lEC are Ͼ0, see Appendix A͒ leads to a principle analysis of Onuchic et al.5 The scheme for com- similar gap in the density of states along the Q coordinate. parison between the simulations and theory for the 27-mer is This also causes a weak barrier ͑Х0.4kBT͒ between a near- folded local minimum and the native folded minimum see to hold TF fixed at the simulational value of 1.5, and then ͓ determine the remaining three energetic parameters Fig. ͑4b͔͒. Again this weak barrier is a result of discrete quantities in the simulations and theoretical model. (⑀,␦⑀n ,¯⑀) by the conditions of folding equilibrium ͓double- well structure of F(Q,¯z)͔, and a barrier position and height More elaborate theories should incorporate a local na- consistent with simulations and experiments. tiveness parameter Q͑x͒ which varies in space, allowing for The result of this is shown in Fig. 4͑b͒, which shows the rigid as well as fluctuating regions of the protein. free energy surface at T obtained from the parameters F B. Explicit three-body effects (⑀,¯⑀,␦⑀n ,TF)ϭ͑1.1, Ϫ1.27, Ϫ2.1, 1.51͒. These values are very compatible with those of the simulations. The gap to It is interesting to investigate the effects of explicit roughness ratio for this minimal model is ͉␦⑀n͉/⑀Х1.9, satis- many-body cooperativity on the folding funnel by introduc-

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2941

a weakly decreasing function of ␣ ͓Fig. 5͑a͔͒ ͑although as described above, Q* is not independent of m, the order of the m-body interactions͒. We eventually expect this trend to reverse for larger m as in Fig. 1͑b͒. The position of the folded state QF remains near native and QMG is also roughly constant. However as homopolymer attraction becomes more collective ͑increasing ␣͒, ¯zMG increases and the molten glob- ule state becomes denser ͓Fig. 5͑a͔͒. The transition becomes more first-order-like with increasing ␣, as the trend in ener- getic loss is not as great as entropic loss, so that the free energy barrier increases with ␣ ͓Fig. 5͑b͔͒.

C. Dependence of the barrier on sequence length It is simple in our theory to vary the polymer sequence length. One recalculates s(Q) at constant density for a larger 12 chain and inserts this, along with the density ␩H ͓Eq. ͑3.1͔͒ at the larger value of N, into the free energy ͑4.1͒. To model the barrier at TF , one must scale the temperature with N since in our model larger proteins fold at higher temperatures ͓see Fig. 6͑a͔͒, e.g., in Eq. ͑2.11͒ TFϰzN . The barrier posi- tion Q* mildly decreases with N ͓Fig. 6͑b͔͒.28 Plotted along with the theoretical curve are two experimental measure- ments of the barrier position. The square represents the mea- surement for truncated ␭ repressor,17 a ϳ80 residue protein fragment with largely helical structure. The corresponding states analysis5 shows that the formation of helical secondary structure within the ␭ repressor makes it entropically similar to the lattice 27-mer. Also plotted in Fig. 6 ͑circle͒ is the experimental barrier measurement for Cytochrome C,29 a FIG. 5. ͑a͒͑Left axis͒ Positions of the barrier Q* and total bonds per 104 residue helical protein which is entropically similar to ¯ monomer in the molten globule ͑zMG͒, as a function of the three-body co- the 64-mer lattice model. The simple proposed model has the efficient ␣. ͑Right axis͒ Decrease with ␣ in the necessary energy gap to maintain equilibrium at TF . Plots are for the 27-mer with fitted energy same decreasing trend in the position of the transition state parameters ͑Sec. IV A͒. ͑b͒ Free energy barrier ⌬F ensemble that is observed experimentally, but decreases ¯ ¯ ϭF(Q*,z*)ϪF͑QMG ,zMG͒ in units of kBTF , and its energetic and entropic much slower. This suggests that a local nucleation descrip- contributions, for the 27-mer, as a function of ␣. The barrier grows moder- ately with ␣, i.e., the energetic drop decreases faster than entropic losses, tion may become more appropriate as N increases, rather due to the collective nature of the energetic interactions. than the homogeneous mean field theory proposed here. However the question as to whether energetic heterogeneity induces a specific nucleus30 rather than an ensemble of nu- ing a three-body interaction in addition to the pair interac- clei with correspondingly many kinetic paths, is an open is- tions already present. Models with such partially explicit co- sue. In Appendix C, we show for thoroughness that experi- operativity mimic the idea that only formed secondary mental plots of folding rate versus equilibrium constant used structure units can couple, and have been introduced in lat- in the experiments above are indeed a measure of the posi- tice models by Kolinski et al.4 Three-body interactions enter tion of the transition state ensemble. into the energetic contributions of Eq. ͑4.1͒ as an additional Figure 6͑c͒ shows the roughly linear trend of the barrier Q2 term in the bias and roughness, and ¯z 2 term in the ho- height with N. This mean-field result applies for small N;as mopolymer attraction, so that those terms in the free energy Ngrows, fluctuation mechanisms begin to dominate the scal- become ing behavior, and may reproduce the sublinear scaling with N seen in lattice studies31 and by scaling arguments.32 Ϫ͓͑1Ϫ␣͒¯zϩ␣¯z 2͔͉¯⑀͉Ϫ͓͑1Ϫ␣͒Qϩ␣Q2͔z ͉␦⑀ ͉ N n The overall folding time results from a combination of 2 zN␩H͑¯z/zN͒⑀ thermodynamic barrier crossing dealt with here, and kinetic Ϫ ͕1Ϫ͓͑1Ϫ␣͒Qϩ␣Q2͔2͖, ͑4.3͒ 26,33 2T diffusion between locally stable basins. Experimental measurements from the folding rate at TF can thus lead to an where ␣ is a measure of the amount of three-body force estimate for the reconfiguration time at Q*. For example, present. In the model defined by Eqs. ͑4.1͒ and ͑4.3͒, pro- rearranging Eq. ͑C2͒, the folding rate over the reconfigura- teins with more three-body forces need not be as strongly tion rate is given by optimized, and so the magnitude of the gap is a decreasing F͑Q*͒ϪF function of ␣ at fixed TF , ¯⑀, and ⑀ ͓see Fig. 5͑a͔͒. With this u ln͑kFt͑Q*͒͒ϭϪ . ͑4.4͒ correction included, we find the barrier position Q*͑␣͒ to be kBT

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2942 Plotkin, Wang, and Wolynes: Protein folding funnels

FIG. 6. ͑a͒ The folding temperature TF is an increasing function of polymer sequence length N. ͑b͒ Position of the barrier Q* as a function of sequence length N. The solid line is the theory as determined by Eq. ͑4.1͒, and the FIG. 7. ͑a͒ Two plots of the free energy vs Q for the 27-mer with the fitted points marked are experimental results ͑see the text͒. ͑c͒ Free energy barrier parameters ͓Fig. 4͑b͔͒. The upper curve is the free energy with the fitted height ⌬F in units of kBTF , as a function of sequence length N. stability gap ␦⑀nϭϪ2.1, and ␦⑀nϭϪ2.5 in the lower curve. These one- dimensional plots are the most folding probable paths ͑minimum free energy along coordinate Q determined by dF/dz¯ϭ0͒ on the 2D surface plot of Fig. 4. From the figure we can see that as ͉␦⑀n͉ increases the folding becomes downhill. Folding becomes purely downhill with no barrier at 17 ͉␦⑀n͉Х2.75. For the ␭ repressor at the folding midpoint, the folding rate Because of the stability of the molten globule position, the barrier shifts Ϫ1 kF is about 400 s . Using the barrier height slightly to lower Q* ͑from 0.51 to 0.47͒, and decreases in height ͑from about 3kBT to 0.6kBT͒. ͑b͒ Position of the barrier Q* as a function of F(Q*)ϪFuХ3kBT for the corresponding 27-mer gives a re- magnitude of the energy gap ͉␦⑀ ͉ ͑in units of k T͒, for the fitted 27-mer configuration time t(Q*) Ϸ 10Ϫ4 s. Since the Rouse–Zimm n B described in ͑a͒. Q* weakly decreases until ͉␦⑀n͉/kBTХ1.75, and then rap- time is typically in the ␮s range, this suggests that configu- idly merges with the position QMG of the molten globule at ͉␦⑀n͉/kBTХ1.82 rational diffusion in the transition region is typically acti- as the barrier vanishes. ͑c͒ Free energy barrier in units of kBT vs magnitude vated. Of course there are many issues involved in the local of stability gap ͉␦⑀n͉. The short dashed line is the entropic contribution to the barrier, and the long dashed line is minus the energetic contribution. These dynamics and structure of folding proteins, which make pre- two terms merge and become zero at ͉␦⑀n͉/kBTХ1.82 where the barrier cise comparisons with specific examples difficult. The num- vanishes.

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2943

FIG. 9. ͑Solid line͒ Probability to be in the unfolded state vs temperature ͑in simulation units͒ for the two-order parameter model used in fitting the 27- FIG. 8. ͑Right axis͒ Plot of the logarithm of the folding rate vs the logarithm mer simulations. With this model TFϭ1.51 and ⌬T/TFХ0.14. ͑Dashed line͒ of the unfolding equilibrium constant. ͑Left axis͒ Reading the slope of the Same probability for the one-order parameter model using Eq. ͑4.5͒. Here folding rate gives a measure of the position of the barrier Q*. Also plotted TFϭ2.14 and ⌬T/TFХ0.22. is the actual value of Q* directly calculated from the free energy curves. The values compare well for most values of the gap where the free energy has a double-well structure. 1 Ϫ1 P ϭ 1ϩexp Ϫ ͑F ϪF ͒ , u ͫ ͫ T f u ͬͬ bers we quote should be interpreted as estimates showing the reasonableness of the current parametrizations. where Fu and F f are the free energies at temperature T of the unfolded and folded minima ͑at TF , F f ϭFu and Puϭ1/2͒. This can be used to obtain denaturation curves as a function D. Dependence of the barrier on the stability gap, at of temperature. For illustration, we make the simplifying as- fixed temperature and roughness sumptions that both the folded and globule states are col- As the stability gap is increased at fixed temperature, lapsed, making Pu independent of ¯⑀, and that the folded and folding approaches a downhill process, with a folded global globule states occur approximately at QFϭ1 and QMGϭ0. equilibrium state ͓see Fig. 7͑a͔͒. We can see from Fig. 7͑a͒ As the temperature is lowered, the molten globule that the barrier position and height are decreasing functions freezes into a low energy configuration at Tg 2 of stability gap, with true downhill folding ͑zero barrier͒ oc- ϭ ⑀ͱzN(1ϪQmg)/(2s(Qmg)) Х ⑀ͱzN /(2s0) ͑see Fig. 3͒, and curring when ͉␦⑀n͉/⑀Х2.5 or ͉␦⑀n͉/TFs0Х1.4 for the 27-mer the expression for Pu becomes one of equilibrium between ͓see Figs. 7͑b͒ and 7͑c͔͒. At folding equilibrium, two temperature independent states with the corresponding ͉␦⑀n͉/TFs0Х1.1. Thus, achieving downhill folding requires a ‘‘Shottky’’ form of the energy and specific heat: considerable change of stability—an estimate for a 60-aa 2 Ϫ1 ͉␦⑀n͉ ⑀ protein ͑27-mer lattice model͒ would be an excess stability P ϭ 1ϩexp͑ϪNs ͒exp Nz Ϫ , T ϽT u ͫ 0 Nͩ T 2T2ͪͬ g of Ϸ12kBTF . We can apply the equations of Appendix C to changes of Ϫ1 NzN 2s0 the transition state free energy by modifying stability. Figure ϭ 1ϩexp ͉␦⑀ ͉Ϫ⑀ , TϽT . ͑4.5͒ T n ͱz g 8 shows a plot of the log of a normalized folding rate ͫ ͩ N ͪͬ ln(kF/k0) vs the log of the unfolding equilibrium constant The condition that TF/TGу1 gives Eq. ͑2.12͒. Using the ln Keq , whose slope is a measure of the barrier position Q*. glass temperature of the globule state, this is equivalent to The increasing magnitude of slope with increasing ln K eq ⑀2 means that the barrier position is shifting toward the native Tgу , state as the gap decreases. Also shown is a comparison be- ␦⑀n tween the position of the barrier Q* calculated directly from which is the temperature where the high T expression for Pu the theory, and Q* as derived from the slope of ln(kF/k0) in 4.5 has a minimum. Hence cold denaturation will not be using Eq. ͑C5͒. The linear free energy relation works well for seen in the constant density model ͑as it would if there were the range of parameters having a double-well free energy no glass transition͒, and Pu will always decrease to zero at surface. low temperatures. In the limit of large T, Eq. ͑4.5͒ becomes Ϸ1/͑1ϩexpϪNs ͒Ϸ1, indicating denaturation. At small T E. Denaturation with increasing temperature 0 Eq. ͑4.5͒ tends to zero as expϪ͑const.ϫN/T͒. The probability Pu for the protein to be in the unfolded Allowing density to vary modifies the denaturation be- globule state at temperature T is havior. In Fig. 9, denaturation curves are plotted for the vari-

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2944 Plotkin, Wang, and Wolynes: Protein folding funnels able density model with fitted parameters ͓(⑀,¯⑀,␦⑀n ,TF) protein density with nativeness. This resulted in a density ϭ͑1.1, Ϫ1.27, Ϫ2.1, 1.51͔͒, and for the one-dimensional contraction in the process of folding. During folding, a dense ͑1D͒ case for a completely collapsed ͑␩ϭ1͒ protein. In the inner native core forms, which grows while possibly inter- 1D case, the same energetic parameters ͓͑␦⑀n ,⑀͒ϭ͑Ϫ2.1, changing some native contacts with others upon completion 1.1͔͒ are used in Eq. ͑4.5͒ but the molten globule entropy is of folding. This core is surrounded by a halo of non-native the fully collapsed value ͑SmgХ27ϫ0.88͒. TF in the 1D case polymer which shrinks and condenses in the folding process, increases to Х2.14, in units where TFϭ1.51 as in the simu- as topological constraints upon folding make dangling loops lation. shorter and denser. The folded free energy minimum is es- If we define the width ⌬T of the transition between 10% sentially native in the model when parameters are chosen to and 90% denatured, the ratios of widths to folding tempera- fit simulated free energy curves for the 27-mer. tures ⌬T/TF are about 0.22 and 0.14 for the 1D and 2D Explicitly cooperative interactions were shown to en- cases, respectively. Allowing density to vary sharpens the hance the first-order nature of the transition through an in- transition. The values lie between those obtained in lattice crease in the size of the barrier, and a shift toward more 25,34 studies of the 27-mer, where ⌬T/TFϷ0.3 for foldable nativelike transition state ensembles ͑i.e., at higher Q*͒. For sequences, and from measurements of the thermal denatur- the constant density scenario the barrier becomes almost en- 35 ation of ␭ repressor, where ⌬T/TFϷ0.05. This suggests tirely entropic when the order m of the m-body interactions that many-body forces may play a role in the stabilization of becomes large, and the transition state ensemble becomes the native state for laboratory proteins. correspondingly more nativelike. In the energy landscape picture, as explicit cooperativity increases, the protein fold- ing funnel disappears, and the landscape tends toward a golf- V. CONCLUSION course topography with energetic correlations less effective In this paper we have shown that if the energy of a given and more short range in Q space. The correlation of stability configuration of a random heteropolymer is known to be gaps and TF/TG ratios with kinetic foldability is true only for lower than expected for the ground state of a completely fixed m much less than N. random sequence ͑i.e., the protein is minimally frustrated͒, A full treatment of the barrier as a function of the three then correlations in the energies of similar configurations energetic parameters (⑀,¯⑀,␦⑀n) plus temperature T would lead to a funneled landscape topography. The interplay of involve the analysis of a multidimensional surface defining entropic loss and energetic loss as the system approaches the folding equilibrium in the space of these parameters. We native state results in a free energy surface with weakly two- shall return to this issue in the future, but we have deferred it state behavior between a dense globule of large entropy, and for now in favor of the simpler analysis of seeking trends in a rigid folded state with nearly all native contacts. The weak the position and height of the barrier as a function of indi- first-order transition is characterized by a free energy barrier vidual parameters such as ␦⑀n and T. which functions as a ‘‘bottleneck’’ in the folding process. The barrier is small compared with the total thermal en- ACKNOWLEDGMENTS ergy of the system—on the order of a few kBT for smaller proteins of sequence length about 60 amino acids. For these We wish to thank R. Cole, J. Onuchic, J. Saven, Z. small proteins the model predicts a position of the barrier Q* Schulten, and N. Socci for helpful discussions. This work about halfway between the unfolded and native states was supported NIH Grant No. 1RO1 GM44557, and NSF ͑Q*Х1/2͒. For larger proteins, the barrier height rises lin- Grant No. DMR-89-20538. early, and its position Q* moves away from the native state toward the molten globule ensemble roughly as 1/zN , due essentially to the fact that the entropy decrease per contact is APPENDIX A independent of N initially. Experimental measurements of the barrier for fast folding proteins are consistent with this Here we summarize the derivation of the configurational predicted shift in position with increasing sequence length, entropy S(Q,␩), as a function of the topological constraints but with a shift somewhat greater in magnitude. and density ␩. For a complete derivation see Ref. 12. The unfolded and transition states are not single configu- The entropy of an unconstrained polymer is given by rations but ensembles of many configurations. The transition ␯ 1Ϫ␩ state ensemble according to the theory consists of about 1600 S ͑␩͒ϭN ln Ϫ ln͑1Ϫ␩͒ , ͑A1͒ 0 ͫ e ͩ ␩ ͪ ͬ thermally accessible states for a small protein such as the ␭ repressor. There are about Ϸ4ϫ105 configurational states in where ␯ϭthe number of configurations per monomer ͑six for the transition state ensemble, less than the ϳ107 ways of three-dimensional cubic lattice models͒. choosing 16 core residues in the 27-mer minimal model, so For a weakly constrained polymer ͑low values of Q͒, the that many but not all nuclei are sampled. The multitude of entropy loss from the unconstrained state can be decomposed states at Q* seen in the theory is in harmony with a picture into three terms; of a transition state ensemble of generally delocalized nuclei, S ͑Q,␩͒ϭS ͑␩͒ϩ⌬S ͑Q,␩͒ϩ⌬S ͑Q,␩͒ a subject investigated recently by various authors.36–38 low 0 B mix A simple theory of collapse was introduced to couple ϩ⌬SAB͑Q,␩͒. ͑A2͒

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2945

3 1/3 The first term ⌬SB(Q,␩) is the reduction in searchable phase where ͑⌬␶/b ͒ is the ratio of the bond radius to the persis- space due to ␮ϭQNzN␩ cross-links in the polymer chain, tence length. 16 first derived by Flory. For polymers in three dimensions There are many ways QNzN␩ cross-links can be formed this is from the NzN␩ total contacts in each molten globule state, at least for weakly constrained polymers. This results in a re- duction of the entropy lost due to a combinatoric or mixing ⌬S ͑Q,␩͒ϭ 3NQz ␩ ln͑CQz ␩͒, ͑A3͒ B 2 N N entropy given by where zN , the coordination number for a chain of length N, is given by

S Q, ϭϪNz Q ln Qϩ 1ϪQ ln 1ϪQ . 1 mix͑ ␩͒ N␩͓ ͑ ͒ ͑ ͔͒ z Ϸ Int͓2NϪ3͑Nϩ1͒2/3ϩ3͔ ͑A4͒ ͑A6͒ N N ͑Int͓...͔ means take the integer part͒, and There is an additional reduction in the searchable phase space because of the (Nz ␩ϪQNz ␩) native contacts that 2/3 N N 3 ⌬␶ cannot be formed because the overlap can be no larger than Cϭ , ͑A5͒ 4␲e ͩ b3 ͪ Q:

N CzN␩ N ⌬S ͑Q,␩͒ϭ dx ln͑1Ϫx3/2͒ϭϪ 3Cz ␩͑1ϪQ͒Ϫ2Cz ␩͑ln͓1Ϫ͑Cz ␩͒3/2͔ϪQ ln͓1Ϫ͑QCz ␩͒3/2͔͒ AB C ͵ 2C N N N N CQz ␩ ͫ N 2 1ϪͱCzN␩ 1ϩQCzN␩ϩͱQCzN␩ 1 ϩln ϩ2) arctan ͓1ϩ2ͱCzN␩͔ ͫͩ 1ϪͱQCz ␩ͪ ͩ 1ϩCz ␩ϩͱCz ␩ ͪͬ ͩ ) N N N 1 Ϫarctan ͓1ϩ2ͱQCzN␩͔ , ͑A7͒ ) ͪͬ

where C is given in expression ͑A5͒. N In Ref. 12, surface effects were also considered as reduc- lnlϭNM , ͑A8͒ ͚l ing the conformational search when the rms loop size was c comparable to the size of the globule. This somewhat modi- N fies the previous formulas for small values of Q. ͚ lmlϭNF , ͑A9͒ When QNzN␩ХN, there is about 1 cross-link per mono- 1 mer, Slow͑Q,␩͒Х0, and the low Q formula is no longer valid. At some point before this, configurations having fluctuations N in the mean field contact pattern begin to dominate the free ͚ lplϭlE , ͑A10͒ lEC energy. It then becomes more accurate to switch the descrip- where N and N are the numbers of melted ͑free͒ and tion of entropy loss from that due to a dilute ‘‘gas’’ of con- M F frozen ͑constrained͒ monomers, respectively. It follows that tacts, to an atomistic description ascribing entropy to lengths ͚ln ϩ͚lm ϩ2͚lp ϭN. Furthermore, if there are N fro- of chain melted out from the frozen ͑Qϭ1͒ three- l l l F zen monomers, there are z ␩N ϭQNz ␩ frozen bonds, so, dimensional native structure, and the combinatorics of these N F N e.g., ͚lnlϭN(1ϪQ)Ϫ2lE . Since ͚nlϭ͚mlϭNf, the av- pieces of melted chain. erage melted loop length at Q is given by In what follows, let Nf be the total number of melted or unconstrained pieces in the chain of any length, let lE be the ͚N ln lc l 1ϪQ 2lE͑Q͒ sequence length of the melted ends of the chain, and lc and ͗lmelted͑Q͒͘ϭ N ϭ Ϫ . ͑A11͒ ͚l nl f͑Q͒ Nf͑Q͒ lEC be the minimum or critical lengths of the melted pieces c or ends, respectively. We can characterize a state by the This expression will be useful in modeling the polymer den- number distribution of melted and frozen pieces of length l, sity as a function of collapse. ͕nl͖, and ͕ml͖, respectively, and the probability distribution In terms of the macroscopic parameters Q, f , and lE , the 12 for an end of the chain to have length l, ͕pl͖. It follows that entropy in the strongly constrained regime is given by

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2946 Plotkin, Wang, and Wolynes: Protein folding funnels

S͑Q, f ,lE͒ ␯ 1Ϫ␩ 2͑lECϪ1͒ ϭ ln Ϫ ln͑1Ϫ␩͒ 1ϪQϪf͑l Ϫ1͒Ϫ ϩQ ln QϪ͑QϪ f ͒ln͑QϪ f ͒Ϫ2 f ln f N ͫ e ͩ ␩ ͪ ͬͫ c N ͬ

lE lE lE lE ϩ 1ϪQϪ f ͑l Ϫ1͒Ϫ2 ln 1ϪQϪ f ͑l Ϫ1͒Ϫ2 Ϫ 1ϪQϪ fl Ϫ2 ln 1ϪQϪ fl Ϫ2 ͩ c N ͪ ͩ c N ͪ ͩ c Nͪ ͩ c Nͪ

lEϪ͑lECϪ1͒ lEϪ͑lECϪ1͒ lEϪlEC lEϪlEC ln N ϩ2 ln Ϫ2 ln Ϫ2 . ͑A12͒ ͩ N ͪ ͩ N ͪ ͩ N ͪ ͩ N ͪ N

Maximizing S(Q, f ,lE) with respect to f and lE gives Equations ͑A13͒ and ͑A14͒ determine f (Q) and lE(Q), from which we can obtain ͗l (Q)͘ from Eq. ͑A11͒. These 1ϪQϪ2l /N melted E quantities, plotted in Fig. 11 for parameters appropriate for a f ͑Q,lE͒ϭ ͑A13͒ lcϩlEϪlEC 27-mer, will be used in calculating the density as a function and of total contacts. In the high Q analysis the total polymer length in the halo is a linearly decreasing function of Q. lE lc ͑lEϪlEC͒ Q͑lcϩlEϪlEC͒ϩQϪ1ϩ2 ͫ Nͬ APPENDIX B

lE Here we derive the form of the thermodynamic functions Ϫ 1ϪQϪ2 ͑l Ϫl ϩ1͒lcϪ1␮lcϪ1ϭ0. ͑A14͒ ͩ Nͪ E EC in the ultrametric approximation, valid for high Q. We wish to find the free energy relative to the state n with energy En , The solution to Eq. ͑A14͒ is numeric for lcу3. These equa- tions put into expression ͑A12͒ gives the entropy Shigh͑Q,␩͒ in the strongly constrained ͑high Q͒ regime. The total entropy may be numerically approximated by an interpolation between the low Q and high Q expressions,

Stot͑Q,␩͒ϭ͑1ϪQ͒Slow͑Q,␩͒ϩQShigh͑Q,␩͒, ͑A15͒ which is plotted in Fig. 10 for a 27-mer with lcϭ3, and lECϭ1.5. Equation ͑A15͒ together with Eq. ͑3.1͒ gives S(Q,¯z) used in Eq. ͑4.1͒.

FIG. 10. Interpolated entropy from Eq. ͑A15͒ for a 27-mer with lCϭ3, and lECϭ1.5, for two densities: ␩ϭ1 and ␩ϭ0.7. The maximum value of S(Q) FIG. 11. ͑a͒ Number of melted pieces ͑free loops͒ Nf(Q)vsQfor a 27-mer at QminХ0.04 is the entropy at the statistically most likely overlap between with lCϭ4.5, lECϭ1.5. ͑b͒ Average free end sequence length in the polymer any two states for the 27-mer. The value of QϭQmaxϽ1 where the entropy 2lE(Q) ͑solid line͒, and mean melted loop length in the polymer at Q, vanishes is due to the finite chain length that must be collectively melted out Nf(Q)͗lmelted(Q)͑͘dashed line͒ for the same 27-mer. These parameters are from the frozen structure. used in deriving the density function ␩(Qϭ¯z/zN) ͓Fig. 2͑b͔͒.

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp Plotkin, Wang, and Wolynes: Protein folding funnels 2947

FIG. 12. Diagram of the hierarchy used in a microscopically ultrametric model.

as a function of Q, if all the states are organized ultrametri- cally as in Fig. 12. Call the energetic contribution common FIG. 13. Free energy surface at TF for an ultrametric 27-mer, obtained from to states with overlap Q ␾Ͻ , and call the different contribu- Eq. ͑B5͒ with the parameters (⑀,¯⑀,␦⑀ ,T )ϭ͑1.1, Ϫ1.17, Ϫ2.1, 1.54͒. The n n F tions ␾Ͼ for state n, and ␾Ͼ for the group ͑stratum͒ of states surface has a double-well structure with a transition state ensemble at Q* 0.52. at Q ͑␾Ͼ varies from state to state͒. As in Fig. 12, Х n Enϭ␾Ͻϩ␾Ͼ, Eϭ␾Ͻϩ␾Ͼ . n Without the constraint ␾Ͻϩ␾ϾϭEn , the distributions of ␾Ͻ 2 2 ⌬E and ␾Ͼ are simple Gaussians of widths Q⌬E and F͑Q,␾ ͒ϭ␾ ϪTS ͑Q͒Ϫ ͑1ϪQ͒. ͑B3͒ 2 2 2 Ͻ Ͻ ␩ T (1ϪQ)⌬E , where ⌬E ϭNzN␩⑀ . These are just the num- ber of bonds contributing to ␾ and ␾ times the individual Ͻ Ͼ Since ␾ is chosen from the distribution ͑B1͒, the free en- variance of a bond ⑀2. Native contacts are energetically het- Ͻ ergy function F in Eq. ͑B3͒ is chosen from the distribution erogeneous and also contribute to the total width. We will let ¯⑀ϭ0 for simplicity here since it has no bearing on the cal- 1 culation. PQ͑F͒ϷexpϪ FϪQEnϩTS ͑Q͒ 2⌬E2Q͑1ϪQ͒ ͩ ␩ For the state n of energy En however, ␾Ͻ is chosen from 2 2 the conditional probability distribution PQ(␾Ͻ͉En), where ⌬E ϩ ͑1ϪQ͒ , ͑B4͒ T ͪ P͑␾Ͻ ,En͒ PQ͑␾Ͻ͉En͒ϭ P͑En͒ with the mean and most probable free energy function 2 2 F*(Q,T,En) given by ␾Ͻ ͑EnϪ␾Ͻ͒ exp Ϫ exp Ϫ ͩ 2⌬E2Qͪ ͩ 2⌬E2͑1ϪQ͒ͪ ⌬E2 Ϸ 2 F*͑Q,T,En͒ϭQEnϪTS␩͑Q͒Ϫ ͑1ϪQ͒ ͑B5͒ En 2T exp Ϫ ͩ 2⌬E2 ͪ for temperatures above the glass temperature TG ͑the glass 2 ͑␾ϽϪQEn͒ temperatures in this model have been investigated Ϸexp Ϫ . ͑B1͒ 13,33 ͩ 2⌬E2Q͑1ϪQ͒ͪ elsewhere ͒. So the modification of the free energy from Eq. ͑2.9͒ is ͓which approaches ␦͑␾ ͒ and ␦͑␾ ϪE ͒ when Q 0 and 2 Ͻ Ͻ n → simply to replace ͑1ϪQ ͒ by ͑1ϪQ͒ in the roughness term. Q 1 respectively͔. Supposing we have found a state n with The entropy and energy follow straightforwardly from the E →and a given contribution ␾ , the number of states n Ͻ derivatives of F*(Q,T,En). N(E,q,En) having overlap Q with it and energy E is given Including homopolymer attraction in Eq. ͑B5͒ yields a by exp[S␩(Q)]P͑␾Ͼ͒ or free energy surface in coordinates (Q,¯z) as before ͑see Fig. 2 13͒. For the same energy parameters as in Sec. IV A, the ͑EϪ␾Ͻ͒ ln N͑E,Q,E ͒ϭS ͑Q͒Ϫ ͑B2͒ surface is very similar to the nonultrametric case. The fold- n ␩ 2⌬E2͑1ϪQ͒ ing temperature is slightly higher ͓(⑀,¯⑀,␦⑀n ,TF)ϭ͑1.1, with ␾Ͻ chosen from the distribution ͑B.1͒. Using 1/T Ϫ1.17, Ϫ2.0, 1.54͔͒ presumably because the ultrametric E, we obtain the free energy relative to the state n landscape is smoother. Equivalently, ultrametric polymersץ/ln N ץϭ as a function of Q, need not be as strongly optimized. The barrier is slightly

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp 2948 Plotkin, Wang, and Wolynes: Protein folding funnels larger and more native ͓⌬FХ4.4kBT and (Q*,¯z*)ϭ͑0.52, N. D. Socci and J. N. Onuchic, ibid. 101, 1519 ͑1994͒; K. A. Dill, S. 0.93͔͒, and its height scales roughly linearly with N as be- Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D. Thomas, and H. S. fore, up to 32.0k T for a 125-mer. Chan, Protein Sci. 4, 561 ͑1995͒; A. Sali, E. Shakhnovich, and M. Kar- Х B plus, J. Mol. Biol. 235, 1614 ͑1994͒. 5 J. N. Onuchic, P. G. Wolynes, Z. Luthey-Schulten, and N. D. Socci, Proc. APPENDIX C Natl. Acad. Sci. USA 92, 3626 ͑1995͒. 6 M. Mezard, G. Parisi, and M. A. Virasoro, Spin Glass Theory and Beyond For small, Q dependent changes in the free energy ͑4.1͒, ͑World Scientific, Singapore, 1986͒. K. H. Fischer and J. A. Hertz, Spin e.g., changes in temperature, we can approximate the change Glasses ͑Cambridge University Press, Cambridge, 1991͒. in the free energy at the position of the barrier ␦F(Q*)asa 7 L. D. Landau and E. M. Lifshitz, Statistical Physics, Part 1, 3rd ed., edited linear interpolation between the free energy changes of the by J. B. Sykes, M. J. Kearsley trans., Course of Theoretical Physics Vol. 5 ͑Pergamon, Oxford, 1980͒; N. Goldenfeld, Lectures on Phase Transi- unfolded and folded minima ␦Fu and ␦F f , here estimated to tions and the Renormalization Group ͑Addison-Wesley, New York, be at QuХ0 and Q f Х1, respectively, 1992͒. 8 T. Garel and H. Orland, Europhys. Lett. 6, 307 ͑1988͒; E. I. Shakhnovich ␦F͑Q*͒ХQ*␦F f ϩ͑1ϪQ*͒␦Fu . ͑C1͒ and A. M. Gutin, ibid. 8, 327 ͑1989͒; Biophys. Chem. 34, 187 ͑1989͒. 9 M. Sasai and P. G. Wolynes, Phys. Rev. Lett. 65, 2740 ͑1990͒. Furthermore let us approximate the kinetic folding time by 10 B. Derrida, Phys. Rev. B 24, 2613 ͑1981͒. 38 the thermodynamic folding time 11 Z. Luthey-Schulten, B. E. Ramirez, and P. G. Wolynes, J. Phys. Chem. 99, 2177 ͑1995͒; J. G. Saven and P. G. Wolynes, J. Mol. Biol. 257, 199 ͑F͑Q*͒ϪF ͒/k T ␶ϭt͑Q*͒e u B ͑C2͒ ͑1996͒. 12 S. S. Plotkin, J. Wang, and P. G. Wolynes, Phys. Rev. E 53, 6271 ͑1996͒. where Q* is the position of the barrier and ¯t(Q*) is the 13 B. Derrida and E. Gardner, J. Phys C 19, 2253 ͑1986͒. lifetime of the microstates of the transition ensemble. Then 14 M. S. Friedrichs and P. G. Wolynes, Science 246, 371 ͑1989͒;M.S. the log of the folding rate k is ϰF ϪF(Q*). The equilib- Friedrichs, R. A. Goldstein, and P. G. Wolynes, J. Mol. Biol. 222, 1013 F u ͑1991͒. rium constant for the unfolding transition Keq is the probabil- 15 S. Ramanathan and E. Shakhnovich, Phys. Rev. E 50, 1303 ͑1994͒;V.S. ity to be in the unfolded minimum over the probability to be Pande, A. Y. Grosberg, and T. Tanaka, J. Phys. II France 4, 1711 ͑1994͒. in the folded minimum, and so ln K ϰF ϪF . If we plot 16 P. J. Flory, J. Am. Chem. Soc., 78, 5222 ͑1956͒. eq f u 17 ln k vs ln K , the assumption of a linear free energy rela- G. S. Huang and T. G. Oas, Proc. Natl. Acad. Sci. USA 92, 6878 ͑1995͒. F eq 18 S. E. Jackson and A. R. Fersht, 30, 10428 ͑1991͒. tion ͑C1͒ and a stable barrier position results in a linear de- 19 J. F. Douglas and T. Ishinabe, Phys. Rev. E 51, 1791 ͑1995͒. pendence of rate upon equilibrium constant, with slope 20 Unless otherwise indicated, all have dimensions of Boltzmann’s constant kB . ␦ ln͑kF /k0͒ ␦͓FuϪF͑Q*͔͒ ␦FuϪ␦F͑Q*͒ 21 Unless otherwise indicated, all temperatures have units of energy, with Х ϭ ϭϪQ* ln K F ϪF F Ϫ F Boltzmann’s constant kB included in the definition of T. ␦ eq ␦͓ f u͔ ␦ f ␦ u 22 ͑C3͒ J. G. Kirkwood, J. Chem. Phys. 3, 300 ͑1935͒. 23 R. A. Goldstein, Z. A. Luthey-Schulten, and Peter G. Wolynes, Proc. Natl. so that experimental slopes of folding rates versus unfolding Acad. Sci. USA 89, 4918 ͑1992͒. 24 equilibrium constants are indeed a measure of the position of B. Derrida and G. Toulouse, J. Phys. Lett. 46, 223 ͑1985͒. 25 E. I. Shakhnovich and A. M. Gutin, Proc. Natl. Acad. Sci. USA 90, 7195 the barrier in our theory. ͑1993͒. If the unfolded and folded states are not assumed to be at 26 N. D. Socci, J. N. Onuchic, and P. G. Wolynes J. Chem. Phys. 104, 5860 Qϭ0 and Qϭ1, respectively, Eqs. ͑C1͒ and ͑C3͒ are modi- ͑1996͒. 27 fied by A. Sali, E. Shakhnovich, and M. Karplus, Nature 248 ͑1994͒. 28 An explanation for this is that in larger polymers, entropy loss due to topological constraints is more dramatic in Q because a smaller fraction of Q*ϪQU QFϪQ* ␦F͑Q*͒Х ␦F f ϩ ␦Fu ͑C4͒ total native contacts is necessary to constrain the polymer. That is, as N ͩ QFϪQU ͪ ͩ QFϪQU ͪ increases, the number of bonds per monomer in the fully constrained state and ͑Qϭ1͒ approaches the bulk limit of 2 ͓see Eq. ͑A4͔͒, while only one bond is needed to constrain a monomer. So this pushes the position Q* of the barrier in, roughly as 1/Z . Density coupling quantitatively modifies the ␦͓FuϪF͑Q*͔͒ Q*ϪQU N ϭϪ , ͑C5͒ above picture in that as N grows, there is a slower decrease in barrier ␦͓F ϪF ͔ ͩQ ϪQ ͪ position Q*. f u F U 29 T. Pascher, J. P. Chesick, J. R. Winkler, and H. B. Gray, Science 271, where QU and QF are the respective positions of the un- 1558 ͑1996͒. folded and folded states. So one can obtain the barrier posi- 30 V. I. Abkevich, A. M. Gutin, and E. I. Shakhnovich, Biochemistry 33 tion from the slope of a plot of ln k vs ln K , given the 10026 ͑1994͒. F eq 31 positions of the unfolded and folded states ͑see Fig. 8͒. A. M. Gutin, V. I. Abkevich, and E. I. Shakhnovich, Phys. Rev. Lett. 77, 5433 ͑1996͒. 32 D. Thirumalai, J. Phys. 1 France 5, 1457 ͑1995͒. 1 J. Bryngelson and P. G. Wolynes, Proc. Natl. Acad. Sci. USA 84, 7524 33 J. Wang, S. S. Plotkin, and P. G. Wolynes, J. Phys 1 France ͑in press͒. ͑1987͒. 34 N. D. Socci and J. N. Onuchic, J. Chem. Phys. 101, 1519 ͑1994͒. 2 J. Bryngelson, J. O. Onuchic, N. D. Socci, and P. G. Wolynes, Proteins: 35 G. S. Huang and T. G. Oas, Biochemistry 34, 3884 ͑1995͒. Struct., Funct., Genet. 21, 167–195 ͑1995͒. 36 L. S. Itzhaki, D. E. Otzen, and A. R. Fersht J. Mol. Biol. 254, 260 ͑1995͒. 3 M. S. Friedrichs and P. G. Wolynes, Tet. Comp. Meth. 3, 175 ͑1990͒.D. 37 E. M. Boczko and C. L. Brooks, Science 269, 393 ͑1995͒. Thirumalai and Z. Guo, Biopolymers 35, 137 ͑1995͒. 38 J. N. Onuchic, N. D. Socci, Z. Luthey-Schulten, and P. G. Wolynes, 4 A. Kolinski, A. Godzik, and J. Skolnick, J. Chem. Phys. 98, 7420 ͑1993͒; Folding & Design 1, 441 ͑1996͒.

J. Chem. Phys., Vol. 106, No. 7, 15 February 1997

Downloaded¬31¬Jul¬2003¬to¬142.103.236.75.¬Redistribution¬subject¬to¬AIP¬license¬or¬copyright,¬see¬http://ojps.aip.org/jcpo/jcpcr.jsp