Sequence-Structure Relationship PETER E
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Nati. Acad. Sci. USA Vol. 89, pp. 8721-8725, September 1992 Biophysics Protein folding funnels: A kinetic approach to the sequence-structure relationship PETER E. LEOPOLD*t, MAURICIO MONTAL*t, AND Jost NELSON ONUCHIC*§ Departments of *Physics and tBiology, University of California, San Diego, La Jolla, CA 92093-0319 Communicated by John J. Hopfield, June 5, 1992 (receivedfor review November 4, 1991) ABSTRACT A lattice model ofprotein folding is developed to distinguish between amino acid sequences that do and do not fold into unique conformations. Although Monte Carlo simu- 103,346 lations provide insights into the long-time processes involved in maximally protein folding, these simulations cannot systematically chart Random coil Random 1 compact the conformational energy surface that enables folding. By dense%aqu"C-uconiormersstate assuming that protein folding occurs after chain collapse, a ap~ ~ ~ _ n kinetic map of important pathways on this surface is con- structed through the use of an analytical theory of probability flow. Convergent kinetic pathways, or "folding funnels," guide folding to a unique, stable, native conformation. Solution of the probability flow equations is facilitated by limiting olding' / Reconfigurational metric treatment to diffusion between geometrically similar collapsed Funnel. | / (distance between conformers. Similarity is measured in terms of a reconfigura- 1 1/ (j) gformers) tional distance. Two specific amino acid sequences are deemed foldable and nonfoldable because one gives rise to a single, large folding funnel leading to a native conformation and the other has multiple pathways leading to several stable conform- Native Mean first passage time of ers. Monte Carlo simulations demonstrate that folding funnel Conformation diffusional interconversion calculations accurately predict the fact of and the pathways involved in folding-specific sequences. The existence of folding FIG. 1. Schematic representation of the folding process. The denatured coil (A) collapses to a random dense structure (B), funnels for specific sequences suggests that geometrically re- approximated by a set of maximally compact conformers (C). A lated families of stable, collapsed conformers fulfill kinetic and reconfigurational distance is defined between compact states (D) and thermodynamic requirements of protein folding. is used to sort pairs ofcubes for calculation ofthe mean first passage time for interconversion (E). The kinetic structure of conformation A first functionally important action of any protein is to fold. space (F) shows a folding funnel leading to a unique, locally stable, Though the general process of protein folding is not fully kinetically accessible state. understood, it has been known since the ribonuclease refold- ing experiments of Anfinsen that small globular proteins fold Here, we introduce the concept of "protein folding fun- in the absence of any catalytic biomolecules (1). From this nels," a kinetic mechanism for understanding the self- fact, it was surmised that folded proteins exist in the global organizing principle of the sequence-structure relationship. minimum free energy state (1). Soon thereafter, Levinthal (2, This concept follows from a few general considerations (Fig. 3) used a simple counting argument to demonstrate that 1). (i) Proteins fold from a random state by collapsing and Anfinsen's thermodynamic hypothesis implied a paradoxical reconfiguring (4), (ii) reconfiguration occurs diffusively and result: a typical protein has far too many conformations to follows a general drift from higher energy to lower energy permit a thorough search for the global minimum. conformations, and (iii) reconfiguration occurs between con- Since the statement of Levinthal's celebrated paradox, formations that are geometrically similar-i.e., global inter- several groups have attempted to preserve the concept of the conversions are energetically prohibitive after collapse-so global minimum by arguing that compactness (4) or native local interconversions alone are considered. We define the state propensities (5) reduce the effective size of conforma- folding funnel as a collection of geometrically similar col- tion space. Neither requirement resolves the paradox be- lapsed structures, one of which is thermodynamically stable cause compactness alone lessens but does not eliminate the with respect to the rest, though not necessarily with respect conformation counting problem and native state propensities to the whole conformation space. Since it is tightly coupled cannot be justified from amino acid sequence analysis. Ac- to the ability of a protein to fold, the existence of a folding cordingly, a plausible resolution to the Levinthal paradox funnel for a given protein depends on its primary structure. might be obtained using a kinetic rather than a thermody- An amino acid sequence having a single sizeable folding namic approach to the folding problem. Folded proteins are funnel leading to a unique, stable conformation is said to be not (necessarily) equilibrium entities; rather, they are meta- "foldable." Conversely, "nonfoldable" amino acid se- stable with lifetimes longer than (or perhaps about equal to) quences have multiple folding funnels and do not exhibit a their functionally significant lifetimes. To overcome the single minimum. Levinthal problem, proteins must have a broad conforma- Three major simplifications are required to describe the tional focusing property to direct folding to a unique, locally long-time kinetic behavior of a protein. (i) A lattice model is stable, kinetically accessible "native" conformation. used to restrict discussion to the most important degrees of The publication costs of this article were defrayed in part by page charge tPresent address: Department ofChemistry, Harvard University, 12 payment. This article must therefore be hereby marked "advertisement" Oxford Street, Cambridge, MA 02138. in accordance with 18 U.S.C. §1734 solely to indicate this fact. §To whom reprint requests should be addressed. 8721 8722 Biophysics: Leopold et al. Proc. Natl. Acad. Sci. USA 89 (1992) conformational freedom. Each residue is located on a vertex of the lattice. (ii) Consistent with the low-resolution struc- tural description of the lattice model, the chemical identity of the protein is depicted in a simplified N-letter hydropathic code, where some number N of amino acid types is chosen to enable an arbitrary degree of heterogeneity for residue- residue interactions. Each residue may represent clusters of real amino acids. (iii) The dynamics of the folding process can be treated as a collection of discrete "hopping" processes between local minima. Since conformational dynamics is governed by the same residue-residue interactions that drive the initial conformational collapse, it is possible to divide motion into two types: relatively fast, sterically constrained chain motions that do not involve the breaking of many chain-chain contacts and slow motions across energy barri- ers associated with larger structural fluctuations and the breaking of many chain-chain contacts. The time scale separation proposed for the interconversion process can be used to divide collapsed conformations into "compactness cells." Within a compactness cell, conforma- C-terminus tions interconvert quickly; between compactness cells, con- formations interconvert slowly. The rates for slow intercon- FIG. 2. Conformation of the native state of sequence S1 repre- by calculating the average time the protein sented in a four-letter hydropathic code (for calculations a 20-letter version are found code was used). Two "subdomains" are present as eight-residue 2 x spends in any cell before a thermal fluctuation moves the 2 x 2 subcubes in the upper left octant and in the lower right octant. chain over a conformational barrier into another cell. The Subdomains enhance kinetic accessibility by providing a large num- division of conformation space into compactness cells is valid ber of dense conformations that are geometrically similar to the for any model for protein structure, whether on- or off-lattice. native state. A difficulty arises when attempting to define compactness cells for specific sequences. To show how such difficulties metrically distinct, maximally compact conformations, each can be managed through an approximation technique, we shaped like a 3 x 3 x 3 cube (6, 7). now focus on a simple lattice model. The maximally compact cubes perform a 2-fold function in describing interconversion. (i) They act as markers among STATEMENT OF MODEL the dense conformations. Each dense conformation is as- Lattice Model and Energy Function. A protein is repre- sumed to be a structural fluctuation around some specific sented as a 27-unit chain on a simple cubic lattice. The protein cubic conformation. (ii) Maximally compact conformations is assigned a chemical identity in the form of a primary are strong candidates for local energy minima. Energy bar- sequence S = {s,, . .. , S27}, where each site i on the chain riers between maximally compact conformations are large is assigned a residue from a list of N residue types, si E {al, because many contacts must be broken in fluctuating from . , aN}. The residue types do not correspond directly to the one conformation to another. Taken together, the maximally naturally occurring amino acids but are chosen to provide a compact local minimum conformation and its dense affiliates wide range of chemical differentiation. It is not clear how constitute a "compactness