The Protein Folding Problem

The Protein Folding Problem Ken A. Dill,1,2 S. Banu Ozkan,3 M. Scott Shell,4 and Thomas R. Weikl5 1Department of Pharmaceutical Chemistry, 2Graduate Group in Biophysics, University of California, San Francisco, California 94143; email: [email protected] 3Department of Physics, Arizona State University, Tempe, Arizona 85287; email: [email protected] 4Department of Chemical Engineering, University of California, Santa Barbara, California 93106; email: [email protected] 5Max Planck Institute of Colloids and Interfaces, Department of Theory and Bio-Systems, 14424 Potsdam, Germany; email: [email protected] Annu. Rev. Biophys. 2008. 37:289–316 Key Words The Annual Review of Biophysics is online at structure prediction, funnel energy landscapes, CASP, folding biophys.annualreviews.org code, folding kinetics This article’s doi: 10.1146/annurev.biophys.37.092707.153558 Abstract Copyright c 2008 by Annual Reviews. The “protein folding problem” consists of three closely related puz- All rights reserved zles: (a) What is the folding code? (b) What is the folding mechanism? 1936-122X/08/0609-0289$20.00 (c) Can we predict the native structure of a protein from its amino acid sequence? Once regarded as a grand challenge, protein folding has seen great progress in recent years. Now, foldable proteins and nonbiological polymers are being designed routinely and mov- ing toward successful applications. The structures of small proteins are now often well predicted by computer methods. And, there is now a testable explanation for how a protein can fold so quickly: A protein solves its large global optimization problem as a series of smaller local optimization problems, growing and assembling the native structure from peptide fragments, local structures first. 289 viously expected (117), and α-helices had been Contents anticipated by Linus Pauling and colleagues (180, 181), but the first protein structures—of INTRODUCTION................. 290 the globins—had helices that were packed THE FOLDING CODE: WHAT together in unexpected irregular ways. Since BALANCE OF FORCES then, the protein folding problem has come ENCODES NATIVE to be regarded as three different problems: STRUCTURES? ................ 290 ( ) the folding code: the thermodynamic Anfinsen’s Thermodynamic a question of what balance of interatomic Hypothesis.................... 290 forces dictates the structure of the protein, One Dominant Driving Force for a given amino acid sequence; ( ) protein or Many Small Ones? .......... 291 b structure prediction: the computational Designing New Proteins and problem of how to predict a protein’s native Nonbiological Foldamers ...... 292 structure from its amino acid sequence; and COMPUTATIONAL PROTEIN ( ) the folding process: the kinetics question STRUCTURE PREDICTION c of what routes or pathways some proteins IS INCREASINGLY use to fold so quickly. We focus here only SUCCESSFUL .................. 293 on soluble proteins and not on fibrous or CASP: A Community-Wide membrane proteins. Experiment ................... 293 ARE THERE MECHANISMS OF PROTEIN FOLDING?.......... 294 THE FOLDING CODE: WHAT There Have Been Big Advances in BALANCE OF FORCES ENCODES Experimental and Theoretical NATIVE STRUCTURES? Methods ...................... 294 The PSB Plot: Folding Speed Anfinsen’s Thermodynamic Correlates with the Localness Hypothesis of Contacts in the Native A major milestone in protein science was Structure...................... 295 the thermodynamic hypothesis of Christian Proteins Fold on Funnel-Shaped Anfinsen and colleagues (3, 92). From his Energy Landscapes ............ 296 now-famous experiments on ribonuclease, The Zipping and Assembly Anfinsen postulated that the native structure Hypothesis for the Folding of a protein is the thermodynamically stable Routes ........................ 300 structure; it depends only on the amino acid PHYSICS-BASED MODELING OF sequence and on the conditions of solution, FOLDING AND STRUCTURE and not on the kinetic folding route. It be- PREDICTION .................. 301 came widely appreciated that the native struc- SUMMARY......................... 303 ture does not depend on whether the protein was synthesized biologically on a ribosome or with the help of chaperone molecules, or if, INTRODUCTION instead, the protein was simply refolded as The protein folding problem is the question of an isolated molecule in a test tube. [There how a protein’s amino acid sequence dictates are rare exceptions, however, such as insulin, its three-dimensional atomic structure. The α-lytic protease (203), and the serpins (227), notion of a folding “problem” first emerged in which the biologically active form is ki- around 1960, with the appearance of the first netically trapped.] Two powerful conclusions atomic-resolution protein structures. Some followed from Anfinsen’s work. First, it en- form of internal crystalline regularity was pre- abled the large research enterprise of in vitro 290 Dill et al. protein folding that has come to understand are generally satisfied in native structures. Hy- native structures by experiments inside test drogen bonds among backbone amide and tubes rather than inside cells. Second, the An- carbonyl groups are key components of all finsen principle implies a sort of division of secondary structures, and studies of mutations labor: Evolution can act to change an amino in different solvents estimate their strengths acid sequence, but the folding equilibrium and to be around 1–4 kcal/mol (21, 72) or stronger kinetics of a given sequence are then matters (5, 46). Similarly, tight packing in proteins im- of physical chemistry. plies that van der Waals interactions are important (28). However, the question of the folding code One Dominant Driving Force is whether there is a dominant factor that or Many Small Ones? explains why any two proteins, for example, Prior to the mid-1980s, the protein folding lysozyme and ribonuclease, have different na- code was seen a sum of many different small tive structures. This code must be written in interactions—such as hydrogen bonds, ion the side chains, not in the backbone hydrogen pairs, van der Waals attractions, and water- bonding, because it is through the side chains mediated hydrophobic interactions. A key that one protein differs from another. There idea was that the primary sequence encoded is considerable evidence that hydrophobic in- secondary structures, which then encoded teractions must play a major role in protein tertiary structures (4). However, through folding. (a) Proteins have hydrophobic cores, statistical mechanical modeling, a different implying nonpolar amino acids are driven to view emerged in the 1980s, namely, that there be sequestered from water. (b) Model com- is a dominant component to the folding code, pound studies show 1–2 kcal/mol for trans- that it is the hydrophobic interaction, that the ferring a hydrophobic side chain from wa- folding code is distributed both locally and ter into oil-like media (234), and there are nonlocally in the sequence, and that a protein’s many of them. (c) Proteins are readily de- secondary structure is as much a consequence natured in nonpolar solvents. (d ) Sequences of the tertiary structure as a cause of it that are jumbled and retain only their cor- (48, 49). rect hydrophobic and polar patterning fold to Because native proteins are only 5– their expected native states (39, 98, 112, 118), 10 kcal/mol more stable than their denatured in the absence of efforts to design packing, states, it is clear that no type of intermolec- charges, or hydrogen bonding. Hydrophobic ular force can be neglected in folding and and polar patterning also appears to be a key structure prediction (238). Although it re- to encoding of amyloid-like fibril structures mains challenging to separate in a clean and (236). rigorous way some types of interactions from What stabilizes secondary structures? Be- others, here are some of the main observa- fore any protein structure was known, Linus tions. Folding is not likely to be dominated by Pauling and colleagues (180, 181) inferred electrostatic interactions among charged side from hydrogen-bonding models that proteins chains because most proteins have relatively might have α-helices. However, secondary few charged residues; they are concentrated in structures are seldom stable on their own in high-dielectric regions on the protein surface. solution. Although different amino acids have Protein stabilities tend to be independent of different energetic propensities to be in sec- pH (near neutral) and salt concentration, and ondary structures (6, 41, 55, 100), there are charge mutations typically lead to small effects also many “chameleon” sequences in natural on structure and stability. Hydrogen-bonding proteins, which are peptide segments that can interactions are important, because essentially assume either helical or β conformations de- all possible hydrogen-bonding interactions pending on their tertiary context (158, 162). www.annualreviews.org • The Protein Folding Problem 291 abc 15 ) 222 ] θ 10 5 2° structure ([ 2° structure Native fold Partially folded structures 0 051015 (radius)–3 Figure 1 (a) Binary code. Experiments show that a primarily binary hydrophobic-polar code is sufficient to fold helix-bundle proteins (112). Reprinted from Reference 112 with permission from AAAS. (b) Compactness stabilizes secondary structure, in proteins, from lattice models. (c) Experiments supporting panel b, showing that compactness correlates with secondary structure content in nonnative

The Protein Folding Problem

NIH Public Access Author Manuscript Proteins

Protein Structure & Folding

DNA Glycosylase Exercise - Levels 1 & 2: Answer Key

Structural Mechanisms of Oligomer and Amyloid Fibril Formation by the Prion Protein Cite This: Chem

Single-Molecule Spectroscopy and Imaging Studies of Protein Folding-Unfolding Conformational Dynamics: the Multiple-State and Multiple-Channel Energy Landscape

Protein Unfolding (Model Peptide Helix/Helix-Coil Transition/Denaturant Effects) J

Analysis of Different Evolutionary Strategies to Prevent Protein Aggregation

Native-Like Mean Structure in the Unfolded Ensemble of Small Proteins

Large-Scale Analysis of Macromolecular Crowding Effects on Protein Aggregation Using a Reconstituted Cell-Free Translation System

Recent Advances in the Structural Biology of the 26S Proteasome

Topological Distribution of Four-A-Helix Bundles (Folding Motif/Solvent Accessibility/Helix Dipole) SCOTT R

DNA Energy Landscapes Via Calorimetric Detection of Microstate Ensembles of Metastable Macrostates and Triplet Repeat Diseases