The Protein-Folding Problem, 50 Years on REVIEW

REVIEW which the hydrophobic (H) amino acids are predominantly located in the protein’s core and the polar (P) amino acids are more commonly on The Protein-Folding Problem, 50 Years On the folded protein’s surface. Theory and experiments indicate that folding is governed by a pre- Ken A. Dill1,2,3* and Justin L. MacCallum1 dominantly binary code based on interactions with surrounding water molecules: There are few ways a The protein-folding problem was first posed about one half-century ago. The term refers to three broad given protein sequence of H and P residues can questions: (i) What is the physical code by which an amino acid sequence dictates a protein’snative configure to bury its hydrophobic amino acids op- structure? (ii) How can proteins fold so fast? (iii) Can we devise a computer algorithm to predict protein timally (12, 13). (vi) Chain entropy. Opposing the structures from their sequences? We review progress on these problems. In a few cases, computer folding process is a large loss in chain entropy as simulations of the physical forces in chemically detailed models have now achieved the accurate folding of the protein collapses into its compact native state small proteins. We have learned that proteins fold rapidly because random thermal motions cause from its many open denatured configurations (12). conformational changes leading energetically downhill toward the native structure, a principle that is These physical forces are described approxi- captured in funnel-shaped energy landscapes. And thanks in part to the large Protein Data Bank of known mately by “forcefields” (14). Forcefields are mod- structures, predicting protein structures is now far more successful than was thought possible in the early els of potential energies that are used in computer days. What began as three questions of basic science one half-century ago has now grown into the simulations. They are widely applied to studies full-fledged research field of protein physical science. of protein equilibria and dynamics. In computer modeling, a protein molecule is put into an ini- rotein molecules embody a remarkable re- The folding mechanism: A polypeptide chain has tial configuration, often random. Conformations lationship between structure and function an almost unfathomable number of possible con- change over the course of the simulation by re- Pat the molecular level. Proteins perform formations. How can proteins fold so fast? (iii) peatedly solving Newton’s dynamical laws of mo- many different functions in biochemistry. A pro- Predicting protein structures using computers: tion for the atoms of the protein molecule and tein’s biological mechanism is determined by its Can we devise a computer algorithm to predict a the solvent by using the forcefield energies. Ac- three-dimensional (3D) native structure, which protein’s native structure from its amino acid se- cording to the laws of thermodynamics, systems in turn is encoded in its 1D string of amino acid quence? Such an algorithm might circumvent the tend toward their states of lowest free energy. monomers. time-consuming process of experimental protein- Computational protein folding explores the pro- This year marks the 50th anniversary of the structure determination and accelerate the dis- cess by which the protein proceeds through con- 1962 Nobel Prize in Chemistry awarded to Max covery of protein structures and new drugs. formational states to states of lower free energies. Perutz and John Kendrew for their pioneering Here, we give our perspective on these ques- As shown in Fig. 2, the thermodynamically stable work in determining the structure of globular pro- tions at the broad-brush level. More detailed re- states of 12 small protein structures can be reached teins (1–3). That work laid the foundation for views can be found elsewhere (4–8). fairly successfully by means of extensive molec- structural biology, which interprets molecular- ular dynamics (MD) simulations in a bath of level biological mechanisms in terms of the struc- The Physical Code of tures of proteins and other biomolecules. Their Protein Folding work also raised the question of how protein What forces drive a protein to its structures are explained by physical principles. 3D folded structure? Much insight Upon seeing the structure of myoglobin (Fig. 1) comes from the Protein Data Bank at 6 Å resolution (1), Kendrew et al.said, (PDB), a collection of now more than 80,000 protein structures at “Perhaps the most remarkable features atomic detail (9). The following of the molecule are its complexity and factors appear to contribute (10): its lack of symmetry. The arrangement (i) Hydrogen bonds. Protein struc- seems to be almost totally lacking in the tures are composed of a-helices kind of regularities which one instinc- and b-sheets, as was predicted by tively anticipates, and it is more com- Linus Pauling on the basis of ex- plicated than has been predicated by any pected hydrogen bonding patterns theory of protein structure. Though the (11). (ii) van der Waals interac- detailed principles of construction do not tions. The atoms within a folded yet emerge, we may hope that they will do protein are tightly packed, im- so at a later stage of the analysis.” plying the importance of the same types of close-ranged interactions The protein-folding problem came to be three that govern the structures of liq- main questions: (i) The physical folding code: uids and solids. (iii) Backbone How is the 3D native structure of a protein deter- angle preferences. Like other types mined by the physicochemical properties that of polymers, protein molecules are encoded in its 1D amino-acid sequence? (ii) have preferred angles of neigh- boring backbone bond orienta- 1Laufer Center for Physical and Quantitative Biology, Stony tions. (iv) Electrostatic interactions. – 2 Brook University, Stony Brook, NY 11794 5252, USA. De- Some amino acids attract or re- partment of Physics, Stony Brook University, Stony Brook, NY Fig. 1. In 1958, Kendrew and co-workers published the first structure 11794–3800, USA. 3Department of Chemistry, Stony Brook pel because of negative and pos- of a globular protein: myoglobin at 6 Å resolution (1). Its puzzlingly University,StonyBrook,NY11794–3400, USA. itive charges. (v) Hydrophobic complex structure lacked the expected symmetry and regularity and *To whom correspondence should be addressed. E-mail: interactions. Proteins ball up in- launched the protein-folding problem. [With permission from the [email protected] to well-packed folded states in Medical Research Council Laboratory of Molecular Biology] 1042 23 NOVEMBER 2012 VOL 338 SCIENCE www.sciencemag.org REVIEW taking random steps that are mostly incrementally downhill in energy. Steps need only be fa- vorable by one to two times the thermal energy to reach the native structure rapidly (24). Insights from funnels, however, have not yet been sufficient to improve computer search methods. A land- scape that appears smooth and funnel-shaped on a global scale can be rough on the local scales that are sampled in computer simulations. Butwearestillmissinga“folding mechanism.” By mechanism, we mean a narrative that explains how the time evolution of a protein’s folding to its native state de- rives from its amino acid sequence and solution conditions. A mechanism is more than just the sequences of events followed by any one given protein in experiments or in computed trajectories. We do not yet have in hand a general principle that is applicable to a broad range of proteins, that would explain differences and similarities of the folding routes and rates of different proteins in advance of the Fig. 2. Modern physical models can compute the folded structures of some small proteins. Using a high-performance data, and that properly average, in custom computer called Anton (48), Shaw and co-workers observed reversible folding and unfolding in more than 400 events “ 15 some meaningful way, over ir- across 12 small proteins to structures within 4.5 Å of the experimental structure ( ). The experimental structures are shown relevant” thermal motions. One dif- in red, and the computed structures are blue. Shown are the name, PDB identifier, and RMSD (root-mean-square deviation ficulty has been reconciling our between alpha carbon atoms) between the predicted and experimental structures. [Adapted with permission (15)] “macroscopic” understanding of kinetics (mass-action models) that explicit water molecules (15). However, such suc- This question led to a major experimental result from ensemble-averaged experiments with cesses, important as they are, are limited. So far, quest to characterize the kinetics of protein fold- our “microscopic” understanding of the angstrom- such modeling succeeds only on a limited set ing and to find folding intermediates, which are by-angstrom changes of each protein conforma- of small simple protein folds (16). And, it does partially structured states along the “folding path- tion in computer simulations (energy landscapes). not yet accurately predict protein stabilities or way” (18, 19). The hope was that snapshots of However, there are a few general conclusions thermodynamic properties. Opportunities for the the chain caught in the act of folding would give (25). Proteins appear to fold in units of secondary future include better forcefields, better models insights into folding “mechanisms,” the rules by structures. A protein’s stability increases with its of the protein-water interactions, and faster ways which nature performs conformational searching. growing partial structure as it folds. And, a pro- to sample conformations, which are far too lim- The experimental challenge was not just to mea- tein appears to first develop local structures in ited, even with today’s most powerful computers. sure atom-by-atom contacts within the hetero- the chain (such as helices and turns) followed by The early days saw hopes of finding simple se- geneous interior of a protein molecule, but to do growth into more global structures.

The Protein-Folding Problem, 50 Years on REVIEW

Structural Mechanisms of Oligomer and Amyloid Fibril Formation by the Prion Protein Cite This: Chem

Native-Like Mean Structure in the Unfolded Ensemble of Small Proteins

DNA Energy Landscapes Via Calorimetric Detection of Microstate Ensembles of Metastable Macrostates and Triplet Repeat Diseases

Conformational Search for the Protein Native State

A Comparison of the Folding Kinetics of a Small, Artificially Selected DNA Aptamer with Those of Equivalently Simple Naturally Occurring Proteins

Study of the Variability of the Native Protein Structure

Evolution of Biomolecular Structure Class II Trna-Synthetases and Trna

Protein Folding

Native-Like Secondary Structure in a Peptide From

Microsecond Sub-Domain Motions and the Folding and Misfolding of the Mouse Prion Protein Rama Reddy Goluguri1, Sreemantee Sen1, Jayant Udgaonkar1,2*

From Knowledge of Secondary Structure ALESSANDRO MONGE*T, RICHARD A

Geometric and Physical Considerations for Realistic Protein Models