Protein Folding and the Organization of the Protein Topology Universe
Total Page:16
File Type:pdf, Size:1020Kb
Opinion TRENDS in Biochemical Sciences Vol.30 No.1 January 2005 Protein folding and the organization of the protein topology universe Kresten Lindorff-Larsen1, Peter Røgen2, Emanuele Paci3, Michele Vendruscolo1 and Christopher M. Dobson1 1University of Cambridge, Department of Chemistry, Lensfield Road, Cambridge, UK, CB2 1EW 2Department of Mathematics, Technical University of Denmark, Building 303, DK-2800 Kongens Lyngby, Denmark 3University of Zu¨ rich, Department of Biochemistry, Winterthurerstrasse 190, 8057 Zu¨ rich, Switzerland The mechanism by which proteins fold to their native ensembles has shown that establishing the correct overall states has been the focus of intense research in recent topology of the polypeptide chain is a crucial aspect of years. The rate-limiting event in the folding reaction is protein folding. This observation is in accord with a series the formation of a conformation in a set known as the of studies that have shown that the folding rate of a transition-state ensemble. The structural features pre- protein, to a first approximation, can be related to the sent within such ensembles have now been analysed for entropic cost of forming the native-like topology [16–22]. a series of proteins using data from a combination of The structural changes occurring during protein fold- biochemical and biophysical experiments together with ing have also been analysed in detail for a series of computer-simulation methods. These studies show that proteins and we discuss some of these studies here. The the topology of the transition state is determined by a results enable the topological view of folding to be set of interactions involving a small number of key reconciled with the well-established concept of nucleation residues and, in addition, that the topology of the [23] by showing that – despite the many different ways in transition state is closer to that of the native state than which a given topology could, in principle, be generated – to that of any other fold in the protein universe. Here, we individual proteins use interactions between a specific and review the evidence for these conclusions and suggest a limited set of residues to define the fold [8,15,23,24]. molecular mechanism that rationalizes these findings by Together with methods that aim to describe protein presenting a view of protein folds that is based on the structures in terms of general topological quantities topological features of the polypeptide backbone, rather [25,26], these results indicate how the underlying than the conventional view that depends on the principles that determine the native-state structure arrangement of different types of secondary-structure are closely related to those that guide the protein- elements. By linking the folding process to the organ- folding reaction. ization of the protein structure universe, we propose an explanation for the overwhelming importance of Glossary topology in the transition states for protein folding. Generalized Gauss integrals: A family of geometric measures constructed to classify protein conformations in terms of general conformational properties The widespread application of the methods of structural [26]. An example of one of these measures is the average number of times a biology is beginning to provide a comprehensive picture of chain segment crosses over and under any other segment when averaged over all directions from which the chain is seen. the variety of possible native folds available to proteins Molecular dynamics simulations: A computational method to calculate the [1–4]. Folding to these structures is, in most cases, the time-dependent behaviour of a molecular system. In classical molecular final and crucial step in the transformation of genetic dynamics simulations, a force field associated with the potential energy of a protein is used and Newton’s equations of motion are integrated to sample the information into a specific biological function. A full relevant conformations of all the atoms in a protein molecule. In restrained understanding of the mechanisms by which folding occurs molecular dynamics simulations, the force field is modified to take experimen- tal data into account, to bias the simulations towards those regions of therefore represents the solution to a central problem in conformation space that are consistent with the experiment. This procedure molecular biology [5–7]. enables protein conformations that are in agreement with available exper- Procedures have recently been developed to provide a imental data (e.g. F-values) to be obtained even if they have free energies that are far from the minimum (e.g. at the TSE) in the unrestrained simulations [10]. molecular description of the conformations that are rate- Transition-state ensemble: An ensemble of conformations, the formation of limiting in the folding of a given protein, the transition- which is rate-limiting for folding. If folding from the denatured state is modelled state ensemble (TSE; see Glossary), by incorporating the as occurring on a free-energy landscape, the transition state is associated with the largest barrier that needs to be crossed to reach the native state [5,7]. results of a mutational analysis of folding kinetics into F-Value analysis: A kinetic method for obtaining structural information about computer simulations [8]. Using this approach, ensembles the transition state for protein folding. Individual amino acid mutations are made throughout the protein sequence, and the effects of the mutations on the of conformations representing the TSE have been deter- folding and unfolding kinetics, in addition to the thermodynamics of folding, mined for a series of proteins [8–15]. Examination of these are measured. The F-value for a specific mutation is the ratio of the stability change in the transition state and in the native state accompanying that Corresponding authors: Vendruscolo, M. ([email protected]), Dobson, C.M. mutation and, thus, reports the extent to which the interactions found in the ([email protected]). native state can also be found in the transition state [7]. Available online 7 December 2004 www.sciencedirect.com 0968-0004/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tibs.2004.11.008 14 Opinion TRENDS in Biochemical Sciences Vol.30 No.1 January 2005 The nature of folding transition states To examine this topological similarity more quantitat- The TSEs for three mainly b-sheet-containing pro- ively, we have explored ways of analysing the relation- teins, muscle acyl-phosphatase (AcP) [27],the ships between conformations in a TSE and the known a-spectrin Src homology 3 (SH3) domain [28] and structures of native proteins [15]. The method that has the third fibronectin type III domain of human proved most valuable is based on structural alignments tenascin (TNfn3) [29], have been studied in particular between the TSE conformations and a large database of detail, including comprehensive mutational F-value native protein structures. Representative members of the analysis [7] of the interactions that are present within conformations in the TSEs of AcP, TNfn3 and the a-spectrin the TSEs. Structural models of the TSEs of these SH3 domain were aligned with the native-state structures of proteins have been determined by incorporating the 921 protein domains of 40–110 residues, extracted from the experimentally determined F-values into molecular major structural classes (a, b, a/b, aCb)oftheSCOP dynamics simulations. The results of such a procedure is (structural classification of proteins) database [1,3] that the simulations are able to identify the conformations (http://scop.mrc-lmb.cam.ac.uk/scop). The structural com- that are fully compatible with the experimental data parisons were carried out using a distance matrix [10,11,15] and, hence, to determine the TSE (Figure 1). alignment procedure implemented in the DALI program Analysis of structures determined in this way has revealed [2,30,31], in which comparisons of matrices of pairwise Ca that many features of the native states, including well- distances are used to quantify structural and topological defined secondary-structure elements and the burial from similarities. An alignment from DALI can be judged by its solvent of the amino acid residues in the hydrophobic core, Z-score, a length-normalized measure of the structural are only partially formed in the conformations that make up similarity; the higher the Z-score the higher the level of the TSEs. Nevertheless, despite the fact that these confor- structural similarity [31]. mations have average root mean square deviations from the The results of this analysis have revealed that the native state typically of w7A˚ [10,11,15], and that some majority of conformations in the TSEs of these proteins conformations have values even larger than 10 A˚ , inspection can be characterized as being structurally more similar to of the structures within the TSEs suggests that they have their native folds than to any other fold within the SCOP overall topologies that are similar to those of their database [15] (Table 1). Thus, for AcP, 27 out of 29 (93%) corresponding native states. representative TSE structures, selected from a much (a) TNfn3 (b) AcP (c) α-Spectrin SH3 Native state Transition state Figure 1. Native structures and transition-state ensembles (TSEs) for (a) the third fibronectin type III domain of human tenascin (TNfn3), (b) acyl-phosphatase (AcP) and (c) a-spectrin Src homology 3 (SH3) domain. Native-state structures were produced using Bobscript