PROTEIN STRUCTURE EVOLUTION and the SCOP DATABASE Raghu P
Total Page:16
File Type:pdf, Size:1020Kb
17 PROTEIN STRUCTURE EVOLUTION AND THE SCOP DATABASE Raghu P. R. Metpally and Boojala V. B. Reddy INTRODUCTION The structure of a protein can elucidate its function and its evolutionary history (see Chapters 18, 21 and 23). Extracting this information requires knowledge of the structure and its relationships with other proteins. These in turn require a general knowledge of the folds that proteins adopt and detailed information about the structure of many proteins. Nearly all proteins have structural similarities with other proteins, and in many cases, share a common evolutionary origin. The knowledge of these relationships makes important contributions to structural bioinformatics and other related areas of science. Further, these relationships will play an important role in the interpretation of sequences produced by genome projects. To facilitate understanding and access to the information available for known protein structures, Murzin, Brenner, Hubbard, and Chothia (1995) have constructed a structural classification of proteins (SCOP) database. The SCOP database is based on evolutionary relationships and on the principles that govern their three-dimensional structure. It provides for each entry links to coordinates, images of the structure, interactive viewers, sequence data, and literature references. The database is freely accessible on the World Wide Web (http:// scop.mrc-lmb.cam.ac.uk/scop). To understand the rationale behind SCOP, we begin with a discussion of protein evolution from a sequence structure and functional perspective. THE EVOLUTION OF PROTEINS Proteins that have descended from the same ancestor retain memory of that ancestor through the sequence, structure, and function. Murzin (1998) discussed various aspects of the evolution of sequence, structure, fold, and function with specific examples. Strong sequence Structural Bioinformatics, Second Edition Edited by Jenny Gu and Philip E. Bourne Copyright Ó 2009 John Wiley & Sons, Inc. 419 420 PROTEIN STRUCTURE EVOLUTION AND THE SCOP DATABASE similarity alone is considered to be sufficient evidence for common ancestry. Close structural and functional similarity taken together is also accepted as sufficient evidence for distant homology between proteins that lack significant sequence similarity. But neither structural nor functional similarity alone is considered to be strong evidence. Proteins of independent origin may well have similar structures due to physicochemical reasons and they may also evolve similar functions due to functional selection referred to as divergent evolved structures. Although in theory descendents from the same ancestor may have different functions and even different structures, they would be very difficult to detect. THE EVOLUTION OF FOLD It is generally accepted that, in distantly related proteins, structure is more conserved than sequence. Proteins that have diverged beyond significant sequence similarity still retain the architecture and topology of their ancestral fold. The reasons for this structural conservation are not completely understood. In principle, a protein chain can have more than one stable fold. Of the competing theories, one theory states that selective pressure originating from the physical demands on the protein chain results in protein evolution towards adopting a stable and fast folding conformation. Another theory states as protein evolution convergences upon a set of small, highly stable protein folds, evolutionary pressures were then dominated by functional constraints resulting in the present state of affairs. Evolution requires that existing proteins continue to function and such processes would be interrupted if the protein’s fold changed significantly. Notwithstanding, a protein fold can change in a minor way during evolution by keeping all the secondary structure elements the same, packed and connected exactly the same way, but with different organization of connecting loops. The structural similarity between seemingly unrelated proteins are often explained by ‘‘convergence’’ to a stable fold as opposed to divergence from a common ancestor. Convergence does not only imply that the given proteins are of independent origin, but that they also had different original folds. With no evidence for their difference in original fold, these examples are referred to as ‘‘parallel’’ evolution. THE EVOLUTION OF ENZYMATIC CATALYSIS Proven cases of distantly related enzymes with very different functions are very rare and therefore are of great value for the understanding of the origin of enzymatic activity. The precision and complexity of the active-site structures of modern enzymes did not happen by chance but evolved from primitive catalytic features of the ancestral protein. Primitive enzymes were probably less efficient, but they were also likely to have a broader range of activities. If this were the case, the devolution of these activities would allow improvements in both efficiency and specificity through customization of the active-site architectures with additional specificity-determining groups. Within the active site, the original catalytic features are likely to remain conserved and can be revealed by structural comparison. When a protein evolves to a new function, the protein fold can change before becoming fully optimized after developing new functional constraints. A simple way of altering a protein fold without a large destabilization effect is to change its topology while maintaining its architecture. This can be done by the internal swapping of similar a-helices and b-strands or by reversing the direction of some of its secondary structures. There are many proteins with SCOP HIERARCHY 421 similar secondary structuresandarchitecturesbutdifferenttopologiesthatcouldberelatedin such a way (e.g., an immunoglobulin domain and plastocyanin; an SH3 domain and GroES). THE COMPARISON OF STRUCTURES Most present methods for structure comparison measure the significance of structural similarity by a score derived from the number of residues that are common in the structural core and their root mean square deviation. These values mainly reflect the arrangement of regular secondary structures, whose packing and topology is determined by stereochemical rules rather than by evolutionary constraints. The common ancestry may manifest itself in more subtle features, such as the conservation of rare and unusual topological and packing details, and other structural irregularities that are less likely to occur independently. Within the common structural core, there may also be conserved turns, a-helical caps, b-bulges and other small structural elements that, in general, may occur in different orientations independent of the protein fold. Early work on protein structures showed that there are striking regularities with the secondary structure assembly (Levitt and Chothia, 1976; Chothia, Levitt, and Richardson, 1977) and polypeptide topologies (Richardson, 1976; Sternberg and Thornton, 1976; Richardson, 1977). These regularities arise from the intrinsic physical and chemical properties of proteins (Chothia, 1984; Finkelstein and Ptitsyn, 1987) and provide the basis for the classification of protein folds (Levitt and Chothia, 1976; Richardson, 1981). Structure comparison (Holm and Sander, 1993; Shindyalov and Bourne, 1998) and structure classification (Orengo et al., 1993; Overington et al., 1993; Yee and Dill, 1993) provides further insight into the systematic relationship among protein structures. Resources are now available for recognition of the relationships between protein structures and several are discussed in this book (Chapters 16, 17 and 18). The SCOP database hierarchically organizes proteins according to their structures and evolutionary origin (Murzin et al., 1995; Lo Conte et al., 2000). It forms a resource that allows researchers to study the nature of protein folds, to help focus research investigations, and to rely on expert-defined relationships. SCOP is the most cited resource for classifying proteins (Table 17.1). In the context of structural bioinformatics it provides a reductionism that facilitates many of the studies outlined in this book. SCOP HIERARCHY The method used to classify proteins in SCOP is manually achieved by the visual inspection and comparison of structures as the human mind is quite adept at recognizing slightly dissimilar objects as belonging to a common architecture. The assistance of automatic tools is sometimes used only to make more complex tasks manageable and to provide generality. Thus, simplified representations of protein structures can be grouped by expert inspection (Murzin et al., 1995). SCOP differs from other protein classification methods like Class Architecture Topology Homologous (CATH) and DALI, in that the protein domains are classified manually compared to CATH which approaches protein domain classification with a combination of manual and automated procedures (Orengo et al., 1997). The Dali Domain Dictionary uses a fully automated approach of defining and classifying domains (Dietmann et al., 2001). 422 PROTEIN STRUCTURE EVOLUTION AND THE SCOP DATABASE T A B L E 1 7 . 1 . A Partial List of References Where SCOP Classification was Used as the Basis for Analysis Apic G, et al. (2001): J Mol Biol 310(2):311–325. Aung Z, Tan KL (2006): J Bioinform Comput Biol 4(6):1197–1216. Babbitt PC, Gerlt JA (2000): Adv Protein Chem 55:1–28. Bertone P, Gerstein M (2001): IEEE Eng Med Biol Mag 20(4):33–40. Bhaduri A, et al. (2004): BMC Bioinform 5(35). Bukhman YV, Skolnick J (2001): Bioinformatics