Minireview 1123 A structural explanation for the twilight zone of protein sequence homology Su Yun Chung1 and S Subbiah2† Homology modeling of protein structures as a function computer alignment programs, the twilight zone typically of sequence breaks down at the twilight zone limit of falls between 20 and 25% sequence identity for proteins sequence identity between the template and target that are comprised of at least one stable domain [1,4]. When proteins. Our results suggest that protein sequences two totally unrelated sequences composed of the 20 stan- that have diverged from a common ancestor beyond the dard amino acids were aligned without any introduced gaps, twilight zone may adopt side-chain interactions that are random chance led to a mean value of 6% for sequence very different from those endowed by the ancestral identity. Sequence-alignment techniques that maximize sequence. similarity by introducing relative insertions and deletions can be expected to significantly raise this baseline average Addresses: 1Department of Biochemistry Uniformed Services, [5]. To summarize, when a pair of protein sequences have University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, high sequence identity—higher than the twilight-zone MD 20814, USA and 2Department of Structural Biology, Stanford University, School of Medicine, Fairchild Building, Stanford, CA limit of 25%—divergent evolutionary relatedness can be 94305, USA. convincingly inferred. When the sequence identity falls within or below the twilight zone of 20–25%, common † Present address: The Wistar Institute, 3601 Spruce Street, ancestry from a shared past cannot be readily assumed by Philadelphia, PA 19104, USA. sequence data alone. E-mail: [email protected] [email protected] Although the significance of the twilight zone is well estab- lished, much of its justification stems from our empirical Structure 15 October 1996, 4:1123–1127 experience with statistical analysis of 1D protein sequences. © Current Biology Ltd ISSN 0969-2126 As almost all protein sequences fold into specific 3D struc- tures, and these folded structures are under evolutionary Decades of comparing and aligning protein sequences led selection pressure, the twilight zone is likely to have some to the empirical observation of the ‘twilight zone’ for 3D structural meaning. Cumulative amino-acid changes in sequence similarity [1]. The twilight zone is an opera- protein sequence, including insertions and deletions, result tionally defined term. It represents a range of sequence in altered 3D structures [6]. The connection between identity that sets the boundary of confidence levels for sequence similarity and structural similarity can be estab- detecting evolutionary relatedness of proteins in sequence- lished by a combined sequence-structural analysis of struc- alignment analysis. When two protein sequences diverge, turally superimposed proteins. Studies carried out on pairs the remaining similarity, measured as percentage sequence of optimally superimposed homologous proteins demon- identity, steadily decreases past the twilight zone to the strated that the structural differences, measured as the limit expected by random chance. Above the twilight zone, average rms deviation of the backbone atoms, increases the case for divergent evolution is strong, with greater with decreasing sequence identity [7–9]. Although there are sequence identity reflecting a shorter period of evolutionary many individual exceptions, the general observation is that divergence between a pair of proteins. When additional bio- when two proteins share 50% or higher sequence identity, physical or biochemical evidence becomes available, such their backbones differ by less than 1Å rms deviation; when higher-than-twilight-zone levels of sequence identity are two proteins share 20–25% sequence identity, their back- almost always accompanied by a very convincing similarity bones typically differ by some 2Å rms deviation. Therefore, in three-dimensional (3D) structure and biological function. to date, one simple 3D structural implication of the 1D twi- When the sequence identity falls in the twilight zone, the light zone exists: when the sequences of two proteins statistical measure for the evolutionary relatedness of pro- diverge to the twilight-zone limit, their backbones can be teins becomes uncertain. In most such instances, the expected to differ by 2Å rms deviation. Our recent results, sequences share neither an evolutionary past, similar struc- obtained while developing the application of side-chain- tures nor biological functions. Despite this, there are docu- packing methods for the homology modeling of proteins, mented cases in which seemingly unrelated sequences, unexpectedly offers a deeper insight into the structural sharing less than the twilight-zone limit of sequence iden- meaning of the twilight zone. tity, adopt similar 3D folds [2,3]. In practice, the minimum sequence identity sufficient to infer evolutionary related- Approaches that analyze side-chain packing are based on ness depends on the length and amino-acid composition of the premise that the fixed backbone template of a protein the aligned sequences, as well as on the gap penalty is sufficient to allow the prediction of the side-chain coordi- imposed by the sequence-alignment procedures. With most nates of its buried core residues, based on packing criteria 1124 Structure 1996, Vol 4 No 10 [10]. It is now well established that different side-chain- side chains of the target buried core residues into their packing methods, including the one [11] that we have used, correct rotamer orientations. can be expected to accurately predict the side-chain coordi- nates of the buried core residues when the experimental It is remarkable that the limit at which the template back- backbone coordinates of a globular protein are given bone is insufficient to constrain the correct packing of the [12–19]. On average, the overall side chain rms error in pre- buried side chains should occur at the twilight zone. This diction is about 1.2Å, while 85% of the x1 and 80% of the provides a structural explanation of the raison d’être for x2 angles can be predicted accurately [20]. Allowing for the the twilight zone: the confident assumption of divergent approximate 0.3Å experimental error in the backbone coor- evolution from sequence information alone. When protein dinates of even the most well determined X-ray structures, sequences diverge from a common ancestor by a single this is remarkably accurate when compared with the 3.1Å amino-acid replacement in the buried core, the new side side-chain rms error and the 22% and 29% success rates for chain replacing the original residue is quite severely x1 and x2 angles (averaging over all amino-acid residues) hemmed in by its immediate tertiary environment, com- that can be expected by random chance. prising nearby side chains and backbone atoms distal in sequence. Thus, only certain side chains in certain rotamer Recently, we demonstrated that the side-chain-packing conformations can be accommodated in the new folds. As methods can be successfully applied to homology model- more amino-acid replacements are progressively intro- ing, using families of proteins with known 3D structures as duced, there are successive gradual distortions of the ter- model systems [9,21]. For each target sequence, the side- tiary fold and side-chain interactions to accommodate the chain coordinates of the buried residues were predicted changes. However, at each step, the descendent protein’s using the backbone coordinates of a known homologous freedom to diverge is always restricted by the particular protein as a fixed template. The side-chain prediction structural environment, imposed by the immediate prede- accuracy was assessed as function of either sequence simi- cessor, at the site of replacement. It appears that there is a larity or backbone structural similarity between the pairs of continuing ‘structural memory’ of both the pattern of side- target and template proteins (Fig. 1). We observed that the chain interactions and the constraining backbone fold average rms errors for the predicted buried side chains endowed by the common ancestral sequence. Both these increase in an exponential fashion with decreasing memories are embedded in the evolving repertoire of side sequence identity or increasing backbone rms deviation chains, the former only in the residues that have been left (Fig. 1a,b). Specifically, when the sequence identity was unchanged and the latter in all the residues. about 50%, or with a corresponding backbone rms devia- tion of about 1Å between the template and the true target, When the diverging protein sequences reach the twilight the average rms error for the predicted buried side chains zone, the memory of the specific pattern of side-chain inter- remained low, at 1.5Å (Fig. 1a,b, arrows). In addition, actions endowed by the ancestral sequence is in a structural 60–65% (Fig. 1c,d, arrows) of the x2 angles were accurately sense ‘lost’: major rearrangements in side-chain interactions predicted. When the template and target sequences are at can take place as long as the side-chain conformations are the twilight zone of about 20–25% sequence identity, or compatible with the ongoing constraining backbone scaf- the corresponding backbone rms deviation of about folds. This is consistent with our recent observation that 1.9–2.0Å, the prediction accuracies for the average side- when a pair of target and template
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-