
Proc. Natl. Acad. Sci. USA Vol. 81, pp. 1075-1078, February 1984 Biochemistry On the use of sequence homologies to predict protein structure: Identical pentapeptides can have completely different conformations (cooperativity/protein folding/amino acid sequence homology) WOLFGANG KABSCH AND CHRISTIAN SANDER Department of Biophysics, Max Planck Institute of Medical Research, 6900 Heidelberg, Federal Republic of Germany Communicated by Sir John Kendrew, November 9, 1983 ABSTRACT The search for amino acid sequence homolo- Probability of Occurrence. A systematic search for identi- gies can be a powerful tool for predicting protein structure. cal pentapeptides seems not to have been undertaken previ- Discovered sequence homologies are currently used in predict- ously, perhaps because one intuitively, but wrongly, does ing the function of oncogene proteins. To sharpen this tool, we not expect to find any. An old party joke is analogous: one investigated the structural significance of short sequence ho- intuitively does not expect to find two identical birthdays in mologies by searching proteins of known three-dimensional a group of 30 randomly chosen people, yet one can, statisti- structure for subsequence identities. In 62 proteins with cally, be 70% certain of finding them by chance (3). What 10,000 residues, we found that the longest isolated homologies matters is not that the number of people is small compared between unrelated proteins are five residues long. In 6 (out of with the number of days in a year but that the number of all 25) cases we saw surprising structural adaptability: the same possible pair comparisons among 30 people, 435, is larger five residues are part of an a-helix in one protein and part of a than 365. ,8-strand in another protein. These examples show quantita- By analogy, the occurrence of identical pentapeptide se- tively that pentapeptide structure within a protein is strongly quences in unrelated proteins in the data base is actually dependent on sequence context, a fact essentially ignored in highly probable: although the number of residues in the data most protein structure prediction methods: just considering base of known protein structures (104) is very small com- the local sequence of five residues is not sufficient to predict pared with the number of possible pentapeptide sequences correctly the local conformation (secondary structure). Coop- (205 = 3.2 x 106), the number of pair comparisons (5 x 107) erativity of length six or longer must be taken into account. is larger. More precisely, assume we have n amino acid resi- Also, we are warned that in the growing practice of comparing dues arranged in random order with, for simplicity, frequen- a new protein sequence with a data base of known sequences, cy of occurrence of p = 1/20 each. The probability that all finding an identical pentapeptide sequence between two pro- possible pairs of peptides of length / are different is teins is not a significant indication of structural similarity or of evolutionary kinship. P = (1 - )N, [1] The folding process of a globular protein is highly selective: where (1 - p') is the probability for two peptides of length I a long amino acid chain ends up in one or a few out of a huge to be different and N = n(n - 1)/2 is the number of pair number of possible three-dimensional conformations. In comparisons. The formula holds for n << 1/pl and neglects contrast, conformational preference of single amino acid res- the complication of overlap of successive pentapeptides. For idues is weak. So the high selectivity of protein folding is pentapeptides (1 = 5) in the data base of known protein struc- only possible through the interaction of many residues. What tures (n = 10,000), we calculate P = 2 x 10-7. The probabili- is the minimum number of residues for folding into a unique ty to find at least one pair is 1 - P = 0.9999998. Thus the structure? Five residues can form more than one turn of an odds are more than 1,000,000:1 for finding one or more such a-helix or an entire p-strand. Cooperativity of length five pairs, and indeed we find 25 pairs. could therefore be deemed sufficient for stabilization of sec- For hexapeptides, the odds for finding one or more pairs in ondary structure. We were thus surprised by the discovery unrelated proteins are 1:1 (P = 0.5, 1 - P = 0.5). Consistent of amino acid sequences of length five that in one protein are with that, within the data base the systematic search yields in a-helical and in another, quite different protein, in /-sheet none. Outside the data base, without a systematic search, we conformation. have so far found two: (i) Cys-Arg-Asp-Lys-Ala-Ser are resi- dues 63-68 in rhodanese (data set 1RHD) and residues 151- Search for Identical Oligopeptides 156 in a gonococcal pilus protein (Thomas Meyer, personal communication) and (ii) Gly-Tyr-Ile-Thr-Asp-Gly are resi- Detailed knowledge of protein structure comes mainly from dues 92-97 in actinidin (data set 2ACT) and residues 68-73 in the more than 100 three-dimensional structures solved by x- phage T4 DNA ligase (John Armstrong, personal communi- ray crystallography (1). Of these, at least 62 are known to cation). sufficient resolution to allow assignment of details of hydro- gen-bonded (secondary) structure (2). A systematic search Conformation of Identical Pentapeptides among these 62 proteins revealed 25 subsequence identities of length five, not counting identities flanked by more exten- How similar in conformation are the discovered identical sive sequence homology. For example, the sequence Lys- pentapeptides? A good objective measure of similarity of Val-Leu-Asp-Ala occurs both in prealbumin and in carbonic three-dimensional structure is the minimum rms distance anhydrase, two proteins otherwise unrelated in sequence, [d(rms)] between the C(a) atom positions of equivalent resi- function, or structural type. dues obtainable by rotation and translation (4). Table 1 gives the pentapeptide pairs in their sequence context ordered in The publication costs of this article were defrayed in part by page charge terms of decreasing d(rms)-i.e., increasing structural simi- payment. This article must therefore be hereby marked "advertisement" larity. Considering that five residues can be more than one in accordance with 18 U.S.C. §1734 solely to indicate this fact. turn of an a-helix and that many B-strands are not longer 1075 Downloaded by guest on September 25, 2021 1076 Biochemistry: Kabsch and Sander Proc. Natl. Acad. Sci. USA 81 (1984) Table 1. Conformation of identical pentapeptides occurring in two different proteins Different conformation in different proteins (d > 2.5 A) Similar conformation in different proteins (d < 2.1 A) Protein (secondary Protein (secondary Protein subsequence structure of DataI Resi- Protein subsequence structure of Data Resi- d, A (conformation) pentapeptide) set dues d, A (conformation) pentapeptide) set dues 4.5 NIEAD VNTFV ASHKP erythrocruorit (hemogl.) lECD 80-84 2.0 CCQEA YGVSV IVGVP alcohol dehydrogenase 4ADH 286-290 hhh hhhhh hhhgg alpha-helix tb tt t eee e DRCKP VNTFV HESLA ribonuclease-s lRNS 44-48 QGGTH YGVSV VGIGR thermolysin (protease) 2TLN 251-255 ss s eeeee s hh beta-strand t ees ss ee s 4.3 CPLMV KVLDA VRGSP prealbumin 2PAB 6-10 2.0 AHTDF AGAEA AWGAT erythrocruorin (hemogl.) lECD 115-119 eee eeeet tttee beta-strand hhs g ggghh hhhhh NPKLQ KVLDA LQAIK carbonic anhydrase B 1CAB 155-159 KFAGQ AGAEA ELAQR cytochrome C551 251C 38-42 gggh hhhht gggt alpha-helix hhtt tthhh hhhhh 4.1 DNGIR LAPVA acid protease 1APR 320-324 1.7 AKKIV SDGDG MNAWV lysozyme (hen) 7LYZ 100-104 tteee bb beta-strand hhhhh hsssg gggsh CTTNC LAPVA KVLHE GP dehydrogenase lGPD 153-157 LKAGD SDGDG KIGVD Ca binding parvalbumin lCPV 91-95 hhhhh hhhhh hhhhh alpha-helix hhht tt ss eeehh 4.1 LIGQK VAHAL AEGLG triose phosp. isomerase 1TIM 112-116 1.6 LMVKV LDAVR GSPAI prealbumin 2PAB 8-12 hhhhh hhhhh htt alpha-helix eeeee eettt tee VSVNG VAHAL TAGHC strep.gris. proteinase A 1SGA 25-29 PVYDS LDAVR RCALI lysozyme (phage) 1LZM 91-95 eeett eeeee e hhh beta-strand hhhhh s hhh hhhhh 4.1 LSGEE KAAVL ALWDK hemoglobin (horse) 2MHB 150-154 1.4 LLILP DEAAV GNLVG acid protease 1APR 229-233 hhh hhhhh hhhtt alpha-helix see ssttt gggtt KVIKC KAAVL WEEKK alcohol dehydrogenase 4ADH 10-14 QQKRW DEAAV NLAKS lysozyme (phage) 1LZM 127-131 s eee eeeeb stts beta-strand htt tttss sstts 3.9 GFFSK IIGEL PNIEA erythrocruorin (hemogl.) lECD 69-73 1.3 SLKPL SVSYD QATSL carbonic anhydrase C lCAC 45-49 hhhhh hhhtt t hh alpha-helix ts e eee t t beta-strand LSKTF IIGEL HPDDR cytochrome B5 2B5C 73-77 NADAT SVSYD VDLND concanavalin A 3CNA 74-78 hgggg eeeee gggg beta-strand ts e eeeee tt be ta-strand 3.3 KITVL GVRQV GMACG lactate dehydrogenase 1LDX 27-31 i.2 TLFPP SSEEL QANKA immunoglobulin 1FAB 117-121 eeee tthh hhhhh alpha-helix start e tttt ttt helix AAKTD GVRQV QPYNQ papain (protease) 8PAP 109-113 AIHPT SSEEL VTLR glutathione reductase 2GRS 453-457 sb s eeee ss h beta-strand sss sgggg ss 3-10-helix 3.3 GNEGS TGSSS TVGYP subtilisin BPN' 1SBT 159-163 1.1 ELPGR SVIVG AGYIA glutathione reductase 2GRS 173-177 s stts bt s ss eeee shhh beta-strand VTISC TGSSS NIGAG immunog lobulin 1FAB 23-27 EAYGV SVIVG VPPDS alcohol dehydrogenase 4ADH 289-293 eeee e tt tttss ttt e eee tt beta-strand 3.1 FIPLS GGIDV VAHEL thermolysin 2TLN 135-139 0.6 GNWVC AAKFE SNFNT lysozyme (hen) 7LYZ 31-35 b gg g hhh hhhhh hhhhh hhhhh tssbt alpha-helix DTVQV GGIDV TGGPQ acid protease 1APR 95-99 KETA AAKFE RQHMD ribonulease-s lRNS 5-9 b sss b s tt h hhhhh hhhb alpha-helix 3.1 KAGIQ LSKTF VKVVS
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-