Biomolecules 2014, 4, 268-288; doi:10.3390/biom4010268 OPEN ACCESS biomolecules ISSN 2218-273X www.mdpi.com/journal/biomolecules/
Article Similar Structures to the E-to-H Helix Unit in the Globin-Like Fold are Found in Other Helical Folds
Masanari Matsuoka 1,2, Aoi Fujita 1, Yosuke Kawai 1,† and Takeshi Kikuchi 1,*
1 Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan; E-Mails: [email protected] (M.M.); [email protected] (A.F.); [email protected] (Y.K.) 2 Japan Society for the Promotion of Science (JSPS), Ichibancho, Chiyoda-ku, Tokyo 102-8471, Japan
† Present address: Division of Biomedical Information Analysis, Department of Integrative Genomics, Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 980-8575, Japan
* Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +81-77-561-5909; Fax: +81-77-561-5203.
Received: 6 December 2013; in revised form: 11 February 2014 / Accepted: 13 February 2014 / Published: 27 February 2014
Abstract: A protein in the globin-like fold contains six alpha-helices, A, B, E, F, G and H. Among them, the E-to-H helix unit (E, F, G and H helices) forms a compact structure. In this study, we searched similar structures to the E-to-H helix of leghomoglobin in the whole protein structure space using the Dali program. Several similar structures were found in other helical folds, such as KaiA/RbsU domain and Type III secretion system domain. These observations suggest that the E-to-H helix unit may be a common subunit in the whole protein 3D structure space. In addition, the common conserved hydrophobic residues were found among the similar structures to the E-to-H helix unit. Hydrophobic interactions between the conserved residues may stabilize the 3D structures of the unit. We also predicted the possible compact regions of the units using the average distance method.
Keywords: Globin-like fold; super-fold; keyword; dali search; average distance map
Biomolecules 2014, 4 269
1. Introduction
To elucidate how a protein folds into its unique 3D structure is a long-standing unsolved problem in molecular bioinformatics and molecular biophysics. To tackle this problem, we have to know how the information of a protein’s 3D structure construction is coded in its amino acid sequence. Protein folds are characterized by some combinations of secondary structural elements, and such structural characteristics of a protein are called its fold or topology. Some protein folds have neither sequence nor functional similarity. Proteins with these folds frequently appear in the structure database, that is, more than 30% of determined structures and such folds are called a “superfold” [1]. In such a case, one might ask if there is a common property in their amino acid sequences that leads to the same fold. Alternatively, is there a partial key 3D structure that leads to the same topology? If so, it should be a common 3D structural unit, which leads to a specific topology. It is well-recognized that some of the proteins classified as having the globin-like fold according to the Structural Classification of Protein Database (SCOP) show a rather low amino acid sequence identity of around 15% in spite of the high conservativeness of the 3D scaffold [1]. The globin-like fold is regarded as a superfold [1]. Figures 1 and 2 show the amino acid sequence alignment and 3D structures of leghemoglobin (soy bean) and myoglobin (sperm whale) as examples. The codes of the Protein Data Bank (PDB) are 1FSL and 1MBN, respectively.
Figure 1. Amino acid sequence alignment of soybean leghemoglobin (PDB: 1FSL) and sperm whale myoglobin (PDB: 1MBN). The amino acid sequence identity is 15%. White letters with a black background denotes a residue in the -helices of the E-to-H helix unit. The portions labeled by E, F, G and H refer to these -helical regions.
Biomolecules 2014, 4 270
Figure 2. 3D structures of (a) soybean leghemoglobin (PDB: 1FSL) and (b) sperm whale myoglobin (PDB: 1MBN). The portion in light gray is the E-to-H helix unit.
The amino acid sequence identity between them is 15%. As seen in Figure 2, a protein in the globin-like fold contains six helices A, B, E, F, G and H. Nakajima et al. [2] have revealed that the E-to-H helices is thought to be a folding initiation site, especially in the plant hemoglobins which are supposed to retain the ancestral characteristics in the predicted folding mechanism. This analysis corresponds well to the NMR experimental results [3] for leghemoglobin. Furthermore, a major part of the 3D structure of 2-on-2 hemoglobin (Mycobacterium tuberculosis), in which the N-terminal part in the peptide chain is truncated in comparison with leghemoglobin and myoglobin, is the E-to-H helix shown in Figure 3 [2].
Figure 3. 3D structure of 2-on-2 hemoglobin, i.e., Mycobacterium tuberculosis hemoglobin (PDB: 1NGK). The portion in light gray is the E-to-H helix unit. The E-to-H helix unit of this protein is very similar to that in 1FSL or 1MBN.
These observations lead us to speculate that the E-to-H helix part forms an evolutionary conserved 3D scaffold, that is, this part is considered as a structure formation unit. In addition, proteins regarded as members of a superfold sometimes appear in several SCOP folds [1]. Thus, we have come to believe that the fold of the E-to-H helix unit may be widely observed in protein 3D structure space.
Biomolecules 2014, 4 271
The purpose of the present study is to search for 3D structures similar to the E-to-H helix unit as independent folding units in SCOP folds besides the globin-like fold. Namely, we investigate the commonality of the 3D scaffold of the E-to-H helix unit in the protein 3D structure space. One technique employed in this work is a 3D structure alignment technique for the aforementioned purpose, the Dali algorithm [4]. Another technique employed involves a contact map constructed from only the amino acid sequence of a protein based on inter-residue average distance statistics (average distance map, ADM) [5,6]. This technique is used to predict a possible compact region in the 3D structure of a protein. In this study, a heme is not explicitly considered because a protein in the globin-like fold, myoglobin, has been confirmed to fold into its native structure without a heme [3]. A similar concept to the present study has been proposed as “protein lego” which refers to similar partial 3D structures observed in the protein 3D structure space [7,8]. Some analyses of 3D structural properties and stability of -helix bundle were carried out by means of energy calculations [9–19], Wenxiang diagram and so on [20–24]. In the present work, we focus on the E-to-H helix unit predicting whether a corresponding part will form a compact region as a possible folding unit. The organization of the present paper and the way of thinking are as follows. We first show the results of the DALI searches with the 3D structure of the E-to-H helix unit in soy bean leghemoglobin as a query followed by the ADM analyses for all hit proteins to examine whether the partial sequence of the E-to-H helix part in a hit protein exhibits a property to form a compact structure based on the ADM for this protein (Section 2.1). The details of the 3D structures of the proteins obtained in Section 2.1 are summarized in Section 2.2. In the Sections 2.3–2.5, the conservation of hydrophobic residues in homologues of a protein currently considered is presented and the hydrophobic packing formed by conserved hydrophobic residues is analyzed to examin whether a specific residue pattern can be observed in the E-to-H helix unit of a protein treated in the present study. Finally, the commonality of the E-to-H helix unit in the whole protein structure space is discussed.
2. Results
2.1. Dali Search and ADM Analyses
About 7500 hits were obtained by the Dali algorithm with the E-to-H helix unit of soybean leghemoglobin as a query. We picked up proteins with Z ≥ 2.0 so that the number of the false positive is less than about 5%. The hit proteins could be classified into the 690 SCOP families, and the representative protein from each family was selected based on the criterion described in the method section, Section 4.2. The 690 families are summarized in Table S1 of the Supplementary Material. It is somewhat hard to pick up folding units by analyzing compact parts defined in 3D structures. Some proteins contain several compact parts, which may be probable folding units. In this study, in order to detect the property of partial peptide sequences, which tend to form compact structures, we use the ADM method. The ADM method predicts the possible compact regions in a protein from its amino acid sequence and predicted regions correspond well to structural domains [5,6,25], and experimentally observed folding regions including highly protected regions measured by NMR during folding [26–28].
Biomolecules 2014, 4 272
For every representative protein, average distance map (ADM) analysis was performed and we took each protein in which the corresponding E-to-H unit was predicted as a compact region. An average distance map, ADM, is a kind of predicted contact maps and the details on ADM are described in the Section 4.2 (the method section). The ADMs for soybean leghemoglobin (PDB: 1FSL) and sperm whale myoglobin (PDB: 1MBN) are presented in Figure 4. In the ADM for soybean leghemoglobin, the regions 9–34 and 65–140 are predicted as compact regions with values of 0.228 and 0.324 respectively. A value denotes an index of the strength of the compactness of a predicted compact region by ADM (see method section for details). The respective predicted regions include helices A–B and E-to-H. From the values, the E-to-H part can be regarded as the main compact unit in soybean leghemoglobin. On the other hand, the compact units 7–33 and 99–151 are predicted by the ADM for sperm whale myoglobin with the values of 0.273 and 0.276 for these regions, respectively, with these regions corresponding to helices A–B and G–H as shown in Figure 4b. From the ADM for sperm whale myoglobin, the regions 7–33 and 99–151 are predicted to be equally compact. This prediction reflects the experimental results of folding of myoglobin by NMR [3,26]. This fact suggests that the G-H part sometimes strongly form in the E-to-H helix unit. Therefore, we also took each protein hit by the Dali search in which only the G–H part is predicted to be a compact region, because in sperm whale myoglobin only the G–H part is predicted to be a compact region.
Figure 4. Average distance maps (ADM) for 1FSL (a), and 1MBN (b). The locations of the -helices are labeled by A, B and so on. The label “9–34” denotes the compact region predicted by ADM. A numeral in parentheses indicates the value of a compact region predicted by ADM.
The ADM predictions were applied for the representative proteins obtained above and finally, five representative proteins were obtained. We confirmed that these five proteins are also all obtained when the E-to-H helix unit from sperm whale myoglobin is used as a query instead of that from leghemoglobin (data not shown). The hit proteins and their details are summarized in Table 1. The amino acid sequences from FASTA files of these five proteins and the positions of the E-to-H helix parts are shown in Figure 5 (a helix is presented by white letters with a black background).
Biomolecules 2014, 4 273
Table 1. Proteins obtained in the present work. Region Hit by DALI Search with the Following Query(Z Score, rmsd(a)) Protein(Source, PDB ID) Family Fold Leghemoglobin E-to-H Myoglobin E-to-H (Soy Bean, 1fsl) (Sperm Whale, 1mbn) circadian clock protein Kai A Circadian clock protein KaiA/RbsU domain E-to-H helices (4.4, 3.8) E-to-H helices (7.0, 3.3) (Synechococcus, 1R8J) KaiA, C-terminal domain Type III secretion system secretion control protein A LcrE-like EGH helices (2.0, 4.5) FGH helices (4.1, 3.9) domain chain(Yersinia, 1XL3) SipA N-terminal cell invasion protein SipA SipA N-terminal domain-like E-to-H helices (2.9, 9.3) H helix (3.0, 8.0) domain-like (Salmonella, 2FM9) transcriptional regulator Tetracyclin repressor-like, Tetracyclin repressor-like, RHA1_ro04179 GH helices (4.4, 9.7) GH helices (2.1, 4.7) C-terminal domain C-terminal domain (Rodococcus, 2NP5) hypothetical protein AF0060 all-alpha NTP GH helices with a part of the E helix AF0060-like GH helices (3.5, 3.5) (E. coli, 2P06) pyrophosphatases (3.2, 4.8) (a): rms distance (Å) between the 3D structures of the aligned regions of the query structure and a hit protein.
Biomolecules 2014, 4 274
Figure 5. The amino acid sequences from FASTA files of (a) Circadian clock protein KaiA (Synechococcus, PDBID:1R8J), (b) Secretion control protein A chain (Yersinia, 1XL3), (c) Cell invasion protein SipA (Salmonella, PDBID: 2FM9), (d) Transcriptional regulator RHA1_ro04179 (Rodococcus, PDBID: 2NP5), and (e) Hypothetical protein AF0060 (E. coli, PDBID: 2P06). White letters with a black background denotes a residue in the -helices of the corresponding E-to-H helix unit.
Biomolecules 2014, 4 275
2.2. Detailed Comparisons of the Folding Units Predicted by ADMs with a Region Hit by the Dali Search
The regions predicted by ADMs for the respective proteins with the values are summarized in Table 2. The details are as follows. We may use a PDB ID to specify each protein.
Table 2. The summary of the results of the ADMs. Protein (PDB ID) Predicted Folding Unit Value 9–34 0.228 1FSL 65–140 0.324 7–33 0.273 1MBN 99–151 0.276 5–34 0.218 1R8J 51–82 0.203 223–270 0.254 3–44 0.195 1XL3 77–99 0.202 124–201 0.297 1–51 0.292 2FM9 79–115 0.157 166–199 0.297 5–38 0.253 2NP5 76–103 0.175 128–186 0.372 2P06 3–83 0.370
2.2.1. Circadian Clock Protein KaiA (1R8J)
The 3D structure and ADM of the circadian clock protein KaiA (1R8J) are presented in Figure 6a and 6b. The predicted compact regions are 5–34,51–82 and 223–270 as shown in Figure 6b and the last predicted region with the highest value corresponds to G and H helices in the E-to-H helix unit as presented in Table 2 and Figure 6b. The rest of this protein, that is, the region 1–179 contains a Flavodoxin-like fold domain, namely, domain (see Figures 5a (regions enclosed by rectangles) and 6a). Incidentally, the first and second predicted regions form the Rossmann fold.
Biomolecules 2014, 4 276
Figure 6. 3D structures and ADMs of 1R8J (a) and (b), 1XL3(c) and (d), 2FM9 (e) and (f), 2NP5 (g) and (h), and 2P06 (i) and (j). A portion in light gray denotes the corresponding E-to-H helix part. The label “5–34” denotes the compact region predicted by ADM. The numeral in parentheses means the value of a compact region predicted by ADM.
2.2.2. Secretion Control Protein (1XL3)
This protein consists of only helices. Figure 6c and 6d present the 3D structure and ADM of the secretion control protein (1XL3). The predicted compact regions are 3–44, 77–99 and 124–201 and the corresponding E-to-H helices are included in the last predicted region with the highest value (see Figure 6d and Table 2). The DALI search picked up the corresponding E, G and H helix parts in this protein, as the parts are structurally similar to the query structure. However, there is a helix between the corresponding E and G helices in this protein, and thus we regard this helix as the helix corresponding to the F helix (Figure 6d).
Biomolecules 2014, 4 277
2.2.3. Cell Invasion Protein SipA (2FM9)
This protein consists of only helices. Figures 6e and 6 f show the 3D structure and ADM of the cell invasion protein SipA (2FM9). The predicted compact regions are 1–51, 79–115 and 166–199 as presented in Table 2 and the region 1–51 corresponds to the E-to-H unit with one of the highest values (0.292 for 1–51 and 0.297 for 166–199). The Dali search hit the segment corresponding to the E-to-G helices as the part structurally similar to the query structure (see also Figure S1 in Supplementary Material). The helix corresponding to the H helix is outside the predicted compact region, but the proximity of this helix to the E-to-G helix unit can be confirmed by visual inspection as seen in Figure 6e.
2.2.4. Transcriptional Regulator RHA1_ro04179 (2NP5)
This protein consists of only helices. The 3D structure and ADM of the transcriptional regulator RHA1_ro04179 (2NP5) are presented in Figure 6g,h. The predicted compact regions are 5–38, 76–103, and 128–186 as shown in Table 2 with the last predicted regions corresponding to the E (short part), G and H helices with the highest value. The Dali search hit only the part corresponding to the G-H helices as the part structurally similar to the query structure (Figure S1 in the Supplementary Material). The other helix is perpendicularly close to these two helices, and this can be regarded as the E helix (see Figure 6g). There is no helix corresponding to the F helix.
2.2.5. Hypothetical Protein AF0060 (2P06)
The 3D structure and ADM of the hypothetical protein AF0060 (2P06) are presented in Figure 6i and 6j. The ADM predicts the almost whole region to be the compact region (3–83) as seen in Table 2. The Dali search hit portions corresponding to the G-H helices and a small portion of the corresponding E helix as the part with a 3D structure similar to the query structure (Figure S1 in Supplementary Material). The structural similarity of this part to the E-to-H unit in leghemoglobin can also be confirmed by visual inspection (Figure 6i). The configurations of the four helices corresponding to the E-to-H helix unit in the proteins obtained by the present method are illustrated in Figure 7. In particular, 1R8J and 1XL3 have the same helix configuration as the helices in 1FSL and 1MBN. However, the helix configuration in 2FM9 and 2NP5 is almost a mirror image of that in 1FSL. This is because the Dali search picks up a protein with a 3D structure similar to that of a query protein by comparing the inter-C distances, and thus a protein with a mirror image structure with a query can be hit. 2P06 contains a helix configuration similar to that of 1FSL, but the orientation of the helix E is different. Thus, one should be careful about the helix configuration in each E-to-H helix unit. We call these helix configurations Configuration A (the configuration in 1FSL, 1MBN, 1R8J, and 1XL3), Configuration B (the configuration in 2FM9 and 2NP5), and Configuration C (configuration in 2P06), respectively, as illustrated in Figure 7.
Biomolecules 2014, 4 278
Figure 7. The configurations of helices E, F, G and H. (a) Configuration A. 1FSL, 1MBN, 1R8J and 1XL3 belong to this category. (b) Configuration B. Mirror image of configuration A. 2FM9 and 2NP5 belong to this category. (c) Configuration C. Variant of Configuration A. 2P06 belongs to this category.
2.3. Conserved Residues in the E-to-H Unit
We performed BLAST searches for amino acid sequences of the E-to-H helix parts in five proteins with leghemoglobin and myoglobin followed by multiple alignments with MUSCLE (see method section, Section 4.4, for details). The conservation in a helix corresponding to the F helix in 1FSL or 1MBN was not taken into account because this helix is not indispensable for structure formations [3]. The statistics of the searched homologs are summarized in Table 3. The accession codes of homologues and their multiple aliments are presented in Figure S2 of the Supplementary Material. Because there is no sufficient amino acid sequence diversity among the homologous sequences of 2FM9 and 2NP5, we cannot find any residues, which could be identified as the conserved residue in terms of the number of amino acid substitutions. The conservation in the amino acid sequence of 2P06 could not be evaluated since there is no significant hit with the amino acid sequence of this protein as a query on UniprotKB databases by a BLAST search. The configurations of the helices in these proteins, 2FM9, 2NP5, and 2P06, are shown in Figure 7b and 7c. Thus, we discuss the conservation of residues for the following four proteins: 1FSL, 2MBN, 1R8J, and 1XL3. All of these proteins contain the E-to-H helix unit with Configuration A (Figure 7a). Focusing on the conservativeness of hydrophobic residues, that is, A, F, I, L, M, V, and W, we present the conserved hydrophobic residues in each protein in Figure 8, indicating the conserved residues with the mark “˅”.
Biomolecules 2014, 4 279
Table 3. Homologues of the proteins in the present analyses. Number of Residues Protein Number of Number of Containing E-to-H (Source, PDB) Homologs Conserved Residue Helices Leghemoglobin (soybean 1FSl) 45 34 88 Myoglobin (sperm whale 1MBN) 82 38 96 Circadian clock protein KaiA 49 50 98 (Synechococcus, 1R8J) Secretion control protein) A chain 29 25 76 (Yersinia, 1XL3) Cell invasion protein SipA 6 0 79 (Salmonella, 2FM9) Transcriptional regulator RHA1_ro04179 3 0 74 (Rodococcus, 2NP5) Hypothetical protein AF0060 0 0 81 (E. coli, 2P06)
Figure 8. Conserved hydrophobic residues in the E-to-H helix unit of (a) 1FSL, (b) 1MBN, (c) 1R8J, (d) 1XL3, (e) 2FM9, (f) 2NP5, and (g) 2P06. The conserved residues are labeled with the mark “˅”. Any two residues with the same mark (one of the marks, #, %, ‡, †, ▲, ▼, ■, □, ○, ◊, and ∆) in different helices denote that this residue pair makes a hydrophobic packing detected by the buried surface. Residues with “*” in a helix constitute a common residue pattern specific for the E-to-H helix unit. The residue with “*” actually does not form hydrophobic packing. For (e) 2FM9, (f) 2NP5 and (g) 2P06, a residue with a mark “+” indicates a residue pattern similar to one of the present common residue patters in Table 4.
Biomolecules 2014, 4 280
Figure 8. Cont.
2.4. Residues Involved in Hydrophobic Packing Assigned Based on Buried Surface
The packing residue pairs in the E-to-H unit in each protein detected based on buried surface (see method section, Section 4.3, for details) are depicted in Figure 8. In this figure, any two residues with the same mark (one of the marks, #, %, ‡, †, ▲, ▼, ■, □, ○, ◊, and ∆) in different helices denote that this residue pair makes hydrophobic packing. For the analyses of the packing residue pairs, the F helix was not considered in the present study because it has been confirmed that the F helix in leghemoglobin does not play a significant role in its folding as already mentioned [3]. Relatively many packing pairs are observed between the G and H helices in every protein while a few packing residues are observed between the E and G or H helices as seen in Figure 8. From this figure, we can confirm that the almost all the packing hydrophobic residue pairs are distributed in the conserved residues. Only 1 of the 58 packing residues, 181-I in 1XL3, are not conserved residues in the present definition (Figure 8, see also Figure 5b).
Biomolecules 2014, 4 281
2.5. Common Residue Patterns Specific to the E-to-H Helix Unit Defined from the Packing Patterns of Conserved Hydrophobic Residues
A common residue pattern might be defined from the information of both conserved hydrophobic residues and the packing residues of the E-to-H helix unit in 1FSL, 1MBN, 1R8J and 1XL3. The common residue patterns defined by visual inspection are presented in Figure 8. A residue constituting a common pattern is labeled by the mark “*”. These residue patterns are also summarized in Table 4. The common residue patterns are expressed as x(3)x(3)orx(2)x(4)for the E helix, x(2,3)(2)x(2, 3) for the G helix, x(2)(2)x(2)for the H helix. The symbol denotes a hydrophobic residue. Only 1of the 44 residues in these patterns, 267-R in 1R8J (a residue with the mark “*” in Figure 8), does not form hydrophobic packing. The packing of the residues constituting common residue patterns in the E-to-H helix unit in 1FSL, 1R8J and 1XL3 is presented in Figure 9a,c respectively. For 2FM9, 2NP5, and 2P06, the similar residue patterns are also inferred by visual inspection based on only the packing in Figures 8e,g. A residue in such a pattern in 2FM9, 2NP5, and 2P06 is labeled by “+” in Figure 8e,g. These residue patterns are also summarized in Table 4. The residue patterns for 2FM9, 2NP5, and 2P06 deviate somewhat from those in the four proteins discussed above. This is probably due to the different configurations of the E-to-H helix units [Figure 7b,c] compared with the helix configuration of the four proteins 1FSL, 1MBN, 1R8J, and 1XL3. Thus, the rule of common residue patterns cannot be applied strictly to 2FM9, 2NP5, and 2P06 as presented in Table 4.
Table 4. Common residue patterns. Protein E Helix G Helix H Helix 1FSL x(2)x(4) x(3)(2)x(2) x(2)(2)x(2) 1MBN x(3)x(3) x(3)(2)x(2) x(2)(2)x(2) 1R8J x(3)x(3) x(3)(2)x(2) x(2)x(3) 1XL3 x(3)x(3) x(2)(2)x(3) x(2)(2)x(3) 2FM9 — x(6) x(10) 2NP5 x(2)x(3) x(3)(2)x x(3)(2)x(2) 2P06 — x(3) x(3)(2)x(1)
The location of these residue patterns for 1R8J, 1XL3, 2FM9, 2NP5, and 2P06 correspond well to that for 1FSL. Figure S1 in Supplementary Material shows the amino acid sequence alignments based on the Dali algorithm of these proteins with 1FSL. In this figure, a residue with the mark “†” constitutes a common residue pattern.
3. Discussion
The results of the present study show that the common topology of the E-to-H helix unit can be observed beyond the globin-like fold when the Dali search is performed followed by ADM screening to check the tendency of a partial amino acid sequence of the hit regions to form a compact structure. This finding demonstrates that the E-to-H helix unit is a common structural unit in the structural space of proteins. Thus, E-to-H structures appearing in several protein folds must not necessarily have evolved from one common ancestor. Instead, the 3D structures of the E-to-H unit should be energetically stable and so this unit is observed widely in the 3D structure space. As far as the globin-like fold is concerned,
Biomolecules 2014, 4 282 as mentioned in the introduction section, there are truncated hemoglobins as primitive hemoglobins and the main structural unit of these proteins is the E-to-H helix unit part. It is speculated that E-to-H unit must be the basic structure in the early stage of the evolution of globin-like fold proteins, and this unit might grow to various globin-fold proteins during evolution. Proteins in other folds, for example, 1R8J, 1XL3 and so on, might also have grew from each ancestor protein with a structure similar to the E-to-H helix unit during evolution. Among the hit proteins, 1R8J is interesting because this protein is composed of two domains, one is the E-to-H unit and the other is the domain, while the other hit proteins are all proteins. This also indicates that the E–to-H helix unit widely exists in the structural space of proteins. Examining the conservation of packing residues with respect to the homologues of each protein reveals the common patterns of packing residue locations on the helices. These residue patterns seem to be typical patterns in -helices, but the present specific combination of these residue patterns must stabilize the 3D structure of the E-to-H helix unit. In other words, we speculate that the helix bundle formed by the G and H helices is stabilized and the E helix interacts with this bundle as shown in Figure 9. It might be possible to define a structural motif specific for E-to-H helix unit by sophisticating the present amino acid sequence patterns.
Figure 9. The hydrophobic packing formed by residues of the common patterns in (a) 1FSL; (b) 1R8J and (c) 1XL3.
Biomolecules 2014, 4 283
For the other hit proteins with the Configurations B and C, only loosely defined common residue pattern are observed as seen in Table 4. Thus, proteins with Configurations B and C show the irregularity of the patterns. Proteins with the Configurations B and C may be excluded from hit proteins by Dalilite by tuning of a Z-score. The analyses in the present study may be extended to other protein structural units and may lead us to find various common structural units beyond folds. We are currently studying along this direction.
4. Method
4.1. Search Protein 3D Structures Similar to that of the E-to-H Helix Unit in Leghemoglobin
In order to search for a 3D structure similar to that of the E-to-H helix unit, a 3D structure comparison program, Dalilite (http://ekhidna.biocenter.helsinki.fi/dali/star), was used [4] with the PDB structures. The 3D coordinates of the C atoms in the E-to-H units in soy bean leghemoglobin (PDB: 1FSL) were used as a query because the E-to-H unit in this protein has been confirmed as a folding core [2,3,26]. To do this, we prepared a 3D structure database, with Dalilite searches for the E-to-H-unit-like structures, by excluding the globin-like fold proteins from the whole PDB structures. Because all the globin-like fold proteins are expected to potentially possess an E-to-H unit, any PDB structures annotated as “globin-like fold” at SCOP were discarded from the Dali search. The protein in which the longest region aligned with the query structure produced by the Dalilite search in comparison with the other proteins within a SCOP family was taken as the representative of this family.
4.2. Prediction of Compact Regions Based on the Average Distance Map Method
To examine whether a polypeptide chain with the E-to-H-unit-like structure in a hit protein by Dalilite itself tends to form a compact structure, we employ the average distance map method. The summary of this method is presented as follows. The readers can also refer to [5,6] for more details. For a given amino acid sequence with an unknown 3D protein structure, a kind of predicted contact maps is defined by plotting a possible contact on a map when the average distance of a pair of residues in a “range” is less than a cutoff value determined beforehand. A map defined with inter-residue average distance statistics is referred to as Average Distance Map (ADM) in this paper. The procedure to construct ADM is presented as follows. (1) Calculations of average distances. The average distances between a pair of Cρatoms of residues were calculated in each range using proteins with known structures. A range is defined by the length between two residues along the amino acid sequence of a given protein. The range M = 1 is defined when 1 ≦ k ≦ 8, where k = |i − j| for i-th and j-th residues along the amino acid sequence. In the same way, the respective ranges M = 2, 3, 4 were defined by 9 ≦ k ≦ 20, 21 ≦ k ≦ 30, 31 ≦ k ≦40 and so on. For every range, the inter Cα atomic distances between two residues were calculated using proteins with known 3D structures [5]. (2) Construction of ADM. To construct a map for a given amino acid sequence, a cutoff distance for each range should be defined. These values are defined so that the contact density of the whole real distance map (RDM) of
Biomolecules 2014, 4 284 the protein is reproduced [5]. The RDM refers to a typical contact map, in other words, a map constructed based on the determined 3D structure of a protein. In the present study, an RDM corresponds to a map constructed with a cutoff of inter-residue Cα atomic distance of 15 Å. The formula, , where N is the total number of residues of a protein and C is an adjustable constant, approximately predicts the average values of the contact density of the entire region of an RDM [5]. C = 36.12, which corresponds to the 15Å threshold for the construction of an RDM, is used in the present work [5]. The cutoff distances for the construction of an ADM from the amino acid sequence of a protein are determined to reproduce a value of . A different cutoff distance is found for a different range to construct an ADM, whereas in the case of an RDM construction just one cutoff distance is required. In the construction of an ADM, it is assumed that the number of residue pairs that make contacts (and should therefore be plots on a map) obeys the following Equation in a range M [5]:
( )
Here, P(M)C is the number of amino acid pairs whose average distances in the range M is less than a given cutoff distance (i.e., residue pairs to be plotted on the ADM), and P(M)t is the total number of residue pairs in a given range (i.e., the number of the pairs with statistical significance among the whole 210 pairs of residues [5].
D is an adjustable parameter chosen to adjust the overall average density ρav of the ADM to be around the predicted value of ρav on an RDM. For an amino acid sequence of a protein, when a pair of residues has an average distance less than the cutoff distance in the range of the residue pair, a plot is made on a contact map. In this way, based on the inter-residue average distances, a kind of predicted contact map is produced from only the amino acid sequence of a given protein. A constructed map is analyzed by the following procedure. (1) Calculation of contact density differences.
Suppose that and ̃ represent the contact density of the triangular and trapezoidal parts, respectively, when the whole area of a map is divided into two parts by a line parallel to the abscissa at the i-th residue or by a line parallel to the ordinate at the i-th residue as illustrated in Figure 10a and 10b.
The contact density difference value is defined as ̃ .
Figure 10. Schematic drawing of a map divided by a line parallel to the ordinate at the i-th residue (a), and divided by a line parallel to the abscissa at the i-th residue (b). The density of
plots in the trapezoidal part and the triangular parts are denoted by ̃ and , respectively.
Biomolecules 2014, 4 285
The values of the contact density difference, are calculated from residues 1 to N, providing a scanning plot of . The scanning plot produced by the division using the line parallel to the ordinate is called horizontal scanning, and the plot produced by the line parallel to the abscissa is called vertical scanning. h of and v of denote the horizontal and vertical divisions of a map, respectively. (2) Detection of boundaries of a compact region. The maximum (peak) or minimum (valley) in the scanning plot reflects a large change in contact density values on a map. Figure 11a shows a schematic example of a horizontal scanning plot of from 1 to N, and at the bottom of the figure, a peak and a valley appear at c and d, respectively indicating a large change in contact density values. The same situation is observed in the vertical scanning in Figure 11a, indicating a peak and a valley at a and b, respectively (shown the left of the figure). The boundary of a compact region on a map can be detected by a peak and a valley appearing in the horizontal and vertical scanning plots of contact density differences.
Figure 11. (a) Schematic drawing of a map with some plots. A peak and a valley appear at the boundaries of a highly dense region of plots. This map suggests the interaction between the segments a–b and c–d. (b) Hypothetical contact map with two compact areas near the diagonal along with the horizontal and vertical scanning plots. This map suggests the existence of two domains at the regions p–q and m–n. We define as a measure of the v v compactness of the region, namely, pq or mn .
(3) Prediction of location of compact regions. The following explains how to define a possible compact region on a map using the positions of the scanning plot peaks. Figure 11b shows a hypothetical contact map with two compact areas near the diagonal with the horizontal and vertical scanning plots. We recognize the existence of two domains by the peaks at residues m and n in the horizontal scanning plot and residues p and q in the vertical scanning plot. Thus, the regions m–p and n–q on the map are predicted as possible compact regions or domains in the protein. In the ADM method, the strength of the compactness of a region m–p is measured by values defined by (Figure 11b) [5]. The procedure described above predicts the regions of possible compact regions in a protein from only its amino acid sequence. The region with the highest value can be defined a most probable compact region and such a region can be interpreted as a folding unit in a protein.
Biomolecules 2014, 4 286
4.3. Identification of Residues Forming Hydrophobic Packing
The residues forming the hydrophobic packing were identified by the decrease of solvent accessible surface area (SASA) upon folding. If the decreases in SASA exceed 10 Å2 in both of a given two residues, this pair of residues is regarded as hydrophobic packing residues. The reduction in the surface area is given by the difference between the surface area of the side chain atoms in the presence of a contact with another residue upon folding and that in the absence of a contact with another residue before folding. The SASAs were numerically calculated by enumerating the grid points dropped onto the mesh surface equidistant from each atom of a side chain [29]. The distance of a grid point from the center of an atom is the van der Waals radius of each atom plus 1.4 Å, which is the probe radius corresponding to van der Waals radius of a solvent molecule (i.e., water). We confirmed that this criterion residue pair packing corresponds to the contact of two respective atoms in each residue within approximately 5 Å.
4.4. Identification of Conserved Residues in the E-to-H Helix Unit in Each Protein
In order to identify the conserved residues across the E-to-H helix unit of each protein hit by the Dali search, homologous amino acid sequences of the helix unit were collected from the amino acid sequence database, UniProt (http://www.uniprot.org/), by means of BLAST. The amino acid sequence of the E-to-H helix unit of a representative protein for each family as well as that of leghemoglobin (1FSL) were queried against the UniRef100 which is non-redundant set of the UniProt database. A threshold of the expected value of a BLAST search for a hit amino acid sequence retrieval was taken sufficiently low value, e = 0.001, to ensure that homologous sequences were obtained. We made a multiple alignment of the obtained amino acid sequences for each query protein using MUSCLE [30]. Because, in general, the cause of the high similarity among homologous amino acid sequences cannot be distinguished either by the structural conservation or by a phylogenetic closeness, we conducted the phylogenetic analysis to find conserved residues across the homologous amino acid sequence caused by 3D structural restraints by comparing the number of amino acid substitutions in each residue. The phylogenetic trees of homologous amino acid sequences were reconstructed to obtain their ancestral amino acid sequences. The phylogenetic reconstruction was carried out by the NJ method implemented in Seaview [31]. The ancestral amino acid sequence estimation was carried out by PAML4.0 [32] where the reconstructed tree was used as the input tree and the JTT model was used as the amino acid substitution model. The number of amino acid substitutions through the tree was determined for every residue from the estimated ancestral amino acid sequences. Since the total length of the phylogenetic tree corresponds to the expected number of amino acid substitutions averaged over all residues and the distribution of the number of amino acid substitutions can be approximated using the Poisson distribution, the residue at a position in a multiple alignment, which shows that the number of amino acid substitutions at this position is within 5% from the lower level expected from Poisson distribution with mean values estimated from tree length, can be regarded as the genuine conserved residue.
Biomolecules 2014, 4 287
Acknowledgments
This work is supported by JSPS KAKENHI to M.M. Grant-in-Aid for JSPS Fellows (Grant No 259198). One of the authors (Takeshi Kikuchi) also expresses his gratitude to the Ministry of Education, Culture, Sports, Science and Technology for the support of the present work through a program for strategic research foundations at private universities, 2010–2014 (Grant No S10010).
Conflicts of Interest
The authors declare no conflict of interest.
References
1. Orengo, C.A.; Jones, D.T.; Thornton, J.M. Protein superfamilies and domain superfolds. Nature 1994, 372, 631–634. 2. Nakajima, S.; Álvarez-Salgado, E.; Kikuchi, T.; Arredondo-Peter, R. Prediction of folding pathway and kinetics among plant hemoglobins by using an average distance map method. Proteins 2005, 61, 500–506. 3. Nishimura, C.; Prytulla, S.; Dyson, H.J.; Wright, P.E. Conservation of folding pathways in evolutionarily distant globin sequences. Nature Struct. Biol. 2000, 7, 679–686. 4. Holm, L.; Park, J. DaliLite workbench for protein structure comparison. Bioinformatics 2000, 16, 566–567. 5. Kikuchi, T.; Némethy, G.; Scheraga, H.A. Prediction of the location of structural domains in globular proteins. J. Protein Chem. 1988, 7, 427–471. 6. Kikuchi, T. Decoding s of proteins using inter-residue average distance statistics to extract information on protein folding mechanisms. In Protein Folding; Walters, E.C., Ed.; Nova Science Publishers. Inc.: New York, NY, USA, 2011; pp. 465–487, ISBN 978-1-61728-990-32011. 7. Tai, C.H.; Sam, V.; Gibrat, J.F.; Garnier, J.; Munson, P.J.; Lee, B. Protein domain assignment from the recurrence of locally similar structures. Proteins 2011, 79, 853–866. 8. Szustakowski, J.D.; Kasif, S.; Weng, Z. Less is more: Towards an optimal universal description of protein folds. Bioinformatics 2005, 21, ii66–ii71. 9. Carlacci, L.; Chou, K.C. Energetic approach to the folding of four a-helices connected sequentially. Protein Eng. 1990, 3, 509–514. 10. Chou, K.C.; Maggiora, G.M.; Némethy, G.; Scheraga, H.A. Energetics of the structure of the four-alpha-helix bundle in proteins. Proc. Natl. Acad. Sci. USA 1988, 8, 4295–4299. 11. Chou, K.C.; Maggiora, G.M.; Scheraga, H.A. The role of loop-helix interactions in stabilizing four-helix bundle proteins. Proc. Natl. Acad. Sci. USA 1992, 89, 7315–7319. 12. Chou, K.C.; Némethy, G.; Pottle, M.; Scheraga, H.A. Energy of stabilization of the right-handed beta-alpha-beta crossover in proteins. J. Mol. Biol. 1989, 205, 241–249. 13. Chou, K.C.; Némethy, G.; Scheraga, H.A. Energetic approach to packing of a-helices: 2. General treatment of nonequivalent and nonregular helices. J. Amer. Chem. Soc. 1984, 106, 3161–3170. 14. Chou, K.C. Review: Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady state systems. Biophys. Chem. 1990, 35, 1–24.
Biomolecules 2014, 4 288
15. Chou, K.C. Does the folding type of a protein depend on its amino acid composition? FEBS Lett. 1995, 363, 127–131. 16. Chou, K.C.; Carlacci, L. Energetic approach to the folding of alpha/beta barrels. Proteins 1991, 9. 280–295. 17. Chou, K.C.; Zhang, C.T. Predicting protein folding types by distance functions that make allowances for amino acid interactions. J. Biol. Chem. 1994, 269, 22014–22020. 18. Zhang, C.T.; Chou, K.C. An eigenvalue-eigenvector approach to predicting protein folding types. J. Protein Chem. 1995, 14, 309–326. 19. Gerritsen, M.; Chou, K.C.; Némethy, G.; Scheraga, H.A. Energetics of multi-helix interactions in protein folding: Application to myoglobin. Biopolymers 1985, 24, 1271–1293. 20. Chou, K.C.; Lin, W.Z.; Xiao, X. Wenxiang: A web-server for drawing wenxiang diagrams. Natur. Sci. 2011, 3, 862–865. 21. Chou, K.C.; Zhang, C.T.; Maggiora, G.M. Disposition of amphiphilic helices in heteropolar environments. Proteins 1997, 28, 99–108. 22. Zhou, G.P. The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein-protein interaction mechanism. J. Theor. Biol. 2011, 284, 142–148. 23. Zhou, G.P. The Structural Determinations of the leucine zipper coiled-coil domains of the cgmp-dependent protein kinase i alpha and its interaction with the myosin binding subunit of the myosin light chains phosphase. Protein Peptide Lett. 2011, 18, 966–978. 24. Zhou, G.P.; Huang, R.B. The pH-triggered conversion of the PrP(c) to PrP(sc.). Curr. Top Med. Chem. 2013, 13, 1152–1163. 25. Kawai, Y.; Matsuoka, M.; Kikuchi, T. Analyses of Protein Sequences Using Inter-Residue Average Distance Statistics to Study Folding Processes and the Significance of Their Partial Sequences.Protein Peptide Lett. 2011, 18, 979–990. 26. Ichimaru, T.; Kikuchi, T. Analysis of the differences in the folding kinetics of structurally homologous proteins based on predictions of the gross features of residue contacts. Proteins 2003, 51, 515–530. 27. Ishizuka, Y.; Kikuchi, T. Analysis of the Local Sequences of Folding Sites in Sandwich Proteins with Inter-Residue Average Distance Statistics. Open Bioinforma. J. 2011, 5, 59–68. 28. Nakajima, S.; Kikuchi, T. Analysis of the differences in the folding mechanisms of c-type lysozymes based on contact maps constructed with interresidue average distances. J. Mol. Model. 2007, 13, 587–594. 29. Shrake, A; Rupley, J.A. Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 1973, 79, 351–371. 30. Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res.2004, 32, 1792–1797. 31. Gouy, M.; Guindon, S.; Gascuel, O. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol. Biol. Evol. 2010, 27:221–224. 32. Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007, 24, 1586– 1591.
© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Table S1: All Families hit by the present Dali search
14-3-3 protein ALDH-like
1-deoxy-D-xylulose-5-phosphate reductoisomerase, Allosteric chorismate mutase
C-terminal domain alpha-Amylases, C-terminal beta-sheet domain
2Fe-2S ferredoxin domains from multidomain proteins alpha-catenin/vinculin
3-Methyladenine DNA glycosylase III (MagIII) alpha-D-glucuronidasE-Hyaluronidase catalytic domain
5-carboxymethyl-2-hydroxymuconate isomerase (CHMI) alpha-D-glucuronidase, N-terminal domain
5' nucleotidase-like Alpha-hemoglobin stabilizing protein AHSP
5' to 3' exonuclease catalytic domain alpha-Subunit of urease
5' to 3' exonuclease, C-terminal subdomain alpha-subunit of urease, catalytic domain
6-phosphogluconate dE-Hydrogenase-like, N-terminal Amidase signature (AS) enzymes
domain A middle domain of Talin 1
ABC transporter ATPase domain-like Aminoacid dE-Hydrogenase-like, C-terminal domain
Aconitase B, N-terminal domain Aminoacid dE-Hydrogenases
Aconitase iron-sulfur domain Aminoglycoside phosphotransferases
Actin/HSP70 AmyC C-terminal domain-like
Acyl-CoA binding protein AmyC N-terminal domain-like
acyl-CoA oxidase C-terminal domains Amylase, catalytic domain
acyl-CoA oxidase N-terminal domains Animal lipoxigenases
Adapter protein APS, dimerisation domain Anticodon-binding domain of a subclass of class I
Adenylation domain of NAD+-dependent DNA ligase aminoacyl-tRNA synthetases
Adenylyl and guanylyl cyclase catalytic domain Antioxidant defense protein AhpD
Adenylylcyclase toxin (the edema factor) Anti-sigma factor AsiA
a domain/subunit of cytochrome bc1 complex Apolipophorin-III
(Ubiquinol-cytochrome c reductase) Apolipoprotein
Aerobic respiration control sensor protein, ArcB Aquaporin-like
AF0060-like Archaeal DNA-binding protein
AF0941-like Arfaptin, Rac-binding fragment
AF1104-like Arginyl-tRNA synthetase (ArgRS), N-terminal 'additional'
AgoG-like domain
Alcohol dE-Hydrogenase-like, C-terminal domain Aristolochene/pentalenene synthase
Alcohol dE-Hydrogenase-like, N-terminal domain Armadillo repeat
AldE-Hyde ferredoxin oxidoreductase, C-terminal domains Arp2/3 complex 16 kDa subunit ARPC5
AldE-Hyde ferredoxin oxidoreductase, N-terminal domain Arylsulfatase Aspartate/ornithine carbamoyltransferase Calmodulin-like
Aspartokinase allosteric domain-like Calpain large subunit, catalytic domain (domain II)
ATP synthase Calpain large subunit, middle domain (domain III)
ATP synthase (F1-ATPase), gamma subunit cAMP-binding domain a tRNA synthase domain CAPPD, an extracellular domain of amyloid beta A4 protein
Atu0492-like Cell cycle transcription factor e2f-dp
Avirulence protein AvrPto Cell division protein ZapA-like
B12-dependent (class II) ribonucleotide reductase Cellulases catalytic domain
Bacterial dinuclear zinc exopeptidases Cellulose-binding domain family III
Bacterial exopeptidase dimerisation domain Centromere-binding
Bacterial GAP domain Cfr10I/Bse634I
Bacterial glucoamylase C-terminal domain-like Chaperone J-domain
Bacterial glucoamylase N-terminal domain-like Chemotaxis phosphatase CheZ
Bacterial photosystem II reaction centre, L and M subunits Chemotaxis protein CheA P1 domain
Bacterial tryptophan 2,3-dioxygenase CheW-like
Bacteriophage CII protein Cholesterol oxidase
Bacteriorhodopsin-like Choline/Carnitine O-acyltransferase
BAG domain Circadian clock protein KaiA, C-terminal domain
Band 7/SPFH domain Class I alpha-1;2-mannosidase, catalytic domain
BAR domain Class I aminoacyl-tRNA synthetases (RS), catalytic domain
Barrier-to-autointegration factor, BAF Class II aminoacyl-tRNA synthetase (aaRS)-like, catalytic
BAS1536-like domain
BC ATP-binding domain-like Class-II DAHP synthetase
BC C-terminal domain-like Class II glutamine amidotransferases
Bcl-2 inhibitors of programmed cell death Class III anaerobic ribonucleotide reductase NRDD subunit
BC N-terminal domain-like clathrin assemblies
Beta-D-glucan exohydrolase, C-terminal domain Clostridium neurotoxins, catalytic domain beta-Lactamase/D-ala carboxypeptidase Clostridium neurotoxins, "coiled-coil" domain beta-Phosphoglucomutase-like Clostridium neurotoxins, C-terminal domain
Biotin repressor-like Clostridium neurotoxins, the second last domain
BRCA2 helical domain CMD-like
BRCA2 tower domain Cobalamin adenosyltransferase
Bromodomain Cobalamin (vitamin B12)-binding domain
Calcium ATPase, transduction domain A CO dE-Hydrogenase flavoprotein C-terminal domain-like
Calcium ATPase, transmembrane domain M CO dE-Hydrogenase flavoprotein N-terminal domain-like CO dE-Hydrogenase ISP C-domain like Cytochrome c'-like
CO dE-Hydrogenase molybdoprotein N-domain-like Cytochrome c oxidase subunit III-like
Coiled-coil domain of nucleotide exchange factor GrpE Cytochrome c oxidase subunit II-like, transmembrane
Cold shock DNA-binding domain-like region
Colicin E3 receptor domain Cytochrome c oxidase subunit I-like
Computational models partly based on NMR data Cytochrome P450
Conserved hypothetical protein MTH1747 Cytochrome p450 reductase N-terminal domain-like
CorA soluble domain-like Cytoplasmic domain of inward rectifier potassium channel
Coronavirus NSP8-like D-aminoacid aminotransferase-like PLP-dependent
Coronavirus S2 glycoprotein enzymes
COX17-like DBL homology domain (DH-domain)
Creatinase/aminopeptidase Dcp2 box A domain
Creatinase/prolidase N-terminal domain Decarboxylase
Crotonase-like DE-Hydroquinate synthase, DHQS
Cryptochrome/photolyase FAD-binding domain delta-Endotoxin, C-terminal domain
Cryptochrome/photolyase, N-terminal domain delta-Endotoxin (insectocide), middle domain
CSE2-like delta-Endotoxin (insectocide), N-terminal domain
C-terminal domain of alpha and beta subunits of F1 ATP Designed four-helix bundle protein synthase Designed single chain threE-Helix bundle
C-terminal domain of class I lysyl-tRNA synthetase DHN aldolase/epimerase
C-terminal domain of DFF45/ICAD (DFF-C domain) Dihydrodipicolinate reductase-like
C-terminal domain of eukaryotic peptide chain release Dimeric chorismate mutase factor subunit 1, ERF1 Dimerisation domain of CENP-B
C-terminal domain of Ku80 Dimerization-anchoring domain of cAMP-dependent PK
C-terminal fragment of elongation factor SelB regulatory subunit
C-terminal fragment of thermolysin DinB-like
C-terminal UvrC-binding domain of UvrB Diol dE-Hydratase, gamma subunit
CUE domain DLC
Cullin homology domain DNA-binding N-terminal domain of transcription activators
Cullin repeat DNA-binding protein Mj223
Cyclin DNA damage-inducible protein DinI
Cyclophilin (peptidylprolyl isomerase) DnaK suppressor protein DksA, alpha-hairpin domain
Cytochrome b562 DNA ligase/mRNA capping enzyme postcatalytic domain
Cytochrome b of cytochrome bc1 complex DNA polymerase I
(Ubiquinol-cytochrome c reductase) DNA polymerase III clamp loader subunits, C-terminal domain Exocyst complex component
DnaQ-like 3'-5' exonuclease Exodeoxyribonuclease V beta chain (RecC), C-terminal
DNA repair protein MutS, domain I domain
DNA repair protein MutS, domain II EXOSC10 HRDC domain-like
DNA repair protein MutS, domain III Extended AAA-ATPase domain
DNA topoisomerase IV, alpha subunit F1F0 ATP synthase subunit A
Domain of poly(ADP-ribose) polymerase F1F0 ATP synthase subunit C
Domain of the SRP/SRP receptor G-proteins FAD-linked oxidases, N-terminal domain
Double Clp-N motif FAD-linked reductases, N-terminal domain double-SIS domain FAD/NAD-linked reductases, dimerisation (C-terminal)
E2 regulatory, transactivation domain domain
EF2458-like FAD/NAD-linked reductases, N-terminal and central
EF-hand modules in multidomain proteins domains eIF-2-alpha, C-terminal domain FAT domain of focal adhesion kinase eIF2alpha middle domain-like Fe,Mn superoxide dismutase (SOD), C-terminal domain
Electron transfer flavoprotein-ubiquinone Fe,Mn superoxide dismutase (SOD), N-terminal domain oxidoreductase-like FemXAB nonribosomal peptidyltransferases
Elongation factor Ts (EF-Ts), dimerisation domain Ferredoxin domains from multidomain proteins
EntA-Im Ferritin
Enzyme IIa from lactose specific PTS, IIa-lac Fertilization protein
Enzyme I of the PEP:sugar phosphotransferase system FHV B2 protein-like
HPr-binding (sub)domain Fibrinogen coiled-coil and central regions
EpoxidE-Hydrolase FIS-like
Epsilon subunit of F1F0-ATP synthase C-terminal domain FKBP12-rapamycin-binding domain of
Epsilon subunit of F1F0-ATP synthase N-terminal domain FKBP-rapamycin-associated protein (FRAP)
ERP29 C domain-like FKBP immunophilin/proline isomerase
ERP29 N domain-like Flagellar export chaperone FliS
ESAT-6 like FMN-linked oxidoreductases
E-set domains of sugar-utilizing enzymes Formate dE-Hydrogenase N, cytochrome (gamma) subunit
ETF-QO domain-like Formin homology 2 domain (FH2 domain)
Eukaryotic DNA topoisomerase I, catalytic core Fumarate reductase respiratory complex cytochrome b
Eukaryotic DNA topoisomerase I, dispensable insert domain subunit, FrdC
Eukaryotic DNA topoisomerase I, N-terminal DNA-binding Fungal zinc peptidase fragment GABA-aminotransferase-like
Eukaryotic type KH-domain (KH-domain type I) GAPDH-like GAT domain Heat-inducible transcription repressor HrcA, N-terminal
Gated mechanosensitive channel domain
G domain-linked domain Heat shock protein 70kD (HSP70), C-terminal subdomain
Glutamate-cysteine ligase Heat shock protein 70kD (HSP70), peptide-binding domain
Glutamyl tRNA-reductase catalytic, N-terminal domain Hect, E3 ligase catalytic domain
Glutamyl tRNA-reductase dimerization domain Helical scaffold and wing domains of SecA
Glutathione peroxidase-like Hemerythrin
Glutathione S-transferase (GST), C-terminal domain Hepatitis B viral capsid (hbcag)
Glutathione S-transferase (GST), N-terminal domain Hermes dimerisation domain
GlyceraldE-Hyde-3-phosphate dE-Hydrogenase-like, Hermes transposase-like
N-terminal domain HisE-like (PRA-PH)
Glycerol-3-phosphate (1)-acyltransferase Histidine kinase
Glycerol-3-phosphate dE-Hydrogenase HlyD-like secretion proteins
(Glycosyl)asparaginase HMA, heavy metal-associated domain
Glycosyltransferase family 36 C-terminal domain HMD dimerization domain-like
Glycosyltransferase family 36 N-terminal domain HMG-box
GMC oxidoreductases Homeodomain
G proteins Homodimeric domain of signal transducing histidine kinase
GreA transcript cleavage factor, C-terminal domain HP1531-like
GreA transcript cleavage protein, N-terminal domain HR1 repeat
GRIP domain HrcA C-terminal domain-like
GroEL chaperone, ATPase domain HRDC domain from helicases
GroEL-like chaperone, apical domain HSC20 (HSCB), C-terminal oligomerisation domain
GroEL-like chaperone, intermediate domain Hsp90 co-chaperone CDC37
Group II chaperonin (CCT, TRIC), apical domain HSP90 C-terminal domain
Group II chaperonin (CCT, TRIC), ATPase domain Hut operon positive regulatory protein HutP
Group II chaperonin (CCT, TRIC), intermediate domain Hyaluronate lyase-like catalytic, N-terminal domain
Group I mobile intron endonuclease Hyaluronate lyase-like, central domain
Group V grass pollen allergen Hyaluronate lyase-like, C-terminal domain
GTP cyclohydrolase I Hyaluronidase N-terminal domain-like half-ferritin Hyaluronidase post-catalytic domain-like
Haloperoxidase (bromoperoxidase) Hybrid and chimeric proteins
HAL/PAL-like Hybrid cluster protein (prismane protein)
HCDH C-domain-like Hypothetical protein D-63
Head domain of nucleotide exchange factor GrpE Hypothetical protein MTH393 Hypothetical protein MTH677 domain
Hypothetical protein Ta1238 LeuD-like
Hypothetical protein TM1457 Lipoxigenase N-terminal domain
Hypothetical protein VC0424 Long-chain cytokines
Hypothetical protein YfhH L-sulfolactate dE-Hydrogenase-like
Hypothetical protein YhaI Magnesium transport protein CorA, transmembrane region
Hypothetical Protein YjhP Mago nashi protein
IF2B-like Malate synthase G
I/LWEQ domain Malic enzyme N-domain
IMD domain MarR-like transcriptional regulators
Influenza hemagglutinin (stalk) Mason-Pfizer monkey virus matrix protein
Inositol MazG-like monophosphatase/fructose-1,6-bisphosphatase-like Mechanosensitive channel protein MscS (YggB), C-terminal
Inositol polyphosphate 5-phosphatase (IPP5) domain
Insect pheromone/odorant-binding proteins Mechanosensitive channel protein MscS (YggB), middle
Insert subdomain of RNA polymerase alpha subunit domain
Interferon-induced guanylate-binding protein 1 (GBP1), Mechanosensitive channel protein MscS (YggB),
C-terminal domain transmembrane region
Interferons/interleukin-10 (IL-10) MED7 hinge region
IpaD-like Medium chain acyl-CoA dE-Hydrogenase-like, C-terminal
Iron-containing alcohol dE-Hydrogenase domain
Isoprenyl diphosphate synthases Medium chain acyl-CoA dE-Hydrogenase, NM (N-terminal
ISY1 N-terminal domain-like and middle) domains
KaiB-like Membrane ion ATPase
L27 domain Meta-cation ATPase, catalytic domain P
Lactate & malate dE-Hydrogenases, C-terminal domain Metal cation-transporting ATPase, ATP-binding domain N
Lambda integrase-like, catalytic core Methane monooxygenasE-Hydrolase, gamma subunit lambda integrase N-terminal domain Methicillin resistance protein FemA probable tRNA-binding
Large subunit arm
L-aspartase/fumarase Methionine synthase domain
LcrE-like Methionine synthase SAM-binding domain
LDH N-terminal domain-like Methionyl-tRNA synthetase (MetRS), Zn-domain
LemA-like Methyl-accepting chemotaxis protein (MCP) signaling
Lesion bypass DNA polymerase (Y-family), catalytic domain domain
Lesion bypass DNA polymerase (Y-family), little finger Methyl-coenzyme M reductase alpha and beta chain C-terminal domain NadC N-terminal domain-like
Methyl-coenzyme M reductase alpha and beta chain NAD+-dependent DNA ligase, domain 3
N-terminal domain NADH oxidase/flavin reductase
Methylmalonyl-CoA mutase, N-terminal (CoA-binding) NADPH-cytochrome p450 reductase FAD-binding domain domain-like
Middle domain of eukaryotic peptide chain release factor NADPH-cytochrome p450 reductase-like subunit 1, ERF1 NagZ-like
MIF4G domain-like Neurotransmitter-gated ion-channel transmembrane pore
MIF-related NF-kappa-B/REL/DORSAL transcription factors,
Mismatch glycosylase C-terminal domain
MIT domain Nickel-containing superoxide dismutase, NiSOD
Mitochondrial ATP synthase coupling factor 6 Nickel-iron hydrogenase, large subunit
Mitochondrial glycoprotein MAM33-like NifU C-terminal domain-like
Mob1/phocein NifU/IscU domain modified HD domain Nitric oxide (NO) synthase oxygenase domain
Molybdenum cofactor-binding domain Nitrogenase iron protein-like
Molybdopterin synthase subunit MoaE NKL-like monodomain cytochrome c Non-heme 11 kDa protein of cytochrome bc1 complex monomeric chorismate mutase (Ubiquinol-cytochrome c reductase)
Motor proteins Nonstructural protein ns2, Nep, M1-binding domain
MPP-like Nop domain mRNA decapping enzyme-like NSP3 homodimer
MTH1187-like N-terminal domain of adenylylcyclase associated protein,
Multidrug efflux transporter AcrB pore domain; PN1, PN2, CAP
PC1 and PC2 subdomains N-terminal domain of alpha and beta subunits of F1 ATP
Multidrug efflux transporter AcrB TolC docking domain; DN synthase and DC subdomains N-terminal domain of cbl (N-cbl)
Multidrug efflux transporter AcrB transmembrane domain N-terminal domain of enzyme I of the PEP:sugar
MxiH-like phosphotransferase system
Myb/SANT domain N-terminal domain of eukaryotic peptide chain release
Myosin S1 fragment, N-terminal domain factor subunit 1, ERF1
N-acetyl transferase, NAT N-terminal domain of the circadian clock protein KaiA
N-acylglucosamine (NAG) epimerase N-terminal domain of the delta subunit of the F1F0-ATP
NAD-binding domain of HMG-CoA reductase synthase
NadC C-terminal domain-like N-terminal, RNA-binding domain of nonstructural protein NS1 PHBH-like
Nuclear receptor ligand-binding domain Phenylalanyl-tRNA synthetase (PheRS)
Nucleotide and nucleoside kinases PhoB-like
NusA extra C-terminal domains Phoshoinositide 3-kinase (PI3K), catalytic domain
Occludin/ELL domain Phoshoinositide 3-kinase (PI3K) helical domain
Ohr/OsmC resistance proteins Phosphate binding protein-like
Oligosaccharide phosphorylase Phosphoenolpyruvate carboxylase
OmpH-like PhosphonoacetaldE-HydE-Hydrolase-like
Orbivirus capsid Phosphorelay protein-like
Orbivirus core Phosphorelay protein luxU
Outer membrane efflux proteins (OEP) Phosphoserine phosphatase RsbU, N-terminal domain
Outer surface protein C (OspC) Photosystem I reaction center subunit XI, PsaL
Oxygen-evolving enhancer protein 3, Photosystems
P40 nucleoprotein PhoU-like
PadR-like PIN domain
Pantothenate synthetase (Pantoate-beta-alanine ligase, Plant invertase/pectin methylesterase inhibitor
PanC) Plant lipoxigenases
PAPS sulfotransferase PLC-like (P variant)
PDEase Pleckstrin-homology domain (PH domain)
PE Poly(ADP-ribose) polymerase, C-terminal domain
Penicillin-binding protein 2x (pbp-2x), c-terminal domain Poly A polymerase C-terminal region-like
Penicillin binding protein dimerisation domain Poly A polymerasE-Head domain-like
Penta-EF-hand proteins Polynucleotide phosphorylase/guanosine pentaphosphate
PEP carboxykinase C-terminal domain synthase (PNPase/GPSI), domain 3
PEP carboxykinase N-terminal domain Polyphosphate kinase C-terminal domain
Pepsin-like Potassium channel NAD-binding domain
PepX catalytic domain-like POU-specific domain
PepX C-terminal domain-like PPE
Periplasmic domain of cytochrome c oxidase subunit II PPK middle domain-like
PF1790-like PPK N-terminal domain-like
PfkB-like kinase Predicted hydrolases Cof
PFOR PP module Prefoldin
PFOR Pyr module Pre-protein crosslinking domain of SecA
PG0816-like PriB N-terminal domain-like
Phase 1 flagellin Prion-like Prokaryotic DksA/TraR C4-type zinc finger RecG "wedge" domain
Prokaryotic phospholipase A2 Regulator of G-protein signaling, RGS
Prokaryotic type I DNA topoisomerase RelA/SpoT domain
Prokaryotic type KH domain (KH-domain type II) Rel/Dorsal transcription factors, DNA-binding domain
Proteasome subunits Release factor
Proteinase/alpha-amylase inhibitors Reovirus components
Protein HNS-dependent expression A; HdeA Restriction endonuclease EcoO109IR
Protein kinases, catalytic subunit Retrovirus capsid protein C-terminal domain
Protein-L-isoaspartyl O-methyltransferase Retrovirus capsid protein, N-terminal core domain
Protein prenylyltransferase Ribonucleotide reductase-like
Pseudo ankyrin repeat Ribosomal protein L19 (L19e)
PTS-regulatory domain, PRD Ribosomal protein L29 (L29p)
Putative anticodon-binding domain of alanyl-tRNA Ribosomal protein L7/12, C-terminal domain synthetase (AlaRS) Ribosomal protein L7/12, oligomerisation (N-terminal)
Putative thiamin/HMP-binding protein YkoF domain
Putative transcriptional regulator TM1602, C-terminal Ribosomal protein S15 domain Ribosomal protein S20
PyrH-like Ribosome-binding factor A, RbfA
Pyrimidine 5'-nucleotidase (UMPH-1) Ribosome complexes
Pyruvate-ferredoxin oxidoreductase, PFOR, domain II Ribosome recycling factor, RRF
Pyruvate-ferredoxin oxidoreductase, PFOR, domain III RING finger domain, C3HC4
Pyruvate phosphate dikinase, central domain RNA polymerase
Pyruvate phosphate dikinase, C-terminal domain RNA polymerase alpha subunit dimerisation domain
Pyruvate phosphate dikinase, N-terminal domain RNA-polymerase beta
R1 subunit of ribonucleotide reductase, C-terminal domain RNA-polymerase beta-prime
R1 subunit of ribonucleotide reductase, N-terminal domain RNase III catalytic domain-like
Rabenosyn-5 Rab-binding domain-like Roadblock/LC7 domain
Rab geranylgeranyltransferase alpha-subunit, C-terminal ROK domain ROK associated domain
Rab geranylgeranyltransferase alpha-subunit, insert domain ROP protein
Rad50 coiled-coil Zn hook RuBisCo LSMT catalytic domain
RAP domain RuBisCo LSMT C-terminal, substrate-binding domain
Ras-binding domain, RBD Rubredoxin
RecA protein-like (ATPase-domain) RWD domain
RecG, N-terminal domain S100 proteins Saposin B Succinate dE-Hydrogenase/fumarate reductase
SCF ubiquitin ligase complex WHB domain flavoprotein C-terminal domain
ScpB/YpuH-like Succinate dE-Hydrogenase/fumarate reductase
Sec1/munc18-like (SM) proteins flavoprotein N-terminal domain
Secreted chorismate mutase-like Succinate dE-Hydrogenase/Fumarate reductase
Serine acetyltransferase transmembrane subunits (SdhC/FrdC and SdhD/FrdD)
Serum albumin-like Superantigen MAM
Seryl-tRNA synthetase (SerRS) Swaposin
SH2 domain Synatpobrevin N-terminal domain
SH3-domain T7 RNA polymerase
Short-chain cytokines Tandem AAA-ATPase domain
Siah interacting protein N terminal domain-like TAZ domain
Sigma2 domain of RNA polymerase sigma factors TBP-associated factors, TAFs
Sigma3 domain TENA/THI-4
Sigma4 domain Terpene synthases
Signal peptide-binding domain Terpenoid cyclase C-terminal domain
Single strand DNA-binding domain, SSB Terpenoid cyclase N-terminal domain
SipA N-terminal domain-like Tetracyclin repressor-like, C-terminal domain
Siroheme synthase middle domains-like Tetracyclin repressor-like, N-terminal domain
Siroheme synthase N-terminal domain-like Tetratricopeptide repeat (TPR)
Small-conductance potassium channel Thermolysin-like
Small subunit Tim10/DDP
Smc hinge domain TK-like PP module
SopE-like GEF domain TK-like Pyr module
Spectrin repeat Top domain of virus capsid protein
SPy1572-like Topoisomerase V catalytic domain-like
SRP alpha N-terminal domain-like Topoisomerase V repeat domain
Stabilizer of iron transporter SufD TorD-like
Staphylocoagulase Transcriptional repressor TraM
STAT Transcription factor IIA (TFIIA), alpha-helical domain
STAT DNA-binding domain Transcription factor IIA (TFIIA), beta-barrel domain
Substrate-binding domain of HMG-CoA reductase Transcription factor IIB (TFIIB), core domain
Subtilases Transcription factor NusA, N-terminal domain
Succinate dE-Hydrogenase/fumarate reductase Transcription factor STAT-4 N-domain flavoprotein, catalytic domain Transducin (alpha subunit), insertion domain Transglutaminase core Virus ectodomain
Transglutaminase N-terminal domain Voltage-gated potassium channels
Transglutaminase, two C-terminal domains VPR protein fragments
Transketolase C-terminal domain-like VPS23 C-terminal domain
Translationally controlled tumor protein TCTP VPS28 C-terminal domain-like
(histamine-releasing factor) VPS28 N-terminal domain
Translin VPS37 C-terminal domain-like
Translocated intimin receptor (tir) intimin-binding domain VPS9 domain
TransmembranE-Helical fragments V-type ATP synthase subunit C
Transposase IS200-like XseB-like
Trigger factor ribosome-binding domain YciF-like
TrkA C-terminal domain-like YfiT-like putative metal-dependent hydrolases
TrmB-like YfmB-like
Tryptophan synthase beta subunit-like PLP-dependent YggX-like enzymes YhfA-like t-snare proteins YihX-like
TS-N domain YlxM/p13-like
Tumor suppressor gene product Apc YppE-like
TyeA-like Zn metallo-beta-lactamase
Type II restriction endonuclease catalytic domain 14-3-3 protein
Type II restriction endonuclease effector domain 1-deoxy-D-xylulose-5-phosphate reductoisomerase,
Typo IV secretion system protein TraC C-terminal domain
TyrA dimerization domain-like 2Fe-2S ferredoxin domains from multidomain proteins
Tyrosine-dependent oxidoreductases 3-Methyladenine DNA glycosylase III (MagIII)
Ubiquitin activating enzymes (UBA) 5-carboxymethyl-2-hydroxymuconate isomerase (CHMI)
U-box 5' nucleotidase-like
Upper collar protein gp10 (connector protein) 5' to 3' exonuclease catalytic domain
Urocanase 5' to 3' exonuclease, C-terminal subdomain
Vacuolar ATP synthase subunit C 6-phosphogluconate dE-Hydrogenase-like, N-terminal
Vanabin-like domain
Vanillyl-alcohol oxidase-like ABC transporter ATPase domain-like
Variant surface glycoprotein (N-terminal domain) Aconitase B, N-terminal domain
VBS domain Aconitase iron-sulfur domain
Vertebrate phospholipase A2 Actin/HSP70
Viral DNA-binding domain Acyl-CoA binding protein acyl-CoA oxidase C-terminal domains Anticodon-binding domain of a subclass of class I acyl-CoA oxidase N-terminal domains aminoacyl-tRNA synthetases
Adapter protein APS, dimerisation domain Antioxidant defense protein AhpD
Adenylation domain of NAD+-dependent DNA ligase Anti-sigma factor AsiA
Adenylyl and guanylyl cyclase catalytic domain Apolipophorin-III
Adenylylcyclase toxin (the edema factor) Apolipoprotein a domain/subunit of cytochrome bc1 complex Aquaporin-like
(Ubiquinol-cytochrome c reductase) Archaeal DNA-binding protein
Aerobic respiration control sensor protein, ArcB Arfaptin, Rac-binding fragment
AF0060-like Arginyl-tRNA synthetase (ArgRS), N-terminal 'additional'
AF0941-like domain
AF1104-like Aristolochene/pentalenene synthase
AgoG-like Armadillo repeat
Alcohol dE-Hydrogenase-like, C-terminal domain Arp2/3 complex 16 kDa subunit ARPC5
Alcohol dE-Hydrogenase-like, N-terminal domain Arylsulfatase
AldE-Hyde ferredoxin oxidoreductase, C-terminal domains Aspartate/ornithine carbamoyltransferase
AldE-Hyde ferredoxin oxidoreductase, N-terminal domain Aspartokinase allosteric domain-like
ALDH-like ATP synthase
Allosteric chorismate mutase ATP synthase (F1-ATPase), gamma subunit alpha-Amylases, C-terminal beta-sheet domain a tRNA synthase domain alpha-catenin/vinculin Atu0492-like alpha-D-glucuronidasE-Hyaluronidase catalytic domain Avirulence protein AvrPto alpha-D-glucuronidase, N-terminal domain B12-dependent (class II) ribonucleotide reductase
Alpha-hemoglobin stabilizing protein AHSP Bacterial dinuclear zinc exopeptidases alpha-Subunit of urease Bacterial exopeptidase dimerisation domain alpha-subunit of urease, catalytic domain Bacterial GAP domain
Amidase signature (AS) enzymes Bacterial glucoamylase C-terminal domain-like
A middle domain of Talin 1 Bacterial glucoamylase N-terminal domain-like
Aminoacid dE-Hydrogenase-like, C-terminal domain Bacterial photosystem II reaction centre, L and M subunits
Aminoacid dE-Hydrogenases Bacterial tryptophan 2,3-dioxygenase
Aminoglycoside phosphotransferases Bacteriophage CII protein
AmyC C-terminal domain-like Bacteriorhodopsin-like
AmyC N-terminal domain-like BAG domain
Amylase, catalytic domain Band 7/SPFH domain
Animal lipoxigenases BAR domain Barrier-to-autointegration factor, BAF Class II aminoacyl-tRNA synthetase (aaRS)-like, catalytic
BAS1536-like domain
BC ATP-binding domain-like Class-II DAHP synthetase
BC C-terminal domain-like Class II glutamine amidotransferases
Bcl-2 inhibitors of programmed cell death Class III anaerobic ribonucleotide reductase NRDD subunit
BC N-terminal domain-like clathrin assemblies
Beta-D-glucan exohydrolase, C-terminal domain Clostridium neurotoxins, catalytic domain beta-Lactamase/D-ala carboxypeptidase Clostridium neurotoxins, "coiled-coil" domain beta-Phosphoglucomutase-like Clostridium neurotoxins, C-terminal domain
Biotin repressor-like Clostridium neurotoxins, the second last domain
BRCA2 helical domain CMD-like
BRCA2 tower domain Cobalamin adenosyltransferase
Bromodomain Cobalamin (vitamin B12)-binding domain
Calcium ATPase, transduction domain A CO dE-Hydrogenase flavoprotein C-terminal domain-like
Calcium ATPase, transmembrane domain M CO dE-Hydrogenase flavoprotein N-terminal domain-like
Calmodulin-like CO dE-Hydrogenase ISP C-domain like
Calpain large subunit, catalytic domain (domain II) CO dE-Hydrogenase molybdoprotein N-domain-like
Calpain large subunit, middle domain (domain III) Coiled-coil domain of nucleotide exchange factor GrpE cAMP-binding domain Cold shock DNA-binding domain-like
CAPPD, an extracellular domain of amyloid beta A4 protein Colicin E3 receptor domain
Cell cycle transcription factor e2f-dp Computational models partly based on NMR data
Cell division protein ZapA-like Conserved hypothetical protein MTH1747
Cellulases catalytic domain CorA soluble domain-like
Cellulose-binding domain family III Coronavirus NSP8-like
Centromere-binding Coronavirus S2 glycoprotein
Cfr10I/Bse634I COX17-like
Chaperone J-domain Creatinase/aminopeptidase
Chemotaxis phosphatase CheZ Creatinase/prolidase N-terminal domain
Chemotaxis protein CheA P1 domain Crotonase-like
CheW-like Cryptochrome/photolyase FAD-binding domain
Cholesterol oxidase Cryptochrome/photolyase, N-terminal domain
Choline/Carnitine O-acyltransferase CSE2-like
Circadian clock protein KaiA, C-terminal domain C-terminal domain of alpha and beta subunits of F1 ATP
Class I alpha-1;2-mannosidase, catalytic domain synthase
Class I aminoacyl-tRNA synthetases (RS), catalytic domain C-terminal domain of class I lysyl-tRNA synthetase C-terminal domain of DFF45/ICAD (DFF-C domain) Dihydrodipicolinate reductase-like
C-terminal domain of eukaryotic peptide chain release Dimeric chorismate mutase factor subunit 1, ERF1 Dimerisation domain of CENP-B
C-terminal domain of Ku80 Dimerization-anchoring domain of cAMP-dependent PK
C-terminal fragment of elongation factor SelB regulatory subunit
C-terminal fragment of thermolysin DinB-like
C-terminal UvrC-binding domain of UvrB Diol dE-Hydratase, gamma subunit
CUE domain DLC
Cullin homology domain DNA-binding N-terminal domain of transcription activators
Cullin repeat DNA-binding protein Mj223
Cyclin DNA damage-inducible protein DinI
Cyclophilin (peptidylprolyl isomerase) DnaK suppressor protein DksA, alpha-hairpin domain
Cytochrome b562 DNA ligase/mRNA capping enzyme postcatalytic domain
Cytochrome b of cytochrome bc1 complex DNA polymerase I
(Ubiquinol-cytochrome c reductase) DNA polymerase III clamp loader subunits, C-terminal
Cytochrome c'-like domain
Cytochrome c oxidase subunit III-like DnaQ-like 3'-5' exonuclease
Cytochrome c oxidase subunit II-like, transmembrane DNA repair protein MutS, domain I region DNA repair protein MutS, domain II
Cytochrome c oxidase subunit I-like DNA repair protein MutS, domain III
Cytochrome P450 DNA topoisomerase IV, alpha subunit
Cytochrome p450 reductase N-terminal domain-like Domain of poly(ADP-ribose) polymerase
Cytoplasmic domain of inward rectifier potassium channel Domain of the SRP/SRP receptor G-proteins
D-aminoacid aminotransferase-like PLP-dependent Double Clp-N motif enzymes double-SIS domain
DBL homology domain (DH-domain) E2 regulatory, transactivation domain
Dcp2 box A domain EF2458-like
Decarboxylase EF-hand modules in multidomain proteins
DE-Hydroquinate synthase, DHQS eIF-2-alpha, C-terminal domain delta-Endotoxin, C-terminal domain eIF2alpha middle domain-like delta-Endotoxin (insectocide), middle domain Electron transfer flavoprotein-ubiquinone delta-Endotoxin (insectocide), N-terminal domain oxidoreductase-like
Designed four-helix bundle protein Elongation factor Ts (EF-Ts), dimerisation domain
Designed single chain threE-Helix bundle EntA-Im
DHN aldolase/epimerase Enzyme IIa from lactose specific PTS, IIa-lac Enzyme I of the PEP:sugar phosphotransferase system FHV B2 protein-like
HPr-binding (sub)domain Fibrinogen coiled-coil and central regions
EpoxidE-Hydrolase FIS-like
Epsilon subunit of F1F0-ATP synthase C-terminal domain FKBP12-rapamycin-binding domain of
Epsilon subunit of F1F0-ATP synthase N-terminal domain FKBP-rapamycin-associated protein (FRAP)
ERP29 C domain-like FKBP immunophilin/proline isomerase
ERP29 N domain-like Flagellar export chaperone FliS
ESAT-6 like FMN-linked oxidoreductases
E-set domains of sugar-utilizing enzymes Formate dE-Hydrogenase N, cytochrome (gamma) subunit
ETF-QO domain-like Formin homology 2 domain (FH2 domain)
Eukaryotic DNA topoisomerase I, catalytic core Fumarate reductase respiratory complex cytochrome b
Eukaryotic DNA topoisomerase I, dispensable insert domain subunit, FrdC
Eukaryotic DNA topoisomerase I, N-terminal DNA-binding Fungal zinc peptidase fragment GABA-aminotransferase-like
Eukaryotic type KH-domain (KH-domain type I) GAPDH-like
Exocyst complex component GAT domain
Exodeoxyribonuclease V beta chain (RecC), C-terminal Gated mechanosensitive channel domain G domain-linked domain
EXOSC10 HRDC domain-like Glutamate-cysteine ligase
Extended AAA-ATPase domain Glutamyl tRNA-reductase catalytic, N-terminal domain
F1F0 ATP synthase subunit A Glutamyl tRNA-reductase dimerization domain
F1F0 ATP synthase subunit C Glutathione peroxidase-like
FAD-linked oxidases, N-terminal domain Glutathione S-transferase (GST), C-terminal domain
FAD-linked reductases, N-terminal domain Glutathione S-transferase (GST), N-terminal domain
FAD/NAD-linked reductases, dimerisation (C-terminal) GlyceraldE-Hyde-3-phosphate dE-Hydrogenase-like, domain N-terminal domain
FAD/NAD-linked reductases, N-terminal and central Glycerol-3-phosphate (1)-acyltransferase domains Glycerol-3-phosphate dE-Hydrogenase
FAT domain of focal adhesion kinase (Glycosyl)asparaginase
Fe,Mn superoxide dismutase (SOD), C-terminal domain Glycosyltransferase family 36 C-terminal domain
Fe,Mn superoxide dismutase (SOD), N-terminal domain Glycosyltransferase family 36 N-terminal domain
FemXAB nonribosomal peptidyltransferases GMC oxidoreductases
Ferredoxin domains from multidomain proteins G proteins
Ferritin GreA transcript cleavage factor, C-terminal domain
Fertilization protein GreA transcript cleavage protein, N-terminal domain GRIP domain HrcA C-terminal domain-like
GroEL chaperone, ATPase domain HRDC domain from helicases
GroEL-like chaperone, apical domain HSC20 (HSCB), C-terminal oligomerisation domain
GroEL-like chaperone, intermediate domain Hsp90 co-chaperone CDC37
Group II chaperonin (CCT, TRIC), apical domain HSP90 C-terminal domain
Group II chaperonin (CCT, TRIC), ATPase domain Hut operon positive regulatory protein HutP
Group II chaperonin (CCT, TRIC), intermediate domain Hyaluronate lyase-like catalytic, N-terminal domain
Group I mobile intron endonuclease Hyaluronate lyase-like, central domain
Group V grass pollen allergen Hyaluronate lyase-like, C-terminal domain
GTP cyclohydrolase I Hyaluronidase N-terminal domain-like half-ferritin Hyaluronidase post-catalytic domain-like
Haloperoxidase (bromoperoxidase) Hybrid and chimeric proteins
HAL/PAL-like Hybrid cluster protein (prismane protein)
HCDH C-domain-like Hypothetical protein D-63
Head domain of nucleotide exchange factor GrpE Hypothetical protein MTH393
Heat-inducible transcription repressor HrcA, N-terminal Hypothetical protein MTH677 domain Hypothetical protein Ta1238
Heat shock protein 70kD (HSP70), C-terminal subdomain Hypothetical protein TM1457
Heat shock protein 70kD (HSP70), peptide-binding domain Hypothetical protein VC0424
Hect, E3 ligase catalytic domain Hypothetical protein YfhH
Helical scaffold and wing domains of SecA Hypothetical protein YhaI
Hemerythrin Hypothetical Protein YjhP
Hepatitis B viral capsid (hbcag) IF2B-like
Hermes dimerisation domain I/LWEQ domain
Hermes transposase-like IMD domain
HisE-like (PRA-PH) Influenza hemagglutinin (stalk)
Histidine kinase Inositol
HlyD-like secretion proteins monophosphatase/fructose-1,6-bisphosphatase-like
HMA, heavy metal-associated domain Inositol polyphosphate 5-phosphatase (IPP5)
HMD dimerization domain-like Insect pheromone/odorant-binding proteins
HMG-box Insert subdomain of RNA polymerase alpha subunit
Homeodomain Interferon-induced guanylate-binding protein 1 (GBP1),
Homodimeric domain of signal transducing histidine kinase C-terminal domain
HP1531-like Interferons/interleukin-10 (IL-10)
HR1 repeat IpaD-like Iron-containing alcohol dE-Hydrogenase domain
Isoprenyl diphosphate synthases Medium chain acyl-CoA dE-Hydrogenase, NM (N-terminal
ISY1 N-terminal domain-like and middle) domains
KaiB-like Membrane ion ATPase
L27 domain Meta-cation ATPase, catalytic domain P
Lactate & malate dE-Hydrogenases, C-terminal domain Metal cation-transporting ATPase, ATP-binding domain N
Lambda integrase-like, catalytic core Methane monooxygenasE-Hydrolase, gamma subunit lambda integrase N-terminal domain Methicillin resistance protein FemA probable tRNA-binding
Large subunit arm
L-aspartase/fumarase Methionine synthase domain
LcrE-like Methionine synthase SAM-binding domain
LDH N-terminal domain-like Methionyl-tRNA synthetase (MetRS), Zn-domain
LemA-like Methyl-accepting chemotaxis protein (MCP) signaling
Lesion bypass DNA polymerase (Y-family), catalytic domain domain
Lesion bypass DNA polymerase (Y-family), little finger Methyl-coenzyme M reductase alpha and beta chain domain C-terminal domain
LeuD-like Methyl-coenzyme M reductase alpha and beta chain
Lipoxigenase N-terminal domain N-terminal domain
Long-chain cytokines Methylmalonyl-CoA mutase, N-terminal (CoA-binding)
L-sulfolactate dE-Hydrogenase-like domain
Magnesium transport protein CorA, transmembrane region Middle domain of eukaryotic peptide chain release factor
Mago nashi protein subunit 1, ERF1
Malate synthase G MIF4G domain-like
Malic enzyme N-domain MIF-related
MarR-like transcriptional regulators Mismatch glycosylase
Mason-Pfizer monkey virus matrix protein MIT domain
MazG-like Mitochondrial ATP synthase coupling factor 6
Mechanosensitive channel protein MscS (YggB), C-terminal Mitochondrial glycoprotein MAM33-like domain Mob1/phocein
Mechanosensitive channel protein MscS (YggB), middle modified HD domain domain Molybdenum cofactor-binding domain
Mechanosensitive channel protein MscS (YggB), Molybdopterin synthase subunit MoaE transmembrane region monodomain cytochrome c
MED7 hinge region monomeric chorismate mutase
Medium chain acyl-CoA dE-Hydrogenase-like, C-terminal Motor proteins MPP-like Nop domain mRNA decapping enzyme-like NSP3 homodimer
MTH1187-like N-terminal domain of adenylylcyclase associated protein,
Multidrug efflux transporter AcrB pore domain; PN1, PN2, CAP
PC1 and PC2 subdomains N-terminal domain of alpha and beta subunits of F1 ATP
Multidrug efflux transporter AcrB TolC docking domain; DN synthase and DC subdomains N-terminal domain of cbl (N-cbl)
Multidrug efflux transporter AcrB transmembrane domain N-terminal domain of enzyme I of the PEP:sugar
MxiH-like phosphotransferase system
Myb/SANT domain N-terminal domain of eukaryotic peptide chain release
Myosin S1 fragment, N-terminal domain factor subunit 1, ERF1
N-acetyl transferase, NAT N-terminal domain of the circadian clock protein KaiA
N-acylglucosamine (NAG) epimerase N-terminal domain of the delta subunit of the F1F0-ATP
NAD-binding domain of HMG-CoA reductase synthase
NadC C-terminal domain-like N-terminal, RNA-binding domain of nonstructural protein
NadC N-terminal domain-like NS1
NAD+-dependent DNA ligase, domain 3 Nuclear receptor ligand-binding domain
NADH oxidase/flavin reductase Nucleotide and nucleoside kinases
NADPH-cytochrome p450 reductase FAD-binding NusA extra C-terminal domains domain-like Occludin/ELL domain
NADPH-cytochrome p450 reductase-like Ohr/OsmC resistance proteins
NagZ-like Oligosaccharide phosphorylase
Neurotransmitter-gated ion-channel transmembrane pore OmpH-like
NF-kappa-B/REL/DORSAL transcription factors, Orbivirus capsid
C-terminal domain Orbivirus core
Nickel-containing superoxide dismutase, NiSOD Outer membrane efflux proteins (OEP)
Nickel-iron hydrogenase, large subunit Outer surface protein C (OspC)
NifU C-terminal domain-like Oxygen-evolving enhancer protein 3,
NifU/IscU domain P40 nucleoprotein
Nitric oxide (NO) synthase oxygenase domain PadR-like
Nitrogenase iron protein-like Pantothenate synthetase (Pantoate-beta-alanine ligase,
NKL-like PanC)
Non-heme 11 kDa protein of cytochrome bc1 complex PAPS sulfotransferase
(Ubiquinol-cytochrome c reductase) PDEase
Nonstructural protein ns2, Nep, M1-binding domain PE Penicillin-binding protein 2x (pbp-2x), c-terminal domain Poly A polymerase C-terminal region-like
Penicillin binding protein dimerisation domain Poly A polymerasE-Head domain-like
Penta-EF-hand proteins Polynucleotide phosphorylase/guanosine pentaphosphate
PEP carboxykinase C-terminal domain synthase (PNPase/GPSI), domain 3
PEP carboxykinase N-terminal domain Polyphosphate kinase C-terminal domain
Pepsin-like Potassium channel NAD-binding domain
PepX catalytic domain-like POU-specific domain
PepX C-terminal domain-like PPE
Periplasmic domain of cytochrome c oxidase subunit II PPK middle domain-like
PF1790-like PPK N-terminal domain-like
PfkB-like kinase Predicted hydrolases Cof
PFOR PP module Prefoldin
PFOR Pyr module Pre-protein crosslinking domain of SecA
PG0816-like PriB N-terminal domain-like
Phase 1 flagellin Prion-like
PHBH-like Prokaryotic DksA/TraR C4-type zinc finger
Phenylalanyl-tRNA synthetase (PheRS) Prokaryotic phospholipase A2
PhoB-like Prokaryotic type I DNA topoisomerase
Phoshoinositide 3-kinase (PI3K), catalytic domain Prokaryotic type KH domain (KH-domain type II)
Phoshoinositide 3-kinase (PI3K) helical domain Proteasome subunits
Phosphate binding protein-like Proteinase/alpha-amylase inhibitors
Phosphoenolpyruvate carboxylase Protein HNS-dependent expression A; HdeA
PhosphonoacetaldE-HydE-Hydrolase-like Protein kinases, catalytic subunit
Phosphorelay protein-like Protein-L-isoaspartyl O-methyltransferase
Phosphorelay protein luxU Protein prenylyltransferase
Phosphoserine phosphatase RsbU, N-terminal domain Pseudo ankyrin repeat
Photosystem I reaction center subunit XI, PsaL PTS-regulatory domain, PRD
Photosystems Putative anticodon-binding domain of alanyl-tRNA
PhoU-like synthetase (AlaRS)
PIN domain Putative thiamin/HMP-binding protein YkoF
Plant invertase/pectin methylesterase inhibitor Putative transcriptional regulator TM1602, C-terminal
Plant lipoxigenases domain
PLC-like (P variant) PyrH-like
Pleckstrin-homology domain (PH domain) Pyrimidine 5'-nucleotidase (UMPH-1)
Poly(ADP-ribose) polymerase, C-terminal domain Pyruvate-ferredoxin oxidoreductase, PFOR, domain II Pyruvate-ferredoxin oxidoreductase, PFOR, domain III RING finger domain, C3HC4
Pyruvate phosphate dikinase, central domain RNA polymerase
Pyruvate phosphate dikinase, C-terminal domain RNA polymerase alpha subunit dimerisation domain
Pyruvate phosphate dikinase, N-terminal domain RNA-polymerase beta
R1 subunit of ribonucleotide reductase, C-terminal domain RNA-polymerase beta-prime
R1 subunit of ribonucleotide reductase, N-terminal domain RNase III catalytic domain-like
Rabenosyn-5 Rab-binding domain-like Roadblock/LC7 domain
Rab geranylgeranyltransferase alpha-subunit, C-terminal ROK domain ROK associated domain
Rab geranylgeranyltransferase alpha-subunit, insert domain ROP protein
Rad50 coiled-coil Zn hook RuBisCo LSMT catalytic domain
RAP domain RuBisCo LSMT C-terminal, substrate-binding domain
Ras-binding domain, RBD Rubredoxin
RecA protein-like (ATPase-domain) RWD domain
RecG, N-terminal domain S100 proteins
RecG "wedge" domain Saposin B
Regulator of G-protein signaling, RGS SCF ubiquitin ligase complex WHB domain
RelA/SpoT domain ScpB/YpuH-like
Rel/Dorsal transcription factors, DNA-binding domain Sec1/munc18-like (SM) proteins
Release factor Secreted chorismate mutase-like
Reovirus components Serine acetyltransferase
Restriction endonuclease EcoO109IR Serum albumin-like
Retrovirus capsid protein C-terminal domain Seryl-tRNA synthetase (SerRS)
Retrovirus capsid protein, N-terminal core domain SH2 domain
Ribonucleotide reductase-like SH3-domain
Ribosomal protein L19 (L19e) Short-chain cytokines
Ribosomal protein L29 (L29p) Siah interacting protein N terminal domain-like
Ribosomal protein L7/12, C-terminal domain Sigma2 domain of RNA polymerase sigma factors
Ribosomal protein L7/12, oligomerisation (N-terminal) Sigma3 domain domain Sigma4 domain
Ribosomal protein S15 Signal peptide-binding domain
Ribosomal protein S20 Single strand DNA-binding domain, SSB
Ribosome-binding factor A, RbfA SipA N-terminal domain-like
Ribosome complexes Siroheme synthase middle domains-like
Ribosome recycling factor, RRF Siroheme synthase N-terminal domain-like Small-conductance potassium channel Thermolysin-like
Small subunit Tim10/DDP
Smc hinge domain TK-like PP module
SopE-like GEF domain TK-like Pyr module
Spectrin repeat Top domain of virus capsid protein
SPy1572-like Topoisomerase V catalytic domain-like
SRP alpha N-terminal domain-like Topoisomerase V repeat domain
Stabilizer of iron transporter SufD TorD-like
Staphylocoagulase Transcriptional repressor TraM
STAT Transcription factor IIA (TFIIA), alpha-helical domain
STAT DNA-binding domain Transcription factor IIA (TFIIA), beta-barrel domain
Substrate-binding domain of HMG-CoA reductase Transcription factor IIB (TFIIB), core domain
Subtilases Transcription factor NusA, N-terminal domain
Succinate dE-Hydrogenase/fumarate reductase Transcription factor STAT-4 N-domain flavoprotein, catalytic domain Transducin (alpha subunit), insertion domain
Succinate dE-Hydrogenase/fumarate reductase Transglutaminase core flavoprotein C-terminal domain Transglutaminase N-terminal domain
Succinate dE-Hydrogenase/fumarate reductase Transglutaminase, two C-terminal domains flavoprotein N-terminal domain Transketolase C-terminal domain-like
Succinate dE-Hydrogenase/Fumarate reductase Translationally controlled tumor protein TCTP transmembrane subunits (SdhC/FrdC and SdhD/FrdD) (histamine-releasing factor)
Superantigen MAM Translin
Swaposin Translocated intimin receptor (tir) intimin-binding domain
Synatpobrevin N-terminal domain TransmembranE-Helical fragments
T7 RNA polymerase Transposase IS200-like
Tandem AAA-ATPase domain Trigger factor ribosome-binding domain
TAZ domain TrkA C-terminal domain-like
TBP-associated factors, TAFs TrmB-like
TENA/THI-4 Tryptophan synthase beta subunit-like PLP-dependent
Terpene synthases enzymes
Terpenoid cyclase C-terminal domain t-snare proteins
Terpenoid cyclase N-terminal domain TS-N domain
Tetracyclin repressor-like, C-terminal domain Tumor suppressor gene product Apc
Tetracyclin repressor-like, N-terminal domain TyeA-like
Tetratricopeptide repeat (TPR) Type II restriction endonuclease catalytic domain Type II restriction endonuclease effector domain VPR protein fragments
Typo IV secretion system protein TraC VPS23 C-terminal domain
TyrA dimerization domain-like VPS28 C-terminal domain-like
Tyrosine-dependent oxidoreductases VPS28 N-terminal domain
Ubiquitin activating enzymes (UBA) VPS37 C-terminal domain-like
U-box VPS9 domain
Upper collar protein gp10 (connector protein) V-type ATP synthase subunit C
Urocanase XseB-like
Vacuolar ATP synthase subunit C YciF-like
Vanabin-like YfiT-like putative metal-dependent hydrolases
Vanillyl-alcohol oxidase-like YfmB-like
Variant surface glycoprotein (N-terminal domain) YggX-like
VBS domain YhfA-like
Vertebrate phospholipase A2 YihX-like
Viral DNA-binding domain YlxM/p13-like
Virus ectodomain YppE-like
Voltage-gated potassium channels Zn metallo-beta-lactamase
Figure S1: Alignment of the sequences of 1FSL, and those of 1R8J, 1XL3, 2FM9, 2NP5, and 2P06, respectively, based on the 3D structure alignment produced by Dalilite. A segment with capital letters denotes that the 3D structure of this segment is similar to that of the aligned segment of the query protein. A residue with the mark “†” constitutes a common residue pattern.
Figure S2: Multiple alignment of each protein and its homologues. The accession numbers of homologues are presented in the right side of each figure. The histogram at the bottom of each figure denotes the number of substitutions of each residue. A red bar means a helix. (a) Multiple alignment of the sequences of 1FSL and its homologues.
(b) Multiple alignment of the sequences of 1MBN and its homologues.
(c) Multiple alignment of the sequences of 1R8J and its homologues.
(d) Multiple alignment of the sequences of 1XL3 and its homologues.
Copyright of Biomolecules (2218-273X) is the property of MDPI Publishing and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.