An Atpase Domain Common to Prokaryotic Cell Cycle Proteins
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Natl. Acad. Sci. USA Vol. 89, pp. 7290-7294, August 1992 Biochemistry An ATPase domain common to prokaryotic cell cycle proteins, sugar kinases, actin, and hsp7O heat shock proteins (structural comparison/property pattern/remote homology) PEER BORK, CHRIS SANDER, AND ALFONSO VALENCIA European Molecular Biology Laboratory, D-6900 Heidelberg, Federal Republic of Germany Communicated by Russell F. Doolittle, March 6, 1992 ABSTRACT The functionally diverse actin, hexokinase, and hsp7O protein families have in common an ATPase domain of known three-dimensional structure. Optimal superposition ofthe three structures and alignment ofmany sequences in each of the three families has revealed a set of common conserved residues, distributed in five sequence motifs, which are in- volved in ATP binding and in a putative interdomain hinge. From the multiple sequence aliment in these motifs a pattern of amino acid properties required at each position is defined. The discriminatory power of the pattern is in part due to the use of several known three-dimensional structures and many sequences and in part to the "property" method ofgeneralizing from observed amino acid frequencies to amino acid fitness at each sequence position. A sequence data base search with the pattern significantly matches sugar kinases, such as fuco-, glucono-, xylulo-, ribulo-, and glycerokinase, as well as the prokaryotic cell cycle proteins MreB, FtsA, and StbA. These are predicted to have subdomains with the same tertiary structure as the ATPase subdomains Ia and Ha of hexokinase, actin, and Hsc7O, a very similar ATP binding pocket, and the capacity for interdomain hinge motion accompanying func- tional state changes. A common evolutionary origin for all of the proteins in this class is proposed. In spite of their different biological functions, actin, Hsc7O, CONNECT Z and hexokinase contain similar three-dimensional structures pxxtthhGhhhxhxh (Fig. 1) (1-4). No overall sequence similarity between these L . three protein families can be detected with standard pairwise FIG. 1. Diagramatic representation of actin. The five parts of the sequence so the structural sequence pattern characteristic for the actin/hexokinase/hsc70 alignment algorithms, similarity ATPase domain are mapped onto the three-dimensional structure came as a surprise (1, 3, 4). All three bind and hydrolyze ATP. (solid arrows). The sequence motifin each ofthe five regions is given The ATPase activity of actin is involved in the control of here in terms of the following amino acid groups: h, purely hydro- polymerization (5); that of Hsc7O (a member of the hsp70 phobic (VLIFWY); f, partly hydrophobic (VLIFWYMCGATKHR); family of heat shock proteins) is involved in a variety of t, tiny (GSAT); s, small (GSATNDVCP); p, tiny plus polar (GSAT- "6chaperonin" functions, such as keeping protein chains NDQEKHR); x, any aminoacid. Note thatthegroups overlapandthat translocation competent, preventing aggregation, or aiding in their names are merely mnemonic. The five-motifpattern used for the their refolding from aggregated states and that of hexo- search cannot be fully represented in this simple form. The motifs are (6); alllocated in the conserved substructure (dark shading) commontothe kinase is involved in phosphorylation of glucose at the entry three known crystal structures. We denote the subdomains by Ia, lb, to the glycolytic pathway (7). 11a, and IIb, as in the Hsc7O crystal structure (4). The common structural feature is that of two domains of similar fold on either side of a large cleft with an ATP binding adenylate kinase, recA protein, elongation factor Tu, or ras site at the bottom of the cleft. Each of these (I and II in Fig. oncogene protein p21 (8). 1) is composed of two subdomains. Subdomains Ia and Ha These three structures are not only similar in three- have the same basic fold (Fig. 1, lower left and right, shaded dimensional fold but probably also in some aspects of mech- ribbons): a central (sheet surrounded by helices, with iden- anism: their ATPase active sites are lined with identical or tical topology of loop connections, probably a result of gene similar residues and the overall domain structure is sugges- duplication (1, 3, 4). The phosphate tail of ATP is bound by tive ofthe capacity for interdomain motion, perhaps directly residues on two /-hairpins, one from each ofthe subdomains coupled to the ATPase activity. A rotation of -30° of one of Ia and Ha, and on other nearby segments (1-3) (not yet the domains is required to superimpose optimally the nucle- proven for hexokinase). This ATP binding motif is distinctly otide-bound [actin or Hsc7O (1, 3)] onto the nucleotide-free different from the single phosphate binding loop in, e.g., [hexokinase (2)] structure, suggesting that each of the struc- tures can be in an open and closed state. For hexokinase, The publication costs ofthis article were defrayed in part by page charge there is direct crystallographic evidence for a large confor- payment. This article must therefore be hereby marked "advertisement" mational change when a nucleotide binds (9, 10). Thus, the in accordance with 18 U.S.C. §1734 solely to indicate this fact. structural similarity of the three proteins is correlated with 7290 Downloaded by guest on September 27, 2021 Biochemistry: Bork et al. Proc. Natl. Acad. Sci. USA 89 (1992) 7291 partial similarity of function. A common evolutionary origin Definition of Conserved Regions. The multiple sequence of this ATPase fold is therefore likely. alignment reveals regions with conserved residues; several of Here, we exploit the similarities of the three remotely, but these are near the ATP binding site, indicating a functional clearly, related three-dimensional structures and the infor- role. The boundaries of the conserved regions were delin- mation contained in the multiple sequence alignment within eated by trial and error, optimizing discriminatory power. each of the three families and define a common sequence Each of the five conserved regions chosen was -20 residues pattern. The pattern appears to be characteristic of the long and each was used to define a motif. Together, the five tertiary structure and ATPase properties of this class of motifs define the overall 93-residue search pattern, covering domains and can be used to scan data bases for structural 47% of all residues in subdomains Ia and Ha. related proteins. Pattern Definition. The problem ofpattern definition is one of extrapolating from the known frequencies ofresidue types at each position, to the spectrum of structurally permissible MATERIALS AND METHODS residue types. This process of generalization is a key feature Alignment of Three-Dimensional Structures. The common of our pattern approach and goes beyond the direct use of structural core of the three proteins is based on optimal residue frequencies in standard profile search methods. The superposition of the actin crystal structure onto that ofhsc70 pattern derivation was carried out by the approach of Bork and of a model of hexokinase. The superposition is done one and Grunwald (14). The method is based on the definition of subdomain at a time, using an algorithm that defines pairs of an 11 by 20 table of amino acid properties, indicating which residues equivalent in three dimensions (11). The common of the 20 standard amino acids has which of 11 amino acid structural core is defined here as those residue pairs that fall properties (hydrophobic, positive, negative, polar, charged, within a 2.5-A cutoff on Ca-Ca distances between equivalent small, tiny, aliphatic, aromatic, proline, glycine). residues. For actin and Hsc7O, the superposition is compat- The property pattern is defined for the five sequence motifs ible with that of Flaherty et al. (4). The structural core-i.e., or "boxes" (Figs. 1 and 2). Each of the five motifs (PHOS- the equivalenced residues-covers as much as 45% (90 of PHATE1, CONNECT1, PHOSPHATE2, ADENOSINE, 199) of the residues in subdomains Ia and Iha. and CONNECT2) is derived from the multiple sequence The hexokinase model was derived from the coordinate alignment of the known members of the actin/hexokinase/ data set 2YHX (Brookhaven Protein Data Bank) after align- hsp70 class as follows (14). (i) The multiple alignment of all ment of the isoform b sequence (Swiss-Prot identifier available sequences is translated to a set of amino acid HXKB.YEAST), using an automatic side-chain building properties observed at each position. (ii) The set ofproperties procedure (12). allowed at each position is translated to a mismatch penalty Combining Multiple Sequence Alignment and Structure for each potential amino acid at that position. (iii) A variable Alignment. To obtain the overall alignment of the sequences gap penalty is assigned to each position, depending on in all three families, we make use of the three-dimensional whether a gap was observed in any one protein. The full alignment of actin/Hsc70/hexokinase in the structural core. property pattern can be reduced to a simplified pattern shown The many additional sequences of unknown three- in Fig. 1, in terms of the binary language of allowed/ dimensional structure are aligned separately in each family by disallowed amino acids at specific positions. This simpler a multiple sequence alignment procedure (13). Finally, the pattern can be used with the pattern search option in standard multiple sequence alignments for the three families are sequence analysis packages but it is less discriminating (data brought into register by