Internal Homologies in the Two Aspartokinase-Homoserine
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Nati. Acad. Sci. USA Vol. 81, pp. 3019-3023, May 1984 Biochemistry Internal homologies in the two aspartokinase-homoserine dehydrogenases of Escherichia coli K-12 (gene duplication/evolution/bifunctional proteins) PASCUAL FERRARA, NATHALIE DUCHANGE, MARIO M. ZAKIN, AND GEORGES N. COHEN The Unite de Biochimie Cellulaire, Ddpartement de Biochimie et Gdndtique Moldculaire, Institut Pasteur, 28 rue du Dr. Roux, 75724 Paris Cedex 15, France Communicated by Norman H. Horowitz, January 27, 1984 ABSTRACT In Escherichia coli, AK I-HDH I and AK II- METHODS HDH II are two bifunctional proteins, derived from a common ancestor, that catalyze the first and third reactions of the com- Protein sequence comparisons were done by computer. mon pathway leading to threonine and methionine. An exten- Each sequence was compared with itself by a simple frame- sive amino acid sequence comparison of both molecules reveals shift method using different segment lengths. After small ho- two main features on each of them: (i) two segments, each of mologous fragments were found, comparison was extended about 130 amino acids, covering the first one-third of the poly- in both directions, allowing gaps to maximize homologies. peptide chain, are similar to each other and (ii) two segments, Finally, the fragments were aligned with a computer pro- each of about 250 amino acids and covering the COOH-termi- gram, provided by T. F. Smith and W. M. Fitch, using the nal 500 amino acids also present a significant homology. These matrix algorithm of Sellers (8), modified by Smith et al. (9). findings suggest that these two regions may have evolved inde- A deletion weight of 0.7 for each gap plus 1.5 times the pendently of each other by a process of gene duplication and number of residues in each gap was initially used. Some of fusion previous to the appearance of an ancestral aspartoki- the alignments were also checked using a deletion weight nase-homoserine dehydrogenase molecule. of 1.5 for each gap and 1.5 times the number of residues in each gap. Unmatched terminal sequences were always In Escherichia coli, aspartokinase I-homoserine dehydroge- weighted. nase I and aspartokinase TI-homoserine dehydrogenase II The number of introduced gaps in the compared se- are two bifunctional proteins that catalyze the first and third quences is between 0.22 and 0.38 times the number of identi- reaction of the common pathway leading to threonine and ties plus homologies produced by their introduction. If we methionine (1). Recently, the sequence of the corresponding restrict ourselves to identities, this ratio varies between 0.42 genes, thrA and metL has been established in our laboratory and 0.61. (2, 3). The comparison of the two amino acid sequences es- In one case (see Fig. 5), in which segments of 51 residues tablishes beyond doubt that the two proteins derive from a were compared, no gaps needed to be introduced, and we common ancestor (3). took advantage of this fact to compare these sequences to It has been known for more than 10 years that the homote- the National Biomedical Research Foundation data base, us- trameric aspartokinase I-homoserine dehydrogenase I (820 ing the program PROBE/EXPLOR, devised by Claverie residues) is composed of at least two functional domains: (i) (10). An NH2-terminal domain defined by a tetrameric fragment To determine the significance of the final alignment ob- extracted from a nonsense mutant possessing only the aspar- tained, two tests were applied. First, 10-20 random se- tokinase activity (4). This fragment, as extracted, extends quences, with identical amino acid composition to one of the from the NH2 terminus to approximately residue 495 (5). (ii) segments compared, were generated and subjected to the A COOH-terminal domain, defined by a proteolytic dimeric same alignment program with the other segment, to deter- fragment, endowed with homoserine dehydrogenase activity mine a set of random Sellers values. The second test was the only (4). This fragment starts, depending on the protease comparison between unrelated proteins or fragments of simi- used to generate it, between residues 293 and 300 (6, 7) and lar size. For example, with the gap penalty chosen, we were extends to the COOH-terminal residue 820. Thus, the two not able to detect any homology between the 190-residue- fragments share a common sequence between residues 293- long equine growth hormone (11) and the first 190 amino ac- 300 and ca. 495. A recent more detailed study of limited pro- ids of rabbit erythrocyte carbonic anhydrase C (12), or be- teolysis has defined three structural and functional domains tween the same length of carbonic anhydrase and a segment (7): an NH2-terminal aspartokinase I domain (Mr, -27,000; of comparable length of aspartokinase IT-homoserine dehy- 250 residues), a central domain ID (Mr, -25,000; 232 resi- drogenase II. dues) involved in subunit association and devoid of any cata- In addition to identities, we also considered the two types lytic activity; and a COOH-terminal homoserine dehydroge- of homologies normally accepted (13): those concerning ami- nase domain (Mr, 35,000; 324 residues). These fragments no acids that could be considered to derive from one another do not overlap and the sum of their molecular weights is by a single base change in the corresponding codon, and equal or nearly equal to that of the entire polypeptide chain. those concerning accepted replacements. For the first cate- To ascertain whether more information could be obtained gory, because we know the nucleotide sequence of the corre- about the origin of this long complex molecule, we decided sponding genes (2, 3), we considered only the cases in which to search for internal homologies in its sequence and to ex- a single base change was actually observed and not the other tend this study to its isofunctional counterpart, aspartoki- theoretically possible cases. The only accepted amino acid nase II-homoserine dehydrogenase 11 (809 residues), with replacements that we have taken into account are as follows: which it shares a common ancestor (3). isoleucine to leucine to valine, serine to threonine, phenylal- anine to tyrosine, arginine to lysine, and aspartate to gluta- The publication costs of this article were defrayed in part by page charge mate. When a given homology was generated either through payment. This article must therefore be hereby marked "advertisement" a single base change or through an accepted replacement, it in accordance with 18 U.S.C. §1734 solely to indicate this fact. was scored only once. 3019 Downloaded by guest on September 30, 2021 3020 Biochemistry: Ferrara et al. Proc. Nad Acad Sci. USA 81 (1984) 9 T S V A N AjR FLRA D I N A R Q G Q V ATV L S AP A KIT - NHL V AMI E K[I SG 14: P146T~~~LJi L-iI.j II-- AS Lii AR PADHMVLMAGLJOU L] FJAGN 58 Q - D ALP N I --S D A E R I F HEL T G LjAH A Q P G F P L A Q L K T FHD Q E F A Q I K HlI 195 E R G ELV V L G R N G S D Y S AAVjA A CLRAD - C C E I W T D V N - GLY - T C D P - R Qj 0 ~~~+ 0 * 0 0* +@0 0 + 0 105 L H G I S L L G Q C P D S I N A - A L I C R G E K 128 241P DAR L L- K SM SY QEA ME L SYF GA K 263 FIG. 1. Comparison of aspartokinase I segments 9-128 and 146-263. Positions compared, 125; gaps, 12; identities, 27 (21.6%); total identi- ties plus homologies, 55 (44%). In this and in the other figures, identities are boxed, homologies deriving from a single base change are denoted by e, and accepted replacements are denoted by +. Amino acids are designated by standard one-letter abbreviations. RESULTS 447, and from Thr-666 to Leu-686 (Fig. 3), showed 12 match- The Aspartokinase I Region is the Product of a Gene Dupli- es out of 63 base paits (19%) within the range expected on a cation. Segments 9-128 and 146-263 present 27 matches out random basis in an organism in which the four bases are ap- of 125 positions (22%). If the single base-change-derived proximately equally represented. However, if the chain resi- amino acids are considered, the similarity increases to 39%, dues 427-447 shifted by one base, the comparison showed 26 and'if we add ascepted replacements, the total homology be- matches out of the 62 positions (42%) (Fig. 4a). If the shifted comes 44% (Fig. 1).' The homology is distributed homoge- chain is translated in the + 1 frame and the amino acids are neously along the sequence (histogram not'shown). compared to sequence 666-685, the identity increases from The Intermediate Region ID and the Homoserine DMhydrog- 0% to 20%, and the total homology' increases from 0% to enase I'Region are Homologous. The alignment of segments 50% (Fig. 4 b and c). 336-569 and 572-812 shows 22% identity; that is, 55 matches We have chosen to apply the shift to residues 427-447 out of 255 positions (Fig. 2). Using the same accepted rules rather than to residues 666-686, since the latter must resem- of homology as described above, this percentage increases, ble the ancestral gene because its translated amino acids pre- respectively, to 34% and 39%. sent 82% identity with the corresponding homoserine dehy- It is interesting to note that within these two segments, drogenease II region (3) (Fig.