Evidence for Massive Gene Exchange Between Archaeal and Bacterial
Total Page:16
File Type:pdf, Size:1020Kb
MEETING REPORT releasing Sir proteins from the be silenced by the Sir protein com- 3 Loo, S. and Rine, J. (1995) Annu. Ku70p–Ku80p telomerase complex plex. In summary, the importance of Rev. Cell Dev. Biol. 11, (David Shore, Univ. of Geneva, chromatin structure was evident in 519–548 Switzerland). Cdc13p protein binds all sessions. Yeast origins, cen- 4 Smith, J.S. and Boeke, J.D. (1997) single-stranded DNA at the tromeres and telomeres bind elegant Genes Dev. 11, 241–254 Ku70p–Ku80p telomerase complex multiprotein complexes that act as 5 Weaver, D.T. (1995) Trends Genet. (Vicki Lundblad, Baylor, USA). regulatory machines to change 11, 388–392 Nuclear organization of telomeres is chromatin structure and to allow important with telomeres located at important cellular processes to occur. the nuclear periphery (Sussan Robert A. Sclafani [email protected] Gasser, ISREC, Switzerland). Target- Further reading ting DNA to the periphery using a 1 Dutta, A. and Bell, S.P. (1997) ER–Golgi anchoring signal can pro- Annu. Rev. Cell Dev. Biol. 13, Department of Biochemistry and Molecular duce silencing (Rolf Sternglanz, 293–332 Genetics, University of Colorado Health SUNY, USA). Hence, any gene 2 Pluta, A.F. et al. (1995) Science 270, Sciences Center, 4200 E. Ninth Avenue, brought to the nuclear periphery will 1591–1594 Denver, CO 80262, USA. LETTER reasoned that a detailed comparison Evidence for massive gene exchange of the Aquifex and archaeal between archaeal and bacterial genomes could reveal genome-scale adaptations for thermophily. hyperthermophiles The protein sequences encoded in all complete bacterial genomes were compared with the non- redundant protein sequence Sequencing of multiple complete exceptional among bacteria in that it database using the gapped BLAST genomes of bacteria and archaea occupies the hyperthermophilic program7, and a phylogenetic makes it possible to perform niche otherwise dominated by breakdown was automatically systematic, genome-scale archaea2. In the published analysis produced using the comparisons that aim to delineate of the Aquifex genome, it has been TAX_COLLECTOR program (Ref. 8, the genomic complement of a concluded that the genome and D.R. Walker, unpublished). The particular phenotype. Recently, the sequence yielded ‘only a few results show that the fraction of first genome of a hyperthermophilic specific indications of Aquifex gene products that have bacterium, Aquifex aeolicus, has thermophily’1. With three genomes archaeal proteins as clear best hits is been sequenced1. Previous studies of extreme thermophilic archaea by far greater than for each of the based on rRNA and aminoacyl-tRNA (Methanococcus jannaschii, other bacteria (Table 1). Taking the analysis had suggested a very early Methanobacterium thermo- fraction of ‘archaeal’ genes in divergence of Aquifex from the rest autotrophicum and Archaeoglobus Bacillus subtilis (Table 1) as a of the bacteria2,3. Aquifex is fulgidus) currently available4–6, we conservative estimate for the random expectation in a bacterial genome and using the normal TABLE 1. ‘Archaeal’ genes in bacterial genomes approximation of the binomial distribution, it could be estimated Bacterial speciesa Reliable best hits to archaeal proteinsb that the excess of ‘archaeal’ genes in Aquifex could not be explained by a Aquifex aeolicus 246 (16.2%) random fluctuation, with p<<10210. Bacillus subtilis 207 (5.0%) Synechocystis sp. 126 (4.0%) A reciprocal comparison showed Borrelia burgdorferi 45 (3.6%) that, for proteins encoded in each of Escherichia coli 99 (2.3%) the three archaeal genomes, Aquifex proteins are the best hits aThe data on Haemophilus influenzae, Helicobacter pylori (Proteobacteria), significantly more frequently than Mycoplasma genitalium and Mycoplasma pneumoniae (Gram-positive bacteria) proteins from other bacteria, even are not shown because, in these species, the majority of the best hits are to those with genomes 2–3 times homologs from larger genomes within the same phylogenetic lineages, namely larger than the Aquifex genome, E. coli and B. subtilis, respectively. b 23 such as Synechocystis sp. or All database hits with associated expectation (e) values <10 were analyzed; a B. subtilis (Table 2). In a ‘reliable best hit’ was registered when the e-value with an archaeal protein was lower than that with any bacterial or eukaryotic protein by at least a factor of 100. complementary analysis, bacterial proteins were compared with TIG NOVEMBER 1998 VOL. 14 NO. 11 0168-9525/98/$ – see front matter. Published by Elsevier Science. 442 PII: S0168-9525(98)01553-4 LETTER TABLE 2. ‘Bacterial’ proteins in archaea Reliable best hits in bacteriaa Archaeal species Aa Bs Ssp Ec Bb Methanococcus jannaschii 193 (10.9%) 78 (4.4%) 56 (3.2%) 44 (2.5%) 16 (0.9%) Methanobacterium autotrophicum 151 (8.0%) 103 (5.4%) 91 (4.8%) 41 (2.2%) 13 (0.7%) Archaeoglobus fulgidus 227 (9.4%) 140 (5.8%) 80 (3.9%) 59 (2.5%) 16 (0.7%) aDefined as in Table 1. The bacterial species included are the same as in Table 1; abbreviations: Aa, Aquifex aeolicus; Bb, Bacillus burgdorferi; Bs, Bacillus subtilis; Ec, Escherichia coli; Ssp, Synechocystis sp. protein families that are conserved These observations suggest that predicted to possess as yet in all three sequenced archaeal there has been massive gene unknown enzymatic activities. The genomes (Ref. 9 and K. Makarova, exchange between extreme remaining genes have homologs in L. Aravind, R.L. Tatusov and E.V. thermophilic archaea and the well-characterized genomes and, Koonin, unpublished). The fraction lineage of bacterial accordingly, functions can be of bacterial proteins that could be hyperthermophiles represented by predicted in most cases. These included into the conserved Aquifex. Convergence brought include metabolic enzymes, archaeal families was essentially about by positive selection for transporters and proteins involved uniform at the level of about 20% of thermotolerance could account for a in genome replication and repair. each of the bacterial proteomes, subset of archaeal best hits among Of particular interest are two with a sharp deviation at 39% Aquifex proteins. Nevertheless, the families of ATP-dependent DNA observed for Aquifex (Table 3). highly significant differences in the ligases, one of which has not been Given these indications of a level of sequence similarity between described previously and is only direct relationship between a sizeable archaeal and bacterial best hits for distantly related to eukaryotic fraction of genes in Aquifex and many Aquifex proteins, conservation ligases, an archaeal/eukaryotic type archaea, we investigated the protein of unique domain architectures in ATPase distantly related to the families that they share in further archaea and Aquifex, and the bacterial RecA, and a small protein detail using iterative database phylogenetic analysis results, appear homologous to the catalytic domain searches with the PSI-BLAST to indicate that at least 10% of the of DnaG-type DNA primases. In program7 and phylogenetic tree Aquifex have been horizontally each of these cases, Aquifex also construction with the neighbor- transferred from the archaea. encodes a typical bacterial joining and parsimony methods10. The ‘archaeal’ genes in Aquifex counterpart of the ‘archaeal’ protein, Of the 246 Aquifex proteins that are are a functionally diverse set. namely the NAD-dependent DNA most similar to their archaeal Predictably, the genes that are ligase, RecA, and a classic DnaG homologs (Table 1), 26 belong to found exclusively in archaea and ortholog. Similar chimerism was families found in archaea and Aquifex are functionally observed for several enzymes, for Aquifex only. In addition, 60 of the uncharacterized owing to the lack example, tryptophan synthase b remaining families were investigated of experimental data on these subunit, peroxidase and isopalmate by phylogenetic methods and, for organisms. Several of them, dehydratase. In these cases, it seems 26, statistically significant support however, form highly conserved particularly plausible that the (>65% bootstrap replications) of the families that, on the basis of the ‘archaeal’ genes have been Aquifex/archaea grouping was observed patterns of amino acid introduced into the Aquifex genome observed (data not shown). residue conservation, could be by horizontal transfer, on top of a Aquifex genome contains 36 clusters of two or more adjacent ‘archaeal’ genes (Fig. 1); the mean TABLE 3. Inclusion of bacterial proteins into conserved archaeal length of a cluster is significantly familiesa greater (p <1023) than expected on the basis of a random distribution in Protein from the given species Bacterial species included in archaeal COGs the genome (as calculated using a geometric distribution Aquifex aeolicus 597 (39%) approximation and confirmed by Synechocystis sp. 707 (22%) computer simulation). This suggests Bacillus subtilis 910 (22%) a conserved arrangement of some Escherichia coli 891 (21%) genes in Aquifex and archaea and, Borrelia burgdorferi 215 (17%) indeed, three such clusters were identified, with the most prominent aA total of 789 families of probable orthologs (clusters of orthologous groups, or one including 13 Aquifex genes COGs) in the three archaeal genomes were identified as previously described. whose arrangement is partially Bacterial proteins were compared with these COGs using the gapped BLASTP program, and a bacterial protein was included in the given COG if its best hits to conserved in the archaea but not in at least