Predicted Highly Expressed Genes in Archaeal Genomes
Total Page:16
File Type:pdf, Size:1020Kb
Predicted highly expressed genes in archaeal genomes Samuel Karlin*†, Jan Mra´ zek*, Jiong Ma‡, and Luciano Brocchieri* Departments of *Mathematics and ‡Biological Sciences, Stanford University, Stanford, CA 94305-2125 Contributed by Samuel Karlin, March 25, 2005 Based primarily on 16S rRNA sequence comparisons, life has been Table 1. Archaeal genomes broadly divided into the three domains of Bacteria, Archaea, and Optimal No. of Eukarya. Archaea is further classified into Crenarchaea and Eur- Genome growth G ϩ C genes yarchaea. Archaea generally thrive in extreme environments as Name size, kb temp, °C content, % Ն80 aa assessed by temperature, pH, and salinity. For many prokaryotic organisms, ribosomal proteins (RP), transcription͞translation fac- Crenarchaea tors, and chaperone genes tend to be highly expressed. A gene is SULSO 2,992 80 36 2,869 predicted highly expressed (PHX) if its codon usage is rather similar SULTO 2,695 80 33 2,657 to the average codon usage of at least one of the RP, transcription͞ AERPE 1,670 90 56 1,783 translation factors, and chaperone gene classes and deviates PYRAE 2,222 100 51 2,290 strongly from the average gene of the genome. The thermosome Euryarchaea (Ths) chaperonin family represents the most salient PHX genes PYRAB 1,765 96 45 1,786 among Archaea. The chaperones Trigger factor and HSP70 have PYRFU 1,908 96 41 1,941 overlapping functions in the folding process, but both of these PYRHO 1,739 96 42 1,828 THEAC 1,565 59 46 1,415 proteins are lacking in most archaea where they may be substi- THEVO 1,585 60 40 1,415 tuted by the chaperone prefoldin. Other distinctive PHX proteins of PICTO 1,546 60 36 1,473 Archaea, absent from Bacteria, include the proliferating cell nuclear ARCFU 2,178 83 49 2,214 antigen PCNA, a replication auxiliary factor responsible for teth- METKA 1,695 110 61 1,606 ering the catalytic unit of DNA polymerase to DNA during high- METTH 1,751 65 50 1,735 speed replication, and the acidic RP P , which helps to initiate 0 METJA 1,665 85 31 1,635 mRNA translation at the ribosome. Other PHX genes feature Cell METMP 1,661 37 33 1,630 division control protein 48 (Cdc48), whereas the bacterial septation METAC 5,751 37 43 4,252 proteins FtsZ and minD are lacking in Crenarchaea. RadA is a major METMA 4,096 37 41 3,147 DNA repair and recombination protein of Archaea. Archaeal ge- HALSP 2,014 37 68 1,880 nomes feature a strong Shine–Dalgarno ribosome-binding motif Nanoarchaea more pronounced in Euryarchaea compared with Crenarchaea. NANEQ 491 90 32 515 acidic ribosomal proteins ͉ Archaea ͉ highly expressed Notice that most archaea subtend genomes of moderate size, ranging from Ϸ proteins ͉ thermosome 1.5 to 3.00 Mb. The methanogens are of variable size with the two meso- philic Methanosarcina species especially relatively large, exceeding 4- and 5.7-Mb genome lengths. SULSO, Sulfolobus solfataricus; SULTO, Sulfolobus he identity of the three domains of life (1) and their tokodaii; AERPE, Aeropyrum pernix; PYRAE, Pyrobaculum aerophilum; Trelationships are controversial (2–11). Archaea form a het- PYRAB, Pyrococcus abyssi; PYRFU, Pyrococcus furiosus; PYRHO, Pyrococcus erogeneous clade composed of a mosaic of bacterial, eukaryotic, horikoshii; THEAC, Thermoplasma acidophilum; THEVO, Thermoplasma vol- and unique features. Archaea and Eukarya share many homol- canium; PICTO, Picrophilus torridus; ARCFU, Archaeoglobus fulgidus; METKA, ogous genes involved in information processing (replication, Methanopyrus kandleri; METTH, Methanobacter thermoautotrophicus; METJA, Methanocaldococcus jannaschii; METMP, Methanococcus maripalu- transcription, and translation), whereas Archaea and Bacteria dis; METAC, Methanosarcina acetivorans; METMA, Methanosarcina mazei; share many morphological structures and metabolic proteins (10, HALSP, Halobacterium sp. NRC-1; NANEQ, Nanoarchaeum equitans; temp, 12). Of 19 archaeal genomes completely sequenced (Table 1, temperature. mid-2004), 4 are from Crenarchaea and 14 are from Eur- yarchaea. Nanoarchaeum equitans, a parasitic archaeon that lives MICROBIOLOGY in coculture with the archaeon Ignicoccus, has been tentatively The objectives of this work are to identify and analyze the assigned to the separate group of Nanoarchaea. Most sequenced major predicted highly expressed (PHX) genes with respect to archaea, to date, are thermophilic, generally prefer extreme codon usage biases among the archaeal genomes. Our analyses environments, and are found in most ecosystems. The four of bacterial genomes support the hypothesis that each species sequenced crenarchaea are all hyperthermophiles (optimal has evolved codon usage patterns promoting ‘‘optimal’’ gene Ն growth temperature, 75°C), although mesophilic crenarchaea expression levels for most circumstances of its habitat, energy have been putatively found in pelagic waters (3, 13). Among the sources, and lifestyle (15, 16). Codon bias is often different at Euryarchaea, six are methanogens, including three mesophiles, the start of a gene compared with the central or terminal part Methanosarcina acetivorans, Methanosarcina mazei, and Meth- of the gene (17, 18). Different selection pressures are imposed anococcus maripaludis. Halobacterium NRC-1 is also mesophilic, by the constraints of ribosomal binding and translation fidelity. thriving in high salt concentrations. Most sequenced archaea, excluding methanogens (lifestyle strictly anaerobic, metabolism methanogenesis) grow both aerobically and anaerobically. Ar- Abbreviations: CH, chaperone͞degradation genes; PHX, predicted highly expressed; RP, chaeal mRNAs are principally polycistronic as in bacterial ribosomal proteins; SD, Shine–Dalgarno; TF, transcription͞translation synthesis factors. genomes. Archaeal proteins involved in translation have both †To whom correspondence should be addressed. E-mail: [email protected]. eukaryotic and bacterial features (14). © 2005 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0502313102 PNAS ͉ May 17, 2005 ͉ vol. 102 ͉ no. 20 ͉ 7303–7308 Downloaded by guest on September 27, 2021 Protein folding is possibly correlated with codon usage (19, RP gene class of Archaea is the most variable, whereas the CH 20). According to the rare codon hypothesis for domains and and TF gene classes are more coherent and consistent. There is secondary structures, repetition of rare codons reduces trans- good agreement of our determinations of PHX protein abun- lation rate and introduces translation pauses, allowing time for dances with assessments by 2D gel electrophoresis displacements protein domains and secondary structures to fold into native (e.g., ref. 16). conformations. However, there appear to be subtle differences in bacterial and eukaryotic translation mechanisms, e.g., the Results and Discussion role of chaperonins in bacteria vs. eukaryotes and the impor- Distinctive Proteins of Archaeal Genomes. The thermosome sub- tant activity of cotranslational folding in eukaryotes but not in units (Ths) are among the most PHX throughout the archaeal prokaryotes. Generally, PHX genes in bacterial genomes rely domain and almost always essential (22). Archaea generally live on favorable codon usages, tend to possess a strong Shine– in extreme environments that are likely to affect the integrity of Dalgarno (SD) sequence (21), and putatively possess a strong their proteins, nucleic acid, and membranes. Mutational and promoter sequence. The substantial variability of GϩC com- other disturbances conceivably may be alleviated by an abun- position within mammalian genomes (isochores) may compli- dance of chaperone and degradation proteins, including ther- cate predicting gene expression levels from codon usages. In mosome, prefoldin, and the proteasome complex assisted by contrast, the nucleotide compositions of bacterial genomes are repair, recombination, and replication enzymes (22, 23). Ths is largely homogeneous. Gene expression in prokaryotes is reg- pervasively PHX in archaeal genomes at a very high predicted ulated at initiation and termination of transcription and expression level (Table 2). Ths also has been investigated translation and by different rates of transcription and trans- experimentally and confirmed especially abundant in Sulfolobus lation, differential mRNA stabilities, segmental stability dif- species encompassing up to 20% of the cellular protein content ferences in polycistronic messages, codon preferences, and (24–26). DnaK (HSP70) is found, so far, only in archaeal interactions with chaperones and other proteins. mesophiles or in moderate thermophiles (27), where it is PHX. The thermosome chaperones are outstandingly PHX genes The heat-inducible Lon protease is absent from the Crenarchaea (Table 2) consistent with the extreme environmental conditions but usually PHX among the Euryarchaea (see Table 5, which is to which these species have adapted. General comparisons of published as supporting information on the PNAS web site). bacterial vs. archaeal genomes and corresponding discussion of Archaeal genomes also are distinguished with proteasome sub- the genomic and proteomic content of the Saccharomyces cer- units. The chaperone prefoldin (Pfd) -subunit is present in all ␣ evisiae and Drosophila melanogaster genomes are set forth in our Archaea, whereas the -subunit is lacking in Crenarchaea and companion paper (2). Thermoplasma genomes (Tables 2 and 5). The replication protein PCNA (proliferating cell nuclear Methods antigen) is present mostly PHX in all archaea and eukaryotes but Highly