Genome Sequence of an Extremely Halophilic Archaeon
Total Page:16
File Type:pdf, Size:1020Kb
Extremely Halophilic Archaeon Sequence 383 21 Genome Sequence of an Extremely Halophilic Archaeon Shiladitya DasSarma INTRODUCTION Extreme halophiles are novel microorganisms that require 5–10 times the salinity of seawater (ca. 3–5M NaCl) for optimal growth (1,2). They include diverse prokaryotic species, both archaeal and bacterial, and some eukaryotic organisms. Extreme halo- philes are found in hypersaline environments near the sea or salt deposits of marine or nonmarine origin. Two of the largest hypersaline lakes supporting a variety of halo- philic species are the Great Salt Lake in the western United States and the Dead Sea in the Middle East. Some of the most interesting hypersaline environments are small arti- ficial solar salterns used for producing salt from the sea, which are distributed through- out the world. Many hypersaline environments exhibit gradients of increasing salinity temporally and produce sequential growth of progressively more halophilic species, including complex microbial mats and spectacular blooms of bright red and red-orange colored species. These environments are important ecologically, frequently supporting entire populations of such exotic birds as pink flamingoes, which obtain their color from the pigmented halophilic microorganisms. A critical feature of halophilic microbes that prevents cell lysis in hypersaline environments is their high internal concentration of compatible solutes (e.g., amino acids, polyols, and salts), which act as osmoprotectants. Although a wide variety of halophiles has been cultured, the genome of only a single extreme halophile, Halobacterium sp NRC-1, has been completely sequenced thus far (3,4). This species is a typical halophile commonly found in many hypersaline environ- ments, including the Great Salt Lake and solar salterns. Phylogenetically, it is classified as an archaeon, a member of the third branch of life (Fig. 1). It has a growth optimum of 4.5M NaCl, close to the saturation point, and a high concentration of K+ salts inter- nally. Halobacterium NRC-1 is a mesophilic archaeon, with a temperature optimum of 42oC for growth. Alhough Halobacterium species are thought to have limited physio- logical capabilities, strain NRC-1 is metabolically quite versatile, growing aerobically, anaerobically, and phototrophically. Phototrophic growth is mediated by the light-driven proton pumping of bacteriorhodopsin, which forms a two-dimensional crystalline lattice in the purple membrane. Halobacterium NRC-1 is also highly resistant to ultraviolet and - radiation and displays sophisticated motility responses, including phototaxis, chemotaxis, From: Microbial Genomes Edited by: C. M. Fraser, T. D. Read, and K. E. Nelson © Humana Press Inc., Totowa, NJ 383 384 DasSarma Fig. 1. Whole genome tree of selected archaeal organisms. Gene content phylogeny done by neighbor-joining using the SHOT web server (19) indicates that Halobacterium is located at the base of the archaeal branch of the phylogenetic tree. and gas vesicle-mediated flotation. One of the most notable features of Halobacterium NRC-1, revealed by genome sequencing, is a highly acidic proteome, which is likely essential to maintain protein solubility and function in high salinity. Significantly, this organism is amenable to analysis using well-developed genetic methodology, including gene knockouts, expression vectors, and complementation systems, which make Halo- bacterium NRC-1 a good model for functional genomic studies among extremophiles and archaea (2). In addition to Halobacterium NRC-1, several other halophiles are the subject of ongoing genome projects. The most notable among these are two Dead Sea archaea, Haloarcula marismortui and Haloferax volcanii (1), which are slightly less halophilic than Halobacterium NRC-1, with an optimum salinity of 2–3M NaCl and a high mag- nesium ion tolerance, reflecting the salt composition of their environment. They also display metabolic capability for growth in media containing simple sugars and carbo- hydrates as carbon and energy sources. Several other interesting categories of halo- philes worthy of genomic studies include alkaliphilic halophiles, which grow in soda lakes with pH of 9.0–11.0; psychrotrophic halophiles, which grow at freezing tempera- tures in Antarctic lakes; bacterial halophiles, which tolerate a wide range of salinity; and eukaryotic halophiles, such as the green algae, Dunaliella salina. Finally, sequenc- ing of a haloarchaeal strain with a nearly identical chromosome to strain NRC-1 is also in progress. A listing of current genome projects on halophiles is maintained on the Halo- phile Genomes Web site at the University of Maryland Biotechnology Institute, Center of Marine Biotechnology (http://zdna2.umbi.umd.edu). Extremely Halophilic Archaeon Sequence 385 THE HALOBACTERIUM GENOME The genomes of Halobacterium species were originally studied a half-century ago; they are composed of two components, a major fraction that is G+C-rich and a rela- tively A+T-rich (58% G+C) satellite (5). Subsequent studies showed that the satellite deoxyribonucleic acid (DNA) corresponded mainly to large heterogeneous extrachro- mosomal replicons containing many transposable insertion sequence (IS) elements (6). For Halobacterium NRC-1, extensive mapping revealed the presence of three replicons: pNRC100, about 200 kbp; pNRC200, nearly twice the size of pNRC100; and a 2-Mbp chromosome (Fig. 2) (7,8). The pNRC100 replicon was found to be partly identical to pNRC200 and to exist as inversion isomers (7). The chromosomes of strain NRC-1 and another wild-type strain, GRB, were compared by restriction mapping, which showed extensive regions of similarity and a few regions with differences, including a large inversion and an insertion. Ordered cosmid libraries representing the genomes of Halobacterium species GRB and H. volcanii were also constructed and compared by hybridization, which indicated the lack of any detectable conserved gene organization (9). These and other mapping projects suggest that significant diversity exists within the genomes of halophilic archaea. Genome Sequencing and Analysis Because of the high G+C composition and the large number of IS elements, the Halo- bacterium NRC-1 genome was sequenced in two stages. Initially, the pNRC100 replicon was sequenced by a combination of random shotgun sequencing of libraries made from purified covalently closed circular DNA and directed sequencing of cloned and mapped HindIII fragments (3,7). This approach permitted the assembly of an unstable replicon that undergoes frequent DNA rearrangements, including inversion isomerization, and contains many IS elements. Subsequently, whole genome random shotgun sequencing was performed, providing 7.5P coverage of the relatively stable large chromosome (4). Remaining lower-quality regions were sequenced using polymerase chain reaction frag- ments and by primer walking. The NRC-1 genome was assembled using the Phred, Phrap, and Consed programs, initially masking all the known and putative new IS elements, to avoid the formation of chimeric contigs (4,10). The complete genome sequence of Halobacterium NRC-1 revealed a 2,571,010-bp genome, including the 2,014,239-bp G+C-rich chromosome, and two smaller circles, 191,346-bp pNRC100, and 365,425-bp pNRC200 (Table 1; Fig. 2) (3,4). Interestingly, pNRC100 and pNRC200 contained a 145,428-bp region of 100% identity, including 33- to 39-kb inverted repeats, which mediate inversion isomerization; the small single copy region; and a part of the large single copy regions (Fig. 2) (7). The unique regions of the large single copy region contained 45,918 bp for pNRC100 and 219,997 bp for pNRC200. Glimmer (Gene Locator and Interpolated Markov Modeler) was used to iden- tify 2,630 likely genes in the genome, of which 64% coded for proteins with significant matches to the databases (4). In addition, 52 ribonucleic acid (RNA) genes were identi- fied. About 40 genes in pNRC100 and pNRC200 coded for proteins likely to be essential or important for cell viability, such as a DNA polymerase, TBP and TFB transcription factors, and the arginyl–tRNA (transfer RNA) synthetase, suggesting that these repli- cons should be classified as minichromosomes rather than megaplasmids (3,4). 386 DasSarma Extremely Halophilic Archaeon Sequence 387 Proteome Analysis One of the most dramatic results of genome sequencing of Halobacterium NRC-1 was the finding of an extremely acidic complement of encoded proteins, which is likely directly related to protein function in its hypersaline (>4M KCl) cytoplasm (11). Cal- culated isoelectric points (pIs) for predicted proteins showed an average pI of approx 5, a prediction confirmed by proteomic analysis (Fig. 3). Similarly, acidic proteomes were predicted from partial genome sequences of two other halophiles, H. marismortui and H. volcanii. In contrast, the average pIs of nearly all other proteomes are close to neu- tral. Notable exceptions are Methanobacterium thermoautotrophicum, which also con- tains both an acidic proteome and a relatively high (~1M) internal concentration of K+ ions, and three hyperthermophiles (Pyrobaculum aerophilum, Pyrococcus furiosus, and Sulfolobus solfataricus), which have relatively basic proteomes. Homology modeling has shown that the acidic pI of Halobacterium NRC-1 proteins is correlated with a high concentration of surface negative charge (11). For example, a transcription factor (TbpE) and a topoisomerase subunit (GyrA) showed a marked increase in surface negative charge when compared