Genome Dynamics in a Natural Archaeal Population
Total Page:16
File Type:pdf, Size:1020Kb
Genome dynamics in a natural archaeal population Eric E. Allen*†, Gene W. Tyson*‡, Rachel J. Whitaker*§, John C. Detter¶, Paul M. Richardson¶, and Jillian F. Banfield*ʈ** Departments of *Environmental Science, Policy, and Management and ʈEarth and Planetary Science, University of California, Berkeley, CA 94720; and ¶Department of Energy Joint Genome Institute, Walnut Creek, CA 94598 Edited by James M. Tiedje, Michigan State University, East Lansing, MI, and approved November 30, 2006 (received for review June 9, 2006) Evolutionary processes that give rise to, and limit, diversification to the interrogated strains and the inability to extract syntenic within strain populations can be deduced from the form and information. distribution of genomic heterogeneity. The extent of genomic An alternative route to assess genetic diversity and reveal mi- change that distinguishes the acidophilic archaeon Ferroplasma croevolutionary processes within natural microbial populations acidarmanus fer1 from an environmental population of the same involves the cultivation-independent analysis of sequences from species from the same site, fer1(env), was determined by compar- many individuals obtained from environmentally derived shotgun ing the 1.94-megabase (Mb) genome sequence of the isolate with sequence data (9, 10). At the population level, comparison of that reconstructed from 8 Mb of environmental sequence data. The reconstructed genome fragments and the associated raw sequence fer1(env) composite sequence sampled Ϸ92% of the isolate ge- data can reveal genomic regions that are invariant as well as those nome. Environmental sequence data were also analyzed to reveal regions that possess heterogeneity. genomic heterogeneity within the coexisting, coevolving fer1(env) In the present study, we reconstructed the genome sequence of population. Analyses revealed that transposase movement and the an isolate, Ferroplasma acidarmanus fer1, and compared it with insertion and loss of blocks of novel genes of probable phage Ϸ103 megabases (Mb) of environmental sequence data obtained origin occur rapidly enough to give rise to heterogeneity in gene directly from a natural biofilm community from the same site. Both content within the local population. Because the environmental the isolate (collected in July 1997) (11) and the environmental DNA was derived from many closely related individuals, it was biofilm (Ϸ1-cm3 sample collected in March 2002) (9) were recov- possible to quantify gene sequence variability within the popula- ered from the five-way region of the Richmond Mine at Iron tion. All but a few gene variants show evidence of strong purifying Mountain in Northern California. F. acidarmanus fer1 is an ex- selection. Based on the small number of distinct sequence types tremely acidophilic ferrous iron [Fe(II)]-oxidizing Euryarchaeote and their distribution, we infer that the population is undergoing (order Thermoplasmatales) implicated in acid mine drainage frequent genetic recombination, resulting in a mosaic genome pool (AMD) generation (11). Isolation of fer1 and description of its that is shaped by selection. The larger genetic potential of the habitat have been reported (11). The fer1 strain grows optimally at population relative to individuals within it and the combinatorial 42°C, pH 1.2, and is capable of growth at pH 0. It is closely related process that results in many closely related genome types may to Ferroplasma acidiphilum (12, 13), based on comparison of 16S provide the basis for adaptation to environmental fluctuations. rRNA gene sequences (99.8% nucleotide similarity). Detailed phenotypic characterization of Ferroplasma species, including F. Archaea ͉ Ferroplasma ͉ population genomics ͉ recombination acidarmanus fer1, has been reported (12). To date, fer1 represents the only archaeal isolate recovered from the Richmond Mine. Here, we investigate the structure and genomic diversity of an he sequencing of microbial isolate genomes has enabled com- environmental population of F. acidarmanus, herein referred to as parison of gene complements from a broad selection of bacte- T fer1(env), making use of the fer1 isolate genome sequence as a rial and archaeal species. Comparative genomic explorations lend reference. Although both fer1 and the fer1(env) population were insight into the metabolic and ecological diversity of microbial taxa obtained from the same site, it cannot be claimed that the fer1 and provide us with the raw materials for inferring evolutionary isolate is a member of the fer1(env) population because it was ECOLOGY history among lineages. However, in most cases only a single collected Ͼ4 years earlier. However, because both fer1 and the representative of a particular lineage has been analyzed genomi- fer1(env) population had a very recent common ancestor, we can cally. Where multiple strain genomes are available, the organisms use the fer1 isolate genome in a comparative approach to quantify are typically pathogens, and the strains are derived from distinct genome-wide heterogeneity and deduce the relative rates of the localities (1–5). Lacking are comparative population genomic stud- ies that include analysis of an isolate versus an environmental population of the same species obtained from the same site. Such Author contributions: E.E.A. and J.F.B. designed research; E.E.A., G.W.T., J.C.D., and P.M.R. data would enable evaluation of how representative of the popu- performed research; E.E.A., G.W.T., R.J.W., and J.F.B. analyzed data; and E.E.A. and J.F.B. lation an individual is and would reveal the form and distribution wrote the paper. of genetic variability that can give rise to the emergence of new The authors declare no conflict of interest. species. This article is a PNAS direct submission. Genetic techniques such as multilocus sequence typing (MLST) Abbreviations: MLST, multilocus sequence typing; Mb, megabase; IS, insertion sequence; (6, 7) of multiple individuals derived from a microbial population R/M, restriction-modification; dN/Ds ratio, ratio of nonsynonymous to synonymous nucle- can provide a glimpse into the extent of genetic diversity within a otide substitutions. Data deposition: Accession numbers for the community genomic data are AADL01000000 natural population and insight into important processes that shape and AADL00000000. The Ferroplasma acidarmanus fer1 genome data associated with this population structure. However, MLST requires isolation of the study have been deposited in the GenBank database (accession no. AABC05000000). organisms under investigation and typically surveys only a small (6 †Present address: Scripps Institution of Oceanography, University of California at San to 10) number of loci. Consequently, MLST approaches are limited Diego, La Jolla, CA 92093. in the extent to which they can be applied to a wide variety of ‡Present address: Civil and Environmental Engineering, Massachusetts Institute of Tech- organisms, and extrapolation of such data to the whole-genome nology, Cambridge, MA 02139. scale may be limited. Similarly, the utility of DNA microarray §Present address: Department of Microbiology, University of Illinois, Urbana, IL 61801. technologies to survey genetic diversity among closely related **To whom correspondence should be addressed. E-mail: [email protected]. strains or environmental populations is inherently limited owing to This article contains supporting information online at www.pnas.org/cgi/content/full/ the fact that only genes shared between reference and test strain(s) 0604851104/DC1. can be surveyed (8). Missing from such analyses are genes unique © 2007 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0604851104 PNAS ͉ February 6, 2007 ͉ vol. 104 ͉ no. 6 ͉ 1883–1888 Downloaded by guest on September 26, 2021 a 1 1800001 200001 integrated elements unique 1600001 genes tnps 400001 Fig. 2. Example of a heteromorphic fer1(env) genomic region that includes G+Cdev a type I restriction-modification system (M, methylation; S, specificity; R, restriction) that is present in only a subset of fer1(env) members, and is distinct from the comparable region in fer1. The 24-gene insertion in fer1 (shown in 1400001 red) includes a phage integrase, four hypothetical genes, a phage-related DNA primase, 13 hypothetical proteins, another phage integrase, a conserved 600001 hypothetical membrane protein, a site-specific DNA methylase, and a hypo- thetical protein. The insertion region is 3Ј delimited by a tRNALeu. 1200001 800001 composite fer1(env) genomes. There are 45 genes in fer1(env) not 1000001 detected in the fer1 isolate and 152 genes in fer1 not present in fer1(env). These gene products are predominantly hypothetical and conserved hypothetical proteins, as well as transposases, integrases, b 1.6e+06 transposase rearrangements integrated element sites transport-related proteins, products involved in restriction/ 1.4e+06 modification and DNA repair, and glycosyltransferases (see SI 1.2e+06 Table 4). No missing genes were found in functional categories ome gen e t involved in core energy metabolism, lipid biosynthesis, cofactor si 1e+06 po biosynthesis, or transcription. Of the unique fer1 isolate genes, 79% 800000 com occur in distinct genomic blocks (the largest comprising 24 genes; v) ϩ 600000 Fig. 2 and SI Fig. 8). Such regions often posses anomalous G C 1(en fer contents (Fig. 1a) and are often associated with phage-type inte- 400000 grase genes. Thus, we infer that the blocks of unique genes are the 200000 result of prophage insertion.