Pan-Genome Analyses Identify Lineage
Total Page:16
File Type:pdf, Size:1020Kb
University of Rhode Island DigitalCommons@URI Cell and Molecular Biology Faculty Publications Cell and Molecular Biology 2014 Pan-Genome Analyses Identify Lineage- and Niche-Specific aM rkers of Evolution and Adaptation in Epsilonproteobacteria Ying Zhang University of Rhode Island, [email protected] Stefan M. Sievert Creative Commons License Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 License. Follow this and additional works at: https://digitalcommons.uri.edu/cmb_facpubs Citation/Publisher Attribution Zhang Y., Sievert S.M. (2014). "Pan-genome analyses identify lineage- and niche-specific am rkers of evolution and adaptation in Epsilonproteobacteria." Frontiers in Microbiology. 5: 110. Available at: http://dx.doi.org/10.3389/fmicb.2014.00110 This Article is brought to you for free and open access by the Cell and Molecular Biology at DigitalCommons@URI. It has been accepted for inclusion in Cell and Molecular Biology Faculty Publications by an authorized administrator of DigitalCommons@URI. For more information, please contact [email protected]. ORIGINAL RESEARCH ARTICLE published: 19 March 2014 MICROBIOLOGY doi: 10.3389/fmicb.2014.00110 Pan-genome analyses identify lineage- and niche-specific markers of evolution and adaptation in Epsilonproteobacteria Ying Zhang*† and Stefan M. Sievert Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA Edited by: The rapidly increasing availability of complete bacterial genomes has created new Martin G. Klotz, University of North opportunities for reconstructing bacterial evolution, but it has also highlighted the difficulty Carolina at Charlotte, USA to fully understand the genomic and functional variations occurring among different Reviewed by: lineages. Using the class Epsilonproteobacteria as a case study, we investigated the Barbara J. Campbell, Clemson University, USA composition, flexibility, and function of its pan-genomes. Models were constructed to Amrita Pati, DOE Joint Genome extrapolate the expansion of pan-genomes at three different taxonomic levels. The results Institute, USA show that, for Epsilonproteobacteria the seemingly large genome variations among *Correspondence: strains of the same species are less noticeable when compared with groups at higher Ying Zhang, Department of Cell and taxonomic ranks, indicating that genome stability is imposed by the potential existence of Molecular Biology, College of the Environment and Life Sciences, taxonomic boundaries. The analyses of pan-genomes has also defined a set of universally University of Rhode Island, 487 conserved core genes, based on which a phylogenetic tree was constructed to confirm CBLS, 120 Flagg Road, Kingston, that thermophilic species from deep-sea hydrothermal vents represent the most ancient RI 02881, USA lineages of Epsilonproteobacteria. Moreover, by comparing the flexible genome of a e-mail: [email protected] chemoautotrophic deep-sea vent species to (1) genomes of species belonging to the †Present address: Ying Zhang, Department of Cell and same genus, but inhabiting different environments, and (2) genomes of other vent species, Molecular Biology, College of the but belonging to different genera, we were able to delineate the relative importance of Environment and Life Sciences, lineage-specific versus niche-specific genes. This result not only emphasizes the overall University of Rhode Island, importance of phylogenetic proximity in shaping the variable part of the genome, but Kingston, USA also highlights the adaptive functions of niche-specific genes. Overall, by modeling the expansion of pan-genomes and analyzing core and flexible genes, this study provides snapshots on how the complex processes of gene acquisition, conservation, and removal affect the evolution of different species, and contribute to the metabolic diversity and versatility of Epsilonproteobacteria. Keywords: pan-genome, core genes, flexible genes, Epsilonproteobacteria, Sulfurimonas, Helicobacter, Campylobacter INTRODUCTION important role. The universally conserved 16S ribosomal RNA The evolution of bacterial genomes is characterized by a massive (rRNA) gene has been widely used to assess the phylogenetic amount of insertions, deletions, and rearrangements, permitting diversity and, by inference, even the functional diversity of the differentiation and adaptation of evolutionarily related lin- microbes in environmental samples. However, inferring func- eages into vastly diverse environments (Romero and Palacios, tionbasedonthe16SrRNAgeneisonlypossibleinselected 1997; Cohan, 2001; Mira et al., 2002; Reams and Neidle, 2003). cases where all members of a 16S-defined clade share the same Over the past two decades, the realization of extended pan- physiology, e.g., in case of cyanobacteria or certain groups genomes in bacterial species has stimulated many discussions of sulfate-reducing Deltaproteobacteria. Further, various studies about the meaning of species boundaries (Medini et al., 2005). have demonstrated substantial genomic variations among strains The fact that strains within a single species can have widely that differ only slightly in 16S rRNA sequences (Coleman et al., different repertoires of genes has highlighted the complexity in 2006; Rasko et al., 2008; Tettelin et al., 2008). Therefore, a identifying bacterial species. Despite the complete sequencing of genome-scale understanding of how genes evolve and what deter- over 2000 bacterial genomes and the in-depth study of a number mines the acquisition and deletion of genes is essential for map- of model organisms, a coherent model is still lacking for genome ping the complete genetic variations of bacteria, while at the same evolution both within and among bacterial species. time assisting in the classification and functional identification of In practice, bacteria have been classified using a polypha- bacterial species. sic approach combining information from multiple molecular, Traditionally, the term pan-genome has been used to describe morphological, and physiological analyses (Rosselló-Mora and the full repertoire of genes found in different strains of a single Amann, 2001), with molecular methods playing an increasingly species (Hanage et al., 2005; Konstantinidis et al., 2006; Read and www.frontiersin.org March 2014 | Volume 5 | Article 110 | 1 Zhang and Sievert Evolution and adaptation of Epsilonproteobacteria Ussery, 2006; Lefébure et al., 2010; Lukjancenko et al., 2010), but new genomes, the size of a closed pan-genome would reach a more recently this concept has been extended to represent the plateau after a certain number of sample genomes were included. total genes in any pre-defined group of bacteria or archaea (Polz Previously, such analyses have only been performed at the species et al., 2013). Here, we compared the pan-genomes at different level (Tettelin et al., 2008). In order to compare the pan-genomes taxonomic ranks within the class Epsilonproteobacteria.Byexam- of different taxonomic ranks, we constructed regression mod- ining the genomic variations within same species as well as among elsforfivegroupsofEpsilonproteobacteria that correspond to different species, we investigated how the processes of gene con- three different levels of taxonomy (Table S1): two groups repre- servation and transfer affect the evolution of pan-genomes and senting strains of same species (intra-species), two representing contribute to the functional adaptation of individual species. species of same genera (intra-genus), and one representing dif- The class Epsilonproteobacteria is metabolically diverse and ferent species of the entire class (intra-class or inter-species). It is contains organisms with different life styles, including both worth mentioning that groups of higher taxonomic rank encom- free-living and host-associated species (Campbell et al., 2006). passed those of lower rank. For example, the inter-species group To date, most studies have focused on species that are associ- (Class) included all the different species in the intra-genus groups ated with the human digestive system, such as members of the (Genus), and the intra-genus groups included a representative genera Helicobacter and Campylobacter, where they exist either strain from each of the intra-species groups (Species). Hence, asymptomatically or cause diseases like peptic ulcers or gastric our analyses compared the extent of pan-genome expansion at cancer (Engberg et al., 2000). The environmental relevance of all three taxonomic levels. Epsilonproteobacteria had not been recognized until the late 90 s We used a two-step process to model the expansion of the pan- (Moyer et al., 1995; Polz and Cavanaugh, 1995; Longnecker and genomes.First,completeorrandompermutationswerecarriedout Reysenbach, 2001). As more and more free-living species were with step-wise additions of new genomes, and a median was taken identified and isolated, it became clear that Epsilonproteobacteria on the size of pan-genomes after each step. Second, the median play important roles in the biogeochemical cycling of nitrogen, counts was extrapolated using two different models: power-law sulfur, and carbon in various marine and terrestrial environ- regression(Tettelinetal.,2008)andexponentialregression(Tettelin ments (Campbell et al., 2006). At deep-sea hydrothermal vents, et al., 2005). The resulting extrapolations were normalized chemoautotrophic Epsilonproteobacteria serve as important pri- by the median genome sizes in their respective sets to assist mary producers