ORIGINAL RESEARCH published: 14 November 2017 doi: 10.3389/fmicb.2017.02132
Ecogenomics and Taxonomy of Cyanobacteria Phylum
Juline M. Walter 1, 2, Felipe H. Coutinho 1, 2, Bas E. Dutilh 2, 3, Jean Swings 4, Fabiano L. Thompson 1, 5* and Cristiane C. Thompson 1*
1 Laboratory of Microbiology, Institute of Biology, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil, 2 Radboud Institute for Molecular Life Sciences, Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, Netherlands, 3 Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands, 4 Laboratory of Microbiology, Ghent University, Ghent, Belgium, 5 Center of Technology - CT2, SAGE-COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Cyanobacteria are major contributors to global biogeochemical cycles. The genetic diversity among Cyanobacteria enables them to thrive across many habitats, although only a few studies have analyzed the association of phylogenomic clades to specific environmental niches. In this study, we adopted an ecogenomics strategy with the aim to delineate ecological niche preferences of Cyanobacteria and integrate them to the genomic taxonomy of these bacteria. First, an appropriate phylogenomic framework was established using a set of genomic taxonomy signatures (including a tree based on conserved gene sequences, genome-to-genome distance, and
Edited by: average amino acid identity) to analyse ninety-nine publicly available cyanobacterial Ana Beatriz Furlanetto Pacheco, genomes. Next, the relative abundances of these genomes were determined throughout Universidade Federal do Rio de diverse global marine and freshwater ecosystems, using metagenomic data sets. The Janeiro, Brazil whole-genome-based taxonomy of the ninety-nine genomes allowed us to identify 57 (of Reviewed by: Zhe-Xue Quan, which 28 are new genera) and 87 (of which 32 are new species) different cyanobacterial Fudan University, China genera and species, respectively. The ecogenomic analysis allowed the distinction of Petr Dvorak, Palacký University, Olomouc, Czechia three major ecological groups of Cyanobacteria (named as i. Low Temperature; ii. Low *Correspondence: Temperature Copiotroph; and iii. High Temperature Oligotroph) that were coherently Fabiano L. Thompson linked to the genomic taxonomy. This work establishes a new taxonomic framework for [email protected] Cyanobacteria in the light of genomic taxonomy and ecogenomic approaches. Cristiane C. Thompson [email protected] Keywords: microbial ecology, ecological niches, charting biodiversity, genome-based microbial taxonomy, metagenome, high-throughput sequencing technology Specialty section: This article was submitted to Aquatic Microbiology, INTRODUCTION a section of the journal Frontiers in Microbiology Earth is home to nearly one trillion (1012) microbial species that have evolved over ∼4 billion years Received: 16 May 2017 (Locey and Lennon, 2016). Cyanobacteria emerged ∼3 billion years ago, ushering Earth’s transition Accepted: 18 October 2017 from anoxygenic to oxygenic conditions through photosynthesis (Schirrmeister et al., 2011a). Published: 14 November 2017 Throughout their evolution, Cyanobacteria became one of the most diverse and widely distributed Citation: Prokaryotes, occupying many niches within terrestrial, planktonic, and benthic habitats. Their long Walter JM, Coutinho FH, Dutilh BE, history evolved in a broad heterogeneity comprising unicellular and multicellular, photosynthetic Swings J, Thompson FL and Thompson CC (2017) Ecogenomics and non-photosynthetic (i.e., Melainabacteria) (Schirrmeister et al., 2011a; Di Rienzi et al., 2013; and Taxonomy of Cyanobacteria Soo et al., 2014), free-living, symbiotic, toxic and predatory organisms (Soo et al., 2015), with Phylum. Front. Microbiol. 8:2132. genomes sizes ranging from 1 to 10 Mb (Shih et al., 2013). Here we consider Cyanobacteria phylum doi: 10.3389/fmicb.2017.02132 as consisting only of oxygenic phototrophs.
Frontiers in Microbiology | www.frontiersin.org 1 November 2017 | Volume 8 | Article 2132 Walter et al. Cyanobacterial Diversity
Cyanobacteria (also known as the Cyanophyceae, Recommendations that nomenclature should agree with Cyanophyta, cyanoprokaryota, blue-green algae or blue-green and reflect genomic information were stated during the pre- bacteria) share similar metabolic features with eukaryotic algae genomic era (Wayne et al., 1987), due nothing describes an and have been named according to the Botanical Code (Kauff organism better than its genome. Sequence-based methods to and Büdel, 2010). The inclusion of Cyanobacteria in taxonomic delimit prokaryotic species have emerged to define and to schemes of Bacteria was only proposed in 1978 by Stanier et al. improve cut-offs criteria during the genomic era (Gevers et al., (1978), and through time the bacterial taxonomic names have 2005; Konstantinidis and Tiedje, 2005a,b; Konstantinidis et al., come into conflict with the botanical nomenclature (Oren, 2004; 2006; Goris et al., 2007; Richter and Rossello-Mora, 2009; Oren and Garrity, 2014). More than two decades passed before Auch et al., 2010a; Thompson et al., 2013a,b; Varghese et al., a Note to General Consideration 5 (1999) was published for 2015), demonstrating a greater discriminatory power. Inexorable Cyanobacteria to be included under the rules of the International advances in methodologies will incorporate genomics into the Committee on Systematic Bacteriology (ICSB)/International taxonomy and systematics of the prokaryotes, boosting the Committee on Systematic of Prokaryotes (ICSP) (Tindall, 1999; credibility of taxonomy in the current post-genomic era (Coenye De Vos and Trüper, 2000; Labeda, 2000). Taxa nomenclature et al., 2005; Chun and Rainey, 2014). Up-to-date, while several within this group has long been a topic of discussion, but groups have been analyzed through a genomic-wide view (Gupta currently there is no consensus (Hoffmann et al., 2005; Oren et al., 2015; Adeolu et al., 2016; Hahnke et al., 2016; Ahn and Tindall, 2005; Oren et al., 2009; Oren and Ventura, 2017). et al., 2017; Amin et al., 2017; Waite et al., 2017), many others As a result, more than 50 genera of Cyanobacteria have been have faced hurdles, such as Cyanobacteria. However, a genomic described since 2000, and many of them remain unrecognized in taxonomy approach has successfully been applied to elucidate the List of Prokaryotic Names with Standing in Nomenclature, the taxonomic structure of the two cyanobacterial genera, LPSN, http://www.bacterio.net (Parte, 2014) or in databases (e.g., Prochlorococcus and Synechococcus (Thompson et al., 2013a; NCBI). Coutinho et al., 2016a,b). As genomic taxonomy postulates The Cyanobacteria form a challenging group for the numeric, non-subjective, cut-offs for taxa delimitation, strains microbiologists. Their traditional taxonomy based on were considered to belong to the same species when share at morphologic traits does not reflect the results of phylogenetic least 98.8% 16S rRNA gene sequence similarity, 95% of AAI, and analyses (Rippka et al., 1979; Boone and Castenholz, 2001; 70% GGD (Konstantinidis and Tiedje, 2005a; Thompson et al., Gugger and Hoffmann, 2004; Schirrmeister et al., 2011b; 2013a,b), while species from the same genus form monophyletic Hugenholtz et al., 2016). The predominance of morphology branches (Yarza et al., 2008; Qin et al., 2014). It is in agreement assembled unrelated Cyanobacteria into polyphyletic species with the concept of species as a discrete, monophyletic and and genera and higher taxonomic categories which require genomically homogeneous population of organisms that can revisions in the future (Komárek et al., 2014). The polyphyly be discriminated from other related populations by means is an indicative of the taxonomic mislabeling of many taxa. of diagnostic properties (Rossello-Mora and Amann, 2001; The 16S rRNA gene sequences were useful in charting and Stackenbrandt et al., 2002). The availability of whole-genomes characterizing microbial communities (Kozlov et al., 2016) but opened the doors for an in-depth knowledge in microbial this molecule lack sensitivity for evolutionary changes that occur diversity and ecology, where the entire genomic pool may be in ecological dynamics, where microbial diversity is organized applied to understanding the forces that govern community by physicochemical parameters (Choudoir et al., 2012; Becraft structure. The use of ecogenomic analysis postulates a reliable et al., 2015). Hence, the processes that shape cyanobacterial and scalable approach to delineate species and genera in communities over space and time are less known. A recent study order to reconstruct their evolution and to draw a global proposed that there should be 170 genera of Cyanobacteria based picture of possible ecological determinants (Di Rienzi et al., on 16S rRNA sequences only (Kozlov et al., 2016). Farrant et al. 2013; Soo et al., 2014; Spang et al., 2015; Thompson et al., (2016) delineated 121 Prochlorococcus and 15 Synechococcus 2015; Anantharaman et al., 2016; Garrity, 2016; Hug et al., ecologically significant taxonomic units (ESTUs) in the global 2016; Hugenholtz et al., 2016). Our hypothesis is that a ocean using single-copy petB sequences (encoding cytochrome phylogenomic framework will reflect ecologic groups found in b6) and environmental cues. nature. High Throughput Sequencing (HTS) have revolutionized the To test this hypothesis, we first established a phylogenomic practice of microbial systematics, providing an informative, framework, using genomic signatures (i.e., a tree based on reproducible, and portable tool to delineate species, reconstruct conserved gene sequences, average amino acid identity, and their evolutionary history, and infer ecogenomic features (Gevers genome-to-genome distance), with the circumscription of et al., 2005; Konstantinidis and Tiedje, 2005a,b; Garrity and Oren, species and genera. We then classified the genomes in three 2012; Gribaldo and Brochier-Armanet, 2012; Shih et al., 2013; major groups according to their ecological traits as inferred Sutcliffe et al., 2013; Hugenholtz et al., 2016). This approach through metagenomics and environmental metadata. Finally, allows both cultured (Al-saari et al., 2015; Appolinario et al., we correlated the three disclosed ecogenomic groups (i. Low 2016) and uncultured microorganisms (Iverson et al., 2012; Temperature; ii. Low Temperature Copiotroph; and iii. High Brown et al., 2015; Hugerth et al., 2015) to be studied. The latter Temperature Oligotroph) with the circumscribed species and is especially important because the cyanobacterial cultivation in genera. We observed that the taxonomic delineation of species laboratory is another hurdle in the study of this group of bacteria. and genera is coherent with the ecogenomic groups.
Frontiers in Microbiology | www.frontiersin.org 2 November 2017 | Volume 8 | Article 2132 Walter et al. Cyanobacterial Diversity
MATERIALS AND METHODS best-fitting nucleic acid substitution models were calculated through the MLModelTest feature. Models were ranked based on Genome Election their Bayesian Information Criterion (BIC) scores as described Cyanobacterial genomes publicly available in January 2016 were by Tamura et al. (2013). The model with the lowest BIC retrieved from RefSeq (NCBI Reference Sequence Database), score was selected and used for further phylogenetic analysis. GenBank and GEBA (Genomic Encyclopedia of Bacteria and The phylogenetic inference was obtained using the Maximum Archaea) databases. Genome completeness was assessed with Likelihood method based on the Kimura 2 parameter method CheckM (Parks et al., 2015), and the genomes that were at + < with the Gamma distributed rate variation (K2 G) as the least 90% complete and assembled in 500 contigs were used nucleotide substitution model, which was estimated from the for further analyses. Ninety-nine genomes were selected based data. The support branches of tree topology were checked by on that criterion, and they are listed in Table 1 (additional 1,000 bootstrap replicates. The 16S rRNA gene alignments were information on Table S2). used to estimate the degree of genetic distance between strains Annotation and Genomic Taxonomy through the Tajima-Nei method (Tajima and Nei, 1984). Gloeobacter violaceus PCC 7421 was set as the outgroup All genomes were annotated using Prokka version 1.11 in both trees. Trees were visualized with FigTree, version (Seemann, 2014), with default settings, in order to avoid 1.4.2 (Rambaut, 2015). Due to incomplete or partial sequences, any possible bias. Genomic taxonomy of the ninety-nine Synechococcus sp. CB0101 was omitted from these analyses. cyanobacterial genomes was performed according to Thompson Planktothrix mougeotii NIVA-CYA 405 as well as Planktothrix et al. (2013a) and Coutinho et al. (2016a) and are briefly prolifica NIVA-CYA 540 were not included in the phylogenetic described here. Average Amino acid Identity (AAI) and Genome- analyses because 16S rRNA sequences are not currently available to-Genome Distance (GGD) were calculated as described for these strains (and not retrievable from their genomes). previously (Konstantinidis and Tiedje, 2005a; Auch et al., The type-strains or the type-species of each taxa were 2010a,b; Meier-Kolthoff et al., 2013). GGD were calculated using included in the 16S phylogenetic tree to confirm the phylogenetic the Genome-to-Genome Distance Calculator tool, version 2.1 relatedness of the cyanobacterial genomes. Designations of type under recommended settings (Meier-Kolthoff et al., 2013; http:// strain or type species were not available for Chaemaesiphon ggdc.dsmz.de/), whereas AAI values were carried out through minutus PCC6605, Pleurocapsa sp. PCC7319, Rivularia GenTaxo as previously described (Coutinho et al., 2016a). The sp. PCC7116, Synechocystis sp. PCC7509, Trichodesmium species cut-offs delimitation were ≥95% AAI and ≥70% GGD, erythraeum IMS01, Xenococcus sp. PCC7305, cyanobacterium and ≥70% AAI for genus delimitation. ESFC-1, and cyanobacterium JSC-12. Geitlerinema sp. PCC7105 The Manhattan distances were calculated based on the is the reference strain for marine species of Geitlerinema, and percentage AAI values of every genome (genome-genome PCC73106 is the reference strain for Gloeocapsa (Sarma, 2012). matrix) and was used as the input for making the hierarchical clustering using the hclust() function in R (R Development Conserved Marker Genes Core Team, 2011). This distance is able to indicate how A tree was generated using 31 conserved gene sequences far/close the genomes are located from each other. The heatmap previously validated as phylogenetic markers for (cyano) bacteria was produced by heatmap.2 {gplots} package in R, with (Wu and Eisen, 2008, and recently used by Shih et al., 2013 background color of each panel mapping to percentage AAI and Komárek et al., 2014). The sequences of these proteins values. were mined using the AutoMated Phylogenomic infeRence Phylogenetic Analysis Application—AMPHORA2 tool (Wu and Scott, 2012), through default settings for the Bacteria option, and with a cut-off To establish the phylogenetic structure of the phylum value of 1.e−10. Individual alignments were performed for Cyanobacteria, phylogenetic trees were constructed using each of the 31 gene sets through MUSCLE v. 3.8 with default the 16S rRNA gene sequences and the concatenated alignments settings (Edgar, 2004). All alignments were then concatenated. of a set of conserved genes, most of which encode ribosomal Only genomes which present all the set of conserved genes proteins. were used in the phylogenetic analysis. A Maximum Likelihood Ribosomal RNA Sequences tree was constructed using RaxML v. 7 (Stamatakis, 2006) The small subunit ribosomal RNA (16S rRNA) sequences from and the Dayhoff+G likelihood model. One thousand bootstrap all cyanobacterial strains for which whole genome sequence replications were calculated to evaluate the relative support of data are publicly available (exception see below, thus N = 97), the branches. Trees were visualized with FigTree, version 1.4.2 as well as 16S rRNA gene sequences from additional type- (Rambaut, 2015). = strains available (N 14) were all analyzed. The sequences were Abundance of Cyanobacterial Genomes retrieved from the ARB SILVA database (Pruesse et al., 2007; Quast et al., 2013). Whenever sequences were not available, they Across Aquatic Environments and were retrieved directly from the genomes using RNammer 1.2 Ecological Correlations Server (Lagesen et al., 2007). Sequences were aligned through Marine and freshwater metagenomes were retrieved to determine MUSCLE v. 3.8 (Edgar, 2004), with default settings, and Gblocks the abundance of ninety-nine cyanobacterial genomes across 0.91b (Castresana, 2000; Talavera and Castresana, 2007) was used the Earth. A set of 191 marine metagenomes from the Tara for alignment curation. Using MEGA 6 (Tamura et al., 2013), Ocean project were retrieved for analysis along with their
Frontiers in Microbiology | www.frontiersin.org 3 November 2017 | Volume 8 | Article 2132 Walter et al. Cyanobacterial Diversity β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β (Continued) # CDS Completeness* Carboxysome % mol GC Lenght (Mbp) 5 7.26 47.02 6,827 99.41 2 6.89 57.69 6,414 99.18 5 9.42 47.62 8,366 99.73 1 4.26 34.28 3,887 99.78 1 5.12 43.87 4,601 99.42 3 5.9 38.78 5,169 99.63 12 8.08 42.61 7,060 100 52 5.62 46.51 4,857 99.59 40 6.69 46.04 5,519 99.67 142 8.65 45.43 7,100 99.37 USA Point, Long Island, NY, USA Moderatehotspring 1 5.43Massachusetts, USA 43.27 5,254 99.78 USA Bermuda Massachusetts, USA Rahama, Israel communities, Elkhorn Slough estuary, CA, USA USA CA, USA FreshwaterFreshwater Solar Lake, Israel Lake near Madison, Wisconsin, 1 4.18 42.92 3,663 99.48 Freshwater Sphagnum bog, SwitzerlandMarine (coastal) 228 Rock at shoreline, Montauk 4.025 41.11 3,704 98.84 Marine (coastal) Limestone, Crystal Cave, Marine (coastal) NA 110 7.03 41.11 5,854 99.3 Thermal - Freshwater Marine (coastal) Plankton, Woods Hole, NAMarine NA India 298 432 8.79 6.87 55.63 41.16 8,370 5,597 99.34 99.74 SoilNAUnknownFreshwater Limestone, Jerucham, Har Unknown Lake USA near Madison, Wisconsin, 1 8 4.68 6.15 58.46 3,727 51.59 4,735 99.87 93.75 FreshwaterFreshwaterSoil Crawford Co., Wisconsin, USAMarine Berkeley, (coastal) CA, USA Salt marsh in 12 Woods Hole, NAMarine (coastal) Greifswald,Germany Extremophylic mat 11.58 1 NA 38.55 3 9,371 6.28 99.39 6.68 45.73 5,956 44.47 5,618 1 99.48 99.63 5.31 40.16 5,002 99.48 NA NA 8 7.16 40.19 6,104 99.76 NA NA 13 8 40.16 6,720 99.76 Soil San Francisco, California, USA 1 7.47 46.21 6,734 99.56 Unknown Unknown 119 6.27 44.7 5,171 99.93 NA NA 34 5.38 41.05 4,627 99.76 Freshwater California, USA 3 5.77 55.18 5,156 98.64 FreshwaterSoilFreshwater Thermal springs, alkaline pod Solar Lake, Israel Stockholm,Sweden 1 3.16 4 1 38.66 2,886 7.61 3.78 99.52 42.44 42.2 3,412 6,127 99.55 99.78 Freshwater NA 20 5.52 47.49 5,024 99.29 Soil Canyonlands National Park, UT, Freshwater Alkaline salt lakes 239 6.49 44.31 5,436 99.34 Freshwater Alkaline salt lakes 1 6.78 44.27 6,676 99.13 Freshwater Alkaline salt lakes 63 6.09 44.69 4,852 99.71 Marine (coastal) Intertidal zone, Moss Beach, Freshwater Cambridge, UK 7 7.06 38.79 6,182 99.44 Habitat Type source/Place # Contigs G. sphagnus L. limosa F. peptidasii G. catellasis C. wisconsii A. nitrilium F. welwii F. hapalii F. sesquitii A. sesilensis A. mossi New species proposal y. Dactylococcopsis D. halotolerans Allonema A. longislandicus Enugrolinea E. bermudensis Rotundosa R. thermolimnetic Adonisia A.splendidus Rheamaris R. confervoides Pseudogeitlerinema P. shalloid Cyclospexia C. valenium Allocoleopsis A. franciscanus Euryforis E. eilemai Geminocystis G. stanieri Tapinonema T. coecalium Microcoleus M. vaginatus New genus proposal NZ_ALVP00000000.1 NZ_ALVM00000000.1 NZ_AAVU00000000.1 NZ_ALVO00000000.1 NC_019695.1 NZ_ABRS00000000.1 NC_019757.1 NC_019780.1 CP003940.1 NZ_CM001632.1 NC_019771.1 NCBI or JGI Reference Sequence d T e T T cT T T T T a b PCC 6306 BL-JBDU NZ_AUZM00000000.1 NZ_JTHE00000000.1 PCC 6308 PCC6605PCC 7203 PCC 7420 NC_019697.1 PCC9333ESFC-1 NC_019753.1 NZ_ARCP00000000.1 PCC 7417 PCC 8305 JSC-12PCC 7202 NZ_CM001633.1 FGP-2 NZ_AFJC00000000.1 Paraca NZ_ACSK00000000.3 NIES-39 NC_016640.1 C1 Strain PCC 7122 a Details of all cyanobacterial genomes included in this stud sp. PCC7104 sp. PCC7375 NZ_ALVN00000000.1 sp. PCC7376 NC_019683.1 sp. PCC6406 NZ_ALVV00000000.2 sp. PCC7407 NC_019703.1 sp. PCC7105 NZ_ANFQ00000000.1 sp. PCC7113 NC_019738.1 sp. PCC73106 NZ_ALVY00000000.1 sp. PCC7428 NC_019745.1 sp. PCC8005 NZ_FO818640.1 sp. PCC9605 NZ_ALVT00000000.1 sp. PCC9431 ALVX00000000.1 sp. PCC9339 NZ_ALVS00000000.1 sp. JSC-11 NZ_AGIZ00000000.1 sp. PCC7418 NC_019779.1 sp. PCC7108 NZ_AJWF00000000.1 sp. PCC8106 sp. PCC7103 NZ_ALVJ00000000.1 Leptolyngbya Leptolyngbya Leptolyngbya boryana Halothece Leptolyngbya Gloeocapsa Lyngbya aestuarii Lyngbya confervoides Lyngbya Gloeocapsa Leptolyngbya Geitlerinema Geminocystis herdmanii Chamaesiphon minutus Chroococcidiopsis thermalis Coleofasciculus chthonoplastes Crinalium epipsammum Cyanobacterium Fischerella Geitlerinema Fischerella Microcoleus Calotrix Arthrospira Fischerella Cylindrospermum stagnale Dactylococcopsis salina Fischerella Cyanobacterium Cyanobacterium stanieri Microcoleus vaginatus Arthrospira platensis Arthrospira platensis Arthrospira platensis Anabaena TABLE 1 | Bacterial Strain Anabaena cylindrica
Frontiers in Microbiology | www.frontiersin.org 4 November 2017 | Volume 8 | Article 2132 Walter et al. Cyanobacterial Diversity β β β β β β β β β β β β β α α α α α α α α α α α α α α α (Continued) # CDS Completeness* Carboxysome % mol GC Lenght (Mbp) 1 2.23 54.2 2,289 99.46 1 2.29 54.2 2,322 99.46 1 2.51 59.2 2,583 99.73 1 2.61 52.4 2,627 99.73 1 6.32 40.36 5,200 99.26 3 6.71 41.53 5,326 99.33 78 2.43 63 2,473 99.18 94 2.69 64.2 2,757 99.73 116 3.28 65.4 2,917 99.46 oligotrophic, 5 m 1,800 m oligotrophic, 51 m coastal, 95 m Connecticut, USA California, USA Maryland, USA Maryland, USA Maha Oya, Sri Lanka Marine Gulf of Aqaba, Red Sea, 10 m 1 2.66 59.8 2,603 99.73 Freshwater Point Reyes Peninsula, Marine Sargasso Sea, 25 m 1 2.37 60.2 2,439 99.18 Marine Sargasso Sea 3 2.63 57.6 2,595 99.73 Marine California current, Pacific, Marine WoodsHole,MA,USA 16 2.69 54.1 2,990 99.18 Marine Chesapeake Bay, Baltimore, Marine LongIslandSound, Marine Gulf of Aqaba, Red Sea, 10 m 1 2.58 64.4 2,575 99.46 Marine Sargasso Sea 1 2.12 60.1 2,661 99.32 NA NAMarine 287 California current, Pacific, 8.38 43.68 6,979 98.56 Marine Sargasso Sea 1 2.43 59.4 2,461 99.46 Marine Blanes Bay, Mediterranean Sea, Freshwater Hot spring, Amparai District, Marine California current, Pacific, Habitat Type source/Place # Contigs Marine Chesapeake Bay, Baltimore, SoilSoilNANA NA USA NA NA 1 1 9 377 7.68 7.47 8.59 6.67 47.6 45.87 6,004 54.1 6,925 43.4 7,012 6,007 99.71 99.78 100 99.12 Marine Mediterranean Sea, 15 m 1 2.22 60.8 2,348 99.64 Freshwater NA 12 6.89 43.43 5,693 99.56 Freshwater NA 13 5.04 39.57 4,188 100 FreshwaterFreshwater NA NA 238 185 5.38 5.48 39.48 4,606 39.48 4,674 100 99.78 Freshwater NA 240 5.46 39.47 4,697 99.56 Freshwater NA 375 5.62 39.51 4,873 100 New species proposal P. stereotis P. agardhii P. agardhii Pseudosynechococcus P. gyrus Nostoc N. reyesii Pseudosynechococcus P. pacificus Pseudosynechococcus P. subtropicalis Pseudosynechococcus P. benguelii Pseudosynechococcus P. sudipacificus Magnicoccus M. indicus Regnicoccus R. antarcticus Pseudosynechococcus P. equatorialis New genus proposal Nostoc N. amparaii Magnicoccus M. sudiatlanticus MicrocoleusSomacatelliumToxinema M. nigroviridis S. hydroxylic T. oscillati Inmanicoccus I. mediterranei Toxinema T. oscillati NZ_DS022299.1 NC_009481 ACNY00000000.1 NZ_CH724168.1 NC_008319.1 AGIK00000000.1 NZ_ADXM00000000.1 NZ_CH724159–NZ_CH724167 NZ_AEPQ00000000.1 NC_007513 NZ_CH724158.1 NC_005070.1 NCBI or JGI Reference Sequence NZ_DS022298.1 NC_007516 NZ_ADXL00000000.1 NC_019693.1 NZ_ALVI00000000.1 NC_009482.1 T g • • • • • • • • • • • • • T T T T T T T T T T • T T T • T a f T WH7803 WH8109 WH7805 WH8016 WH5701 RS9916 WH8102 CC9311 Strain 3L CC9902 RS9917 BL107 CC9605 CB0205 CB0101 RCC307 PCC 6304 PCC7112 NC_019729.1 NIVA-CYA 126/8 NZ_CM002803.1 NIVA-CYA 15NIVA-CYA 56/3 NZ_AVFS00000000.1 NZ_AVFY00000000.1 NIVA-CYA 405 NZ_AVFU00000000.1 NIVA-CYA 406 NZ_AVFV00000000.1 a Continued sp.sp.sp. PCC10802 PCC6506 PCC6407 NZ_ANKO00000000.1 NZ_CACA00000000.1 sp. PCC7107 NC_019676.1 sp. PCC7524 NC_019684.1 Parasynechococcus pacificus Parasynechococcus marearabicus Parasynechococcus subtropicalis Parasynechococcus sudipacificus Parasynechococcus antarcticus Parasynechococcus gyrus Parasynechococcus marenigrum Parasynechococcus benguelii TABLE 1 | Bacterial Strain Moorea producens Nostoc Parasynechococcus chillensis Parasynechococcus equatorialis Parasynechococcus nordiatlanticus Parasynechococcus africanus Parasynechococcus indicus Nostoc Parasynechococcus sudiatlanticus Parasynechococcus mediterranei Oscillatoria acuminata Oscillatoria nigroviridis Oscillatoria Oscillatoria Oscillatoria Planktothrix agardhii Planktothrix agardhii Planktothrix agardhii Planktothrix mougeotii Planktothrix prolifica
Frontiers in Microbiology | www.frontiersin.org 5 November 2017 | Volume 8 | Article 2132 Walter et al. Cyanobacterial Diversity β β β β β β β β β β β β β β β β β β β α α α α α α α α α α α α α (Continued) # CDS Completeness* Carboxysome % mol GC Lenght (Mbp) 1 3.05 58.5 3,064 100 1 2.93 60.2 3,036 100 3 8.72 37.53 6,612 99.78 1 5.07 53.7 5,093 100 11 5.97 48.2 5,702 98.91 10 7.38 38.74 4,516 99.56 10 7.38 38.74 4,516 99.56 Octopus Spring, Yellowstone Park, USA Octopus Spring, Yellowstone Park, USA Mexico Puerto Penasco, Mexico California, Puerto Penasco, Mexico California, Puerto Penasco, Mexico University, CA, USA Thermal- Thermal- Freshwater Unknown Unknown 7 3.41 49.16 3,121 100 Freshwater California, USA 2 3.72 48.49 3,795 99.29 Marine (coastal) Intertidal zone, Mexico 1 4.55 46.31 3,960 98.23 Freshwater FreshwaterFreshwater NA NA 1 2 2.7 2.74 55.5 55.46 2,576 2,655 99.73 100 Marine (coastal) Snail shell, intertidal zone, Marine Sargasso Sea, 100 m 1 2.68 50.01 2,504 100 Marine Equatorial Pacific, 83 m 1 1.68 38.01 1,748 99.73 MarineFreshwater NA GulfStream,135m 1 2.41 464 50.74 5.47 2,339 43.18 99.46 4,774 99.29 Freshwater NA 157 5.5 39.48 4,710 99.78 NAFreshwater Havana, Cuba NA 6 5.54 10 36.22 5.32 4,895 47.39 99.56 4,580 99.56 Freshwater California, USA 6 5.62 47.83 5,363 99.76 Freshwater NA 346 5.61 39.52 4,862 99.78 Marine (coastal) Sea Water Tank, Berkeley Habitat Type source/Place # Contigs Freshwater NA 219 5.39 39.46 4,658 100 Marine (coastal) La Paz, Baja California Sur, Marine Arabian Sea, 50 m 1 1.66 31.32 1,769 99.64 Marine (coastal) Arizona Station, Gulf of Marine (coastal) Arizona Station, Gulf of Marine GulfStream,135m 1 1.7 31.21 1,815 99.73 Marine Equatorial Pacific, surface 1 1.73 31.15 1,840 99.73 Marine South Pacific, 79 m 1 1.69 31.1 1,795 98.78 MarineMarine Sargasso Sea, 90 m Northern Atlantic, 10 m 1 1 1.64 1.84 31.34 1,774 35.12 1,930 99.46 99.45 MarineMarine Equatorial Pacific, 15 m Northern Atlantic, 30 m 1 1 1.7 1.86 30.79 34.98 1,784 2,204 100 99.73 Marine Sargasso Sea, 120 m 1 1.75 36.44 1,883 100 P. ceticus P. agardhii P. agardhii New species proposal P. agardhii R. bajacalifornii P. penascus P. penascus E. marinus Marine Mediterranean Sea, 5 m 1 1.65 30.8 1,777 99.46 Leptococcus L. yellowstonii Leptococcus L. springii Enugrolinea E. euryhalinus Stenotopis S. californii Leptolatis L. gracile Coccusdissimilis C. mexicanus New genus proposal Thaumococcus T. swingsii Thaumococcus T. swingsii Paraspirulina P. subsalsa Paraleptovivax P. allomegium Brevicoccus B. berkleyi Eurycolium Eurycolium E.chisholmii Eurycolium E.neptunius Eurycolium E. nereus Eurycolium E. nereus EurycoliumProlificoccus E. ponticus P. proteus EurycoliumProlificoccus E. tetisii P. proteus NC_007775.1 NC_007776 NC_010475.1 NC_019680.1 NC_006576.1 ABRV00000000.1 NC_009976.1 NCBI or JGI Reference Sequence NC_008820.1 NC_019748.1 NC_005071.1 ALWC00000000.1 NC_008816.1 NC_007577.1 NC_009840.1 NZ_ACDW00000000.1 NC_007335.2 NC_009091.1 NC_005042.1 NC_008817.1 NC_008819.1 • T T • • • • T T T T T T ◦ ◦ ◦ ◦ ◦ ◦ ◦ • T ◦ T T T ◦ T T ◦ T T T a JA23Ba213 JA33Ab PCC 7002 PCC 6312 PCC 6301 PCC7942 NC_007604.1 PCC 7335 MIT9211 Strain NIVA-CYA 540 NZ_AVFX00000000.1 MIT9303 MIT9313 PCC7429 NZ_ALWB00000000.1 NIVA-CYA 98 NZ_AVFZ00000000.1 PCC9445PCC 7437 NZ_ALVR00000000.1 NIVA-CYA 407 NZ_AVFW00000000.1 PCC 7336 CCMP1986 NC_005072.1 AS9601 MIT9312 MIT9202 MIT9215 MIT9301 MIT9515 CCMP1375 NATL1A NATL2A sp. PCC6802 ALVK00000000.1 a sp. PCC7367 NC_019701.1 Continued sp. PCC7319 NC_019689.1 sp. PCC7319 NC_019689.1 sp. PCC7116 NC_019678.1 Synechococcus springii Synechococcus yellowstonii Synechococcus euryhalinus Synechococcus californii Pseudanabaena Synechococcus elongatus Synechococcus elongatus Synechococcus mexicanus Prochlorococcus ceticus Pseudoanabaena TABLE 1 | Bacterial Strain Planktothrix prolifica Prochlorococcus swingsii Prochlorococcus swingsii Pseudanabaena biceps Planktothrix prolifica Rivularia Spirulina subsalsa Stanieria cyanosphaera Planktothrix rubescens Pleurocapsa Synechococcus berkleyi Pleurocapsa Prochlorococcus marinus Prochlorococcus chisholmii Prochlorococcus neptunius Prochlorococcus nereus Prochlorococcus nereus Prochlorococcus ponticus Prochlorococcus tetisii Prochlorococcus marinus Prochlorococcus proteus Prochlorococcus proteus
Frontiers in Microbiology | www.frontiersin.org 6 November 2017 | Volume 8 | Article 2132 Walter et al. Cyanobacterial Diversity
associated metadata (Sunagawa et al., 2015). Sample-associated ed for β β β β β environmental data were inferred across multiple depths at global scale of Tara’s metagenomics sampling: (i) surface water layer (5m, s.d. = 0); and (ii) subsurface layer, including deep chlorophyll maximum zone (71 m, s.d. = 41 m) and mesopelagic zone (600m, s.d. = 220m) (Sunagawa et al., 2015). Eight freshwater metagenomes were retrieved for analysis from the Caatinga biome microbial community project along with their associated metadata (Lopes et al., 2016).
# CDS Completeness* Carboxysome Metagenome reads were mapped to a database containing the