<<

Single- unveiled a cryptic cyanobacterial with a worldwide distribution hidden by a

Takuro Nakayamaa,1, Mami Nomurab,2, Yoshihito Takanoc, Goro Tanifujid, Kogiku Shibab, Kazuo Inabab, Yuji Inagakie, and Masakado Kawataa

aGraduate School of Sciences, Tohoku University, Sendai 980-8578, Japan; bShimoda Marine Research Center, University of Tsukuba, Shimoda 415-0025, Japan; cFaculty of Science and Technology, Kochi University, Nankoku 783-8502, Japan; dDepartment of Zoology, National Museum of Nature and Science, Tsukuba 305-0005, Japan; and eCenter for Computational Sciences, University of Tsukuba, Tsukuba 305-8577, Japan

Edited by David M. Karl, University of Hawaii at Manoa, Honolulu, HI, and approved May 24, 2019 (received for review February 13, 2019) are one of the most important contributors to revealed that this cyanobacterial lineage symbiotically interacts oceanic primary production and survive in a wide range of marine with a lineage of unicellular photosynthetic (14–17). . Much effort has been made to understand their ecological This is thought to trace back to at least the Late features, diversity, and , based mainly on data from free- Cretaceous Period (18). Whole- sequencing of UCYN- living cyanobacterial . In addition, symbiosis has emerged as A showed that the cyanobacterial lineage has greatly reduced an important lifestyle of oceanic microbes and increasing knowl- its metabolic capacities for and has been spe- edge of cyanobacteria in symbiotic relationships with unicellular cialized for fixation, thus furthering current un- eukaryotes suggests their significance in understanding the global derstanding of marine cyanobacteria (16, 19). While detailed oceanic . However, detailed characteristics of these genetic features of some symbiotic cyanobacteria have been cyanobacteria remain poorly described. To gain better insight into reported aside from UCYN-A (20–22), those of most cyano- marine cyanobacteria in symbiosis, we sequenced the genome of cyanobacteria collected from a cell of a pelagic dinoflagellate that bacterial symbionts remain unknown. These poorly understood ECOLOGY is known to host cyanobacterial symbionts within a specialized symbiotic species potentially have , which is im- chamber. Phylogenetic analyses using the genome sequence revealed portant to an understanding of cyanobacteria as a whole. that the cyanobacterium represents an underdescribed lineage within Pelagic heterotrophic of the genus Ornitho- an extensively studied, ecologically important group of marine cercus have long been known to host cyanobacteria as symbionts cyanobacteria. Metagenomic analyses demonstrated that this cyano- (Fig. 1). species cells are surrounded by a cellulosic bacterial lineage is globally distributed and strictly coexists with its covering known as the thecal plate, and crown-shaped extensions host dinoflagellates, suggesting that the intimate symbiotic associa- tion allowed the cyanobacteria to escape from previous metagenomic Significance studies. Furthermore, a comparative analysis of the repertoire with related species indicated that the lineage has independently un- Cyanobacteria are an important component of marine micro- Prochloro- dergone reductive genome evolution to a similar extent as bial ecology, and thus their biodiversity has been extensively coccus , which has the most reduced among free-living studied. Here, through whole-genome sequencing, we discov- cyanobacteria. Discovery of this cyanobacterial lineage, hidden by ered that a marine cyanobacterium in a symbiotic association its symbiotic lifestyle, provides crucial insights into the diversity, with a unicellular (OmCyn) represents a previously ecology, and evolution of marine cyanobacteria and suggests the under-described lineage within an ecologically important cya- existence of other undiscovered cryptic cyanobacterial lineages. nobacterial group. Our metagenomic analyses showed that the cyanobacterium OmCyn thrives in global , further sug- cyanobacteria | dinoflagellate | symbiosis | single-cell genomics | gesting the existence of other cryptic cyanobacterial lineages metagenomics that have been overlooked because of their symbiotic lifestyle. Via comparison with genomes of free-living relatives, the yanobacteria are one of the most successful groups of OmCyn genome was shown to have a reductive nature, which Coxygen-producing photoautotrophs occupying a broad range apparently resulted from intimate association with the host. of habitats on earth. in this group are highly ubiqui- Together, our results expand current understanding of the bi- tous in marine environments and play a vital role in oceanic ology of cyanobacteria and marine . biogeochemical cycles, with their metabolic abilities including photosynthesis and (1–3). To obtain a better Author contributions: T.N., M.N., Y.T., and Y.I. designed research; T.N., M.N., G.T., K.S., and K.I. performed research; T.N. analyzed data; and T.N., M.N., Y.T., G.T., K.S., K.I., Y.I., understanding of marine microbial ecology, the biodiversity and and M.K. wrote the paper. ecological features of marine cyanobacteria have been actively The authors declare no conflict of interest. explored (4–8). Previous studies based on environmental DNA and cultivated strains have revealed that marine cyanobacteria This article is a PNAS Direct Submission. display broad genetic diversity and have evolved by adapting to Published under the PNAS license. various ecological niches (9–11). Data deposition: The data reported in this paper are available from the DNA Data Bank of Japan (http://www.ddbj.nig.ac.jp/), NCBI GenBank database (https://www.ncbi.nlm.nih. Other than free-living species, which have been extensively gov/genbank), and the European Molecular Biology Laboratory (https://www.embl.de/) studied, cyanobacterial species have symbiotic relationships with DNA database under BioProject accession number PRJDB7787. various organisms (12, 13). Recent developments in DNA-sequencing 1To whom correspondence may be addressed. Email: [email protected]. technologies have revealed the genetic characteristics of one group 2Present address: Graduate School of Human and Environmental Studies, Kyoto Univer- of symbiotic cyanobacteria. Unicellular cyanobacteria group A sity, Kyoto 606-8501, Japan. (UCYN-A; also known as Candidatus Atelocyanobacterium tha- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. lassa) is a recently recognized, ecologically important cyano- 1073/pnas.1902538116/-/DCSupplemental. bacterial group in the marine environment. Recent studies have

www.pnas.org/cgi/doi/10.1073/pnas.1902538116 PNAS Latest Articles | 1of6 Downloaded by guest on September 26, 2021 of the thecal plate form an extracellular chamber per cell (SI phylogenetic relationship between OmCyn and the marine Appendix, Fig. S1). A number of coccoid cyanobacteria reside in picocyanobacteria (SI Appendix, Fig. S3). The OmCyn sequence the specialized chamber (Fig. 1 and SI Appendix, Fig. S1); these made a monophyletic clade along with previously reported se- extracellular symbionts are also called phaeosomes (12, 23). quences. Most of those were a part of diverse 16S rDNA se- Despite the existence of phaeosomes being first recorded over quences directly amplified from dinoflagellate cells with 100 y ago (23), there is no laboratory culture or any report of cyanobacterial symbionts that are akin to O. magnificus (24). successful cultivation of the symbiont and detailed characteristics Biodiversity within the picocyanobacterial clade has been ex- of the cyanobacteria surviving in these chambers remain poorly tensively studied, as these cyanobacterial species numerically understood. Here, we sequenced the genome of the cyanobacteria dominate global oceans (7, 25). However, we detected no strong isolated from the dinoflagellate Ornithocercus magnificus (Fig. 1) phylogenetic affinity between the clade of cyanobacterial se- using single-cell genomics technology. Analyses based on the ge- quences from dinoflagellates, including OmCyn, and previously nome sequence revealed that the cyanobacterium represents an explicitly described marine picocyanobacterial lineages (7) based underdescribed lineage of a cyanobacterial clade, of which bio- on traditional phylogenetic markers for this group—namely, 16S diversity has been extensively studied and which has independently rDNA (SI Appendix, Fig. S3) and the internal transcribed spacer undergone reductive genome evolution. Analyses using meta- (ITS; SI Appendix, Fig. S4). The two trees showed different genomic data from the Tara Oceans Expedition further suggest phylogenetic positions of the clade comprising the cyanobacterial worldwide distribution of this cyanobacterial lineage and the po- sequences from dinoflagellates: The ITS tree suggested the clade tential existence of other undiscovered symbiotic cyanobacterial is basal to a well-defined clade, known as Syn- lineages. echococcus subcluster 5.1 (7), while the 16S rDNA tree placed the clade as sister to one of the clades within Synechococcus Results and Discussion subcluster 5.1, named clade V. Nevertheless, since both back- Genome Sequence and Phylogenetic Analyses of Cyanobacteria bones of the two trees did not receive high bootstrap support, we Isolated from a Pelagic Dinoflagellate. A single O. magnificus cell could not conclude on the phylogenetic position of OmCyn was obtained from collected off Shimoda Bay, Japan, based on these single- phylogenetic analyses. To reveal the and cyanobacterial symbiont cells were isolated from a chamber position of OmCyn within the phylogeny of marine picocyano- of the host (SI Appendix, Fig. S2). The genomes of the isolated , we carried out phylogenomic analyses using datasets cyanobacterial cells were amplified and analyzed using two high- consisting of 40 protein sequences (Fig. 2, Left, and SI Appendix, throughput sequencing platforms, Illumina MiSeq and Oxford Fig. S5). The resultant trees clearly showed that OmCyn oc- Nanopore MinION. From scaffolds assembled using the short cupies a basal position to Synechococcus subcluster 5.1 in SI Appendix Syn- and long sequencing reads from the MiSeq and MinION systems, agreement with the ITS phylogeny ( , Fig. S4). echococcus subcluster clade 5.1, together with OmCyn, was respectively, we successfully recovered ∼1.87 Mbp of sequence shown to be a sister to another monophyletic clade consisting of (SI Appendix, Table S1) representing the genome sequence of species; this topology was strongly supported the cyanobacteria isolated from O. magnificus (hereafter referred with high bootstrap values (Fig. 2, Left, and SI Appendix, Fig. S5). to as OmCyn). The sequence showed particular similarity to Based on this phylogeny, the OmCyn lineage likely emerged genomes of the cyanobacterial lineage referred to as the marine soon after the split of the Prochlorococcus lineage and before the picocyanobacterial clade, consisting of the genera Synechococcus divergence of Synechococcus subcluster 5.1. and Prochlorococcus. This is consistent with the results of a previous study (24), which showed that the majority of pro- Metagenomic Analyses Reveal an Intimate Association of OmCyn with karyotic 16S ribosomal RNA gene (16S rDNA) sequences am- the Host Dinoflagellate and Its Worldwide Distribution. Marine plified from marine dinoflagellates with cyanobacterial symbionts picocyanobacteria occupy a wide variety of ecological niches (7, have high similarities to Synechococcus/Prochlorococcus species. 26). We attempted to determine the OmCyn distribution pattern Our phylogenetic analysis of 16S rDNA sequences confirmed the using publicly available marine metagenomic data to reveal the and behavior of the newly identified marine picocyano- bacterial lineage in the natural environment. A read-mapping analysis using Tara Oceans short-read data (27–29) against OmCyn and 19 picocyanobacteiral genome sequences revealed a marked difference in the occurrence pattern among sample size fractions between OmCyn and the remaining species examined. Reflecting their small cell size (<2 μm), the genome sequences of marine picocyanobacteria were most frequently (>70%; Fig. 3A) identified in samples from the 0.8–5-μm fraction relative to other fractions (5–20, 20–180, and 180–2,000 μm). In contrast, only 7% of OmCyn genome sequences were detected in samples from the 0.8–5-μm fraction, with 83% detected in the 20–180-μm fraction (Fig. 3A), inconsistent with the size of the symbiont cells (<5 μm) (30). This distribution pattern is clearly equivalent to that of the 18S rDNA of O. magnificus, which hosts the OmCyn lineage, based on Tara Oceans metabarcoding analysis data (Fig. 3A) (31). The majority of the 18S rDNA V9 amplicons corresponding to the O. magnificus 18S rDNA sequence were from the 20–180- Fig. 1. Microscopy images of an Ornithocercus magnificus dinoflagellate μm fraction (Fig. 3A), with only 3% from the 0.8–5 μm fraction. with its cyanobacterial symbiont. (Left) Bright-field image of O. magnificus This distribution pattern for the host is reasonable, considering from the same sampling area where the individual used in the present study that O. magnificus cells are 75–115 μm in size (32) and the low was collected, showing brown cyanobacterial symbionts (arrowheads) re- – μ siding inside the symbiotic chamber. (Right) Blue-light epifluorescence im- proportion of reads detected in the smaller fraction (0.8 5 m) age of the same individual at the same angle of view as that of the left may have originated from free DNA in the water sample (e.g., image. The symbionts emit yellow autofluorescence. Arrowheads point to DNA released from disrupted cells). OmCyn sequences were the same cyanobacterial cells as those indicated in the bright-field image. detected from the 20–180-μm samples from 51 out of 57 Tara

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1902538116 Nakayama et al. Downloaded by guest on September 26, 2021 Fig. 2. Maximum likelihood (ML) phylogenetic tree of a multiple-protein dataset and protein repertoire comparison between the marine picocyanobacteria ECOLOGY and OmCyn. (Left) ML tree inferred from a concatenated dataset of 40 orthologous . The dataset includes only taxa whose genomes have been completely sequenced. The ML tree for the full dataset comprising 50 taxa is shown in SI Appendix, Fig. S5. Thick branches indicate 100% bootstrap values based on ML nonparametric bootstrapping. (Right) Comparison of protein numbers in the nonredundant proteomes of marine picocyanobacteria and OmCyn. Each bar indicates a proteome size without functional redundancy (protein number in a species protein repertoire).

Oceans geographical stations, suggesting its wide distribution the OmCyn genome sequence from the eukaryote size fraction across the world’s oceans (Fig. 3B). Consistent with the com- (20–180 μm; Fig. 4). Considering that Ornithocercus species parison of sample size fractions, global distribution patterns of other than O. magnificus and their relatives also have cyano- the OmCyn genome sequence and the O. magnificus 18S rDNA bacterial symbionts (39, 40), and that some 16S rDNA sequences V9 amplicon were shown to be the most correlated with each from dinoflagellate cells made a monophyletic clade with the other compared with free-living marine picocyanobacterial ge- OmCyn sequence (SI Appendix, Fig. S3), these metagenomic nomes (Fig. 3C). The result of our principal coordinate analysis reads may partially represent the genomic diversity of cyano- (PCoA) on the distribution patterns also showed clear isolation bacteria in symbiotic relationships with dinoflagellates. of the OmCyn genome and the O. magnificus 18S rDNA V9 amplicon from the marine picocyanobacterial genomes (SI Ap- Genome Streamlining. Another remarkable feature of the OmCyn pendix, Fig. S6A). Moreover, the sequence occurrence patterns genome is its streamlined protein-coding gene content. Given for both the OmCyn genome and the O. magnificus rDNA the high genome-sequencing coverage (mean, 436-fold), and the amplicon among Tara Oceans metagenomic samples were pre- fact that the sequence contains tRNA for all 20 amino dicted to be similarly associated with environmental physico- acids, the scaffolds likely represent the complete, or nearly chemical parameters (SI Appendix, Fig. S6B), supporting further complete, OmCyn genome. The genome sequence of OmCyn the co-occurrence of OmCyn and O. magnificus. was predicted to contain 1,846 protein-coding genes, which could Given the strong correlation of sequence occurrence patterns be clustered into 1,726 protein types by orthologous protein for OmCyn and O. magnificus, along with reports that the sym- clustering (41). This protein repertoire of OmCyn (1,726 protein biotic cyanobacteria of Ornithocercus species can be passed on to types; Fig. 2, Right, and SI Appendix, Table. S2) was significantly the symbiotic chambers of daughter cells through host cell di- smaller (detected as an outlier by Grubbs’ ; α = 0.05) relative vision (12, 33), OmCyn cells are likely to exclusively reside within to the repertoires of other species within Synechococcus sub- the symbiotic chambers of these dinoflagellates in the natural cluster 5.1 (2,061–2,498 protein types; Fig. 2, Right). The size of environment. This assumption could explain why the lineage the protein repertoire of OmCyn is comparable to those of including OmCyn was never explicitly described in previous Prochlorococcus species, which have the most reduced genomes surveys that extensively explored the genetic diversity of the among free-living cyanobacteria (Fig. 2, Right). The limited marine picocyanobacterial clade. This may be attributable to the protein repertoire suggests that many protein-coding genes were fact that studies focused on marine picocyanobacteria often use lost during the evolution of the OmCyn lineage. To estimate the samples of small fraction size (i.e., filter size <20 μm) (34–38) to extent of OmCyn protein repertoire reduction, we assessed how increase the sensitivity of analyses by filtering out nontarget or- the protein repertoires in each lineage have been altered from ganisms with larger cell size; such procedures may lead to the ancestral state by comparing them with the putative protein oversight of species attached to larger organisms. This implies repertoire of a common ancestor of the marine picocyano- that we may still be missing symbiotic cyanobacterial lineages bacterial clade. The ancestral protein repertoire was defined other than OmCyn. Interestingly, our metagenomic read- as the protein set in common between total protein repertoires mapping analyses detected a large number of metagenomic for each group of genomes included in Synechococcus subcluster reads, which did not exactly match but showed high similarity to 5.1 and the Prochlorococcus clade. This putative ancestral

Nakayama et al. PNAS Latest Articles | 3of6 Downloaded by guest on September 26, 2021 Fig. 3. Distribution patterns in natural environments estimated on the basis of metagenomic and metabarcoding data. (A) Relative abundances among four size fractions. While the marine picocyanobacterial lineages were abundant in the 0.8–5-μm fraction, OmCyn was estimated to be most abundant in the 20– 180-μm fraction, matching the abundance pattern of O. magnificus, the host dinoflagellate. The relative abundances of cyanobacterial lineages, including OmCyn, and O. magnificus were calculated based on Tara Oceans metagenomic data and 18S rRNA V9 metabarcoding data, respectively. (B) OmCyn genome- sequence read abundance in the 20–180-μm fraction from 57 Tara Oceans stations. Circle sizes indicate abundance of detected reads per 5 million meta- genomic reads. (C) Correlation matrix of distribution patterns among marine picocyanobacteria, OmCyn, and O. magnificus.

protein repertoire of marine picocyanobacteria consists of 2,114 genomes of other bacterial symbionts (22, 43–45), implying that proteins, and comparisons showed that, while the Synechococcus the symbionts have been adapted to relatively stable environ- genomes have retained an average of 86% of ancestral proteins, ments (e.g., the symbiotic chamber of dinoflagellates). the OmCyn genome possesses only 65% (Fig. 2, Right), sup- In contrast to these gene losses, the cyanobacterial “core” porting that a genome reduction event occurred in this lineage protein set that is conserved among all complete genomes of free- (see Dataset S1 for a list of proteins missing from OmCyn). The living cyanobacteria analyzed in this study is mostly retained same assessment revealed that Prochlorococcus species with (96.9%; Fig. 2, Right) in the OmCyn genome. The high degree of small genomes (Prochlorococcus other than P. marinus MIT universality of the core protein set appears to indicate the im- 9313) have retained only 69% of ancestral proteins. While ge- portance of these proteins for growth and function of cyanobac- nome size and retention of putative ancestral proteins are ap- teria. The genome of OmCyn has likely been streamlined while parently equivalent in OmCyn and Prochlorococcus species with retaining minimum metabolic integrity, including photosynthetic reduced genomes, genome reduction, including gene losses, ap- ability. A total of 20 cyanobacterial core proteins are missing in pears to have occurred independently in the two lineages based OmCyn (SI Appendix, Table S3); however, to our knowledge, none on their phylogenetic relationship (Fig. 2, Left, and SI Appendix, of these proteins are critical for growth or photosynthetic ability. It Fig. S5). In support of this notion, the OmCyn genome retains most genes encoding the components of , a large is noteworthy that the list of the missing cyanobacterial core light-harvesting antennae complex, that were characteristically proteins includes five genes for DNA recombination and repair: lost during Prochlorococcus evolution (26). To evaluate the im- recF, recG, recN, recO,andrecR. Since such loss of genes for ge- pact of gene loss on metabolic function in OmCyn, we further netic recombination has been observed in various bacterial sym- – analyzed the proteins that were predicted to be discarded from bionts with reduced genomes (44, 46 48), it is likely to be a the OmCyn genome using the Kyoto Encyclopedia of Genes and common phenomenon associated with reductive genome evolu- Genomes (KEGG) orthology system (42). The analysis showed tion. In contrast to OmCyn, which retains most of the cyano- that the metabolic functional category that experienced the most bacterial core protein set, intracellular cyanobacterial symbionts, severe reduction was membrane transport (SI Appendix, Fig. S7), which metabolically depend on their host cell for growth (16, 19, which is responsible for interactions with extracellular environ- 21, 44), retain only 74–88% (SI Appendix,TableS4) of core pro- ments. Similar trends in gene loss have been documented in the teins. Retention of the conserved protein set implies that OmCyn

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1902538116 Nakayama et al. Downloaded by guest on September 26, 2021 Fig. 4. Metagenomic-read recruitment plot of the OmCyn genome; 250 million Tara Oceans metagenomic reads from each of four filter size fractions (0.8–5, 5–20, 20–180, and 180–2,000 μm), totaling 1 billion metagenomic reads, were used for the mapping. Each marker in the left plot indicates reads with ≥100-bp alignment lengths. The histogram on the right shows the number of mapped reads. The asterisk indicates the group of reads that did not exactly match the OmCyn genome sequence but had high similarity (>90% identity). The color code represents filter size fractions from which each metagenomic read was obtained.

retains metabolic independence from its host cell, contrasting sharply lineages that were overlooked in previous metagenomic analyses, with obligate intracellular cyanobacterial symbionts. and further emphasize the significance of symbiosis as an important The genome streamlining of OmCyn cyanobacteria can be lifestyle in oceanic microbial ecology. explained by the intimate association between the symbiont and its host; if the habitat of the cyanobacteria is limited to inside the Materials and Methods O. magnificus symbiotic chamber, the effect of genetic drift Sampling, Cell Isolation, Genome Amplification, and Sequencing. An Ornitho- should increase due to shrinkage of the effective population size cercus magnificus cell containing cyanobacterial symbionts (SI Appendix, Fig. S2A) was isolated from a natural sample collected using a microplankton net (49). An increased effect of genetic drift would accelerate the μ loss of genes dispensable in the relatively mild and stable envi- (mesh size 25 m) in Shimoda Bay, Shizuoka, Japan, on 2017 September 14. The water temperature of the sampling area was 23.7 °C. The isolated di- ECOLOGY ronment inside the symbiotic chamber. noflagellate cell was transferred into sterilized , and the cyano- The characteristics of OmCyn also shed light on another im- bacterial symbionts residing in a symbiotic chamber were isolated using a portant question regarding the function of the symbiont within microcapillary pipette under an inverted microscope (SI Appendix, Fig. S2B). dinoflagellate cells. Some symbiotic cyanobacteria provide fixed The genomes of the isolated cyanobacterial cells were amplified and ana- nitrogen to their hosts (17, 21, 50); however, this cannot be the lyzed using two different high-throughput sequencing platforms, Illumina case for the relationship between OmCyn and O. magnificus MiSeq and Oxford Nanopore MinION. Detailed descriptions of genome since all species of the marine picocyanobacterial clade, in- amplification and sequencing are provided in SI Appendix, SI Materials cluding OmCyn, lack genes encoding proteins required for ni- and Methods. trogen fixation. As the OmCyn genome retains genes required De Novo Genome Assembly, Scaffold Screening, and Genome Annotation. for photosynthetic activity, and since Ornithocercus is a non- Paired-end short reads from the two Illumina libraries were assembled in- photosynthetic dinoflagellate, a reasonable explanation for this to scaffolds with the nanopore reads using SPAdes (version 3.11.1) (52, 53) in symbiotic relationship would be that the symbiont provides a hybrid assembling mode (54). Preliminary assessment of the resulting scaf- source of organic carbon for its host in open oceans, an oligo- folds showed that the majority exhibited similarity to marine picocyano- trophic habitat. It remains ambiguous how the host cell takes up bacterial genome sequences while some had similarity to proteobacterial nutrients from the symbionts, which reside in an extracellular sequences, indicating the presence of contaminating genome sequences. To space (i.e., the symbiotic chamber). An intriguing possibility that remove the contaminating sequences, we conducted scaffold screening Ornithocercus species and their relatives ingest and digest their based on phylogenetic analyses (see SI Appendix, SI Materials and Methods cyanobacterial symbionts has been raised, based on transmission for details). With this procedure, 16 scaffolds totaling 1.879 Mbp were re- trieved as the OmCyn genome, while 8 scaffolds totaling 2.371 Mbp were microscopy observations (33, 51). In symbiont-bearing judged to be derived from contaminated (noncyanobacterial) cells. Anno- dinoflagellate cells, vacuoles with inclusions resembling tation of predicted open reading frames in the OmCyn genome and pre- degraded cyanobacterial symbionts were observed (33). If true, diction of tRNA and rRNA genes was initially conducted with Prokka (55) given the intimate association between OmCyn and its host, using a prokaryotic protein database including Reference Sequence (RefSeq) cyanobacterial symbionts could be considered a “domesticated breed” protein sequences from a wide range of cyanobacterial lineages (SI Ap- farmed within the specialized chambers of the dinoflagellates. pendix, Table S6), particularly marine picocyanobacteria, and then checked manually. The annotated genome data are available from the DNA Data Conclusions Bank of Japan/GenBank/European Molecular Biology Laboratory DNA da- tabases under BioProject accession number PRJDB7787. Metagenomic analyses using high-throughput sequencing have provided opportunities for greater understanding of the diversity Phylogenetic Analyses. Phylogenetic analyses based on the 16S rRNA gene, the and structure of oceanic microbial communities, including or- internal transcribed spacer, and the multiprotein datasets were conducted as ganisms that have not been cultivated; however, because the described in SI Appendix, SI Materials and Methods. interpretation of metagenomic data still depends heavily on knowledge of the properties of organisms inhabiting an environ- Protein Repertoire Analysis. The OrthoFinder (version 2.1.2) (41) clustering ment (e.g., morphology, lifestyle, and whole-genome sequence), method was used to classify the proteomes predicted from cyanobacterial efforts to obtain basic knowledge about poorly understood organ- genomes, as well as that from the OmCyn genome, into orthologous protein isms must be simultaneously made. In the present study, we focused groups, referred to as orthogroups. The overall statistics for the OrthoFinder analysis, including a list of cyanobacterial genomes used, are presented in SI on, and revealed the genome sequence of, a yet to be cultured – cyanobacterium in symbiosis with a dinoflagellate. The genome Appendix, Table S2. By this orthologous protein clustering procedure, functionally redundant proteins coded by a genome (i.e., products of gene sequence provided insights into its unique phylogenetic position duplications within a genome) were clustered into a single orthogroup. For and reductive genome evolution, and enabled the use of meta- each species, the sum of the total number of orthogroups detected in the genomic data to estimate its global distribution and relationship proteome and the number of proteins unassigned to any orthogroups with its host. Our findings suggest the possibility of cryptic microbial (proteins unique to a species) was defined as the protein repertoire size for

Nakayama et al. PNAS Latest Articles | 5of6 Downloaded by guest on September 26, 2021 the species, without functional redundancy. The cyanobacterial core protein Abundance Estimation Using Tara Oceans Expedition Data and Co-Occurrence set was defined as the protein repertoire conserved in all completely se- Analysis. Relative abundances for OmCyn, free-living cyanobacteria, and O. quenced, free-living cyanobacterial genomes used for this analysis. The magnificus in natural environments were estimated using metagenomic and protein repertoire for the common ancestor of the marine picocyanobac- metabarcoding data provided by the Tara Oceans Expedition (27). Detailed teria was estimated based on the assumption that a protein in common descriptions of the abundance estimation and co-occurrence analysis are between protein repertoires for two groups of genomes that are included in provided in SI Appendix, SI Materials and Methods. the Prochlorococcus clade and Synechococcus subcluster 5.1 should be pos- sessed by the common ancestor of these two clades. Total orthogroups for ACKNOWLEDGMENTS. We are indebted to Kenjiro Hinode at Nagasaki each of the Prochlorococcus clade and Synechococcus subcluster 5.1 were University, Toshihiko Sato, Daisuke Shibata, Tomomi Kodaka, Jiro Takano, separately extracted, and the intersection of the two sets was defined as the and all other staff at the Shimoda Marine Research Center, University of putative protein repertoire for the ancestral marine picocyanobacterium. Tsukuba, for their support in sampling the dinoflagellate. We thank the Tara Oceans Consortium and sponsors who supported the Tara Oceans Expedition Only completely sequenced genomes were used to estimate the protein for making the data accessible. This work was supported by Japan Society for repertoire in the ancestral marine picocyanobacterium. KEGG orthology ID the Promotion of Science KAKENHI Grants (16H04826, 17K15164, 18KK0203, assignment for the putative ancestral proteins (2,114 proteins) was performed and JP16H06280) and Advanced Bioimaging Support platform (ABiS): using eggNOG-mapper (56) based on eggNOG 4.5 orthology data (57). Imaging Marine Organisms.

1. W. K. W. Li, Primary production of prochlorophytes, cyanobacteria, and eucaryotic 30. R. A. Foster, E. J. Carpenter, B. Bergman, Unicellular cyanobionts in open di- ultraphytoplankton: Measurements from flow cytometric sorting. Limnol. Oceanogr. noflagellates, radiolarians, and : Ultrastructural characterization and 39, 169–175 (1994). immuno-localization of and . J. Phycol. 42, 453–463 (2006). 2. D. G. Capone, J. P. Zehr, H. W. Paerl, B. Bergman, E. J. Carpenter, ,a 31. C. de Vargas et al., Eukaryotic diversity in the sunlit ocean. Science 348, globally significant marine cyanobacterium. Science 276, 1221–1229 (1997). 1261605 (2015).

3. J. P. Zehr et al., Unicellular cyanobacteria fix N2 in the subtropical North Pacific Ocean. 32. Y. B. Okolodkov, Dinophysiales () of the national park Sistema Arrecifal Ve- Nature 412, 635–638 (2001). racruzano, Gulf of Mexico, with a key for identification. Acta Bot. Mex. 106,9–71 (2014). 4. F. Partensky, J. Blanchot, D. Vaulot, Differential distribution and ecology of Pro- 33. I. A. N. Lucas Symbionts of the tropical dinophysiales (Dinophyceae). Ophelia 33, 213– chlorococcus and Synechococcus in oceanic waters: a review. Bull. de l’Institut 224 (1991). Océanographique, Monaco 19,457–475 (1999). 34. G. K. Farrant et al., Delineating ecologically significant taxonomic units from global pat- 5. D. J. Scanlan et al., Ecological genomics of marine picocyanobacteria. Microbiol. Mol. terns of marine picocyanobacteria. Proc. Natl. Acad. Sci. U.S.A. 113, E3365–E3374 (2016). Biol. Rev. 73, 249–299 (2009). 35. B. Díez et al., Metagenomic analysis of the Indian Ocean picocyanobacterial com- 6. A. D. Jungblut, C. Lovejoy, W. F. Vincent, Global distribution of cyanobacterial eco- munity: Structure, potential function and evolution. PLoS One 11, e0155757 (2016). types in the cold biosphere. ISME J. 4, 191–202 (2010). 36. F. Humily et al., Development of a targeted metagenomic approach to study a ge- 7. N. A. Ahlgren, G. Rocap, Diversity and distribution of marine Synechococcus: Multiple nomic region involved in light harvesting in marine Synechococcus. FEMS Microbiol. gene phylogenies for consensus classification and development of qPCR assays for Ecol. 88, 231–249 (2014). sensitive measurement of clades in the ocean. Front. Microbiol. 3, 213 (2012). 37. K. Zwirglmaier et al., Basin-scale distribution patterns of picocyanobacterial lineages 8. S. Huang et al., Novel lineages of Prochlorococcus and Synechococcus in the global in the Atlantic Ocean. Environ. Microbiol. 9, 1278–1290 (2007). oceans. ISME J. 6, 285–297 (2012). 38. S. Mazard, M. Ostrowski, F. Partensky, D. J. Scanlan, Multi-locus sequence analysis, 9. G. Rocap et al., Genome divergence in two Prochlorococcus ecotypes reflects oceanic taxonomic resolution and biogeography of marine Synechococcus. Environ. Micro- niche differentiation. Nature 424, 1042–1047 (2003). biol. 14, 372–386 (2012).

10. G. C. Kettler et al., Patterns and implications of gene gain and loss in the evolution of 39. H. Farnelid, W. Tarangkoon, G. Hansen, P. Hansen, L. Riemann, Putative N2-fixing Prochlorococcus. PLoS Genet. 3, e231 (2007). heterotrophic bacteria associated with dinoflagellate-Cyanobacteria consortia in the 11. F. Partensky, L. Garczarek, Prochlorococcus: Advantages and limits of minimalism. low-nitrogen Indian Ocean. Aquat. Microb. Ecol. 61, 105–117 (2010). Annu. Rev. Mar. Sci. 2, 305–331 (2010). 40. R. E. Norris, Proceeding of the Seminar on , Salt and , V. Krishnamurti, Ed. 12. J. Decelle, S. Colin, R. A. Foster, Marine : Diversity and Dynamics,S.Ohtsuka,T. (Central Salt and Marine Chemicals Research Institute, Bhavnagar, 1967), pp. 178–189. Suzaki, T. Horiguchi, N. Suzuki, F. Not, Eds. (Springer Japan, Tokyo, 2015), pp. 465–500. 41.D.M.Emms,S.Kelly,OrthoFinder:Solvingfundamental biases in whole genome compari- 13. E. J. Carpenter, R. A. Foster, Cyanobacteria in Symbiosis (Kluwer Academic Publishers, sons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015). Dordrecht, Netherlands, 2002), pp. 11–17. 42. M. Kanehisa, Y. Sato, M. Kawashima, M. Furumichi, M. Tanabe, KEGG as a reference 14. K. Hagino, R. Onuma, M. Kawachi, T. Horiguchi, Discovery of an endosymbiotic resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016). nitrogen-fixing cyanobacterium UCYN-A in Braarudosphaera bigelowii (Prymnesio- 43. H. Kuwahara et al., Reduced genome of the thioautotrophic intracellular symbiont in phyceae). PLoS One 8, e81749 (2013). a deep-sea clam, Calyptogena okutanii. Curr. Biol. 17, 881–886 (2007).

15. J. P. Zehr et al., Globally distributed uncultivated oceanic N2-fixing cyanobacteria lack 44. E. C. Nowack, M. Melkonian, G. Glöckner, Chromatophore genome sequence of oxygenic II. Science 322, 1110–1112 (2008). sheds light on acquisition of photosynthesis by eukaryotes. Curr. Biol. 18, 16. H. J. Tripp et al., Metabolic streamlining in an open-ocean nitrogen-fixing cyano- 410–418 (2008). bacterium. Nature 464,90–94 (2010). 45. H. Charles et al., A genomic reappraisal of symbiotic function in the aphid/Buchnera 17. A. W. Thompson et al., Unicellular cyanobacterium symbiotic with a single-celled symbiosis: Reduced transporter sets and variable membrane organisations. PLoS One eukaryotic alga. Science 337, 1546–1550 (2012). 6, e29096 (2011). 18. F. M. Cornejo-Castillo, et al., Cyanobacterial symbionts diverged in the late Cretaceous 46. G. J. Sharples, For absent friends: Life without recombination in mutualistic towards lineage-specific nitrogen fixation factories in single-celled . γ-. Trends Microbiol. 17, 233–242 (2009). Nat. Commun. 7, 11071 (2016). 47. E. P. C. Rocha, E. Cornet, B. Michel, Comparative and evolutionary analysis of the 19. D. Bombar, P. Heller, P. Sanchez-Baracaldo, B. J. Carter, J. P. Zehr, Comparative ge- bacterial systems. PLoS Genet. 1, e15 (2005). nomics reveals surprising divergence of two closely related strains of uncultivated 48. H. Kuwahara et al., Loss of genes for DNA recombination and repair in the reductive UCYN-A cyanobacteria. ISME J. 8, 2530–2542 (2014). genome evolution of thioautotrophic symbionts of Calyptogena clams. BMC Evol. 20. J. A. Hilton et al., Genomic deletions disrupt nitrogen pathways of a Biol. 11, 285 (2011). cyanobacterial symbiont. Nat. Commun. 4, 1767 (2013). 49. N. A. Moran Microbial minimalism: Genome reduction in bacterial pathogens. Cell 21. T. Nakayama et al., Complete genome of a nonphotosynthetic cyanobacterium in a 108, 583–586 (2002). diatom reveals recent adaptations to an intracellular lifestyle. Proc. Natl. Acad. Sci. 50. T. Nakayama, Y. Inagaki, Genomic divergence within non-photosynthetic cyano- U.S.A. 111, 11407–11412 (2014). bacterial in rhopalodiacean . Sci. Rep. 7, 13075 (2017).

22. T. Nakayama, Y. Inagaki, Unique genome evolution in an intracellular N2-fixing 51. W. Tarangkoon, G. Hansen, P. Hansen, Spatial distribution of symbiont-bearing di- symbiont of a rhopalodiacean diatom. Acta Soc. Bot. Pol. 83, 409–413 (2014). noflagellates in the Indian Ocean in relation to oceanographic regimes. Aquat. Mi- 23. F. Schütt, “Die peridineen der plankton-expedition” in Ergebnisse der Plankton-Expedition crob. Ecol. 58, 197–213 (2010). der Humboldt-Stiftung (Lipsius & Tischer, 1895), pp. 1–170. 52. A. Bankevich et al., SPAdes: A new genome assembly algorithm and its applications to 24. R. A. Foster, J. L. Collier, E. J. Carpenter, Reverse transcription PCR amplification of single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012). cyanobacterial symbiont 16S rRNA sequences from single non-photosynthetic eu- 53. S. Nurk et al., Assembling single-cell genomes and mini-metagenomes from chimeric karyotic marine planktonic host cells. J. Phycol. 42, 243–250 (2006). MDA products. J. Comput. Biol. 20, 714–737 (2013). 25. P. Flombaum et al., Present and future global distributions of the marine Cyanobacteria 54. D. Antipov, A. Korobeynikov, J. S. McLean, P. A. Pevzner, hybridSPAdes: An algorithm Prochlorococcus and Synechococcus. Proc. Natl. Acad. Sci. U.S.A. 110,9824–9829 (2013). for hybrid assembly of short and long reads. Bioinformatics 32, 1009–1015 (2016). 26. S. J. Biller, P. M. Berube, D. Lindell, S. W. Chisholm, Prochlorococcus: The structure and 55. T. Seemann, Prokka: Rapid prokaryotic genome annotation. Bioinformatics 30, 2068– function of collective diversity. Nat. Rev. Microbiol. 13,13–27 (2015). 2069 (2014). 27. S. Sunagawa et al., Structure and function of the global ocean . Science 56. J. Huerta-Cepas et al., Fast genome-wide functional annotation through orthology 348, 1261359 (2015). assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017). 28. S. Pesant et al., Open science resources for the discovery and analysis of Tara Oceans 57. J. Huerta-Cepas et al., eggNOG 4.5: A hierarchical orthology framework with im- data. Sci. Data 2, 150023 (2015). proved functional annotations for eukaryotic, prokaryotic and viral sequences. Nu- 29. Q. Carradec et al., A global ocean atlas of eukaryotic genes. Nat. Commun. 9, 373 (2018). cleic Acids Res. 44, D286–D293 (2016).

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1902538116 Nakayama et al. Downloaded by guest on September 26, 2021