<<

ARTICLE

https://doi.org/10.1038/s42003-019-0500-9 OPEN assembly of oceanica provides evidence of host nucleus overthrow by the symbiont nucleus during speciation

Li Guo1, Sijie Liang1, Zhongyi Zhang1, Hang Liu1, Songwen Wang2, Kehou Pan3, Jian Xu4, Xue Ren5, Surui Pei5 &

1234567890():,; Guanpin Yang1,6,7

The of the Nannochloropsis are unique in their maintenance of a nucleus- continuum throughout their cell cycle, non-motility and asexual reproduction. These char- acteristics should have been endorsed in their assemblages (). Here we show that N. oceanica has a genome of 29.3 Mb consisting of 32 pseudochromosomes and con- taining 7,330 -coding ; and the host nucleus may have been overthrown by an ancient red alga symbiont nucleus during speciation through secondary endosymbiosis. In addition, N. oceanica has lost its flagella and abilities to undergo meiosis and sexual repro- duction, and adopted a genome reduction strategy during speciation. We propose that N. oceanica emerged through the active fusion of a host protist and a photosynthesizing ancient red alga and the symbiont nucleus became dominant over the host nucleus while the was wrapped by two layers of endoplasmic reticulum. Our findings evidenced an alternative speciation pathway of .

1 College of Marine Life Sciences, University of China (OUC), Qingdao 266003, P. R. China. 2 College of Agriculture and Resources and Environment, Tianjin Agricultural University, Tianjin 300384, P. R. China. 3 Laboratory of Applied , College of Fisheries, OUC, Qingdao 266003, P. R. China. 4 Functional Genomics Group, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao 266101, P. R. China. 5 Annoroad Gene Technology (Beijing) Co., Ltd, Beijing 100176, P. R. China. 6 Institutes of Evolution and Marine Biodiversity, OUC, Qingdao 266003, P. R. China. 7 Key Laboratory of Marine Genetics and Breeding of Ministry of Education, OUC, Qingdao 266003, P. R. China. Correspondence and requests for materials should be addressed to G.Y. (email: [email protected])

COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio 1 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9

pecies in the genus Nannochloropsis are promising for them may originate from an ancient red alga symbiont nucleus industrial applications because they accumulate large through secondary endosymbiosis during speciation. We also S 1,2 quantities of polyunsaturated fatty acids . They also show found that N. oceanica has adopted a genome reduction strategy promising characteristics allowing convenient genetic improve- during speciation; it has lost its flagella and abilities to undergo ment, which include, for example, a monoploid nucleus and meiosis and sexual reproduction. We propose that N. oceanica asexual reproduction, as revealed by mutation3, homologous emerged through the active fusion of a host protist and a pho- recombination4, and preliminary genome sequencing5. They are tosynthesizing ancient red alga and the symbiont nucleus became easy to genetically manipulate. Various species of the genus dominant over the host nucleus, while the chloroplast was Nannochloropsis are able to perform homologous recombination4 wrapped by two layers of endoplasmic reticulum. Our findings and genome editing6,7. The species in genus Nannochloropsis are evidenced an alternative speciation pathway of eukaryotes. mainly used as feed for fish larvae and , and as food additives for human nutrition. They have also evolved gradually into the models for both industrial applications and biological Results research8,9. Genome assembly and characterization. We sequenced the Genus Nannochloropsis comprises six known species that genome of N. oceanica with a third-generation sequencing tech- inhabit marine, freshwater and brackish environments10,11. nique. The quality reads were assembled into contigs that were Recently, a phylogenetic analysis of the concatenated 18S ribo- then grouped, ordered and oriented into pseudochromosomes somal RNA (rDNA) and rbcL for Nannochloropsis taxon revealed based on Hi-C revealed chromatin conformational information. a new species, N. australis, and erected a new genus, Micro- In total, 3.31 G raw PacBio reads (4.3 kb in average length, ~113- chloropsis, comprising M. salina and M. gaditana, which were fold of the genome) were generated on the Sequel System, and modified from N. salina and N. gaditana, respectively12. Whether 773,137 clean reads were obtained. The PacBio reads were self- they belong to Nannochloropsis or Microchloropsis, the species in corrected and assembled with Canu and polished using PacBio these taxa are assigned to class Eustigmatophyceae, reads and Illumina reads (~105-fold of genome), and then 10 , superphylum Heterokonta (or ) . assembled into 129 contigs with a contig N50 of 664.75 kb. The Heterokonta refers to flagellated cells with two differently shaped longest contig was 1.54 Mb in length. The assembled genome was flagella. Of the species in Heterokonta, the colored alga-like 29.3 Mb in total length. The GC content of the genome was species in phylum Ochrophyta include and golden, 54.01%. To evaluate the integrity of the assembly, the assembly brown, and yellow-green , among others. However, all of the was assessed with 303 Benchmarking Universal Single Copy species in the genera Nannochloropsis and Microchloropsis are Orthologs (BUSCO)29,30 genes from eukaryota ortholog database, small, nonmotile, and spherical in shape13; they can be dis- of which 264 (87.1%) were annotated and 255 (84.2%) were tinguished only by their rbcL and rDNA sequences14–16. The intact. In order to anchor the 129 contigs into pseudochromo- algae of genus Nannochloropsis are able to build up high con- somes, the Hi-C library was constructed and ~425.8 million reads centrations of diverse pigments, such as , zeaxanthin were generated on Illumina platform. In total, ~201.7 million and canthaxanthin17 and have a but completely lack valid interaction reads were used to build the interaction matrices b and c18. and draw the heatmap. Eighty six contigs were ordered as trunks The genomes of two species in the genus Nannochloropsis, N. which contained 26,395,858 bp (~93.84%) and the others were gaditana19,20 and N. oceanica21,22, have been sequenced on the reinserted in between the trunks to maximize the amount of Illumina platform. Their genomes vary between 28.5 and 29.0 Mb linkage. As results, 128 contigs were ordered (28,129,022 bp, in size and are characterized by a high density of genes, low ~95.99%) and these contigs have high-confidential orientations content, short spacers between genes19 and few repetitive within each chromosome-scale group as were calculated with sequences21. In addition, a comparison of the genomes of all six weighted directed acyclic graph (WDGA). With the assistance of species in the genus Nannochloropsis revealed an extreme case of Hi-C data, the primary genome assembly was scaled up to 32 23 = dose expansion among biosynthesis genes . These genomes pseudochromosomes (pseudochromosomal N50 1.14 Mb), are all sequenced on Illumina platforms, and have not been scaled including 30 nuclear, one mitochondrial, and one chloroplast up to pseudochromosomes with the newly commercialized pseudochromosomes (Table 1, Supplementary Tables 1–4, technologies, including single-molecule real-time (SMRT) Fig. 1a–c). The possible errors revealed by the heatmap showing sequencing24 and Hi-C technology25,26, among others. Such interactions among the pseudochromosomes (Fig. 1b) may result scenario is detrimental to identify completely the genes each from the contigs containing high portions of repeats rather than genome. from the order and orientation of the contigs themselves. We noted in particular that the of N. oculata is Gene modeling and homolog searching identified 7330 unusual; a nucleus-plastid continuum exists throughout its cell nonredundant protein-coding genes. The exons of these genes cycle27, but the basal body and flagellum observable in green were 483.7 bp in average. The of these genes were 370.9 algae28 are absent. Considering comprehensively the above bp in average. The number of exons in each gene was 2.87. findings, we suspected that an invaded red algal nucleus may have Different numbers of non-protein-coding miRNA, tRNA, rRNA, overthrown the host protist nucleus during evolution, i.e., a red and snRNA genes were also predicted, which in total accounted algal nucleus has occupied the host cytoplasm, and the host for 0.0674% of the genome. The repeated sequences occupied nucleus has been expelled. Unfortunately, these unique char- ~19.52% of the genome (Table 1; Supplementary Tables 5–8). acteristics have not been deciphered genetically and The genome assembly of N. oceanica contained 7330 protein- evolutionarily. coding genes, which were 2084 bp in average. Unfortunately, a In this study, we sequenced the genome of N. oceanica with a portion of these genes were annotated as putative/hypothetical real-time DNA sequencing method and scaled up the primary protein-coding genes. It was interesting to note that 1030 genes assembly with the chromatin conformation information revealed (14.05%) contained 2354 transposons, ~2.26 transposons each. by Hi-C data, aiming to look for evidence supporting our This result was different from those in higher plants where assumptions. A genome assembly of 29.3 Mb containing 32 retrotransposons are the greatest contributors to the size chromosome-scale scaffolds (pseudochromosomes) was obtained. expansion of the genome21. The transposons in the genome of The genome contains 7330 protein-coding genes, and most of N. oceanica were fewer in number, indicating reduction of N.

2 COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9 ARTICLE

Table 1 The characteristics of N. oceanica genome three species, N. oceanica, A. anophagefferens, and P. tricornutum, as data points; their genome sequences are currently available, and their positions on the 18S rDNA phylogenetic tree are Assembled genome size (bp) 29,303,273 appropriate for determining the sources of genes. We counted the Read coverage depth 112× homologs among N. oceanica, A. anophagefferens, and P. No. of contigs 129 tricornutum. Of 7330 protein-coding genes of N. oceanica, 2867 Contig N50 (bp) 664,749 Length of maximum contig 1,540,838 genes were unique, and 4463 genes were homologous to A. No. of contigs clustered 129 anophagefferens (656) or P. tricornutum (1329) only or to both of No. of contigs ordered and oriented 128a them (2478) (Fig. 3b). Judging from the number of homologous No. of contigs in trunks 86b genes, the nucleus of N. oceanica is more similar to that of P. No. of pseudochromosomes 32 tricornutum and, thus, more distant from that of A. anopha- No. of nuclear chromosomes 30 gefferens (1329 vs. 656). Except for the microevolutionary tree, the No. of chloroplast chromosome 1 OrthoFinder calculated also the phylogenetic trees for the No. of mitochondrial chromosome 1 homologs using BLAST, MAFFT, and FastTree as the tools and Total length of pseudochromosomes (bp) 29,303,273 and Glaucophytes as outgroup. A gene of N. oceanica Max. length of pseudochromosomes (bp) 1,670,642 No. of protein-coding genes 7330 is from the host nucleus if it is phylogenetically near its homolog Average length of protein-coding genes (bp)c 2084 from A. anophagefferens (Pattern I; Fig. 3c), while a gene of N. Length percentage of repeat sequence (bp) 19.54 oceanica is from the symbiont nucleus if it is phylogenetically far Length percentage of non-protein-coding genes (bp) 0.0674 from its homolog in A. anophagefferens (Patterns II and III, No. of exons each gene 2.87 Fig. 3c). In order to clarify the evolutionary relationships of 2478 Average length of exons (bp) 483.7 N. oceanica with homologs in both A. anophagefferens Average length of introns (bp) 370.9 and P. tricornutum, we aligned the amino acid sequences of these aOne contig is a singleton, which is not related with any other contigs proteins one by one by setting that a cluster must contain only bA trunk contains at least more than three contigs two sequences, and found that 645 (26%) belonged to Pattern I, cA portion of these genes are annotated as hypothetical 635 (26%) to Pattern II, and 722 (29%) to Pattern III. The remaining 476 cannot be categorized into these patterns; they are oceanica genome (Table 1; Supplementary Table 9; Fig. 1d). The encoded by multiple copy genes (Fig. 3c). The genes in pattern I high-quality genome assembly ensured that searching for were phylogenetically near A. anophagefferens, indicating their homologous genes and phylogenetic analyses of both genomes host nuclear origin, while those in patterns II and III were and genes can be carried out appropriately. phylogenetically far from A. anophagefferens, indicating their symbiont nuclear origin. Judging again from the number of genes homologous to those of both A. anophagefferens and P. Phylogenetic position of N. oceanica within Heterokonta. The tricornutum, Nannochloropsis nuclei are different from those of phylogenetic tree of 18S ribosomal RNA gene (rDNA) sequences A. anophagefferens (1357 vs. 645). Therefore, we believe that the representing Rhodophyta, , Viridiplantae, and Glau- Nannochloropsis nuclei derive mainly from symbiont nuclei. cophytes, among others, showed that Nannochloropsis were at the The protoplasts of Chromista are believed to have a single top of the current Chromista (Fig. 2c; Supplementary Fig. 1), ancient origin31,32. However, we found that the phylogenetic which were similar to some peridinin-containing dinoflagellates position of class Eustigmatophyceae is unusual. As indicated by evolved through tertiary endosymbiosis. The nuclei involved in the phylogenetic tree of the rbcL, families Eustigmatophyceae and diverse endosymbioses belong to different protists. The positions clustered separately from single-cell of Nannochloropsis on the phylogenetic tree of 18S rDNA were as and Chromista (Fig. 2d; Supplementary Fig. 2). The whole- expected and demonstrated that the nuclei of Nannochloropsis are genome-based microevolutionary analysis showed that Nanno- host ones. It was expected that the vast majority of Nanno- chloropsis formed a clade outside the single-cell red algae (Fig. 2b). chloropsis protein-coding genes could be placed at the position of These findings demonstrated that the symbiont carries an ancient rDNA. However, we found that the whole-genome-based red algal chloroplast. Accordingly, we believe that the symbiont is microevolution of Heterokonta showed a parallel relationship an ancient red alga that is phylogenetically distant from the between Nannochloropsis and the microalgae derived through the current single-cell microalgae. second endosymbiosis (Fig. 2a), indicating that the vast majority of Nannochloropsis protein-coding genes are phylogenetically different from rDNA. This discrepancy implied that Nanno- Simple protein import mechanism into the plastid. We have chloropsis nuclei have different origins from those of other extensively searched the genome of N. oceanica obtained in this Chromista. study and those of N. gaditana, P. tricornutum, C. merolae, and G. We calculated the percentages of homologous genes of two sulphuraria, which were published previously, for the protein diatoms, one golden microalga and two Nannochloropsis in E. components of translocons (or complexes) mediating protein huxleyi, G. theta (nucleus) and two single-cell red algae (Fig. 3a). import across the four membranes covering their using We found that the percentages of genes homologous to those of the identified proteins33,34 as the queries. We found that the red algae are similar among Nannochloropsis and other major components of SELMA complexes were absent in the two Chromista, while the percentages of genes homologous to those species of Nannochloropsis, and a few residual proteins had no of Haptophytes and Cryptophytes are obviously different (Fig. 3a). predictable bipartite topogenic signal except sDer1-1 and sHsp70 The percentages of Nannochloropsis genes with homologs in of N. oceanica (Table 2). We also found that TOC75, the main Haptophytes and Cryptophytes are obviously lower than those of component of the import complex across the outer membrane of diatoms and golden microalgae, indicating that a portion of plastids, was absent in N. oceanica, while its alternative, Omp85, Nannochloropsis genes have been replaced by those from the existed (Table 3). In addition, the main component of the import symbiont nucleus. When we calculated the microevolutionary complex across the inner membrane of plastids, Tic110, was tree, OrthoFinder simultaneously generated a list showing the found to exist in N. oceanica but not in N. gaditana, which may homologs among species used to calculate the tree. We selected be due to the incompleteness of its genome sequence. All these

COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio 3 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9

a c + chr1 – + chr2 – + chr3 – + chr4 – + chr5 – NR + NT chr6 – + chr7 – + chr8 – + BLASTX chr9 – BLASTP + 1964 1 chr10 – + chr11 – + 17 chr12 – + 178 0 chr13 – + chr14 – + chr15 – 15 0 3 0 + chr16 – + chr17 – + chr18 – + 456 chr19 – + chr20 – 0 18 + chr21 – + 0 4178 chr22 – + chr23 – + chr24 – + 8 chr25 – + chr26 – + chr27 – + chr28 – + chr29 – + chr30 – + C – + M –

b d

chr1 –10.00

chr2 chr10

1 1050 1100 1000

150

950 900 chr9 850 800 750 700 650 550 600 0 500 50 200 100 150 250 450 300 350 400 400 450 500 350 550 600 300 650 250 700 200 750 150 800 850 900 100 950 1000 100 1050 chr1 150 1 M 50 1 0 0 50 50 100 150 chr3 0 200 1 C 250 300 100 350 –9.75 400 450 50 500 550 0 600 1300 650 1250 700 1200 750 1150 800 1 850 900 1050100 950 1000 1000 1050100 950 1 900 850 800 chr4 750 chr12 700 0 650 50 chr7600 100 550 150 500 200 450 250 400 300 350 350 300 400 250 450 200 500 150 550 100 600 chr5 650 50 700 750 0 800 850 900 –9.50 1300 950 1250 1000 1200 1050 1 1100 1 150 1050100 chr6 1000 950 0 900 50 850 100 chr13 800 150 750 200 700 250 650 300 600 350 550 400 chr7 chr6500 450 450 500 400 550 350 600 300 650 250 700 200 750 150 800 100 850 900 chr8 –9.25 50 950 0 1000

1500 0 1450 50 1400 100 chr10 1350 150 chr14 1300 200 1250 250 1200 300 1150 350 1100 400 1050 450 1000 500 chr11 950 550 900 600 850 650 800 700 750 750 800 700 850 650 900 –9.00 chr4 chr12 600 950 550 500 450 400 350 0 300 50 chr13 250 100 200 150

150 200 chr16 100 250 50 300 0 350 400 chr14 450 500 550 600 350 650 300 700 250 750 chr15 200 800 150 850 100 –8.75 50 900 0

chr16 chr30 0 50 1550 100 1500 150

1450 200 1400 250 chr17 chr17 1350 300 1300 350 1250 400 1200 450 1150 500 100 550 chr18 1 600 1050 650 1000 950 700 900 750 850 800 800 850 chr19 750 –8.50 700 650

600 0

550 50 chr3 500 100

chr20 450 150

400 200 350 250 chr18 300 300 250 350 200 400 150 450 chr21 100 500 500 550 600 650 700 chr22 750 800 500 450 400 350 chr23 0 300 50 250 100 200 150 150 200

–8.25 100 250 50 0

chr24 300 350 chr27 400 450 500 550 550 chr25 500 600 650 450 700

400 750 350 800

300 850 250 900 chr26 200 950 chr2 150 1000

10050 0 1050 1

1 100 1200150 1250 chr27 chr26 1300 1350 1400 550 1450 500 1500 450 1550 400 chr28 350 300 250

200 0 150 50 50 0 100 100 150

200 250 chr29 300 350

chr25 400 450 600 500 550 550

600 500 650 450 700 400 750 chr30 800 –8.00 350 300 chr20 250 0

0 200 50 50

150 100 150 100 200 250 300 350 450 400

500 550 6

7 6

0 0 600 0 5 550 500 450

400 50 0 chr24 350 0 0 300 250 200 150 100 chr21 chr22 chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23 chr24 chr25 chr26 chr27 chr28 chr29 chr30

Fig. 1 Protein-encoding gene models on each pseudochromosomes of N. oceanica. Mitochondrial and chloroplast pseudochromosomes are presented linearly but they can be circulated as their two ends contain the repeat assemblies of sequences. a The pseudochromosomes with gene models marked with vertical bars. The green bars represent gene models in the sense strand, while the yellow ones in the anti-sense strand of DNA. b The heatmap showing the interaction between 30 nuclear pseudochromosomes. c The Venn diagram of functional genes annotated against NT, NR, BLASTX, and BLASTP databases. d The similarity between nuclear pseudochromosomes. Less homozygosity is found among these pseudochromosomes, which is one of the characteristics of N. oceanica, genome reduction

findings indicated that N. oceanica and N. gaditana have either (stramenopiles, haptophytes, diatoms, and green algae) protists evolved a different protein import system from that of P. tri- and found a nearly two-fold differential between motile and cornutum or do not need these complexes at all. nonmotile protists (104 vs. 55). We found that the total numbers of flagellum-associated genes in N. oceanica and N. gaditana were 29 and 17, respectively (Table 4 and Supplementary Data 1), Nonmotility of N. oceanica. The flagellum is an axial 9 × 2 which explained to some extent the two most salient character- +2 structural assembly that includes substructures such as dynein istics of Nannochloropsis species, their small size and nonmotility. arms, radial spokes, and a central pair complex. Jang et al.35 The loss of flagellum-associated genes also reflected the reduction described the existence of flagellum-associated genes in motile of Nannochloropsis genomes. From being motile to being small (mainly dinoflagellates), motile small (cryptophytes, eugleno- and motile to being completely nonmotile, the number of phytes, stramenopiles, and haptophytes), and nonmotile movement-associating genes is reduced exponentially.

4 COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9 ARTICLE

a Nannochloropsis oceanica b Nannochloropsis granulata Nannochloropsis oculata Nannochloropsis gaditana Nannochloropsis limnetica Phaeodactylum tricornutum Nannochloropsis oceanica Nannochloropsis gaditana Nannochloropsis salina anophagefferens Phaeodactylum tricornutum Thalassiosira pseudonana Guillardia theta Aureococcus anophagefferens Guillardia theta Galdieria sulphuraria Emiliania huxleyi Cyanidioschyzon merolae Galdieria sulphuraria Cyanidioschyzon merolae Chlorella variabilis Cyanophora paradoxa Chlamydomonas reinhardtii Chlorella variabilis Cyanophora paradoxa Chlamydomonas reinhardtii

0.05 0.5

cd 1 Eustigmatophyceae 1 Bacillariophyta 0.74 1 Synurophyceae 0.83 1 1 Synurophyceae 0.97 0.99 Chrysophyceae 1 0.79 1 Chrysolepidomonas 0.98 1 Raphidophyceae 1 1 1 Pelagophyceae 0.92 0.98 1 1 Raphidophyceae 1 PX clade 0.61 1 Dictyochophyceae 1 PX clade 1 0.91 1 1 Bacillariophyta 0.67 Rhodophyta

Bacillariophyta 1 Eustigmatophyceae 0.66 1 1 Cryptophyta 1 1 1 Dictyochophyceae

1 Glaucocystophyceae 1 Rhodophyta 1 0.69 1 1 Haptophyceae 1 1 Cryptophyta 1 Viridiplantae 1 1 Haptophyceae 1 1 nucleomorph 1 1 1 Rhodophyta Viridiplantae 1 Rhodophyta Glaucocystophyceae

0.06 0.07

Fig. 2 Phylogeny of Nannochloropsis and their relatives in different taxa deduced from nuclear genome (a), chloroplast genome (b), 18S ribosomal RNA gene (18S rDNA, c) and ribulose bisphosphate carboxylase large chain gene (rbcL)(d). a The phylogenetic tree of species inferred from the OrthoFinder using the genome protein sequences each species from the whole genome. b The phylogenetic tree of species inferred from the OrthoFinder using the chloroplast genome sequences each species. c The phylogenetic trees for 18S rDNA. The tree shows the consensus tree topology inferred by Bayesian analysis using alignments of 18S genes from NCBI. The scale bar indicates the nucleotide substitutions per site. This consensus topology derived from 512 trees, lnL = 22033.73. d The phylogenetic trees for rbcL protein. The tree shows the consensus tree topologies inferred by Bayesian analysis using alignments of rbcL proteins from NCBI. Scale bars represent 0.1 amino acid substitutions per site. In total, 439 aligned amino acid sites were analyzed. This consensus topology derived from 726 trees, α = 0.47 (0.41 < α < 0.56), pI = 0.0019 (0.0000007 < pI < 0.0059) and lnL = 10364.8

Nannochloropsis have lost their ability to move because their of yeast and cell mating and fusion genes among other number of movement-associating genes is approximately one- organisms, including N. oceanica and N. gaditana.Weturned fourth that of motile protists and approximately one-half that of to the second basic process involved in sex determination, the small and nonmotile protists. meiosis, focusing on the meiosis-specific genes developed as a sex detection toolkit42,43. During evolution, the function of a gene may be replaced by Loss of sexual reproduction. From mating types in yeast to other genes, and a gene may evolve a new function. We assumed different sexes in plants and animals and from a single gene to that a meiosis-specific gene was truly absent and not replaced by sex chromosome systems36–39 sex determination mechanisms an unidentified gene if it existed in a relatively wide taxonomic have been intensively investigated. The sexual reproduction and range of species. Of the nine meiosis-specific genes documented characteristics of organisms can be extremely complex; how- early42,43, Rec8 was not identified in the two species of genus ever, mating, cell fusion, and meiosis are among the basic Nannochloropsis. A gene homologous to Rec8 is found in their processes that are absolutely involved in sexual reproduction. genomes; however, it is phylogenetically similar to Rad21,a We have tried to identify the mating and cell fusion-associating mitosis-associating but not specific gene. The remaining eight genes found in yeast40 and diatoms (phylogenetically closer to meiosis-specific genes were identified with BLASTp or tBLASTn Nannochloropsis)41. Unfortunately, we found that this gene set and by phylogenetic analysis in a yeast, two green algae, a brown is extremely diverse; thus, it is difficult to uncover the homologs alga and a haptophyte among the currently available genome

COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio 5 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9

a b 60 Phaeodactylum tricornutum Thalassiosira pseudonana N. oceanica Aureococcus anophagefferens Nannochloropsis ocaenica 50 Nannochloropsis gaditana 2867

40

30

20 656 2478 1329

10 Percentage of homologous gene (%) gene homologous of Percentage

0

A. anophagefferens P. tricornutum

Guillardia theta Emiliania huxleyi Galdieria sulphuraria

Cyanidioschyzon merolae c N. oceanica Pattern I A. anophagefferens 476 P. tricornutum (19%) 645 Pattern I (26%) A. anophagefferens

Pattern II P. tricornutum

N. oceanica Pattern II 722 (29%) 635 Pattern III A. anophagefferens (26%) P. tricornutum Multi-copy gene N. oceanica Pattern III

Fig. 3 Homologous protein-coding genes of N. oceanica found among a few representative microalgal species. a The percentages of homologous genes of P. tricornutum, T. pseudonana, A. anophagefferens, N. oceanica, and N. gaditana found in the nuclear genomes of E. huxleyi, G. theta, C. merolae, and G. sulphuraria. b The homologous protein-coding genes of N. oceanica found in A. anophagefferens but not in P. tricornutum,inP. tricornutum but not in A. anophagefferens and in both of them. c The number and percentage of the homologous genes of N. oceanica found in both A. anophagefferens and P. tricornutum (2478 in total) can be further partitioned into three patterns (I through III) and multi-copy genes. The pattern I contains the genes phylogenetically near to those of A. anophagefferens, while patterns II and III contain the genes phylogenetically far from those of A. anophagefferens sequences (Table 5; Supplementary Data 1). Although they do not flagellated cells, have never been observed among species of genus completely encompass all species reproducing sexually, these Nannochloropsis, class Eustigmatophyceae. This discrepancy between species represent a relatively wide taxonomic range of species. classification and observation prompted us to rethink the phyloge- Eight meiosis-specific genes are conserved among these species, netic position of genus Nannochloropsis within phylum Ochrophyta. indicating that they cannot be replaced and are not substitutable We found that the phylogenetic positions of Nannochloropsis in meiosis. We failed to identify three meiosis-specific genes indicated by their 18S rDNA were different from those indicated (Hop1, Dmc1, and Mnd1) in the genome of N. oceanica and two by their whole genomes. The former implied an identical origin (Hop1 and Hop2) in the genome of N. gaditana (Table 5). These way to modern Chromista, while the latter implied an unusual findings (the absence of meiosis-specific and mating-associating way different from that of modern Chromista. If the nuclei of genes) indicated that species in the genus Nannochloropsis have modern Chromista belong to diverse host protists, those of lost their ability to reproduce sexually. Nannochloropsis should belong to a protist different from all of them. We found that Nannochloropsis contained a similar num- ber of red algal genes with diatoms and golden microalgae but Discussion obviously less genes of Haptophytes and Cryptophytes than Heterokonta (or Stramenopiles) refers to the type of motile life cycle diatoms and golden microalgae, indicating that a portion of stage at which the flagellated cells possess two differentially shaped Nannochloropsis genes should have been replaced by those from flagella. The typical classes in this infrakingdom10 include the colored the symbiont nucleus. The positions of N. oceanica, A. anopha- Ochrophyta (alga-like) and colorless and phyla. gefferens, and P. tricornutum on phylogenetic trees are appro- The classes in the alga-like Ochrophyta include Dictyochophyceae priate for determining the sources of genes; a gene of N. oceanica (axodines), Bacillariophyceae (diatoms), Chrysophyceae (golden is from the host nucleus if it is phylogenetically near its homolog algae), Eustigmatophyceae, Phaeophyceae (), and Xan- from A. anophagefferens, while a gene of N. oceanica is from the thophyceae (yellow-green algae), among others, while those in symbiont nucleus if it is phylogenetically far from its homolog in Pseudofungi include (water molds), among others44. A. anophagefferens. We found that the nucleus of N. oceanica is However, the two most prominent characteristics, motile and more distant from that of A. anophagefferens. N. oceanica has

6 COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9 ARTICLE

Table 2 The identification of host ERAD (endoplasmic reticulum-associated protein degradation) and symbiont SELMA (symbiont-derived ERAD-like machinery) components in N. oceanica (NO), N. gaditana (NG), P. tricornutum (PT), C. merolae (CM), and G. sulphuraria (GS)

Species NOa NGa PTa CMa GSa HS? HS? HS ER translocation Sec61 X − X − X − XX Derlin protein Dfm1, hDer1-1 (sDer1-1) X Xsp XXXXXX Der1, hDer1-2 (sDer1-2) X − X − XXXX Ubiquitinylation Hrd1/Der3 (ptE3p) X − X − XXXX Hrd3 −−−−X − XX Uba1 (sUba1) X X X X X X X X Doa10 X − X − X − XX Ubc6 (sUbc6) Xsf − X − XXXX Polyubiquitin (sUbi) X − X − XX−− Cdc48 complex Cdc48 (sCdc48-1) X − X − XXXX (sCdc48-2) −−−−−X −− Ufd1 (sUfd1) X − X − XXXX Npl4 (sNpl4) X − X − XXXX (sUBX) −−−−−XX X (sPUB) −−−−−X −− Processing Png1 (sPng1) X X X X X X X X Dsk2 (sUbq) X X − XX XX X Rad23 X − X − X − XX Ufd2 X − X − X − XX (ptDUP) −−−−−X −− Unknown function (PPP1) X − X −− XX X Chaperones Hsp70 (sHsp70) X Xsp XXXXXX Hsp40/Ydj1/MAS5 X − X − X − XX (sDPC) −−−−−X −−

The genes of N. oceanica are searched against the genome obtained in this study, while those of N. gaditana are searched against the genome published early19 (e-value < e−5 and annotated as the genes searched). In parentheses, symbiont specific if with s or putatively symbiont specific if without s. X, found; –, not found; ?, not sure as the signal peptide at the N-terminal was not found except two with superscript sp of N. oceanica. sf, annotated as different names but the function is the same aFor these two species, host-specific (H) and symbiont-specific (S) are judged according to the similarity between protein query and its hit. Data cited from Stork et al.42

Table 3 The identification of translocon components in the words, the symbiont nucleus has overthrown the host nucleus. outer and inner membranes (OM and IM) of plastid of N. We propose that Nannochloropsis evolutionarily emerged oceanica (NO), N. gaditana (NG), P. tricornutum (PT), C. through the active fusion of a host protist and a symbiont pho- merolae (CM), and G. sulphuraria (GS) tosynthesizing protist, forming a triad of two nuclei and a plastid; the two nuclei were covered by independent nuclear membranes and shared endoplasmic reticulum, while the plastid was covered Species NO NG PT CM GS by two layers of endoplasmic reticulum outside the two mem- Translocon in IM Tic20 −−XX X branes derived from the primary endosymbiosis; and the two Tic22 X X X X X nuclei interacted with each other, the symbiont nucleus finally Tic55 X X X −− Tic110 X − XX X won the competition, and the host nucleus was overthrown Translocon in OM Toc75 −−−XX during the speciation of Nannochloropsis (Fig. 4). The cell fusion Omp85 X X X X X mode of speciation requires no evolution of new structure and Toc64 X X X X X function, which should have helped Nannochloropsis to thrive Toc34 X X X X X during speciation. We can find more clues among previous studies. For example, The genes of N. oceanica are searched against the genome obtained in this study, while those of remaining species are searched against the genome published early. X, found; –, not found the nucleus and plastid of N. oculata accompany each other throughout the whole cell cycle by forming a nucleus-plastid continuum covered by a membrane with ribosomes on the outer more genes from the symbiont, thus its nucleus derived mainly surface27. This observation supports our hypothesis. The exis- from symbiont nucleus during speciation. As shown on rbcL tence of barrel-shaped nuclear pole bodies27 in N. oculata evi- phylogenetic and the whole-genome-based microevolutionary dence the ancient nature of the nuclei of Nannochloropsis, which trees, the symbiont carries an ancient red algal chloroplast. may explain the unusual position of Nannochloropsis on the rbcL Therefore, we believe that the symbiont is an ancient red alga that phylogenetic tree. In addition, it has been hypothesized previously is phylogenetically distant from the current single-cell microalgae. that the two outermost membranes of complex plastids may both The widely accepted understanding is that a protist engulfed originate from the endoplasmic reticulum45. and enslaved a red alga and discarded its nucleus but returned its Cell fusion and nuclear competition may avoid the formation chloroplast during secondary endosymbiosis, yielding crypto- of complex plastids and the requirement of new protein targeting phytes, haptophytes, and . During the speciation of systems. Complex plastids are believed to be veiled by four Nannochloropsis, it seems that the host nucleus contributed only a membranes originating during secondary endosymbiosis45,46. small portion of the genes, including the rDNA, while the sym- Early studies have demonstrated that translocons (or complexes) biont nucleus contributed a large portion of the genes. In other help to import proteins synthesized on the cytoplasmic ribosomes

COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio 7 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9

Table 4 The numbers of eight categories of flagellum-associated genes of two Nannochloropsis species and their comparison with those (in average) of protists determined early35

Gene category Motile protists Motile small protists Nonmotile protists N. oceanica N. gaditana Gene category Structural gene Tubulin 4 4 4 2 2 Radial spoke 15 15 8 2 1 Central pair 11 10 8 6 4 ODA 19 19 11 6 4 IDA 21 21 16 4 2 Functional gene IFT-A complex 6 6 1 2 0 IFT-B complex 21 20 6 7 4 BBSome 8 8 2 0 0 Total 104 104 55 29 17 Total

Nonmotile species include stramenopiles, haptophytes, diatoms, and green algae; motile species are all dinoflagellates; motile small species include cryptophytes, euglenophytes, stramenopiles, and haptophytes. The genes of N. oceanica are searched against the genome obtained in this study, while those of N. gaditana are searched against the genome published early19 (searching with e-value < e−5 as the threshold of expected hits; filtrated with those of full lengths as candidates and accepted as the protein expected if a candidate shows the highest similarity which is usually extremely less than the e-value threshold for searching and is annotated simultaneously as the expected). ODA, outer dynein arm; IDA, inner dynein arm

Table 5 Identification of known meiosis-specific genes used as a meiosis detection toolkit

Species Meiosis-specific genesa Spo11-2 Mer3 Hop1 Dmc1 Hop2 Mnd1 Msh4 Msh5 Saccharomyces cerevisiae XXXXXXXX Chlamydomonas reinhardtii XXXXXXXX Volvox carteri XXXXXXXX Ectocarpus siliculosus XXXXXXXX Emiliania huxleyi XXXXXXXX Nannochloropsis oceanica XX−− X − XX N. gaditana XX− X − XX X aOf nine meiosis-specific genes51, 52, Rec8 is not identified in two species of genus Nannochloropsis. A homologous gene of Rec8 is found in the genome of N. oceanica; however, it is phylogenetically similar to Rad21, a meiosis associating gene. We believed that a meiosis-specific gene is truly absent and not replaced by an unidentified gene if it exists in a wide taxonomical range of species which include here, for example, a yeast, two green algae, a brown alga, and a haptophyte. These genes are identified by searching against the genomes either downloaded from public databanks or sequenced ourselves (N. oceanica) with BLASTp or tBLASTn and by phylogenetic analysis into the periplastid compartment and the stroma of the chlor- always believed to be a passive process in which the host protist oplast47–49. From the outermost (possibly derived from the host engulfs a photosynthesizing alga and enslaves its plastid but endoplasmic reticulum) to the second outermost (periplastid discards its mitochondria. This widely accepted mechanism membrane) and the inner plastid membranes, Sec61, SELMA, hypothesized that: the newly appearing chloroplast must be and TIC complexes have been suggested to import preproteins covered by four membranes. The outermost membrane possibly across these membranes; however, the TOC complex in the outer originated from the endoplasmic reticulum, while the periplastid plastid membrane has not yet been found among complex plas- membrane possibly originated from the host cellular membrane, tids50. Instead of the TOC complex, a homolog of TOC75 and the two interior membranes were inherited from the first (Omp85) has been demonstrated to mediate the translocation of endosymbiosis; a new system must import proteins synthesized in proteins across the outer membranes of complex plastids51, and a the cytoplasm into the periplastid compartment and the stroma Der complex has been proposed to replace TOC52. In addition, of plastids; and the genes of the enslaved nucleus must undergo the genes encoding SELMA complex components are believed to horizontal transfer to the host nucleus on a large scale. It seems be recycled from the red alga symbiont, and their corresponding that this mechanism has received strong support from the studies proteins are synthesized in the cytoplasm and imported either carried out in some Chromista47–49. However, this mechanism into the periplastid compartment or the plastid stroma. These also has exceptions, including, for example, the absence of the proteins are called preproteins and are marked by a bipartite components of the protein import system in many other species, topogenic signal at their N-terminals. A bipartite topogenic signal including N. oceanica and N. gaditana as analyzed in this study. consists of a signal peptide for crossing Sec61, a transient peptide In our assumption of N. oceanica speciation, the chloroplast of N. for crossing the periplastid membrane and a special amino acid oceanica was enveloped outside by two layers of endoplasmic residue between them for targeting the plastid stroma50. If the reticulum, which is homogeneous to that found in the cytoplasm. two outermost membranes of are derived from the Sec61 and ERAD should be involved in transportation into and endoplasmic reticulum, the evolution of complex preprotein out of the endoplasmic reticulum45. Although some modifications transportation systems can be avoided. As was strongly expected, may exist, the protein transportation system of Nannochloropsis the plastids of N. oceanica and N. gaditana had not evolved a was inherited from the symbiont, e.g., an ancestral single-cell red protein import system similar to that of P. tricornutum. alga. This evolutionary path would readily explain the speciation If N. oceanica and N. gaditana do not need these complexes, of Nannochloropsis. they must have adopted a membrane system different from that The two nuclei fused to make a hybrid nucleus, which then of P. tricornutum during evolution. Secondary endosymbiosis is reduced its genetic information by haploidization. During this

8 COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9 ARTICLE

what we found in this study. With the increase in available genome sequences, we believe that molecular characterization of M M the reproductive strategies of many organisms will become fea- sible and convenient. N N Methods Microalgal cultivation and DNA isolation. The marine microalga N. oceanica was C provided by Key Laboratory of Mariculture of the Chinese Ministry of Education, Ocean University of China. The algae were cultivated in f/2 medium (pH 7.8, salinity 30)54 prepared with natural seawater and autoclaved at 121 °C for 30 min. DNA was isolated with the cetyltrimethylammonium bromide (CTAB) method55.

M M Genome sequencing. The whole genome of N. oceanica was sequenced on the N PacBio Sequel System (https://www.pacb.com/products-and-services/pacbio- systems/sequel/) based on the single-molecule real-time (SMRT) sequencing technology. The template library was constructed using SMRTbell Template Prep Kit 1.0 (product code 100-259-100) and SMRTbell Damage Repair Kit (product C code 100-465-900). Following the procedure described in the PacBio brochure to ™ – N prepare > 20 kb template using BluePippin size-selection system (15 20 kb cutoff) for Sequel™ Systems, the quality DNA was fragmented with g-TUBE (covaries, 520079) and concentrated with AMPure® PB magnetic beads with the fragments eluted with the Pacific Biosciences® Elution Buffer. The fragments were damage- repaired with ExoVII, end-repaired with End Repair Mix and ligated with the blunt adapter. After removing away the failed ligation products with ExoIII and ExoVII, the ligation products were purified twice with AMPure® PB Beads, and subjected to size selection using BluePippin™ Size-Selection System. The yielded fragments were fi M bead-puri ed, damage-repaired, and used as the ~20 kb SMRTbell templates. The templates were annealed with primers and bound to DNA polymerase using the PacBio DNA/Polymerase Kit and magnetic beads, and loaded onto PacBio Sequel™ C Systems to read the sequence of templates. N Generation of short reads for genome correction. In order to collect the Illu- mina paired-end reads, the possible degradation and contamination of genomic DNA were observed in 1% agarose gel, the purity of the genomic DNA was determined on NanoPhotometer® (IMPLEN, CA, USA) and the concentration was measured with Qubit® 2.0 Fluorometer (Life Technologies, CA, USA). The genomic Fig. 4 The hypothetical route of N. oceanica speciation through cellular DNA survived quality control was used to construct the short fragment library fusion and nuclear haploidization. The symbiont nucleus may have following the TruSeq DNA Sample Preparation Guide (Illumina, 15026486 Rev. C). overthrown the host nucleus during speciation of Nannochloropsis. The cell The procedure includes mainly the steps of DNA fragmentation, end-repairing, fusion mode of speciation needs no evolution of new structure and function, base A tailing, adaptors ligation, recovering of DNA with desirable sizes from gel, and PCR amplification of the recovered DNA. The amplification products are used which should aid to fast speciation of Nannochloropsis. Instead of free as the libraries for sequencing once they survived the quality checking. In brief, the floating, the plastid homing in the host cytoplasm may be enveloped by amplification products were quantitated on Qubit2.0, and the size range of the either one or two layers of cytoplasmic reticulum, forming a continuum with amplification products was determined with Agilent 2100. If the fragment size fi nucleus and dividing simultaneously with nucleus ranged as the expected, the library was accurately quanti ed with Bio-RAD CFX 96 real-time quantitative PCR thermocycler and Bio-RAD KIT iQ SYBR GRN Q-PCR thermocycler. The quality library was sequenced on HiSeq X Ten Platform set at PE150 program with paired-end reads obtained. kind of evolution, diverse structures and functions may have been lost through either selection or genetic drift or both, including ™ fl Genome assembling. The reads exported by Sequel Systems were quality eval- motility and sexual reproduction. The agellum plays important uated with the in-built High Quality Region Finder (HQRF) which identifies the roles in competition for nutrients by enabling rapid movement, longest high-quality regain each read generated by a singly-loaded DNA poly- and thus, rapid finding, capture, and ingestion of prey. It also merase according to the ratio of signal to noise. Upon being generated by the helps autotrophic and mixotrophic species sense and receive solar system, all bases were marked with exclamation point in order to make the format perfect. The quality reads (or regions) were marked with 0.8, while the low quality light. It is known that all species in the genus Nannochloropsis are ones were marked with zero. 13 extremely small, structurally simple and nonmotile . The The quality reads obtained were assembled into contigs using Canu (v1.5 advantages and/or disadvantages of these characteristics in evo- https://github.com/marbl/canu)56 by setting the parameters as the following: canu lution remain unclear. However, these characteristics may imply -pacbio-raw nanoc.subreads.bam.fasta -p nanoc -d nanoc -canu genomeSize=40m gridEngineMemoryOption=“-l vf=MEMORY”. Canu uses the all-versus-all the loss of structures, thus marking the reduction of the Nan- overlap information to correct individual reads. It selects these overlaps in a two- nochloropsis genome. In this study, we found that most flagellum- step manor, global and local filtrations. The global filtration identifies the targets associating genes were lost, providing at least one example of such where a read may provide correction support, and the local filtration allows a read change during the speciation of N. oceanica. to accept or reject the correction evidences provided by other reads. At the trimming stage, Canu identifies the region of reads without correction supports, The genome reduction of N. oceanica has also caused the loss trimming or splicing the reads into their longest regions with correction supports. of its other functions, including sexual reproduction. Neither These regions were subjected to a final identification of sequencing errors, and used ploidy variation nor morphological mating and cellular fusion has to construct the best overlap graph as the output contigs and make a summary been documented among the species in the genus Nanno- statistics. fi chloropsis to date. As was expected, the function of sexual The errors in the primary assembly were identi ed and corrected with blasr (v5.1, https://github.com/PacificBiosciences/blasr)57 and Arrow (v2.2.1), a tool reproduction has been lost during the speciation of N. oceanica; built in Smrt Link (https://downloads.pacbcloud.com/public/software/installers/ an incomplete set of meiosis-specific genes were identified in the smrtlink_5.0.1.9585.zip). First, the PacBio reads were mapped to the raw contig genome of N. oceanica. We once tried to determine the existence using blasr with the parameter: --bam --bestn 5 --minMatch 18 --nproc 4 of sexual reproduction among microalgae with their genome --minSubreadLength 1000 --minAlnLength 500 --minPctSimila rity 70 fi --minPctAccuracy70 --hitPolicy randombest --randomSeed 1, and then the sequences available by setting the number of meiosis-speci c consensus sequences were obtained and variants were called via Arrow with the 53 genes ≥ 6 . This early set threshold is roughly in accordance with default parameters.

COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio 9 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9

The consensus genome was subjected to a final round of base-error correction contigs of each chromosomal group into pseudochromosomes. LACHESIS ordered (polish) by referring to the Ilumina reads with BWA (v0.7.9a) and Pilon (v1.22, the contigs by building a graph and extracting the longest path as the trunk (the https://github.com/broadinstitute/pilon)58. The Illumina paired-end reads were highest-confidence order of the contigs in a chromosomal group). Those contigs mapped to the contigs by BWA (parameter, -k 30), and Pilon 1.22 (default excluded by the trunk were reinserted according to the linkage information parameters) uses this alignment to correct the assembly. The quality of the genome between adjacent contigs. The ordered contigs were oriented by building a sequence obtained was further evaluated with BUSCO (v2, http://busco.ezlab.org/)59 weighted, directed and acyclic graph (WDAG)26. The weight was calculated as the (default parameters) based on a benchmark of 303 conserved eukaryota ortholog log-likelihood of Hi-C links between adjacent contigs25, and the orientation each database. contig each chromosomal group was predicted according to the maximum likelihood path through WDAG. N. gaditana may contain 30 chromosomes, one chloroplast, and one Construction of Hi-C library. Algal cells (0.2 g, fresh weight) were harvested 62 = fi mitochondrial chromosomes . Accordingly, we set CLUSTER_N 30 for through centrifugation and xed with 2% formaldehyde to crosslink the DNA and LACHESIS. The tool then full-range scanned the five key parameters, including proteins at a final concentration of 1% (v/v) and at room temperature for 15 min. fi CLUSTER_MIN_RE_SITES [15,2000], CLUSTER_MAX_LINK_DENSITY [1,10], The crosslinking was quenched with glycine at a nal concentration of 0.2 M CLUSTER_NONINFORMATIVE_RATIO [1,v 10], through incubation at room temperature for 5 min. The algal cells were collected ORDER_MIN_N_RES_IN_TRUNK [15,2000], and and resuspended in cold 1× PBS. The cells were rinsed again and the cell pellet was fl − ORDER_MIN_N_RES_IN_SHREDS [15,2000]. All other parameters were set as ash-frozen in liquid nitrogen and stored at 80 °C. the default. After about 2000 trials, a pseudochromosome candidate was picked up The crosslinked algal cells were lysed in Hi-C lysis buffer (10 mM Tris-HCl pH if it contained > 95% of the contig (in total length) and its corresponding trunk 8.0, 10 mM NaCl, 0.2% Igepal CA630) supplemented with the protease inhibitors contained > 90% of the contigs (in total length). A single pseudochromosome was (Sigma, P8340) on ice for > 15 min. The debris (mainly nuclei) was collected considered as the best if it contained 100% of the contig (in total length) and its through centrifugation (2500 × g, 5 min), washed with ice-cold Hi-C lysis buffer, corresponding trunk contained > 93% of the contigs (in total length). Finally, the gently resuspended in 0.5% sodium dodecyl sulfate (SDS) and incubate at 62 °C for – tool yielded 29 nucleic pseudochromosomes, one chloroplast, and one 5 10 min, and then quenched with 10% Triton X-100 (Sigma, 93443) at 37 °C for mitochondrial pseudochromosomes. 15 min. The chromatin was digested with Mbo I (New England Biolabs, R0147) To determine the accuracy of the scaling-up results, we cut the overnight at 37 °C. The chromatin digestion was ceased by incubating at 62 °C for pseudochromosomes predicted by Lachesis into bins with 100 kb equal length and 20 min. fi constructed the heatmap based on the interaction signals that were revealed by The cohesive ends of DNA were lled in and biotin labeled with DNA valid mapped read pairs between bins. The matrix was produced by Hic-Pro and polymerase I Large (Klenow) Fragment (NEB, M0210) and biotin-14-dATP (Life then visualized as the heatmap to show the diagonal patches of strong linkage. Technologies, 19524-016) at 37 °C for 45 min. The DNA fragments were ligated The plastid and mitochondrial genomes were clustered as two contig singletons, proximately with T4 DNA ligase (NEB, B0202) at room temperature for 4 h. and both had long overlapping regions at both ends. This helped us to circularize Crosslink was reversed by degrading the protein with proteinase K (NEB, P8102) their genomes. In addition, gene modeling and annotation characterized sets of when SDS was present at 55 °C for 30 min. genes shared by the chloroplasts and mitochondria of other species. To simplify In order to extract HiC information through high-throughput sequencing using chromosomal drawing, the chloroplast and mitochondrial genomes were linearized Illumina sequencers, the biotinylated DNA was sheared to sizes varying between (Fig. 1a). 300 and 500 bp in length with Covaris LE220 (Covaris, Woburn, MA) (volume 130 μL in a Covaris microTUBE; fill level 10; duty cycle 15; PIP 500; cycles/burst 200; time 58 s). The sheared DNA was eluted and verified to be successfully sheared Identification of repeat sequences. The tandem and interspersed repeats were through agarose gel electrophoresis. The sheared DNA was purified with AMPure recognized and annotated with RepeatProteinMask and RepeatMasker (v1.323, XP beads (Beckman Coulter, A63881). The DNA fragments in the range of http://www.repeatmasker.org/) according to their sequence similarity with the 300–500 bp retained on the beads were eluted, quantified by Qubit dsDNA High deposited ones in Repbase63 (http://www.girinst.org/repbase/) and predicted as Sensitivity Assay (Life Technologies, Q32854), and run on a 2% agarose gel to models with RepeatModeler (open-1.0.8, http://www.repeatmasker.org/ verify successful size selection. RepeatModeler/) and de novo documented and annotated with RepeatMasker The sheared DNA fragments were end-repaired with NEB T4 PNK (NEB, (v1.323, http://www.repeatmasker.org/). All the parameters were default. The M0201), NEB T4 DNA polymerase I (NEB, M0203), and NEB DNA polymerase I tandem repeats were also predicted and annotated directly with TRF software64 Large (Klenow) Fragment (NEB, M0210) at room temperature for 30 min, and built in RepeatMasker. NEB Klenow exo minus (NEB, M0212) at 37 °C for 30 min. The DNA fragments Repeats that were predicted and annotated in different ways were integrated containing biotin labels were pulled down and the biotin from unligated ends were and documented as nonredundant repeats. The tRNAs were predicted using removed away by adding to the reaction the pretreated Dynabeads MyOne tRNAscan-SE65 (http://lowelab.ucsc.edu/tRNAscan-SE/). Other noncoding RNAs, Streptavidin T1 beads (Life Technologies, 65602). The Illumina indexed adapter for example, rRNA, snRNA and miRNA, were identified by homologous searching was added to the DNA fragments attained on the beads, which were further for sequences deposited in the Rfam database (http://rfam.xfam.org/). All the polished with NEB Klenow exo minus (NEB, M0212) and ligated with an Illumina parameters were default. indexed adapter with NEB DNA Quick ligase (NEB, M2200) at room temperature for 15 min. The HiC library was constructed by amplifying the innate one directly off of the Protein-encoding gene modeling and annotation. The genes in the genome assembly were modeled (predicted) following the philosophy described pre- T1 beads through PCR using Illumina primers and protocol, which was separated 66 from the beads, purified with AMpure XP beads and eluted from the beads into 1× viously . Such gene modeling procedure integrated the evidences from ab initio Tris buffer, yielding an in situ HiC library. The library was quantified and gene predictors based on Hidden Markov Models, spliced transcripts from RNA- seq data available currently and orthologous proteins from closely related species. sequenced on an Illumina HiSeq 4000 Platform (San Diego, CA, USA) which reads 67 150 bp paired ends of DNA fragments. Trinity (v2.4.0, default parameters) predicted and joined the potential transcripts. PASA (v2.1, https://github.com/PASApipeline/PASApipeline/wiki)68 extracted the open reading frames (ORFs) from these potential transcripts. These ORFs were HiC information aided scaling-up of the genome assembly. A read was con- directly used to optimize (train) the parameters of Augustus (v3.3, http://bioinf. sidered to be clean if it contained < 5 bases from adaptor; > 50% of bases with uni-greifswald.de/augustus/), SNAP 2006-07-28 (http://archive.broadinstitute.org/ phred quality value of < 19; and < 5% of unknown base (N). The clean reads were mpg/snap/) and GlimmerHMM (v4.33, http://ccb.jhu.edu/software/glimmerhmm/) first aligned on the genome assembly using the bowtie 2 (v2.2.3; http://bowtie-bio. but not GeneMark-ES (v4.33, http://opal.biology.gatech.edu/GeneMark/) which sourceforge.net/bowtie2/index.shtml)60. The parameters used for bowtie were as used the genome assembly with the repetitive sequences marked to obtain an initial the following: --very-sensitive -L 20 --score-min L,−0.6,−0.2 --end-to-end set of gene models69,70. Augustus was optimized for the parameters by eightfold --reorder --rg-id BMG --phred33-quals -p 5. The unmapped reads were mainly cross-validation. SNAP and GlimmerHMM was optimized for the parameters by composed of the chimeric regions spanning across the ligation junction. Following one round of training with the training set. GeneMark-ES did not require an the philosophy of the Hi-C, the ligation site of an unmapped read was detected explicit training data set. Orthologous protein sequences from Nannochloropsis with HiC-Pro (v2.7.8; https://github.com/nservant/HiC-Pro)61 using an exact gaditana20, Phaeodactylum tricornutum71, Chlamydomonas reinhardtii72, and matching procedure and aligned its 5′ fractions back onto the genome. The results Chlorella variabili73 from DOE-JGI (https://jgi.doe.gov) were mapped against the of both mapping steps were merged into a single alignment file, while the low masked genome using tblastn 2.2.28 (https://blast.ncbi.nlm.nih.gov/Blast.cgi)74 (e- mapping quality and multiple hitting reads and singletons were discarded. value = 1e−5), and then Genewise (v2.2.0, https://www.ebi.ac.uk/~birney/wise2/) The genome of N. oceanica was sequenced with the third-generation sequencing was used to obtain the evidence of spliced protein-encoding gene. All the results method. The contigs obtained cannot be scaled up to scaffolds; only one library was were integrated into a single high-confidence gene model set with EVidenceMo- constructed. In this case, contig and scaffold are identical in meaning. The valid deler (v1.1.1, http://evidencemodeler.github.io/)75 using a weight set (PROTEIN interaction pairs were used to build the interaction matrices and scale up the GeneWise 5 ABINITIO_PREDICTION SNAP 1 ABINITIO_PREDICTION primary genome assembly in contigs to chromosome-scale scaffolds (hereafter Augustus 5 ABINITIO_PREDICTION GlimmerHMM 1 ABINITIO_PREDIC- pseudochromosomes) with LACHESIS26 (https://github.com/shendurelab/ TION GeneMark.hmm 1 TRANSCRIPT assembler 10). LACHESIS). The tool first clustered the contigs into chromosomal groups with the These genes were functionally annotated by homologous searching against Swiss- agglomerative hierarchical clustering algorithm and then ordered and oriented the Prot (https://web.expasy.org/docs/swiss-prot_guideline.html), NCBI nt (https://www.

10 COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9 ARTICLE ncbi.nlm.nih.gov/nucleotide/), NCBI nr (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ 13. Kandilian, R., Lee, E. & Pilon, L. Radiation and optical properties of nr.gz), Pfam (http://xfam.org/), eggnog (http://eggnogdb.embl.de/), GO (http:// Nannochloropsis oculata grown under different irradiances and spectra. geneontology.org/page/go-database), and KEGG (http://www.genome.jp/kegg/). Bioresour. Technol. 137,63–73 (2013). 14. Andersen, R. A., Brett, R. W., Potter, D. & Sexton, J. P. Phylogeny of the Phylogenetic analysis. The 18S ribosomal RNA (18S rDNA), ribulose bispho- Eustigmatophyceae based upon 18s rDNA, with emphasis on – sphate carboxylase large chain (rbcL), meiosis-associated and flagellum component Nannochloropsis. Protist 149,61 74 (1998). genes and genome sequences used in this study were retrieved from GenBank 15. Bailey, J. C. & Freshwater, D. W. Molecular systematics of the Gelidiales: (https://www.ncbi.nlm.nih.gov/genbank/). A set of selected proteins or gene Inferences from separate and combined analyses of plastid rbcL and nuclear sequences was aligned with ClustalX (v2.0, http://www.clustal.org)76, and the SSU gene sequences. J. Eur. J. Phycol. 32, 343–352 (1997). phylogenetic tree of such set of sequences was constructed with MrBayes (v3.2.6, 16. Daugbjerg, N. & Andersen, R. A. A molecular phylogeny of the http://nbisweden.github.io/MrBayes/)77. The nuclear and chloroplast genomes algae based on analyses of chloroplast-encoded rbcl sequence data. J. Phycol. selected for microevolution analysis included those of C. merolae78,G. 33, 1031–1041 (1997). sulphuraia 79, C. reinhardtii72, C. variabilis 73, C. paradoxa80, E. huxleyi81, G. 17. Lubian, L. M. et al. Nannochloropsis (Eustigmatophyceae) as source of theta82, P. tricornutum71, A. anophagefferen83, and N. gaditana20 from either NCBI commercially valuable pigments. J. Appl. Phycol. 12, 249–255 (2000). (https://www.ncbi.nlm.nih.gov/) or DOE-JGI (https://jgi.doe.gov) and those of N. 18. Manning, W. M. & Starain, H. H. Chlorophyll d, a green pigment of red algae. oceanica were obtained in this study. The OrthoFinder (v2.2.6, https://github.com/ J. Biol. Chem. 151,1–19 (1943). davidemms/OrthoFinder/releases)84 generated an orthogroup set showing the 19. Carpinelli, E. C. et al. Chromosome scale genome assembly and transcriptome protein homologs among the species selected. The gene trees and species tree were profiling of Nannochloropsis gaditana in nitrogen depletion. Mol. Plant 7, inferred using the OrthoFinder workflow, including BLAST, MAFFT, and FastTree 323–335 (2014). with green algae and Glaucophytes as outgroup. From the orthogroup set of all 20. Radakovits, R. et al. Draft genome sequence and genetic transformation of the species, the orthogroups of N. oceanica, P. tricornutum, and A. anophagefferens oleaginous alga Nannochloropis gaditana. Nat. Commun. 3, 686 (2012). were selected to clarify the similarities of protein homologs among these three 21. Vieler, A. et al. Genome, functional gene annotation, and nuclear species one by one. The protein homologs in orthogroups each were clustered using transformation of the heterokont oleaginous alga Nannochloropsis oceanica 85 ClustalO (v1.2.3, http://www.clustal.org) by limiting the maximum number of CCMP1779. PLoS Genet. 8, e1003064 (2012). sequences in sub-clusters to two. In this way, the most similar protein homologs 22. Liang, C. et al. De novo sequencing and global transcriptome analysis of between N. oceanica and diatom, between N. oceanica and golden alga and between Nannochloropsis sp. (Eustigmatophyceae) following nitrogen starvation. fi diatom and golden alga can be identi ed and counted. Bioenerg. Res. 6, 494–505 (2012). 23. Wang, D. et al. Nannochloropsis genomes reveal evolution of microalgal Reporting summary. Further information on research design is available in oleaginous traits. PLoS Genet. 10, e1004094 (2014). the Nature Research Reporting Summary linked to this article. 24. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009). 25. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions Data availability reveals folding principles of the human genome. Science 326, 289–293 (2009). The N. oceanica genome assembly data are available through the NCBI under project 26. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome number PRJNA503776. This genome assembly has been deposited at DDBJ/ENA/ assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 GenBank under accession number GenBank CP038106-CP03813. (2013). 27. Murakami, R. & Hashimoto, H. Unusual nuclear division in Nannochloropsis Received: 22 October 2018 Accepted: 29 May 2019 oculata (Eustigmatophyceae, Heterokonta) which may ensure faithful transmission of secondary plastids. Protist 160,41–49 (2009). 28. Nishikawa, K. & Tominaga, N. Isolation, growth, ultrastructure, and metal tolerance of the green alga, Chlamydomonas acidophila (Chlorophyta). Biosci. Biotechnol. Biochem. 65, 2650–2656 (2001). 29. Waterhouse, R. M. et al. BUSCO Applications from quality assessments to – References gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543 548 (2018). 30. Simão, F. A. et al. BUSCO: assessing genome assembly and annotation 1. Sukenik, A., Carmeli, Y. & Berner, T. Regulation of composition by completeness with single-copy orthologs. Bioinformatics 31, 3210–3012 irradiance level in the Nannochloropsis sp. J. Phycol. 25, (2015). 686–692 (1989). 31. Bhattacharya, D., Yoon, H. S. & Hackett, J. D. Photosynthetic eukaryotes 2. Boussiba, S. et al. Lipid and biomass production by the halotolerant microalga unite: Endosymbiosis connects the dots. BioEssays 26,50–60 (2004). Nannochloropsis salina. Biomass 12,37–47 (1987). 32. Yoon, H. S., Hackett, J. D., Pinto, G. & Bhattacharya, D. The single, ancient 3. Galloway, R. E. Selective condition and isolation of mutants in salt-tolerant, origin of chromist plastids. Proc. Natl Acad. Sci. USA 99, 15507–15512 (2002). lipid-producing microalgae. J. Phycol. 26, 752–260 (1990). 33. Stork, S. et al. Distribution of the SELMA translocon in secondary plastids of 4. Kilian, O., Benemann, C. S. E., Niyogi, K. K. & Vick, B. High-efficiency red algal origin and predicted uncoupling of ubiquitin-dependent homologous recombination in the oil-producing alga Nannochloropsis sp. translocation from degradation. Eukaryot. Cell 11, 1472–1481 (2012). Proc. Natl Acad. Sci. USA 108, 21265–21269 (2011). 34. Hirakawa, Y., Burki, F. & Keeling, P. J. Genome-based reconstruction of the 5. Pan, K. H. et al. Nuclear monoploidy and asexual propagation of protein import machinery in the secondary plastid of a Chlorarachniophyte Nannochloropsis oceanica (Eustigmatophyceae) as revealed by its genome alga. Eukaryot. Cell 11, 324–333 (2012). sequence. J. Phycol. 47, 1425–1432 (2011). 35. Jang, S. H., Jeong, H. J., Chon, J. K. & Lee, S. Y. De novo assembly and 6. Verruto, J. et al. Unrestrained markerless trait stacking in Nannochloropsis characterization of the transcriptome of the newly described dinoflagellate gaditana through combined genome editing and marker recycling Ansanella granifera: Spotlight on flagellum-associated genes. Mar. Genom. 33, technologies. Proc. Natl Acad. Sci. USA 115, E7015–E7022 (2018). 47–55 (2017). 7. Wang, Q. et al. Genome editing of model oleaginous microalgae 36. Charlesworth, D. Plant sex chromosome evolution. J. Exp. Bot. 64, 405–420 Nannochloropsis sp. by CRISPR/Cas9. Plant J. 88, 1071–1081 (2016). (2013). 8. Weeks, D. P. Homologous recombination in Nannochloropsis: a powerful tool 37. Charlesworth, D. Plant sex chromosomes. Annu. Rev. Plant Biol. 67, 397–420 in an industrially relevant alga. Proc. Natl Acad. Sci. USA 108, 20859–20860 (2016). (2011). 38. Loidl, J. Conservation and variability of meiosis across the eukaryotes. Annu. 9. Gee, C. W. & Niyogi, K. K. The carbonic anhydrase CAH1 is an essential Rev. Genet. 50, 293–316 (2016). component of the carbon-concentrating mechanism in Nannochloropsis 39. Bachtrog, D. et al. Are all sex chromosomes created equal? Trends Genet. 27, oceanica. Proc. Natl Acad. Sci. USA 114, 4537–4542 (2017). 350–357 (2011). 10. Hibberd, D. J. Notes on the and nomenclature of the algal classes 40. Merlini, L., Dudin, O. & Martin, S. G. Mate and fuse: how yeast cells do it. Eustigmatophyceae and Tribophyceae (synonym Xanthophyceae). Bot. J. Open Biol. 3, 130008 (2013). Linn. Soc. 82,93–119 (1981). 41. Russo, M. T. et al. MRP3 is a sex determining gene in the diatom Pseudo- 11. Fawley, K. P. & Fawley, M. W. Observations on the diversity and of nitzschia multistriata. Nat. Commun. 9, 5050 (2018). freshwater Nannochloropsis (Eustigmatophyceae), with descriptions of new 42. Malik, S. B. et al. An expanded inventory of conserved meiotic genes provides taxa. Protist 158, 325–336 (2007). evidence for sex in Trichomonas vaginalis. PLoS ONE 3, e2879 (2008). 12. Fawley, M. W., Jameson, I. & Fawley, K. P. The phylogeny of the genus 43. Schurko, A. M. & Logsdon, J. M. Jr. Using a meiosis detection toolkit to Nannochloropsis (Monodopsidaceae, Eustigmatophyceae), with descriptions of investigate ancient asexual ‘scandals’ and the evolution of sex. BioEssays 30, N. australis sp. nov. and Microchloropsis gen. nov. Phycologia 54, 545–552 579–589 (2008). (2015).

COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio 11 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-019-0500-9

44. Cavalier-Smith, T. The Chromista, origin and systematics. In 75. Haas, B. J. et al. Automated eukaryotic gene structure annotation using Progress in Phycological Research Vol. 4 (eds Round, F. E. & Chapman, D. J.) EVidenceModeler and the program to assemble spliced alignments. Genome 309–347 (Elsevier, Amsterdam, Netherlands, 1986). Biol. 9, R7 (2008). 45. Gould, S. B., Maier, U.-G. & Martin, W. F. Protein import and the origin of 76. Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, red complex plastids. Curr. Biol. 25, R515–R521 (2015). 2947–2948 (2007). 46. Agrawal, S. & Striepen, B. More membranes, more proteins: complex protein 77. Ronquist, F. et al. MRBAYES 3.2: efficient Bayesian phylogenetic inference import mechanisms into secondary plastids. Protist 161, 672–687 (2010). and model selection across a large model space. Syst. Biol. 61, 539–542 (2012). 47. Gentil, J. et al. Origin of complex algae by secondary endosymbiosis: a journey 78. Nozaki, H. et al. A 100%-complete sequence reveals unusually simple genomic through time. Protoplasma 254, 1835–1843 (2017). features in the hot-spring red alga Cyanidioschyzon merolae. BMC Biol. 5,28 48. Gross, J. & Bhattacharya, D. Revaluating the evolution of the Toc and Tic (2007). protein translocons. Trends Plant Sci. 14,13–20 (2008). 79. Schonknecht, G. et al. Gene transfer from and archaea facilitated 49. Maier, U. G., Zauner, S. & Hempel, F. Protein import into complex plastids: evolution of an extremophilic . Science 339, 1207–1210 (2013). cellular organization of higher complexity. Eur. J. Cell Biol. 94, 340–348 80. Price, D. C. et al. Cyanophora paradoxa genome elucidates origin of (2015). in algae and plants. Science 335, 843–847 (2012). 50. Sommer, M. S. et al. Der1-mediated preprotein import into the periplastid 81. Read, B. A. et al. Pan genome of the Emiliania underpins its compartment of Chromalveolates? Mol. Biol. Evol. 24, 918–928 (2007). global distribution. Nature 499, 209–213 (2013). 51. Bullmann, L. et al. Filling the gap, evolutionarily conserved Omp85 in plastids 82. Curtis, B. A. et al. Algal genomes reveal evolutionary mosaicism and the fate of Chromalveolates. J. Biol. Chem. 285, 6848–6856 (2010). of nucleomorphs. Nature 492,59–65 (2012). 52. Tonkin, C. J., Kalanon, M. & McFadden, G. I. Protein targeting to the malaria 83. Gobler, C. J. et al. Niche of harmful alga Aureococcus anophagefferens revealed parasite plastid. Traffic 9, 166–175 (2008). through ecogenomics. Proc. Natl Acad. Sci. USA 108, 4352–4357 (2011). 53. Guo, L. & Yang, G. Predicting the reproduction strategies of several 84. Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole microalgae through their genome sequences. J. Ocean Univ. China 13,1–12 genome comparisons dramatically improves orthogroup inference accuracy. (2014). Genome Biol. 16, 157 (2015). 54. Guillard, R. R. L. & Ryther, J. H. Studies of marine planktonic diatoms. I. 85. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple Cyclotella nana Hustedt and Detonula confervacea Cleve. Can. J. Microbiol. 8, sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011). 229–239 (1962). 55. Couch, J. A. & Fritz, P. J. Isolation of DNA from plants high in polyphenolics. Plant Mol. Biol. Rep. 8,8–12 (1990). Acknowledgements 56. Koren, S. et al. Canu: Scalable and accurate long-read assembly via adaptive k- This study was financially supported by National Key R&D Program of China mer weighting and repeat separation. Genome Res. 27, 722–736 (2017). (2018YFD0900305 and 2018YFD0901506), The Marine S & T Fund of Shandong Pro- 57. Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using vince for Pilot National Laboratory for Marine Science and Technology (Qingdao) (2018 basic local alignment with successive refinement (BLASR): application and SDKJ0406-3), and The Fundamental Research Funds for The Central Universities theory. BMC Bioinformatics 13, 238 (2012). (201762017). The authors sincerely thank Weifei Yang, Yezhen Chen, Guori Gao, 58. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial Sanyang Liu, Tao Yan, Ning Wang, Yanfang Gu, Yunlong Zhang, Ying Xin, Yuanyuan variant detection and genome assembly improvement. PLoS ONE 9, e112963 Li, and Jiao Zhang of ANOROAD GENOME (Beijing, China) for their assistances in (2014). sequencing the genome and suggestions and assistances in data processing. 59. Simão, F. A. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 Author contributions (2015). L.G. carried out culture and collected the algae, processed the genome data, and drafted 60. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. the paper. As the corresponding author, G.Y. contributed to designing and coordinating Methods 9, 357 (2012). of the project, and writing the paper; J.X. and K.P., two professors who work on N. 61. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data oceanica, discussed the topics described in this paper and provided their assistance in processing. Genome Biol. 16, 259 (2015). paper writing and data processing. X.R. and S.P. contributed to the SMRT sequencing 62. Corteggiani Carpinelli, E. et al. Chromosome scale genome assembly and and data processing. Other authors including S.L., Z.Z., H.L., and S.W. contributed to the transcriptome profiling of Nannochloropsis gaditana in nitrogen depletion. microalgal culture, data processing among others. Mol. Plant 7, 323–335 (2014). 63. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics 25, Additional information 4.10.1–4.10.14 (2009). Supplementary information accompanies this paper at https://doi.org/10.1038/s42003- 64. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. 019-0500-9. Nucleic Acids Res. 27, 573–580 (1999). 65. Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: search and contextual Competing interests: The authors declare no competing interests. analysis of transfer RNA genes. Nucleic Acids Res. 44, W54–W57 (2016). 66. Salojärvi, J. et al. Genome sequencing and population genomic analyses Reprints and permission information is available online at http://npg.nature.com/ provide insights into the adaptive landscape of silver birch. Nat. Genet. 49, reprintsandpermissions/ 904–912 (2017). 67. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). (7). published maps and institutional affiliations. 68. Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003). Open Access This article is licensed under a Creative Commons 69. Majoros, W. H. et al. TigrScan and GlimmerHMM: two open source ab initio Attribution 4.0 International License, which permits use, sharing, eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004). adaptation, distribution and reproduction in any medium or format, as long as you give 70. Stanke, M. et al. AUGUSTUS: a web server for gene finding in eukaryotes. appropriate credit to the original author(s) and the source, provide a link to the Creative Nucleic Acids Res. 32, W309–W312 (2004). Commons license, and indicate if changes were made. The images or other third party 71. Bowler, C. et al. The Phaeodactylum genome reveals the evolutionary history material in this article are included in the article’s Creative Commons license, unless of diatom genomes. Nature 456, 239–244 (2008). indicated otherwise in a credit line to the material. If material is not included in the 72. Merchant, S. S. et al. The Chlamydomonas genome reveals the evolution of article’s Creative Commons license and your intended use is not permitted by statutory – key animal and plant functions. Science 318, 245 250 (2007). regulation or exceeds the permitted use, you will need to obtain permission directly from 73. Blanc, G. et al. The Chlorella variabilis NC64A genome reveals adaptation to the copyright holder. To view a copy of this license, visit http://creativecommons.org/ photosymbiosis, coevolution with viruses, and cryptic sex. Plant Cell 22, licenses/by/4.0/. 2943–2955 (2010). 74. McGinnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32, W20–W25 (2004). © The Author(s) 2019

12 COMMUNICATIONS BIOLOGY | (2019) 2:249 | https://doi.org/10.1038/s42003-019-0500-9 | www.nature.com/commsbio