The genome of Prasinoderma coloniale unveils the existence of a third phylum within green

Li, Linzhou; Wang, Sibo; Wang, Hongli; Sahu, Sunil Kumar; Marin, Birger; Li, Haoyuan; Xu, Yan; Liang, Hongping; Li, Zhen; Cheng, Shifeng; Reder, Tanja; Çebi, Zehra; Wittek, Sebastian; Petersen, Morten; Melkonian, Barbara; Du, Hongli; Yang, Huanming; Wang, Jian; Wong, Gane Ka-Shu; Xu, Xun; Liu, Xin; Van de Peer, Yves; Melkonian, Michael; Liu, Huan Published in: Nature Ecology and Evolution

DOI: 10.1038/s41559-020-1221-7

Publication date: 2020

Document version Publisher's PDF, also known as Version of record

Document license: CC BY

Citation for published version (APA): Li, L., Wang, S., Wang, H., Sahu, S. K., Marin, B., Li, H., ... Liu, H. (2020). The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants. Nature Ecology and Evolution, 4(9), 1220- 1231. https://doi.org/10.1038/s41559-020-1221-7

Download date: 10. Sep. 2020 Articles https://doi.org/10.1038/s41559-020-1221-7

The genome of Prasinoderma coloniale unveils the existence of a third phylum within green plants

Linzhou Li1,2,13, Sibo Wang1,3,13, Hongli Wang1,4, Sunil Kumar Sahu 1, Birger Marin 5, Haoyuan Li1, Yan Xu1,4, Hongping Liang1,4, Zhen Li 6, Shifeng Cheng1, Tanja Reder5, Zehra Çebi5, Sebastian Wittek5, Morten Petersen3, Barbara Melkonian5,7, Hongli Du8, Huanming Yang1, Jian Wang1, Gane Ka-Shu Wong 1,9, Xun Xu 1,10, Xin Liu 1, Yves Van de Peer 6,11,12 ✉ , Michael Melkonian5,7 ✉ and Huan Liu 1,3 ✉

Genome analysis of the pico-eukaryotic marine green alga Prasinoderma coloniale CCMP 1413 unveils the existence of a novel phylum within green plants (), the Prasinodermophyta, which diverged before the split of and . Structural features of the genome and gene family comparisons revealed an intermediate position of the P. colo- niale genome (25.3 Mb) between the extremely compact, small genomes of picoplanktonic (Chlorophyta) and the larger, more complex genomes of early-diverging streptophyte algae. Reconstruction of the minimal core genome of Viridiplantae allowed identification of an ancestral toolkit of transcription factors and flagellar proteins. Adaptations of P. coloniale to its deep-water, oligotrophic environment involved expansion of light-harvesting proteins, reduction of early light-induced proteins, evolution of a distinct type of C4 photosynthesis and carbon-concentrating mechanism, synthesis of the metal-complexing metabolite picolinic acid, and vitamin B1, B7 and B12 auxotrophy. The P. coloniale genome provides first insights into the dawn of green evolution.

ne of the most important biological events in the history later studies20. M. viride is now recognized as an early-diverging of life was the successful colonization of the terrestrial member of the Streptophyta21,22. While the majority of the Olandscape by green plants (Viridiplantae) that paved the early-diverging lineages in the Chlorophyta consisted of (mostly way for terrestrial animal evolution, altering geomorphology marine) scaly flagellates, some lineages were represented by very and changes in the Earth’s climate1–3. The Viridiplantae comprise small, non-flagellate unicells often surrounded by cell walls23,24. perhaps 500,000 , ranging from the smallest to the largest One of these lineages, provisionally termed ‘Prasinococcales’23 eukaryotes4,5. Divergence time estimates from molecular data sug- (clade VI), could not be reliably positioned in phylogenetic gest that Viridiplantae may be close to 1 billion years old6,7. All trees24,25. A major step forward was made when it was discovered extant green plants are classified in either of two divisions/phyla, that an enigmatic, non-cultured group of deep-water, oceanic Chlorophyta and Streptophyta, which differ structurally, bio- macroscopic algae of palmelloid organization comprising the chemically and molecularly8–12. The Streptophyta contain the land genera Verdigellas and Palmophyllum formed a deeply diverg- plants () and a paraphyletic assemblage of algae ing lineage of Viridiplantae that included the Prasinococcales26. known as the streptophyte algae, whereas all other green algae Later, the class was established for these comprise the Chlorophyta. The reconstruction of phylogenetic organisms as the first divergence in Chlorophyta, that is sister to relationships across green plants using transcriptomic or genomic all other Chlorophyta27. Phylogenies based on nuclear-encoded data provided evidence that unicellular, often scaly, flagellate ribosomal RNA genes (4,579 positions), however, placed organisms were positioned near the base of the radiation in both Palmophyllophyceae as the earliest divergence in Viridiplantae, phyla13–16, corroborating earlier proposals based on ultrastruc- but monophyly of Chlorophyta + Streptophyta to the exclusion of tural analyses that the common ancestor of all green plants may Palmophyllophyceae, received no support in these analyses27. have been a scaly flagellate17,18. The search for an extant relative To date, genomic resources for the Palmophyllophyceae have of such a flagellate, however, has been in vain, although an initial been limited to organelle genomes. Here we present the first report suggested that viride diverged before the split nuclear genome sequence of a unicellular member of this lineage, of Chlorophyta and Streptophyta19, a result not corroborated by Prasinoderma coloniale (Fig. 1a). Based on phylogenomic analyses,

1State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen, China. 2Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark. 3Department of Biology, University of Copenhagen, Copenhagen, Denmark. 4BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China. 5Institute for Plant Sciences, Department of Biological Sciences, University of Cologne, Cologne, Germany. 6Department of Plant Biotechnology and Bioinformatics (Ghent University) and Center for Plant Systems Biology, Ghent, Belgium. 7Central Collection of Algal Cultures, Faculty of Biology, University of Duisburg-Essen, Essen, Germany. 8School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China. 9Department of Biological Sciences and Department of Medicine, University of Alberta, Edmonton, Alberta, Canada. 10Guangdong Provincial Key Laboratory of Genome Read and Write, BGI-Shenzhen, Shenzhen, China. 11College of Horticulture, Nanjing Agricultural University, Nanjing, China. 12Centre for Microbial Ecology and Genomics, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa. 13These authors contributed equally: Linzhou Li, Sibo Wang. ✉e-mail: [email protected]; [email protected]; [email protected]

1220 Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol Nature Ecology & Evolution Articles

a c

Arabidopsis thaliana 0.05 91/57/1 Pinus taeda 69/-/- Angiopteris lygodiifolia/fokiensis Embryophyta Equisetum arvense/diffusum Huperzia selago/lucidula Tracheophytes Physcomitrella patens Polytrichum juniperinum Diphyscium fulvifolium 95/58/1 Takakia lepidozioides Marchantia polymorpha Treubia lacunosa Liverworts Nardia compressa Nothoceros vincentianus Phaeoceros sp. Cosmarium ochthodes 97/100/1 Penium margaritaceum Netrium digitus 57/–/– Mougeotia sp. 56/100/- 68/63/1 Mesotaenium caldariorum Cylindrocystis cushleckae Cylindrocystis brebissonii Mesotaenium endlicherianum 94/ Chaetosphaeridium globosum 76/1 Coleochaete scutata Coleochaetophyceae Nitella hyalina Chara braunii Klebsormidium nitens Entransia fimbriata Klebsormidiophyceae condensata atmophyticus Streptophyta Mesostigma viride Mesostigmatophyceae

Ulva mutabilis 10 µm 95/89/1 Desmochloris halophila 100/93/1 Ignatius tetrasporus Chlorophyta Oltmannsiellopsis sp. carteri 81/81/1 94/86/1 sp. 97/96/1 unicocca Chlamydomonas reinhardtii Spermatozopsis exsultans –/70/– Hydrodictyonre ticulatum b Chromochloris zofingiensis Stigeoclonium helveticum 89/99/1 Tetraselmis cordiformis Arabidopsis thaliana 94/98/1 Tetraselmis striata Chlorodendrophyceae 100/1 Scherffelia dubia Stichococcus bacillaris 100/1 Chara braunii Koliella longiseta Streptophyta subellipsoidea 100/1 Klebsormidium nitens Botryococcus braunii Auxenochlorella protothecoides variabilis

Chlorokybus atmophyticus 85/98/1 Chlorella vulgaris 100/1 minor Mesostigma viride Marsupiomonas pelliculata Pedinophyceae 94/96/1 Chloropicon primus Chloropicophyceae/Pseudoscourfieldiales 82/99/1 provasolii 0.1 100/1 Volvox carteri Nephroselmis pyriformis Nephroselmis olivacea Nephroselmidophyceae Gonium pectorale Micromonas bravo 56/83/0.98 100/1 Micromonas commoda Mamiellophyceae 100/1 Chlamydomonas reinhardtii Mamiella gilva Ostrecoccus tauri 96/98/1 100/1 100/1 95/91/1 Ostreococcus lucimarinus Chromochloris zofingiensis Bathycoccus prasinos Dolichomastix tenuilepis Ulva mutabilis 75/55/1 Crustomastix stigmatica Monomastix opisthostigma 100/1 Monomastix sp. Auxenochlorella protothecoides 96/98/1 Pyramimona s parkeae 100/1 Pyramimona s olivacea Chlorella variabilis Chlorophyta Pyramimona s tetrarhynchus 100/1 Cymbomona s tetramitiformis Viridiplantae Coccomyxa subellipsoidea Ostreococcus tauri Verdigellas peltata Palmophyllo- 82/100/1 Palmophyllum crassum 100/1 phyceae Prasinodermo- 100/1 100/1 Ostreococcus lucimarinus capsulatus Prasinoderma coloniale phyta Bathycoccus prasinos Prasinoderma singularis Prasinodermo- 100/1 Prasinoderma sp. MBIC10622 phyceae 100/1 Micromonas pusilla 100/1 100/1 Micromonas commoda Cyanophora paradoxa 54/100/1 Glaucocystis nostochinearum Monomastix opisthostigma (1KP) Cyanoptyche gloeocystis Glaucoplantae 62/1 Gloeochaete wittrockiana 100/1 Dolichomastix tenuilepis (1KP) Rhodogorgon sp. 100/1 Sporolithon durum parkeae (1KP) Porolithon onkodes Helminthocladia australis 100/1 sp. (1KP) Viridiplantae Palmaria palmata Ahnfeltia plicata Florideophyceae 66/56/1 79/96/1 Chondrus crispus Prasinoderma coloniale 82/–/1 Gracilariopsis chorda Schizymenia dubyi 100/1 Asparagopsis taxiformis Prasinoderma singularis (MMETSP) Porphyra umbilicalis 100/1 97/99/1 Porphyra purpurea Prasinodermophyta Bangia fuscopurpurea Prasinococcus capsulatus (1KP) 67/–/1 Pyropia yezoensis Rhodoplantae Madagascaria erythrocladiodes Cyanophora paradoxa Glaucoplantae Rhodochaete parvula/pulchella

92/93/1 Compsopogon caeruleus 83/96/1 Erythrolobus australicus Porphyra umbilicalis Timspurckia oligopyrenoides 81/73/1 Porphyridium aerugineum 100/1 Rhodosorus marinus UTEX2760 Chondrus crispus Rhodoplantae Rhodosorus marinus CCMP769 100/1 cf.Bangiopsis sp. CCMP1999 75/73/1 Rhodella maculata Cyanidioschyzon merolae Galdieria sulphuraria 57/–/0.96 Cyanidioschyzon merolae

Fig. 1 | Phylogenetic analysis of P. coloniale. a, Light micrograph of P. coloniale. b, The phylogenetic tree was constructed using the maximum-likelihood method in RAxML and MrBayes based on a concatenated sequence alignment of 256 single-copy genes (500 bootstraps). c, The basal divergence of the new phylum Prasinodermophyta, as revealed by analyses of complete nuclear- and plastid-encoded rRNA operons from 109 . The rRNA dataset comprised 8,818 aligned positions and contained representatives of all major lineages of Rhodoplantae (seven classes), Glaucoplantae (four genera) and Viridiplantae (three divisions with several classes) including embryophytes. Shown is the RAxML phylogeny (GTRGAMMA model); the three support values at branches are RAxML/IQ-TREE bootstrap percentages/Bayesian posterior probabilities. Bold branches received maximal support (100/100/1).

we establish a new phylum for this group, the Prasinodermophyta, could be mapped to the assembled genome. P. coloniale has a GC with two classes, as the earliest divergence of the Viridiplantae. The content of 69.8%, while 6.51% of the genome consists of repeats genome of P. coloniale provided new insights into pico-eukaryotic (Supplementary Fig. 3, Supplementary Table 3 and Extended Data biology near the dawn of green plant evolution. Fig. 1). A total of 7,139 protein-coding genes were annotated, of which 6,996 were supported by the transcriptome. Additionally, Results and discussion 6,759 (94.7%) genes were annotated from known protein databases Genome sequencing and characteristics. The genome size of (Supplementary Table 4). P. coloniale was estimated to be about 26.04 Mb. After reads fil- tering (7.4 Gb PacBio data) and self-correction, a 25.3 Mb Phylogenetic analyses and Prasinodermophyta div. nov. Phylo­ genome was de novo assembled consisting of 22 chromosomes, genetic analyses of P. coloniale were performed with two different including the complete chloroplast and mitochondrial genomes taxon and datasets. (1) Both maximum-likelihood and Bayesian (Supplementary Figs. 1 and 2). The sizes of the individual chromo- trees were constructed from an alignment of 256 orthologues of somes varied from 0.45 to 3.60 Mb. BUSCO analysis showed a high single-copy nuclear genes from 28 taxa of Archaeplastida, showing degree of completeness of the genome, with 282 out of 303 (93.1%) that P. coloniale (and the related Prasinococcus capsulatus) diverged complete eukaryotic universal genes (Supplementary Table 1). before the split of Streptophyta and Chlorophyta (Fig. 1b). All inter- Additionally, 99.38% (Supplementary Table 2) of the transcriptome nal branches in the tree received maximal/nearly maximal support,

Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol 1221 Articles Nature Ecology & Evolution and the monophyly of Streptophyta + Chlorophyta to the exclusion were equally distributed among Streptophyta- and Chlorophyta- of P. coloniale and P. cap su l atu s received 88% bootstrap and 1.0 specific genes (Fig. 2b). Principal component analysis (PCA) posterior probability support. A phylogenetic tree constructed from showed that early-diverging Chlorophyta (Mamiellophyceae), 31 mitochondrial genes of 19 taxa of Archaeplastida also revealed streptophyte algae (Mesostigmatophyceae, Klebsormidiophyceae P. coloniale as the earliest divergence in Viridiplantae (Supplemen­ and Charophyceae), Glaucoplantae and Rhodoplantae form four tary Fig. 4). (2) We increased the taxon sampling to 109 taxa of separate clusters with P. coloniale in an isolated position, which also Archaeplastida, including six sequences of Palmophyllophyceae27 further supports its classification as a new and independent clade— comprising nuclear- and plastid-encoded rRNA operons. The that is, the Prasinodermophyta div. nov. (Fig. 2c) phylogeny corroborated the multi-protein phylogeny because Furthermore, a comparative analysis on structural genomic Palmophyllophyceae again diverged before the split of Chlorophyta features showed a trend of gradually increasing average intron and Streptophyta (Fig. 1c). Separate phylogenies of nuclear- and length and decreasing average exon length from P. coloniale plastid-encoded rRNA operons gave congruent results, although to early-diverging streptophytes, and the opposite trend from support values were generally lower (Supplementary Figs. 5 and 6). P. coloniale to early-diverging Chlorophyta was observed (Fig. 2d). The summary coalescent method ASTRAL gave inconclusive In addition, the genome size, gene size, gene spacing distance results (Supplementary Table 5 and Supplementary Figs. 7 and 8), and total and average exon numbers exhibited a similar pat- and taxon sampling was sensitive to long-branch attraction28 tern with early-diverging Chlorophyta (Extended Data Fig. 3 and (Supplementary Fig. 9 and Extended Data Fig. 2). The former tree is Supplementary Table 7). However, the P. coloniale genome contains corroborated by a recent phylotranscriptomic analysis of 1,090 viri- 41% coding sequences, higher than the early-diverging strepto- diplant species in which the placement of three Palmophyllophyceae phytes but considerably lower than early-diverging Chlorophyta. In was unstable in ASTRAL trees but resolved as the basal divergence summary, the structural characteristics of the P. coloniale genome of Viridiplantae in concatenated trees29. Previous plastome phylog- revealed its intermediate position between the extremely compact enies placed Palmophyllophyceae either as the earliest divergence and small genomes of picoplanktonic early-diverging Chlorophyta34 within Chlorophyta, sister to all other Chlorophyta27, or in an unre- and the larger and structurally more complex genomes of solved position among Chlorophyta15. Plastome phylogenies are early-diverging streptophytes35,36. limited by the dataset (70–80 plastid-encoded genes) but also suf- fer from introgression of the plastid from one species to another, Analysis of transcription factors in P. coloniale. In total, 55 of recombination and gene conversion, as well as differential selective 114 types of TF/TR genes were identified in the P. coloniale genome pressures acting on protein-coding plastid genes, which may also (Supplementary Table 8). Although all 55 types of transcription introduce biases and lead to incongruent gene and species trees30–33. factor/transcription regulator (TF/TR) genes of P. coloniale were For example, unlike nuclear trees, some studies have failed to recover also found in Chlorophyta and early-diverging streptophyte algae, Ulvophyceae, Trebouxiophyceae and Pedinophyceae as mono- considerably lower numbers of TF/TR genes (201) were identi- phyletic groups27 or Mesostigmatophyceae within Streptophyta15. fied in P. coloniale compared to Chlorophyta and Streptophyta Based on their phylogenetic positions (Fig. 1b,c, Supplementary (Supplementary Table 9). Figs. 4–6 and Extended Data Fig. 2), gene family comparisons and Among the 55 types of TF/TR genes, the majority (50) are also molecular synapomorphies, we here propose a new division/phy- present in Glaucoplantae and/or Rhodoplantae, suggesting that lum for the Palmophyllophyceae sensu27, the Prasinodermophyta these constituted the basic TF/TR toolbox in the common ances- div. nov. with two classes, Prasinodermophyceae class. nov. and tor of Archaeplastida. However, five TF/TR types of P. coloniale Palmophyllophyceae emend (Supplementary Data 1). (C2C2-Dof, WRKY, SBP, GARP_ARR-B and TAZ) were presumably gained in the common ancestor of the Viridiplantae since they are Comparison of gene families among Archaeplastida. The phy- absent in both Glaucoplantae and Rhodoplantae (Supplementary logenetic placement of Prasinodermophyta as a sister group to all Table 8). WRKY proteins are key regulators of development, car- other Viridiplantae provided a unique opportunity to reconstruct bohydrate synthesis, senescence and responses to biotic and abiotic the minimum core genome of Viridiplantae, and to compare the stresses in embryophytes37. Using newly retrieved WRKY sequences, genome of P. coloniale to those of early-diverging Streptophyta, we confirmed the presence of eight well-supported WRKY domain Chlorophyta and the Glaucoplantae, to identify plesiomorphic and subgroups in Viridiplantae (Extended Data Fig. 4). The number of apomorphic traits. In total, 4,052 orthogroups are shared among gene copies with WRKY domains and the divergent sequences of Chlorophyta and Streptophyta, of which 3,292 are also shared the N-terminal WRKY domains in P. coloniale may be related to with P. coloniale. If the orthogroups shared uniquely by P. coloniale its picoplanktonic lifestyle and/or low-light environment (the pico- with either Micromonas commoda (621) or Chlorokybus atmoph- planktonic Mamiellophyceae generally also display more than one yticus (179) are added, 4,092 orthogroups represent the minimal WRKY gene copy; Supplementary Table 8). core genome of Viridiplantae (Fig. 2a). A total of 1,356 unique The type-B phospho-accepting response regulator (GARP_ orthogroups were found in P. coloniale, mainly involved in bio- ARR-B) family modulates plastid biogenesis, circadian clock oscil- logical process categories such as photosynthesis-antenna proteins, lation, cytokinin signalling and control of the phosphate starvation plant–pathogen interaction and plant hormone signal transduc- response in plants38. Since many genes of the cytokinin biosynthesis tion (Supplementary Table 6). Thus, it is reasonable to expect that and signalling pathways are lacking in P. coloniale (Supplementary these unique biological traits reflect adaptations of P. coloniale to its Table 10), these response regulators may be involved in other func- deep-water/low-light, oligotrophic habitat. tions. Finally, the evolution of the SQUAMOSA promotor-binding protein (SBP)-box TF was previously suggested to predate the split Comparative genomics of P. coloniale with early-diverging of Streptophyta and Chlorophyta39. SBP-box TFs have diverse spe- Viridiplantae. About 38.5% of the P. coloniale genes gave best hits cialized functions in embryophytes, but in green algae they may with Chlorophyta, while a similar percentage (33.9%) gave best hits be involved in more basic functions such as regulation of trace with Streptophyta, supporting an equidistant relationship between metal homeostasis40. The C2C2-Dof (DNA binding with one fin- P. coloniale and Streptophyta and Chlorophyta (Supplementary ger) TFs have been implicated in light control of zygote germina- Fig. 10). P. coloniale, along with some representative early-diverging tion in Chlamydomonas reinhardtii41 and apparently originated Viridiplantae, showed a very similar percentage of Viridiplantae also in the common ancestor of Viridiplantae. Besides the five TF/ genes (commonly shared). The remaining proteins of P. coloniale TR gains, expansion in gene copy numbers was observed in only

1222 Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol Nature Ecology & Evolution Articles

a c C. atmophyticus M. commoda O. tauri Early Streptophyta O. lucimarinus M. commoda Early Chlorophyta M. pusilla B. prasinos P. coloniale 3,290 3,728 Rhodoplantae 0.2 (3,318) (3,730) Glaucoplantae 320 432 (720) 621 (1,162) (1,385) P. coloniale 0 440 682 19,205 (1,733) (2,336) 1,356 C. atmophyticus (19,315) 2,610 (1,360) K. nitens

PC2 (17% variance) C. merolae M. endlicherianum 226 (13,647) 179 0.2 M. viride C. crispus (594) 371 245 (393) (1,422) (940) C. braunii P. umbilicalis C. paradoxa 113 P. coloniale (318) C. paradoxa 0.2 0 0.2 0.4 PC1 (25% variance)

b d Exon O. tauri O. tauri Intron O. lucimarinus O. lucimarinus

B. prasinos B. prasinos

M. commoda M. commoda Early Chlorophyta Early Chlorophyta M. pusilla M. pusilla

P. coloniale P. coloniale

C. atmophyticus C. atmophyticus

M. viride M. viride

K. nitens K. nitens

M. endlicherianum M. endlicherianum Early Streptophyta Early Streptophyta C .braunii C .braunii

0 25 50 75 100 2.5 5.0 7.5 10.0 12.5 Percentage of proteins Ln(exon/intron length) Both Early Streptophyta Early Chlorophyta

Fig. 2 | Comparative analysis of P. coloniale and other Chlorophyta. a, Venn diagram showing unique and shared orthogroups among P. coloniale, C. atmophyticus, M. commoda and C. paradoxa. Gene numbers are given in parentheses. b, Percentages of proteins found among Viridiplantae (red), Chlorophyta-specific (blue) and Streptophyta-specific (green) based on the classification given in OrthoFinder. Species abbreviations are listed in Supplementary Table 32. c, PCA of the type and number of Pfam domains of all genes across the Viridiplantae. d, Box-and-whisker plots depicting distributions of the lengths of exons and introns in selected Viridiplantae. one TF (Jumonji_Other) in the common ancestor of Viridiplantae light-harvesting pigments44–47. We identified 41 LHC and LHC-like when compared with Glaucoplantae and Rhodoplantae. Plant JmjC proteins of P. coloniale (Supplementary Table 11). domain-containing proteins have important functions in both his- Phylogenetic analysis of LHC proteins from P. coloniale showed tone modification42 and regulation of development and environ- them to be widely distributed in seven of the ten LHC clades, namely mental responses43. LHCA, LHCB, LHCX, PSBS, OHP, Ferrochelatase and ELIPs (Fig. 3). In contrast to gains and expansion of TFs/TRs, P. coloniale The P. coloniale genome has 19 Lhcb genes (six of which appar- also exhibited loss of six TFs/TRs (C2H2, C3H, CCAAT_HAP2, ently originated from three successive gene duplications in the MADS_MIKC, MBF1 and Zinc Finger MIZ type) that are present Prasinoderma lineage). P. coloniale also displayed nine Lhca genes, in Chlorophyta, Streptophyta, Glaucoplantae and Rhodoplantae. whereas in most of the investigated early-diverging Chlorophyta/ Mapping of TFs/TRs on the phylogeny (Fig. 1b) also allowed Mamiellophyceae and in Mesostigma (Streptophyta) there are only tentative conclusions about gains of TFs/TRs in the common six Lhca genes (Supplementary Table 11). As in the early-diverging ancestor of Chlorophyta + Streptophyta (five: ABI3/VP1, Dicer, Mamiellophyceae, P. coloniale displayed two LHCX proteins. There HD_DDT, Pseudo ARR-B and Whirly) and the common ancestor of are three helix proteins in P. coloniale, as in Cyanophora paradoxa. Streptophyta (seven: HD-ZIP_I_II, HD-ZIP_III, HD-PLINC, GRF, Other types of LHC-like proteins, such as RedCap, SEP (SEP LUG, SRS and Trihelix). apparently originated in streptophyte algae) and LHCL, are miss- ing in P. coloniale. The relatively large number of gene copies of Light-harvesting complex (LHC) and LHC-like proteins in chlorophyll-a/b-binding proteins (Lhca, Lhcb) in P. coloniale could P. coloniale. Archaeplastida produce metabolic energy by col- reflect adaptation to the low-light environment from which this lecting solar energy and transferring it to photosynthetic reac- strain was isolated (150 m depth), requiring larger LHC antennae. tion centres, facilitated by two types of light-harvesting complexes This is corroborated by two other observations: first, the relatively (LHC I and LHC II), composed of LHC proteins that interact with low Chl-a/b ratio (1.13) reported for this organism and the related

Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol 1223 Articles Nature Ecology & Evolution

LHCB LHCA

ChlNC IGS.gm 18 00325

KleflGAQ91570.1

1 1 C C 1

. . .

PRCOL 00005963-RA Chrsp4S06301

h h

2 PRCOL 00005945-RA 2 1 Mesvi1099S09088 Mesvi229S00395

r r

9 Mesvi120S02122 1 OstRCC56129 estExt Genewise1.C 80543 PRCOL 00003440-RA Mesvi1099S09089 2

s s

6 3 0

ChlNC estExt Genewise1Plus.C 40251 p p Mesvi120S00996 Cocsu8039

2 PRCOL 00005165-RA 3 2

8 9

8 9 8

0 4 Volca7030 Chlre7351 KleflGAQ88363.1

S S Q Q Q LHCB-M

LHCB1 LHCB3 0 0 A Chlre742 A A

LHCB2 7 0

G G G

l Mesvi294S04944 l l 8 7

f Chrsp110S01575 f f

9 0

e e e

l l l Phypa6 313V6.1 0 1 Ostlu7206

K K KleflGAQ82022.1 Phypa6 312V6.1 K PRCOL 00000621-RA KleflGAQ82232.1 Batpr6697 OstluCp1090 Mesvi3437S05282 Chrsp75S07575 KleflGAQ82095.1 AT3G54890.1 Cocsu7438 LHCA1 PRCOL 00002442-RA KleflGAQ82234.1Chrsp57S06809 PRCOL 00005820-RA Mesvi1297S02417AT4G10340.1 LHCB5 KleflGAQ86204.1 Phypa28 315V6.1 OstRCC91056 eugene1.0000020518 PRCOL 00006738-RA Chrsp51S06525 PRCOL 00003242-RA PRCOL 00002443-RA AT1G61520.3 KleflGAQ89773.1 PRCOL 00003351-RA Mesvi1213S02138 LHCA3 Ostlu6840 Ostta7409 Volca518 LHCA8 ChlNC IGS.gm 12 00329 Chlre8938 LHCA7 MicRCC8849 Chrsp4S06304 LHCA4 KleflGAQ89802.1 Mesvi7832S00861 Micpu9518 Chrsp175S02382 PRCOL 00001429-RA Micpu9517 ChlNC fgenesh3 pg.C scaffold 3000291 PRCOL 00001426-RA KleflGAQ79026.1 Mesvi649S00788Ostta6764 AT3G47470.1 PRCOL 00006299-RA PRCOL 00004018-RA PRCOL 00006230-RA PRCOL 00006179-RA Volca561 PRCOL 00006298-RA LHCA6 PRCOL 00003081-RA Volca10216 LHCA5 PRCOL 00001034-RA PRCOL 00000672-RA PRCOL 00004719-RA Mesvi163S03399 PRCOL 00004425-RA Mesvi2349S04479 Batpr4993 PRCOL 00006974-RA MicRCC7301 PRCOL 00001129-RA Batpr6663 Micpu8788 Mesvi1580S00244 Ostta974 Ostlu831 KleflGAQ85853.1 OstRCC59614 est orfs.7 1200 CBPU CBPX CBPY 158 1 PRCOL 00006231-RA LHCA9 Mesvi566S06883 Chrsp23S03692 LHCB4 Cocsu8703 Volca5761 Cocsu8704 Chlre9739 Ostta6972 LHCA2 PRCOL 00003936-RAOstlu6397 Mesvi566S06881 MicRCC9338Batpr7309 PRCOL 00006232-RA OstRCC7388 gw1.1.703.1 MicRCC547 Micpu9402 KleflGAQ84962.1 KleflGAQ89393.1 KleflGAQ78204.1 KleflGAQ89394.1 KleflGAQ78205.1 Batpr4852 Batpr6824 OstRCC87531 eugene1.0000110216 Ostlu2457 OstRCC45963 estExt Genewise1Plus.C 50413 Ostta2555 Chlre13639 MicRCC9510 Chlre13637 Micpu1634 Volca10160 Micpu2560 Chrsp115S00750 LI818-2-M.commoda Chrsp115S00070 99 9 MicRCC2881 100

100

8

98 100 LI818-1-M.commoda 89

Volca7897 85 82 LHCX5 100 100 Chlre5807 100 100 100 99 LHCX4 Cocsu8809 100 100 91 100 LHCX6 100 94 97 Cocsu7412 100 99 100 100 97 PTI 17G00120 84 91 100 100 88 100 100 PSBS ChlNC IGS.gm 3 00253 100 100 LHCX3--Phaeodactylum tricornutum PSBS-1 99 100 100 100 87 100 99 96 91 100 93 85 97 100 94 99 100 LHCX1

100 100 99 LHCX 100 89 100 Cocsu8199 99 100 95 96 100 89100100100 TP23G01280 10080 100 100 100 100 100 100 100 100 100 100 100 LHCX2 PSBS-2 82 98 80 KleflGAQ89003.1 Phypa241 86V6.2 100 95 99 99 99 96 93 100 PRCOL 00003352-RA 94100 90 KleflGAQ81861.1 Chlre2155 99 100 99 PRCOL 00000073-RA 100 99 100 92 92 100 Mesvi409S05858 Cocsu1035 100 99 100 PRCOL 00001561-RA 99 100 Mesvi1156S00088 93 91 100 97 97 Ostlu3216 100 100 100 92 Mesvi531S06696 100 86 Mesvi2480S04542 Micpu2975 95 100 91 Mesvi686S07587 MicRCC5674 99 100 86 100 96 100 86 94 100 Mesvi1889S03762 Volca10426 100 10099 86 100 ES0009G00310 100 100100 Cocsu4837 99 84 85 99 100 85 100 Phypa99 95V6.1 Chrsp158S02304 100 100 Phypa213 80V6.1 Micpu4422 100 Cocsu4143 100 100 MicRCC7651 100 100100 ChlNC fgenesh3 pm.C scaffold 8000032 Micpu2001 Chlre5397

99 LHCSR1-LI818 LHCL KleflGAQ78568.1 99 100 100 Tree scale: 1 100 Volca3084 Cyapa27092C21200 100 100 94 100 Chlre5427 Cyapa7056C13222 98 91 100 LHCSR2-LI818 100 100 100 Chrsp88S09238 OstRCC17975 e gw1.1.961.1Micpu5452 10010094 81 100 82 100 Mesvi1214S02143 MicRCC1826 Phypa9 38V6.1 Batpr2517 100 100 100 Phypa15 328V6.1 Ostta2962 Chrsp56S06666 Volca6284 94 100 KleflGAQ87685.1 Chlre11852 83 100 SEP2 100 100 Phypa15 146V6.1 PRCOL 00006531-RACocsu8822 100100 10098 Phypa54 304V6.1 95 92 93 PRCOL 00004845-RA 87 Porpu2120.4 ChlNC e gw1.35.79.1 97 Phypa33 127V6.2 96 94 Porpu983.1 100 100 Phypa12 176V6.2 98 OHP6 KleflGAQ82899.1 Porpu2120.3 OHP1-ARATH 100 100 Porpu3409.2 10098 81 100 100 81 OHP9 Chrsp33S05150 89 Sep

82 OHP7 OHP2 KleflGAQ82404.1 99 OHP1 Phypa364 8V6.1 10087 81 100 100 80 OHP2 100 100 Phypa38 333V6.1 99 99 OHP10 100 Phypa371 25V6.1 99 100 OHP3 100 Mesvi855S08428 83 96 OHP4 100 82 KleflGAQ89635.1Cocsu7575 100 92 91 OHP5 100 96 98 100 92 OLQ13027.1 100 89 97 100 ES0486G00010 99 100 Ferrochelatase-2 100 98 100 100 97 AT2G30390.2 100 95 96 Cyame3457 90 Porpu2050.7 98 95 97 100 Mesvi277S04867 8 87 OHP8

8 Volca6174 100 98 100 Chrsp44S05761 98 98 100 PTI 25G00510 86 Chlre8238 100 100 86 HLIP2 TP02G07730 100 Porpu504.4 Ostta5365 100 Cyapa25517C20094 100 100 ChlNC e gw1.13.26.1Ostlu5142 98 PRCOL 00003386-RA ChlNC estExt fgenesh3 pg.C 120179 PRCOL 00003592-RA 98 Batpr6309 100 Chlre12974

99 Cocsu5492

100 Chrsp80S07789 Mesvi487S09487 89 Ferrochelatase-1Micpu3415 100 89 Phypa22 380V6.1 100 OstRCC13158 gw1.8.715.1 100 KleflGAQ81840.1 Cyapa38681C07299MicRCC8130 OHP2-ARATH Batpr1548 Ostta1036 ES0018G00040PTI 06G04090 Porpu4470.5 Micpu8203 Chocr4498 MicRCC3535 Cyame4449 HLIP-LIKE--Micromonas commoda Ostlu771 OstRCC93518 eugene1.0000070083 Cocsu8078 XP 005841655.1 KleflGAQ79032.1 Ferrochelatase Volca9179 Chlre6630 Cyapa26964C21111XP 005839672.1TP02G07820 Chlre6633 Cocsu2649 PTI 01G02410 Chrsp171S08706 Cyapa26964C21112 RedCap2 ELIP1 RedCap1 ES0256G00100 ELIP2 Cocsu3787 Rhodoplantae XP 005831727.1 KleflGAQ84898.1 Ostta5966 Phypa78 96V6.1 Porpu3431.6 KleflGAQ83766.1 KleflGAQ83767.1 Ostlu7513 KleflGAQ82390.1 Ostlu5723 Phypa196 69V6.1 Chlre11593 Batpr3657 Chlre11594 ChlNC IGS.gm 4 00564 Cocsu7386 Chlre13289 MicRCC6199 PRCOL 00006120-RA Chlre8151 Ostlu1937 PRCOL 00002344-RA Ostta1996 Chlre5661 Cocsu9730 ChlNC e gw1.7.200.1 Embryophyta Volca3680 Micpu6770 MicRCC9020 Micpu4367 ChlNC gw1.1.492.1 Cocsu5074 Batpr3368 Cocsu9222 Cocsu1579 KleflGAQ81172.1 Volca1616 Chlre6654 Mesvi179S03581

K K MicRCC3257 Micpu3934 K

3 1 2

Batpr2103 l l l

. .

e e e

5 OHP1 PRCOL 00000777-RA

0 6

f f MicRCC4406 f 6

l l Ostlu7440 l 8

G G G

V Ostlu5798 Batpr174 9

Chlre7018 5 3

0 A A A

1

6

S Q Q Q

9

1

4

8 9 8

Q Prasinophytes

4 OstRCC21948 e gw1.4.646.1 7

0 0 0

A 8

7

i 3 0 3

a

v G 5 5 PRCOL 00000234-RA 4 Chrsp387S05514 l

p s f

PRCOL 00002372-RA Chrsp201S03275 3 4 Chrsp387S05513 Chrsp4S06419 0

PRCOL 00000235-RA Chrsp387S05519 y Chrsp45S05933 e e

. . Chrsp201S03279 . Chrsp68S07373 Chrsp49S06161 l

Chrsp201S03273 1 1 1 Chrsp201S03272 h

M

K OstRCC48476 estExt Genewise1Plus.C 130524 P Streptophyte algae RedCap TUC clade Diatoms Cryptophyta Brown algae ELIP Glaucoplantae

Fig. 3 | Phylogenetic tree of the LHC antenna protein superfamily. The tree is derived from a MAFFT alignment and was constructed using IQ-TREE (see Methods) with the model of sequence evolution suggested by the programme. Bootstrap values (500 replicates) ≥50% are shown. The LHC superfamily can be divided into ten clades, marked by different colours; the LHC genes of P. coloniale are highlighted in red. The coloured circles on the outer ring denote the distribution of the different LHC subfamilies in the respective taxa. The TUC clade comprises Trebouxiophyceae, Ulvophyceae and Chlorophyceae (all in Chlorophyta). ELIP, early light-induced protein; SEP, two-helix stress-enhanced protein; OHP, one-helix protein; PSBS, the photosystem II subunit S; RedCAP, red lineage chlorophyll a/b-binding (CAB)-like protein.

Palmophyllum48 (0.64), and second, the lower number (six) of ELIPs only Mamiellophyceae were found to encode delta-type CAs in P. coloniale compared to core Chlorophyta (nine in Ulva lactuca (Supplementary Table 12). Whereas alpha-, beta- and gamma-type and ten in C. reinhardtii) or early-diverging subaerial/terrestrial CAs apparently evolved in the common ancestor of Archaeplastida streptophyte algae (13/9 in C. atmophyticus and Klebsormidium (alpha- and beta-type CAs were lost in P. coloniale, and alpha-type nitens, respectively). CAs in the later-diverging Mamiellophyceae, perhaps related to cell miniaturization in both groups), delta-type CAs apparently evolved Carbon-concentrating mechanisms (CCMs). Previous studies of in the common ancestor of Viridiplantae and were independently picoplanktonic Mamiellophyceae suggested that these algae might lost in the core Chlorophyta and in Streptophyta.

possess a C4-like carbon fixation pathway to alleviate low CO2 affin- Here we propose a putative model of CCMs in P. coloniale, 49 ity . A C4-like CCM has also been reported in photosynthetic stra- based on the targets of the genes that are necessary for inorganic 50–53 menopiles . CCMs mainly rely on carbonic anhydrases (CAs) carbon assimilation (Fig. 4b). As a potential C4-like CCM, malate that catalyse the reversible conversion of carbon dioxide to bicar- dehydrogenase catalyses the reaction to yield malate in the cyto- bonate. Four CA genes belonging to the delta- and gamma-type sol, mitochondrion and chloroplast (Fig. 4b). Malic enzymes could CAs were identified in the genome of P. coloniale, while alpha- be transported into the mitochondrion and chloroplast, where

and beta-type CAs were absent (Fig. 4a). Among Viridiplantae, they release CO2. Previous studies of CCMs in Micromonas and

1224 Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol Nature Ecology & Evolution Articles

a b PRCOL_00000018-RA

Glycolysis

Glucose PEP Pyr Ethanol

PRCOL_00006549-RA

ME MA PEPCK PC + + MDH TCA γ-CA – γ-CA CO HCO 2 3 CO2 OAA Gamma-CA Respiration PRCOL_00006246-RA Beta-CA

PRCOL_00000479-RA 100 98 δ-CA – PEPC MDH CO2 CO2 HCO3 OAA MA PEP 69 1 100 – Delta-CA HCO PEP 46 3 OAA Calvin cycle PPDK -CA Pyr 3-PGA RuBP δ MDH ME RuBisCO ME CO2 MA

Alpha-CA

Fig. 4 | CCMs in P. coloniale. a, Maximum-likelihood phylogeny of CA proteins in P. coloniale. b, Proposed CCMs in which inorganic carbon is assimilated by P. coloniale based on predicted protein localizations. A brown arrow denotes that a reaction occurs only in P. coloniale, and a grey dotted arrow denotes a reaction that exists in Mamiellophyceae. MA, malic acid; MDH, malate dehydrogenase; ME, malic enzyme; Pyr, pyruvate; 3-PGA, 3-phosphoglyceric acid; PPDK, pyruvate, phosphate dikinase; RuBisCO, ribulose-1,5-bisphosphate carboxylase oxygenase; TCA, tricarboxylic acid cycle.

Ostreococcus suggested that these algae might perform cytosol- and biogenesis in bacteria and archaea59. Furthermore, seven copies of 49 chloroplast-based C4-like CCMs . P. coloniale, however, potentially regulatory response/sensor proteins, homologous to bacteria, could harbours cytosol-, chloroplast- and mitochondrion-based CCMs to be identified in P. coloniale, which might respond to environmen- enhance the ability to concentrate CO2 in a low-CO2 environment. tal signals. Further studies are needed to biochemically explore the Interestingly, the P. coloniale genome encoded phosphoenolpyru- main components of the of P. coloniale. vate carboxykinase (PEPCK) but not pyruvate carboxylase (PC), Peptidoglycan is the main component of cell walls in bacteria60. opposite to the situation in the genomes of Mamiellophyceae49,54–56. Peptidoglycan biosynthesis requires several enzymes to participate in This result suggests that a distinct CCM might exist in P. coloniale the conversion of UDP-N-acetyl-d-glucosamine (GlcNAc) t­o GlcNac- that uses phosphoenolpyruvate (PEP) as a substrate from the gly- ­N-­ ­ac­et­yl­mu­ra­my­l-­pe­nt­ap­ep­ti­de­-p­yr­op­ho­sp­horyl-unde­caprenol61. colytic pathway to produce oxaloacetate (OAA) by PEPCK, instead All ten core enzymes involved in peptidoglycan biosynthesis were of PC as in the Mamiellophyceae (Supplementary Table 12 and 13). identified in the P. coloniale genome (Fig. 5a). Consistent with pre- vious results, Glaucoplantae (C. paradoxa) and Micromonas pusilla Analysis of carbohydrate-active enzymes (CAZymes) and (Mamiellophyceae), as well as all streptophyte algae, bryophytes peptidoglycan biosynthesis. The P. coloniale genome encoded and ferns, encoded all the core enzymes62. We conclude that pep- 34 glycoside hydrolases (GHs) and 83 glycosyltransferases (GTs) tidoglycan was present in the ancestor of Archaeplastida, com- belonging to 16 GH and 33 GT families (Supplementary Table 14). pletely lost in Rhodoplantae but retained in the common ancestor The total number of CAZymes was lower than in the early-diverging of Viridiplantae and Glaucoplantae, and then independently lost (to Chlorophyta and Streptophyta, and even lower than in Ostreococcus different degrees) in the later-diverging Mamiellophyceae, the core spp., the smallest (Supplementary Table 15), which prob- Chlorophyta and the vast majority of vascular seed plants. ably reflects the simple chemical structure of the P. coloniale cell wall (cells are enclosed within thick cell walls57). P. coloniale, however, Evolutionary analysis of flagella and sexual reproduction in harbours all genes involved in the biosynthesis and metabolism P. coloniale. Prasinoderma coloniale and other members of the of starch (Supplementary Table 16 and Supplementary Fig. 11). recently described class Palmophyllophyceae27 have been reported Surprisingly, we could not find any enzymes involved in the syn- to lack flagella57,63–65. We performed a comparative analysis of fla- thesis or remodelling of the major components of the primary cell gellar proteins, and found that non-flagellate species (three species wall in embryophytes, such as enzymes of cellulose, mannan, xylo- of Trebouxiophyceae and four of Ostreococcus and Bathycoccus) glucan and xylan biosynthesis and degradation. Chlorella spp. have have ≤26 core flagellar proteins and a total number of flagellar pro- been reported to contain a cell wall with components of glucos- teins of ≤140 (average 117, n = 7), whereas flagellate species display amine polymers such as chitin and chitosan58. However, very few ≥40 core flagellar proteins and a total of ≥192 flagellar proteins chitosan-related genes were identified in the genome of P. coloniale (average 272, n = 13) (Fig. 5b and Supplementary Table 18). This (Supplementary Table 17). Interestingly, some but not many bacte- is corroborated by recent analyses among non-flagellate organisms ria/archaea-specific protein glycosylation genes could be detected (angiosperms, Rhodoplantae and pennate diatoms) that yielded on in the P. coloniale genome, such as low-salt glycan biosynthesis pro- average 77 flagellar proteins21 (n = 10). Furthermore, non-flagellate tein Agl12, low-salt glycan biosynthesis reductase Agl14 and the species completely lack central pair proteins, dynein heavy chains GT AglE, which are involved in S-layer and cell surface structure (DHC1–7), and most of the intraflagellar transport (IFT) and radial

Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol 1225 Articles Nature Ecology & Evolution

a PBP

MurG

MurY

MurF

Ddl

MurE

MurD

MurC

MurB

MurA Kni Ath Mvi Ccr Cat Apr Olu Bpr Cre Ota Vca Pco Czo Cva Cso Csu Ppa Ceu Osp Cpa Osa Mco Gpe Mpo Mne Mpu Pum Smo Cme

Streptophyta TUC clade Prasinophytes Glaucoplantae Rhodoplantae Presence Absence

b Central pair Intraflagellar transport Radial spoke protein protein Out dynein Inner dynein protein Dynein protein

Reciprocal best hits 0 100 200 300 RSP1 RSP2 RSP3 RSP4 RSP5 RSP6 RSP8 RSP9 RSP10 RSP11 RSP12 RSP16 RSP17 RSP23 CPC1 PF6 PF16 PF20 ODA DC1 ODA DC2 ODA DC3 ODA IC1 ODA IC2 ODA LC1 ODA LC3 ODA LC6 ODA LC7 ODA LC8 ODA DHCa ODA DHCb ODA DHCg IA1 DHC1a IA1 DHC1b IDA5 IA IC28 IDA IC138 IA1 IC140 IFT20 IFT22 IFT46 IFT52 IFT57 IFT72/74 IFT80 IFT81 IFT88 IFT122A IFT140 IFT172 DHC1b DHC2 DHC4 DHC5 DHC6 DHC7 DHC8 DHC9 Mastigoneme

C. braunii 40/169 K. nitens 51/189 C. atmophyticus 42/150 M. viride 42/162 Streptophyta V. carteri 57/282 G. pectorale 57/295 C. reinhardtii 59/338 C. eustigma 56/271 D. salina 50/256 C. zofingiensis 54/228 M. neglectum 49/202 U. mutabilis 44/227 A. protothecoides 21/87

Chlorophyta C. variabilis 26/114 C. subellipsoidea 25/104 Ostreococcus sp. 18/95 O. lucimarinus 17/101 O. tauri 17/90 B. prasinos 20/84 M. commoda 53/187 M. pusilla 52/184 P. coloniale 50/167

Glaucoplantae C. paradoxa 45/172

Flagellum loss events Selected flagellar proteins Other flagellar proteins Flagella previously not reported

Fig. 5 | Analysis of peptidoglycan biosynthesis and flagellar proteins derived from the P. coloniale genome. a, Distribution of proteins involved in the peptidoglycan biosynthetic pathway across Archaeplastida. b, Distribution of key flagellar proteins across Viridiplantae and Glaucoplantae. spoke proteins (RSP)21. RSP3, which binds to the inner dynein arm meiosis-specific genes in some other protists (five genes in Giardia71 of the axonemal doublet microtubule and is required for axonemal and diatoms72 and four in the trebouxiophytes Auxenochlorella and sliding and flagellar motility, is also absent in non-flagellate organ- Helicosporidium73), but similar to the number of meiosis-specific isms66,67. The number of flagellar proteins in P. coloniale (50 core genes in Micromonas (7)49. Interestingly, P. coloniale seems to proteins, 217 total flagellar proteins) and the presence of IFT (12) lack DMC1, the loss of which correlates with the adaptation of and DHC proteins (6), as well as RSP3, strongly suggest that P. col o - recombination-independent mechanisms for pairing and synapsis niale can produce flagellate cells. The absence of the PF6 protein of in both Drosophila and Caenorhabditis74. We tentatively conclude the central pair microtubule apparatus may indicate that flagellate that P. coloniale retains the capacity for meiotic recombination and cells in P. coloniale are short-lived, like the spermatozoids of centric thus sexual reproduction. diatoms68 or Chara braunii that also lack this protein. Since sexual reproduction has not been observed in P. coloniale De novo NAD+ and quinolate biosynthesis in P. coloniale. and Palmophyllophyceae in general, we searched for genes partici- Nicotinamide adenine dinucleotide (NAD) and its phosphate pating in sexual reproduction. Thirty-one out of 40 meiosis-related (NADP) are essential redox co-factors in all living systems. All genes were identified in the P. coloniale genome, and 8 out of 11 eukaryotic organisms have the ability to synthesize NAD by one of meiosis-specific genes were found (Supplementary Table 19). two de novo pathways, the aspartate pathway75 or the kynurenine These numbers are higher than reported for meiosis-related genes pathway starting with tryptophan76. To date, no eukaryotic organism in Symbiodinium69 (25) and Trichomonas vaginalis70 (27), and for had been found that contains both pathways. P. coloniale is the first

1226 Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol Nature Ecology & Evolution Articles

a Tryptophan

TDO/IDO

N-Formylkynurenine P. coloniale Early Chlorophyta AFM Early Streptophyta Rhodoplantae L-Kynurenine Glaucoplantae Bacteria KMO

3-Hydroxy-L-kynurenine

KYU

3-Hydroxyanthranilate L-Aspartate

HAAO AO

2-Amino 3-carboxymuconate Iminoaspartate 6-semialdehyde

ACMSD Spontaneous QS

2-Aminomuconic Quinolinate 6-semialdehyde

NNP

Picolinic acid NAD+

b 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800

Consensus

PRCOL_00003084-RA

Bacteria-WP_040496802.1

KYU Fungi-XP_021878578.1

Animal-XP_027039601.1

Animal-XP_027050477.1

HAAO Fungi-CEG65823.1

Bacteria-WP_026611299.1

KYU HAAO

Fig. 6 | Comparison of de novo NAD+ and quinolinate biosynthesis genes. a, Distribution of genes related to the de novo NAD+ and quinolinate biosynthetic pathways in P. coloniale (orange) as compared with Rhodoplantae (red), Glaucoplantae (purple), early-diverging Chlorophyta (blue), early-diverging Streptophyta (green) and bacteria (brown). Solid circles denote the presence of homologues in each clade. TDO/IDO, tryptophan-/ indoleamine 2,3-dioxygenase; AFM, arylformamidase; KMO, kynurenine 3-monooxygenase; KYU, kynureninase; HAAO, 3-hydroxyanthranilate 3,4-dioxygenase; ACMSD, 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase; AO, l-aspartate oxidase; QS, quinolinate synthase. b, A gene fusion architecture between KYU and HAAO of P. coloniale; the left and right parts are the KYU and HAAO genes, respectively. A comparison of sequence similarity of various KYU and HAAO genes from different organisms is shown.

Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol 1227 Articles Nature Ecology & Evolution eukaryotic organism to display both pathways (Supplementary high number of selenocysteine-containing proteins, similar to Table 20, Fig. 6a and Supplementary Figs. 12–21): the (presumably) picoplanktonic Mamiellophyceae but unlike core Chlorophyta, ancestral eukaryotic kynurenine pathway and the aspartate path- early-diverging streptophyte algae, Glaucoplantae and Rhodoplantae way. It has been hypothesized that the latter was acquired through (Supplementary Table 22). Selenoenzymes are more catalytically primary endosymbiosis from cyanobacteria77. This is corroborated active than similar enzymes lacking selenium, and small cells may by the fact that both aspartate oxidase (AO) and quinolinate syn- therefore require fewer of those proteins54. thase (QS) of Glaucoplantae in phylogenetic analyses branch within cyanobacteria. Rhodoplantae have apparently lost the aspartate Conclusion pathway (and instead retained the kynurenine pathway for NAD The picoplanktonic P. coloniale is a member of a new divi- biosynthesis77). In Viridiplantae, however, the original cyanobac- sion/phylum of Viridiplantae, the Prasinodermophyta, that in phy- terial AO and QS genes were replaced by those acquired through logenomic analyses diverges before the split of Viridiplantae into horizontal gene transfer (HGT) from other bacteria (Bacteriodetes Chlorophyta and Streptophyta. Its genome revealed both ancestral and Deltaproteobacteria, respectively77). However, we can now and derived characteristics that correspond to its unique phyloge- develop a hypothetical evolutionary scenario for both pathways netic position, equidistant from Chlorophyta and Streptophyta. The in Archaeplastida: with the introduction of the aspartate pathway genome of strain CCMP 1413 showed adaptations to a low-light from cyanobacteria during primary endosymbiosis, the ancestral (deep-water), oligotrophic oceanic environment. In such an envi- eukaryotic kynurenine pathway for NAD biosynthesis was lost in ronment, metabolic coupling and horizontal gene transfer from bac- Glaucoplantae and Viridiplantae (but not in Rhodoplantae, which teria may have facilitated adaptation. In the latter, it resembles the apparently lost the aspartate pathway). While Glaucoplantae essen- genomes of the picoplanktonic Mamiellophyceae (Chlorophyta), tially retained the cyanobacterial aspartate pathway, the ancestor of although a number of apomorphic features in the genome of the Viridiplantae replaced the cyanobacterial AO and QS genes by P. coloniale suggest that the picoplanktonic lifestyle in the two nuclear-encoded genes obtained from other bacteria through HGT, groups evolved independently. thus compensating their function and representing a new synapo- morphy for Viridiplantae. This recalls the situation in Paulinella Methods chromatophora, in which loss of genes from the chromatophore Cultivation of algae, nucleic acid extraction and light microscopy. Cultures of genome was compensated by bacterial genes obtained through P. coloniale (CCMP 1413) were obtained from the National Center for Marine 78 Algae and Microbiota (https://ncma.bigelow.org/ccmp1413#.XqP0zGgzYdU). HGT and encoded on the nuclear genome . We suggest that Axenic cultures were prepared by streaking out algae on agar and picking single P. coloniale retained the kynurenine pathway, not for NAD syn- cell-derived clones from the plates. Algae were grown in a modifed ASP12 thesis but for synthesis (and possible excretion) of picolinic acid. culture medium86 (http://www.ccac.uni-koeln.de/) in a 14/10 h light/dark cycle Additionally, gene fusion architecture between KYU and HAAO of at 20 µmol photons m−2 s−1 and 23 °C. During all steps of culture scale-up until P. coloniale was observed (Fig. 6b). The metabolite picolinic acid, a nucleic acid extraction, axenicity was monitored by both sterility tests and light microscopy. Total RNA was extracted from P. coloniale using the CTAB-PVP tryptophan catabolite, can potentially form metal complexes with method as described in ref. 87 (appendix S1 therein). Total DNA was extracted using limiting trace elements such as iron, an important property in oli- a modifed CTAB protocol88. Light microscopy was performed with a Leica DMLB gotrophic environments. light microscope using a PL-APO ×100/1.40 numerical aperture (NA) objective, an immersed condenser (NA 1.4) and a Metz Mecablitz 32 Ct3 fash system. Vitamin auxotrophy and selenocysteine-containing proteins in P. coloniale. Previous studies found that some Mamiellophyceae Genome sequencing and assembly. The long reads libraries were constructed using standard library preparation protocols and sequenced by the Pacbio Sequel (Ostreococcus, Micromonas) need to acquire the vitamins thiamine platform. NextDenovo (https://github.com/Nextomics/NextDenovo) was used (B1) and cobalamin (B12) from the extracellular environment for to generate the draft assembly. The draft assembly were first polished by Pacbio growth, because they lack either enzymes needed for their biosyn- reads using Arrow, then NextPolish was used to perform a second round of polishing using short reads generated by the Illumina sequence platform. To thesis (B1) or vitamin-independent isoforms of essential enzymes (for example, methionine synthase) that require the vitamin as eliminate putative bacterial contamination, contigs were searched against the NCBI 79,80 non-redundant database. a co-enzyme (B12) . P. coloniale shares these features with the K-mer analysis was performed to survey genome size, heterozygosity and picoplanktonic Mamiellophyceae (Supplementary Table 21). The repeat content before genome assembly. The peak of K-mer frequency (M) was absence of any enzymes involved in the biosynthesis of thiamine determined by the real sequencing depth of the genome (N), read length (L) and in P. coloniale (except for the final phosphorylation step) dem- the length of the K-mer (K) following the formula: M = N × (L – K + 1)/L. This onstrates (1) the lack of bacterial contaminations in the genome formula enables accurate estimation of N, and hence an estimation of the genome size for homozygous diploid or haploid genomes. All these analyses indicated assembly of the axenic strain used, and (2) that thiamine must be homozygosity of the genome and gave similar estimations of genome size. The final provided by the extracellular environment. It has recently been genome size was estimated (~26.04 Mb) using 17-mer analysis. shown that, in Ostreococcus tauri, B1 and B12 auxotrophy can be alle- The quality of the assembly was evaluated in four ways: (1) we used BUSCO viated by co-cultivation with the bacterium Dinoroseobacter shibae, v.3 to determine the presence of a proportion of a core set of 303 highly conserved 81 eukaryotic genes. (2) SOAP (v.2.21) was used to map the short reads to the a member of the Rhodobacteraceae , suggesting that in P. coloniale assemblies to evaluate the DNA reads mapping rate in both species. In the similar algal/bacterial partnerships may exist. Most importantly, meantime, sequence depth and genomic copy content distribution were calculated. P. coloniale is also a vitamin B7 (biotin) auxotroph, lacking all four (3) We used BLAT (v.36) to compare the draft assemblies to a transcript assembled genes involved in the biosynthesis of this vitamin, and apparently by Bridger. (4) We mapped the RNA reads to the draft assemblies to evaluate the the only biotin auxotroph currently known among Archaeplastida82 RNA reads mapping rate using Tophat2. (Supplementary Table 21). Many genomes of marine bacteria con- tain full sets of biotin biosynthesis genes83,84 and these bacteria could Transcriptome sequencing and assembly. Two methods of library construction were performed. The rRNA-depleted RNA library was constructed using the be a source of biotin for P. coloniale. Genome compaction in these ribo-zero rRNA removal kit (plant) (Illumina) following the manufacturer’s picoplanktonic eukaryotes may have facilitated evolution of such protocol, while the poly (A)-selected RNA library was constructed using the symbiotic interactions in an oligotrophic marine environment. ScriptSeq Library Prep kit (Plant leaf) (Illumina) following the manufacturer’s The Mamiellophyceae genomes encode a large number of protocol. A total of 12.09 Gb of PE-100 RNA-seq data was generated using the selenocysteine-containing proteins compared to C. reinhardtii49,54. Illumina Hiseq 4000 sequencing platform. SOAPfilter (v.2.2) was used to filter the reads with N > 10 bp, removing duplicates and adaptors. As a result, 5.58 Gb The number of selenoproteins, their homologues and selenocys- of clean reads were obtained after filtering, then Bridger was used to assemble teine insertion sequences have recently been investigated across the clean data into a transcriptome, which was used for gene annotation and selected Archaeplastida genomes85. P. coloniale also displayed a genome evaluation.

1228 Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol Nature Ecology & Evolution Articles

Repeat annotation. A pipeline combining de novo and library-based approaches RAxML analyses were performed with 1,000 bootstrap replicates, each with was used to identify the repeat elements. For the de novo approach, MITE-hunter 100 starting trees, using either the GTRGAMMA model (for all trees shown here) and LTRharvest were used to annotate the transposon and retrotransposon, or the GTRCAT model (GTRCAT trees were almost identical to GTRGAMMA respectively, then RepeatModeler (v.1.0.8) was performed to annotate the other trees; not shown). In likelihood analyses using IQ-TREE, the best-fitting model repeat elements. For the library-based approach, the custom library Repbase 22.01 was identified by ModelFinder and the bootstrap analysis again involved was used to identify the repeat elements by RepeatMasker. 1,000 replicates. For Bayesian analysis, 1,000,000 generations were calculated under the GTR + I + G model, and generations 1–250,000 were discarded as burn-in. Gene prediction and preliminary functional annotation. Three methods were Bootstrap percentages <50% and Bayesian posterior probabilities <0.9 were combined to predict the gene model, an ab initio prediction method, a homologue regarded as ‘unsupported’. Phylogenetic trees were also constructed using nuclear- search method and a RNA-seq data-aided method. For the first method, and plastid-encoded rRNA operons separately (Supplementary Figs. 5 and 6). PASApipeline-2.1.0 was performed to predict gene structure using transcripts assembled by Bridger, which were further used in AUGUSTUS (v.3.2.3) to train Search for unique rRNA synapomorphies. To find unique molecular gene models. GeneMark (v.1.0) was used to construct a hidden Markov model synapomorphies (Supplementary Files_Taxonomic Acts and Revisions)—that (HMM) profile for further annotation. For the homologue search method, gene is, rare mutations that characterize a given clade—we performed tree-based sets of homologue species and public proteins of Prasinoderma were downloaded synapomorphy searches as previously described89. To identify genuine from the NCBI database. For the RNA-aided method, transcripts were assembled non-homoplasious synapomorphies (flagged as NHS), and to find homoplasious by Bridger as evidence. All predictions were combined using two rounds of changes (parallelisms and reversals), the synapomorphy search must cover as MAKER (v.2.31.8) to yield the consensus gene sets. The final gene set was much diversity as possible. Therefore, all synapomorphies that resulted from the evaluated by mapping with eukaryotic BUSCO v.3 dataset and RNA read mapping initial search procedure (using only 109 Plantae) were controlled for homoplasies by Tophat2. Coverage depth was calculated by Samtools (v.0.1.19). by (1) a taxon-rich alignment containing nuclear rRNA operons from about Preliminary gene function annotation was performed by BLASTP (<10 × 10–5) 1,300 Archaeplastida/Plantae, (2) an alignment with plastid rRNA operons from against certain known databases, including SwissProt, TrEMBL, KEGG, COG about 1,600 Archaeplastida/Plantae and (3) BLAST searches (https://blast.ncbi.nlm. and NR. InterProScan (using data from Pfam, PRINTS, SMART, ProDom nih.gov/Blast.cgi). and PROSITE) was used to identify protein motifs and protein domains of the predicted gene set. Gene Ontology information was obtained through Blast2Go Genome composition of P. coloniale genome. We looked at the components (v.2.5.0). For certain key functional genes we used a stricter functional annotation of the P. coloniale genome mainly in three ways: (1) gene family clustering was method by the addition of some known query genes, as described in Detection of first performed on gene sets of the species C. atmophyticus (Streptophyta), M. important candidate functional genes. commoda (Chlorophyta), C. paradoxa (Glaucoplantae) and P. coloniale. Commonly shared and unique gene families were shown and displayed on a Venn diagram. Whole-genome phylogenetic analysis. For whole-genome phylogenetic analysis, (2) The P. coloniale gene set was aligned to the NCBI non-redundant database both genome data downloaded from public databases (NCBI Refseq or JGI) (NR), and the best alignment results (Best-hit) were obtained for each gene. The and transcriptome data downloaded from the 1000 Plants Project (1KP, https:// NCBI database was then used to classify the P. coloniale gene set. (3) We sites.google.com/a/ualberta.ca/onekp/) were used. First, OrthoFinder (v.1.1.8) selected early-diverging Streptophyta (five species), early-diverging Chlorophyta was used to infer orthogroups (gene families) among the 28 selected organisms. (five species) and P. coloniale to perform the gene family cluster, and then divided Single-copy orthogroups (gene families with only one gene copy per species) all gene families into three categories: early Chlorophyta gene families, early were collected, since every single-copy gene in each gene family could be an Streptophyta gene families and gene families shared by both early Chlorophyta and orthologue among 28 organisms. We used multiple alignment with fast Fourier early Streptophyta35. First, we removed unusual/weird gene families in which the transform (MAFFT v.7.310) to perform multiple sequence alignment for each gene number of some species was over tenfold larger than the average gene number single-copy gene orthogroup, followed by a gap position (removing only positions of the other species. We also removed gene families that include only one species. where 50% or more of the sequences having a gap are treated as gap positions). Then, the average gene numbers in early Chlorophyta and early Streptophyta were We constructed multiple phylogenetic trees using different tree construction determined for each gene family. If the average gene number of early Chlorophyta methods (concatenated and coalescent methods) based on different taxon in a gene family was more than twice the number of early Streptophyta, that gene samplings (that is, number of species). In concatenated tree reconstruction, family was designated as an early Chlorophyta gene family. Conversely, if the each single-copy gene alignment was linked by order to establish a super-gene, average gene number of early Streptophyta in a gene family was larger than twice which was used to construct a concatenated maximum-likelihood phylogenetic the number of early Chlorophyta, that gene family was designated as an early tree with either RAxML (amino acid substitution model: CAT + GTR, with Streptophyta gene family. The remaining gene families were shared between early 500 bootstrap replicates) or IQ-TREE (amino substitution model inferred by Chlorophyta and early Streptophyta. ModelFinder, with 500 bootstrap replicates). In addition, we used MrBayes (v.3.2.6) to construct a Bayesian phylogenetic tree, Markov chain Monte Carlo, Detection of key candidate functional genes. All candidate genes were screened which was set to run 1,000,000 generations and sampled every 1,000 generations, based on the following conditions: (1) candidate gene sequences should be the first 25% of which was discarded as burn-in. In the coalescent method, a similar to the query genes collected from previous studies or databases (BLAST maximum-likelihood phylogenetic tree was constructed for each single-copy <10 × 10–5); and (2) the function of the candidate genes should be consistent orthogroup. We then used ASTRAL to combine all single-copy gene trees into a with the query genes according to online NR functional annotation or Swissprot species tree with the multi-species coalescent model. Finally, we compared and functional annotation. summarized phylogenetic trees using different methods or different datasets. For Regarding the detection of flagellar genes, we mainly referenced the flagellar a general discussion on concatenated versus coalescent methods for phylogenetic genes from refs. 49,90. After elimination of redundancy, we obtained 397 flagellar reconstruction, see ref. 28. genes as our query set. We used the reciprocal best hits method to identify flagellar genes. Phylogenetic analyses of complete nuclear and plastid-encoded rRNA For cell wall-related gene annotation we used the CAZyme database as query, operon sequences of 109 Archaeplantae. New sequence data of rRNA operons then the web meta-server dbCAN2 (http://bcb.unl.edu/dbCAN2/index.php) was were generated for several taxa (see Supplementary Table 33, and as described used to detect CAZymes. dbCAN2 integrates three tools/databases for automated previously30). For other taxa, data were either retrieved from annotated CAZyme annotation: (1) HMMER, for annotation of the CAZyme domain against entries in sequence databases (https://www.ncbi.nlm.nih.gov/nucleotide/) or the dbCAN CAZyme domain HMM database; (2) DIAMOND for fast blast hits assembled from non-annotated transcriptome sequence data (MMETSP and in the CAZy database; and (3) Hotpep for short conserved motifs in the Peptide ONE_KP; see Supplementary Table 31). All new rRNA sequences, as well as Pattern Recognition (PPR) library. newly assembled transcriptome data, were submitted to GENBANK (https:// For TFs we used the HMMER search method. We downloaded the HMMER www.ncbi.nlm.nih.gov/genbank/; bold accession numbers in Supplementary model of the domain structure of each transcription factor from the Pfam website Table 31). Sequences were manually aligned, guided by rRNA/transfer RNA (https://pfam.xfam.org/) while referring to the TAPscan v.2 transcription factor secondary structures using SeaView 4.3.0 (http://pbil.univ-lyon1.fr/software/ database91 (https://plantcode.online.uni-marburg.de/tapscan/). Preliminary seaview.html). For phylogenetic analyses, only those positions were selected that candidates were collected by searching the profile HMM for each species could be unambiguously aligned among the Rhodoplantae, Glaucoplantae and (<10 × 10–5), then we filtered those genes that did not match the SwissProt Viridiplantae—in total, 8,818 nucleotides (nt). For all phylogenetic analyses, the functional annotation (<10 × 10–5). Finally, we filtered genes containing a wrong 8,818 positions were subdivided into four sections: nuclear 18 S rDNA (1,621 nt), domain according to the domain rules of the TAPscan v.2 transcription factor nuclear 5.8 S and 28 S rDNA (3,025 nt), plastid 16 S rDNA and two tRNA database. Most TFs/TRs were confirmed by phylogenetic tree analysis. genes (1,535 nt) and plastid 23 S rDNA (2,637 nt). Tree reconstructions were performed at the CIPRES Science Gateway (http://www.phylo.org/sub_sections/ Subcellular localization. To predict where key proteins (for example, certain portal/) using three methods: maximum-likelihood with RAxML (v.8.2.10), enzymes related to carbon-concentrating mechanisms) reside in a cell, we used maximum-likelihood with IQ-TREE (v.1.6.10) and Bayesian tree reconstruction online tools including WoLF_PSORT (https://www.genscript.com/wolf-psort. with MrBayes (v.3.2.6). html?src=leftbar), TargetP (http://www.cbs.dtu.dk/services/TargetP/), Hectar

Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol 1229 Articles Nature Ecology & Evolution

(https://webtools.sb-roscoff.fr/) and LocSigDB (http://genome.unmc.edu/ 26. Zechman, F. W. et al. An unrecognized ancient lineage of green plants LocSigDB/index.html) to predict the subcellular localization of these proteins. persists in deep marine waters. J. Phycol. 46, 1288–1295 (2010). Combining the results of the four tools, we estimated the localization. 27. Leliaert, F. et al. Chloroplast phylogenomic analyses reveal the deepest-branching lineage of the Chlorophyta, Palmophyllophyceae class. nov. Reporting Summary. Further information on research design is available in the Sci. Rep. 6, 25367 (2016). Nature Research Reporting Summary linked to this article. 28. Molloy, E. & Warnow, T. Large-scale species tree estimation. Preprint at arXiv https://arxiv.org/abs/1904.02600 (2019). Data availability 29. Leebens-Mack, J. H. et al. One thousand plant transcriptomes and the Whole-genome assemblies, annotation and raw data for P. coloniale in this study phylogenomics of green plants. Nature 574, 679–685 (2019). are deposited at the CNGB Nucleotide Sequence Archive92 (CNSA: http://db.cngb. 30. Marin, B. Nested in the or independent class? Phylogeny and org/cnsa, accession no. CNP0000924). classifcation of the Pedinophyceae (Viridiplantae) revealed by molecular phylogenetic analyses of complete nuclear and plastid-encoded rRNA operons. Protist 163, 778–805 (2012). Received: 12 June 2019; Accepted: 12 May 2020; 31. Shen, X.-X., Hittinger, C. T. & Rokas, A. Contentious relationships in Published online: 22 June 2020 phylogenomic studies can be driven by a handful of genes. Nat. Ecol. Evol. 1, 0126 (2017). References 32. Walker, J. F., Walker-Hale, N., Vargas, O. M., Larson, D. A. & Stull, G. W. 1. Niklas, K. J. Te Evolutionary Biology of Plants (Univ. of Chicago Press, 1997). Characterizing gene tree confict in plastome-inferred phylogenies. PeerJ. 7, 2. Kenrick, P. & Crane, P. Te origin and early evolution of plants on Land. e7747 (2019). Nature 389, 33–39 (1997). 33. Gonçalves, D. J. P., Simpson, B. B., Ortiz, E. M., Shimizu, G. H. & Jansen, R. 3. Willis, K. & McElwain, J. Te Evolution of Plants (Oxford Univ. Press, 2014). K. Incongruence between gene trees and species trees and phylogenetic signal 4. Judd, W. S., Campbell, C. S., Kellogg, E. A., Stevens, P. F. & Donoghue, M. J. variation in plastid genes. Mol. Phylogenet. Evol. 138, 219–232 (2019). Plant Systematics: A Phylogenetic Approach (Sinauer, 2008). 34. Grimsley, N., Yau, S., Piganeau, G. & Moreau, H. in Marine Protists 5. Courties, C. et al. Smallest eukaryotic organism. Nature 370, 255 (1994). (eds. Ohtsuka, S. et al.) 107–127 (Springer, 2015). 6. Yoon, H. S., Hackett, J. D., Ciniglia, C., Pinto, G. & Bhattacharya, D. A 35. Hori, K. et al. Klebsormidium faccidum genome reveals primary factors for molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. plant terrestrial adaptation. Nat. Commun. 5, 3978 (2014). Evol. 21, 809–818 (2004). 36. Nishiyama, T. et al. Te Chara genome: secondary complexity and 7. Morris, J. L. et al. Te timescale of early land plant evolution. Proc. Natl Acad. implications for plant terrestrialization. Cell 174, 448–464 (2018). Sci. USA 115, E2274–E2283 (2018). 37. Rinerson, C. I., Rabara, R. C., Tripathi, P., Shen, Q. J. & Rushton, P. J. Te 8. Bremer, K. Summary of green plant phylogeny and classifcation. Cladistics 1, evolution of WRKY transcription factors. BMC Plant Biol. 15, 66 (2015). 369–385 (1985). 38. Saf, A. et al. Te world according to GARP transcription factors. Curr. Opin. 9. Melkonian, M. & Surek, B. Phylogeny of the Chlorophyta: congruence Plant Biol. 39, 159–167 (2017). between ultrastructural and molecular evidence. Bull. Soc. Zool. Fr. 120, 39. Guo, A.-Y. et al. Genome-wide identifcation and evolutionary analysis of the 191–208 (1995). plant-specifc SBP-box transcription factor family. Gene 418, 1–8 (2008). 10. Lewis, L. A. & McCourt, R. M. Green algae and the origin of land plants. 40. Kropat, J. et al. A regulator of nutritional copper signaling in Chlamydomonas Am. J. Bot. 91, 1535–1556 (2004). is an SBP domain protein that recognizes the GTAC core of copper response 11. Becker, B. & Marin, B. Streptophyte algae and the origin of embryophytes. element. Proc. Natl Acad. Sci. USA 102, 18730–18735 (2005). Ann. Bot. 103, 999–1004 (2009). 41. Moreno-Risueno, M. Á., Martínez, M., Vicente-Carbajosa, J. & Carbonero, P. 12. Leliaert, F. et al. Phylogeny and molecular evolution of the green algae. Te family of DOF transcription factors: from green unicellular algae to Crit. Rev. Plant Sci. 31, 1–46 (2012). vascular plants. Mol. Genet. Genomics 277, 379–390 (2007). 13. Wickett, N. J. et al. Phylotranscriptomic analysis of the origin and 42. Crevillén, P. et al. Epigenetic reprogramming that prevents transgenerational early diversifcation of land plants. Proc. Natl Acad. Sci. USA 111, inheritance of the vernalized state. Nature 515, 587–590 (2014). E4859–E4868 (2014). 43. Liu, C., Lu, F., Cui, X. & Cao, X. Histone methylation in higher plants. 14. Ruhfel, B. R., Gitzendanner, M. A., Soltis, P. S., Soltis, D. E. & Burleigh, J. G. Annu. Rev. Plant Biol. 61, 395–420 (2010). From algae to angiosperms–inferring the phylogeny of green plants 44. Croce, R., Van Grondelle, R., Van Amerongen, H. & Van Stokkum, I. (Viridiplantae) from 360 plastid genomes. BMC Evol. Biol. 14, 23 (2014). Light Harvesting in Photosynthesis (CRC Press, 2018). 15. Gitzendanner, M. A., Soltis, P. S., Wong, G. K. S., Ruhfel, B. R. & Soltis, D. E. 45. Grossman, A. R., Bhaya, D., Apt, K. E. & Kehoe, D. M. Light-harvesting Plastid phylogenomic analysis of green plants: a billion years of evolutionary complexes in oxygenic photosynthesis: diversity, control, and evolution. history. Am. J. Bot. 105, 291–301 (2018). Annu. Rev. Genet. 29, 231–288 (1995). 16. Turmel, M. & Lemieux, C. Evolution of the plastid genome in green algae. 46. Dreyfuss, B. W. & Tornber, J. P. Assembly of the light-harvesting complexes Adv. Bot. Res. 85, 157–193 (2018). (LHCs) of photosystem II (monomeric LHC Iib complexes are intermediates 17. Stewart, K. D. & Mattox, K. R. Structural evolution in the fagellated cells of in the formation of oligomeric LHC IIb complexes). Plant Physiol. 106, green algae and land plants. BioSystems 10, 145–152 (1978). 829–839 (1994). 18. Melkonian, M. Structural and evolutionary aspects of the fagellar apparatus 47. Schmid, V. H. R. Light-harvesting complexes of vascular plants. Cell. Mol. Life in green algae and land plants. Taxon 31, 255–265 (1982). Sci. 65, 3619–3639 (2008). 19. Lemieux, C., Otis, C. & Turmel, M. Ancestral chloroplast genome in 48. Kunugi, M. et al. Evolution of green plants accompanied changes in Mesostigma viride reveals an early branch of green plant evolution. Nature light-harvesting systems. Plant Cell Physiol. 57, 1231–1243 (2016). 403, 649–652 (2000). 49. Worden, A. Z. et al. Green evolution and dynamic adaptations revealed 20. Wang, S. et al. Genomes of early-diverging streptophyte algae shed light on by genomes of the marine picoeukaryotes Micromonas. Science 324, plant terrestrialization. Nat. Plants 6, 95–106 (2020). 268–272 (2009). 21. Rodríguez-Ezpeleta, N., Philippe, H., Brinkmann, H., Becker, B. & 50. Cock, J. M. et al. Te Ectocarpus genome and the independent evolution of Melkonian, M. Phylogenetic analyses of nuclear, mitochondrial, and plastid multicellularity in brown algae. Nature 465, 617–621 (2010). multigene data sets support the placement of Mesostigma in the Streptophyta. 51. Roberts, K., Granum, E., Leegood, R. C. & Raven, J. A. Carbon acquisition by Mol. Biol. Evol. 24, 723–731 (2007). diatoms. Photosynth. Res. 93, 79–88 (2007). 22. Marin, B. & Melkonian, M. Mesostigmatophyceae, a new class of streptophyte 52. Kroth, P. G. et al. A model for carbohydrate metabolism in the diatom green algae revealed by SSU rRNA sequence comparisons. Protist 150, Phaeodactylum tricornutum deduced from comparative whole genome 399–417 (1999). analysis. PLoS ONE 3, e1426 (2008). 23. Guillou, L. et al. Diversity of picoplanktonic prasinophytes assessed by 53. Radakovits, R. et al. Draf genome sequence and genetic transformation direct nuclear SSU rDNA sequencing of environmental samples and novel of the oleaginous alga Nannochloropis gaditana. Nat. Commun. 3, isolates retrieved from oceanic and coastal marine ecosystems. Protist 155, 686 (2012). 193–214 (2004). 54. Palenik, B. et al. Te tiny eukaryote Ostreococcus provides genomic insights 24. Marin, B. & Melkonian, M. Molecular phylogeny and classifcation of into the paradox of plankton speciation. Proc. Natl Acad. Sci. USA 104, the Mamiellophyceae class. nov. (Chlorophyta) based on sequence 7705–7710 (2007). comparisons of the nuclear- and plastid-encoded rRNA operons. Protist 161, 55. Derelle, E. et al. Genome analysis of the smallest free-living eukaryote 304–336 (2010). Ostreococcus tauri unveils many unique features. Proc. Natl Acad. Sci. USA 25. Lemieux, C., Otis, C. & Turmel, M. Six newly sequenced chloroplast genomes 103, 11647–11652 (2006). from prasinophyte green algae provide insights into the relationships among 56. Moreau, H. et al. Gene functionalities and genome structure in Bathycoccus prasinophyte lineages and the diversity of streamlined genome architecture in prasinos refect cellular specializations at the base of the green lineage. picoplanktonic species. BMC Genomics 15, 857 (2014). Genome Biol. 13, R74 (2012).

1230 Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol Nature Ecology & Evolution Articles

57. Jouenne, F. et al. Prasinoderma singularis sp. nov. (, 83. Cho, S. H. et al. Elucidation of the biosynthetic pathway of vitamin B groups Chlorophyta), a solitary coccoid prasinophyte from the South-East Pacifc and potential secondary metabolite gene clusters via genome analysis of a Ocean. Protist 162, 70–84 (2011). marine bacterium Pseudoruegeria sp. M32A2M. J. Microbiol. Biotechnol. 30, 58. Blanc, G. et al. Te Chlorella variabilis NC64A genome reveals adaptation to 505–514 (2020). photosymbiosis, coevolution with viruses, and cryptic sex. Plant Cell 22, 84. Karimi, E. et al. Genome sequences of 72 bacterial strains isolated from 2943–2955 (2010). Ectocarpus subulatus: a resource for algal microbiology. Genome Biol. Evol. 59. Fagan, R. P. & Fairweather, N. F. Biogenesis and functions of bacterial 12, 3647–3655 (2020). S-layers. Nat. Rev. Microbiol. 12, 211–222 (2014). 85. Liang, H. et al. Phylogenomics provides new insights into gains and losses of 60. Seltmann, G. & Holst, O. Te Bacterial Cell Wall (Springer Science & selenoproteins among Archaeplastida. Int. J. Mol. Sci. 20, 3020 (2019). Business Media, 2013). 86. McFadden, G. I. & Melkonian, M. Use of Hepes bufer for microalgal culture 61. Lovering, A. L., Safadi, S. S. & Strynadka, N. C. J. Structural perspective media and fxation for electron microscopy. Phycologia 25, 551–557 (1986). of peptidoglycan biosynthesis and assembly. Annu. Rev. Biochem. 81, 87. Johnson, M. T. J. et al. Evaluating methods for isolating total RNA and 451–478 (2012). predicting the success of sequencing phylogenetically diverse plant 62. van Baren, M. J. et al. Evidence-based green algal genomics reveals marine transcriptomes. PLoS ONE 7, e50226 (2012). diversity and ancestral characteristics of land plants. BMC Genomics 17, 88. Sahu, S. K., Tangaraj, M. & Kathiresan, K. DNA extraction protocol for 267 (2016). plants with high levels of secondary metabolites and polysaccharides without 63. Miyashita, H., Ikemoto, H., Kurano, N., Miyachi, S. & Chihara, M. using liquid nitrogen and phenol. ISRN Mol. Biol. 2012, 205049 (2012). Prasinococcus capsulatus gen. et sp. nov., a new marine coccoid prasinophyte. 89. Marin, B., Palm, A., Klingberg, M. & Melkonian, M. Phylogeny and J. Gen. Appl. Microbiol. 39, 571–582 (1993). taxonomic revision of plastid-containing euglenophytes based on SSU rDNA 64. Hasegawa, T. et al. Prasinoderma coloniale gen. et sp. nov., a new pelagic sequence comparisons and synapomorphic signatures in the SSU rRNA coccoid prasinophyte from the Western Pacifc Ocean. Phycologia 35, secondary structure. Protist 154, 99–145 (2003). 170–176 (1996). 90. Nevers, Y. et al. Insights into ciliary genes and evolution from multi-level 65. Sieburth, J. M., Keller, M. D., Johnson, P. W. & Myklestad, S. M. Widespread phylogenetic profling. Mol. Biol. Evol. 34, 2016–2034 (2017). occurrence of the oceanic ultraplankter, Prasinococcus capsulatus 91. De Clerck, O. et al. Insights into the evolution of multicellularity from the sea (Prasinophyceae), the diagnostic “Golgi‐decapore complex” and the newly lettuce genome. Curr. Biol. 28, 2921–2933 (2018). described polysaccharide “capsulan”. J. Phycol. 35, 1032–1043 (1999). 92. Cheng, S. et al. 10KP: a phylodiverse genome sequencing plan. Gigascience 7, 66. Yang, P. & Smith, E. F. in Te Chlamydomonas Sourcebook (ed. Witman, G. giy013 (2018). B.) 209–234 (Elsevier, 2009). 67. Jivan, A., Earnest, S., Juang, Y. C. & Cobb, M. H. Radial spoke protein 3 is a Acknowledgements mammalian protein kinase A-anchoring protein that binds ERK1/2. J. Biol. We thank G. Günther (http://www.mikroskopia.de/index.html) for microscopic images Chem. 284, 29437–29445 (2009). of P. coloniale. Financial support was provided by the Shenzhen Municipal Government 68. Armbrust, E. V. et al. Te genome of the diatom Talassiosira pseudonana: of China (grant no. JCYJ20151015162041454) and the Guangdong Provincial Key ecology, evolution, and metabolism. Science 306, 79–86 (2004). Laboratory of Genome Read and Write (grant no. 2017B030301011). This work is part of 69. Chi, J., Parrow, M. W. & Dunthorn, M. Cryptic sex in Symbiodinium the 10KP project, and is supported by China National GeneBank. (Alveolata, Dinofagellata) is supported by an inventory of meiotic genes. J. Eukaryot. Microbiol. 61, 322–327 (2014). 70. Malik, S.-B. An expanded inventory of conserved meiotic genes provides Author contributions evidence for sex in Trichomonas vaginalis. PLoS ONE 3, e2879 (2008). H.Liu., M.M. and Y.V.P. conceived, designed and supervised the project. M.M., H.Liu., 71. Ramesh, M. A., Malik, S. B. & Logsdon, J. M. Jr A phylogenomic inventory of X.X., J.W., G.K.-S.W. and H.Y. provided resources and materials. Z.C. and S.K.S. meiotic genes: evidence for sex in Giardia and an early eukaryotic origin of developed the protocol for DNA extraction. S.Wittek., T.R. and B.Melkonian grew the meiosis. Curr. Biol. 15, 185–191 (2005). organisms to quantity and extracted DNA. Samples were sequenced by B.G.I.; L.L. and 72. Patil, S. et al. Identifcation of the meiotic toolkit in diatoms and exploration S.Wang generated the draft genome and performed the annotation. L.L., S.Wang, H.W., of meiosis-specifc SPO11 and RAD51 homologs in the sexual species S.K.S., B.Marin, H.Y., Y.X., H.Li., H.Liang, Z.L., S.C. and M.P. analysed data. S.Wang., Pseudo-nitzschia multistriata and Seminavis robusta. BMC Genomics 16, L.L., S.K.S. and M.M. wrote the paper. J.W., H.Y., X.L., H.Liu., M.M. and Y.V.P. revised 930 (2015). the manuscript. All authors read and revised the final version of the manuscript. 73. Fučíková, K., Pažoutová, M. & Rindi, F. Meiotic genes and sexual reproduction in the green algal class Trebouxiophyceae (Chlorophyta). Competing interests J. Phycol. 51, 419–430 (2015). The authors declare no competing interests. 74. Villeneuve, A. M. & Hillers, K. J. Whence meiosis? Cell 106, 647–650 (2001). 75. Grifth, G. R., Chandler, J. L. & Gholson, R. K. Studies on the de novo biosynthesis of NAD in Escherichia coli: the separation of the nadB gene Additional information product from the nadA gene product and its purifcation. Eur. J. Biochem. 54, Extended data is available for this paper at https://doi.org/10.1038/s41559-020-1221-7. 239–245 (1975). 76. Gaertner, F. H. & Shetty, A. S. Kynureninase-type enzymes and the evolution Supplementary information is available for this paper at https://doi.org/10.1038/ of the aerobic tryptophan-to-nicotinamide adenine dinucleotide pathway. s41559-020-1221-7. Biochim. Biophys. Acta Enzymol. 482, 453–460 (1977). Correspondence and requests for materials should be addressed to Y.V.P., M.M. or H.L. 77. Ternes, C. M. & Schönknecht, G. Gene transfers shaped the evolution Reprints and permissions information is available at www.nature.com/reprints. of de novo NAD+ biosynthesis in eukaryotes. Genome Biol. Evol. 6, 2335–2349 (2014). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in 78. Nowack, E. C. M. et al. Gene transfers from diverse bacteria compensate for published maps and institutional affiliations. reductive genome evolution in the chromatophore of Paulinella Open Access This article is licensed under a Creative Commons chromatophora. Proc. Natl Acad. Sci. USA 113, 12214–12219 (2016). Attribution 4.0 International License, which permits use, sharing, adap- 79. Crof, M. T., Lawrence, A. D., Raux-Deery, E., Warren, M. J. & Smith, A. G. tation, distribution and reproduction in any medium or format, as long Algae acquire vitamin B12 through a symbiotic relationship with bacteria. as you give appropriate credit to the original author(s) and the source, provide a link to Nature 438, 90–93 (2005). the Creative Commons license, and indicate if changes were made. The images or other 80. Helliwell, K. E. Te roles of B vitamins in phytoplankton nutrition: new third party material in this article are included in the article’s Creative Commons license, perspectives and prospects. New Phytol. 216, 62–68 (2017). unless indicated otherwise in a credit line to the material. If material is not included in 81. Cooper, M. B. et al. Cross-exchange of B-vitamins underpins a mutualistic the article’s Creative Commons license and your intended use is not permitted by statu- interaction between Ostreococcus tauri and Dinoroseobacter shibae. ISME J. tory regulation or exceeds the permitted use, you will need to obtain permission directly 13, 334–345 (2019). from the copyright holder. To view a copy of this license, visit http://creativecommons. 82. Crof, M. T., Warren, M. J. & Smith, A. G. Algae need their vitamins. org/licenses/by/4.0/. Eukaryot. Cell 5, 1175–1183 (2006). © The Author(s) 2020

Nature Ecology & Evolution | VOL 4 | September 2020 | 1220–1231 | www.nature.com/natecolevol 1231 Articles Nature Ecology & Evolution

Extended Data Fig. 1 | A physical map of the P. coloniale genome. Outer ring: The 22 chromosomes were labeled from Chr1 to Chr22. Inner rings 2–5 (from outside to inside): Illumina sequencing depth colored in light green (y-axis min-max: 0–592). PacBio sequencing depth colored in light purple (y-axis min-max: 0–67). GC content of P. coloniale chromosomes in light blue (y-axis min-max: 0–80.0). The gene number distribution of P. coloniale colored in red (y-axis min-max: 0–38). The slide window of inner rings 2–5 is 5,000 bp. Inner rings 6–15: Genes shared between P. coloniale and other early-diverging viridiplant genomes, from outside to inside. M. viride, C. atmophyticus, K. nitens, C. braunii and M. endlicherianum (green), M. commoda, M. pusilla, B. prasinos, O. lucimarinus and O. tauri (blue).

Nature Ecology & Evolution | www.nature.com/natecolevol Nature Ecology & Evolution Articles

Extended Data Fig. 2 | See next page for caption.

Nature Ecology & Evolution | www.nature.com/natecolevol Articles Nature Ecology & Evolution

Extended Data Fig. 2 | The impact of a severely reduced taxon sampling in rRNA phylogenies on the placement of the Prasinodermophyta. (a). RAxML phylogeny of 23 Archaeplastida/Plantae, for which genome sequences have been determined. As an exception, the Gonium genome project did not cover both rRNA operons, and thus, Gonium was replaced by the closely related Yamagishiella. For similar reasons, Micromonas pusilla was replaced by M. bravo. The Prasinodermophyta, represented only by Prasinoderma coloniale, was resolved as sister to the (Mamiellophyceae) with maximal support. This artificial placement (that is Prasinoderma coloniale diverging within the Chlorophyta) gained high support by bootstrapping (numbers in red color). (b). Splitting the long branch of Prasinoderma coloniale by addition of Prasinococcus capsulatus did not change the artificial placement of the Prasinodermophyta, but reduced the bootstrap support for the artificial branches (numbers in red color). (c). When the long branch of the Mamiellales was subdivided by addition of Monomastix sp. and Pyramimonas parkeae, the Mamiellophyceae/Pyramimonadophyceae-clade diverged independently, and the Prasinodermophyta attained a basally diverging position within Viridiplantae. However, the support for the basal divergence of the Prasinodermophyta was relatively low (numbers in blue color). (d). Further addition of only two taxa, Dolichomastix tenuilepis and Cymbomonas tetramitiformis, was sufficient to raise the bootstrap support for the monophyly of the Chlorophyta (to the exclusion of Prasinodermophyta; 94%), and the monophyly of Chlorophyta+Streptophyta (again to the exclusion of Prasinodermophyta; 89%) to high values (numbers in blue color), comparable to the 109-taxa rRNA phylogeny (Fig. 1c), and the genome/transcriptome tree (Fig. 1b). Taxon sampling for resolving the phylogenetic position of the Prasinodermophyta is thus saturated with only 28 sequences of Archaeplastidae/Plantae.

Nature Ecology & Evolution | www.nature.com/natecolevol Nature Ecology & Evolution Articles

Extended Data Fig. 3 | Comparison of genome characteristics across Viridiplantae. Genome size, average gene size, the percentage of the coding sequence, average gene density, average exon number per gene and total exon number among early-diverging lineages of Chlorophyta and Streptophyta compared to P. coloniale.

Nature Ecology & Evolution | www.nature.com/natecolevol Articles Nature Ecology & Evolution

Extended Data Fig. 4 | The phylogenetic tree of WRKY domain. Prasinoderma’s WRKY domain is marked in light green color. WRKY domains I CTD and I NTD represent the C- and N-terminal domains of a single WRKY gene, each domain is monophyletic comprising both Streptophyta and Chlorophyta. This suggests that the common ancestor of Chlorophyta and Streptophyta had this configuration. Interestingly, P. coloniale has four gene copies with a total of six WRKY domains (Supplementary Fig. 13). Two of the gene copies display both N- and C-terminal WRKY domains, the other two have only N-terminal WRKY domains. The phylogenetic tree (Supplementary Fig. 13) placed three WRKY domains in clade I CTD (two C-terminal and one N-terminal WRKY domain), the other N-terminal WRKY domains of P. coloniale could not be positioned in one of the 8 WRKY domain subfamilies. We suggest that the I CTD subfamily is ancestral in the Viridiplantae and the N-terminal WRKY domains originated by domain duplication and shuffling.

Nature Ecology & Evolution | www.nature.com/natecolevol nature research | reporting summary

Corresponding author(s): Huan Liu

Last updated by author(s): Apr 30, 2020 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section. A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Our web collection on statistics for biologists contains articles on many of the points above. Software and code Policy information about availability of computer code Data collection Paired-end libraries with insert sizes of 170 bp, 250 bp, 2 kb, 5 kb, 10 kb and 20 kb were constructed following standard Illumina protocols. The libraries were sequenced on an Illumina HiSeq 2000/4000. A total of 179Gb (about 8885.94X) paired-end data were generated for Prasinoderma coloniale (CCMP 1413). Besides, 7.4 Gb Pacbio long reads were generated by Sequel II platform.

For Illumina sequencing, we considered two ways of library construction. The rRNA-depleted RNA library was constructed using the ribo- zero rRNA removal kit (plant) (Illumina, American) following the manufacturer’s protocol, while the poly (A)-selected RNA library was constructed using the ScriptSeq Library Prep kit (Plant leaf) (Illumina, American) following the manufacturer’s protocol.

Data analysis The list of Software used in this study are as follows:

CLC Assembly Cell (version 5.0.1) Pairfq (version 0.16.0) SOAPfilter (version 2.2) fastp (version v0.20.1) Kmerfreq (version 1.0) Jellyfish (version v2.3.0) SPAdes (version 3.10.1) October 2018 SSPACE (version 3.0) GapCloser (version 1.12) MeDuSa (version 1.6) NextDenovo (version v2.2) NextPolish (version v1.1.0) BUSCO (version3) Soap (version 2.21) blat (v36)

1 Bridger_r2014-12-01

Trinityrnaseq (version 2.1.1) nature research | reporting summary Tophat2 (version 2.1.0) RepeatModeler (version 1.0.8) GenomeTools (version 1.5.8) MITE-hunter LTRharvest PASApipeline-2.1.0 AUGUSTUS (version 3.2.3) GeneMark (version 1.0) MAKER (version 2.31.8) SNAP (version 2006-07-28) Samtools (version 0.1.19) blast-2.2.26 ncbi-blast-2.2.31+ Blast2go (version 2.5.0) InterProScan 5.28-67.0 OrthoFinder (version 1.1.8) MAFFT (version 7.310) RAxML (version 8.2.4) IQ-tree (version 1.6.1) ASTRAL (version 4.11.1) For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: - Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data - A description of any restrictions on data availability

The whole genome assemblies for P. coloniale in this study are deposited at DDBJ/ENA/GenBank under the accession numbers of RQSC00000000. Those data are also available in the CNGB Nucleotide Sequence Archive (CNSA: http://db.cngb.org/cnsa; accession number CNA0002354).

Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design All studies must disclose on these points even when the disclosure is negative. Sample size Axenic cultures of Prasinoderma coloniale (CCMP 1413) were obtained from the Culture Collection of Algae at the University of Cologne and grown in a modified ASP12 culture medium (http://www.ccac.uni-koeln.de/).

Data exclusions The reads with low quality are more likely to contain errors, which might complicate the assembly process, and were excluded. Detailed criteria are provided in the subsection of Method "Genome sequencing and assembly"

Replication NA

Randomization No randomization is required for our experiment.

Blinding Blind experiment is not required for our work. October 2018 Reporting for specific materials, systems and methods We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

2 Materials & experimental systems Methods nature research | reporting summary n/a Involved in the study n/a Involved in the study Antibodies ChIP-seq Eukaryotic cell lines Flow cytometry Palaeontology MRI-based neuroimaging Animals and other organisms Human research participants Clinical data October 2018

3