<<

Letter https://doi.org/10.1038/s41586-019-1398-6

Non-photosynthetic predators are sister to red Ryan M. R. Gawryluk1,3,5*, Denis V. Tikhonenkov1,2,5*, Elisabeth Hehenberger1,4, Filip Husnik1, Alexander P. Mylnikov2 & Patrick J. Keeling1*

Rhodophyta () is one of three lineages of Archaeplastida1, complete statistical support (Fig. 2a, b and Extended Data Fig. 4a, b), a supergroup that is united by the primary endosymbiotic origin although approximately unbiased tests were unable to distinguish of in eukaryotes2,3. Red algae are a diverse and - between monophyly or paraphyly (P = 0.6693 and rich group, members of which are typically photoautotrophic, P = 0.3397, respectively). To examine the possibility that long-branch but are united by a number of highly derived characteristics: they attraction affected the position of Rhodelphis, we carried out fast-site have relatively small intron-poor genomes, reduced metabolism removal analyses, which showed that support for the sisterhood of and lack cytoskeletal structures that are associated with motility, Rhodelphis and red algae remained high for both 153/253 (Extended flagella and centrioles. This suggests that marked gene loss occurred Data Fig. 3c) and 151/253 datasets (Fig. 2c). To test whether mixed around their origin4; however, this is difficult to reconstruct gene ancestry (for example, from horizontal gene transfer) affected because they differ so much from the other archaeplastid lineages, the phylogenetic placement of Rhodelphis, we calculated internode and the relationships between these lineages are unclear. Here we certainty values6 from 253 bootstrapped single-gene trees and the 50 trees describe the novel eukaryotic Rhodelphidia and, using with the highest relative tree certainty. These analyses showed a degree phylogenomics, demonstrate that it is a closely related sister to red of conflict that is similar to other ancient but well-established rela- algae. However, the characteristics of the two Rhodelphis species tionships (such as monophyly) and that the internode described here are nearly opposite to those that define red algae: certainty scores for the Rhodelphis and red algae bipartition increase they are non-photosynthetic, predators with gene-rich in the 50 best-supported trees (Extended Data Fig. 5a, b), which also genomes, along with a relic genome-lacking primary that recover Rhodelphis as sisters to the red algae in a concatenated phylog- probably participates in haem synthesis. Overall, these findings eny (Extended Data Fig. 6). Coalescent species trees7 estimated from alter our views of the origins of Rhodophyta, and Archaeplastida the 253 and 50 single-gene tree datasets also recover Rhodelphis and red as a whole, as they indicate that mixotrophic feeding—that algae as sisters, with full support (Extended Data Fig. 7a, b). is, a combination of predation and phototrophy—persisted well into Most Archaeplastida are photoautotrophic and phagotrophy is very the evolution of the group. rare. However, phagotrophy must have existed for the archaeplastid Two previously undescribed eukaryovorous , Rhodelphis ancestor to take up the plastid, and must have persisted at least until limneticus and Rhodelphis marinus (see Supplementary Information the protoplastid became a reliable source of both energy and nutrients. for taxonomic diagnosis), were isolated from a freshwater lake and This suggests a key mixotrophic intermediate stage; however, because marine coral sand, respectively. Rhodelphis are 10–13 μm, oval or the few known phagotrophic archaeplastids have been interpreted as tapered, slightly flattened cells with two subapical heterodynamic secondarily derived, it has been widely assumed that phagotrophy was flagella (Fig. 1a–f and Extended Data Figs. 1a, 2a, b). Characteristic lost early in the evolution of archaeplastids8. We therefore characterized morphological features include umbrella-shaped glycostyles on the the Rhodelphis genome and transcriptomes further to investigate the surface and (Fig. 1g, h, s and Extended Data Figs. 1d, 2c–e), ancestral states of phagotrophy and other characteristics that are seem- perpendicularly oriented flagellar basal bodies (Fig. 1i, j and Extended ingly absent from red algal and archaeplastid ancestors. Altogether, Data Figs. 1e, k, 1, 2g, h) with outgoing striated structures and at least these analyses suggest that Rhodelphis possess larger and more gene- two fibrils (Fig. 1j–l and Extended Data Fig. 2e), one narrow and two rich nuclear genomes than most red algae (Extended Data Table 1a–c) wide microtubular bands (Fig. 1l–q and Extended Data Fig. 1h–l), a and genes with many more introns (nearly 40,000 spliceosomal introns flagellar transition zone with a transverse plate at the cell surface and were identified in genome–transcriptome comparisons in R. limneti- a proximal diaphragm through which a central pair of flagellar micro- cus). But the most notable differences between Rhodelphis and red algae tubules surrounded by a cylinder passes (Fig. 1q, r and Extended Data are in gene content: Rhodelphis possess genes that are associated with Fig. 1e–h), a sac-shaped double-layered smooth endoplasmic reticulum centrioles, autophagy and synthesis of glycosylphosphatidylinositol, (Fig. 1 s–u and Extended Data Fig. 1o) and mitochondria with tubular all of which are absent from red algae9. Notably, genes that encode cristae (Fig. 1 s–u and Extended Data Fig. 2g, h, j). Plastids were not flagellar proteins are absent from red algae, whereas Rhodelphis encode observed. homologues of 209 out of 361 high-confidence Chlamydomonas flagel- To establish the evolutionary position of Rhodelphis, we sequenced lar proteins10, consistent with our microscopic observations. transcriptomes from cultures and manually isolated cells, and generated Rhodelphis engulf whole and eukaryotic prey at the posterior a concatenated 153-taxon/253-protein supermatrix (153/253 dataset; end (Extended Data Fig. 1r–t and Supplementary Video 1), but a dis- 56,312 sites). Maximum-likelihood and Bayesian analyses recovered tinct feeding apparatus is not evident. Phagotrophy in Rhodelphis there- Rhodelphis as a well-supported sister to the red algae, with ultrafast fore differs from feeding in some prasinophytes (the only other known bootstrap support of 97%, Shimodaira–Hasegawa-like approximate class in which phagotrophy occurs within Archaeplastida), which use likelihood ratio test (SH-aLRT) support of 0.99 and a Bayesian poste- a mouth-like opening, a tubular channel and a large permanent vac- rior probability of 0.98 (Extended Data Fig. 3a, b). Analyses of a second uole to engulf, transport and digest bacterial cells11. Specific genetic supermatrix without the picozoan MS584-115 and (both of markers of phagotrophy are difficult to define; however, genome-level which had poor datasets; 151-taxon/253-protein supermatrix (151/253 predictive models12 indicate that the genetic repertoire of Rhodelphis dataset), 56,530 sites), recovered Rhodelphis and red algae with is consistent with phagocytotic feeding (Extended Data Fig. 8).

1Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada. 2Papanin Institute for Biology of Inland Waters, Russian Academy of Sciences, Borok, Russia. 3Present address: Department of Biology, University of Victoria, Victoria, British Columbia, Canada. 4Present address: GEOMAR – Helmholtz Centre for Ocean Research Kiel, Division of Experimental 5 , Kiel, Germany. These authors contributed equally: Ryan M. R. Gawryluk, Denis V. Tikhonenkov. *e-mail: [email protected]; [email protected]; [email protected]

NAtUre| www.nature.com/nature RESEARCH Letter

a b c d cv cv cv

n

e g h i gl bb2 gl tp bb1

j fb k l ss f ss sm wmb1 af bb2 bb1 st bb1 bb2 fb mnwmb2 o pf wmb1

rs fv

wmb1 bb2

sm wmb2

p q dp cmt tp

bb2 ob

nmb mrt wmb2 r s tu dp ser ser ob mt cr bb1 mt cmt cl dc

n tp n ser rgl

gl

Fig. 1 | Cell morphology of R. limneticus. a–c, Living cells, visualized af, anterior flagellum; cl, cylinder; cv, contractile ; cr, cristae; cmt, by light microscopy. d, Cyst, visualized by light microscopy. e, Scanning central ; dc, dark condensation; dp, diaphragm; fb, fibril; fv, electron microscopy image highlighting the flagella. f–u, Cells, food vacuole; gl, glycostyles; bb1, of posterior flagellum; bb2, visualized by transmission electron microscopy. f, Longitudinal section. basal body of anterior flagellum; mt, ; mrt, ; g, Glycostyles can be seen on the cell surface. h, Transverse section of n, nucleus; nmb, narrow microtubular band; ob, osmiophilic body; pf, the posterior flagellum covered with glycostyles. i–l, Arrangement of posterior flagellum; rgl, rudiments of glycostyles; rs, reserve substance; ser, basal bodies, connecting structures and satellites. m–p, Arrangement sac of smooth endoplasmic reticulum; sm, single microtubules; ss, striated of microtubular bands, microtubules and satellites. q, r, Structure of the structure; st, satellite of basal body; tp, transverse plate; wmb1, wide flagellar transition zone. s, t, Transverse sections of the cell showing the microtubular band 1; wmb2, wide microtubular band 2. Scale bars, 10 μm sac-shaped smooth endoplasmic reticulum, nucleus and mitochondria. (a–c), 5 μm (d, e), 1 μm (f), 0.2 μm (g, h, n–p, r), 0.5 μm (i–m, q, u) and u, Area near the endoplasmic reticulum showing a single mitochondrion 2 μm (s, t). These experiments were repeated 50 (a–d), 3 (e) and 7 (f–u) with dark condensations and vesicles with rudiments of glycostyles. times, with similar results.

Taken together, the cellular motility and phagotrophic feeding by and TOC75 (Extended Data Fig. 9b–e), as well as many proteins with a Rhodelphis and the ancestral photosynthetic capacity of red algae indi- putative plastid function (Fig. 3a and Supplementary Table 1). Notably, cates that their common ancestor was mixotrophic. plastid proteins of Rhodelphis encode leader sequences that are similar to Although an ancestor of Rhodelphis must have been photosynthetic, archaeplastid transit peptides (Extended Data Fig. 9a); they do not have no plastids were observed using microscopy, so we searched for genetic bipartite leader sequences or specific homologues of SELMA complex evidence of a relic plastid, as complete loss of the plastid is exceedingly subunits15 that are indicative of protein targeting to complex plastids rare13,14. In archaeplastids, nucleus-encoded proteins are targeted to by the endoplasmic reticulum. Taken together, the evidence from the plastids through N-terminal transit-peptide leaders that are recognized analysis of the phylogenomics, plastid-targeting leaders and plastid by TIC/TOC import complexes. We identified homologues of several import machinery is consistent with the conclusion that Rhodelphis plastid import proteins in Rhodelphis, including TIC20, TIC22, TIC32 have a primary plastid as with other members of Archaeplastida.

NAtUre| www.nature.com/nature Letter RESEARCH

a b Fig. 2 | The evolutionary position of Rhodelphis. a, b, Bayesian Chloroplastida 20 Stramenopila (a; CAT + GTR) and maximum-likelihood (b; 151/253 dataset; LG + C60 + F + G4)21 analyses place Rhodelphis as a sister to red Glaucophyta –/81 algae (Rhodophyta). Bayesian analyses also recover the monophyly of Alveolata Archaeplastida, whereas maximum-likelihood analyses do not. Black dots denote full statistical support for the Bayesian tree (a; Bayesian posterior – probability = 1.0) or maximum-likelihood tree (b; maximum-likelihood Rhodophyta ultrafast bootstrap = 100%/SH-aLRT = 1.0); support values below 1.0 or Haptophyta Rhodelphis limneticus 100% are shown in cases in which either value is below this value; support Centrohelida Rhodelphis marinus values <0.7 or 70% are not shown (indicated by a ‘−’). Complete Bayesian 0.82 Ancoracysta twista and maximum-likelihood trees for 151/253 and 153/253 datasets are shown in Extended Data Figs. 3 and 4, respectively. c, Bootstrap support Chloroplastida c Opisthokonta Rhodelphis + red algae for maximum-likelihood trees (PROT + CAT + LG + F) after progressive Glaucophyta Archaeplastida removal of the fastest-evolving amino acid sites (151/253 supermatrix) Cryptista + green shows that the Rhodelphis and red algae relationship is robust, but that and Rhodophyta Archaeplastida paraphyly (Cryptista, and green plants and glaucophytes) chaeplastid a 100 may be the result of long branch attraction. Support for Opisthokonta Rhodelphis limneticus Ar monophyly serves as a control for the presence of sufficient information Rhodelphis marinus for phylogenomic inference. A parallel analysis for 153/253 is shown in 75 Cryptista Extended Data Fig. 3c.

Discoba 50 is found in numerous , fungi and protists, but not in archae- Opisthokonta 25 plastids (Fig. 3a). Furthermore, Rhodelphis also lack the plastid methy- Thecamonas trahens Bootstrap support (%) lerythritol phosphate (MEP) isoprenoid biosynthetic pathway, and use Breviate 0.77 the cytosolic mevalonate (MVA) pathway instead (Fig. 3a). The MVA 0 0.72 pathway is patchily distributed among red algae, and has been inde- sp. 01020304050 9 Thousand sites removed pendently lost numerous times , whereas the MEP pathway is found in most archaeplastids that have been examined to date. The distribu- tion of both pathways highlights the potential for long-term functional Consistent with their lack of pigmentation, Rhodelphis encode almost redundancy to shape gene loss and metabolic reorganization. no proteins that are involved in . The exceptions are By contrast, most Rhodelphis haem biosynthetic enzymes are prob- ferredoxin and ferredoxin NADP+ reductase, which probably partic- ably targeted to plastids, with the exception of the first step that is cat- ipate in FeS cluster assembly through the plastidial SUF-type (sulfur alysed by δ-aminolevulinic acid synthase (ALAS), which is probably a assimilation) biosynthesis pathway (Fig. 3a). A mitochondrial-type iron mitochondrial protein (Fig. 3a); this is probably the major reason that sulfur cluster (ISC) pathway is also present in Rhodelphis, which sug- Rhodelphis plastids have not been lost. All of the putatively plastid- gests that the SUF system supports only plastid FeS proteins. targeted haem proteins are of the typical plastid-type (phylogenetic We found no evidence for the type II fatty acid biosynthesis pathway trees are available through the Dryad Digital Repository), except fer- that is typical of plastids. Instead, Rhodelphis synthesize fatty acids in rochelatase (HemH), which branches with homologues from some the cytosol, through a multidomain type I fatty acid synthase, which red algae and that are in turn nested within a group

Fig. 3 | Genomic insights into Rhodelphis cell ab biology. a, Rhodelphis genome and transcriptome CPN60 LG + R9 analyses reveal a relic plastid that generates Flagella 0.3 87 FeS clusters and probably cooperates with (swimming) mitochondria in haem biosynthesis, cytosolic isoprenoid and fatty acid synthesis, extensive nuclear mRNA splicing (coloured lines on Nucleus 88 chromosomes), (Extended Data Basal Gene-rich Fig. 8 and Supplementary Video 1), basal bodies bodies Intron-rich 94 and flagella (a long flagellum was shortened to fit). 73 Note that subcellular structures are not drawn to 87 88 scale and the mitochondrial genome has not been Plastid Mitochondrion Non- verified to be circular. b, Maximum-likelihood Tubular photosynthetic ALAD cristae R. limneticus mtDNA phylogeny of Cpn60 with a predicted N-terminal PBGD R. marinus UROS transit peptide (Extended Data Fig. 9), indicating UROD e– SCoA Galdieria sulphuraria FNR FDX CPOX the presence of a primary plastid in Rhodelphis, PPOX ALAS Chondrus crispus NifU SufE FeCH Gly Cytosol Pyropia yezoensis 86 although none were observed by microscopy. SufB SufC Haem Porphyra purpurea SufS SufD Isoprenoid Porphyra umbilicalis Rhodelphis Cpn60 groups basal to homologues biosynthesis Porphyridium aerugineum FeS from red algae and other with plastids TIC acetyl-coA Porphyridium cruentum TOC 88 Erythrolophus madagascariensis derived from red algae (coloured wedges). Type I mevalonate Erythrolobus australicus fatty acid 75 Rhodosorus marinus Other putative plastid proteins are presented Madagascaria erythrocladiodes biosynthesis IPP Phagocytosis (FASI) Compsopogon coeruleus in Supplementary Table 1. IPP, isopentenyl (eukaryotic prey) Rhodella maculata Rhodella maculata pyrophosphate. Eukaryotic groups are indicated 91 94 Erythrolobus australicus Mastigonemes Exocytosis Porphyridium cruentum by coloured taxon names/collapsed : green, Digestion Cyanidioschyzon merolae (lysosomes) ; red, red algae; grey, glaucophytes; 90 turquoise, cryptophytes; pink, ;

78 dark blue, ; yellow, dinoflagellates; orange, chrompodellids and apicomplexans; brown, , amoebozoans, apusozoans, ancyromonads and . Taxa or clades outlined in black are prokaryotic.

NAtUre| www.nature.com/nature RESEARCH Letter of bacterial homologues. Other red algae and lineages with red algal Received: 27 February 2019; Accepted: 13 June 2019; plastids—such as , cryptophytes and ochrophytes—have a Published online xx xx xxxx. typical cyanobacterial HemH16. This again suggests that there is redun- dancy of HemH types in the common ancestor of Rhodelphis and red 1. Burki, F. The eukaryotic tree of from a global phylogenomic perspective. Cold Spring Harb. Perspect. Biol. 6, a016147 (2014). algae. 2. Archibald, J. M. The puzzle of plastid evolution. Curr. Biol. 19, R81–R88 (2009). Despite finding many nucleus-encoded putatively plastid-targeted 3. Keeling, P. J. The number, speed, and impact of plastid endosymbioses in proteins, we found no evidence that supports the existence of a plastid eukaryotic evolution. Annu. Rev. Biol. 64, 583–607 (2013). 4. Qiu, H., Price, D. C., Yang, E. C., Yoon, H. S. & Bhattacharya, D. Evidence of genome. No plastid DNA is present in the genomic datasets of ancient genome reduction in red algae (Rhodophyta). J. Phycol. 51, 624–636 R. limneticus, which were generated from over 300 million paired reads (2015). from both cultures and single cells. By contrast, mitochondrial DNA 5. Yoon, H. S. et al. Single-cell genomics reveals organismal interactions in was readily identified; the genome could not be unambiguously assem- uncultivated . Science 332, 714–717 (2011). 6. Salichos, L. & Rokas, A. Inferring ancient divergences requires genes with strong bled, but is >100 kilobases and individual contigs were connected in a phylogenetic signals. Nature 497, 327–331 (2013). single contig map. Similarly, no nucleus-encoded components of plastid 7. Zhang, C., Rabiee, M., Sayyari, E. & Mirarab, S. ASTRAL-III: polynomial time genome replication, gene expression or translation systems were identi- species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19, 153 (2018). fied in any Rhodelphis dataset. Notably, many of the nucleus-encoded, 8. Spiegel, F. W. Contemplating the frst Plantae. Science 335, 809–810 (2012). putatively plastid-targeted Rhodelphis proteins that we did identify are 9. Qiu, H., Yoon, H. S. & Bhattacharya, D. Red algal phylogenomics provides a encoded in plastid genomes of red algae. Taken together, we interpret robust framework for inferring evolution of key metabolic pathways. PLoS Curr. 8, https://doi.org/10.1371/currents.tol.7b037376e6d84a1be34af756a4d these observations as strong evidence for the complete loss of plastid 90846 (2016). DNA in Rhodelphis. This has only been reported in a few non- 10. Pazour, G. J., Agrin, N., Leszyk, J. & Witman, G. B. Proteomic analysis of a photosynthetic plastids17,18 and is in contrast to the gene-rich nature eukaryotic . J. Cell Biol. 170, 103–113 (2005). 19 11. Maruyama, S. & Kim, E. A modern descendant of early green algal phagotrophs. of the plastid genomes of red algae . Curr. Biol. 23, 1081–1084 (2013). In conclusion, phylogenomic analyses strongly support the place- 12. Burns, J. A., Pittis, A. A. & Kim, E. Gene-based predictive models of trophic ment of Rhodelphis as a sister lineage to red algae. Rhodelphis are modes suggest Asgard are not phagocytotic. Nat. Ecol. Evol. 2, 697–704 (2018). flagellate predators with primary, non-photosynthetic plastids that 13. Gornik, S. G. et al. Endosymbiosis undone by stepwise elimination of the plastid are involved in haem biosynthesis; all of which indicates that the in a parasitic dinofagellate. Proc. Natl Acad. Sci. USA 112, 5767–5772 (2015). ancestor of Rhodelphis and red algae was very different from previ- 14. Xu, P. et al. The genome of hominis. Nature 431, 1107–1112 ous models of the ancestors of red algae. This ancestor was probably (2004). 15. Gould, S. B., Maier, U.-G. & Martin, W. F. Protein import and the origin of red a mixotrophic flagellate that obtained energy and nutrients from both complex plastids. Curr. Biol. 25, R515–R521 (2015). photosynthetic plastids and phagotrophy, which suggests that phag- 16. Oborník, M. & Green, B. R. Mosaic origin of the heme biosynthesis pathway in otrophy persisted within Archaeplastida until well after the diver- photosynthetic eukaryotes. Mol. Biol. Evol. 22, 2343–2353 (2005). 17. Smith, D. R. & Lee, R. W. A plastid without a genome: evidence from the gence of red algae from green plants and glaucophytes. The gene- and nonphotosynthetic green algal Polytomella. Plant Physiol. 164, intron-rich genomes and complex pattern of biosynthetic pathway 1812–1819 (2014). retention also provide insights into the potential for functional redun- 18. Fernández Robledo, J. A. et al. The search for the missing link: a relic plastid in ? Int. J. Parasitol. 41, 1217–1229 (2011). dancy to persist over considerable periods of evolutionary time, 19. Muñoz-Gómez, S. A. et al. The new red algal subphylum Proteorhodophytina followed by differential loss. Indeed, Rhodelphis reveals that the comprises the largest and most divergent plastid genomes known. Curr. Biol. absence of phagotrophy and many other characteristics of the 27, 1677–1684 (2017). 20. Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software Archaeplastida as a whole are due to multiple convergent losses rather package for phylogenetic reconstruction and molecular dating. Bioinformatics than an already established ancestral state. 25, 2286–2288 (2009). 21. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and efective stochastic algorithm for estimating maximum-likelihood phylogenies. Online content Mol. Biol. Evol. 32, 268–274 (2015). Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, Publisher’s note: Springer Nature remains neutral with regard to jurisdictional peer review information; details of author contributions and competing interests; claims in published maps and institutional affiliations. andstatements of data and code availability are available at at https://doi. org/10.1038/s41586-019-1398-6. © The Author(s), under exclusive licence to Springer Nature Limited 2019

NAtUre| www.nature.com/nature Letter RESEARCH

Methods R. limneticus DNA was extracted from cultures (containing R. limneticus, prey Data reporting. No statistical methods were used to predetermine sample size. and bacteria) using the MasterPure DNA Purification Kit (Epicentre) or obtained The experiments were not randomized and the investigators were not blinded to from three individually picked cells through whole-genome amplification using the allocation during experiments and outcome assessment. TruePrime Single Cell WGA kit v.2.0 (Expedeon) according to the manufacturer’s Cell isolation and culture establishment. R. marinus (clone Colp-29) was obtained instructions. Libraries were generated at The Centre of Applied Genomics and from near-shore marine coral sand off the coast of a small island, Bay Canh, 151-bp paired-end reads were sequenced on an Illumina HiSeq X. Whole-genome near Con Dao Island, South Vietnam (8° 39′ 58.6362″ N, 106° 40′ 49.7562″ E). amplified DNA (TruePrime Single Cell WGA kit v.2.0 (Expedeon)) was sequenced The sample was collected from a depth of 40 cm on 3 May 2015. on MinION, Oxford Nanopore Technologies using the Ligation Sequencing Kit R. limneticus (clone Colp-38) was obtained from a near-shore freshwater sample 1D (SQK-LSK108, Oxford Nanopore Technologies). Covaris shearing was omit- including organic debris, taken by Y. V. Dubrovsky (IEE NAS Ukraine) on 9 August ted to preserve long fragments. DNA was initially treated with T7 endonuclease 2016 from Lake Trubin (floodlands of Desna River, 51° 23′ 49.9986″ N, 32° 22′ to remove extremely branched DNA structures resulting from whole-genome 8.0004″ E), near Yaduty village, Chernigovskaya oblast, Ukraine. amplification. The samples were examined on the third, sixth and ninth day of incubation Sequencing dataset assembly and decontamination. Sequence quality and adaptor in accordance with previously described methods22. Following isolation by glass contamination of reads from transcriptomic datasets were assessed with FastQC27. micropipette, R. marinus and R. limneticus were propagated on the bodonids Reads were trimmed with Trimmomatic-0.32 ILLUMINACLIP28, with a maximum Procryptobia sorokini strain B-69 and Parabodo caudatus strain BAS-1, respec- of two mismatches, a palindromeClipThreshold of 30 and a simpleClipThreshold tively, which were grown in marine Schmalz–Pratt’s medium and spring water of 10. Low-quality sequences were discarded, using a sliding window of 4 bp, a (Aqua Minerale, PepsiCo or PC Natural Spring Water, President’s Choice) using minimum quality score of 25 and a minimum trimmed length of 35 bp. the bacterium Pseudomonas fluorescens as food23. R. limneticus is currently being A strand-specific R. marinus HiSeq transcriptome was assembled with Trinity stored in a collection of live protozoan cultures at the Papanin Institute for Biology v.2.0.629, with the --SS_lib_type flag set to RF. MiSeq transcriptomes of R. limne- of Inland Waters, Russian Academy of Sciences; however, R. marinus perished after ticus from culture or 20-cell preparations were assembled in essentially the same several months of cultivation. manner, but without strand specificity. Transdecoder was used to infer the most Light and electron microscopy. Light microscopy observations of R. limneticus likely open reading frame sequences, informed by blastp30 and hmmscan queries were made using a Zeiss AxioScope A.1 equipped with a differential interference of the Swissprot and Pfam databases, respectively (E-value cut-off = 1 × 10−5). contrast (DIC) water-immersion objective (63×) and an AVT HORN MC-1009/S CD-HIT31 was used to reduce the redundancy of the inferred protein dataset analogue video camera. Observations of R. marinus were made using a Zeiss by clustering proteins with ≥95% identity. Extensive in silico decontamination Axioplan 2 Imaging microscope equipped with a DIC objective (40×) and a Canon was performed to remove sequences derived from the eukaryotic prey P. sorokini XL H1S video camera. (R. marinus), P. caudatus (R. limneticus) and co-cultured bacteria. We used mega- For scanning electron microscopy, cells from a culture in exponential growth blast to identify transcripts that were ≥95% identical to sequences of previously phase were fixed with 2.5% glutaraldehyde (final concentration). The cells were generated P. sorokini and P. caudatus MiSeq transcriptome datasets, along with mounted on a glass coverslip coated with poly-l- for 30 min and subsequently previously generated HiSeq transcriptome datasets from protists that feed on rinsed three times with 0.1 M sodium cacodylate buffer (pH 7.34), which was P. sorokini or P. caudatus (that is, sequences that these datasets had in common diluted twice with spring water (PC Natural Spring Water, President’s Choice). were probably derived from P. sorokini or P. caudatus). Contaminating bacterial Next, cells were fixed in 1% osmium tetroxide for 1 h. The fixed cells were rinsed sequences were identified using megablast and blastp queries of the NCBI nt and three times with distilled water, 10 min each time, and dehydrated with a graded nr databases. Nucleotide sequences that were ≥80% identical to bacterial entries ethanol series from 30% to absolute ethanol (10 min per step), followed by 100% were removed. For a protein sequence from the Rhodelphis datasets to be classified hexamethyldisilazane (three times, 15 min each) and dried at 65 °C. Dry glass as bacterial, each of the top-15 blastp hits had to be most similar to a bacterial coverslips were mounted on aluminium stubs, coated with gold–palladium, and homologue and ≥70% identical to a bacterial protein. Assessment of transcriptome observed with a Hitachi S4700 scanning electron microscope (Hitachi High- completeness was performed by searching ‘’ protein datasets with BUSCO Technologies Corporation). v.3.0.132, using default parameters. For transmission electron microscopy, cells were centrifuged, fixed in a cocktail The R. limneticus genome was assembled from HiSeq X reads (either culture of 0.6% glutaraldehyde and 2% osmium tetroxide (final concentration) prepared or whole-genome-amplified DNA from single cells (WGA)) and MinION reads using a 0.1 M cacodylate buffer (pH 7.2) for freshwater cells, or Schmaltz–Pratt using SPAdes v.3.11.133 with kmer lengths of 21, 33, 55, 77, 99 and 121, and with medium for marine cells at 1 °C for 30–60 min and dehydrated in an alcohol and the ‘–sc’ flag activated for the single-cell assembly. For each of the WGA and culture acetone series (30, 50, 70, 96 and 100%; 20 min per step). Finally, cells were embed- genome assemblies, contigs from co-cultured contaminants (for example, kine- ded in a mixture of Araldite and Epon24. Ultrathin sections were obtained with toplastid prey and bacteria) were identified using Autometa34 and removed. As an LKB ultramicrotome. Transmission electron microscopy observations were expected, the culture dataset was heavily contaminated, whereas WGA contigs obtained using a JEM-1011 (JEOL) electron microscope. were predominantly from R. limneticus. Assessments of genome assembly were Preparation of libraries and sequencing. RNA and genomic DNA isolation. Сells performed using QUAST v.5.0.235. grown in clonal laboratory cultures were collected when the cultures had reached A search for putative spliceosomal introns in R. limneticus was performed by peak abundance and after the prey had been eaten (based on daily light microscopy aligning transcripts to the culture assembly, requiring a minimum 95% identity observations). Cells were collected by centrifugation (1,000g, room temperature) threshold, using GMAP v.2016-08-1636. Spliceosomal introns with GT/AG splice onto a 0.8-μm membrane of a Vivaclear mini column (Sartorius Stedim Biotech, boundaries were extracted from the GMAP output with a custom Python script VK01P042); this was done separately for RNA and DNA extractions. Total RNA and visualized with WebLogo37. The total proportion of transcripts from R. limne- was then extracted using an RNAqueous-Micro Kit (Invitrogen, AM1931) and ticus mapping to the nuclear genome sequence was determined with isoblat v.338. converted into cDNA using the Smart-Seq2 protocol25. Additionally, cDNA of Searches for a plastid genome were carried out by querying Rhodelphis tran- R. limneticus was obtained from 20 single cells using the Smart-Seq2 protocol: scriptome and genome datasets with red algal plastid-encoded and proteins, cells were manually picked from the culture using a glass micropipette and and by searching for rRNAs of plastidial/cyanobacterial affinity with phyloFlash transferred to a 0.2-ml thin-walled PCR tube containing 2 μl cell lysis buffer v.3.039. All plastid-type proteins that were found are probably encoded by DNA in (0.2% Triton X-100 and RNase inhibitor (Invitrogen)). Total DNA was extracted the nucleus and no plastidial/cyanobacterial rRNAs were found. from the filters using the MasterPure Complete DNA and RNA Purification Kit Phylogenomic dataset preparation and analysis. Construction of a phylogenomic (Epicentre, MC85200). supermatrix was performed essentially as previously described40, using nearly The small subunit (SSU) rRNA genes of R. marinus and R. limneticus were the same dataset. In brief, blastp was used to identify Rhodelphis homologues, amplified by polymerase chain reaction (PCR) using the general eukar- with an expect value threshold of ≤1 × 10−30. Alignments were generated with yotic primers GGF (CTTCGGTCATAGATTAAGCCATGC) and GGR MAFFT L-INS-I v.7.21241 and trimmed automatically with BMGE v.1.1242 based (CCTTGTTACGACTTCTCCTTCCTC) and 18SFU and 18SRU26, respectively. on the BLOSUM75 substitution matrix. Single-protein maximum-likelihood PCR products were subsequently cloned (R. marinus) or sequenced directly phylogenies—derived from 20 independent heuristic searches with RAxML (R. limneticus) using Sanger dideoxy sequencing. v.8.1.643 using the PROTGAMMALGF model—were used to screen for paralogues R. limneticus transcriptome sequencing was performed on the Illumina MiSeq and sequences that were probably derived from prey contamination. Individual platform with read lengths of 300 bp using the NexteraXT protocol (Illumina, trimmed alignments were concatenated with SCaFOs v.1.2.544, requiring a FC-131-1024) to construct paired-end libraries. R. marinus transcriptome sequenc- minimum of 15% coverage for inclusion. The final concatenated alignment ing was performed on the Illumina HiSeq platform (UCLA Clinical Microarray included 153 taxa, 253 proteins and 56,312 amino acid sites (153/253 dataset); Core) with read lengths of 100 bp using the KAPA stranded RNA-seq kit (Roche) R. marinus and R. limneticus were well-represented (R. marinus, 98% of genes to construct paired-end libraries. and 99% of sites; R. limneticus, 94% of genes and 93% of sites). Another alignment RESEARCH Letter was generated without the picozoan MS584-11 and Telonema, leaving 151 taxa, v.1.247 using a gap threshold of 20% and trees were calculated using FastTree48 with 253 proteins, and 56,530 sites (151/253 dataset); R. marinus and R. limneticus were default options. Resulting phylogenies and underlying alignments were manually again well represented (R. marinus, 98% of genes and 99% of sites; R. limneticus, inspected, and obvious contaminations and paralogues were removed. Cleaned 94% of genes and 93% of sites). sequence files as well as the pre-existing alignments with added R. marinus/ Maximum-likelihood phylogenomic tree reconstruction was performed using R. limneticus sequences were filtered using prequel v.1.0.249 to remove stretches IQ-TREE v.1.5.521 with the LG matrix combined with the C60 protein mixture of non-homologous characters and then aligned using the G-INS-I algorithm of 50 model and four gamma categories (that is, LG + C60 + F + G4). The results of MAFFT in combination with VSM , using an αmax of 0.6, to reduce over-alignment. 1,000 ultrafast bootstrap and SH-aLRT replicates are reported as a measure of Alignments were trimmed as above and final trees were reconstructed using statistical support for bipartitions. Bayesian analyses were carried out by run- IQTREE21, using ModelFinder51 to identify the best model for each alignment ning three independent Markov chain Monte Carlo chains with PhyloBayes based on the Bayesian information criterion. Node supports were calculated with MPI v.1.720, using an infinite mixture model and four discrete gamma cate- 1,000 ultrafast bootstrap (UFBoot) replicates52. gories (CAT + GTR + G4). Chains were run for more than 7,200 generations Comparison of Rhodelphis gene repertoire to red algae. We used the KEGG for both the 153/253 and 151/253 datasets, and the first 1,500 generations were Automatic Annotation Server53, using the bi-directional best hit, to functionally discarded as burn-in. As is frequently seen in large-scale phylogenomic analyses, annotate Rhodelphis genes, along with genes encoded by the red algae Cyanidioschyzon the chains failed to converge in each case; for the 153/253 dataset, maxdiff = 1 merolae, Galdieria sulphuraria, Chondrus crispus and Porphyra purpureum. and meandiff = 0.0128946, and for the 151/253 dataset, maxdiff = 1 and mean- The resulting KEGG Orthology assignments were used to infer the overlap or diff = 0.0204329. Bayesian posterior probabilities are reported as a measure of difference in gene repertoire between Rhodelphis and red algae. Similarly, we used statistical support for bipartitions. OrthoFinder v.2.0.054 to assess the overlap in orthologous gene sets between the The effect of fast-evolving sites on phylogenomic inference was examined by same species (Extended Data Table 1c). Genome-based assessments of Rhodelphis progressively removing the fastest-evolving sites in intervals of 3,000 and using trophic mode were done with R. marinus and R. limneticus Transdecoder proteins RAxML (PROT + CAT + LG + F model) to generate 100 rapid-bootstrap replicates using PredictTrophicMode_Tool.R12. for each alignment. The fastest-evolving sites were identified using AgentSmith, Reporting summary. Further information on research design is available in based on the tree topologies generated in the IQ-TREE LG + C60 + F + G4 153- the Nature Research Reporting Summary linked to this paper. and 151-taxon trees. The support for the sister relationship of Rhodelphis and red algae was assessed for both trees; support for Archaeplastida monophyly versus Data availability Cryptista, glaucophytes and green plants (Cryptista and GG) was similarly tested Raw transcriptome and genome reads from R. limneticus and R. marinus are deposited for both datasets. Support for , Rhodelphis and red algae was assessed only in GenBank (PRJNA544719), along with full SSU rRNA gene sequences for for the 153-taxon tree. In each case, the monophyly of Opisthokonta was tested as R. marinus (MK966712) and R. limneticus (MK966713). Assembled tran- a positive control. Alternative tree topologies were generated from the maximum- scriptomes and genomes, along with raw light and electron-microscopy images, 45 likelihood trees using Mesquite v.3.5 and tested using the approximately unbiased individual gene alignments, concatenated and trimmed alignments, single-gene trees, 21 test, as implemented in IQ-TREE v.1.5.5 . and maximum-likelihood and Bayesian tree files for the 151-taxon and 153-taxon We carried out a number of analyses to test whether mixed gene ancestry could datasets have been deposited in Dryad (https://doi.org/10.5061/dryad.tr6d8q2). 43 be affecting the phylogenetic analysis. RAxML was used to compute internode The family Rhodelphidae (urn:lsid:zoobank.org:act:80B5C004-2954-4A57- 6 certainty scores by mapping bootstrapped single-gene trees (generated as above, A411-482BCD29E85D), genus Rhodelphis (urn:lsid:zoobank.org:act:6D09D4D9- except with the PROTCATLGF model) to the 151/253 maximum-likelihood D9FC-4D0C-8FB2-55FD9DDEAD53) and species Rhodelphis limneticus (urn:lsid: phylogenomic tree (Extended Data Fig. 4b). We subsequently identified the 50 single- zoobank.org:act:695ACD0B-8151-4609-97FC-A044A312BE22) and Rhodelphis 43 gene trees with the highest relative tree certainty (RTC) values with RAxML marinus (urn:lsid:zoobank.org:act:84233191-4710-43D1-A2DA-914B8E7B7E01) (average RTC score for all 253 single-gene trees is 0.175; average for best 50 is have been registered with the Zoobank database (http://zoobank.org/). 0.362), and computed internode certainty scores for this subset of the data, as above. To test whether the 50 highest RTC single-gene datasets recover the sister Code availability relationship of Rhodelphis and red algae, we concatenated the alignments with All unpublished code is available upon reasonable request from the corresponding SCaFOs v.1.2.544 and performed phylogenetic analysis in IQ-TREE using the authors. + + LG PMSF G model along with 1,000 ultrafast bootstrap and SH-aLRT rep- licates. ASTRAL-III7 was used to calculate overall species trees from individual 22. Tikhonenkov, D. V., Mazeĭ, IuA. & Embulaeva, E. A. [Degradation succession of bootstrapped trees under a ‘coalescence’ framework, using default parameters and heterotrophic fagellate communities in microcosms]. Zh. Obshch. Biol. 69, 100 multilocus bootstrap replicates. 57–64 (2008). Identification of putative plastid-targeted proteins. A screen for Rhodelphis 23. Tikhonenkov, D. V. et al. Description of vietnamica sp.n. and proteins bearing putative N-terminal plastid transit peptides was performed with Acavomonas peruviana n. gen. n. sp., two new phyla (Colponemidia TargetP46, with the ‘Plant’ flag activated. Proteins with a score corresponding to a nom. nov. and Acavomonidia nom. nov.) and their contributions to reconstructing the ancestral state of and eukaryotes. PLoS ONE 9, 70% probability of plastid localization were retained for further manual inspection. e95467 (2014). Additionally, we queried the NCBI nr database with Rhodelphis protein sequences 24. Luft, J. H. Improvements in epoxy resin embedding methods. J. Biophys. in search of proteins that were most similar to characterized plastidial or cyanobac- Biochem. Cytol. 9, 409–414 (1961). terial homologues, and queried the Rhodelphis genome and transcriptome assem- 25. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. blies with proteins encoded by red algal plastid genomes. We considered proteins Nat. Protocols 9, 171–181 (2014). to potentially be plastidial if they fulfilled both of the above criteria, or if they were 26. Tikhonenkov, D. V., Janouškovec, J., Keeling, P. J. & Mylnikov, A. P. The morphology, ultrastructure and SSU rRNA gene sequence of a new freshwater constituents of a metabolic unit of members which predominantly met the criteria. fagellate, borokensis n. sp. (Kinetoplastea, ). J. Eukaryot. Plastid protein phylogenies. Proteins of R. marinus and R. limneticus identified as Microbiol. 63, 220–232 (2016). being putatively plastid-targeted were either added to existing alignments (FeS clus- 27. Andrews, S. FastQC: a quality control tool for high throughput sequence data. ter and haem biosynthesis pathway proteins) or used as queries in a blastp search version 0.10.1 https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (E-value threshold of 1 × 10−5) against a comprehensive custom database, which (2010). 28. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a fexible trimmer for contained representatives of plastid-bearing and non-plastidial groups (archaeplas- Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). tids, cryptophytes, haptophytes, stramenopiles, dinoflagellates, chrompodellids, 29. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data apicomplexans as well as opisthokonts, amoebozoans/apusozoans/ancyromonads without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). and ciliates). For the TIC22 and TOC75 alignments, additional known homologues 30. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein were used as queries, to extend the number of taxa obtained with R. marinus and database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). 31. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large R. limneticus. Results from blast searches were parsed for hits with a minimum −25 −5 sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006). query coverage of 50% and E < 1 × 10 or E < 1 × 10 (TIC/TOC and YCF 32. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. proteins, respectively). The number of bacterial hits was constrained to 20 hits BUSCO: assessing genome assembly and annotation completeness with per phylum (for the –Chlorobi– group, most classes single-copy orthologs. Bioinformatics 31, 3210–3212 (2015). of , the group, 33. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its , , Cyanobacteria (unranked) and ) or 10 per applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012). 34. Miller, I. J. et al. Autometa: automated extraction of microbial genomes from phylum (remaining ) as defined by NCBI . For initial individual shotgun metagenomes. Nucleic Acids Res. 47, e57 (2019). tree reconstruction, corresponding sequences were aligned with MAFFT using 35. Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool the ‘--auto’ option, ambiguously aligned positions were trimmed off with trimAL for genome assemblies. Bioinformatics 29, 1072–1075 (2013). Letter RESEARCH

36. Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment 52. Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2005). (2018). 37. Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence 53. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C. & Kanehisa, M. KAAS: an logo generator. Genome Res. 14, 1188–1190 (2004). automatic genome annotation and pathway reconstruction server. Nucleic 38. Ryan, J. F. Baa.pl: a tool to evaluate de novo genome assemblies with RNA Acids Res. 35, W182–W185 (2007). transcripts. Preprint at https://arxiv.org/abs/1309.2087 (2013). 54. Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole 39. Gruber-Vodicka, H. R., Seah, B. K. B. & Pruesse, E. phyloFlash – rapid SSU rRNA genome comparisons dramatically improves orthogroup inference accuracy. profling and targeted assembly from metagenomes. Preprint at https://www. Genome Biol. 16, 157 (2015). biorxiv.org/content/10.1101/521922v1 (2019). 40. Burki, F. et al. Untangling the early diversifcation of eukaryotes: a phylogenomic Acknowledgements We thank L. Nguyen-Ngoc, H. Doan-Nhu, E. S. Gusev, study of the evolutionary origins of Centrohelida, Haptophyta and Cryptista. Y. Dubrovsky and the staff of the Russian-Vietnam Tropical Centre, Coastal Proc. R. Soc. B 283, 20152802 (2016). Branch for assistance with sample collection and trip management; 41. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software S. A. Karpov for assistance with interpretation of transmission electron version 7: improvements in performance and usability. Mol. Biol. Evol. 30, microscopy images; Compute/Calcul Canada for computational resources, 772–780 (2013). especially the Orcinus (Westgrid) and Mammouth Parallèle II (Calcul Québec) 42. Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): clusters. This work was supported by a grant from the Natural Sciences and a new software for selection of phylogenetic informative regions from multiple Engineering Research Council of Canada to P.J.K. (grant 227301). Field work sequence alignments. BMC Evol. Biol. 10, 210 (2010). in Vietnam is part of the project ‘Ecolan 3.2’ of the Russian-Vietnam Tropical 43. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and Centre. R.M.R.G. was supported by a fellowship from the Canadian Institutes post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). of Health Research and a grant from the Tula Foundation to the Centre for 44. Roure, B., Rodriguez-Ezpeleta, N. & Philippe, H. SCaFoS: a tool for selection, Microbial Diversity and Evolution. Cell isolation and culturing, generation of concatenation and fusion of sequences for phylogenomics. BMC Evol. Biol. 7 material for sequencing, light and electron microscopy and analysis were (Suppl. 1), S2 (2007). supported by the Russian Science Foundation to D.V.T. (grant 18-14-00239). 45. Maddison, W. P. & Maddison, D. R. Mesquite: a modular system for F.H. is supported by an EMBO fellowship (ALTF 1260–2016). evolutionary analysis. Version 3.5 https://www.mesquiteproject.org/ (2018). Author contributions R.M.R.G., D.V.T. and P.J.K. designed the study. D.V.T. 46. Emanuelsson, O., Nielsen, H., Brunak, S. & von Heijne, G. Predicting subcellular isolated and cultured cells. D.V.T. and F.H. generated material for sequencing localization of proteins based on their N-terminal amino acid sequence. J. Mol. and F.H. removed contaminant sequences from the genome assembly. Biol. 300, 1005–1016 (2000). D.V.T. and A.P.M. performed microscopy experiments. R.M.R.G. performed 47. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for phylogenomic, genomic and transcriptomic analyses. E.H. performed automated alignment trimming in large-scale phylogenetic analyses. phylogenetic analysis of plastid proteins. R.M.R.G., D.V.T. and P.J.K. wrote the Bioinformatics 25, 1972–1973 (2009). manuscript with input from all authors. 48. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profles instead of a distance matrix. Mol. Biol. Evol. 26, Competing interests The authors declare no competing interests. 1641–1650 (2009). 49. Whelan, S., Irisarri, I. & Burki, F. PREQUAL: detecting non-homologous Additional information characters in sets of unaligned homologous sequences. Bioinformatics 34, Supplementary information is available for this paper at https://doi.org/ 3929–3930 (2018). 10.1038/s41586-019-1398-6. 50. Katoh, K. & Standley, D. M. A simple method to control over-alignment in the Correspondence and requests for materials should be addressed to R.M.R.G., MAFFT multiple sequence alignment program. Bioinformatics 32, 1933–1942 D.V.T. or P.J.K. (2016). Peer review information Nature thanks Geoffrey McFadden and the other 51. Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. anonymous reviewer(s) for their contribution to the peer review of this work. ModelFinder: fast model selection for accurate phylogenetic estimates. Reprints and permissions information is available at http://www.nature.com/ Nat. Methods 14, 587–589 (2017). reprints. RESEARCH Letter

Extended Data Fig. 1 | Cell structure of R. limneticus. Related to several engulfed bacterial cells. bc, bacterium; cl, cylinder; cv, contractile Fig. 1. a, b, Scanning electron microscopy images showing the flagella vacuole; fv, food vacuole; gl, glycostyles; bb1, basal body of posterior and mastigonemes on the posterior flagellum. c, Section of an anterior flagellum; bb2, basal body of anterior flagellum; mn, mastigonemes; mrt, flagellum. d, Section of a posterior flagellum. e–g, Arrangement of the microtubule; n, nucleus; nmb, narrow microtubular band; ob, osmiophilic transitional zone of a flagellum with transverse plate and cylinder. body; pf, posterior flagellum; pr, eukaryotic prey; rgl, rudiments of h, Wide microtubular band 2 accompanies the posterior flagellum. glycostyles; rp, rhizoplast; ser, sac of smooth endoplasmic reticulum; i–l, Cell sections from anterior to posterior. m, Single microtubules sm, single microtubule; ss, striated structure; st, satellite of basal body; inside the . n, A rhizoplast connects the basal body of the tp, transverse plate; wmb1, wide microtubular band 1; wmb2, wide posterior flagellum to the nucleus. o, Area of cell with nucleus, rudiments microtubular band 2. Scale bars, 5 μm (a, b), 0.5 μm (c, d, i–o), 0.2 μm of glycostyles inside the vesicles and smooth endoplasmic reticulum. (e–h, q), 1 μm (p) and 2 μm (r–t). These experiments were repeated three p, . q, Osmiophilic body. r, s, Phagocytosis of (a, b) and seven (c–t) times, with similar results. eukaryotic prey and bacteria. t, Cell section showing food vacuole with Letter RESEARCH

Extended Data Fig. 2 | Cell structure of R. marinus. a, b, Living cells, k, Osmiophilic formation and microtubules. cr, cristae; dg, dark granules; obtained by light microscopy. c, Longitudinal section of the cell, d, Region fb, fibril; gl, glycostyles; bb1, basal body of posterior flagellum; bb2, of the cell surface with glycostyles. e, Basal body of posterior flagellum basal body of anterior flagellum; mt, mitochondrion; mrt, microtubules; with outgoing fibrils and striated structure. f, Section of the flagellum n, nucleus, of, osmiophilic (dark) formation; pf, posterior flagellum; covered with glycostyles and dark granules. g, Emergence of a posterior rgl, rudiments of glycostyles; rs, reserve substance; ser, sac of smooth flagellum. h, Basal body of the posterior flagellum and mitochondrion endoplasmic reticulum; ss, striated structure. Scale bars, 10 μm (a, b), with tubular cristae. i, Formation of rudiments of glycostyles in 2 μm (c), 0.2 μm (d, e), 0.5 μm (f–i, k) and 1 μm (j). These experiments were perinuclear space. j, Nucleus, mitochondrion and reserve substance. repeated ten (a, b) and three (c–k) times, with similar results. RESEARCH Letter

Chattonella subsalsa marina Fibrocapsa japonica Pseudopedinella elastica --/82 Vaucheria litorea Dictyocha speculum Nannochloropsis gaditana Pelagomonas calceolata Ochromonas sp. MMETSP1177 --/-- Spumella elongata Aureococcus anophageferrens 81/98 Paraphysomonas imperforata Aureoumbra lagunensis Mallomonas sp. ab100/81 Phaeomonas parva -- Phaeodactylum tricornutum Pinguiococcus pyrenoidosus Asterionellopsis glacialis Aureococcus anophageferrens Odontella aurita Pelagomonas calceolata Aureoumbra lagunensis Stramenopiles Pseudopedinella elastica Bolidomonas pacifica Rhizochromulina marina Dictyocha speculum Spumella elongata 99.9/82 Asterionellopsis glacialis Ochromonas sp. MMETSP1177 Phaeodactylum tricornutum 92.5/99 Odontella aurita Paraphysomonas imperforata Thalassiosira pseudonana -- Mallomonas sp. Bolidomonas pacifica Pinguiococcus pyrenoidosus --/91 Phytophthora parasitica Phaeomonas parva Stramenopiles Schizochytrium aggregatum Nannochloropsis gaditana Thraustochytrium sp. Aurantiochytrium limacinum Heterosigma akashiwo subsalsa hominis Fibrocapsa japonica sp. Cafeteria roebergensis Vaucheria litorea bovis Ectocarpus siliculosus falciparum Saprolegnia parasitica 0.89 Phytophthora Thraustochytrium sp. 93.1/98 pontica GG 99.8/100 Schizochytrium aggregatum Aurantiochytrium limacinum carterae brevis Aplanochytrium Alveolata Cafeteria sp. sp Blastocystis hominis 73.9/96 Cafeteria roebergensis Sterkiella histriomuscorum Litonotus pictus 87.9/96 tetraurelia Babesia bovis thermophila Toxoplasma gondii Platyophrya macrostoma Protocruzia adherens Cryptosporidium muris Colponema vietnamica Voromonas pontica GG 87.3/-- Elphidium sp Colpodella filosa 82.8/98 Astrolonche serrata Chromera velia sphaerica Vitrella brassicaformis Chlorarachnion reptans Rhizaria Karenia brevis Telonema subtilis Amphidinium carterae Arabidopsis Noctiluca scintillans 99.8/100 Populus trichocarpa Alveolata Mimulus guttatus Hematodinium sp Brachypodium distachyon Perkinsus marinus Oryza sativa Physcomitrella patens Tetrahymena thermophila Selaginella moellendorffii Paramecium tetraurelia Chlamydomonas reinhardtii Archaeplastida Platyophrya macrostoma Volvox carteri Chlorella vulgaris Sterkiella histriomuscorum Coccomyxa sp. Euplotes Micromonas sp. Ostreococcus lucimarinus Litonotus pictus Glaucocystis nostochinearum Protocruzia adherens 99.8/100 Gloeochaete witrockiana Colponema vietnamica --/94 Cyanophora paradoxa Cyanoptyche gloeocystis Reticulomyxa filosa 98.4/99 abbreviata Elphidium sp. Rhodomonas sp --/-- 95.6/100 sp. Astrolonche serrata theta Gromia sphaerica Rhizaria 99.8/100 andersenii Hemiselmis rufescens Chlorarachnion reptans curvata Cryptista Bigelowiella natans pacifica Telonema subtilis Goniomonas sp. Calcidiscus leptoporus Porphyridium aerugineum 100/75 Porphyridium cruentum Scyphosphaera apsteinii Erythrolobus galbana Compsopogon coeruleus Madagascaria erythrocladoides Chondrus crispus parvum Haptophyta Porphyra polylepis 70.1/78 Rhodella maculata Archaeplastida 87.6/80 Rhodosorus marinus Chrysochromulina brevifilum Cyanidioschyzon merolae Phaeocystis sp. 97.3/99 Galdieria sulphuraria Phaeocystis antarctica R. marinus 100/99 Pavlovales sp. R. limneticus Picobiliphyte MS584-11 heterophryoidea --/-- Calcidiscus leptoporus Pleurochrysis carterae Raphidiophrys ambigua Scyphosphaera apsteinii Raineriophrys erinaceoides 0.83 Centrohelida Emiliania huxleyi Choanocystis sp. 99.4/100 Chrysochromulina polylepis Acanthocystis sp. Haptophyta Ancoracysta twista Chrysochromulina brevifilum Phaeocystis antarctica Populus trichocarpa Phaeocystis sp. Arabidopsis Pavlovales sp. 100/97 Raphidiophrys ambigua Mimulus guttatus Raphidiophrys heterophryoidea Oryza sativa --/83 Raineriophrys erinaceoides Brachypodium distachyon 73.4/94 Choanocystis sp. 99.7/95 Acanthocystis sp. Centrohelida Selaginella moellendorffii Ancoracysra twista -- Physcomitrella patens Naegleria gruberi 90.5/100 Sawyeria marylandensis Coccomyxa sp. Percolomonas cosmopolitus MMETSP0758 Chlorella vulgaris Percolomonas cosmopolitus MMETSP0759 Volvox carteri saltans Neobodo designis Chlamydomonas reinhardtii --/72 Eutreptiella gymnastica Ostreococcus lucimarinus Excavata --/98 Seculamonas ecuadoriensis Micromonas sp. Reclinomonas americana Gloeochaete witrockiana Tsukubamonas globosa Glaucocystis nostochinearum Danio rerio 0.99 Archaeplastida Homo sapiens Cyanophora paradoxa Branchiostoma floridae Cyanoptyche gloeocystis Daphnia pulex Lottia gigantea Porphyridium cruentum Nematostella vectensis Porphyridium aerugineum Monosiga brevicollis Erythrolobus owczarzaki Cryptococcus neoformans Madagascaria erythrocladoides Ustilago maydis Compsopogon coeruleus Neurospora crassa -- Schizosaccharomyces pombe Rhodosorus marinus 99.8/97 Phycomyces blakesleeanus Porphyra Batrachochytrium dendrobatidis Chondrus crispus 97/97 Breviate -- Thecamonas trahens Rhodella maculata Dictyostelium discoideum Galdieria sulphuraria Dictyostelium purpureum 0.98 Polysphondylium pallidum Cyanidioschyzon merolae 97.8/100 Acanthamoeba castellanii R. limneticus --/77 --/95 Hartmannella vermiformis Collodictyon sp. 153/253 dataset (56,312 sites) R. marinus --/74 Malawimonas Picobiliphyte MS584-11 Rhodomonas sp. Rhodomonas abbreviata 0.2 IQ-TREE LG + C60 + F + G4 Cryptophyceae sp. Guillardia theta Hemiselmis rufescens Hemiselmis andersenii c 100 Cryptomonas curvata Cryptista Goniomonas sp. Goniomonas pacifica Roombia truncata Palpitomonas bilix Percolomonas cosmopolitus MMETSP0759 Percolomonas cosmopolitus MMETSP0758 Sawyeria marylandensis 0.97 Naegleria gruberi Neobodo designis 75 Archaeplastida Bodo saltans Excavata -- Eutreptiella gymnastica Seculamonas ecuadoriensis 0.99 Jakoba Rhodelphis + reds Reclinomonas americana Tsukubamonas globosa Homo sapiens Danio rerio Cryptista + green and glauc. Branchiostoma floridae support Lottia gigantea 50 Daphnia pulex Opisthokonta Nematostella vectensis Monosiga brevicollis Capsaspora owczarzaki Schizosaccharomyces pombe Pico + Rhodelphis + reds Neurospora crassa Ustilago maydis Cryptococcus neoformans Obazoa

Phycomyces blakesleeanus % bootstra p Batrachochytrium dendrobatidis 25 Thecamonas trahens 0.99 Breviate Dictyostelium purpureum Dictyostelium discoideum 153/253 dataset (56,312 sites) Polysphondylium pallidum Acanthamoeba castellanii -- Hartmannella vermiformis PhyloBayes CAT + GTR + G4 Collodictyon sp. Malawimonas 0

0.3 01020304050 Thousand sites removed

Extended Data Fig. 3 | Phylogenomic analysis of the concatenated removal of the fastest evolving amino acid sites shows that both the 153/253 dataset. a, Bayesian tree using the CAT + GTR evolutionary Rhodelphis and red algae and the picozoa, Rhodelphis and red algae model as implemented in PhyloBayes. b, Maximum-likelihood tree using relationships are relatively robust to data removal. Similar to the 151/253 the LG + C60 + F + G4 model as implemented in IQ-TREE. Black dots dataset, support for Archaeplastida paraphyly (Cryptista, green plants and denote full statistical support (Bayesian posterior probability = 1.0, glaucophytes (green and glauc.)) decreases with data removal, whereas maximum-likelihood ultrafast bootstrap and SH-aLRT = 100%); support Archaeplastida monophyly support increases. Support for Opisthokonta values <0.7/70% are not shown (indicated by ‘–’). c, Bootstrap support for monophyly serves as a control for the presence of sufficient information maximum-likelihood trees (PROT + CAT + LG + F) after progressive for phylogenomic inference. Letter RESEARCH

Rhizochromulina marina Pseudopedinella elastica Dictyocha speculum Pelagomonas calceolata Aureococcus anophageferrens Aureoumbra lagunensis -- Phaeodactylum tricornutum a Asterionellopsis glacialis Odontella aurita Thalassiosira pseudonana Bolidomonas pacifica Spumella elongata Ochromonas sp. MMETSP1177 -- Paraphysomonas imperforata Mallomonas sp Pinguiococcus pyrenoidosus Phaeomonas parva Stramenopiles Nannochloropsis gaditana Heterosigma akashiwo Chattonella subsalsa Fibrocapsa japonica Vaucheria litorea Ectocarpus siliculosus Saprolegnia parasitica Phytophthora Thraustochytrium sp. Schizochytrium aggregatum Aurantiochytrium limacinum Aplanochytrium Cafeteria sp. Blastocystis hominis Cafeteria roebergensis Plasmodium falciparum Babesia bovis Toxoplasma gondii Cryptosporidium muris Voromonas pontica GG Colpodella Chromera velia Vitrella brassicaformis Karenia brevis Amphidinium carterae Noctiluca scintillans Hematodinium sp. Alveolata Perkinsus marinus Tetrahymena thermophila Paramecium tetraurelia Platyophrya macrostoma Sterkiella histriomuscorum Euplotes Litonotus pictus Protocruzia adherens Colponema vietnamica Reticulomyxa filosa Elphidium sp. Astrolonche serrata Gromia sphaerica Rhizaria Chlorarachnion reptans Bigelowiella natans Pleurochrysis carterae -- Calcidiscus leptoporus Scyphosphaera apsteinii Isochrysis galbana Emiliania huxleyi Prymnesium parvum Haptophyta Chrysochromulina polylepis Chrysochromulina brevifilum Phaeocystis sp. Phaeocystis antarctica Pavlovales sp. Raphidiophrys heterophryoidea 0.81 Raphidiophrys ambigua Raineriophrys erinaceoides Choanocystis sp. Centrohelida 0.82 Acanthocystis sp. Ancoracysta twista Populus trichocarpa Arabidopsis Mimulus guttatus Oryza sativa Brachypodium distachyon Selaginella moellendorffii -- Physcomitrella patens Coccomyxa sp. Chlorella vulgaris Volvox carteri Chlamydomonas reinhardtii Ostreococcus lucimarinus Micromonas sp. Gloeochaete witrockiana Glaucocystis nostochinearum 0.96 Cyanophora paradoxa Archaeplastida Cyanoptyche gloeocystis Porphyridium cruentum Porphyridium aerugineum Erythrolobus Madagascaria erythrocladoides 0.86 Compsopogon coeruleus Rhodosorus marinus Porphyra Chondrus crispus 0.86 Rhodella maculata Galdieria sulphuraria Cyanidioschyzon merolae R. limneticus R. marinus Rhodomonas sp. Rhodomonas abbreviata Cryptophyceae sp. Guillardia theta Hemiselmis rufescens Hemiselmis andersenii Cryptomonas curvata Cryptista Goniomonas sp. Goniomonas pacifica Roombia truncata Palpitomonas bilix Percolomonas cosmopolitus MMETSP0759 Percolomonas cosmopolitus MMETSP0758 Sawyeria marylandensis 0.97 Naegleria gruberi Neobodo designis Bodo saltans Excavata -- Eutreptiella gymnastica Seculamonas ecuadoriensis 1 0.99 Jakoba Reclinomonas americana Tsukubamonas globosa Homo sapiens Danio rerio Branchiostoma floridae Lottia gigantea Daphnia pulex Nematostella vectensis Monosiga brevicollis Capsaspora owczarzaki Schizosaccharomyces pombe Neurospora crassa Ustilago maydis Cryptococcus neoformans Obazoa Phycomyces blakesleeanus Batrachochytrium dendrobatidis Thecamonas trahens Breviate 0.77 Dictyostelium purpureum Dictyostelium discoideum -- Polysphondylium pallidum Hartmannella vermiformis Acanthamoeba castellanii 0.72 Collodictyon sp. Malawimonas 151/253 dataset (56,530 sites) PhyloBayes CAT + GTR + G4 0.3

b Chattonella subsalsa Heterosigma akashiwo Fibrocapsa japonica Ectocarpus siliculosus --/84 Vaucheria litorea Nannochloropsis gaditana Ochromonas sp. MMETSP1177 --/-- Spumella elongata 85.2/96 Paraphysomonas imperforata 100/72 Mallomonas sp. Phaeomonas parva Pinguiococcus pyrenoidosus Aureococcus anophageferrens Pelagomonas calceolata Aureoumbra lagunensis Pseudopedinella elastica Stramenopiles Rhizochromulina marina Dictyocha speculum 99.9/73 Asterionellopsis glacialis Phaeodactylum tricornutum 94.9/99 Odontella aurita Thalassiosira pseudonana Bolidomonas pacifica --/91 Phytophthora Saprolegnia parasitica Schizochytrium aggregatum Thraustochytrium sp Aurantiochytrium limacinum Aplanochytrium Blastocystis hominis 99.9/100 Cafeteria sp. Cafeteria roebergensis Babesia bovis Plasmodium falciparum Toxoplasma gondii Cryptosporidium muris Colpodella 92.6/98 Voromonas pontica GG Chromera velia Vitrella brassicaformis Amphidinium carterae Karenia brevis Noctiluca scintillans Alveolata Hematodinium sp Perkinsus marinus 77.4/96 Euplotes Sterkiella histriomuscorum Litonotus pictus 89/97 Paramecium tetraurelia Tetrahymena thermophila Platyophrya macrostoma Protocruzia adherens Colponema vietnamica Elphidium sp. Reticulomyxa filosa 87/98 Gromia sphaerica Astrolonche serrata Rhizaria Bigelowiella natans Chlorarachnion reptans Arabidopsis 99.8/100 Populus trichocarpa Mimulus guttatus Brachypodium distachyon Oryza sativa Physcomitrella patens Selaginella moellendorffii Chlamydomonas reinhardtii Archaeplastida Volvox carteri Chlorella vulgaris Coccomyxa sp Micromonas sp Ostreococcus lucimarinus Glaucocystis nostochinearum 94.4/98 Gloeochaete witrockiana --/93 Cyanophora paradoxa Cyanoptyche gloeocystis Rhodomonas abbreviata Rhodomonas sp. --/81 93.7/99 Cryptophyceae sp. Guillardia theta 99.2/100 Hemiselmis andersenii Hemiselmis rufescens Cryptista Cryptomonas curvata Goniomonas pacifica Goniomonas sp. Roombia truncata Palpitomonas bilix Porphyridium aerugineum Porphyridium cruentum Erythrolobus Compsopogon coeruleus 70.7/-- Madagascaria erythrocladoides Rhodosorus marinus Chondrus crispus Archaeplastida Porphyra 98.5/-- Rhodella maculata Cyanidioschyzon merolae Galdieria sulphuraria R. marinus R. limneticus 73.3/89 Calcidiscus leptoporus Pleurochrysis carterae Scyphosphaera apsteinii Emiliania huxleyi 99.4/100 Isochrysis galbana Chrysochromulina polylepis Haptophyta Prymnesium parvum Chrysochromulina brevifilum Phaeocystis antarctica Phaeocystis sp. Pavlovales sp. 99.9/100 Raphidiophrys ambigua --/88 Raphidiophrys heterophryoidea Raineriophrys erinaceoides 71.5/95 Choanocystis sp. Centrohelida 99.7/100 Acanthocystis sp. Ancoracysta twista Naegleria gruberi 90.9/100 Sawyeria marylandensis Percolomonas cosmopolitus MMETSP0758 Percolomonas cosmopolitus MMETSP0759 Bodo saltans --/-- Neobodo designis Eutreptiella gymnastica Excavata Tsukubamonas globosa Jakoba --/98 Seculamonas ecuadoriensis Reclinomonas americana Danio rerio Homo sapiens Branchiostoma floridae Daphnia pulex Lottia gigantea Nematostella vectensis Monosiga brevicollis Capsaspora owczarzaki Cryptococcus neoformans Ustilago maydis Neurospora crassa Schizosaccharomyces pombe 99.8/100 Phycomyces blakesleeanus Obazoa Batrachochytrium dendrobatidis Breviate 90.6/99 Thecamonas trahens Dictyostelium discoideum Dictyostelium purpureum Polysphondylium pallidum Acanthamoeba castellanii 96.5/96 --/85 Hartmannella vermiformis Collodictyon sp. 72.3/79 Malawimonas

0.2

151/253 dataset (56,530 sites) IQ-TREE LG + C60 + F + G4 Extended Data Fig. 4 | Phylogenomic analysis of the concatenated 151/253 dataset. a, Bayesian tree using the CAT + GTR evolutionary model as implemented in PhyloBayes. b, Maximum-likelihood tree using the LG + C60 + F + G4 model as implemented in IQ-TREE. Black dots denote full statistical support (Bayesian posterior probability = 1.0, maximum-likelihood ultrafast bootstrap and SH-aLRT = 100%); support values <0.7/70% are not shown. RESEARCH Letter

0.993,0.993 Eutreptiella gymnastica Bodo saltans 1.000,1.000 Neobodo designis ab0.704,0.704 Bodo saltans 0.221,0.162 Neobodo designis 0.096,0.096 0.996,0.996 Percolomonas cosmopolitus 759 Eutreptiella gymnastica 0.128,0.128 Percolomonas cosmopolitus 758 0.133,0.117 0.083,0.083 Naegleria gruberi Naegleria gruberi -0.997,-0.543 Sawyeria marylandensis 0.609,0.609 Sawyeria marylandensis 0.069,0.078 0.100,0.235 0.216,0.252 Percolomonas cosmopolitus 758 0.999,0.999 Seculamonas ecuadoriensis 0.595,0.595 Percolomonas cosmopolitus 759 Jakoba 0.225,0.185 Tsukubamonas globosa -0.986,-0.986 Reclinomonas americana 0.000,0.122 Jakoba Tsukubamonas globosa 0.141,0.272 Seculamonas ecuadoriensis 0.993,0.993 Guillardia theta Reclinomonas americana Cryptophyceae sp. Arabidopsis -0.039,-0.039 Rhodomonas abbreviata 0.034,0.050 1.000,1.000 0.299,0.327 Populus trichocarpa -0.857,-0.857 Rhodomonas sp. 0.958,0.958 Mimulus guttatus 1.000,1.000 Hemiselmis rufescens 0.774,0.774 -0.995,-0.680 0.998,0.998 Brachypodium distachyon Hemiselmis andersenii 0.346,0.371 Oryza sativa Cryptomonas curvata 0.773,0.773 0.098,0.195 Physcomitrella patens 1.000,1.000 Goniomonas sp. Selaginella moellendorffii 0.612,0.612 Goniomonas pacifica 0.328,0.354 0.999,0.999 Chlamydomonas reinhardtii Roombia truncata 0.573,0.573 Volvox carteri Palpitomonas bilix Chlorella vulgaris 1.000,1.000 Volvox carteri 0.312,0.340 -0.002,-0.001 Coccomyxa sp. 0.127,0.127 Chlamydomonas reinhardtii -0.959,-0.122 0.724,0.724 Micromonas sp. Coccomyxa sp 0.773,0.773 Ostreococcus lucimarinus -0.238,-0.289 Chlorella vulgaris 0.117,0.211 Cyanophora paradoxa Ostreococcus lucimarinus Glaucocystis nostochinearum 0.992,0.992 Micromonas sp 0.122,0.183 1.000,1.000 0.083,0.083 Gloeochaete witrockiana 0.110,0.110 Selaginella moellendorffii Cyanoptyche gloeocystis -0.993,-0.418 Physcomitrella patens Cryptomonas curvata -0.153,-0.153 Arabidopsis -1.000,-0.108 0.773,0.773 0.036,0.052 Cryptophyceae sp. 1.000,1.000 Populus trichocarpa 1.000,1.000 Rhodomonas abbreviata -0.993,-0.378 0.263,0.295 Mimulus guttatus 0.041,0.091 Rhodomonas sp. 0.987,0.987 1.000,1.000 Brachypodium distachyon Guillardia theta 0.286,0.316 0.237,0.272 Oryza sativa 0.992,0.992 Hemiselmis andersenii 0.991,0.991 Cyanophora paradoxa Hemiselmis rufescens 0.235,0.189 0.996,0.996 Glaucocystis nostochinearum 0.981,0.981 Goniomonas pacifica Cyanoptyche gloeocystis Goniomonas sp. 0.678,0.678 Gloeochaete witrockiana 0.045,0.063 Roombia truncata 0.096,0.096 Chondrus crispus -0.993,-0.108 Palpitomonas bilix -0.666,-0.287 Porphyra 0.090,0.132 Chondrus crispus -0.993,-0.379 Rhodella maculata 0.103,0.130 Porphyra 0.993,0.993 Rhodosorus marinus Rhodella maculata 0.082,0.082 Porphyridium cruentum Compsopogon coeruleus 0.166,0.141 0.191,0.137 -0.081,-0.081 1.000,1.000 Porphyridium aerugineum 0.006,0.039 Madagascaria erythrocladoides 1.000,1.000 Erythrolobus Erythrolobus -0.082,-0.082 Compsopogon coeruleus Porphyridium aerugineum 0.268,0.299 0.310,0.310 0.758,0.758 Madagascaria erythrocladoides 0.001,0.019 0.001,0.045 Porphyridium cruentum -0.674,-0.674 Galdieria sulphuraria Rhodosorus marinus 0.110,0.110 Cyanidioschyzon merolae 0.018,0.108 0.075,0.128 Cyanidioschyzon merolae 0.128,0.128 Rhodelphis limneticus Galdieria sulphuraria Rhodelphis marinus 0.568,0.568 Rhodelphis marinus Rhodelphis limneticus -0.338,-0.338 Pleurochrysis carterae -1.000,-0.131 -0.365,-0.365 Calcidiscus leptoporus 0.119,0.081 Calcidiscus leptoporus 0.515,0.515 Scyphosphaera apsteinii -0.021,-0.026 Pleurochrysis carterae 1.000,1.000 Isochrysis galbana 0.064,0.075 Scyphosphaera apsteinii -0.496,-0.496 Emiliania huxleyi 0.939,0.939 Emiliania huxleyi -0.994,-0.379 0.098,0.066 Isochrysis galbana 0.991,0.991 Chrysochromulina polylepis Chrysochromulina brevifilum 1.000,1.000 0.520,0.520 Prymnesium parvum 0.420,0.342 Chrysochromulina brevifilum 0.859,0.859 Chrysochromulina polylepis 1.000,1.000 1.000,1.000 Phaeocystis antarctica 0.718,0.718 Prymnesium parvum Phaeocystis sp. 0.582,0.582 0.988,0.988 Phaeocystis antarctica Phaeocystis sp. Pavlovales sp. -0.031,-0.030 Pavlovales sp. -0.039,-0.039 0.997,0.997 Raphidiophrys heterophryoidea Acanthocystis sp. -0.674,-0.674 Raphidiophrys ambigua 0.587,0.587 Choanocystis sp. -0.039,-0.039 0.082,0.082 Raineriophrys erinaceoides -0.039,-0.052 Raineriophrys erinaceoides 1.000,1.000 Choanocystis sp. 0.090,0.132 Acanthocystis sp. -0.272,-0.087 Raphidiophrys ambigua 0.331,0.331 Raphidiophrys heterophryoidea Ancoracysta twista Ancoracysta twista 1.000,1.000 Bigelowiella natans Astrolonche serrata Chlorarachnion reptans 0.280,0.212 0.124,0.124 Elphidium sp. -0.945,-0.945 Astrolonche serrata -0.994,-0.125 0.280,0.212 0.999,0.999 Reticulomyxa filosa Reticulomyxa filosa 0.077,0.148 0.876,0.876 Gromia sphaerica -0.947,-0.947 Elphidium sp 0.945,0.945 Bigelowiella natans Gromia sphaerica Chlorarachnion reptans 1.000,1.000 Phytophthora Aplanochytrium Saprolegnia parasitica 0.977,0.977 Aurantiochytrium limacinum 0.082,0.082 Pseudopedinella elastica 0.127,0.127 0.598,0.598 Schizochytrium aggregatum Rhizochromulina marina 0.160,0.201 Thraustochytrium sp. 0.987,0.987 Dictyocha speculum 0.292,0.303 Asterionellopsis glacialis 1.000,1.000 Pelagomonas calceolata 0.126,0.171 Phaeodactylum tricornutum Aureococcus anophageferrens 0.363,0.386 Odontella aurita 1.000,1.000 Aureoumbra lagunensis 0.366,0.389 Thalassiosira pseudonana -0.994,-0.379 -0.496,-0.218 Bolidomonas pacifica 0.612,0.612 1.000,1.000 Bolidomonas pacifica 0.703,0.703 Thalassiosira pseudonana -0.371,-0.167 -0.997,-0.129 0.270,0.270 Aureococcus anophageferrens Odontella aurita 0.335,0.360 Pelagomonas calceolata 1.000,1.000 Asterionellopsis glacialis 0.006,0.077 0.156,0.134 Aureoumbra lagunensis -0.497,-0.218 0.096,0.096 0.612,0.612 Phaeodactylum tricornutum Dictyocha speculum 1.000,1.000 Phaeomonas parva Pseudopedinella elastica -0.998,-0.128 0.306,0.334 Pinguiococcus pyrenoidosus 0.145,0.276 Rhizochromulina marina 0.998,0.998 Ectocarpus siliculosus 0.227,0.186 0.964,0.964 Chattonella subsalsa Vaucheria litorea 0.110,0.110 0.268,0.282 Heterosigma akashiwo -0.497,-0.511 1.000,1.000 Chattonella subsalsa 0.032,0.053 Fibrocapsa japonica 0.622,0.622 Heterosigma akashiwo 0.110,0.110 -0.488,-0.101 Ectocarpus siliculosus Fibrocapsa japonica Vaucheria litorea 0.773,0.773 Spumella elongata 0.166,0.295 0.172,0.145 Nannochloropsis gaditana 0.100,0.100 Ochromonas sp. 1177 Mallomonas sp. 1.000,1.000 Mallomonas sp. 0.129,0.116 -0.278,-0.069 0.354,0.379 -0.081,-0.081 0.128,0.128 Ochromonas sp. 1177 Paraphysomonas imperforata Spumella elongata Nannochloropsis gaditana 0.129,0.174 -0.998,-0.380 0.124,0.124 -0.127,-0.044 Paraphysomonas imperforata 0.128,0.128 Aurantiochytrium limacinum 0.500,0.500 Phaeomonas parva Schizochytrium aggregatum 1.000,1.000 Pinguiococcus pyrenoidosus -0.869,-0.721 Thraustochytrium sp. 0.973,0.973 Phytophthora Aplanochytrium Saprolegnia parasitica -0.082,-0.218 Cafeteria sp. -0.989,-0.108 -0.054,-0.077 Blastocystis hominis 0.772,0.772 Blastocystis hominis 0.048,0.119 Cafeteria sp. Cafeteria roebergensis Cafeteria roebergensis Colponema vietnamica Colponema vietnamica 1.000,1.000 Karenia brevis 0.862,0.862 Euplotes Amphidinium carterae 0.092,0.228 0.999,0.999 Sterkiella histriomuscorum -0.669,-0.669 Noctiluca scintillans Litonotus pictus 0.622,0.622 0.098,0.144 Hematodinium sp. 0.257,0.290 Paramecium tetraurelia 0.110,0.100 0.110,0.110 Perkinsus marinus 0.237,0.234 Tetrahymena thermophila 0.593,0.593 Toxoplasma gondii Platyophrya macrostoma 0.100,0.100 0.082,0.082 Plasmodium falciparum Protocruzia adherens 0.096,0.096 0.760,0.760 Babesia bovis 0.065,0.112 Babesia bovis Cryptosporidium muris 0.050,0.067 Plasmodium falciparum 0.084,0.088 0.128,0.128 0.124,0.124 Chromera velia 0.096,0.091 Toxoplasma gondii Cryptosporidium muris Colpodella 0.082,0.218 0.980,0.980 Voromonas pontica GG 0.105,0.101 Chromera velia 0.124,0.124 0.105,0.170 Colpodella Vitrella brassicaformis 0.029,0.080 Voromonas pontica GG Protocruzia adherens 0.097,0.164 Sterkiella histriomuscorum 0.089,0.135 Vitrella brassicaformis 0.096,0.096 0.290,0.290 Perkinsus marinus 0.096,0.096 Euplotes Litonotus pictus 0.239,0.273 Hematodinium sp. Noctiluca scintillans 0.096,0.096 Platyophrya macrostoma 0.357,0.357 Paramecium tetraurelia 0.333,0.333 Karenia brevis 0.993,0.993 0.049,0.031 Amphidinium carterae 0.773,0.773 Tetrahymena thermophila Batrachochytrium dendrobatidis 0.082,0.218 Branchiostoma floridae Homo sapiens 0.832,0.832 Cryptococcus neoformans 0.216,0.253 0.096,0.232 1.000,1.000 Danio rerio 0.090,0.060 Ustilago maydis Neurospora crassa 0.996,0.996 Daphnia pulex 0.316,0.316 0.098,0.093 0.090,0.059 Schizosaccharomyces pompe 0.987,0.987 Lottia gigantea Phycomyces blakesleeanus Nematostella vectensis 0.124,0.124 Monosiga brevicollis 0.086,0.125 Branchiostoma floridae 0.037,0.053 Danio rerio -0.057,-0.057 Capsaspora owczarzaki 0.027,0.089 0.388,0.388 Homo sapiens Batrachochytrium dendrobatidis 1.000,1.000 Phycomyces blakesleeanus 0.245,0.262 -0.046,-0.056 Daphnia pulex 1.000,1.000 Cryptococcus neoformans Lottia gigantea -0.082,-0.218 -0.082,-0.030 0.199,0.163 Nematostella vectensis 0.099,0.099 Ustilago maydis 0.609,0.609 Neurospora crassa 0.120,0.106 Monosiga brevicollis Capsaspora owczarzaki 0.609,0.609 Schizosaccharomyces pompe 0.000,0.069 Breviate -0.999,-0.682 Thecamonas trahens Thecamonas trahens Breviate -0.994,-0.679 Collodictyon sp. 0.920,0.920 Dictyostelium purpureum Malawimonas 1.000,1.000 Dictyostelium discoideum 0.128,0.128 0.112,0.137 0.736,0.736 Dictyostelium discoideum Polysphondylium pallidum 1.000,1.000 Dictyostelium purpureum 0.086,0.222 Hartmannella vermiformis Polysphondylium pallidum 0.086,0.222 -1.000,-0.139 Acanthamoeba castellanii 0.510,0.375 0.497,0.497 Hartmannella vermiformis Collodictyon sp. Acanthamoeba castellanii Malawimonas Extended Data Fig. 5 | Internode certainty analyses for the 151/253 and internode certainty-all scores for the 50 trees among the 253-tree and 151/50 datasets. Internode certainty and internode certainty- dataset that have the highest RTC scores, which are expected to improve all scores were calculated with RAxML v.8.1.6 and mapped onto the the robustness of phylogenomic analyses. Internode certainty scores for 151/253 maximum-likelihood tree topology presented in Extended Data the sister relationship of Rhodelphis and red algae are higher for the 50 Fig. 4b. a, Internode certainty and internode certainty-all scores for 253 best-supported trees, indicating that they also support this relationship, individual bootstrapped maximum-likelihood trees used to generate the and with fewer conflicting signals. concatenated alignment for Extended Data Fig. 4b. b, Internode certainty Letter RESEARCH

Thalassiosira pseudonana 89.3/100 Odontella aurita Asterionellopsis glacialis Phaeodactylum tricornutum 99.3/79 Bolidomonas pacifica Phaeomonas parva Pinguiococcus pyrenoidosus --/71 Pseudopedinella elastica Rhizochromulina marina Dictyocha speculum Pelagomonas calceolata 98.7/-- Aureococcus anophageferrens Aureoumbra lagunensis Chattonella subsalsa Heterosigma akashiwo Fibrocapsa japonica Ectocarpus siliculosus Stramenopiles Vaucheria litorea Spumella elongata Ochromonas sp. MMETSP1177 83.2/99 Mallomonas sp. --/-- Paraphysomonas imperforata Nannochloropsis gaditana Phytophthora --/72 Saprolegnia parasitica Schizochytrium aggregatum Thraustochytrium sp. Aurantiochytrium limacinum Aplanochytrium Cafeteria sp. 99.6/100 Blastocystis hominis 81.6/91 Cafeteria roebergensis Reticulomyxa filosa Elphidium sp Astrolonche serrata Gromia sphaerica Rhizaria Bigelowiella natans Chlorarachnion reptans Plasmodium falciparum Babesia bovis Toxoplasma gondii Cryptosporidium muris Colpodella 99.7/100 Voromonas pontica GG 99.6/100 Chromera velia Vitrella brassicaformis Amphidinium carterae --/93 Noctiluca scintillans Alveolata Karenia brevis Hematodinium sp. Perkinsus marinus 88.4/98 Tetrahymena thermophila Paramecium tetraurelia Platyophrya macrostoma 99.6/100 Sterkiella histriomuscorum Euplotes Litonotus pictus Protocruzia adherens 76.2/87 Colponema vietnamica Pleurochrysis carterae Calcidiscus leptoporus Scyphosphaera apsteinii Isochrysis galbana 89.1/99 Emiliania huxleyi Chrysochromulina polylepis Haptophyta Prymnesium parvum Chrysochromulina brevifilum Phaeocystis antarctica Phaeocystis sp. Pavlovales sp. 99/99 Raphidiophrys heterophryoidea 86.1/98 Raphidiophrys ambigua Raineriophrys erinaceoides --/97 Choanocystis sp. 82/96 Acanthocystis sp. Centrohelida Ancoracysta twista Arabidopsis 92.1/99 Populus trichocarpa Mimulus guttatus Brachypodium distachyon Oryza sativa 100/96 Selaginella moellendorffii 90.3/99 Physcomitrella patens Volvox carteri Chlamydomonas reinhardtii Coccomyxa sp. Chlorella vulgaris 92.9/95 Ostreococcus lucimarinus Micromonas sp. Cyanophora paradoxa --/76 Glaucocystis nostochinearum Cyanoptyche gloeocystis Archaeplastida --/79 Gloeochaete witrockiana Porphyridium cruentum 99.9/100 Porphyridium aerugineum --/85 Erythrolobus 99.5/100 Compsopogon coeruleus 92.2/87 Madagascaria erythrocladoides Rhodosorus marinus Chondrus crispus Porphyra 98.3/87 Rhodella maculata Galdieria sulphuraria 100/95 99.8/100 Cyanidioschyzon merolae Rhodelphis marinus 99.9/99 Rhodelphis limneticus Rhodomonas abbreviata --/89 Rhodomonas sp. Cryptophyceae sp. Guillardia theta Hemiselmis rufescens --/80 Hemiselmis andersenii Cryptomonas curvata Cryptista 92.5/98 Goniomonas sp. Goniomonas pacifica Roombia truncata Palpitomonas bilix Percolomonas cosmopolitus MMETSP0759 Percolomonas cosmopolitus MMETSP0758 Naegleria gruberi 84.9/99 Sawyeria marylandensis 89.8/-- Seculamonas ecuadoriensis 95.4/100 Jakoba Excavata 97.5/-- Reclinomonas americana Bodo saltans Neobodo designis Eutreptiella gymnastica Tsukubamonas globosa Homo sapiens 99.6/100 Danio rerio Branchiostoma floridae Daphnia pulex Lottia gigantea Nematostella vectensis Monosiga brevicollis Capsaspora owczarzaki Cryptococcus neoformans Ustilago maydis Neurospora crassa Schizosaccharomyces pompe Obazoa Phycomyces blakesleeanus 95.2/80 Batrachochytrium dendrobatidis Thecamonas trahens 100/95 Breviate 100/95 Dictyostelium purpureum Dictyostelium discoideum 99.5/100 Polysphondylium pallidum 95.4/92 Hartmannella vermiformis Acanthamoeba castellanii --/-- --/70 Collodictyon sp. Malawimonas 151/50 dataset (21,886 sites) IQ-TREE LG + PMSF + G 0.2

Extended Data Fig. 6 | Phylogenomic analysis based on concatenation statistical support (maximum-likelihood ultrafast bootstrap and of 50 single-gene datasets with highest RTC scores. Maximum- SH-aLRT = 100%); support values <0.7/70% are not shown. The sister likelihood tree using the LG + PMSF + G model as implemented in relationship of Rhodelphis and red algae still receives full statistical support IQ-TREE (151 taxa, 50 proteins, 21,886 sites). Black dots denote full with a highly reduced, phylogenetically well-supported dataset. RESEARCH Letter

ab

Rhizochromulina marina Populus trichocarpa Pseudopedinella elastica -- Arabidopsis Dictyocha speculum Mimulus guttatus Pelagomonas calceolata Brachypodium distachyon Aureococcus anophageferrens Oryza sativa Aureoumbra lagunensis Physcomitrella patens -- Asterionellopsis glacialis 0.75 Selaginella moellendorffii 0.7 Phaeodactylum tricornutum Volvox carteri Odontella aurita Chlamydomonas reinhardtii -- Thalassiosira pseudonana Chlorella vulgaris Bolidomonas pacifica Coccomyxa sp. Phaeomonas parva 0.81 Ostreococcus lucimarinus Pinguiococcus pyrenoidosus Micromonas sp. -- Chattonella subsalsa Cyanoptyche gloeocystis Heterosigma akashiwo -- Gloeochaete witrockiana Fibrocapsa japonica -- Glaucocystis nostochinearum Vaucheria litorea Cyanophora paradoxa 0.8 Ectocarpus siliculosus Porphyridium cruentum -- Porphyridium aerugineum Nannochloropsis gaditana 0.79 Ochromonas sp. MMETSP1177 0.83 Erythrolobus 0.85 Spumella elongata Madagascaria erythrocladoides Mallomonas sp. Compsopogon coeruleus Paraphysomonas imperforata Chondrus crispus Phytophthora -- Porphyra 0.99 Saprolegnia parasitica -- Rhodella maculata Schizochytrium aggregatum Rhodosorus marinus Thraustochytrium sp. Cyanidioschyzon merolae Aurantiochytrium limacinum 0.96 Galdieria sulphuraria Aplanochytrium Rhodelphis limneticus Cafeteria sp. Rhodelphis marinus -- Blastocystis hominis Calcidiscus leptoporus 0.89 Cafeteria roebergensis Pleurochrysis carterae Colpodella Scyphosphaera apsteinii 0.9 Voromonas pontica GG Emiliania huxleyi 0.95 Chromera velia -- Isochrysis galbana Vitrella brassicaformis Chrysochromulina polylepis Babesia bovis Prymnesium parvum Plasmodium falciparum -- Chrysochromulina brevifilum -- Toxoplasma gondii Phaeocystis sp Cryptosporidium muris -- Phaeocystis antarctica Karenia brevis Pavlovales sp. Amphidinium carterae Ancoracysta twista Noctiluca scintillans -- -- Raineriophrys erinaceoides Hematodinium sp. Choanocystis sp. Perkinsus marinus 0.76 Raphidiophrys heterophryoidea 0.97 Raphidiophrys ambigua Euplotes 0.83 0.95 Sterkiella histriomuscorum Acanthocystis sp. Litonotus pictus Malawimonas 0.98 Paramecium tetraurelia Rhodomonas abbreviata Rhodomonas sp. Tetrahymena thermophila -- 0.84 Platyophrya macrostoma Cryptophyceae sp. Protocruzia adherens 0.81 Guillardia theta Hemiselmis rufescens Colponema vietnamica Hemiselmis andersenii Reticulomyxa filosa Cryptomonas curvata Elphidium sp. Goniomonas sp. Astrolonche serrata 0.94 Goniomonas pacifica Bigelowiella natans 0.99 Roombia truncata 0.76 Chlorarachnion reptans Gromia sphaerica Palpitomonas bilix Arabidopsis Rhizochromulina marina 0.89 Populus trichocarpa Pseudopedinella elastica Mimulus guttatus Dictyocha speculum Oryza sativa Aureococcus anophageferrens -- Pelagomonas calceolata Brachypodium distachyon Aureoumbra lagunensis Physcomitrella patens Nannochloropsis gaditana Selaginella moellendorffii -- Chattonella subsalsa Chlamydomonas reinhardtii Heterosigma akashiwo Volvox carteri Fibrocapsa japonica Coccomyxa sp. Ectocarpus siliculosus Chlorella vulgaris 0.84 -- 0.93 Micromonas sp. -- Vaucheria litorea Ostreococcus lucimarinus Phaeodactylum tricornutum Gloeochaete witrockiana Asterionellopsis glacialis 0.98 Odontella aurita 0.76 Glaucocystis nostochinearum -- Cyanophora paradoxa Thalassiosira pseudonana Cyanoptyche gloeocystis -- Bolidomonas pacifica Porphyridium aerugineum Phaeomonas parva Porphyridium cruentum Pinguiococcus pyrenoidosus 0.76 Erythrolobus Spumella elongata 0.76 -- Ochromonas sp. MMETSP1177 Madagascaria erythrocladoides Mallomonas sp. -- Compsopogon coeruleus Paraphysomonas imperforata -- Rhodella maculata Saprolegnia parasitica Rhodosorus marinus Phytophthora Chondrus crispus Thraustochytrium sp. Porphyra Schizochytrium aggregatum Galdieria sulphuraria Aurantiochytrium limacinum Cyanidioschyzon merolae Aplanochytrium Rhodelphis limneticus 0.74 Cafeteria roebergensis Rhodelphis marinus -- Blastocystis hominis Pleurochrysis carterae Cafeteria sp. Calcidiscus leptoporus Babesia bovis Scyphosphaera apsteinii Plasmodium falciparum Isochrysis galbana Toxoplasma gondii -- -- Emiliania huxleyi Cryptosporidium muris Chrysochromulina polylepis -- Voromonas pontica GG Prymnesium parvum -- Colpodella 0.88 Chrysochromulina brevifilum 0.81 Chromera velia Phaeocystis antarctica -- Vitrella brassicaformis Phaeocystis sp. Amphidinium carterae Pavlovales sp. 0.75 Noctiluca scintillans -- Raphidiophrys ambigua Karenia brevis Raphidiophrys heterophryoidea Hematodinium sp. -- Choanocystis sp. 0.85 Perkinsus marinus -- Raineriophrys erinaceoides 0.73 Sterkiella histriomuscorum Acanthocystis sp. 0.91 Euplotes Ancoracysta twista Litonotus pictus Rhodomonas abbreviata 0.93 Paramecium tetraurelia Rhodomonas sp. Tetrahymena thermophila Cryptophyceae sp. -- Platyophrya macrostoma -- Guillardia theta Protocruzia adherens Hemiselmis rufescens Colponema vietnamica Hemiselmis andersenii Reticulomyxa filosa Cryptomonas curvata 0.99 Elphidium sp. Goniomonas pacifica 0.7 Astrolonche serrata Goniomonas sp. Gromia sphaerica Roombia truncata Bigelowiella natans Palpitomonas bilix Chlorarachnion reptans Percolomonas cosmopolitus MMETSP0758 Percolomonas cosmopolitus MMETSP0758 Percolomonas cosmopolitus MMETSP0759 -- Percolomonas cosmopolitus MMETSP0759 Sawyeria marylandensis Naegleria gruberi -- Naegleria gruberi Sawyeria marylandensis Bodo saltans 0.98 Bodo saltans Neobodo designis Neobodo designis Eutreptiella gymnastica 0.73 Eutreptiella gymnastica Seculamonas ecuadoriensis Jakoba -- Jakoba 0.98 Seculamonas ecuadoriensis -- Reclinomonas americana 0.77 Reclinomonas americana Tsukubamonas globosa Tsukubamonas globosa Danio rerio Homo sapiens Homo sapiens 0.88 Danio rerio Branchiostoma floridae -- Branchiostoma floridae Lottia gigantea Lottia gigantea 0.86 Daphnia pulex 0.99 Daphnia pulex 0.99 Nematostella vectensis Nematostella vectensis Monosiga brevicollis Monosiga brevicollis Capsaspora owczarzaki Capsaspora owczarzaki Ustilago maydis Cryptococcus neoformans Cryptococcus neoformans Ustilago maydis Schizosaccharomyces pompe Neurospora crassa Neurospora crassa Schizosaccharomyces pompe -- Phycomyces blakesleeanus -- Phycomyces blakesleeanus -- Batrachochytrium dendrobatidis -- Batrachochytrium dendrobatidis Breviate Breviate Thecamonas trahens Thecamonas trahens -- Dictyostelium discoideum -- Dictyostelium purpureum Dictyostelium purpureum Dictyostelium discoideum Polysphondylium pallidum Polysphondylium pallidum -- 0.88 Acanthamoeba castellanii Hartmannella vermiformis 0.96 Hartmannella vermiformis 0.72 Acanthamoeba castellanii Malawimonas Collodictyon sp. -- Collodictyon sp.

2.0 2.0

ASTRAL-III species tree ASTRAL-III species tree 253 ML gene trees 50 ML gene trees 151 taxa 151 taxa Extended Data Fig. 7 | A coalescence phylogenomic framework not shown. a, Species trees were made from all 253 single-gene trees from recovers Rhodelphis as sister to red algae based on individual gene the 151/253 dataset. b, Species trees were made from the 50 trees with the trees. Individual bootstrapped gene trees were generated with RAxML highest relative tree certainty. The sister relationship of Rhodelphis and red v.8.1.6 and used to generate a species tree with ASTRAL-III under default algae is recovered with both datasets, and is consistent with concatenated parameters and 100 bootstrap replicates. Support values <0.7/70% are phylogenomic analyses. Letter RESEARCH

a

1

R. limneticus Groups R. marinus Phagocytotic eukaryote 0 Phagomixotrophic eukaryote Non-phagocytotic eukaryote PC2 (4%)

Rhodelphis

−1

−4 −2 02 PC1 (73%) b

1

Groups R. marinus R. limneticus Non-photosynthetic eukaryote Phagomixotrophic eukaryote

Photosynthetic eukaryote 0

PC2 (4%) Rhodelphis

−1

−4 −2 02 PC1 (73%) c

Color Key

Non-phagocytes Phagocytes

00.2 0.40.6 0.81 Value GO.0044655_phagosome.reneutralization_3 GO.0046952_ketone.body.catabolic.process_1 GO.0045806_negative.regulation.of.endocytosis_3 GO.0008045_motor.neuron.axon.guidance_3 GO.0014074_response.to.purine.containing.compound_6 GO.0098876_vesicle.mediated.transport.to.the.plasma.membrane_9 GO.0055070_copper.ion.homeostasis_2 GO.0010507_negative.regulation.of.autophagy_3 GO.0046683_response.to.organophosphorus_5 GO.1901031_regulation.of.response.to.reactive.oxygen.species GO.2000058_regulation.of.ubiquitin.dependent.protein.catabolic.process_6 GO.0034394_protein.localization.to.cell.surface_3 GO.0033132_negative.regulation.of.glucokinase GO.0016057_regulation.of.membrane.potential.in.photoreceptor.cell_1 GO.0061462_protein.localization.to.lysosome_8 GO.0007041_lysosomal.transport_7 GO.0071902_positive.regulation.of.protein.serine.threonine.kinase.activity_8 GO.0034097_response.to.cytokine_14 GO.0050767_regulation.of.neurogenesis_24 GO.0010506_regulation.of.autophagy_10 GO.0010770_positive.regulation.of.cell.morphogenesis.involved.in.differentiation_8 GO.0045666_positive.regulation.of.neuron.differentiation_14 GO.0031647_regulation.of.protein.stability_6 GO.0008361_regulation.of.cell.size_7 GO.0009166_nucleotide.catabolic.process_12 GO.0009612_response.to.mechanical.stimulus_7 GO.0030036_actin..organization_34 GO.0051017_actin.filament.bundle.assembly_7 GO.0019915_lipid.storage_5 GO.0030041_actin.filament.polymerization_13 GO.0007409_axonogenesis_14 GO.0045494_photoreceptor.cell.maintenance_8 GO.0010212_response.to.ionizing.radiation_9 GO.0030587_sorocarp.development_16 GO.0030218_erythrocyte.differentiation_5 GO.0010906_regulation.of.glucose.metabolic.process_4 GO.0032743_positive.regulation.of.inter leukin.2.production_3 GO.0033036_macromolecule.localization_125 GO.0032388_positive.regulation.of.intracellular.transport_7 GO.0005975_carbohydrate.metabolic.process_42 GO.0001894_tissue.homeostasis_13 GO.0007399_nervous.system.development_77 GO.0051495_positive.regulation.of.cytoskeleton.organization_11 GO.0007040_lysosome.organization_6 GO.0048666_neuron.development_41 GO.0007423_sensory.organ.development_15 GO.0033299_secretion.of.lysosomal.enzymes_5 GO.0035264_multicellular..growth_7 GO.0001932_regulation.of.protein.phosphorylation_25 GO.0040011_locomotion_49 GO.1902905_positive.regulation.of.supramolecular.fiber.organization_8 GO.1902904_negative.regulation.of.supramolecular.fiber.organization_8 GO.0009880_embryonic.pattern.specification_6 GO.0060429_epithelium.development_38 GO.0042060_wound.healing_15 GO.1903829_positive.regulation.of.cellular.protein.localization_8 GO.0046835_carbohydrate.phosphorylation_3 GO.0031152_aggregation.involved.in.sorocarp.development_9 GO.0051336_regulation.of.hydrolase.activity_34 GO.0060322_head.development_26 GO.0006915_apoptotic.process_42 GO.0061448_connective.tissue.development_6 GO.0001822_kidney.development_8 GO.0030097_hemopoiesis_17 GO.0000910_cytokinesis_11 GO.0022610_biological.adhesion_42 GO.0051260_protein.homooligomerization_9 GO.0051896_regulation.of.protein.kinase.B.signaling_4 GO.0002790_peptide.secretion_18 GO.0007601_visual.perception_8 GO.0045936_negative.regulation.of.phosphate.metabolic.process_16 GO.9999999_noGOannot_75 GO.0019835_cytolysis_5 GO.0051298_centrosome.duplication_7 GO.0007517_muscle.organ.development_8 GO.0008015_blood.circulation_10 GO.0019692_deoxyribose.phosphate.metabolic.process_5 GO.0018209_peptidyl.serine.modification_7 GO.0001501_skeletal.system.development_12 GO.0001503_ossification_8 . i i fii e is s m r f va on tus ae d m olae y r tta aste patens sorghi g tach thaliana onatus inhardtii C. tobiF.n alba P. abieO. satiss P. orella sp V. carter R. filosa P. S. pombcor e california irregularis T. trahenB. natanss B. saltans C. me di A. M. gu castellani lano. musculus R. marinus moellendor B. S. cerevisiC. Chl C. re N. R. A. M P. tetraurelia dendrobatiA. macrogynuD. discoideu T. thermophilaR. limneticus S. B. A. subglobosuD. me C. tetramitoformis

Extended Data Fig. 8 | Genomic support for Rhodelphis as non- by free-living phagocytes (a; n = 86 GO categories, 474 proteins) and photosynthetic phagotrophs. a, b, Principal component (PC) plots of photosynthesis model (b; n = 37 GO categories, 243 proteins). c, Heat gene ontology (GO) category score from free-living phagocytes (a) and map of phagocyte GO terms showing (as in a) that Rhodelphis gene photosynthetic (b). a, Rhodelphis associate with phagocytes, repertoires are similar to phagocytes. Analyses were performed using but not with photosynthetic eukaryotes. Dashed ellipses represent 95% PredictTrophicMode_Tool.R. confidence intervals based on training datasets using a model defined RESEARCH Letter

Extended Data Fig. 9 | Rhodelphis encode plastid-targeted proteins homologues (names in cyan), but lack a signal peptide (SP) characteristic with N-terminal targeting sequences and homologues of the TIC/TOC of proteins targeted to complex secondary or tertiary plastids, as found in plastid import system. a, An alignment of plastid-type chaperonin 60 Plasmodium (orange). b–e, Rhodelphis nuclear genomes encode bona fide (related to Fig. 3b) shows that Rhodelphis nuclear genomes encode plastid- homologues of plastid protein import subunits TIC20, TIC32, TIC22 and targeted proteins that have clear N-terminal extensions (cTP) relative to TOC75, which are specific genetic markers for plastid presence. plastid-encoded red algal homologues (names in red) and cyanobacterial Letter RESEARCH

Extended Data Table 1 | Summary of Rhodelphis genome and transcriptome data

a, Assembly statistics for the R. limneticus WGA dataset. HiSeq X 150-bp paired-end and MinION reads were assembled with SPAdes v.3.11.1, the assembly (scafolds) was decontaminated using Autometa and assessed with QUAST v.5.0.2. N50 and N75 values refer to the shortest contig lengths that make up 50% and 75% of the genome assembly, respectively. L50 and L75 refer to the smallest number of contigs that account for 50% and 75% of the assembly length, respectively. b, Summary of BUSCO reports for both Rhodelphis species based on decontaminated transcriptome data. R. limneticus results are from a combination of single-cell and culture datasets. Percentage of transcripts mapping to the R. limneticus genome was determined with isoblat and introns were inferred from GMAP alignments. High-quality R. marinus genome data were not generated before the death of the culture. c, Orthologous group (OG) distribution across red algae and Rhodelphis. OrthoFinder v.2.0.0 was used to infer orthologous groups between proteins inferred from R. marinus and R. limneticus transcriptomes and red algal genomes. Rhodelphis genomes are gene-rich in comparison to published red algal genomes, and retain components of evolutionarily conserved structures (such as fagella and centrioles) that have been lost in red algae. Species is abbreviated as ‘spec’. nature research | reporting summary

Corresponding author(s): Ryan Gawryluk

Last updated by author(s): May 26, 2019 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one- or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section. A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Our web collection on statistics for biologists contains articles on many of the points above. Software and code Policy information about availability of computer code Data collection Sequence quality: FastQC v0.10.1 Transcriptome assembly: Trimmomatic v0.32;Trinity v2.0.6;Transdecoder v5.0.2;CD-HIT v4.6 Genome assembly: SPAdes v3.11.1

Data analysis Transcriptome and genome analysis: QUAST v5.0.2; GMAP v2016-08-16; Autometa, cloned Jan 29, 2019 ( https://bitbucket.org/ jason_c_kwan/autometa); phyloFlash v3.0;R v3.5.0;BUSCO v3.0.1;BLAST+ v2.2.30;WebLogo v3 (http://weblogo.threeplusone.com/ create.cgi);isoblat v0.3

Comparative genomics: OrthoFinder v2.0.0;KEGG Automatic Annotation Server (KAAS) (https://www.genome.jp/kegg/kaas/);TargetP v1.1

Phylogenomic and phylogenetic: mesquite v3.5;trimAL v1.2;MAFFT v7.212;SCaFOs v1.2.5;BMGE v1.12;RAxML v8.1.6;FastTree v2.1.7 SSE3;IQ-TREE v1.5.5;PhyloBayes MPI v1.7;AgentSmith;ASTRAL-III v5.6.2;Prequal v1.0.2 For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data October 2018 Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: - Accession codes, unique identifiers, or web links for publicly available datasets - A list of figures that have associated raw data - A description of any restrictions on data availability

Raw transcriptome and genome reads from R. limneticus and R. marinus have been deposited in the GenBank Sequence Read Archive (SRA). Full SSU rRNA gene sequences have been deposited in GenBank under the accessions MK966712 (R. marinus) and MK966713 (R. limneticus). Assembled transcriptomes and genomes

1 have been deposited in Dryad (ACCESSION), along with raw light and electron microscopy images, individual gene alignments, concatenated and trimmed

alignments, and ML and Bayesian tree-files for the 151-taxon and 153-taxon datasets . Zoobank accessions are also provided for family nature research | reporting summary (urn:lsid:zoobank.org:act:80B5C004-2954-4A57-A411-482BCD29E85D), genus (urn:lsid:zoobank.org:act:6D09D4D9-D9FC-4D0C-8FB2-55FD9DDEAD53), and species (R. limneticus, urn:lsid:zoobank.org:act:695ACD0B-8151-4609-97FC-A044A312BE22; R. marinus, urn:lsid:zoobank.org:act:84233191-4710-43D1- A2DA-914B8E7B7E01).

Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design All studies must disclose on these points even when the disclosure is negative. Study description In this study, we describe two species from a novel phylum of predatory eukaryotic microbes, Rhodelphidia. We performed detailed ultrastructural, transcriptomic/genomic, and phylogenomic analyses, showing that Rhodelphis are the sister lineage to red algae, and that they likely retain a non-photosynthetic plastid involved in heme biosynthesis.

Research sample This research describes two new species, Rhodelphis limneticus and R. marinus, from a new phylum of predatory eukaryotic microbes that is the sister lineage to red algae. The organisms were collected from freshwater lake sediments and seawater sediments, respectively.

Sampling strategy Sample size is not relevant to the present study.

Data collection Samples were collected from freshwater and marine sediments, and the new organisms were subsequently grown in the laboratory. Microscopic data were recorded by D Tikhonenkov. Sequencing data were generated by the UCLA Clinical Microarray Core (R. marinus), The Centre of Applied Genomics (Toronto, Canada; R. limneticus), and in-house using a minION (F Husnik). Transcriptome/ genome data were assembled by R Gawryluk.

Timing and spatial scale Sampling relevant to the present study was carried out only two times: once from marine coral sand off of a small island, Bay Canh, near Con Dao Island, South Vietnam on May 3, 2015, and once from a near-shore freshwater sample on August 9, 2016. We had no reason to expect to find the organisms that we did, so there is no specific rationale to sampling sites.

Data exclusions Sequencing data from prey organisms were excluded from the analyses as best as possible. To do this for transcriptomes (R.marinus and R. limneticus), we subtracted transcripts derived from prey (kinetoplastids and co-cultured bacteria) from the total datasets. For genomic datasets, we used automated binning (autometa) along with limited manual curation in order to reduce prey contributions to the genome datasets. The raw data associated with this are still accessible in the raw read files deposited in the NCBI SRA database.

Reproducibility Microscopic analyses were conducted several times. Phylogenomic analyses were carried out with a number of different approaches (maximum likelihood, Bayesian etc.) and all associated datasets have been made available.

Randomization Randomization is not relevant to the present study because organisms were not allocated into groups.

Blinding Blinding was not relevant to the present study. Did the study involve field work? Yes No

Field work, collection and transport Field conditions Climatic conditions in the field were not recorded and are not relevant to the study.

Location 1) Bay Canh island near Con Dao Island, South Vietnam (8.666288 N, 106.680488 E). 2) Lake Trubin (flood-lands of Desna River, 51.397222 N, 32.368889 E), near Yaduty village, Chernigovskaya oblast, Ukraine

Habitats were accessed via a motor boat (location 1) and a car (location 2). No permissions were required for sampling in the

Access and import/export October 2018 selected sampling sites.

Disturbance No disturbances to the sites were caused; we sampled a small amount of water and sand/debris from a marine and lake habitat.

Reporting for specific materials, systems and methods

2 We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. nature research | reporting summary Materials & experimental systems Methods n/a Involved in the study n/a Involved in the study Antibodies ChIP-seq Eukaryotic cell lines Flow cytometry Palaeontology MRI-based neuroimaging Animals and other organisms research participants Clinical data

Eukaryotic cell lines Policy information about cell lines Cell line source(s) Two clonal cultures of protists were isolated from marine coral sand and a freshwater sample containing organic debris.

Authentication Phase contrast light microscopy and 18S rRNA gene sequencing was used for authentication.

Mycoplasma contamination This is not relevant to cell culture.

Commonly misidentified lines This is not relevant to protist cell culture. (See ICLAC register)

Animals and other organisms Policy information about studies involving animals; ARRIVE guidelines recommended for reporting research Laboratory animals The study did not involve laboratory animals.

Wild animals The study did not involve wild animals (or any animals).

Field-collected samples Monoeukaryotic cultures of Rhodelphis marinus and R. limneticus were established by isolating cells with a glass micropipette. Cultures were maintained at room temperature. R. marinus and R. limneticus were propagateed using the kinetoplastid protists Procryptobia sorokini B-69 and Parabodo caudatus BAS-1 as prey, respectively. The kinetoplastids were grown in marine Schmalz-Pratt’s medium and spring water and preyed upon Pseudomonas fluorescens.

Ethics oversight No ethical approval was required. The organisms described here are novel eukaryotic microbes (protists) that feed on other protists and pose no risks. Note that full information on the approval of the study protocol must also be provided in the manuscript. October 2018

3