1

Molecular Biology And Evolution Achimer April 2016, Volume 33 Issue 4 Pages 980-983 http://dx.doi.org/10.1093/molbev/msv340 http://archimer.ifremer.fr http://archimer.ifremer.fr/doc/00320/43156/ © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

Evolutionary Origins of Rhizarian Parasites

Sierra Roberto 1, *, Canas-Duarte Silvia J. 2, 8, Burki Fabien 3, Schwelm Arne 4, Fogelqvist Johan 4, Dixelius Christina 4, Gonzalez-Garcia Laura N. 2, Gile Gillian H. 5, Slamovits Claudio H. 5, Klopp Christophe 6, Restrepo Silvia 2, Arzul Isabelle 7, Pawlowski Jan 1

1 Univ Geneva, Dept Genet & Evolut, Geneva, Switzerland. 2 Univ Los Andes, Dept Biol Sci, Bogota, Colombia. 3 Univ British Columbia, Dept Bot, Vancouver, BC, Canada. 4 Swedish Univ Agr Sci, Linnean Ctr Plant Biol, Uppsala BioCtr, Dept Plant Biol, Uppsala, Sweden. 5 Dalhousie Univ, Dept Biochem & Mol Biol, Halifax, NS, Canada. 6 INRA, GenoToul Bioinformat Facil, F-31931 Toulouse, France. 7 IFREMER, La Tremblade, France. 8 Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA USA.

* Corresponding author : Roberto Sierra, email address : [email protected]

Abstract :

The SAR group (Stramenopila, Alveolata, ) is one of the largest clades in the tree of and includes a great number of parasitic lineages. Rhizarian parasites are obligate and have devastating effects on commercially important plants and animals but despite this fact, our knowledge of their biology and evolution is limited. Here, we present rhizarian transcriptomes from all major parasitic lineages in order to elucidate their evolutionary relationships using a phylogenomic approach. Our results suggest that , parasites of marine , are sister to the novel clade Apofilosa. The phytomyxean plant parasites branch sister to the vampyrellid algal ectoparasites in the novel clade Phytorhiza. They also show that Ascetosporea + Apofilosa + + Filosa + Phytorhiza form a monophyletic clade, although the branching pattern within this clade is difficult to resolve and appears to be model-dependent. Our study does not support the monophyly of the rhizarian parasitic lineages (), suggesting independent origins for rhizarian animal and plant parasites.

Keywords : Rhizaria, parasites, phylogenomics

Please note that this is an author-produced PDF of an article accepted for publication following peer review. The definitive publisher-authenticated version is available on the publisher Web site.

The SAR group includes a large diversity of eukaryotic parasites, including economically important plant parasites such as oomycetes (Stramenopila) and plasmodiophorids (Rhizaria), as well as the human malaria parasite (Alveolata), along with many other animal parasites that infect commercially important invertebrates. Parasitic rhizarians comprise poorly studied organisms that are split into two major assemblages based on their hosts, either animals or plants. The Ascetosporea, composed of Haplosporida (e.g. Bonamia) and Paramyxida (e.g. , no phylogenomic data available for the group), are intracellular parasites of marine invertebrates such as Turbellaria, , molluscs and crustaceans (Hartikainen, Ashford, et al. 2014; Hartikainen, Stentiford, et al. 2014). The , composed of Plasmodiophorida (e.g. ) and Phagomyxida (e.g. Phagomyxa, no phylogenomic data available for the group), are obligate biotrophic parasites of plants, diatoms, oomycetes and brown algae (Neuhauser, et al. 2014). Additionally, the naked vampyrellid amoebae, most commonly found in freshwater habitats, are ectoparasites of algae, fungi, small metazoans, and other protists (Hess, et al. 2012; Berney, et al. 2013). Despite being very diverse, vampyrellids have remained enigmatic but these parasites will likely gain attention as the commercial algae industry grows (Carney and Lane 2014).

All rhizarian parasites have been classified as Endomyxa based on 18S rRNA phylogenies together with two non-parasitic lineages and Filoreta (Cavalier-Smith 2002). However, multiple gene phylogenetic analysis did not support their common origin (Sierra, et al. 2013), but also did not resolve their evolutionary relationships, mainly because Ascetosporea were represented only by the extremely fast-evolving Mikrocytos mackini (Burki, et al. 2013). Here, we generated extensive transcriptomic data from representative species of all major rhizarian clades to elucidate the rhizarian parasites’ evolutionary relationships by means of phylogenomic analyses. Most notably, we obtained and analyzed Illumina sequenced transcriptomic data for two ascetosporeans (Bonamia and Minchinia), a vampyrellid (Leptophrys), and the plasmodiophorids Spongospora and the complete genome data from Plasmodiophora (Schwelm, et al. 2015). In addition, 454 GS-FLX Titanium reads were generated for 12 additional rhizarian species, including Gromia, Aulacantha scolymantha and 10 retarians (fig. 1, species in bold). These data were used to assemble a super-matrix of 229 proteins and 56 species containing 64,107 amino acid positions. The average missing sites per taxon in the full matrix is 45% (supplementary table S2. )

We recovered a sister relationship of Rhizaria and Alveolata with Stramenopila at their base using green plants as outgroup (fig. 1). This topology is strongly supported (fig. 1, branch support 1/0.5/98/98/100) by most analyses with the exception of the Bayesian inference calculated on PhyloBayes (Lartillot, et al. 2009), where the posterior probability (PP) for the branch is 0.5 (fig. 1 and supplementary fig. S3). This result is also congruent with the previously obtained phylogeny based on 109 protein-coding genes (Sierra, et al. 2013). However, different topologies have been proposed for the SAR group. Burki, et al. (2013) suggested a sister relationship of Alveolata and Stramenopila with Rhizaria at their base while Parfrey, et al. (2010) proposed a sister relationship of Rhizaria and Stramenopila with Alveolata at their base. Hence the branching order within the group remains unresolved.

Our results show that the ascetoporean parasites form a monophyletic group of shellfish parasites including a strongly supported (fig. 1, branch support .99/1/93/98/93) clade of Bonamia and Minchinia. The best topologies obtained from our analyses place the intracellular parasite, M. mackini, in the Ascetosporea clade. However, due to the fast evolutionary rates, its phylogenetic position is not strongly supported (fig. 1, branch support .57/-/53/65/100), except for the maximum support obtained with the non-parametric SH-alrt single branch test. The Bayesian analysis obtained on PhyloBayes (Lartillot, et al. 2009) does not resolve the position of M. mackini and thus results in a trichotomy of Retaria, Ascetosporea + Gromia + Filoreta and M. mackini (supplementary fig. S3). As previously suggested by Burki, et al. (2013), the Ascetosporea form a sister group to the clade Gromia + Filoreta, called here Apofilosa. The support for this sister group relationship is not very strong (fig. 1, branch support .60/- /53/65/100), yet in the absence of M. mackini, the Ascetosporea + Apofilosa form a robust clade (supplementary fig. S1, branch support .92/.97/100/100). On the other hand, the rhizarian plant parasites, Phytomyxea and Vampyrellida were recovered in a strongly supported (fig. 1 branch support 1/1/100/100/100) monophyletic clade named here Phytorhiza. In the absence of M. mackini, the Phytorhiza clade is still recovered with maximum branch support values, but its position within Rhizaria is weakly supported (supplementary fig. S1, branch support .92/-/64/70) compared to the full data set analyses (fig. 1 branch support .98/1/89/90/100). A monophyletic Endomyxa (rhizarian parasites, Gromia and Filoreta) was only recovered by Bayesian inference implemented on PhyloBayes in the absence of M. mackini, though not supported (0.66 PP, supplementary fig. S8). Additionally, the sister relationship of this assemblage with Retaria is not resolved (0.58 PP).

To conclude, the highly supported deep nodes of our phylogenomic tree suggest that there are four major groups of Rhizaria: Filosa, Phytorhiza, Ascetosporea + Apofilosa, and Retaria (fig. 1); this greatly clarifies rhizarian phylogeny. Nonetheless, the diversity of these groups is extremely high and further sampling and data is needed to obtain better phylogenetic resolution. In particular, the addition of two ascetosporean taxa to the data matrix resulted in a sister relationship between ascetosporeans and Apofilosa, together with the strongly supported clade of Vampyrellida and Phytomyxea, allowed confirming the independent evolution of in Phytorhiza and Ascetosporea.

Materials and Methods

Species and alignments. A phylogenomic supermatrix of 229 genes (64107 positions, Supplementary table S1) was constructed using full-length proteins based on a pool of genes from (Burki, et al. 2010; Burki, et al. 2012; Burki, et al. 2013). Rhizarian sequences were assigned and added to the individual alignments based on homology searches using blastp. The alignments included representative species for all major groups of eukaryotes. New, unpublished sequences from L. vorax, S. subterranea, P. brassicae, M. chitonis, B. ostreae, M. mackini, G. sphaerica, six species of (Elphidium margaritaceum, Globobulimina turgida, Brizalina sp., Bulimina marginata, Nonionellina sp., and Ammonia sp.), 5 radiolarians (Phyllostaurus siculus, Amphilonche elongata, Astrolonche serrata, Collozoum sp., and Spongosphaera streptacantha) and A. scolymantha were generated for this study. See supplementary methods for details of library preparation, sequencing, and assembly. All new sequences were added and automatically aligned to the data set using MAFFT v.6.847b (Katoh et al., 2002). Ambiguously aligned positions were removed using BMGE (Criscuolo and Gribaldo 2010). Phylogenetic analyses. Single-gene maximum likelihood trees were performed on RAxML v.8.0.26 (Stamatakis 2006) using the PROTGAMMALGF setting with 1000 bootstrap replicates. All trees were carefully examined to discard any sequence from the SAR group that branched with plants or animals. The 229 alignments were concatenated using SCaFoS (Roure, et al. 2007). The ML analysis of the phylogenomic matrix was performed using RAxML with PROTGAMMALGF setting and statistical support was evaluated with 1000 standard bootstrap replicates. The IQ-Tree algorithm was used to calculate a second ML analysis with LG+G parameters, 1000 Ultrafast bootstrap replicates and the non parametric SH-alrt single branch test (Nguyen, et al. 2015). Bayesian Inferences was carried out using ExaBayes using revmat=(0) setting (Aberer, et al. 2014) and PhyloBayes v.3.3f (Lartillot, et al. 2009) under the CAT-GTR model with two independent chains until convergence. For post-analysis of the independent chains a 20% burnin was used.

Data availability. The transcriptomes used for this study were deposited under: SRX1014535, SRX1014576, SRX1014585, SRX1014586, SRX1014587, SRX1014589, SRX1016224, SRS621308, SRS412271, PRJEB8376, PRJEB9159, SRS621308, SRS621524.

Acknowledgements. Most computations were performed at the University of Geneva on the Baobab cluster (http://baobab.unige.ch), PhyloBayes was run on the CIPRES portal (phylo.org), sequence assemblies were performed at the Vital-IT High Performance Computing Center (http://www.vital-it.ch) at the Swiss Institute of Bioinformatics. We are indebted to Bryony A. P. Williams for providing chitons infected with Minchinia chitonis. GHG is supported by an NSERC postdoctoral fellowship. CHS is a Fellow of the Canadian Institute for Advanced Research and is also supported by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant 386345). AS and CD are supported by BioSoM and the Swedish University of Agricultural Sciences, SLU. This work was supported by the Swiss National Science Foundation Grant No. 140766 and 159709 (RS, JP).

References Aberer AJ, Kobert K, Stamatakis A. 2014. ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol 31:2553-2556.

Berney C, Romac S, Mahe F, Santini S, Siano R, Bass D. 2013. Vampires in the oceans: predatory cercozoan amoebae in marine habitats. ISME J 7:2387-2399.

Burki F, Corradi N, Sierra R, Pawlowski J, Meyer GR, Abbott CL, Keeling PJ. 2013. Phylogenomics of the intracellular parasite Mikrocytos mackini reveals evidence for a mitosome in Rhizaria. Curr Biol 23:1541-1547.

Burki F, Kudryavtsev A, Matz MV, Aglyamova GV, Bulman S, Fiers M, Keeling PJ, Pawlowski J. 2010. Evolution of Rhizaria: new insights from phylogenomic analysis of uncultivated protists. BMC Evol Biol 10:377.

Burki F, Okamoto N, Pombert JF, Keeling PJ. 2012. The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins. Proc Biol Sci 279:2246-2254.

Carney LT, Lane TW. 2014. Parasites in algae mass culture. Front Microbiol 5:278.

Cavalier-Smith T. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of . Int J Syst Evol Microbiol 52:297-354.

Criscuolo A, Gribaldo S. 2010. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:210.

Hartikainen H, Ashford OS, Berney C, Okamura B, Feist SW, Baker-Austin C, Stentiford GD, Bass D. 2014. Lineage-specific molecular probing reveals novel diversity and ecological partitioning of haplosporidians. ISME J 8:177-186.

Hartikainen H, Stentiford GD, Bateman KS, Berney C, Feist SW, Longshaw M, Okamura B, Stone D, Ward G, Wood C, et al. 2014. Mikrocytids are a broadly distributed and divergent radiation of parasites in aquatic invertebrates. Curr Biol 24:807-812.

Hess S, Sausen N, Melkonian M. 2012. Shedding light on vampires: the phylogeny of vampyrellid amoebae revisited. PLoS ONE 7:e31165.

Lartillot N, Lepage T, Blanquart S. 2009. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286-2288.

Neuhauser S, Kirchmair M, Bulman S, Bass D. 2014. Cross- host shifts of phytomyxid parasites. BMC Evol Biol 14:33.

Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268- 274. Parfrey LW, Grant J, Tekle YI, Lasek-Nesselquist E, Morrison HG, Sogin ML, Patterson DJ, Katz LA. 2010. Broadly sampled multigene analyses yield a well-resolved eukaryotic tree of life. Syst Biol 59:518-533.

Roure B, Rodriguez-Ezpeleta N, Philippe H. 2007. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol Biol 7 Suppl 1:S2.

Schwelm A, Fogelqvist J, Knaust A, Jülke S, Lilja TB-R, G., Karlsson M, Shevchenko A, Dhandapani V, Choi SR, Kim HG, et al. 2015. The Plasmodiophora brassicae genome reveals insights in its life cycle and ancestry of chitin synthases. Sci Rep 5:11153.

Sierra R, Matz MV, Aglyamova G, Pillet L, Decelle J, Not F, de Vargas C, Pawlowski J. 2013. Deep relationships of Rhizaria revealed by phylogenomics: a farewell to Haeckel's . Mol Phylogenet Evol 67:53-59.

Stamatakis A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688-2690. Figure 1. Phylogeny of the SAR group. The consensus tree shown was obtained by the majority rule of the Bayesian tree pool after 385 000 generations inferred with 229 genes using the GTR matrix (revMat with Dirichlet process). The statistical support on branches was estimated by Bayesian posterior probabilities of post-burnin bipartitions using ExaBayes, PhyloBayes under CAT-GTR model, bootstrap replicates from maximum likelihood analysis on RAxML and ultrafast bootstrap values and the single branch SH-like approximate likelihood ratio test (SH-aLRT) implemented on IQ-Tree, respectively. Maximum statistical support was obtained under the five methods where solid dots and no values are shown (i.e. 1/1/100/100/100). The parasitic lineages are marked with a capital "P" on the branches. New data added in this study is shown for the rhizarian species in bold italic fonts.

1/.9/48/84/75 Nonionellina sp Globobulimina turgida Elphidium margaritaceum Foraminifera Bulimina marginata .99/1/39/73/39 Ammonia sp 1/1/70/87/71 filosa Collozoum sp Retaria Spongosphaera streptacantha Polycystinea Phyllostaurus siculus Amphilonche elongata .98/.5/83/86/100 Astrolonche serrata .99/1/93/98/93 P Minchinia chitonis Ascetosporea Mikrocytos mackini .98/1/89/90/100 .57/-/53/65/100 Gromia sphaerica Filoreta sp Apofilosa .98/-/93/91/99 RHIZARIA .60/-/53/65/100 Spongospora subterranea .98/1/100/100/100 Phytomyxea P Plasmodiophora brassicae Phytorhiza Leptophrys vorax Vampyrellida marina Guttulinopsis vulgaris .92/.87/33/40/60 natans Filosa .92/1/40/46/95 Aulacantha scolymantha Plasmodium falciparum 1/.5/98/98/100 Theileria annulata Neospora caninum P Toxoplasma gondii Cryptosporidium hominis Symbiodinium sp Karlodinium micrum Dinoflagellata Oxyrrhis marina P Perkinsus marinus Perkinsea P Ichthyophthirius multifiliis ALVEOLATA Tetrahymena thermophila Ciliata Paramecium tetraurelia Phaeodactylum tricornutum Fragilariopsis cylindrus 1/1/95/97/97 Thalassiosira pseudonana Bacillariophyta Pseudochattonella farcimen Dictyophyceae Aureococcus anophagefferens Pelagophyceae Nannochloropsis gaditana Phaeophyceae Ectocarpus siliculosus Eustigmatophyceae Pythium arrhenomanes

P Phytophthora sp Oomycota 1/1/90/98/99 Saprolegnia parasitica

Aurantiochytrium limacinum STRAMENOPILA Schizochytrium aggregatum Labyrinthulomycetes P Blastocystis hominis .98/1/100/100/100 Oryza sativa Sorghum bicolor Populus trichocarpa Arabidopsis thaliana Physcomitrella patens VIRIDIPLANTAE Coccomyxa sp Volvox carteri Micromonas sp 0.3

Supplementary figure S1. Phylogeny of the SAR group excluding Mykrocytos mackini. Summary tree of supplementary figures 7-10. The consensus tree shown was obtained by the majority rule of the Bayesian tree pool after 15 000 generations inferred with 229 genes using the GTR matrix (revMat with Dirichlet process). The statistical support on branches was estimated by Bayesian posterior probabilities of post-burnin bipartitions using ExaBayes, PhyloBayes under CAT-GTR model, bootstrap replicates from maximum likelihood analysis on RAxML and ultrafast bootstrap values implemented on IQ-Tree, respectively. Maximum statistical support was obtained under the four methods where solid dots and no values are shown (i.e. 1/1/100/100).

0.91 Nonionellina_sp 1 Globobulimina_turgida 1 Bulimina_marginata Elphidium_sp 1 0.76 Ammonia_sp 0.91 Reticulomyxa_filosa 1 Collozoum_sp 1 Spongosphaera_streptacantha 1 Phyllostaurus_siculus 1 Amphilonche_elongata 1 Astrolonche_serrata 0.83 Minchinia_chitonis 1 Bonamia_ostreae 1 1 Mikrocytos_mackini 1 Filoreta_sp Gromia_sphaerica 1 Spongospora_subterranea 1 1 Plasmodiophora_brassicae Leptophrys_vorax 0.76 Paracercomonas_marina 0.75 1 Guttulinopsis_vulgaris Bigelowiella_natans Aulacantha_scolymantha 1 1 Plasmodium_falciparum 1 Theileria_annulata 1 1 Toxoplasma_gondii Neospora_caninum 1 Cryptosporidium_hominis 1 Symbiodinium_sp 1 1 Karlodinium_micrum 1 Oxyrrhis_marina Perkinsus_marinus 1 Tetrahymena_thermophila 1 Ichthyophthirius_multifiliis 1 Paramecium_tetraurelia 1 Phaeodactylum_tricornutum 1 Fragilariopsis_cylindrus 1 Thalassiosira_pseudonana 1 1 Pseudochattonella_farcimen Aureococcus_anophageferrens 1 1 Ectocarpus_siliculosus Nannochloropsis_gaditana 1 Phytophthora_ramorum 1 1 Pythium_arrhenomanes 1 Saprolegnia_parasitica 1 Schizochytrium_aggregatum Aurantiochytrium_limacinum Blastocystis_hominis 1 Oryza_sativa 1 Sorghum_bicolor 1 1 Arabidopsis_thaliana Populus_trichocarpa Physcomitrella_patens 1 Volvox_carteri 1 Coccomyxa_sp Micromonas_sp

0.3 Supplementary figure S2. Phylogeny of the SAR group. The consensus tree shown was obtained by the majority rule of the Bayesian tree pool after 15,000 generations inferred with 229 genes using the GTR matrix (revMat with Dirichlet process) as implemented in ExaBayes. The statistical support on branches was estimated by Bayesian posterior probabilities of post-burnin bipartitions.

0.81 Nonionellina_sp 0.65 Globobulimina_turgida 1 Bulimina_marginata 1 Elphidium_sp 1 Ammonia_sp 1 Reticulomyxa_filosa 1 Spongosphaera_streptacantha 1 Collozoum_sp 1 Phyllostaurus_siculus 1 Amphilonche_elongata

0.98 Astrolonche_serrata 0.5 1 Minchinia_chitonis Bonamia_ostreae Gromia_sphaerica 1 Filoreta_sp 0.8 Mikrocytos_mackini 1 Spongospora_subterranea 1 1 Plasmodiophora_brassicae Leptophrys_vorax 0.96 Paracercomonas_marina 0.98 Guttulinopsis_vulgaris 1 Bigelowiella_natans Aulacantha_scolymantha 1 Theileria_annulata 0.5 1 Plasmodium_falciparum 1 1 Toxoplasma_gondii Neospora_caninum Cryptosporidium_hominis 1 1 Symbiodinium_sp 1 Karlodinium_micrum 1 1 Oxyrrhis_marina Perkinsus_marinus 1 Tetrahymena_thermophila 1 Ichthyophthirius_multifiliis 1 Paramecium_tetraurelia 1 Phaeodactylum_tricornutum 1 Fragilariopsis_cylindrus 1 Thalassiosira_pseudonana 1 Pseudochattonella_farcimen 1 Aureococcus_anophageferrens 1 Nannochloropsis_gaditana 1 Ectocarpus_siliculosus 1 Pythium_arrhenomanes 1 1 Phytophthora_ramorum Saprolegnia_parasitica 1 1 Schizochytrium_aggregatum Aurantiochytrium_limacinum Blastocystis_hominis 1 Sorghum_bicolor 1 Oryza_sativa 1 1 Populus_trichocarpa Arabidopsis_thaliana Physcomitrella_patens 1 Volvox_carteri 1 Coccomyxa_sp Micromonas_sp

0.9

Supplementary figure S3. Phylogeny of the SAR group. The strict consensus tree shown was obtained by the majority rule of the Bayesian tree pool after 4,000 generations inferred with 229 genes using the CAT+GTR setting as implemented in PhyloBayes. The statistical support on branches was estimated by Bayesian posterior probabilities of post-burnin bipartitions.

4 8 Nonionellina_sp 100 Globobulimina_turgida 100 Bulimina_marginata 100 3 9 Elphidium_sp Ammonia_sp 7 0 Reticulomyxa_filosa 100 Spongosphaera_streptacantha 100 Collozoum_sp 100 Amphilonche_elongata 100 Phyllostaurus_siculus 8 3 Astrolonche_serrata 9 3 Minchinia_chitonis 5 3 Bonamia_ostreae 5 3 8 9 Mikrocytos_mackini 9 3 Filoreta_sp Gromia_sphaerica 100 Plasmodiophora_brassicae 100 100 Spongospora_subterranea Leptophrys_vorax 3 3 Paracercomonas_marina 4 0 100 Guttulinopsis_vulgaris Bigelowiella_natans Aulacantha_scolymantha 100 9 8 Toxoplasma_gondii 100 Neospora_caninum 100 100 Theileria_annulata Plasmodium_falciparum 100 Cryptosporidium_hominis 100 Karlodinium_micrum 100 Symbiodinium_sp 100 100 Oxyrrhis_marina Perkinsus_marinus 100 Ichthyophthirius_multifiliis 100 Tetrahymena_thermophila 100 Paramecium_tetraurelia 100 Phaeodactylum_tricornutum 100 Fragilariopsis_cylindrus 9 5 Thalassiosira_pseudonana 100 100 Pseudochattonella_farcimen Aureococcus_anophageferrens 100 Nannochloropsis_gaditana 100 Ectocarpus_siliculosus 100 Phytophthora_ramorum 100 9 0 Pythium_arrhenomanes 100 Saprolegnia_parasitica 100 Aurantiochytrium_limacinum Schizochytrium_aggregatum Blastocystis_hominis 100 Oryza_sativa 100 Sorghum_bicolor 100 100 Arabidopsis_thaliana Populus_trichocarpa Physcomitrella_patens 100 Coccomyxa_sp Volvox_carteri 100 Micromonas_sp

0.3 Supplementary figure S4. Phylogeny of the SAR group. The best maximum likelihood tree shown was obtained using RAxML inferred with 229 genes using the PROTGAMMALG setting. The statistical support on branches was estimated by 1000 bootstrap replicates.

7 3 Bulimina_marginata 100 Elphidium_sp 100 Globobulimina_turgida Nonionellina_sp 100 8 4 Ammonia_sp 8 7 Reticulomyxa_filosa 100 Collozoum_sp 100 Spongosphaera_streptacantha 100 Amphilonche_elongata 100 Phyllostaurus_siculus 8 6 Astrolonche_serrata 9 8 Bonamia_ostreae 6 5 Minchinia_chitonis 6 5 9 0 Mikrocytos_mackini 9 1 Filoreta_sp Gromia_sphaerica 100 Plasmodiophora_brassicae 100 100 Spongospora_subterranea Leptophrys_vorax 4 0 Guttulinopsis_vulgaris 4 6 100 Paracercomonas_marina Bigelowiella_natans Aulacantha_scolymantha 100 9 8 Neospora_caninum 100 Toxoplasma_gondii 100 100 Plasmodium_falciparum Theileria_annulata 100 Cryptosporidium_hominis 100 Karlodinium_micrum 100 Symbiodinium_sp 100 100 Oxyrrhis_marina Perkinsus_marinus 100 Ichthyophthirius_multifiliis 100 Tetrahymena_thermophila 100 Paramecium_tetraurelia 100 Fragilariopsis_cylindrus 100 Phaeodactylum_tricornutum 9 7 Thalassiosira_pseudonana 100 100 Aureococcus_anophagefferens Pseudochattonella_farcimen 100 100 Ectocarpus_siliculosus Nannochloropsis_gaditana 100 Phytophthora_ramorum 9 8 100 Pythium_arrhenomanes 100 Saprolegnia_parasitica 100 Aurantiochytrium_limacinum Schizochytrium_aggregatum Blastocystis_hominis 100 Arabidopsis_thaliana 100 Populus_trichocarpa 100 100 Oryza_sativa Sorghum_bicolor Physcomitrella_patens 100 Coccomyxa_sp Volvox_carteri 100 Micromonas_sp

0.3 Supplementary figure S5. Phylogeny of the SAR group. The best maximum likelihood tree shown was obtained using IQ-Tree inferred with 229 genes using the LG+GAMMA setting. The statistical support on branches was estimated by 1000 ultrafast bootstrap replicates.

39/71 Bulimina_marginata 100/100 Elphidium_sp 100/100 Globobulimina_turgida Nonionellina_sp 100/100 75/83 Ammonia_sp 71/84 Reticulomyxa_filosa 100/100 Collozoum_sp 100/100 Spongosphaera_streptacantha 100/100 Amphilonche_elongata 100/100 Phyllostaurus_siculus 100/93 Astrolonche_serrata 93/98 Bonamia_ostreae 100/61 Minchinia_chitonis 100/97 Mikrocytos_mackini 99/92 100/61 Filoreta_sp Gromia_sphaerica 100/100 100/100 Plasmodiophora_brassicae 100/100 Spongospora_subterranea Leptophrys_vorax 60/44 Guttulinopsis_vulgaris 95/49 Paracercomonas_marina Bigelowiella_natans 100/98 Aulacantha_scolymantha 100/100 Neospora_caninum 100/99 100/100 Toxoplasma_gondii 100/100 100/100 Plasmodium_falciparum Theileria_annulata 100/100 Cryptosporidium_hominis 100/100 Karlodinium_micrum 100/100 Symbiodinium_sp 100/100 Oxyrrhis_marina 100/100 Perkinsus_marinus 100/100 Ichthyophthirius_multifiliis 100/100 Tetrahymena_thermophila 100/100 Paramecium_tetraurelia 100/100 Fragilariopsis_cylindrus 100/100 Phaeodactylum_tricornutum 97/96 Thalassiosira_pseudonana 100/100 Aureococcus_anophageferrens 100/100 Pseudochattonella_farcimen 100/100 Ectocarpus_siliculosus 100/100 Nannochloropsis_gaditana 100/100 Phytophthora_ramorum 99/97 100/100 Pythium_arrhenomanes Saprolegnia_parasitica 100/100 Aurantiochytrium_limacinum 100/100 Schizochytrium_aggregatum Blastocystis_hominis 100/100 Arabidopsis_thaliana 100/100 Populus_trichocarpa 100/100 Oryza_sativa 100/100 Sorghum_bicolor Physcomitrella_patens 100/100 Coccomyxa_sp Volvox_carteri 100/100 Micromonas_sp

0.3 Supplementary figure S6. Phylogeny of the SAR group. The best maximum likelihood tree shown was obtained using IQ-Tree inferred with 229 genes using the LG+GAMMA setting. The statistical support on branches was estimated by the non-parametric SH-alrt and 1000 ultrafast bootstrap replicates, respectively.

0.92 Elphidium_sp 1 Bulimina_marginata 1 0.94 Nonionellina_sp 1 Globobulimina_turgida Ammonia_sp 0.85 Reticulomyxa_filosa 1 Collozoum_sp 1 Spongosphaera_streptacantha 1 Amphilonche_elongata 1 0.92 Phyllostaurus_siculus Astrolonche_serrata 0.92 Bonamia_ostreae 0.92 Minchinia_chitonis 0.92 0.92 Filoreta_sp Gromia_sphaerica 1 Plasmodiophora_brassicae 1 1 Spongospora_subterranea Leptophrys_vorax 0.9 Bigelowiella_natans 0.92 Guttulinopsis_vulgaris 0.92 Aulacantha_scolymantha Paracercomonas_marina 1 1 Plasmodium_falciparum 1 Theileria_annulata 1 1 Toxoplasma_gondii Neospora_caninum 1 Cryptosporidium_hominis 1 Karlodinium_micrum 1 1 Symbiodinium_sp 1 Oxyrrhis_marina Perkinsus_marinus 1 Tetrahymena_thermophila 1 Ichthyophthirius_multifiliis 1 Paramecium_tetraurelia 1 Fragilariopsis_cylindrus 1 Phaeodactylum_tricornutum 1 Thalassiosira_pseudonana 1 1 Pseudochattonella_farcimen Aureococcus_anophageferrens 1 1 Nannochloropsis_gaditana Ectocarpus_siliculosus 1 Phytophthora_ramorum 1 0.79 Pythium_arrhenomanes Saprolegnia_parasitica 1 1 Schizochytrium_aggregatum Aurantiochytrium_limacinum Blastocystis_hominis 1 Populus_trichocarpa 1 Arabidopsis_thaliana 1 1 Sorghum_bicolor Oryza_sativa Physcomitrella_patens 1 Volvox_carteri 1 Coccomyxa_sp Micromonas_sp

0.2 Supplementary figure S7. Phylogeny of the SAR group excluding Mikrocytos mackini. The consensus tree shown was obtained by the majority rule of the Bayesian tree pool after 15,000 generations inferred with 229 genes using the GTR matrix (revMat with Dirichlet process) as implemented in ExaBayes. The statistical support on branches was estimated by Bayesian posterior probabilities of post-burnin bipartitions.

0.61 Nonionellina_sp 0.83 Globobulimina_turgida 1 1 Bulimina_marginata 1 Elphidium_sp Ammonia_sp 1 Reticulomyxa_filosa 1 Spongosphaera_streptacantha 1 Collozoum_sp 1 Phyllostaurus_siculus 1 Amphilonche_elongata 0.58 Astrolonche_serrata 1 Minchinia_chitonis 0.97 1 Bonamia_ostreae Gromia_sphaerica 0.66 Filoreta_sp 1 1 Spongospora_subterranea 1 Plasmodiophora_brassicae Leptophrys_vorax 0.82 Paracercomonas_marina 0.84 1 Guttulinopsis_vulgaris Bigelowiella_natans Aulacantha_scolymantha 1 1 Theileria_annulata 1 Plasmodium_falciparum 1 1 Toxoplasma_gondii Neospora_caninum 1 Cryptosporidium_hominis 1 Symbiodinium_sp 1 1 Karlodinium_micrum 1 Oxyrrhis_marina Perkinsus_marinus 1 Tetrahymena_thermophila 1 Ichthyophthirius_multifiliis 1 Paramecium_tetraurelia 1 Phaeodactylum_tricornutum 1 Fragilariopsis_cylindrus 1 Thalassiosira_pseudonana 1 1 Pseudochattonella_farcimen Aureococcus_anophageferrens 1 1 Nannochloropsis_gaditana Ectocarpus_siliculosus 1 Pythium_arrhenomanes 1 1 Phytophthora_ramorum Saprolegnia_parasitica 1 1 Schizochytrium_aggregatum Aurantiochytrium_limacinum Blastocystis_hominis 1 Sorghum_bicolor 1 Oryza_sativa 1 1 Populus_trichocarpa Arabidopsis_thaliana Physcomitrella_patens 1 Volvox_carteri 1 Coccomyxa_sp Micromonas_sp

0.4 Supplementary figure S8. Phylogeny of the SAR group excluding Mikrocytos mackini. The strict consensus tree shown was obtained by the majority rule of the Bayesian tree pool after 4,000 generations inferred with 229 genes using the CAT+GTR setting as implemented in PhyloBayes. The statistical support on branches was estimated by Bayesian posterior probabilities of post-burnin bipartitions.

5 0 Nonionellina_sp 100 Globobulimina_turgida 3 9 100 Elphidium_sp 100 Bulimina_marginata Ammonia_sp 7 7 Reticulomyxa_filosa 100 Collozoum_sp 100 Spongosphaera_streptacantha 100 Amphilonche_elongata 100 5 6 Phyllostaurus_siculus Astrolonche_serrata 100 Minchinia_chitonis 100 6 4 Bonamia_ostreae 9 8 Gromia_sphaerica Filoreta_sp 100 Spongospora_subterranea 100 100 Plasmodiophora_brassicae Leptophrys_vorax 3 2 Paracercomonas_marina 3 9 100 Guttulinopsis_vulgaris Bigelowiella_natans Aulacantha_scolymantha 100 9 7 Neospora_caninum 100 Toxoplasma_gondii 100 100 Plasmodium_falciparum Theileria_annulata 100 Cryptosporidium_hominis 100 Symbiodinium_sp 100 100 Karlodinium_micrum 100 Oxyrrhis_marina Perkinsus_marinus 100 Ichthyophthirius_multifiliis 100 Tetrahymena_thermophila 100 Paramecium_tetraurelia 100 Fragilariopsis_cylindrus 100 Phaeodactylum_tricornutum 9 4 Thalassiosira_pseudonana 100 100 Pseudochattonella_farcimen Aureococcus_anophageferrens 100 100 Ectocarpus_siliculosus Nannochloropsis_gaditana 100 Pythium_arrhenomanes 8 8 100 Phytophthora_ramorum Saprolegnia_parasitica 100 100 Aurantiochytrium_limacinum Schizochytrium_aggregatum Blastocystis_hominis 100 Sorghum_bicolor 100 Oryza_sativa 100 100 Populus_trichocarpa Arabidopsis_thaliana Physcomitrella_patens 100 Volvox_carteri 100 Coccomyxa_sp Micromonas_sp

0.2 Supplementary figure S9. Phylogeny of the SAR group excluding Mikrocytos mackini. The best maximum likelihood tree shown was obtained using RAxML inferred with 229 genes using the PROTGAMMALG setting. The statistical support on branches was estimated by 1000 bootstrap replicates. 7 7 Bulimina_marginata 100 Elphidium_sp 8 8 100 Globobulimina_turgida 100 Nonionellina_sp Ammonia_sp 8 4 Reticulomyxa_filosa 100 Collozoum_sp 100 Spongosphaera_streptacantha 100 Amphilonche_elongata 100 6 5 Phyllostaurus_siculus Astrolonche_serrata 100 Bonamia_ostreae 100 7 0 Minchinia_chitonis 7 7 Filoreta_sp Gromia_sphaerica 100 Plasmodiophora_brassicae 100 100 Spongospora_subterranea Leptophrys_vorax 4 8 Guttulinopsis_vulgaris 5 5 100 Paracercomonas_marina Bigelowiella_natans Aulacantha_scolymantha 100 9 9 Neospora_caninum 100 Toxoplasma_gondii 100 100 Plasmodium_falciparum Theileria_annulata 100 Cryptosporidium_hominis 100 Karlodinium_micrum 100 100 Symbiodinium_sp 100 Oxyrrhis_marina Perkinsus_marinus 100 Ichthyophthirius_multifiliis 100 Tetrahymena_thermophila 100 Paramecium_tetraurelia 100 Fragilariopsis_cylindrus 100 Phaeodactylum_tricornutum 9 6 Thalassiosira_pseudonana 100 100 Aureococcus_anophageferrens Pseudochattonella_farcimen 100 100 Ectocarpus_siliculosus Nannochloropsis_gaditana 100 Phytophthora_ramorum 9 7 100 Pythium_arrhenomanes Saprolegnia_parasitica 100 100 Aurantiochytrium_limacinum Schizochytrium_aggregatum Blastocystis_hominis 100 Arabidopsis_thaliana 100 Populus_trichocarpa 100 100 Oryza_sativa Sorghum_bicolor Physcomitrella_patens 100 Coccomyxa_sp 100 Volvox_carteri Micromonas_sp

0.2 Supplementary figure S10. Phylogeny of the SAR group excluding Mikrocytos mackini. The best maximum likelihood tree shown was obtained using IQ-Tree inferred with 229 genes using the LG+GAMMA setting. The statistical support on branches was estimated by 1000 ultrafast bootstrap replicates.

Evolutionary origins of rhizarian parasites Supplementary Materials and Methods

Isolation of Leptoprhys vorax, library and preliminary analyses. Leptophrys vorax CCAC 3422B was co-cultured in tall petri dishes with Closterium sp. CCAC 2697B in Waris-H culture medium at 18ºC. After the algae were consumed, the vampyrellid cells were collected and total RNA was extracted using the NucleoSpin RNA XS (Macherey–Nagel, Germany). The sample was prepared using the TruSeq RNA sample kit (Illumina) and sequenced on the Illumina HiSeq2000. Low quality reads were discarded and the raw reads were assembled using the Velvet/oases software (Zerbino and Birney 2008).

Isolation of Plasmodiophora brassicae and Spongospora subterranean, library and preliminary analyses. The DNA and RNA isolations and gene prediction and annotation we performed as described by Schwelm, et al. (2015).

Isolation of Bonamia ostreae, library preparation and preliminary analysis Sample collection. B. ostreae was purified according to the protocol developed by Mialhe et al. (1988) from one highly infected flat oyster collected in Quiberon Bay (Morbihan, France), an endemic bay regarding bonamiosis. Briefly, the heavily infected oyster was selected by examination of heart tissue imprints using light microscopy. After homogenisation of all the organs except the adductor muscle, parasites were concentrated by differential centrifugation on sucrose gradients. Finally, the purified B. ostreae cells were suspended in filtered seawater (FSW) and counted using a Malassez-cell. About 2.108 parasites were obtained. RNA extraction and library preparation. After removing, parasite cells were placed in TRIzol® (Invitrogen) at -80°C until RNA isolation was done following the manufacturer's protocol. Total RNA concentration was measured on a NanoDrop 2000c spectrophotometer (Thermo Scientific). The quality of RNA sample was then assessed with an Agilent 2100 Bioanalyzer (Agilent Technologies). The cDNA libraries were prepared from 3 µg of total RNA with Illumina Tru-Seq RNA Sample Prep Kits (Illumina) for 100 bp paired-end reads according to the manufacturer’s instructions. Libraries were sequenced on a HiSeq 2000 (Illumina). Transcriptome analyses. The read pairs were checked and stored in ng6 environment (Mariette, et al. 2012). They were cleaned from remaining sequencing adapters using trim_galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). The over- represented reads were filtered out using the normalize_by_kmer_coverage.pl script from the Trinity software package (Haas, et al. 2013). The longest sub-sequence without Ns where extracted from each read and kept if their length exceeded half of the initial read length. This step was performed using an in-house script. Nine Oases (Schulz, et al. 2012) assemblies using k-mers ranging form 25 to 69 (pace 6) were performed on the cleaned read pairs. Only the longest contig per locus provided by the OasesV0.2.04OutputToCsvDataBase.py script (http://code.google.com/p/oases-to- csv/) were retained. All contigs files were merged. Finally, anti-sense chimeras (accidentally produced by the assembly step) where found and split with an in-house script. Cd-hit-est (Li and Godzik 2006) was used to cluster and remove similar contigs produced by different k-mers, based on their sequence similarities. TGICL (Pertea, et al. 2003) was used to assemble contigs having significant overlaps. The contigs were filtered on a minimum length of 200 base pairs. Input reads were then mapped back to the contigs using bwa aln (Li and Durbin 2009). The resulting alignment files were used to correct the contig sequence from spurious insertions and deletions using an in-house script and to filter out those with very low coverage. The filter excluded contigs with less than two mapped reads per million.

Isolation of Minchinia chitonis, library and preliminary analysis. Sample collection. Two heavily infected individuals of Lepidochitona cinerea were collected from Torre Abbey Sands beach, Torquay, Devon, on November 3, 2009. The presence of Minchinia chitonis was verified by light microscopy. Each chiton was allowed to rot in seawater in the lab for 2 weeks, after which time the plates of each chiton were removed, and remaining clumps of chiton tissue were diced with a razor blade. The tissue and slurry from each chiton was concentrated by centrifugation

and resuspended in 1 mL dH2O. Density gradient centrifugation. Samples were layered on 90% Percoll™ in 0.15M NaCl and spun at 30 000 rpm in an SW 41 rotor on a Beckman-Coulter Optima 190-K ultracentrifuge (Beckman-Coulter, Brea, CA, USA) for 1 hour at 12 ºC. Each sample separated into a pale layer near the top of the tube, a small, dark pellet at the bottom of the tube, and a brown layer slightly above and incompletely separated from the pellet. The pellet consisted of spores and sand particles, the brown layer was nearly pure spores, and the top pale layer was mainly cellular debris with a few spores. The spore layers were extracted and the spin procedure repeated twice more. Clean spore fractions from both chitons were pooled for subsequent wash steps. In order to remove Percoll™, then surface bacteria, then salts, the spore sample was washed twice in 0.15M NaCl, once in

10% SDS, and twice in dH2O, with spores collected after each wash by benchtop centrifugation at 12 000 g for 10 minutes (Eppendorf, Hamburg, Germany). RNA extraction and whole transcriptome amplification. Clean spores were resuspended in 1 mL DNase buffer and treated with 2µL DNase I (Invitrogen, Carlsbad, CA, USA) at room temperature for 25 minutes. Spores were collected by centrifugation and resuspended in 0.5 mL lysis buffer (10mM Tris HcL pH 8, 0.1M EDTA pH 8, 0.5% w/v SDS) and incubated with 5µL of 20 mg/mL proteinase K (Bioshop, Burlington, ON, Canada) overnight at 50°C. Pellet density was estimated to be 4000 spores per µL. For RNA extraction using the Nucleospin RNA XS kit (Macherey-Nagel, Düren, Germany) approximately 800,000 spores were repeatedly frozen in liquid nitrogen and ground by hand in buffer RA1 with 4µL TCEP in a small tube-shaped glass mortar for 1 hour. Subsequent steps were performed according to the manufacturer’s protocol, without use of carrier RNA, and including the filtration step. Total RNA was eluted in 10 µL elution buffer. Agarose gel electrophoresis of 2µL of the RNA sample did not reveal detectable bands after staining with SYBR green II for RNA, though both NEB ssRNA ladder (New England Biolabs, Ipswitch, MA, USA) and Ambion RNA 6000 ladder (Life Technologies, Grand Road, NY, USA) were clearly visualized. The remaining 8µL of RNA sample were subjected to whole transcriptome amplification using the Sigma WTA kit (Sigma, St. Louis, MO, USA). 40µL of the amplified cDNA sample was ethanol- precipitated (100 µL 100% EtOH and 4 µL 3M NaOAc pH 5.2) and resuspended in

dH2O. 10 µL of 100 ng/µL amplified cDNA was submitted to NCGR for sequencing. Project ID: MMETSP0186 (http://data.imicrobe.us/).

The full materials and methods for the library construction and preliminary analyses of the transcriptome from Mikrocytos mackini can be found in Burki et al., 2013; and for Gromia sphaerica, Nonionellina sp., Globobulimina turgida, Elphidium margaritaceum, Bulimina marginata, Ammonia sp., Collozoum sp., Spongosphaera streptacantha, Phyllostaurus siculus, Amphilonche elongata, Astrolonche serrata and Aulacantha scolymantha in Sierra et al., 2012.

References

Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, et al. 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494-1512.

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754-1760.

Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658-1659.

Mariette J, Escudie F, Allias N, Salin G, Noirot C, Thomas S, Klopp C. 2012. NG6: Integrated next generation sequencing storage and processing environment. BMC Genomics 13:462.

Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, et al. 2003. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19:651-652. Schulz MH, Zerbino DR, Vingron M, Birney E. 2012. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28:1086-1092.

Schwelm A, Fogelqvist J, Knaust A, Jülke S, Lilja TB-R, G., Karlsson M, Shevchenko A, Dhandapani V, Choi SR, Kim HG, et al. 2015. The Plasmodiophora brassicae genome reveals insights in its life cycle and ancestry of chitin synthases. Sci Rep 5.

Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821-829.