<<

Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

Untangling the early diversification of rspb.royalsocietypublishing.org : a phylogenomic study of the evolutionary origins of Centrohelida, Haptophyta and Research Fabien Burki1, Maia Kaplan1, Denis V. Tikhonenkov1,2, Vasily Zlatogursky3, Cite this article: Burki F, Kaplan M, Bui Quang Minh4, Liudmila V. Radaykina2, Alexey Smirnov3, Alexander Tikhonenkov DV, Zlatogursky V, Minh BQ, 2 1,5 Radaykina LV, Smirnov A, Mylnikov AP, Keeling P. Mylnikov and Patrick J. Keeling PJ. 2016 Untangling the early diversification of 1Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada eukaryotes: a phylogenomic study of the 2Institute for Biology of Inland Waters, Russian Academy of Sciences, Borok, Russia evolutionary origins of Centrohelida, 3Department of Invertebrate Zoology, St Petersburg State University, St Petersburg, Russia 4 Haptophyta and Cryptista. Proc. R. Soc. B 283: Center for Integrative Bioinformatics, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, Vienna, Austria 20152802. 5Canadian Institute for Advanced Research, Integrated Microbial Biodiversity Program, Toronto, Ontario, Canada http://dx.doi.org/10.1098/rspb.2015.2802 Assembling the global eukaryotic tree of life has long been a major effort of Biology. In recent years, pushed by the new availability of genome-scale data for microbial eukaryotes, it has become possible to revisit many - Received: 24 November 2015 ary enigmas. However, some of the most ancient nodes, which are essential for Accepted: 22 December 2015 inferring a stable tree, have remained highly controversial. Among other reasons, the lack of adequate genomic datasets for key taxa has prevented the robust reconstruction of early diversification events. In this context, the cen- trohelid heliozoans are particularly relevant for reconstructing the tree of eukaryotes because they represent one of the last substantial groups that was Subject Areas: missing large and diverse genomic data. Here, we filled this gap by sequencing evolution, and systematics high-quality transcriptomes for four lineages, each correspond- ing to a different family. Combining these new data with a broad eukaryotic Keywords: sampling, we produced a gene-rich taxon-rich phylogenomic dataset that , eukaryotes, , enabled us to refine the structure of the tree. Specifically, we show that (i) cen- tree of life, plastid evolution trohelids relate to , confirming ; (ii) Haptista relates to SAR; (iii) Cryptista share strong affinity with ; and (iv) Haptista þ SAR is sister to Cryptista þ Archaeplastida. The implications of this topology are discussed in the broader context of plastid evolution. Authors for correspondence: Fabien Burki e-mail: [email protected] 1. Introduction Patrick J. Keeling e-mail: [email protected] Reconstructing the tree of life is a challenging task, because the long evolutionary history since the origin of life has often confounded the phylogenetic signal that can be recovered today. Nevertheless, molecular-based phylogenies have made possible profound rearrangements in the tree, most recently using phylogenomics (i.e. the use of genomic-scale datasets with stronger phylogenetic power) [1]. Accordingly, the global tree of eukaryotes has been reshuffled once again, leading to a better understanding of the relationships between the largest assemblages, or supergroups, and the origins of some ‘orphan’ lineages [2]. However, contentious nodes between supergroups remain, as well as a few lingering ‘orphans’. Resol- ving the positions of these orphans is necessary for understanding their evolution, but also impacts the tree as a whole, because poorly sampled ‘orphan’ groups may lead to instability in the tree. One such group lacking proper genomic data is Centrohelida, a monophyletic Electronic supplementary material is available group of free-living predatory mainly found in freshwater and soil habitats, at http://dx.doi.org/10.1098/rspb.2015.2802 or but also increasingly recognized to occur widely in marine environments [3]. With about 90 described species and a vast diversity of environmental sequences [4], cen- via http://rspb.royalsocietypublishing.org. trohelids have traditionally constituted the core of the original ,

& 2016 The Author(s) Published by the Royal Society. All rights reserved. Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

which included a subset of microbial eukaryotes characterized Raineriophrys erinaceoides (Pterocystidae), as well as two 2 by a special type of , the axopodia. Heliozoa was undescribed species: Acanthocystis sp. (Acanthocystidae) and rspb.royalsocietypublishing.org shown to be a polyphyletic assemblage, and today several Choanocystis sp. (Choanocystidae). Our analyses unambigu- relatively minor lineages are scattered across the tree [5]. ously confirm that centrohelids share a common origin with Centrohelids, however, have remained one of the last sub- haptophytes. More generally, we present compelling evidence stantially diverse groups of eukaryotes that has eluded for the phylogenetic position of the centrohelid– phylogenetic placement in the tree of life. Different analyses group and Cryptista, altogether bringing us one step closer of the 18S rRNA and a small number of protein-coding to a fully resolved eukaryotic tree of life. genes (actin, a-tubulin, b-tubulin, EF2, HSP70, HSP90) led to placement in various regions of the tree, but never with good statistical support. For example, centrohelids were 2. Methods weakly inferred to branch close to members of the Viridiplan- B Soc. R. Proc. tae, specifically [6] or red [7]. Other Details of experimental procedure for culturing, molecular work, sequencing, assembling and gene preparation are described in studies showed the centrohelids to share affinities with hap- the electronic supplementary material. tophytes [4,8], or were inconclusive [9]. Even a larger-scale multigene analysis involving 127 genes was unsuccessful at 283 the task, placing centrohelids with low confidence as sister (a) Phylogenomic datasets construction 20152802 : to either haptophytes or the enigmatic telonemids [10]. Following the preparation of 263 genes for phylogenomic analysis More recently, the partial transcriptome sequencing for the (see electronic supplementary material), all taxa were listed with tiny centrohelid marina was included in a 187 SCaFoS [29], which amounted to 274 taxa. This list was first reduced genes dataset, which resulted in a less ambiguous monophy- to 234 taxa after removing all taxa with more than or equal to 20% letic grouping with haptophytes [11], reinforcing the phylum missing genes. A 234-taxa, 263-gene (234/263) supermatrix was Haptista originally proposed on weaker evidence [4,12]. then constructed to infer an initial maximum-likelihood (ML) tree with IQ-TREE v. 1.3.0 [30] under the LG þ G model. Based on this Beyond their intrinsic interest as a large group of eukar- initial tree, a phylogeny-driven taxon selection approach was yotes with unknown evolutionary origin, centrohelids also applied to reduce further the number of taxa by retaining only repre- hold some of the clues to better understand a larger and sentative sequences within strongly supported monophyletic groups a priori unrelated evolutionary mystery. Owing to their poss- (100% bootstrap support), discarding the longest branches and/or ible link to haptophytes [11], centrohelids may help to shed least complete sequences. Chimeric concatenated sequences were light on one of the most puzzling aspects of plastid evolution: also allowed by pooling highly incomplete taxa of the same genus the origin and evolution of complex red plastids [13,14]. (see electronic supplementary material, table S1 for details). This Centrohelids are heterotrophs, and no permanent plastid approach led to a final taxon sampling composed of 150 operational has ever been observed [15], although kleptoplasty has taxonomic units (OTUs). Because removal of ambiguously aligned been reported [16]. Haptophytes, on the other hand, are sites is directly influenced by the proportion of gaps, we then re- phototrophs and possess complex plastids derived from an extracted from the unaligned and untrimmed fasta files the 150 OTUs corresponding to our final selection, which were re-aligned endosymbiotic event with a red alga [17]. Haptophytes rep- with MAFFT-LINSI v. 7 and trimmed with BMGE v. 1.1 [31] resent one of four lineages harbouring such plastids, the using conservative settings (removal of sites with more than 20%, others being ochrophytes (photosynthetic stramenopiles), minimum block size of 8, substitution matrix BLOSUM 75). Finally, myzozoans ( with plastids: apicomplexans, dinofla- from our starting dataset of 263 seed genes, only 250 were retained to gellates and chrompodellids) and cryptophytes (belonging to enter the final concatenated alignment (55 554 aa positions), which Cryptista, which also include goniomonads, corresponded to genes with less than 50% missing OTUs. See elec- and Palpitia). Whereas the origins of stramenopiles and alveo- tronic supplementary material, table S1 for details about missing lates are better understood [18,19], haptophytes and Cryptista data, and electronic supplementary material, table S2 for complete have notoriously remained challenging to place in the tree. gene names. These 250 genes, containing up to 150 OTUs, were They are sometimes grouped together, along with telonemids concatenated into a supermatrix (150/250) with SCaFoS [29]. and centrohelids [10,11,20–23], which resulted in the establish- From the full 150/250 dataset, two reduced datasets were considered. First, subtilis and Picomonas sp. were ment of [24]. However, haptophytes and Cryptista removed (see Results and Discussion for the justification), lead- have also been shown to have polyphyletic origins in several ing to the 148/250 dataset. This dataset was reduced further by recent multigene analyses [25–27]. Thus, untangling the con- eliminating the 19 047 fastest-evolving positions corresponding troversial phylogenetic positions of these two groups, along to bin10, according to the tree-independent method described with their closely related plastid-lacking lineages such as in [32]; this dataset was named 148/250-slow. centrohelids, is a much-needed step to better explain the observed distribution of red plastids in the eukaryotic tree. In this study, we used a phylogenomic approach includ- (b) Phylogenetic analyses ing a broad sampling of diversity to investigate the deep Our supermatrices were analysed by ML and Bayesian tree recon- evolutionary relationships among eukaryotes, with particular struction methods. ML analyses were performed with IQ-TREE focus on centrohelids, haptophytes and Cryptista. For that pur- v. 1.3.0–1.3.10 [30]. Gene-partitioned and unpartitioned alignments were analysed; in all cases, the model that best fits the data was pose, we filled an important gap in genome datasets by determined by IQ-TREE according to the Bayesian information cri- sequencing high-quality transcriptomes for four centrohelid terion (BIC). The partitioned analysis was applied to the 150/250 species, and combined those with recent transcriptomes for a and 148/250 datasets, where the best-fit model was chosen accord- very large diversity of marine microbial eukaryotes (the ing to a greedy strategy that sequentially merges genes from the fully MMETSP initiative [28]). Cultures for four species were partitioned alignment (250 partitions) until the model fit stops established, each representing a different centrohelid family, improving. We opted for the new model selection procedure (-m namely heterophryoidea (Raphidiophryidae), TESTNEW), which additionally implements the FreeRate Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

heterogeneity model inferring the site rates directly from the data In total, four models of evolution were tested on the differ- 3 instead of being drawn from a gamma distribution [33]. Owing to ent datasets: ML analyses employed a partition approach with rspb.royalsocietypublishing.org the large size of the partition schemes, only the top 20% was checked 250 gene partitions allowing each gene to have its own model, using the relaxed clustering algorithm (-rcluster 20), as described in the LG þ Rx þ F model and the LG þ C60 þ F model (elec- [34]. For both datasets, the best-fit partitioning scheme contained the tronic supplementary material, table S4); Bayesian analyses original 250 partitions, i.e. no merging was deemed necessary. This were run under the CAT þ GTR þ G4 model. To select the partitioning scheme was then used to specify a model for each par- best-fitting model in ML, we followed the BIC score selection tition, allowing each gene to have its own rate (-spp). For the unpartitioned analyses of both 150/250 and 148/250 supermatrices, criterion, which showed that the LG þ C60 þ F model consist- the best-fitted model corresponded to the LG matrix with relative ently achieved better scores than the other two models rates estimated from the data using the non-parametric FreeRate (electronic supplementary material, table S4). In Bayesian fra- model with 10 categories and empirical amino acid frequencies mework, the CAT þ GTR þ G4 model has been repeatedly

(LG þ R10 þ F). The best-fitted model for the unpartitioned analysis shown to have a better fit than simpler models based on B Soc. R. Proc. of the 148/250-slow dataset was LG þ R6 þ F. A more complex empirical exchangeability matrices such as LG, or even empirical mixture model not evaluated by the selection strategy in CAT þ G4 alone [40,41]. However, the size of our datasets IQ-TREE was also tested on all datasets: following recommendation makes comparing the fit of these complex models computation- in [35], the LG matrix was combined to an amino acid class fre- ally prohibitive, and thus topologies corresponding to the quency mixture model with 60 frequency component profiles plus 283 best-fitting LG þ C60 þ F model (ML) and the CAT þ GTR þ a class of empirical amino acid frequency of the alignment, and 20152802 : G4 model (Bayesian) are discussed in the following sections. four gamma categories to take into account the across-site rate het- erogeneity (LG þ C60 þ F). To assess branch support, all IQ-TREE analyses used the ultrafast bootstrap approximation (UFboot) with (b) Evolutionary relationships among major eukaryotic 1000 replicates [36] and the SH-like approximate likelihood ratio test (SH-aLRT) also with 1000 bootstrap replicates [37]. lineages Bayesian analyses were performed with PHYLOBAYES MPI The LG þ C60 þ F and CAT þ GTR þ G4 analyses of the v. 1.5a [38], under a site-heterogeneous mixture model combin- complete dataset (150/250) recovered with maximal support ing infinite profile mixtures and exchange rates inferred from (100% UFboot and SH-aLRT; 1.0 PP) a monophyletic assem- the data with the rates across site drawn from a discrete blage including centrohelids and haptophytes (figure 1; þ þ G gamma distribution (CAT GTR 4). Constant sites were electronic supplementary material, S1). More generally, removed to decrease computational time (-dc). Three indepen- these analyses recovered many of the major eukaryotic dent Markov chain Monte Carlo (MCMC) chains were run, for groups, namely , , and the at least 3000 generations but up to 7000 for the smaller 148/250-slow dataset. The burnin period was determined after SAR assemblage (stramenopiles, alveolates, ). The plotting the evolution of the log-likelihood (Lnl) across the iter- association previously suggested between , ations, removing the generations anterior to the stabilization of katablepharids and the marine biflagellate the Lnl. Convergence between the chains was assessed by exam- into the Cryptista clade was supported with 100% UFboot ining the difference in frequency between all bipartitions and SH-aLRT and 1.0 PP [25,27,43]. In the LG þ C60 þ Ftree, (maxdiff). Owing to the large size of our taxon sampling, conver- the Archaeplastida lineages (i.e. and land , gence was generally not globally achieved (maxdiff 0.46), an glaucophytes and ) were paraphyletic, with Cryptista issue that has been reported in other taxon-rich phylogenomic branching with green plants and glaucophytes (96% UFboot; studies [11,22]. The discrepancies between the chains mostly con- 88% SH-aLRT). In the CAT þ GTR þ G4 analysis, the position cerned nodes not under active discussion in this study, except for of Cryptista among the Archaeplastida lineages was unresolved the monophyly of Archaeplastida, which was accordingly labelled owing to incongruent nodes in the independent MCMC chains unsupported; electronic supplementary material, figures S2 and S5 show the trees inferred from each individual chains to allow (electronic supplementary material, figure S2a–c). Telonemids visual assessment of the discrepancies. were recovered as sister to SAR (93% UFboot; 99% SH-aLRT; 0.78 PP) and as sister to the red algae (93% UFboot; 100% SH-aLRT; 1.0 PP). Following the inference of a close evolutionary link 3. Results between centrohelids and haptophytes, the next important question is where Haptista goes in the global tree. The analyses (a) Improved dataset and model selection of the 150/250 dataset placed Haptista as sister to SAR, a To place the centrohelids in a broad eukaryotic framework, relationship that received no support under the LG þ C60 þ we took special care to include a very large diversity for all F model, but 1.0 PP under the CAT þ GTR þ G4 model known main lineages. Building on previously published (figure 1). To investigate this and the deeper structure of the datasets [25,39], we more than doubled the taxon sampling, tree in more detail, we reduced our dataset in two successive mostly using recently released high-quality transcriptomes steps. First, we removed two orphan lineages, T. subtilis and for marine microbial species [28] (electronic supplementary Picozoa, leading to the 148/250 dataset. These enigmatic taxa material, table S3). Our carefully curated taxon sampling con- mirror in many ways the problems we sought to solve here tained 150 OTUs for 250 genes (55 554 aa positions), globally for centrohelids. They are still extremely poorly represented characterized by only 21% of missing data (electronic sup- in genomic databases, being the sole representatives of a plementary material, table S1). Importantly, the four new much higher lineage diversity [44,45], which translates into centrohelid sequences missed only between 7.4% and 12.9% high proportions of missing data (67% for telonemids and positions, which corresponded to at least 48 399 aa positions 86% for Picomonas sp.; electronic supplementary material, included, representing many fold improvements compared table S1). Second, we removed from the 148/250 supermatrix with the 76.3% missing data for the older Polyplacocystis the 19 047 fastest-evolving positions using the similarity contractilis dataset [10]. between characters as an estimate of the evolutionary rates Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

4 rspb.royalsocietypublishing.org rc .Sc B Soc. R. Proc. 283 20152802 :

Figure 1. Phylogenetic tree of eukaryotes inferred from the complete dataset (150/250). The topology shown corresponds to the ML tree under the LG þ C60 þ F model, with both ML and Bayesian support value reported. Black dots on branches mean maximal support (i.e. 100% UFboot and SH-aLRT, and 1.0 Bayesian PP; the Bayesian CAT þ GTR þ G4 topology is shown in electronic supplementary material, figure S1). When not maximal, values are indicated only if deemed robust as follows: UFboot 95%/SH-aLRT 80%/PP 0.9. The tree is drawn rooted between Obazoa, Amoebozoa, , and the rest of eukaryotes after [42], though we note that the position of the root is under active debate. Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

5 rspb.royalsocietypublishing.org

S A R (Stramenopiles, Alveolata, Rhizaria)

98/91/1

Haptophyta B Soc. R. Proc.

Centrohelida

98/92/1

Archaeplastida (, Glaucophyta) 283 20152802 :

Cryptista 100/98/1 Archaeplastida (Rhodophyta) 98/94/1 Excavata

Malawimonas

Obazoa (+ Collodictyon) Amoebozoa Figure 2. Schematics of the new backbone for the eukaryotic tree, highlighting the relationships among the main groups. The topology is based on the 148/250- slow supermatrix, and corresponds to both ML and Bayesian reconstructions under the LG þ C60 þ F and CAT þ GTR þ G4 models, respectively. The complete tree is presented in electronic supplementary material, figure SX. Black dots on branches mean maximal support (i.e. 100% UFboot and SH-aLRT, and 1.0 Bayesian PP). When not maximal values are indicated as followed: UFboot/SH-aLRT/PP. All supergroups indicated by the triangles received maximal support, with the excep- tion of the grouping of Viridiplantae and glaucophytes, which was unsupported (shown by dashed lines). The size of the triangles roughly represents the diversity of taxa included in our analyses, as well as the length of the longest branch in each group. The root is placed in the same position as in figure 1.

[32], leading to the 148/250-slow dataset. Fast-evolving pos- 4. Discussion itions are more likely to concentrate undetected multiple substitutions, even by advanced models of evolution such as (a) Towards resolving the eukaryotic tree the mixture models used here. Removing these positions Over the past decade, several phylogenomic studies have from large alignments diminishes the amount of undetected attempted to resolve the deep-level relationships among the multiple substitutions, but maintains enough phylogenetic main lineages of eukaryotes [18,19,25–27,47]. These studies information to reconstruct even ancient events, so this have greatly improved our model for the tree of eukaryotes, approach has shown great potential in other phylogenomic but several questions remain unsolved owing to the lack studies [26,46]. of data from poorly studied groups. Among these unsolved The resulting topologies were similar to those based on the questions, the relationships between centrohelids, hapto- full dataset. However, whereas the analyses of the 148/250 phytes, Cryptista and the main Archaeplastida lineages dataset did not improve the general statistical support of the (green plants, glaucophytes and red algae) have all proved to tree (electronic supplementary material, figures S3, S4 and be refractory to robust phylogenetic inferences. A combination S5a–c), the reconstructions based on the 148/250-slow dataset of three important sources of artefact is most likely to explain led to consistent and more robust topologies (figure 2; elec- the poor resolution for the placement of these lineages: tronic supplementary material, S6). Here, Haptista received (i) lack of data; (ii) too few representative species with genomic strong support (98% UFboot; 91% SH-aLRT; 1.0 PP) for its datasets; a (iii) models of evolution that fail to account for position as sister to SAR, and the Archaeplastida lineages homoplasic positions. In this study, we addressed these and Cryptista were strongly inferred to share a common ances- possible sources of incongruence by (i) sequencing the tor (100% UFboot; 98% SH-aLRT; 1.0 PP). Archaeplastida transcriptome of four centrohelid lineages, (ii) using a remained paraphyletic, but this was still unsupported and considerable amount of newly available taxon diversity, and should be further tested (69% UFboot; 80% SH-aLRT; 0.89 (iii) reducing non-phylogenetic signal by removing fast- PP). In these analyses, the Archaeplastida–Cryptista grouping evolving sites and applying site-heterogeneous models of evol- branched with SAR þ Haptista to the exclusion of all other ution in both ML and Bayesian frameworks. eukaryotes with maximal support (100% UFboot; 100% Our analyses recovered Haptista with maximal support, SH-aLRT; 1.0 PP). regardless of the dataset or the model used, strongly Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

confirming that centrohelids share a direct common ancestry eukaryotes have important implications for how we interpret 6 with haptophytes [4,11]. For the deeper relationships among some major evolutionary and ecological transitions in rspb.royalsocietypublishing.org eukaryotic groups, we found that a greater taxon diversity eukaryotic history. The groups investigated here and their together with the systematic use of site-heterogeneous relationships to the SAR and Archaeplastida supergroups rep- models, allowing us to take into account site-specific substi- resent a complex mixture of photosynthetic and heterotrophic tution patterns (C60 mixture and CAT models), improves eukaryotes, as well as lineages for which we have little evi- the general statistical confidence of the tree. When combined dence as to whether they harbour a plastid or not [13,54]. with a less noisy dataset (removal of the fastest-evolving Many of these lineages possess plastids bounded by three sites), these models converged towards a similar picture in or four membranes, which are the result of -to- both ML and Bayesian frameworks (figure 2). In this tree, Hap- eukaryote endosymbioses where heterotrophic organisms tista are closely related to the SAR assemblage with high acquired plastids from red algae [55]. What makes the evol- support, in agreement with weaker results based on lower ution of complex red plastids so hard to decipher is the B Soc. R. Proc. taxon diversity and different models [25,48]. Another relation- apparent discrepancy between plastid and host phylogenies. ship to receive strong support for the first time is the grouping Plastid phylogenies have generally been consistent with the of Archaeplastida with Cryptista. This affinity between notion that all red plastids are the product of a single secondary Archaeplastida and Cryptista has been noted before in several endosymbiosis [17,56–58]. This idea of a single origin was first 283 nuclear [25–27,48] and mitochondrial-based [42] phyloge- formalized in the chromalveolate hypothesis, which posited nomic investigations, as well as in many 18S rDNA that there was a single engulfment of a red alga in a common 20152802 : molecular studies [6], but unlike here, it never received signifi- ancestor of stramenopiles, haptophytes, cryptophytes and cant support. Taken together, the affinities of Cryptista to alveolates [59]. Host-derived phylogenies, on the other hand, Archaeplastida and of Haptista to SAR further diminish the have generally failed to provide any strong evidence that all support for Hacrobia, which was initially a less controversial red-algal-containing lineages (and their associated plastid- assemblage when poorer taxon sampling was available lacking relatives) are monophyletic, which is required under [10,20,21,24]. Even though recent phylogenomic analyses con- the single endosymbiotic origin scenario. However, host phylo- tinued to show a monophyletic Hacrobia, this was with no genies have thus far not provided any convincing alternative support [11,22], or with better confidence only when a large topologies either, making it difficult to see how plastid and part of the diversity was removed [11]. host data can be best reconciled. One group of Cryptista (the cryptomonads) includes In this context, our work can help us understand the evol- lineages with plastids of red algal origin (see below), which ution of red plastids. Specifically, the strongly supported may confound our ability to discriminate vertically inherited grouping of Archaeplastida and Cryptista de facto rules out genes from endosymbiotically derived ones. Indeed, it is the scenario of a single red plastid origin in a hypothetical at face value possible that the phylogenetic relationship ancestor of a unified chromalveolate assemblage (figure 3a). between Cryptista and Archaeplastida observed here and else- As stated above, the lack of support for the monophyletic where [25–27,48] is due to undetected red algal genes in origin of red plastids from host data is not new, but this is phylogenomic datasets, rather than common ancestry. This the first time, to the best of our knowledge, that a phylogenetic is formally possible, because endosymbiotic gene transfer tree strongly argues against it. Indeed, had Cryptista branched (EGT) is common during endosymbiosis, but there are several elsewhere in the tree, a single origin of chromalveolate plastids reasons to suggest this is not affecting our results. First, if it was could be explained by positing additional plastid loss events, the case that large numbers of unrecognized red algal genes however likely that may be. However, because Cryptista invaded eukaryotic genomes after endosymbiosis, then one branches with the same lineage from which the plastid is would expect all red algal plastid-containing lineages to con- derived (i.e. Archaeplastida), a single origin of red plastids is tain many such genes, and accordingly, all be attracted to formally impossible, because those plastids would have Archaeplastida, not only cryptomonads. Second, large-scale needed to travel backwards in time to result in this topology. investigations of EGT in various eukaryotes (including the With what are now robust relationships for both plastids whole genome of the theta)have and hosts, how can we best reconcile their apparent conflictual shown that the endosymbiotic contribution to the host topologies? Two main scenarios exist to explain the origin and genome, although real, is probably less substantial than orig- present distribution of complex red plastids: (i) indepen- inally envisioned [49–52]. Third, phylogenomic datasets dent secondary endosymbioses (figure 3b) and (ii) a unique usually consist of highly expressed housekeeping genes that secondary endosymbiosis followed by additional layers of show no sign of widespread red algal signal. Careful inspection endosymbioses (i.e. tertiary or quaternary; figure 3c). Even of our dataset allowed us to detect various contamination in though the first scenario of independent endosymbioses invol- different lineages, but not specifically from red algae, and sus- ving different red algae could explain the tree topologies, such picious topologies were not included, as in the case of the a model is unlikely in the light of several other pieces of translation elongation factor 2 [53]. Overall, we observed no evidence showing that all or substantial subsets of the ‘chro- genes in our dataset that individually showed a strong affinity malveolate’ plastids trace back to a single secondary red algal between Cryptista and red algae, suggesting that this relation- endosymbiont [23,60–62]. Lately, the second scenario of ship is a reflection of vertical inheritance rather than owing to a serial endosymbioses (figure 3c) has received increased atten- cryptic contamination of endosymbiont genes. tion, being now supported by a growing body of empirical data [48,63]. Several versions of this serial endosymbiotic fra- mework for red plastid evolution have been proposed, all (b) Implications for plastid evolution involving the idea of one secondary endosymbiosis with a Beyond these taxonomic considerations, the positions of red alga, followed by subsequent eukaryote-to-eukaryote centrohelids, haptophytes and Cryptista in the tree of endosymbioses [48,62,64,65]. Recently, an explicit model was Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

(a) single secondary endosymbiosis 7 diversification rspb.royalsocietypublishing.org

secondary endosymbiosis rc .Sc B Soc. R. Proc.

(b) multiple secondary endosymbioses

multiple secondary diversification endosymbioses 283 20152802 :

(c) cryptic serial endosymbioses diversification

secondary endosymbiosis

additional serial endosymbioses

Figure 3. Scenarios for the origin and evolution of complex red plastids. These scenarios do not refer to any specific taxa, but rather illustrate the various pos- sibilities discussed in the text, and show that the same diversity of plastid types can be generated by different combinations of events. (a) A single secondary endosymbiosis in the ancestor of all red plastid-bearing eukaryotes was followed only by descent with modification, as formalized in the chromalveolate hypothesis; this scenario is not supported by the current analyses. (b) Multiple independent secondary endosymbioses take place with different red algal symbionts, followed by descent with modification; this is compatible with current phylogenetic evidence from hosts, but not with evidence from plastids. (c) A single secondary endo- symbiosis takes place, but is followed by serial eukaryote-to-eukaryote endosymbioses; several versions of this scenario have been proposed (see text for references), and they are consistent with current phylogenetic data. devised using regression analyses to measure the expected strongly supported trees can be shown to be misleading with similarity between genomes of various ‘chromalveolate’ additional data. Moreover, the breadth for plastid genome lineages [63]. This approach resulted in a model where crypto- data has now been far exceeded by nuclear data, so that it is phytes first engulfed a red alga, which was then transferred likely that changes to the plastid tree will occur after the to the ochrophytes by tertiary endosymbiosis, and to the addition of new sequences, as recently demonstrated [58]. All haptophytes by quaternary endosymbiosis [63]. of this could ultimately affect our interpretation, but more In this context, our results are compatible with such a importantly various kinds of data, not only phylogenetics, ‘cryptophyte-first’ model, although we note that phylogenetic will be needed to validate a particular model. Plastids are cel- lines of evidence are not compelling by themselves. More gen- lular structures of great complexity that have integrated with erally, our results will need to stand the test of time, as even their hosts in many ways [54,66]. Serial endosymbiosis is Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

currently known for certain only in a few Assembled transcriptomes: Dryad data depository (http://data- 8 lineages, whose endosymbionts display peculiar ways of inte- dryad.org) accession http://dx.doi.org/10.5061/dryad.rj87v. rspb.royalsocietypublishing.org grating that are very different from what we observe in lineages Untrimmed sequences, trimmed alignments and single-gene trees: Dryad data depository (http://datadryad.org) accession like haptophytes, ochrophytes or most alveolates [67–69]. http://dx.doi.org/10.5061/dryad.rj87v. Thus, an integrative model of plastid evolution will need to Authors’ contributions. F.B. designed the study, participated in the dataset explain many aspects to be comprehensive, from phylogeny construction, carried out the phylogenetic analyses and drafted the to genetics to fine cellular processes. manuscript; M.K. participated in the dataset construction; D.V.T. and V.Z. collected field samples, established cultures, carried out molecular laboratory work and drafted the manuscript; L.V.R. col- lected field samples and established cultures; B.Q.M. carried out 5. Concluding remarks phylogenetic analyses and drafted the manuscript; A.S. and A.P.M. participated in the design of the study and critically revised the Our centrohelid transcriptomes fill an important diversity manuscript; P.J.K. designed the study and drafted the manuscript. B Soc. R. Proc. gap in genomic sequencing. In the near future, effort All authors gave final approval for publication. should be made to provide better-quality datasets for taxa Competing interests. The authors declare no competing interests. that are still evolutionary mystery but are essential to further Funding. This work was supported by a grant from the Natural resolve the tree; telonemids and Picozoa represent obvious Sciences and Engineering Research Council of Canada, and by a targets near to the organisms studied here, but many other grant from the Tula Foundation to the Centre for Microbial Diversity 283 and Evolution. This work was also partially supported by the enigmatic microbial eukaryotes probably affect other parts 20152802 : Russian Foundation for Basic Research (no. 14-04-00554, 15-34-20065, of the tree in similar ways. More work is also necessary to 15-29-02518, 15-04-18101_a) and by a grant from the President of Rus- determine the relative position of Cryptista to the Archae- sian Federation MK-7436.2015.4. The work of D.V.T. was supported by plastida lineages in order to assess the monophyletic origin the Russian Science Foundation (no 14-14-00515). B.Q.M. acknowl- of the primary plastids. edges financial support to Arndt von Haeseler from the University of Vienna and the Medical University Vienna. Data accessibility. Raw reads are available through GenBank sequence Acknowledgements. We thank Compute/Calcul Canada for computing read archive: SRR2170621, SRR2170625, SRR2170626, SRR2170627, resources and assistance, in particular WestGrid’s Orcinus and SRR2170634. Calcul Quebec’s Guillimin and Colosse facilities.

References

1. Delsuc F, Brinkmann H, Philippe H. 2005 9. Sakaguchi M, Inagaki Y, Hashimoto T. 2007 16. Patterson DJ, Du¨rrschmidt M. 1987 Selective Phylogenomics and the reconstruction of the tree of Centrohelida is still searching for a phylogenetic home: retention of by algivorous heliozoa: life. Nat. Rev. Genet. 6, 361–375. (doi:10.1038/ analyses of seven Raphidiophrys contractilis genes. fortuitous symbiosis? Eur. J. Protistol. 23, nrg1603) Gene 405, 47–54. (doi:10.1016/j.gene.2007.09.003) 51–55. (doi:10.1016/S0932-4739(87)80007-X) 2. Burki F. 2014 The eukaryotic tree of life from a 10. Burki F et al. 2009 Large-scale phylogenomic 17. Yoon HS, Hackett JD, Pinto G, Bhattacharya D. 2002 global phylogenomic perspective. Cold Spring Harb. analyses reveal that two enigmatic The single, ancient origin of chromist plastids. Proc. Perspect. Biol. 6, a016147. (doi:10.1101/cshperspect. lineages, and centroheliozoa, are Natl Acad. Sci. USA 99, 15 507–15 512. (doi:10. a016147) related to photosynthetic chromalveolates. 1073/pnas.242379899) 3. Mikrjukov KA. 2000 System and phylogeny of Genome Biol. Evol. 1, 231–238. (doi:10.1093/ 18. Burki F, Shalchian-Tabrizi K, Minge M, Skjaeveland Heliozoa: should this taxon exist in modern systems gbe/evp022) A, Nikolaev SI, Jakobsen KS, Pawlowski J. 2007 of protists? Zool Z. 79, 883–897. 11. Cavalier-Smith T, Chao EE, Lewis R. 2015 Multiple Phylogenomics reshuffles the eukaryotic 4. Cavalier-Smith T, Heyden von der S. 2007 Molecular origins of Heliozoa from ancestors: new supergroups. PLoS ONE 2, e790. (doi:10.1371/ phylogeny, scale evolution and taxonomy of cryptist subphylum Corbihelia, superclass journal.pone.0000790) centrohelid heliozoa. Mol. Phylogenet. Evol. 44, Corbistoma, and monophyly of Haptista, Cryptista, 19. Rodriguez-Ezpeleta N, Brinkmann H, Burger G, 1186–1203. (doi:10.1016/j.ympev.2007.04.019) Hacrobia and . Mol. Phylogenet. Evol. 93, Roger AJ, Gray MW, Philippe H, Lang BF. 2007 5. Cavalier-Smith T. 2003 Protist phylogeny and the 331–362. (doi:10.1016/j.ympev.2015.07.004) Toward resolving the eukaryotic tree: the high-level classification of . Eur. J. Protistol. 12. Cavalier-Smith T, Chao EE-Y. 2003 Molecular phylogenetic positions of and cercozoans. 39, 338–348. (doi:10.1078/0932-4739-00002) phylogeny of centrohelid heliozoa, a novel lineage Curr. Biol. 17, 1420–1425. (doi:10.1016/j.cub.2007. 6. Nikolaev SI, Berney C, Fahrni JF, Bolivar I, Polet S, of eukaryotes that arose by ciliary loss. 07.036) Mylnikov AP, Aleshin VV, Petrov NB, Pawlowski J. J. Mol. Biol. 56, 387–396. (doi:10.1007/s00239- 20. Hackett JD, Yoon HS, Li S, Reyes-Prieto A, Ru¨mmele 2004 The twilight of Heliozoa and rise of Rhizaria, 002-2409-y) SE, Bhattacharya D. 2007 Phylogenomic analysis an emerging supergroup of amoeboid eukaryotes. 13. Archibald JM. 2015 Genomic perspectives on the supports the monophyly of cryptophytes and Proc. Natl Acad. Sci. USA 101, 8066–8071. (doi:10. birth and spread of plastids. Proc. Natl Acad. Sci. haptophytes and the association of rhizaria with 1073/pnas.0308602101) USA 112, 10 147–10 153. (doi:10.1073/pnas. chromalveolates. Mol. Biol. Evol. 24, 1702–1713. 7. Sakaguchi M, Nakayama T, Hashimoto T, Inouye I. 1421374112) (doi:10.1093/molbev/msm089) 2005 Phylogeny of the Centrohelida inferred from 14. Keeling PJ. 2013 The number, speed, and impact of 21. Patron NJ, Inagaki Y, Keeling PJ. 2007 Multiple gene SSU rRNA, tubulins, and actin genes. J. Mol. Biol. plastid endosymbioses in eukaryotic evolution. phylogenies support the monophyly of 61, 765–775. (doi:10.1007/s00239-005-0006-6) Annu. Rev. Biol. 64, 27.1–27.25. (doi:10. cryptomonad and haptophyte host lineages. Curr. 8. Nikolaev SI, Berney C, Petrov NB, Mylnikov AP, 1146/annurev-arplant-050312-120144) Biol. 17, 887–891. (doi:10.1016/j.cub.2007.03.069) Fahrni JF, Pawlowski J. 2006 Phylogenetic position 15. Mikrjukov KA, Siemensma FJ, Patterson DJ. 2000 22. Katz LA, Grant JR. 2014 Taxon-rich phylogenomic of Multicilia marina and the evolution of Phylum Heliozoa. In The Illustrated guide to the analyses resolve the eukaryotic tree of life and Amoebozoa. Int. J. Syst. Evol. Microbiol. 56, protozoa (eds JJ Lee, GF Leedale, P Bradbury), reveal the power of subsampling by sites. Syst. Biol. 1449–1458. (doi:10.1099/ijs.0.63763-0) pp. 860–871. Lawrence, KS: Society of Protozoologists. 64, 406–415. (doi:10.1093/sysbio/syu126) Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

23. Rice DW, Palmer JD. 2006 An exceptional horizontal 35. Wang H-C, Susko E, Roger AJ. 2014 An amino acid ‘supergroups’. Proc. Natl Acad. Sci. USA 106, 9 gene transfer in plastids: gene replacement by a substitution–selection model adjusts residue fitness 3859–3864. (doi:10.1073/pnas.0807880106) rspb.royalsocietypublishing.org distant bacterial paralog and evidence that to improve phylogenetic estimation. Mol. Biol. Evol. 48. Baurain D et al. 2010 Phylogenomic evidence for haptophyte and cryptophyte plastids are sisters. 31, 779–792. (doi:10.1093/molbev/msu044) separate acquisition of plastids in cryptophytes, BMC Biol. 4, 31. (doi:10.1186/1741-7007-4-31) 36. Minh BQ, Nguyen MAT, von Haeseler A. 2013 haptophytes, and stramenopiles. Mol. Biol. Evol. 27, 24. Okamoto N, Chantangsi C, Horak A, Leander BS, Ultrafast approximation for phylogenetic bootstrap. 1698–1709. (doi:10.1093/molbev/msq059) Keeling PJ. 2009 Molecular phylogeny and Mol. Biol. Evol. 30, 1188–1195. (doi:10.1093/ 49. Burki F, Imanian B, Hehenberger E, Hirakawa Y, description of the novel Roombia molbev/mst024) Maruyama S, Keeling PJ. 2014 Endosymbiotic gene truncata gen. et sp. Nov., and establishment of the 37. Guindon S, Dufayard J-F, Lefort V, Anisimova M, transfer in tertiary plastid-containing Hacrobia taxon nov. PLoS ONE 4, e7080. (doi:10. Hordijk W, Gascuel O. 2010 New algorithms and . Eukaryot. Cell 13, 246–255. (doi:10. 1371/journal.pone.0007080) methods to estimate maximum-likelihood 1128/EC.00299-13)

25. Burki F, Okamoto N, Pombert J-F, Keeling PJ. 2012 phylogenies: assessing the performance of PhyML 50. Burki F, Flegontov P, Obornik M, Cihlar J, Pain A, B Soc. R. Proc. The evolutionary history of haptophytes and 3.0. Syst. Biol. 59, 307–321. (doi:10.1093/sysbio/ Lukes J, Keeling PJ. 2012 Reevaluating the green cryptophytes: phylogenomic evidence for separate syq010) versus red signal in eukaryotes with secondary origins. Proc. R Soc. B 279, 2246–2254. (doi:10. 38. Lartillot N, Lepage T, Blanquart S. 2009 PhyloBayes plastid of red algal origin. Genome Biol. Evol. 4, 1098/rspb.2011.2301) 3: a Bayesian software package for phylogenetic 626–635. (doi:10.1093/gbe/evs049) 26. Brown MW, Sharpe SC, Silberman JD, Heiss AA, reconstruction and molecular dating. Bioinformatics 51. Deschamps P, Moreira D. 2012 Re-evaluating the 283 Lang BF, Simpson AGB, Roger AJ. 2013 25, 2286–2288. (doi:10.1093/bioinformatics/ green contribution to diatom genomes. Genome 20152802 : Phylogenomics demonstrates that breviate btp368) Biol. Evol. 4, 795–800. (doi:10.1093/gbe/evs053) are related to and 39. Burki F, Corradi N, Sierra R, Pawlowski J, Meyer GR, 52. Curtis BA et al. 2012 Algal genomes reveal apusomonads. Proc. R. Soc. B 280, 20131755. Abbott CL, Keeling PJ. 2013 Phylogenomics of the evolutionary mosaicism and the fate of (doi:10.1016/j.protis.2010.06.004) intracellular parasite Mikrocytos mackini reveals . Nature 492, 59–65. (doi:10.1038/ 27. Yabuki A, Kamikawa R, Ishikawa SA, Kolisko M, Kim evidence for a in rhizaria. Curr. Biol. 23, nature11681) E, Tanabe AS, Kume K, Ishida K-I, Inagki Y. 2014 1541–1547. (doi:10.1016/j.cub.2013.06.033) 53. Kim EE, Graham LE. 2008 EEF2 analysis challenges Palpitomonas bilix represents a basal cryptist 40. Philippe H, Brinkmann H, Copley RR, Moroz LL, the monophyly of Archaeplastida and lineage: insight into the character evolution in Nakano H, Poustka AJ, Wallberg A, Peterson KJ, . PLoS ONE 3, e2621. (doi:10.1371/ Cryptista. Sci. Rep. 4, 4641. (doi:10.1038/srep04641) Telford MJ. 2011 Acoelomorph flatworms are journal.pone.0002621) 28. Keeling PJ et al. 2014 The marine microbial deuterostomes related to Xenoturbella. Nature 470, 54. Archibald JM. 2009 The puzzle of plastid evolution. eukaryote transcriptome sequencing project 255–258. (doi:10.1038/nature09676) Curr. Biol. 19, R81–R88. (doi:10.1016/j.cub.2008. (MMETSP): illuminating the functional diversity of 41. Lartillot N, Philippe H. 2008 Improvement of 11.067) eukaryotic life in the oceans through transcriptome molecular phylogenetic inference and the 55. Bhattacharya D, Archibald JM, Weber APM, Reyes- sequencing. PLoS Biol. 12, e1001889. (doi:10.1371/ phylogeny of . Proc. R. Soc. B 363, Prieto A. 2007 How do endosymbionts become journal.pbio.1001889) 1463–1472. (doi:10.1098/rstb.2007.2236) organelles? Understanding early events in plastid 29. Roure B, Rodriguez-Ezpeleta N, Philippe H. 2007 42. Derelle R, Torruella G, Klimesˇ V, Brinkmann H, Kim evolution. Bioessays 29, 1239–1246. (doi:10.1002/ SCaFoS: a tool for selection, concatenation and E, Vlcˇek Cˇ, Lang BF, Elia´ˇs M. 2015 Bacterial proteins bies.20671) fusion of sequences for phylogenomics. BMC pinpoint a single eukaryotic root. Proc. Natl Acad. 56. Qiu H, Yang EC, Bhattacharya D, Yoon HS. 2012 Evol. Biol. 7(Suppl. 1), S2. (doi:10.1186/1471-2148- Sci. USA 112, E693–E699. (doi:10.1073/pnas. Ancient gene paralogy may mislead inference of 7-S1-S2) 1420657112) plastid phylogeny. Mol. Biol. Evol. 29, 3333–3343. 30. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 43. Cavalier-Smith T. 2013 Symbiogenesis: mechanisms, (doi:10.1093/molbev/mss137) 2015 IQ-TREE: a fast and effective stochastic evolutionary consequences, and systematic 57. Janouskovec J, Horak A, Obornik M, Lukes J, Keeling algorithm for estimating maximum-likelihood implications. Annu. Rev. Ecol. Evol. Syst. 44, 145– PJ. 2010 A common red algal origin of the phylogenies. Mol. Biol. Evol. 32, 268–274. (doi:10. 172. (doi:10.1146/annurev-ecolsys-110411-160320) apicomplexan, dinoflagellate, and 1093/molbev/msu300) 44. Bra˚te J, Klaveness D, Rygh T, Jakobsen KS, plastids. Proc. Natl Acad. Sci. USA 107, 10 949– 31. Criscuolo A, Gribaldo S. 2010 BMGE (block mapping Shalchian-Tabrizi K. 2010 Telonemia-specific 10 954. (doi:10.1073/pnas.1003335107) and gathering with entropy): a new software for environmental 18S rDNA PCR reveals unknown 58. Sˇevcˇ´ıkova´ T et al. 2015 Updating algal evolutionary selection of phylogenetic informative regions from diversity and multiple marine-freshwater relationships through plastid genome sequencing: multiple sequence alignments. BMC Evol. Biol. 10, colonizations. BMC Microbiol. 10, 168. (doi:10.1186/ did plastids emerge through 210. (doi:10.1186/1471-2148-10-210) 1471-2180-10-168) endosymbiosis of an ochrophyte? Sci. Rep. 5, 10134. 32. Cummins CA, McInerney JO. 2011 A method for 45. Seenivasan R, Sausen N, Medlin LK, Melkonian M. (doi:10.1038/srep10134) inferring the rate of evolution of homologous 2013 Picomonas judraskeda gen. et sp. nov.: the 59. Cavalier-Smith T. 1999 Principles of protein and characters that can potentially improve phylogenetic first identified member of the Picozoa phylum nov., lipid targeting in secondary symbiogenesis: inference, resolve deep divergence and correct a widespread group of picoeukaryotes, formerly euglenoid, dinoflagellate, and sporozoan plastid systematic biases. Syst. Biol. 60, 833–844. (doi:10. known as ‘picobiliphytes’. PLoS ONE 8, e59565. origins and the eukaryote family tree. J. Eukaryot. 1093/sysbio/syr064) (doi:10.1371/journal.pone.0059565.s006) Microbiol. 46, 347–366. (doi:10.1111/j.1550-7408. 33. Soubrier J, Steel M, Lee MSY, Der Sarkissian C, 46. Rodriguez-Ezpeleta N, Brinkmann H, Roure B, 1999.tb04614.x) Guindon S, Ho SYW, Cooper A. 2012 The influence Lartillot N, Lang BF, Philippe H. 2007 Detecting and 60. Zimorski V, Ku C, Martin WF, Gould SB. 2014 of rate heterogeneity among sites on the time overcoming systematic errors in genome-scale Endosymbiotic theory for organelle origins. Curr. dependence of molecular rates. Mol. Biol. Evol. 29, phylogenies. Syst. Biol. 56, 389–399. (doi:10.1080/ Opin. Microbiol. 22, 38–48. (doi:10.1016/j.mib. 3345–3358. (doi:10.1093/molbev/mss140) 10635150701397643) 2014.09.008) 34. Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis 47. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, 61. Stork S, Moog D, Przyborski JM, Wilhelmi I, Zauner A. 2014 Selecting optimal partitioning schemes for Simpson AGB, Roger AJ. 2009 Phylogenomic S, Maier UG. 2012 Distribution of the SELMA phylogenomic datasets. BMC Evol. Biol. 14, 82. analyses support the monophyly of Excavata and translocon in secondary plastids of red algal origin (doi:10.1186/1471-2148-14-82) resolve relationships among eukaryotic and predicted uncoupling of ubiquitin-dependent Downloaded from http://rspb.royalsocietypublishing.org/ on October 18, 2017

translocation from degradation. Eukaryot Cell 11, 64. Bodyl A, Stiller JW, Mackiewicz P. 2009 Chromalveolate 67. Keeling PJ. 2010 The endosymbiotic origin, 10 1472–1482. (doi:10.1128/EC.00183-12) plastids: direct descent or multiple endosymbioses? diversification and fate of plastids. Phil. rspb.royalsocietypublishing.org 62. Petersen J, Ludewig A-K, Michael V, Bunk B, Jarek M, Trends Ecol. Evol. 24, 119–121; author reply 121–122. Trans. R. Soc. B 365, 729–748. (doi:10.1098/rstb. Baurain D, Brinkmann H. 2014 Chromera velia, (doi:10.1016/j.tree.2008.11.003) 2009.0103) endosymbioses and the rhodoplex hypothesis: plastid 65. Sanchez Puerta MV, Delwiche CF. 2008 A hypothesis 68. Wisecaver JH, Hackett JD. 2011 Dinoflagellate evolution in cryptophytes, alveolates, stramenopiles, for plastid evolution in chromalveolates. J. Phycol. genome evolution. Annu. Rev. Microbiol. 65, and haptophytes (CASH lineages). Genome Biol. Evol. 44, 1097–1107. (doi:10.1111/j.1529-8817.2008. 369–387. (doi:10.1146/annurev-micro-090110- 6, 666–684. (doi:10.1093/gbe/evu043) 00559.x) 102841) 63. Stiller JW, Schreiber J, Yue J, Guo H, Ding Q, Huang 66. Bolte K, Bullmann L, Hempel F, Bozarth A, Zauner 69. Patron NJN, Waller RF. 2007 Transit peptide diversity J. 2014 The evolution of in chromist S, Maier UG. 2009 Protein targeting into secondary and divergence: a global analysis of plastid algae through serial endosymbioses. Nat. Commun. plastids. J. Eukaryot. Microbiol. 56, 9–15. (doi:10. targeting signals. Bioessays 29, 1048–1058.

5, 5764. (doi:10.1038/ncomms6764) 1111/j.1550-7408.2008.00370.x) (doi:10.1002/bies.20638) B Soc. R. Proc. 283 20152802 :