<<

with relic nuclei as models for elucidating organellogenesis

Chihiro Saraia,1, Goro Tanifujib,c,1,2, Takuro Nakayamad,e,1, Ryoma Kamikawaf,1, Kazuya Takahashia,g, Euki Yazakih, Eriko Matsuoi, Hideaki Miyashitaf, Ken-ichiro Ishidab, Mitsunori Iwatakia,g,2, and Yuji Inagakid,i,2

aGraduate School of Science and Engineering, Yamagata University, 990-8560 Yamagata, Japan; bFaculty of Life and Environmental Sciences, University of Tsukuba, 305-8572 Tsukuba, Japan; cDepartment of Zoology, National Museum of Nature and Science, 305-0005 Tsukuba, Japan; dCenter for Computational Sciences, University of Tsukuba, 305-8577 Tsukuba, Japan; eGraduate School of Life Sciences, Tohoku University, 980-8578 Sendai, Japan; fGraduate School of Human and Environmental Studies, Kyoto University, 606-8501 Kyoto, Japan; gAsian Natural Environmental Science Center, The University of Tokyo, 113-8657 Tokyo, Japan; hDepartment of Biochemistry and Molecular Biology, Graduate School and Faculty of Medicine, The University of Tokyo, 113-0033 Tokyo, Japan; and iGraduate School of Life and Environmental Sciences, University of Tsukuba, 305-8572 Tsukuba, Japan

Edited by W. Ford Doolittle, Dalhousie University, Halifax, NS, Canada, and approved January 29, 2020 (received for review July 15, 2019) are relic endosymbiont nuclei so far found only in The evolutionary process of integrating an endosymbiont into two algal groups, cryptophytes and , which the host cell (organellogenesis) has yet to be fully understood. have been studied to model the evolutionary process of integrat- Nevertheless, genomic data from diverse eukaryotic lineages ing an endosymbiont alga into a host-governed (organello- indicated that endosymbiont genomes should have lost a massive genesis). However, past studies suggest that DNA transfer from number of genes that were dispensable for intracellular/endo- the endosymbiont to host nuclei had already ceased in both symbiotic lifestyles (1–5). It is most likely that the reduction of cryptophytes and chlorarachniophytes, implying that the organ- endosymbiont genomes and integration of the endosymbiont into ellogenesis at the genetic level has been completed in the two the host progressed simultaneously during organellogenesis (1– systems. Moreover, we have yet to pinpoint the closest free-living 4). The reductive process that occurred in endosymbiont ge- relative of the endosymbiotic alga engulfed by the ancestral nomes seemingly has a tight correlation with genome G + C or cryptophyte, making it difficult to infer content (GC%), as reduced endosymbiont genomes are com- how organellogenesis altered the endosymbiont genome. To monly poor in G and C (6–9). To interlock the host and endo- counter the above issues, we need novel -bearing symbiont metabolically and genetically, we regard, transfer of EVOLUTION algae, in which endosymbiont-to-host DNA transfer is on-going endosymbiont genes to the host nuclear genome (endosymbiotic and for which endosymbiont/plastid origins can be inferred at a ’ fine taxonomic scale. Here, we report two previously undescribed gene transfer, EGT), coupled with the host s invention of ma- dinoflagellates, strains MGD and TGD, with green algal endosym- chineries that enable them to express the transferred genes and bionts enclosing as well as relic nuclei (nucleomorphs). We target the gene products to the original compartment, as critical provide evidence for the presence of DNA in the two nucleo- morphs and the transfer of endosymbiont genes to the host (di- Significance noflagellate) genomes. Furthermore, DNA transfer between the host and endosymbiont nuclei was found to be in progress in both We report here two previously undescribed dinoflagellates the MGD and TGD systems. Phylogenetic analyses successfully re- that can be models for elucidating the genome evolution as- solved the origins of the at the genus level. With sociated with transforming an endosymbiotic alga into a the combined evidence, we conclude that the host–endosymbiont plastid (organellogenesis). The two strains pos- integration in MGD/TGD is less advanced than that in crypto- sess green algal endosymbionts enclosing the plastids and relic phytes/chrorarachniophytes, and propose the two dinoflagellates nuclei (nucleomorphs). Our analyses indicated that DNA as models for elucidating organellogenesis. transfer from the nucleomorph to the host nuclear genome is in progress in both dinoflagellates, even though their endo- secondary endosymbiosis | nucleomorph | endosymbiotic gene transfer | symbiotic algae have been transformed into organelles (plas- plastid | Pedinophyceae tids). Moreover, the origins of the two endosymbionts were resolved at the genus level. These two features found in the he transformation of a free-living photosynthetic organism two dinoflagellates are absent from well-studied nucleomorph- Tinto a plastid through endosymbiosis has occurred multiple bearing algal lineages, namely cryptophytes and chlorar- times in eukaryotic evolution. The first plastid was most likely achniophytes. Consequently, the two dinoflagellates assist us in established through “primary endosymbiosis” between a cyano- understanding endosymbiosis-driven eukaryotic genome evolu- bacterium and the common ancestor of red, , and tion at a finer scale. green algae (plus descendants of green algae, i.e., land ) – Author contributions: G.T., M.I., and Y.I. designed research; C.S., G.T., T.N., R.K., K.T., H.M., (1 4). The plastids in the three lineages described above are the and K.-i.I. performed research; C.S., K.T., K.-i.I., and M.I. isolated and cultivated the two direct descendants of the cyanobacterial endosymbiont, and strains of dinoflagellates used in this study; G.T., T.N., R.K., E.Y., E.M., and Y.I. analyzed designated as “primary plastids.” After major eukaryotic lineages data; and C.S., G.T., T.N., R.K., M.I., and Y.I. wrote the paper. diverged, some heterotrophs turned into phototrophs by ac- The authors declare no competing interest. quiring “secondary plastids” through algal endosymbionts bear- This article is a PNAS Direct Submission. ing primary plastids (secondary endosymbioses). Secondary Published under the PNAS license. endosymbioses most likely occurred multiple times in eukaryotic Data deposition: The transcriptome data from the dinoflagellate strains MGD and TGD evolution, as the host lineages bearing secondary plastids (so- are available from the DNA Data Bank Japan (BioProject PRJDB8237). called complex algae) are distantly related to one another (1– 1C.S., G.T., T.N., and R.K. contributed equally to this work. 4). In addition, the origins of secondary plastids vary among 2To whom correspondence may be addressed. Email: [email protected], iwataki@ complex algae; some possess red alga-derived plastids, while anesc.u-tokyo.ac.jp, or [email protected]. others possess green alga-derived plastids, strongly arguing that This article contains supporting information online at https://www.pnas.org/lookup/suppl/ the two types of secondary plastids were established through doi:10.1073/pnas.1911884117/-/DCSupplemental. separate (at least two) endosymbiotic events (1–4).

www.pnas.org/cgi/doi/10.1073/pnas.1911884117 PNAS Latest Articles | 1of12 Downloaded by guest on October 1, 2021 (1–4, 10). Nevertheless, the precise process that enables organ- molecular evidence for the diatom endosymbionts being modified ellogenesis still remains unclear. extensively during the endosymbiosis (54, 60–64). No intracellular structures of the endosymbiotic algae were Here, we report two undescribed dinoflagellates, strains MGD left except plastids in most of the complex algae (euglenids, and TGD, with green alga-derived plastids containing Chls a+b. chromerids, haptophytes, ochrophytes, and dinoflagellates). Only The two dinoflagellates are distinct from each other in terms of cryptophytes and chlorarachniophytes are known to retain nucle- their cell morphologies, and no clear affinity between the two omorphs, the relic nuclei of their algal endosymbionts (1, 8, 9, 11, hosts was recovered by molecular phylogenetic analyses. In both 12). As the two complex algae bearing nucleomorphs possess the MGD and TGD, conspicuous nucleus-like structures with DNA morphological characteristics that have been lost from others, were identified in the periplastidal compartments (PPCs) that they were thought to provide clues to understand the detailed correspond to the endosymbiont cytoplasm. We successfully process of organellogenesis (1, 8, 9). In this regard, the genomic obtained evidence for the green algal endosymbionts being data of both nuclei and nucleomorphs, as well as transcriptomic genetically integrated into the dinoflagellate host cells. Both and proteomic data, have been accumulated for cryptophytes and green algal endosymbionts showed clear phylogenetic affinities to chlorarachniophytes (13–22), and genetic transformation was Pedinophyceae, a particular group of green algae. Taken together, established for a chlorarachniophyte species (23). It has been these observations lead us to conclude that MGD and TGD are determined that a red alga and an ulvophyte green alga are nucleomorph-bearing organisms harboring Chls a+b-containing the sources of cryptophyte and chlorarachniophyte plastids, re- plastids derived from endosymbiotic Pedinophyceae green algae. spectively (18, 24–28), but an even recent multigene phylogenetic Finally, we propose the two dinoflagellates as models to study the analyses did not provide finer resolution in determining the closest genome evolution associated with secondary endosymbiosis. living species/genus for the origins of the two plastids (18, 28–32). Such uncertainties are potential drawbacks of using crypto- Results phytes and chlorarachniophytes as the model organisms to MGD and TGD Possess Nucleomorphs. Dinoflagellate strains MGD study the effect of the reductive process on the endosymbiont and TGD were isolated from two distinct coastal locations in genome during secondary endosymbiosis. Without pinpointing Japan and their monoclonal cultures have been maintained in the precise origin of the plastid (or endosymbiotic alga that the laboratory (Materials and Methods). Both strains contain gave rise to the plastid), it is difficult to reconstruct the original permanent green-colored plastids instead of peridinin-containing gene contents of the nuclear genome of the endosymbionts, and plastids in major photosynthetic dinoflagellates. Their overall the reductive process that shaped the current nucleomorph cell structures appeared to be distinct from each other (Fig. 1 B genomes in the two algal lineages. Thus, it is ideal to find and and F). The green-colored plastids in the two dinoflagellates are investigate a novel nucleomorph-bearing organism, for which most likely of green algal origin, as pigment profiles similar to the endosymbiont origin is resolved at a fine evolutionary scale, green algae were detected from MGD and TGD (SI Appendix, and compare it with cryptophytes and chlorarachniophytes. Fig. S1). Significantly, the cell structure and plastid shape of However, no novel nucleomorph-bearing lineage has been found Lepidodinium spp. (41, 42), a previously described species since the discovery of the nucleomorph in chlorarachniophytes in bearing green alga-derived plastids, are distinctive from those of 1984 (12). MGD and TGD (SI Appendix, Fig. S1). Under transmission Dinoflagellates are a eukaryotic group belonging to Alveolata, electron microscopy (TEM) observations, nucleus-like structures comprising both photosynthetic and nonphotosynthetic species with double-membranes were found in the spaces between the (33, 34). The vast majority of photosynthetic dinoflagellates second and third plastid membranes, which corresponds to the possess red alga-derived plastids containing a unique carotenoid cytoplasm of the endosymbiont algae, in MGD and TGD (Fig. 1 peridinin (35, 36). It is widely accepted that a peridinin-containing A, C, E, and G). Each of the dinoflagellate cells examined most plastid already existed in the ancestral dinoflagellate, and photo- likely contained a single plastid associated with a single nucleus- synthetic capacity has been lost secondarily on multiple branches like structure (Fig. 1 A, D, E, and H), The nucleus-like structures of the tree of dinoflagellates (37–39). In addition to multiple losses in MGD and TGD are clearly distinct from characteristic di- of photosynthesis, different types of noncanonical plastids lacking noflagellate nuclei in the cytoplasm (Fig. 1 A and E). peridinin have been reported. So far, three types of noncanonical We conducted fluorescent microscopic observations to exam- plastids have been found: 1) The plastids containing chlorophylls ine whether the nucleus-like structure in the PPCs contains any (Chls) a and c plus 19′ hexanoyloxyfucoxanthin in the family DNA by SYBR green staining. In our preliminary observations Kareniaceae (e.g., Karenia brevis); 2) those containing Chls a and b staining whole MGD and TGD cells, a DNA signal from the (Chls a+b) in the genus Lepidodinium (e.g., Lepidodinium chlor- endosymbiont compartment was undetectable, due to the over- ophorum); and 3) those containing Chls a and c plus fucoxanthin powering brightness of the dinoflagellate nucleus. Thus, the in the family Kryptoperidineaceae (e.g., Durinskia baltica)(36,40– plastid enclosing the nucleus-like structure was separated from 43). The pigment composition, together with molecular phyloge- the dinoflagellate nucleus prior to SYBR green staining. We nies inferred from plastid genes clearly designated the origins of confirmed that the isolated plastids were associated physically the first, second, and third noncanonical plastids described above with the nucleus-like structures by TEM observation (SI Ap- as a haptophyte, a green alga, and a diatom, respectively (44–49). pendix, Fig. S2). The observation of the SYBR green-stained In the species with the haptophyte-derived plastids, the endo- plastid samples successfully detected clear DNA signals on the symbiotic algae are regarded as being fully integrated into the surface of the plastid, a location consistent with the nucleus-like dinoflagellate (host) cells, because no cellular component except structures in the PPCs (Fig. 1 D and H). The results described the plastid remains, and gene transfer from the endosymbiont above strongly suggest that the nucleus-like structures in the genome to the dinoflagellate genome (i.e., EGT) has been de- PPCs are derived from the genome-containing nuclei of the tected (47, 50–56). In Lepidodinium viride with a green alga- green algal endosymbionts in MGD and TGD. derived plastid, although a nucleus-like structure in the compart- In dinoflagellate species bearing obligate diatom endosymbi- ment corresponding to the cytoplasm of the endosymbiont alga onts, collectively called “dinotoms” (e.g., Kryptoperidinium folia- was reported (41), it is still controversial (Discussion). On the ceum and D. baltica), the endosymbionts retain their nuclei and other hand, the species with the third noncanonical plastids, de- mitochondria as well as the plastids (44, 49). The presence of rived from diatoms, are unique in maintaining major cellular mitochondria in the endosymbiont alga-derived compartment components of the endosymbiont, such as the plastid, nucleus, and suggests that energy production in the endosymbiont has yet to (57–59). To our knowledge, there has been no be fully brought under host control in the dinotom systems. Thus,

2of12 | www.pnas.org/cgi/doi/10.1073/pnas.1911884117 Sarai et al. Downloaded by guest on October 1, 2021 EVOLUTION

Fig. 1. Morphology of undescribed dinoflagellate strains MGD (A–D) and TGD (E–H). (A and E) Cross sections of the cell under TEM, showing the di- noflagellate nucleus (DN), nucleomorph (Nm), plastid (Pl), and PPC. (B and F) Whole-cell light micrographs. (C and G) Enlarged image of a cross section of the cell under TEM observation. (D and H) Fluorescent microscopy with SYBER green I-staining image.

the endosymbionts in dinotoms are much less morphologically There are two possibilities for which genome (or genomes) the reduced and host-governed than those in cryptophytes or chlor- “green algal genes” encoding green algal proteins reside in, arachniophytes, which contain residual nuclei (nucleomorphs) and depending on how the host and endosymbiont are integrated plastids, but no mitochondrion (1, 8, 9). Our TEM observations with each other in the MGD and TGD systems. If the host– detected ribosomes but no mitochondrion in the PPCs of MGD or endosymbiont integration at the genetic level has yet to be estab- TGD (Fig. 1 A and E and SI Appendix, Fig. S2), suggesting that lished in the two systems [as proposed for the dinotom system; their green algal endosymbionts were ultrastructurally reduced Hehenberger et al. (64)], green algal genes are anticipated to be and governed by the host to a similar degree seen in the endosym- found exclusively in their endosymbiont genomes. However, the bionts of cryptophytes and chlorarachniophytes. The microscopic endosymbiont-derived compartments of MGD and TGD are data described above strongly suggest that: 1) The endosymbiont ultrastructurally more reduced than those of dinotoms, imply- compartments in both MGD and TGD are indeed endosymbiont- ing that the host–endosymbiont integration is more advanced in derived organelles, not obligate endosymbionts; and 2) the the former systems than the latter. If the host–endosymbiont nucleus-like structures found in the PPCs of the two dinofla- integration of the MGD and TGD systems has reached the gellates are equivalent to cryptophyte and chlorarachniophyte genetic level observed in cryptophytes and chlorarachniophytes nucleomorphs. Thus, we conclude that both MGD and TGD (20), then some of the green algal genes were likely trans- possess nucleomorphs derived from the nuclei of their green ferred from the endosymbiont genome to the host genome algal endosymbionts. (i.e., EGT). We repeated the procedure described above to retrieve the Recent Genetic Integration of the Green Algal Endosymbionts in MGD transcripts encoding the proteins conserved among and TGD. In this section, we provide the evidence for the host and (including dinoflagellates) in MGD and TGD. Such “ endosymbiont being genetically interlocked with each other in transcripts” were most likely expressed from the host (di- MGD and TGD. From the transcriptomic data generated from noflagellate) genomes. Alveolate transcripts in MGD formed a the MGD and TGD cells cultured in the laboratory, we pre- cluster in the two dimensional plot based on the GC% of first dicted 57,983 and 73,589 transcripts encoding putative proteins, codon positions and that of third codon positions (Fig. 2A, dots respectively. Of the putative proteins in MGD and TGD, 534 in orange). Similarly, alveolate transcripts from TGD formed a and 961, respectively, showed high amino acid sequence simi- cluster in the same plot, but shifted toward higher GC% in third larity to nucleus-encoded proteins of free-living green algae. codon positions (Fig. 2B). In sharp contrast, green algal tran- Hereafter, the abovementioned transcripts/proteins are referred scripts from both MGD and TGD were found to be split into two to as “green algal transcripts/proteins.” populations (Fig. 2 A and B, dots in green), and the population

Sarai et al. PNAS Latest Articles | 3of12 Downloaded by guest on October 1, 2021 Fig. 2. Scatter plots showing the distribution of GC% (A and B) and box plots for putative N-terminal extension (C and D) of the transcripts found in TGD (Left) and MGD (Right). (A and B)Thex and y axes show the GC% of first and third codon positions, respectively. Plots in green and orange represent the transcripts encoding the putative green algal and alveolate proteins, respectively. In both plots, green algal transcripts were divided into two populations based on GC%, and the ones with higher GC% overlapped with the masses of alveolate transcripts, which were presumably expressed from the dinoflagellate nuclear genomes. (C and D) Box plots of N-terminal extension of green algal transcripts with low and high GC%. The x axes show lengths of putative N- terminal extensions (see Material and Methods). P values displayed in the plots were calculated based on the Wilcoxon rank-sum test.

with high GC% overlapped with alveolate transcripts. Likewise, a from TGD by RT-PCR using a forward primer matching the di- codon usage-based analysis also split green algal transcripts into two noflagellate SL sequence (66) and reverse primers matching spe- populations and one of them overlapped with the cluster of al- cifically to the individual transcripts. As presented in SI Appendix, veolate transcripts (SI Appendix, Fig. S3). Fig. S5,the5′ termini of the three transcripts with high GC% were We estimated the abundance of each of the green algal/alveolate confirmed by PCR amplification followed by Sanger sequencing, transcripts in MGD and TGD by calculating FPKM (fragments per while no amplification was observed for the three transcripts with kilo-base transcript length per million fragments mapped) (65). low GC% (SI Appendix,Fig.S5, Upper). We subjected six green Green algal transcripts with high GC% and alveolate transcripts algal transcripts identified from MGD to the same RT-PCR ex- appeared to be expressed at similar levels in both dinoflagellates (SI periments, and observed that the amplification of the 5′ termini Appendix,Fig.S4). On the other hand, the average FPKM for green occurred only for the transcripts with high GC% (SI Appendix, Fig. algal transcripts with low GC% (334.8 and 279.6 for MGD and S5, Lower). We systematically looked for green algal transcripts TGD, respectively) appeared to be higher than those for the tran- bearing 5′ sequences that matched with the 3′ portion of the SL scriptswithhighGC%(5.7and4.8forMGDandTGD,re- sequence (5′-TTTTGGCTCAAG-3′). We detected the partial SL spectively). Similar differences in transcriptional intensity between sequence in 78 of the 534 green algal transcripts in MGD and the the nuclear and nucleomorph genomes has been documented in vast majority of the SL-bearing transcripts were classified into the cryptophytes and chlorarachniophytes (21, 22). high-GC% category, albeit a few transcripts were on the boundary Mature mRNA molecules transcribed from dinoflagellate nu- between high- and low-GC% categories (SI Appendix,Fig.S6). In clear genomes are known to possess a particular short sequence TGD, 9 SL-bearing transcripts, all of which are of high-GC%, (spliced leader or SL sequence; 5′-CCGTAGCCATTTTGGCT- were detected among the 961 green algal transcripts. CAAG-3′) at their 5′ termini (66). Thus, green algal transcripts We confirmed the presence of the SL sequence in a subset of expressed from the host nuclear genome are anticipated to be green algal transcripts (see above), and the SL-bearing transcripts preceded by the SL sequences. We examined the presence/ are compelling evidence that the host genomes of MGD and TGD absence of the SL sequence in six green algal transcripts identified carry and transcribe genes acquired from the genomes of their

4of12 | www.pnas.org/cgi/doi/10.1073/pnas.1911884117 Sarai et al. Downloaded by guest on October 1, 2021 green algal endosymbionts. We should be cautious not to regard In photosynthetic with primary plastids surrounded the GC% of green algal transcripts as the absolute “probe” for the by two membranes (e.g., green algae), many of the nucleus- presence/absence of the SL sequence, but anticipate that a sub- encoded plastid-targeted proteins bear N-terminal extensions stantial portion of high-GC% green algal transcripts are expressed (so-called “transit peptides”) that work as a plastid-targeting from the dinoflagellate nuclear genomes and receive the SL se- signal. The N-terminal extensions of the nucleus-encoded pro- quences posttranscriptionally in both MGD and TGD. Although teins targeted to complex plastids, which are surrounded by three the precise magnitude of EGT is currently uncertain in the MGD or four membranes, tend to have a bipartite structure compris- or TGD system, we here conclude that the host and endosymbi- ing a hydrophobic “signal peptide” (SP) followed by the tran- ont in the two dinoflagellates have been integrated by the process sit peptide-like region (68). As both MGD and TGD plastids of EGT to the same extent as seen in cryptophytes and are surrounded by four membranes (Fig. 1 A and E), the chlorarachniophytes. nucleus-encoded plastid-targeted proteins of the two dinofla- We additionally searched for the transcripts encoding enzymes gellates should have a bipartite plastid-targeting signal. The involved in C5 pathway for the heme biosynthesis, and chloro- proteins encoded by endosymbiotically transferred green algal phyll a (Chl-a) and isopentenyl diphosphate (IPP) biosynthetic genes unlikely acquired their bipartite signals ab initio, and pathways that are typically localized in plastids (SI Appendix, Fig. instead needed to modify their N-terminal extensions by S7). Indeed, some of the transcripts were predicted to encode appending the SPs to be targeted into the current MGD and TGD proteins with typical plastid-targeting signals (SI Appendix, Fig. plastids. If the above scenario is the case, the nucleus-encoded S7). Heme and IPP pathways in both MGD and TGD appeared plastid-targeted green algal proteins in MGD and TGD pos- to be dominated by “vertically inherited (VI)-type” enzymes that sess N-terminal extensions longer than those of the homolo- share their origins with the homologs of peridinin-containing gous proteins in free-living green algae with primary plastids. dinoflagellates. Thus, besides EGT, substantial proportions of It is reasonable to expect that the nucleus-encoded plastid- the current plastid proteomes in MGD and TGD may have been targeted green algal proteins are a subset of the green algal inherited from the ancestral plastids containing peridinin. In proteins encoded by the putative nuclear transcripts (i.e., green addition to VI-type genes, “laterally acquired (LA)-type” genes algal transcripts with high GC%). As expected, the nucleus- from diverse organisms distantly related to dinoflagellates (host) encoded green algal proteins tend to bear significantly longer N-terminal extensions than the nucleomorph-encoded green or green algae (endosymbiont) were detected in the pathways C D assessed here (SI Appendix, Fig. S6). According to the shopping algal proteins (Fig. 2 and ), implying that these N-terminal bag hypothesis (67), LA-type genes could have made MGD/ extensions have bipartite structures. Indeed, some green algal proteins in MGD and TGD were predicted to have the typical EVOLUTION TGD preadapted for plastid acquisition. To examine the afore- SI Appendix mentioned possibility, it is critical to examine whether each LA- bipartite plastid-targeting signals ( ,Fig.S9). The results described above suggest that both MGD and TGD type gene was acquired in the dinoflagellate genome prior to the possess nucleus-encoded green algal proteins, which are local- green algal endosymbiosis by comparative studies of plastid ized posttranslationally in the plastid. On the other hand, for proteomes sampled from diverse dinoflagellates, particularly the green algal proteins encoded by green algal transcripts with those closely related to MGD and TGD. low GC% (i.e., putative nucleomorph transcripts), it is un- Green algal transcripts with low GC% (Fig. 2 A and B) likely necessary to have the N-terminal extensions with the bipartite bear no SL sequence at their 5′ termini, implying the presence of structure. a second genome with a GC% content lower than the di- Our detailed assessment, focused on green algal transcripts, noflagellate nuclear genome, in both MGD and TGD. Accu- identified multiple psbO and rbcS transcripts with distinct GC% mulated genomic data clearly suggest that endosymbiont and in MGD (Fig. 3). Some of these green algal transcripts showed organellar genomes underwent reductive evolution (e.g., cryp- high affinity to particular green algae, spp., that tophyte and chlorarachniophyte nucleomorph genomes) and – were closely related to the endosymbionts that gave rise to the tend to have low GC% (8, 9, 13 19). Considering the reduced current plastids in MGD and TGD (see the next section for the characteristics in the endosymbiont compartments in MGD and details). We confirmed the SL sequences of the high-GC% ver- TGD at the morphological level, we predict that the nucleo- sions of the aforementioned transcripts bioinformatically or ex- morph genomes in the two dinoflagellates bear lower GC% than perimentally (see above), but found no evidence for the SL those of the dinoflagellate nuclear genomes. Thus, the sources of sequence at the 5′ termini of the low-GC% counterparts (Fig. 3). green algal transcripts with low GC% are most likely the nucle- These data suggest that MGD possesses two sets of psbO and rbcS omorph genomes. It is worth noting that housekeeping proteins, genes; one is nucleus-encoded (high-GC% and generates the SL- which are involved in the -type machineries for trans- bearing transcripts) and the other is nucleomorph-encoded (low- lation, transcription, and replication, were found to be encoded GC% and generates the SL-lacking transcripts). Similarly, TGD almost exclusively by green algal transcripts with low GC% (i.e., likely possesses both nucleus- and nucleomorph-encoded petC putative nucleomorph transcripts) in both MGD and TGD (SI genes, as we found two petC transcripts; one is of high-GC% Appendix,Fig.S8A and B). In contrast, transcripts encoding and bears the SL sequence, and the other is of low GC% and proteins involved in plastid metabolic pathways appeared to span bears no SL sequence. The petC genes of TGD, as well as that of the two populations of green algal transcripts with distinct GC% MGD, showed a phylogenetic affinity to the Pedinomonas ho- (SI Appendix,Fig.S8C and D). Similar biased gene content in the molog, reflecting their plastid (and endosymbiont) origins (Fig. 3). nucleomorph genomes has been documented in the cryptophyte In addition to the nuclear (high-GC%) and nucleomorph and chlorarachniophyte systems (8, 9, 13–. 19) (low-GC%) versions described above, we found a third psbO We identified the green algal transcripts with high GC% (i.e., transcript in MGD (Fig. 3), which has neither a high nor low GC putative nuclear transcripts) encoding plastid-related proteins %. The psbO transcript with an “intermediate GC%” was found involved in plastid function and maintenance (e.g., photosyn- to bear the SL sequence, indicating that MGD possesses a thesis) in both MGD and TGD (SI Appendix, Fig. S8 C and D). nucleus-encoded psbO gene whose GC% does not match that of In theory, to operate the plastids, these nucleus-encoded plastid- other nuclear genes, including the high-GC% version of the related proteins (many of them are presumably acquired from psbO gene. We herein propose that the intermediate-GC% the green algal endosymbionts) need to be targeted to the plastids version of the psbO gene was transplanted in the nuclear genome after being synthesized by the host machinery in the cytoplasm in before the GC% of the nucleomorph genome had been lowered to MGD and TGD. the current level. Alternatively, the intermediate-GC% psbO gene is

Sarai et al. PNAS Latest Articles | 5of12 Downloaded by guest on October 1, 2021 Fig. 3. ML trees for the green algal orthologous proteins with distinct GC%. The numbers above branches show nonparametric ML bootstrap values. Only ML bootstrap support values greater than 50% are shown on the corresponding branches.

the result of more recent EGT than the high GC% one, and the affinity to those of dinoflagellates—from each of MGD and amelioration of GC% has yet to be completed in the former. TGD. The 18S rRNA sequences isolated from MGD and TGD were phylogenetically analyzed along with those from , Pedinophyceae Origin of the Green Algal Endosymbionts in MGD and green plants, including green algae and land plants, glauco- TGD. We isolated two eukaryotic small subunit rRNA (18S rRNA) phytes, cryptophytes, chlorarachniophytes, and dinoflagellates. genes—one with and the other without a clear phylogenetic The 18S rRNA phylogeny placed nondinoflagellate-type MGD

6of12 | www.pnas.org/cgi/doi/10.1073/pnas.1911884117 Sarai et al. Downloaded by guest on October 1, 2021 and TGD sequences within the sequences from free-living The plastidal 16S rRNA phylogeny united MGD, TGD, and green algae, being distant from the clade of the dinoflagellate Pedinomonas spp. into a clade with an ML bootstrap value of sequences, including dinoflagellate-type MGD and TGD se- 97% and a BPP of 0.92, excluding other Pedinophyceae con- quences (Fig. 4). We thus regard the positions of dinoflagellate- sidered in the analyses, Marsupiomonas spp. and Resultor mikron type and nondinoflagellate-type sequences from MGD and TGD (Fig. 4). The two phylogenetic analyses consistently and strongly in the 18S rRNA phylogeny as the phylogenetic positions of the indicate that the current plastids in both MGD and TGD can host and endosymbiont in the MGD and TGD systems. be traced back to Pedinophyceae green algae closely related to Unlike cryptophyte or chlorarachniophyte nucleomorphs, the Pedinomonas. 18S rRNA phylogeny clarified the precise origins of the green algal Some of us have already reported the Pedinophyceae origin of endosymbionts in the MGD and TGD systems. Nondinoflagellate- the Chls a+b-containing plastids in the dinoflagellate genus type MGD and TGD sequences grouped together, and this Lepidodinium (69). Indeed, in the plastidal 16S rRNA phylogeny MGD–TGD clade was placed within a clade of a small collection including the L. chlorophorum sequence, L. chlorophorum ro- of green algae, Pedinophyceae, with a specific affinity to Ped- bustly grouped with MGD and TGD, and this dinoflagellate inomonas spp. (Fig. 4). The monophyly of Pedinomonas plus clade as a whole was found to be sister to Pedinomonas (SI MGD and TGD received a maximum-likelihood (ML) bootstrap Appendix, Fig. S10; note that the L. chlorophorum sequence was value of 83% and a Bayesian posterior probability (BPP) of excluded from the analyses presented in Fig. 4 due to its ex- 0.97 (Fig. 4). The detailed origins of the endosymbionts in tremely divergent nature). The simplest interpretation of this MGD and TGD were also assessed by phylogenetic analyses of tree topology is that L. chlorophorum, MGD, and TGD share a plastidal small-subunit rRNA (16S rRNA) sequences (Fig. 4). single ancestor with a Pedinophyceae-derived plastid. However, EVOLUTION

Fig. 4. ML tree inferred from eukaryotic small subunit ribosomal RNA (18S rRNA) sequences. All of the taxon names are omitted except MGD, TGD, and Pedinophyceae green algae. Taxon labels of red and green algae per Adl et al. (89). Only ML bootstrap support values greater than 80% are shown on the corresponding branches. The branches supported by BPPs greater than 0.95 are shown as thick lines. The clade comprising MGD, TGD, and Pedinophyceae green algae inferred from plastidal small subunit rRNA (16S rRNA) sequences is shown in the box. The 16S rRNA tree including L. chlorophorum with full taxon names is provided as SI Appendix, Fig. S10.

Sarai et al. PNAS Latest Articles | 7of12 Downloaded by guest on October 1, 2021 a taxon-rich dinoflagellate phylogeny inferred from eukaryotic cannot rationalize why separate dinoflagellate lineages engulfed large subunit rRNA (28S rRNA) sequences (SI Appendix, Fig. very closely related Pedinophyceae green algae as endosymbi- S11) appeared to be inconsistent with the scenario deduced from onts. To answer the above question, we need to understand the the plastid phylogeny. In the host phylogeny deduced from 28S interaction between dinoflagellates and Pedinophyceae at the rRNA sequences, MGD or TGD showed no particular affinity to genetic, physiological, and environmental levels. any other dinoflagellates, while L. chlorophorum was nested within Discussion a robustly supported clade of dinoflagellates with peridinin- containing plastids (e.g., Gymnodinium and Nematodinium). The In this study, we reported previously undescribed dinoflagellates, relationship among the three dinoflagellates with Pedinophyceae- strains MGD and TGD, both of which still retain the remnant derived plastids was much better resolved in a phylogenetic intercellular structures of algal endosymbionts enclosing the analysis of 75 proteins than that of the 28S rRNA sequences. In nuclei and plastid, but no mitochondrion (Fig. 1). Multiple cases of EGT in MGD and TGD indicated that the endosymbiont- the 75-protein phylogeny, the three species were separated by derived compartments had been integrated as organelles in the robustly supported nodes (Fig. 5). We examined the relationship two dinoflagellates. Combining these facts, we regard MGD and among L. chlorophorum, MGD, and TGD by subjecting the ML TGD as nucleomorph-bearing organisms. Cryptophytes and tree, wherein the three species are paraphyletic, and 15 alternative chlorarachniophytes, have been investigated intensively as model trees to an approximately unbiased test (70). In the alternative organisms to depict organellogenesis that transformed an algal trees, all or subsets of the three species were forced to be endosymbiont into a plastid (e.g., refs. 1, 8, 9, and 20). Curiously, SI Appendix monophyletic ( ,Fig.S12). Significantly, all of the 15 three characteristics of the cryptophyte and chlorarachniophyte P alternative trees were rejected with very small values. Thus, we nucleomorphs were found in those of MGD and TGD (see be- can conclude that the host lineages of the three species are highly low). Pioneering studies demonstrated that, in both cryptophytes likely to be paraphyletic. and chlorarachniophytes: 1) the nucleomorph genomes are rich The inconsistency between the host and plastid phylogenies in house-keeping genes; 2) the GC% of the nucleomorph ge- demands three independent endosymbioses of Pedinophyceae nomes are low, as seen in other reduced genomes such as plastid green algae on the branches leading to Lepidodinium species, and mitochondrial genomes; and 3) nucleomorph genes tend to MGD, and TGD in dinoflagellate evolution. We currently be transcribed more intensively than nuclear genes (8, 9, 13–22). Although no genome data are available for the nucleomorphs of MGD or TGD, the putative nucleomorph transcripts of the two dinoflagellates appeared to be rich in those encoding house- keeping proteins (SI Appendix, Fig. S5). We also noticed that nucleomorph genes were found to be transcribed more intensively than nuclear genes in both MGD and TGD (SI Appendix,Fig.S3). We suspect that the two characteristics in gene content and transcription shared among the nucleomorph genomes known to date might provide the critical clues to understand organellogenesis. Besides the characteristics shared with cryptophytes and chlorarachniophytes (see above), MGD and TGD turned out to have characteristics that are not present in the two previously known nucleomorph-containing lineages. The phylogenetic analyses inferred from plastidal 16S and eukaryotic 18S rRNA sequences revealed that the plastid origins of MGD, TGD, and Lepidodinium spp. are the close relatives of a particular green algal genus, Pedinomonas (Fig. 4). In contrast, neither origin of the red alga engulfed by the ancestral cryptophyte nor that of the green alga engulfed by the ancestral chlorarachniophyte has been pinpointed to the genus level (18, 24–28). Thus, the mod- ifications of the endosymbiont genomes during the transition from an endosymbiotic alga into the plastid can be predicted directly by comparing MGD/TGD nucleomorph genomes to those of free-living Pedinophycean green algae in the future. The process of transferring endosymbiont genes to the host genome may have gone through a transitional state in which the particular genes co-occurred in both endosymbiont and host genomes. Nevertheless, Curtis et al. (20) proposed that both cryptophyte and chlorarachniophyte systems are in the post-EGT state, as no co-occurrence of a nucleomorph gene and its nuclear copy were detected. Nor have recent (i.e., lineage specific) EGTs been detected despite apparent examples of lineage specific nucleomorph gene loss (16, 17, 20). On top of that, no recent DNA influx from the endosymbiont to host in the cryptophyte or chlorarachniophyte system was apparent, since no nucleomorph Fig. 5. ML tree inferred from a 75-protein alignment. The topology in or plastid genome copies (NUNMs or NUPTs) were found in the question was emphasized. The 75-protein alignment contains 21,042 amino host genome (20). In contrast, we identified both nuclear and acid positions. The tree topology was generated by a ML analysis with the psbO rbcS LG + Γ + F + C60 model. Nonparametric ML bootstrap support values (shown nucleomorph versions of and genes in MGD, and + Γ + + those of petC in TGD (Fig. 3) (the three sets of genes showed above the nodes) were calculated from 100 replicates with the LG F Pedinomonas PMSF model. The nodes highlighted by closed dots were supported fully by phylogenetic affinities to the corresponding ho- nonparametric bootstrap analyses. Only bootstrap support values greater mologs, suggesting that the nuclear versions were derived from than 70% are shown. the corresponding genes in the endosymbiont genome). Moreover,

8of12 | www.pnas.org/cgi/doi/10.1073/pnas.1911884117 Sarai et al. Downloaded by guest on October 1, 2021 the MGD nuclear genome appeared to possess two psbO genes cacodylate for 15 min at 4 °C. Cells were dehydrated in a series of baths with with distinct GC% (Fig. 3), which are likely the outcome of two increasing ethanol concentrations, 10 min per bath, and were then embedded separate EGT events. The co-occurrence of a nucleomorph gene in low viscosity resin via three incubations with polypropylene oxide at room and its nuclear copy in MGD and TGD suggest that EGT has yet temperature over an hour. Embedded samples were polymerized for 14 h at 70 °C. Polymerized blocks were sectioned using a diamond knife and placed to be completed in either of the two dinoflagellate systems. In this onto formvar-coated copper grids. Ultra-thin sections of the cells were stained sense, MGD and TGD are excellent models to elucidate the de- with 2% uranyl acetate and lead citrate. Observations were carried out using a tailed process of EGT during organellogenesis. It is also intriguing H7650 (Hitachi) microscope. to search the evidence for DNA influx from the endosymbiont to the host in the two dinoflagellate systems. According to the Plastid Isolation. Living TGD cells were centrifuged for 10 min at 2,810 × g to limited transfer window hypothesis (20, 71, 72), NUPTs or make the cells burst. For MGD, the cells were put into a 25% final concen- NUNMs are unlikely to be abundant in the nuclear genomes of tration of sucrose solution for 5 min. Then, four volumes of distilled water MGD or TGD, which have cells containing a single plastid as- were added to make the cells burst by hypoosmotic shock. TGD and MGD cell sociated with a single nucleomorph. The above question should pellets including the plastids freed from other cellular structures were har- be addressed by sequencing the nuclear, plastid, and nucleo- vested by centrifugation, and were fixed as described above. The freed plastids morph genomes of both MGD and TGD in future studies. were sought and photos were taken under fluorescent microscopy and TEM. A recent study proposed that the current plastids in Lep- Transcriptome Analyses and Protein Prediction. Total RNAs of MGD and TGD idodinium spp. were established more recently than the chlor- Lepidodinium cells harvested from monoclonal laboratory cultures were extracted using arachniophyte plastids (73). Both plastids and TRIzol reagent (Life Technologies). After enrichment of polyA-tailed mRNA MGD/TGD plastids were derived from closely related Ped- molecules, RNA samples were reverse-transcribed and the cDNAs were ligated inophyceae green algae (Fig. 4), but the host phylogeny inferred with adaptor primers. The cDNA libraries from MGD and TGD were then se- from a 75-protein alignment clearly suggest that three in- quenced on a HiSeq 2000 instrument (Illumina); 286 and 411 million reads dependent endosymbioses of Pedinophyceae green algae gave (paired-end) were generated from MGD and TGD libraries, respectively. Se- rise to the current plastids in Lepidodinium spp., MGD, and quence reads with low sequencing quality were removed using FASTQ Trimmer TGD (Fig. 5 and SI Appendix, Fig. S12). Although a nucleus-like and FASTQ Quality Filter programs included in the FASTX-toolkit package structure within the plastid in L. viride was reported previously (http://hannonlab.cshl.edu/fastx_toolkit/). The remaining reads of MGD and (41), no clear structure such as those seen in in MGD and TGD TGD were then assembled into 286,124 and 393,513 transcript contigs using the TRINITY package (release 2013-02-25) (74, 75), respectively. All of the were observed in our survey. The differences in intracellular Lepidodinium transcript contigs were subjected to blastx analysis against an in-house data- structure between MGD/TGD and spp. leads us base of protein sequences retrieved from phylogenetically diverse organisms. EVOLUTION to propose that the endosymbiosis in MGD/TGD was more ORFs encoding known proteins, identified by blastx analyses, were translated recent than that in the ancestral Lepidodinium cell (or in the into amino acid sequences. Otherwise, the longest possible ORFs seen in in- ancestral chlorarachniophyte cell). We suspect that 1) the dividual transcripts were translated into amino acid sequences. The putative clarity in plastid/nucleomorph origin and 2) traits correspond- proteins predicted from the transcriptomic data were further analyzed as ing to a transitional state of EGT (see above) found in MGD described below. All transcriptome data are available from the DNA Data Bank and TGD stem from their evolutionary youthfulness. of Japan under BioProject PRJDB8237. The contig data of MGD and TGD are The present study reports the third and fourth nucleomorph- available from the Dryad Digital Repository (76). bearing organisms, dinoflagellate strains MGD and TGD, and unique in being discovered in the last 30 y. It is anticipated that Functional Annotation of the Proteins Predicted from MGD and TGD Transcriptomic Data. The proteins predicted from MGD and TGD transcripts these discoveries will shed light on the nature of endosymbio- were roughly annotated by referring to the Kyoto Encyclopedia of Genes and genesis. The two dinoflagellates will be the foundation of future Genomes (KEGG) database (77). First, a total of 193,301 proteins with KEGG works that deepen our knowledge derived from the pioneering orthology IDs (KO IDs) from 40 eukaryotic and 32 bacterial species were re- works on cryptophytes and chlorarachniophytes, which used to trieved from the KEGG database. The protein sets predicted for MGD and TGD be the sole models to study the evolutionary process of trans- were subjected to blastp analyses against the retrieved KEGG proteins. Then forming an algal endosymbiont into a plastid. the eukaryotic sequences retrieved from the KEGG database were subjected to blastp analysis in the opposite direction (i.e., blastp searches against the pro- Materials and Methods tein sets predicted from the MGD and TGD transcriptomes). If a particular Strains and Culture Condition. Two unique green dinoflagellate strains MGD MGD/TGD sequence and a KO ID matched in reciprocal blastp analyses, the KO and TGD were collected from coastal areas in Muroran, Hokkaido, Japan and ID was assigned to the sequence of interest. We assigned functional annota- Tsuruoka, Yamagata, Japan in September 2011, respectively. Single cells were tions to 57,983 and 73,589 of the putative proteins identified from MGD and isolated using a glass pipette under light microscopy. The strains were cul- TGD transcriptome data, respectively. tivated and have been maintained in half final concentration of Daigo IMK medium (Wako) with seawater at 20 °C under a light/dark cycle of 12/12 h. Evolutionary Origins of the Proteins Predicted from MGD and TGD Transcriptomic Data. The phylogenetic origins of functionally annotated proteins were in- Light and Fluorescent Microscopy. Living cells were observed by Olympus IX71 dividually assessed as described below. Each of the MGD and TGD proteins with inverted microscope (Olympus) with an Olympus DP71 CCD camera (Olym- functional annotations was subjected to blastp analyses against a custom protein pus). For fluorescent microscopic observation, the cells were fixed with the database containing genome-wide protein data from 129 phylogenetically di- same volume of fixation buffer containing 2.5% glutaraldehyde, 2% para- verse organisms (48 eukaryotic, 68 bacterial, and 13 archaeal species), and the formaldehyde, 0.2 M sucrose, and 0.1 M cacodylate at pH 7.0 for 5 min at proteins encoded in the plastid genome of a Pedinophyceae green alga, room temperature. Fixed cells were rinsed with room temperature distilled Pedinomonas minor (78), a free-living relative of the green algal endosymbionts water for 10 min three times, and were harvested by centrifugation at 2,810 × engulfed by MGD and TGD (Discussion). We identified two sets of proteins, g for 10 min at 18 °C. The cells were put onto a glass coverslip coated with green algal proteins and alveolate proteins, for which the top five hits from poly-L-Lysin (Wako), and left to stand for 30 min. DNA in the fixed cells was blastp searches were occupied by sequences from members of or stained by 0.1% SYBR green I solution for 10 min at room temperature. those from alveolates, respectively. MGD and TGD proteins, whose top five After washing three times with distilled water for 10 min, the DNA-stained blastp hits included matches to P. minor plastid genes, were designated as cells were mounted with ProLong Diamond Antifade Mountant (Life Tech- plastid genome-encoded proteins, and were not analyzed any further. nologies), and then observed using a Leica DMRD light and fluorescence microscope (Leica) with an Olympus DP73 CCD camera (Olympus). GC-Content and FPKM Calculation for Each Transcript. GC-content for an entire sequence as well as for first, second, and third codon positions of each Transmission Electron Microscopy. The first fixation step was carried out as transcript was calculated using an in-house Ruby script. In this study, the described above, then the cells were washed with 0.2 M cacodylate buffer. The green algal transcripts with >50% GC at first and >40% GC at third codon

second fixation was performed with buffer containing 1% OsO4 and 0.2 M positions were designed as high-GC% green algal transcripts, and those

Sarai et al. PNAS Latest Articles | 9of12 Downloaded by guest on October 1, 2021 with <60% at first and <40% at third codon positions were designed as low- subjected to tree searchs initiated from a single MP tree reconstructed by the GC% green algal transcripts. random stepwise addition of sequences using RAxML. To estimate expression levels for each transcript from MGD and TGD, we The three alignments were also analyzed by a Bayesian method using the calculated relative abundances as FPKM. To calculate FPKM, RNA sequencing CAT-GTR + Γ model implemented in the program PHYLOBAYES v3.3 (83) (RNA-seq) reads from MGD and TGD were separately aligned onto their with two independent chains. Markov chain Monte Carlo chains were run respective transcriptome assembly using bowtie (79). FPKMs were then cal- for 80,000 (18S rRNA), 80,000 (28S rRNA), and 78,000 (plastidal 16S rRNA) culated using RSEM (80) based on the short-read alignments. We carried out generations with a burn-in of 20,000 generations, respectively. We consid- these steps using a Perl script (align_and_estimate_abundance.pl) bundled ered the two chains to have converged when the maximum difference value with Trinity (v2.0.6) (74, 75, 80). was less than 0.1. After “burn-in,” the consensus tree with branch lengths and BPPs were calculated from the rest of the sampled trees. Estimation of N-Terminal Extensions in Green Algal Proteins. Possible N- terminal extensions of the green algal proteins from MGD and TGD were Phylogenetic Analyses of the 75-Protein Alignment. A phylogenetic alignment determined based on blastp alignments. First, all of the green algal proteins comprising 75 protein sequences (SI Appendix, Table S1) was assembled from the two dinoflagellates were subjected to a blastp search against the by referring to a previously published multigene phylogenetic study (38). custom protein database described above. For each green algal protein, we The sequences included in the 75-protein alignment were retrieved from defined the mature protein region that is conserved among the top 10 hits the assembled data of MGD and TGD generated in this study, the from the blastp search. The N terminus of the putative mature protein region National Center for Biotechnology Information GenBank database (https:// for each protein was set as an average of the start positions of the top 10 blast www.ncbi.nlm.nih.gov/), and the reassembled data of the Marine Microbial Eu- alignments (outliers detected by the Grubbs’ test were not used in the av- karyote Transcriptome Sequencing Project (https://github.com/dib-lab/dib-MMETSP) erage calculation). In this way, the putative N-terminal extension of each (84) by TBLASTN searches. We used the homologs in the organisms, for which high- protein corresponds to the amino acid sequence between the first methio- quality genomes/transcriptomes are available, as the BLAST queries (e.g., Plasmo- nine of the protein sequence and the estimated N terminus of the mature dium falciparum, Chromera velia,andThalassiosira pseudonana). Note than the protein region. If no methionine was found upstream of a putative mature sets of the BLAST queries included the sequences of a land Oyza sativa and a protein region, we concluded that that particular transcript most likely green alga Chlamydomonas reinhardtii to help identify and exclude the possible encoded an N terminus truncated protein and excluded it from the analysis endosymbiont-derived (green algal) sequences in the MGD and TGD data. The described below. Finally, lengths of putative N-terminal extensions between retrieved sequences were aligned with MAFFT (81). After ambiguously aligned green algal proteins encoded by high-GC% transcripts and those encoded by positions were removed, each single-gene alignment was subjected to an ML low-GC% transcripts (see above), were compared. Differences in the length phylogenetic analysis to identify and remove extremely divergent sequences, as of N-terminal extensions between the two groups of green algal proteins well as putative paralogous and contaminated sequences. In each of the single- was tested by the Wilcoxon rank-sum test after removal of the outliers de- gene ML analyses, the most appropriate substitution model was chosen. Finally, 75 termined by the Grubbs’ test. single-gene alignments were concatenated into a single alignment with 42 taxa and 21,042 amino acid positions, and then analyzed by the ML method with Confirmation of the SL Sequence at the 5′ Termini of MGD and TGD Transcripts. the LG + Γ + F + C60 model to infer the ML tree with ultrafast bootstrap Total RNA samples were extracted from the cultured MGD and TGD cells using support (1,000 replicates). The statistical support for the bipartitions in the ML TRIzol reagent (Life Technologies). Reverse-transcription and cDNA amplifi- tree was also assessed by a nonparametric ML bootstrap analysis (100 replicates) cation were performed with the SMARTer PCR cDNA Synthesis Kit (Clontech with the LG + Γ + F + PMSF [posterior mean site frequency (85)] model. IQ-TREE Laboratories) according to the manufacturer’s instructions. DNA amplifica- (86) was used for all of the ML analyses described above. The 75-protein align- tion was carried out as described below. The first PCR was performed with ment, single-gene alignments, and a spreadsheet with calculated site coverages the cDNA sample as the template and a set of primers: An adaptor primer for the 75-protein alignment are available from Dryad Digital Repository (76). supplied in the kit mentioned above (5′ PCR Primer II A) and a transcript- specific primer. The thermal cycle was set as: 5 cycles consisting of 10 s at Phylogenetic Analyses of Overlapping Green Algal Proteins. Homologous se- 98 °C and 1 min at 68 °C; 5 cycles consisting of 10 s at 98 °C, 20 s at 60 °C, and quences of three green algal proteins in MGD and TGD, namely PsbO, RbcS, 1 min at 68 °C followed by 25 cycles consisting of 10 s at 98 °C, 20 s at 53 °C, and PetC, were retrieved from an in-house database comprising proteins 1 min at 68 °C. We conducted the second PCR with an SL sequence-specific sequences from phylogenetically diverse organisms by similarity searches primer (66) and another transcript-specific primer that was set in a nested using blastp software. Retrieved protein sequences and the homologous position to the first primer. The amplicons of the first PCR were used as sequences from MGD and TGD were then aligned by MAFFT (81) with the L- the template in the second PCR. The thermal cycle was set as: 5 cycles con- INS-I method. Ambiguously aligned positions and gaps were removed from sisting of 10 s at 98 °C and 1 min at 68 °C; 5 cycles consisting of 10 s at 98 °C, the final PsbO, RbcS, and PetC alignments, which comprise 35 sequences 20 s at 60 °C, and 1 min at 68 °C followed by 25 cycles consisting of 10 s at with 254 amino acid positions, 35 sequences with 100 amino acid positions, 98 °C, 20 s at 53 °C, 1 min at 68 °C. The amplicons were sequenced by the and 36 sequences with 178 amino acid positions, respectively. ML trees were Sanger method. inferred from the three alignments by IQ-TREE (86) with nonparametric We also conducted an in silico survey of SL sequences at the 5′ termini of bootstrapping (100 replicates) under the LG4X substitution model. The three the transcripts reconstructed from the transcriptomic data of MGD and TGD. alignments are available from the Dryad Digital Repository (76). The 3′ portion of the SL sequence (12 nucleotides) was searched for within the first 50 nucleotides from the 5′ termini of all of the transcripts. Phylogenetic Analyses of the Enzymes Involved in Heme, Chl-a, and IPP Biosynthetic Pathways. We retrieved the sequences encoding the 9, 6, and Phylogenetic Analyses of Ribosomal RNA Genes. Eukaryotic small and large 7 nucleus-encoded enzymes involved in C5 pathway for the heme biosysn- subunit ribosomal RNAs (rRNAs) (18S and 28S rRNA, respectively), and thesis, and Chl-a, and IPP biosynthetic pathways, respectively, from the MGD plastidial small-subunit ribosomal RNA (16S rRNA) sequences were amplified and TGD transcriptomic data. The identified sequences were translated into by standard PCR and then sequenced by the Sanger method. The determined amino acid sequences and added to the corresponding alignments pre- sequences were separately aligned by the program MAFFT (81). After manual viously generated by some of us (56). The alignments, updated by adding exclusion of ambiguously aligned positions, 1,658 positions and 82 taxa the MGD and TGD homologs (22 in total), were subjected to ML phyloge- remained in the final 18S rRNA alignment, 732 positions and 78 taxa in the netic analyses to classify individual enzymes into 1) VI-type, 2) “endo- 28S rRNA alignment, and 1,232 positions and 66 taxa in the plastidal 16S symbiotically acquired (EA)-type,” or 3) LA-type (see above). We labeled the rRNA alignment. ML phylogenetic analyses of the three alignments were sequences, which we were unable to categorize into any of the three types done using RAxML v8.0.2 (82) under the GTR substitution model in- with confidence as “uncertain origin” (note that this category can contain corporating among-site rate variation approximated by a discrete gamma contaminated sequences). The single-gene alignments and ML tree with distribution with four categories. (For the 18S rRNA analysis, the proportion bootstrap support values are available from Dryad Digital Repository (76). of invariant sites was also used to approximate among-site rate variation.) The ML trees were heuristically searched for 10 distinct maximum-parsimony Approximately Unbiased Test. To assess alternative relationships among L. trees, each of which was reconstructed by the random stepwise addition of chlorophorum, MGD, and TGD in the 75-protein phylogeny (Fig. 5), we sequences. One-thousand bootstrap replicates were generated from the 18S pruned and regrafted one or two of the branches of interest in the ML tree to rRNA alignment, and analyzed with the rapid bootstrap method under the generate alternative trees with 1) the monophyly of MGD and TGD, 2) the CAT model in RAxML. For the bootstrap analyses of both 28S rRNA and plas- monophyly of L. chlorophorum and MGD, 3) the monophyly of L. chlor- tid 16S rRNA alignments, 100 replicates were generated, and individually ophorum and TGD, and 4) the monophyly of the three species. The ML tree, in

10 of 12 | www.pnas.org/cgi/doi/10.1073/pnas.1911884117 Sarai et al. Downloaded by guest on October 1, 2021 which the three species were paraphyletic, and the 15 alternative trees were performed as described in Zapata et al. (88) without any modifications. The subjected to an approximately unbiased test (70). For each test tree, site-wise eluted pigments were detected by their absorbance at 440 nm and identified log-likelihoods were calculated by IQ-TREE with the LG + Γ + F + C60 model. by their elution patterns compared to those reported in Zapata et al. (88). The resulting site-wise log-likelihood data were subsequently analyzed by Consel v0.1 with default settings (87). ACKNOWLEDGMENTS. The authors thank Dr. Bruce A. Curtis (Dalhousie University, Canada) for proofreading an early draft of this manuscript. This Pigment Analysis. For L. chlorophorum, we used strain NIES-1868 deposited in work was supported in part by the Japan Society for the Promotion of Sciences the National Institute for Environmental Studies (NIES) culture collection. Grants 23117006, 16H04826, 18KK0203, and 19H03280 (to Y.I.), 25304029 MGD, TGD, and L. chlorophorum cells were harvested by centrifugation. After and 15H04533 (to M.I.), 17H03723 and 26840123 (to G.T.) 14J05929 (to C.S.), 17K15164 (to T.N.), and 19H03274 (to R.K.); Ministry of Education, Culture, removal of a transparent viscous layer above the cell pellet, pigments were μ Sports, Science and Technology of Japan Grant-in-Aid for Scientific Research extracted with 100 L of 100% methanol, and the pigment extract was col- on Innovative Areas 3308; a research grant from The Yanmar Environmental lected into a 1.5-mL centrifuge tube after centrifugation. We repeated the Sustainability Support Association (to R.K.); and the “Tree of Life” research extraction until the extract was no longer colored. The extracted pigments project of the University of Tsukuba. Phylogenetic analyses of the 75-protein were subjected to reverse-phase HPLCy with a Waters Symmetry C8 column alignment were carried out under the “Interdisciplinary Computational Science (150 mm × 4.6 mm; particle size 3.5 μm; pore size 100 Å). The HPLC was Program” in the Center for Computational Sciences, University of Tsukuba.

1. J. M. Archibald, The puzzle of plastid evolution. Curr. Biol. 19, R81–R88 (2009). 26. G. I. McFadden, P. R. Gilson, R. F. Waller, Molecular phylogeny of chlorarachniophytes 2. V. Zimorski, C. Ku, W. F. Martin, S. B. Gould, Endosymbiotic theory for organelle or- based on plastid ribosomal-RNA and rbcL sequences. Arch. Protistenkd. 145, 231–239 igins. Curr. Opin. Microbiol. 22,38–48 (2014). (1995). 3. J. M. Archibald, Endosymbiosis and eukaryotic cell evolution. Curr. Biol. 25, R911–R921 27. M. B. Rogers, P. R. Gilson, V. Su, G. I. McFadden, P. J. Keeling, The complete chloro- (2015). plast genome of the chlorarachniophyte : Evidence for in- 4. E. C. M. Nowack, A. P. M. Weber, Genomics-informed insights into endosymbiotic dependent origins of chlorarachniophyte and euglenid secondary endosymbionts. organelle evolution in photosynthetic eukaryotes. Annu. Rev. Plant Biol. 69,51–84 Mol. Biol. Evol. 24,54–62 (2007). (2018). 28. S. Suzuki, Y. Hirakawa, R. Kofuji, M. Sugita, K. I. Ishida, Plastid genome sequences of 5. W. Martin et al., Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast Gymnochlora stellata, Lotharella vacuolata, and Partenskyella glossopodia reveal genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the remarkable structural conservation among chlorarachniophyte species. J. Plant Res. nucleus. Proc. Natl. Acad. Sci. U.S.A. 99, 12246–12251 (2002). 129, 581–590 (2016). 6. J. P. McCutcheon, N. A. Moran, Extreme genome reduction in symbiotic bacteria. Nat. 29. J. M. Archibald, Genomic perspectives on the birth and spread of plastids. Proc. Natl. Rev. Microbiol. 10,13–26 (2011). Acad. Sci. U.S.A. 112, 10147–10153 (2015). 7. D. R. Smith, Unparalleled GC content in the plastid DNA of Selaginella. Plant Mol. Biol. 30. J. I. Kim et al., Evolutionary dynamics of Cryptophyte plastid genomes. Genome Biol. 71, 627–639 (2009). Evol. 9, 1859–1872 (2017). 8. G. Tanifuji, J. M. Archibald, “Nucleomorph comparative genomics” in Endosymbiosis, 31. S. A. Muñoz-Gómez et al., The new red algal subphylum Proteorhodophytina com- EVOLUTION W. Löffelhardt, Ed. (Springer, 2014), pp. 197–213. prises the largest and most divergent plastid genomes known. Curr. Biol. 27, 1677– 9. G. Tanifuji, N. T. Onodera, “: A model organism sheds light on the 1684.e4 (2017). evolutionary history of genome reorganization in secondary endosymbioses” in Ad- 32. J. W. Stiller et al., The evolution of photosynthesis in chromist algae through serial vances in Botanical Research, Y. Hirakawa, Ed. (Academic Press, 2017), pp. 263–320. endosymbioses. Nat. Commun. 5, 5764 (2014). 10. J. N. Timmis, M. A. Ayliffe, C. Y. Huang, W. Martin, Endosymbiotic gene transfer: 33. J. D. Hackett, D. M. Anderson, D. L. Erdner, D. Bhattacharya, Dinoflagellates: A re- Organelle genomes forge eukaryotic chromosomes. Nat. Rev. Genet. 5, 123–135 markable evolutionary experiment. Am. J. Bot. 91, 1523–1534 (2004). (2004). 34. F. J. R. Taylor, M. Hoppenrath, J. F. Saldarriaga, Dinoflagellate diversity and distri- 11. A. Greenwood, “The Cryptophyta in relation to phylogeny and photosynthesis” in 8th bution. Biodivers. Conserv. 17, 407–418 (2008). International Congress of Electron Microscopy, J. Sanders, D. Goodchild, Eds. (Aus- 35. J. Janouškovec, A. Horák, M. Oborník, J. Lukeš , P. J. Keeling, A common red algal tralian Academy of Sciences, Canberra, 1974), pp. 566–567. origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl. Acad. 12. D. J. Hibberd, R. E. Norris, Cytology and ultrastructure of Chlorarachnion reptans Sci. U.S.A. 107, 10949–10954 (2010). (Chlorarachniophyta Divisio Nova, Chlorarachniophyceae Classis Nova). J. Phycol. 20, 36. M. Zapata, S. Fraga, F. Rodríguez, J. L. Garrido, Pigment-based chloroplast types in 310– 330 (1984). dinoflagellates. Mar. Ecol. Prog. Ser. 465,33–52 (2012). 13. S. Douglas et al., The highly reduced genome of an enslaved algal nucleus. Nature 37. E. Schnepf, M. Elbrächter, Dinophyte chloroplasts and phylogeny: A review. Grana 38, 410, 1091–1096 (2001). 81–97 (1999). 14. P. R. Gilson et al., Complete nucleotide sequence of the chlorarachniophyte nucleo- 38. J. Janouškovec et al., Major transitions in dinoflagellate evolution unveiled by phy- morph: Nature’s smallest nucleus. Proc. Natl. Acad. Sci. U.S.A. 103, 9566–9571 (2006). lotranscriptomics. Proc. Natl. Acad. Sci. U.S.A. 114, E171–E180 (2017). 15. C. E. Lane et al., Nucleomorph genome of andersenii reveals complete 39. J. F. Saldarriaga, F. J. R. Taylor, P. J. Keeling, T. Cavalier-Smith, Dinoflagellate nuclear intron loss and compaction as a driver of protein structure and function. Proc. Natl. SSU rRNA phylogeny suggests multiple plastid losses and replacements. J. Mol. Evol. Acad. Sci. U.S.A. 104, 19908–19913 (2007). 53, 204–213 (2001). 16. G. Tanifuji et al., Complete nucleomorph genome sequence of the nonphotosynthetic 40. T. Bjørnland, F. T. Haxo, S. Liaaen-Jensen, Carotenoids of the Florida red tide di- alga paramecium reveals a core nucleomorph gene set. Genome Biol. noflagellate Karenia brevis. Biochem. Syst. Ecol. 31, 1147–1162 (2003). Evol. 3,44–54 (2011). 41. M. M. Watanabe et al., A green dinoflagellate with chlorophylls a and b: Morphol- 17. C. E. Moore, B. Curtis, T. Mills, G. Tanifuji, J. M. Archibald, Nucleomorph genome ogy, fine structure of the chloroplast and chlorophyll composition. J. Phycol. 23, 382– sequence of the cryptophyte alga mesostigmatica CCMP1168 reveals 389 (1987). lineage-specific gene loss and genome complexity. Genome Biol. Evol. 4, 1162–1175 42. M. M. Watanabe, S. Suda, I. Inouye, T. Sawaguchi, M. Chihara, Lepidodinium viride (2012). gen. et sp. nov. (, Dinophyta), a green dinoflagellate with a chlorophyll 18. G. Tanifuji et al., Nucleomorph and plastid genome sequences of the chlorar- a- and b-containing endosymbiont. J. Phycol. 26, 741–751 (1990). achniophyte Lotharella oceanica: Convergent reductive evolution and frequent re- 43. T. Matsumoto, M. Kawachi, H. Miyashita, Y. Inagaki, Prasinoxanthin is absent in the combination in nucleomorph-bearing algae. BMC Genomics 15, 374 (2014). green-colored dinoflagellate Lepidodinium chlorophorum strain NIES-1868: Pigment 19. S. Suzuki, S. Shirato, Y. Hirakawa, K. Ishida, Nucleomorph genome sequences of two composition and 18S rRNA phylogeny. J. Plant Res. 125, 705–711 (2012). chlorarachniophytes, Amorphochlora amoebiformis and Lotharella vacuolata. Ge- 44. T. Horiguchi, Y. Takano, Serial replacement of a diatom endosymbiont in the marine nome Biol. Evol. 7, 1533–1545 (2015). dinoflagellate Peridinium quinquecorne (Peridiniales, ). Phycol. Res. 54, 20. B. A. Curtis et al., Algal genomes reveal evolutionary mosaicism and the fate of nu- 193–200 (2006). cleomorphs. Nature 492,59–65 (2012). 45. T. Matsumoto et al., Green-colored plastids in the dinoflagellate genus Lepidodinium 21. G. Tanifuji, N. T. Onodera, C. E. Moore, J. M. Archibald, Reduced nuclear genomes are of core chlorophyte origin. Protist 162 , 268–276 (2011). maintain high gene transcription levels. Mol. Biol. Evol. 31, 625–635 (2014). 46. K. Takishita, K. Nakano, A. Uchida, Preliminary phylogenetic analysis of plastid- 22. Y. Hirakawa, S. Suzuki, J. M. Archibald, P. J. Keeling, K. Ishida, Overexpression of encoded genes from an anomalously pigmented dinoflagellate Gymnodinium miki- molecular chaperone genes in nucleomorph genomes. Mol. Biol. Evol. 31, 1437–1443 motoi (Gymnodiniales, Dinophyta). Phycol. Res. 47, 257–262 (1999). (2014). 47. K. Takishita et al., Origins of plastids and glyceraldehyde-3-phosphate dehydrogenase 23. Y. Hirakawa, K. Nagamune, K. Ishida, Protein targeting into secondary plastids of genes in the green-colored dinoflagellate Lepidodinium chlorophorum. Gene 410, chlorarachniophytes. Proc. Natl. Acad. Sci. U.S.A. 106, 12820–12825 (2009). 26–36 (2008). 24. S. E. Douglas, S. L. Penny, The plastid genome of the cryptophyte alga, 48. T. Tengs et al., Phylogenetic analyses indicate that the 19’ hexanoyloxy-fucoxanthin- theta: Complete sequence and conserved synteny groups confirm its common an- containing dinoflagellates have tertiary plastids of haptophyte origin. Mol. Biol. Evol. cestry with red algae. J. Mol. Evol. 48, 236–244 (1999). 17, 718–729 (2000). 25. K. Ishida, Y. Cao, M. Hasegawa, N. Okada, Y. Hara, The origin of chlorarachniophyte 49. N. Yamada, S. D. Sym, T. Horiguchi, Identification of highly divergent diatom-derived plastids, as inferred from phylogenetic comparisons of amino acid sequences of EF-Tu. chloroplasts in dinoflagellates, including a description of Durinskia kwazulunatalensis J. Mol. Evol. 45, 682–687 (1997). sp. nov. (Peridiniales, Dinophyceae). Mol. Biol. Evol. 34, 1335–1351 (2017).

Sarai et al. PNAS Latest Articles | 11 of 12 Downloaded by guest on October 1, 2021 50. K. Takishita, K. Ishida, T. Maruyama, Phylogeny of nuclear-encoded plastid-targeted 70. H. Shimodaira, An approximately unbiased test of phylogenetic tree selection. Syst. GAPDH gene supports separate origins for the peridinin- and the fucoxanthin Biol. 51, 492–508 (2002). derivative-containing plastids of dinoflagellates. Protist 155, 447–458 (2004). 71. D. R. Smith, K. Crosby, R. W. Lee, Correlation between nuclear plastid DNA abundance 51. M. A. Minge et al., A phylogenetic mosaic plastid proteome and unusual plastid- and plastid number supports the limited transfer window hypothesis. Genome Biol. targeting signals in the green-colored dinoflagellate Lepidodinium chlorophorum. Evol. 3, 365–371 (2011). BMC Evol. Biol. 10, 191 (2010). 72. A. C. Barbrook, C. J. Howe, S. Purton, Why are plastid genomes retained in non- 52. T. Nosenko et al., Chimeric plastid proteome in the Florida “red tide” dinoflagellate photosynthetic organisms? Trends Plant Sci. 11, 101–108 (2006). Karenia brevis. Mol. Biol. Evol. 23, 2026–2038 (2006). 73. C. Jackson, A. H. Knoll, C. X. Chan, H. Verbruggen, Plastid phylogenomics with broad 53. N. J. Patron, R. F. Waller, P. J. Keeling, A tertiary plastid uses genes from two endo- taxon sampling further elucidates the distinct evolutionary origins and timing of symbionts. J. Mol. Biol. 357, 1373–1382 (2006). secondary green plastids. Sci. Rep. 8, 1523 (2018). 54. F. Burki et al., Endosymbiotic gene transfer in tertiary plastid-containing dinoflagel- 74. M. G. Grabherr et al., Full-length transcriptome assembly from RNA-Seq data without – lates. Eukaryot. Cell 13, 246 255 (2014). a reference genome. Nat. Biotechnol. 29, 644–652 (2011). 55. B. Bentlage, T. S. Rogers, T. R. Bachvaroff, C. F. Delwiche, Complex ancestries of iso- 75. B. J. Haas et al., De novo transcript sequence reconstruction from RNA-seq using the – prenoid synthesis in dinoflagellates. J. Eukaryot. Microbiol. 63, 123 137 (2016). Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 56. E. Matsuo, Y. Inagaki, Patterns in evolutionary origins of heme, chlorophyll a and (2013). isopentenyl diphosphate biosynthetic pathways suggest non-photosynthetic periods 76. Y. Inagaki et al., Supplementary data for Dinoflagellates with relic endosymbiont prior to plastid replacements in dinoflagellates. PeerJ 6, e5345 (2018). nuclei as models for elucidating organellogenesis. Dryad, Dataset. https://doi.org/10. 57. S. W. Jeffrey, M. Vesk, Further evidence for a membrane-bound endosymbiont within 5061/dryad.k6djh9w38. Deposited 18 December 2019. – the dinoflagellate Perisinium foliaceum. J. Phycol. 12, 450 455 (1976). 77. M. Kanehisa et al., Data, information, knowledge and principle: Back to metabolism 58. M. Tamura, S. Shimada, T. Horiguchi, Galeidinium rugatum gen. et sp. nov. (Dino- in KEGG. Nucleic Acids Res. 42, D199–D205 (2014). phyceae), a new coccoid dinoflagellate with a diatom endosymbiont. J. Phycol. 41, 78. M. Turmel, M. C. Gagnon, C. J. O’Kelly, C. Otis, C. Lemieux, The chloroplast genomes 658–671 (2005). of the green algae Pyramimonas, Monomastix, and Pycnococcus shed new light on 59. R. N. Tomas, E. R. Cox, K. A. Steidinger, Peridinium balticum (Levander) Lemmermann, the evolutionary history of prasinophytes and the origin of the secondary chloroplasts an unusual dinoflagellate with a mesokaryotic and an eucaryotic nucleus. J. Phycol. 9, of euglenids. Mol. Biol. Evol. 26, 631–648 (2009). 91–98 (1973). 79. B. Langmead, Aligning short sequencing reads with Bowtie. Curr. Protoc. Bio- 60. B. Imanian, P. J. Keeling, The dinoflagellates Durinskia baltica and Kryptoperidinium informatics. Chapter 11, Unit 11.7 (2010). foliaceum retain functionally overlapping mitochondria from two evolutionarily 80. B. Li, C. N. Dewey, RSEM: Accurate transcript quantification from RNA-seq data with distinct lineages. BMC Evol. Biol. 7, 172 (2007). or without a reference genome. BMC Bioinformatics 12, 323 (2011). 61. B. Imanian, J. F. Pombert, R. G. Dorrell, P. J. Keeling, A typically unusual dinoflagellate 81. K. Katoh, D. M. Standley, MAFFT multiple sequence alignment software version 7: mitochondrial genome and an unusually typical diatom mitochondrial genome con- Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). stitute the mitochondrial genomes of “dinotoms”. J. Phycol. 47, S16 (2011). 82. A. Stamatakis, RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with 62. B. Imanian, J. F. Pombert, R. G. Dorrell, F. Burki, P. J. Keeling, Tertiary endosymbiosis in thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006). two dinotoms has generated little change in the mitochondrial genomes of their 83. N. Lartillot, T. Lepage, S. Blanquart, PhyloBayes 3: A Bayesian software package for dinoflagellate hosts and diatom endosymbionts. PLoS One 7, e43763 (2012). – 63. E. Hehenberger, B. Imanian, F. Burki, P. J. Keeling, Evidence for the retention of two phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286 2288 evolutionary distinct plastids in dinoflagellates with diatom endosymbionts. Genome (2009). Biol. Evol. 6, 2321–2334 (2014). 84. L. K. Johnson, H. Alexander, C. T. Brown, Re-assembly, quality evaluation, and an- 64. E. Hehenberger, F. Burki, M. Kolisko, P. J. Keeling, Functional relationship between a notation of 678 microbial eukaryotic reference transcriptomes. Gigascience 8, giy158 dinoflagellate host and its diatom endosymbiont. Mol. Biol. Evol. 33, 2376–2390 (2019). (2016). 85. H. C. Wang, B. Q. Minh, E. Susko, A. J. Roger, Modeling site heterogeneity with 65. C. Trapnell et al., Transcript assembly and quantification by RNA-Seq reveals un- posterior mean site frequency profiles accelerates accurate phylogenomic estimation. annotated transcripts and isoform switching during cell differentiation. Nat. Bio- Syst. Biol. 67, 216–235 (2018). technol. 28, 511–515 (2010). 86. L. T. Nguyen, H. A. Schmidt, A. von Haeseler, B. Q. Minh, IQ-TREE: A fast and effective 66. H. Zhang et al., Spliced leader RNA trans-splicing in dinoflagellates. Proc. Natl. Acad. stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. Sci. U.S.A. 104, 4618–4623 (2007). 32, 268–274 (2015). 67. A. W. Larkum, P. J. Lockhart, C. J. Howe, Shopping for plastids. Trends Plant Sci. 12, 87. H. Shimodaira, M. Hasegawa, CONSEL: For assessing the confidence of phylogenetic 189–195 (2007). tree selection. Bioinformatics 17, 1246–1247 (2001). 68. K. Bolte et al., Protein targeting into secondary plastids. J. Eukaryot. Microbiol. 56,9– 88. M. Zapata, F. Rodríguez, J. L. Garrido, Separation of chlorophylls and carotenoids 15 (2009). from marine phytoplankton: A new HPLC method using a reversed phase C8 column 69. R. Kamikawa et al., Plastid genome-based phylogeny pinpointed the origin of the and pyridine-containing mobile phases. Mar. Ecol. Prog. Ser. 195,29–45 (2000). green-colored plastid in the dinoflagellate Lepidodinium chlorophorum. Genome 89. S. M. Adl et al., Revisions to the classification, nomenclature, and diversity of eu- Biol. Evol. 7, 1133–1140 (2015). karyotes. J. Eukaryot. Microbiol. 66,4–119 (2019).

12 of 12 | www.pnas.org/cgi/doi/10.1073/pnas.1911884117 Sarai et al. Downloaded by guest on October 1, 2021