BMC Evolutionary Biology BioMed Central

Research article Open Access Cyanobacterial contribution to the genomes of the plastid-lacking Shinichiro Maruyama*1, Motomichi Matsuzaki1,2,3, Kazuharu Misawa1,3,3 and Hisayoshi Nozaki1

Address: 1Department of Biological Sciences, Graduate School of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo 113-0033, Japan, 2Current address: Department of Biomedical Chemistry, Graduate School of Medicine, University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo 113- 0033, Japan and 3Current address: Research Program for Computational Science, Riken, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan Email: Shinichiro Maruyama* - [email protected]; Motomichi Matsuzaki - [email protected]; Kazuharu Misawa - [email protected]; Hisayoshi Nozaki - [email protected] * Corresponding author

Published: 11 August 2009 Received: 13 March 2009 Accepted: 11 August 2009 BMC Evolutionary Biology 2009, 9:197 doi:10.1186/1471-2148-9-197 This article is available from: http://www.biomedcentral.com/1471-2148/9/197 © 2009 Maruyama et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: Eukaryotic genes with cyanobacterial ancestry in plastid-lacking protists have been regarded as important evolutionary markers implicating the presence of plastids in the early evolution of . Although recent genomic surveys demonstrated the presence of cyanobacterial and algal ancestry genes in the genomes of plastid-lacking protists, comparative analyses on the origin and distribution of those genes are still limited. Results: We identified 12 gene families with cyanobacterial ancestry in the genomes of a taxonomically wide range of plastid-lacking eukaryotes (Phytophthora [], [], Dictyostelium [], Saccharomyces and Monosiga [Opisthokonta]) using a novel phylogenetic pipeline. The eukaryotic gene clades with cyanobacterial ancestry were mostly composed of genes from (, Chromalveolata, and Excavata). We failed to find genes with cyanobacterial ancestry in Saccharomyces and Dictyostelium, except for a photorespiratory enzyme conserved among fungi. Meanwhile, we found several Monosiga genes with cyanobacterial ancestry, which were unrelated to other Opisthokonta genes. Conclusion: Our data demonstrate that a considerable number of genes with cyanobacterial ancestry have contributed to the genome composition of the plastid-lacking protists, especially bikonts. The origins of those genes might be due to lateral gene transfer events, or an ancient primary or secondary endosymbiosis before the diversification of bikonts. Our data also show that all genes identified in this study constitute multi-gene families with punctate distribution among eukaryotes, suggesting that the transferred genes could have survived through rounds of gene family expansion and differential reduction.

Background tion, by way of endosymbiotic gene transfer (EGT), a par- Cyanobacterial ancestors gave rise to plastids (chloro- ticular form of lateral gene transfer (LGT) from plasts) in the ancestor of a eukaryotic lineage. The birth of endosymbionts into the phylogenetically discontiguous the plastid had an impact on eukaryotic genome evolu- host genome [1]. Subsequently, an algal ancestor gave rise

Page 1 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

to secondary plastids in several punctate lineages of phytes, green and ) [10]. Although it is eukaryotes. A number of these secondarily phototrophic widely accepted that primary plastids share a single origin lineages lost their photosynthetic ability and further [[11-13], but see [14,15]] and the Archaeplastida are diverged into secondarily heterotrophic, plastid-lacking monophyletic [[3,16], but see [17,18]], the evolutionary protists [2,3]. history of the primary plastids is still debatable [19-21]. In plastid-lacking protists, 'plastid imprints' can be exempli- Although the position of the root of eukaryotes is still fied by genomic information, i.e. genes with affinity to uncertain, the presence of gene fusions and insertion/ extant cyanobacterial or algal genes. These genes were sup- deletion sequences in the marker genes have allowed us to posed to have originated from EGT events, and this sort eukaryotes into at least three large groups; assumption should be affirmed by the resulting phyloge- Opisthokonta, Amoebozoa and bikonts (Archaeplastida, netic relationship between 'imprint' genes and the extant Chromalveolata, Rhizaria and Excavata) [4-10] (Figure 1). relatives of the putative endosymbionts. The biggest chal- Most phototrophic eukaryotes harboring plastids derived lenge and the limitation of this 'imprint' searching process from primary endosymbiosis (primary plastids) are classi- is that the inevitable incompleteness of genome informa- fied into the super-group Archaeplastida (i.e. glauco- tion on lineages of interest and the ever-developing phyl- Amoebozoa Unikonts

Opisthokonta

(Chlorarachniophytes) Rhizaria

Alveolata Origin of Eukaryotes () ? Chromalveolata Stramenopiles

(Oomycetes)

Glaucophytes

Red algae Archaeplastida ( Plantae sensu stricto) ? ? Primary endosymbiosis Green plants

(Euglenida) Excavata Secondary endosymbiosis Bikonts AFigure schematic 1 representation of eukaryotic phylogeny A schematic representation of eukaryotic phylogeny. The current consensus phylogeny and rooting of eukaryotes based on previous studies [4,23,40]. Arrows and stars indicate plastid acquisition via endosymbiosis and alternative hypotheti- cal time points which primary endosymbiosis occurred, respectively. The root of the eukaryotic tree on the unikonts/bikonts boundary is hypothesized, but still controversial [5,7-9,19]. Archaeplastida are represented as a monophyletic group, but see also [19].

Page 2 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

ogenetic methodologies make it difficult to distinguish To address how many genes have cyanobacterial ancestry EGT and ancient LGT [22]. Thus, although available in plastid-lacking protists, and whether cyanobacterial eukaryotic genome data are increasingly accumulating, ancestry is limited to this gnd gene or also found in other gene and genome phylogenies should be carefully inter- genes, we conducted a phylogenomic analysis using preted to infer evolutionary scenarios. genome sequence data of a taxonomically wide range of plastid-lacking eukaryotes. Here we present a gene mining Chromalveolata is a large taxonomic group of eukaryotes, study with a novel pipeline automatically producing and encompassing secondary phototrophs and secondarily summarizing one-by-one phylogenetic trees, and show heterotrophic protists [10], and the 'chromalveolate phylogenetic analyses of resultant candidate genes with hypothesis' argues that this group originated from a com- cyanobacterial ancestry, using the whole genome mon ancestor harboring the chlorophyll c-containing sec- sequence data from a wide range of eukaryotic lineages. ondary plastid derived from a red alga (Figure 1) [23]. Among the secondarily heterotrophic chromalveolates, Results several lineages have retained remnant chloroplasts for To address how many genes are derived from cyanobacte- non-photosynthetic metabolic pathways, e.g. apicoplasts ria in non-photosynthetic protists, we conducted cyano- in apicomplexan parasites [24]. Recent genomic surveys bacterial gene mining using the genome sequence data of revealed the presence of plastid-derived genes, and further a wide range of the plastid-lacking eukaryotes (Additional suggested the presence of cryptic secondary plastids in file 1). Using the whole genome data, we conducted non-photosynthetic protists [25,26]. Further- BLAST searches against all 'Bacteria' and selected queries more, re-examination of the whole genome sequences showing the highest similarity to genes in the available suggested the existence of algal genes in ciliates, another cyanobacteria genome sequences. We then drew the plastid-lacking alveolate lineage, which could support the neighbor-joining (NJ) trees for genes showing homology photosynthetic ancestry of ciliates [27]. Oomycetes are to cyanobacterial counterparts. After the first tree con- plastid-lacking stramenopiles, or chromists, classified into struction step, we selected the gene trees where cyanobac- Chromalveolata [10]. Although whole genome sequence teria and eukaryotes formed a monophyletic group analysis showed that a number of genes with affinity to excluding other prokaryotes. As a result, we obtained a photosynthetic organisms (cyanobacteria and algae) are shorter list of candidates, which we termed 'genes with encoded in the nuclear genome, most of these 'plastid cyanobacterial affinity'. Subsequently we re-analyzed the imprints' candidates were only suggested by similarity eukaryotic genes with cyanobacterial affinity by visually search and phylogenetic analyses have not yet led to fully checking and re-drawing the Bayesian and maximum like- recovering the expected tree topology [28]. Considering lihood (ML) trees after manually trimming operational the uncertain phylogenetic affinity of the 'best hit' in sim- taxonomic units (OTUs). In total, we identified 12 plas- ilarity search [29], reassessment of the genome informa- tid-lacking genes 'with cyanobacterial ancestry' in tion is important to determine whether the evolutionary the genomes of the wide range of eukaryotes: two plastid- history of oomycetes is comparable to ciliates [27]. lacking bikonts (the oomycete P. ramorum and the heter- olobosean N. gruberi) and three unikonts (the One candidate of 'plastid imprints' in oomycetes has been D. discoideum, the budding yeast S. cerevisiae and the cho- confirmed by studies reporting the phylogeny of gnd anoflagellate M. brevicollis) (Table 1). These were the genes, which encode 6-phosphogluconate dehydroge- eukaryotic genes with cyanobacterial ancestry that shared nase, showing that some plastid-lacking protists have the same origin with Archaeplastida and other eukaryotes. -like, cyanobacterium-derived gnd genes [20,21,30]. They were placed within a monophyletic subclade mostly These analyses suggested that the gnd genes with cyano- composed of photosynthetic organisms (cyanobacteria bacterial ancestry were acquired early in eukaryotic evolu- and plants/algae) and showed an apparent cyanobacterial tion, either via ancient -to-eukaryote LGT, or ancestry as far as was determined by tree topology (Table primary EGT that occurred earlier than had ever been 1; Figures 2, 3, 4 and 5; and Additional files 2, 3, 4, 5, 6, thought [21]. Additionally, the phylogeny of gnd genes 7, 8 and 9). We found another type of gene with cyano- demonstrated that cyanobacterial genes are also present in bacterial ancestry, which were the protist genes forming several Excavata protists, e.g. the heterolobosean amoebo- monophyletic groups mostly with genes from extant flagellate Naegleria gruberi. Naegleria gruberi is a non-para- cyanobacteria (prokaryote-type genes with cyanobacterial sitic heterotrophic species related to N. fowleri, which is ancestry). Among a number of candidate genes found the causative agent of primary amoebic meningoencepha- through the first screening, we have presented three typi- litis in mammals [31]. Although the phylogenetic rela- cal trees that were resolved with significant support values tionship within Excavata is still unclear, Heterolobosea, (Additional files 10, 11 and 12). We postulate that these together with Jakobida, is likely to be a sister group of prokaryote-type genes are remnants of the bacterium-to- [18,32]. eukaryote LGT, which occurred 'recently' in evolution.

Page 3 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

BI/ML 0.2 Ostreococcus tauri 17331 Ostreococcus lucimarinus CCE9901 XP_001417831 Micromonas sp. RCC299 72207 0.93 Micromonas pusilla 21172 1.00 70 75 Thalassiosira pseudonana Volvox carteri 104902 0.84 -- Phytophthora sojae 109158 Phytophthora ramorum 51635 1.00 Pythium oligandrum EV244605 98 0.63 Pythium ultimum EL778029 65 Monosiga brevicollis MX1 XP_001742170 0.70 Galdieria sulphuraria -- Cyanothece sp. CCY0110 ZP_01730125 0.85 -- Anabaena variabilis ATCC 29413 YP_323976 1.00 Acaryochloris marina MBIC11017 YP_001518945 63 Synechococcus elongatus PCC 6301 YP_171952 0.98 71 Gloeobacter violaceus PCC 7421 NP_924180 0.99 Arabidopsis thaliana NP_198901 96 Vitis vinifera CAO15100 Oryza sativa (japonica cultivar-group) NP_001043645 Zea mays NP_001105724 Ostreococcus lucimarinus 4668 Ostreococcus tauri 4145 Micromonas sp. RCC299 87830 0.96 Micromonas pusilla 22906 84 Chlamydomonas reinhardtii 191432 Volvox carteri 81921 1.00 88 Emiliania huxleyi 213726 Gluconobacter oxydans 621H YP_192457 0.99 86 Alteromonas macleodii 'Deep ecotype' ZP_01110047 0.99 Nitrosomonas eutropha C91 YP_747226 0.99 76 Cyanidioschyzon merolae CMV039C 73 Cyanidium caldarium NP_045061 1.00 .72 87 Acaryochloris marina MBIC11017 YP_001518662 -- Gloeobacter violaceus PCC 7421 NP_925526 1.00 83 Synechococcus sp. BL107 ZP_01469115 Methanoculleus marisnigri JR1 YP_001046899 Methanosarcina acetivorans C2A NP_617926 1.00 .82 72 Clostridium tetani E88 NP_781393 -- Arcobacter butzleri RM4018 YP_001489309 .77 -- Desulfitobacterium hafniense Y51 YP_518457 .97 -- Syntrophus aciditrophicus SB YP_460621 Physcomitrella patens subsp. patens XP_001787048 Deinococcus geothermalis DSM 11300 YP_605594 1.0 Rubrobacter xylanophilus DSM 9941 YP_644736 55 Carboxydothermus hydrogenoformans Z-2901 YP_359632 0.85 .98 Methanocaldococcus jannaschii DSM 2661 NP_247960 -- 53 Streptomyces avermitilis MA-4680 NP_827972

FigureUroporphyrin 2 III methyltransferase gene phylogeny showing the presence of genes with cyanobacterial ancestry in oomycetes Uroporphyrin III methyltransferase gene phylogeny showing the presence of genes with cyanobacterial ances- try in oomycetes. The MrBayes consensus tree with Bayesian posterior probabilities (BI) (70% or more) and maximum like- lihood (ML) bootstrap support values (50% or more) is shown. Thick branches represent BI and ML values not lower than 100 and 95, respectively. Different phylogenetic affiliations are represented as follows: green, green plants; magenta, red algae; blue- green, glaucophytes; orange, Chromalveolata; dark blue, Excavata; yellow, Rhizaria; gray, unikonts; sky blue, cyanobacteria. Stars indicate plastid-lacking eukaryotes.

Page 4 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

BI/ML Chlamydomonas reinhardtii XP_001702934 0.2 0.98 Galdieria sulphuraria 72 Phytophthora sojae 108148 1.00 100 Phytophthora ramorum 72019 Phaeodactylum tricornutum 28056 1.00 Synechococcus sp. JA-3-3Ab YP_473659 89 Arthrospira maxima CS-328 ZP_03272025 0.92 59 Thermosynechococcus elongatus BP-1 NP_681881 Trichodesmium erythraeum IMS101 YP_720730 1.00 Cyanothece sp. ATCC 51142 YP_001806235 79 Lactococcus lactis subsp. lactis Il1403 NP_267411 Escherichia coli 536 YP_671884 Klebsiella pneumoniae 342 YP_002241112 0.94 -- Serratia proteamaculans 568 YP_001476480 0.84 Bordetella bronchiseptica RB50 NP_888622 -- Naegleria gruberi 59646 Cyanidioschyzon merolae CMJ234C Ralstonia eutropha JMP134 YP_298442 Methylococcus capsulatus str. Bath YP_114678 Neisseria meningitidis 053442 YP_001599025 0.88 Haemophilus somnus 2336 YP_001784615 51 Saccharophagus degradans 2-40 YP_526112 1.00 85 Acaryochloris marina MBIC11017 YP_001522348 Crocosphaera watsonii WH 8501 ZP_00518984 1.00 Parvibaculum lavamentivorans DS-1 YP_001411886 60 Neosartorya fischeri NRRL 181 XP_001267290 Aspergillus terreus NIH2624 XP_001214650 0.80 Coccidioides immitis RS XP_001244621 63 Sclerotinia sclerotiorum 1980 XP_001588472 1.00 90 Arabidopsis thaliana NP_187028 .99 Oryza sativa Japonica Group NP_001067314 65 Arabidopsis thaliana NP_197598 Physcomitrella patens subsp. patens XP_001758527 Nitratiruptor sp. SB155-2 YP_001356355 Caminibacter mediatlanticus TB-2 ZP_01871179 Sulfurihydrogenibium sp. YO3AOP1 YP_001931535 Bacillus halodurans C-125 NP_241304 Listeria monocytogenes str. 4b F2365 YP_014301 0.91 Geobacillus sp. Y412MC10 ZP_03037227 59

FigureCobalamin-independent 3 methionine synthase genes in oomycetes are monophyletic with algal and cyanobacterial homologs Cobalamin-independent methionine synthase genes in oomycetes are monophyletic with algal and cyanobac- terial homologs. See legend for figure 2.

Page 5 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

0.91 67 Oryza sativa Japonica Group NP_001047064 BI/ML 1.00 73 Beta vulgaris BAE07183 0.5 1.00 Vitis vinifera CAN70523 87 0.89 Arabidopsis thaliana NP_175036 64 0.98 Volvox carteri 82548 74 Chlamydomonas reinhardtii XP_001689659 Solanum lycopersicum ABE77149 Thalassiosira pseudonana 14389 1.00 Phaeodactylum tricornutum 2217 95 Naegleria fowleri Naegleria gruberi 36109 Nostoc punctiforme PCC 73102 ZP_00106716 Gloeobacter violaceus PCC 7421 NP_925165 Morganella morganii BAE94286 Vibrio harveyi ATCC BAA-1116 YP_001445292 1.00 85 Acaryochloris marina MBIC11017 YP_001520311 Clostridium tetani E88 NP_782098 Clostridium botulinum A str. ATCC 3502 YP_001255424 .85 Lotus japonicus CAG30580 0.96 58 75 Oryza sativa Japonica Group NP_001051099 Nicotiana tabacum AAC39483 1.00 77 Arabidopsis thaliana NP_178310 0.92 Cyanidioschyzon merolae CMF072C -- Coccidioides immitis RS XP_001249050 0.88 Synechococcus sp. WH5701 ZP_01085038 78 0.98 Lactococcus lactis subsp. cremoris MG1363 YP_001032492 -- 1.00 0.80 87 Yarrowia lipolytica CLIB122 XP_505160 -- Candida albicans SC5314 XP_713020 1.00 92 Ustilago maydis 521 XP_762210 Ralstonia metallidurans CH34 YP_583914 Methanosarcina barkeri str. Fusaro YP_306227 0.70 Vitis vinifera CAO41624 -- Ostreococcus lucimarinus CCE9901 XP_001416416 .87 Monosiga brevicollis XP_001743625 51 Homo sapiens CAA09590 1.00 0.95 85 Ornithorhynchus anatinus XP_001508856 50 Caenorhabditis elegans NP_505372 0.76 .95 Strongylocentrotus purpuratus XP_790556 63 70 Monosiga brevicollis XP_001745715 Phytophthora ramorum 130206 Cryptococcus neoformans var. neoformans JEC21 XP_572250 0.89 Tetrahymena thermophila SB210 XP_001017229 56 0.84 Paramecium tetraurelia XP_001454206 52 0.71 -- Haloarcula marismortui ATCC 43049 YP_136395 Archaeoglobus fulgidus DSM 4304 NP_070828 Methanosphaera stadtmanae DSM 3091 YP_447382

FigureAmino acid 4 decarboxylase genes in Heterolobosea, within a subfamily with cyanobacterial ancestry Amino acid decarboxylase genes in Heterolobosea, within a subfamily with cyanobacterial ancestry. See legend for figure 2.

Page 6 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

0.88 Arabidopsis thaliana NP_849444 (ACD1-like) BI/ML 64 Physcomitrella patens subsp. patens XP_001753171 0.5 0.82 Arabidopsis thaliana NP_190074 (ACD1) -- Physcomitrella patens subsp. patens XP_001769164 Anabaena variabilis ATCC 29413 YP_321822 Chlamydomonas reinhardtii XP_001697234 1.00 92 Ostreococcus lucimarinus CCE9901 XP_001420009 Chlamydomonas reinhardtii XP_001697980 .72 Anabaena variabilis ATCC 29413 YP_321575 0.84 61 Aureococcus anophagefferens 32727 -- 1.00 0.76 Nostoc punctiforme PCC 73102 ZP_00109510 76 -- 53 Cyanothece sp. CCY0110 ZP_01731008 .99 52 Acaryochloris marina MBIC11017 YP_001514559 67 Ostreococcus lucimarinus CCE9901 XP_001421803 0.91 Micromonas strain RCC299 56398 65 0.79 Emiliania huxleyi 42217 -- Volvox carteri 89016 1.00 Emiliania huxleyi 95263 80 Emiliania huxleyi 205252 Emiliania huxleyi 308518 1.00 Aureococcus anophagefferens 11520 78 Acaryochloris marina MBIC11017 YP_001514433 .96.93 Synechocystis sp. PCC 6803 NP_441106 68 53 Synechococcus sp. WH 8102 NP_896952 Chlamydomonas reinhardtii XP_001698326 .99 1.00 53 Ostreococcus lucimarinus CCE9901 XP_001421195 81 Arabidopsis thaliana NP_180055 (TIC55) Physcomitrella patens subsp. patens XP_001753962 .94 51 Nostoc punctiforme PCC 73102 ZP_00107775 Emiliania huxleyi 457299 0.99 Emiliania huxleyi 219969 69 Arabidopsis thaliana NP_973969 (CAO) 0.84 0.81 Physcomitrella patens subsp. patens XP_001780924 -- -- Chlamydomonas reinhardtii XP_001690175 Ostreococcus lucimarinus CCE9901 XP_001418699 0.84 Naegleria gruberi 53656 -- Naegleria gruberi 52597 Emiliania huxleyi 437979 1.00 Aureococcus anophagefferens 64390 97 Synechococcus sp. RCC307 YP_001228728 Prochlorococcus marinus str. MIT 9312 YP_397313 Streptomyces coelicolor A3(2) NP_630017 Myxococcus xanthus DK 1622 YP_629862 Synechococcus sp. JA-2-3B'a(2-13) YP_477536 .84 Gloeobacter violaceus PCC 7421 NP_922973 59 Nodularia spumigena CCY9414 ZP_01632124 1.00 Volvox carteri 88044 82 0.99 Xenopus laevis NP_001090436 1.00 78 Aedes aegypti XP_001655171 93 Paramecium tetraurelia strain d4-2 XP_001429174 0.98 Mycobacterium smegmatis str. MC2 155 YP_886596 78 Anabaena variabilis ATCC 29413 YP_323010 Nodularia spumigena CCY9414 ZP_01629472 Nostoc punctiforme PCC 73102 ZP_00108134 0.73 67 Burkholderia pseudomallei 1106a YP_001076766

FigureNaegleria 5 genes are a member of the multiple gene families of TIC55-like oxidoreductase genes Naegleria genes are a member of the multiple gene families of TIC55-like oxidoreductase genes. See legend for figure 2.

Page 7 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

Table 1: Summary of eukaryote-type genes with cyanobacterial ancestry identified in this study

bikonts unikonts Chromalveolata Excavata Amoebozoa Opisthokonta Opisthokonta Gene family Pathway P. ramorum N. gruberi D. discoideum S. cerevisiae M. brevicollis

Uroporphyrin III methyltransferase porphyrin 51635 - - - XP_001742170 Cobalamin-independent methionine synthase methionine 72019 - - - - Amino acid decarboxylase amino acid - 36109 - - - TIC55-like oxidoreductase unknown - 52597 - - - Folate/biopterin transporter folate 72218 - - - - 6-phosphogluconate dehydrogenase pentose phosphate 71783 30694 - - - Cobalamin synthesis protein cobalamin 85610 38446 - - XP_001746731 Oligopeptidase unknown 54177 - - - - YCF45 unknown 83996 2396 - - - Glycerate kinase glyoxylate 94130 - - YGR205w - Amino acid aminotransferase amino acid - 2119 - - XP_001749475 Glyoxalase I family protein-like unknown - 29304 - - XP_001750995

JGI gene ID (P. ramorum and N. gruberi), gene name (S. cerevisiae) and accession numbers (M. brevicollis) are listed. Interestingly, while Phytophthora ycf21 homologs, proba- ancestry of the genes in the chromalveolate lineage. We bly transferred from a relative of the extant cyanobacterial failed to find the homologs in the prasinophytes Ostreo- species via LGT, were placed within the cyanobacterial coccus and Micromonas, suggesting that this gene family gene clade, the Tetrahymena ycf21 homolog showed was dispensable in some plant lineages. affinity to Archaeplastida (Additional file 11). This gene was neither found in the Paramecium genome nor in the One of the genes with cyanobacterial ancestry found in N. list of the recently identified algal genes in ciliates [27]. gruberi is pyridoxal-dependent amino acid decarboxylase gene (Figure 4). The tree indicated that green plants were We found that Uroporphyrin III methyltransferase gene split into different eukaryotic clades. Naegleria and chro- homologs (Figure 2) consisted of two large subfamilies of malveolate genes showed robust monophyly with green genes with cyanobacterial ancestry, and that oomycete plants, included in a cyanobacterial gene clade. The tree genes were included only in one of them. Given that both showed that land plants possessed another subfamily, subfamilies include green plants, red algae, chromalveo- associated with red algal and fungal genes, apparently of lates and cyanobacteria, it is likely that they diverged non-cyanobacterial origin. We also identified genes with within the ancestral cyanobacteria and transferred into cyanobacterial ancestry from Naegleria in an oxidoreduct- eukaryotic hosts via primary and secondary endosymbi- ase gene family that included genes encoding Rieske iron- oses. Both of the subfamilies were concurrently present in sulfur cluster 55 kDa protein of chloroplast inner mem- the cyanobacterial and green algal genomes. In land brane translocon (TIC55), chlorophyll a oxidase (CAO), plants, red algae, diatoms, and the plastid- Lethal-leaf spot 1 (LLS1, which is synonymous with phe- lacking oomycetes, one of the subfamilies might be lost ophorbide a oxygenase (PAO)) and accelerated cell death along with the loss of the plastid. The Thalassiosira 1 (ACD1) (Figure 5) [34,35]. All the members of this fam- homolog formed a monophyletic group with green ily in land plants were hypothesized to be located at the plants, rather than red algae, suggesting that it was inner membrane of the chloroplast, and to be involved in acquired independently of the secondary plastid of the red chlorophyll metabolism [34]. The phylogenetic tree of the lineage. In this study, the bacteriovorous TIC55-like gene family showed intricate distribution of Monosiga brevicollis gene and the proteobacterial genes cyanobacterial, green plant and chromalveolate genes. (Gluconobacter, Alteromonas and Nitrosomonas) were treated as 'apparently LGT-derived genes', incongruously In other trees of the genes identified in this study (Addi- showing affinities to photosynthetic bikonts [33]. tional files 2, 3, 4, 5, 6, 7, 8 and 9), gene clades with cyanobacterial ancestry were mostly composed of bikonts Genes encoding cobalamin-independent methionine syn- genes, besides the choanoflagellate M. brevicollis genes thase in green and red algae, diatoms, and oomycetes (see Discussion). formed a monophyletic group with cyanobacterial homologs, while the land plants and the red alga Cyanid- Discussion ioschyzon homologs were placed in different clades unre- We identified eight and seven genes with cyanobacterial lated to cyanobacteria (Figure 3). Close association ancestry in the genome sequences of the oomycete P. ram- between diatom and oomycete genes suggested the deep orum and the heterolobosean N. gruberi, respectively

Page 8 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

(Table 1). It was reported that the apicomplexan Crypt- fore, it is unlikely that the genes with cyanobacterial osporidium 'recently' lost their secondary plastid, and ancestry found in the heterolobosean nuclear genomes retained two to seven putative plastid-derived genes in the originated from the plastid cognate with any known sec- genome [36]. This number is comparable to our result of ondary plastids in extant photosynthetic eukaryotes. The the gene mining study using oomycete and heterolo- amino acid decarboxylase gene (Figure 4) and the gnd bosean genomes. In addition, our system resolved the gene (Additional file 3) [21] trees demonstrated the pres- hidden diversity of the gene family repertoire in eukaryo- ence of genes with cyanobacterial ancestry in other heter- tic genomes by one-by-one gene phylogenies. olobosean species than N. gruberi, suggesting that the ancestor of the genus Naegleria possessed this gene family. Secondary EGT scenario Furthermore, although ML bootstrap support or Bayesian Although the phylogenetic positions of posterior probability (BI) values were not always suffi- and Haptophyta are still debatable [e.g. [17,37-41]], the cient, the Naegleria genes occupy relatively basal phyloge- chromalveolate hypothesis has been reinstated to support netic positions within the bikonts clade in all seven trees the evolutionary scenario that the plastid-lacking protists (Figures 4 and 5; Additional files 3, 4, 6, 8, and 9). Thus it oomycetes and ciliates once might have had a plastid [27]. is possible that the genes with cyanobacterial ancestry According to this hypothesis, the genes with cyanobacte- were introduced en bloc in the ancestor of Heterolobosea, rial ancestry found in the oomycete genomes were via a batch gene transfer, in a concerted manner. One pos- acquired via secondary EGT in the common ancestor of sible origin of such a concerted gene transfer is secondary Chromalveolata, from the red algal ancestor of secondary EGT from a photosynthetic eukaryote with a basal phylo- plastids. This explanation is also applicable under the genetic position within bikonts. However, as discussed alternative hypothesis for chromalveolate plastids, which above, it is unlikely that Heterolobosea experienced sec- proposes that a tertiary endosymbiont of the / ondary endosymbiosis and acquired genes common to cryptophyceae lineage is the origin of the stramenopile/ the extant secondary plastid-containing eukaryotes via alveolate plastids [22]. The phylogenetic tree of the pho- secondary EGT. torespiratory glycerate kinase genes, suggesting the red algal origin of the Phytophthora genes (Additional file 7), Ancient eukaryote-to-eukaryote LGT or primary EGT is consistent with the chromalveolate hypothesis. How- scenarios ever, several other gene trees in this study showed oomyc- Alternatively, we can argue for two other explanations: a ete genes with green lineage affinity, not red algae (e.g. concerted eukaryote-to-eukaryote LGT scenario or a more Additional files 2, 3 &4). Recently, Frommolt et al. [42] ancient primary EGT scenario. The Naegleria genes with demonstrated that, out of 16 genes involved in carotenoid cyanobacterial ancestry shown in Table 1 are basally posi- biosynthesis from chromalveolate algae, one third (5/16) tioned within bikonts, but not intruding into any of gene of plastid-targeted, nuclear-encoded genes are most clades from extant photosynthetic eukaryotes (Figures 4 closely related to green algal homologs. Reyes-Prieto, and 5; Additional files 3, 4, 6, 8, and 9). Thus, if we Moustafa and Bhattacharya [27] identified 16 genes of assume that these genes were acquired via non-endosym- possible algal origin in the ciliates Tetrahymena ther- biotic LGT, they may originate from unknown ancient mophila and Paramecium tetraurelia, and 7/16 of their trees photosynthetic lineages basally positioned within show a close relationship between green plants and Chro- bikonts. Meanwhile, under the primary EGT scenario, in malveolata. Frommolt et al. [42] attributed the close rela- which the primary endosymbiosis occurred in the com- tionships between green plants and chromalveolate genes mon ancestor of bikonts (Figure 1) [[19-21], but see Ref. to the secondary endosymbiosis of an ancient green plant [9] for further discussion on the root of eukaryotic tree of (e.g. prasinophyte), based on the hypothesis on the life], ancient primary EGT occurred much earlier than the monophyly of the Archaeplastida [16,40]. This explana- conventional hypothesis, from the cyanobacterium-like tion might be also applicable to the plant-like genes in cil- prokaryote to the common ancestor of bikonts. Primary iates [27]. plastids were subsequently lost in many lineages of bikonts, except for the Archaeplastida lineages, but some While Heterolobosea and Euglenozoa are often united as genes originating from the cyanobacterial ancestor of the the morphologically defined taxon, Discicristata, within primary plastids have been retained in the nuclear Excavata [10], recent morphological and molecular phyl- genomes of the plastid-lacking lineages of bikonts (Fig- ogenetic analyses suggest that the heteroloboseans (e.g. ures 1 and 6). The loss of the plastid might have triggered Naegleria) never possessed the secondary plastid of green the loss of genes that specifically functioned within the lineage and share the same origin with Euglenida [43]. plastid. Only a portion of the plastid-derived genes, which Molecular phylogenetic analyses showed that Excavata is we can find now in the plastid-lacking protist genomes, separated from other secondary plastid-containing might have escaped from or survived through eliminative eukaryotes (Chromalveolata and Rhizaria) [18,40]. There- pressure in a lineage-specific manner, by acquiring addi-

Page 9 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

Endosymbiosis Retargeting Duplication Plastid loss Plastid

EGT ? ? Mt

Nucleus

gene protein expression/transport

FigureAn evolutionary 6 history of the genes with cyanobacterial ancestry An evolutionary history of the genes with cyanobacterial ancestry. Thick continuous arrows represent gene flow via EGT. Thin broken arrows indicate gene expression or intracellular transport into organelles. Dashed line circles and boxes indicate that they have been lost in the evolutionary history. Note that the genes with cyanobacterial ancestry (white), which had been derived from the plastid genome via EGT, were retargeted into the plastid. After rounds of gene family duplication, some genes (magenta) gained additional functions in other cellular compartments (cytosol, mitochondrion, etc.). In some plas- tid-lacking protists, a number of genes were retained in the nuclear genomes after the plastid loss events. Mt, mitochondrion. tional functions with other components and/or in other deum (Amoebozoa), and only one gene in S. cerevisiae cellular compartments. This might account for the (Opisthokonta). These results are also consistent with the observed punctate distribution of gene families among ancient primary EGT scenario. the eukaryotes [44,45]. A photorespiratory gene with cyanobacterial ancestry in Recently, a hypothesis for the non-monophyly of Archae- fungi plastida was proposed based on the phylogenetic analyses Our analysis using the genome data of the budding yeast of slowly evolving nuclear-encoded genes [17,19]. This S. cerevisiae identified one gene with cyanobacterial ances- non-monophyly hypothesis could be also considered try, encoding the glycerate kinase for photorespiration within the scope of the primary EGT scenario. It is notable (Additional file 7). Given that photorespiration is essen- that a number of the trees in this study (Figure 2; Addi- tial for cyanobacteria and plants, it is likely that the glyc- tional files 2, 3, 4, 6, and 8) showed intriguing topologies, erate kinases in plants and cyanobacteria are depicting the split of Archaeplastida and inclusion of phylogenetically and physiologically related to photores- Chromalveolata and Excavata genes within it, as shown in piration [47,48]. A previous study on glycerate kinases the previously reported multiple slowly-evolving gene showed that, regardless of the complete absence of pho- phylogeny [19] and gnd gene phylogeny [20,21]. These torespiratory metabolism in fungi, the gene product from results are consistent with the hypothesis for the non- the budding yeast Saccharomyces showed similar enzy- monophyly of Archaeplastida, and suggest that the matic activity and substrate specificity compared with the oomycete and heterolobosean genes with cyanobacterial Arabidopsis gene, suggesting that the plant and fungal ancestry might reflect the host nuclear genome phylogeny. genes catalyze the same reaction in different contexts of On the other hand, the genes found in the marine cho- the metabolic pathway [47]. Another example of plant- anoflagellate M. brevicollis were positioned within the type genes in fungi was reported in a phylogenetic study bikonts clade, but not associated with the genes from of the genes encoding high-affinity nitrate transporter other Opisthokonta relatives (Metazoa and fungi), sug- NRT2, which suggested that fungi probably acquired the gesting that the tree topologies were probably not reflec- NRT2 genes via LGT from one of the chromalveolate line- tive of the host phylogeny [46] but eukaryote-to- ages [49]. Meanwhile, our data showed that the fungal eukaryote LGT (Figure 2; Additional files 4, 8, and 9). No clade was located outside the clade of plants plus oomyc- gene with cyanobacterial ancestry was found in D. discoi- etes (Additional file 7), suggesting that fungal glycerate

Page 10 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

kinase genes with cyanobacterial ancestry likely origi- otide substitutions (see Methods). At the stage of starting nated from an LGT event from an ancestor of cyanobacte- the present gene mining study, N. gruberi was the only ria, or eukaryote-to-eukaryote LGT from an ancestor of species with whole genome data released within the non- Archaeplastida (or bikonts). One likely explanation for parasitic excavates, and thereby the excavate genes with the presence of photorespiratory genes in oomycetes is cyanobacterial ancestry were mostly from N. gruberi. More that the ancestor of Chromalveolata possessed this gene genome data from plastid-lacking protists from Excavata family, but some photosynthetic descendants lost this and Rhizaria as well as Archaeplastida, especially red algae gene family or replaced it with other genes during the and glaucophytes, are needed to unravel the evolutionary course of lineage-dependent customization of photorespi- history of plastids, and plastid-lacking protists. ratory pathways [[50,51]; for discussion on carbon assim- ilation in diatoms], while oomycetes retained the genes Conclusion without any replacement. The comparative analyses of the genome sequence data of the plastid-lacking eukaryotes demonstrated the poten- Gene family expansion and differential reduction tially significant contributions of ancestral or extant Another conclusion of this analysis is that rounds of gene cyanobacteria to the eukaryotic genomes, which probably family expansion and selective reduction are important occurred via LGT or ancient primary EGT events. Further- factors in making eukaryotic genome phylogeny look like more, the automated phylogenetic analyses revealed the a complicated mosaic (Figure 6). It is likely that the alter- diversity and punctate distribution of gene families within ation of gene family repertoire contributed to the restruc- the genomes in the unicellular microbes. More genome turing of the intracellular metabolome and a reduction of data of the plastid-lacking Excavata and Rhizaria will the dispensable gene families. Our data showed that all make the evolutionary history clear and support our the genes identified in this study were members of multi- hypotheses. ple gene families. Algae and plastid-lacking protists retained only members of subfamilies (e.g. Figure 2 and Methods Additional file 8), suggesting that the punctate distribu- Data preparation tion might be a corollary of the common mechanism by The genome sequence data of P. ramorum, N. gruberi and which genes with cyanobacterial ancestry were retained in M. brevicollis was produced by the US Department of their genomes. The presence of genes from multiple sub- Energy Joint Genome Institute (JGI) [54]. D. discoideum families in one organism supports this idea (e.g. two Uro- genome data (9 Nov 2007) at dictyBase [55] and S. cerevi- porphyrin III methyltransferase subfamilies in siae genome data [56] were used for phylogenetic analysis. prasinophytes and Volvox in Figure 2). Discontinuous loss Red algal data were retrieved from the Cyanidioschyzon or gain of a metabolic pathway in a lineage might be merolae [57], Galdieria sulphuraria [58] genome databases, another factor in punctate distribution; e.g. the oxidative and other algal data were from Aureococcus anophageffer- pentose pathway, and the cyanobacterial gnd genes func- ens, Emiliania huxleyi, Micromonas pusilla, Micromonas sp. tioning therein, were present in most bikonts but lost in RCC299, Ostreococcus tauri, Ostreococcus sp. RCC809, the ciliate Tetrahymena [21,52]. A recent study on pyri- Phaeodactylum tricornutum, Phytophthora sojae, Thalassiosira doxal-dependent amino acid aminotransferase reported pseudonana and Volvox carteri genome databases on JGI. that, besides the ancestrally eukaryotic enzymes, land EST sequences of several protists were obtained from plants possess a distinct subfamily of prokaryote-type TBestDB [59] and all other sequences were from the NCBI chloroplast-targeted enzymes [53]. Our data with richer GenBank refseq database [60]. We excluded amitochon- taxon sampling identified another prokaryote-type sub- drial and/or parasitic eukaryotes, which might cause long family with cyanobacterial ancestry (Additional file 8), branch attraction due to unusual nucleotide substitutions illustrating the hidden evolutionary diversity of protist [61,62]. Fragments of N. fowleri amino acid decarboxylase and algal metabolomes. gene [DDBJ: AB491948] were amplified from genomic DNA using degenerated primers based on the conserved Future prospects amino acid motif YHHFGYP for the forward primer (TAY- Our results showed that many genes with cyanobacterial CAYCAYTTIGGITAYCC) and WQLACEG for the reverse ancestry identified in this study were found only in com- primer (CCYTCRCAIGCIARYTGCCA). PCR products were plete genome sequences, suggesting that these genes directly sequenced using an ABI PRISM 3100 Genetic Ana- might be difficult to discover by expressed sequence tag lyzer (Applied Biosystems, Foster City, CA, USA) with a (EST) library sequencing, probably due to the low-level BigDye Terminator Cycle Sequencing Ready Reaction kit expression of these genes. Although the whole genome v. 3.1 (Applied Biosystems). data from excavate parasites (e.g. , and ) are available, they seem to be unsuited for the gene mining study because of the unusual nucle-

Page 11 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

Phylogenomic analysis frequencies reached less than 0.01. Except for cyanobacte- A genome-wide phylogenetic program was made with sev- rial genes of which no homologs were found in other eral bench-made BioRuby scripts (Additional file 1), refer- prokaryotes (e.g. Additional file 2), or of which mono- ring to the previously reported phylogenomic pipeline phyly was confirmed by previous studies (e.g. Additional used in the macronuclear genome analysis of Tetrahymena file 3), threshold values to assess the monophyly of cyano- thermophila [63]. For the first screening, query amino acid bacterial gene clades were 50% on ML bootstrap or 0.9 on sequences were automatically subjected to BLAST search- BI posterior probability values. ing using NCBI netblast [64] and EFetch utilities [65], extracting the genes showing the highest E-value to a Abbreviations cyanobacterial counterpart among 'Bacteria' by BLASTP. BI: Bayesian posterior probability; EGT: endosymbiotic For the second step, these genes were subjected to BLASTP gene transfer; EST: expressed sequence tag; LGT: lateral analysis against 'refseq-protein' to fetch homologous gene transfer; ML: maximum likelihood; NJ: neighbor- sequences with E-values less than 0.001, up to 500 hits at joining; OUT: Operational Taxonomic Unit; TIC55: a maximum. Multiple alignments were then performed Rieske iron-sulfur cluster 55 kDa protein of chloroplast using MUSCLE [66], which automatically removed inner membrane translocon. ambiguously aligned sites or sequences with too many gaps. Bootstrapped neighbor-joining trees were produced Authors' contributions using QuickTree [67]. Trees were output in the PostScript SM and HN conceived the study. SM, MM and KM pre- format using the newicktops program in the NJplot pack- pared and analyzed the data. SM and HN drafted the man- age [68] with sizes and colors of OTU names modified uscript. All authors read and approved the final according to the NCBI database [69] to sim- manuscript. plify the subsequent visual checking process. Genomes of several bacterial genera were intensively sequenced and Additional material many homologous sequences from closely related species and strains (e.g. Escherichia, Bacillus) appeared on the trees. To diminish the sampling bias, the output files of Additional file 1 Supplemental Figure 7. Flow chart of procedures used in the phylogenetic QuickTree were also used to parse tree topology and analyses. detect a monophyletic clade exclusively composed of Click here for file OTUs from a single genus using Bio::Tree class methods in [http://www.biomedcentral.com/content/supplementary/1471- BioRuby scripts. One representative OTU was automati- 2148-9-197-S1.pdf] cally selected in such single-genus clades, the other OTUs were removed, and the trees were re-constructed for visual Additional file 2 checking. In addition to the automatic process, trees for Supplemental Figure 8. MrBayes consensus tree of folate/biopterin trans- genes listed in the putative photosynthetic endosymbi- porter genes. Click here for file ont-derived genes [28], but not detected in our analysis, [http://www.biomedcentral.com/content/supplementary/1471- were manually re-constructed. Non-cyanobacterial 2148-9-197-S2.pdf] prokaryotic genes taxonomically unrelated to, but placed within, the cyanobacterial clade were interpreted as Additional file 3 'apparently LGT-derived genes' with cyanobacterial ances- Supplemental Figure 9. MrBayes consensus tree of 6-phosphogluconate try. dehydrogenase genes. Click here for file [http://www.biomedcentral.com/content/supplementary/1471- Candidate cyanobacteria-related genes were manually 2148-9-197-S3.pdf] selected, their homologs were collected from major groups of the three domains of life, and then subjected to Additional file 4 multiple protein sequence alignments using MUSCLE. Supplemental Figure 10. MrBayes consensus tree of cobalamin synthesis Phylogenetic analyses were performed with a maximum protein genes. likelihood (ML) method using RAxML [70] and with a Click here for file Bayesian interference (BI) method using MrBayes [71]. [http://www.biomedcentral.com/content/supplementary/1471- 2148-9-197-S4.pdf] ML and BI were based on the WAG substitution matrix with options of four gamma-distributed rate categories Additional file 5 and estimate of invariable sites (plus empirical base fre- Supplemental Figure 11. MrBayes consensus tree of oligopeptidase genes. quencies in ML). ML branch support was evaluated with Click here for file 1000 bootstrap replicates, and BI posterior probability [http://www.biomedcentral.com/content/supplementary/1471- values were calculated from the MCMC run data, which 2148-9-197-S5.pdf] summarized when the average standard deviation of split

Page 12 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

References Additional file 6 1. Timmis JN, Ayliffe MA, Huang CY, Martin W: Endosymbiotic gene Supplemental Figure 12. MrBayes consensus tree of YCF45 genes. transfer: organelle genomes forge eukaryotic chromosomes. Nat Rev Genet 2004, 5:123-135. Click here for file 2. Delwiche CF: Tracing the thread of plastid diversity through [http://www.biomedcentral.com/content/supplementary/1471- the tapestry of life. Am Nat 1999, 154:164-177. 2148-9-197-S6.pdf] 3. Bhattacharya D, Yoon HS, Hackett JD: Photosynthetic eukaryo- tes unite: endosymbiosis connects the dots. Bioessays 2004, Additional file 7 26:50-60. 4. Baldauf SL: The deep roots of eukaryotes. Science 2003, Supplemental Figure 13. MrBayes consensus tree of glycerate kinase 300:1703-1706. genes. 5. Stechmann A, Cavalier-Smith T: Rooting the eukaryote tree by Click here for file using a derived gene fusion. Science 2002, 297:89-91. [http://www.biomedcentral.com/content/supplementary/1471- 6. Stechmann A, Cavalier-Smith T: The root of the eukaryote tree 2148-9-197-S7.pdf] pinpointed. Curr Biol 2003, 13:665-666. 7. Nozaki H, Matsuzaki M, Misumi O, Kuroiwa H, Higashiyama T, Kuroiwa T: Phylogenetic implications of the CAD complex Additional file 8 from the primitive red alga Cyanidioschyzon merolae (Cya- Supplemental Figure 14. MrBayes consensus tree of amino acid ami- nidiales, Rhodophyta). J Phycol 2005, 41:652-657. notransferase genes. 8. Richards TA, Cavalier-Smith T: Myosin evolution and the primary divergence of eukaryotes. Nature 2005, Click here for file 436:1113-1118. [http://www.biomedcentral.com/content/supplementary/1471- 9. Roger AJ, Simpson AGB: Revisiting the root of the eukaryote 2148-9-197-S8.pdf] tree. Curr Biol 2009, 19:165-167. 10. Adl SM, Simpson AG, Farmer MA, Andersen RA, Anderson OR, Barta JR, Bowser SS, Brugerolle G, Fensome RA, Fredericq S, James TY, Additional file 9 Karpov S, Kugrens P, Krug J, Lane CE, Lewis LA, Lodge J, Lynn DH, Supplemental Figure 15. ML consensus tree of glyoxalase I family pro- Mann DG, McCourt RM, Mendoza L, Moestrup O, Mozley-Standridge tein-like genes. SE, Nerad TA, Shearer CA, Smirnov AV, Spiegel FW, Taylor MF: The Click here for file new higher level classification of eukaryotes with emphasis [http://www.biomedcentral.com/content/supplementary/1471- on the taxonomy of protists. J Eukaryot Microbiol 2005, 52:399-451. 2148-9-197-S9.pdf] 11. Matsuzaki M, Misumi O, Shin-I T, Maruyama S, Takahara M, Miyag- ishima SY, Mori T, Nishida K, Yagisawa F, Nishida K, Yoshida Y, Additional file 10 Nishimura Y, Nakao S, Kobayashi T, Momoyama Y, Higashiyama T, Supplemental Figure 16. MrBayes consensus tree of phosphoadenosine Minoda A, Sano M, Nomoto H, Oishi K, Hayashi H, Ohta F, Nishizaka S, Haga S, Miura S, Morishita T, Kabeya Y, Terasawa K, Suzuki Y, Ishii phosphosulfate reductase genes. Y, et al.: Genome sequence of the ultrasmall unicellular red Click here for file alga Cyanidioschyzon merolae 10D. Nature 2004, 428:653-657. [http://www.biomedcentral.com/content/supplementary/1471- 12. Reyes-Prieto A, Bhattacharya D: Phylogeny of calvin cycle 2148-9-197-S10.pdf] enzymes supports plantae monophyly. Mol Phylogenet Evol 2007, 45:384-391. 13. Tyra HM, Linka M, Weber AP, Bhattacharya D: Host origin of plas- Additional file 11 tid solute transporters in the first photosynthetic eukaryo- Supplemental Figure 17. MrBayes consensus tree of YCF21 genes. tes. Genome Biol 2007, 8:R212. Click here for file 14. Larkum AW, Lockhart PJ, Howe CJ: Shopping for plastids. Trends [http://www.biomedcentral.com/content/supplementary/1471- Plant Sci 2007, 12:189-195. 2148-9-197-S11.pdf] 15. Stiller JW: Plastid endosymbiosis, genome evolution and the origin of green plants. Trends Plant Sci 2007, 12:391-396. 16. Rodríguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, Additional file 12 Löffelhardt W, Bohnert HJ, Philippe H, Lang BF: Monophyly of pri- Supplemental Figure 18. MrBayes consensus tree of hypothetical protein mary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr Biol 2005, 15:1325-1330. genes. 17. Nozaki H, Iseki M, Hasegawa M, Misawa K, Nakada T, Sasaki N, Click here for file Watanabe M: Phylogeny of primary photosynthetic eukaryo- [http://www.biomedcentral.com/content/supplementary/1471- tes as deduced from slowly evolving nuclear genes. Mol Biol 2148-9-197-S12.pdf] Evol 2007, 24:1592-1595. 18. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, Simpson AGB, Roger AJ: Phylogenomic analyses support the monophyly of Exca- vata and resolve relationships among eukaryotic "super- groups". Proc Nat Acad Sci USA 2009, 106:3859-3864. Acknowledgements 19. Nozaki H: A new scenario of plastid evolution: plastid primary endosymbiosis before the divergence of the "Plantae," We thank Dr. Kenji Yagita and Dr. Takuro Endo for kindly providing the emended. J Plant Res 2005, 118:247-255. genomic DNA from N. fowleri strain Nf 66. Computation time was provided 20. Andersson JO, Roger AJ: A cyanobacterial gene in nonphoto- by the Super Computer System, Human Genome Center, Institute of Med- synthetic protists – an early chloroplast acquisition in ical Science, University of Tokyo. This work was supported by Grants-in- eukaryotes? Curr Biol 2002, 12:115-119. 21. Maruyama S, Misawa K, Iseki M, Watanabe M, Nozaki H: Origins of Aid for Research Fellowships for Young Scientists (No.20-9894 to SM) a cyanobacterial 6-phosphogluconate dehydrogenase in plas- from the Japan Society for the Promotion of Science; Creative Scientific tid-lacking eukaryotes. BMC Evol Biol 2008, 8:151. Research (No. 16GS0304 to HN) and Scientific Research (No. 20247032 22. Sanchez-Puerta MV, Delwiche CF: A hypothesis for plastid evolu- to HN) from The Ministry of Education, Culture, Sports, Science, and Tech- tion in chromalveolates. J Phycol 2008, 44:1097-1107. nology, Japan. 23. Cavalier-Smith T: Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, , and spo- rozoan plastid origins and the eukaryote family tree. J Eukaryot Microbiol 1999, 46:347-366.

Page 13 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

24. Foth BJ, McFadden GI: The apicoplast: a plastid in Plasmodium ments of translation elongation factor 1alpha. Proc Natl Acad falciparum and other apicomplexan parasites. Int Rev Cytol Sci USA 2004, 101:15380-15385. 2003, 224:57-110. 45. Rogers MB, Watkins RF, Harper JT, Durnford DG, Gray MW, Keeling 25. Matsuzaki M, Kuroiwa H, Kuroiwa T, Kita K, Nozaki H: A cryptic PJ: A complex and punctate distribution of three eukaryotic algal group unveiled: a plastid biosynthesis pathway in the genes derived by lateral gene transfer. BMC Evol Biol 2007, 7:89. oyster parasite Perkinsus marinus. Mol Biol Evol 2008, 46. Shalchian-Tabrizi K, Minge MA, Espelund M, Orr R, Ruden T, Jakobsen 25:1167-1179. KS, Cavalier-Smith T: Multigene phylogeny of choanozoa and 26. Slamovits CH, Keeling PJ: Plastid-derived genes in the non-pho- the origin of . PLoS ONE 2008, 3:e2098. tosynthetic alveolate Oxyrrhis marina. Mol Biol Evol 2008, 47. Boldt R, Edner C, Kolukisaoglu U, Hagemann M, Weckwerth W, 25:1297-1306. Wienkoop S, Morgenthal K, Bauwe H: D-GLYCERATE 3- 27. Reyes-Prieto A, Moustafa A, Bhattacharya D: Multiple genes of KINASE, the last unknown enzyme in the photorespiratory apparent algal origin suggest ciliates may once have been cycle in Arabidopsis, belongs to a novel kinase family. Plant photosynthetic. Curr Biol 2008, 18:956-962. Cell 2005, 17:2413-2420. 28. Tyler BM, Tripathy S, Zhang X, Dehal P, Jiang RH, Aerts A, Arre- 48. Eisenhut M, Ruth W, Haimovich M, Bauwe H, Kaplan A, Hagemann M: dondo FD, Baxter L, Bensasson D, Beynon JL, Chapman J, Damasceno The photorespiratory glycolate metabolism is essential for CM, Dorrance AE, Dou D, Dickerman AW, Dubchak IL, Garbelotto cyanobacteria and might have been conveyed endosymbion- M, Gijzen M, Gordon SG, Govers F, Grunwald NJ, Huang W, Ivors tically to plants. Proc Natl Acad Sci USA 2008, 105:17199-17204. KL, Jones RW, Kamoun S, Krampis K, Lamour KH, Lee MK, McDon- 49. Slot JC, Hallstrom KN, Matheny PB, Hibbett DS: Diversification of ald WH, Medina M, et al.: Phytophthora genome sequences NRT2 and the origin of its fungal homolog. Mol Biol Evol 2007, uncover evolutionary origins and mechanisms of pathogene- 24:1731-43. sis. Science 2006, 313:1261-1266. 50. Wilhelm C, Büchel C, Fisahn J, Goss R, Jakob T, Laroche J, Lavaud J, 29. Koski LB, Golding GB: The closest BLAST hit is often not the Lohr M, Riebesell U, Stehfest K, Valentin K, Kroth PG: The regula- nearest neighbor. J Mol Evol 2001, 52:540-542. tion of carbon and nutrient assimilation in diatoms is signifi- 30. Nozaki H, Matsuzaki M, Misumi O, Kuroiwa H, Hasegawa M, Higash- cantly different from green algae. Protist 2006, 157:91-124. iyama T, Shin-I T, Kohara Y, Ogasawara N, Kuroiwa T: Cyanobac- 51. Roberts K, Granum E, Leegood RC, Raven JA: C3 and C4 pathways terial genes transmitted to the nucleus before divergence of of photosynthetic carbon assimilation in marine diatoms are red algae in the . J Mol Evol 2004, 59:103-113. under genetic, not environmental, control. Plant Physiol 2007, 31. Schuster FL, Visvesvara GS: Free-living amoebae as opportunis- 145:230-235. tic and non-opportunistic pathogens of humans and animals. 52. Eldan MB, Blum J: Presence of Nonoxidative Enzymes of the Int J Parasitol 2004, 34:1001-1027. Pentose Phosphate Shunt in Tetrahymena. J Eurkaryot Micro- 32. Simpson AG, Perley TA, Lara E: Lateral transfer of the gene for biol 1975, 22:145-149. a widely used marker, alpha-tubulin, indicated by a multi- 53. de la Torre F, De Santis L, Suárez MF, Crespillo R, Cánovas FM: Iden- protein study of the phylogenetic position of tification and functional analysis of a prokaryotic-type aspar- (Excavata). Mol Phylogenet Evol 2008, 47:366-377. tate aminotransferase: implications for plant amino acid 33. Archibald JM, Rogers MB, Toop M, Ishida K, Keeling PJ: Lateral gene metabolism. Plant J 2006, 46:414-425. transfer and the evolution of plastid-targeted proteins in the 54. DOE Joint Genome Institute [http://www.jgi.doe.gov/] secondary plastid-containing alga Bigelowiella natans. Proc 55. DictyBase: An Online Informatics Resource for Dictyostelium Natl Acad Sci USA 2003, 100:7678-7683. [http://dictybase.org/] 34. Gray J, Wardzala E, Yang M, Reinbothe S, Haller S, Pauli F: A small 56. MIPS Comprehensive Yeast Genome Database [http:// family of LLS1-related non-heme oxygenases in plants with mips.gsf.de/genre/proj/yeast/] an origin amongst oxygenic photosynthesizers. Plant Mol Biol 57. Cyanidioschyzon merolae Genome Database [http://mero 2004, 54:39-54. lae.biol.s.u-tokyo.ac.jp/] 35. Gross J, Bhattacharya D: Revaluating the evolution of the Toc 58. Galdieria sulphuraria Genome Database [http://genom and Tic protein translocons. Trends Plant Sci 2009, 14:13-20. ics.msu.edu/galdieria/] 36. Huang J, Mullapudi N, Lancto CA, Scott M, Abrahamsen MS, Kissinger 59. Taxonomically broad EST database TBestDB [http:// JC: Phylogenomic evidence supports past endosymbiosis, tbestdb.bcm.umontreal.ca/] intracellular and horizontal gene transfer in 60. National Center for Biotechnology Information (NCBI) parvum. Genome Biol 2004, 5:R88. [http://www.ncbi.nlm.nih.gov/] 37. Moreira D, Heyden S von der, Bass D, López-García P, Chao E, Cav- 61. Stiller JW, Duffield EC, Hall BD: Amitochondriate amoebae and alier-Smith T: Global eukaryote phylogeny: combined small- the evolution of DNA-dependent RNA polymerase II. Proc and large-subunit ribosomal DNA trees support monophyly Natl Acad Sci USA 1998, 95:11769-11774. of Rhizaria, and Excavata. Mol Phylogenet Evol 2007, 62. Stiller JW, Riley J, Hall BD: Are red algae plants? A critical eval- 44:255-266. uation of three key molecular data sets. J Mol Evol 2001, 38. Hackett JD, Yoon HS, Li S, Reyes-Prieto A, Rümmele SE, Bhattacharya 52:527-539. D: Phylogenomic analysis supports the monophyly of crypto- 63. Eisen JA, Coyne RS, Wu M, Wu D, Thiagarajan M, Wortman JR, phytes and haptophytes and the association of rhizaria with Badger JH, Ren Q, Amedeo P, Jones KM, Tallon LJ, Delcher AL, Salz- chromalveolates. Mol Biol Evol 2007, 24:1702-1713. berg SL, Silva JC, Haas BJ, Majoros WH, Farzad M, Carlton JM, Smith 39. Patron NJ, Inagaki Y, Keeling PJ: Multiple gene phylogenies sup- RK, Garg J, Pearlman RE, Karrer KM, Sun L, Manning G, Elde NC, port the monophyly of and haptophyte host Turkewitz AP, Asai DJ, Wilkes DE, Wang Y, Cai H, et al.: Macronu- lineages. Curr Biol 2007, 17:887-891. clear genome sequence of the ciliate Tetrahymena ther- 40. Burki F, Shalchian-Tabrizi K, Pawlowski J: Phylogenomics reveals mophila, a model eukaryote. PLoS Biol 2006, 4:e286. a new 'megagroup' including most photosynthetic eukaryo- 64. NCBI netblast [http://www.ncbi.nlm.nih.gov/BLAST/down tes. Biol Lett 2008, 4:366-369. load.shtml] 41. Minge M, Silberman JD, Orr R, Cavalier-Smith T, Shalchian-Tabrizi K, 65. NCBI EFetch utilities [http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ Burki F, Skjæveland Å, Jakobsen KS: Evolutionary position of bre- efetch.fcgi?] viate amoebae and the primary eukaryote divergence. Proc 66. Edgar RC: MUSCLE: multiple sequence alignment with high Biol Sci 2009, 276:597-604. accuracy and high throughput. Nucleic Acids Res 2004, 42. Frommolt R, Werner S, Paulsen H, Goss R, Wilhelm C, Zauner S, 32:1792-1797. Maier UG, Grossman AR, Bhattacharya D, Lohr M: Ancient recruit- 67. Howe K, Bateman A, Durbin R: QuickTree: building huge Neigh- ment by chromists of green algal genes encoding enzymes bour-Joining trees of protein sequences. Bioinformatics 2002, for carotenoid biosynthesis. Mol Biol Evol 2008, 25:2653-2667. 18:1546-1547. 43. Leander BS: Did trypanosomatid parasites have photosyn- 68. Perrière G, Gouy M: WWW-query: an on-line retrieval system thetic ancestors? Trends Microbiol 2004, 12:251-258. for biological sequence banks. Biochimie 1996, 78:364-369. 44. Keeling PJ, Inagaki Y: A class of eukaryotic GTPase with a punc- 69. NCBI taxonomy database [http://www.ncbi.nlm.nih.gov/Taxon tate distribution suggesting multiple functional replace- omy]

Page 14 of 15 (page number not for citation purposes) BMC Evolutionary Biology 2009, 9:197 http://www.biomedcentral.com/1471-2148/9/197

70. Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algo- rithm for the RAxML Web servers. Syst Biol 2008, 57:758-771. 71. Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 2001, 17:754-755.

Publish with BioMed Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright

Submit your manuscript here: BioMedcentral http://www.biomedcentral.com/info/publishing_adv.asp

Page 15 of 15 (page number not for citation purposes)