<<

ARTICLE IN PRESS

Protist, Vol. 160, 191—204, May 2009 http://www.elsevier.de/protis Published online date 9 February 2009

ORIGINAL PAPER

Seven Gene Phylogeny of

Ingvild Riisberga,d,1, Russell J.S. Orrb,d,1, Ragnhild Klugeb,c,2, Kamran Shalchian-Tabrizid, Holly A. Bowerse, Vishwanath Patilb,c, Bente Edvardsena,d, and Kjetill S. Jakobsenb,d,3 aMarine Biology, Department of Biology, University of Oslo, P.O. Box 1066, Blindern, NO-0316 Oslo, Norway bCentre for Ecological and Evolutionary Synthesis (CEES),Department of Biology, University of Oslo, P.O. Box 1066, Blindern, NO-0316 Oslo, Norway cDepartment of and Environmental Sciences, P.O. Box 5003, The Norwegian University of Sciences, N-1432, A˚ s, Norway dMicrobial Evolution Research Group (MERG), Department of Biology, University of Oslo, P.O. Box 1066, Blindern, NO-0316, Oslo, Norway eCenter of Marine Biotechnology, 701 East Pratt Street, Baltimore, MD 21202, USA

Submitted May 23, 2008; Accepted November 15, 2008 Monitoring Editor: Mitchell L. Sogin

Nucleotide ssu and lsu rDNA sequences of all major lineages of autotrophic () and heterotrophic ( and ) heterokonts were combined with amino acid sequences from four protein-coding genes (actin, b-tubulin, cox1 and hsp90) in a multigene approach for resolving the relationship between lineages. Applying these multigene data in Bayesian and maximum likelihood analyses improved the heterokont tree compared to previous rDNA analyses by placing all -lacking heterotrophic heterokonts sister to Ochrophyta with robust support, and divided the heterotrophic heterokonts into the previously recognized phyla, Bigyra and Pseudofungi. Our trees identified the heterotrophic heterokonts , and Labyrinthulida (Bigyra) as the earliest diverging lineages. A separate analysis of the phototrophic lineages, by adding the rbcL gene, further resolved the Ochrophyta lineages by increased support for several important nodes. Except for the positioning of Chrysophyceae, Eustigmatophyceae, Raphidophyceae and Pinguiophyceae, all main branches of Ochrophyta were resolved. Our results support the transfer of classes and from subphylum Phaeista to Khakista. Based on all our trees, in combination with current knowledge about ultrastructure of heterokonts we suggest that a more advanced flagellar apparatus originated at one occasion in the ancestor of Phaeista whereas, Khakista independently reduced their flagellar apparatus and gained chlorophyll c3. & 2008 Elsevier GmbH. All rights reserved.

Key words: heterokont phylogeny; covarion; Bayesian; maximum likelihood; multigene; .

Introduction

Heterokonta was established as a by Cavalier-Smith (1986), comprising all eukaryotic motile biflagellate cells having typically a forward e-mail [email protected] (K.S. Jakobsen). directed flagellum () with tripartite rigid 1These authors contributed equally to this work. 2Present address: Norwegian Pollution Control Authority, Box tubular flagellar hairs (mastigonemes) and a 8100 Dep, 0032 Oslo, Norway. trailing hairless (smooth) flagellum, plus all their 3Corresponding author; fax +47 22854001. descendants having secondarily lost one or both

& 2008 Elsevier GmbH. All rights reserved. doi:10.1016/j.protis.2008.11.004 ARTICLE IN PRESS

192 I. Riisberg et al.

flagella. The diversity among heterokonts is strik- years been successfully applied to several ing; they range from large multicellular groups (Fast et al. 2002; Kim et al. 2006; Nosenko to tiny unicellular species, they are present in and Bhattacharya 2007; Shalchian-Tabrizi et al. freshwater, marine and terrestrial habitats and 2006a; Simpson et al. 2006). For heterokonts, embrace many ecologically important algal (e.g. two-gene analyses (lsu+ssu rDNA) have been , brown , chrysophytes), and hetero- performed — but, with the main emphasis on trophic (e.g. ) groups. Due to the Ochrophyta (Ben Ali et al. 2002; Edvardsen et al. diversity in Heterokonta it was later raised to 2007). In the present study, we have performed a infrakingdom (Cavalier-Smith 1997) with two main two-gene lsu+ssu rDNA analysis of all three groups; Ochrophyta (Cavalier-Smith 1986) con- heterokont phyla (Ochrophyta, Pseudofungi and sisting mainly of autotrophic heterokonts, and a Bigyra). In order to further enhance the phyloge- purely heterotrophic group, which was again netic resolution we have combined the rDNA data further subdivided in two phyla: Bigyra and with amino acids from five protein encoding genes Pseudofungi (Cavalier-Smith and Chao 2006). (actin, beta-tubulin [b-tubulin], cytochrome oxi- Due to their diversity and ecological signifi- dase subunit I (cox1), heat-shock protein 90 cance, a number of molecular phylogenetic (hsp90) and rbcL). Our largest concatenated studies have been performed on separate hetero- alignment consisted of 5643 characters (3886 konts groups; predominantly single gene infer- nucleotides and 1757 amino acids) thereby repre- ences (Andersen et al. 1999; Guillou et al. senting the most gene- and character-rich hetero- 1999; Moriya et al. 2000; Negrisolo et al. 2004; kont alignment published to date. Potter et al. 1997). Investigations on the global phylogeny of heterokonts are limited to relatively few studies of nuclear-encoded ribosomal RNAs Results (rDNA) or -encoded rbcL (ribulose-1,5- bisphosphate carboxylase/oxygenase) (Ben Ali Thirty-six new heterokont nucleotide sequences et al. 2001; Daugbjerg and Guillou 2001; Edvard- including lsu (rRNA), ssu (rRNA), actin, b-tubulin, sen et al. 2007). Recently, the most species-rich cox1 and hsp90 were PCR-amplified for the phylogeny of all three heterokont phyla (Ochro- purpose of this study and have been made phyta, Bigyra, Pseudofungi) employing a compre- publicly available with accession numbers from hensive ssu rDNA dataset was performed (Cava- FJ030880 to FJ030915. In addition, we have lier-Smith and Chao 2006). The ssu rDNA study extracted all available accessions from GenBank clarified some sister taxa relationships, and dis- for the various heterokont groups including the played the heterotrophic groups as deep branch- above-mentioned genes plus rbcL (Table 1). ing heterokonts (Cavalier-Smith and Chao 2006). The phylogenetic inferences of heterokonts have Global Phylogeny of Ochrophyta and been congruent in placing the plastid-free hetero- Heterotrophic Heterokonts trophic species basal to the plastid-containing forms (Cavalier-Smith and Chao 2006), but the Our dataset included 35 taxa representing ten statistical support for this phylogeny has been heterokont classes (Bacillariophyceae, Chryso- only moderate. According to the chromalveolate phyceae, Dictyochophyceae, Eustigmatophy- hypothesis heterokonts, , crypto- ceae, Pelagophyceae, Phaeophyceae, Phaeo- phytes and acquired their red-algal thamniophyceae, Pinguiophyceae, Raphidophy- derived in a single secondary endosym- ceae, Xanthophyceae) as well as the parasitic biotic event (Cavalier-Smith 1999). However, taxa Blastocystis hominis and Developayella ele- recurrent plastid loss and gains have been gans. Thus, all major lineages of Ochrophyta, suggested for several chromalveolate groups Bigyra and Pseudofungi (heterotrophic hetero- (Bachvaroff et al. 2006; Bodyl 2005; Sanchez- konts) except Bolidiophyceae, Chrysomerophy- Puerta and Delwiche 2008; Shalchian-Tabrizi et al. ceae, Placididea and Opalinidea (excluded due to 2006b). A robust phylogeny for Ochrophyta and lack of available sequence data or lack of the entirely heterotrophic heterokont groups may available cultures or DNA) were represented. improve our understanding of the type of pro- For all our analyses the Bayesian covarion cesses forming the current plastid distribution. sequence evolution model (Huelsenbeck 2002) Multigene approaches for resolving phyloge- fitted the data better than competing models with netic relationships using a moderate number of the highest marginal likelihoods. Our phylogenetic protein encoding genes have during the last few analysis based on lsu+ssu rDNA sequence data ARTICLE IN PRESS

Heterokont Phylogeny 193

Table 1. Taxa, genes and accession numbers of sequences used in the phylogenetic inferences (Figs 1—3)

Taxon lsu ssu actin b-tubulin cox1 hsp90 rbcL Missing %

Aconoidasida AF0134191* AF0134181 ABO696292 XP_0013473692 AAC633902 BM1624043 Heterotroph 0.4 CX0201683 AF289042 U40257 EH117431 AAP42295 — EH117437 AAD39107 14.1 anophagereffens Bacillariophycidae EF5534584 EF1406224 AAV328315 AAW580834 BAA866106 AAX109454 YP_8744184 0.2 CT8998704 Blastocystis hominis CT899870 AB023499 EC649064 EC650343 — — Heterotroph 23.2 EC649592 EC650685 EC651237 EC649837 EC649960 EC645290 Caecitellus parvulus FJ030883 AY642126 ABC85753 — FJ030895 — Heterotroph 17.4 sp. FJ0308917 AB2176267 — — AAC949767 — AAC029208 16.9 Chlorarachniophyceae AF2890369 AF0548329 AAK923639 AAC685069 BAA9635510 DR0415479 Heterotroph 0.4 DR0402089 DR0388999 Cryptomonas sp. AY75299011 AF50827112 AAG3147413 ABR2255111 BAA9635813 — CAJ1966013 11.2 Developayella elegans FJ030882 U37107 — — — — Heterotroph 25.3 Dictyocha speculum AF289046 U14385 — — AM850252 — AAK85735 19.5 AF33115514 AY30739815 — P3015616 AAC9498017 — AAD2259518 11.2 Eimeriorina AF04425219 AF00924419 AAC1376620 XP_62780321 ABL9629022 XP_62692421 Heterotroph 4.8 AF33115123 AB01142324 Q3975823 DV67006325 YP_44862323 ACF0618426 CAC6751827 9.7 Glossomastix AF409128 AF438324 FJ030905 — FJ030899 FJ030915 AAM45877 9.8 chrysoplasta Guillardia theta AF289037 X57162 AAG31473 AAD02570 — AAX10949 NP_050705 6.3 Gymnodiniaceae AY83141228 AY83141229 ABI1438729 ABV2220029 ABK5796530 CAJ7575129 Excluded 0.2 AF409124 DQ191681 AAW56948 AAW58080 — AAX10940 BAD151144 6.3 X80345 AF163294 — — — — Heterotroph 25.1 catenoides Isochrysis galbana DQ202389 AJ246266 AAW56949 AAW58081 BAA24905 AAX10942 BAB20783 6.5 EC143993 EC146828 EC144527 EC142332 EC137274 Japonochytrium sp. FJ03088731 AB02210431 ABC8574332 ABC6149332 — — Heterotroph 33.6 AF33115333 AB02281834 ABB0244535 CN46811233 NP_65927433 CN46820433 AAR2445033 10.0 sp. FJ03088536 FJ03089337 AAW5695036 — — FJ03091336 ABN4695638 15.8 EF16514737 Mischococcales FJ03088139 FJ03089239 — — BAA2496740 FJ03090739 CAE4570039 11.8 FJ030880 AF067957 FJ030901 — — FJ030909 BAC10559 13.7 gaditana Nannochloropsis salina Y07975 AF045048 — — — — BAC10567 23.1 Nerada mexicana FJ030884 AY520453 FJ030902 — — FJ030912 Heterotroph 15.7 Ochromonadaceae Y0797741 EF16513642 AAR2754943 AAW5808743 NP_06645041 AAR2754043 ABN4693444 2.3 Pavlova sp. DQ20238845 AF10237346 EC17666247 AAW5808247 AAD1206847 AAX1094447 BAA0828148 12.9 EC17494845 EC17713147 EC17448445 EC17428845 EC17717945 AF28904549 U1438750 — — FJ03089650 FJ03091050 AAC0281750 13.6 Pelagococcus FJ030889 U14386 — — — — AAC02919 23.4 subviridis AF289047 U14389 ABX25970 ABX25960 — — AAC02816 14.3 calceolata Peridiniales AY11274651 AF07705551 AAO4934052 AAO4934352 AAM9769451 AAX1094152 Excluded 6.6 FJ03089053 AF04484654 — — — — AAC6246254 34.4 Phytophtora sp. X7563155 AY74276156 P2213156 ABV0008157 AAA3202355 AAX1094658 Heterotroph 0.4 Pinguiococcus AF409130 AF438325 FJ030906 — FJ030900 FJ030908 AAM45878 9.9 pyrenoidosa Prymnesiaceae AF28903859 AB18361260 AAW5695461 ABJ8099762 — AAX1095161 BAB2078861 6.8 Pseudochattonella FJ030886 AM075624 FJ030903 FJ030894 FJ030897 — AM850235 27.9 farcimen sp. AY59864763 EF41892564 AAW5695565 AAY4045766 AAM9873867 AAX1094865 Heterotroph 0.3 EL77723063 EL77716463 EL774542 ARTICLE IN PRESS

194 I. Riisberg et al.

Table 1. (continued )

Taxon lsu ssu actin b-tubulin cox1 hsp90 rbcL Missing %

Rhizosolenia setigera AF289048 M87329 — — BAA86611 — AAC02907 16.9 Saprolegniaceae AF23594368 M3270569 P2618269 P2080270 YP_05291571 AAM9067472 Heterotroph 20.2 AF32058768 Synura sp. DQ98047573 U7322173 ABJ8092573 ABJ8100473 — FJ03091174 ABN4695275 9.5 Y1151276 AJ63220976 ABC5473877 AAF8190678 YP_31658679 FC50395079 YP_87449879 0.4 FC49718279 FC53609579 Thaumatomonas sp. DQ98047780 AF41125981 ABJ8092780 ABJ8100780 — AAP7216280 Heterotroph 7.5 Thraustochytrium sp. FJ03088882 AB02211082 FJ03090482 ABC6149283 FJ03089882 FJ03091482 Heterotroph 18.8 Tribonema sp. Y0797984 AF08339785 — — BAA2497584 — AAD0284384 16.9 Vacuolaria virescens AF409125 U41651 — — — — AAC02921 24.9 Character end 2473 3886 4134 4521 4873 5185 5643 position Alignment ALIGN_001291 ALIGN_001292 ALIGN_001293 ALIGN_001294 ALIGN_001295 ALIGN_001296 ALIGN_001297 accession Calculated percentage of total missing characters for each taxon in the multigene alignment is presented. Note, percentages of missing characters for species not included in the phylogenetic inference represented in Fig. 3 are calculated without rbcl (labelled; Heterotroph). The dinoflagellate (Dinophyta) taxa, Gymnodiniaceae and Peridiniales were ‘‘Excluded’’ from the rbcL inference, as this gene is highly derived. All seven single-gene alignments have been made available under the provided accession numbers from the EMBL-Align database: ftp://ftp.ebi.ac.uk/pub/databases/embl/align/ *Composite (in silico chimeric) sequences constructed from closely related species indicated as follows: 1Theileria parva, 2Plasmodium falciparum, 3Plasmodium sp., 4Phaeodactylum tricornutum, 5Nitzschia thermalis, 6Nitzschia frustulum, 7Chattonella antiqua, 8Chattonella subsalsa, 9Bigelowiella natans, 10Chlor- arachnion reptans, 11Cryptomonas paramecium, 12Cryptomonas platyuris, 13Cryptomonas ovata, 14Stre- blonema maculans, 15Ectocarpus siliculosus, 16Ectocarpus variabilis, 17Ectocarpus sp., 18Myriotrichia clavaeformis, 19Frenkelia microti, 20Toxoplasma gondii, 21Cryptosporidium parvum, 22Eimeria acervulina, 23Fucus vesiculosus, 24Fucus distichus, 25Sargassum binderi, 26Fucus serratus, 27Sargassum muticum, 28Akashiwo sanguinea, 29Karlodinium micrum, 30Akashiwo sp., 31Japonochytrium sp., 32Japonochytrium marinum, 33Laminaria digitata, 34Laminaria angustata, 35Saccharina japonica, 36Mallomonas tonsurata, 37Mallomonas sp., 38Mallomonas rasilis, 39Chlorellidium tetrabotrys, 40Botrydiopsis alpina, 41Ochromonas danica, 42Ochromonas distigma, 43Spumella uniguttata, 44Ochromonas sp., 45Pavlova sp., 46Pavlova pinguis, 47Pavlova lutheri, 48Pavlova salina, 49Apedinella radians, 50Pseudopedinella elastica, 51Pfiesteria piscicida, 52Heterocapsa triquetra, 53Phaeobotrys solitaria, 54Phaeothamnion confervicola, 55Phytophthora mega- sperma, 56Phytophthora infestans, 57Phytophthora glovera, 58Phytophthora palmivora, 59Prymnesium patelliferum, 60Prymnesium sp., 61Prymnesium parvum, 62Chrysochromulina sp., 63Pythium sp., 64Pythium helicoids, 65Pythium graminicola, 66Pythium macrosporum, 67Pythium aphanidermatum, 68Achlya sp., 69Achlya bisexualis, 70Achlya klebsiana, 71Saprolegnia ferax, 72Achlya ambisexualis, 73Synura sphagnicola, 74Synura curtispina, 75Synura petersenii, 76Skeletonema pseudocostatum, 77Skeletonema costatum, 78Thalassiosira weissflogii, 79Thalassiosira pseudonana, 80Thaumatomonas sp., 81Thaumatomonas seravini, 82Thraustochytrium aureum, 83Thraustochytrium kinnei, 84Tribonema aequale, 85Tribonema intermixtum.

(3886 nucleotides, Fig. 1) supported Ochrophyta Dictyochophyceae as sister taxa was well as a monophyletic clade with high support resolved (1.00/93/90) — likewise that of Phaeo- (Bayesian posterior probability of 1.00, maximum phyceae as sister taxa to Xanthophyceae (1.00/ likelihood bootstrap value of 99 (Treefinder) and 100/99). 100 (RAxML), hereafter written 1.00/99/100). The By adding protein sequences from four genes lsu+ssu rDNA tree (Fig. 1) included two weakly (actin, b-tubulin, cox1 and hsp90; in total 1299 supported heterotrophic branches, Bigyra amino acid characters) to the lsu+ssu rDNA (0.64/—/—) and Pseudofungi (0.82/—/—). Within dataset, we generated the most gene-rich hetero- Ochrophyta, all classes were highly supported kont alignment to date. No taxa had more than (1.00/490/490). In addition, the phylogenetic 35% missing character data and the supermatrix relationship of the classes Pelagophyceae and proportion of missing data was o15% (Table 1). ARTICLE IN PRESS

Heterokont Phylogeny 195

Figure 1. rDNA phylogeny of heterokonts. Combined lsu and ssu phylogeny of Ochrophyta, Bigyra and Pseudofungi (3886 characters). The tree is reconstructed with Bayesian inference (MrBayes). Numbers on the internal nodes represent posterior probability and bootstrap values (450%) for Treefinder and RAxML (ordered; MrBayes/Treefinder/RAxML). Black circles indicate support values of 1.00 posterior probability and bootstrap over 90%. See Table 1 for inferred composite (in silico chimeric) sequences.

The combined rDNA and protein tree (Supple- terns remained elusive, and the heterotrophic taxa mentary Fig. S1) showed a similar topology to the Developayella elegans moved from a basal posi- lsu+ssu rDNA tree (Fig. 1), suggesting that the tion in the Pseudofungi to a position basal to protein and rDNA data contain congruent phylo- Ochrophyta (0.98/63/68). Additionally, Bigyra split genetic signals. However, some branching pat- to form two clades consisting of Labyrinthulida ARTICLE IN PRESS

196 I. Riisberg et al. and Blastocystis/Bicosoecida (0.96/—/51). To Kumar et al. unpublished). The resulting tree investigate the effect of the fastest evolving (Fig. 2) again supports Ochrophyta as a monophyletic heterotrophic taxa on the topology, PAML was group, congruent with Figures 1 and S1 (support used to remove fast-evolving classes of sites from values 1.00/100/99). The groupings of Pelagophy- the alignment (site class 8) (Yang 1997, 2007; ceae with Dictyochophyceae (1.00/91/87) and

Figure 2. Protein and rDNA phylogeny of heterokonts. Combined lsu, ssu, actin, b-tubulin, cox1 and hsp90 phylogeny of Ochrophyta, Bigyra and Pseudofungi. PAML fast-evolving site category 8 for both rDNA and proteins has been removed (Yang 1997, 2007; Kumar et al. unpublished) (4632 characters). For further information, see legend for Figure 1. ARTICLE IN PRESS

Heterokont Phylogeny 197

Phaeophyceae as sister taxa to Xanthophyceae formed a separate analysis of these lineages only were still highly supported (1.00/100/99). Further, (in total 30 taxa, 5643 characters). The hetero- Bacillariophyceae clustered at a deep Ochrophyta trophic lineages (Bigyra and Pseudofungi) not node, and the placement of Raphidophyceae was possessing the autotrophic rbcL gene were consistent with Figure 1. The combined rDNA and excluded and the phylogeny was inferred both protein tree did not resolve the position of with and without an outgroup (Figs 3 and S3). The Pinguiophyceae, Chrysophyceae and Bacillario- resulting tree with outgroup (Fig. 3) showed many phyceae; all positions received weak support. In similarities with Figures 1 and 2, but a notable contrast to the tree including fast evolving sites change was the placement of Pelagophyceae/ (Fig. S1) and in congruence with Figure 1, Dictyochophyceae together with Bacillariophy- Developayella elegans clustered monophyletically ceae (Khakista excluded from Phaeista with with the Pseudofungi to form a moderately 1.00/60/—). The exclusion of Khakista from supported clade (0.99/73/68). The branching Phaeista was further highlighted when the rela- pattern for Bigyra remained unresolved. As tionship was inferred without an outgroup (1.00/ branching pattern for the heterotrophic taxa 80/58) (Fig. S3). The branching pattern within remained an issue, the approximately unbiased Phaestia was unresolved with Eustigmatophy- (AU) test (Shimodaira 2002) was applied to ceae, Raphidophyceae, Chrysophyceae and Pin- compare several user-defined trees as well as guiophyceae receiving no topological support. the optimal topologies obtained from the Bayesian The single gene tree for rbcL did not reveal any analyses. The AU test rejected a single plastid loss supported topological conflicts with the concate- (heterotrophic monophyly) hypothesis (p- nated trees (see Methods and Fig. S10). value ¼ 0.00). However, it could not reject a heterotrophic heterokont evolution resulting in 2, 3 or 4 separate clades. See Supplementary data; Discussion AU tests. Multigene Phylogeny and Covarion Protein Tree is Congruent with the rDNA and Analyses Strongly Divide Ochrophyta and Combined rDNA and Protein Trees Heterotrophs In previous studies, usually only a few of the A separate four-gene protein analysis (actin, affiliated classes have clustered together with b-tubulin, cox1, hsp90) of taxa having less than statistical support, and many of the deeply 50% missing character data was performed (25 diverging nodes have remained unresolved (Ben taxa, 1299 characters, Fig. S2). As this dataset Ali et al. 2001, 2002; Daugbjerg and Andersen contained fewer taxa than Figures 1 and 2, the 1997; Edvardsen et al. 2007). Here we present a tree does not reflect the relationship between all considerable number of sequences from all the classes, but shows an overall similarity by dividing major classes of the heterokonts, and increase the the Ochrophyta and the heterotrophic groups support for class-level phylogenies. Compared to (1.00/99/92). The deviations from the other topol- previous combined lsu+ssu rDNA analyses (Ben ogies in Figures 1 and 2 were not significantly Ali et al. 2002; Edvardsen et al. 2007)we supported. Thus, it seems that there are no increased the number of taxa, and instead of significant conflicts in the phylogenetic signal in using only partial sequences including the D1 and the rDNA and protein datasets. Additionally, the D2 domains (Edvardsen et al. 2007)we single protein gene trees for actin, b-tubulin, cox1 sequenced and compared the full length of the and hsp90 did not reveal any supported topolo- lsu sequences (Fig. 1). gical conflicts with the protein concatenated trees In all our analyses no taxa had more than 35% (see Methods and Figs S4—S9). missing character data in the inferred alignments (for further details, see Table 1). According to Addition of rbcL Sequence Data for Wiens, taxa that are missing up to 50% character Inference of Ochrophyta Phylogeny data can subdivide long branches and improve accuracy as well as those with no missing To further improve the resolution of Ochrophyta, character data (Wiens 2005, 2006). In addition, we added the rbcL gene to the protein and rDNA the resulting supermatrices proportion of missing sequences (458 amino acids, in total the align- data never exceeded 15% (14% missing data in ment constituted 1757 amino acids) and per- Fig. 3), a figure significantly lower than that ARTICLE IN PRESS

198 I. Riisberg et al.

Figure 3. Protein and rDNA phylogeny of photosynthetic heterokonts. Combined lsu, ssu, actin, b-tubulin, cox1, hsp90 and rbcL phylogeny of Ochrophyta (5643 characters). For further information, see legend for Figure 1. scrutinized by others (e.g., 25%: Philippe et al. heterokont trees (Figs 1 and 2) the split between 2004; 75%: Wiens and Reeder 1995; 95%: heterotrophic heterokonts (Bigyra and Pseudo- McMahon and Sanderson 2006). For this reason, fungi) and Ochrophyta was strongly supported the Bayesian topology was preferred since this which shows that the resolution and support for has been documented to be less prone to missing our preferred topology is greater than in previous characters than both likelihood and parsimony published analyses (Ben Ali et al. 2002; Cavalier- methods (Wiens 2005, 2006; Wiens and Moen Smith and Chao 2006; Edvardsen et al. 2007). 2008). Additionally, the covarion model received The heterokont trees embracing the three phyla significantly higher marginal likelihood values than Ochrophyta, Bigyra and Pseudofungi (Figs 1 and competing models. S1) indicated that the heterotrophic heterokonts When comparing the amounts of rDNA data (Bigyra and pseudofungi) have higher evolutionary versus protein data in our alignments, rDNA rates (inferred from branch length) than Ochro- constituted the majority (Table 1). Still, the phyta. This evolutionary rate was considerably separate protein tree (with taxa having less than reduced with the exclusion of the fastest-evolving 50% missing character data) was congruent with nucleotide and amino acid sites from the align- the rDNA tree (Figs 1 and S2). In our global ment (Fig. 2). Among all tested models of ARTICLE IN PRESS

Heterokont Phylogeny 199 sequence evolution the covarion model (Huelsen- clades may belong to a single class as earlier beck 2002) fitted the data with the highest log suggested (Goertzen and Theriot 2003; Negrisolo likelihood. The covarion model is most appro- et al. 2004). However, more data from deeply priate for this dataset due to changes in site-rate diverging xanthophytes are needed to test this pattern, in which sites switch between invariable scenario. and variable states across the heterokont tree — The position of Pinguiophyceae, Chrysophy- similarly to the uneven rates observed among the ceae, Eustigmatophyceae and Raphidophyceae lineages of other protist nuclear and plastid within Phaeista varied in almost all trees, and genome sequences (Shalchian-Tabrizi et al. these seem to be the clades with weakest 2006 a,b,c). Application of the covarion model on affiliation among all ochrophytes. the multigene data resulted in a tree with better support for the basal branches of heterokonts than earlier shown in studies of global heterokont Proposed Relationships between phylogeny (Ben Ali et al. 2002; Cavalier-Smith and Heterotrophic Heterokonts (Bigyra and Chao 2006; Edvardsen et al. 2007). Pseudofungi) In conclusion, some nodes in Ochrophyta were well resolved (Figs 1 and 2), but despite the The basal heterotrophic branch (Figs 1 and 2) amount of data (up to 5185 characters), concate- consisting of Bigyra (Bicoecida, Blastocystis and nating six genes in a multigene analysis, we were Labyrinthulida) is consistent with previous studies not able to resolve the phylogeny of heterotrophic (Cavalier-Smith and Chao 2006) but only weakly heterokonts. Our four gene protein tree (Fig. S2) as supported. The branch consisting of oomycetes well as the combined rDNA+protein tree (Fig. 2) and Hyphochytrium catenoides (forming Pseudo- were in congruence with the rDNA tree (Fig. 1). In fungi) was strongly supported, and the mono- all our global heterokont trees, Ochrophyta was phyletic inclusion, with moderate support, of placed as a monophyletic group clearly divided Developayella elegans within the pseudofungi from the heterotrophic phyla (Bigyra and Pseudo- was achieved when the class of fastest-evolving fungi). sites was excluded from the analysis. Both Bigyra and Pseudofungi were excluded from the Ochro- phyta congruent with earlier studies (Cavalier- Proposed Relationships of Ochrophyta Smith and Chao 2006; Leipe et al. 1994, 1996), but In spite of several attempts to resolve the here shown with high support in Bayesian and evolutionary relationships among heterokonts, maximum likelihood analyses. the phylogeny among classes in Ochrophyta has been unclear. Previous analyses (Ben Ali et al. Molecular Phylogeny and Ultrastructure 2002; Cavalier-Smith and Chao 2006; Daugbjerg and Andersen 1997; Edvardsen et al. 2007) Taken all our trees, plus known ultrastructural clarified some interclass relationships in Ochro- traits into consideration, we propose that Bacillar- phyta such as Dictyochophyceae clustering with iophyceae, Dictyochophyceae and Pelagophycae Pelagophyceae (Ben Ali et al. 2001, 2002; are related and placed within the subphylum Edvardsen et al. 2007) and likewise Phaeophy- Khakista, and separated from subphylum Phaeista ceae as sister taxa to Xanthophyceae (Ben Ali (Cavalier-Smith and Chao 2006). This is consistent et al. 2002). These sister taxa relationships are with the morphological features as the lineages strongly supported in our analyses (Figs 1 and 2) belonging to Khakista have reduced flagellar by receiving maximum support values. When apparatus (Cavalier-Smith and Chao 2006; Leipe adding rbcL the separate analysis of Ochrophyta et al. 1994) with no or vestigial flagellar roots displayed the three heterokont classes, Bacillar- (Guillou et al. 1999) and usually no distal transi- iophyceae, Pelagophyceae and Dictyochophy- tional helix (members of the dictyochophyte order ceae (Khakista) as being excluded from the is an exception; Edvardsen et al. Phaeista lineages with moderately high support (2007), but may have a proximal transitional helix (Fig. 3). This topology is contradictory to Figures 1 (found in some dictyochophytes and pelago- and 2 as well as Cavalier-Smith and Chao (2006), phytes) in the forwardly directed flagellum (Guillou where the Bacillariophyceae (together with Boli- et al. 1999). Furthermore, they lack the pigment dophyceae) formed a basal group within Ochro- violaxanthin and possess chlorophyll c3 (Andersen phyta. Figure 3 shows 2004; Bjørnland and Liaaen-Jensen 1989; Edvard- cluster strongly with Xanthophyceae. These two sen et al. 2007). The basal bodies are placed in a ARTICLE IN PRESS

200 I. Riisberg et al. depression in the nucleus in Dictyochophyceae with ultrastructural characters will enable us to and Pelagophyceae (Edvardsen et al. 2007; better understand the evolution of this diverse Honda et al. 1995) and in Bacillariophyceae, only group of organisms. one basal body may be present in the flagellate stage (Preisig 1989). In the remaining taxa, the basal bodies are placed anterior to the nucleus. Methods Contrary to the original definition of Khakista (Cavalier-Smith and Chao 2006), both the mole- Algal cultures: In total 19 different algal strains (Supplemen- cular and ultrastructural data indicate that Pela- tary Table 1) were grown in media and temperature according to the specific recommendations from culture collections. The gophyceae and Dictyochophyceae should be dictyochophyte Pseudochattonella farcimen strain UIO110 moved from subphylum Phaeista to Khakista. was grown in IMR1/2 medium (Eppley et al. 1967) added The existence of a monophyletic group consisting 10 nM selenite and with salinity 25, at the temperature of Bacillariophyceae, Pelagophyceae and Dictyo- 14—15 1C and irradiance of about 100 mmol photons m2 s1. chophyceae has previously been suggested from DNA isolation and PCR amplification: Algal cells were harvested directly from the culture (1 ml) by centrifugation ultrastructural, biochemical and ssu rDNA data (Eppendorf 5415R, Eppendorf, Hamburg, Germany) at 3220g (Potter et al. 1997; Saunders et al. 1995, Saunders for 5 min. Organisms grown on solid surfaces (bottom of Petri et al. 1997). dishes or agar surfaces) were scraped off and dissolved in This study indicated at least two heterotrophic molecular biology grade water prior to centrifugation. DNA heterokont clades (Fig. 1) — and possibly more from the algal pellets was isolated using standard procedure of Dynabeads DNA direct Universal kit (Invitrogen, CA, USA) (Figs 2 and S1). Since the number of heterotrophic according to manufactures recommendation. heterokont taxa is limited in this study, more data The primers used for amplifications by PCR are shown in from other groups will probably further resolve the Supplementary Table 2. For degenerate primers the final heterotrophic heterokont tree. Assuming two concentration was doubled (20 mM). When nested PCR heterotrophic clades, as our data may suggest, reactions were required, the initial PCR products were diluted 1:20 in molecular biology grade water. Amplifications of PCR two independent plastid losses seem to be the products were optimized and carried out in a Master Cycler most likely scenario given a common photosyn- (Eppendorf). Two different DNA polymerases HotMaster thetic ancestor of heterokonts (Cavalier-Smith (Eppendorf) and Phusion (Finnzymes OY, Espoo, Finland) 1999). Alternatively, assuming a non-photosyn- were used (Supplementary Table 2). Molecular cloning and sequencing: All PCR products thetic origin of heterokonts (see Sanchez-Puerta were examined for correct length on a 1.0% agarose gel and Delwiche 2008), only a single acquisition of a stained with ethidium bromide. PCR products were purified red algal (or red algal-derived) plastid in the from agarose gels, with WizardSV Gel and PCR Clean-Up ancestral Ochrophyta needs to be invoked. Con- System (Promega, WI, USA). Depending on polymerase used versely, the evolution of plastids in heterokonts in PCR we cloned PCR products into TOPO cloning pCR 2.1 vector (Invitrogen, CA, USA) or PJET vector (Fermentas, St. may be more complex, for example, invoked Leon-Rot, Germany) and transformed into TOP10 competent independent plastid losses in some Ochrophyta bacterial cells (Invitrogen). Plasmids were purified by use of lineages (e.g. Oikomonas with the Chrysophyceae Wizard plus SV miniprep (Promega) and bidirectionally clade; see Cavalier-Smith and Chao 2006). sequenced using the standard M13 sequencing primers or This study has further resolved the phylogeny of PJET sequencing primers (Fermentas). Internal sequencing primers (Supplementary Table 2) were also used. One to five heterokonts. In particular, the topology within clones from each PCR product were sequenced and Ochrophyta was better resolved and their separa- compared. tion from heterotrophic heterokonts was robustly Single and multigene alignments: Additional non-ampli- demonstrated. However, despite the substantial fied sequences used during this study were obtained from the increase in sequence data in this work, there are publicly available GenBank databases (http://www.ncbi.nlm.- nih.gov/). Sequences were compared and manually edited in some ambiguities among the heterotrophic phyla, Vector NTI (Invitrogen). Identification of different paralogous Bigyra and Pseudofungi. As most of the character gene copies was performed using Pfam gene family seed information in our data came from the rDNA genes alignments (Finn et al. 2006) and non-orthologous gene copies we believe it is worth generating an even larger were subsequently removed prior to concatenation. Sequences were aligned to the Pfam alignments using alignment with more genes and increased number ClustalW (Thompson et al. 1994) and subsequently edited of taxa preferably from the classes Bolidiophy- manually in BioEdit v.7.0.5 (Tippmann 2004) and MacClade ceae, , Placididea and sev- (Vazquez 2004). The single gene alignments were then inferred eral of the heterotrophic classes including Opali- using RAxML version 7.0.3 (Stamatakis 2006) with 100 nidea as well as the recently identified pico-sized bootstrap replicates, random starting trees and either the GTRMIX (rDNA) or PROTMIXrtREV(F) (protein) substitution MAST clades (Kolodziej and Stoeck 2007; Mas- model. This determined if the genes had phylogentically sana et al. 2006). Extending our analyses with compatible histories (no strongly supported conflicting improved taxon and gene sampling combined clades). In all cases, no paralogous evidence and no strongly ARTICLE IN PRESS

Heterokont Phylogeny 201 supported (i.e. bootstrap support 455%) topological conflicts site evolutionary rates was preferred in all cases with were observed (Figs S4—S10). The only exceptions to this significantly higher marginal likelihoods for the runs. Burn-in rule were the placement of Ochromonas danica (Ochromona- trees were set based on the assessment of likelihood plots daceae) as sister to Caecitelius parvulus in the cox1 tree (Fig. and convergence diagnostics implemented in MrBayes. Tree S8) and the placement of Nerada mexicana within the sampling was done every 100 generations, and after burn-in outgroup in the hsp90 tree (Fig. S9). Both these topologies the consensus of the sampled trees used to calculate the and high bootstrap support values are most likely due to posterior probability of the trees topology. artefacts caused by biased taxon sampling; for cox1, Maximum likelihood (ML) analyses were performed with Ochromonas danica and Caecitelius are the singular repre- Treefinder (Jobb et al. 2004) and RAxML version 7.0.3 sentatives of Chrysophyceae and Bigyra, whilst for hsp90, (Stamatakis 2006). Bootstrap analyses were done by 100 Nerada mexicana is adversely affected as being the only pseudoreplicates with the same evolutionary model as the representative from the basal Bigyra taxa (Bicosoecida and initial search. Trees were inferred from RAxML under Blastocystis). Removing Ochromonas danica cox1 and Ner- GTR(PROT)MIX, with an initial inference under the ada mexicana hsp90 sequences for the multigene inferences GTR(PROT)CAT approximation (optimization of individual did not change the topology nor maximum likelihood boot- per-site substitution rates and classification of those individual strap values significantly from that presented in Figures 1 and rates under 25 specified rate categories), before the 2 (results not shown). GTR(PROT)GAMMA model with 4 discrete GAMMA rate The choice of outgroup has been a contentious issue in categories was used to evaluate the final tree topology, such resolving heterokont evolutionary relationships (Cavalier- that it yielded stable likelihood values (Stamatakis 2006). Smith and Chao 2006). An outgroup distant to the ingroup To investigate the impact of the fast-evolving heterotrophic can result in long branch attraction (LBA) affecting the ingroup taxa on the phylogenetic inference, fast-evolving sites were relationships (Hillis 1998; Rannala et al. 1998). However, identified using PAML (Yang 1997, 2007). The rDNA (lsu and recent articles that have inferred global eukaryotic phyloge- ssu) dataset was assessed by estimation with baseML, whilst nies based upon 123+ genes (29908+ amino acid characters) the protein dataset (actin, b-tubulin, cox1 and hsp90) was show a clear relationship between the heterokonts, assessed by estimation with codonML. For both estimations, and the alveolates (, and dinoflagellates), sites were classified under 8-rate categories, and subsequent which form the SAR clade (Burki et al. 2007, 2008). It was site removal script applied to the alignment (Kumar et al. therefore important to include taxa from these lineages in unpublished). Alignments were constructed with sites 8 and addition to the chromist divisions of Cryptophyta and sites 7 and 8 removed; however, subsequent analysis showed Haptophyta, which make up the chromalveolates with the that removal of more than site 8 resulted in a highly reduced heterokonts and alveolates, for the enhanced resolution of alignment and therefore phylogenetic resolution (results not heterokont evolution. shown) (Kumar et al. unpublished). Two genes (b-tubulin and rbcL) from the dictyochophyte The paired-site approximately unbiased (AU) test Pseudochattonella farcimen strain UIO110 originated from an (Shimodaira 2002) was used to compare incongruent hetero- EST library of this organism (Riisberg et al. unpublished; trophic topologies and implemented in Treefinder (Jobb et al. accessions FJ030894 and AM850235, respectively). 2004). The heterotrophic topologies to be tested were Taxa were chosen to prevent missing character data for the reoptimized (multifurcations resolved) in Treefinder before multiple gene alignments. However, as missing character data AU testing. The alignment used was based on Figure S1 (lsu, was extensive for some species, composite (in silico chimeric) ssu, actin, b-tubulin, cox1 and hsp90), where no fast evolving sequences were constructed from closely related species. All were sites excluded. Taxon choice for the autotrophic taxa had less than 35% missing character data in the resulting heterokonts was also reduced, with the exclusion of Phaeista alignments (for further details, see Table 1) and in the resulting taxa from the dataset. The AU test was performed with supermatrices the proportion of missing data never exceeded 1,000,000 bootstrap replicates with the same evolutionary 15% (14.37% missing data in Fig. 3). models and partitions as described earlier (GTR and rtREV+F). Phylogenetic analyses and AU test: The evolutionary The tested topologies and results are shown in Supplemen- models were chosen on the basis of BIC and AIC criterions tary data. implemented in ProtTest version 1.3, Modeltest and Treefinder All model estimation and phylogenetic analyses were done (Abascal et al. 2005; Jobb et al. 2004; Posada and Crandall on the freely available Bioportal at the University of Oslo 1998). The best fitting evolutionary models for the protein (http://www.bioportal.uio.no/). partitions were estimated to be the rtREV+F (frequency) model. For the rDNA partition the GTR evolutionary model was preferred. For all analyses we accounted for invariable sites and among-site rate variation by assuming a gamma Note added in proof distribution of site rates approximated with four rate cate- gories. In analyses of concatenated data was divided into Tsui et al. (2008) recently published a global rDNA and protein data partitions and the best fitting models of evolution for each partition applied. heterokont phylogeny using three protein-coding Bayesian analyses were carried out with MrBayes MPI genes focusing mainly on the relationship version 3.1.2 (Huelsenbeck et al. 2001; Ronquist and between Labyrinthulida and Bicosoecida (Tsui Huelsenbeck 2003) with the best fitting BIC model estimated et al. 2008). In congruence with our topologies in Prottest, Modeltest and Treefinder. Bayesian covarion trees they obtained statistical support for the separation were generated from two runs with one heated and three cold chains in the Metropolis-coupled Monte Carlo Markov Chain of Bigyra from Pseudofungi and separation (MCMC) with 4,000,000 generations implemented. Both between the heterotrophs and Ochrophyta. How- covarion and non-covarion phylogenies were inferred, how- ever in Tsui et al., Labyrinthulida was moderately ever the covarion model, which allows for different between- supported as sister to Bicosoecida. In our trees, ARTICLE IN PRESS

202 I. Riisberg et al. based on seven gene sequences and different trophic stramenopiles) based on combined analyses of small taxon sampling, we found the latter two as weakly and large subunit ribosomal RNA. Protist 153: 123—132 supported paraphyletic groups. Bjørnland T, Liaaen-Jensen S (1989) Distribution Patterns of Carotenoids in Relation to Chromophyte Phylogeny and Systematics. In Green JC, Leadbeater BSC, Diver WL (eds) The Chromophyte Algae: Problems and Perspectives. Sys- Acknowledgements temat Assoc, Clarendon Press, Oxford, pp 37—60 We would like to thank Thomas Cavalier-Smith for Bodyl A (2005) Do plastid-related characters support the discussions and Ema Chao for DNA from Nerada chromalveolate hypothesis? J Phycol 41: 712—719 mexicana, Cedric Berney for lsu and ssu align- Burki F, Shalchian-Tabrizi K, Minge M, Skjaeveland A, ments of available heterokont sequences, and Ave Nikolaev SI, Jakobsen KS, Pawlowski J (2007) Phylogenomics Tooming-Klunderud for PCR amplifying and clon- reshuffles the eukaryotic supergroups. PLoS ONE 2:e790 ing of the actin sequence for the Labyrinthulida Burki F, Shalchian-Tabrizi K, Pawlowski J (2008) Phyloge- Thraustochytrium aureum. We thank the Bioportal nomics reveals a new ‘megagroup’ including most photosyn- at the University of Oslo for computer resources. thetic . Biol Lett 4: 366—369 The study was financially supported by grants Cavalier-Smith T (1986) The : Origin and from the Norwegian Research Council (Projects Systematics. In Round FE, Chapman DJ (eds) Progress in no. 166555 and 172572) to KSJ, PhD and by the Phycological Research, Vol. 4. Biopress Ltd., Bristol, pp 309—347 University of Oslo and the Norwegian University of Life Sciences for strategic funding (Trippelallian- Cavalier-Smith T (1997) and Bigyra, two phyla of sen). The seven single-gene alignments used in heterotrophic heterokont chromists. Arch Protistenkd 148: 253—267 this study have been made freely available for public access at the EMBL-Align database: ftp:// Cavalier-Smith T (1999) Principles of protein and ftp.ebi.ac.uk/pub/databases/embl/align/ under targeting in secondary : euglenoid, dinoflagel- late, and sporozoan plastid origins and the family the accession numbers ALIGN_001291—ALIGN_ tree. J Eukaryot Microbiol 46: 347—366 001297. Cavalier-Smith T, Chao EEY (2006) Phylogeny and mega- systematics of phagotrophic heterokonts (kingdom Chro- mista). J Mol Evol 62: 388—420 Appendix A. Supplementary material Daugbjerg N, Guillou L (2001) Phylogenetic analyses of Supplementary data associated with this article (Heterokontophyta) using rbcL gene sequences support their sister group relationship to diatoms. can be found in the online version at doi:10.1016/ Phycologia 40: 153—161 j.protis.2008.11.004. Daugbjerg N, Andersen RA (1997) A molecular phylogeny of the heterokont algae based on analyses of chloroplast- encoded rbcL sequence data. J Phycol 33: 1031—1041 References Edvardsen B, Eikrem W, Shalchian-Tabrizi K, Riisberg I, Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of Johnsen G, Naustvoll L, Throndsen J (2007) Verrucophora best-fit models of protein evolution. Bioinformatics 21: farcimen gen. et sp. nov. (Dictyochophyceae, Heterokonta) — 2104—2105 a bloom-forming ichthyotoxic flagellate from the Skagerrak, Norway. J Phycol 43: 1054—1070 Andersen RA, Van de Peer Y, Potter D, Sexton JP, Kawachi M, LaJeunesse T (1999) Phylogenetic analysis of the SSU Eppley R, Holmes RW, E P (1967) Periodicity in cell division rRNA from members of the Chrysophyceae. Protist 150: and physiological behaviour of Ditylum brightwellii, a marine 71—84 planktonic during growth in light—dark cycles. Arch Mikrobiol 56: 305—323 Andersen RA (2004) Biology and systematics of heterokont and algae. Am J Bot 91: 1508—1522 Fast NM, Xue LR, Bingham S, Keeling PJ (2002) Re- examining evolution using multiple protein molecular Bachvaroff TR, Sanchez-Puerta MV, Delwiche CF (2006) phylogenies. J Eukaryot Microbiol 49: 30—37 Rate variation as a function of gene origin in plastid-derived genes of peridinin-containing dinoflagellates. J Mol Evol 62 Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, 42-U27 Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer ELL, Bateman A (2006) Ben Ali A, De Baere R, Van der Auwera G, De Wachter R, Pfam: clans, web tools and services. Nucleic Acids Res 34: Van de Peer Y (2001) Phylogenetic relationships among algae D247—D251 based on complete large-subunit rRNA sequences. Int J Syst Evol Microbiol 51: 737—749 Goertzen LR, Theriot EC (2003) Effect of taxon sampling, character weighting, and combined data on the interpretation Ben Ali A, De Baere R, De Wachter R, Van de Peer Y (2002) of relationships among the heterokont algae. J Phycol 39: Evolutionary relationships among heterokont algae (the auto- 423—439 ARTICLE IN PRESS

Heterokont Phylogeny 203

Guillou L, Chretiennot-Dinet MJ, Medlin LK, Claustre H, missing data on large alignments. Mol Biol Evol 21: Loiseaux-de Goer S, Vaulot D (1999) Bolidomonas: a new 1740—1752 genus with two species belonging to a new algal class, the Bolidophyceae (Heterokonta). J Phycol 35: 368—381 Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817—818 Hillis DM (1998) Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst Biol 47: 3—8 Potter D, Saunders GW, Andersen RA (1997) Phylogenetic relationships of the Raphidophyceae and Xanthophyceae as Honda D, Kawachi M, Inouye I (1995) Sulcochrysis biplastida inferred from nucleotide sequences of the 18S ribosomal RNA gen. et sp. nov.: cell structure and absolute configuration of gene. Am J Bot 84: 966—972 the flagellar apparatus of an enigmatic chromophyte alga. Phycol Res 43: 1—16 Preisig HR (1989) The Flagellar Base Ultrastructure and Phylogeny of Chromophytes. In Green JC, Leadbeater BSC, Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Diver WL (eds) The Chromophyte Algae: Problems and Bayesian inference of phylogeny and its impact on evolu- Perspectives. Clarendon Press, Oxford, pp 167—187 tionary biology. Science 294: 2310—2314 Rannala B, Huelsenbeck JP, Yang ZH, Nielsen R (1998) Huelsenbeck JP (2002) Testing a covariotide model of DNA Taxon sampling and the accuracy of large phylogenies. Syst substitution. Mol Biol Evol 19: 698—707 Biol 47: 702—710 Jobb G, von Haeseler A, Strimmer K (2004) TREEFINDER: a Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian powerful graphical analysis environment for molecular phylo- phylogenetic inference under mixed models. Bioinformatics genetics. BMC Evol Biol 4:18 19: 1572—1574 Kim E, Simpson AGB, Graham LE (2006) Evolutionary Sanchez-Puerta MV, Delwiche CF (2008) A hypothesis for relationships of apusomonads inferred from taxon-rich ana- plastid evolution in chromalveolates. J Phycol 44: 1097—1107 lyses of 6 nuclear encoded genes. Mol Biol Evol 23: 2455—2466 Saunders GW, Potter D, Paskind MP, Andersen RA (1995) Cladistic analyses of combined traditional and molecular-data Kolodziej K, Stoeck T (2007) Cellular identification of a novel sets reveal an algal lineage. Proc Natl Acad Sci USA 92: uncultured marine (MAST-12 clade) small-sub- 244—248 unit rRNA gene sequence from a Norwegian estuary by use of fluorescence in-situ hybridization-scanning electron micro- Saunders GW, Potter D, Andersen RA (1997) Phylogenetic scopy. Appl Environ Microbiol 73: 2718—2726 affinities of the Sarcinochrysidales and Chrysomeridales (Heterokonta) based on analyses of molecular and combined Leipe DD, Wainright PO, Gunderson JH, Porter D, data. J Phycol 33: 310—318 Patterson DJ, Valois F, Himmerich S, Sogin ML (1994) The Stramenopiles from a molecular perspective — 16S-like Shalchian-Tabrizi K, Eikrem W, Klaveness D, Vaulot D, ribosomal-RNA sequences from Labyrinthuloides minuta and Minge MA, Le Gall F, Romari K, Throndsen J, Botnen A, roenbergensis. Phycologia 33: 369—377 Massana R, Thomsen HA, Jakobsen KS (2006a) , Leipe DD, Tong SM, Goggin CL, Slemenda SB, Pieniazek a new protist phylum with affinity to chromist lineages. Proc R NJ, Sogin ML (1996) 16S-like rDNA sequences from Devel- Soc Biol Sci B 273: 1833—1842 opayella elegans, Labyrinthuloides haliotidis, and Proteromo- Shalchian-Tabrizi K, Minge MA, Cavalier-Smith T, Nedrek- nas lacertae confirm that the stramenopiles are a primarily lepp JM, Klaveness D, Jakobsen KS (2006b) Combined heat heterotrophic group. Eur J Protistol 32: 449—458 shock protein 90 and ribosomal RNA sequence phylogeny Massana R, Terrado R, Forn I, Lovejoy C, Pedros-Alio C supports multiple replacements of dinoflagellate plastids. J (2006) Distribution and abundance of uncultured heterotrophic Eukaryot Microbiol 53: 217—224 flagellates in the world oceans. Environ Microbiol 8: Shalchian-Tabrizi K, Skanseng M, Ronquist F, Klaveness 1515—1522 D, Bachvaroff TR, Delwiche CF, Botnen A, Tengs T, McMahon MM, Sanderson MJ (2006) Phylogenetic super- Jakobsen KS (2006c) Heterotachy processes in rhodo- matrix analysis of GenBank sequences from 2228 papilionoid phyte-derived secondhand plastid genes: Implications for legumes. Syst Biol 55: 818—836 addressing the origin and evolution of dinoflagellate plastids. Mol Biol Evol 23: 1504—1515 Moriya M, Nakayama T, Inouye I (2000) Ultrastructure and 18S rDNA sequence analysis of Wobblia lunata gen. et an. Shimodaira H (2002) An approximately unbiased test of nov., a new heterotrophic flagellate (Stramenopiles, Incertae selection. Syst Biol 51: 492—508 sedis). Protist 151: 41—55 Simpson AGB, Inagaki Y, Roger AJ (2006) Comprehensive Negrisolo E, Maistro S, Incarbone M, Moro I, Dalla Valle L, multigene phylogenies of excavate reveal the evolu- Broady PA, Andreoli C (2004) Morphological convergence tionary positions of ‘‘primitive’’ eukaryotes. Mol Biol Evol 23: characterizes the evolution of Xanthophyceae (Heterokonto- 615—625 phyta): evidence from nuclear SSU rDNA and plastidial rbcL Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood- genes. Mol Phylogenet Evol 33: 156—170 based phylogenetic analyses with thousands of taxa and Nosenko T, Bhattacharya D (2007) Horizontal gene transfer mixed models. Bioinformatics 22: 2688—2690 in chromalveolates. BMC Evol Biol 7: 173 Thompson JD, Higgins DG, Gibson TJ (1994) Clustal-W — Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, improving the sensitivity of progressive multiple sequence Casane D (2004) Phylogenomics of eukaryotes: impact of alignment through sequence weighting, position-specific gap ARTICLE IN PRESS

204 I. Riisberg et al. penalties and weight matrix choice. Nucleic Acids Res 22: Wiens JJ (2006) Missing data and the design of phylogenetic 4673—4680 analyses. J Biomed Inform 39: 34—42

Tippmann HF (2004) Analysis for free: comparing programs Wiens JJ, Moen DS (2008) Missing data and the accuracy of for sequence analysis. Brief Bioinform 5: 82—87 Bayesian phylogenetics. J Syst Evol 46: 307—314 Tsui CK, Marshall W, Yokoyama R, Honda D, Lippmeier JC, Craven KD, Peterson PD, Berbee ML (2008) Labyrinthulo- Wiens JJ, Reeder TW (1995) Combining data sets with mycetes phylogeny and its implications for the evolutionary different numbers of taxa for phylogenetic analysis. Syst Biol loss of and gain of ectoplasmic gliding. Mol 44: 548—558 Phylogenet Evol 50: 129—140 Yang Z (1997) PAML: a program package for phylogenetic Vazquez J (2004) Phylogenetics — MacClade 4: analysis of analysis by maximum likelihood. Comput Appl Biosci 13: phylogeny and character evolution, version 4.06. Am Biol Teacher 66: 511—512 555—556 Wiens JJ (2005) Can incomplete taxa rescue phylogenetic Yang Z (2007) PAML 4: phylogenetic analysis by maximum analyses from long-branch attraction? Syst Biol 54: 731—742 likelihood. Mol Biol Evol 24: 1586—1591