<<

Article

How many novel eukaryotic 'kingdoms'? Pitfalls and limitations of environmental DNA surveys

BERNEY, Cédric, FAHRNI, José, PAWLOWSKI, Jan Wojciech

Abstract

Over the past few years, the use of molecular techniques to detect cultivation-independent, eukaryotic diversity has proven to be a powerful approach. Based on small-subunit ribosomal RNA (SSU rRNA) gene analyses, these studies have revealed the existence of an unexpected variety of new phylotypes. Some of them represent novel diversity in known eukaryotic groups, mainly stramenopiles and . Others do not seem to be related to any molecularly described lineage, and have been proposed to represent novel eukaryotic kingdoms. In order to review the evolutionary importance of this novel high-level eukaryotic diversity critically, and to test the potential technical and analytical pitfalls and limitations of eukaryotic environmental DNA surveys (EES), we analysed 484 environmental SSU rRNA gene sequences, including 81 new sequences from sediments of the small river, the Seymaz (Geneva, Switzerland).

Reference

BERNEY, Cédric, FAHRNI, José, PAWLOWSKI, Jan Wojciech. How many novel eukaryotic 'kingdoms'? Pitfalls and limitations of environmental DNA surveys. BMC Biology, 2004, vol. 2, p. 13

DOI : 10.1186/1741-7007-2-13 PMID : 15176975

Available at: http://archive-ouverte.unige.ch/unige:82107

Disclaimer: layout of this document may differ from the published version.

1 / 1 BMC Biology BioMed Central

Research article Open Access How many novel eukaryotic 'kingdoms'? Pitfalls and limitations of environmental DNA surveys Cédric Berney*, José Fahrni and Jan Pawlowski

Address: Department of Zoology and Biology, University of Geneva, CH – 1211 Geneva 4, Switzerland Email: Cédric Berney* - [email protected]; José Fahrni - [email protected]; Jan Pawlowski - [email protected] * Corresponding author

Published: 04 June 2004 Received: 05 February 2004 Accepted: 04 June 2004 BMC Biology 2004, 2:13 This article is available from: http://www.biomedcentral.com/1741-7007/2/13 © 2004 Berney et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.

Abstract Background: Over the past few years, the use of molecular techniques to detect cultivation- independent, eukaryotic diversity has proven to be a powerful approach. Based on small-subunit ribosomal RNA (SSU rRNA) gene analyses, these studies have revealed the existence of an unexpected variety of new phylotypes. Some of them represent novel diversity in known eukaryotic groups, mainly stramenopiles and alveolates. Others do not seem to be related to any molecularly described lineage, and have been proposed to represent novel eukaryotic kingdoms. In order to review the evolutionary importance of this novel high-level eukaryotic diversity critically, and to test the potential technical and analytical pitfalls and limitations of eukaryotic environmental DNA surveys (EES), we analysed 484 environmental SSU rRNA gene sequences, including 81 new sequences from sediments of the small river, the Seymaz (Geneva, Switzerland). Results: Based on a detailed screening of an exhaustive alignment of eukaryotic SSU rRNA gene sequences and the phylogenetic re-analysis of previously published environmental sequences using Bayesian methods, our results suggest that the number of novel higher-level taxa revealed by previously published EES was overestimated. Three main sources of errors are responsible for this situation: (1) the presence of undetected chimeric sequences; (2) the misplacement of several fast- evolving sequences; and (3) the incomplete sampling of described, but yet unsequenced . Additionally, EES give a biased view of the diversity present in a given biotope because of the difficult amplification of SSU rRNA genes in some taxonomic groups. Conclusions: Environmental DNA surveys undoubtedly contribute to reveal many novel eukaryotic lineages, but there is no clear evidence for a spectacular increase of the diversity at the level. After re-analysis of previously published data, we found only five candidate lineages of possible novel high-level eukaryotic taxa, two of which comprise several phylotypes that were found independently in different studies. To ascertain their taxonomic status, however, the organisms themselves have now to be identified.

Background genes revealed a huge diversity of eubacterial and archaeal Over the past few years, cultivation-independent identifi- phylotypes in environmental samples, many of which are cation of microbial organisms by PCR amplification and not represented by cultured organisms [1,2]. Recently, the sequencing of small-subunit ribosomal RNA (SSU rRNA) same techniques have been applied to surveys of

Page 1 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

eukaryotic diversity in different marine and freshwater The phylogenetic position of the 48 non-chimeric phylo- biotopes, including planktonic [3,4] and some extreme, types from our samples was assessed by minimum evolu- anoxic [5], acidic and iron-rich [6] or deep-sea hydrother- tion analyses. Results are illustrated in Figure 1 (see mal vent [7,8] environments. All these studies revealed an Additional file 2 for a summary of the identification of all unexpectedly high diversity of new eukaryotic phylotypes 81 sequences). The tree shown in Figure 1A is the result of at three distinct taxonomic levels. Some of them can be an analysis of 86 partial eukaryotic SSU rRNA gene attributed to novel species in already known genera, fam- sequences, including five selected environmental phylo- ilies or orders. Others represent novel lineages within types from previous studies. A total of 670 unambigu- already known eukaryotic groups, such as fungi, stra- ously aligned positions were included, and the GTR + G menopiles, alveolates, and kinetoplastids [8-11]. Finally, model of evolution was used (alpha = 0.37). Because of some of these new phylotypes do not seem to be related the short size of the amplified fragment, some phyloge- to any described lineage, and have been proposed to rep- netic signal was lost and the monophyly of cercozoans resent novel high-level taxonomic diversity in eukaryotes and fungi was not retrieved. Almost all phylotypes belong [5,7,8]. to already known eukaryotic groups. Their relative pro- portions are illustrated in Figure 1B. Only two phylotypes Here we report 81 new partial SSU rRNA gene sequences (Sey010 and Sey017, represented by ten and two of eukaryotes from sediments of the small river, the Sey- sequences, respectively) belong to a yet undetermined, maz (Geneva, Switzerland). We analyze these sequences fast-evolving eukaryotic lineage (Figure 1A). They clearly together with 403 complete or nearly complete environ- correspond to already published environmental mental eukaryotic sequences available in GenBank. We sequences from deep-sea Antarctic plankton (DH148-5- point out some of the pitfalls that can impede a correct EKD18 [3]), from the Guaymas Basin hydrothermal vent interpretation of the results of eukaryotic environmental (CS_R003 [7]), and from anoxic, marine sediments col- DNA surveys (EES), and evaluate the candidature of some lected in Bolinas Tidal Flat (BOL1 cluster [5]). These phy- phylotypes to represent novel higher-level eukaryotic lin- lotypes were screened by eye in search for rare sequence eages. We discuss the impact of an accurate assessment of signatures that would support their inclusion in already the environmental diversity on our view of known eukaryotic groups, but none could be detected, megaevolution, in light of recent hypotheses about the suggesting that this lineage might represent a novel high- shape of the eukaryotic tree and the position of its root. level taxon.

Results In the second part of this work, we re-analysed 403 com- Sequencing of 81 clones from an EES of the small river, plete or nearly complete published environmental the Seymaz (Geneva, Switzerland) yielded 58 distinct SSU sequences, representing 289 distinct phylotypes. We rRNA phylotypes. The size of the sequences varies from focused on 28 phylotypes that could not be attributed to 760 to 900 base pairs, which corresponds to the average known groups of eukaryotes. First, our general alignment size expected for the amplified fragment (helices 27 to 50 was screened by eye for the presence of specific sequence of the SSU rRNA secondary structure). Size variations signatures, as described above. It is noteworthy that sev- occur mainly in the variable region V7, but expansions eral previously undetected chimeras were identified in were observed in the variable region V8 for some that way, among which three phylotypes were considered sequences. The newly obtained SSU rRNA phylotypes as novel high-level taxa, and this result was confirmed by were added to a general alignment of eukaryotes, includ- partial treeing analysis. The phylogenetic position of all ing most complete or nearly complete sequences from non-chimeric phylotypes was analysed using Bayesian EES available in GenBank. Sequences from cultured methods (Figures 2, 3, and 4; see Additional file 3 for a organisms were selected so that all major taxonomic summary of the identification of all 403 sequences). In groups of eukaryotes were represented; only extremely order to avoid the loss of important informative sites, divergent lineages such as microsporidia and metamon- none of our partial sequences were included in these anal- ads were omitted. Manual alignment of our sequences yses. The tree shown in Figure 2 is the result of a Bayesian allowed the identification of 10 chimeras, which were ini- analysis of 125 eukaryotic SSU rRNA gene sequences, tially detected because different regions of the same including a selection of 56 phylotypes from environmen- sequence contained rare substitutions and/or indels that tal surveys. A total of 1,175 unambiguously aligned posi- are specific for different groups of eukaryotes. Distance tions were included, and the GTR + G model of evolution analyses based on different subsets of unambiguously was used (alpha = 0.44). Since resolution within alveo- aligned regions (partial treeing analysis [12]) were then lates and was poor (using only 1,175 sites), used to confirm the chimeric nature of these sequences two additional datasets were designed to refine evolution- (see Additional file 1 for detailed examples of how we ary relationships within these supergroups. Figure 3 detected chimeric sequences). presents the result of a Bayesian analysis of 77

Page 2 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

98 Sey114 100 Sey071 Loxophyllum utriculare 99 Nyctotherus ovalis A Sey095 85 Sey115 100 Oxytricha nova Sey039 94 Sey089 Ciliophora 100 Epistylis chrysemydis 94 Sey084 (9) Paramecium tetraurelia 100 Sey116 98 Tracheloraphis sp. ambiguum 0.05 substitution/site 100 Sey031 61 100 Sey091 98 Monocystis agilis 99 Sey021 80 Ophriocystis elektroscirrha Sey061 serpentis (3) 58 Theileria parva 100 Vexillifera minutissima Sey078 Mayorella sp. 72 proteus 58 Sey088 Echinamoeba exundans (2) 100 DH148EKD18 74 CSR003 62 Sey017 100 BOLA048 Sey010 Sey103 Scherffelia dubia undetermined 96 Sey016 Chlorophyta 81 Volvox carteri eukaryotic lineage 54 Sey045 (3) Picocystis salinarum (2) 96 Sey024 100 Sey043 71 Bacillaria paxillifer 52 Sey081 Ciliophrys infusionum Stramenopiles 77 100 Sey087 Achlya bisexualis (5) 99 Cafeteria roenbergensis Sey077 97 Plasmodiophora brassicae 68 Phagomyxa odontellae 96 Sey055 Sey076 Gromia oviformis 77 Sey074 Nuclearia-like filose amoeba NPor Sey012 (5) Sey075 75 Gymnophrys cometa Chlorarachnion reptans

70 69 100 Metazoa 94 (3)

84 Sey111 100 Sey101 Sey098 Sey105 Sey044 99 Sey059 98 Neurospora crassa undetermined Sey060 eukaryotic lineage Sey062 B 75 Sey096 Fungi (4.2%) Sympodiomycopsis paphiopedili 98 Sey056 (16) Cercozoa 81 Sey015 99 Sey022 (10.4%) Amoebozoa Spizellomyces acuminatus Sey090 (4.2%) RT5iin3 Sey052 75 Sey019 Sey006 BAQK011

Ciliophora Fungi (18.9%) (33.3%)

Apicomplexa (6.2%) Metazoa (6.2%)

Stramenopiles Chlorophyta (10.4%) (6.2%)

IdentificationmazFigure (Geneva, 1 ofSwitzerland) the 48 distinct, non-chimeric eukaryotic phylotypes we obtained from our samples of the small river, the Sey- Identification of the 48 distinct, non-chimeric eukaryotic phylotypes we obtained from our samples of the small river, the Sey- maz (Geneva, Switzerland). (A) Phylogenetic positions of the 48 eukaryotic phylotypes we obtained. The tree shown is the result of a minimum evolution analysis of 68 partial SSU rRNA gene sequences, using the GTR + G model of evolution (see text). The number of phylotypes belonging to each higher-level eukaryotic group is indicated in brackets under the clade name. A fast-evolving lineage of undetermined taxonomic position is highlighted in blue. The tree was arbitrarily rooted on opisthokonts. Numbers at nodes are bootstrap support values following 10,000 replicates. All branches are drawn to scale. (B) Relative proportion of phylotypes belonging to each higher-level eukaryotic group.

Page 3 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

0.88 RT5iin20 1.0 - Bodomorpha minima 80 1.0 LKM30 1.0 - 1.0 Lecythium sp. 65 1.0 99 BOLA383 51 LKM48 1.0 Gymnophrys cometa 1.0 55 RT5iin4 Cercozoa (13) 100 Nuclearia-like filose amoeba N-Por 1.0 Plasmodiophora brassicae 1.0 100 Phagomyxa odontellae 0.60 - - LEMD052 1.0 - 0.99 Ammonia sp. 1.0 * - Reticulomyxa filosa 0.68 100 Allogromia sp. - 1.0 1.0 Minchinia teredinis - 57 Urosporidium crescens 1.0 Gromia oviformis - 1.0 OLI11032 1.0 92 DH147-EKD17 100 undet. symphyacanthid 211 0.91 Acanthometra sp. 205 - 0.99 Sphaerozoum punctatum “” (14) 1.0 1.0 60 AT8-54 0.89 58 100 Acrosphaera sp. CR6A 77 AT4-94 0.85 1.0 - OLI11072 - OLI11056 1.0 Emiliania huxleyi 1.0 100 OLI11007 Haptophyta (4) 100 Pavlova salina AT4-68 undetermined lineage (1) 1.0 Ciliophrys infusionum 1.0 1.0 100 OLI11025 54 0.95 94 1.0 RT7iin2 - 100 Bacillaria paxillifer 1.0 1.0 Achlya bisexualis - 99 BOLA515 1.0 DH148-EKD53 - 1.0 Cafeteria roenbergensis 0.97 90 CS_E045 - 1.0 OLI51105 Stramenopiles (38) 1.0 99 BAQA072 79 OLI11066 1.0 OLI11150 68 1.0 CS_E036 0.76 RT5iin25 63 0.87 100 - Diplophrys sp. ATCC50360 1.0 Labyrinthuloides minuta 62 Ulkenia profunda 1.0 Gonyaulax spinifera 0.78 82 Noctiluca sccintillans 69 Cryptosporidium parvum 1.0 Colpodella pontica 76 Alveolata (91, see Fig. 3) 0.89 1.0 Loxophyllum utriculare 50 94 Oxytricha nova 1.0 RT1n14cul 0.75 - 100 Volvox carteri 1.0 RT5iin2 0.78 1.0 100 Scherffelia dubia 77 100 OLI11305 (10) 1.0 OLI11059 99 RT5iin8 1.0 Helianthus annuus 94 1.0 Glaucosphaera vacuolata Rhodophyta 0.95 96 Compsopogon coeruleus - 0.68 LKM101 1.0 - cruzi 1.0 * 100 AT1-3 1.0 99 1.0 Ichthyobodo necator 100 100 AT4-56 AT4-96 1.0 * 1.0 RT8n7 100 * 100 gracilis Discicristates (10) 0.95 1.0 DH148-EKB1 66 100 Diplonema ambulator 1.0 RT5in38 1.0 99 ustiana 99 gruberi 0.77 C3_E012 1.0 - C2_E026 0.94 1.0 - 100 BOLA458 undetermined lineage (4) 1.0 100 BOLA212 - * BAQA065 undetermined lineage (1) 0.75 1.0 DH148EKD18 1.0 - * 100 CS_R003 0.88 100 BOLA048 undetermined lineage (3) - 0.93 DH145-EKD11 undetermined lineage (1) - 0.96 CS_E022 85 Jacoba incarcerata 1.0 americana Jakobidae (1) 99 Jacoba libera jakobiformis 0.58 * sp. ATCC50375 “ + Retortamonas 1.0 - * Carpediemonas membranifera 0.64 87 * C1_E027 + Diplomonadida group” (1) - 1.0 Streblomastix strix 100 marina Oxymonadida + Trimastix 1.0 BOLA366 0.98 99 BOLA187 90 “”Mastigamoeba invertens “M . invertens group” (2)

0.79 1.0 Goniomonas truncata 65 95 Guillardia theta Cryptophyta Cyanophora paradoxa Glaucophyta 0.80 0.76 Amastigomonas debruynei - 55 AT4-11 proboscidea (2) 0.60 AT4-50 66 sigmoides

0.58 Filamoeba nolandi 1.0 - RT5iin44 0.79 70 0.99 - LEMD267 - Platyamoeba stenopodia Acanthamoeba castellanii 1.0 1.0 Amoeba proteus Amoebozoa (4) 0.65 - 1.0 98 BOLA868 - 96 Leptomyxa reticulata Mayorella sp. 0.62 - LKM74 1.0 Lumbricus rubellus 0.95 - 98 Podocoryne carnea 1.0 Monosiga brevicollis 0.05 substitution/site 79 1.0 Filobasidiella neoformans Opisthokonts (49, see Fig. 4) 98 Saccharomyces cerevisiae Nuclearia simplex

Bayesianpositions),Figure 2 phylogeny including of56 eukaryotes selected environm based onental the phylotypesanalysis of 125 (indicated complete in bold)or nearly complete SSU rRNA gene sequences (1,175 Bayesian phylogeny of eukaryotes based on the analysis of 125 complete or nearly complete SSU rRNA gene sequences (1,175 positions), including 56 selected environmental phylotypes (indicated in bold). The number of phylotypes belonging to each higher-level eukaryotic group is indicated in brackets next to the clade name. Phylotypes previously considered as novel eukaryotic lineages, which are in fact fast-evolving members of known groups, are highlighted in orange. Phylotypes that could be identified thanks to an increasing taxon sampling are highlighted in green. The remaining phylotypes of undetermined taxo- nomic position are highlighted in blue. The tree is presented with a basal bifurcation between unikonts (Amoebozoa + opisthokonts) and . The GTR + G model of evolution was used, and the topology shown is a Bayesian consensus of 20,000 sampled trees (see text). The posterior probability of each resolved node is indicated above branches, while numbers under branches represent bootstrap support following 10,000 replicates of a minimum evolution analysis of the same dataset, using maximum likelihood-corrected estimates of the distances (dashes indicate bootstrap values under 50%). Branches are drawn to scale, except those marked with an asterisk (*), which were reduced by half for clarity.

Page 4 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

1.00 1.00 0.98 0.88 0.99 1.00 1.00

1.00

0.99 0.98 Apicomplexa (12)

0.98 0.93 1.00

0.98 1.00 0.97 0.74 1.00 0.97 0.85 Colpodellidae (3) 0.71

0.52 0.56 0.70 0.94 0.91 0.99 0.74

0.51 0.87 1.00 0.98 0.82 1.00 1.00 Dinoflagellata (44) 1.00

0.98 0.87 1.00 1.00 0.83

1.00 1.00 0.88 0.55 Perkinsea (3) 1.00

1.00 0.99 1.00 0.62 1.00 1.00 1.00 1.00 1.00 1.00 0.56 1.00 0.95 1.00 Ciliophora (29) 0.99 1.00 1.00

1.00 outgroup : Stramenopiles

Bayesianpositions),Figure 3 phylogeny including of44 alveolates selected environmbased on entalthe analysis phylotypes of 80 (indicated complete in or bold) nearly complete SSU rRNA gene sequences (1,325 Bayesian phylogeny of alveolates based on the analysis of 80 complete or nearly complete SSU rRNA gene sequences (1,325 positions), including 44 selected environmental phylotypes (indicated in bold). The number of phylotypes belonging to each of the five main alveolate lineages is indicated in brackets next to the clade name. Phylotypes previously considered as novel eukaryotic lineages, which are in fact fast-evolving members of known groups are highlighted in orange. Phylotypes that could be identified thanks to an increasing taxon sampling are highlighted in green. The tree is rooted with three stramenopile sequences. The GTR + G model of evolution was used, and the topology shown is a Bayesian consensus of 20,000 sampled trees (see text). The posterior probability of each resolved node is indicated. Branches are drawn to scale, except those marked with an asterisk (*), which were reduced by half for clarity.

Page 5 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

0.86 1.00 1.00 0.66 1.00 0.82 1.00 1.00 0.64 0.98 (15) 1.00 1.00 diplobastic metazoans 1.00

1.00 1.00 1.00 0.87 1.00 Choanoflagellata (3) 0.68 0.82

0.99 1.00

0.97 0.96 1.00 Ichthyosporea (1)

1.00 1.00 Nucleariidae 0.66 1.00 undetermined lineage (2) 0.92 0.99 0.99 0.96 0.99 1.00 (5) 1.00 1.00

1.00 1.00 1.00 0.99 0.83 0.62 1.00 1.00 0.78 (10) 1.00 1.00

1.00 0.52 Glomeromycota 1.00 0.92 0.95 undetermined fungal 1.00 0.72 lineage (11) 1.00

0.69 1.00 1.00 1.00 1.00

1.00 0.95 (2) 1.00 Chytridiomycota

0.99 Zygomycota

1.00 0.99 outgroup : Amoebozoa 0.84

Bayesianpositions),Figure 4 phylogeny including of 28 opisthokonts selected environm based entalon the phylotypes analysis of (indicated 80 complete in bold) or nearly complete SSU rRNA gene sequences (1,395 Bayesian phylogeny of opisthokonts based on the analysis of 80 complete or nearly complete SSU rRNA gene sequences (1,395 positions), including 28 selected environmental phylotypes (indicated in bold). The number of phylotypes belonging to each opisthokont lineage is indicated in brackets next to the clade name. An as yet undetermined lineage is highlighted in blue. The tree is rooted with five amoebozoan sequences. The GTR + G model of evolution was used, and the topology shown is a Baye- sian consensus of 20,000 sampled trees (see text). The posterior probability of each resolved node is indicated. All branches are drawn to scale.

Page 6 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

A undetermined eukaryotic lineages Amoebozoa (9.7%) (1.4%) Cercozoa Opisthokonts (4.5%) (19.4%) Radiolaria (4.8%) Apusozoa (0.3%) Viridiplantae Haptophyta (3.8%) (1.4%)

Discicristates (3.5%) Stramenopiles (17.6%) Alveolata (33.6%) B

chimeras Amoebozoa (13.8%) (1.4%) Opisthokonts undetermined (16.9%) eukaryotic lineages (3.5%) Apusozoa (0.7%) Cercozoa (4.5%) Viridiplantae (3.5%) Radiolaria (4.8%) “Excavates” (1.4%) Discicristates Haptophyta (3.5%) (1.4%)

Stramenopiles (13.1%) Alveolata (31.5%)

IdentificationFigure 5 of the 289 published phylotypes we re-analysed Identification of the 289 published phylotypes we re-analysed. (A) As determined by their authors and (B) after our re-analy- sis, highlighting the relative proportion of previously undetected chimeras and the reduced number of phylotypes of undeter- mined taxonomic position, compared to the proportion of phylotypes belonging to each defined higher-level eukaryotic group. The phylotypes related to, respectively, Mastigamoeba invertens, incarcerata, and the Carpediemonas + Retortamonas + lineage were grouped together as 'Excavates'.

Page 7 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

Table 1: Summary of our re-analysis of 28 published phylotypes proposed to represent novel high-level eukaryotic diversity

Phylotype GenBank accession number Taxonomic status proposed after our re-analysis Reference

Previously undetected chimeric sequences CS_E042 AY046663 1/2 = CS_E028 (Dinoflagellata)/2/2 = CS_E022 Edgcomb et al. 2002 [7] (Jakobidae) LEMD145 AF372805 1/2 = undet. Ascomycota/2/2 = LEMD003 (Gregarinia) Dawson & Pace 2002 [5] LEMD119 AF372777 1/2 = undet. Apicomplexa/2/2 = LEMD003 (Gregarinia) Dawson & Pace 2002 [5] Misplaced fast-evolving phylotypes LEMD267 AF372778 Lobosea (Amoebozoa) Dawson & Pace 2002 [5] C1_E016 AY046618 Dinoflagellata (Alveolata) Edgcomb et al. 2002 [7] C3_E014 AY046873 Apicomplexa (Alveolata) Edgcomb et al. 2002 [7] C1_E017 AY046619 Apicomplexa (Alveolata) Edgcomb et al. 2002 [7] C2_E016 AY046806 Apicomplexa (Alveolata) Edgcomb et al. 2002 [7] BOLA267 AF372774 Apicomplexa (Alveolata) Dawson & Pace 2002 [5] LEMD134 AF372806 Apicomplexa (Alveolata) Dawson & Pace 2002 [5] LEMD003 AF372797 Apicomplexa (Alveolata) Dawson & Pace 2002 [5] CS_E036 AY046668 Labyrinthulata (Stramenopiles) Edgcomb et al. 2002 [7] LEMD052 AF372744 Cercozoa () Dawson & Pace 2002 [5] Phylotypes identified with an increasing molecular sampling of described organisms AT4-11 AF530526 Apusomonadidae (Apusozoa) López-García et al. 2003 [8] BOLA187 AF372745 'Mastigamoeba invertens group' Dawson & Pace 2002 [5] BOLA366 AF372746 'Mastigamoeba invertens group' Dawson & Pace 2002 [5] CS_E022 AY046649 Jakobidae (Excavates) Edgcomb et al. 2002 [7] C1_E027 AY046628 'Retortamonas/Carpediemonas group' (Excavates) Edgcomb et al. 2002 [7] Phylotypes that passed our checking procedure: DH145-EKD11 AF290065 possibly novel high-level lineage López-García et al. 2001 [3] DH148-5-EKD18 AF290084 possibly novel high-level lineage López-García et al. 2001 [3] C3_E012 AY046842 possibly novel high-level lineage Edgcomb et al. 2002 [7] C2_E026 AY046816 possibly novel high-level lineage Edgcomb et al. 2002 [7] CS_R003 AY046643 possibly novel high-level lineage Edgcomb et al. 2002 [7] BOLA212 AF372767 possibly novel high-level lineage Dawson & Pace 2002 [5] BOLA458 AF372771 possibly novel high-level lineage Dawson & Pace 2002 [5] BOLA048 AF372821 possibly novel high-level lineage Dawson & Pace 2002 [5] BAQA065 AF372825 possibly novel high-level lineage Dawson & Pace 2002 [5] AT4-68 AF530543 possibly novel high-level lineage López-García et al. 2003 [8]

SSU rRNA gene sequences, inferred from 1,325 unambig- Discussion uously aligned positions, using the GTR + G model of evo- Our study, based on a detailed visual screening of an lution (alpha = 0.38). Figure 4 presents the result of a exhaustive alignment of SSU rRNA gene sequences of Bayesian analysis of 75 opisthokont SSU rRNA gene eukaryotes and the phylogenetic re-analysis of previously sequences, inferred from 1,395 unambiguously aligned published environmental sequences using Bayesian meth- positions, using the same model (alpha = 0.37). Remark- ods, shows that at least 18 of the 28 previously published ably, 10 of the 25 non-chimeric phylotypes that could not phylotypes proposed to represent novel high-level eukary- be attributed to known lineages of eukaryotes are now otic diversity were misidentified (Table 1). Three main robustly identified as fast-evolving members of different sources of errors are responsible for this situation. well-known groups (mainly alveolates), and five other phylotypes can be linked to recently published sequences Undetected chimeric sequences of various small eukaryotic lineages (Figures 2 and 3). Fig- When performing PCR amplifications of SSU rRNA genes ure 5 summarizes the proportion of phylotypes belonging on total environmental DNA extracts, chimeric sequences to each of the higher-level eukaryotic groups identified in are easily formed because highly conserved regions of EES as previously published (Figure 5A) and after our re- ribosomal genes can anneal even between sequences from analysis (Figure 5B). distantly related organisms. As a result, chimeras can rep- resent a relatively large proportion of environmental sequences [12,13]. In our samples, at least 10 out of the 58 phylotypes we obtained (about 17%) could be identi-

Page 8 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

fied as chimeras (see Additional file 2). Comparison with artifactually attracted to each other in phylogenetic trees previous studies are difficult, because although most [17]. In the case of eukaryotic phylogenies, this is espe- authors checked for the presence of chimeric sequences in cially problematic when prokaryotic sequences are used to their data, some do not indicate precisely how many root the trees, because distant outgroup sequences act as clones were sequenced and how many of them were chi- long branches [18]. The resulting topologies often merely meras. However, we found at least 40 undetected chime- correspond to a 'sequential attachment of longer and ras among previously published environmental longer branches in the absence of any evolutionary signal' sequences, of which three of the phylotypes are consid- [19]. A possible solution to this problem is to avoid the ered as novel high-level taxa (Figure 5, Table 1). The fact use of prokaryotic outgroup sequences in eukaryotic phy- that chimeras represent about 14% of the 289 phylotypes logenies. We recently used this approach to demonstrate we re-analysed is of concern, given that chimeric the relationship between Foraminifera and Cercozoa, in sequences are a source of artifactual diversity and can bias spite of the extreme SSU rRNA gene divergence between phylogenetic reconstructions [14]. both groups [20]. Besides, the rapid accumulation of SSU rRNA sequences in the databases constantly diminishes These results support the idea that the methods generally the risks of LBA artifacts in phylogenetic analyses. With an used for the identification of chimeras might be mislead- increasing taxonomic sampling, the chances of finding ing [13]. In order to detect potential chimeric sequences, slowly evolving taxa closely related to the fast-evolving programs such as CHECK_CHIMERA [15] can be used. ones increase. This can hopefully help to position cor- However, the efficiency of these programs depends largely rectly the long branches in the trees. However, the on the completeness of the databases. When a chimera is problem of the position of the root persists in the case of composed of two parts for which no closely related phylogenetic analyses performed without obvious out- sequences are available, then either part will have low group sequences because it can still be argued that the similarity to all sequences in the database. Furthermore, root lies along the stem-branch of one of the apparently the asymmetric composition of some chimeric sequences fast-evolving lineages. Other evidence, such as rare – that is, chimeras in which the putative breakpoint is very genomic rearrangements, should hopefully help resolving close to one extremity of the sequence – can limit drasti- this problem. Following recent hypotheses on the posi- cally the available signal for their detection. Of the 10 chi- tion of the root of the eukaryotic tree [21,22], we decided meras we detected in our sequences, only five could be to root our eukaryotic phylogeny between unikonts unambiguously identified as such using (opisthokonts + Amoebozoa) and bikonts. Under these CHECK_CHIMERA. A thorough visual checking of all conditions, we are confident that the topology obtained putative new phylotypes for the presence of specific best reflects the true phylogenetic signal present in the sequence tags might thus prove a more efficient way to sequences, and that LBA artifacts are minimized. detect chimeras. Incomplete molecular sampling of described eukaryotes Long-branch attraction artifacts The incompleteness of molecular databases for known The high heterogeneity of the rates of substitution in the eukaryotes is a third source of misinterpretation of the SSU rRNA gene sequences of eukaryotes is a second source results of EES. The identification of molecular phylotypes of errors for an accurate evaluation of the diversity in EES. as novel eukaryotic groups is correct only if we can be sure López-García et al. [8] showed that the two undetermined that these phylotypes do not belong to some described, phylotypes BOLA267 [5] and C3_E014 [7] belong to but as yet unsequenced eukaryotes. Unfortunately, the some as yet undescribed, fast-evolving, apicomplexan lin- proportion of eukaryotic taxa for which no molecular data eage. We decided to screen the 23 remaining non-chi- exist is still relatively high. To our knowledge, SSU rRNA meric, undetermined phylotypes by eye to look for rare data are available only for about 35 of the 170 genera sequence signatures that would also support their considered as amoebae and flagellates of uncertain affini- inclusion in already known eukaryotic supergroups. At ties in a recent taxonomic review of [23]. Besides, least eight sequences displayed such signatures, suggest- no molecular data exist yet for some higher-level, mor- ing that they are not representatives of novel high-level, phologically well-defined taxa, such as the Hemimastigo- taxonomic diversity, but fast-evolving members of well- phora, the testate, lobose , or for some known groups, such as lobose amoebae or apicomplexan members of the polyphyletic heliozoans, and for many alveolates. These results were strongly confirmed by our families of the testate, filose amoebae and the so-called Bayesian analyses, which correctly placed all 8 sequences 'ramicristate amoebae' (gymnamoebae) [23], whose (Figures 2 and 3; Table 1). monophyly is uncertain.

Because of the well-known long-branch attraction (LBA) There are several examples that show how the putatively phenomenon [16], fast-evolving sequences tend to be novel eukaryotic phyla disappear with publication of new

Page 9 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

sequences. When Edgcomb et al. [7] published their good PCR amplifications even from cultured organisms results, no SSU rRNA sequences from flagellates (A. Smirnov, personal communication). It is not surpris- were available, and data on other so-called excavate taxa ing that these eukaryotes are rarely found in EES. were scarce. Re-analysis of their sequences in light of the results published by Simpson et al. [24] reveals that two of In our EES of the River Seymaz, we decided to amplify the phylotypes that previously did not show any close only the second half of the SSU rRNA gene, which is resemblance to molecularly described groups of eukaryo- generally more conserved both in sequence length and tes (CS_E022 and C1_E027) turned out to be related to primary structure, in the hope of avoiding negative com- Jakoba incarcerata and the clade comprising Carpediemonas petition against SSU rRNA gene sequences of unusual + Retortamonas + diplomonads, respectively (Figure 2). length or high divergence. The analysis of our data reveals Similarly, the phylotype AT4-11 [9] is related to the apu- the presence of many common eukaryotic groups, includ- somonad Amastigomonas (Figure 2). The same applies to ing , cercozoans, chlorophytes, diatoms, and fungi, the potential new diversity within known eukaryotic which are expected to be present in a freshwater environ- supergroups. Dawson and Pace [5] obtained three ment like the small river, the Seymaz (Figure 1, Additional sequences forming a novel alveolate lineage near apicom- file 2). However, some groups of common protists that plexans (BOLA176, BOLA553, and BOLA914). In light of were repeatedly observed microscopically in the same area the results published by Kuvardina et al. [25], Leander et over previous years, such as lobose amoebae and eugleno- al. [26] showed that it corresponds to the family Col- zoans (R. Peck, personal communication), were widely podellidae. Likewise, we recently obtained the first SSU under-represented or even absent from the sequences we rRNA gene sequence from the heliozoan-like Sticholonche obtained (Figure 1, Additional file 2). We suspect that this zanclea [27]. This sequence turned out to be closely related discrepancy might also apply to previously published EES to Acantharea and Polycystinea (data not shown), and it of marine biotopes. The use of several combinations of corresponds to the previously published environmental universal and/or specific primers, coupled with the use of 'radiolarian' phylotypes DH145-KW16 [28] and CS_E043 a range of different PCR conditions, might allow a more [7]. Obtaining molecular data on a comprehensive sam- realistic qualitative sample of the diversity of organisms pling of described protists is thus of prime importance to present in a given biotope, although this would never be avoid over-interpretation of the diversity revealed by EES. guaranteed.

Other pitfalls of eukaryotic environmental DNA surveys Identifying novel eukaryotic lineages The correct identification of higher-level phyla is only one After carefully re-analyzing most available near full-length of the problems related to EES. Another obvious problem environmental eukaryotic sequences, we found that the is the accurateness of the diversity inferred from EES data. number of supposedly novel higher-level phylotypes that Whether molecular surveys correctly represent the eukary- cannot be included in defined eukaryotic supergroups is otic diversity in a given biotope is of crucial importance much smaller than enthusiastically proclaimed by the for inferring accurate ecological conclusions from the authors of some previous studies [5,7] (see Figure 5 and samples. Foraminifera are a good example of an impor- Table 1). Among 28 phylotypes, three were identified as tant taxonomic group that is absent in all environmental chimeras, 10 were misplaced fast-evolving sequences, and surveys reported so far, although they are present in both five were identified after new molecular data on described planktonic and benthic marine biotopes, as well as in eukaryotes became available. Among the remaining 10 freshwater biotopes, including the small river we sampled candidates, three phylotypes (DH148-5-EKD18, [29]. This may be due to the extreme divergence of CS_R003, BOLA048) from three different EES form a foraminiferan ribosomal genes, which cannot be directly strongly supported clade with two of the phylotypes we amplified with most known universal primers [30], obtained in our samples (Sey010 and Sey017), suggesting although other explanations, such as a low abundance of that they belong to a group of organisms present in all Foraminifera in the samples, cannot be discarded. Apart types of environment (Figures 1 and 2). Another candi- from sequence divergence, the wide range of possible date cluster that passed our checking procedure comprises lengths for eukaryotic SSU rRNA genes can also be an the phylotypes C3_E012 and C2_E026 [7] and the BOL2 important limiting factor during PCR amplifications or cluster [5] (Figure 2). Finally, three isolated phylotypes cloning. Even with appropriate primers, it is doubtful that from previous EES might also represent novel high-level complete SSU rRNA gene sequences of more than 3,000 diversity: AT4-68 [8], DH145-EKD11 [3], and BAQA065 nucleotides, such as those of most foraminiferans, many [5] (Figure 2). euglenozoans, and some amoebozoans, would amplify or be cloned in the presence of competing sequences of nor- Although these phylotypes passed our checking, it is pre- mal length. Finally, in case of some eukaryotes (lobose mature to claim that they truly represent novel eukaryotic amoebae, actinophryid heliozoans) it is difficult to obtain kingdoms. First, we cannot exclude the possibility that the

Page 10 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

three phylotypes represented by single sequences are multigenic databases [34,35]. Following this view, most amplification artifacts, especially in the case of the of the eukaryotic diversity can be distributed into eight extremely divergent sequence BAQA065, or as yet unde- 'supergroups' [36], with a limited number of possibly tected chimeras. It is also uncertain what the real nature of independent, smaller, high-level lineages such as apuso- the two clusters of undetermined phylotypes is. The fact zoans or centroheliozoans [37-39]. Most of the taxa that that sequences belonging to these two clades were inde- were traditionally considered early diverging branches of pendently found in several different EES indicates that the eukaryotic tree [40] are now seen as highly derived they probably represent real taxonomic lineages. With an members of groups belonging to the so-called crown of increasing taxonomic sampling and/or the development eukaryotes [41]. It seems, therefore, that the eukaryotes, in of better phylogenetic tools, it might ultimately be possi- terms of cytological innovations and fundamental body ble to link these clusters with already known groups of plans, are much less diverse than previously thought; the eukaryotes. In the tree shown in Figure 2, all fast-evolving opposite view that emerged at the dawn of the molecular phylotypes of unknown affiliation are grouped in a clade systematics era was strongly biased by LBA artifacts. Fur- that also includes the jakobid flagellates and the discic- thermore, the whole diversity of eukaryotes may even be ristates (Heterolobosea + ). Because these reduced to a single basal bifurcation between unikonts sequences are fast evolving (especially the clade present in and bikonts [22]. However, the existence among extant our samples), we cannot exclude the possibility that their eukaryotes of truly ancient lineages predating the grouping in the tree is the result of LBA artifacts that even unikont/ divergence cannot be excluded. The detec- Bayesian analyses and a large taxa sampling could not tion of such early diverging organisms, if they exist, might avoid. Supposing that this is not the case, however, these prove difficult and necessitate different molecular phylotypes might belong to the recently proposed super- approaches, such as the use of randomly modified eukary- group of excavates [31]. Alternatively, these sequences otic primers. In this respect, the use of cultivation-inde- might belong to some extremely fast-evolving apicompl- pendent identification of eukaryotes by PCR exan parasites, as suggested by some distance analyses per- amplification of SSU rRNA gene sequences should not be formed on a larger dataset (data not shown), and as neglected, provided that results of such EES are correctly proposed by Cavalier-Smith [32] in a similar, simultane- interpreted, and the pitfalls discussed in our study are ous study. No clear sequence signature could be detected circumvented. to support the inclusion of any of these phylotypes in the apicomplexan alveolates. However, such signatures are Conclusions secondarily absent in some of the fastest evolving Environmental DNA surveys undoubtedly contribute to sequences of gregarines known to date. Finally, it is possi- unraveling many novel eukaryotic lineages. In view of our ble that these sequences represent as yet unrecognized results, however, there is no clear evidence for a spectacu- nucleomorphs, which are generally characterized by rapid lar increase of the diversity at a megaevolutionary level. rates of substitution [33]. This is in agreement with the recent view of eukaryotic evolution, proposing that most of the known diversity of Whatever hypothesis is correct, the influence of LBA will eukaryotes can be attributed to a relatively small number be difficult to disprove convincingly in the case of such of 'supergroups'. After re-analysis of previously published fast-evolving sequences. Therefore, the only way to ascer- data, we found only five candidate lineages of possibly tain the nature of these putative novel high-level taxa is to novel high-level eukaryotic taxa, four of which are typi- identify them in environmental samples using such cally fast evolving. Only two of these lineages comprise approaches as the fluorescent in situ hybridization. This several phylotypes that were found independently in dif- technique was successfully used by Massana et al. [10] to ferent studies, suggesting that they represent real taxo- identify representatives of two novel lineages of strameno- nomic lineages. To ascertain their taxonomic status, piles. One of these lineages was shown to be an important however, the organisms themselves must now be component of the total stock of bacterial grazers in a identified. coastal environment [10]. Similarly, the novel eukaryotic lineages that might be revealed with this approach might Methods turn out to be quantitatively and/or ecologically impor- Sediment was sampled in the small river, the Seymaz tant members of the biotopes to which they belong. (Geneva, Switzerland), in May and June, 2002. Total DNA extractions were performed following a protocol modified How large is the novel eukaryotic mega-diversity? from Zhou et al. [42], as detailed in Holzmann et al. [29]. The fact that most of the new phylotypes discovered in A fragment of about one half of the SSU rRNA gene was EES can be attributed to already known supergroups of amplified by PCR with the universal primers s12.2 (5'- eukaryotes is not surprising given the new view of eukary- GATYAGATACCGTCGTAGTC-3') and sB (5'-TGATCCT- otic evolution that is emerging from recent analyses of TCTGCAGGTTCACCTAC-3'). PCR amplifications, purifi-

Page 11 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

cations, cloning and sequencing were done as described Supplementary Table 1. Identification of the 81 environmental eukaryotic elsewhere [43]. sequences we obtained from our samples of the small river, the Seymaz (Geneva, Switzerland). Two phylotypes of undetermined taxonomic posi- tion are highlighted in blue. SSU rRNA gene sequences were aligned manually with the Click here for file Genetic Data Environment software, version 2.2 [44], fol- [http://www.biomedcentral.com/content/supplementary/1741- lowing a secondary structure model [45]. Chimeras were 7007-2-13-S2.xls] identified by visual screening of the alignment in search of contradictory sequence signatures, and confirmed by par- Additional File 3 tial treeing analysis [12,46]. PAUP* [47] was used for Supplementary Table 2. Identification of the 403 published, environmen- minimum evolution analyses using the GTR model of tal eukaryotic sequences we re-analysed. Previously undetected chimeras are highlighted in pink. Phylotypes previously considered as novel eukary- substitution [48,49], and taking into account a gamma- otic lineages, which are in fact fast-evolving members of known groups are shaped distribution of the rates of substitution among highlighted in orange. Phylotypes that could be identified thanks to an sites, with eight rate categories. Maximum likelihood-cor- increasing taxon sampling are highlighted in green. Remaining phylotypes rected estimates of the distances were used, and parame- of undetermined taxonomic position are highlighted in blue. ters were estimated from the dataset. Bayesian analyses Click here for file were performed with MrBayes, version 3.0b4 [50], using [http://www.biomedcentral.com/content/supplementary/1741- 7007-2-13-S3.xls] the GTR + G model, as above. For each dataset, the chains were run for 2,500,000 generations, and 25,000 trees were sampled. The first 5,000 sampled trees, corresponding to the initial phase before the chains reach stationarity Acknowledgements (burn-in), were discarded. The reliability of internal The authors wish to thank Louisette Zaninetti, Alexey Smirnov, Robert branches was assessed using the posterior probabilities Peck, and Juan Montoya for helpful discussion. This work was supported by (PP) calculated with MrBayes. Alternatively, the bootstrap the Swiss NSF grant 3100-064073 and 3100A0-100415. method [51] was used with 10,000 replicates for mini- mum evolution analyses, as described above. The 48 non- References chimeric phylotypes reported in this paper have been 1. Barns SM, Delwiche CF, Palmer JD, Pace NR: Perspectives on archaeal diversity, thermophily and monophyly from envi- deposited in the EMBL/GenBank database under acces- ronmental rRNA sequences. Proc Natl Acad Sci USA 1996, sion numbers AY605183 to AY605230. 93:9188-9193. 2. Hugenholtz P, Goebel BM, Pace NR: Impact of culture-independ- ent studies on the emerging phylogenetic view of bacterial List of abbreviations used diversity. J Bacteriol 1998, 180:4765-4774. EES, eukaryotic environmental DNA survey; LBA, long- 3. López-García P, Rodríguez-Valera F, Pedrós-Alió C, Moreira D: Unexpected diversity of small Eukaryotes in deep-sea Ant- branch attraction; SSU rRNA, small-subunit ribosomal arctic plankton. Nature 2001, 409:603-607. RNA 4. Moon-van der Staay SY, De Wachter R, Vaulot D: Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature 2001, 409:607-610. Authors' contributions 5. Dawson SC, Pace NR: Novel kingdom-level eukaryotic diver- CB constructed and screened the sequence alignments, sity in anoxic environments. Proc Natl Acad Sci USA 2002, 99:8324-8329. carried out all phylogenetic analyses, drafted the manu- 6. Amaral Zettler LA, Gómez F, Zettler E, Keenan BG, Amils R, Sogin script and prepared all figures and tables. JF performed ML: Eukaryotic diversity in Spain's River of Fire. Nature 2002, total DNA extractions and carried out PCR amplifications, 417:137. 7. Edgcomb VP, Kysela DT, Teske A, de Vera Gomez A, Sogin ML: Ben- purifications, cloning and sequencing. JP supervised the thic eukaryotic diversity in the Guaymas Basin hydrothermal study and participated in the preparation of the manu- vent environment. Proc Natl Acad Sci USA 2002, 99:7658-7662. 8. López-García P, Philippe H, Gail F, Moreira D: Autochthonus script. All authors read and approved the final eukaryotic diversity in hydrothermal sediment and experi- manuscript. mental microcolonizers at the Mid-Atlantic ridge. Proc Natl Acad Sci USA 2003, 100:697-702. 9. van Hannen EJ, Mooij W, van Agterveld MP, Gons HJ, Laanbroek HJ: Additional material Detritus-dependent development of the microbial commu- nity in an experimental system: qualitative analysis by dena- turing gradient gel electrophoresis. Appl Environ Microbiol 1999, Additional File 1 65:2478-2484. 10. Massana R, Guillou L, Díez B, Pedrós-Alió C: Unveiling the organ- Supplementary Figure 1. Illustration of the methods we used for the detec- isms behind novel eukaryotic ribosomal DNA sequences tion of chimeric sequences. from the ocean. Appl Environ Microbiol 2002, 68:4554-4558. Click here for file 11. Moreira D, López-García P: The molecular ecology of microbial [http://www.biomedcentral.com/content/supplementary/1741- Eukaryotes unveils a hidden world. Trends Microbiol 2002, 7007-2-13-S1.pdf] 10:31-38. 12. Hugenholtzt P, Huber T: Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. Int J Additional File 2 Syst Evol Microbiol 2003, 53:289-293.

Page 12 of 13 (page number not for citation purposes) BMC Biology 2004, 2 http://www.biomedcentral.com/1741-7007/2/13

13. Robison-Cox JF, Bateson MM, Ward DM: Evaluation of nearest- 37. Atkins MS, McArthur AG, Teske AP: : a new neighbor methods for detection of chimeric small-subunit phylogenetic lineage among the closely related to rRNA sequences. Appl Environ Microbiol 1995, 61:1240-1245. the common ancestor of Metazoans, Fungi, and Choanoflag- 14. Liesack W, Weyland H, Stackebrandt E: Potential risks of gene ellates (Opisthokonta). J Mol Evol 2000, 51:278-285. amplification by PCR as determined by 16S rDNA analysis of 38. Cavalier-Smith T, Chao EEY: Phylogeny of Choanozoa, Apuso- a mixed-culture of strict barophilic bacteria. Microb Ecol 1991, zoa, and other Protozoa and early Eukaryote 21:191-198. megaevolution. J Mol Evol 2003, 56:540-563. 15. Cole JR, Chai B, Marsh TL, Farris RJ, Wang Q, Kulam SA, Chandra S, 39. Cavalier-Smith T, Chao EEY: Molecular phylogeny of McGarrell DM, Schmidt TM, Garrity GM, Tiedje JM: The Ribosomal , a novel lineage of bikont Eukaryotes that arose by Database Project (RDP-II): previewing a new autoaligner ciliary loss. J Mol Evol 2003, 56:387-396. that allows regular updates and the new prokaryotic 40. Sogin ML: Early evolution and the origin of Eukaryotes. Curr . Nucleic Acids Res 2003, 31:442-443. Opin Genet Dev 1991, 1:457-463. 16. Felsenstein J: Cases in which parsimony or compatibility meth- 41. Philippe H, Germot A: Phylogeny of Eukaryotes based on ribos- ods will be positively misleading. Syst Zool 1978, 27:401-410. omal RNA: long-branch attraction and models of sequences 17. Philippe H: Opinion: Long branch attraction and evolution. Mol Biol Evol 2000, 17:830-834. phylogeny. Protist 2000, 151:307-316. 42. Zhou J, Bruns AM, Tiedje JM: DNA recovery from soils of diverse 18. Wheeler WC: Nucleic acid sequence phylogeny and random composition. Appl Environ Microbiol 1996, 62:316-322. outgroups. Cladistics 1990, 6:363-368. 43. Fahrni JF, Bolivar I, Berney C, Nassonova E, Smirnov A, Pawlowski J: 19. Stiller JW, Hall BD: Long-branch attraction and the rDNA Phylogeny of lobose amoebae based on actin and small-sub- model of early eukaryotic evolution. Mol Biol Evol 1999, unit ribosomal RNA genes. Mol Biol Evol 2003, 20:1881-1886. 16:1270-1279. 44. Larsen N, Olsen GJ, Maidak BL, McCaughey MJ, Overbeek R, Macke 20. Berney C, Pawlowski J: Revised small subunit rRNA analysis TJ, Marsh TL, Woese CR: The ribosomal database project. provides further evidence that Foraminifera are related to Nucleic Acids Res 1993, 21:3021-3023. Cercozoa. J Mol Evol 2003, 57(Suppl 1):120-127. 45. Wuyts J, De Rijk P, Van de Peer Y, Pison G, Rousseeuw P, De 21. Stechmann A, Cavalier-Smith T: Rooting the Eukaryote tree by Wachter R: Comparative analysis of more than 3000 using a derived gene fusion. Science 2002, 297:89-91. sequences reveals the existence of two pseudoknots in area 22. Stechmann A, Cavalier-Smith T: The root of the Eukaryote tree V4 of eukaryotic small subunit ribosomal RNA. Nucleic Acids pinpointed. Curr Biology 2003, 13:R665-R666. Res 2000, 28:4698-4708. 23. Lee JJ, Leedale GF, Bradbury P: An Illustrated Guide to the Protozoa 2nd 46. Kopczynski ED, Bateson MM, Ward DM: Recognition of chimeric edition. Lawrence, Kansas: Society of Protozoologists; 2000. small-subunit ribosomal DNAs composed of genes from 24. Simpson AGB, Roger AJ, Silberman JD, Leipe DD, Edgcomb VP, Jer- uncultivated microorganisms. Appl Environ Microbiol 1994, miin LS, Patterson DJ, Sogin ML: Evolutionary history of "early- 60:746-748. diverging" Eukaryotes: the excavate taxon Carpediemonas is 47. Swofford DL: PAUP*, phylogenetic analyses using parsimony (* and other a close relative of . Mol Biol Evol 2002, 19:1782-1791. methods) Sunderland, Massachussets: Sinauer Associates; 1998. 25. Kuvardina ON, Leander BS, Aleshin VV, Myl'nikov AP, Keeling PJ, Sim- 48. Lanave C, Preparata G, Saccone C, Serio G: A new method for cal- dyanov TG: The phylogeny of Colpodellids (Alveolata) using culating evolutionary substitution rates. J Mol Evol 1984, small subunit rRNA gene sequences suggests they are the 20:86-93. free-living sister group to Apicomplexans. J Eukaryot Microbiol 49. Rodriguez F, Oliver JL, Marin A, Medina JR: The general stochastic 2002, 49:498-504. model of nucleotide substitution. J Theor Biol 1990, 142:485-501. 26. Leander BS, Kuvardina ON, Aleshin VV, Myl'nikov AP, Keeling PJ: 50. Huelsenbeck JP, Ronquist F: MrBayes: Bayesian inference of phy- Molecular phylogeny and surface morphology of Colpodella logenetic trees. Bioinformatics 2001, 17:754-755. edax (Alveolata): insights into the phagotrophic ancestry of 51. Felsenstein J: Confidence limits on phylogenies: an approach Apicomplexans. J Eukaryot Microbiol 2003, 50:334-40. using the bootstrap. Evolution 1985, 39:783-791. 27. Nikolaev SI, Berney C, Fahrni JF, Boliver I, Polet S, Mylnikov AP, Aleshin VV, Petrov NB, Pawlowski J: The twilight of Heliozoa and rise of Rhizaria, a new supergroup of amoeboid Eukaryotes. Proc Natl Acad Sci USA 2004, 101:8066-8071. 28. López-García P, Rodríguez-Valera F, Moreira D: Toward the monophyly of Haeckel's Radiolaria: 18S rRNA environmen- tal data support the sisterhood of Polycystinea and Acantharea. Mol Biol Evol 2002, 19:118-121. 29. Holzmann M, Habura A, Giles H, Bowser SS, Pawlowski J: Freshwa- ter Foraminiferans revealed by analysis of environmental DNA samples. J Eukaryot Microbiol 2002, 50:135-139. 30. Pawlowski J: Introduction to the molecular systematics of Foraminifera. Micropaleontology 2000, Suppl 1:1-112. 31. Cavalier-Smith T: The phagotrophic origin of Eukaryotes and phylogenetic classification of Protozoa. Int J Syst Evol Microbiol 2002, 52:297-354. 32. Cavalier-Smith T: Only six kingdoms of life. Proc R Soc Lon B Biol Sci 2004 in press. Publish with BioMed Central and every 33. Van de Peer Y, Rensing SA, Maier UG, De Wachter R: Substitution rate calibration of small subunit ribosomal RNA identifies scientist can read your work free of charge chlorarachniophyte endosymbionts as remnants of green "BioMed Central will be the most significant development for algae. Proc Natl Acad Sci USA 1996, 93:7732-7736. disseminating the results of biomedical research in our lifetime." 34. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF: A kingdom- level phylogeny of Eukaryotes based on combined protein Sir Paul Nurse, Cancer Research UK data. Science 2000, 290:972-977. Your research papers will be: 35. Bapteste E, Brinkmann H, Lee JA, Moore DV, Sensen CW, Gordon P, Duruflé L, Gasterlan T, Lopez P, Müller M, Philippe H: The analysis available free of charge to the entire biomedical community of 100 genes supports the grouping of three highly divergent peer reviewed and published immediately upon acceptance amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci USA 2002, 99:1414-1419. cited in PubMed and archived on PubMed Central 36. Baldauf SL: The deep roots of Eukaryotes. Science 2003, yours — you keep the copyright 300:1703-1706. Submit your manuscript here: BioMedcentral http://www.biomedcentral.com/info/publishing_adv.asp

Page 13 of 13 (page number not for citation purposes)