<<

Received 13 November 2003 Accepted 3 February 2004 Published online 17 May 2004

Only six kingdoms of life Thomas Cavalier-Smith Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK ([email protected]) There are many more phyla of microbes than of macro-organisms, but microbial biodiversity is poorly understood because most microbes are uncultured. Phylogenetic analysis of rDNA sequences cloned after PCR amplification of DNA extracted directly from environmental samples is a powerful way of exploring our degree of ignorance of major groups. As there are only five eukaryotic kingdoms, two claims using such methods for numerous novel ‘-level’ lineages among anaerobic would be remarkable, if true. By reanalysing those data with 167 known species (not merely 8–37), I identified relatives for all 8–10 ‘mysterious’ lineages. All probably belong to one of five already recognized phyla (, , , Myzozoa, Loukozoa) within the basal kingdom , mostly in known classes, sometimes even in known orders, families or genera. This strengthens the idea that the ancestral was a mitochondrial aerobe. Analogous claims of novel bacterial divisions or kingdoms may reflect the weak resolution and grossly non-clock-like evolution of ribosomal rRNA, not genuine -level biological disparity. Critical interpretation of environmental DNA sequences suggests that our overall picture of microbial biodiversity at phylum or division level is already rather good and comprehensive and that there are no uncharacterized kingdoms of life. However, immense lower-level diversity remains to be mapped, as does the root of the tree of life. Keywords: kingdoms of life; Protozoa; gregarines; Amoebozoa; environmental PCR; eukaryote phylogeny

1. INTRODUCTION kingdoms are probably not justified. Taxonomic rank depends on phenotypic disparity (Cavalier-Smith 1998); All six kingdoms of life contain both unicellular and asserting it from unidentified sequences on trees is funda- macroscopically visible organisms (Cavalier-Smith 1998). mentally unsound. This is especially true for trees based However, the higher-level classification of microbes has on rRNA genes, which contrary to widespread assump- lagged considerably behind that of macro-organisms. Yet, tions are grossly non-clock-like and suffer from major sys- in the revised six-kingdom system (Cavalier-Smith 2002a, tematic biases in evolutionary mode that can result in 2003a), 34 out of the 57 living phyla consist entirely or radically incorrect topologies with long branches being largely of unicellular species (see table 1). With the placed far too deeply in the tree (Philippe 2000; Cavalier- notable exceptions of Cyanobacteria and Spirochaetes, Smith 2002b). Large systematic biases are also found in which are sufficiently distinctive morphologically to be many protein trees (see Cavalier-Smith 2002b; Gribaldo & easily recognized microscopically, establishing the major Philippe 2002). Some proteins, e.g. Hsp90 (Stechmann & divisions or phyla of bacteria has depended on being able Cavalier-Smith 2003a), appear to be much more clock- to culture them, which has not yet been achieved for the like and less misleading in this respect. The gross errors majority of microbes. This has raised the question of how possible with rRNA trees were first revealed for microspo- complete at the level of kingdoms and phyla is our inven- ridia, which early rRNA trees implied were the most diver- tory of the Earth’s microbial biodiversity. A powerful way gent of all eukaryotes (Vossbrinck et al. 1987). Studies of of exploring this question is to sequence well-conserved numerous proteins and the analysis of a complete and phylogenetically informative genes from uncultured microsporidian genome have shown conclusively that samples of soil or water taken directly from the environ- are highly derived secondarily amitochondr- ment. This approach was pioneered with bacteria using ial fungi and not early-branching eukaryotes, and that PCR primers specific for the 16S rRNA to amplify their deeper branching position on even the best 18S environmental DNA extracts and to generate gene rRNA trees obtained so far (Van de Peer et al. 2000) is a libraries that could be cloned and sequenced at random mathematical artefact (Roger 1999; Cavalier-Smith (Giovannoni et al. 1990). Numerous lineages that are 2000a, 2002b; Williams et al. 2002; Keeling 2003). highly divergent from those known from cultures have Accordingly microsporidia are now classified within the thus been identified. A few of these are now being charac- kingdom Fungi, not Protozoa (Cavalier-Smith 1998, terized and this process has already revealed quite novel 2000b); molecular and cytological evidence indicates that groups of bacteria of considerable ecological, physiological they evolved from an ancestor within the lower fungal phy- and systematic importance (Morris et al. 2002; Sait et lum Archemycota, possibly a parasitic trichomycete al. 2002). (Cavalier-Smith 2000b; Keeling 2003). The degree of Claims that the apparently most deeply diverging of misplacement of microsporidia on the rRNA tree is so such lineages, e.g. the 14 eubacterial ones found in hot great that it far exceeds the signal supporting the positions springs (Hugenholtz et al. 1998) or korarcheota (Barns et of most other species within the tree, so the position of al. 1996), represent undescribed divisions (phyla) or any long-branch on the tree must be treated with

Proc. R. Soc. Lond. B (2004) 271, 1251–1262 1251  2004 The Royal Society DOI 10.1098/rspb.2004.2705 1252 T. Cavalier-Smith Only six kingdoms of life

Table 1. The six kingdoms of life and the 34 microbial phyla (based on Cavalier-Smith 1998, 2002a, 2003a,b). empire PROKARYOTA (Cavalier-Smith 2002b) kingdom Bacteria subkingdom Negibacteria (phyla Eobacteria, Sphingobacteria, Spirochaetae, Proteobacteria, Planctobacteria, Cyanobacteria) subkingdom Unibacteria (phyla Posibacteria, Archaebacteria) empire EUKARYOTA (Cavalier-Smith 1998) kingdom Protozoa (Cavalier-Smith 2002a, 2003a) subkingdom Sarcomastigota (phyla Amoebozoa, Choanozoa) subkingdom Biciliata infrakingdom (phyla Cercozoa, , Radiozoa) infrakingdom (phyla Loukozoa, , , Metamonada; the latter now includes Parabasalia and ; Cavalier-Smith 2003a,b) infrakingdom Alveolata (phyla Myzozoa (Cavalier-Smith & Chao 2004), Ciliophora) Biciliata incertae sedis: phylum Apusozoa (may be sister to Excavata); phylum Heliozoab kingdom Animalia (Myxozoa and 21 othera phyla) (Cavalier-Smith 1998; Cavalier-Smith & Chao 2003c) kingdom Fungi (phyla Archemycota, Microsporidia, , ) (Cavalier-Smith 2000b) kingdom Plantae subkingdom Biliphyta (phyla Glaucophyta, Rhodophyta) subkingdom Viridaeplantae (Chlorophyta, Bryophytaa, Tracheophytaa) kingdom Chromista subkingdom Cryptista (phylum Cryptista: cryptophytes, goniomonads, katablepharids) subkingdom Chromobiota infrakingdom Heterokonta (phyla Ochrophyta, Pseudofungi, Opalozoa (comprising subphyla Opalinata, Sagenista) infrakingdom Haptista (phylum Haptophyta) a No microbial members. All 34 phyla that contain microbes are listed. b Although centrohelid might be Chromista, they probably belong in Protozoa (Biciliata) (Cavalier-Smith & Chao 2003a). deep suspicion unless it is corroborated by independent belong within the phylum Myzozoa (=Miozoa) as sisters evidence. Such evidence is not available for purely to the dinoflagellates (Cavalier-Smith & Chao 2004). environmental sequences. Within eukaryotes both single- Because within most phyla there are many known and gene protein trees and protein trees combining data from well-described (whether of clear or obscure taxo- several or many genes have revealed that most published nomic position) that have not yet been cultured or had rRNA trees have fundamentally incorrect topologies for their rRNA sequenced, any study of environmental-DNA several major long-branch ; this is particularly true libraries is bound to come up with unidentifiable lineages for trees published before corrections for intramolecular that are not close to known sequences. But in eukaryotes, variation in evolutionary rates became de rigueur and unlike bacteria, most of these are likely to be from groups those with unduly sparse taxon sampling. with previously known representatives; such lineages are Despite these problems, rRNA phylogeny remains a simply unidentified, not totally new to science. valuable tool for initial explorations of biodiversity. Such Anaerobic aquatic habitats are ecologically important studies on cultured protists (unicellular eukaryotes) have (Fenchel & Finlay 1995) but have been undersampled by contributed to major recent improvements in eukaryotic culturing and ecological studies. It was formerly thought phylogeny and high-level classification (Silberman et al. that early eukaryotes might have been anaerobic (Cavalier- 2002; Simpson et al. 2002; Cavalier-Smith 2003a; Cava- Smith 1983a,b). However, all well-characterized groups of lier-Smith & Chao 2003a,b,c, 2004) and through synthesis anaerobes have now been shown to have had aerobic with morphological and other data to a considerable ancestors (Embley & Hirt 1998; Roger 1999; Silberman reduction in the number of phyla (Cavalier-Smith et al. 2002; Simpson et al. 2002), making it highly prob- 2003a) compared with a decade ago (Cavalier-Smith able that the ancestral eukaryote was aerobic (Cavalier- 1993a). PCR of DNA extracts from environmental Smith 2002a). Nonetheless, the remote possibility samples would a priori be expected to reveal many fewer remains that some little-known or entirely unknown deeply branching novel lineages than has been the case for primitively anaerobic eukaryote group might still exist, bacteria. This is because most eukaryote cells are much and anaerobic environments could still hold early- larger and morphologically much more complex than branching eukaryotic lineages important for understand- most bacteria, so centuries of microscopical study have ing the origin of the nucleated (eukaryotic) cell. The probably already revealed most of the major types, even recent studies of Dawson & Pace (2002) and Stoeck & of lineages that have never been cultured. This has been Epstein (2003) of DNA extracted from marine and fresh- borne out by the first such studies of eukaryotes from water anoxic sites are therefore of particular interest. They aerobic habitats, where almost all the so-called ‘novel’ lin- found that over 90% of their sequences could easily be eages discovered can be easily placed within the known identified as belonging to known kingdoms and phyla, but phyla (Lo´pez-Garcı´a et al. 2001; Moon-van der Staay et were not able to identify relatives for 13 and nine novel al. 2001). This is true even for the alveolate protozoa, sequences, respectively. Their conclusions that these new where the major groups of ‘novel lineages’ almost certainly sequences represent eight and six novel kingdoms,

Proc. R. Soc. Lond. B (2004) Only six kingdoms of life T. Cavalier-Smith 1253

CABOZOA unikonts (ancestrally bikonts uniciliate eukaryotes) Excavata (ancestrally biciliate eukaryotes) Metamonada Euglenozoa kingdom Plantae kingdom kingdom kingdom Percolozoa Viridaeplantae Rhodophyta Chromista Animalia Fungi (green plants) Choanozoa Loukozoa chromalveolates G Glaucophyta R Alveolata Rhizaria * Foraminifera plastid GAPDH Cercozoa* CHLOROPLAST replacement posterior cilium; flat polyubiquitin Heliozoa cortical alveoli (corticates) mitochondrial cristae; insertion EF1-α insertion Radiozoa ? ? Apusozoa Amoebozoa pyrimidine synthesis DHFR–TS fusion gene triple gene fusion empire MITOCHONDRION ancestral heterotrophic aerobic uniciliate eukaryote ca. 900 Myr ago Eukaryota endomembrane system; cytoskeleton; nucleus; cilium Archaebacteria ancestral neomuran hyperthermophily; N-linked Unibacteria isoprenoid ether lipids; glycoproteins; reverse gyrase histones H3/4 Posibacteria kingdom outer Bacteria membrane eubacteria loss empire Prokaryota Negibacteria

origin of life and photosynthesis ? 2.9–3.5 Gyr ago

Figure 1. The tree of life based on molecular, ultrastructural and palaeontological evidence. Contrary to widespread assumptions, the root is among the eubacteria, probably within the double-enveloped Negibacteria, not between the eubacteria and archaebacteria (Cavalier-Smith 2002b). Establishing its precise position with confidence is one of the most challenging questions in evolutionary biology and is not yet solved; it may lie between Eobacteria and other Negibacteria (Cavalier-Smith 2002b); even the internal branching order of eubacterial phyla is more uncertain than for eukaryotes. The position of the eukaryotic root has been nearly as controversial, but is less hard to establish: it probably lies between the unikonts and the bikonts (Lang et al. 2002; Stechmann & Cavalier-Smith 2002, 2003a). For clarity the basal eukaryotic kingdom Protozoa is not labelled; it comprises four major groups (Alveolata, cabozoa, Amoebozoa and Choanozoa) plus the phylum Apusozoa, which may be sister to the other bikonts or, possibly more likely, sister to excavates alone (Cavalier-Smith 2003a,b), and probably Heliozoa. Heliozoa are of uncertain phylogenetic position within the bikonts, and might be chromists (Cavalier- Smith & Chao 2003a) not earlier branching as shown here. Symbiogenetic cell enslavement occurred four or five times: in the origin of mitochondria and chloroplasts from different Negibacteria, of chromalveolates by the enslaving of a red alga (Cavalier-Smith 1999, 2003c; Harper & Keeling 2003) and in the origin of the green plastids of euglenoid (excavate) and chlorarachnean (cercozoan) algae—a green algal cell (G) was enslaved either by the ancestral cabozoan (dashed arrow) or (less likely) twice independently within excavates and Cercozoa (asterisks) (Cavalier-Smith 2003c). The ages of eukaryotes and archaebacteria are also controversial; reasons for scepticism over substantially earlier dates than shown are explained by Cavalier-Smith (2002b) and Cavalier-Smith & Chao (2003c). The thumbnail sketches show the four major kinds of cell in the living world, plus the most complex eukaryote cells of all: the cryptophytes (upper sketch, nucleus and nucleomorph in blue, mitochondrion in purple, chloroplast in green, periplastid space in red). The middle ones show in red the contrasting ancestral microtubular cytoskeleton (ciliary roots) of unikonts (a cone of single microtubules attaching the single centriole to the nucleus) and bikonts (two bands of microtubules attached to the posterior centriole and an anterior fan of microtubules attached to the anterior centriole); cilia and plasma membrane are black and the nucleus blue. The lower ones show the single plasma membrane of unibacteria (Posibacteria plus archaebacteria), which were ancestral to eukaryotes, and the double envelope of Negibacteria, which were ancestral to mitochondria and chloroplasts (which retained the outer membrane, shown in red). Eubacteria are a paraphyletic grade comprising Negibacteria and Posibacteria. respectively, are very surprising indeed, given that only relatives or yield an evolutionarily reliable topology. In five eukaryotic kingdoms are currently recognized (figure addition both studies were seriously flawed by a systematic 1). misrooting caused by including the very distant bacterial Have we really missed as much deep eukaryotic biodiv- outgroups. I have therefore reanalysed their data using ersity as these authors claim? No! Their conclusions are 167, not just 37 (Dawson & Pace 2002) or only 8–35 invalid, not only because one cannot assess rank from (Stoeck & Epstein 2003), known eukaryotic taxa rep- divergence depth on a molecular tree, but also because resenting all the major groups. As recent microscopic the key analyses shown in their figures include far too few studies have shown that the deep sea harbours many novel known eukaryotes to be able to identify their likely protists (Hausmann et al. 2002a), sometimes quite

Proc. R. Soc. Lond. B (2004) 1254 T. Cavalier-Smith Only six kingdoms of life extraordinary ones (Hausmann et al. 2002b), I have also that any of the sequences grouped with them and to test the included four environmental sequences from the deep sea robustness of the groupings found. Although these other trees not previously assigned to phyla that were more cautiously are not shown, the generalizations about grouping are based not suggested as possibly representing novel kingdoms only on figure 2 but also on these dozens of other trees. (Lo´pez-Garcı´a et al. 2003). My analysis of 193 sequences As rRNA trees cannot be reliably rooted using bacterial out- is, I think, the most taxonomically comprehensive and bal- groups because of long-branch artefacts (Philippe 2000), an anced phylogenetic study to date of eukaryotic 18S rRNA. independent method is needed. Figure 1 shows that the position Unsurprisingly, it does not support the idea that anaerobic of the eukaryote root can be unambiguously specified by the use habitats or the deep ocean contain uncharacterized eukar- of two complementary gene fusions. One is the derived fusion yotic kingdoms. All 26 mystery lineages discovered by the of dihydrofolate reductase (DHFR) and thymidine synthetase three studies can be placed within the established six- (TS) first noted by Philippe et al. (2000) and shown by Stech- kingdom classification (table 1) in established phyla and mann & Cavalier-Smith (2002, 2003a) to characterize all groups usually also in classes within the kingdom Protozoa, of bikont eukaryotes; by contrast all three groups except for one that proves to be an artefactual chimera of and Amoebozoa have separate DHFR and TS genes like their sequences from two phyla. bacterial ancestors (Stechmann & Cavalier-Smith 2002, 2003a). Dawson & Pace (2002) also interpreted their tree in This fusion shows that bikonts are a clade and that the root terms of my former hypothesis (Cavalier-Smith 1983a,b) cannot lie within them as it does on rRNA trees rooted by bac- that the earliest eukaryotes lacked mitochondria, asserting terial outgroups. The second is a fusion of the first three genes that seven clades were ‘deep branching’, as did Stoeck & of the pyrimidine biosynthesis pathway, which is found in ani- Epstein (2003) and another study that appeared after I mals, fungi and Amoebozoa, but not in bacteria or bikonts, completed the present analysis (Stoeck et al. 2003). How- showing that unikonts are a clade and that the root cannot lie ever, recent evidence indicates that the root of the eukary- among them. Therefore the eukaryote root must lie precisely ote tree lies among aerobic eukaryotes with mitochondria between the bikonts and the unikonts (Stechmann & Cavalier- (Lang et al. 2002; Simpson & Roger 2002; Stechmann & Smith 2003b). Precisely the same position for the root is shown Cavalier-Smith 2002, 2003a), not anaerobic ones as those by the cytosolic Hsp90 tree where the branch lengths of all the authors mistakenly assumed (figure 1). Therefore, as I eukaryote groups are relatively uniform, in marked contrast to shall explain, the derived positions of all their sequences the grossly non-clock-like rRNA, if it is rooted using the rela- on my tree give added support for the now prevailing view tively closely related endoplasmic reticulum paralogue and not that the last common ancestor of eukaryotes was aerobic the much more distant bacterial outgroups (Stechmann & and that mitochondria originated immediately following Cavalier-Smith 2003b). Concatenated trees based on mitochon- or even during the origin of the nucleus. drial proteins, which also have relatively uniform branch lengths (except for the aberrant Euglenozoa, which must be excluded), also place the root precisely between bikonts and unikonts (Lang 2. METHODS et al. 2002) when rooted using bacterial outgroups, which are The 12 unknown environmental sequences from fig. 4 of relatively much less distant than for rRNA. Ideally one should Dawson & Pace (2002), claimed to represent new kingdoms, also root the rRNA tree (figure 2) between the unikonts and the plus another of their sequences that they did not assign to a bikonts. In figure 2 this is not possible, as the Amoebozoa are known group (BOLA868), 10 phyletically unassigned sequences not cleanly separated from the bikonts, so it is rooted (only from Stoeck & Epstein (2003) and four from Lo´pez-Garcı´a et approximately correctly) between the opisthokonts and the al. (2001, 2003) were aligned with 167 sequences of known Amoebozoa/bikonts. organisms, representing all the major eukaryotic lineages, obtained from GenBank; wherever possible short-branch rep- 3. IDENTIFYING RELATIVES OF THE PUTATIVELY resentatives were chosen to minimize phylogenetic artefacts. As ANAEROBIC MYSTERY CLADES the 26 environmental sequences were incomplete, the terminal regions and obviously ambiguously aligned regions were omitted Figure 2 indicates that 22 out of the 24 sequences that from the phylogenetic analysis. Bacterial outgroups were also previous authors (Dawson & Pace 2002; Lo´pez-Garcı´a et omitted, as these are so distant that they would simply have arte- al. 2003; Stoeck & Epstein 2003) could not even place in factually joined the longest eukaryotic branch () established kingdoms form only 17 distinct groups, every and caused an artefactual rooting, as happened in the trees of one of which is related to known groups, seven belonging Dawson & Pace (2002) and Stoeck & Epstein (2003); in the in a single order of eugregarine . The rank former the bacterial branch was about twice as long as the of the known groups to which they can be assigned with Trepomonas/Hexamita part of the one. Omitting bac- reasonable confidence is highly variable, ranging from teria allowed the inclusion of 1044 nucleotide positions not just to phylum. I shall discuss them in the order shown 789 as included by Dawson & Pace (2002); Stoeck & Epstein by the numbered arrows in figure 2. (2003) did not specify how many they used. Neighbour-joining Three of these lineages belong in the protozoan phylum (BioNJ), weighted least-squares (power 2) distance analyses Amoebozoa. One (BOLA187/366) is very robustly sister (GTRϩ⌫ϩI model: ␣ = 0.614764; i = 0.120037; parameters cal- to the anaerobic uniciliate amoeboflagellate ‘ culated via Modeltest v. 3.06) and maximum parsimony were invertens’ (the quotes signify that the strain sequenced was carried out using PAUP∗ v. 4.0b10. In addition to the tree misidentified and is not even a Mastigamoeba)—it might shown in figure 2, a large number of other trees were also calcu- even belong in the same genus; this clade is now treated lated with differing taxon samples, including the addition of as the class Breviatea (Cavalier-Smith et al. 2004). The longer-branch taxa such as diplomonads, , Para- second (BOLA868) belongs in the order Euamoebida basalia, Percolozoa and Foraminifera to exclude the possibility (moderate support); most trees place it as sister to the

Proc. R. Soc. Lond. B (2004) Only six kingdoms of life T. Cavalier-Smith 1255 family , but in figure 2 it is sister to the (2001), unlike in Dawson & Pace (2002) where they Leptomyxidae instead (it was sister to the Amoebidae in appeared as three apparently unrelated clades, is also note- the corresponding bootstrap consensus trees, both dis- worthy as it suggests that their dispersal in the Dawson & tance and parsimony, with just over 50% support); it may Pace (2002) tree cannot be attributed solely to the belong in the Amoebidae. The third lineage (LEMD267) inclusion of eubacterial outgroups. However, Milyutina et is sister to Filamoeba on most trees (e.g. the bootstrapped al. (2001) included 10 Amoebozoa and used 1209 align- consensus trees corresponding to figure 2; distance 62% ment positions, compared with only 789 in Dawson & and parsimony 84% support) or less often to Filamoeba Pace (2002). Both factors would have improved their plus the myxogastrids (figure 2)—it might even belong to trees. the genus Filamoeba or in the family Filamoebidae, and it Lineage 4 (LEMD052/CCI78) belongs to the proto- is certainly a member of the order Varipodida and the zoan phylum Cercozoa and is a deep branch within the class Variosea (Cavalier-Smith et al. 2004). Thus none of subphylum (Cavalier-Smith & Chao 2003b); these sequences represents a new kingdom, phylum, class on most trees it is sister to with moderate or even order; as such a small fraction of the Amoebozoa bootstrap support (67% on the parsimony tree corre- has been sequenced, they may not even be new families, sponding to figure 2), but in figure 2 it is sister to all other genera or species. They are simply unidentified amoe- Cercozoa. The Dawson & Pace (2002) fig. 4 included only bozoan sequences. It cannot safely be assumed that all 20 two cercozoans (not the most divergent possible), not 12 novel uncultured lineages of Dawson & Pace (2002) and as here. This poor taxon sampling coupled with the use Stoeck & Epstein (2003) are anaerobic, as aerobic cysts of only 789 not 1044 positions probably explains why or other cells must sometimes enter anaerobic habitats. It LEMD052 did not group within the Cercozoa on their is likely, however, that the BOLA187/366 clade at least is tree (though it was nearby). Although the bootstrap sup- anaerobic like its sister ‘M. invertens’. The fact that this port for Cercozoa is low in figure 2 (probably because only short-branch clade does not group with other Amoebozoa partial sequences could be used because of the incom- (it does in some trees based on longer rRNA sequences; pleteness of the environmental ones), both LEMD052 and Bolivar et al. 2001) may be just a consequence of the very CCI78 possess the almost unique signature deletion that poor resolution at the base of the ribosomal rRNA tree, characterizes all Cercozoa (Cavalier-Smith & Chao probably caused by a combination of rapid early radiation 2003b,c). In separate analyses with over 80 cercozoan and saturation effects. In figure 2 even the Acanthamoeba sequences the LEMD052/CCI78 clade branches well clade does not group with the rest of the Amoebozoa, within them as sister to Phytomyxea with good support. At although it does with slightly different taxon samples or present only one anaerobic cercozoan is known (Cavalier- methods. Smith & Chao 2003b), and this novel clade does not The failure of Dawson & Pace (2002) to identify any group with it. We know from our own studies of environ- of these sequences as Amoebozoa probably partly arose mental cercozoan sequences that hundreds of cercozoan because their fig. 4 included only five Amoebozoa, not 40 sequences can be found that are not close to identified as here. Their fig. 4 specifically did not include strains, and we have identified a clade from aerobic habi- ‘M. invertens’, Filamoeba, , myxogastrids or tats in the same position on the rRNA tree as any Amoebidae or Leptomyxidae, the very groups to LEMD052/CCI78 (Bass & Cavalier-Smith 2004). Thus, which their sequences are related. Furthermore, their five in addition to not being a new kingdom or even phylum, Amoebozoa formed three apparently unrelated clades this cercozoan clade may not even be anaerobic; it could (four counting LEMD267) rather than a single one as be a new class or just deep-branching Phytomyxea. here and in another recent analysis with good taxon It is also questionable whether sequence CCA32 sampling (Bolivar et al. 2001). This emphasizes the (Stoeck & Epstein 2003) is from an anaerobe. Stoeck & importance of broad taxon sampling for obtaining sound Epstein (2003) did not claim that it represents a novel trees and making sweeping conclusions about the non- kingdom, as it grouped with strong support on their trees affinity of environmental clones to known lineages. Their with the apusozoan Ancyromonas, but it was included here tree and those of Stoeck & Epstein (2003) were probably because no sequence was previously known to group also distorted by including the very distant bacterial out- robustly with Ancyromonas. On distance trees it is weakly groups, as such extreme outgroups tend to pull the Myce- sister to the zooflagellate Diphylleia, not included in the tozoa, Dictyostelium and Physarum (and often also the tree of Stoeck & Epstein (2003), but in some parsimony ) towards them and away from the other analyses it groups weakly with AT4-68 in an unresolved Amoebozoa, as seen in the trees of Cavalier-Smith position deep among the bikonts. The taxonomic position (1993a), Milyutina et al. (2001) and Silberman et al. of Diphylleia has itself been problematic, but it is now (2002) compared with the better ones of Bolivar et al. classified in the class Diphyllatea within the protozoan (2001) and figure 2. However, the distortion of fig. 4 of phylum Apusozoa (Cavalier-Smith 2003a,b). Figure 2 Dawson & Pace (2002) and the artefactually wide disper- does not resolve the position of the Diphylleia/CCA32 sal of the Amoebozoa were far worse than in the clade, showing it as a very deep bikont branch not sister maximum-likelihood trees of Milyutina et al. (2001) that to any other group. However, very minor changes in the also included bacterial outgroups. The reasonably well- gamma correction parameters or in taxon sampling can supported grouping together of the Archamoebae, cause it to group with Apusomonadida and Ancyromonas, and the clade (mislabelled as acan- the other Apusozoa on the tree, as well as with the brevi- thamoebae in Dawson & Pace (2002): Hartmannella is not ates. This is consistent with CCA32 being a member of an acanthamoebid—it does not even belong in the same the class Diphyllatea of the phylum Apusozoa. Currently class; Cavalier-Smith et al. 2004) in Milyutina et al. no anaerobic Apusozoa are known; although there is no

Proc. R. Soc. Lond. B (2004) 1256 T. Cavalier-Smith Only six kingdoms of life

Nuclearia simplex Nuclearia delicatula Nuclearia sp. salmonis Amoebidium parasiticum Aurofeca richardsi 97/99 Choanozoa limacisporum Monosiga ovata Monosiga brevicollis Diaphanoeca grandis Anemonia sulcata opisthokonts unikonts Trichoplax adhaerens Mnemiopsis leidyi Scypha ciliata kingdom Animalia Tetilla japonica Chytridium confervae Glomus mossae Smittium culisetae kingdom Fungi Schizosaccharomyces pombe 99 Raphidiophrys ambigua Chlamydaster sterni Heterophrys marina Centrohelida Heliozoa AT4-68 Petalomonas cantuscygni Peranema trichophorum 17 Trachelomonas volvocina Euglena Phacus megalopsis mutabilis Khawkinea quartana 95 Eutreptia viridis Eutreptiella sp. Astasia Diplonema papillatum curvata diplonemid Ichthyobobdo necator Crithidia oncopelti Trypanosoma brucei Bodo saltans - Reclinomonas americana Rhynchomonas nasuta Euglenozoa libera Jakobea Jakoba incarcerata Loukozoa CCW8 Excavata Parvilucifera infectans 91 Monocystis agilis LEMD119 6 87 77 9 BAQA65div BOLA48 59 84 Leidyana migrator 82 Gregarina polymorpha 49 81 89 LEMD145 13 72 LEMD003 38 8 99/97 DH145-EKD1 CCW100 CCW75 69 Ophriocystis elektroscirra 81 Mattesia geminata Lecudina sp. Gregarinea CCI7 47 Selenidium vivax 7 14 Apicomplexa BOLA458 20 27 BOLA212 11 - CCI31 CCA38 12 76/84 BOLA267 58 CCI73 10 parasite of Ammonia beccarii Alveolata 55 Hepatozoon catesbianae Toxoplasma gondii 96 Babesia bigemina Theileria parva 97 Selenidium terebellae 96 Cryptosporidium serpentis Cryptosporidium parva 96 parasite of Tridacna Colpodella pontica Colpodella sp. 83 Heterocapsa triquetra 54 Noctiluca scintillans Perkinsus sp. Dinozoa 6 Amoebophrya sp. Myzozoa 4 35/28 Colpoda inflata Furgasonia blochmanni Paramecium tetraurelia Protocruzia adherens Ciliophora 27 Oxytricha nova Homalozoon vermiculare 61 Blepharisma americanum 97 Prymnesium patelliferum Pavlova aff. salina 99 Ochromonas danica Haptophyta Chromulina chromophila Skeletonema costatum Nitzschia apiculata Bolidomonas pacifica Fucus distichus Achlya bisexualis Phytophthora megasperma Heterokonta Chromobiota kingdom Chromista Developayella elegans 75 Hyphochytrium sp. Labyrinthuloides minuta 72 96 Siluania monomastiga Cafeteria roenbergensis 95 Wobblia lunata Euglypha rotunda Thaumatomonas sp. bikonts Cercomonas sp.21 Proleptomonas faecicola Cryothecomonas longipes Pseudodifflugia cf. gracilis Massisteria marina n-por filose Chlorarachnion reptans Cercozoa oviformis Haplosporidium nelsoni Rhizaria 51 Urosporidium crescens 84 Plasmodiophora brassicae 52 Phagomyxa odontellae Phytomyxea 64 85 LEMD052 CCI78 85 chaunacanthid sp. 4 arthracanthid sp. 92 Zea mays Mantoniella squamata 86 Nephroselmis olivacea Viridiplantae 76 Porphyra umbilicalis Stylonema alsidii kingdom Plantae 81 Porphyridium aerugineum Rhodophyta unidentified microheliozoan AT4-11 16 membranifera marina Eopharyngia 97 Trimastix pyriformis Metamonada Excavata 97 Streblomastix strix Anaeromonadea 86 Oxymonas sp. 30 Gephyramoeba sp. 35 LEMD267 3 Varipodida Variosea 15 - Filamoeba nolandi 62/ Physarum polycephalum 20/- 83 Didymium nigripes Stemonitis flavogenita Hyperamoeba sp. 2 Hyperamoeba flagellata Dictyostelium discoideum BOLA868 - Paraflabellula reniformis2 - sp. 69 Leptomyxidae Chaos nobile 59 Amoeba leningradensis Euamoebida Amoeba proteus Amoebidae 51/53 Saccamoeba limax Echinamoeba exundans Amoebozoa Hartmannella vermiformis Mastigamoeba sp. 96 commutans Phreatamoeba balamuthi unikonts 56 Mastigamoeba simplex Archamoebea Endolimax nana 58 Pelomyxa palustris gingivalis 82 Vexillifera armata Entamoeba coli stella 99 Neoparamoeba pemaquidensis Lingulamoeba leei Clydonella sp. Vannella miroides Glycostylea 99 plurinucleolus Vanellidae Vannella anglica 93 ‘Mastigamoeba invertens’ BOLA187 88 BOLA366 1 Breviatea Malawimonas jakobiformis Malawimonadea Loukozoa Excavata Apusomonas proboscidea Amastigomonas bermudensis 27 Amastigomonas sp. Millport 15/- Amastigomonas sp. CCAP Apusomonadida 30 AT-50 Thecomonadea Amastigomonas15 sp. ATCC50062 Apusozoa Amastigomonas mutabilis Ancyromonas sigmoides 32 Diphylleia rotans bikonts 19 CCA32 Diphyllatea Cyanophora paradoxa5 Cyanoptyche gloeocystis Glaucophyta kingdom Plantae Goniomonas truncata 98 Goniomonas sp. ATCC 50108 95 Chilomonas paramecium Cryptomonada Rhodomonas mariana kingdom Chromista Acanthamoeba castellani solitarium Amoebozoa unikonts

10% nucleotide substitution

Figure 2. (Caption opposite.) Only six kingdoms of life T. Cavalier-Smith 1257

Figure 2. Phylogenetic analysis (BioNJ) of 193 eukaryotic 2 of Fenchel et al. (1995), which has a groove 18S ribosomal RNA sequences using 1044 nucleotide (like Loukozoa and Percolozoa) and two posteriad cilia positions. The 26 environmental sequences not previously (unlike these two phyla or Euglenozoa, but like the postu- assigned to established groups are in bold, and the 17 clades lated missing link between Loukozoa and discicristates; they form are marked by red numbered arrows. Only Cavalier-Smith 2003b). bootstrap percentages of 95% or more are shown except for those directly relevant to the positions of the environmental The remaining eight lineages of Dawson & Pace (2002) clades: distance (1000 pseudoreplicates: BioNJ) upper or and Stoeck & Epstein (2003) all apparently belong in the left; parsimony (1000 pseudoreplicates) lower or right; as the phylum Myzozoa in the subphylum Apicomplexa, class tree is crowded, some are shown by the named clades not Gregarinea and order Eugregarinida. This is most obvious the bipartition points on the tree itself; bipartitions with for CCI7, which is so close to Lecudina sp. that it almost 100% support by both methods are marked by a single black certainly belongs to that very genus. The LEMD145/003 disc on the branch itself. The scale bar represents 10% clade is sister to Gregarina polymorpha with 100% support, sequence divergence. The great disparity of branch lengths while LEMD119 is sister to Monocystis with very high sup- indicates that 18S rRNA is grossly non-clock-like in its port. BOLA267/CCI73 has 100% support as sister to the evolutionary rates; this would be several times greater than parasite of Ammonia, identified by Leander et al. (2003) shown had not the longest-branch taxa been excluded to as a gregarine. BOLA458/212 groups weakly as sister to reduce artefacts. All taxa not included in the four derived kingdoms (Animalia, Fungi, Plantae, Chromista; shown in Selenidium vivax. Clade CCI31/CCA38 groups with mod- colour) belong in the unlabelled basal kingdom Protozoa (in erate support with the Ammonia parasite clade. Two black, see also table 1). clades that consistently branch within the gregarines (BAQA65/BOLA48 and DH145 (of Lo´pez-Garcı´a et al. 2001)/CCW75/100) are, however, such long branches reason why some should not exist, it seems more likely that their identification as gregarines is more open to ques- that CCA32 is an aerobic flagellate (apart from breviates, tion, but there is no reason from the present analysis to anaerobic protists, unlike CCA32, have long or very long regard any of these sequences as representing novel king- branches on rRNA trees making them very hard to place; doms or phyla or even classes or orders. it seems that loss of the mitochondrial genome generally It is not surprising that Dawson & Pace (2002) and causes dramatic and long-sustained acceleration of Stoeck & Epstein (2003) failed to identify any of these nuclear rRNA gene evolution; Cavalier-Smith 2002b). sequences as gregarines, as neither included any known Sequence ccw88 of Stoeck & Epstein (2003) turns out gregarines in their published trees, and Dawson & Pace’s to be an artefactual chimera that was probably formed (2002) had only one apicomplexan. When I omit gregar- during the PCR process. The 3Ј part of the molecule is ines altogether from the present dataset and add Percol- clearly from a deep-branching heterokont as it has the ozoa (as in Dawson & Pace 2002), none of these characteristic heterokont signature sequence (Cavalier- sequences group within Apicomplexa, Myzozoa or Smith et al. 1994) and branches with or close to Labyr- (usually) even the Alveolates, but form a single large clade inthuloides. The 5Ј part groups weakly with another of their (or two clades) that is very distantly related and very unidentified environmental lineages, CCW8 (significantly weakly sister to Percolozoa; this long-branch artefact is from the same DNA sample). This chimera was omitted corrected only by the addition of numerous gregarines. from the analysis shown because of the risk of its system- Gregarines themselves have highly variable rRNA evol- atically distorting the tree. utionary rates and branch lengths; coupled with the Sequence CCW8 might be genuinely anaerobic. It explosive basal branching of Myzozoa, this means that groups as sister to the Loukozoa plus the Eugleno- bootstrap support for the unity of the Myzozoa itself is zoa in figure 2; when Percolozoa are added to the tree very low when they are all included. they group with Euglenozoa, not CCW8. The phyletic It is hard to know just how fig. 4 of Dawson & Pace position of CCW8 is the least clear of all sequences from (2002) was produced. The text calls it ‘one typical tree’ anaerobic habitats; it is likely to be a member of the Lou- without specifying method, but implying that only one was kozoa, a possibly paraphyletic phylum at the base of the used; the figure legend calls it ‘a consensus tree’ without excavates (Cavalier-Smith 2003b). The other loukozoan, explaining how a fully resolved consensus tree with branch Malawimonas, is in an isolated position in figure 2, but on lengths was calculated. The tree shows support values for most trees that also include Percolozoa it groups as sister different methods, three that did not allow for intersite to Metamonada, which are here in an unusual artefactual variation and one (Bayesian) that did; for the first three position within the Amoebozoa. The bootstrapped con- there was no support at all for four bipartitions that separ- sensus tree for the figure 2 dataset showed Malawimonas ated most of the sequences claimed to represent separate as sister to Anaeromonadea with 19% support and this kingdoms. The tree itself looks very different from the one clade plus AT4-68 as sister to the discicristate/jakobid presented here. It is pectinate with the known and clade; excavates would be holophyletic on that tree except environmental sequences in positions from top to bottom for the misplacement of Carpediemonas alone within the in approximate proportion to the branch lengths. Such a Amoebozoa. Establishing the unity and basal branching tree has all the hallmarks of being dominated by long- order for excavates is notoriously difficult on rRNA trees branch artefacts, and is just the kind of tree that one used because of their tremendous rate variations (Cavalier- to get before corrections for intersite rate variation were Smith 2003b). A variety of anaerobic has been introduced (Cavalier-Smith 1993a, 1995). Correction for observed microscopically by ecologists but not studied intersite rate variation is important to reduce long-branch systematically; one such that appears to have just the right artefacts, though such correction can never be totally suc- mix of characters to branch precisely as does CCW8 is cessful for molecules that are as grotesquely non-clock-

Proc. R. Soc. Lond. B (2004) 1258 T. Cavalier-Smith Only six kingdoms of life like as rRNA. The reader can easily compare the overall following the origin of the eukaryotic cell (Cavalier- eukaryote trees produced by both methods for rRNA by Smith & Chao 2003c). Nonetheless, the present tree is consulting Bolivar et al. (2001) and Milyutina et al. (2001) much more congruent with other molecular data, ultra- and for proteins by consulting Bapteste et al. (2002). In structure and cell biology (Cavalier-Smith 2002b; all cases the corrected trees are more congruent with other Cavalier-Smith & Chao 2003c) than any recent tree with biological data than are the uncorrected trees. The rRNA only limited taxon sampling and/or including bacterial trees shown in fig. 2a,b of Milyutina et al. (2001) are parti- outgroups (Dawson & Pace 2002; Silberman et al. 2002) cularly instructive as like in Dawson & Pace (2002) and (including bacteria leads many authors to ignore parts of Stoeck & Epstein (2003) they were rooted by bacterial the molecule informative for eukaryote phylogeny). In outgroups and included microsporidia and even the long- contrast to such trees, including those of Dawson & Pace branch Foraminifera. Their uncorrected fig. 2a is pecti- (2002), the long-branch Amoebozoa group with most of nate and graded in branch lengths from top to bottom like the short-branch aerobic Amoebozoa, not arbitrarily else- that of Dawson & Pace (2002), but is not as bad as there where or artefactually with the long-branch excavates, and is a major amoebozoan clade apart from slime moulds. the latter do not all artefactually cluster together. It is very Their tree is also better than that of Dawson & Pace difficult indeed for tree-reconstruction algorithms to cope (2002) in showing a /diplomonad clade, now with the extreme variations in evolutionary rate and mode supported also by several protein trees, including cpn60 of the rRNA molecule, especially in the vastly accelerated (Archibald et al. 2002), and shared lateral transfers (see secondarily amitochondrial, and the discicristate and review by Cavalier-Smith 2003b). Their gamma-corrected myxogastrid amoebozoan lineages. Nonetheless, including fig. 2b is very different. It is largely non-pectinate and non- a large number of taxa in the tree, including as high a graded with a big bang of rapidly radiating mostly short- proportion of the molecule as is reasonable and excluding branch lineages from which numerous long-branch the excessively distant outgroups (bacteria) are keys to sequences stem largely independently, with a general reducing these problems and obtaining a relatively undis- appearance similar to my present tree, even though it is torted phylogeny. These three features of the present tree rooted differently and artefactually on the parabasalid may explain why Excavata and Amoebozoa are both much sequence. A striking difference between the two trees is in less randomly dispersed in the present tree than in taxo- the position of the sole microsporidian: in the uncorrected nomically sparse bacterially rooted trees. The positions of tree it is well towards the base among other long-branch many excavates on rRNA trees are notoriously sensitive taxa (but not as far away from its true fungal position as to taxon sampling and the phylogenetic parameters, in Dawson & Pace (2002)), but in the corrected tree it is because of their exceptional disparity in rRNA evolution- between opisthokonts and Amoebozoa, i.e. in the unikont ary rate and mode; only rarely do all branch together part of the tree, much closer to its true position within the (Cavalier-Smith 2002b, 2003b). Fungi. Fig. 2b was also biologically realistic in grouping together the short-branch radiolaria and the ultra-long- 4. IDENTIFYING RELATIVES OF THE MYSTERY branch Foraminifera, the first molecular support for the DEEP-OCEAN CLADES group Retaria, established on morphological grounds (Cavalier-Smith 1999). Fig. 4 of Dawson & Pace (2002) The tree of Lo´pez-Garcı´a et al. (2003) already weakly would seem to be an uncorrected tree, which would arte- suggested that two of their mystery sequences were sisters factually have dispersed both named and unidentified of Apusomonas. Figure 2 shows that one of them, AT-50, sequences, impeding recognition of their true affinities. does indeed branch within the class Thecomonadea of the The frequent capacity of gamma-corrected distance trees phylum Apusozoa, actually within the genus Amastigo- to recover (albeit usually with weak support) clades well- monas (sister to Amastigomonas sp. Millport on both con- substantiated by morphology but with a mixture of short- sensus trees), suggesting that it is simply an additional and very-long-branch taxa that usually disrupt the clade Amastigomonas species. From our previous work we con- on uncorrected trees is also evident in the present figure 2: cluded that Amastigomonas is probably vastly under- e.g. the short-branch are grouped with the long- described (Cavalier-Smith & Chao 2003c). If AT-50 is an branch Euglenozoa, are holophyletic, with Amastigomonas, that means that we already have six dra- the short-branch anaeromonad clade grouped with the matically different sequences for this genus in which only medium-length Carpediemonas (if retortamonads and the nine species have been described (Mylnikov 1999). Only ultra-long-branch diplomonads are also included they two morphospecies (A. mutabilis and A. debruynei) were group as expected with Carpediemonas) and the long- previously recorded from the deep ocean (Arndt et al. branch myxogastrids are grouped with the medium- 2003), and their sequences are markedly different from branch Dictyostelium. that of AT-50. The latter need not necessarily be Amastig- However, figure 2 is not a perfect representation of omonas, however, for there are currently no sequences eukaryote phylogeny. As in most previous studies, several available from the thecomonad order Hemimastigida. major groups known to be holophyletic from other exten- As this order is considered to have evolved from an sive evidence (see review by Cavalier-Smith & Chao Amastigomonas-like ancestor (Cavalier-Smith 2000a), one 2003c) are not recovered as clades (e.g. the kingdoms of these sequences might be from a hemimastigid; Plantae and Chromista, and the chromalveolates although hemimastigids were previously known only from (Chromista plus Alveolata)). Moreover, there is virtually terrestrial environments, two genera were recently no support for the basal-branching orders (bootstrap sup- recorded in very deep (1325–1249 m) Mediterranean port for them typically in the range of 0–10%) consistent sediments (Arndt et al. 2003). Thus, while we can be with a very rapid, virtually explosive radiation immediately reasonably confident that AT-50 is a thecomonad

Proc. R. Soc. Lond. B (2004) Only six kingdoms of life T. Cavalier-Smith 1259 sequence, we cannot exclude the possibility that it belongs numerous phylogenetically informative sites, using only to the order Hemimastigida rather than the Apusomonadida. 789 nucleotide positions, compared with 1044 in my On some trees AT4-11 groups as sister to the Apuso- analysis. monadida (rarely within it), but it can also go within the Although their trees are technically inferior to many Amoebozoa or, as in figure 2, in a deep position with no published ones, being topologically incorrect in major specific relative. It seems likely that it belongs either to respects and rooted in entirely the wrong place, Dawson & the Apusozoa or to the Amoebozoa, as these phyla seldom Pace (2002) and Stoeck & Epstein (2003) assume that appear strictly holophyletic on rRNA trees, but it might both features are correct when discussing so-called deep belong to a third protozoan phylum. branches within them. They rooted their trees using bac- Sequence AT4-68 is even harder to place. It typically terial outgroups (despite this being known to give an occupies a deep but variable position within the bikonts incorrect root among the longest-branch eukaryotes; with no clear relatives. In some distance trees it groups Cavalier-Smith 2002b; Simpson & Roger 2002; Stech- with very low support as sister to the Glaucophyceae or mann & Cavalier-Smith 2002). It is remarkable that eight within the excavates as sister to Malawimonas/ of the lineages claimed to represent novel anaerobic king- Metamonada, but with parsimony it is sister to CCA32 doms turn out to belong to a single gregarine order, as no near the base of the plant kingdom. Glaucophytes are the anaerobic gregarines have so far been described. They are only one of the three major groups of the plant kingdom neither novel kingdoms nor early diverging; are they even in which secondarily heterotrophic species have not been anaerobic? The cercozoan clade and two of the three amo- identified; by contrast there are non-photosynthetic green ebozoan clades may not be anaerobic and are certainly not plants and red algae that have secondarily lost photosyn- ‘early diverging’ sequences. Only one clade thesis (but retained plastids, presumably for starch and/or (BOLA187/BOLA366) has any potential to be early fatty-acid synthesis; Cavalier-Smith 1993b). Because diverging, but it is not really novel, being sister to Lo´pez-Garcı´a et al. (2003) found no sequences from auth- ‘M. invertens’. If the breviate clade (‘M. invertens’/ entic photosynthetic algae in their deep-sea samples, BOLA187/BOLA366) really belongs within a holophyletic CCA32 is probably also from a heterotroph. While it phylum Amoebozoa, as seems most likely (Cavalier-Smith might be the first heterotrophic member of the phylum et al. 2004), then it would not be early diverging either. Glaucophyta and of great potential interest for under- If so, as figure 1 makes clear, the rooting of the eukaryote standing the early evolution of the plant kingdom if only tree between bikonts and unikonts (not within the most it could be cultured, its occasional grouping with CCA32 highly derived bikonts, as in fig. 4 of Dawson & Pace or within the excavates makes it likely that it is simply a (2002)) would mean that there are also no known extant bikont protozoan. eukaryotes, whether aerobic or anaerobic, that diverged prior to the last common ancestor of and plants. Far from revealing ‘novel’ early-diverging lineages, as 5. ALL ANAEROBIC EUKARYOTES ARE claimed, both studies failed to detect any novel anaerobic PROBABLY DERIVED: PROBLEMS OF ROOTING lineages that are early diverging (given the correct rooting THE TREE of the tree), which considerably strengthens the current In accordance with previous evidence that the last com- interpretation that there are probably no primitively ami- mon ancestor of all eukaryotes had mitochondria capable tochondrial eukaryotes. Only the breviates remain as poss- of aerobic respiration (probably facultatively rather than ible candidates for such a position. ‘Mastigamoeba obligately; Cavalier-Smith 2002b) and that all anaerobic invertens’ needs to be studied for the presence or absence eukaryotes arose secondarily by converting mitochondria of relict Hsp70 and Cpn60 chaperones from a mitochon- into or mitosomes (Silberman et al. drial ancestry and for the presence of the two fusion genes. 2002; Williams et al. 2002; Tovar et al. 2003), all the puta- If both sets of genes are unfused, this would provide evi- tively anaerobic lineages claimed to represent new king- dence that breviates are early-diverging anaerobic eukary- doms (Dawson & Pace 2002; Stoeck & Epstein 2003) nest otes, as postulated to exist in the Archezoa hypothesis well within aerobic clades in figure 2. In the past decade (Cavalier-Smith 1983a). If the DHFR and TS genes are data from numerous proteins and the discovery of mito- unfused but the three pyrimidine-biosynthesis ones are somes firmly established the secondary nature of amito- fused, this would place the breviates clearly in the Amoe- chondrial eukaryotes (Cavalier-Smith 2002a,b, 2003b; bozoa and unikonts; the converse would place them on Keeling 2003; Roger 1999; Silberman et al. 2002; Will- the bikont side of the basal eukaryote bifurcation. While iams et al. 2002). With no evidence or arguments whatso- I have identified two out of the four putatively aerobic ever, Dawson & Pace (2002) superficially dismissed as clades of Lo´pez-Garcı´a et al. (2001, 2003), the identity of ‘lateral transfer’ the disparate and extensive evidence for the other two remains unclear. This is unsurprising as this major advance in eukaryotic phylogeny. Their tree is there are numerous aerobic protozoan genera of uncertain also topologically incorrect, as shown by the non-grouping taxonomic position that have not been cultured, of microsporidia with fungi from which they evolved sequenced or studied ultrastructurally. Until we have such (Cavalier-Smith 2000b; Keeling 2003), and by the three information we cannot tell whether they represent new widely dispersed amoebozoan clades that in better analy- orders or classes (both likely) or even phyla (unlikely, but ses come together (Bolivar et al. 2001), as they do in figure possible). Naive interpretations of rRNA trees and protein 2; the topology in Dawson & Pace (2002) was probably paralogy trees have grossly misled evolutionary biology distorted by long-branch bacterial outgroups. Unwise (Cavalier-Smith 2002a). inclusion of bacteria and the drastically shortened If breviates turn out to be secondarily anaerobic microsporidial genes also meant that they excluded Amoebozoa and Amoebozoa prove to be holophyletic, the

Proc. R. Soc. Lond. B (2004) 1260 T. Cavalier-Smith Only six kingdoms of life rooting of the eukaryote tree between unikonts and eukaryotes. Contrary to over-confident but deeply held bikonts means that there is no such thing as a deep- assumptions, we do not actually know where the base of branching eukaryote, in the sense of one branching prior the tree of life is, but it is much more likely to be among to the last common ancestor of animals and plants. The the Gram-negative eubacteria (Cavalier-Smith 2002b) common use of the terms ‘crown’ and basal/deep by many than between archaebacteria and eubacteria as is often rRNA sequencers reflect multiple misunderstandings assumed. Until we know the answer to this question, any (Cavalier-Smith 1999). The dichotomy between ‘crown reference to ‘deeply branching’ or ‘ancient’ prokaryotic groups’ and ‘basal lineages’ is biologically totally meaning- lineages (e.g. Gaucher et al. 2003) is potentially mislead- less and simply reflects the severe long-branch exclusion ing and immensely more controversial than it is generally artefacts and the systematic misrooting of rRNA trees realized to be. While it may be that the Eobacteria are when using bacterial outgroups. For this reason, and the earliest-diverging phylum (Cavalier-Smith 2002b), we because the cladistic term ‘crown’ properly refers to all cannot currently exclude the possibility that their appar- extant eukaryotes (Cavalier-Smith 2002a), the muddled ently primitive characters are secondarily simplified and and misleading phrase ‘crown eukaryote’ should no longer that, as in eukaryotes, there are no extant ‘early diverging’ be used for any subset of extant eukaryotes. lineages. Even when representatives of novel rRNA lin- The widespread but false assumption that rRNA is a eages are cultivated, as is increasingly being done universal molecular clock has led to very serious misinter- (Leadbetter 2003), and partly characterized, ranking them pretations of the tree of life, especially concerning the as phyla purely on the basis of the degree of rRNA diver- roots of both the bacterial and eukaryotic parts of the tree gence (e.g. Zhang et al. 2003) is taxonomically unsound (Cavalier-Smith 2002a). To reconstruct phylogeny satis- as rRNA divergence can be greatly accelerated for rela- factorily it is fundamentally unsound to rely on a single tively trivial reasons and does not necessarily correlate well molecule, as Dawson & Pace (2002) appear to. Trees with biologically more important character differences. based on large numbers of molecules (Baldauf et al. 2000; Bapteste et al. 2002), and genic, biochemical and ultra- 6. CONCLUSIONS structural data that can be treated cladistically (Stechmann & Cavalier-Smith 2002, 2003a; Cavalier- Far from revealing novel kingdoms, the new rRNA Smith & Chao 2003c), as well as palaeontology (Cavalier- sequences from anaerobic habitats (Dawson & Pace 2002; Smith 2002a,b) and single-gene trees for proteins (e.g. Stoeck & Epstein 2003) and the deep ocean (Lo´pez- Hsp90, which appears to be much more clock-like than Garcı´a et al. 2003) show that our understanding of eukary- rRNA; Stechmann & Cavalier-Smith 2003b) as well as ote high-level diversity is actually now rather good. When rRNA are all taken note of in the synthesis of figure 1. properly analysed they tell us, in conjunction with numer- Dawson & Pace (2002) and Stoeck & Epstein (2003) ous studies from aerobic habitats (Lo´pez-Garcı´a et al. ignore the vast majority of such data on eukaryote evol- 2001; Moon-van der Staay et al. 2001; Moreira & Lo´pez- ution, including the evidence that the positions of the Garcı´a 2002), that there may be very few, if any, pre- roots of their trees are profoundly incorrect (Simpson & viously unknown protist phyla—and no ‘new kingdoms’— Roger 2002; Stechmann & Cavalier-Smith 2002; remaining to be ‘discovered’. These other studies did find Cavalier-Smith & Chao 2003c); their discussion in terms a very small number of sequences that are long branches of a ‘crown radiation’ and ‘deep branching’ is basically and as hard to place as some of those included here, e.g. upside down. It is the divergences among the short- Edgcomb et al. (2002), who also misrooted the tree and branch, misnamed, ‘crown groups’ that are basal, whereas made similar unwarranted statements about ‘early branch- their so-called ‘deep’ branches (mostly excavates) are ing’ anaerobic eukaryotes. While it is not impossible that actually among the most derived groups. It is particularly some of these lineages might represent new phyla, it is astounding that Dawson & Pace (2002) ignore the evi- more probable that they also will turn out to be hard-to- dence for the fungal nature of microsporidia, which was place long-branch representatives of established ones. I do the first group to show conclusively how dramatically mis- not wish to be misunderstood as predicting that no new leading the rRNA tree can be when naively interpreted phyla will be discovered. There are scores of protist genera (Embley & Hirt 1998; Roger 1999; Cavalier-Smith 2000b; that have not yet been studied ultrastructurally (e.g. the Williams et al. 2002; Keeling 2003). When naively inter- extraordinary Meteora, which waves its filopodia like sema- preted in the absence of other data, rRNA trees are the phores (Hausmann et al. 2002b); its phylum is unknown; single most misleading source of information we have my hunch is that it belongs in the Cercozoa, which are about the history of life. When integrated with all the other unsurpassed in filopodial wonders (Cavalier-Smith & information they are very valuable. Chao 2003b)) and there are many others still undescribed. Analogous claims of 14 novel division (phylum)/ It is possible that some may belong in a small group suf- kingdom-level bacterial lineages (Hugenholtz et al. 1998) ficiently distinct from known phyla to merit one of its own; based on unidentified environmental sequences are prob- Apusozoa is one such phylum that was only recently estab- ably also ill-founded; long-branch problems, many exacer- lished (Cavalier-Smith 2002a). There may be others, but bated by thermophilic bias (Cavalier-Smith 2002b; probably rather few, and possibly none. Phyla and king- Gribaldo & Philippe 2002), coupled with the non-existent doms are not actually things that one discovers, but con- resolution at the base of the eubacterial rRNA tree, ceptual entities that systematists create by deliberately impede their placement in one of the seven established grouping together known organisms. Novel organisms, eubacterial phyla (table 1 in Cavalier-Smith 2002b). The molecules or lineages can be discovered, but one lesson eubacterial tree is equally poorly resolved at its base, and from the present study is that molecular trees used to the problem of its rooting is even more severe than for claim novel major eukaryote lineages should in future

Proc. R. Soc. Lond. B (2004) Only six kingdoms of life T. Cavalier-Smith 1261 include the broadest range of protist diversity as in figure Cavalier-Smith, T. 2000b What are fungi? In The Mycota, vol. 2 and not just a tiny subset of it. VII part A (ed. D. J. McLaughlin, E. J. McLaughlin & P. Ribosomal DNA sequencing is very useful indeed for Lemke), pp. 3–37. Berlin: Springer. evolutionary and ecological studies, and we ourselves are Cavalier-Smith, T. 2002a The phagotrophic origin of eukary- using environmental rDNA sequencing to explore the otes and phylogenetic classification of Protozoa. Int. J. Syst. immense hidden diversity of protists, but it has to be criti- Evol. Microbiol. 52, 297–354. Cavalier-Smith, T. 2002b The neomuran origin of archaebac- cally integrated into traditional and not given a teria, the negibacterial root of the universal tree and bacterial spurious pre-eminence. megaclassification. Int. J. Syst. Evol. Microbiol. 52, 7–76. I thank NERC for research grants, and the Canadian Institute Cavalier-Smith, T. 2003a Protist phylogeny and the high-level for Advanced Research and NERC for fellowship support. classification of Protozoa. Eur. J. Protistol. 39, 338–348. Cavalier-Smith, T. 2003b The excavate protozoan phyla Meta- REFERENCES monada Grasse´ emend. (Anaeromonadea, Parabasalia, Car- pediemonas, Eopharyngia) and Loukozoa emend. (Jakobea, Archibald, J. M., O’Kelly, C. J. & Doolittle, W. F. 2002 The Malawimonas): their evolutionary affinities and new higher chaperonin genes of jakobid and jakobid-like flagellates: taxa. Int. J. Syst. Evol. Microbiol. 53, 1741–1758. implications for eukaryotic evolution. Mol. Biol. Evol. 19, Cavalier-Smith, T. 2003c Genomic reduction and evolution of 422–431. novel genetic membranes and protein-targeting machinery Arndt, H., Hausmann, K. & Wolf, M. 2003 Deep-sea hetero- in eukaryote–eukaryote chimaeras (meta-algae). Phil. Trans. trophic nanoflagellates of the Eastern Mediterranean Sea: R. Soc. Lond. B 358, 109–134. (DOI 10.1098/rstb. qualitative and quantitative aspects of their pelagic and ben- 2002.1194.) thic occurrence. Mar. Ecol. Prog. Ser. 256, 45–56. Baldauf, S. L., Roger, A. J., Wenk-Siefert, I. & Doolittle, W. F. Cavalier-Smith, T. & Chao, E. E. 2003a Molecular phylogeny 2000 A kingdom-level phylogeny of eukaryotes based on of centrohelid heliozoa, a novel lineage of bikont eukaryotes combined protein data. Science 290, 972–977. that arose by ciliary loss. J. Mol. Evol. 56, 387–396. Bapteste, E. (and 10 others) 2002 The analysis of 100 genes Cavalier-Smith, T. & Chao, E. E. 2003b Phylogeny and classi- supports the grouping of three highly divergent amoebae: fication of phylum Cercozoa (Protozoa). Protist 154, 341– Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl 358. Acad. Sci. USA 99, 1414–1419. Cavalier-Smith, T. & Chao, E. E. 2003c Phylogeny of Choano- Barns, S. M., Delwiche, C. F., Palmer, J. D. & Pace, N. R. zoa, Apusozoa, and other Protozoa and early eukaryote 1996 Perspectives on archaeal diversity, thermophily and megaevolution. J. Mol. Evol. 56, 540–563. monophyly from environmental rRNA sequences. Proc. Natl Cavalier-Smith, T. & Chao, E. E. 2004 Protalveolate phy- Acad. Sci. USA 93, 9188–9193. logeny and the origins of Sporozoa and dinoflagellates Bass, D. & Cavalier-Smith, T. 2004 Phylum-specific environ- (phylum Myzozoa nom. nov.) Eur. J. Protistol. 40, 21–48. mental DNA analysis reveals remarkably high global biodiv- Cavalier-Smith, T., Allsopp, M. T. E. P. & Chao, E. E. 1994 ersity of Cercozoa (Protozoa). Int. J. Syst. Evol. Microbiol. Thraustochytrids are chromists, not Fungi: 18S rRNA signa- (In the press.) tures of Heterokonta. Phil. Trans. R. Soc. Lond. B 339, Bolivar, I., Fahrni, J. F., Smirnov, A. & Pawlowski, J. 2001 139–146. SSU rRNA-based phylogenetic position of the genera Cavalier-Smith, T., Chao, E. E. & Oates, B. 2004 Molecular Amoeba and Chaos (Lobosea, Gymnamoebia): the origin of phylogeny of Amoebozoa and the evolutionary significance gymnamoebae revisited. Mol. Biol. Evol. 18, 2306–2314. of the unikont Phalansterium. Eur. J. Protistol. 40, 21–48. Cavalier-Smith, T. 1983a A 6-kingdom classification and a Dawson, S. C. & Pace, N. R. 2002 Novel kingdom-level eukar- unified phylogeny. In Endocytobiology II (ed. W. yotic diversity in anoxic environments. Proc. Natl Acad. Sci. Schwemmler & H. E. A. Schenk), pp. 1027–1034. Berlin: USA 99, 8324–8329. de Gruyter. Edgcomb, V. P., Kysela, D. T., Teske, A., de Vera Gomez, Cavalier-Smith, T. 1983b Endosymbiotic origin of the mito- A. & Sogin, M. L. 2002 Benthic eukaryotic diversity in the chondrial envelope. In Endocytobiology II (ed. W. Guaymas Basin hydrothermal vent environment. Proc. Natl Schwemmler & H. E. A. Schenk), pp. 265–279. Berlin: de Acad. Sci. USA 99, 7658–7662. Gruyter. Embley, T. M. & Hirt, R. P. 1998 Early branching eukaryotes? Cavalier-Smith, T. 1993a Kingdom Protozoa and its 18 phyla. Curr. Opin. Genet. Dev. 8, 624–629. Microbiol. Rev. 57, 953–994. Fenchel, T. & Finlay, B. J. 1995 Ecology and evolution in anoxic Cavalier-Smith, T. 1993b The origin, losses and gains of chlor- worlds. Oxford University Press. oplasts. In Origin of plastids: symbiogenesis, prochlorophytes and Fenchel, T., Bernard, C., Esteban, G., Finlay, B. J., Hansen, the origins of chloroplasts (ed. R. A. Lewin), pp. 291–348. New P. J. & Iversen, N. 1995 Microbial diversity and activity in York: Chapman & Hall. a Danish fjord with anoxic deep water. Ophelia 43, 45–100. Cavalier-Smith, T. 1995 Membrane heredity, symbiogenesis, Gaucher, E. A., Thomson, J. M., Burgan, M. F. & Benner, and the multiple origins of algae. In Biodiversity and evolution S. A. 2003 Inferring the palaeoenvironment of ancient bac- (ed. R. Arai, M. Kato & Y. Doi), pp. 75–114. Tokyo: The National Science Museum Foundation. teria on the basis of resurrected proteins. Nature 425, 285– Cavalier-Smith, T. 1998 A revised six-kingdom system of life. 288. Biol. Rev. Camb. Phil. Soc. 73, 203–266. Giovannoni, S. J., Britschgi, T. B., Moyer, C. L. & Field, Cavalier-Smith, T. 1999 Principles of protein and lipid tar- K. G. 1990 Genetic diversity in Sargasso Sea bacterioplank- geting in secondary symbiogenesis: euglenoid, dinoflagellate, ton. Nature 345, 60–63. and sporozoan plastid origins and the eukaryotic family tree. Gribaldo, S. & Philippe, H. 2002 Ancient phylogenetic J. Eukaryotic Microbiol. 46, 347–366. relationships. Theor. Popul. Biol. 61, 391–408. Cavalier-Smith, T. 2000a Flagellate megaevolution: the basis Harper, J. T. & Keeling, P. J. 2003 Nucleus-encoded, plastid- for eukaryote diversification. In The flagellates (ed. J. R. targeted glyceraldehyde-3-phosphate dehydrogenase Green & B. S. C. Leadbeater), pp. 361–390. London: (GAPDH) indicates a single origin for chromalveolate plas- Taylor & Francis. tids. Mol. Biol. Evol. 20, 1730–1735.

Proc. R. Soc. Lond. B (2004) 1262 T. Cavalier-Smith Only six kingdoms of life

Hausmann, K., Hu¨lsmann, N., Polianski, I., Schade, S. & Roger, A. J. 1999 Reconstructing early events in eukaryotic Weitere, M. 2002a Composition of benthic protozoan com- evolution. Am. Nat. 154(Suppl. 4), S146–S163. munities along a depth transect in the eastern Mediterranean Sait, M., Hugenholtz, P. & Janssen, P. H. 2002 Cultivation of Sea. Deep Sea Res. 49, 1959–1970. globally distributed soil bacteria from phylogenetic lineages Hausmann, K., Weitere, M., Wolf, M. & Arndt, H. 2002b previously only detected in cultivation-independent surveys. Meteora sporadica gen. nov. et sp. nov. (Protista incertae Environ. Microbiol. 4, 654–666. sedis): an extraordinary free-living protist from the Mediter- Silberman, J. D., Simpson, A. G., Kulda, J., Cepicka, I., ranean deep sea. Eur. J. Protistol. 38, 171–177. Hampl, V., Johnson, P. J. & Roger, A. J. 2002 Hugenholtz, P., Pitulle, C., Hershberger, K. L. & Pace, N. R. flagellates are closely related to diplomonads: implications 1998 Novel division level bacterial diversity in a Yellowstone for the history of mitochondrial function in eukaryote evol- hot spring. J. Bacteriol. 180, 366–376. ution. Mol. Biol. Evol. 19, 777–786. Keeling, P. J. 2003 Congruent evidence from alpha-tubulin Simpson, A. G. & Roger, A. J. 2002 Eukaryotic evolution: get- and beta-tubulin gene phylogenies for a zygomycete origin ting to the root of the problem. Curr. Biol. 12, R691–R693. of microsporidia. Fungal Genet. Biol. 38, 298–309. Simpson, A. G., Roger, A. J., Silberman, J. D., Leipe, D. D., Lang, B. F., O’Kelly, C., Nerad, T., Gray, M. W. & Burger, Edgcomb, V. P., Jermiin, L. S., Patterson, D. J. & Sogin, G. 2002 The closest unicellular relatives of animals. Curr. M. L. 2002 Evolutionary history of ‘early-diverging’ eukary- Biol. 12, 1773–1778. otes: the excavate taxon Carpediemonas is a close relative of Leadbetter, J. R. 2003 Cultivation of recalcitrant microbes: Giardia. Mol. Biol. Evol. 19, 1782–1791. cells are alive, well and revealing their secrets in the 21st Stechmann, A. & Cavalier-Smith, T. 2002 Rooting the eukary- century laboratory. Curr. Opin. Microbiol. 6, 274–281. ote tree by using a derived gene fusion. Science 297, 89–91. Leander, B. S., Clopton, R. E. & Keeling, P. J. 2003 Phy- Stechmann, A. & Cavalier-Smith, T. 2003a The root of the logeny of gregarines (Apicomplexa) as inferred from small- eukaryote tree pinpointed. Curr. Biol. 13, R665–R666. subunit rDNA and beta-tubulin. Int. J. Syst. Evol. Microbiol. Stechmann, A. & Cavalier-Smith, T. 2003b Phylogenetic 53, 345–354. analysis of eukaryotes using heat-shock protein Hsp90. J. Lo´pez-Garcı´a, P., Rodriguez-Valera, F., Pedros-Alio, C. & Mol. Evol. 57, 408–419. Moreira, D. 2001 Unexpected diversity of small eukaryotes Stoeck, T. & Epstein, S. 2003 Novel eukaryotic lineages in deep-sea Antarctic plankton. Nature 409, 603–607. inferred from small-subunit rRNA analyses of oxygen- Lo´pez-Garcı´a, P., Philippe, H., Gail, F. & Moreira, D. 2003 depleted marine environments. Appl. Environ. Microbiol. 69, Autochthonous eukaryotic diversity in hydrothermal sedi- 2657–2663. ment and experimental microcolonizers at the Mid-Atlantic Stoeck, T., Taylor, G. T. & Epstein, S. S. 2003 Novel eukary- Ridge. Proc. Natl Acad. Sci. USA 100, 697–702. otes from the permanently anoxic Cariaco Basin (Caribbean Milyutina, I. A., Aleshin, V. V., Mikrjukov, K. A., Kedrova, Sea). Appl. Environ. Microbiol. 69, 5656–5663. O. S. & Petrov, N. B. 2001 The unusually long small sub- Tovar, J., Leon-Avila, G., Sanchez, L. B., Sutak, R., Tachezy, unit ribosomal RNA gene found in amitochondriate amoe- J., Van Der Giezen, M., Hernandez, M., Mu¨ller, M. & Luc- boflagellate Pelomyxa palustris: its rRNA predicted secondary ocq, J. M. 2003 Mitochondrial remnant organelles of Giar- structure and phylogenetic implication. Gene 272, 131–139. dia function in iron–sulphur protein maturation. Nature 426, Moon-van der Staay, S. Y., De Wachter, R. & Vaulot, D. 2001 172–176. Van de Peer, Y., Ali, A. B. & Meyer, A. 2000 Microsporidia: Oceanic 18S rDNA sequences from picoplankton reveal accumulating molecular evidence that a group of amitochon- unsuspected eukaryotic diversity. Nature 409, 607–610. driate and suspectedly primitive eukaryotes are just curious Moreira, D. & Lo´pez-Garcı´a, P. 2002 The molecular ecology fungi. Gene 246, 1–8. of microbial eukaryotes unveils a hidden world. Trends Vossbrinck, C. R., Maddox, J. V., Friedman, S., Debrunner- Microbiol. 10, 31–38. Vossbrinck, B. A. & Woese, C. R. 1987 Ribosomal RNA Morris, R. M., Rappe, M. S., Connon, S. A., Vergin, K. L., sequence suggests microsporidia are extremely ancient euka- Siebold, W. A., Carlson, C. A. & Giovannoni, S. J. 2002 ryotes. Nature 326, 411–414. SAR11 clade dominates ocean surface bacterioplankton Williams, B. A., Hirt, R. P., Lucocq, J. M. & Embley, T. M. communities. Nature 420, 806–810. 2002 A mitochondrial remnant in the microsporidian Mylnikov, A. P. 1999 New brackish water amoeboid flagellates Trachipleistophora hominis. Nature 418, 865–869. of the genus Amastigomonas (Apusomonadida, Protozoa). Zhang, H., Sekiguchi, Y., Hanada, S., Hugenholtz, P., Kim, Zool. Zhurnal 78, 771–777. [In Russian.] H., Kamagata, Y. & Nakamura, K. 2003 Gemmatimonas aur- Philippe, H. 2000 Opinion: long branch attraction and protist antiaca gen. nov., sp. nov., a Gram-negative, aerobic, poly- phylogeny. Protist 151, 307–316. phosphate-accumulating micro-organism, the first cultured Philippe, H., Lopez, P., Brinkmann, H., Budin, K., Germot, representative of the new bacterial phylum Gemmatimonad- A., Laurent, J., Moreira, D., Mu¨ller, M. & Le Guyader, H. etes phyl. nov. Int. J. Syst. Evol. Microbiol. 53, 1155–1163. 2000 Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. As this paper exceeds the maximum length normally permitted, the Lond. B 267, 1213–1221. (DOI 10.1098/rspb.2000.1130.) author has agreed to contribute to production costs.

Proc. R. Soc. Lond. B (2004)