Cell, Vol. 107, 419–425, November 16, 2001, Copyright 2001 by Cell Press Reconstructing/Deconstructing Review the Earliest : How Comparative Genomics Can Help

Joel B. Dacks and W. Ford Doolittle1 plex cellular features and their component parts first Program in Evolutionary Biology appeared. In the simplest cases, features found in ho- Canadian Institute for Advanced Research mologous form in all descendants of a common ancestor Department of Biochemistry and Molecular Biology must, barring lateral transfer, have been present in that Dalhousie University ancestor (must be primitive or ancestral features). More Halifax, Nova Scotia complex cases involve traits found in some descendants Canada (for instance, as designated “ϩ” in species B of Figure 1A) but not in others (species A). These could either have been gained (in the lineage leading to B) or lost (in We could reconstruct the evolution of -spe- the lineage leading to A) after their divergence from their cific molecular and cellular machinery if some living common ancestor (X). One can tell which by looking at eukaryotes retained primitive cellular structures and “outgroups” (O). If (and only if) some of these show the we knew which eukaryotes these were. It’s not clear ϩ trait, parsimony demands that X was ϩ, and that A that either is the case, but the expanding protist geno- has lost this feature. mic database could help us in several ways. To apply such reasoning, it is essential to know the phylogeny, and to make interesting and sensible conclu- Introduction sions about early eukaryotic cellular evolution, it is es- Almost thirty years ago, the Canadian microbiologist sential to have a deeply branched eukaryotic phylogeny. Roger Stanier (Stanier, 1974) asserted that “the differ- Too many eukaryotic features have been assumed to ences in structure and function between prokaryotic be universal (and thus primitive) for all eukaryotes simply and eukaryotic cells are many and profound, and no because they are “present from yeast to man.” There contemporary biological group has a cellular organiza- are likely very many early eukaryotic branches (many tion that can plausibly be interpreted as intermediate.” outgroups) below the common ancestor of yeast and Microbiologists’ understanding now is more nuanced in man, whatever the true structure of the eukaryotic tree. several ways. We divide Life into three domains (Bacte- Before molecular methods achieved their current ria, , and Eukarya) rather than two ( dominance, however, attempts to reconstruct phylog- and eukaryotes). We recognize an astonishing diversity eny were often conflated with attempts to infer when of form and function within and between the two pro- complex traits and their components first appeared. For karyotic domains, and the defining positive prokaryotic instance, it seemed reasonable to suggest that the sim- characters recognized by Stanier are no longer universal plest contemporary eukaryotes branched off the main among them. And increasingly we can confirm the pres- trunk of the eukaryotic tree very early—because the first ence in Bacteria or Archaea or both of “primitive” forms eukaryotic cells obviously must have been simpler than of key macromolecules and processes previously modern ones and simplicity is unlikely to be a second- thought to be eukaryote specific. For instance, bacterial arily derived feature. Among such simple contemporary MreB, first claimed as a homolog to eukaryotic actin eukaryotes were the (including the intesti- only on the basis of predicted structural similarity in nal parasite Giardia), (including the sexu- ally transmitted disease-causing agent Trichomonas), ATPase domains (Bork et al., 1992) has recently been , microsporidia (such as the human patho- shown to form “cytoskeletal” actin-like filaments in vivo gen Encephalitozoon), pelobionts (Mastigamoeba), en- (Jones et al., 2001) and in vitro (van den Ent et al., 2001). tamoebae, and the heteroloboseid amoebae such as Nevertheless, differences at the level of cell structure Naegleria. These protists are variously deficient in such and function remain “many and profound.” It is still rea- basic eukaryotic cellular components as mitochondria, sonable to speak of the -eukaryote transition peroxisomes, and Golgi dictyosomes. as one of cellular evolution’s greatest leaps. It is still For much of the last decade, molecular phylogeny sensible to ask how and when the various complex mac- seemed to support an early branching for many of these romolecular machines (cytoskeletons, endomembrane simple protists. (Note that we should call such eukary- systems, spliceosome, membrane-bounded organelles, otes “early branching” or “deeply diverging” rather than and many others) that all the best-known eukaryotes “ancestral” or “primitive,” since no contemporary cell have and all known prokaryotes lack—or possess only can be any other’s ancient ancestor, and none is likely in inchoate form—came into being. to retain primitive states of all important characters.) One can approach this question from the bottom up, Small subunit (SSU) ribosomal RNA gene sequences, looking to diverse prokaryotes for homologs of compo- which have served as the gold standard molecular mea- nents and subassemblies of complex eukaryotic cellular sure for reconstructing phylogenetic relationships, showed machines (as in the case of actin and MreB), or from the amitochondriate diplomonads, microsporidia, and the top down, searching among eukaryotes for simpler parabasalids at the base of the eukaryotic tree (Figure versions of such machines (as we will describe below). 1B). Above them, mitochondriate protist lineages such Either way, there are principles of logic and parsimony as Heterolobosea and emerged, followed that can help in inferring when in cellular evolution com- by the unresolved cluster of animals, fungi, plants, and other lineages (including some protists) commonly 1Correspondence: [email protected] called the “eukaryotic crown” (Sogin, 1991). Cell 420

protists that Cavalier-Smith (1983) called “Archezoa”— comprising, among others, the diplomonads (Giardia), parabasalids (Trichomonas), and microsporidia. The prevailing belief about mitochondria since Margulis (1970) had been that these eukaryotic organelles are the degenerate descendants of endosymbiotic bacteria engulfed by some early amitochondriate eukaryote (the host). It was logical to propose that the immediate an- cestor of this eukaryote left other surviving descendants that never harbored such endosymbionts or never con- verted them to mitochondria. Those descendants would have been the direct ancestors of modern archezoa. This appealing “Archezoa Hypothesis” now seems likely to be false, based on two types of evidence: First, sev- eral gene sequences (Hirt et al., 1999; Keeling and Doo- little, 1996; Van de Peer et al., 2000) disagree with SSU rDNA on the deep branching of microsporidia (placing them instead among or as sister to fungi), while the deep divergences of diplomonads and parabasalids obtained with several molecular datasets are now seen as “long branch artifacts” (see below). Second, all three lineages harbor, in their nuclear genomes, at least one gene that obviously derived from mitochondrial genomes. Initial evidence for this came in 1996, when four groups simul- taneously described a mitochondrial-type cpn60 gene in nuclear DNA of Trichomonas vaginalis (Bui et al., 1996; Germot et al., 1996; Horner et al., 1996; Roger et al., 1996). The same gene was since found in the diplomo- Figure 1. Parsimony Diagram and Representative Tree of Eu- nads Giardia lamblia (Roger et al., 1998) and Spiro- karyotes nucleus barkhanus (Horner and Embley, 2001), and thor- (A) demonstrates the principles of parsimony in evolutionary deduc- ough phylogenetic analyses place all such “archezoal” tion (see text). (B) is a redrawn version of the tree of eukaryotes cpn60s with a mitochondrial cluster (Horner and Embley, based on SSU rDNA from Sogin (1991). This tree, and the early 2001). Of course, their functions cannot be “mitochon- and sequential divergence of several protist groups it implies, have drial”: current thinking is that, at least in parabasalids, provided an invaluable stimulus and guide to much evolutionary cpn60 serves the hydrogenosome—a hydrogen-gener- research in the last two decades. However, the agnostic representa- tion in Figure 2 may be more consistent with the bulk of available ating organelle that likely shares a common origin with reliable data. mitochondria. Several other amitochondriate lineages have been shown to have nuclear genes of mitochon- drial origin (Clark and Roger, 1995; Germot et al., 1997) or Now however, things are much less certain. In some to be embedded in groups that have likely mitochondrial cases, the simplicity of supposed deeply diverging pro- homologs (Dacks et al., 2001), and thus (by the logic of tist lineages is obviously due to secondary loss of com- Figure 1A) likely also to be secondarily deficient in these plex cell structures, likely a consequence of adopting a organelles. At the moment, it is unclear that there are parasitic lifestyle. Although surely the first eukaryotes any living direct descendants of the premitochondriate were simpler in structure than modern cells, there is no “host,” if such ever existed. Thus, parsimony says that compelling evidence that the last common ancestor of the last common ancestor of all currently known eukary- all surviving eukaryotes was. Nor is there any agreed otes had mitochondria. upon and robust deep eukaryotic phylogeny. While the Deep protists should also, according to prevailing the- lack of resolution may be largely a reflection of our meth- ory, lack introns. Not only would this be consistent with ods of phylogenetic analysis, some feel that eukaryotes the general observation that more complex organisms diversified so rapidly (in a “Big Bang” radiation) that generally have more introns (Logsdon, 1998), it would the order of the early branchings will never be known jibe with the popular theory that spliceosomal introns (Philippe et al., 2000a). Here, we summarize the prob- derive from group II introns were first introduced into lems, and consider whether comparative genomics eukaryotes by the premitochondrial symbiont (Cavalier- might hold the answers. Our take on these issues is not Smith, 1991). But if all known eukaryotes once had mito- unlike (and was influenced by) that presented earlier by chondria, did they also all once have (indeed do they Embley and Hirt (1998) and by Roger (1999), and of all still have) spliceosomal introns? Microsporidia surely necessity leaves out much: for instance, the complex do: introns have now been described in some of their supportive arguments for early mitochondria based on genes (Biderre et al., 1998) and U2 and U6 spliceosomal hydrogenosomes (Tachezy et al., 2001). RNAs are present (Fast et al., 1998). Although no intron- containing genes have yet been described in diplomo- Primitive Simplicity versus Secondary nads or parabasalids, a gene for the essential and highly Simplification conserved splicing component PRP8 has been found in The deepest of the deeply diverging eukaryotes were Trichomonas (Fast and Doolittle, 1999) and a PRP8 gene until recently thought to be those mitochondrion-lacking and several other splicing factors are listed at the Giardia Genome Review 421

Table 1. Presence of Endomembrane Genes in Diverse Eukaryotic Genome Projects Higher taxon organism Syntaxin Snap25 Rab ARFGap COPI Sec1 Ykt6p Fungi Saccharomyces **** **** **** **** **** **** **** Land Plants Arabidopsis **** **** **** *** **** **** **** Animal Human **** **** **** **** **** **** **** Giardia *** NI *** ** * ** ** Kinetoplastid Trypanosoma * NI **** * *** * * Apicomplexa Plasmodium *** NI *** *** ***-a *** * Apicomplexa Theileria * NI **** * * * * Slime molds Dictyostelium **** NI **** NI NI ** ** Entamoebae Entamoeba * NI **** * * * * Red Algae Porphyra ** NI ** NI ** NI ** Stramenopiles Phytophthora ** NI * ** * ** ** Green algae Chlamydomonas ** NI **** **** ** ** **

This summarizes searches as of Oct 2001. **** indicates a published reference, *** means that a homolog is listed in GenBank, ** denotes an identified BLAST hit listed on the relevant genomic project web page, * indicates that our search identified a putative homolog by BLAST with a score less than p ϭ 0.05. NI ϭ Not Identified: a homolog could not be reliably identified by any of the above criteria. a ϭ there is a COPI homolog identified in GenBank for the apicomplexan Toxoplasma. This data was obtained by searching, using keywords/BLASTing either GenBank or the following genomics projects, which we thank for their contributions: Giardia ϭ http://hermes.mbl.edu/baypaul/Giardia-HTML/index2.html, Chlamydomonas and Porphyra ϭ http://www.kazusa.or.jp/en/plant/database.html, Dictyostelium ϭ http://www.csm.bio1.tsukuba.ac.jp/cDNAproject.html, Phytophthora ϭ http://www.ncgr.org/pgc/indes.html and Theileria ϭ http://www.tigr.org/tdb/e2k1/tpa1/. While preliminary lists of this type can allow general statements about evolutionary events, to make robust and detailed conclusions using genomic data, it is also important to obtain near complete open reading frames (ORF) of putative proteins identified by BLAST. Potentially misleading BLAST results of partial sequences aside, the full ORF can be used for phylogentic analysis, or to identify those rare evolutionary events discussed below. The verified sequence will also allow for comparative analysis of functionally characterized residues.

genome project web site. The few gene sequences avail- verse taxa. The last common ancestor of all extant eu- able for other putatively early eukaryotes such as Recli- karyotes likely had complicated vesicular transport ma- nomonas (Archibald et al., 2001; Edgcomb et al., 2001) chinery. and Mastigamoeba (Stiller et al., 1998) contain introns, These are but three examples of systems once sometimes at high density. Although one awaits proof thought to be primitively absent in deeply diverging lin- of spliceable introns in genes of Giardia and Trichomo- eages that seem now more likely to have been second- nas, the odds are that all known eukaryotes have—or arily lost in such cells, or cryptically present in all eukary- once had—these elements in their genes, and the otes. Others, including the protein folding apparatus wherewithal to remove them. (Archibald et al., 2000) and translation elongation release A number of putatively deeply diverging lineages in- factors (Inagaki and Doolittle, 2000), are also well docu- cluding Heterolobosea, diplomonads, Entamoeba, Mas- mented. The last common eukaryote ancestor would tigamoeba, and oxymonads lack recognizable stacked probably not be thought primitive, were it available for Golgi bodies, and some were once thought to have di- examination today. Either (1) the “many and profound” verged prior to this key eukaryotic innovation (Cavalier- differences between eukaryotes and prokaryotes ap- Smith, 1983). In the case of Giardia (Lujan et al., 1995) peared very quickly in evolution, (2) many transitional and Entamoeba (Ghosh et al., 1999), however, there is stages have left no survivors, or (3) those survivors have evidence that a Golgi-like structure can be induced, and yet to be identified. that some Golgi enzymes are present in the cell. In the case of such a complex and variable system, mapping Problems with Phylogeny components and subassemblies to eukaryote phylog- Another barrier to reconstructing the prokaryote- eny might provide a picture of progressive complexifica- eukaryote transition is uncertainty about the phyloge- tion. The diversity of Rab proteins (GTPase proteins netic relationships of the lineages that are known. Many involved in vesicular transport) has been examined, with protein sequences from eukaryotes, developed to be homologs identified in animals, fungi (Armstrong, 2000), reliable markers of eukaryotic organismal evolution, do and plants (Borg and Poulsen, 1994) as well as diverse not reproduce the small subunit rDNA tree. Similar dis- protists (Bush et al., 1993; Field and Boothroyd, 1995; harmony among prokaryotic phylogenies based on dif- Janoo et al., 1999; Rodriguez et al., 2000). However, ferent genes may frequently be due to lateral gene trans- the majority of proteins involved in the endomembrane fer (Doolittle, 1999), and eukaryotes are surely not system have not been examined through comparative immune to transfer. Nevertheless, such incongruence genomics. Table 1 summarizes our recent preliminary in eukaryotes has most often been attributed to failures searches of completed and in-progress genomes for in methods of phylogenetic reconstruction. For compu- several crucial genes involved in the endomembrane tational reasons, early models of phylogenetic analysis system. Although Snap-25 homologs were not found involved radical and unrealistic simplifications. These outside animals, plants, and fungi, most proteins included the assumption that all sites in a protein vary searched did have at least one homolog present in di- equally, and that all changes between nucleotides or Cell 422

amino acids occur with the same frequency (Swofford et al., 1996). Most importantly, however, was the faulty assumption that the same gene in different organisms evolves at a constant rate. This led to an artifact called long branch attraction (LBA) (Felsenstein, 1978). In phy- logenetic reconstruction, sequences that evolve at higher rates are artificially attracted to each other and (for the same reasons) to sequences that are very differ- ent because they diverged very long ago. For eukaryotic trees rooted with bacterial or archaeal outgroup se- quences (which present long branches because they are indeed anciently diverged), the result will be the artifactual placement of rapidly evolving sequences at the root of the tree. The recognition of this artifact, and attempts to com- pensate for it with more sophisticated computer algo- rithms and biologically accurate models of sequence evolution, have thrown into doubt the ancient nature Figure 2. Eukaryotic Relationships Based on Consensus Data of many of the organisms once thought to be deeply This scheme takes into account (Baldauf and Palmer, 1993; Baldauf et al., 2000; Cavalier-Smith and Chao, 1996; Dacks et al., 2001; Fast diverging. Microsporidia were the first to go. Phyloge- et al., 2001; Moreira et al., 2000; Simpson and Patterson, 2001), nies of tubulins and RNA polymerase as well as reanaly- among others. Taxa with full genome sequencing or GSS projects ses of other markers have shown that the microsporidia underway are shown in Blue, while those with EST projects only are are not ancient eukaryotes, but either degenerate mem- shown in Red. Taxa with both EST and full genomic initiatives are bers or very close relatives of the fungi (Hirt et al., 1999; shown in purple. Keeling and Doolittle, 1996; Van de Peer et al., 2000). The ancient nature of the parabasalids and diplomonads Martin and Muller, 1998). Searching by microscopic has also come into question: these organisms do pres- methods in benthic environments have uncovered inter- ent long branches in many phylogenetic reconstructions esting new organisms such as Reclinomonas (Flavin and (Hirt et al., 1999; Stiller and Hall, 1999). However, no Nerad, 1993), (Brugerolle and Patterson, 1997), alternative placement has been suggested for these taxa Malawimonas (O’Kelly and Nerad, 1999), and Carpedie- and so their status as deeply diverging lineages is only monas (Simpson and Patterson, 1999). For prokaryotes, in doubt, not disproved. it is estimated that we only know 1% of the actual biodi- In doubt as well, though, is the very structure of the versity that exists globally (Pace, 1997). For eukaryotes eukaryotic tree. Philippe et. al. (2000b) have argued that the fraction might be higher than that since diversity long branch attraction can provide false support for the can be more easily seen, but by how much? Recent ladder-like structure of sequentially emerging taxa in environmental PCR studies of ocean water have revealed the eukaryotic phylogenetic tree, seen in Figure 1B. They (through their SSU rDNA sequences) novel eukaryotic show that correction for this artifact produces a multifur- groups (Lopez-Garcia et al., 2001). When environmental cated tree, whose deep branching order is unresolved genomic sampling methods (such as characterizing BAC (Philippe et al., 2000a). This, they propose, is the result clones from whole communities or ecosystems [Beja et of a real biological phenomenon, not methodological al., 2000]) become widely applied, our appreciation of failure. Their “Big Bang” hypothesis suggests that many eukaryotic diversity, not only phylogenetic but genomic, of the extant eukaryotic lineages evolved very rapidly physiological, and structural diversity, will expand enor- from one another: the branching pattern is unresolved mously. because, for most markers, not enough mutations oc- Second, the multiplicity of new individual gene se- curred between branchings to allow reconstruction of quences that emerge from genome projects and ge- the order of events (Philippe et al., 2000a). Such a view nome surveys (such as single-pass and EST analyses) is of course consistent with (although not required by) will allow different and more compelling sequence- the conclusion drawn above that extant characterized based phylogenetic reconstructions. As we have seen eukaryotes diverged from an already complex eukaryo- in the past few years, single strong pieces of evidence tic ancestor. However, a rapid radiation does not neces- that are consistent with previous inconclusive data can sarily mean that the phylogeny of eukaryotes is an insur- help us lock down phylogenetic positions. Sequences mountable or unresolvable question. from different genes will be useful for reconstructing relationships between different organisms. The debate How Genomics Might Help with Phylogeny about whether red and green algae share a recent com- Scientists almost always assert that what is needed is mon ancestor seemed played to a stalemate, until se- “more data.” Here, we need more data of at least three quences of the EF2 gene strongly placed them together sorts. (Moreira et al., 2000). Similarly adding an additional or- First, one can look harder for surviving direct descen- ganism can help to down a taxon that had gone dants of the primitively simple eukaryotes that must phylogenetically adrift, as in the case of the once have thrived in ancient anaerobic habitats, ac- and Trimastix relationship (Dacks et al., 2001). More cording to all currently fashionable theories of cell evolu- genome sequences will help us find more insertion/dele- tion (Lopez-Garcia and Moreira, 1999; Margulis, 1970; tions such as the EF1␣ insertion that unites animals and Genome Review 423

fungi (Baldauf and Palmer, 1993). These can give great that that last common eukaryotic ancestor was a com- confidence because they rely on positive evidence for plex cell notwithstanding, there will surely be taxon- extremely unlikely events, in addition to whatever phylo- specific differences in parts lists that support more in- genetic signal their sequences provide. Increasing gene clusive groupings of the taxa in Figure 2. Conversely, sequence from genomic projects also allows us to com- as such groupings (and the eukaryotic root) become bine many genes into concatenated multigene datasets. established in multiple overlapping analyses, one can This approach has been used successfully a number of undertake a detailed reconstruction of the evolutionary times in the past few years to yield robust evidence for history of specific eukaryotic cellular structures. If we do relationships that were previously contested (Martin et find new eukaryotic lineages that are primitively simple, al., 1998; Baldauf et al., 2000). Some of these new or then the “many and profound” differences that distin- newly affirmed relationships are illustrated in Figure 2, guish eukaryotes from prokaryotes can be tracked to which summarizes what we feel to be the most well- their origins. If we don’t, we can content ourselves with supported relationships among eukaryotes, based on detailing the many elaborations wrought in such struc- many sorts of data. tures over one to two billion years of eukaryotic cellular Third, data from EST and partial and complete ge- radiation. nome projects for many protists will allow comparative From a practical standpoint, until we have a well- analyses of their (partial or complete) gene comple- resolved phylogeny, it is necessary to sample as many ments, or “proteomes,” in popular parlance. There are eukaryotes as possible. EST projects provide evidence several ways in which information on gene content could for protein presence and expression levels. Full genome augment sequence-based phylogenetic analyses of in- sequences are going to be harder to come by, but, dividual or concatenated genes, by revealing unique importantly, provide information about protein absence, rare gene duplication, fusion, or transfer events. These crucial when ascribing simplified protein complements can convincingly link existing groups to single inferred to potentially early-diverging lineages. Although geno- ancestors, just as insertions and deletions within genes mics projects are currently clustered in the better-known can do. For instance, Philippe and collaborators (2000a, taxa, as a wider diversity is covered, we will be able 2000b) recently identified a fusion of dihydrofolate re- to glean more and more information about eukaryotic ductase (DHFR) and thymidylate synthase (TS) genes biology. By looking at the detailed phylogenies and para- that unites plants, alveolates, and Euglenozoa. In a more log complements from diverse eukaryotes, we may be complex case, Fast et al. (2001) propose that the plastid- able to reconstruct, on a molecular level, aspects of our targeted GAPDH (glyceraldehyde-3-phosphate dehy- earliest eukaryotic ancestors’ cell biology. drogenase) genes common to apicomplexa, dinoflagel- lates, heterokonts, and perhaps cryptomonads derive Acknowledgments from a unique duplication and retargetting of cytosolic GAPDH at the base of this group, which also includes We would like to thank A.J. Roger, J.M. Logsdon, A.G.B. Simpson, the secondarily aplastidic ciliates. and R.A. Singer for helpful discussion, and the latter two also for Unambiguous cases of such unique genetic events critical reading of the manuscript. This work was supported by Cana- may not only serve to unite groups, but also to root the dian Institutes of Health Research (C.I.H.R.) grant MT4467 to W.F.D. J.B.D. is supported by a C.I.H.R. Doctoral Research Award and eukaryotic tree. Rooting requires events that unite all W.F.D. by a Canada Research Chair. eukaryotic taxa save one; that one, provided it convinc- ingly shares the pre-event character state with prokary- References otic outgroups, can then be taken as the deepest branch. Keeling and Palmer (2000) recently found two Archibald, J.M., Logsdon, J.M., Jr., and Doolittle, W.F. (2000). Origin deletions in enolase shared by all known eukaryotes and evolution of eukaryotic chaperonins: phylogenetic evidence for save parabasalids (such as Trichomonas). Parabasalids ancient duplications in CCT genes. Mol Biol Evol 17, 1456–1466. are themselves like archaea in enolase structure, and Archibald, J.M., Doolittle, W.F., and O’Kelley, C.J. (2001). The chap- on this basis these authors nominated parabasalids as eronin genes of and jakobid-like flagellates: implications for the deepest eukaryotes. Any single such one-of-a kind eukaryotic evolution. Mol. Biol. Evol., in press. event can in principle establish relationships between Armstrong, J. (2000). Membrane traffic between genomes. Genome the unconnected groups of Figure 2, but molecular evo- Biol 104, 1–4. lutionists are as often wrong as right in specific cases. Baldauf, S.L., and Palmer, J.D. (1993). Animals and fungi are each Thus, it is comforting that genomic data will likely pro- other’s closest relatives: congruent evidence from multiple proteins. vide scores of such examples. Proc. Natl. Acad. Sci. USA 90, 11558–11562. Protist genomics is in its infancy, about where pro- Baldauf, S.L., Roger, A.J., Wenk-Siefert, I., and Doolittle, W.F. (2000). karyotic genomics was six years ago when the first com- A -level phylogeny of eukaryotes based on combined pro- tein data. Science 290, 972–977. plete sequences began to appear. What evolutionary microbiologists will soon have are phylogenetic trees Beja, O., Aravind, L., Koonin, E.V., Suzuki, M.T., Hadd, A., Nguyen, L.P., Jovanovich, S.B., Gates, C.M., Feldman, R.A., Spudich, J.L., for tens of thousands of genes and partial or complete et al. (2000). Bacterial rhodopsin: evidence for a new type of photo- gene contents (through EST, genome surveys and com- trophy in the sea. Science 289, 1902–1906. plete sequences) for dozens of protists about many of Biderre, C., Metenier, G., and Vivares, C.P. (1998). A small spliceoso- whose evolutionary relationships we can currently only mal-type intron occurs in a ribosomal protein gene of the microspori- guess. As these data accumulate, comparative analyses dian Encephalitozoon cuniculi. Mol Biochem Parasitol 94, 283–286. of the “parts lists” of eukaryotic molecular machines Borg, S., and Poulsen, C. (1994). Molecular analysis of two Ypt/ and cellular organelles much more detailed and com- Rab-related sequences isolated from soybean (Glycine max) DNA plete than Table 1 will be assembled. Our conclusion libraries. Plant Mol. Biol. 26, 175–187. Cell 424

Bork, P., Sander, C., and Valencia, A. (1992). An ATPase domain dence from the largest subunit of RNA polymerase II and other common to prokaryotic cell cycle proteins, sugar kinases actin and proteins. Proc. Natl. Acad. Sci. USA 96, 580–585. hsp 70 heat shock proteins. Proc. Natl. Acad. Sci. USA 89, 7290– Horner, D.S., and Embley, T.M. (2001). Chaperonin 60 phylogeny 7294. provides further evidence for secondary loss of mitochondria among Brugerolle, G., and Patterson, D.J. (1997). Ultrastructure of Trimastix putative early-branching eukaryotes. Mol. Biol. Evol. 18, 1970–1975. convexa Hollande, an amitochondriate anaerobic flagellate with a Horner, D.S., Hirt, R.P., Kilvington, S., Lloyd, D., and Embley, T.M. previously undescribed organization. Eur. J. Prot. 33, 121–130. (1996). Molecular data suggest an early acquisition of the mitochon- Bui, E.T., Bradley, P.J., and Johnson, P.J. (1996). A common evolu- drion endosymbiont. Proc. R. Soc. Lond. B Biol. Sci. 263, 1053–1059. tionary origin for mitochondria and hydrogenosomes. Proc. Natl. Inagaki, Y., and Doolittle, W.F. (2000). Evolution of the eukaryotic Acad. Sci. USA 93, 9651–9656. translation termination system: origins of release factors. Mol. Biol. Bush, J., Franek, K., Daniel, J., Spiegelman, G.B., Weeks, G., and Evol. 17, 882–889. Cardelli, J. (1993). Cloning and characterization of five novel Dictyo- Janoo, R., Musoke, A., Wells, C., and Bishop, R. (1999). A Rab1 stelium discoideum rab-related genes. Gene 136, 55–60. homologue with a novel isoprenylation signal provides insight into Cavalier-Smith, T. (1983). A 6-kingdom classification and a unified the secretory pathway of Theileria parva. Mol. Biochem. Parasitol. phylogeny. In Endocytobiology II, H.E.A. Schenk and W. Schwemm- 102, 131–143. ler, eds. (Berlin: Walter de Gruyter), pp. 1027–1034. Jones, L.J., Carballido-Lopez, R., and Errington, J. (2001). Control of Cavalier-Smith, T. (1991). Intron phylogeny: a new hypothesis. cell shape in bacteria: helical, actin-like filaments in Bacillus subtilis. Trends Genet. 7, 145–148. Cell. 104, 913–922. Cavalier-Smith, T., and Chao, E.E. (1996). Sarcomonad ribosomal Keeling, P.J., and Palmer, J.D. (2000). Parabasalian flagellates are RNA sequences, rhizopod phylogeny, and the origin of euglyphid ancient eukaryotes. Nature 405, 635–636. amoebae. Arch. Protistenkd. 147, 227–236. Keeling, P.J., and Doolittle, W.F. (1996). Alpha-tubulin from early- Clark, C.G., and Roger, A.J. (1995). Direct evidence for secondary diverging eukaryotic lineages and the evolution of the tubulin family. loss of mitochondria in Entamoeba histolytica. Proc. Natl. Acad. Sci. Mol Biol Evol 13, 1297–1305. USA 92, 6518–6521. Logsdon, J.M., Jr. (1998). The recent origins of spliceosomal introns Dacks, J.B., Silberman, J.D., Simpson, A.G., Moriya, S., Kudo, T., revisited. Curr. Opin. Genet. Dev. 8, 637–648. Ohkuma, M., and Redfield, R.J. (2001). Oxymonads are closely re- Lopez-Garcia, P., and Moreira, D. (1999). Metabolic symbiosis at lated to the excavate taxon Trimastix. Mol. Biol. Evol. 18, 1034–1044. the origin of eukaryotes. Trends Biochem. Sci. 24, 88–93. Doolittle, W.F. (1999). Phylogenetic classification and the universal Lopez-Garcia, P., Rodriguez-Valera, F., Pedros-Alio, C., and More- tree. Science 284, 2124–2129. ira, D. (2001). Unexpected diversity of small eukaryotes in deep-sea Edgcomb, V.P., Roger, A.J., Simpson, A.G., Kysela, D.T., and Sogin, Antarctic plankton. Nature 409, 603–607. M.L. (2001). Evolutionary relationships among “jakobid” flagellates Lujan, H.D., Marotta, A., Mowatt, M.R., Sciaky, N., Lippincott- as indicated by alpha- and beta-tubulin phylogenies. Mol. Biol. Evol. Schwartz, J., and Nash, T.E. (1995). Developmental induction of 18, 514–522. Golgi structure and function in the primitive eukaryote Giardia lam- Embley, T.M., and Hirt, R.P. (1998). Early branching eukaryotes? blia. J. Biol. Chem. 270, 4612–4618. Curr. Opin. Genet. Dev. 8, 655–661. Margulis, L. (1970). Origin of eukaryotic cells; evidence and research Fast, N.M., and Doolittle, W.F. (1999). Trichomonas vaginalis pos- implications for a theory of the origin and evolution of microbial, sesses a gene encoding the essential spliceosomal component, plant, and animal cells on the Precambrian earth (New Haven: Yale PRP8. Mol. Biochem. Parasitol. 99, 275–278. University Press). Fast, N.M., Kissinger, J.C., Roos, D.S., and Keeling, P.J. (2001). Martin, W., and Muller, M. (1998). The hydrogen hypothesis for the Nuclear-encoded, plastid-targeted genes suggest a single common first eukaryote. Nature 392, 37–41. origin for apicomplexan and dinoflagellate plastids. Mol. Biol. Evol. Martin, W., Stoebe, B., Goremykin, V., Hapsmann, S., Hasegawa, 18, 418–426. M., and Kowallik, K.V. (1998). Gene transfer to the nucleus and the Fast, N.M., Roger, A.J., Richardson, C.A., and Doolittle, W.F. (1998). evolution of chloroplasts. Nature 393, 162–165. U2 and U6 snRNA genes in the microsporidian Nosema locustae: Moreira, D., Le Guyader, H., and Phillippe, H. (2000). The origin of evidence for a functional spliceosome. Nucleic Acids Res. 26, 3202– red algae and the evolution of chloroplasts. Nature 405, 69–72. 3207. O’Kelly, C.J., and Nerad, T.A. (1999). Malawimonas jakobiformis n. Felsenstein, J. (1978). Cases in which parsimony or compatibility gen., n. sp. ( n. fam.): A -like heterotrophic methods will be positively misleading. Systematic Zoology 25, nanoflagellate with discoidal mitochondrial cristae. J. Euk. Micro. 401–410. 46, 522–531. Field, M.C., and Boothroyd, J.C. (1995). Trypanosoma brucei: molec- Pace, N.R. (1997). A molecular view of microbial diversity and the ular cloning of homologues of small GTP-binding proteins involved biosphere. Science 276, 734–740. in vesicle trafficking. Exp. Parasitol. 81, 313–320. Flavin, M., and Nerad, T.A. (1993). Reclinomonas americana N. G., Philippe, H., Germot, A., and Moreira, D. (2000a). The new phylogeny N. Sp., a new freshwater heterotrophic flagellate. J. Eukaryot. Micro- of eukaryotes. Curr. Opin. Genet. Dev. 10, 596–601. biol. 40, 172–179. Philippe, H., Lopez, P., Brinkmann, H., Budin, K., Germot, A., Laurent, Germot, A., Philippe, H., and Le Guyader, H. (1996). Presence of J., Moreira, D., Muller, M., and Le Guyader, H. (2000b). Early- a mitochondrial-type 70-kDa heat shock protein in Trichomonas branching or fast-evolving eukaryotes? An answer based on slowly vaginalis suggests a very early mitochondrial endosymbiosis in eu- evolving positions. Proc. R. Soc. Lond. B Biol. Sci. 267, 1213–1221. karyotes. Proc. Natl. Acad. Sci. USA 93, 14614–14617. Roger, A.J. (1999). Reconstructing early events in eukaryotic evolu- Germot, A., Philippe, H., and Le Guyader, H. (1997). Evidence for tion. Am. Nat. 154, S146–S163. the loss of mitochondria in Microsporidia from a mitochondrial-type Rodriguez, M.A., Garcia-Perez, R.M., Garcia-Rivera, G., Lopez- HSP70 in Nosema locustae. Mol. Biochem. Parasitol. 87, 159–168. Reyes, I., Mendoza, L., Ortiz-Navarrete, V., and Orozco, E. (2000). Ghosh, S.K., Field, J., Frisardi, M., Rosenthal, B., Mai, Z., Rogers, An Entamoeba histolytica rab-like encoding gene and protein: func- R., and Samuelson, J. (1999). Chitinase secretion by encysting Enta- tion and cellular location. Mol. Biochem. Parasitol. 108, 199–206. moeba invadens and transfected Entamoeba histolytica trophozo- Roger, A.J., Clark, C.G., and Doolittle, W.F. (1996). A possible mito- ites: localization of secretory vesicles, endoplasmic reticulum, and chondrial gene in the early-branching amitochondriate protist Golgi apparatus. Infect. Immun. 67, 3073–3081. Trichomonas vaginalis. Proc. Natl. Acad. Sci. USA 93, 14618–14622. Hirt, R.P., Logsdon, J.M., Jr., Healy, B., Dorey, M.W., Doolittle, W.F., Roger, A.J., Svard, S.G., Tovar, J., Clark, C.G., Smith, M.W., Gillin, and Embley, T.M. (1999). Microsporidia are related to Fungi: evi- F.D., and Sogin, M.L. (1998). A mitochondrial-like chaperonin 60 Genome Review 425

gene in Giardia lamblia: Evidence that diplomonads once harbored an endosymbiont related to the progenitor of mitochondria. Proc. Natl. Acad. Sci. USA 95, 229–234. Simpson, A.G., and Patterson, D.J. (2001). On core and excavate taxa: the ultrastructure of Jakoba incarcerata. J. Euk. Mi- cro. 48, 480–492. Simpson, A.G.B., and Patterson, D.J. (1999). The ultrastructure of membranifera: (Eukaryota), with reference to the “excavate hypothesis”. Eur. J. Prot. 35, 353–370. Sogin, M.L. (1991). Early evolution and the origin of eukaryotes. Curr. Opin. Gen. Dev. 1, 457–463. Stanier, R.Y. (1974). The origins of photosynthesis in Eukaryotes. In Evolution in the Microbial World, M.J. Carlile and J. J. Skehel, eds. (Cambridge, UK: Cambridge University Press), pp. 219–240. Stiller, J.W., and Hall, B.J. (1999). Long-branch attraction and the rDNA model of early eukaryotic evolution. Mol Biol Evol 16, 1270– 1279. Stiller, J.W., Duffield, E.C., and Hall, B.D. (1998). Amitochondriate amoebae and the evolution of DNA-dependent RNA polymerase II. Proc. Natl. Acad. Sci. USA 95, 11769–11774. Swofford, D.L., Olsen, G.J., Waddell, P.J., and Hillis, D.M. (1996). Phylogenetic Inference. In Molecular Systematics, D.M. Hillis, C. Moritz and B. K. Mable, eds. (Sunderland, Mass.: Sinauer Associates Inc.), pp. 407–514. Tachezy, J., Sanchez, L.B., and Muller, M. (2001). Mitochondrial type iron sulfur cluster assembly in the amitochondriate eukaryotes Trichomonas vaginalis and Giardia intestinalis, as indicated by the phylogeny of IscS. Mol Biol Evol 18, 1919–1928. Van de Peer, Y., Ben Ali, A., and Meyer, A. (2000). Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi. Gene 246, 1–8. van den Ent, F., Amos, L.A., and Lowe, J. (2001). Prokaryotic origin of the actin cytoskeleton. Nature 413, 39–44.