<<

L’évolution des Eucaryotes

Denis BAURAIN Département des Sciences de la Vie Université de Liège Société Royale des Sciences de Liège 20 septembre 2012 Plan de l’exposé

1. Qu’est-ce qu’un Eucaryote ? 2. Quelle est la diversité des Eucaryotes ? 3. Quelles sont les relations de parenté entre les grands groupes d’Eucaryotes ? 4. D’où viennent les Eucaryotes ? Qu’est-ce1 qu’un Eucaryote ? Eukaryotic Cells

définition ultrastructurale : organelles spécifiques • noyau (1) • nucléole (2) • RE (5, 8) • Golgi (6) • centriole(s) (13) • mitochondrie(s) (9) • chloroplaste(s) • ... http://en.wikipedia.org/ A eukaryotic gene is arranged in a patchwork of coding (exons) and non-coding sequences (introns). Introns are eliminated while exons are spliced together to yield the mature mRNA used for protein synthesis.

http://reflexions.ulg.ac.be/ Gene DNA Transcription

Exon1 Exon2 Exon3 Exon4 Exon5 Exon6 pre-mRNA

Alternatif splicing

mature mRNA

Translation

Protein

In many , almost all genes can lead to different proteins through a process termed alternative splicing.

http://reflexions.ulg.ac.be/ REVIEWS

Box 2 | Endosymbiotic evolution and the tree of genomes Intracellular that originally descended from free-living prokaryotes have been important in the evolution of eukaryotes by giving rise to two cytoplasmic organelles. Mitochondria arose from α-proteobacteria and arose from cyanobacteria. Both organelles have made substantial contributions to the complement of genes that are found in eukaryotic nuclei today. The figure shows a schematic diagram of the evolution of eukaryotes, highlighting the incorporation of mitochondria and chloroplasts into the eukaryotic lineage through endosymbiosis and the subsequent co-evolution of the nuclear and organelle genomes. The host that acquired probably possessed two flagella113. The nature of the host cell that acquired the mitochondrion (lower right) is fiercely debated among cell evolutionists. The host is generally accepted by most to have an affinity to ARCHAEBACTERIA but beyond that, biologists cannot agree as to the nature of its intracellular organization (prokaryotic, eukaryotic Endosymbiotic Organelles120 or intermediate), its age, its biochemical lifestyle or how many and what kind of genes it possessed . The host is usually assumed to have been unicellular and to have lacked mitochondria.

Eukaryotes

Early diversification of Origin of Lynn Margulis algal/ lineages and plastids gene transfer to the host Ancient protozoon origines endosymbiotiques de la Early diversification of eukaryotic lineages and gene transfer to the host mitochondrie et du chloroplaste Ancient cyanobacterium Origin of mitochondria

Ancient The host proteobacterium that acquired mitochondria

Cyanobacteria Proteobacteria Archaebacteria

Timmis et al. (2004) Nat Rev Genet 5:123-135 the term ‘NUMTS’ was coined to designate these nuclear divergence indicates recurrent transfer events, from stretches of mtDNA26.Numts have been found in the ancient to contemporary. The human genome has at 27 28,29 ARCHAEBACTERIA nuclear genomes of grasshoppers ,primates and least 296 different numts of between 106 bp and 14,654 An ancient group of organisms shrimps30, and are often mistaken for bona fide bp (90% of the mitochondrial genome) that cover the that have ribosomes and cell mtDNA31,32. entire mtDNA circle34.Other studies tallied 612 mtDNA membranes that distinguish Eukaryotic genome sequences have more fully insertions in the human genome35,a greater number them from eubacteria. They sometimes show exposed the scale of integrated mitochondrial and because different sequence conservation criteria for 36 environmentally extreme cpDNA in the nuclear genome. Fragments of organelle identifying numts are used in different studies .Older ecology. DNA are becoming recognized as a normal attribute of numts are more abundant in the human genome than nearly all eukaryotic chromosomes. For example, the recent INTEGRANTS, indicating that mtDNA can be ampli- NUMT yeast genome contains tracts with 80–100% similarity fied once inserted36,37 and many are organized as tandem An acronym to describe nuclear 35 integrants of mitochondrial to mtDNA that range in size from 22 to 230 base pairs repeats .Barely detectable numts are present in DNA. (bp) integrated at 34 sites33.This range of sequence Plasmodium32,but highly conserved numts have now

NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 125

262 DANIEL GRZEBYK ET AL.

Table 2. The protein-coding gene nomenclature in algal rarachniophyte plastids appear to have been acquired genomes (adapted from Stoebe et al. 1998). from secondary endosymbiosis of chlorophytes (van de Peer et al. 1996, Bhattacharya and Medlin 1998, Gene code Protein function Palmer and Delwiche 1998, Turmel et al. 1999). Ac- accA,B,D Acetyl-CoA carboxylase cordingly, the set of genes retained in secondary en- acpP Acyl carrier protein apcA,B,D,E,F Allophycocyanin phycobilisome dosymbiotic plastids is almost totally included in the argB Acetylglutamate kinase primary lineage from which they descended (Fig. 1). atpA,B,D,E,F,G,H,I ATP synthase The only exceptions are a small number of genes re- bas1 Thiol-specific antioxidant protein tained in secondary plastids that may not be present bioY Biotin synthase carA Carbamoyl phosphate synthetase inGenome their related extant primary plastids: minD and Reduction cbbX Red type Calvin cycle operon minE in cryptophytes (Fig. 1), ycf66 in , and ccsA Heme attachment to cytochrome c ycf13 and ycf67 in euglenophytes. These genes are rel- cemA Envelope membrane protein REVIEWS chlB,I,L,N Protochlorophyllide reductase ics from ancestral primary plastids that were later lost. clpC/P Caseinolytic-like protease (Clp) cpcA,B,G Phycocyanin phycobilisome gene retention and losses in chloroplast Mitochondrion-to-nucleus transfers. In yeast, a recombi- cpeA,B Phycoerythrin genomes reflect evolutionary patterns nant plasmid, which was introduced into a genome- crtE Geranylgeranyl pyrophosphate synthetase cysA,T Probable transport proteins Whereas Chloroplastthe presence genome reductionof specific genes provides clues lacking mitochondrion, was shown to relocate to the dfr Drug sensory protein about the originand gene of transfer plastids, to the nucleus plastid gene losses, which nucleus as an EPISOME (that is, not recombined into nuclear DNA) at a frequency of 2 x 10–5 per cell per dnaB DNA-replication helicase are effectivelyPhotosynthetic irreversible, electron also provide evolutionary dnaK Hsp 70-type chaperone transport, respiration, ATPase generation80chloroplastes.A lower frequency (5 x 10–6 per cell per gen- information. There are 200 or fewer protein-coding dsbD Thiol:disulfide interchange protein Functional eration) of episomal relocation was observed when the Translation categories fabH ␤-Ketoacyl-acyl carrier protein synthase III genes in primary plastids. Assuming that the number plasmid was integrated into the mitochondrial chromo- fdx 2[4Fe-4S] ferredoxin of genes in Otherthe original primary ur-plastid was similar 81 ftrB Ferredoxin-thioredoxin reductase some .In these experiments, the released DNA was epi- ftsH,W Division proteins to that found in the extant cyanobacterium Synechocys- somal, indicating that release of DNA from the yeast glnB Nitrogen regulatory protein tis PCC6803 (3168 protein-coding genes; Kaneko et mitochondrion is frequent, but integration might be rare gltB Glutamate synthase (GOGAT) al. 1996), more than 93% of the ancestral endosym- in yeast nuclei because of their characteristically high level groEL,ES Chaperonins 60 and 10 kDa biont genome was lost or transferred to the host nu- of reliance on homologous recombination for DNA hemA 5-Aminolevulinic acid synthase cleus in the primary plastid lineages (Fig. 2). Green incorporation. Newer work indicates that mtDNA escape hisH Histidinol-phosphate aminotransferase in yeast occurs through an intracellular mechanism that I-CvuI DNA endonuclease plastids exhibit the most numerous gene losses, either depends on the composition of the growth medium and ilvB,H Acetohydroxyacid synthase specific losses or those in common with . infA,B,C Translational initiation factors the genetic state of the mitochondrial genome, and is Subsequently, patterns of gene losses occurred differ- independent of an RNA intermediate82. minD,E Homologues of bacterial cell division Genome reduction regulators ently at phylum level radiations within primary lin- mntA,B Manganese transport system proteins eages and at radiations accompanying secondary sym- Chloroplast-to-nucleus transfers in higher plants.Only moeB Molybdopterin biosynthesis protein biotic events. Between 1% and 10% of the remaining more recently has it been possible to quantify the process nadA Quinolinate synthase of chloroplast-to-nucleus DNA transfer. To determine nblA Phycobilisome degradation protein plastid genes were lost at phylum level radiations the frequency of plastid DNA transfer and integrative ndhA-J NADH-plastoquinone oxidoreductase within Evolutionaryrhodophytes and chlorophytes. A larger num- ndhK NADH-ubiquinone oxidoreductase ber of timethe remaining primary plastid genes were lost recombination into the higher plant nuclear genome, the ntcA Global nitrogen transcriptional regulator plastome of tobacco was transformed with a neomycin odpA,B Pyruvate dehydrogenase E1 component at radiations linked to secondary symbiotic events. For phosphotransferase gene (neoSTLS2) that was tailored pbsA Heme oxygenase example, with the divergence of cryptophyte and ba- for expression only in the nuclear genome83 (BOX 4).In 16 petA,B,D,F,G,J,L,M Photosystem electron transport proteins cillariophyte plastids from the rhodoplasts, between out of ~250,000 seedlings, the neoSTLS2 marker had pgmA Phosphoglycerate mutase 1.gènes inutiles pour preA Prenyl transferase 15% and 20% of the plastid genes were lost. Approxi- been integrated into a nuclear chromosome, each time psaA-M PSI proteins mately 30% of the genes were lost at the divergence of in a different location, which equates to a chloroplast- psbA-X PSII proteins euglenoid plastids from chloroplasts. This analysis sug- to-nucleus DNAun transfer endosymbionte frequency of one in 16,000 rbcLg/r RUBISCO large subunit, green and red forms gests that endosymbiotic events resulted in relatively gametes tested. The diversity of insertion locations rbcR RUBISCO operon transcriptional rapid and massive gene losses in plastids, whereas the indicates that the marker might be transposed during regulator meiotic2. or postmeioticgènes events duringredondants male gamete for- rbcSg/r RUBISCO small subunit, green and red forms radiations within phyla were accompanied by slower mation because the extreme alternative explanation for rdpO Probable reverse transcriptase and more gradual genomic erosion. these integrations — a single transfer event that is subse- rne RNAseE We 201used 141 the 125 patterns 181 59 79of gene 92 100 loss, 87 inferred 73 80 81 from 80 80a 136 3,200 ~2,000 – rpl1-36 Large subunit ribosomal proteins >7,000 quently amplifiedavec by somatic ceux cell division — woulddu lead noyau rpoA,B,C1,C2 RNA polymerase simple presence/absence analysis, to construct an to the same integration site being found in all plants with rps1-20 Small subunit ribosomal proteins evolutionary tree (Fig. 3). In this analysis, we implic- chloroplast integrants83.In agreement with the DNA Pinus

secA,Y Preprotein-translocase BIOLISTIC TRANSFORMATION

Chlorella integrations induced by , these Porphyra itly assumeGuillardia that each gene retained in a plastid re- Odontella 3.gènes transférés Cyanidium syfB Phenylalanine tRNA synthetase Marchantia Mesostigma flects an ancestral status, and hence, at least within a Cyanophora transfers show no particular preference for recombina- Nephroselmis

Histidine tRNA synthetase Synechocystis syh tion sites in either the nuclear or plastid genomes. thiG Thiamine biosynthesis lineage, the more evolved a plastid, the more genesFlowering plants Using a similar experimental strategy with a trans- trpA Tryptophane synthase were lost. The patterns of gene loss clearly separate Other cyanobacteria dans le noyau trpG Anthranilate synthase, glutamine Lineage diversification gene in a different plastomic location, the frequency of amidotransferase the red plastid lineage, with rhodophytes at the base chloroplast-to-nucleus transposition was estimated in of the cluster, from the green lineage. Further out of 84 trxA Thioredoxin Figure 2 | Reduction of the chloroplast genome over time. We know that plastids tobacco somatic cells .Leaf tissue from transplastomic tsf Translational elongation factor Ts the redoriginated lineage more arethan 1.2 the billion secondary years ago, because endosymbiotic fossil of al- that age have tobacco that contained an intron-less neo gene was cul- tufA Translational elongation factor Tu gae: thebeen found cryptophytes121. The ancestor and of plastids bacillariophytes. was a free-living cyanobacterium In the and therefore tured on medium that contained high concentrations of upp Uracil phosphoribosyltransferase must have possessed several thousand genes as did its contemporaries. Subsequent to ycf Hypothetical proteins green lineage, the Mesostigma plastid appears ancestral kanamycin (100–400 mg/L). Twelve highly resistant withinthe the invention chlorophytes,Timmis of a plastid protein et and importal. the (2004) apparatus pattern (a prerequisite Natof gene Rev for loss relocating Genet genes 5:123-135; plants were Grzebyk regenerated, 11et of whichal. (2003)showed Mendelian J Phycol 39:259-267 that encode proteins required by the organelle to the nucleus), plastids relinquished most inheritance of the antibiotic-resistant phenotype. After a suggestsof their that genes land to the plants genome ofdiverged their host cell. early This gene from relocation chloro- process occurred massively at the onset of endosymbiosis and continued in parallel during algal courageous approximation of the number of regenerat- diversification, yet the same core set of genes (for photosynthesis and translation) has able cells that were present in LEAF EXPLANTS,a chloroplast- been retained in all lineages. The size of the bars shown indicates the genome sizes of to-nucleus transposition frequency of one event in ~five chloroplasts from a diversity of plant lineages, from red algae (Porphyra) to angiosperms million somatic cells was estimated84.Taken at face value, (flowering plants) and Cyanophora (belonging to the most ancient lineage of the frequency in somatic cells is ~300 times lower than photosynthetic eukaryotes), and their free-living cyanobacterial relatives (cyanobacteria). that in male gametes of the same species83.The pro- The reduction in chloroplast genome size has been mapped onto a phylogenetic tree of the relationships among these genomes. Numbers at the end of branches indicate the grammed degeneration of plastids that occurs during number of genes that are present in the respective genome. These genes are divided into pollen-grain development — the process that underpins three functional categories that are represented by the three different colours making up UNIPARENTAL INHERITANCE of plastid genes (FIG. 1) — might the bars. Data from REF. 19. explain this difference. After the chloroplast genomes are

NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 129 EndosymbioticREVIEWS Gene Transfer

Proteobacterium-like endosymbiotic ancestor

Mitochondrion

Organelle DNA Chloroplast

Proteins Nucleus

Organelle Bill Martin DNA

Other

relocalisation dans le noyau Cyanobacterium-like des gènes encodés dans les endosymbiotic ancestor

organelles endosymbiotiquesFigure 1 | Organellar DNA mobility and the genetic control of biogenesis of mitochondria and chloroplasts. The eukaryotic mitochondrion is derived from a proteobacterial endosymbiotic ancestor but most of the genes that were originally present in this ancestor’s genome have been transferred to the nucleus (thick black arrow), with only a small number being retained in the organelle (blue circle). Similarly, most of the genes from the cyanobacterial endosymbiontTimmis et ancestor al. (2004) of the chloroplastNat Rev Genetwere 5:123-135 also transferred to the nucleus (thick black arrow). So, as a result, cytoplasmic organelles are heavily dependent on nuclear genes and import more than 90% of their proteins from the cytoplasm (white arrows). The dotted arrows indicate how DNA of mitochondrial (blue) and chloroplast (green) origin is still being transferred to the nucleus. Chloroplast and nuclear sequences are also found in the mitochondrial genome but little or no promiscuous DNA is located in the chloroplast.

EPISOME A unit of genetic material that is composed of a series of genes new patterns of compartmentalization in the cell11. mixing and matching of endosymbiotically inherited that sometimes has an Moreover, gene donations from organelles often lead to functions with newly evolved, -specific bio- independent existence in a host functional replacement of pre-existing and functionally chemistry78,79. cell and at other times is integrated into a chromosome of equivalent host genes, a process known as endosymbi- In summary, retargeting of proteins among organ- the cell, replicating itself along otic gene replacement69. elles and, most notably, the cytosol is a highly dynamic with the chromosome. The number of proteins that are predicted to be and influential process in eukaryotic evolution. When imported into mitochondria varies markedly across genes are donated from organelles to the nucleus, there BIOLISTIC TRANSFORMATION A commonly used eukaryotic groups, ranging from ~150 proteins in the is no homing device that automatically re-routes the transformation method in parasitic Encephalitozoon cuniculi to ~4,000 pro- protein product back to the donor organelle. Rather, which metal beads are coated teins in humans. Only ~50 proteins were common to chance, natural selection and lineage diversification with gene contructs and shot the mitochondria of all non-parasitic eukaryotes72. seem to govern the intracellular targeting fate of genes into cells. Similarly, the number of nuclear-encoded proteins that that organelles donate to the chromosomes of their

LEAF EXPLANTS are predicted to be targeted to chloroplasts differs by a host. In this sense, gene donations from organelles are Small sterile sections of leaf or factor of two between rice and Arabidopsis73.Such pre- important starting material for the evolution of new other plant tissue from which dictions still have clear limitations but are improving genes that are specific to the eukaryotic lineage. whole plants might sometimes with the accumulation of more direct experimental data be regenerated. for localization73,74. Laboratory estimates of transfer frequencies UNIPARENTAL INHERITANCE For biochemical pathways that are present in both Comparative genome analyses show us that gene trans- The mode of inheritance that the original host and its endosymbionts, competition fers have occurred at different times in the past, and generally characterizes the genes can ensue8,66.In some cases, the pathway of the sym- indicate that the process is continuing. The challenge of cytoplasmic organelles in biont can predominate18,75 but hybrid pathways can has been to get direct empirical estimates of the fre- which only one of the two sexual 66,76,77 partners contributes to the develop from both host and sources . quency at which DNA is being transferred among cellular offspring. Organelle division is a prime example of lineage-specific compartments.

128 | FEBRUARY 2004 | VOLUME 5 www.nature.com/reviews/genetics Woese’s Tree of Life

SSU rRNA (16S/18S)

définition phylogénétique

33 domainesdomaines 1. eubactArchaeaéries (Procaryotes) 2. archébactériesBacteria (Procaryotes) 3. EucaryotesEukarya

Carl Woese Woese (1987) Microbiol Rev 51:221-271; http://pacelab.colorado.edu/ Quelle est la diversité des2 Eucaryotes ? gether are the Archimycetes of Giiu- which it is placed here) or of the Pro- Phaeophyta are recognized to have mann (65, 57)] become phyla of tista (in which it could with equal jus- come from different unicellular ances- , adjacent to absorptive and tice be placed). The slime molds cross tors than the ; the resem- spore-forming organisms regarded as the distinctions of the kingdoms in both blance of these three groups as higher protozoans, the Sporozoa and Cnidospo- nutrition and organization, and offer a plants results from convergence. There ridia. Other wall-less fungi, the slime free choice of treatment as aberrant is reason also to suspect that these molds, probably include at least three fungi, eccentric protists, or very peculiar algae supplement photosynthetic nutri- separate evolutionary lines from the uni- animals. The line from the unicellular tion by absorption (67). Judged by the cellular condition (66), the true slime to multicellular and multinucleate criterion of monophyly, the Plantae as molds (Myxomycetales), cellular slime organization has been crossed by a num- treated here may seem less a molds (Acrasiales), and cell-net slime ber of independent phyletic lines. I sug- than an alliance of separate groups molds (Labyrinthulales). These have, for gest that the transition between the uni- which are multicellular and predomi- their separate origin and different orga- cellular and multicellular-multinucleate nantly photosynthetic. It is also true nization, been treated as three phyla and conditions is a better conceptual division that the Metazoa in its traditional form grouped in a polyphyletic subkingdom between lower and higher organisms is polyphyletic, with separate derivation Gymnomycota. than degree of tissue differentiation. The to be assumed for two and probably all This treatment results in a consider- practical difficulties with borderline three of its subkingdoms. The kingdom able elevation of taxa; groups which are groups are at least as great, and may be Fungi includes, as indicated, probably orders and classes in most other classi- greater, when the separation is based on two convergent groups of chytrid and fications become phyla here, in some the unicellular condition rather than mycelial fungi and three of slime molds. cases separated into different branches degree of tissue differentiation. There is 3) Even with the multicellular algae and subkingdoms. Recognition of three room for different judgments on the and higher fungi excluded, the Protista phyla of slime molds and seven of chy- merits of the two lines of division. is a grouping of diverse organisms of trid and mycelial fungi is not, however, 2) The three higher kingdoms are disparate directions of evolution. Neces- undue taxonomic inflation. The range polyphyletic.Whittaker’sThe Rhodophyta and sarily, someSystemprotist phyla are more of forms comprised in the fungi is wide, and the evidence of independent origin on April 18, 2011 of various fungal and groups Plantae Fungi Animalia is clear. It is suggested that true fungi and slime molds are not best treated as two phyla, that their designation as such is in part a consequence of the effort to treat these groups within the plant or the protoctist kingdom, and that the ex- pansion of each into a number of phyla is more reasonable. www.sciencemag.org I believe that this system better rep- resents broad relationships in regard to both levels of organization and nutritive modes affecting kinds of organization than the two-kingdom and Copeland systems. The red and and the fungi may seem better placed, the Downloaded from former as the higher plants of the sea, the latter as the third major evolution- ary direction among higher organisms. The system may further have much ad- vantage over the two-kingdom system and some over the Copeland system in the coherence and definable character 0el of the kingdoms as units of classifica- tion. Robert H. Whittaker Whittaker (1969) Science 163:150-160 Limitations of the Five-Kingdom System Fig. 3. A five-kingdom system based on three levels of organization-the procaryotic (kingdom Monera), eucaryotic unicellular (kingdom Protista), and eucaryotic multi- 1) The distinction of the unicellular cellular and multinucleate. On each level there is divergence in relation to three principal modes of nutrition-the photosynthetic, absorptive, and ingestive. Ingestive versus the multicellular and multi- nutrition is lacking in the Monera; and the three modes are continuous along numerous nucleate conditions becomes the line of evolutionary lines in the Protista; but on the multicellular-multinucleate level the nutri- division and difficulty. The phylum tive modes lead to the widely different kinds of organization which characterize the Chlorophyta includes intergrading uni- three higher kingdoms-Plantae, Fungi, and Animalia. Evolutionary relations are much simplified, particularly in the Protista. Phyla are those of Table 1; but only major animal cellular, colonial-unicellular, and multi- phyla are entered, and phyla of the bacteria are omitted. The Coelenterata comprise cellular forms and consequently violates the Cnidaria and Ctenophora; the Tentaculata comprise the Bryozoa, Brachiopoda, and the definition either of the Plantae (in Phoronida, and in some treatments the Entoprocta. 10 JANUARY 1969 157 Eukaryotic Diversity

Walker et al. (2011) Parasitology 138:1638-1663 letters to nature

5 mm planktonic fraction and, for comparison, also from the a microbial fraction .5 mm at 3,000 m deep. After partial sequencing Microsporidia of the 39 region of the gene (700 base pairs, bp, on average), BLAST Diplomonadida 5 searches and phylogenetic reconstruction distance methods pro- Trichomonadida DH148-5-EKD18 vided us with a ®rst survey of the type of eukaryotic sequences Physarum polycephalum 80 DH148-EKB1 present in our samples. Twenty-four representative clones from all 100 Diplonema papillatum 96 95 Euglena gracilis depths were subsequently chosen for complete sequencing. The DH145-EKD11 Phreatamoeba balamuthi complete sequences were aligned with 1,443 additional 18S rRNA Entamoeba histolytica gene sequences retrieved from databanks. A subset of 101 complete 50 Myxozoa Dictyostelium discoideum sequences was then selected for phylogenetic analysis, taking special 48 Ammonia beccarii care to include a taxonomically broad sample of eukaryotes (all and Haplosporidia closest relatives to our sequences) to minimize artefacts related to taxonomic sampling. We constructed distance (neighbour-joining, 250 m NJ), maximum-parsimony (MP) and maximum-likelihood (ML) 500 m trees, which produced similar congruent results. Figure 1 shows an 2,000 m Crown eukaryotes ML tree displaying the eukaryotic microbial diversity found. 3,000 m As was expected of cold, highly oxygenated waters, the majorityletters of 3,000 to m (> nature 5 µm) sequences af®liate with the eukaryotic ``crown'', the densely branched apical part of the eukaryotic tree6. However, we found three phylotypes belonging to the early branching part of the 18S b Environmental Surveys 5 mm planktonic fraction and, for comparison, americanum also from the a rRNA tree (Fig. 1a). DH148-5-EKD18 represents a new lineage pustulata microbial fraction .5 mm at 3,000 m deep. After pyriformis partial sequencing emerging in the region of the Archezoa. The large length of its DH148-5-EKD6 Microsporidia 100 Anophyroides haemophila branch suggests that it could correspond to a parasite whoseof the rRNA 39 region of the gene (700 base DH147-EKD23 pairs, bp, on average), BLAST Diplomonadida 5 50 Pseudoplatyophrya nana has evolved rapidly. It would thus be `attracted' to the basesearches of the and phylogenetic100 reconstruction inflata distance methods pro- Trichomonadida tree by a long-branch attraction artefact, as indeed occurs with DH148-5-EKD18 7 vided us with a ®rst survey of the type of eukaryotic sequences Physarum polycephalum Microsporidia . This may be supported by the occurrence of several DH147-EKD19 DH148-EKB1 present in our samples. 12 84 Twenty-four representative DH145-EKD20 clones from all 80 speci®c deletions in this sequence (data not shown), as fast-evolving DH147-EKD20 100 Diplonema papillatum DH148-EKD27 96 95 Euglena gracilis 8 42 Marine eukaryotic sequences are often characterized by length variationdepths. were subsequently 89 chosen DH148-EKD14 for complete sequencing. The DH145-EKD11 39 DH147-EKD3 Group II Phreatamoeba balamuthi 38 DH147-EKD16 DH145-EKD11 also corresponds to a new eukaryotic lineagecomplete of sequences were100 aligned DH147-EKD6 with 1,443 additional 18S rRNA Entamoeba histolytica DH144-EKD3 50 uncertain phylogenetic ascription, although it emerges in a region DH148-EKD22 Myxozoa gene sequences retrieved 70 from databanks. A subsetMarine of alveolate 101 complete 69 DH147-EKD18 Dictyostelium discoideum of the tree occupied by amoeboid organisms (Phreatamoeba, 48 DH145-EKD10 Group I sequences was then selected for Perkinsus phylogenetic marinus analysis, taking special 48 Ammonia beccarii Entamoeba or Myxozoa) (Fig. 1a). We could retrieve very few Crypthecodinium cohnii Pyrocystis noctiluca Haplosporidia sequences with the primer set EK-1F + EK-1520R, whichcare to was include a taxonomically Noctiluca broad scintillans sample of eukaryotes (all andSSU rRNA DH147-EKD21 closest relatives to our 99 sequences) tomikimotoi minimize artefacts related to used only with the 3,000-m sample, but all of them (here repre- microadriaticum sented by DH148-EKB1) were related to Diplonema spp., which are DH147-EKD17 taxonomic sampling.100 We constructed Acanthometra distance sp.- AF063240 (neighbour-joining,Acantharea euglenozoan heterotrophs frequently found in marine benthic 100 Chaunacanthid sp.-218 250 m 9 NJ), maximum-parsimony (MP) andChlorarachniophyta maximum-likelihood (ML) 500 m sites . Heteromita globosa trees, which produced similar Euglypha congruent rotunda results. Figure 1 shows an The diversity of crown eukaryotes is much larger (Fig. 1b). The hominis 2,000 m Crown eukaryotes Cafeteria roenbergensis most frequently retrieved groups were the alveolates, followedML tree by displaying the eukaryotic microbial Ulkenia diversityprofunda found. 3,000 m Labyrinthuloides haliotidis . We also found sequences belonging to fungi, and to the DH147-EKD10 As was expected of cold, highly Labyrinthuloides oxygenated minuta waters,Labyrinthulids the majority of 3,000 m (> 5 µm) 29 & 100 Schizochytrium minutum Thraustochytrids amoeboid phagotrophic acantharean radiolaria. Fungisequences reaf®rm af®liate with Thraustochytrium the eukaryotic multirudimentale ``crown'', the densely DH148-5-EKD53 themselves as one of the most ecologically successful eukaryotic Developayella elegans 6 branched apical part 57 of the DH144-EKD10 eukaryotic tree . However, we found lineages; they have even been isolated from the bottom of the 19 Lagenidium giganteum 10 three phylotypes belonging 43 Hyphochytrium to the earlycatenoides branching part of the 18S Mariana trench (10,897 m) . Within the heterokonts, we detected Heterokonts b rRNA tree (Fig. 1a). DH148-5-EKD18Dictyochophyceae represents a new lineage Blepharisma americanum sequences related to the labyrinthulids (DH147-EKD10). These are Stylonychia pustulataPurificacion relatively common in the sea and play a role in decompositionemerging in the region of the Archezoa.Centric diatoms The large length of its Tetrahymena pyriformis 11 DH148-5-EKD6 David Moreira processes colonizing faecal pellets also under deep-sea conditions . Fragilaria striatula 100 AnophyroidesLopez-Garcia haemophila Ciliates branch suggests that it could correspond DH148-5-EKD54 to a parasitePennate diatoms whose rRNA DH147-EKD23 Two other sequences, DH148-5-EKD53 and DH144-EKD10, do not 100 Pseudo-nitzschia pungens has evolved rapidly. It would Pelagomonas thus be calceolata `attracted' to the base of the 50 Pseudoplatyophrya nana clearly af®liate with any known species, and may represent new Giraudyopsis stellifera 100 Colpoda inflata Chrysophyceae lineages of heterokonts. We also retrieved a pennatetree by a long-branch attraction Vacuolaria virescens artefact, as indeed occurs with Drake Passage Apicomplexa Alveolates 7 Nannochloropsis salina sequence (DH148-5-EKD54) at 3,000 m that could correspondMicrosporidia to . This may be supported Fucus gardneri by the occurrence of several DH147-EKD19 12 84 DH145-EKD20 a sinking cell. However, as it is very similar to Pseudo-nitzschiaspeci®cspp. deletions in this sequenceHaptophyceae (data not shown), as fast-evolving DH147-EKD20 DH148-EKD27 sequences, common dino¯agellate endosymbionts, this sequence Green plants 8 42 Marine alveolate eukaryotic sequences are often characterized by length variation . 89 DH148-EKD14 Group II could instead derive from a dino¯agellate endosymbiont. Cryptophyta 39 DH147-EKD3 Cyanophora paradoxa 38 DH147-EKD16 Alveolate sequences were by far the most diverse in ourDH145-EKD11 samples. also corresponds to a new eukaryotic lineage of 100 DH147-EKD6 Acanthamoebidae DH144-EKD3 uncertain phylogenetic ascription, although it emerges in a region DH148-EKD22 Within the commonly predatory ciliates, we obtained new oligo- Red algae 70 Marine alveolate Hydra littoralis Metazoa 69 DH147-EKD18 hymenophorean (DH148-5-EKD6) and colpodeanof (DH147- the tree occupied by amoeboid Saccharomyces organisms cerevisiae (Phreatamoeba, 48 DH145-EKD10 Group I DH148-5-EKD21 Fungi EKD23) sequences. We also recovered typical dino¯agellateEntamoeba or Myxozoa) (Fig.100 1a). Eupenicillium We could javanicum retrieve very few CrypthecodiniumLopez-Garcia cohnii et al. (2001) Nature 409:603-607 sequences from all depths and both planktonic fractions (Figs 1b Pyrocystis noctiluca sequences with the primer set EK-1F + EK-1520R, which was and 2). These sequences are related to (often lacking Figure 1 Maximum-likelihood tree of eukaryotic phylotypes in deep Antarctic waters DH147-EKD21 Dinoflagellates 99 Gymnodinium mikimotoi a theca cell wall) and (theca with two plates),used which onlyconstructed with the using 3,000-m 101 eukaryotic sample, 18S rRNA but sequences. all of The them tree has (here been split repre- in two Symbiodinium microadriaticum are usually small12. Interestingly, the vast majority ofsented sequences by DH148-EKB1)parts representing the basal were part related (a) and the to crownDiplonema (b) of the eukaryoticspp., rRNA-based which are DH147-EKD17 100 Acanthometra sp.- AF063240 Acantharea obtained grouped in two major between dino¯agellateseuglenozoan and phylogeny. heterotrophs The outgroup branch frequently (archaea) is not found shown. Thin in triangles marine correspond benthic to two 100 Chaunacanthid sp.-218 apicomplexans (we have termed these marine alveolate groups I9 and representative species of a given taxon, three in the case of Apicomplexa. Bootstrap Chlorarachniophyta sites . Heteromita globosa II). These sequences were mainly retrieved from the smallest values are given only below nodes concerning the new eukaryotic sequences. The colour Euglypha rotunda The diversity of crown eukaryotes is much larger (Fig. 1b). The Blastocystis hominis planktonic fraction at all depths (Figs 1b and 2). In terms of genetic code indicates sea depths at which sequences were obtained. Scale bars correspond to Cafeteria roenbergensis divergence, the diversity found within these groups is equivalentmost frequently to 15 substitutions retrieved per 100 groupspositions for were a unit branch the length. alveolates, followed by Ulkenia profunda Labyrinthuloides haliotidis heterokonts. We also found sequences belonging to fungi, and to the DH147-EKD10 Labyrinthuloides minuta Labyrinthulids 29 & 604 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 1 FEBRUARY 2001 | www.nature.com 100 Schizochytrium minutum Thraustochytrids amoeboid phagotrophic acantharean radiolaria. Fungi reaf®rm Thraustochytrium multirudimentale DH148-5-EKD53 themselves as one of the most ecologically successful eukaryotic Developayella elegans 57 DH144-EKD10 lineages; they have even been isolated from the bottom of the 19 Lagenidium giganteum Mariana trench (10,897 m)10. Within the heterokonts, we detected 43 Hyphochytrium catenoides Dictyochophyceae Heterokonts sequences related to the labyrinthulids (DH147-EKD10). These are Bolidophyceae relatively common in the sea and play a role in decomposition Centric diatoms processes colonizing faecal pellets also under deep-sea conditions11. Fragilaria striatula DH148-5-EKD54 Pennate diatoms 100 Pseudo-nitzschia pungens Two other sequences, DH148-5-EKD53 and DH144-EKD10, do not Pelagomonas calceolata clearly af®liate with any known species, and may represent new Giraudyopsis stellifera Chrysophyceae lineages of heterokonts. We also retrieved a pennate diatom Vacuolaria virescens Nannochloropsis salina sequence (DH148-5-EKD54) at 3,000 m that could correspond to Fucus gardneri a sinking cell. However, as it is very similar to Pseudo-nitzschia spp. Haptophyceae sequences, common dino¯agellate endosymbionts, this sequence Green plants Cryptophyta could instead derive from a dino¯agellate endosymbiont. Cyanophora paradoxa Alveolate sequences were by far the most diverse in our samples. Acanthamoebidae Within the commonly predatory ciliates, we obtained new oligo- Red algae Hydra littoralis Metazoa hymenophorean (DH148-5-EKD6) and colpodean (DH147- Saccharomyces cerevisiae DH148-5-EKD21 Fungi EKD23) sequences. We also recovered typical dino¯agellate 100 Eupenicillium javanicum sequences from all depths and both planktonic fractions (Figs 1b and 2). These sequences are related to Gymnodiniales (often lacking Figure 1 Maximum-likelihood tree of eukaryotic phylotypes in deep Antarctic waters a theca cell wall) and Prorocentrales (theca with two plates), which constructed using 101 eukaryotic 18S rRNA sequences. The tree has been split in two are usually small12. Interestingly, the vast majority of sequences parts representing the basal part (a) and the crown (b) of the eukaryotic rRNA-based obtained grouped in two major clades between dino¯agellates and phylogeny. The outgroup branch (archaea) is not shown. Thin triangles correspond to two apicomplexans (we have termed these marine alveolate groups I and representative species of a given taxon, three in the case of Apicomplexa. Bootstrap II). These sequences were mainly retrieved from the smallest values are given only below nodes concerning the new eukaryotic sequences. The colour planktonic fraction at all depths (Figs 1b and 2). In terms of genetic code indicates sea depths at which sequences were obtained. Scale bars correspond to divergence, the diversity found within these groups is equivalent to 15 substitutions per 100 positions for a unit branch length.

604 © 2001 Macmillan Magazines Ltd NATURE | VOL 409 | 1 FEBRUARY 2001 | www.nature.com REPORTS

recently described as a previously unknown lineage Single-Cell Genomics Reveals of pigmented eukaryotes with a phylogenetic af- finity to cryptophytes and (8, 9). The cells were originally identified microscopically with Organismal Interactions in the use of 18S rDNA-based fluorescent in situ hybridization probes. Although their ultrastructure Uncultivated Marine Protists is unknown, previous studies using autofluores- cence and 4´,6-diamidino-2-phenylindole staining Hwan Su Yoon,1,2* Dana C. Price,3* Ramunas Stepanauskas,1 Veeran D. Rajah,3 Michael E. Sieracki,1 data (9, 10)appearedtoshowthatpicobiliphytes William H. Wilson,1 Eun Chan Yang,1 Siobain Duffy,3 Debashish Bhattacharya3† contain a plastid derived from a cryptophyte alga (owing to the presence of phycobilin proteins; Whole-genome shotgun sequence data from three individual cells isolated from seawater, hence the phylum name) and the associated rem- followed by analysis of ribosomal DNA, indicated that the cells represented three divergent clades nant nucleus (nucleomorph). These taxa have of picobiliphytes. In contrast with the recent description of this phylum, we found no evidence of plastid not yet been successfully cultivated, leaving open DNA nor of nuclear-encoded plastid-targeted proteins, which suggests that these picobiliphytes the possibility that the plastid and nucleomorph are heterotrophs. Genome data from one cell were dominated by sequences from a widespread may not be permanent acquisitions but rather single-stranded DNA virus. This virus was absent from the other two cells, both of which contained come from a klepto-plastid or a cryptophyte alga non-eukaryote DNA derived from marine Bacteroidetes and large DNA viruses. By using shotgun sequencing of uncultured marine picobiliphytes, we revealed the distinct interactions of individual cells. 1Bigelow Laboratory for Ocean Sciences, West Boothbay Har- bor, ME 04575, USA. 2Department of Biological Sciences, 3 ulture-independent analyses of environ- veal in situ interactions among organisms. To Sungkyunkwan University, Suwon 440-746, South Korea. De- partment of Ecology, Evolution and Natural Resources, Rutgers mental ribosomal DNA (rDNA) clone li- achieve this level of resolution, genome data from University, New Brunswick, NJ 08901, USA. braries and metagenomes can uncover single cells captured from the wild environment C *These authors contributed equally to this work. unexpected microbial species and gene diversity are needed. We used single-cell genomics (4–7) †To whom correspondence should be addressed. E-mail: (e.g., 1–3). These methods cannot, however, re- to study the marine plankton group Picobiliphyta, [email protected] REPORTS

body stainable with the nucleic acid–specific phytes and chlorarachniophytes (17), support- Kleptoplastidy is another possibility, such as

dye DAPI (Fig. 1) may be a DNA-containing ing the idea that picobiliphytes are another in the katablepharids (19, 20), which along with on February 16, 2012 nucleomorph, similar to that found in crypto- secondary endosymbiotic algal group (18). the cryptophytes are the picobiliphytes’ pur- tein Phylum-level Findings (1) Ciliata pro ported sister group. However,A kleptoplastidy is BCd te 00 1 Fungi a 17 00 unlikely in such small organisms. In the absence i 0 DQ222873 North Sea c 0 20 u Haptophyta o 16 0 n of living cells to follow through cell division, we s 0 3 k DQ222876 English Channel s 0 0 All gene hits 5 0 n screened filtered 3-mm–fractioned water for Proteobacteria a 1 - o 0 4 n 0 0 DQ222877 English Channel w 4 cells that hybridized with our probes, using a Unique gene hits 0 Stramenopiles o MS584-5

i 1

t n

5

SSU rRNA, 0

a

ChemScan solid-phase cytometer (8) (fig. S2). 0

DQ222878 English Channel 0

f Vira 0

c 3 1804 nt

i u

1

96 // l

We never encountered positive cells without a 6 n

0

p

0

EU368039 North Atlantic Slope Viridiplantae 1686 0

0

e c

96 2

r 1

plastid on the filters scanned by the laser, t

7

0

i

0

0

e o

0

DQ060525 Arctic Ocean Metazoa 1

1

FISH, SCG v

8

n i

which implies that the cells are predominately 0 0

0 t 0

0

1 9

0 0 a

Others t pigmented, so kleptoplastidy does not seem u MS584-5 p very likely. Boothbay MS609-66 020406080100120 Are the picobiliphytes representatives of DQ060524 Arctic Ocean MS584-5 Contigs with hits another red algal secondary endosymbiosis, 77 CB_A FJ959082 60 81 DQ222872 North Sea Elusimicrobia D such as chromo-alveolates, in the broad sense, BBC_A FJ959086 www.sciencemag.org or do they have kleptoplastids? Without living Fusobacteria RW_C FJ959079 95 MS584-11 RW_D FJ959080 cells, the status of their endosymbiosis and a 87 Vira 98 EU368038 North Atlantic Slope RW_B FJ959078 formal description will remain unresolved.76 99 RW_E FJ959081 picobiliphytes EU368037 North Atlantic Slope Nevertheless, picobiliphytes are pigmented100 and69 Rhodophyta RW_A FJ959077 thus contribute to primary production. Molecu- DQ222879 English Channel Firmicutes lar analysis confirms that they are a eukaryotic100 MS584-22 Alveolata group that should be recognized at the phylum Choanoflagellida circoviruses 97 DQ222880 English Channel or division level, without any real indication of Others 98 cycloviruses their sister group. We found that they are well DQ222875 North Atlantic 100

represented in polar and cold temperate coastal on January 13, 2007 DQ060526 Arctic Ocean 100 Fungi marine ecosystems, as judged from their ap- AY426835 Mediterranean coast Haptophyta 96 ocean pearance in clone libraries and preliminary FISH Proteobacteria 95 data. The putative prese100nce of a DNA-containing DQ222874 North Sea metagenome Downloaded from 100 cryptophytes Stramenopiles body in the purported plastid places them in an 98 AB240953 Cryptomonas rostratiformis Viridiplantae 100 100 80 intriguing position in the study of plastid re- 100 AB240952 Cryptomonas ovata Metazoa nanoviruses 98 62 duction to organelles. 100 L28811 Chilomonas Bacteroidetes 894 100 Within the past 15 years, four algal classes95 // 100 AB240955 Cryptomonas paramecium 0 100 200 300 400 have been described from the picoplankton100 [see100 AB241128 Rhodomonas baltica (5) for details], and picobiliphytes represent www.sciencemag.org MS584-11 Contigs with hits ocean 100 U03072 another division or phylum. The phylogenetic truncata Firmicutes metagenome analysis indicates that they are a highly diverse 100 AB194980 Leucocryptos marina Excavata group, composed of at least three distinct clades. Alveolata 100 ECU78741 GOS_11242 The temporal and spatial scales at100 which they100 Environmental sample POVAMDc39 100 ECU78740 GOS_11241 katablepharids Haptophyta 63 EBA56841 GOS_7308 62 occur, as inferred from molecular data,100 indicate84 AY919672 Katablepharis remigera Bacteroidetes ECU79006 GOS_10976 that they could make up a substantial picoplank- 69 EBA57629 GOS_6345 87 AB231617 Katablepharis japonica Fungi 70 EBA53737 GOS_9213 ton fraction under certain conditions. The ex- Vira EBA56731 GOS_7420 X70803 Glaucocystis nostochinearum istence of small, sometimes rare, organisms is Downloaded from Stramenopiles picobiliphyte MS584-5 X81901 Gloeochaete wittrockiana ECU78869 GOS_11113 only now being recognized, and their role in glaucophytes Others ECU78738 GOS_11246 ecosystem functionAY823716 is unknown,Cyanophora but they paradoxa prob- Viridiplantae ECL36795 GOS_3680446 ably act as reservoirs of genetic capacity that are Proteobacteria 93 EBA56617 GOS_7546 0.1 substitutions/site ECU79003 GOS_10979 activated under specific conditions. The discov- Metazoa 66 EBA53362 GOS_11576 ery of picobiliphytes and their apparent wide- 050100150 0.2 substitutions/sites spread distribution and contribution to marine MS584-22 Contigs with hits Notprotist et assemblagesal. (2007) highlight Science the imperative315:253-255; of Yoon et al. (2011) Science 332:714-717 Fig. 1. Phylogenetic trees were reconstructed from full-length 18S rRNA sequence data listed in table understanding biodiversity before its loss on a S1 and inferred with Bayesian analysis from two parallel runs, each with one million generations with six global scale.Fig. 1. (A)Randomizedacceleratedmaximumlikelihood(RAxML)phyloge- and MS584-22. Some taxa are overrepresented, such as virus hits in MS584-5 chains and increased temperature between the chains to facilitate exchange between the chains (8). This tree is the 50% majority-rule tree of the last 100 trees saved from one of the parallel runs. Support for Referencesnetic and Notes tree of picobiliphyte SSU rDNA coding regions. RAxML bootstrap values and Bacteroidetes in MS584-22 that are probably explained by MDA bias. (C) each node was also determined with 100 replicated bootstrap analyses of weighted maximum 1. B. Díez, C. Pedrós-Alió, R. Massana, Appl. Environ. parsimony and neighbor-joining analyses. Nodes supported by bootstrap or posterior probability values Microbiol. are67, 2932 above (2001). the branches, and those derived from maximum parsimony (when Genome structure of the previously unknown ssDNA virus. (D)Simplified above 50% are labeled for the three methods used (MrBayes/maximum parsimony/neighbor-joining). If 2. P. López-García, F. Rodríguez-Valera, C. Pedrós-Alió, D. Moreira,nodesNature 409, are 603 (2001). shared) are below the branches. Only bootstrap values ≥60% are RAxML tree of Rep proteins from representative ssDNA viruses, showing the a was not supported by a method, it is indicated by a dash. The asterisk indicates that internal 3. S. Y. Moon-van der Stay, R. De Wachter, D. Vaulot, major clades were supported by 100 posterior probabilities from the MrBayes analysis. PICOBI01 and Nature 409shown., 607 (2001). Sequenced genomes are in bold. GenBank numbers are shown for each phylogenetic position of the MS584-5 sequence. Rep from marine ssDNA PICOBI02 are specific for the sequences belonging to the three clades as bracketed. (Insert)Pictureofa 4. K. Romari, D. Vaulot, Limnol. Oceanogr. 49, 784 (2004). cell targeted by the probe PICOBI02 (specific for picobiliphyte clade 2) from the Roscoff ASTAN 5. L. K. Medlintaxon.et al., Microb. (B Ecol.)AnalysisofthetaxonomicdistributionofBLASTxhitsusingasquery167, 1432 (2006). viruses is shown in blue, whereas sequences derived from ocean metagenome sampling site on 26 September 2001. Arrows point to the DAPI-stained nucleus (nuc) in blue, to the 6. R. Massana, V. Balagué, L. Guillou, C. Pedros-Alió, FEMS Microbiol.the Ecol. 454-derived50, 231 (2004). contigs from each SAG assembly (when ≥10; if ≤10, the data are shown in red. The bootstrap values (when ≥60%) above the branches green fluorescence from probe-specific labeling of the small subunit rRNA in the cytoplasm (cyto), and 7. C. Lovejoy, R. Massana, C. Pedrós-Alió, Appl. Environ. to the red autofluorescence from the phycobiliprotein-containing organelle (PBPorg). Double asterisks Microbiol. different72, 3085 (2006). hits were grouped under “Others”). The total number of hits (blue are from RAxML, whereas those below are from PhyML. The full tree is shown indicate sequences not recognized by the probes. 8. See supportingbars) material and on Science theOnline. unique gene hits (red bars) are shown for MS584-5, MS584-11, in fig. S2A.

254 12 JANUARY 2007 VOL 315 SCIENCE www.sciencemag.org 714 6 MAY 2011 VOL 332 SCIENCE www.sciencemag.org LSU 28Sr1 28Sr1 conditions, USA conditions, ll, The Netherlands conditions, USA 2 ! 2 oor, Japan oor, ll " ! 5.8S Washington Singer CM2 Washington V9 variable region m. owering maize plants, Germany m Washington Singer CM1 Washington " AU4 1520r AU4 , Provenance list of environmental b CM2 primer SSU Lily Stem CM2 DAPI TSA-FISH Overlay CM1.2 probe LKM11-01 group

LKM11-01 probe

CM1 primer AU2 Trembling aspen rhizosphere, elevated CO aspen rhizosphere, Trembling Marine sediments, low tide, Berkeley Aquatic Park, USA Sulphide-rich spring, Oklahoma, USA Polluted aquifer sediment, Banisveld land pond, Oklahoma, USA anaerobic Freshwater Spain The Rio Tinto, pH 2 river, plant, Oklahoma, USA waste-water treatment Anaerobic The Netherlands inoculated with lake water, Continuous culture water column, Greece Lake Koronia, aquifer polluted with land Anaerobic Lake Pavin, France Freshwater, France lakes, Freshwater clay/sand sediment, France Freshwater sediment sample, USA Lake Huron France Chlorinated drinking water, Peat bog, Switzerland ambient CO aspen rhizosphere, Trembling Rhizosphere soil of Rhizosphere 1975 non-fertilized agricultural soil sample, The Netherlands UK (this study) freshwater, Eutrophic Anoxic sediment, submarine caldera Upper marine methane cold seep sediment, Japan Anoxic sediment, Lake Namako-ike, Japan

CM1.1 CM1.2 LKM11-01 CM1.1 probe b c d , Schematic representation of the positions of primers

Phylum-level Findings (2) c Cryptomycota Probe CM1.1 Probe CM1.2 Rozella Probe Probe LKM11-01 CM2 To examine further the life cycle and morphology of the target groups , TSA-FISH identification of cryptomycota cells (green) and DAPI staining of (arrows) and probes (green hexagons) ond a partial rDNA genemicrobial array community (grey). (blue). Scale bar, 10 with actual values, with the SSU–5.8S–LSU support2a) value shown (Supplementary in Fig. red. Shortenedprobes/primers branches are are labelled in on grey. the Sequences tree. DNA targeted sequences. by the second subset of thea cryptomycota cell clade type similar and to consistently thatprobe recovered recovered for was CM1 negative. (Fig. We 1d). note The thatLKM11-01 LKM11-02 was the at abundance least of tenfold higher cells than identifiedsuggesting observed by for that the the CM1 LKM11-01 cells, subclade wasCM1 more subclade abundant in than the the environments sampled. in the freshwater samples, we tested whether the cells identified by Zygomycota Chytridiales Chytridiales Zygomycota Neocallimastigales Spizellomycetales Glomeromycota Chytridiales Monoblepharidales Ascomycota Basidiomycota Zygomycota Glomeromycota Blastocladiomycota Chytridiales Blastocladiomycota outgroup G5.3 P6-3m3 CH1_S2_50 is marked P1-3m12 PG5.28 P1-3m9 PFB12AU2004 LKM46 PFD5AU2004 Magnaporthe grisea PFB3AU2004 Neurospora crassa Neurospora er site Rozella p Lily Stem CM2 Stem Lily Macmillan Publishers Limited. All rights reserved Ustilago maydis Hypocrea jecorina Monascus purpureus Rhizopus oryzae Phycomyces blakesleeanus Phycomyces Platygloea disciformisPlatygloea Smittium culisetae BAQA64 , Phylogeny demonstrates a ©2011 P4-3m5 sp. Amb_18S_397 Amb_18S_405 PFB7SP2005 P2-3m3 BAQA04 Tilletiaria anomala Tilletiaria –1/2 Batrachochytrium dendrobatidis sp. sp. sp. Coprinopsis cinereaCoprinopsis Joinv23 NAMAKO-36 RT5iin3 0.2 substitutions Elev_18S_791 Filobasidiella neoformansFilobasidiella CV1_B2_34 Saccharomyces cerevisiaeSaccharomyces Calocera corneaCalocera dpeuk6 wweuk6 LKM11 Candida glabrata Candida G40 –1/2 I34 Monilinia fructicola Monilinia RSC-CHU-59 Washington Singer CM1 Washington LKM15 Blastocladiella emersonii Blastocladiella Debaryomyces hansenii Allomyces arbuscula Allomyces Disciotis venosa Disciotis –1/2 BAQA254 sp. sp. Dothidea sambuci Dothidea Amb_18S_1059 Elev_18S_792

RSC-CHU-43 Oedogoniomyces P2-3m5 RSC-CHU-42 Zeuk2 Karlingiomyces sp. Rhizoclosmatium Monoblepharella P1-3m3 P2-3m4 PRS2_4E_06 DSGM-64 WIM27 PFA12AU2004 sp. TAGIRI-23 Aurelia Chytriomyces hyalinus Rozella Glomus mosseae sp. sp. P34.42 NAMAKO-37 Paraglomus occultum Rhizophydium brooksianum Amoebidium parasiticum Rozella allomycis Rozella Schizosaccharomyces pombe Hyaloraphidium curvatum Polychytrium aggregatum –1/2 Basidiobolus ranarum Endochytrium Cladochytrium replicatum Cyllamyces aberensisCyllamyces kor_110904_24 Entophlyctis helioformisEntophlyctis Suberites ficus Capsaspora owczarzaki Triparticalcar arcticum Triparticalcar Physoderma maydis Powellomyces Nowakowskiella Catenomyces 0.6/50/50 Rhizophlyctis rosea Rhizophlyctis Rhizophydium sphaerotheca . LKM11 denotes a subsection of the environmental LETTER Nuclearia simplex Nuclearia 15 Identification of the cryptomycota. a . We used the probe LKM11-01 (Fig. 1a, c) to identify a | 0.9/80/80 6–8 1/66/36 1/96/96 To complement this approach, we also used two LKM11 FISH

SSUa rRNA RESEARCH diverse clade of environmental sequences (coloured dots) branchingof at the the base Fungi. MrBayes treesequences topology and was 1,012 calculated DNA characters. from Support an valuesdots are alignment summarized (indicating of by at 100 black least 0.9support Bayesian by posterior probability ML and and 80% Log-Det bootstrap posterior distance probability methods) above or 0.6 by and black bootstrapnode rings support showing (Bayesian above monophyly 50%). of The cryptomycota encompassing key Figure 1 cryptomycota group-specific probes and a(detailed panel in of the Methods) negative were controls consistently negative. probes previously usedprobes to determine in the eukaryotic community structure association inenvironments freshwater with additional eukaryotic 2|NATURE|VOL000|00MONTH2011 sequences branching within the cryptomycota clade detected instudies earlier Jones et al. (2011) Nature 474:200-203 Quelles sont les relations entre les groupes3 d’Eucaryotes ?

Vol 440|30 March 2006|doi:10.1038/nature04546 REVIEWS

Eukaryotic evolution, changes and challenges

T. Martin Embley1 & William Martin2

The idea that some eukaryotes primitively lacked mitochondria and were true intermediates in the prokaryote-to- eukaryote transition was an exciting prospect. It spawned major advances in understanding anaerobic and parasitic eukaryotes and those with previously overlooked mitochondria. But the evolutionary gap between prokaryotes and eukaryotes is now deeper, and the nature of the host that acquired the mitochondrion more obscure, than ever before. ew findings have profoundly changed the ways in which we they diverged after this singular symbiotic event5. Therefore, Archezoa view early eukaryotic evolution, the composition of major were interpreted as contemporary descendants of a phagotrophic, groups, and the relationships among them. The changes nucleated, amitochondriate cell lineage that included the host for the Nhave been driven by a flood of sequence data combined with mitochondrial endosymbiont6. The apparent agreement between improved—but by no means consummate—computational methods molecules and morphology depicted the relative timing of the of phylogenetic inference. Various lineages of -shunning or mitochondrial endosymbiosis (Fig. 1) as a crucial, but not ancestral, parasitic eukaryotes were once thought to lack mitochondria and event in eukaryote phylogeny. to have diverged before the mitochondrial endosymbiotic event. Such key lineages, which are salient to traditional concepts about Chinks in the consensus eukaryote evolution, include the (for example, ), Mitochondrial genomes studied so far encode less than 70 of the trichomonads (for example, ) and microsporidia (for proteins that mitochondria need to function5; most mitochondrial example, Vairimorpha). From today’s perspective, many key groups proteins are encoded by the nuclear genome and are targeted to have been regrouped in unexpected ways, and aerobic andArchezoa anaerobic vs. ‘Crown’ groups eukaryotes intermingle throughout the unfolding tree. Mitochondria in previously unknown biochemical manifestations seem to be universal among eukaryotes, modifying our views about the nature of the earliest eukaryotic cells and testifying to the importance of ~1995 endosymbiosis in eukaryotic evolution. These advances have freed the field to consider new hypotheses for eukaryogenesis and to weigh these, and earlier theories, against the molecular record preserved in genomes. Newer findings even call into question the very notion of a ‘tree’ as an adequate metaphor to describe the relationships among genomes. Placing eukaryotic evolution within a time frame and ancient ecological context is still problematic owing to the vagaries of the molecular clock and the paucity of Proterozoic fossil eukaryotes that can be clearly assigned to contemporary groups. Although the broader contours of the eukaryote phylogenetic tree are emerging from genomic studies, the details of its deepest branches, and its root, remain uncertain. Tom Cavalier-Smith The universal tree and early-branching eukaryotic lineages The universal tree based on small-subunit (SSU) ribosomal RNA1 provided a first overarching view of the relationships betweenSSU rRNA the & elongation factors different types of cellular life. The relationships among eukaryotes Figure 1 | The general outline of eukaryote evolution provided by rooted 2 rRNA trees. The tree has been redrawn and modified from ref. 92. Until recovered from rRNA , backed up by trees of translation elongation Cavalier-Smith (1993) Microbiol Rev 57:953-994; Embley & Martin (2006) Nature 440:623-630 factor (EF) proteins3, provided what seemed to be a consistent, and recently, lineages branching near the root were thought to primitively lack mitochondria and were termed Archezoa4. Exactly which archezoans hence compelling, picture (Fig. 1). The three at the base of 2 these trees (Giardia, Trichomonas and Vairimorpha), along with branched first is not clearly resolved by rRNA data , hence the polytomy (more than two branches from the same node) involving diplomonads, Entamoeba and its relatives, were seen as members of an ultrastruc- 4 and microsporidia at the root. Plastid-bearing lineages are turally simple, paraphyletic group of eukaryotes called the Archezoa . indicated in colours approximating their respective pigmentation. Lineages Archezoa were thought to primitively lack mitochondria, having split furthest away from the root, including those with multicellularity, were from the main trunk of the eukaryotic tree before the mitochondrial thought to be the latest-branching forms and were sometimes misleadingly endosymbiosis: all other eukaryotes contain mitochondria because (see ref. 60) called the ‘crown’ groups.

1School of Biology, The Devonshire Building, University of Newcastle upon Tyne, Newcastle NE1 7RU, UK. 2Institute of Botany III, University of Du¨sseldorf, D-40225 Du¨sseldorf, Germany. 623 © 2006 Nature Publishing Group What are the Archezoa?

Giardia Trichomonas

Encephalitozoon

parasites amitochondriaux Giardia Trichomonas (ultrastructure simple, pas de peroxysomes, système endomembranaire peu développé)

Encephalitozoon S150 The American Naturalist

Not so Archezoa (1)

Andrew Roger S150 The American Naturalist CPN60 HSP70

Figure 3: Phylogenies of chaperonin 60 and heat shock protein 70 genes from amitochondriate protists. Trees were obtained from protein maximum likelihood (PROTML; Adachi and Hasegawa 1996) analysis of aligned cpn60 (513 positions) (A) and hsp70 (515 positions) (B) data sets. Phylogenetic présence de gènes nucléairesmethodology is described in Rd’origineoger et al. (1998). Amitochon driate lineages are shown in bold. could be highly modified mitochondria (Cavalier-Smith report of cpn60, cpn10, and hsp70 genes in T. vaginalis 1987). (Bui et al. 1996; Germot et al. 1996; Horner et al. 1996; mitochondriale dansWhen da talesfor hyd rArchezoaogenosomal proteins became avail- Roger et al. 1996). Once again, phylogenetic analyses in- able from T. vaginalis, it was shown that they had mito- dicated that these genes were clearly related to mitochon- chondrial-like N-terminal targeting peptides that were drial isoforms (fig. 3). Furthermore, the cpn60 and hsp70 cleaved upon import into the hydrogenosome (Johnson proteRogerins appe a(1999)r to local iAmze to Nathydr o154:146-163genosomes in this or- et al. 1993). However, stronger evidence came with the ganism (Bui et al. 1996; Bozner 1997), where they may

Figure 3: Phylogenies of chaperonin 60 and heat shock protein 70 genes from amitochondriate protists. Trees were obtained from protein maximum likelihood (PROTML; Adachi and Hasegawa 1996) analysis of aligned cpn60 (513 positions) (A) and hsp70 (515 positions) (B) data sets. Phylogenetic methodology is described in Roger et al. (1998). Amitochondriate lineages are shown in bold. could be highly modified mitochondria (Cavalier-Smith report of cpn60, cpn10, and hsp70 genes in T. vaginalis 1987). (Bui et al. 1996; Germot et al. 1996; Horner et al. 1996; When data for hydrogenosomal proteins became avail- Roger et al. 1996). Once again, phylogenetic analyses in- able from T. vaginalis, it was shown that they had mito- dicated that these genes were clearly related to mitochon- chondrial-like N-terminal targeting peptides that were drial isoforms (fig. 3). Furthermore, the cpn60 and hsp70 cleaved upon import into the hydrogenosome (Johnson proteins appear to localize to hydrogenosomes in this or- et al. 1993). However, stronger evidence came with the ganism (Bui et al. 1996; Bozner 1997), where they may

REVIEWS NATURE|Vol 440|30 March 2006 mitochondria using a protein import machinery that is specific to their common ancestor also contained mitochondria. At face value, this organelle7. The mitochondrial endosymbiont is thought to have those findings falsified the central prediction of the archezoan belonged to the a-proteobacteria, because some genes and proteins concept. However, suggestions were offered that lateral gene transfer still encoded by the mitochondrial genome branch in molecular (LGT) in a context not involving mitochondria could also account trees among homologues from this group5,8. Some mitochondrial for the data. But that explanation, apart from being convoluted, now proteins, such as the 60- and 70-kDa heat shock proteins (Hsp60, seems unnecessary: the organisms once named Archezoa for lack of Hsp70), also branch among a-proteobacterial homologues, but the mitochondria not only have mitochondrial-derived proteins, they genes are encoded by the host nuclear genome. This is readily have the corresponding double-membrane-bounded organelles as explained by a corollary to endosymbiotic theory called endosym- well. biotic gene transfer9: during the course of mitochondrial genome reduction, genes were transferred from the endosymbiont’s genome Mitochondria in multiple guises to the host’s chromosomes, but the encoded proteins were re- The former archezoans are mostly anaerobes, avoiding all but a trace of imported into the organelle where they originally functioned. With oxygen, and like many anaerobes, including various ciliates and fungi the caveat that gene origin and protein localization do not always that were never grouped within the Archezoa, they are now known to correspond9, any nuclear-encoded protein that functions in mito- harbour derived mitochondrial organelles—hydrogenosomes and chondria and clusters with a-proteobacterial homologues is most mitosomes. These organelles all share one or more traits in common simply explained as originating from the mitochondrion in this with mitochondria (Fig. 2), but no traits common to them all, apart manner. from the double membrane and conserved mechanisms of protein By that reasoning10, the discovery of mitochondrial Hsp60 in import, have been identified so far. Mitochondria typically—but E. histolytica was taken as evidence that its ancestors harboured not always (the mitochondrion lacks DNA12)— mitochondria.Not A flood of similar reports so on mitochondrial Archezoa Hsp60 possess a genome that encodes components (2) involved in oxidative and Hsp70 from all key groups of Archezoa ensued11, suggesting that phosphorylation5. With one notable exception13, all hydrogenosomes

mitochondries hydrogenosomes mitosomes

présence d’organelles homologues aux mitochondries dans les Archezoa Figure 2 | Enzymes and pathways found in various manifestations of niches, but that their ATP supply is apparently O2-independent. UQ, mitochondria. Proteins sharing more sequence similarity to eubacterial ubiquinone; CI, mitochondrial complex I (and II, III and IV, respectively); than to archaebacterial homologues are shaded blue; those with converse NAD, nicotinamide adenine dinucleotide; MCF, mitochondrial carrier similarity pattern are shaded red; those whose presence is based only on family protein transporting ADPEmbley and ATP; & STK, Martin succinate (2006) thiokinase; Nature 440:623-630 biochemical evidence are shaded grey; those lacking clearly homologous PFO, pyruvate:ferredoxin oxidoreductase; PDH, pyruvate dehydrogenase; counterparts in prokaryotes are shaded green. a, Schematic summary of CoA, coenzyme A; Fd, ferredoxin; HDR, iron-only hydrogenase; salient biochemical functions in mitochondria5,88, including some anaerobic PFL, pyruvate:formate lyase; ASC, acetate-succinate CoA transferase; forms16,17. b, Schematic summary of salient biochemical functions in ADHE, bi-functional alcohol acetaldehyde dehydrogenase; FRD, fumarate hydrogenosomes14,19. c, Schematic summary of available findings for reductase; RQ, rhodoquinone; Hsp, heat shock protein; IscU, iron–sulphur mitosomes and ‘remnant’ mitochondria32–34,93. The asterisk next to the cluster assembly scaffold protein; IscS; cysteine desulphurase; ACS (ADP), Trachipleistophora and Cryptosporidium mitosomes denotes that these acetyl-CoA synthase (ADP-forming). organisms are not anaerobes in the sense that they do not inhabit O2-poor 624 © 2006 Nature Publishing Group 582 Evolution: Hirt et al. Proc. Natl. Acad. Sci. USA 96 (1999)

animal, plant, and some RPB1 protein sequences (36, sporidia and Fungi. Application of the LogDet transformation Not so37) butArchezoa are absent from the protists Trichomonas, Trypano- to variable sites at coding positions (3) 1 ϩ 2producedatree(not soma, and Giardia (38) as well as from the red algae Bonne- shown) where M ϩ FwassupportedwithBP74%,risingtoBP maisonia and Porphyra (39).Evolution: Thus, the presence Hirt ofet these al. CTD 93% in the absence of the outgroup. Proc. Natl. Acad. Sci. USA 96 (1999) 583 repeats is consistent with the inferred fungal relationship (see In summary, a relationship between Microsporidia and below). The CTD has been implicated in the processing of Fungi is strongly supported by our analyses of the RPB1 data spliceosomal introns from pre-mRNA (40), so it is interesting sets, and this relationship cannot be attributed to shared amino that spliceosomal snRNAs have now been reported from both acid or nucleotide biases, mutational saturation, or long- V. necatrix and N. locustae (ref. 18 and references therein), and branch effects. Because RPB1 appears robust in supporting M aspliceosomalintronhasrecentlybeendiscoveredinthe ϩ F, we were interested in how inferences from the elongation microsporidian Encephalitozoon cuniculi (41). With the excep- factors, EF-1␣ and EF-2, which apparently support Micros- tion of red algae, the known presence of spliceosomal introns poridia-early, would stand up in the face of the same analyses. -in eukaryotes (ref. 42; J.M.L., unpublished data) is perfectly Do Elongation Factors Support M ؉ ForMicrosporidia correlated with the possession of CTD heptapeptide repeats Early? Our ML analysis of the EF-2 protein sequences gave a (37, 39). tree similar to the one recently published for the same data set RPBI Protein and DNA Sequences Support a Relationship (4) where the microsporidian Glugea plecoglossi is a long Between Microsporidia and Fungi. The protein ML tree (Fig. branch at the base of the eukaryote clade. However, bootstrap 1) placed the two Microsporidia together with the fungal RPB1 support (with resampled datasets) for Glugea as first branch TUBA sequences (M ϩ F) with strong bootstrap support in all was only 54%, and support was further reduced to 33% when analyses. Use of the Kishino–Hasegawa (29) test to compare invariant sites were removed. This is in striking contrast to the statistical significance of different trees also supports M ϩ published local bootstrap support (75%) from the same data F. In fact, the one tree we found that could not be rejected at set for the basal position of Glugea (4). However, local the 0.05 level and that placed Microsporidia early also placed bootstrap support can only be interpreted as bootstrap prob- Fungi as the next deepest branch. Such a deep position for abilities of a particular internal branch when the other parts of Fungi conflicts with analyses of several proteins (11) and the tree are correct (25), an assumption that is likely not met SSUrRNA (6) that support a Fungi ϩ Metazoa relationship. for these data. EF-2 Indeed, we also recovered a Fungi (ϩ Microsporidia) rela- We surmised that a common amino acid bias with some tionship with Metazoa in our protein ML tree, albeit with weak outgroups may be influencing the observed deep position of support. When Fungi are constrained with Metazoa (as most the long Glugea branch relative to other eukaryotes, because data would have them), trees in which Microsporidia branched the amino acid frequency tree for the aligned sequences before Trichomonas, Giardia, or could all be grouped Glugea with the outgroup Archaea Sulfolobus and rejected at the 0.05 level. Methanococcus.Toinvestigatewhetherbase-compositional Reduction of rate heterogeneity between sites did not effects and͞or long-branch attraction was contributing to the reduce support for M ϩ FwithproteinML(Fig.1;datanot observed deep position for Glugea,weremovedthearchaeal shown), whereas removal of fast-evolving sites dramatically outgroup sequences. We also removed the category of fastest increased bootstrap support from 41% to 77% for M ϩ Fin evolving sites because these are expected to contribute most to MP analyses. This last result is consistent with the hypothesis any long-branch effect (33). Consistent with our hypothesis that parsimony is particularly sensitive to long-branch effects that Glugea is branching deep because of artifact, ML and MP microsporidiescaused by unequal substitution rates= (43). champignons Analyses of RPB1 analyses both recovered M dérivésϩ F(Fig.2A), albeit with weak DNA sequences also support a relationship between Micro- bootstrap support (ML ϭ 29%, MP ϭ 40%).

RBP1 FIG.2. PhylogeneticanalysesofEF-2.(A)ProteinMLtreewithoutgroupsremovedandFSR(72sites).Thescalebarrepresents10%estimated sequence divergence under the JTT-F model. (B–E)PhylogeneticanalysesofEF-2DNAcodonpositions1ϩ 2usingLogDetandinvestigating the influence of constant sites and choice of outgroup on bootstrap support. The DNA alignment was derived from a 542-aa alignment. (B)Bootstrap consensus tree topology for LogDet distances estimated from 812 variable sites (93% of constant sites removed as invariant) indicating support for M ϩ F. The effect of incremental removal of observed constant sites on bootstrap support for different relationships in the presence or absence of all 3 outgroups (C), with Halobacterium halobium as outgroup (D), and with Methanococcus vannielii as outgroup (E). The dashed vertical lines represent the ML estimates of invariant sites: 93%, 92%, and 93% of constant sites for C, D,andE,respectively(seetextfordiscussion).

Analyses of transitions versus transversions for EF-2 codon variation coupled with an outgroup-attraction effect. This last

FIG.1. PhylogeneticanalysesofRPB1.EffectsonbootstrapsupportofISRorFSRwereconsideredbyusingproteinML.Thetreeshownispositions 1 or 2 (data not shown) suggest that Microsporidia phenomenon is clearly related to the base composition of the the ML consensussequences tree topology from are analysis potentially of 760 aligned saturated. positions for Furthermore, 15 RPB1 sequences analyses and 1 outgroup of RPA1 sequence.outgroup. Values When report Halobacterium was used as outgroup, M ϩ F bootstrap support from ML analyses for all 760 sites (ALL sites), 669 sites (FSR, where the fastest evolving sites common to the ML tree and a tree where Microsporidiabase are compositions at the base of the eukaryotes suggest were that removed), shared and 645 biases sites (ISR). may Where potentially only a single bootstrapwas value recovered is shown, support even when all constant sites were included, and Keeling & Doolittle (1996) Molwas 100% Biol in all analyses.influenceEvol The13:1297-1305; scale tree bar represents topology. 10% estimatedFor Hirt example, sequence et al. divergence nucleotide (1999) under the frequency PNAS JTT-F model. USA there96:580-585 was never any strong support for Microsporidia-early trees clustered Glugea with the outgroup Archaea Sulfolobus (Fig. 2D). The Microsporidia-early hypothesis was strongly and Methanococcus,withwhichitsharesasimilarbasecom- supported only when (i)outgrouptaxa,i.e.,Sulfolobus (data position (interestingly, Giardia and Trichomonas clustered not shown) or Methanococcus (Fig. 2E), shared the same base with the other outgroup, Halobacterium). We therefore inves- composition bias as Glugea and (ii)toomanyconstantsites tigated the effect of reducing site rate heterogeneity by using were included in the analysis. progressive CSR and outgroup choice on support for M ϩ F Protein ML analyses of the EF-1␣ data set produced similar or Microsporidia-early from the EF-2 DNA dataset (Fig. 2 trees as previously published (4) with moderate (64%) boot- C–E). strap support for Glugea as the first branch. However, pairwise ML estimates suggested that 93% of constant sites should be comparisons for EF-1␣ codon positions 1 and 3 (data not removed from the EF-2 dataset for LogDet analyses if the shown) indicate potential saturation with all comparisons assumption of the method—that all sites can vary—is not to be involving Glugea strongly clustered and separated from other violated (22, 23, 33). Under these conditions, the LogDet tree among-eukaryote comparisons. The pattern for changes at recovered G. plecoglossi as the sister group to Saccharomyces replacement position 2 is even more extreme where the with 76% bootstrap support (Fig. 2B). In the absence of clustered points for Glugea are outside the range for the other outgroups, the M ϩ Frelationshipwasstronglysupportedin eukaryote and Archaea outgroup comparisons (data not all analyses (Fig. 2C). In the presence of outgroups, there were shown). Because all changes at position 2 result in a change of two conflicting positions for Microsporidia detected among amino acid, the use of the Glugea sequence for phylogenetic bootstrap partitions: (i)Microsporidia-earlywhenallormost inference is compromised, and trees inferred from this data set constant sites were included and (ii)Mϩ F, which increased must be considered unreliable. Consistent with the hypothesis in support as constant sites were removed (Fig. 2C). This that the Glugea EF-1␣ sequence is behaving in a manner that suggests that EF-2 support for Microsporidia-early is mainly is different from the other eukaryote sequences are observa- caused by a failure to adequately account for site-by-site rate tions (44) that it contains many nonconservative amino acid Long Branch Attraction

Y Z Y Z

q q p < q2

Joe Felsenstein p surtout en W X W X parcimonie, mais pas seulement si Y est un outgroup phylogénétiquement éloigné Y Y W Z position artéfactuelle Z W X X des Archezoa dans l’arbre de C. Woese

Felsenstein (1978) Syst Zool 27:401-410; Philippe et al. (2000) Curr Opin Genet Dev 10:596-601 R EPORTS basalia), suggesting dependence on exoge- tions in enolase that are shared with pro- zymes could have been lost independently nous thymidine (10-12), and the gene is karyotes (7). The enolase insertions are far in several lineages. absent from the virtually complete Giardia weaker evidence than the DHFR-TS fusion, Although most published distance trees genome. Possibly these parasites have lost as they could easily have been secondarily do not show excavate monophyly, probably the fusion gene. rRNA trees not using long- acquired by replication slippage mutations or because of long-branch problems, one max- branching archaebacteria as an outgroup by a single gene conversion using as template imum likelihood rRNA tree not rooted on place Parabasalia and Metamonada together a bacterial enolase gene taken into the archaebacteria shows the monophyly of ex- with high bootstrap support (5, 13, 14) and phagotrophic ancestor of Parabasalia after it cavates including (5), with often place Archezoa as sisters to . diverged from Metamonada and other exca- reasonably strong bootstrap support (87%) The complex tetrakont ciliary apparatus of vates. Sequence evidence that in the corresponding distance tree. In view Archezoa and Percolozoa was long an obsta- arose from diplomonads (17) eliminates them of this and of the ultrastructural unity of cle to considering them the most primitive as candidate “early amitochondrial eu- excavates and their complex ciliary roots eukaryotes; more likely, they evolved from karyotes” by placing them firmly within the (5), we predict that Loukozoa also simpler biciliate eukaryotes (5, 9, 14). As ancestrally tetrakont . have the gene fusion. Whether Loukozoa R EPORTS parabasalids and diplomonads share two de- Jakobid Loukozoa have also been con- are monophyletic or polyphyletic is uncer- basalia), suggesting dependence on exoge- tions in enolase that are shared with pro-rivedzymes laterally could transferred have been genes lost of independently cyanobac- sidered possible primitive eukaryotes be- tain, as is the precise position of excavates. nous thymidine (10-12), and the gene is karyotes (7). The enolase insertions are farterialin origin several [glucokinase lineages. (GK) and glucose- cause their mitochondria retain more pro- But given the strong evidence for absent from the virtually complete Giardia weaker evidence than the DHFR-TS fusion,phosphateAlthough isomerase most (GPI); published Fig. 1] distance (15, 16), treeskaryotic features than others (8); retention monophyly, no uncertainty in their internal genome. Possibly these parasites have lost as they could easily have been secondarilyArchezoado not are show almost excavate certainly monophyly, holophyletic, probablyof the proteobacterial RNA polymerase by branching order would be relevant to our the fusion gene. rRNA trees not using long- acquired by replication slippage mutations orso thebecause root cannot of long-branch lie within problems, them, whether one max-Reclinomonas does not mean that it is the conclusion unless an unstudied lineage branching archaebacteria as an outgroup by a single gene conversion using as templatebesideimum metamonads, likelihood as rRNA rRNA tree trees not rooted rooted on onmost ancient eukaryote—if the viral-type consistently branches closer to Amoebozoa place Parabasalia and Metamonada together a bacterial enolase gene taken into thearchaebacteriaarchaebacteria suggested, shows the or Parabasalia, monophyly of as ex-polymerase that replaced it was present in than any other and also has separate with high bootstrap support (5, 13, 14) and phagotrophic ancestor of Parabasalia after itsuggestedcavates by including two singleReclinomonas amino-acid( inser-5), withthe ancestral eukaryote, the bacterial en- DHFR and TS genes. often place Archezoa as sisters to Percolozoa. diverged from Metamonada and other exca- reasonably strong bootstrap support (87%) Like shared laterally transferred genes, The complex tetrakont ciliary apparatus of vates. Sequence evidence that retortamonads in the corresponding distance tree. In view symbiogenetic acquisitions of chloroplasts Fig. 1. Phylogenetic rela- are derived characters that help narrow down Archezoa and Percolozoa was long an obsta- arose from diplomonads (17) eliminates themtionshipsof this of the and major of the eu- ultrastructural unity of cle to considering them the most primitive as candidate “early amitochondrial eu-karyoteexcavates groups and [modified their complex ciliary roots the position of the eukaryotic root, for it eukaryotes; more likely, they evolved from karyotes” by placing them firmly within thefrom(5), (5)]. we Opis predictthokonts that jakobid Loukozoa also cannot lie within any group created by a simpler biciliate eukaryotes (5, 9, 14). As ancestrally tetrakont metamonads. havehave separate the gene DHFR fusion. and Whether Loukozoa single such event. Thus the root cannot be parabasalids and diplomonads share two de- Jakobid Loukozoa have also been con-TS genes.are monophyletic Asterisks mark or polyphyletic is uncer- within Plantae or chromalveolates (5, 18). If rived laterally transferred genes of cyanobac- sidered possible primitive eukaryotes be-all eighttain, groups as is positivethe precise for position of excavates. euglenoids and chlorarachneans got their the bifunctional DHFR-TS chloroplasts in one event in a common ances- terial origin [glucokinase (GK) and glucose- cause their mitochondria retain more pro-fusionBut gene. given The the origin strong of evidence for bikont phosphate isomerase (GPI); Fig. 1] (15, 16), karyotic features than others (8); retentionthemonophyly, fusion gene no at uncertainty the in their internal tor, for which good evolutionary arguments Archezoa are almost certainly holophyletic, of the proteobacterial RNA polymerase bysamebranching time as the order bikont would be relevant to our exist (18), then it cannot lie in any of its so the root cannot lie within them, whether Reclinomonas does not mean that it is theflagellatesconclusion is indicated unless [Ͻ an unstudied lineage descendants (i.e., not within excavates or the beside metamonads, as rRNA trees rooted on most ancient eukaryote—if the viral-typeand consistentlyϾ mark primer branches sites closer to Amoebozoa / clade; Fig. 1). Even if archaebacteria suggested, or Parabasalia, as polymerase that replaced it was present inusedthan here; any for details other bikonts and a and also has separate euglenoids acquired chloroplasts indepen- list of the nine taxa stud- dently (5), an early timing of that event would suggested by two single amino-acid inser- the ancestral eukaryote, the bacterial en-iedDHFR see (4)]. and Triangles TS genes. on the geneLike organization shared laterallydia- transferred genes, exclude the root from some groups: the gnd gramssymbiogenetic indicate the acquisitions posi- of chloroplasts gene of plastid affinity in Percolozoa (19) Fig. 1. Phylogenetic rela- tionare of derived translation characters initia- that help narrow down implies a photosynthetic common ancestor of tionships of the major eu- tionthe codons. position Although of the the eukaryotic root, for it Percolozoa and , ruling out both karyote groups [modified branching order among from (5)]. cannot lie within any group created by a discicristates and Archezoa, if the latter really Current Biology Vol 14the No bikont 17 groups is uncer- are sisters of Percolozoa (Fig. 1) [even the have separate DHFR and tainsingle in places such (notably event. the Thus the root cannot be TS genes. Asterisks mark R694 Theprecise withinSix position Plantae of Helio- or chromalveolatesSupergroups (5, 18). If highly divergent gnd of Archezoa (19) might all eight groups positive for zoaeuglenoids and excavates), and these chlorarachneans got their have the same origin]. the bifunctional DHFR-TS uncertaintieschloroplasts are irrelevant in one event in a common ances- Although the DHFR-TS gene fusion is fusion gene. The origin of to thetor, conclusion for which that good the evolutionary arguments the strongest available evidence for the the fusion gene at the root lies below the com- same time as the bikont exist (18), then itDHFR-TS cannot lie in any ofgene its fusion position of the eukaryote root—among uni- mon ancestor of all these protozoa (5), independent corrobo- flagellates is indicated [Ͻ ‘unikonts’ bikontdescendants groups. The (i.e., mono- not within excavates or the multiple times within . and Ͼ mark primer sites phylyCercozoa/Retaria and internal branch- clade; Fig. 1). Even if ration is desirable to rule out the theoretical used here; for details and a vs. ‘bikonts’ ingseuglenoids of excavates acquired are also chloroplasts indepen-Chromalveolata possibility ofOne its reversal particular in opisthokonts multicellular list of the nine taxa stud- uncertain,Plantaedently (involving5), an early both timing of that event would (and possibly Amoebozoa). Such reversal ied see (4)]. Triangles on sequence and ultrastruc- Alveolates might in principleassemblage, occur by horizontal the ‘charophytes’, the gene organization dia- exclude theGlaucophytes root from some groups:Stramenopiles the gnd turalgene evidence of plastid (5); the affinity to- in Percolozoa (19) acquisition from bacteria of separate DHFR grams indicate the posi- pology (notRed the algae rooting) of Cryptophytes and/or TSactually genes and partially gave or rise totally to the tion of translation initia- thisGreenimplies tree isalgae congruent a photosynthetic with common ancestor of tion codons. Although the Amoebozoa severalPercolozoa+ Plants (not all) recent and Euglenozoa, se- ruling out both Radiolaria deleting theembryophytes fusion gene. Our multiple that dominate land branching order among quencediscicristates trees (e.g., and2, 5 Archezoa,, 6, if the latter really Cercozoa alignments argue clearly against this; de- (Lobose amoebae) (inc. Foraminifera?) the bikont groups is uncer- 9). Theare uncertainsisters of position Percolozoa of Amoebozoa (Fig. 1) [even relative the to the root is emphasized by a dashed branch. rived signaturehabitats, sequences indicatebut are that the of very minor tain in places (notably the Mycetozoan slime moulds Symbiogenetichighly divergent events thatgnd createdof Archezoa eukaryotic (19) algae might are shown: theEuglenozoa, primary origin of chloroplasts from separate opisthokont DHFR and TS genes precise position of Helio- cyanobacteria to form the plant kingdom (23); the secondary symbiogenetic Heterolobosea, implantation of a red algal Pelobionts, have the same origin]. are distinctlyimportance more similar to thein the fusion ocean (where, in zoa and excavates), these Entamoebaecell (circled R) into a heterotrophic host to form chromalveolates (5)andlateraltransferofgreenalgal uncertainties are irrelevant Although the DHFR-TS gene fusion is , genes than to the separate bacterial genes. chloroplasts to create euglenoid and chlorarachnean algae—whether both got plastids in a single fact, extremely small unicellular to the conclusion that the Animals the strongest available evidence for the Excavata Unfortunately, both genes are too short to ancestral event [circled G (18)] or separately [circled plus sign (5)] isMalawimonas uncertain. Miozoa comprise root lies below the com- Opisthokonta Choano- dinoflagellates,position of Sporozoa the eukaryote and protalveolates; root—among Loukozoa uni- include jakobids and anaeromonads; Retaria make robustgreen trees. algae are significant). mon ancestor of all these Diplomonads, compriseciliate Foraminifera protozoa and(5), Radiolaria independent (5). corrobo- Retortamonads, A unique insertion in EF-1␣ of animals bikont groups. The mono- Ichthyosporea phyly and internal branch- Nucleariid amoebae ration is desirable to rule out the theoreticalApusomonads Parabasalids possibility of its reversal in opisthokonts ings of excavates are also Fungi (inc.90 5 JULYCollodictyonids 2002 VOL 297 SCIENCE www.sciencemag.org Chromalveolata uncertain, involving both microsporidia) (and possibly Amoebozoa). Such reversal ? Heliozoa sequence and ultrastruc- might in principleroot occur by horizontal In ‘secondary endosymbiosis’ a tural evidence (5); the to- acquisitionEukaryotes from bacteria of separate DHFR pology (not the rooting) of and/or TS genes and partially or totally eukaryote already containing a this tree is congruent with Prokaryotes several (not all) recent se- deleting the fusion gene. Our multiple primary plastid is engulfed by quence trees (e.g., 2, 5, 6, alignments argue clearly against this; de- ~2002 9). The uncertain position of Amoebozoa relative to the root is emphasized by a dashed branch. rived signature sequences indicate that the another host eukaryote, and over Symbiogenetic events that created eukaryotic algae are shown: the primary origin of chloroplasts from separate opisthokontEubacteria DHFR andArchaebacteria TS genes cyanobacteria to form the plant kingdom (23); the secondary symbiogenetic implantation of a red algal Current Biology Stechmannare distinctly & Cavalier-Smith more similar (2002) to Science the fusion 297:89-91; Simpson & Roger (2004) Curr Biol 14:R693-696 time is reduced to an organelle. cell (circled R) into a heterotrophic host to form chromalveolates (5)andlateraltransferofgreenalgal genes than to the separate bacterial genes. chloroplasts to create euglenoid and chlorarachnean algae—whether both got plastids in a single The new plastid-containing host is ancestral event [circled G (18)] or separately [circled plus signFigure (5)] is 1. uncertain. A diagrammatic Miozoa comprise treeUnfortunately, depicting both the genes organisation are too short to of most eukaryotes into six major dinoflagellates, Sporozoa and protalveolates; Loukozoa include jakobids and anaeromonads; Retaria make robust trees. termed a ‘secondary alga’. comprise Foraminifera and Radiolaria (5). groups. The relationships amongstA unique insertionmost of in EF-1the␣ majorof animals groups and the position of the ‘root’ of the tree are shown as unresolved (note however, the grouping of Opisthokonta and Secondary endosymbiosis has 90 5 JULY 2002 VOLAmoebozoa). 297 SCIENCE The www.sciencemag.org arrow shows a possible precise placement of the root, based on gene happened more than once in fusion data (see text). eukaryotes, but mounting evidence from plastid gene trees and a multicellularity and differentiation, () and green algae distinctive gene replacement event derived independently of animals, such as Chlamydomonas; red suggests that most groups of plants and fungi. Myxogastrid algae (rhodophytes); and an secondary algae descend from one slime moulds, such as Physarum, obscure group called the particular endosymbiosis involving form giant multinucleate super- glaucophytes. Phylogenies of a red algal symbiont. These cells (plasmodia) which are several plastid genes and the organisms, plus their many non- commonly found in terrestrial organisation of plastid genomes photosynthetic relatives, comprise ecosystems. suggest that the plastids of these the group Chromalveolata. Evidence that Amoebozoa is a groups form a single lineage The chromalveolates unites four monophyletic group has emerged specifically related to major groups of eukaryotic algae: only recently. Traditional cyanobacteria. So primary plastid dinoflagellates, cryptophytes, classifications grouped diverse endosymbiosis seems to have haptophytes and stramenopiles amoeboid organisms together; this happened just once in eukaryotic (~heterokonts), and many non- united most or all Amoebozoa, but evolution, with the host being a photosynthetic forms (see below). grouped them also with unrelated common ancestor of these three The first three groups are forms such as Radiolaria, groups. Phylogenetic analyses, unicellular, with a few colonial Foraminifera and ‘heliozoa’ (see particularly some centred around forms. Stramenopiles, however, below). By contrast, early the gene for elongation factor 2 range from tiny unicells, through to phylogenetic studies of rRNA indicate that ‘reds’ and ‘greens’ are elaborate unicells and colonies, for sequences suggested the various closely related, with glaucophytes example diatoms, and truly Amoebozoa are independent perhaps being their sister group. At multicellular and massive life groups. This now appears to have present we follow many in the field, forms, such as . been an artefact of the simplistic and refer to this whole group as Dinoflagellates and diatoms are the analytical methods available at the ‘Plantae’, but we caution that most dominant ‘large’ phytoplankton in time — improved analyses and botanists use ‘Plantae’ for subsets the ocean. sampling of more amoebae tend to of this group, such as green algae Dinoflagellates and show that these organisms are plus land plants. stramenopiles also include a wide related, as do analyses of The incorporation of the primary diversity of heterotrophic forms individual protein sequences plastid had a huge effect on the (and mixotrophs, organisms that (especially actin) and sophisticated genetic potential and basic biology subsist by both photosynthesis studies of multiple proteins. of the host organisms. Almost all and heterotrophy). Heterotrophic Plantae are specialist phototrophs; stramenopiles are very important Plantae a few are non-photosynthetic consumers of bacteria in aquatic ‘Primary endosymbiosis’ describes parasites, but even these environments, but also include the origin of a eukaryotic organelle organisms retain plastids in a some animal by the engulfment, enslavement reduced form. Plantae are the only parasites/commensals, and a and genomic reduction of a one of the major groups that may diversity of fungal-like forms. For prokaryotic cell. Three lack entirely the ability to engulf example, the Irish potato famine photosynthetic groups have particulate food. pathogen, , plastids (chloroplasts) that Multicellularity has evolved on is an stramenopile. originated by primary several occasions within Plantae: Heterotrophic and mixotrophic endosymbiosis: land plants probably once in red algae, but dinoflagellates are important !""# $%&%'()*+',-*./#01#-2$$&'3'4.1#5"!"

#6789:;<7=>#<6#?@AB::#E@:#:IG@8DC:# E7::>#E<#FX:#?@AB#6<7# J:E;::D# 9A>:B6# 8DK# L(M4#N%.'-/# FD# ?87EFGOB87P# N%.'-/# 9<>E#C7#E@8D#F>#GO77:DEBA#?<>>FJB:#O>FDC#8#>O?:798T :E#8BQ#5""5R#S*4*4)%T'3(4)-#:E#8BQ#5""0R#S*4*4)%T'3(4)-# E7FI#8??7<8G@Q#'X:D#><1#FE#F>#@:BK#JA#><9:#E@8E#?@AB9>#<6#>O?:7E7::>#78DC:# D:EFG#>O?:7E7::>#9:7:BA#7:?7:>:DE#8#>EO7:#6<7# 67<9# >@<7EG<9FDC># FD# >?:GF!#G# F9?B:9:DE8EF# <6# E@:# ?@ABO6!#GF:DE#9@<7EG<9FDC>#E@8E#87:#6ODK89:DE8B# G<9:#8X8FB8JB:#E<#:D8JB:#E@:#9<7:#K:>F78JB:#>O?:798E7FI# E<#E@:#9:E@:B6Q#%9# 8D8BA>:>Q#[FE@#E@:#:X:7TFDG7:8>FDC#?8G:#8DK#:X:7TK:G7:8>T E@8E#>O?:7E7::>1#JA#G<9JFDFDC#8DK#8D8BAYFDC#E@:#E7::>#K:T FDC#G<>E>#<6#@FC@#E@7:\O:DGFDC1#E@F>## 7FX:K#67<9#G@878GE:7#K8E8#78E@:7#E@8D#8D8BAYFDC#E@:#K8E8# @8X:#8#G:7E8FD#X8BFKFEA#E<#FE1#8BJ:FE#9OG@#9<7:#><#6<7#G@87T KF7:GEBA1#7:?7:>:DE#8#9:E8T8D8BA>F>#E:?#7:998EFG#C7#H:Q#CQ1#98998B>#<7#"#<;:7FDC#?B8DE>V#E@8D# E@:#7:8B#K8E8#HN%.'-/#Z#-$,*4N',#5""UVQ#.@:#FD@:7:DE#B<>># 6<7#>#;:BB#>EOKF:K##H:Q#CQ1#7#8DK#98DA# <6#FD6<798EF#68GE#:DE8FB>#8OE<98EFG8BBA#E78D>B8E:>#6<7# <9:#E<#8D#FD@:7:DE#K:G7:8>:#FD#8GGO78GA#G<9?87:K#E<#8# [@8E#E@:D#K<:>#E@:#6OEO7:#@O?:7E7::>1#F6#8DT >O?:798E7FI#8D8BA>F>Q#*D>E:8K1#E@:#9:E8T8D8BA>F>#D8EO7:#<6# AE@FDC]# %># *# 87CO:K# ><9:# A:87># 8C<# HS*4*4)%T'3(4)-# >O?:7E7::>#G8D#8B><#J:#XF:;:K#8>#8#?E7:DCE@Q#S:T 5""U8V1# E@:# 8??BFG8EFO?:7E7::# 6789:;<7=# ;FBB# G8O>:#F>>O:>#<6#G@878GE:7#K8E8#G<9JFD8JFBFEA#K<#D@F6E#67<9#FE>#E78KFEFO?:7E7::#GE7OGEF#:#G8D#J:#O>:K#E<#K:7FX:#E@:#:XQ#.@F># 9<7:# FDE:C78E:K# ;FE@# E@:# >O?:798E7FI# 6789:;<7=# H>::## 68GE1#FD#B87C:#9:8>O7:1#8GG#6<7#E@:#8JFBFEA#<6#>O?:7TPhylogenomics^FCQ#!VQ#.@:#EF9FDC#<6#E@F>#>@F6E#K:?:DK>#B87C:BA#

(Supermatrix)

Bininda-Emonds (2010) Palaeodiversity 3 (Suppl.):99-106 !"#$%&Q#S7:8=FDC#K<;D#E@:#;8BB#J:E;::D#>O?:798E7FI#HB:6EV#8DK#>O?:7E7::#H7FC@EV#8D8BA>:>Q#*D>E:8K#<6#J:FDC#J8>:K#EFDGE#K8E8#>:E># HE?:GEFX:BAV1#J#;FBB#8D8BAY:#E@:#>89:#H9:E#HEO?:7E7::# 8D8BA>F>#<6#?87EFEF#FD#E@F>#K8E8#>:E#;FBB#E@:D#J:#G<9?87:K#KF7:GEBA#E<#E@:#>O?:798E7FI#>:K#E<#>::K#8#>O?:798E7FI#8D8BA>F>#8>#?87E#<6#8#KFXFK:T8DKTGE78E:CA#H9FKKB:VQ Supermatrix Assembly 74 L’inference´ phylogenomique´

S1 S2 S3

S1 A C G T C A A G S1 T G G - - T S1 C G G A C T A C G T S2 A C - T C C A G S3 A G C T C C S4 C C C T - - - - G G S3 A C - T C G A C S4 A G C T C G S5 C G T T C G A C G T

S1 S2 S3

S1 A C G T C A A G T G G - - T C G G A C T A C G T S2 A C - T C C A G ...... S3 A C - T C G A C A G C T C C ...... S4 ...... A G C T C G C C C T - - - - G G S5 ...... C G T T C G A C G T

FIG. 4.1 – Exemple de supermatrice de caracteres` A partir de trois alignements de sequences´ d’ADN S1, S2 et S3 (en haut), une supermatrice de caractere` a et´ e´ construite Delsuc et al. (2005) Nat Rev Genet 6:361-375; Criscuolo (2006) Thèse de Doctorat en concatenant´ les trois alignements de sequences´ et en figurant les etats´ de caractere` manquants par un point (en bas).

4.1.2 Analyse simultanee´ a` partir d’une supermatrice de caracteres`

Il n’est pas rare d’observer des differences´ topologiques entre deux arbres phylogen´ etiques´ definis´ sur le memeˆ ensemble d’especes` mais infer´ es´ a` partir de deux genes` differents´ . Ce fait, nomme´ incongruence entre genes` , est souvent provoque´ par le phenom´ ene` d’homoplasie (Sober, 1988). Par homoplasie, on entend l’apparition independante´ d’etats´ de caractere` simi- laires chez des taxons eloign´ es´ , impliquant souvent un phenom´ ene` d’attraction des longues branches, en particulier si on utilise les criteres` MP ou de distance. L’homoplasie est subdi- visee´ en convergence (apparition independante´ d’un memeˆ etat´ de caractere)` et rev´ ersion (ap- parition d’un etat´ de caractere` ayant l’apparence d’un etat´ de caractere` ancestral). Une forte tel-00142222, version 1 - 17 Apr 2007 het´ erog´ en´ eit´ e´ des taux d’ev´ olution entre sites est une des causes de l’homoplasie lorsque l’on considere` des sequences´ de caracteres` moleculaires´ . Les transferts horizontaux de genes` sont egalement´ responsables de l’incongruence entre genes` . Ainsi, lorsque l’on souhaite inferer´ un arbre phylogen´ etique´ a` partir d’une collection de genes` , il est souvent recommande´ d’effectuer des tests d’incongruence entre genes` , e.g. test ILD (Farris et al., 1995). Une premiere` approche consiste a` eliminer´ du jeu de donnees´ initial le(s) gene(s)` presentant´ une forte incongruence par rapport aux autres. Une autre approche revient a` detecter´ et eliminer´ seulement les etats´ de ca- racteres` responsables de l’incongruence. Ces deux approches ne sont pas incompatibles avec le principe TE (Lecointre and Deleporte, 2005). Les techniques de combinaison basse peuvent souvent etreˆ handicapees´ par l’apparition de nombreuses donnees´ manquantes (e.g. 46% d’etats´ de caractere` manquants dans l’exemple de la Figure 4.1). Les etudes´ de phylogenomiques´ recentes´ basees´ sur la combinaison basse offrent souvent plus de 50% d’etats´ de Dispatch R165

cytokinesis and are enhanced by Rho 18. Yamada, T., Hikida, M., and Kurosaki, T. (2006). and RGA-4 in the germ line and in the early and suppressed by Rac. J. Cell Biol. 166, Regulation of cytokinesis by mgcRacGAP in B embryo of C. elegans. Development 134, 61–71. lymphocytes is independent of GAP activity. 3495–3505. 16. Severson, A.F., Baillie, D.L., and Bowerman, B. Exp. Cell Res. 312, 3517–3525. (2002). A formin homology protein and a 19. Schonegg, S., Constantinescu, A.T., Hoege, C., profilin are required for cytokinesis and and Hyman, A.A. (2007). The Rho GTPase- Department of Molecular Genetics and Cell Arp2/3-independent assembly of cortical activating proteins RGA-3 and RGA-4 are Biology, University of Chicago, Chicago, microfilaments in C. elegans. Curr. Biol. 12, required to set the initial size of PAR domains IL 60637, USA. 2066–2075. in Caenorhabditis elegans one-cell E-mail: [email protected] 17. Zhang, W., and Robinson, D.N. (2005). Balance embryos. Proc. Natl. Acad. Sci. USA of actively generated contractile and resistive 104, 14976–14981. forces controls cytokinesis dynamics. Proc. 20. Schmutz, C., Stevens, J., and Spang, A. (2007). Natl. Acad. Sci. USA 102, 7186–7191. Functions of the novel RhoGAP proteins RGA-3 DOI: 10.1016/j.cub.2008.12.028

Evolution: Revisiting the Root of the been controversial since they were first proposed [6]. Now, with the Eukaryote Tree rapid accumulation of genome-scale data for diverse protist species, a flurry of phylogenomic analyses A recent phylogenomic investigation shows that the enigmatic flagellate [7–9] are putting these hypotheses is a distinct anaerobic lineage within the eukaryote super-group to the test. Amoebozoa and challenges the unikont–bikont rooting of the tree of A recent paper by Minge et al. [7] eukaryotes. reports phylogenomic analyses of the enigmatic protist Breviata anathema, Andrew J. Roger1,2 ‘bikonts’ (all other super-groups), a small -like cell with an and Alastair G.B. Simpson1,3 as shown in Figure 1. Both the six anterior flagellum. Breviata is super-groups model and the interesting for two major reasons: it In the 1980s and 1990s, prevailing unikont–bikont root hypothesis have lives in low oxygen conditions and views of the eukaryote tree of life were strongly influenced by phylogenies of Lack of Resolution small subunit ribosomal RNA (rRNA) genes [1]. Although these analyses ‘Unikonts’ ‘Bikonts’ placed many eukaryotes into major groups, it became clear that the relationships amongst these groups Apusomonads (= Plantae) could not be determined because + Amoebozoa Breviata of the limited information available Haptophytes Slime moulds in a single gene, as well as Stramenopiles methodological artifacts [2]. More 0 Phalansterium 1 2+ recently, a ‘six super-groups’ Alveolata Multicilia hypothesis for deep eukaryote 1 1/2 2 2 2 1 2 phylogeny emerged as a synthesis of Lobose 2 Rhizaria analyses of sequence data for rRNAs, amoebae 0 2 concatenated sets of conserved 2 proteins and organellar genomes, Opisthokonts 2* Excavata and some detailed ultrastructural inc. Animals comparisons [3]. The six super-groups and Fungi proposed are the opisthokonts, Eukaryotes Amoebozoa, Archaeplastida, Prokaryotes chromalveolates, Rhizaria and Excavata. In the absence of outgroup ~2010 sequences that are sufficiently Eubacteria Archaebacteria Current Biology closely related to allow reliable modifié de Roger & Simpson (2009) Curr Biol 19:R165-7 rooting of eukaryotes in molecular Figure 1. The placement of Breviata anathema in the eukaryote tree of life. phylogenies, Cavalier-Smith and The relationships amongst the six super-groups of eukaryotes are shown as recovered by Minge colleagues proposed that the et al. [7] and other recent phylogenomic analyses [8,9]. The hypothesized super-groups are presence or absence of colour-coded as follows: opisthokonts (purple), Amoebozoa (light blue), Archaeplastida (green), a dihidrofolate reductase-thymidylate chromalveolates (orange), Rhizaria (dark blue) and Excavata (brown). Note that recent evidence synthase (DHFR-TS) gene fusion [4] suggests that Rhizaria are specifically related to some chromalveolates [8,9]. The tree is shown and specific myosin gene families [5] as rooted according to the unikont–bikont hypothesis [5,14]. Anaerobic/microaerophilic in diverse eukaryotes could be used protistan lineages that lack classical mitochondria are shown in red. The numbers in ellipses show the inferred ancestral number of basal bodies per kinetid (flagellar unit) in the various to infer that the eukaryote root falls eukaryote lineages. The plus (+) indicates that Breviata may contain more basal bodies than between so-called ‘unikonts’ the number cited whereas the asterisk (*) indicates that one basal body is non-flagellated. (opisthokonts and Amoebozoa) and Dashed lineages indicate uncertainty in the location of that branch on the tree (see text). Issues (1)

possible ‘big-bang’ radiation of Eukaryotes (worsened by phylogenetic artefacts) Issues (2)

mid analog exists in eukaryotes, Archibald et al.suggestthatHGTinB. natans may occur in the same way it has for the many endosymbiotic events that have happened over the past 2 billion years, by engulfing other organisms (Fig. 1). Compared with the green alga Chlamydomonas reinhardtii,whichis photoautotrophic and in which no paral- lel evidence of HGT is found, B. natans is mixotrophic, meaning that it can live phagocytotically and photosynthetically. Models have been proposed whereby small snippets of DNA from an en- gulfed microbe are able to escape diges- tion, e.g., from protist lysozomes, and migrate to and subsequently be incorpo- rated into the host genome (6, 16). One can imagine the series of fleetingly small probability events proceeding from en- gulfment to incorporation of a strand of foreign DNA into the genome to a new gene overcoming genetic drift to be- come fixed in the population. In a cer- Fig. 1. Stepwise conceptual image of one mechanism of HGT that, as proposed by Archibald et al. (5), tain sense, this is the same cumulative might operate in the chlorarachniophyte alga B. natans. Red arrows show phagocytosis and subsequent effect as random mutations in single digestionhorizontal of a bacterium or protist, gene from whichtransfer foreign DNA has(HGT) survived digestion due and become to incorporated into the algal nucleus (flow of HGT-acquired genetic information indicated with blue genes or dripping water, but it now op- arrows). Although the genes« You studied herein are by Archibald whatet al. areyou directed eat for function » to the plastid, the erates on a different level, an entire significant number of horizontally transferred genes they found may only be the tip of the iceberg in gene. However, the essential point is phagocytic protists such as B. natans. that those probabilities are nonzero and over time have made a significant con- tribution to the genome of this organism support for HGT having played a signif- Most intriguingly, two of the genes from (6). The real boon of Archibald et al.’s icant role in the evolution of this their analysis indicate HGT from differ- hypothesis, perhaps best synopsized by organism. ent bacteria, significant not only as an Ford Doolittle’sepigram‘‘you are what Because all of the genes studied by example of prokaryote-to-eukaryote you eat’’ (16), is that it is eminently test- Archibald et al. are encoded within the gene transfer but also because these ac- able as more eukaryotic genome data nucleus of B. natans but operate within quired genes initially would have not become available. It is already apparent the plastid, the signal and transit pep- had the proper leader sequence for that the magnitude of HGT varies dra- tides are absolutely necessary. One can import into the plastid. Whether the matically in different lineages of algae, surmise that this necessity would appropriate targeting sequence was in- with the proposed explanation of the strongly favor HGT in and among algae, corporated de novo through gene con- phagocytotic lifestyle as a likely but not where nuclear-encoded genes targeted version or some other mechanism of proven explanation for the observed to plastids must navigate a similar maze homologous or orthologous replacement mosaic pattern. Whether this is a more of endomembranes through the direc- is not clear, but this remarkable finding general mechanism for HGT in a wider tion of transit peptides, and additional certainly invokes new ideas on how range of eukaryotes, including nonalgal signal peptides in the case of other sec- genes are assimilated into a genome. taxa, is not yet apparent. ondary algae. Indeed, a majority of Microbiologists have long known So although Nature herself may not genes studied by the authors support about phenotypes that favor promiscu- make leaps, it now seems clear that this expectation, favoring phylogenies ous plasmid sharing among bacteria, many organisms, eukaryotes and pro- consistent with HGT from streptophytes responsible for the epidemic spread of karyotes, are certainly able to mimic or red algae into the B. natans genome. antibiotic resistance. Although no plas- evolutionary jumps through HGT.

1. Doolittle, W. F. (1999) Trends Cell Biol. 9, Douady, C. J., Andersson, J. O. & Roger, A. J. 11. Douglas, A. E. & Raven, J. A. (2003) Philos. M5–M8. (2003) Philos. Trans. R. Soc. London B 358, 39–57; Trans. R. Soc. London B 358, 5–17; discussion 2. Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. discussion 57–58. 17–18. (2002) Mol. Biol. Evol. 19, 2226–2238. 7. Stiller, J. W. & Hall, B. D. (1997) Proc. Natl. Acad. 12. Douglas, S. E. & Gray, M. W. (1991) Nature 352, 3. Ochman, H., Lawrence, J. G. & Groisman, E. A. Sci. USA 94, 4520–4525. 290 (lett.). (2000) Nature 405, 299–304. 8. Stiller, J. W., Riley, J. & Hall, B. D. (2001) J. Mol. 13. Douglas, S. E. (1992) Biosystems 28, 57–68. 4. Avery, O. T., MacLeod, C. M. & McCarty, M. Evol. 52, 527–539. 14. McFadden, G. I. (1999) J. Eukaryotic Microbiol. (1944) J. Exp. Med. 79, 137–159. 9. Kirschvink, J. L., Gaidos, E. J., Bertani, L. E., 46, 339–346. 5. Archibald, J. M., Rogers, M. B., Toop, M., Ishida, Beukes, N. J., Gutzmer, J., Maepa, L. N. & 15. Deane, J. A., Fraunholz, M., Su, V., Maier, U. G., K.-i. & Keeling, P. J. (2003) Proc. Natl. Acad. Sci. Steinberger, R. E. (2000) Proc. Natl. Acad. Sci. Martin, W., Durnford, D. G. & McFadden, G. I. USA 100, 7678–7683. USA 97, 1400–1405. (2000) Protist 151, 239–252. 6. Doolittle, W. F., Boucher, Y., Nesbo, C. L., 10. McFadden, G. I. (2001) J. Phycol. 37, 951–959. 16. Doolittle, W. F. (1998) Trends Genet. 14, 307–311.

7420 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.1533212100 Raymond and Blankenship What the papers say M. Elias and J. M. Archibald

Secondary host a distantly related photosynthetic eukaryote whose plastid Primary host evolved directly from the cyanobacterial plastid progenitor. Inferring how many times the ‘primary’ plastids of red algae, Plastid green algae (and plants) and algae evolved into progenitor M2 ‘secondary’ plastids is an area of active investigation and debate.(22–25) No secondary plastids derived from glauco- phytes are known, but both green and red algae have, each at M1 N2 the very least on one occasion, been captured and converted N1 into a secondary plastid (Fig. 2). This process involves a EGT second round of EGT, this time from the endosymbiont CB EGT nucleus to that of the secondary host (Fig. 1), as well as the evolution of another protein import pathway built on top of that used by primary plastids.(5,26) For these reasons, successful HGT HGT HGT integration of a secondary endosymbiont is thought by many Figure 1. Endosymbiosis and gene flow in photosynthetic eukar- to be difficult to achieve, and secondary endosymbiosis is yotes. Diagram depicts movement of genes in the context of primary thus usually invoked only sparingly. and secondary endosymbiosis, beginning with the cyanobacterial Secondary plastids of green algal origin occur in endosymbiont (CB) that gave rise to modern-day plastids. Acquisition euglenophytes and chlorarachniophytes, whereas most of genes by horizontal (or lateral) gene transfer is always a possibility, plastids in so-called ‘chromalveolates’ are derived from red and such genes can be difficult to distinguish from those acquired by endosymbiotic gene transfer (EGT). CB, cyanobacterium; HGT, hor- algae. Chromalveolates include cryptophytes, haptophytes, izontal gene transfer; MT, mitochondrion. dinoflagellates, apicomplexans, the newly discovered coral alga and stramenopiles (or heterokonts), the latter being the group to which diatoms belong (Fig. 2). The chromalveolate hypothesis was put forth as an attempt to Issues (3) Plantae

Archaeplastida SAR Rhizaria Other Rhizaria Glaucophytes Red algae Foraminifera Green algae o + plants 2 Chlorarachniophytes Stramenopiles Other heterotrophs Amoebozoa HGT Other phototrophs Includes

HGT 'Chromalveolates' Slime moulds 1o Diatoms Apusomonads, o HGT Lobose amoebae 2 HGT Other lineages o Ciliates

2 Alveolates 3o Apicomplexans +/- Chromera Perkinsids (?) 3o Includes Dinoflagellates +/- 3o HGT Animals Cryptophytes Fungi Includes Diplomonads Goniomonas Opisthokonts Parabasalids Katablepharids Heterolobosea Telonemids Jakobids Kinetoplastids Euglenophytes 2o Haptophytes (Pico)biliphytes (?) Excavata Figure 2. Origin and spreadendosymbiotic of photosynthesis across the gene eukaryotic transfer tree of life. Diagram (EGT) shows hypothesised due to ‘supergroups’ with emphasis on those containing photosynthetic lineages. Some (but not all) lineages within the different supergroups are provided for context. The tree topology shown within the ‘chromalveolate’ SAR (Stramenopiles Alveolates Rhizaria) clade represents a synthesis of phylogenetic and þ þ þ phylogenomic data published as of 31 Augustmultiple 2009. Branch plastid lengths do nottransfers correspond to evolutionary distance. Dashed lines indicate uncertainties with respect to the timing and/or directionality of secondary (2adapted8) or tertiary from Elias (3 8and) endosymbiotic Archibald (2009) events, BioEssays question 31:1273-1279 marks (?) indicate uncertainty as to the presence of a plastid and ‘ / ’ indicates that both plastid-bearing ( ) and plastid-lacking ( ) dinoflagellates and þ À þ À apicomplexans exist. Examples of green and red algal-derived tertiary plastids in dinoflagellates are known (see text for discussion). HGT, horizontal (or lateral) gene transfer.

1274 BioEssays 31:1273–1279, ß 2009 Wiley Periodicals, Inc. ANRV329-GE41-08 ARI 21 June 2007 22:19 ANRV329-GE41-08 ARI 21 June 2007 22:19 endosymbiose primaire endosymbiose secondaire Plantae « Chromalveolés »

Karenia brevis tablishmenta of a reliable connectionCryptophytes between b the host cell and the ancestral plastid to allow Isochrysis galbana the controlled exchange of metabolic inter- Photosynthetic Haptophytes Prymnesium parvum Plantae ancestor mediates between the symbiotic partners. Thalassiosira pseudonana Regulated exchange is important because the Stramenopiles Phaeodactylum tricornutum unfettered flux of metabolites between the Heterocapsa triquetra Alexandrium tamarense Plastid host and plastidAncestral would have had detrimental establishment Chromalveolate carterae effects on the metabolism of both partners Galdieria sulphuraria Alveolates and thereby lowered the evolutionary fitness Cyanidioschyzon merolae Red algae of the symbiosis. A diverse set of metabolite Eudicot Primary antiporters that are embedded in the inner Arabidopsis thaliana endosymbiosis Oryza sativa membraneGreen of current day plastids allows the Chlamydomonas reinhardtii controlledalgae exchange of solutes between cellu- Prey Bigelowiella natans retention lar compartments (100). This antiport func- Euglena gracilis tion is dependent on the presence of a suitable Nostoc sp. PCC7120 Crocosphaera watsonii WH8501 counterexchange substrate onChorarachniophytes the trans-site of the membrane. It was recently shown that Synechococcus elongatus PCC6301 Gloeobacter violaceus PCC7421 theGlaucophyte plastid triosephosphate and related sugar algae Prey transporters were established in the common Chromalveolates Green algae digestion ancestor of the red and green algae (and likely Cyanobacteria Photosynthetic Red algae Chlorarachniophyte (prey) Phagotrophic Plantae ancestor all ‘Plantae’,‘Plantae’ supporting ancestor their monophyly), al- Cyanobacteria Figure 2 lowing this first alga to profit from cyanobac- terial carbon fixation (99).Figure This 4 evolutionary Hypothetical model showing the primary endosymbiotic origin of the The origin(s) of plastids in photosynthetic eukaryotes. (a) Multiple lines of evidence (see text) support the plastid in the ‘Plantae’ common ancestor. step likely rendered irreversible the associ- ation betweenReyes-Prieto the plastidsingle et andal. origin (2007) the of host Annu the cell. primary Rev Genet plastid 41:147-168 in the ‘Plantae’ common ancestor. The plastid in red and green algae was then transferred to chromalveolates, euglenids, and chlorarachniophyte amoebae via independent cyanobacterium (10, 56, 103). The sister The ancestral plastid antiporter evolved from secondary endosymbioses. (b) Phylogenetic tree based on maximum likelihood analysis of a data set of 6 of Paulinella chromatophora, Paulinella ovalis, an existing metabolite translocatornuclear-encoded in the plastid-targeted host proteins that shows the origin of the primary plastid in ‘Plantae’ from a lacks a plastid but is an active predator of cell that had evolved duecyanobacterial to the pre-existence source (blue circle), the secondary origin of the red algal plastid (red circle) in cyanobacteria that are localized in food vac- of mitochondria and anchromalveolates, endomembrane and sys- the independent origins of the green algal plastid (green circles) in euglenids, and uoles in the cytoplasm (43). This observation tem and was likely transferredchlorarachniophytes to the ancient (see text for details). These latter two groups are not part of the phylogenetic provides some support for the phagotrophic plastid via membrane fusion.analysis This and hypothesis have been simply added to the tree. origin of the ancient plastid. Molecular of a close interaction between plastid enve- clock analyses using multigene data sets and lope membranes and theanalyses host endomembrane of nuclear-encoded plastid-targeted an alga containing a secondary plastid was it- by UNIVERSITE DE LIEGE on 07/03/07. For personal use only. proteins that supports the monophyly of chro- self engulfed and retained by another protist “relaxed clock” approachesby UNIVERSITE DE LIEGE on 07/03/07. For personal use only. (e.g., penalized system is supported by the observation that Annu. Rev. Genet. 2007.41. Downloaded from arjournals.annualreviews.org likelihood, Bayesian methods) that do not the outer leaflet of themalveolate outer membrane plastids of is shown in Figure 4b. (13, 40, 69). Although not discussed in detail Annu. Rev. Genet. 2007.41. Downloaded from arjournals.annualreviews.org The separate origins of the chlorarachnio- here, this phenomenon is until now limited Antiporter: an assume strict chronometric behavior of genes extant primary plastids consists mainly of integral membrane under study suggest that the ‘Plantae’ primary ER-derived phospholipidsphyte (17, and 21). euglenid A more green plastids that was in- to dinoflagellates that are the masters of serial protein that couples endosymbiosis is an ancient event in eukary- recent analysis of 83 annotatedferred from plastid analysis solute of plastid genomes from endosymbiosis (31). the active transport otic evolution. Although still controversial transporters from Arabidopsisthese taxa thaliana (85)shows have been added to this tree. of two different (23), recent analyses suggest that the primary that the majority of theseThe genes potential that power have a offered by phylogenetics molecules in Case Study: The Peculiar Path of opposite directions plastid was established ca. 1.5 billion years ago resolved phylogeny in thisis exemplified taxon and in by otherFigure 4b in which we can across the in the Mesoproterozoic (e.g., 14, 35, 59, 101). ‘Plantae’, including alltrace carbohydrate in one framework trans- the origin of prokary- Dinoflagellate Peridinin Plastid membrane, as in the porters, originated fromotic co-option genes in (i.e., eukaryotic gene nuclear genomes via Evolution plastid triosephos- duplication followed byprimary retargeting endosymbiosis to the (filled blue circle) and The most common type of plastid in di- phate/phosphate Early Events in ‘Plantae’ Plastid plastid) of host genes, whereasthe subsequent about a quarter transfer of these genes from antiporter Evolution noflagellates contains peridinin as the major are of cyanobacterialone (i.e., or endosymbiont) more red algae to the chromalveolates carotenoid. This pigment, although similar ER: endoplasmic We have hypothesized that a crucial early provenance. The nuclear origin of many reticulum via secondary endosymbiosis (filled red circle). in structure to fucoxanthin, is unique to step in endosymbiosis must have been the es- transporters is supportedThis by typetheir of absence analysis in has also provided direct this group. Three membranes surround the evidence for tertiary endosymbiosis in which peridinin-containing plastid, which is not 150 Reyes-Prieto Weber Bhattacharya

· · 156 Reyes-Prieto Weber Bhattacharya · · What the papers say M. Elias and J. M. Archibald

Secondary host a distantly related photosynthetic eukaryote whose plastid Primary host evolved directly from the cyanobacterial plastid progenitor. Inferring how many times the ‘primary’ plastids of red algae, Plastid green algae (and plants) and glaucophyte algae evolved into progenitor M2 ‘secondary’ plastids is an area of active investigation and debate.(22–25) No secondary plastids derived from glauco- phytes are known, but both green and red algae have, each at M1 N2 the very least on one occasion, been captured and converted N1 into a secondary plastid (Fig. 2). This process involves a EGT second round of EGT, this time from the endosymbiont CB EGT nucleus to that of the secondary host (Fig. 1), as well as the evolution of another protein import pathway built on top of that used by primary plastids.(5,26) For these reasons, successful HGT HGT HGT integration of a secondary endosymbiont is thought by many Elias and Archibald (2009) BioEssays 31:1273-1279 Figure 1. Endosymbiosis and gene flow in photosynthetic eukar- to be difficult to achieve, and secondary endosymbiosis is yotes. Diagram depicts movement of genes in the context of primary thus usually invoked only sparingly. and secondary endosymbiosis, beginning with the cyanobacterial Secondary plastids of green algal origin occur in endosymbiont (CB) that gave rise to modern-day plastids. Acquisition euglenophytes and chlorarachniophytes, whereas most of genes by horizontal (or lateral) gene transfer is always a possibility, plastids in so-called ‘chromalveolates’ are derived from red and such genes can be difficult to distinguish from those acquired by endosymbiotic gene transfer (EGT). CB, cyanobacterium; HGT, hor- algae. Chromalveolates include cryptophytes, haptophytes, izontal gene transfer; MT, mitochondrion. dinoflagellates, apicomplexans, the newly discovered coral alga Chromera velia and stramenopiles (or heterokonts), the latter being the group to which diatoms belong (Fig. 2). The chromalveolate hypothesis was put forth as an attempt to

Archaeplastida SAR Rhizaria Other Rhizaria Glaucophytes Red algae Foraminifera Green algae o + plants 2 Chlorarachniophytes Stramenopiles Other heterotrophs Amoebozoa HGT Oomycetes Other phototrophs Includes

HGT 'Chromalveolates' Slime moulds 1o Diatoms Apusomonads, o HGT Lobose amoebae 2 HGT Other lineages o Ciliates

2 Alveolates 3o Apicomplexans +/- Chromera Perkinsids (?) 3o Includes Dinoflagellates +/- 3o Choanoflagellates HGT Animals Cryptophytes Fungi Includes Diplomonads Goniomonas Hacrobia Opisthokonts Parabasalids Katablepharids Heterolobosea Telonemids Jakobids Kinetoplastids Centrohelids Euglenophytes 2o Haptophytes (Pico)biliphytes (?) Excavata

Figure 2. Origin and spread of photosynthesis across the eukaryotic tree of life. Diagram shows hypothesised ‘supergroups’ with emphasis on those containing photosynthetic lineages. Some (but not all) lineages within the different supergroups are provided for context. The tree topology shown within the ‘chromalveolate’ SAR (Stramenopiles Alveolates Rhizaria) clade represents a synthesis of phylogenetic and þ þ þ phylogenomic data published as of 31 August 2009. Branch lengths do not correspond to evolutionary distance. Dashed lines indicate uncertainties with respect to the timing and/or directionality of secondary (28) or tertiary (38) endosymbiotic events, question marks (?) indicate uncertainty as to the presence of a plastid and ‘ / ’ indicates that both plastid-bearing ( ) and plastid-lacking ( ) dinoflagellates and þ À þ À apicomplexans exist. Examples of green and red algal-derived tertiary plastids in dinoflagellates are known (see text for discussion). HGT, horizontal (or lateral) gene transfer.

1274 BioEssays 31:1273–1279, ß 2009 Wiley Periodicals, Inc. D’où viennent les Eucaryotes4 ? Eukaryotic Origins

3 main scenarios

http://reflexions.ulg.ac.be/ Chimeric Eukaryotes (1)

c cyanobactérie (eubact érie) p protéobactérie (eubact érie) m méthanogène (archébactérie) y levure (Eucaryote)

628 gènes 513 gènes James Lake

628 gènes BLAST BRH gènes informationnels (best reciprocal hits) gènes opérationnels

Rivera et al. (1998) PNAS USA 95:6239-6244; Poptsova & Gogarten (2007) BMC Bioinformatics 8:120

NATURE|Vol 440|30 March 2006 REVIEWS

called into question, as some phylogenies have suggested that ism involving mitochondrial origins (Fig. 4e–g). A criticism facing all prokaryotes might be derived from eukaryotes55. However, the ‘archaebacterial host’ models is that phagotrophy (the ability to ubiquity of mitochondrial homologues represents a strong argument engulf bacteria as food particles) was once seen as an absolute that clearly polarizes the prokaryote-to-eukaryote transition: because prerequisite for mitochondrial origins60. This argument has lost the common ancestor of contemporary eukaryotes contained a some of its strength with the discovery of symbioses where one mitochondrial endosymbiont that originated from within the prokaryote lives inside another, non-phagocytotic prokaryote65. proteobacterial lineage, we can confidently infer that prokaryotes arose and diversified before contemporary eukaryotes—the only The elusive informational ancestor ones whose origin requires explanation—did. This view is consistent With the exception of the neomuran hypothesis, which views both with microfossil and biogeochemical evidence56. eukaryotes and archaebacteria as descendants of Gram-positive Current ideas on the origin of eukaryotes fall into two general eubacteria60,61 (Fig. 4d), most current theories for eukaryotic origins classes: those that derive a nucleus-bearing but amitochondriate cell (Fig. 4) posit the involvement of an archaebacterium in that process. first, followed by the origin of mitochondria in a eukaryotic host57–61 The archaebacterial link to eukaryote origins was first inferred (Fig. 4a–d), and those that derive the origin of mitochondria in a from shared immunological and biochemical similarities of their prokaryotic host, followed by the origin of eukaryotic-specific DNA-dependent RNA polymerases66. Tree-based studies of entire features62–64 (Fig. 4e–g). Models that derive a nucleated but amito- genomes67,68 extended this observation: most eukaryotic genes for chondriate cell as an intermediate (Fig. 4a–d) have suffered a replication, transcription and translation (informational genes) are substantial blow with the demise of Archezoa. Models that do not related to archaebacterial homologues, while those encoding biosyn- entail amitochondriate intermediates have in common that the host thetic and metabolism functions (operational genes) are usually assumed to have acquired the mitochondrion was an archaebacter- related to eubacterial homologues8,67,68. Chimericium not a eukaryote; hence, the steep organizationalEukaryotes grade between The rooted SSU rRNA tree1 depicts eukaryotes (2) and archaebacteria prokaryotes and eukaryotes follows in the wake of radical chimaer- as sister groups, as in the neomuran (Fig. 4d) hypothesis60,61. By nombreux scénarios de fusion entre une archébactérie et une ou plusieurs eubactérie(s) ‘archébactérie’ à mitochondrie ‘Archezoa’

Figure 4 | Models for eukaryote origins that are, in principle, testable with features. Panels a–g are redrawn from refs 57 (a), 58 (b), 59 (c), 60 and 61 Martin Embley genome data. a–d, Models that propose the origin of a nucleus-bearing (d), 62 (e), 63 (f) and 64 (g). The relevant microbial players in each model but amitochondriate cell first, followed by the acquisition of mitochondria are labelled. ArchaebacterialEmbley and & eubacterial Martin lipid(2006) membranes Nature are indicated 440:623-630 in a eukaryotic host. e–g, Models that propose the origin of mitochondria in in red and blue, respectively. a prokaryotic host, followed by the acquisition of eukaryotic-specific 627 © 2006 Nature Publishing Group REVIEW

mitochondria descended from the ancestral ubiquitins, and some GTP binding proteins are tween 20 and 30% of weight or volume (53). bacterium (17, 18, 36, 47). Several hundred among the most highly conserved eukaryotic Such densities are described as ‘‘molecular genes have been transferred from the ancestral proteins. These may be descendants of the com- crowding’’ because the space between macro- bacterium to the nuclear genome, but most mon ancestor recruited early in the evolution of molecules is much less than their diameters; proteins from the original endosymbiont have the eukaryotic nuclear genome. consequently, diffusion of proteins in cells is been lost. For yeast, the largest protein class Nucleolar proteomes (20, 21)areexamples retarded (54). Molecular crowding favors mac- contains more than 200 eukaryote proteins of essential eukaryote compartments not wrapped romolecular associations, large complexes, and (ESPs) targeted to the mitochondrion but en- in double membranes and where there is no networks of proteins that support biological coded in the nucleus. In addition, the yeast suspicion of an endosymbiotic origin. From 271 functions (16, 17, 53). nucleus encodes 150 mitochondrial proteins not proteins in the human nucleolar proteome, 206 High densities enhance the association ki- uniquely identifiable with a single but protein folds were identified and classified phy- netics of small molecules with proteins because apparently eukaryotic descendants from the com- logenetically (20, 21). Of these, 109 are eukary- the excluded volumes of the proteins reduce the mon ancestor. Accordingly, the yeast and human otic signature folds, and the remaining ones effective volume through which small molecules Eukaryoticmitochondria proteomes emerge largely asSignatureappear to be descendants of the commonProteins an- diffuse (55). The sum of these effects is that the products of the eukaryotic nuclear genome cestor, occurring in two or three domains. high macromolecular densities within CSSs en- (85%) and only to a lesser degree (15%) as direct The spliceosome is a unique molecular hance the kinetic efficiencies of proteins. The descendants of endosymbionts (17, 18, 36, 45). machine that removes introns from eukaryote same principles apply to the smaller prokary- The1. inventées strong representation chez les Eucaryotes otic cells, but the effects of2. ESPstrès in modifiées their prote- chez les Eucaryotes are accentuated in larger omes means that mitochon- 347 ESPs / 2136 cells. Subdividing high dria3. perdues and their descendants chez les Procaryotes densities of proteins in- are usefully viewed as ‘‘hon- to more or less distinct orary’’ CSSs. compartments contain- There are substantial ing functionally interac- numbers of ESPs in the tive macromolecules is other CSSs. For the pro- expected to be an early

teome of the reduced an- feature of the eukary- on March 12, 2007 aerobic parasite Giardia ote lineage. The distinc- lamblia (23), searches of tive proteome of nucleoli 2136 proteins found in demonstrates that com- each ofGiardiaSaccharomyces lamblia partmentation does not cerevisiae, Drosophila require an enclosing mem- melanogaster, Caenorhab- brane. Furthermore, cell ditis elegans,andArabi- fusion is not required to dopsis thaliana yielded account for, nor does it

347 ESPs for G. lamblia. explain (49), the large www.sciencemag.org This was reduced to rough- number of eukaryote cell ly 300 by rigorous screen- compartments. ing, with ESPs distributed between nuclear and cy- Selection Gives and toplasmicCharles compartments Kurland Fig. 2. Distribution of ESPs in the proteome of G. lamblia.ESPs(23)werematchedtothehuman Selection Takes (Fig. 2) (48). The ubiquity International Protein Index data set (48)andthenassignedtoindividualCSSsbasedontheirgeneKurland et al. (2006) Science 312:1011-1014 Genomes evolve continu- of the ESPs and the ab- ontology annotations. A protein may be present in more than one CSS (e.g., a protein involved in ously through the interplay sence of archaeal de- transport from the nucleus to the cytoplasm will be assigned to both CSSs). Black numbers are the of unceasing mutation, Downloaded from scendants are not easily number of proteins assigned to each CSS from the total G. lamblia proteome (AACB00000000) unremitting competition, explained by a prokary- (3077 ORFs matched and linked to gene ontology);rednumbersaretheESPsassignedtoeachCSS and ever-changing envi- ote genome fusion model (320 proteins matched and linked to gene ontology). ronments. Both sequence (49). The simplest inter- loss and sequence gain pretation is that the host for the endosymbiont/ mRNAs (22). Even though we do not know the can result. In general, expanded genome size, mitochondrial lineage was an ancestral eukaryote. ancestral processing signals for the earliest along with augmented gene expression, increases Similar results are obtained for another eukaryotes (50), roughly half of the 78 spliceo- the costs of cell propagation so the evolution of reduced eukaryote, the intracellular parasite somal proteins likely to be present in the an- larger genomes and larger cells requires gains in Encephalitozoon cuniculi.Arecentstudy(24) cestral spliceosome are ESPs, (13)whereas fitness that compensate (15, 56, 57). Conversely, identified 401 ESPs, ofwhich295hadhomo- the other half containing the Sm/LSm proteins genome reduction is expected to lower the costs logs among the ESPs of G. lamblia (23). Two (51)havehomologsinbacteriaandarchaea of propagation. There is an ever-present poten- major categories of ESPs in the G. lamblia and (13). These distributions of both ESPs as well tial to improve the efficiency of cell propaga- E. cuniculi genomes were distinguished: those as of putative descendants of the common an- tion by reductive evolution. associated with the CSSs (Fig. 2) and those cestor suggest that many components of mod- Environmental shifts may neutralize se- involved in control functions such as guanosine ern spliceosomes were present in the common quences, leaving no selective pressure to main- triphosphate (GTP) binding proteins, kinases, ancestor (52). tain them against the persistent flux of deleterious and phosphatases (7). It was also observed (23) The subdivision into subcellular compart- mutations. Such neutralized sequences eventu- that many characteristic eukaryotic proteins with ments (CSSs) with characteristic proteomes re- ally and inevitably disappear because of ‘‘mu- weak sequence homology to prokaryotic proteins stricts proteins to volumes considerably smaller tational meltdown’’ (14, 15, 56, 57). Genome but more convincing homologies of structural than the whole cell. Concentrations of macro- reduction can be achieved through differential fold such as the actins, tubulins, kinesins, molecules in cells are very high, typically be- loss of coding and noncoding sequences (com-

1012 19 MAY 2006 VOL 312 SCIENCE www.sciencemag.org

REVIEWS

MONOPHYLY phylogenomic approach to shed light on long-standing principal groups have been identified among the orders Monophyletic taxa include all phylogenetic questions, spanning all levels of the tree of of placental mammals, and their origins might be the species that are derived from life (FIG. 1). In this section, we present recent advances for explained by geographical isolation, resulting from plate a single common ancestor. each of the three domains of life that have been enabled tectonics, during the early stages of diversification by phylogenomics. among placental mammals85. Molecular studies have also revealed the prevalence of morphological conver- Eukaryotes. The reconstructions of phylogenies of pla- gence during the evolution of placental mammals, cental mammals and land plants represent the most with the occurrence of parallel adaptive radiations14. spectacular examples of recent advances enabled by This might partly explain why reconstructing the phy- phylogenomics. The evolutionary history of placental logeny of placental orders on the basis of morphological mammals was traditionally considered to be irresolv- characters has been difficult. able, owing to the explosive radiation of species that Similarly, studies of multiple genes from all three occurred in a short space of time. However, this has now compartments of the plant cell (the mitochondrion, been elucidated by the analysis of supermatrices that chloroplast and nucleus) have helped to overcome long- contain about 20 nuclear genes14,15,85,86, and the resulting standing uncertainties in land-plant phylogeny11,88–92. phylogeny has been confirmed by recent analyses of These advances have brought several changes to the comparticleslete mitochondrial genomes30,87, with only a few traditional botanical classification of flowering plants, nodes left unresolved. All but 1 of the 18 morphologi- which were previously based on morphological features cumulative probabilities of these five trees are shown at the right of interpretation of restriction digests of a circularcally d plasmidefined orex thetant mammalian orders have been con- such as floral characteristics and leaf shape93. In this each tree. The upper tree is supported by 60.5% of the conditioned mapping of a circular chromosome, as implyingfirm ae ringd by of m life.ole Thecular studies. The notable exception to case, as with the placental mammals, molecular evi- reconstruction bootstraps, the second tree by 16.8% of the con- ring shown at the lower left of Fig. 1 is completely consistent with all ditioned reconstruction bootstraps, for a cumulative probability of five of the fully resolved trees shown at the topthi ofs i thes th figure,e ins henceectivores, which have been split into two dence revealed the plasticity of phenotypic characters, 77.3%, and so on. It initially appears that the resolution of the tree is we refer to it as a fully resolved ring. ThatLife ringd explainsistinct 96.3% orde ofrs the by theRoots recognition of an unexpected highlighting several examples of previously unrecog- poor because the most probable tree is supported by a low bootstrapProblemsbootstrap replicates, and and the partially paradigms resolved ring at the lower right 14,15,77 value, and the other trees are supported by even lower values. of Fig. 1 explains 99.2% of the bootstrap replicates.group of TheAfr ringican is origin named Afrotheria . Four nized convergent evolution. This is perhaps best illus- However, when the five most probable unrooted trees are aligned by supported for all possible conditioning genomes (see Supplemen- trated by the case of the sacred lotus (Nelumbo nucifera), shifting each to the left or the right, until their leaves match, they taryTree Data). This providesof Life strong evidence for the completely which was previously thought to be related to water form a repeating pattern indicating that the five trees are simply resolved ring on the left, and even stronger evidence for the less- permutations of an underlying cyclic pattern. This implies that they resolved ring on the right. From these results we infer that the lilies (Nymphaeaceae), whereas molecular studies Crenarchaeota are derived from the single cycle graph (ring) shown at the bottom eukaryotic nuclear genomeLow-GC, was formed from the fusion of the unveiled a close phylogenetic affinity with plane trees left of Fig. 1 (ref. 20). When that ring is cut at any one of the five genomes of a relative of a proteobacteriumGram positive (PPlancto-) and a relativeThermo- of an Desulfurococcales g proteales (genus PProblemslatanus)88. Problems and paradigms and paradigms central arcs and then unfolded, the resulting unrooted tree will archaeal eocyte (E).Thermotogales mycetales Sulfolobales Patrick Forterre Hervé Philippe correspond to one of the five most probable trees. In other words, Euryarchaeota At a larger scale, sequence-based phylogenomic stud- Eubacterial roots ofAquificales the ring of life the data are not tree-like they are ring-like. ies of eukaryotic phylogeny confirmed the MONOPHYLY of A rigorous combinatorial analysis of the genomic fusion of two By analysing additionalSpirochaetes taxa, one can discover whether the eubacter- Halobacteriales organisms valid for all possible initial pre-fusion trees has shown ial fusion partner branches even deeper than the g-proteobacterium Methanosarcinales most phyla, which were originally defined on the basis Chlamydiales that the conditioned reconstruction algorithm recovers all permu- Xylella. The data set for this second analysis, described in the legend of ultrastructural or rRNA analyses. However, they also 20 Thermoplasmatales tations of the cycle graph, and only those permutations . Hence, for Fig. 2,Deinococcales specifically included the a-proteobacterium Brucella and demonstrated the common origin of a group of mor- these results may be interpreted, in a manner analogous to the the cyanobacteriumHigh-GC,Synechocystis. The results are shown in Fig. 2. Archaeoglobales Bacteria The cumulativeGram probability positive of the five trees, shown at the top ofArchaea the phologically diverse amoebae, which were previously figure, corresponds to 87.1% of the total bootstraps, indicating Methanococcales 17 Cyanobacteria thought to have evolved independently . Phylogenomic significant statistical support for the ringMitochondria shown at the lower left Thermococcales Proteobacteria 20,94 95 of Fig. 2. The percentage increases to 92.8% when theEukaryotaa-proteo- analyses of nuclear and mitochondrial genes have bacterium and the g-proteobacteriumChloropla are allowedsts to form an also corroborated the long-suspected sister-group rela- unresolved bifurcation,Land plants and rises to 99.7% when the a-proteobac- Opisthokonta tionship between the unicellular choanoflagellates and terium,Figureg-proteobacterium 1. Current and cyanobacterium views of are the allowed universal to tree. Fungia: The form an unresolvedGreen trifurcation. algae The ring is also supported for all multicellular animals. However, the lack of representa- Plantae Choanoflagellates possibletraditional conditioning treeRed genomes with algae the (see Supplementary root in the Data). bacterial The branch (in green) tives from sevSSUeral pr inrRNAcipal lineages (thatprotéines is, Rhizaria, dupliquées combined analysesGlaucophytes in Figs 1 and 2 provide strong evidence for the (1) Metazoans ringand in the Fig. sisterhood3, andMycetozoans indicate that of archaeathe immediate and relative eucarya. of the This tree mainly Cryptophytes, Haptophytes and Jakobids) in phyloge- Radiolaria eubacterial fusion partnerPelobionts was probably a primitive proteobacter- nomic studieFigures currently Figurep 1.revCurrentents the 1. stu viewsdCurrenty of mo ofst o viewsf thethe universal of the universal tree. a: tree.The a: The corresponds to the rRNA tree, the rooting beingCercozoa based on ium, or possibly a primitiveEntamoebae photosynthetic eubacterium. Amoebozoa Rhizaria worracineking hyptraditionalothe seubactériennees dertraditionaliv treeed fro withm de tree thecade roots with of u inl thetra thesartéfactuelletu rootc bacterial- in the bacterialbranch (in branch green)? (in green) duplicatedThe second analysis, genes. illustratedb: inTwo Fig. 2, did universal not detect an treesAlveolates of paralogous Diplomonads (1) (1) additional ring connecting the a-Proteobacteria, theJakobids phylum(3,4) from tural and moandlecul thear stu sisterhooddandies. T thehese sisterhood h ofypo archaeatheses h ofav and archaeae pro eucarya.- and eucarya.This treeThis mainly tree mainly whichproteins mitochondria are used arose27, to to the root eukaryotes.Euglenoids each There other. is increasingCrypto- TheStramenopiles root of the tree of posed the division of the eukaryotic world into six main Haptophytes correspondscorresponds to the rRNA to the tree, rRNA the tree, rooting the being rooting based being on based on evidence that eukaryotes have receivedExcavata a deluge of DNAphytes from 80,96 life cannot28 be inferred from a single gene, because molecular groups : the Opisthokonta, Amoebozoa, Plantae, organelles . In one study, 630 nuclear-encoded genes in eukaryotesChromalveolata duplicatedduplicated genes. b: genes.Twob: universalTwo universal trees of trees paralogous of paralogous phylogeny only produced unrooted trees. As proposed by Chromalveolata, Rhizaria, and Excavata (FIG. 1). The proteins areproteins used areto root used each to root other. each(3,4) other.The root(3,4) ofThe the root tree of of the tree of (78) validity of these proposed ‘kingdoms’ represents one Schwartz and Dayhoff, when a gene duplication has oc- life cannotlife be cannot inferred be from inferred a single from gene, a single because gene, becausemolecular molecular curredFigure in 1 an| Phylogenomic earliest commons and the tree ancestor of life. A schematic and has representation been followed showing recent of the most important outstanding questions that advances and future challenges of the phylogenomic approach for resolving the main branches phylogenomphylogenyics has the phylogenypot onlyential producedto an onlyswer. produced unrooted unrooted trees. As trees. proposed As proposed by by (78) (78) by divergentof the tree of life. evolution This tree aims in to represent eukaryotes a consensus and view prokaryotes, on evolutionary relationships it is within Finally, thSchwartze questionSchwartz o andf the Dayhoff,origin and of th Dayhoff,e ewhenukaryot aes genewhen duplication a gene duplication has oc- has oc- the three domains — Bacteria, Archaea, and Eukaryota — with hypothetical relationships possible to reciprocally root trees of paralogous genes. curred incurred an earliest97 in an common earliest commonancestor ancestorand has been and has followed been followed indicated as dashed lines. The main branches that have been identified (purple) or confirmed has been recently addressed using a gene-content c: Recent(yellow) by phylogenomics hypotheses are indicated. suggest Blueinstead ‘Ringdashed lines the underlineof emergenceLife’ putative phylogenetic of method thatby allo divergentws theby mo divergentd evolutionelling of g evolutione innom eukaryotese-fusi inon eukaryotes and prokaryotes, and prokaryotes, it is it is eukaryoteshypotheses fromthat have the been merging indicated by of phylogenomic ancient bacteria studies and andneed further archaea. investigation. events60. Thepossible authors propossible topose reciprocallyd that tothe e reciprocallyukar rootyotes o treesrig- root of trees paralogous of paralogous genes. genes. The main uncertainties for which the phylogenomic approach might provide future answers are inated from c:a fuRecentsion evc:ent hypotheses Recentbetween a hypotheses bac suggestterial speci insteadsuggestes the instead emergence the emergence of of In somepinpointed scenarios, by red circles. the Note archaeal that most of endosymbiontthe progress brought about that by gives the phylogenomic rise (63) and an archaeukaryoteseal specieseukaryotes, lea fromding t theo a r merging froming-lik thee st ofr merginguc ancientture of bacteria ancient and bacteria archaea. and archaea. to theapproach eukaryotic has been realized nucleus at a smaller was taxonomic a crenoarchaeota, scale for land plants, andin for other placental at the root of the tree of life97. This would account for mammals within the metazoans (see main text). The(19,20) two well-recognized(9) endosymbiotic events In some scenarios,In some scenarios, the archaeal the archaealendosymbiont endosymbiont that gives that rise gives rise cases a methanogenic euryarchaeota. Gupta suggests thFiguree chimeric2. natuEffectre of the of eu mutationalkaryotes, whic saturationh has been on the relative rate involving bacteria that gave rise to eukaryotic organelles (mitochondriaDelsuc and chloroplasts)et al. (2005) are Nat Rev Genet 6:361-375;to the eukaryoticto Forterre the eukaryotic nucleus & Philippe was nucleus a (1999) crenoarchaeota, was aBioEssays crenoarchaeota,(63) 21:871-879in other(63) in other a first split between Gram positive and Gram negative bacteria intestferred and from the the o longbserv branchation that attraction many eukary artefact.otic A: The relative indicated by arrows (blue and green, respectively). Note however, that other horizontal gene cases a methanogeniccases a methanogenicRivera euryarchaeota. & Lake euryarchaeota. (2004)(19,20) NatureGupta(19,20) (9)431:152-155suggestsGupta(9) suggests Figure 2.FigureEffect 2. of mutationalEffect of mutational saturation saturation on the relative on the rate relative rate (insteadtransfers of and Bacteria gene duplication and Archaea)events are not represented followed in by thisthe organismal emergence tree, although they mrateetabo testlic ge isne performeds have a grea toter know numb whetherer of simil speciesar A and B evolve Figure 3 A schematic diagram of the ring of life. The eukaryotes plus the two eukaryotic a first splita between first split Grambetween positive Gram and positive Gram and negative Gram bacterianegative bacteria test andtest the long and thebranch long attraction branch attraction artefact. A: artefact.The relativeA: The relative do constitute important aspects of genome evolution. counterparts in Bacteria, whereas most of those ϭ (79) Figure 2 Eubacterial relationships within the ring of life. The genomes are from two yeasts rootof organisms Archaea (the operational from and Gram informational positive ancestors) comprise bacteria. the eukaryotic at the same(instead rate. of(instead To Bacteria do that,of and Bacteria one Archaea) tests and Archaea) followed if d(A,O) by followed thed(B,O). emergence by the emergence rate test israte performed test is performed to know whether to know species whether A species and B evolve A and B evolve (Y1, S. pombe and Y2, S. cerevisiae), a g-proteobacterium (Pg, X. fastidiosa 9a5c), an realm (see Supplementary Discussion). Ancestors defining major groups in the prokaryotic (79) (79) 49 This test is efficient when the gene under study is not at the sameat the rate. same To do rate. that, To one do that, tests one if d(A,O) tests ifϭ d(A,O)d(B,O).ϭ d(B,O). a-proteobacterium (Pa, Brucella melitensis 16M), a cyanobacterium (C, Synechocystis realm are indicated by small circles on the ring. The Archaea , shown on the bottom right, of Archaeaof from Archaea Gram from positive Gram bacteria. positive bacteria. sp. PCC6803), a halobacterium (H, Halobacterium sp. NRC-1), an eocyte (E, S. tokodaii) includes the Euryarchaea, the Eocyta and the informational eukaryotic ancestor. The 3 6 6 | MAY 2005 | VOLUME 6 mutationally saturatedwww.nature.com/reviews/genetics (ᮀ and ᭝). In contrast, two species that This testThis is efficient test is whenefficient the when gene the under gene study under is study not is not and a bacillus (not shown; the conditioning genome S. aureus MW2). The five unrooted Karyota5, shown on the upper right of the ring, includes the Eocyta and the informational trees consistent with the ring are shown with leaves pointing upward to emphasize that eukaryotic ancestor. The upper left circle includes the Proteobacteria49 and the © 2005 Nature Publishing Group have evolved at different rates (᭿ and ᭡, upper right tree) will mutationallymutationally saturated saturated (ᮀ and ᭝ ().ᮀ Inand contrast,᭝). In contrast,two species two that species that each is part of a repeating pattern. Cumulative probabilities are at the right of each tree. operational eukaryotic ancestor. The most basal node on the left represents the have evolvedhave at evolved different at rates different (᭿ and rates᭡ (,᭿ upperand ᭡ right, upper tree) right will tree) will Fully and partially resolved rings are at the lower left, right and centre, respectively. photosynthetic prokaryotes and the operational eukaryotic ancestor. be also separated from an outgroup by the same distance if hypotheses, the derived domain corresponds to the eukary- be also separatedbe also separated from an outgroup from an outgroup by the same by the distance same distanceif if NATURE | VOL 431 | 9 SEPTEMBER 2004 | www.nature.com/nature 153 the sequences are saturated by multiple substitutions, be- © 2004 Nature Publishing Group hypotheses,hypotheses, the derived the deriveddomain domaincorresponds corresponds to the eukary- to the eukary- otes that originated from a merging of two primitive prokary- cause in that case, the observed differences between these the sequencesthe sequences are saturated are saturated by multiple by substitutions, multiple substitutions, be- be- otes thatotes originated that originated from a merging from a merging of two primitive of two primitive prokary- prokary-cause incause that case, in that the case, observed the observed differences differences between between these these otic lineages, but the authors disagree on the nature of these two proteins are not proportional to the real number of otic lineages,otic lineages, but the authors but the disagreeauthors disagree on the nature on the of nature these of thesetwo proteinstwo proteins are not areproportional not proportional to the real to the number real ofnumber of substitutions (plateau of the curve). When performed on such lineages (see legend of Fig. 1). lineageslineages (see legend (see of legend Fig. 1). of Fig. 1). substitutionssubstitutions (plateau of (plateau the curve). of the When curve). performed When performed on such on such a protein, the relative rate test will give the misleading result Alternatively, others have favoured the view that the rRNA Alternatively,Alternatively, others have others favoured have favoured the view the that view the rRNAthat the rRNAa protein,a the protein, relative the rate relative test willrate give test the will misleading give the misleading result result that the two species have evolved at the same rate. Obviously, that the twothat species the two have species evolved have at evolved the same at the rate. same Obviously, rate. Obviously, tree rooted in the bacterial branch is the good one, and that tree rootedtree in rooted the bacterial in the bacterial branch is branch the good is the one, good and one, that and that the relative rate test is improved by the use of methods that the relativethe rate relative test israte improved test is improved by the use by of the methods use of methods that that most, if not all, contradictions observed in other phylogenies correctmost, hidden if notmost, multiple all, if contradictions not substitutions, all, contradictions observed but these observed in othermethods inphylogenies other are phylogeniescorrect hiddencorrect multiple hidden substitutions, multiple substitutions, but these but methods these methods are are (8,21) (8,21) should be explained by lateral gene transfers (LGT).(8,21) For hinderedshould by be theshould explained poor be fit explained of by the lateral model by gene lateral of sequence transfers gene transfers (LGT). evolution (LGT).For Forhinderedhindered by the poor by the fit of poor the fit model of the of model sequence of sequence evolution evolution these authors, the choice of the ‘‘correct tree’’ is based on the (see(22)thesefor authors,a morethese complete authors, the choice the discussion). of choice the ‘‘correct of theB,C: ‘‘correct tree’’Left is trees based tree’’ are is on based the on the(see(22) for(see a more(22) for complete a more complete discussion). discussion).B,C: LeftB,C: treesLeft are trees are supposedassumption to beassumption real that ones proteins that and proteins rightinvolved trees involved in are translation, those in translation, incorrectly transcription, transcription,supposedsupposed to be real to ones be real and ones right and trees right are trees those are incorrectly those incorrectly assumption that proteins involved in translation, transcription, recoveredrecovered by classical by classical methods methods of molecular of molecular phylogeny phylogeny be- be- recoveredand replication byand classical replication (informational methods (informational of molecular proteins, proteins, phylogenysensu, Riverasensu, be- etRivera et and replication (informational proteins, sensu, Rivera et cause of thecause long of branch the long attraction branch attraction artefact. artefact. causeal., of(14) the) which longal.,(14) branch are) which often attraction are more often similar artefact. more in similar Archaea in Archaea and Eucarya, and Eucarya, al.,(14)) which are often more similar in Archaea and Eucarya, are especiallyare especially informative informative in inferring in inferring phylogenies phylogenies because because are especially informative in inferring phylogenies because they arethey less are frequently less frequently involved involved in LGT. in Moreover, LGT. Moreover, Brown Brown they are less frequently involved in LGT. Moreover, Brown and Doolittleand Doolittle(8) argued(8) thatargued these that proteins these proteins have evolved have evolved at a at a and Doolittle(8) argued that these proteins have evolved at a similar ratesimilar in the rate three in the domains, three domains, through through the application the application of this of viewthis and view suggest and suggest an explanation an explanation for some for artefactual some artefactual similar rate in the three domains, through the application of this viewthe and relative suggestthe relativerate test an rate explanationto a test large to adata large for set some data(8) (see set artefactual(8) Fig.(see 2A Fig. for 2Aresults for thatresults are strongly that are stronglysupported supported by classical by classical tree reconstruc- tree reconstruc- the relative rate test to a large data set(8) (see Fig. 2A for results thatexplanation). are stronglyexplanation). supported by classical tree reconstruc- tion methods.tion methods.(23–26) In(23–26) addition,In addition, we argue we that argue most that authors most authors explanation). tion methods.The(23–26) commonTheIn common assumptionaddition, assumption we of most argue people of that most most who people support authors who onesupport onefocus toofocus exclusively too exclusively on LGT on or LGT cell fusion or cell as fusion a possible as a possible The common assumption of most people who support one focus tooof the exclusively twoof the above two on hypotheses above LGT orhypotheses cell (fusion fusion or (fusion LGT) as a or is possible LGT) that trees is that treessourcesource of mosaicism of mosaicism between between the three the domains three domains (often (often of the two above hypotheses (fusion or LGT) is that trees sourceinferred of mosaicisminferred from molecular from between molecular phylogenetic the phylogenetic three analyses domains analyses are basically (often are basicallyignoringignoring that mosaicism that mosaicism is a general is a rulegeneral in evolution) rule in evolution) and do and do correct (andcorrect thus (and require thus biological require biological explanations). explanations). We argue We arguenot paynot enough pay enoughattention attention to other to major other evolutionary major evolutionary forces, forces, inferred from molecular phylogenetic analyses are basically ignoring that mosaicism is a general rule in evolution) and do here thathere this that is not this a is correct not a assumption correct assumption and that and many that manysuch assuch differential as differential gene loss gene and loss non-orthologous and non-orthologous replace- replace- correct (and thus require biological explanations). We argue not pay enough attention to other major evolutionary forces, molecularmolecular phylogenies, phylogenies, in particular in particular those rooting those the rooting tree the of treement. of Finallyment. we Finally think we that think most that current most scenarios current scenarios about early about early here that this is not a correct assumption and that many such aslife, differential do notlife, reflect do gene not the reflect loss genuine and the non-orthologous genuine evolutionary evolutionary history replace- (see history for (seeevolution for evolution are highly are biased highly by biased prejudicial by prejudicial view of prokaryotes view of prokaryotes molecular phylogenies, in particular those rooting the tree of ment. Finallyreview we(22)review think). Recent(22) that). moststudiesRecent current instudies eukaryotic scenarios in eukaryotic evolution about evolution support early supportas primitiveas primitive organisms organisms and we and consider we consider alternative alternative hypoth- hypoth- life, do not reflect the genuine evolutionary history (see for evolution are highly biased by prejudicial view of prokaryotes review (22)). Recent studies in eukaryotic evolution support as primitive organisms and we consider alternative hypoth- 872 BioEssays872 BioEssays 21.10 21.10

872 BioEssays 21.10 Problems and paradigms

indel in the elongation factor EF1␣ present in eukaryotes and crenotes.(63) The phenotypic homogeneity of a group of organisms is sometimes used as an argument to support their monophyly. However, this can be misleading. For example, the emer- gence of birds and mammals led to the appearance of three phenotypically coherent groups: reptiles, birds, and mam- mals; but only the latter two are monophyletic, reptiles being paraphyletic! The phenotypic coherence of the three domains at the molecular level does not in fact impose a strong constraint on evolutionary scenarios except to assume two (Fig. 4a) or three (Fig. 4b) major morphological and physiologi- cal changes (dramatic evolutionary events), during which fundamental molecular mechanisms evolved independently from each others. In a scenario with three dramatic evolution- ary events, the three groups are related by a trifurcation in an unrooted tree (Fig. 4b), whilst in the case of two dramatic evolutionary events, one of the three groups has an ‘‘interme- diate’’ position between the two others, but this group can be either mono, poly, or paraphyletic (Fig. 4). The possibility that archaea have an intermediate position between the two other domains is suggested by recent data from comparative genomics. Indeed,REVIEW one notices a reduction in the number of proteins involved in transcription, translation, and replication machineries, with a gradient from Eucarya to Bacteria, and Figure 5. Schematic evolution of the transcription appara- nodeArchaea at which in-between. the three domains For example, coalesce the in replication factor that tus. The number of different known subunits is indicated (this unrooted trees. loads the DNA replicase onto the RNA primer is encoded by number is an approximation for large eukaryotic complexes). There are links between comparative ge- five homologous genes in Eucarya, two in Archaea and one Double-arrows and question marks indicate that the direction nomics and the ecology of organisms. These Genomics and the Irreducible Nature (49) of evolution is a priori unknown. Homologous protein or includein Bacteria. the aerobic/anaerobicThe same trendsstates of can the be recognised in the complex have the same colour. environmentnumber and of RNA the adaptive polymerase fit of subunits organelles and basal transcription of Eukaryote Cells suchfactors, as mitochondria, of ribosomal hydrogenosomes, proteins, and and of translation initiation mitosomesfactors. (10 This, 18 suggests, 19, 38–41 an). evolutionary In addition totrend from eukaryotes C. G. Kurland,1 L. J. Collins,2 D. Penny2* the advantagesto bacteria from or from oxidative bacteria metabolism to eukaryotes, and/ with archaea in or oxygenbetween. detoxification, A strikingother feature advantages of this process must is the huge gap in Large-scale comparative genomics in harness with proteomics has substantiated fundamental havecomplexity accrued from between having archaea a cellular and compart- eukaryotes, as exemplified features of eukaryote cellular evolution. The evolutionary trajectory of modern eukaryotes is Problems and paradigms Secondaryment by withSimplification the dense evolution proteomes of the (15, transcription38, 42). Eco- apparatus (Fig. 5), distinct from that of prokaryotes. Data from many sources give no direct evidence that eukaryotes logical specialization can account for the evolved by genome fusion between archaea and bacteria. Comparative genomics shows that, under differences between prokaryote and eukaryote certain ecological settings, sequence loss and cellular simplification are common modes of indel in the elongation factor EF1␣ present in eukaryotes and cell architectures and genome sizes. Small pro- evolution. Subcellular architecture of eukaryote cells is in part a physical-chemical consequence of (63) karyote cells with streamlined genomes may crenotes. molecular crowding; subcellular compartmentation with specialized proteomes is required for the reflect adaptation to rapid growth and/or mini- The phenotypic homogeneity of a group of organisms is efficient functioning of proteins. mal resource use by autotrophs, heterotrophs, and sometimes used as an argument to support their monophyly. saprotrophs. Divergent evolutionary paths may However, this can be misleading. For example, the emer- omparative genomics and proteomics hypotheses that attribute eukaryote origins to emerge with the adoption of a phagotrophic- Figure 6. Our proposal for the universal gence of birds and mammals led to the appearance of three have strengthened the view that modern genome fusion between archaea and bacteria feeding mode in an ancestor of eukaryotes. This tree of life. The root is located in the phenotypically coherent groups: reptiles, birds, and mam- Ceukaryote and prokaryote cells have long (25–30)aresurprisinglyuninformativeabout uniquely eukaryote feedingeukaryotic mode branch requires(45) (see a text). In all lin- mals; but only the latter two are monophyletic, reptiles being followed separate evolutionary trajectories. Be- the emergence of the cellular and genomic sig- larger and more complexeages, cell, consistent evolution with is supposed to pass cause their cells appear simpler, prokaryotes natures of eukaryotes (CSSs and ESPs). The earlier suggestions thatthrough a unicellular stages of raptor simplification (blue) and paraphyletic! The phenotypic coherence of the three domains have traditionally been considered ancestors of failure of genome fusion to directly explain any (predator), which acquiredcomplexification a bacterial endo- (red) in terms of gene at the molecular level does not in fact impose a strong eukaryotes (1–4). Nevertheless, comparative characteristic feature of the eukaryote cell is a symbiont/mitochondria lineage,contents and became variety the of molecular on March 12, 2007 mecha- constraint on evolutionary scenarios except to assume two nisms. Simplification is prevalent in prokary- genomics has confirmed a lesson from paleon- critical starting point for studying eukaryote common ancestor of all modern eukaryotes (Fig. 4a) or three (Fig. 4b) major morphological and physiologi- tology: Evolution does not proceed monoton- origins. (3, 4, 43). Indeed, predator/oticprey lineages relationships but with some exceptions. The reverse is true in eukaryotes. cal changes (dramatic evolutionary events), during which ically from the simpler to the more complex may provide the ecological setting for the (5–9). Here, we review recent data from pro- divergence of the distinctive cell types adopted fundamental molecular mechanisms evolved independently teomics and genome sequences suggesting that by eukaryotes, bacteria, and archaea. from each others. In a scenario with three dramatic evolution- eukaryotes are a unique primordial lineage. ary events, the three groups are related by a trifurcation in an Mitochondria, mitosomes, and hydrogeno- Proteomics of Cell Compartments unrooted tree (Fig. 4b), whilst in the case of two dramatic 876 BioEssays 21.10 somes are a related family of organelles that Comparative genomics and proteomics reveal evolutionary events, one of the three groups has an ‘‘interme- distinguish eukaryotes from all prokaryotes phylogenetic relationships between proteins www.sciencemag.org diate’’ position between the two others, but this group can be (10). Recent analyses also suggest that early making up eukaryote subcellular features and eukaryotes had many introns (11, 12), and RNAs those found in prokaryotes. We distinguish three either mono, poly, or paraphyletic (Fig. 4). The possibility that and proteins found in modern spliceosomes main phylogenetic classes; the first are proteins archaea have an intermediate position between the two other (13). Indeed, it seems that life-history param- that are unique to eukaryotes: the ESPs. The domains is suggested by recent data from comparative eters affect intron numbers (14, 15). In addition, ESPs we place in three subclasses: proteins genomics. Indeed, one notices a reduction in the number of Bmolecular crowding[ is now recognized as an arising de novo in eukaryotes; proteins so proteins involved in transcription, translation, and replication important physical-chemical factor contributing divergent to homologs of other domains that

Downloaded from machineries, with a gradient from Eucarya to Bacteria, and to the compartmentation of even the earliest their relationship is largely lost; or finally, Fig. 1. The common ancestor of eukaryotes, Figure 5. Schematic evolution of the transcription appara- Archaea in-between. For example, the replication factor that eukaryote cells (16, 17). bacteria, and archaea may have been a community descendants of proteinstus. that The are number lost from of differentother known subunits is indicated (this Nuclei, nucleoli, Golgi apparatus, centrioles, of organisms containing the following:Forterre autotrophs & Philippedomains, (1999) surviving BioEssays onlynumber as ESPs21:871-879; is anin eukaryotes. approximation Kurland et for al. large (2006) eukaryotic Science complexes).312:1011-1014 loads the DNA replicase onto the RNA primer is encoded by and endoplasmic reticulum are examples of The second class contains interdomain five homologous genes in Eucarya, two in Archaea and one that produced organic compounds from CO2 either Double-arrows and question marks indicate that the direction cellular signature structures (CSSs) that dis- photosynthetically or by inorganic chemical reac- horizontal gene transferofs; evolution these are is a proteins priori unknown. Homologous protein or in Bacteria.(49) The same trends can be recognised in the tinguish eukaryote cells from archaea and bacte- tions; heterotrophs that obtained organics by occurring in two domainscomplex with have the the lineage same of colour. number of RNA polymerase subunits and basal transcription ria. Comparative genomics, aided by proteomics leakage from other organisms; saprotrophs that one domain rooted within their homologs in a factors, of ribosomal proteins, and of translation initiation of CSSs such as the mitochondria (18, 19), absorbed nutrients from decaying organisms; and second domain (44). The third class contains nucleoli (20, 21), and spliceosomes (13, 22), phagotrophs that were sufficiently complex to homologs found in at least two domains, but factors. This suggests an evolutionary trend from eukaryotes reveals hundreds of proteins with no orthologs envelop and digest prey. M: endosymbiosis of the proteins of one domain are not rooted to bacteria or from bacteria to eukaryotes, with archaea in þ evident in the genomes of prokaryotes; these mitochondrial ancestor. within another domain(s); instead, the homo- between. A striking feature of this process is the huge gap in are the eukaryotic signature proteins (ESPs) logs appear to descend from the common an- complexity between archaea and eukaryotes, as exemplified (23, 24). The many ESPs within the subcel- It is agreed that, whether using gene con- cestor (Fig. 1). Most eukaryote proteins shared by the evolution of the transcription apparatus (Fig. 5), lular structures of eukaryote cells provide tent, protein-fold families, or RNA sequences by prokaryotes are distant, rather than close, landmarks to track the trajectory of eukary- (31–36), the unrooted tree of life divides into relatives. Thus, proteins shared between do- ote genomes from their origins. In contrast, archaea, bacteria, and eukaryotes (Fig. 1). On mains appear to be descendants of the common such unrooted trees, the three domains diverge ancestor; few seem to result from interdomain 1 Department of Microbial Ecology, Lund University, Lund, from a population that can be called the last lateral gene transfer (31–35). Sweden. 2Allan Wilson Center for Molecular Ecology and Evolution, Massey University, Palmerston North, New universal common ancestor (LUCA). How- Although the genomes of mitochondria are Zealand. ever, LUCA (37) means different things to clearly descendants of a-proteobacteria (45, 46), *To whom correspondence should be addressed. E-mail: different people, so we prefer to call it a com- proteomics and comparative genomics identify [email protected] mon ancestor; in this case it is the hypothetical relatively few proteins in yeast and human Figure 6. Our proposal for the universal tree of life. The root is located in the www.sciencemag.org SCIENCE VOL 312 19 MAY 2006 1011 eukaryotic branch(45) (see text). In all lin- eages, evolution is supposed to pass through stages of simplification (blue) and complexification (red) in terms of gene contents and variety of molecular mecha- nisms. Simplification is prevalent in prokary- otic lineages but with some exceptions. The reverse is true in eukaryotes.

876 BioEssays 21.10