Probing early eukaryotic evolution using phylognetic methods The Universal SSU rRNA Tree Wheelis et al. 1992 PNAS 89: 2930 The SSU Ribosomal RNA Tree for Fungi Choanozoa / Mitochondria? + Stramenopiles Dictyostelium Entamoebae Prokaryotic Physarum outgroup

Trichomonas The Archezoa Hypothesis T. Cavalier-Smith (1983)

“Archezoa are eukaryotes which primitively lack mitochondria” – The nucleus was invented before the was acquired – The first eukaryotes were anaerobes – Archezoans might provide insight into the nature of ancestral eukaryotic genomes and biology The Archezoa Hypothesis T. Cavalier-Smith (1983)

• The Archezoa hypothesis would fall if: – Find mitochondrial genes on archezoan genomes – Find mitochondrion-derived organelles in archezoans – Find that archezoans branch among aerobic species with mitochondria Is an Archezoan?

mitochondria

alpha- other

Clostridia

Chaperonin60/GroEL phylogeny

pyruvate ferredoxin oxidoreductase PFO Pyruvate Acetyl CoA

Fe-hydrogenase 2H+ H2 Pi + ADP ATP Trichomonas ancestry with mitochondrial chaperonins ancestry with chaperonin 60 shares common chaperonin .10 Rickettsia tsutsugumushi Ehrlichia chaffeensis Homo sapiens Ratus norvegicus Mus musculus Schizosaccharomyces pombe Arabidopsis thaliana Dictyostelium discoidium Saccharomyces cerevisiae Zea mays Trichomonas vaginalis Neocallimastix frontalis Ajellomyces capsulatum Heliothis virescens fowleri

Trypanosoma brucei alpha - proteobacteria Mitochondria Mitochondrial genes in Archezoa

Archezoa Proteins of mitochondrial origin*

Giardia / Heat shock 70, Chaperonin 60

Trichomonas Heat shock 70, Chaperonin 60

Microsporidia Heat shock 70

*defined as forming a monophyletic group with mitochondrial homologues in a non-controversial species phylogeny Chaperonin 60 Protein Maximum Likelihood Tree (PROTML, Roger et al. 1998, PNAS 95: 229)

Note 100% Bootstrap support

A case of Eukaryote HGT? Long branches may cause problems for phylogenetic analysis

• Felsenstein (1978) made a simple model phylogeny including four taxa and a mixture of short and long branches

AB A

p p D q TRUE TREE WRONG TREE q q p > q C CD B • Methods which assume all sites change at the same rate (e.g. PROTML) may be particularly sensitive to this problem Chaperonin 60 Protein Maximum Likelihood Tree (PROTML, Roger et al. 1998, PNAS 95: 229)

Longest branches A simple experiment:

• Does the Cpn60 tree topology change: – If we remove long-branch outgroups – If we remove sites where every species has the same amino acid Cpn-60 Protein ML tree (PROTML) from variable sites with outgroups removed

Giardia

Trichomonas

Euglena & 30 31 Animals & Fungi

Apicomplexa

Plants Dictyostelium

Entamoeba Competing Hypotheses for Microsporidia

Animals Fungi Microsporidia + Fungi Choanozoa Tubulin, mitHSP70 Plants / green algae Ciliates + Apicomplexa Stramenopiles Red algae Dictyostelium Entamoebae Euglenozoa Physarum Percolozoa “Microsporidia Early” Trichomonas SSU rRNA, EF-1 alpha Giardia Microsporidia EF-2

• HGT from Fungi to Microsporidia? (Sogin, 1998) • Another artefact of the method of analysis? Microsporidia have a number of unusual features

• Absence of mitochondria and peroxisomes • 70s ribosomes - most eukaryotes have 80S • 5.8S and 23S rRNA genes are fused - like in some prokaryotes • Lack 9 + 2 microtubule structures Alternative explanations of Microsporidia unusual features

• Retention of ancestral features of the eukaryote cell at an early stage of evolution? Or are they • Adaptations to an obligate intracellular lifestyle? Elongation Factor 2 protein ML tree (PROTML) (Hashimoto et al. 1997 Arch. . 148:287)

Animals + Fungi

Trypanosoma Entamoeba 88 Dictyostelium Eukaryote root 75 Trichomonas

Giardia Microsporidia Sulfolobus Halobacterium Archaebacteria outgroups Methanococcus

Also note that in PROTML the amino acid substitution process is assumed to be homogeneous across the tree Shared nucleotide or amino acid composition biases can also cause problems for phylogenetic analysis

Aquifex Thermus Aquifex (73%) Bacillus (50%)

True 16S rRNA Wrong tree tree

Bacillus Deinococcus Thermus Deinococcus (72%) (52% G+C)

Aquifex The correct tree can be obtained if a model is used which allows base/aa composition to vary between sequences -LogDet/Paralinear Bacillus Distances Thermus Heterogeneous Maximum Likelihood Deinococcus LogDet/Paralinear distances for EF-2 DNA variable sites codon positions 1+2

Animals

Chlorella 70 Note that Trypanosoma root has Trichomonas changed 60 Giardia Dictyostelium 25 Entamoeba Sacharomyces 76 Microsporidia 45% G+C Cryptosporidium Sulfolobus Archaebacteria Methanococcus 44% outgroups Halobacterium 58% A combination of factors (outgroup GC content and site rate heterogeneity) influence the EF-2 DNA tree

Methanococcus outgroup Halobacterium outgroup (low G+C) Higher G+C 100 100

80 80

LogDet ML 60 60 Bootstrap estimate values 40 40

20 20

0 0 0 20406080100 0 20406080100 Fraction of constant sites removed (Microsporidia, outgroup) A combination of factors (outgroup GC content & site rate heterogeneity) influence the EF-2 DNA tree Methanococcus outgroup Halobacterium outgroup (low G+C) Higher G+C 100 100

80 80 Bootstrap values 60 60

40 40

20 20

0 0 0 20406080100 0 20406080100 Fraction of constant sites removed (Microsporidia, outgroup) (Microsporidia, Fungi) A combination of factors (outgroup GC content & site rate heterogeneity) influence the EF-2 DNA tree

Methanococcus outgroup Halobacterium outgroup (low G+C) Higher G+C 100 100

80 80 Bootstrap values 60 60

40 40

20 20

0 0 0 20406080100 0 20406080100

Fraction of constant sites removed (Giardia, Trichomonas, outgroup) Competing hypotheses for Microsporidia

Animals Microsporidia + Fungi Fungi Choanozoa Tubulin, RNA polymerase, Plants / green algae LSU rRNA, HSP70, Ciliates + Apicomplexa TATA binding protein, Stramenopiles Red algae EF-2, EF-1 alpha Dictyostelium Entamoebae Euglenozoa Physarum Percolozoa

Trichomonas “Microsporidia Early” Giardia SSU rRNA Microsporidia

The best supported hypothesis for Microsporidia is a relationship to fungi - why does SSU rRNA place them deep? Summary I

• Making trees is not easy: – Among-site rate heterogeneity, “fast clock” species, shared nucleotide or amino acid composition biases – Different data sets may be affected by individual phenomena to different degrees – Biases need not be large if phylogenetic signal is weak Summary II

• Are Archezoa ancient offshoots? – Microsporidia are related to fungi – Evidence for Giardia and Trichomonas branching deeper than other eukaryotes is based on trees made using unrealistic assumptions • PLUS – For the same reasons we don’t know where the root lies on the eukaryote tree – So arguments about early or late branching are probably premature anyway Can we make a robust unrooted tree for eukaryotes?

• Combining different genes in a single analysis may provide a more robust eukaryotic tree • One argument is that phylogenetic signal should be additive whereas gene-specific “noise” will pull in different directions DNA ML tree found using a model which allows both base composition and site rates to vary across the tree

Animals + fungi + slime Entamoeba moulds

Dictyostelium Ciliates plus

Human Plasmodium apicomplexa Schizosaccharomyces Drosophila

Sacharomyces Cryptosporidium

Halteria /Stylonichia Giardia

Giardia and Tetrahymena Trichomonas Green algae Trichomonas Arabidopsis Trypanosoma Red and green algae/plants another red alga Actin+tubulin+EF-2 Chondrus Hydrogenosomes • Strange anaerobic eukaryotic organelles which make hydrogen 12um Methanocorpusculum (Uses H2 and CO2) Metopus endosymbiont

Anaerobic eukaryotes from different phylogenetic groups make hydrogen - how? Origin(s) of Hydrogenosomes

• Is a 2 part problem – The organelle (the bag) – The biochemistry to produce hydrogen particularly hydrogenase CO2 hsp70 ME Protein Pyruvate Malate cpn60 import Transit NAD(P)H NAD(P)+ peptides AAC ATP N ADP NAD(P)-FO [Fe]Hyd H2 2Fd 2Fd- 2H+ ASCT Acetyl-CoA Acetate PFO

Succinate Succinyl-CoA

CO2 CoASH Double STK membrane Fungi and Trichomonas ATP ADP + Pi Enzyme found also in mitochondria Alpha-proteobacterial ancestry Schematic Map of Hydrogenosomes Unknown ancestry (after Muller 1993) Spironucleus

DNA ML tree for Fe hydrogenases

Thermotoga maritima Clostridium acetobutylicum Clostridium pasteurianum Clostridium acetobutylicum Clostridium perfringens

Clostridium difficile

100ClostridiumDesulfovibrio thermocellum spp. Desulfovibrio fructosovorans

Entamoeba histolytica 48

45 Thermotoga maritima 100 72 Desulfovibrio vulgarisNyctotherus ovalis Spironucleus barkhanus Trichomonas vaginalis

Trichomonas vaginalis 0.1 substitutions per site A likelihood ratio test of monophyly (Huelsenbeck, Hillis & Nielson 1996)

δ The Test Statistic ( ) = lnL1 - lnL0

• Where lnL1 is the likelihood of the best tree

and lnL0 is the likelihood of the best monophyly tree • The null (eukaryote monophyly) distribution of δ is generated by simulation under an appropriate model (parametric bootstrapping) Parametric Bootstrapping to estimate a test distribution What might the test statistic distribution look like if the Fe hydrogenases were monophyletic?

Estimate ML model Simulate 1000 new parameters using original sequence data sets using data this model over the best monophyly tree Calculate d for original data and compare to distribution - if it falls outside of the 95% interval it is bigger than expected by chance and monophyly can be rejected

For each new data set Plot d for each of the estimate L0 and L1 using 1000 data sets to give the ML, with model re- test distribution and the optimised each time 95%confidence interval The likelihood ratio test rejects the hypothesis that eukaryotic hydrogenases are monophyletic

δ = lnL1 - lnL0

δ for original data δ Original data 95%95% 9.64

δ (lnL1 - lnL0) distribution from 1000 simulations of the [Fe] hydrogenase data on the best monophyly tree

Clostridium acetobutylicum p262 Clostridium perfringens Iron hydrogenase ML tree Clostridium pasteurianum Clostridium acetobutylicum ATCC824 Shewanella putrefaciens Dehalococcoides ethanogenes

Desulfovibrio vulgaris (Oxamicus) Desulfovibrio vulgaris (Hildenborough) Eukaryotic compartment Desulfovibrio fructosovorans

Trichomonas vaginalis (long form) Trichomonas Hydrogenosome Trichomonas vaginalis (short form) Spironucleus barkhanus Entamoeba histolytica Cytosolic? Chlamydomonas reinhardtii Scenedesmus obliquus Plastid

Megasphaera elsdenii

Treponema denticola Thermotoga maritima

Desulfovibrio vulgaris Hyd-g Nyctotherus ovalis Hydrogenosome Clostridium thermocellum.

Clostridium difficile Desulfovibrio fructosovorans (Hildenborough) Conclusions I

• Hydrogenosomes share common ancestry with mitochondria • Hydrogenase has been acquired at least twice and can be targeted to different cell compartments in different eukaryotes – Humans, plants and fungi also contain remnants of iron hydrogenases • There is no evidence from phylogenetic analysis that the “bag” and hydrogenase share a common origin from the mitochondrion endosymbiont Conclusions II

• Phylogeny is hard, there are lots of potential problems with data, so we need to be careful in our interpretations of what trees mean - includes inferences of HGT • Better methods hold promise of more reliable trees (allowing re-analysis of SSUrRNA data) • Archezoa contain genes which originated with mitochondrion endosymbiont and the jury is still out on whether former archezoa have lost the mitochondrial bag • We don’t know which eukaryotes are early branching - for this we need a rooted tree The mitosome, a novel organelle related to mitochondria in Entamoeba histolytica

Tovar et al., 1999.

Slide shows epitope tagged recombinant cpn60 localised to mitosome Are there still organelles of common ancestry with mitochondria in Giardia and Microsporidia?

• Giardia: – “What are the ovoid pellicular bodies (in Giardia)? The study made suggests that they might be nothing but changed mitochondria with a few crysts or tubules” – “The ultrastructure of mitochondria may be related with the oxygen deficiency in Lamblia environment” (Cheissen, 1965) • Microsporidia: – “There are reports of mitochondria-like structures in several microsporidia” (Vavra, 1976)