Analyse de séquences et phylogénie moléculaire -
From phylogenetics to phylogenomics: Lessons from the Eucarya École doctorale E2M2 – 2015-2016 (http://www.frangun.org)
Céline Brochier ( [email protected] ) Guy Perrière ( [email protected] )
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Phylogeny of Eucarya (eighties)
Crown
¢ Based on SSU rRNA
¢ A well resolved asymmetric base
¢ An unresolved crown
¢ Late emergence of multicellular eukaryotes
¢ Early emergence of amitochondriate lineages Asymmetric (Archezoa) base
⇒ Gradual complexity of the eukaryotic cell
(Adapted from Sogin Early evolution and the origin of eukaryotes - Curr. Opin. Genet. Dev. – 1991) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Origin of the mitochondria
(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Archezoa
Cavalier-Smith Eukaryotes with no mitochondria – Nature - 1987 Cavalier-Smith Archaebacteria and Archezoa - Nature - 1989
Metamonad Archamoeba Parabasalia Microsporidia Retortamonas Pelomyxa Trichomonas Nosema
(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis
¢ “Archezoa are eukaryotes which primitively lack mitochondria”
¢ The nucleus originated before the mitochondrial endosymbiosis
¢ The first eukaryotes were anaerobes
¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis
¢ “Archezoa are eukaryotes which primitively lack mitochondria”
¢ The nucleus originated before the mitochondrial endosymbiosis
¢ The first eukaryotes were anaerobes
¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology
¢ BUT… the hypothesis would fall if
¢ We find mitochondrial genes in archezoan genomes
¢ We find that archezoans branch among aerobic species with mitochondria
¢ Mitochondrion-derived organelles are found in archezoans Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)
Mol. Biol. Evol. 2001
Mitochondrial genes
Parabasalia Trichomonas vaginalis 1 (Hydrogenosomes) Trichomonas vaginalis 2 Giardia intestinalis Metamonads (Mitosomes) Mitochondrial genes
Proteobacteria Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis
¢ “Archezoa are eukaryotes which primitively lack mitochondria”
¢ The nucleus originated before the mitochondrial endosymbiosis
¢ The first eukaryotes were anaerobes
¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology
¢ BUT… the hypothesis would fall if
¢ We find mitochondrial genes in archezoan genomes
¢ We find that archezoans branch among aerobic species with mitochondria
¢ Mitochondrion-derived organelles are found in archezoans
Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Conflicting phylogenetic signals
SSU rRNA Actin β-Tubulin
(Philippe H. et al. Early-branching or fast-evolving eukaryotes? - Proc. Biol. Sci. - 2000) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The phylogeny of Eucarya is severely affected by tree reconstruction artefacts
B C B C
q q p p A D A D Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Microsporidia are related to Fungi The use of accurate evolutionary models (S/F, removal of invariant sites, Gamma corrections, etc.) and methods (ML, Bayesian) disentangles the phylogenetic position of Microsporidia ML – V-ATPase ML – RPB1 (Vivares et al. Curr Opin Microbiol. 2002) (Hirt et al. PNAS 2001) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis ¢ “Archezoa are eukaryotes which primitively lack mitochondria” ¢ The nucleus originated before the mitochondrial endosymbiosis ¢ The first eukaryotes were anaerobes ¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology ¢ BUT… the hypothesis would fall if ¢ We find mitochondrial genes in archezoan genomes ¢ We find that archezoans branch among aerobic species with mitochondria ¢ Mitochondrion-derived organelles are found in archezoans Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Hydrogenosomes and Mitosomes are mitochondria remnants 50 nm 50 nm (2002) (2003) 50 nm 1 µm (Nyctotherus , 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis ¢ “Archezoa are eukaryotes which primitively lack mitochondria” ¢ The nucleus originated before the mitochondrial endosymbiosis ¢ The first eukaryotes were anaerobes ¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology ¢ BUT… the hypothesis would fall if ¢ We find mitochondrial genes in archezoan genomes ¢ We find that archezoans branch among aerobic species with mitochondria ¢ Mitochondrion-derived organelles are found in archezoans Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Origin of the mitochondria (From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Phylogeny of Eucarya (nineties) Microsporidia Crown ? Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Why such a lack of resolution? ¢ Radiation ß Too few phylogenetic signal has been recorded in sequences ∆t short (Gribaldo and Brochier - Phylogeny of prokaryotes: does it exist and why should we care? - Res Micro - 2009) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Why such a lack of resolution? ¢ Radiation ß Too few phylogenetic signal has been recorded in sequences ∆t short ¢ Substitutional saturation ß The ancient phylogenetic signal is progressively erased by multiple substitutions Observed substitutions Real number of substitutions (Gribaldo and Brochier - Phylogeny of prokaryotes: does it exist and why should we care? - Res Micro - 2009) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) How can we overcome the lack of signal? ¢ Improve methods of reconstruction and evolutionary models (Van de Peer, Ben Ali, Meyer – Gene - 2000) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) How can we overcome the lack of signal? ¢ Improve the taxonomic sampling to avoid tree reconstruction artefact & reduce saturation 28S rRNA of 31 Gnathostoma (parsimony – 529 positions) (Lecointre et al. – Mol. Phyl. Evol. - 1993) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) How can we overcome the lack of signal? ¢ Increase the amount of data Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient ¢ The use of inaccurate evolutionary models can deadly bias phylogenetic inferences (~70 nuclear genes 17807 amino acid positions) Fungi Fungi Deuterostoma Deuterostoma Arthropoda Arthropoda Nematoda Nematoda WAG+F+ Γ CAT+F+ Γ (Lartillot, Brinkmann and Philippe – BMC Evol. Biol. - 2007) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient ¢ The use of inaccurate taxonomic sampling can deadly bias phylogenetic inferences (~70 nuclear proteins,17807 amino acid positions) Fungi Fungi Deuterostoma Cnidaria+Choano Deuterostoma Arthropoda Arthropoda Nematoda Nematoda WAG+F+ Γ (Lartillot, Brinkmann and Philippe – BMC Evol. Biol. - 2007) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient ¢ Break the long branches to avoid Long Branch Attraction artefacts 146 nuclear proteins (35,346 amino-acid positions), ML (JTT+ Γ) (Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient ¢ Compositional biases and multiple substitutions can deadly affect tree inference… 46 r-prot, 137 proteobacterial species, 15372 nucleic acid positions (Ramulu et al. – Mol. Phylogent. Evol. - 2014) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient ¢ especially when using inaccurate evolutionary models ε ε GG+ Γ δ α GTR+ Γ α δ β β γ γ 46 r-prot, 137 proteobacterial species, 15372 nucleic acid positions (Ramulu et al. – Mol. Phylogenet. Evol. - 2014) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Garbage in, garbage out ¢ The recoding of multiple alignments can help to overcome systematic biases 4 mitochondrial genes (3 729 nucleic acid positions) ML (GTR+I+ Γ) Recoding RY + ML (CF+I+ Γ) ⇒ Ticks emerge with A+T rich insects ⇒ Ticks emerge within other chelicerates (Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Garbage in, garbage out ¢ The removal of fastest evolving sites can help to overcome systematic biases and to detect Long Branch Attraction artefacts 146 nuclear genes (35371 amino acid positions), ML (JTT + Γ) Slow-Fast method (Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Supermatrix approaches have improved our knowledge of eukaryotic phylogeny Year Author Number Main results of genes 2002 Baptesteetal 123 MonophylyofAmoebozoa 2004 Philippeetal 129 MonophylyofOpisthokonta 2005 Rodiguez- 143 Monophyly des Plantae Ezpeleta et al 2007 Patronetal 102 MonophylyHaptophyta+Cryptophyta 2007 Burkietal 123 QuestionthemonophylyofChromalveolata 2008 Burkietal 135 EarlyemergenceofExcavatawithinBikonta 2007 Rodiguez- 143 Monophyly of Excavata and question the monophyly of Ezpeleta et al Chromalveolata 2009 Hampletal 143 MonophylyofExcavata 2010 Baurainet al 108 Question the monophyly of Chromalveolata 2010 Burkietal. 167 MonophylyofRhizaria 2012 Burkietal. 258 QuestionthemonophylyofChromalveolata Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Phylogenomics led to the emergence of super-phyla (Simpson and Roger- The ‘real’ kingdoms of eukaryotes – Current biology 2004) (Baldauf et al. The Deep Roots of Eukaryotes – Science – 2003) (Pawlowski, Jan Protist Evolution and Phylogeny . In: eLS. John Wiley & Sons, Ltd: Chichester – 2014) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Opisthokonta ‹ Typical features: - One posteriorly-inserting flagellum - Flattened mitochondrial cristae - Specific insertion in the EF1-alpha ‹ Divided in Holozoa and Holomycetes Metazoa 96 Ministeriids 84 94 Choanoflagellates Ichthyosporeans Corallochytreans Fungi Nucleariid Homo WFGG WK VTRKD GNASGTTLL Saccharomyces WYGG WEKETK AG VVKGKTLL Glugea WFKG WK PVSG AG DSI-FTLE Arabidopsis WYKG------PTLL Trypanosoma WYKG------PILV Ef-1a+HSP70 Plasmodium WYKG------RTLI +act+tub Giardia WYEG------PCLI Sulfolobus WYNG ------PTLE E. coli WEAK------ILELA (Steenkamp et al. The Protistan Origins of Animals and Fungi - MBE - 2006) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Amoebozoa ‹ Mainly composed of amoebae and amoeboid flagellates ‹ Some species have flagella and/ or subpseudopodia ‹ Many species have branching, irregular mitochondrial cristae ‹ Amitochondriate members ‹ Artificially divided in amoebae, slime moulds and flagellated amoebae Pelomixa 123 genes (25,032 aa positions) – JTT + Γ4. 97 (Bapteste et al - PNAS - 2002) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Archaeplastida ‹ Typical features: - Presence of primary plastids - Duplication of the cytosolic Fructose 1,6- bisphosphate Aldolase (FBA) - Type I transcription factor pBRp - Two membrane-bound plasts ‹ Divided in Viridiplantae, Rhodoplastida and Glaucophyta 143 genes 98 (30,113 aa 97 positions) – ML - BI (Rodríguez-Ezpeleta et al.– Current Biology - 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The SAR clade or Harosa 1 ‹ Typical features: - Presence of secondary (red and green) photosynthetic lineages - Duplication of the GTPase Rab1 ‹ Divided in Stramenopiles, Alveolata and Rhizaria 258 genes (55, 881 aa pos.) BI – CAT + Γ4 (Burki et al. The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins – Proc Biol Sci. 2012 ) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The CCTH clade or Hacrobia ‹ Typical features: - Free-living, heterotrophic, mixotrophic and autotrophic - No synapomorphies - No parasits ‹ Divided in Cryptomonads, Centrohelids , Telonemids , Haptophyta , + Kathablepharids and Picobiliphytes 127 genes (25, 235 aa pos.) BI – CAT + Γ4, RaxML + Recod (Burki et al. Large-Scale Phylogenomic Analyses Reveal That Two Enigmatic Protist Lineages, Telonemia and Centroheliozoa, Are Related to Photosynthetic Chromalveolates – GEB 2009 ) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Excavata ‹ Typical features: - Heterotrophic flagellates - Most live in oxygen-poor environments - Most harbor non-aerobic mitochondria - Contain a distinctive longitudinal feeding groove - Include secondary photosynthetic lineages (Euglenozoa) ‹ Divided in Metamonads (amito.) and Discoba (mito.) Pseudo Heterolobosea ciliata Jakobida Kinetoplastida Diplomonad 143 genes (35,584 aa pos.) ML Euglenozoa – WAG + Γ4 Parabasalia (Hampl et al. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic ‘‘supergroups’’ – PNAS 2009) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Is the story over? ¢ The phylogenetic position of some lineages remains elusive ß Apusozoa, Breviates, Collodictyonids, Colponema, Hemimastigophora, Palpitomonas, etc. ¢ Uncultured protists (picoeukaryotes, etc.) ¢ What are the relationships among the super-phyla? ¢ Where is the root of Eucarya? ß In-between Unikonta and Bikonta? (DHFR/TS fusion) ß In-between Plantae and other phyla? (distribution pattern of rare aa) ß In-between Euglenozoa and other phyla? (cytochrome synthesis pathway)