Analyse de séquences et phylogénie moléculaire -

From phylogenetics to phylogenomics: Lessons from the Eucarya École doctorale E2M2 – 2015-2016 (http://www.frangun.org)

Céline Brochier ( [email protected] ) Guy Perrière ( [email protected] )

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Phylogeny of Eucarya (eighties)

Crown

¢ Based on SSU rRNA

¢ A well resolved asymmetric base

¢ An unresolved crown

¢ Late emergence of multicellular

¢ Early emergence of amitochondriate lineages Asymmetric (Archezoa) base

⇒ Gradual complexity of the eukaryotic cell

(Adapted from Sogin Early evolution and the origin of eukaryotes - Curr. Opin. Genet. Dev. – 1991) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Origin of the mitochondria

(From Keeling A ’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Archezoa

Cavalier-Smith Eukaryotes with no mitochondria – Nature - 1987 Cavalier-Smith Archaebacteria and Archezoa - Nature - 1989

Metamonad Archamoeba Parabasalia Retortamonas Nosema

(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis

¢ “Archezoa are eukaryotes which primitively lack mitochondria”

¢ The nucleus originated before the mitochondrial endosymbiosis

¢ The first eukaryotes were anaerobes

¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis

¢ “Archezoa are eukaryotes which primitively lack mitochondria”

¢ The nucleus originated before the mitochondrial endosymbiosis

¢ The first eukaryotes were anaerobes

¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology

¢ BUT… the hypothesis would fall if

¢ We find mitochondrial genes in archezoan genomes

¢ We find that archezoans branch among aerobic species with mitochondria

¢ -derived organelles are found in archezoans Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)

Mol. Biol. Evol. 2001

Mitochondrial genes

Parabasalia Trichomonas vaginalis 1 (Hydrogenosomes) Trichomonas vaginalis 2 intestinalis (Mitosomes) Mitochondrial genes

Proteobacteria Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis

¢ “Archezoa are eukaryotes which primitively lack mitochondria”

¢ The nucleus originated before the mitochondrial endosymbiosis

¢ The first eukaryotes were anaerobes

¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology

¢ BUT… the hypothesis would fall if

¢ We find mitochondrial genes in archezoan genomes

¢ We find that archezoans branch among aerobic species with mitochondria

¢ Mitochondrion-derived organelles are found in archezoans

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Conflicting phylogenetic signals

SSU rRNA Actin β-Tubulin

(Philippe H. et al. Early-branching or fast-evolving eukaryotes? - Proc. Biol. Sci. - 2000) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The phylogeny of Eucarya is severely affected by tree reconstruction artefacts

B C B C

q q p

p A D A D

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Microsporidia are related to Fungi

The use of accurate evolutionary models (S/F, removal of invariant sites, Gamma corrections, etc.) and methods (ML, Bayesian) disentangles the phylogenetic position of Microsporidia

ML – V-ATPase ML – RPB1 (Vivares et al. Curr Opin Microbiol. 2002) (Hirt et al. PNAS 2001) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis

¢ “Archezoa are eukaryotes which primitively lack mitochondria”

¢ The nucleus originated before the mitochondrial endosymbiosis

¢ The first eukaryotes were anaerobes

¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology

¢ BUT… the hypothesis would fall if

¢ We find mitochondrial genes in archezoan genomes

¢ We find that archezoans branch among aerobic species with mitochondria

¢ Mitochondrion-derived organelles are found in archezoans

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Hydrogenosomes and Mitosomes are mitochondria remnants

50 nm 50 nm (2002)

(2003) 50 nm

1 µm

(Nyctotherus , 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The archezoa hypothesis

¢ “Archezoa are eukaryotes which primitively lack mitochondria”

¢ The nucleus originated before the mitochondrial endosymbiosis

¢ The first eukaryotes were anaerobes

¢ Archezoans might provide insights into the nature of ancestral eukaryotic genomes and biology

¢ BUT… the hypothesis would fall if

¢ We find mitochondrial genes in archezoan genomes

¢ We find that archezoans branch among aerobic species with mitochondria

¢ Mitochondrion-derived organelles are found in archezoans Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Origin of the mitochondria

(From Keeling A kingdom’s progress: Archezoa and the origin of eukaryotes – BioEssays - 1998)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Phylogeny of Eucarya (nineties)

Microsporidia

Crown

? Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Why such a lack of resolution?

¢ Radiation

ß Too few phylogenetic signal has been recorded in sequences

∆t short

(Gribaldo and Brochier - Phylogeny of prokaryotes: does it exist and why should we care? - Res Micro - 2009)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Why such a lack of resolution?

¢ Radiation

ß Too few phylogenetic signal has been recorded in sequences

∆t short

¢ Substitutional saturation

ß The ancient phylogenetic signal is progressively erased by multiple substitutions Observed substitutions

Real number of substitutions

(Gribaldo and Brochier - Phylogeny of prokaryotes: does it exist and why should we care? - Res Micro - 2009) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) How can we overcome the lack of signal?

¢ Improve methods of reconstruction and evolutionary models

(Van de Peer, Ben Ali, Meyer – Gene - 2000)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) How can we overcome the lack of signal?

¢ Improve the taxonomic sampling to avoid tree reconstruction artefact & reduce saturation

28S rRNA of 31 Gnathostoma (parsimony – 529 positions)

(Lecointre et al. – Mol. Phyl. Evol. - 1993) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) How can we overcome the lack of signal?

¢ Increase the amount of data

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient

¢ The use of inaccurate evolutionary models can deadly bias phylogenetic inferences

(~70 nuclear genes 17807 amino acid positions)

Fungi Fungi

Deuterostoma Deuterostoma

Arthropoda Arthropoda

Nematoda Nematoda

WAG+F+ Γ CAT+F+ Γ

(Lartillot, Brinkmann and Philippe – BMC Evol. Biol. - 2007) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient

¢ The use of inaccurate taxonomic sampling can deadly bias phylogenetic inferences (~70 nuclear proteins,17807 amino acid positions)

Fungi Fungi

Deuterostoma Cnidaria+Choano Deuterostoma

Arthropoda Arthropoda

Nematoda Nematoda WAG+F+ Γ (Lartillot, Brinkmann and Philippe – BMC Evol. Biol. - 2007)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient

¢ Break the long branches to avoid Long Branch Attraction artefacts

146 nuclear proteins (35,346 amino-acid positions), ML (JTT+ Γ)

(Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient

¢ Compositional biases and multiple substitutions can deadly affect tree inference…

46 r-prot, 137 proteobacterial species, 15372 nucleic acid positions (Ramulu et al. – Mol. Phylogent. Evol. - 2014)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Improving the amount of data is not sufficient

¢ especially when using inaccurate evolutionary models

ε ε GG+ Γ

δ α

GTR+ Γ

α δ

β β

γ γ

46 r-prot, 137 proteobacterial species, 15372 nucleic acid positions (Ramulu et al. – Mol. Phylogenet. Evol. - 2014) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Garbage in, garbage out

¢ The recoding of multiple alignments can help to overcome systematic biases 4 mitochondrial genes (3 729 nucleic acid positions)

ML (GTR+I+ Γ) Recoding RY + ML (CF+I+ Γ) ⇒ Ticks emerge with A+T rich insects ⇒ Ticks emerge within other chelicerates

(Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Garbage in, garbage out

¢ The removal of fastest evolving sites can help to overcome systematic biases and to detect Long Branch Attraction artefacts

146 nuclear genes (35371 amino acid positions), ML (JTT + Γ)

Slow-Fast method

(Delsuc, Brinkmann and Philippe – Nat. Rev. Micro. - 2005) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Supermatrix approaches have improved our knowledge of eukaryotic phylogeny

Year Author Number Main results of genes 2002 Baptesteetal 123 MonophylyofAmoebozoa 2004 Philippeetal 129 MonophylyofOpisthokonta 2005 Rodiguez- 143 Monophyly des Plantae Ezpeleta et al 2007 Patronetal 102 MonophylyHaptophyta+Cryptophyta 2007 Burkietal 123 QuestionthemonophylyofChromalveolata 2008 Burkietal 135 EarlyemergenceofExcavatawithinBikonta 2007 Rodiguez- 143 Monophyly of and question the monophyly of Ezpeleta et al Chromalveolata 2009 Hampletal 143 MonophylyofExcavata 2010 Baurainet al 108 Question the monophyly of Chromalveolata 2010 Burkietal. 167 MonophylyofRhizaria 2012 Burkietal. 258 QuestionthemonophylyofChromalveolata

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Phylogenomics led to the emergence of super-phyla

(Simpson and Roger- The ‘real’ kingdoms of eukaryotes – Current biology 2004) (Baldauf et al. The Deep Roots of Eukaryotes – Science – 2003) (Pawlowski, Jan Evolution and Phylogeny . In: eLS. John Wiley & Sons, Ltd: Chichester – 2014) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Opisthokonta

‹ Typical features: - One posteriorly-inserting flagellum - Flattened mitochondrial cristae - Specific insertion in the EF1-alpha ‹ Divided in and Holomycetes

Metazoa 96 Ministeriids 84 94 Ichthyosporeans Corallochytreans Fungi

Nucleariid Homo WFGG WK VTRKD GNASGTTLL Saccharomyces WYGG WEKETK AG VVKGKTLL Glugea WFKG WK PVSG AG DSI-FTLE Arabidopsis WYKG------PTLL WYKG------PILV Ef-1a+HSP70 Plasmodium WYKG------RTLI +act+tub Giardia WYEG------PCLI Sulfolobus WYNG ------PTLE E. coli WEAK------ILELA (Steenkamp et al. The Protistan Origins of Animals and Fungi - MBE - 2006)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The

‹ Mainly composed of amoebae and amoeboid flagellates ‹ Some species have flagella and/ or subpseudopodia ‹ Many species have branching, irregular mitochondrial cristae ‹ Amitochondriate members ‹ Artificially divided in amoebae, slime moulds and flagellated amoebae

Pelomixa

123 genes (25,032 aa positions) – JTT + Γ4. 97 (Bapteste et al - PNAS - 2002) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Archaeplastida

‹ Typical features: - Presence of primary plastids - Duplication of the cytosolic Fructose 1,6- bisphosphate Aldolase (FBA) - Type I transcription factor pBRp - Two membrane-bound plasts ‹ Divided in Viridiplantae, Rhodoplastida and Glaucophyta

143 genes 98 (30,113 aa 97 positions) – ML - BI

(Rodríguez-Ezpeleta et al.– Current Biology - 2005)

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The SAR clade or Harosa

1

‹ Typical features: - Presence of secondary (red and green) photosynthetic lineages - Duplication of the GTPase Rab1 ‹ Divided in Stramenopiles, Alveolata and

258 genes (55, 881 aa pos.) BI – CAT + Γ4 (Burki et al. The evolutionary history of and cryptophytes: phylogenomic evidence for separate origins – Proc Biol Sci. 2012 ) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The CCTH clade or Hacrobia

‹ Typical features: - Free-living, heterotrophic, mixotrophic and autotrophic - No synapomorphies - No parasits ‹ Divided in Cryptomonads, , Telonemids , Haptophyta , + Kathablepharids and Picobiliphytes

127 genes (25, 235 aa pos.) BI – CAT + Γ4, RaxML + Recod

(Burki et al. Large-Scale Phylogenomic Analyses Reveal That Two Enigmatic Protist Lineages, and Centroheliozoa, Are Related to Photosynthetic Chromalveolates – GEB 2009 )

Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) The Excavata

‹ Typical features: - Heterotrophic flagellates - Most live in oxygen-poor environments - Most harbor non-aerobic mitochondria - Contain a distinctive longitudinal feeding groove - Include secondary photosynthetic lineages () ‹ Divided in Metamonads (amito.) and Discoba (mito.)

Pseudo Heterolobosea ciliata Jakobida

Diplomonad 143 genes (35,584 aa pos.) ML Euglenozoa – WAG + Γ4 Parabasalia (Hampl et al. Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic ‘‘supergroups’’ – PNAS 2009) Analyse de séquences et phylogénie moléculaire (Céline Brochier-Armanet 2015-2016) Is the story over?

¢ The phylogenetic position of some lineages remains elusive ß Apusozoa, Breviates, Collodictyonids, Colponema, , Palpitomonas, etc. ¢ Uncultured (picoeukaryotes, etc.) ¢ What are the relationships among the super-phyla? ¢ Where is the root of Eucarya? ß In-between Unikonta and Bikonta? (DHFR/TS fusion) ß In-between Plantae and other phyla? (distribution pattern of rare aa) ß In-between Euglenozoa and other phyla? (cytochrome synthesis pathway)