<<

Molecular Phylogeny

T.D. Master 2 Recherche SdV UE GAP

Karine ALIX [email protected] Phylogeny

 Main purpose: to model the evolutionary history of living

 The word phylogenesis (from Greek phylon « race, tribe » and genesis « origin »), was defined by Ernst Haeckel (Germany) in 1866 as followed

 « During its rapid , an individual repeats the most important changes in form evolved by its ancestors during their long and slow paleontological development » = Ontogeny recapitulates phylogeny

 To study the embryo development you can identify the different ancestral steps that lead to the current

species 2 To estimate relationships between species along evolution First trees: Charles Darwin

note book, 1837 Wikimedia Commons

On the origin of species, 1859 The only drawing in the whole book!

From Baum 2008 3 Same principles as family tree

Philippe Claire

Marc Estelle Corinne Tom

John Vincent Anaïs

4 Anaïs is closer to her brother Vincent… Same principles as family tree

Philippe Claire

Marc Estelle Corinne Tom

John Vincent Anaïs

5 …than to her cousin John. Same principles as family tree

Search for the most recent common ancestor…

The closest Philippe Claire common ancestor is the mother Marc Estelle Corinne Tom

John Vincent Anaïs

6 Anaïs is closer to her brother Vincent… Same principles as family tree

…and compare ! The closest common ancestor is the grand- mother (mother’s mother) Philippe Claire

Marc Estelle Corinne Tom

John Vincent Anaïs

7 …than to her cousin John. But in phylogeny :

The objective is to reconstruct the tree, • having no complete data concerning the ancestors • having no ideas about the changes that occured during the evolutionary history

 The need to develop appropriate methods for phylogenetic establishment

8 Phylogenetic tree

 Corresponds to a scheme of the phylogeny (phylogenesis) for a group of taxa :

 The external nodes (leaves) represent the OTU = operational taxonomic units (1.)

 The branches define the relationships between the taxa in terms of descendants (4.); the length of the branch estimates evolutionary distances (3.)

 The internal nodes represent hypothetical ancestors (5.) and a is a group of OTUs sharing a common ancestor (2.)

 The root (the closest ancestor common to all the taxa under survey) gives the direction of the relationships / ancestry towards progeny (descendants) (6.) 9 Phylogenetic tree 1.

5.

2.

4. 6.

3. 10 common ancestor

rice, maize, wheat,…

Medicago, pea… Arabidopsis, kale, The phylogeny of rapeseed…

angiosperms tomato, tabacco… as an example 11 https://www.mobot.org/MOBOT/research/APweb/ Phylogenetic tree

 Several scheme but the same topology

 We have to look at the way the branches are inserted along the tree = the location of the internal nodes

 That indicate the position of the hypothetical common ancestors, allowing to characterize the relationships along the lineages

 same topology = same biological meaning!!

12 What do we need to construct a molecular phylogeny…

 A set of nucleic sequences (AATTCG) or proteic sequences (LKEVM)

 That are « homologous »

 With high sequence similarities

 But, there are different types of molecular homology, in association with the evolution of the species and the

13 Molecular homology

Orthology = homology by speciation

14 Molecular homology

Paralogy = homology by duplication

15 Molecular homology

Evolution can lead to highly complexe situations…

16 What do we need to construct a molecular phylogeny…  A set of homologous sequences

 A multiple sequence alignement of « very good » quality be careful with uncertain sites, highly variable…

 Ex software: Clustal Omega http://www.ebi.ac.uk/Tools/msa/clustalo/

17 What do we need to construct a molecular phylogeny…

Need (sometimes) to correct the alignment by hand

18 D’après E. Talla Conserved regions to keep Conserved vs. non conserved regions

 To make evolutionary scenario  To find protein homologs between different species  focus on the conserved regions!

conserved  Changes within the sequences in the conserved region reflect the mutations that occured along the evolutionary divergence between the OTU 19 And for You do not use one gene but a set of numerous genes or full genomes obtained by high throughput sequencing

20 What do we need to construct a molecular phylogeny…  A set of homologous sequences

 A multiple sequence alignement of « very good » quality (be careful with highly variable sites)

 You need an to root the internal phylogenetic tree and allow the definition of well locates from each other / role of the root to provide us with the time direction (from ancestry…)

21 The phylogenetic methods

 Based on distances

 alignment  matrix of distances  phylogenetic tree

 Based on characters

 alignment  analysis of the changes for the informative characters  phylogenetic tree

22 Methods based on distances

 To construct the matrix of distances from the set of sequences observed distance / evolutionary distance

 Jukes-Cantor model

 Tajima-Nei model

 Kimura model

 To construct the phylogenetic tree from the matrix of distances established previously

 UPGMA / evolution is constant

 Neighbor-Joining 23 Method based on parcimony

 Principle :  For a group of taxa, the most likely phylogeny is the one that requires the smallest number of evolutionary changes  The shortest way  Looking for the shortest tree  The methods to obtain the tree:  When exhaustive / it’s very long… used for a small set of sequences  « branch and bound » and heuristic 24 Method based on probabiliy

= maximum likelihood

 Principle  The likelihood is the probability to observe  The date we have D = molecular sequences  With the evolutionary hypothesis = phylogenetic tree  Using the evolutionary model M  D and M are known / we look for H with the highest likelihood  The method is highly robust but time- consuming…

25 Is the resulting tree robust?

 To test the robustness: Bootstrap

 Provides the reliability of each branch using statistics) // NECESSARY! (how can we trust the tree obtained?)

 Principle:  Random sample from the data (sites along the alignment) with replacement / you obtain a virtual set of data  For each sample, construction of one tree  1000 random samples (1000 bootstraps)  For each branch: number of trees among the 1000 that display this specific branch 26 Objectives of phylogeny

 To establish the Tree of Life /

 To study biodiversity ex : to determine the evolutionary geographical distribution of species 

27 Figure 2. Chronogram and Major Biogeographic Events of the Poison Frogs Divided by Major Regions and Inferred from the DEC Analysis

Santos JC, Coloma LA, Summers K, Caldwell JP, Ree R, et al. (2009) Amazonian Amphibian Diversity Is Primarily Derived from Late Miocene Andean Lineages. PLoS Biol 7(3): e1000056. doi:10.1371/journal.pbio.1000056 http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.1000056 A nice example if you are interested in… Objectives of phylogeny

 To establish the Tree of Life / taxonomy

 To study biodiversity ex : to determine the evolutionary geographical distribution of species  phylogeography

 To understand evolutionary dynamics by defining the molecular mechanisms associated with evolution ex : to determine gene functions Principle : orthologous genes have the same

function! Major feature within multigene families. 29 Opaque2 and the evolution monocots / eudicots

 Or O2

 Transcription factor with a bZIP domain, first characterized in maize

 A role in the biosynthesis of the endosperm proteins

 As well as in the carbohydrate metabolism and nitrate assimilation in developing grains  It is a gene of domestication in association

with grain quality 30 Opaque2 and the evolution monocots / eudicots

3 homologs EMGO1, EMGO2, EMGO3 ; sequences retrieved from a large sample of angiosperms.

Try to identify the events that have shaped the evolution of the 31 gene family Opaque 2 in angiosperms. Opaque2 and the evolution monocots / eudicots

32 There are limits

 There is no universally accepted algorithm (=phylogenetic reconstruction method)

 All the hypotheses concerning the evolutionary dynamics of the sequences (mutation model) are simplified

 From the same set of sequences, the use of different algorithms can lead to different trees

 Never sure to obtain the real phylogenetic tree: it is still a model!

33 There are limits

 A gene tree is not a species tree!

 The sequences do not have the same evolutionary dynamics than the whole sepcies (mutation rate, selection pressure)

 Genes evolve at different rates (when comparing one gene to another)

 Horizontal transfers / recombination / hybridization….

 Need to cumulate phylogenetic information from several gene phylogenies Maybe phylogenomics is the answer…. 34 http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002411