Molecular Phylogeny
T.D. Master 2 Recherche SdV UE GAP
Karine ALIX [email protected] Phylogeny
Main purpose: to model the evolutionary history of living organisms
The word phylogenesis (from Greek phylon « race, tribe » and genesis « origin »), was defined by Ernst Haeckel (Germany) in 1866 as followed
« During its rapid evolution, an individual repeats the most important changes in form evolved by its ancestors during their long and slow paleontological development » = Ontogeny recapitulates phylogeny
To study the embryo development you can identify the different ancestral steps that lead to the current
species 2 Phylogenetic tree To estimate relationships between species along evolution First trees: Charles Darwin
note book, 1837 Wikimedia Commons
On the origin of species, 1859 The only drawing in the whole book!
From Baum 2008 3 Same principles as family tree
Philippe Claire
Marc Estelle Corinne Tom
John Vincent Anaïs
4 Anaïs is closer to her brother Vincent… Same principles as family tree
Philippe Claire
Marc Estelle Corinne Tom
John Vincent Anaïs
5 …than to her cousin John. Same principles as family tree
Search for the most recent common ancestor…
The closest Philippe Claire common ancestor is the mother Marc Estelle Corinne Tom
John Vincent Anaïs
6 Anaïs is closer to her brother Vincent… Same principles as family tree
…and compare ! The closest common ancestor is the grand- mother (mother’s mother) Philippe Claire
Marc Estelle Corinne Tom
John Vincent Anaïs
7 …than to her cousin John. But in phylogeny :
The objective is to reconstruct the tree, • having no complete data concerning the ancestors • having no ideas about the changes that occured during the evolutionary history
The need to develop appropriate methods for phylogenetic establishment
8 Phylogenetic tree
Corresponds to a scheme of the phylogeny (phylogenesis) for a group of taxa :
The external nodes (leaves) represent the OTU = operational taxonomic units (1.)
The branches define the relationships between the taxa in terms of descendants (4.); the length of the branch estimates evolutionary distances (3.)
The internal nodes represent hypothetical ancestors (5.) and a clade is a group of OTUs sharing a common ancestor (2.)
The root (the closest ancestor common to all the taxa under survey) gives the direction of the relationships / ancestry towards progeny (descendants) (6.) 9 Phylogenetic tree 1.
5.
2.
4. 6.
3. 10 common ancestor
rice, maize, wheat,…
Medicago, pea… Arabidopsis, kale, The phylogeny of rapeseed…
angiosperms tomato, tabacco… as an example 11 https://www.mobot.org/MOBOT/research/APweb/ Phylogenetic tree
Several scheme but the same topology
We have to look at the way the branches are inserted along the tree = the location of the internal nodes
That indicate the position of the hypothetical common ancestors, allowing to characterize the relationships along the lineages
same topology = same biological meaning!!
12 What do we need to construct a molecular phylogeny…
A set of nucleic sequences (AATTCG) or proteic sequences (LKEVM)
That are « homologous »
With high sequence similarities
But, there are different types of molecular homology, in association with the evolution of the species and the genomes…
13 Molecular homology
Orthology = homology by speciation
14 Molecular homology
Paralogy = homology by duplication
15 Molecular homology
Evolution can lead to highly complexe situations…
16 What do we need to construct a molecular phylogeny… A set of homologous sequences
A multiple sequence alignement of « very good » quality be careful with uncertain sites, highly variable…
Ex software: Clustal Omega http://www.ebi.ac.uk/Tools/msa/clustalo/
17 What do we need to construct a molecular phylogeny…
Need (sometimes) to correct the alignment by hand
18 D’après E. Talla Conserved regions to keep Conserved vs. non conserved regions
To make evolutionary scenario To find protein homologs between different species focus on the conserved regions!
conserved Changes within the sequences in the conserved region reflect the mutations that occured along the evolutionary divergence between the OTU 19 And for phylogenomics You do not use one gene but a set of numerous genes or full genomes obtained by high throughput sequencing
20 What do we need to construct a molecular phylogeny… A set of homologous sequences
A multiple sequence alignement of « very good » quality (be careful with highly variable sites)
You need an outgroup to root the internal phylogenetic tree and allow the definition of clades well locates from each other / role of the root to provide us with the time direction (from ancestry…)
21 The phylogenetic methods
Based on distances
alignment matrix of distances phylogenetic tree
Based on characters
alignment analysis of the changes for the informative characters phylogenetic tree
22 Methods based on distances
To construct the matrix of distances from the set of sequences observed distance / evolutionary distance
Jukes-Cantor model
Tajima-Nei model
Kimura model
To construct the phylogenetic tree from the matrix of distances established previously
UPGMA / evolution is constant
Neighbor-Joining 23 Method based on parcimony
Principle : For a group of taxa, the most likely phylogeny is the one that requires the smallest number of evolutionary changes The shortest way Looking for the shortest tree The methods to obtain the tree: When exhaustive / it’s very long… used for a small set of sequences « branch and bound » and heuristic 24 Method based on probabiliy
= maximum likelihood
Principle The likelihood is the probability to observe The date we have D = molecular sequences With the evolutionary hypothesis = phylogenetic tree Using the evolutionary model M D and M are known / we look for H with the highest likelihood The method is highly robust but time- consuming…
25 Is the resulting tree robust?
To test the robustness: Bootstrap
Provides the reliability of each branch using statistics) // NECESSARY! (how can we trust the tree obtained?)
Principle: Random sample from the data (sites along the alignment) with replacement / you obtain a virtual set of data For each sample, construction of one tree 1000 random samples (1000 bootstraps) For each branch: number of trees among the 1000 that display this specific branch 26 Objectives of phylogeny
To establish the Tree of Life / taxonomy
To study biodiversity ex : to determine the evolutionary geographical distribution of species phylogeography
27 Figure 2. Chronogram and Major Biogeographic Events of the Poison Frogs Divided by Major Regions and Inferred from the DEC Analysis
Santos JC, Coloma LA, Summers K, Caldwell JP, Ree R, et al. (2009) Amazonian Amphibian Diversity Is Primarily Derived from Late Miocene Andean Lineages. PLoS Biol 7(3): e1000056. doi:10.1371/journal.pbio.1000056 http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.1000056 A nice example if you are interested in… Objectives of phylogeny
To establish the Tree of Life / taxonomy
To study biodiversity ex : to determine the evolutionary geographical distribution of species phylogeography
To understand evolutionary dynamics by defining the molecular mechanisms associated with evolution ex : to determine gene functions Principle : orthologous genes have the same
function! Major feature within multigene families. 29 Opaque2 and the evolution monocots / eudicots
Or O2
Transcription factor with a bZIP domain, first characterized in maize
A role in the biosynthesis of the endosperm proteins
As well as in the carbohydrate metabolism and nitrate assimilation in developing grains It is a gene of domestication in association
with grain quality 30 Opaque2 and the evolution monocots / eudicots
3 homologs EMGO1, EMGO2, EMGO3 ; sequences retrieved from a large sample of angiosperms.
Try to identify the events that have shaped the evolution of the 31 gene family Opaque 2 in angiosperms. Opaque2 and the evolution monocots / eudicots
32 There are limits
There is no universally accepted algorithm (=phylogenetic reconstruction method)
All the hypotheses concerning the evolutionary dynamics of the sequences (mutation model) are simplified
From the same set of sequences, the use of different algorithms can lead to different trees
Never sure to obtain the real phylogenetic tree: it is still a model!
33 There are limits
A gene tree is not a species tree!
The sequences do not have the same evolutionary dynamics than the whole sepcies (mutation rate, selection pressure)
Genes evolve at different rates (when comparing one gene to another)
Horizontal transfers / recombination / hybridization….
Need to cumulate phylogenetic information from several gene phylogenies Maybe phylogenomics is the answer…. 34 http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002411