Molecular Phylogeny
Total Page:16
File Type:pdf, Size:1020Kb
Molecular Phylogeny T.D. Master 2 Recherche SdV UE GAP Karine ALIX [email protected] Phylogeny Main purpose: to model the evolutionary history of living organisms The word phylogenesis (from Greek phylon « race, tribe » and genesis « origin »), was defined by Ernst Haeckel (Germany) in 1866 as followed « During its rapid evolution, an individual repeats the most important changes in form evolved by its ancestors during their long and slow paleontological development » = Ontogeny recapitulates phylogeny To study the embryo development you can identify the different ancestral steps that lead to the current species 2 Phylogenetic tree To estimate relationships between species along evolution First trees: Charles Darwin note book, 1837 Wikimedia Commons On the origin of species, 1859 The only drawing in the whole book! From Baum 2008 3 Same principles as family tree Philippe Claire Marc Estelle Corinne Tom John Vincent Anaïs 4 Anaïs is closer to her brother Vincent… Same principles as family tree Philippe Claire Marc Estelle Corinne Tom John Vincent Anaïs 5 …than to her cousin John. Same principles as family tree Search for the most recent common ancestor… The closest Philippe Claire common ancestor is the mother Marc Estelle Corinne Tom John Vincent Anaïs 6 Anaïs is closer to her brother Vincent… Same principles as family tree …and compare ! The closest common ancestor is the grand- mother (mother’s mother) Philippe Claire Marc Estelle Corinne Tom John Vincent Anaïs 7 …than to her cousin John. But in phylogeny : The objective is to reconstruct the tree, • having no complete data concerning the ancestors • having no ideas about the changes that occured during the evolutionary history The need to develop appropriate methods for phylogenetic establishment 8 Phylogenetic tree Corresponds to a scheme of the phylogeny (phylogenesis) for a group of taxa : The external nodes (leaves) represent the OTU = operational taxonomic units (1.) The branches define the relationships between the taxa in terms of descendants (4.); the length of the branch estimates evolutionary distances (3.) The internal nodes represent hypothetical ancestors (5.) and a clade is a group of OTUs sharing a common ancestor (2.) The root (the closest ancestor common to all the taxa under survey) gives the direction of the relationships / ancestry towards progeny (descendants) (6.) 9 Phylogenetic tree 1. 5. 2. 4. 6. 3. 10 common ancestor rice, maize, wheat,… Medicago, pea… Arabidopsis, kale, The phylogeny of rapeseed… angiosperms tomato, tabacco… as an example 11 https://www.mobot.org/MOBOT/research/APweb/ Phylogenetic tree Several scheme but the same topology We have to look at the way the branches are inserted along the tree = the location of the internal nodes That indicate the position of the hypothetical common ancestors, allowing to characterize the relationships along the lineages same topology = same biological meaning!! 12 What do we need to construct a molecular phylogeny… A set of nucleic sequences (AATTCG) or proteic sequences (LKEVM) That are « homologous » With high sequence similarities But, there are different types of molecular homology, in association with the evolution of the species and the genomes… 13 Molecular homology Orthology = homology by speciation 14 Molecular homology Paralogy = homology by duplication 15 Molecular homology Evolution can lead to highly complexe situations… 16 What do we need to construct a molecular phylogeny… A set of homologous sequences A multiple sequence alignement of « very good » quality be careful with uncertain sites, highly variable… Ex software: Clustal Omega http://www.ebi.ac.uk/Tools/msa/clustalo/ 17 What do we need to construct a molecular phylogeny… Need (sometimes) to correct the alignment by hand 18 D’après E. Talla Conserved regions to keep Conserved vs. non conserved regions To make evolutionary scenario To find protein homologs between different species focus on the conserved regions! conserved Changes within the sequences in the conserved region reflect the mutations that occured along the evolutionary divergence between the OTU 19 And for phylogenomics You do not use one gene but a set of numerous genes or full genomes obtained by high throughput sequencing 20 What do we need to construct a molecular phylogeny… A set of homologous sequences A multiple sequence alignement of « very good » quality (be careful with highly variable sites) You need an outgroup to root the internal phylogenetic tree and allow the definition of clades well locates from each other / role of the root to provide us with the time direction (from ancestry…) 21 The phylogenetic methods Based on distances alignment matrix of distances phylogenetic tree Based on characters alignment analysis of the changes for the informative characters phylogenetic tree 22 Methods based on distances To construct the matrix of distances from the set of sequences observed distance / evolutionary distance Jukes-Cantor model Tajima-Nei model Kimura model To construct the phylogenetic tree from the matrix of distances established previously UPGMA / evolution is constant Neighbor-Joining 23 Method based on parcimony Principle : For a group of taxa, the most likely phylogeny is the one that requires the smallest number of evolutionary changes The shortest way Looking for the shortest tree The methods to obtain the tree: When exhaustive / it’s very long… used for a small set of sequences « branch and bound » and heuristic 24 Method based on probabiliy = maximum likelihood Principle The likelihood is the probability to observe The date we have D = molecular sequences With the evolutionary hypothesis = phylogenetic tree Using the evolutionary model M D and M are known / we look for H with the highest likelihood The method is highly robust but time- consuming… 25 Is the resulting tree robust? To test the robustness: Bootstrap Provides the reliability of each branch using statistics) // NECESSARY! (how can we trust the tree obtained?) Principle: Random sample from the data (sites along the alignment) with replacement / you obtain a virtual set of data For each sample, construction of one tree 1000 random samples (1000 bootstraps) For each branch: number of trees among the 1000 that display this specific branch 26 Objectives of phylogeny To establish the Tree of Life / taxonomy To study biodiversity ex : to determine the evolutionary geographical distribution of species phylogeography 27 Figure 2. Chronogram and Major Biogeographic Events of the Poison Frogs Divided by Major Regions and Inferred from the DEC Analysis Santos JC, Coloma LA, Summers K, Caldwell JP, Ree R, et al. (2009) Amazonian Amphibian Diversity Is Primarily Derived from Late Miocene Andean Lineages. PLoS Biol 7(3): e1000056. doi:10.1371/journal.pbio.1000056 http://journals.plos.org/plosbiology/article?id=info:doi/10.1371/journal.pbio.1000056 A nice example if you are interested in… Objectives of phylogeny To establish the Tree of Life / taxonomy To study biodiversity ex : to determine the evolutionary geographical distribution of species phylogeography To understand evolutionary dynamics by defining the molecular mechanisms associated with evolution ex : to determine gene functions Principle : orthologous genes have the same function! Major feature within multigene families. 29 Opaque2 and the evolution monocots / eudicots Or O2 Transcription factor with a bZIP domain, first characterized in maize A role in the biosynthesis of the endosperm proteins As well as in the carbohydrate metabolism and nitrate assimilation in developing grains It is a gene of domestication in association with grain quality 30 Opaque2 and the evolution monocots / eudicots 3 homologs EMGO1, EMGO2, EMGO3 ; sequences retrieved from a large sample of angiosperms. Try to identify the events that have shaped the evolution of the 31 gene family Opaque 2 in angiosperms. Opaque2 and the evolution monocots / eudicots 32 There are limits There is no universally accepted algorithm (=phylogenetic reconstruction method) All the hypotheses concerning the evolutionary dynamics of the sequences (mutation model) are simplified From the same set of sequences, the use of different algorithms can lead to different trees Never sure to obtain the real phylogenetic tree: it is still a model! 33 There are limits A gene tree is not a species tree! The sequences do not have the same evolutionary dynamics than the whole sepcies (mutation rate, selection pressure) Genes evolve at different rates (when comparing one gene to another) Horizontal transfers / recombination / hybridization…. Need to cumulate phylogenetic information from several gene phylogenies Maybe phylogenomics is the answer…. 34 http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002411.