Phylogeny Reconstruction
Total Page:16
File Type:pdf, Size:1020Kb
Phylogeny reconstruction Eva Stukenbrock & Julien Dutheil [email protected] [email protected] Max Planck Institute for Terrestrial Microbiology – Marburg 19 Janvier 2011 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 1 / 30 Infer the history of taxa Taxonomy Phylogeography Necessary to comparative analysis, as species are not “independent” Why studying phylogenetics? Phylogenetics The study of the evolution of (taxonomic) groups of organisms (e.g. species, populations). Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 2 / 30 Why studying phylogenetics? Phylogenetics The study of the evolution of (taxonomic) groups of organisms (e.g. species, populations). Infer the history of taxa Taxonomy Phylogeography Necessary to comparative analysis, as species are not “independent” Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 2 / 30 Some applications: Resolving orthology and paralogy Datation: estimate divergence times Reconstruct ancestral sequences Exhibit sites under positive selection Predict the structure of a molecule. Phylogenetics applications Phylogenetic tree A graph depicting the ancestor-descendant relationships between organisms or gene sequences. The sequences are the tips of the tree. Branches of the tree connect the tips to their (unobservable) ancestral sequences (Holder 2003) Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 3 / 30 Phylogenetics applications Phylogenetic tree A graph depicting the ancestor-descendant relationships between organisms or gene sequences. The sequences are the tips of the tree. Branches of the tree connect the tips to their (unobservable) ancestral sequences (Holder 2003) Some applications: Resolving orthology and paralogy Datation: estimate divergence times Reconstruct ancestral sequences Exhibit sites under positive selection Predict the structure of a molecule. Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 3 / 30 3 leaves 4 leaves A A C A 1 B C C D B B 2 D A A B B C D C D With branch lengths: With clock: A A B B (2n − 5)! C C 2n−3(n − 2)! D D distinct unrooted trees What is a tree? 2 leaves A B Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 4 leaves A C A 1 B C D B 2 D A A B B C D C D With branch lengths: With clock: A A B B (2n − 5)! C C 2n−3(n − 2)! D D distinct unrooted trees What is a tree? 2 leaves 3 leaves A A B C B Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 A 1 B C D 2 A B C D With branch lengths: With clock: A A B B (2n − 5)! C C 2n−3(n − 2)! D D distinct unrooted trees What is a tree? 2 leaves 3 leaves 4 leaves A A C A B C B B D A B C D Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 A 1 B C D 2 A B C D With branch lengths: With clock: A A B B C C D D What is a tree? 2 leaves 3 leaves 4 leaves A A C A B C B B D A B C D (2n − 5)! 2n−3(n − 2)! distinct unrooted trees Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 2 A B C D With branch lengths: With clock: A A B B C C D D What is a tree? 2 leaves 3 leaves 4 leaves A A C A 1 B A B C C D B B D A B C D (2n − 5)! 2n−3(n − 2)! distinct unrooted trees Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 With branch lengths: With clock: A A B B C C D D What is a tree? 2 leaves 3 leaves 4 leaves A A C A 1 B A B C C D B B 2 D A A B B C D C D (2n − 5)! 2n−3(n − 2)! distinct unrooted trees Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 With clock: A B C D What is a tree? 2 leaves 3 leaves 4 leaves A A C A 1 B A B C C D B B 2 D A A B B C D C D With branch lengths: A B (2n − 5)! C 2n−3(n − 2)! D distinct unrooted trees Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 What is a tree? 2 leaves 3 leaves 4 leaves A A C A 1 B A B C C D B B 2 D A A B B C D C D With branch lengths: With clock: A A B B (2n − 5)! C C 2n−3(n − 2)! D D distinct unrooted trees Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 4 / 30 Taxon 2 Taxon 1 Site Taxon 1 AAGACATGTGGCA Taxon 2 AGGAC-TGTGGCA Taxon 3 AGTAC-TGTGA-A Taxon 3 Taxon 4 AGCAC-TGTG--T Taxon 5 AGCACATGTGA-A Taxon 5 Taxon 4 Aligned homologous positions: sites Each site is a realization of a random variable From sequences to trees Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 5 / 30 Taxon 2 Taxon 1 Site Taxon 3 Taxon 5 Taxon 4 Aligned homologous positions: sites Each site is a realization of a random variable From sequences to trees Taxon 1 AAGACATGTGGCA Taxon 2 AGGAC-TGTGGCA Taxon 3 AGTAC-TGTGA-A Taxon 4 AGCAC-TGTG--T Taxon 5 AGCACATGTGA-A Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 5 / 30 Site Aligned homologous positions: sites Each site is a realization of a random variable From sequences to trees Taxon 2 Taxon 1 Taxon 1 AAGACATGTGGCA Taxon 2 AGGAC-TGTGGCA Taxon 3 AGTAC-TGTGA-A Taxon 3 Taxon 4 AGCAC-TGTG--T Taxon 5 AGCACATGTGA-A Taxon 5 Taxon 4 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 5 / 30 From sequences to trees Taxon 2 Taxon 1 Site Taxon 1 AAGACATGTGGCA Taxon 2 AGGAC-TGTGGCA Taxon 3 AGTAC-TGTGA-A Taxon 3 Taxon 4 AGCAC-TGTG--T Taxon 5 AGCACATGTGA-A Taxon 5 Taxon 4 Aligned homologous positions: sites Each site is a realization of a random variable Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 5 / 30 The phenetic approach Uses an overall similarity measure (commonly used in morphology) Consider that the similarity is (inversely) correlated to the evolutionary distance Build a tree from a pairwise distance matrix: T1 T2 T3 T4 T5 T1 0 T2 2 0 T3 3.5 4.5 0 T4 8 10 8 0 T5 6 8 9 3 0 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 6 / 30 (Assumes that the rate of change is constant over time: molecular clock) 2 Gather the corresponding groups 3 Recompute distances from the new group: 1.0 T1 d + d 1.0 d = i;k j;k ij;k 1.0 2 2.125 T2 2.0 T3 1.5 T4 2.625 1.5 T5 The WPGMA clustering method Weighted Pair Group Method using Average T1 T2 T3 T4 T5 1 Pick the smallest distance T1 0 T2 2 0 T3 3.5 4.5 0 T4 8 10 8 0 T5 6 8 9 3 0 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 (Assumes that the rate of change is constant over time: molecular clock) 1 Pick the smallest distance 3 Recompute distances from the new group: di;k + dj;k 1.0 dij;k = 2 2.125 2.0 T3 1.5 T4 2.625 1.5 T5 The WPGMA clustering method Weighted Pair Group Method using Average T1 T2 T3 T4 T5 T1 0 2 Gather the corresponding T2 2 0 groups T3 3.5 4.5 0 T4 8 10 8 0 T5 6 8 9 3 0 1.0 T1 1.0 T2 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 1 Pick the smallest distance 2 Gather the corresponding groups 1.0 2.125 2.0 T3 1.5 T4 2.625 1.5 T5 The WPGMA clustering method Weighted Pair Group Method using Average T12 T3 T4 T5 T12 0 T3 4 0 T4 9 8 0 3 Recompute distances from the T5 7 9 3 0 new group: 1.0 T1 di;k + dj;k dij;k = 2 1.0 T2 (Assumes that the rate of change is constant over time: molecular clock) Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 (Assumes that the rate of change is constant over time: molecular clock) 2 Gather the corresponding groups 3 Recompute distances from the new group: di;k + dj;k 1.0 dij;k = 2 2.125 2.0 T3 1.5 T4 2.625 1.5 T5 The WPGMA clustering method Weighted Pair Group Method using Average T12 T3 T4 T5 1 Pick the smallest distance T12 0 T3 4 0 T4 9 8 0 T5 7 9 3 0 1.0 T1 1.0 T2 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 (Assumes that the rate of change is constant over time: molecular clock) 1 Pick the smallest distance 3 Recompute distances from the new group: di;k + dj;k 1.0 dij;k = 2 2.125 2.0 T3 2.625 The WPGMA clustering method Weighted Pair Group Method using Average T12 T3 T4 T5 T12 0 2 Gather the corresponding T3 4 0 groups T4 9 8 0 T5 7 9 3 0 1.0 T1 1.0 T2 1.5 T4 1.5 T5 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 1 Pick the smallest distance 2 Gather the corresponding groups 1.0 2.125 (Assumes that the rate of 2.0 T3 change is constant over time: molecular clock) 2.625 The WPGMA clustering method Weighted Pair Group Method using Average T12 T3 T45 T12 0 T3 4 0 T45 8 8.5 0 3 Recompute distances from the new group: 1.0 T1 di;k + dj;k dij;k = 2 1.0 T2 1.5 T4 1.5 T5 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 (Assumes that the rate of change is constant over time: molecular clock) 2 Gather the corresponding groups 3 Recompute distances from the new group: di;k + dj;k 1.0 dij;k = 2 2.125 2.0 T3 2.625 The WPGMA clustering method Weighted Pair Group Method using Average 1 Pick the smallest distance T12 T3 T45 T12 0 T3 4 0 T45 8 8.5 0 1.0 T1 1.0 T2 1.5 T4 1.5 T5 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 (Assumes that the rate of change is constant over time: molecular clock) 1 Pick the smallest distance 3 Recompute distances from the new group: di;k + dj;k dij;k = 2 2.125 2.625 The WPGMA clustering method Weighted Pair Group Method using Average T12 T3 T45 2 Gather the corresponding T12 0 groups T3 4 0 T45 8 8.5 0 1.0 T1 1.0 1.0 T2 2.0 T3 1.5 T4 1.5 T5 Stukenbrock EH & Dutheil JY (MPItM) Phylogeny reconstruction 19 Janvier 2011 7 / 30 1 Pick the smallest