An Introduction to Phylogenetic Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Phylogenetics Topic 2: Phylogenetic and Genealogical Homology
Phylogenetics Topic 2: Phylogenetic and genealogical homology Phylogenies distinguish homology from similarity Previously, we examined how rooted phylogenies provide a framework for distinguishing similarity due to common ancestry (HOMOLOGY) from non-phylogenetic similarity (ANALOGY). Here we extend the concept of phylogenetic homology by making a further distinction between a HOMOLOGOUS CHARACTER and a HOMOLOGOUS CHARACTER STATE. This distinction is important to molecular evolution, as we often deal with data comprised of homologous characters with non-homologous character states. The figure below shows three hypothetical protein-coding nucleotide sequences (for simplicity, only three codons long) that are related to each other according to a phylogenetic tree. In the figure the nucleotide sequences are aligned to each other; in so doing we are making the implicit assumption that the characters aligned vertically are homologous characters. In the specific case of nucleotide and amino acid alignments this assumption is called POSITIONAL HOMOLOGY. Under positional homology it is implicit that a given position, say the first position in the gene sequence, was the same in the gene sequence of the common ancestor. In the figure below it is clear that some positions do not have identical character states (see red characters in figure below). In such a case the involved position is considered to be a homologous character, while the state of that character will be non-homologous where there are differences. Phylogenetic perspective on homologous characters and homologous character states ACG TAC TAA SYNAPOMORPHY: a shared derived character state in C two or more lineages. ACG TAT TAA These must be homologous in state. -
Phylogenetic Trees and Cladograms Are Graphical Representations (Models) of Evolutionary History That Can Be Tested
AP Biology Lab/Cladograms and Phylogenetic Trees Name _______________________________ Relationship to the AP Biology Curriculum Framework Big Idea 1: The process of evolution drives the diversity and unity of life. Essential knowledge 1.B.2: Phylogenetic trees and cladograms are graphical representations (models) of evolutionary history that can be tested. Learning Objectives: LO 1.17 The student is able to pose scientific questions about a group of organisms whose relatedness is described by a phylogenetic tree or cladogram in order to (1) identify shared characteristics, (2) make inferences about the evolutionary history of the group, and (3) identify character data that could extend or improve the phylogenetic tree. LO 1.18 The student is able to evaluate evidence provided by a data set in conjunction with a phylogenetic tree or a simple cladogram to determine evolutionary history and speciation. LO 1.19 The student is able create a phylogenetic tree or simple cladogram that correctly represents evolutionary history and speciation from a provided data set. [Introduction] Cladistics is the study of evolutionary classification. Cladograms show evolutionary relationships among organisms. Comparative morphology investigates characteristics for homology and analogy to determine which organisms share a recent common ancestor. A cladogram will begin by grouping organisms based on a characteristics displayed by ALL the members of the group. Subsequently, the larger group will contain increasingly smaller groups that share the traits of the groups before them. However, they also exhibit distinct changes as the new species evolve. Further, molecular evidence from genes which rarely mutate can provide molecular clocks that tell us how long ago organisms diverged, unlocking the secrets of organisms that may have similar convergent morphology but do not share a recent common ancestor. -
Phylogeny Codon Models • Last Lecture: Poor Man’S Way of Calculating Dn/Ds (Ka/Ks) • Tabulate Synonymous/Non-Synonymous Substitutions • Normalize by the Possibilities
Phylogeny Codon models • Last lecture: poor man’s way of calculating dN/dS (Ka/Ks) • Tabulate synonymous/non-synonymous substitutions • Normalize by the possibilities • Transform to genetic distance KJC or Kk2p • In reality we use codon model • Amino acid substitution rates meet nucleotide models • Codon(nucleotide triplet) Codon model parameterization Stop codons are not allowed, reducing the matrix from 64x64 to 61x61 The entire codon matrix can be parameterized using: κ kappa, the transition/transversionratio ω omega, the dN/dS ratio – optimizing this parameter gives the an estimate of selection force πj the equilibrium codon frequency of codon j (Goldman and Yang. MBE 1994) Empirical codon substitution matrix Observations: Instantaneous rates of double nucleotide changes seem to be non-zero There should be a mechanism for mutating 2 adjacent nucleotides at once! (Kosiol and Goldman) • • Phylogeny • • Last lecture: Inferring distance from Phylogenetic trees given an alignment How to infer trees and distance distance How do we infer trees given an alignment • • Branch length Topology d 6-p E 6'B o F P Edo 3 vvi"oH!.- !fi*+nYolF r66HiH- .) Od-:oXP m a^--'*A ]9; E F: i ts X o Q I E itl Fl xo_-+,<Po r! UoaQrj*l.AP-^PA NJ o - +p-5 H .lXei:i'tH 'i,x+<ox;+x"'o 4 + = '" I = 9o FF^' ^X i! .poxHo dF*x€;. lqEgrE x< f <QrDGYa u5l =.ID * c 3 < 6+6_ y+ltl+5<->-^Hry ni F.O+O* E 3E E-f e= FaFO;o E rH y hl o < H ! E Y P /-)^\-B 91 X-6p-a' 6J. -
Interpreting Cladograms Notes
Interpreting Cladograms Notes INTERPRETING CLADOGRAMS BIG IDEA: PHYLOGENIES DEPICT ANCESTOR AND DESCENDENT RELATIONSHIPS AMONG ORGANISMS BASED ON HOMOLOGY THESE EVOLUTIONARY RELATIONSHIPS ARE REPRESENTED BY DIAGRAMS CALLED CLADOGRAMS (BRANCHING DIAGRAMS THAT ORGANIZE RELATIONSHIPS) What is a Cladogram ● A diagram which shows ● Is not an evolutionary ___________ among tree, ___________show organisms. how ancestors are related ● Lines branching off other to descendants or how lines. The lines can be much they have changed traced back to where they branch off. These branching off points represent a hypothetical ancestor. Parts of a Cladogram Reading Cladograms ● Read like a family tree: show ________of shared ancestry between lineages. • When an ancestral lineage______: speciation is indicated due to the “arrival” of some new trait. Each lineage has unique ____to itself alone and traits that are shared with other lineages. each lineage has _______that are unique to that lineage and ancestors that are shared with other lineages — common ancestors. Quick Question #1 ●What is a_________? ● A group that includes a common ancestor and all the descendants (living and extinct) of that ancestor. Reading Cladogram: Identifying Clades ● Using a cladogram, it is easy to tell if a group of lineages forms a clade. ● Imagine clipping a single branch off the phylogeny ● all of the organisms on that pruned branch make up a clade Quick Question #2 ● Looking at the image to the right: ● Is the green box a clade? ● The blue? ● The pink? ● The orange? Reading Cladograms: Clades ● Clades are nested within one another ● they form a nested hierarchy. ● A clade may include many thousands of species or just a few. -
Conceptual Issues in Phylogeny, Taxonomy, and Nomenclature
Contributions to Zoology, 66 (1) 3-41 (1996) SPB Academic Publishing bv, Amsterdam Conceptual issues in phylogeny, taxonomy, and nomenclature Alexandr P. Rasnitsyn Paleontological Institute, Russian Academy ofSciences, Profsoyuznaya Street 123, J17647 Moscow, Russia Keywords: Phylogeny, taxonomy, phenetics, cladistics, phylistics, principles of nomenclature, type concept, paleoentomology, Xyelidae (Vespida) Abstract On compare les trois approches taxonomiques principales développées jusqu’à présent, à savoir la phénétique, la cladis- tique et la phylistique (= systématique évolutionnaire). Ce Phylogenetic hypotheses are designed and tested (usually in dernier terme s’applique à une approche qui essaie de manière implicit form) on the basis ofa set ofpresumptions, that is, of à les traits fondamentaux de la taxonomic statements explicite représenter describing a certain order of things in nature. These traditionnelle en de leur et particulier son usage preuves ayant statements are to be accepted as such, no matter whatever source en même temps dans la similitude et dans les relations de evidence for them exists, but only in the absence ofreasonably parenté des taxons en question. L’approche phylistique pré- sound evidence pleading against them. A set ofthe most current sente certains avantages dans la recherche de réponses aux phylogenetic presumptions is discussed, and a factual example problèmes fondamentaux de la taxonomie. ofa practical realization of the approach is presented. L’auteur considère la nomenclature A is made of the three -
Math/C SC 5610 Computational Biology Lecture 12: Phylogenetics
Math/C SC 5610 Computational Biology Lecture 12: Phylogenetics Stephen Billups University of Colorado at Denver Math/C SC 5610Computational Biology – p.1/25 Announcements Project Guidelines and Ideas are posted. (proposal due March 8) CCB Seminar, Friday (Mar. 4) Speaker: Jack Horner, SAIC Title: Phylogenetic Methods for Characterizing the Signature of Stage I Ovarian Cancer in Serum Protein Mas Time: 11-12 (Followed by lunch) Place: Media Center, AU008 Math/C SC 5610Computational Biology – p.2/25 Outline Distance based methods for phylogenetics UPGMA WPGMA Neighbor-Joining Character based methods Maximum Likelihood Maximum Parsimony Math/C SC 5610Computational Biology – p.3/25 Review: Distance Based Clustering Methods Main Idea: Requires a distance matrix D, (defining distances between each pair of elements). Repeatedly group together closest elements. Different algorithms differ by how they treat distances between groups. UPGMA (unweighted pair group method with arithmetic mean). WPGMA (weighted pair group method with arithmetic mean). Math/C SC 5610Computational Biology – p.4/25 UPGMA 1. Initialize C to the n singleton clusters f1g; : : : ; fng. 2. Initialize dist(c; d) on C by defining dist(fig; fjg) = D(i; j): 3. Repeat n ¡ 1 times: (a) determine pair c; d of clusters in C such that dist(c; d) is minimal; define dmin = dist(c; d). (b) define new cluster e = c S d; update C = C ¡ fc; dg Sfeg. (c) define a node with label e and daughters c; d, where e has distance dmin=2 to its leaves. (d) define for all f 2 C with f 6= e, dist(c; f) + dist(d; f) dist(e; f) = dist(f; e) = : (avg. -
Clustering and Phylogenetic Approaches to Classification: Illustration on Stellar Tracks Didier Fraix-Burnet, Marc Thuillard
Clustering and Phylogenetic Approaches to Classification: Illustration on Stellar Tracks Didier Fraix-Burnet, Marc Thuillard To cite this version: Didier Fraix-Burnet, Marc Thuillard. Clustering and Phylogenetic Approaches to Classification: Il- lustration on Stellar Tracks. 2014. hal-01703341 HAL Id: hal-01703341 https://hal.archives-ouvertes.fr/hal-01703341 Preprint submitted on 7 Feb 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Clustering and Phylogenetic Approaches to Classification: Illustration on Stellar Tracks D. Fraix-Burnet1, M. Thuillard2 1 Univ. Grenoble Alpes, CNRS, IPAG, 38000 Grenoble, France email: [email protected] 2 La Colline, 2072 St-Blaise, Switzerland February 7, 2018 This pedagogicalarticle was written in 2014 and is yet unpublished. Partofit canbefoundin Fraix-Burnet (2015). Abstract Classifying objects into groups is a natural activity which is most often a prerequisite before any physical analysis of the data. Clustering and phylogenetic approaches are two different and comple- mentary ways in this purpose: the first one relies on similarities and the second one on relationships. In this paper, we describe very simply these approaches and show how phylogenetic techniques can be used in astrophysics by using a toy example based on a sample of stars obtained from models of stellar evolution. -
II. a Cladistic Analysis of Rbcl Sequences and Morphology
Systematic Botany (2003), 28(2): pp. 270±292 q Copyright 2003 by the American Society of Plant Taxonomists Phylogenetic Relationships in the Commelinaceae: II. A Cladistic Analysis of rbcL Sequences and Morphology TIMOTHY M. EVANS,1,3 KENNETH J. SYTSMA,1 ROBERT B. FADEN,2 and THOMAS J. GIVNISH1 1Department of Botany, University of Wisconsin, 430 Lincoln Drive, Madison, Wisconsin 53706; 2Department of Systematic Biology-Botany, MRC 166, National Museum of Natural History, Smithsonian Institution, P.O. Box 37012, Washington, DC 20013-7012; 3Present address, author for correspondence: Department of Biology, Hope College, 35 East 12th Street, Holland, Michigan 49423-9000 ([email protected]) Communicating Editor: John V. Freudenstein ABSTRACT. The chloroplast-encoded gene rbcL was sequenced in 30 genera of Commelinaceae to evaluate intergeneric relationships within the family. The Australian Cartonema was consistently placed as sister to the rest of the family. The Commelineae is monophyletic, while the monophyly of Tradescantieae is in question, due to the position of Palisota as sister to all other Tradescantieae plus Commelineae. The phylogeny supports the most recent classi®cation of the family with monophyletic tribes Tradescantieae (minus Palisota) and Commelineae, but is highly incongruent with a morphology-based phylogeny. This incongruence is attributed to convergent evolution of morphological characters associated with pollination strategies, especially those of the androecium and in¯orescence. Analysis of the combined data sets produced a phylogeny similar to the rbcL phylogeny. The combined analysis differed from the molecular one, however, in supporting the monophyly of Dichorisandrinae. The family appears to have arisen in the Old World, with one or possibly two movements to the New World in the Tradescantieae, and two (or possibly one) subsequent movements back to the Old World; the latter are required to account for the Old World distribution of Coleotrypinae and Cyanotinae, which are nested within a New World clade. -
Hemiplasy and Homoplasy in the Karyotypic Phylogenies of Mammals
Hemiplasy and homoplasy in the karyotypic phylogenies of mammals Terence J. Robinson*†, Aurora Ruiz-Herrera‡, and John C. Avise†§ *Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland 7602, South Africa; ‡Laboratorio di Biologia Cellulare e Molecolare, Dipartimento di Genetica e Microbiologia, Universita`degli Studi di Pavia, via Ferrata 1, 27100 Pavia, Italy; and §Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697 Contributed by John C. Avise, July 31, 2008 (sent for review May 28, 2008) Phylogenetic reconstructions are often plagued by difficulties in (2). These syntenic blocks, sometimes shared across even dis- distinguishing phylogenetic signal (due to shared ancestry) from tantly related species, may involve entire chromosomes, chro- phylogenetic noise or homoplasy (due to character-state conver- mosomal arms, or chromosomal segments. Based on the premise gences or reversals). We use a new interpretive hypothesis, termed that each syntenic assemblage in extant species is of monophy- hemiplasy, to show how random lineage sorting might account for letic origin, researchers have reconstructed phylogenies and specific instances of seeming ‘‘phylogenetic discordance’’ among ancestral karyotypes for numerous mammalian taxa (ref. 3 and different chromosomal traits, or between karyotypic features and references therein). Normally, the phylogenetic inferences from probable species phylogenies. We posit that hemiplasy is generally these cladistic appraisals are self-consistent (across syntenic less likely for underdominant chromosomal polymorphisms (i.e., blocks) and taxonomically reasonable, and they have helped those with heterozygous disadvantage) than for neutral polymor- greatly to identify particular mammalian clades. However, in a phisms or especially for overdominant rearrangements (which few cases problematic phylogenetic patterns of the sort described should tend to be longer-lived), and we illustrate this concept by above have emerged. -
Phylogenetics
Phylogenetics What is phylogenetics? • Study of branching patterns of descent among lineages • Lineages – Populations – Species – Molecules • Shift between population genetics and phylogenetics is often the species boundary – Distantly related populations also show patterning – Patterning across geography What is phylogenetics? • Goal: Determine and describe the evolutionary relationships among lineages – Order of events – Timing of events • Visualization: Phylogenetic trees – Graph – No cycles Phylogenetic trees • Nodes – Terminal – Internal – Degree • Branches • Topology Phylogenetic trees • Rooted or unrooted – Rooted: Precisely 1 internal node of degree 2 • Node that represents the common ancestor of all taxa – Unrooted: All internal nodes with degree 3+ Stephan Steigele Phylogenetic trees • Rooted or unrooted – Rooted: Precisely 1 internal node of degree 2 • Node that represents the common ancestor of all taxa – Unrooted: All internal nodes with degree 3+ Phylogenetic trees • Rooted or unrooted – Rooted: Precisely 1 internal node of degree 2 • Node that represents the common ancestor of all taxa – Unrooted: All internal nodes with degree 3+ • Binary: all speciation events produce two lineages from one • Cladogram: Topology only • Phylogram: Topology with edge lengths representing time or distance • Ultrametric: Rooted tree with time-based edge lengths (all leaves equidistant from root) Phylogenetic trees • Clade: Group of ancestral and descendant lineages • Monophyly: All of the descendants of a unique common ancestor • Polyphyly: -
The Generalized Neighbor Joining Method
Ruriko Yoshida The Generalized Neighbor Joining method Ruriko Yoshida Dept. of Statistics University of Kentucky Joint work with Dan Levy and Lior Pachter www.math.duke.edu/~ruriko Oxford 1 Ruriko Yoshida Challenge We would like to assemble the fungi tree of life. Francois Lutzoni and Rytas Vilgalys Department of Biology, Duke University 1500+ fungal species http://ocid.nacse.org/research/aftol/about.php Oxford 2 Ruriko Yoshida Many problems to be solved.... http://tolweb.org/tree?group=fungi Zygomycota is not monophyletic. The position of some lineages such as that of Glomales and of Engodonales-Mortierellales is unclear, but they may lie outside Zygomycota as independent lineages basal to the Ascomycota- Basidiomycota lineage (Bruns et al., 1993). Oxford 3 Ruriko Yoshida Phylogeny Phylogenetic trees describe the evolutionary relations among groups of organisms. Oxford 4 Ruriko Yoshida Constructing trees from sequence data \Ten years ago most biologists would have agreed that all organisms evolved from a single ancestral cell that lived 3.5 billion or more years ago. More recent results, however, indicate that this family tree of life is far more complicated than was believed and may not have had a single root at all." (W. Ford Doolittle, (June 2000) Scienti¯c American). Since the proliferation of Darwinian evolutionary biology, many scientists have sought a coherent explanation from the evolution of life and have tried to reconstruct phylogenetic trees. Oxford 5 Ruriko Yoshida Methods to reconstruct a phylogenetic tree from DNA sequences include: ² The maximum likelihood estimation (MLE) methods: They describe evolution in terms of a discrete-state continuous-time Markov process. -
Rapid Neighbour-Joining
Rapid Neighbour-Joining Martin Simonsen, Thomas Mailund and Christian N. S. Pedersen Bioinformatics Research Center (BIRC), University of Aarhus, C. F. Møllers All´e,Building 1110, DK-8000 Arhus˚ C, Denmark. fkejseren,mailund,[email protected] Abstract. The neighbour-joining method reconstructs phylogenies by iteratively joining pairs of nodes until a single node remains. The cri- terion for which pair of nodes to merge is based on both the distance between the pair and the average distance to the rest of the nodes. In this paper, we present a new search strategy for the optimisation criteria used for selecting the next pair to merge and we show empirically that the new search strategy is superior to other state-of-the-art neighbour- joining implementations. 1 Introduction The neighbour-joining (NJ) method by Saitou and Nei [8] is a widely used method for phylogenetic reconstruction, made popular by a combination of com- putational efficiency combined with reasonable accuracy. With its cubic running time by Studier and Kepler [11], the method scales to hundreds of species, and while it is usually possible to infer phylogenies with thousands of species, tens or hundreds of thousands of species is infeasible. Various approaches have been taken to improve the running time for neighbour-joining. QuickTree [5] was an attempt to improve performance by making an efficient implementation. It out- performed the existing implementations but still performs as a O n3 time algo- rithm. QuickJoin [6, 7], instead, uses an advanced search heuristic when searching for the pair of nodes to join. Where the straight-forward search takes time O n2 per join, QuickJoin on average reduces this to O (n) and can reconstruct large trees in a small fraction of the time needed by QuickTree.