The Impact of Tree Topology on Neutrality Tests

Total Page:16

File Type:pdf, Size:1020Kb

The Impact of Tree Topology on Neutrality Tests | INVESTIGATION Decomposing the Site Frequency Spectrum: The Impact of Tree Topology on Neutrality Tests Luca Ferretti,*,1 Alice Ledda,† Thomas Wiehe,‡ Guillaume Achaz,§,** and Sebastian E. Ramos-Onsins†† *The Pirbright Institute, Woking, GU24 0NF, United Kingdom, †Department of Infectious Disease Epidemiology, Imperial College, London, W2 1PG, United Kingdom, ‡Institute of Genetics, University of Cologne, D-50674, Germany, §Institut de Systématique, Evolution, Biodiversité, Unité Mixte de Recherche 7205, and **Centre Interdisciplinaire de Recherche en Biologie, Unité Mixte de Recherche 7241, Paris College de France, and ††Centre for Research in Agricultural Genomics (CRAG), Bellaterra, 08290 Barcelona, Spain ABSTRACT We investigate the dependence of the site frequency spectrum on the topological structure of genealogical trees. We show that basic population genetic statistics, for instance, estimators of u or neutrality tests such as Tajima’s D, can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of an important class of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’s D and Fay and Wu’s H depend in a direct way on a peculiar measure of tree balance, which is mostly determined by the root balance of the tree. We present a new test for selection in the same class as Fay and Wu’s H and discuss its interpretation and power. Finally, we determine the trees corresponding to extreme expected values of these neutrality tests and present formulas for these extreme values as a function of sample size and number of segregating sites. KEYWORDS coalescent theory; neutrality tests; site frequency spectrum; tree shape; tree balance OALESCENT theory (Kingman 1982; Hein et al. 2004; b statistic to tree balance (Blum and François 2006). Impor- CWakeley 2009) provides a powerful framework to inter- tantly, these statistics can only be computed after the tree pret the mutation patterns in a sample of DNA sequences. structure was independently inferred, typically by phyloge- Grounded in the neutral theory of molecular evolution netic reconstruction methods (Felsenstein 2004). (Kimura 1985), binary coalescent trees are the dual back- In population genetics, the historical relationship among ward representations of the continuous-forward-time diffu- nonrecombining sequences is represented by a single genea- sion model of genetic drift. In this view, sequences are related logical tree. The tree is completely determined by the waiting by a genealogical tree where leaf nodes represent the sam- times and the branching order of coalescent events. The pled sequences at present time, and internal nodes (coales- waiting times determine branch lengths; the branching order cent events) represent last common ancestors of the leaves determines tree shape. Population genetic statistics, such as underneath. In particular, the root node represents the most estimates of the scaled mutation rate or tests of the neutral recent common ancestor of the whole sample. evolution hypothesis (neutrality tests) are sensitive to waiting In species phylogeny and epidemiology, tree structure is times and tree shape. often used to compare different models of evolution or to fit The site frequency spectrum (SFS) is one of the most-used model parameters (Bouckaert et al. 2014). Two summary sta- statistics in population genetics. The unfolded SFS tistics are routinely used to characterize tree structure: the g j ; ...; fi ¼ðj1 jn21Þ of a sample of n sequences is de ned as statistic relates to the waiting times (Pybus et al. 2000) and the ; ; ...; 2 ; the vector of counts ji i 2f1 n 1g of all polymorphic “ ” = : Copyright © 2017 by the Genetics Society of America sites with a derived allele ( mutation ) at frequency i n The doi: https://doi.org/10.1534/genetics.116.188763 SFS is a function of both tree structure and mutational process. Manuscript received March 1, 2016; accepted for publication May 19, 2017; published Early Online July 5, 2017. For a given mutational process, the SFS carries information on Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10. the underlying, but not directly observable, genealogical trees 1534/genetics.116.188763/-/DC1. 1 and therefore on the forward process that has generated the Corresponding author: The Pirbright Institute, Ash Rd., Pirbright, Woking, GU24 0NF, United Kingdom. E-mail: [email protected] trees. For a nonrecombining locus, the SFS carries information Genetics, Vol. 207, 229–240 September 2017 229 on the realized coalescent tree and can be used to estimate tree internal nodes close to the root will be referred to as “upper part” structure (both waiting times and topology). of the tree; conversely, the “lower part” is close to the leaves. Variation over time in the effective population size affects The waiting times between subsequent “binary” coalescent the expected waiting times between coalescent events. In the events, i.e., the level heights, are denoted by tk: For trees with past, much attention in theoretical works has been paid to the coalescent events involving multiple mergers, some of the relation between waiting times and population size variation. binary waiting times could be null, i.e., tk ¼ 0: For example, For example, skyline plots (Pybus et al. 2000) are directly if four lineages would coalesce together in a tree with five used to infer variation of population size (Ho and Shapiro lineages, and then the two remaining lineages would coa- 2011), although care should be taken while using this ap- lesce to form the root, then t3 ¼ 0: proach (Lapierre et al. 2016). More generally, formulas of In a neutral, panmictic population of ploidy p (typically the SFS can be generalized to include deterministic changes p ¼ 1 or 2) and constant effective population size Ne that can of population size (Griffiths and Tavaré 1998; Zivkovic and be modeled by the Kingman coalescent, the tk are exponen- Wiehe 2008; Liu and Fu 2015). In contrast, the influence of tially distributed with parameter kðk 2 1Þ; when the time is tree shape on the SFS has not yet been tackled analytically. measured in 2pNe generations (WakeleyP 2009). Two sum- n ; Theshapeofatreecanrangefromcompletelysymmetric mary tree statistics are the height h ¼ k¼2tk which is the trees, in which all internal nodesevenlysplitthelineages;to time from the present to theP most recent common ancestor, n : caterpillar trees, in which each node isolates exactly one lineage. and the total tree length l ¼ k¼2ktk Basic coalescentP theory 2 = ; n21 = In the standard neutral model, as well as in any other equal-rates states EðhÞ¼1 1 n and EðlÞ¼an where an ¼ i¼1 1 i is Markov or Yule model (Yule 1925), both of these extreme cases the ðn 2 1Þth harmonic number. are very unlikely to appear by chance (Blum and François 2006). In fact, since the number of binary tree shapes [enumer- Tree imbalance per level ated by the Wedderburn–Etherington numbers (Sloane and Following Fu (1995), we define the size dk of a branch from Plouffe 1995)] grows rapidly with the number of sequences n, level k as the number of leaves that descend from that branch. any specific tree shape is arbitrarily improbable if n is sufficiently Any mutation on this branch is carried by dk sequences from large. Nonetheless, tree topology is a major determinant of the the present sample. We denote by Pðdk ¼ ijTÞ the probability SFS. For example, a caterpillar shape leads to a large excess of that a randomly chosen branch of level k is of size i, given tree singleton mutations, while a completely symmetric tree leads to T. The complete set of distributions Pðdk ¼ ijTÞ for each i and an overrepresentation of intermediate frequency alleles. k determines uniquely the shape of the tree T. This study aims at a providing a systematic analysis of the The mean numberP of descendants across all branches from impact of the structure of genealogical trees upon the SFS. n2kþ1 = : level k is EðdkÞ¼ i¼1 iPðdk ¼ ijTÞ¼n k This holds for First, we introduce the theoretical framework for neutrality any tree, since all n present-day sequences must descend tests and tree balance. In particular, we develop a new mea- from one of the k branches from that level. sure of imbalance appropriate for population genetics. Then, In contrast, the size variance, VarðdkÞ; depends on the tree we present the decomposition of the SFS in terms of waiting topology: at all levels, it is almost zero in completely balanced times and tree shape. We discuss the case of a single non- trees and maximal in caterpillar trees, where all nodes isolate fi recombining locus, assuming a single realized tree ( xed one leaf from the remaining subtree. For this reason, we pro- topology). As recombination affects mostly lower branches pose the variance VarðdkÞ as the natural measure of imbalance of the tree, this also constitutes an excellent approximation for for each level. a locus with a low level of recombination. The bounds on VarðdkÞ; shown in Figure 1A, vary greatly We present a mathematically rigorous, yet intuitive, inter- from level to level: for example, the variance of the upper- pretation ofneutralitytests interms oftreetopologyand branch 2 most level is Varðd2Þ2½0; ðn=221Þ ; whereas VarðdnÞ¼0 lengths. We focus on a subclass of tests of special interest and (since dn ¼ 1 for all branches). More generally, the maximum simplicity. A qualitative summary of the results about the in- variance at a given level k is obtained in trees where k 2 1 terpretation of neutrality tests is given in Table 1. We also lineages lead to exactly one leaf and one lineage has propose a new neutrality test, L, for selection.
Recommended publications
  • Lecture Notes: the Mathematics of Phylogenetics
    Lecture Notes: The Mathematics of Phylogenetics Elizabeth S. Allman, John A. Rhodes IAS/Park City Mathematics Institute June-July, 2005 University of Alaska Fairbanks Spring 2009, 2012, 2016 c 2005, Elizabeth S. Allman and John A. Rhodes ii Contents 1 Sequences and Molecular Evolution 3 1.1 DNA structure . .4 1.2 Mutations . .5 1.3 Aligned Orthologous Sequences . .7 2 Combinatorics of Trees I 9 2.1 Graphs and Trees . .9 2.2 Counting Binary Trees . 14 2.3 Metric Trees . 15 2.4 Ultrametric Trees and Molecular Clocks . 17 2.5 Rooting Trees with Outgroups . 18 2.6 Newick Notation . 19 2.7 Exercises . 20 3 Parsimony 25 3.1 The Parsimony Criterion . 25 3.2 The Fitch-Hartigan Algorithm . 28 3.3 Informative Characters . 33 3.4 Complexity . 35 3.5 Weighted Parsimony . 36 3.6 Recovering Minimal Extensions . 38 3.7 Further Issues . 39 3.8 Exercises . 40 4 Combinatorics of Trees II 45 4.1 Splits and Clades . 45 4.2 Refinements and Consensus Trees . 49 4.3 Quartets . 52 4.4 Supertrees . 53 4.5 Final Comments . 54 4.6 Exercises . 55 iii iv CONTENTS 5 Distance Methods 57 5.1 Dissimilarity Measures . 57 5.2 An Algorithmic Construction: UPGMA . 60 5.3 Unequal Branch Lengths . 62 5.4 The Four-point Condition . 66 5.5 The Neighbor Joining Algorithm . 70 5.6 Additional Comments . 72 5.7 Exercises . 73 6 Probabilistic Models of DNA Mutation 81 6.1 A first example . 81 6.2 Markov Models on Trees . 87 6.3 Jukes-Cantor and Kimura Models .
    [Show full text]
  • Poaceae: Bambusoideae) Christopher Dean Tyrrell Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2008 Systematics of the neotropical woody bamboo genus Rhipidocladum (Poaceae: Bambusoideae) Christopher Dean Tyrrell Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Botany Commons Recommended Citation Tyrrell, Christopher Dean, "Systematics of the neotropical woody bamboo genus Rhipidocladum (Poaceae: Bambusoideae)" (2008). Retrospective Theses and Dissertations. 15419. https://lib.dr.iastate.edu/rtd/15419 This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Systematics of the neotropical woody bamboo genus Rhipidocladum (Poaceae: Bambusoideae) by Christopher Dean Tyrrell A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Ecology and Evolutionary Biology Program of Study Committee: Lynn G. Clark, Major Professor Dennis V. Lavrov Robert S. Wallace Iowa State University Ames, Iowa 2008 Copyright © Christopher Dean Tyrrell, 2008. All rights reserved. 1457571 1457571 2008 ii In memory of Thomas D. Tyrrell Festum Asinorum iii TABLE OF CONTENTS ABSTRACT iv CHAPTER 1. GENERAL INTRODUCTION 1 Background and Significance 1 Research Objectives 5 Thesis Organization 6 Literature Cited 6 CHAPTER 2. PHYLOGENY OF THE BAMBOO SUBTRIBE 9 ARTHROSTYLIDIINAE WITH EMPHASIS ON RHIPIDOCLADUM Abstract 9 Introduction 10 Methods and Materials 13 Results 19 Discussion 25 Taxonomic Treatment 26 Literature Cited 31 CHAPTER 3.
    [Show full text]
  • Evolutionary History of Floral Key Innovations in Angiosperms Elisabeth Reyes
    Evolutionary history of floral key innovations in angiosperms Elisabeth Reyes To cite this version: Elisabeth Reyes. Evolutionary history of floral key innovations in angiosperms. Botanics. Université Paris Saclay (COmUE), 2016. English. NNT : 2016SACLS489. tel-01443353 HAL Id: tel-01443353 https://tel.archives-ouvertes.fr/tel-01443353 Submitted on 23 Jan 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. NNT : 2016SACLS489 THESE DE DOCTORAT DE L’UNIVERSITE PARIS-SACLAY, préparée à l’Université Paris-Sud ÉCOLE DOCTORALE N° 567 Sciences du Végétal : du Gène à l’Ecosystème Spécialité de Doctorat : Biologie Par Mme Elisabeth Reyes Evolutionary history of floral key innovations in angiosperms Thèse présentée et soutenue à Orsay, le 13 décembre 2016 : Composition du Jury : M. Ronse de Craene, Louis Directeur de recherche aux Jardins Rapporteur Botaniques Royaux d’Édimbourg M. Forest, Félix Directeur de recherche aux Jardins Rapporteur Botaniques Royaux de Kew Mme. Damerval, Catherine Directrice de recherche au Moulon Président du jury M. Lowry, Porter Curateur en chef aux Jardins Examinateur Botaniques du Missouri M. Haevermans, Thomas Maître de conférences au MNHN Examinateur Mme. Nadot, Sophie Professeur à l’Université Paris-Sud Directeur de thèse M.
    [Show full text]
  • Zootaxa,Molecular Systematics of Serrasalmidae
    Zootaxa 1484: 1–38 (2007) ISSN 1175-5326 (print edition) www.mapress.com/zootaxa/ ZOOTAXA Copyright © 2007 · Magnolia Press ISSN 1175-5334 (online edition) Molecular systematics of Serrasalmidae: Deciphering the identities of piranha species and unraveling their evolutionary histories BARBIE FREEMAN1, LEO G. NICO2, MATTHEW OSENTOSKI1, HOWARD L. JELKS2 & TIMOTHY M. COLLINS1,3 1Dept. of Biological Sciences, Florida International University, University Park, Miami, FL 33199, USA. 2United States Geological Survey, 7920 NW 71st St., Gainesville, FL 32653, USA. E-mail: [email protected] 3Corresponding author. E-mail: [email protected] Table of contents Abstract ...............................................................................................................................................................................1 Introduction ......................................................................................................................................................................... 2 Overview of piranha diversity and systematics .................................................................................................................. 3 Material and methods........................................................................................................................................................ 10 Results............................................................................................................................................................................... 16 Discussion
    [Show full text]
  • A Simple Polytomy Resolver for Dated Phylogenies
    Methods in Ecology and Evolution 2011, 2, 427–436 doi: 10.1111/j.2041-210X.2011.00103.x A simple polytomy resolver for dated phylogenies Tyler S. Kuhn1, Arne Ø. Mooers2 and Gavin H. Thomas3* 1Biological Sciences and 2IRMACS, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada V5A 1S6; and 3School of Biological Sciences, University of Bristol, Woodland Road, Bristol BS8 1UG, UK Summary 1. Unresolved nodes in phylogenetic trees (polytomies) have long been recognized for their influ- ences on specific phylogenetic metrics such as topological imbalance measures, diversification rate analysis and measures of phylogenetic diversity. However, no rigorously tested, biologically appro- priate method has been proposed for overcoming the effects of this phylogenetic uncertainty. 2. Here, we present a simple approach to polytomy resolution, using biologically relevant models of diversification. Using the powerful and highly customizable phylogenetic inference and analysis software beast and r, we present a semi-automated ‘polytomy resolver’ capable of providing a distri- bution of tree topologies and branch lengths under specified biological models. 3. Utilizing both simulated and empirical data sets, we explore the effects and characteristics of this approach on two widely used phylogenetic tree statistics, Pybus’ gamma (c) and Colless’ normalized tree imbalance (Ic). Using simulated pure birth trees, we find no evidence of bias in either estimate using our resolver. Applying our approach to a recently published Cetacean phylogeny, we observed the expected small positive bias in c and decrease in Ic. 4. We further test the effect of polytomy resolution on diversification rate analysis using the Ceta- cean phylogeny. We demonstrate that using a birth–death model to resolve the Cetacean tree with 20%, 40% and 60% of random nodes collapsed to polytomies gave qualitatively similar patterns regarding the tempo and mode of diversification as the same analyses on the original, fully resolved phylogeny.
    [Show full text]
  • Basics of Cladistic Analysis
    Basics of Cladistic Analysis Diana Lipscomb George Washington University Washington D.C. Copywrite (c) 1998 Preface This guide is designed to acquaint students with the basic principles and methods of cladistic analysis. The first part briefly reviews basic cladistic methods and terminology. The remaining chapters describe how to diagnose cladograms, carry out character analysis, and deal with multiple trees. Each of these topics has worked examples. I hope this guide makes using cladistic methods more accessible for you and your students. Report any errors or omissions you find to me and if you copy this guide for others, please include this page so that they too can contact me. Diana Lipscomb Weintraub Program in Systematics & Department of Biological Sciences George Washington University Washington D.C. 20052 USA e-mail: [email protected] © 1998, D. Lipscomb 2 Introduction to Systematics All of the many different kinds of organisms on Earth are the result of evolution. If the evolutionary history, or phylogeny, of an organism is traced back, it connects through shared ancestors to lineages of other organisms. That all of life is connected in an immense phylogenetic tree is one of the most significant discoveries of the past 150 years. The field of biology that reconstructs this tree and uncovers the pattern of events that led to the distribution and diversity of life is called systematics. Systematics, then, is no less than understanding the history of all life. In addition to the obvious intellectual importance of this field, systematics forms the basis of all other fields of comparative biology: • Systematics provides the framework, or classification, by which other biologists communicate information about organisms • Systematics and its phylogenetic trees provide the basis of evolutionary interpretation • The phylogenetic tree and corresponding classification predicts properties of newly discovered or poorly known organisms THE SYSTEMATIC PROCESS The systematic process consists of five interdependent but distinct steps: 1.
    [Show full text]
  • Introduction to Computational Phylogenetics
    Introduction to Computational Phylogenetics Tandy Warnow The University of Texas at Austin No Institute Given This textbook is a draft, and should not be distributed. Much of what is in this textbook appeared verbatim in another text for the LSA (Linguistics Society of America) course for computa- tional phylogenetics for linguistics, which was co-authored by Don Ringe, Johanna Nichols, and Tandy Warnow. Copyright is owned by Tandy Warnow. Table of Contents 1. Introduction 2. Trees 2.1 Rooted trees 2.2 Unrooted trees 2.3 Consensus trees 2.4 When trees are compatible 2.5 Measures of accuracy in estimated trees 2.6 Rogue taxa 2.7 Induced subtrees 3. Constructing trees from subtrees 3.1 Constructing trees from rooted triples 3.2 Constructing trees from quartet subtrees 3.3 General subtrees 4. Constructing trees from qualitative characters 4.1 Introduction 4.2 Constructing rooted trees from directed binary characters 4.3 Constructing unrooted trees from compatible binary characters 4.4 General issues in constructing trees from characters 4.5 Maximum compatibility 4.6 Maximum parsimony 4.7 Binary encoding of multi-state characters 4.8 Informative and uninformative characters 5. Constructing trees from distances 5.1 Step 1: computing distances 5.2 Step 2: computing a tree from a distance matrix 6. Statistical methods 6.1 Introduction to Markov models of evolution 6.2 Calculating the probability of a site pattern 6.3 Calculating the likelihood of a tree 6.4 Bayesian methods 6.5 Maximum likelihood methods 7. Other estimation issues 8. Reticulate evolution 8.1 Introduction 8.2 Phylogenetic networks in linguistics 8.3 Phylogenetic networks in biology 9.
    [Show full text]
  • The Importance of Divergence Ages in Phylogenetic Studies
    Available online at www.sciencedirect.com MOLECULAR PHYLOGENETICS W ScienceDirect AND EVOLUTION ELSEVIER Molecular Phylogenetics and Evolution 43 (2007) 1131-1137 www.elsevier.com/locate/ympev Origins of social parasitism: The importance of divergence ages in phylogenetic studies Jaclyn A. Smith ^'*, Simon M. Tierney ^, Yung Chul Park ^ Susan Fuller '^, Michael P. Schwarz ^ '^Flinders University, School of Biological Sciences, GPO Box 2100, Adelaide, SA, 5001, Australia Smithsonian Tropical Research Institute, Apartado Postal 0843-03092, Balboa, Ancon, Panama •^ Department of Biology, Dongguk University, Junggu, Seoul 100-715, Republic of Korea School of Natural Resource Sciences, Queensland University of Technology, GPO Box 2434, Bri,sbane, Qld4001, Australia Received 15 June 2006; revised 13 December 2006; accepted 18 December 2006 Available online 6 February 2007 Abstract Phylogenetic studies on insect social parasites have found very close host-parasite relationships, and these have often been interpreted as providing evidence for sympatric speciation. However, such phylogenetic inferences are problematic because events occurring after the origin of parasitism, such as extinction, host switching and subsequent speciation, or an incomplete sampling of taxa, could all confound the interpretation of phylogenetic relationships. Using a tribe of bees where social parasitism has repeatedly evolved over a wide time- scale, we show the problems associated with phylogenetic inference of sympatric speciation. Host-parasite relationships of more ancient species appear to support sympatric speciation, whereas in a case where parasitism has evolved very recently, sympatric speciation can be ruled out. However, in this latter case, a single extinction event would have lead to relationships that support sympatric speciation, indi- cating the importance of considering divergence ages when analysing the modes of social parasite evolution.
    [Show full text]
  • Phylogeny As Population History
    Philos Theor Biol (2013) 5:e402 Phylogeny as population history Joel D. Velasco§‡ The construction and use of phylogenetic trees is central to modern systematics. But it is unclear exactly what phylogenies and phylogenetic trees represent. They are sometimes said to represent genealogical relationships between taxa, between species, or simply between “groups of organisms.” But these are incompatible representational claims. This paper focuses on how trees are used to make inferences and then argues that this focus requires that phylogenies represent the histories of populations. KEYWORDS Phylogeny ● Population ● Species ● Systematics ● Taxonomy 1. Introduction The project of this paper is to understand what a phylogenetic tree represents and to discuss some of the implications that this has for the practice of systematics. At least the first part of this task, if not both parts, might appear trivial—or perhaps better suited for a single page in a textbook rather than a scholarly research paper. But this would be a mistake. While the task of interpreting phylogenetic trees is often treated in a trivial way, their interpretation is tied to foundational conceptual questions at the heart of systematics— questions whose answers are hotly disputed. I have previously argued that widely shared ideas about the meaning and interpretation of phylogenetic trees are inconsistent with species concepts other than some genealogical version of a phylogenetic species concept (Velasco 2008). Here I rely on a similar approach and concentrate on the implications of the necessary conditions underlying the inferences that we make using phylogenetic trees. I argue that common practices for the interpretation and use of trees are in conflict and that unacceptable principles about species as units of phylogeny must be given up.
    [Show full text]
  • Rosid Radiation and the Rapid Rise of Angiosperm-Dominated Forests
    Rosid radiation and the rapid rise of angiosperm-dominated forests Hengchang Wanga,b, Michael J. Moorec, Pamela S. Soltisd, Charles D. Belle, Samuel F. Brockingtonb, Roolse Alexandreb, Charles C. Davisf, Maribeth Latvisb,f, Steven R. Manchesterd, and Douglas E. Soltisb,1 aWuhan Botanical Garden, The Chinese Academy of Science, Wuhan, Hubei 430074 China; bDepartment of Botany and dFlorida Museum of Natural History, University of Florida, Gainesville, FL 32611; cBiology Department, Oberlin College, Oberlin, OH 44074-1097; eDepartment of Biological Sciences, University of New Orleans, New Orleans, LA 70148; and fDepartment of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138 Communicated by Peter Crane, University of Chicago, Chicago, IL, January 4, 2009 (received for review April 1, 2008) The rosid clade (70,000 species) contains more than one-fourth of all nitrogen-fixing bacteria (nitrogen-fixing clade) and defense mech- angiosperm species and includes most lineages of extant temperate anisms such as glucosinolate production (Brassicales) and cyano- and tropical forest trees. Despite progress in elucidating relationships genic glycosides (2). Many important crops, including legumes within the angiosperms, rosids remain the largest poorly resolved (Fabaceae) and fruit crops (Rosaceae), are rosids. Furthermore, 4 major clade; deep relationships within the rosids are particularly of the 5 published complete angiosperm nuclear genome sequences enigmatic. Based on parsimony and maximum likelihood (ML) anal- are rosids (with 2 other rosids, Manihot and Ricinus, well under- yses of separate and combined 12-gene (10 plastid genes, 2 nuclear; way): Arabidopsis (Brassicaceae), Carica (Caricaceae), Populus >18,000 bp) and plastid inverted repeat (IR; 24 genes and intervening (Salicaceae), and Vitis (Vitaceae, sister to other rosids).
    [Show full text]
  • Live Phylogeny with Polytomies
    1 Live Phylogeny with Polytomies: Finding the Most Compact Parsimonious Trees Dimitris Papamichail, Angela Huang, Edward Kennedy, Jan-Lucas Ott, Andrew Miller, Georgios Papamichail Abstract—Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and describe novel algorithms to find the such trees without resorting to exhaustive enumeration of all possible tree topologies. Index Terms—Phylogenetics, Maximum Parsimony, Live Phylogeny, Polytomies. ✦ 1 INTRODUCTION may be known and well characterized, prompting the need for phylogenetic reconstruction methods that account for Phylogeny is the evolutionary history of a set of species labeled internal nodes. Notably, the fossil record is incom- whose relationships are often represented by a tree. Phy- plete, and it does not provide a high guarantee of recording logenetic trees can be rooted or unrooted, and their edges the common ancestor of species [12]. However, there are are labelled with lengths that correspond to evolutionary certain species where the fossil record has been extensively distances between species. studied and extinct common ancestors are highly known, Maximum Parsimony is a method that uses characters, such as the case for graptolites (e.g.
    [Show full text]
  • Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies
    Testing for polytomies in phylogenetic species trees using quartet frequencies Erfan Sayyari ∗1 and Siavash Miraraby1 1Department of Electrical and Computer Engineering, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093. February 8, 2018 Abstract Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the ASTRAL package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest. Index terms| Incomplete Lineage Sorting, Multi-species Coalescent Model, Summary Methods, Phy- logenomics, Polytomy, Multifurcation, Statistical Test 1 Introduction Phylogenies are typically modeled as bifurcating trees. Even when the evolution is fully vertical, which it is not always Bapteste et al. (2013); Nakhleh (2011), the binary model precludes the possibility of several species evolving simultaneously from a progenitor species Hoelzer and Meinick (1994).
    [Show full text]