Molecular Phylogenetics: Principles and Practice
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Selecting Molecular Markers for a Specific Phylogenetic Problem
MOJ Proteomics & Bioinformatics Review Article Open Access Selecting molecular markers for a specific phylogenetic problem Abstract Volume 6 Issue 3 - 2017 In a molecular phylogenetic analysis, different markers may yield contradictory Claudia AM Russo, Bárbara Aguiar, Alexandre topologies for the same diversity group. Therefore, it is important to select suitable markers for a reliable topological estimate. Issues such as length and rate of evolution P Selvatti Department of Genetics, Federal University of Rio de Janeiro, will play a role in the suitability of a particular molecular marker to unfold the Brazil phylogenetic relationships for a given set of taxa. In this review, we provide guidelines that will be useful to newcomers to the field of molecular phylogenetics weighing the Correspondence: Claudia AM Russo, Molecular Biodiversity suitability of molecular markers for a given phylogenetic problem. Laboratory, Department of Genetics, Institute of Biology, Block A, CCS, Federal University of Rio de Janeiro, Fundão Island, Keywords: phylogenetic trees, guideline, suitable genes, phylogenetics Rio de Janeiro, RJ, 21941-590, Brazil, Tel 21 991042148, Fax 21 39386397, Email [email protected] Received: March 14, 2017 | Published: November 03, 2017 Introduction concept that occupies a central position in evolutionary biology.15 Homology is a qualitative term, defined by equivalence of parts due Over the last three decades, the scientific field of molecular to inherited common origin.16–18 biology has experienced remarkable advancements -
Investgating Determinants of Phylogeneic Accuracy
IMPACT OF MOLECULAR EVOLUTIONARY FOOTPRINTS ON PHYLOGENETIC ACCURACY – A SIMULATION STUDY Dissertation Submitted to The College of Arts and Sciences of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree Doctor of Philosophy in Biology by Bhakti Dwivedi UNIVERSITY OF DAYTON August, 2009 i APPROVED BY: _________________________ Gadagkar, R. Sudhindra Ph.D. Major Advisor _________________________ Robinson, Jayne Ph.D. Committee Member Chair Department of Biology _________________________ Nielsen, R. Mark Ph.D. Committee Member _________________________ Rowe, J. John Ph.D. Committee Member _________________________ Goldman, Dan Ph.D. Committee Member ii ABSTRACT IMPACT OF MOLECULAR EVOLUTIONARY FOOTPRINTS ON PHYLOGENETIC ACCURACY – A SIMULATION STUDY Dwivedi Bhakti University of Dayton Advisor: Dr. Sudhindra R. Gadagkar An accurately inferred phylogeny is important to the study of molecular evolution. Factors impacting the accuracy of a phylogenetic tree can be traced to several consecutive steps leading to the inference of the phylogeny. In this simulation-based study our focus is on the impact of the certain evolutionary features of the nucleotide sequences themselves in the alignment rather than any source of error during the process of sequence alignment or due to the choice of the method of phylogenetic inference. Nucleotide sequences can be characterized by summary statistics such as sequence length and base composition. When two or more such sequences need to be compared to each other (as in an alignment prior to phylogenetic analysis) additional evolutionary features come into play, such as the overall rate of nucleotide substitution, the ratio of two specific instantaneous, rates of substitution (rate at which transitions and transversions occur), and the shape parameter, of the gamma distribution (that quantifies the extent of iii heterogeneity in substitution rate among sites in an alignment). -
Analysis of the Impact of Sequencing Errors on Blast Using Fault Injection
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository ANALYSIS OF THE IMPACT OF SEQUENCING ERRORS ON BLAST USING FAULT INJECTION BY SO YOUN LEE THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering in the Graduate College of the University of Illinois at Urbana-Champaign, 2013 Urbana, Illinois Adviser: Professor Ravishankar K. Iyer ABSTRACT This thesis investigates the impact of sequencing errors in post-sequence computational analyses, including local alignment search and multiple sequence alignment. While the error rates of sequencing technology are commonly reported, the significance of these numbers cannot be fully grasped without putting them in the perspective of their impact on the downstream analyses that are used for biological research, forensics, diagnosis of diseases, etc. I approached the quantification of the impact using fault injection. Faults were injected in the input sequence data, and the analyses were run. Change in the output of the analyses was interpreted as the impact of faults, or errors. Three commonly used algorithms were used: BLAST, SSEARCH, and ProbCons. The main contributions of this work are the application of fault injection to the reliability analysis in bioinformatics and the quantitative demonstration that a small error rate in the sequence data can alter the output of the analysis in a significant way. BLAST and SSEARCH are both local alignment search tools, but BLAST is a heuristic implementation, while SSEARCH is based on the optimal Smith-Waterman algorithm. -
An Automated Pipeline for Retrieving Orthologous DNA Sequences from Genbank in R
life Technical Note phylotaR: An Automated Pipeline for Retrieving Orthologous DNA Sequences from GenBank in R Dominic J. Bennett 1,2,* ID , Hannes Hettling 3, Daniele Silvestro 1,2, Alexander Zizka 1,2, Christine D. Bacon 1,2, Søren Faurby 1,2, Rutger A. Vos 3 ID and Alexandre Antonelli 1,2,4,5 ID 1 Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Gothenburg, Sweden; [email protected] (D.S.); [email protected] (A.Z.); [email protected] (C.D.B.); [email protected] (S.F.); [email protected] (A.A.) 2 Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden 3 Naturalis Biodiversity Center, P.O. Box 9517, 2300 RA Leiden, The Netherlands; [email protected] (H.H.); [email protected] (R.A.V.) 4 Gothenburg Botanical Garden, Carl Skottsbergsgata 22A, SE-413 19 Gothenburg, Sweden 5 Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford St., Cambridge, MA 02138 USA * Correspondence: [email protected] Received: 28 March 2018; Accepted: 1 June 2018; Published: 5 June 2018 Abstract: The exceptional increase in molecular DNA sequence data in open repositories is mirrored by an ever-growing interest among evolutionary biologists to harvest and use those data for phylogenetic inference. Many quality issues, however, are known and the sheer amount and complexity of data available can pose considerable barriers to their usefulness. A key issue in this domain is the high frequency of sequence mislabeling encountered when searching for suitable sequences for phylogenetic analysis. -
Advancing Solutions to the Carbohydrate Sequencing Challenge † † † ‡ § Christopher J
Perspective Cite This: J. Am. Chem. Soc. 2019, 141, 14463−14479 pubs.acs.org/JACS Advancing Solutions to the Carbohydrate Sequencing Challenge † † † ‡ § Christopher J. Gray, Lukasz G. Migas, Perdita E. Barran, Kevin Pagel, Peter H. Seeberger, ∥ ⊥ # ⊗ ∇ Claire E. Eyers, Geert-Jan Boons, Nicola L. B. Pohl, Isabelle Compagnon, , × † Göran Widmalm, and Sabine L. Flitsch*, † School of Chemistry & Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, U.K. ‡ Institute for Chemistry and Biochemistry, Freie Universitaẗ Berlin, Takustraße 3, 14195 Berlin, Germany § Biomolecular Systems Department, Max Planck Institute for Colloids and Interfaces, Am Muehlenberg 1, 14476 Potsdam, Germany ∥ Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, U.K. ⊥ Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia 30602, United States # Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States ⊗ Institut Lumierè Matiere,̀ UMR5306 UniversitéLyon 1-CNRS, Universitéde Lyon, 69622 Villeurbanne Cedex, France ∇ Institut Universitaire de France IUF, 103 Blvd St Michel, 75005 Paris, France × Department of Organic Chemistry, Arrhenius Laboratory, Stockholm University, S-106 91 Stockholm, Sweden *S Supporting Information established connection between their structure and their ABSTRACT: Carbohydrates possess a variety of distinct function, full characterization of unknown carbohydrates features with -
Phylogeny Inference Based on Parsimony and Other Methods Using Paup*
8 Phylogeny inference based on parsimony and other methods using Paup* THEORY David L. Swofford and Jack Sullivan 8.1 Introduction Methods for inferring evolutionary trees can be divided into two broad categories: thosethatoperateonamatrixofdiscretecharactersthatassignsoneormore attributes or character states to each taxon (i.e. sequence or gene-family member); and those that operate on a matrix of pairwise distances between taxa, with each distance representing an estimate of the amount of divergence between two taxa since they last shared a common ancestor (see Chapter 1). The most commonly employed discrete-character methods used in molecular phylogenetics are parsi- mony and maximum likelihood methods. For molecular data, the character-state matrix is typically an aligned set of DNA or protein sequences, in which the states are the nucleotides A, C, G, and T (i.e. DNA sequences) or symbols representing the 20 common amino acids (i.e. protein sequences); however, other forms of discrete data such as restriction-site presence/absence and gene-order information also may be used. Parsimony, maximum likelihood, and some distance methods are examples of a broader class of phylogenetic methods that rely on the use of optimality criteria. Methods in this class all operate by explicitly defining an objective function that returns a score for any input tree topology. This tree score thus allows any two or more trees to be ranked according to the chosen optimality criterion. Ordinarily, phylogenetic inference under criterion-based methods couples the selection of The Phylogenetic Handbook: a Practical Approach to Phylogenetic Analysis and Hypothesis Testing, Philippe Lemey, Marco Salemi, and Anne-Mieke Vandamme (eds.). -
Comparative Phylogeography As an Integrative Approach to Historical Biogeography
Journal of Biogeography, 28, 819±825 Comparative phylogeography as an integrative approach to historical biogeography ABSTRACT GUEST EDITORIAL Phylogeography has become a powerful approach for elucidating contemporary geographical patterns of evolutionary subdivision within species and species complexes. A recent extension of this approach is the comparison of phylogeographic patterns of multiple co-distributed taxonomic groups, or `comparative phylogeography.' Recent comparative phylogeographic studies have revealed pervasive and previously unrecognized biogeographic patterns which suggest that vicariance has played a more important role in the historical development of modern biotic assemblages than current taxonomy would indicate. Despite the utility of comparative phylogeography for uncovering such `cryptic vicariance', this approach has yet to be embraced by some researchers as a valuable complement to other approaches to historical biogeography. We address here some of the common misconceptions surrounding comparative phylogeography, provide an example of this approach based on the boreal mammal fauna of North America, and argue that together with other approaches, comparative phylogeography can contribute importantly to our understanding of the relationship between earth history and biotic diversi®cation. Keywords Area cladistics, comparative phylogeography, historical biogeography, vicariance. INTRODUCTION In a recent guest editorial Humphries (2000) presented an overview of historical biogeography. He concentrated largely on comparisons -
University of Florida Thesis Or Dissertation Formatting
TOOLS FOR BIODIVERSITY ANALYSES USING NATURAL HISTORY COLLECTIONS AND REPOSITORIES: DATA MINING, MACHINE LEARNING AND PHYLODIVERSITY By CHANDRA EARL A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2020 1 . © 2020 Chandra Earl 2 . ACKNOWLEDGMENTS I thank my co-chairs and members of my supervisory committee for their mentoring and generous support, my collaborators and colleagues for their input and support and my parents and siblings for their loving encouragement and interest. 3 . TABLE OF CONTENTS page ACKNOWLEDGMENTS .................................................................................................. 3 LIST OF TABLES ............................................................................................................ 5 LIST OF FIGURES .......................................................................................................... 6 ABSTRACT ..................................................................................................................... 8 CHAPTER 1 INTRODUCTION ...................................................................................................... 9 2 GENEDUMPER: A TOOL TO BUILD MEGAPHYLOGENIES FROM GENBANK DATA ...................................................................................................................... 12 Materials and Methods........................................................................................... -
EVOLUTIONARY INFERENCE: Some Basics of Phylogenetic Analyses
EVOLUTIONARY INFERENCE: Some basics of phylogenetic analyses. Ana Rojas Mendoza CNIO-Madrid-Spain. Alfonso Valencia’s lab. Aims of this talk: • 1.To introduce relevant concepts of evolution to practice phylogenetic inference from molecular data. • 2.To introduce some of the most useful methods and computer programmes to practice phylogenetic inference. • • 3.To show some examples I’ve worked in. SOME BASICS 11--ConceptsConcepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. • Allele mutation vs allele substitution. • Rates of allele substitution. • Neutral theory of evolution. SOME BASICS Owen’s definition of homology Richard Owen, 1843 • Homologue: the same organ under every variety of form and function (true or essential correspondence). •Analogy: superficial or misleading similarity. SOME BASICS 1.Concepts1.Concepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. • Allele mutation vs allele substitution. • Rates of allele substitution. • Neutral theory of evolution. SOME BASICS Similarity ≠ Homology • Similarity: mathematical concept . Homology: biological concept Common Ancestry!!! SOME BASICS 1.Concepts1.Concepts ofof MolecularMolecular EvolutionEvolution • Homology vs Analogy. • Homology vs similarity. • Ortologous vs Paralogous genes. • Species tree vs genes tree. • Molecular clock. -
"Phylogenetic Analysis of Protein Sequence Data Using The
Phylogenetic Analysis of Protein Sequence UNIT 19.11 Data Using the Randomized Axelerated Maximum Likelihood (RAXML) Program Antonis Rokas1 1Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee ABSTRACT Phylogenetic analysis is the study of evolutionary relationships among molecules, phenotypes, and organisms. In the context of protein sequence data, phylogenetic analysis is one of the cornerstones of comparative sequence analysis and has many applications in the study of protein evolution and function. This unit provides a brief review of the principles of phylogenetic analysis and describes several different standard phylogenetic analyses of protein sequence data using the RAXML (Randomized Axelerated Maximum Likelihood) Program. Curr. Protoc. Mol. Biol. 96:19.11.1-19.11.14. C 2011 by John Wiley & Sons, Inc. Keywords: molecular evolution r bootstrap r multiple sequence alignment r amino acid substitution matrix r evolutionary relationship r systematics INTRODUCTION the baboon-colobus monkey lineage almost Phylogenetic analysis is a standard and es- 25 million years ago, whereas baboons and sential tool in any molecular biologist’s bioin- colobus monkeys diverged less than 15 mil- formatics toolkit that, in the context of pro- lion years ago (Sterner et al., 2006). Clearly, tein sequence analysis, enables us to study degree of sequence similarity does not equate the evolutionary history and change of pro- with degree of evolutionary relationship. teins and their function. Such analysis is es- A typical phylogenetic analysis of protein sential to understanding major evolutionary sequence data involves five distinct steps: (a) questions, such as the origins and history of data collection, (b) inference of homology, (c) macromolecules, developmental mechanisms, sequence alignment, (d) alignment trimming, phenotypes, and life itself. -
EMBL-EBI Powerpoint Presentation
Processing data from high-throughput sequencing experiments Simon Anders Use-cases for HTS · de-novo sequencing and assembly of small genomes · transcriptome analysis (RNA-Seq, sRNA-Seq, ...) • identifying transcripted regions • expression profiling · Resequencing to find genetic polymorphisms: • SNPs, micro-indels • CNVs · ChIP-Seq, nucleosome positions, etc. · DNA methylation studies (after bisulfite treatment) · environmental sampling (metagenomics) · reading bar codes Use cases for HTS: Bioinformatics challenges Established procedures may not be suitable. New algorithms are required for · assembly · alignment · statistical tests (counting statistics) · visualization · segmentation · ... Where does Bioconductor come in? Several steps: · Processing of the images and determining of the read sequencest • typically done by core facility with software from the manufacturer of the sequencing machine · Aligning the reads to a reference genome (or assembling the reads into a new genome) • Done with community-developed stand-alone tools. · Downstream statistical analyis. • Write your own scripts with the help of Bioconductor infrastructure. Solexa standard workflow SolexaPipeline · "Firecrest": Identifying clusters ⇨ typically 15..20 mio good clusters per lane · "Bustard": Base calling ⇨ sequence for each cluster, with Phred-like scores · "Eland": Aligning to reference Firecrest output Large tab-separated text files with one row per identified cluster, specifying · lane index and tile index · x and y coordinates of cluster on tile · for each -
Resolving Postglacial Phylogeography Using High-Throughput Sequencing
Resolving postglacial phylogeography using high-throughput sequencing Kevin J. Emerson1, Clayton R. Merz, Julian M. Catchen, Paul A. Hohenlohe, William A. Cresko, William E. Bradshaw, and Christina M. Holzapfel Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, OR 97403-5289 Edited by David L. Denlinger, Ohio State University, Columbus, OH, and approved August 4, 2010 (received for review May 11, 2010) The distinction between model and nonmodel organisms is becom- not useful for determining the genetic similarity in closely related ing increasingly blurred. High-throughput, second-generation se- populations of nonmodel species or species for which whole-genome quencing approaches are being applied to organisms based on their resequencing is not yet possible. interesting ecological, physiological, developmental, or evolutionary Phylogeography and phylogenetics have recently benefited from properties and not on the depth of genetic information available for genome-wide SNP detection methods to elucidate patterns of fi them. Here, we illustrate this point using a low-cost, ef cient tech- variation in model taxa or their close relatives (7, 13). The limiting fi nique to determine the ne-scale phylogenetic relationships among step in the applicability of multilocus datasets in nonmodel organ- recently diverged populations in a species. This application of restric- isms has been generating the genetic markers to be used (14). tion site-associated DNA tags (RAD tags) reveals previously unre- solved genetic structure and direction of evolution in the pitcher Baird et al. (4) developed a second generation sequencing ap- plant mosquito, Wyeomyia smithii, from a southern Appalachian proach that allowed for the simultaneous discovery and typing of Mountain refugium following recession of the Laurentide Ice Sheet thousands of SNPs throughout the genome (5).