Proquest Dissertations

Total Page:16

File Type:pdf, Size:1020Kb

Proquest Dissertations INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI A Bell & Howell Infonnation Company 300 North Zeeb Road, Ann Arbor MI 48106-1346 USA 313/761-4700 800/521-0600 Simulation-Based Estimation of Phylogenetic Trees DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Laura A. Salter, B.A., M.S. * * * * * The Ohio State University 1999 Dissertation Committee: Approved by Professor Dennis K. Pearl, Adviser JT Professor L. Mark Berliner Adviser Professor Paul Fuerst Department of Statistics Professor Joseph Verducci UMI Number: 9931673 UMI Microform 9931673 Copyright 1999, by UMI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. UMI 300 North Zeeb Road Ann Arbor, MX 48103 ABSTRACT A common, goal in the analysis of nucleotide sequence data is the inference of the phylogenetic history of the sequences under consideration. Many criteria for the selection of a phylogenetic representation of the data have been developed. We focus here on two criteria: the maximum likelihood criteria and the parsimony criteria. The maximum likelihood method of phylogenetic tree construction has several advantages over other criteria, including the interpretability of the underlying Markov models, consistency in the statistical sense, and the possibility of statistical testing of hypotheses using the likelihood framework. However, use of the maximum likelihood method in practice has been limited because the method is computationally intensive, especially when the number of sequences under consideration is large. We therefore propose a stochastic search algorithm for estimation of the maximum likelihood tree. The method significantly reduces the computation time involved in constructing the maximum likelihood tree, and in many cases returns an estimate of the phylogeny that has a higher likelihood than those returned by the methods currently in use. We give some convergence results for the algorithm and apply it to several theoretical and real data sets. The algorithm can also be extended to allow for simultaneous estimation of the tree and the model parameters. Examples of the application of the extended algorithm are also given. 11 Parsimoay is currently one of the most widely used phylogenetic tree construction methods. However, current implementations of the parsimony method can be shown to give locally optimal estimates of the phylogeny when a large number of sequences are considered. We have developed a simulated annealing algorithm for estimation of phylogenetic trees under the parsimony criteria, in the hope that such an algorithm would be less prone to entrapment in local minima. Though our algorithm does show reasonable ability to locate the most parsimonious tree, it does so at the expense of computing time. This result is in agreement with previous literature concerning the use of simulated annealing in estimating phylogenetic trees under the parsimony criteria. We provide convergence results for our algorithm and apply the method to several examples. lU This is dedicated to my parents and my sister IV ACKNOWLEDGMENTS I would like to thank my family, my friends, and especially Justin, for the constant encouragement and support they have given me. I am grateful to the members of my committee for their insight and suggestions concerning this research. I would especially like to thank Dr. Berliner for his assistance with the results in Chapter 4. Finally, I am extremely grateful to Dr. Pearl for his continued advice, encouragement, dedication, and understanding throughout my thesis work. VITA April 11, 1972 .....................................................Bom - New Orleans, Louisiana USA 1994 ......................................................................B.A. Biolog}% Mathematics 1996 ......................................................................M.S. Statistics 1997-present ....................................................... Graduate Research Associate, The Ohio State University. FIELDS OF STUDY Major Field: Biostatistics VI TABLE OF CONTENTS Page A b stra c t............................................................................................................................ ii Dedication. ......................................................................................................................... iv Acknowledgments ............................................................................................................ v V i t a ................................................................................................................................... vi List of Tables .................................................................................................................. ix List of Figures ............................................................................................................... x Chapters: 1. Introduction and Literature Review ................................................................. 1 1.1 Phylogenetic Trees and Reconstruction Methods ................................ 1 1.2 Simulated Annealing and Stochastic Probing ....................................... 8 2. A Stochastic Search Strategy for Estimation of the Maximum Likelihood T r e e ......................................................................................................................... 14 2.1 Calculation of the L ik e lih o o d ................................................................. 14 2.2 A Stochastic Search A lg o rith m............................................................. 21 2.2.1 The Generation S c h e m e............................................................... 22 2.2.2 The Cooling S c h e d u le .................................................................. 26 2.2.3 The Stopping R u le ......................................................................... 27 2.3 Simultaneous Estimation of the Tree and the Substitution Model P a ra m e te rs .................................................................................................. 29 2.3.1 Estimation of the Nucleotide Frequency Parameters ................ 30 2.3.2 Estimation of Other Substitution Model Parameters .... 31 2.4 Computer Implementation ....................................................................... 40 vii 3. A Simulated Annealing Algorithm for Estimation of Phylogenetic Trees Under the Parsimony C riteria ............................................................................ 43 3.1 The Parsimony C r ite r ia ............................................................................ 43 3.2 Estimating the Most Parsimonious Tree(s) Using Simulated Annealing 47 3.2.1 A New Simulated Annealing Algorithm for Estimation of the Most Parsimonious Tree(s) ........................................................ 49 3.2.2 Computer Im plem entation ............................................................ 52 4. Properties of the Algorithms ................................................................................ 53 4.1 Convergence Results for the Stochastic Search A lgorithm ................. 53 4.2 Convergence Results for the Simulated Annealing Algorithm for Es­ timation of the Most Parsimonious Tree(s) ........................................... 66 5. Applications ............................................................................................................. 73 5.1 Estimation of the ML Tree for Fixed Parameter Values .................... 73 5.1.1 Theoretical D a t a ............................................................................. 73 5.1.2 Mitochondrial DNA Sequences ......................................... 76 5.1.3 Group A Papillomavirus Sequences ............................................ 80 5.1.4 Analysis of the env Region for 30 HIV Sequences ................... 84 5.2 Simultaneous Estimation of the Tree and Substitution Model Pa­ rameters ......................................................................................................... 87 5.3 Estimation of the Most Parsimonious Tree(s) ..................................... 91 6. Conclusion and
Recommended publications
  • Uncorrelated Genetic Drift of Gene Frequencies and Linkage
    Genet. Res., Camb. (1974), 24, pp. 281-294 281 Printed in Great Britain Uncorrelated genetic drift of gene frequencies and linkage disequilibrium in some models of linked overdominant polymorphisms BY JOSEPH FELSENSTEIN Department of Genetics, University of Washington, Seattle, Washington 98195 {Received 15 April 1974) SUMMARY For large population sizes, gene frequencies p and q at two linked over- dominant loci and the linkage disequilibrium parameter D will remain close to their equilibrium values. We can treat selection and recombination as approximately linear forces on^J, q and D, and we can treat genetic drift as a multivariate normal perturbation with constant variance-covariance matrix. For the additive-multiplicative family of two-locus models, p, q and D are shown to be (approximately) uncorrelated. Expressions for their variances are obtained. When selection coefficients are small the variances of p and q are those previously given by Robertson for a single locus. For small recombination fractions the variance of D is that obtained for neutral loci by Ohta & Kimura. For larger recombination fractions the result differs from theirs, so that for unlinked loci r2~ 2/(3N) instead of l/(2N). For the Lewontin-Kojima and Bodmer symmetric viability models, and for a model symmetric at only one of the loci, a more exact argument is possible. In the asymptotic conditional distribution in these cases, various of p, q and D are uncorrelated, depending on the type of symmetiy in the model. 1. INTRODUCTION Much of the work on linked genes in the last few years has centred on the deter- ministic theory of natural selection of linked polymorphisms.
    [Show full text]
  • Workshop on Molecular Evolution (Extended Special Topics Session July 27-August 8,2003 August 8-August 15,2003) Course Director
    Workshop on Molecular Evolution July 27-August 8,2003 (Extended Special Topics Session August 8-August 15,2003) Course Director: Michael P. Cummings, University of Maryland and Marine Biological Laboratory Molecular evolution has become the nexus of many areas of biological research. It both brings together and enriches such areas as biochemistry, molecular biology, microbiology, population genetics, systematics, developmental biology, genomics, bioinformatics, in vitro evolution, and molecular ecology. The Workshop provides an important contribution to these fields in that it promotes interdisciplinary research and interaction, and thus provides a glue that sticks together disparate fields. Due to the wide range of fields addressed by the study of molecular evolution, it is difficult to offer a comprehensive course in a university setting. It is rare for a single institution to maintain expertise in all necessary areas. In contrast, the Workshop is uniquely able to provide necessary breadth and depth by utilizing a large number of faculty with appropriate expertise. Furthermore, the flexible nature of the Workshop allows for rapid adaptation to changes in the dynamic field of molecular evolution. For example, the 2003 Workshop included recently emergent research areas of molecular evolution of development and genomics. The interest in the Workshop remains very strong and is increasing. The number of applications for the 2003 course was 143, continuing the trend of increased applications since 2000. In 2003 there were 60 students participating in the Workshop, which was taught by 19 faculty and 4 teaching assistants. The students came from all over the world (1 7 countries), and represented several career stages: graduate students (57%), postdoctoral researchers (1 3%), faculty/principal investigators (27%), and other (3%).
    [Show full text]
  • American Society of Naturalists Honorary Lifetime Membership Awards
    The University of Chicago American Society of Naturalists Honorary Lifetime Membership Awards. Source: The American Naturalist, Vol. 183, No. 6 (June 2014), pp. ii-v Published by: The University of Chicago Press for The American Society of Naturalists Stable URL: http://www.jstor.org/stable/10.1086/676468 . Accessed: 28/09/2014 17:59 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. The University of Chicago Press, The American Society of Naturalists, The University of Chicago are collaborating with JSTOR to digitize, preserve and extend access to The American Naturalist. http://www.jstor.org This content downloaded from 142.103.160.110 on Sun, 28 Sep 2014 17:59:55 PM All use subject to JSTOR Terms and Conditions American Society of Naturalists Honorary Lifetime Membership Awards Jane Lubchenco The American Society of Naturalists is pleased to award step of effective involvement in the political processes Jane Lubchenco an Honorary Lifetime Membership. Jane that direct and fund our endeavors. She has cofounded received her BA in biology from Colorado College, an three organizations: the Leopold Leadership Program, the MS in zoology from the University of Washington, and Communications Partnership for Science and the Sea, her doctorate from Harvard University, studying with and Climate Central.
    [Show full text]
  • JUMPSTARTING PHYLOGENETIC SEARCHES by Jesse L. Mecham A
    JUMPSTARTING PHYLOGENETIC SEARCHES by Jesse L. Mecham A thesis submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Master of Science Department of Computer Science Brigham Young University April 2006 Copyright c 2006 Jesse L. Mecham All Rights Reserved BRIGHAM YOUNG UNIVERSITY GRADUATE COMMITTEE APPROVAL of a thesis submitted by Jesse L. Mecham This thesis has been read by each member of the following graduate committee and by majority vote has been found to be satisfactory. Date Mark J. Clement, Chair Date Quinn O. Snell Date Dennis Ng BRIGHAM YOUNG UNIVERSITY As chair of the candidate’s graduate committee, I have read the thesis of Jesse L. Mecham in its final form and have found that (1) its format, citations, and biblio- graphical style are consistent and acceptable and fulfill university and department style requirements; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the graduate committee and is ready for submission to the university library. Date Mark J. Clement Chair, Graduate Committee Accepted for the Department Parris Egbert Graduate Coordinator Accepted for the College Tony Martinez Dean, College of Engineering and Technology ABSTRACT JUMPSTARTING PHYLOGENETIC SEARCHES Jesse L. Mecham Department of Computer Science Master of Science Phylogenetic analysis is a central tool in studies of comparative genomics. When a new region of DNA is isolated and sequenced, researchers are often forced to throw away months of computation on an existing phylogeny of homologous sequences in order to incorporate this new sequence.
    [Show full text]
  • A General Population Genetic Theory for the Evolution of Developmental Interactions
    A general population genetic theory for the evolution of developmental interactions Sean H. Rice* Department of Ecology and Evolutionary Biology, Osborn Memorial Laboratories, Yale University, New Haven, CT 06520 Communicated by Joseph Felsenstein, University of Washington, Seattle, WA, October 11, 2002 (received for review October 7, 2001) The development of most phenotypic traits involves complex The geometry of this surface is determined by how the interactions between many underlying factors, both genetic and underlying factors interact to influence phenotype, in other environmental. To study the evolution of such processes, a set of words, by development. There is a straightforward relation mathematical relationships is derived that describe how selection between the terminology of gene interaction and the geometry acts to change the distribution of genetic variation given arbitrarily of the phenotype landscape. If u1 and u2 are genetic (heritable) complex developmental interactions and any distribution of ge- factors, then the degree to which the value of u1 influences the netic and environmental variation. The result is illustrated by using phenotypic consequences of changing u2 (i.e., the epistatic Ѩ2␾͞Ѩ Ѩ it to derive models for the evolution of dominance and for the interaction between them) is u1 u2, which measures the evolutionary consequences of asymmetry in the distribution of Ѩ2␾͞Ѩ 2 curvature of the landscape in one direction. Similarly, u1 genetic variation. measures the nonlinear effects of changing u1 and thus (when u1 is genetic) provides a measure of dominance (10). If u1 is a Ѩ2␾͞Ѩ Ѩ uring development of a phenotypic trait, gene products genetic factor and u2 an environmental factor, then u1 u2 Dinteract in highly nonadditive ways with one another and measures the genotype by environment (G ϫ E) interaction.
    [Show full text]
  • Dobzhansky's Evolution of Tropical Populations, and the Science and Politics Of
    CARVALHO, Tito. “A most bountiful source of inspiration:” Dobzhansky’s evolution of tropical populations, and the science and politics of genetic variation. História, Ciências, Saúde – Manguinhos, Rio de Janeiro, v.26, n.1, “A most bountiful jan.-mar. 2019, p.281-297. Abstract source of inspiration:” Theodosius Dobzhansky has been studied for how he integrated Dobzhansky’s evolution of field naturalism and laboratory experimentation in ways that helped tropical populations, and produce the Modern Synthesis, as well as how he leveraged biological expertise to support liberal and cosmopolitan the science and politics of values amidst Second World War and the Cold War. Moreover, Dobzhansky genetic variation has been central in analyses of the institutionalization of genetics in Brazil, where he spent several years. This article “A mais abundante fonte situates Dobzhansky’s Brazilian research within the science of variation and the de inspiração”: Dobzhansky politics of diversity. I conclude by raising questions about how the ways in which e sua evolução sobre as science figured in politics depended on ideas about the role of scientists in society whichwere advanced in populações dos trópicos, parallel, suggesting research on the “co- a ciência e a política da production” of natural and social orders. Keywords: evolutionary genetics; variabilidade genética transnational science; eugenics; race; tropics; Theodosius Dobzhansky (1900- 1975). Resumo Theodosius Dobzhansky tem sido estudado pelo modo como ele integrou o naturalismo de campo e a experimentação científica, que deram origem à síntese moderna, assim como a alavanca que ele deu ao conhecimento biológico para apoiar valores liberais e cosmopolitas em meio à Segunda Guerra Mundial e à Guerra Fria.
    [Show full text]
  • Indo-European Phylogenetics with R a Tutorial Introduction
    Indo-European Linguistics (2020) 1–71 brill.com/ieul Indo-European phylogenetics with R A tutorial introduction David Goldstein University of California, Los Angeles, CA, USA [email protected] Abstract The last twenty or so years have witnessed a dramatic increase in the use of computa- tional methods for inferring linguistic phylogenies. Although the results of this research have been controversial, the methods themselves are an undeniable boon for histori- cal and Indo-European linguistics, if for no other reason than that they allow the field to pursue questions that were previously intractable. After a review of the advantages and disadvantages of computational phylogenetic methods, I introduce the following methods of phylogenetic inference in R: maximum parsimony; distance-based meth- ods (UPGMA and neighbor joining); and maximum likelihood estimation. I discuss the strengths and weaknesses of each of these methods and in addition explicate var- ious measures associated with phylogenetic estimation, including homoplasy indices and bootstrapping. Phylogenetic inference is carried out on the Indo-European dataset compiled by Don Ringe and Ann Taylor, which includes phonological, morphological, and lexical characters. Keywords phylogenetics – computational methods – parsimony – UPGMA – neighbor joining – maximum likelihood – homoplasy – bootstrapping 1 Introduction Phylogenetic trees model linguistic descent. More specifically, they are hypoth- eses about the order of lineage-splitting events from an often unobservable common ancestor to a set of observable descendants (Bowern & Koch 2004: 8–9, Pagel 2017: 152). The phylogeny of the Indo-European languages is a mat- © david goldstein, 2020 | doi:10.1163/22125892-20201000 This is an open access article distributed under the terms of the CC BY-NCDownloaded 4.0 license.
    [Show full text]
  • Phylip and Phylogenetics
    ® Genes, Genomes and Genomics ©2009 Global Science Books Phylip and Phylogenetics Ahmed Mansour* Genetics Department, Faculty of Agriculture, Zagazig University, Zagazig, Egypt Correspondence : * [email protected] ; [email protected] ABSTRACT Phylogenetics studies are mainly concerned with evolutionary relatedness among various groups of organisms. Recently, phylogenetic analyses have been performed on a genomic scale to address issues ranging from the prediction of gene and protein function to organismal relationships. Computing the relatedness of organisms either by phylogenetic (gene by gene analyses) or phylogenomic (the whole genome comparison) methods reveals high-quality results for demonstrating phylogenies. In this regard, Phylip (Phylogeny Inference Package) software is a free package of programs for inferring phylogenies of living species and organisms. It is now one of the most widely used packages for computing accurate phylogenetic trees and carrying out certain related tasks. This paper provides an overview on Phylip package and its applications and contribution to phylogenetic analyses. _____________________________________________________________________________________________________________ Keywords: bioinformatics, evolutionary relatedness, genetic diversity INTRODUCTION Phylip: Different useful programs The word phylogenetics is derived from the Greek words, The PHYLIP programs could be classified into five cate- phylon, which means tribe or race, and genetikos, which gories (Table 1): means birth. Phylogenetic
    [Show full text]
  • Phylogeny of Five Taxa in the Felsenstein and Farris Zones
    Phylogeny of Five Taxa in the Felsenstein and Farris Zones Eric Trung Lam Thesis submitted to the Faculty of Science in partial fulfillment of the requirements for the degree of Masters in Statistics with Specialization in Bioinformatics1 Mathematics and Statistics Faculty of Science University of Ottawa © Eric Trung Lam, Ottawa, Canada, 2021 1The program is a joint program with Carleton University, administered by the Ottawa-Carleton Institute of Mathematics and Statistics Abstract Mathematical conditions which showed where parsimony was not consistent for four taxa were first introduced by Felsenstein in 1978. This was subsequently labelled the \Felsenstein zone". Following Felsenstein's findings, `frequentists' conjectured that for five taxa there would also be a region in parameter space where parsimony is not consistent. In response, `cladists' claimed that parsimony was consistent in a different region of parameter space, which is called the \Farris zone". However, no analytical description of the region in which this consistency occurs has been made. Furthermore, no mathematical extensions of this Felsenstein theory to five taxa or more has been made. The same is true for the Farris zone. In this thesis, we give a complete account for the Felsenstein zone and Farris zone for four and five taxa and interpret these in terms of the shape of the phylogenetic tree. ii Dedications I would like to dedicate this to my parents and sister, whose love and support has helped me throughout my university career. iii Acknowledgements Firstly, I would like to express my thanks and gratitude to my supervisor, David Sankoff, without whom this thesis would not have been completed.
    [Show full text]
  • The Troubled Growth of Statistical Phylogenetics
    Syst. Biol. 50(4):465–467, 2001 The Troubled Growth of Statistical Phylogenetics JOSEPH FELSENSTEIN Department of Genetics, University of Washington, Seattle, Washington 98919 USA; E-mail: [email protected] Statistical inference of phylogenies almost of a small band of pioneers exploring new didn’t happen. The story of the origin, territory. growth, and spread of “statistical phyloge- The stage would now seem set for a grad- netics” needs to be told, because it is so ual spread of statistical methods, but reality strange. It is not the straightforward story of was not to be this simple. In 1969 I began gradual spread that one might imagine. to attend the annual Numerical Taxonomy It starts with the development of numeri- conferences convened by Bob Sokal. In cal methods in systematics, whose modern 1971, at that meeting in Ann Arbor, Gareth proponents were Sokal and Sneath. Their Nelson advocated Willi Hennig’s strictly work, embodied in their book Principles of monophyletic classication. It became clear Numerical Taxonomy, set off an explosion of that some systematists wanted to take a well- work by mathematical clusterers, but did not dened, almost algorithmic approach. Hen- win many converts in systematics. In the nig set forth well-dened methods for infer- early 1960s two groups started work on nu- ring phylogenies (provided there was no in- merical inference of phylogenies. Edwards ternal conict in the data), an approach with and Cavalli-Sforza, working on trees of enormous appeal to a new generation of mor- human populations and using gene frequen- phological systematists. cies, invented parsimony and distance ma- The difculty was that although well- trix methods.
    [Show full text]
  • Punctuated Evolution Shaped Modern Vertebrate Diversity
    bioRxiv preprint doi: https://doi.org/10.1101/151175; this version posted June 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Punctuated evolution shaped modern vertebrate diversity Michael J. Landis1 and Joshua G. Schraiber2, 3 1Department of Ecology and Evolutionary Biology, Yale University 2Department of Biology, Temple University 3Institute for Genomics and Evolutionary Medicine, Temple University June 16, 2017 Abstract The relative importance of different modes of evolution in shaping phenotypic diversity remains a hotly debated question. Fossil data suggest that stasis may be a common mode of evolution, while modern data suggest very fast rates of evolution. One way to reconcile these observations is to imagine that evolution is punctuated, rather than gradual, on geological time scales. To test this hypothesis, we developed a novel maximum likelihood framework for fitting L´evyprocesses to comparative morphological data. This class of stochastic processes includes both a gradual and punctuated component. We found that a plurality of modern vertebrate clades examined are best fit by punctuated processes over models of gradual change, gradual stasis, and adaptive radiation. When we compare our results to theoretical expectations of the rate and speed of regime shifts for models that detail fitness landscape dynamics, we find that our quantitative results are broadly compatible with both microevolutionary models and with observations from the fossil record. A key debate in evolutionary biology centers around the seeming contradictions regarding the tempo and mode of evolution as seen in fossil data compared to ecological data.
    [Show full text]
  • A Primer to Phylogenetic Analysis Using the PHYLIP Package
    A primer to phylogenetic analysis using the PHYLIP package Jarno Tuimala Fifth Edition All rights reserved. The PDF version of this book or parts of it can be used in Finnish universities as course material, provided that this copyright notice is included. However, this publication may not be sold or included as part of other publications without permission of the publisher. © The author and CSC – Scientific Computing Ltd. 2006 ISBN 952-5520-02-1 2 Index Index ..............................................................................................................................................................3 Preface ...........................................................................................................................................................4 Introduction ..................................................................................................................................................5 What is PHYLIP? ...................................................................................................................................5 Installation ..............................................................................................................................................5 User interface..........................................................................................................................................6 Getting started – datafiles and programs...................................................................................................6 Always keep
    [Show full text]