Hierarchical Phylogeny Construction

Total Page:16

File Type:pdf, Size:1020Kb

Hierarchical Phylogeny Construction Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2019 Hierarchical phylogeny construction Anindya Das Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Bioinformatics Commons, and the Computer Sciences Commons Recommended Citation Das, Anindya, "Hierarchical phylogeny construction" (2019). Graduate Theses and Dissertations. 17433. https://lib.dr.iastate.edu/etd/17433 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Hierarchical phylogeny construction by Anindya Das A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Xiaoqiu Huang, Major Professor David Fernandez-Baca Oliver Eulenstein Peng Liu Dennis V Lavrov The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Anindya Das, 2019. All rights reserved. ii DEDICATION I would like to dedicate this dissertation to my wife Soma and to my parents without whose support I would not have been able to complete this work. I would also like to thank my friends and family for their loving guidance and continuous encouragement and assistance during the writing of this work. iii TABLE OF CONTENTS Page LIST OF TABLES . vi LIST OF FIGURES . viii ACKNOWLEDGMENTS . .x ABSTRACT . xi CHAPTER 1. OVERVIEW . .1 1.1 Introduction . .2 1.2 Outline . .3 1.2.1 Background And Literature Review . .3 1.2.2 Identification Of Informative Regions For Construction Of Phylogeny . .3 1.2.3 Hierarchical Phylogeny Construction . .4 1.2.4 Summary And Conclusion . .4 CHAPTER 2. BACKGROUND . .5 2.1 Phylogeny . .5 2.1.1 Branch Length . .6 2.1.2 Bipartition . .6 2.2 Genome Sequence Alignment . .6 2.2.1 Construction Of Alignment From SNPs . .7 2.3 Construction Of Phylogeny From Sequence Alignment . .8 2.4 Models Of DNA Evolution . .8 2.4.1 Jukes-Cantor Model . .9 2.4.2 K80 Model . .9 iv 2.4.3 GTR Model . .9 2.5 Methods For Computing Phylogeny From Sequence Alignment . 10 2.5.1 Distance Based Methods . 10 2.5.2 Maximum Parsimony Method . 11 2.5.3 Bayesian Method . 11 2.5.4 Maximum Likelihood Method . 11 2.6 Programs For Maximum Likelihood Phylogeny Construction . 16 2.6.1 PhyML . 16 2.6.2 RAxML . 17 2.6.3 IQ-TREE . 17 2.6.4 FastTree . 17 2.7 Shortcomings Of Existing Approach . 18 CHAPTER 3. RECONSTRUCTION OF LOW RESOLUTION CLADES . 20 3.1 Introduction . 20 3.2 Methodology . 22 3.2.1 Detection Of Informative Regions . 22 3.2.2 Selection Of Clades With Low Resolution . 25 3.2.3 Reconstruction Of Phylogeny Of Closely Related Isolates . 26 3.3 Results . 27 3.3.1 Real Data . 27 3.3.2 Simulated Data . 28 3.4 Discussion . 32 v CHAPTER 4. HIERARCHICAL PHYLOGENY CONSTRUCTION . 34 4.1 Introduction . 34 4.2 Methods . 35 4.2.1 Partition Of Isolates . 36 4.2.2 Construction Of Phylogeny For The Current Group . 38 4.2.3 Learning Optimal Values For Parameters of HPC . 38 4.2.4 Workflow of HPC . 40 4.3 Results . 41 4.3.1 Datasets With Two Levels Of Sequence Similarity . 42 4.3.2 Comparison Of HPC With Existing Approach . 43 4.3.3 Datasets With Three Levels Of Sequence Similarity . 48 4.3.4 Computational Resources And Average Wall-clock Time . 55 4.3.5 Settings And Commands For RAxML, IQ-TREE And FastTree . 56 4.4 Discussion . 57 CHAPTER 5. SUMMARY AND CONCLUSION . 74 5.1 Summary . 74 5.1.1 Identification Of Informative Columns And Closely Related Isolates . 74 5.1.2 Hierarchical Phylogeny Construction . 75 5.2 Conclusion . 76 BIBLIOGRAPHY . 77 vi LIST OF TABLES Page Table 3.1 Deviation of likelihood for the alignment S ................... 23 Table 3.2 Noise alignment from sequence alignment . 23 Table 3.3 Deviation of likelihood for the alignment S0 .................. 24 Table 3.4 Number (percentage) of informative columns . 30 Table 3.5 Robinson-Foulds distance of trees built from whole alignment (Tw) and from informative columns (Tr)............................. 31 Table 4.1 Robinson-Foulds distance (average over 5 instances) from true phylogenies for various lengths of alignment . 39 Table 4.2 Time required by three versions of HPC and three existing programs . 44 Table 4.3 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 60 Table 4.4 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 61 Table 4.5 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 62 Table 4.6 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 63 Table 4.7 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 64 Table 4.8 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 65 vii Table 4.9 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Wide alignments . 66 Table 4.10 Sum of normalized Robinson-Foulds distances over 24 groups in 20 Wide alignments . 67 Table 4.11 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Narrow alignments . 68 Table 4.12 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Narrow alignments . 69 Table 4.13 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Narrow alignments . 70 Table 4.14 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Narrow alignments . 71 Table 4.15 Sum of normalized Robinson-Foulds distances over 24 groups in 40 Narrow alignments . 72 Table 4.16 Average normalized Robinson-Foulds distance for each program . 73 Table 4.17 Normalized Robinson-Foulds distance for 16 subgroups in 10 alignments . 73 viii LIST OF FIGURES Page Figure 2.1 Example of a phylogenetic tree of 4 species . .5 Figure 2.2 Example of an alignment of 3 species with 4 columns . 12 Figure 2.3 A tree of 3 species for calculating the likelihood . 13 Figure 2.4 A tree of 3 species with character states in the nodes . 13 Figure 2.5 Nearest neighbor interchange on an edge . 15 Figure 3.1 Algorithm for selection of informative columns from a sequence alignment and corresponding maximum likelihood tree . 26 Figure 3.2 The maximum likelihood tree of 11 Fusarium isolates from an alignment of 137; 718 columns . 28 Figure 3.3 The maximum likelihood tree of 4 F. virguliforme isolates from an alignment of 319 informative columns . 29.
Recommended publications
  • Phylogenetic Analysis of Anostracans (Branchiopoda: Anostraca) Inferred from Nuclear 18S Ribosomal DNA (18S Rdna) Sequences
    MOLECULAR PHYLOGENETICS AND EVOLUTION Molecular Phylogenetics and Evolution 25 (2002) 535–544 www.academicpress.com Phylogenetic analysis of anostracans (Branchiopoda: Anostraca) inferred from nuclear 18S ribosomal DNA (18S rDNA) sequences Peter H.H. Weekers,a,* Gopal Murugan,a,1 Jacques R. Vanfleteren,a Denton Belk,b and Henri J. Dumonta a Department of Biology, Ghent University, Ledeganckstraat 35, B-9000 Ghent, Belgium b Biology Department, Our Lady of the Lake University of San Antonio, San Antonio, TX 78207, USA Received 20 February 2001; received in revised form 18 June 2002 Abstract The nuclear small subunit ribosomal DNA (18S rDNA) of 27 anostracans (Branchiopoda: Anostraca) belonging to 14 genera and eight out of nine traditionally recognized families has been sequenced and used for phylogenetic analysis. The 18S rDNA phylogeny shows that the anostracans are monophyletic. The taxa under examination form two clades of subordinal level and eight clades of family level. Two families the Polyartemiidae and Linderiellidae are suppressed and merged with the Chirocephalidae, of which together they form a subfamily. In contrast, the Parartemiinae are removed from the Branchipodidae, raised to family level (Parartemiidae) and cluster as a sister group to the Artemiidae in a clade defined here as the Artemiina (new suborder). A number of morphological traits support this new suborder. The Branchipodidae are separated into two families, the Branchipodidae and Ta- nymastigidae (new family). The relationship between Dendrocephalus and Thamnocephalus requires further study and needs the addition of Branchinella sequences to decide whether the Thamnocephalidae are monophyletic. Surprisingly, Polyartemiella hazeni and Polyartemia forcipata (‘‘Family’’ Polyartemiidae), with 17 and 19 thoracic segments and pairs of trunk limb as opposed to all other anostracans with only 11 pairs, do not cluster but are separated by Linderiella santarosae (‘‘Family’’ Linderiellidae), which has 11 pairs of trunk limbs.
    [Show full text]
  • Investgating Determinants of Phylogeneic Accuracy
    IMPACT OF MOLECULAR EVOLUTIONARY FOOTPRINTS ON PHYLOGENETIC ACCURACY – A SIMULATION STUDY Dissertation Submitted to The College of Arts and Sciences of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree Doctor of Philosophy in Biology by Bhakti Dwivedi UNIVERSITY OF DAYTON August, 2009 i APPROVED BY: _________________________ Gadagkar, R. Sudhindra Ph.D. Major Advisor _________________________ Robinson, Jayne Ph.D. Committee Member Chair Department of Biology _________________________ Nielsen, R. Mark Ph.D. Committee Member _________________________ Rowe, J. John Ph.D. Committee Member _________________________ Goldman, Dan Ph.D. Committee Member ii ABSTRACT IMPACT OF MOLECULAR EVOLUTIONARY FOOTPRINTS ON PHYLOGENETIC ACCURACY – A SIMULATION STUDY Dwivedi Bhakti University of Dayton Advisor: Dr. Sudhindra R. Gadagkar An accurately inferred phylogeny is important to the study of molecular evolution. Factors impacting the accuracy of a phylogenetic tree can be traced to several consecutive steps leading to the inference of the phylogeny. In this simulation-based study our focus is on the impact of the certain evolutionary features of the nucleotide sequences themselves in the alignment rather than any source of error during the process of sequence alignment or due to the choice of the method of phylogenetic inference. Nucleotide sequences can be characterized by summary statistics such as sequence length and base composition. When two or more such sequences need to be compared to each other (as in an alignment prior to phylogenetic analysis) additional evolutionary features come into play, such as the overall rate of nucleotide substitution, the ratio of two specific instantaneous, rates of substitution (rate at which transitions and transversions occur), and the shape parameter, of the gamma distribution (that quantifies the extent of iii heterogeneity in substitution rate among sites in an alignment).
    [Show full text]
  • Eidesstattliche Erklärung
    ZENTRUM FÜR BIODIVERSITÄT UND NACHHALTIGE LANDNUTZUNG SEKTION BIODIVERSITÄT, ÖKOLOGIE UND NATURSCHUTZ CENTRE OF BIODIVERSITY AND SUSTAINABLE LAND USE SECTION: BIODIVERSITY, ECOLOGY AND NATURE CONSERVATION Mitochondrial genomes and the complex evolutionary history of the cercopithecine tribe Papionini Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultäten der Georg-August-Universität zu Göttingen vorgelegt von Dipl. Biol. Rasmus Liedigk aus Westerstede Göttingen, September 2014 Referent: PD Dr. Christian Roos Korreferent: Prof. Dr. Eckhard Heymann Tag der mündlichen Prüfung: 19.9.2014 Table of content 1 General introduction .............................................................................................. 1 1.1 An introduction to phylogenetics ....................................................................... 1 1.2 Tribe Papionini – subfamily Cercopithecinae ..................................................... 3 1.2.1 Subtribe Papionina.................................................................................... 4 1.2.2 Subtribe Macacina, genus Macaca ........................................................... 5 1.3 Papionin fossils in Europe and Asia................................................................... 7 1.3.1 Fossils of Macaca ..................................................................................... 8 1.3.2 Fossils of Theropithecus ........................................................................... 9 1.4 The mitochondrial genome and its
    [Show full text]
  • Comparative Phylogeography As an Integrative Approach to Historical Biogeography
    Journal of Biogeography, 28, 819±825 Comparative phylogeography as an integrative approach to historical biogeography ABSTRACT GUEST EDITORIAL Phylogeography has become a powerful approach for elucidating contemporary geographical patterns of evolutionary subdivision within species and species complexes. A recent extension of this approach is the comparison of phylogeographic patterns of multiple co-distributed taxonomic groups, or `comparative phylogeography.' Recent comparative phylogeographic studies have revealed pervasive and previously unrecognized biogeographic patterns which suggest that vicariance has played a more important role in the historical development of modern biotic assemblages than current taxonomy would indicate. Despite the utility of comparative phylogeography for uncovering such `cryptic vicariance', this approach has yet to be embraced by some researchers as a valuable complement to other approaches to historical biogeography. We address here some of the common misconceptions surrounding comparative phylogeography, provide an example of this approach based on the boreal mammal fauna of North America, and argue that together with other approaches, comparative phylogeography can contribute importantly to our understanding of the relationship between earth history and biotic diversi®cation. Keywords Area cladistics, comparative phylogeography, historical biogeography, vicariance. INTRODUCTION In a recent guest editorial Humphries (2000) presented an overview of historical biogeography. He concentrated largely on comparisons
    [Show full text]
  • Phylogeny Codon Models • Last Lecture: Poor Man’S Way of Calculating Dn/Ds (Ka/Ks) • Tabulate Synonymous/Non-Synonymous Substitutions • Normalize by the Possibilities
    Phylogeny Codon models • Last lecture: poor man’s way of calculating dN/dS (Ka/Ks) • Tabulate synonymous/non-synonymous substitutions • Normalize by the possibilities • Transform to genetic distance KJC or Kk2p • In reality we use codon model • Amino acid substitution rates meet nucleotide models • Codon(nucleotide triplet) Codon model parameterization Stop codons are not allowed, reducing the matrix from 64x64 to 61x61 The entire codon matrix can be parameterized using: κ kappa, the transition/transversionratio ω omega, the dN/dS ratio – optimizing this parameter gives the an estimate of selection force πj the equilibrium codon frequency of codon j (Goldman and Yang. MBE 1994) Empirical codon substitution matrix Observations: Instantaneous rates of double nucleotide changes seem to be non-zero There should be a mechanism for mutating 2 adjacent nucleotides at once! (Kosiol and Goldman) • • Phylogeny • • Last lecture: Inferring distance from Phylogenetic trees given an alignment How to infer trees and distance distance How do we infer trees given an alignment • • Branch length Topology d 6-p E 6'B o F P Edo 3 vvi"oH!.- !fi*+nYolF r66HiH- .) Od-:oXP m a^--'*A ]9; E F: i ts X o Q I E itl Fl xo_-+,<Po r! UoaQrj*l.AP-^PA NJ o - +p-5 H .lXei:i'tH 'i,x+<ox;+x"'o 4 + = '" I = 9o FF^' ^X i! .poxHo dF*x€;. lqEgrE x< f <QrDGYa u5l =.ID * c 3 < 6+6_ y+ltl+5<->-^Hry ni F.O+O* E 3E E-f e= FaFO;o E rH y hl o < H ! E Y P /-)^\-B 91 X-6p-a' 6J.
    [Show full text]
  • University of Florida Thesis Or Dissertation Formatting
    TOOLS FOR BIODIVERSITY ANALYSES USING NATURAL HISTORY COLLECTIONS AND REPOSITORIES: DATA MINING, MACHINE LEARNING AND PHYLODIVERSITY By CHANDRA EARL A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2020 1 . © 2020 Chandra Earl 2 . ACKNOWLEDGMENTS I thank my co-chairs and members of my supervisory committee for their mentoring and generous support, my collaborators and colleagues for their input and support and my parents and siblings for their loving encouragement and interest. 3 . TABLE OF CONTENTS page ACKNOWLEDGMENTS .................................................................................................. 3 LIST OF TABLES ............................................................................................................ 5 LIST OF FIGURES .......................................................................................................... 6 ABSTRACT ..................................................................................................................... 8 CHAPTER 1 INTRODUCTION ...................................................................................................... 9 2 GENEDUMPER: A TOOL TO BUILD MEGAPHYLOGENIES FROM GENBANK DATA ...................................................................................................................... 12 Materials and Methods...........................................................................................
    [Show full text]
  • Family Classification
    1.0 GENERAL INTRODUCTION 1.1 Henckelia sect. Loxocarpus Loxocarpus R.Br., a taxon characterised by flowers with two stamens and plagiocarpic (held at an angle of 90–135° with pedicel) capsular fruit that splits dorsally has been treated as a section within Henckelia Spreng. (Weber & Burtt, 1998 [1997]). Loxocarpus as a genus was established based on L. incanus (Brown, 1839). It is principally recognised by its conical, short capsule with a broader base often with a hump-like swelling at the upper side (Banka & Kiew, 2009). It was reduced to sectional level within the genus Didymocarpus (Bentham, 1876; Clarke, 1883; Ridley, 1896) but again raised to generic level several times by different authors (Ridley, 1905; Burtt, 1958). In 1998, Weber & Burtt (1998 ['1997']) re-modelled Didymocarpus. Didymocarpus s.s. was redefined to a natural group, while most of the rest Malesian Didymocarpus s.l. and a few others morphologically close genera including Loxocarpus were transferred to Henckelia within which it was recognised as a section within. See Section 4.1 for its full taxonomic history. Molecular data now suggests that Henckelia sect. Loxocarpus is nested within ‗Twisted-fruited Asian and Malesian genera‘ group and distinct from other didymocarpoid genera (Möller et al. 2009; 2011). 1.2 State of knowledge and problem statements Henckelia sect. Loxocarpus includes 10 species in Peninsular Malaysia (with one species extending into Peninsular Thailand), 12 in Borneo, two in Sumatra and one in Lingga (Banka & Kiew, 2009). The genus Loxocarpus has never been monographed. Peninsular Malaysian taxa are well studied (Ridley, 1923; Banka, 1996; Banka & Kiew, 2009) but the Bornean and Sumatran taxa are poorly known.
    [Show full text]
  • The Probability of Monophyly of a Sample of Gene Lineages on a Species Tree
    PAPER The probability of monophyly of a sample of gene COLLOQUIUM lineages on a species tree Rohan S. Mehtaa,1, David Bryantb, and Noah A. Rosenberga aDepartment of Biology, Stanford University, Stanford, CA 94305; and bDepartment of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand Edited by John C. Avise, University of California, Irvine, CA, and approved April 18, 2016 (received for review February 5, 2016) Monophyletic groups—groups that consist of all of the descendants loci that are reciprocally monophyletic is informative about the of a most recent common ancestor—arise naturally as a conse- time since species divergence and can assist in representing the quence of descent processes that result in meaningful distinctions level of differentiation between groups (4, 18). between organisms. Aspects of monophyly are therefore central to Many empirical investigations of genealogical phenomena have fields that examine and use genealogical descent. In particular, stud- made use of conceptual and statistical properties of monophyly ies in conservation genetics, phylogeography, population genetics, (19). Comparisons of observed monophyly levels to model pre- species delimitation, and systematics can all make use of mathemat- dictions have been used to provide information about species di- ical predictions under evolutionary models about features of mono- vergence times (20, 21). Model-based monophyly computations phyly. One important calculation, the probability that a set of gene have been used alongside DNA sequence differences between and lineages is monophyletic under a two-species neutral coalescent within proposed clades to argue for the existence of the clades model, has been used in many studies. Here, we extend this calcu- (22), and tests involving reciprocal monophyly have been used to lation for a species tree model that contains arbitrarily many species.
    [Show full text]
  • Resolving Postglacial Phylogeography Using High-Throughput Sequencing
    Resolving postglacial phylogeography using high-throughput sequencing Kevin J. Emerson1, Clayton R. Merz, Julian M. Catchen, Paul A. Hohenlohe, William A. Cresko, William E. Bradshaw, and Christina M. Holzapfel Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, OR 97403-5289 Edited by David L. Denlinger, Ohio State University, Columbus, OH, and approved August 4, 2010 (received for review May 11, 2010) The distinction between model and nonmodel organisms is becom- not useful for determining the genetic similarity in closely related ing increasingly blurred. High-throughput, second-generation se- populations of nonmodel species or species for which whole-genome quencing approaches are being applied to organisms based on their resequencing is not yet possible. interesting ecological, physiological, developmental, or evolutionary Phylogeography and phylogenetics have recently benefited from properties and not on the depth of genetic information available for genome-wide SNP detection methods to elucidate patterns of fi them. Here, we illustrate this point using a low-cost, ef cient tech- variation in model taxa or their close relatives (7, 13). The limiting fi nique to determine the ne-scale phylogenetic relationships among step in the applicability of multilocus datasets in nonmodel organ- recently diverged populations in a species. This application of restric- isms has been generating the genetic markers to be used (14). tion site-associated DNA tags (RAD tags) reveals previously unre- solved genetic structure and direction of evolution in the pitcher Baird et al. (4) developed a second generation sequencing ap- plant mosquito, Wyeomyia smithii, from a southern Appalachian proach that allowed for the simultaneous discovery and typing of Mountain refugium following recession of the Laurentide Ice Sheet thousands of SNPs throughout the genome (5).
    [Show full text]
  • Genetics 540 Winter, 2001 Models of DNA Evolution – Part 2 Joe Felsenstein Department of Genetics University of Washington
    Genetics 540 Winter, 2001 Models of DNA evolution { part 2 Joe Felsenstein Department of Genetics University of Washington joe@genetics A model of variation in evolutionary rates among sites The basic idea is that the rate at each site is drawn independently from a distribution of rates. The most widely used choice is the Gamma distribution, which has density function (if its mean is 1): αα rα 1 e α r f(r) = − − Γ(α) Unrealistic aspects of the model: • There is no reason, aside from mathematical convenience, to assume that the Gamma is the right distribution. A common variation is to assume there is a separate probability f0 of having rate 0. • Rates at different sites appear to be correlated, which this model does not allow. • Rates are not constant throughout evolution { they change with time. α = 0.25 cv = 2 α = 11.1111 α = 1 cv = 0.3 frequency cv = 1 0 0.5 1 1.5 2 rate Gamma distributions with mean 1 and different coefficients of variation (standard deviation / mean). α = 1=CV 2 is the \shape parameter" of the Gamma distribution Hidden Markov Models These are the most widely used models allowing rate variation to be correlated along the sequence. We assume: • There are a finite number of rates, m. Rate i is ri. • There are probabilities pi of a site having rate i. • A process not visible to us (\hidden") assigns rates to sites. It is a Markov process working along the sequence. For example it might have transition probability Prob (jji) of changing to rate j in the next site, given that it is at rate i in this site.
    [Show full text]
  • A Comparative Phenetic and Cladistic Analysis of the Genus Holcaspis Chaudoir (Coleoptera: .Carabidae)
    Lincoln University Digital Thesis Copyright Statement The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). This thesis may be consulted by you, provided you comply with the provisions of the Act and the following conditions of use: you will use the copy only for the purposes of research or private study you will recognise the author's right to be identified as the author of the thesis and due acknowledgement will be made to the author where appropriate you will obtain the author's permission before publishing any material from the thesis. A COMPARATIVE PHENETIC AND CLADISTIC ANALYSIS OF THE GENUS HOLCASPIS CHAUDOIR (COLEOPTERA: CARABIDAE) ********* A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Lincoln University by Yupa Hanboonsong ********* Lincoln University 1994 Abstract of a thesis submitted in partial fulfilment of the requirements for the degree of Ph.D. A comparative phenetic and cladistic analysis of the genus Holcaspis Chaudoir (Coleoptera: .Carabidae) by Yupa Hanboonsong The systematics of the endemic New Zealand carabid genus Holcaspis are investigated, using phenetic and cladistic methods, to construct phenetic and phylogenetic relationships. Three different character data sets: morphological, allozyme and random amplified polymorphic DNA (RAPD) based on the polymerase chain reaction (PCR), are used to estimate the relationships. Cladistic and morphometric analyses are undertaken on adult morphological characters. Twenty six external morphological characters, including male and female genitalia, are used for cladistic analysis. The results from the cladistic analysis are strongly congruent with previous publications. The morphometric analysis uses multivariate discriminant functions, with 18 morphometric variables, to derive a phenogram by clustering from Mahalanobis distances (D2) of the discrimination analysis using the unweighted pair-group method with arithmetical averages (UPGMA).
    [Show full text]
  • Math/C SC 5610 Computational Biology Lecture 12: Phylogenetics
    Math/C SC 5610 Computational Biology Lecture 12: Phylogenetics Stephen Billups University of Colorado at Denver Math/C SC 5610Computational Biology – p.1/25 Announcements Project Guidelines and Ideas are posted. (proposal due March 8) CCB Seminar, Friday (Mar. 4) Speaker: Jack Horner, SAIC Title: Phylogenetic Methods for Characterizing the Signature of Stage I Ovarian Cancer in Serum Protein Mas Time: 11-12 (Followed by lunch) Place: Media Center, AU008 Math/C SC 5610Computational Biology – p.2/25 Outline Distance based methods for phylogenetics UPGMA WPGMA Neighbor-Joining Character based methods Maximum Likelihood Maximum Parsimony Math/C SC 5610Computational Biology – p.3/25 Review: Distance Based Clustering Methods Main Idea: Requires a distance matrix D, (defining distances between each pair of elements). Repeatedly group together closest elements. Different algorithms differ by how they treat distances between groups. UPGMA (unweighted pair group method with arithmetic mean). WPGMA (weighted pair group method with arithmetic mean). Math/C SC 5610Computational Biology – p.4/25 UPGMA 1. Initialize C to the n singleton clusters f1g; : : : ; fng. 2. Initialize dist(c; d) on C by defining dist(fig; fjg) = D(i; j): 3. Repeat n ¡ 1 times: (a) determine pair c; d of clusters in C such that dist(c; d) is minimal; define dmin = dist(c; d). (b) define new cluster e = c S d; update C = C ¡ fc; dg Sfeg. (c) define a node with label e and daughters c; d, where e has distance dmin=2 to its leaves. (d) define for all f 2 C with f 6= e, dist(c; f) + dist(d; f) dist(e; f) = dist(f; e) = : (avg.
    [Show full text]