Abstract a Comprehensive Phylogenetic Study of The
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT A COMPREHENSIVE PHYLOGENETIC STUDY OF THE CORE GENOME OF THIRTEEN BACILLUS AND RELATED SPECIES USING KNOWN AND NOVEL TECHNIQUES Jennifer Diane Hintzsche, Ph.D. Department of Biological Sciences Northern Illinois University, 2014 Mitrick A. Johns, Director A comprehensive study into the phylogeny of the Bacillus core genome was accomplished using two primary studies. The first study involved two previously studied methods, MUMmer and BSR, and one novel approach, RINC, to compare core genome phylogeny. BSR analysis revealed genomic rearrangements among the genomes of interest. MUMmer was used to confirm these genomic rearrangements and provide evidence that 76.2% of the inverted regions were statistically significant. Circular chromosome comparisons connecting homologous core genes revealed an ancestral inversion pivoted on the terminus in several species. The inversions were resolved, allowing for the identification of the location of the core genome before the inversion event. These analyses led to development of a novel approach to core genome phylogeny, RINC. Based on the tree produced, as well as agreement of groups of nodes with previous studies, the RINC approach was successful at comparing the core genome of Bacillus. RINC offers a simplistic yet powerful tool for core genome phylogeny. In the second study, core genomes of Bacillus and Eudicot were determined individually from whole genome sequences of members of each genus, along with closely related species using BSRs. The Eudicot core genome was used as a control for evidence that results observed in the Bacillus core genomes were not due to horizontal gene transfer. Each core genome was analyzed with MrBayes and BUCKy to test the robustness of both approaches using sequences from different domains of life. Three multiple sequence alignments, ClustalW2, MUSCLE, and T-COFFEE, were used as inputs for MrBayes. Over 75% of all genes studied had a resolved gene tree topology after one of ten million generations of MrBayes, regardless of MSA. BUCKy calculated concordance trees for both core genomes. BUCKy tree topology was not affected by the MSAs. In both core genomes, high concordance factors were found on interior nodes, defining the relationship between groups of species. Outer nodes between outgroups had lower concordance factors, leading to the conclusion that genomes of more species would need to be included in order to resolve the phylogeny from more distant relatives. NORTHERN ILLINOIS UNIVERSITY DEKALB, ILLINOIS DECEMBER 2014 A COMPREHENSIVE PHYLOGENETIC STUDY OF THE CORE GENOME OF THIRTEEN BACILLUS AND RELATED SPECIES USING KNOWN AND NOVEL TECHNIQUES BY JENNIFER DIANE HINTZSCHE ©2014 Jennifer Diane Hintzsche A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE DOCTOR OF PHILOSOPHY DEPARTMENT OF BIOLOGICAL SCIENCES Doctoral Director: Mitrick A. Johns ACKNOWLEDGEMENTS I am most grateful to the many people who have contributed their time, expertise, ideas, and resources to help complete this project. First of all, thank you to Dr. Mitrick Johns for years of training, education, and support. Without y skill set and capabilities needed for the next step in my career. Thank you for taking a chance on me in the beginning. Thank you for knowing when to let my independence shine and also when I needed your guidance. Perhaps most of all, thank you for your continued inspiration and motivation to become the best scientist I can be. Dr. Gordon Pusch, thank you for your countless hours offering critical insight and expertise. Thank you also for providing me with training and access to the computational resources at Argonne National Laboratory, which was integral in my being able to complete this project. Thank you Dr. Melvin Duvall, for your phylogenetic expertise, advice, and mentoring. Thank you Dr. Thomas Sims, for advising me since the start of my graduate career, and your continued expertise and advice. Thank you Dr. Wesley Swingley, for your expertise and advice about this project and my career options. DEDICATION To my Mom, Dad, Jessica, Lera, and Ryan TABLE OF CONTENTS Page LIST OF TABLES .............................................................................................................. x LIST OF FIGURES ............................................................................................................. xi LIST OF APPENDICES ..................................................................................................... xiv Chapter 1. INTRODUCTION ............................................................................................... 1 Phylogeny of the Bacillus Genus .................................................................. 2 Significance of Species Studied .................................................................... 10 Outline of Study ............................................................................................ 18 2. BSR & CORE GENOME .................................................................................... 26 Introduction ................................................................................................... 26 Methods ......................................................................................................... 27 Bacillus Genomes ................................................................................. 27 Bidirectional Best Hits ......................................................................... 27 BLAST Score Ratio .............................................................................. 28 Plotting BSR ......................................................................................... 28 Results and Discussion .................................................................................. 29 Core Genome ........................................................................................ 29 BSR Plots .............................................................................................. 30 vii Chapter Page 3. MUMmer ............................................................................................................. 41 Introduction ................................................................................................... 41 Methods ......................................................................................................... 44 Number of MUMs ................................................................................ 44 Length of MUMs .................................................................................. 45 Results and Discussion .................................................................................. 46 Range of MUMmer Results .................................................................. 46 Total MUMs Found .............................................................................. 52 MUMs on the Forward Diagonal ......................................................... 56 MUMs on the Reverse Diagonal .......................................................... 61 Statistical Significance with Respect to the Number of MUMs ........... 65 Significance of the number of MUMs on the Forward Diagonal ........................................................................................ 66 Significance of the number of MUMs on the Reverse Diagonal ........................................................................................ 69 Statistical Significance with Respect to the Length of MUMs ............ 73 Significance of MUM Length on the Forward Diagonal .............. 73 Significance of MUM Length on the Reverse Diagonal............... 75 Conclusions ................................................................................................... 77 4. CIRCULAR CHROMOSOMAL COMPARISON ............................................. 79 Introduction ................................................................................................... 79 Methods ......................................................................................................... 80 viii Chapter Page Circular Chromosomal Comparisons ...................................................... 80 Resolved Inversion Comparisons ............................................................ 81 Results and Discussion .................................................................................. 82 Circular Chromosomal Comparisons ...................................................... 82 Number of Inverted Genes ...................................................................... 87 Resolved Inversions Comparison ............................................................ 94 Conclusions ................................................................................................... 97 5. RINC .................................................................................................................... 99 Introduction ................................................................................................... 99 Methods ......................................................................................................... 100 Results and Discussion .................................................................................. 103 RINC Distance ......................................................................................... 103 RINC Phylogeny ..................................................................................... 110 Conclusions ........................................................................................................