Whole Genome Comparison of 1,803 Bacteria: an Analysis of Genetic Relatedness

Whole genome comparison of 1,803 bacteria: An analysis of genetic relatedness and species-specific antibiotic target identification. A dissertation presented by Anthony Bissell to The Department of Biology in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the field of Biology Northeastern University Boston, Massachusetts October 28, 2013 © 2013 Anthony Bissell ALL RIGHTS RESERVED 1 Whole genome comparison of 1,803 bacteria: An analysis of genetic relatedness and species-specific antibiotic target identification. by Anthony Bissell ABSTRACT OF DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Biology in the College of Science of Northeastern University October 28, 2013 2 ABSTRACT The primary method of identifying and classifying novel bacteria is by comparison of 16S ribosomal RNA gene sequences. In order to better understand the relationship between 16S gene sequence relatedness and similarity of gene content between bacteria, we conducted all pairwise comparisons of all predicted open reading frames (ORFs) from 1,803 fully sequenced bacteria. Previously, mean amino acid identity (AAI) of shared gene content has been used to characterize bacterial genome relatedness. However, because AAI only accounts for orthologous genes, it does not account for the major differences in pan-genomes or unique genes; nor does it distinguish organisms with relatively small genomes (such as Mycoplasma). In order to fully characterize the relationship between bacterial genomes, we developed a novel metric—termed genome similarity index (GSI). Simply, a comparison with a GSI of g has g percent of ORFs with at least g percent similarity. Unexpectedly, we find that the range of the 95th percentile of GSI of all bacterial relationships increased with 16S percent identity indicating inconsistencies between 16S percent identity and genome similarity. At a 16S percent identity of 97%, genome similarity ranged from 49% to 100%. In addition, we analyzed the relationship between current biological taxonomic classifications (phylum, class, order, family, genus, and species) and the GSI, AAI, and 16S. With every metric we examined, there was overlap in the distribution of intra-taxonomic comparisons. Finally, the GSI (when compared to 16S percent identity) identified, where AAI did not, a distinct subset of obligate blood-borne pathogens termed hemotropic Mycoplasma with relatively low 16S percent identity and GSI compared to the rest of the bacteria. 3 ACKNOWLEDGEMENTS I would like to thank my thesis advisor Kim Lewis for providing excellent guidance and support and for the opportunity to work with a great group of scientists for the past seven years at the Antimicrobial Discovery Center (ADC). To my dissertation committee members Christopher Sassetti, Slava Epstein, Amy Spoering, and Ken Coleman; thank you for providing guidance and advice on my thesis project. Special thanks to my committee member Eric Stewart for all the advice, support, friendship, and conversations over many cups of coffee. Thank you to Laura Fleck and Chao Chen, my fellow drug discovery comrades. Thank you to Heather for all the help with M. tuberculosis. Thank you to Katya Gavrish for all your help and advice over the years. Thank you to Iris Keren for the brilliant discussions and critiques of my research plans. Thank you to the people at NovoBiotic; Dallas Hughes, Aaron Peoples, Losee Ling, Theresa Farrell, and Ashley Zullo, your help and advice has been invaluable. Thank you to Bill Sheehan for the assistance with the Northeastern University Linux Cluster. Thank you to all the members of the ADC, past and present, for making the lab a wonderful place to do science. To my parents; Mom and Errol and my siblings; Nick and Sam, I sincerely appreciate all the love and support through all of my endeavors. To my wife Thea, I am forever grateful for your unending love, encouragement, advice, and support during the past three years! 4 TABLE OF CONTENTS ABSTRACT ................................................................................................................................ 3 ACKNOWLEDGEMENTS .......................................................................................................... 4 TABLE OF CONTENTS ............................................................................................................. 5 LIST OF FIGURES .................................................................................................................... 7 LIST OF TABLES ......................................................................................................................10 Chapter 1: Introduction ..............................................................................................................11 Whole genome sequence comparison of bacteria. ................................................................12 Identification of species-specific antimicrobial targets. ...........................................................14 Dissertation Aims ..................................................................................................................16 Chapter 2: Pairwise comparison of bacterial genomes ..............................................................19 Introduction ...........................................................................................................................20 Results ..................................................................................................................................21 Genomic relatedness .........................................................................................................21 Taxonomic classification ....................................................................................................30 Neighbor-joining trees. .......................................................................................................32 Single strain analyses ........................................................................................................39 Discussion .............................................................................................................................42 Materials and Methods ..........................................................................................................43 Determination of genetic relatedness .................................................................................43 Taxonomy classification. ....................................................................................................44 Taxonomy neighbor-joining trees. ......................................................................................44 Density scatterplot visualization. ........................................................................................44 Chapter 3: Identification of species-selective antibiotic targets in Mycobacterium tuberculosis. 45 Introduction ...........................................................................................................................46 Results ..................................................................................................................................47 Identification of Mycobacterium-selective targets. ..............................................................47 Discussion .............................................................................................................................53 Materials and Methods ..........................................................................................................54 Comparison of M. tuberculosis to active and inactive strains..............................................54 Annotation of essential genes in M. tuberculosis. ...............................................................55 Identification of Mycobacterium-selective targets. ..............................................................55 5 REFERENCES .........................................................................................................................56 SUPPLEMENTAL DATA ...........................................................................................................60 APPENDIX I: Platforms for drug discovery ................................................................................64 Introduction ...........................................................................................................................64 Small molecule potentiates killing of Mycobacterium tuberculosis by rifampicin. ....................64 Introduction ........................................................................................................................64 Screen development ..........................................................................................................65 Pilot screen ........................................................................................................................69 References ........................................................................................................................72 A screen for species-selective compounds ............................................................................73 Pilot screen ........................................................................................................................73 Results and discussion ......................................................................................................76 References ........................................................................................................................77

Whole Genome Comparison of 1,803 Bacteria: an Analysis of Genetic Relatedness

Chemical Structures of Some Examples of Earlier Characterized Antibiotic and Anticancer Specialized

The Role of Earthworm Gut-Associated Microorganisms in the Fate of Prions in Soil

Indoor Microbiome, Environmental Characteristics and Asthma Among Junior High

Diversity of Understudied Archaeal and Bacterial Populations of Yellowstone National Park: from Genes to Genomes Daniel Colman

Investigating Prevalence and Geographical Distribution of Mycoplasma Sp. in the Gut of Atlantic Salmon (Salmo Salar L.)

Species Determination – What's in My Sample?

Global Metagenomic Survey Reveals a New Bacterial Candidate Phylum in Geothermal Springs

Novel Bacterial Diversity in an Anchialine Blue Hole On

A Tertiary-Branched Tetra-Amine, N4-Aminopropylspermidine Is A

Bordetella Petrii Clinical Isolate Isolates of This Species Have Been Previously Reported from 4

Prokaryotic Community Structure and Activity of Sulfate Reducers in Production Water from High-Temperature Oil Reservoirs with and Without Nitrate Treatment

The Mysterious Orphans of Mycoplasmataceae