Genome-Wide Studies of Mycolic Acid Bacteria: Computational Identification And

Genome-wide Studies of Mycolic Acid Bacteria: Computational Identification and Analysis of a Minimal Genome Dissertation by Frederick Kinyua Kamanu In Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy King Abdullah University of Science and Technology Thuwal, Kingdom of Saudi Arabia December, 2012 2 EXAMINATION COMMITTEE APPROVALS FORM The dissertation of Frederick Kinyua Kamanu is approved by the examination committee. Committee Chairperson: Prof. Vladimir B. Bajic Committee Co-Chair: Dr. John A.C. Archer Committee Member: Prof. Pierre Magistretti Committee Member: Prof. Xin Gao Committee Member: Prof. Christoph Gehring Committee Member: Prof. Kothandaraman Narasimhan 3 COPYRIGHT © December, 2012 Frederick Kinyua Kamanu All Rights Reserved 4 ABSTRACT Genome wide studies of mycolic acid bacteria: Computational Identification and Analysis of a Minimal Genome Frederick Kinyua Kamanu The mycolic acid bacteria are a distinct suprageneric group of asporogenous Gram- positive, high GC-content bacteria, distinguished by the presence of mycolic acids in their cell envelope. They exhibit great diversity in their cell and morphology; although primarily non-pathogens, this group contains three major pathogens Mycobacterium leprae, Mycobacterium tuberculosis complex, and Corynebacterium diphtheria. Although the mycolic acid bacteria are a clearly defined group of bacteria, the taxonomic relationships between its constituent genera and species are less well defined. Two approaches were tested for their suitability in describing the taxonomy of the group. First, a Multilocus Sequence Typing (MLST) experiment was assessed and found to be superior to monophyletic (16S small ribosomal subunit) in delineating a total of 52 mycolic acid bacterial species. Phylogenetic inference was performed using the neighbor- joining method. To further refine phylogenetic analysis and to take advantage of the widespread availability of bacterial genome data, a computational framework that simulates DNA-DNA hybridisation was developed and validated using multiscale bootstrap resampling. The tool classifies microbial genomes based on whole genome DNA, and was deployed as a web-application using PHP and Javascript. It is accessible online at http://cbrc.kaust.edu.sa/dna_hybridization/ 5 A third study was a computational and statistical methods in the identification and analysis of a putative minimal mycolic acid bacterial genome so as to better understand (1) the genomic requirements to encode a mycolic acid bacterial cell and (2) the role and type of genes and genetic elements that lead to the massive increase in genome size in environmental mycolic acid bacteria. Using a reciprocal comparison approach, a total of 690 orthologous gene clusters forming a putative minimal genome were identified across 24 mycolic acid bacterial species. In order to identify new potential drug candidates against the pathogenic mycolic acid bacteria, a drug target analysis of the protein-coding genes, that are conserved across 15 pathogens, identified a total of 175 potential broad- spectrum drug candidates. The activity of the predicted targets can be modified by a total of 306 chemical compounds, 34 of which have been approved by the Food and Drug Administration (FDA), for human treatment. 6 ACKNOWLEDGMENTS I would like to thank my PhD advisors, Prof. Vladimir Bajic and Dr. John Archer, for their guidance throughout the study. Special thanks goes to Dr. Noor Azlin Mokhtar for all the help in understanding mycolic acid bacteria. Finally I'd like to thank my mum, brother and sister for their love and support. 7 TABLE OF CONTENTS EXAMINATION COMMITTEE APPROVALS FORM ...............................................2 COPYRIGHT..................................................................................................................3 ABSTRACT ...................................................................................................................4 ACKNOWLEDGMENTS...............................................................................................6 TABLE OF CONTENTS................................................................................................7 LIST OF ABBREVIATIONS..........................................................................................9 LIST OF FIGURES.......................................................................................................10 LIST OF TABLES.........................................................................................................11 Chapter 1: Introduction.................................................................................................12 1.1 Bacterial Genomes..............................................................................................12 1.1.1 Genome evolution.......................................................................................14 1.1.2 Genome annotation.....................................................................................16 1.1.3 Bacterial systematics and taxonomy...........................................................19 1.1.4 The minimal genome..................................................................................24 1.2 Computational Methods.....................................................................................28 1.2.1 Computers in biology.................................................................................28 1.2.2 Cluster analysis of biological data..............................................................32 1.3 Mycolic acid bacteria.........................................................................................34 1.4 Thesis aims and objectives.................................................................................37 1.5 Thesis outline......................................................................................................38 Chapter 2: Methods.......................................................................................................40 2.1 Research overview..............................................................................................40 2.2 A multi-gene phylogenetic analysis of mycolic acid bacteria............................41 2.2.1 Gene sequence data.....................................................................................41 2.2.2 Reconstruction of single-gene trees............................................................42 2.2.3 Reconstruction of the multi-gene tree.........................................................42 2.3 In silico DNA-DNA hybridisation .....................................................................43 2.3.1 Whole genome sequence data.....................................................................44 2.3.2 Computing similarity indices from whole genomes...................................46 2.3.3 Hierarchical clustering................................................................................46 2.3.4 Development of a web application ............................................................47 2.3.5 rRNA-based phylogenetic reconstruction of the mycolic acid bacteria.....50 2.4 Analysis of the minimal mycolic acid bacterial genome ...................................51 2.4.1 Identification of the core essential genes....................................................52 2.4.2 Visualizing genome annotations.................................................................52 2.4.3 Metabolic pathway analysis........................................................................53 2.4.4 Operon analysis..........................................................................................54 2.4.5 Comparative analysis of the pathogens and non-pathogens.......................55 2.4.6 Analysis of the pathogenic core genome....................................................55 Chapter 3: A multi-gene phylogenetic analysis of mycolic acid bacteria ..................58 3.1 Research overview..............................................................................................58 3.2 Single-gene and multi-gene phylogenetic reconstruction .................................59 3.3 The Rhodococcus and Nocardia genera.............................................................64 8 Chapter 4: In silico DNA-DNA hybridisation..............................................................65 4.1 Research overview .............................................................................................65 4.2 Mycolic acid bacteria genomes..........................................................................66 4.3 Computing genome-to-genome similarity..........................................................69 4.4 Genome clustering and analysis.........................................................................70 4.5 Development of a web-application for DNA hybridisation...............................77 4.6 Ribosomal RNA (rRNA)-based phylogeny........................................................79 4.7 A comparison of the 5S, 16S and 23S rRNA phylogeny....................................80 4.8 A comparison of the rRNA and in silico DNA hybridization phylogeny...........86 Chapter 5: Defining the minimal mycolic acid bacterial genome................................88 5.1 Research overview..............................................................................................88 5.2 Genome size and structure .................................................................................89 5.3 Identification of the core genome.......................................................................92

Genome-Wide Studies of Mycolic Acid Bacteria: Computational Identification And

Microorganisms

Investigation of New Actinobacteria for the Biodesulphurisation of Diesel Fuel

A Novel Approach to the Discovery of Natural Products from Actinobacteria Rahmy Tawfik University of South Florida, [email protected]

Characterization of Actinobacteria Degrading and Tolerating Organic Pollutants and Tolerating Organic Pollutants

An Innovative Approach to Resolve the Structure of Complex Prokaryotic Taxa

Lipid Accumulation by Rhodococcus Rhodochrous

Diversity of Culturable Nocardioform Actinomycetes from Wastewater Treatment Plants in Spain and Their Role in the Biodegradability of Aromatic Compounds

Genome-Based Taxonomic Classification of the Phylum

Document Downloaded From: This Paper Must Be Cited As: the Final Publication Is Available at Copyright

(12) United States Patent (10) Patent No.: US 9,157,058 B2 Dalla-Betta Et Al

Provided for Non-Commercial Research and Educational Use. Not for Reproduction, Distribution Or Commercial Use

A Report of 12 Unrecorded Prokaryotic Species Isolated from Gastrointestinal Tracts and Feces of Various Endangered Animals in Korea