A Genomic and Transcriptomic Study of Lineage-Specific Variation in Mycobacterium Tuberculosis

A genomic and transcriptomic study of lineage-specific variation in Mycobacterium tuberculosis Graham David Rose Thesis submitted for the degree of Doctor of Philosophy 2013 MRC National Institute for Medical Research Declaration I, Graham David Rose, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Signed………………………………………….Date…………………………………….. The thesis work was conducted from September 2009 to March 2013 at the MRC National Institute of Medical Research (NIMR), London, UK, under the supervision of Douglas Young (NIMR, London), and Sebastien Gagneux (Swiss Tropical and Public Health Institute, Switzerland). ii Abstract Human tuberculosis (TB) is caused by several closely related species of bacteria collectively known as the Mycobacterium tuberculosis complex (MTBC). In this thesis the identification and effect of lineage-specific genetic variation within the phylogenetic lineages of the MTBC was investigated using a combination of computational methods and high-throughput sequencing technology. Genome sequencing has now identified an extensive repertoire of single nucleotide polymorphisms (SNPs) amongst clinical isolates of the MTBC. Comparative analysis focused on the detection of all lineage-specific SNPs, providing the first glimpse of the total SNP diversity that separates the main phylogenetic lineages from each other. Bioinformatic analysis focused on SNPs more likely to contribute to functional diversity, which predicted nearly half of all SNPs in the MTBC to have functional consequences, while SNPs within regulatory proteins were over-represented. To determine whether these and other lineage- specific SNPs lead to phenotypic diversity, genome datasets were integrated with RNA- sequencing to assess their impact on the comparative transcriptome profiles of strains belonging to two MTBC lineages. Analysing the transcriptomes in the light of the underlying genetic variation found clear correlations between genotype and transcriptional phenotype. These arose by three mechanisms. First, lineage-specific changes in amino acid sequence of transcriptional regulators were associated with alterations in their ability to control gene expression. Second, changes in nucleotide sequence were associated with alteration of promoter activity and generation of novel transcriptional start sites in intergenic regions and within coding sequences. Finally, genes showing lineage-specific patterns of differential expression not linked directly to primary mutations were characterised by a striking over- representation of toxin-antitoxin pairs. iii Acknowledgements This thesis would not have been possible without the efforts of my colleagues and friends. Firstly I would like to thank my PhD supervisors Sebastien Gagneux and Douglas Young for their support and guidance throughout my project, providing me with their invaluable depth of knowledge and resources. Of special note were the annual Gagneux group retreats in Charmey and Les Diablerets, which always provided a healthy mix of stimulating scientific discussions about my projects and great food, including of course the meringue et la crème double. I am grateful to my three thesis supervisor’s, Delmiro Fernandez-Reyes, Roger Buxton and Seb, who were a great help in contextualising my ideas and providing a focus. My thesis relied heavily on sequence data, and as such I thank Abdul Sesay and the rest of the High Throughput Sequencing group at NIMR for performing the Illumina sequencing. Next I would like to thank Iñaki Comas, who was always happy to answer my questions on evolutionary theory and phylogenomics, and provide more general daily support on all things computational. I also thank the other original member of the Gagneux group at NIMR, Sonia Borrell, particularly so for her help in getting me up and running in the lab at the start, and then the current members of Douglas Young’s group, including Kristine Arnvig, for her guidance on the RNA side of my project, and Steve Coade, who was my Biosafety Containment Level 3 trainer for the first six months of my PhD. My time at NIMR would not have been as enjoyable without my colleagues and friends Christina Kahramanoglou and Teresa Cortés Méndez, and to Teresa, I am indebted to you for your support in keeping me focused and all things in perspective during the final few months. I apologise that despite your and the past efforts from the Spanish contingent of the group that my vocabulary is still quite limited in your language. One day! Of course I am grateful to my parents, who provided me with their untiring support to undertake my studies throughout the years, and to my brother Phil for his advice and the countless Sunday lunches in Balham. Finally I am grateful to the Medical Research Council (MRC) for their funding, who supported not only my university costs and living expenses for the last three and a half years, but the research of many of my colleagues as well. Thank you. iv CONTENTS Contents Declaration...……………………………………………………………………………..ii Abstract.…………………………………………………………………………………iii Acknowledgements...……………………………………………………………………iv List of Figures...………………………………………………………………………….x List of Tables...………………………………………………………………………….xii Glossary...………………………………………………………………………………xiii Chapter 1 Introduction ............................................................................................... 1 1.1 The genus Mycobacterium ................................................................................ 2 1.1.1 Taxonomy ..................................................................................................... 2 1.1.2 The Mycobacterium tuberculosis complex (MTBC) .................................... 4 1.1.3 TB disease in humans ................................................................................... 5 1.1.4 Disease diversity ........................................................................................... 6 1.2 Genetic diversity in the MTBC ........................................................................ 7 1.2.1 General features of the M. tuberculosis genome .......................................... 7 1.2.2 Typing the MTBC ......................................................................................... 7 1.2.3 The phylogenetic lineages of the MTBC ...................................................... 9 1.2.4 Origin of the MTBC ................................................................................... 13 1.2.5 Selective pressures acting within the MTBC .............................................. 13 1.3 Phenotypic diversity ........................................................................................ 15 1.3.1 Laboratory strains ....................................................................................... 15 1.3.2 Clinical strain phenotype ............................................................................ 16 1.4 Linking genotype to phenotype ...................................................................... 17 1.4.1 In silico prediction of functional SNPs ....................................................... 19 1.4.2 Gene expression diversity ........................................................................... 20 1.4.3 High throughput DNA sequencing technology ........................................... 22 1.5 Thesis Outline .................................................................................................. 25 v CONTENTS Chapter 2 Materials and Methods ........................................................................... 26 2.1 General microbiological methods .................................................................. 26 2.1.1 Containment 3 laboratory ........................................................................... 26 2.1.2 General chemicals and reagents .................................................................. 26 2.1.3 Bacterial culture and storage ....................................................................... 27 2.1.4 Growth curves ............................................................................................. 27 2.2 Molecular biology techniques ......................................................................... 28 2.2.1 Genomic DNA extraction ........................................................................... 28 2.2.2 RNA Isolation and handling ....................................................................... 28 2.2.3 Quantification of DNA and RNA by Nanodrop ......................................... 29 2.2.4 Determination of DNA and RNA integrity by micro fluidics .................... 30 2.2.5 Removal of DNA contamination from RNA samples ................................ 30 2.2.6 Polymerase chain reaction (PCR) ............................................................... 30 2.3 Materials ........................................................................................................... 31 2.3.1 Mycobacterium tuberculosis strains ........................................................... 31 2.4 DNA-seq ............................................................................................................ 31 2.5 RNA-seq ............................................................................................................ 32 2.5.1 Strand specific RNA-seq libraries .............................................................. 32 2.5.2 TSS 5’ enriched RNA-seq libraries ...........................................................

A Genomic and Transcriptomic Study of Lineage-Specific Variation in Mycobacterium Tuberculosis

Detection, Survival and Infectious Potential of Mycobacterium Tuberculosis in the Environment: a Review of the Evidence and Epidemiological Implications

Mycobacteria in Northern Tanzania: Exposure and Risk of Disease Among Agropastoralists and Programmatic Challenges in Investigation of Re-Treatment Cases

Computational Identification of the Proteins Associated with Quorum

Glossery of Tb Terms Acid-Fast Bacilli- (Afb)

7. Smooth Tubercle Bacilli Neglected Opportunistic Tropical Pathogens

APUTS) Reporting Terminology and Codes Microbiology (V1.0

Mycobacteriology

Developing Methodologies for the Investigation of Free-Living Amoeba As a Tool for Pathogen Surveillance on Dairy Farms and Aquaculture

Archives of Veterinary and Animal Sciences

No Need for Metadata Here

Framework for the Development Of

Mycobacterial Infections in Zoo Animals: Relevance, Diagnosis and Management* A