The Evolution of the Pan-Genome of Shiga-Toxin (Stx) Producing Escherichia Coli and the Stx2 Bacteriophage
Total Page:16
File Type:pdf, Size:1020Kb
The evolution of the pan-genome of Shiga-toxin (Stx) producing Escherichia coli and the Stx2 bacteriophage CHAD LAING University of Lethbridge A Thesis Submitted to the School of Graduate Studies of the University of Lethbridge in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY Department of Biological Sciences University of Lethbridge LETHBRIDGE, ALBERTA, CANADA c Chad Laing, 2013 Sooner or later you’ll have to see The cause and effect So many things still left to do But we haven’t made it yet –Neil Young, ‘Transformer Man’ iii Abstract Human infections with Shiga-toxin (Stx)-producing E. coli (STEC) vary in severity of ill- ness. The pan-genome of a bacterial species contains a shared, essential core genome, and a variably distributed accessory genome. While single nucleotide changes likely influence virulence in STEC, horizontal gene transfer (HGT) on elements such as bacteriophage are thought to be most important. My thesis objectives were to: 1) develop tools for the pan- genomic analyses of bacterial genomes; 2) describe the phylogeny of STEC and; 3) deter- mine if the evolution of the Stx2-bacteriophage parallels that of its bacterial host. For this thesis, the software program Panseq was created and used to identify pan-genomic differ- ences among STEC. Whole-genome phylogenies showed all serotypes as discrete clusters, with O157:H7 having three distinct lineages and grouping separately from all other STEC. Finally, the phylogenies of Stx2-bacteriophage and their bacterial hosts were largely con- cordant, with occasional instances of HGT having led to novel pathogen emergence. iv Acknowledgments Thank you to everyone involved. v Contents Approval/Signature Page ii Epigraph iii Abstract iv Acknowledgements v Table of Contents vi List of Tables x List of Figures xi List of Abbreviations xiv 1 Introduction 1 2 Everything at once: Comparative analyses of bacterial pathogen genomes 5 2.1 Preface . 5 2.2 Introduction . 5 2.3 The Pan-genome . 6 2.4 Molecular Genotyping via Phenotypic Differences . 8 2.5 Molecular genotyping using the accessory genome . 10 2.6 Molecular genotyping using the core genome . 13 2.7 Software for Whole-genome Analyses . 19 2.8 Examples of WGS Comparisons Using Current Methods . 23 2.9 Conclusion . 25 2.10 Conflict of interest statement . 26 2.11 Acknowledgements . 26 3 A review of Shiga-toxin producing E. coli virulence and genomic evolution 33 3.1 Introduction . 33 3.2 Main Virulence Factors . 36 3.2.1 Shiga-toxins . 36 3.2.2 Stx-containing Bacteriophage . 41 3.2.3 Role of Shiga-toxin . 43 3.2.4 The Locus of Enterocyte Attachment and Effacement (LEE) . 43 3.3 The STEC Genome . 48 3.3.1 Acid Resistance . 51 3.3.2 Adherence and Additional Virulence Factors . 52 3.4 Plasmid-encoded Virulence Factors . 53 vi 3.5 Evolutionary Dynamics . 56 3.6 Conclusions . 59 4 In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence. 60 4.1 Preface . 60 4.2 Introduction . 60 4.3 Materials and Methods . 63 4.3.1 Genome-sequenced Strains Used in the Analyses . 63 4.3.2 In silico Lineage Typing . 63 4.3.3 Shiga-toxin Encoding Bacteriophage Insertion Site Typing . 63 4.3.4 Comparative Genomic Fingerprinting (CGF) . 64 4.3.5 Genomic in silico Subtractive Hybridization (GISSH) . 64 4.3.6 Single Nucleotide Polymorphism (SNP) Analysis . 64 4.3.7 Microarray Comparative Genomic Hybridization (mCGH) . 65 4.3.8 Multi-locus Variable Number Tandem Repeat Analysis (MLVA) . 66 4.3.9 Construction of Dendrograms . 66 4.3.10 Construction of Supernetworks . 66 4.4 Results . 67 4.4.1 Lineage Typing . 67 4.4.2 Stx-phage Insertion Site Typing . 67 4.4.3 SNP Genotyping . 68 4.4.4 GISSH-based Novel Region Distribution . 68 4.4.5 Combined Analysis . 69 4.5 Discussion . 70 5 Pan-genome sequence analysis using Panseq: an online tool for the rapid anal- ysis of core and accessory genomic regions 81 5.1 Preface . 81 5.2 Introduction . 81 5.3 Implementation and Methods . 85 5.3.1 Novel Region Finder (NRF) . 86 5.3.2 Core and Accessory Genome Finder (CAGF) . 86 5.3.3 Loci Selector (LS) . 87 5.3.4 Construction of Salmonella enterica Phylogenies . 89 5.3.5 In Silico MLST of Salmonella enterica genomes . 90 5.4 Results and Discussion . 90 5.4.1 Novel Region Finder (NRF) Module . 90 5.4.1.1 The NRF Module in Genomic Island Identification . 91 5.4.1.2 The NRF Module in Multiple Sequence Comparisons . 93 5.4.1.3 Experimental Confirmation of Outputs from the NRF Mod- ule . 94 5.4.2 Core/Accessory Genome Finder (CAGF) Module . 94 vii 5.4.2.1 The CAGF Module in SNP Analysis of the Core Genome 95 5.4.2.2 The CAGF Module in the Analysis of the Accessory Genome . 96 5.4.2.3 The CAGF Module in the Examination of Pan-Genomic Differences . 97 5.4.2.4 Selectable Core Stringency . 97 5.4.3 The Loci Selector (LS) Module . 102 5.4.3.1 The LS Module in the Comparison of Randomly Gener- ated Data . 102 5.4.3.2 The LS Module in the Analysis of SNP Genotyping Data 103 5.4.4 Advantages of Panseq over Other Related Programs . 104 5.5 Conclusions . 105 5.6 Availability and Requirements . 105 5.7 Acknowledgements . 106 6 A comparison of Shiga-toxin 2 bacteriophage from classical enterohemorrhagic Escherichia coli serotypes and the German E. coli O104:H4 outbreak strain 126 6.1 Preface . 126 6.2 Introduction . 126 6.3 Materials and Methods . 128 6.3.1 Identification and Isolation of Stx2-bacteriophage Sequences . 128 6.3.2 Phylogenetic Relationships Among Stx2-bacteriophage . 128 6.3.3 Whole-genome Phylogeny . 129 6.3.4 Quantification of Stx2 Production . 129 6.3.5 Antibiotic Induction . 130 6.3.6 RNA Isolation . 130 6.3.7 cDNA synthesis . 131 6.3.8 Detection of gnd and stx2 Gene Expression by RT-PCR . 131 6.3.9 Statistical Analyses . 132 6.4 Results . 132 6.5 Discussion . 134 6.5.1 Pan-genomic Comparison . 134 6.5.2 Evolutionary Concordance of Stx2-bacteriophage and Host-genomes 135 6.5.3 Stx2-bacteriophage Sequence Alignment . 137 6.5.4 Stx2-toxin Production . 138 6.6 Conclusions . 140 6.7 Acknowledgements . 141 7 Thesis summary 151 7.1 Introduction . 151 7.2 Major Contributions . 152 7.2.1 Design and Implementation of Pan-genomic Software . 152 viii 7.2.2 Population Structure and Phylogeny of E. coli O157:H7 and other STEC . 154 7.2.3 The evolution of the Shiga-toxin 2 producing bacteriophage and its bacterial STEC host . 156 7.3 Conclusions and Future Directions . 157 Bibliography 161 ix List of Tables 2.1 A summary of 15 recent whole-genome based studies. 27 4.1 The 19 E. coli strains analyzed in silico in this study. 76 4.2 The 102 E. coli O157:H7 strains used in the construction of the supernet- work in Figure 4.3 . 77 5.1 The six Listeria monocytogenes genomic sequences analyzed, with RefSeq accession numbers and genomic sequence status. 107 5.2 The 14 E. coli O157:H7 strains compared by Panseq to the E. coli O157:H7 reference strains EDL933 and Sakai for novel accessory genomic regions. 108 5.3 The 39 Salmonella enterica genome sequences analyzed, and the propor- tion of the 9.03 mbp pan-genome of the 39 strains present in each genome. 109 5.4 The 20 best loci as chosen by the LS module of Panseq from the original 96 loci. Loci are listed in order of choice and with points of discrimination (POD). 110 6.1 The Stx2-bacteriophage and genomic sequences used in the current study. 142 x List of Figures 4.1 The distribution of 1456 regions approximately 500 bp in size among 17 E. coli O157:H7 strains and the O145:NM strain EC33264 and K12 strain MG1655. Regions are not necessarily contiguous and are defined as novel based on less than 80% sequence identity to the genome of either EDL933 or Sakai. Black indicates the presence of a region and white indicates the absence of a region. 78 4.2 The supernetwork created from the combination of each maximum par- simony tree from the following typing methods: Stx-phage insertion site typing, MLVA, CGF, SNP genotyping, mCGH, and GISSH-based novel region distribution typing. Maximum parsimony trees were combined us- ing the unweighted mean distance and Z-closure with 1000 iterations; the resulting supernetwork was displayed using the equal angle method. 79 4.3 The supernetwork created from the combination of each maximum par- simony tree from the following typing methods: Stx-phage insertion site typing, MLVA, CGF, SNP genotyping, mCGH, and GISSH-based novel region distribution typing. Both mCGH and CGF datasets included the in silico data from the strains in Table 4.1 in addition to experimental data from the remaining strains in Table 4.2. Maximum parsimony trees were combined using the unweighted mean distance and Z-closure with 1000 iterations; the resulting supernetwork was displayed using the equal angle method. 80 5.1 The size distribution of regions novel to one or both Listeria monocyto- genes strains F6900 and 10403S with respect to the four L. monocytogenes strains Clip81459, EGD-e, HCC23 and 4bF2365. The minimum extracted novel region size was 500 bp. 111 5.2 The size of.