Kelleherpr Phd2017.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
UCC Library and UCC researchers have made this item openly available. Please let us know how this has helped you. Thanks! Title Comparative and functional genomic analysis of dairy lactococci Author(s) Kelleher, Philip Publication date 2017 Original citation Kelleher, P. 2017. Comparative and functional genomic analysis of dairy lactococci. PhD Thesis, University College Cork. Type of publication Doctoral thesis Rights © 2017, Philip Kelleher. http://creativecommons.org/licenses/by-nc-nd/3.0/ Item downloaded http://hdl.handle.net/10468/4522 from Downloaded on 2021-10-09T07:19:49Z Comparative and Functional Genomic Analysis of Dairy Lactococci A thesis presented to the National University of Ireland, Cork by Philip Kelleher BSc Pharmaceutical Biotechnology, MSc Bioinformatics & Systems Biology For the degree of PhD in Microbiology School of Microbiology, National University of Ireland, Cork January 2017 Supervisor: Prof. Douwe van Sinderen Head of School: Prof. Gerald Fitzgerald General Table of Contents Abbreviations ………………………………………….…………………. 4 Abstract …………………………………………………….…………….. 6 Chapter I: General introduction ……………………………………….. 12 Chapter II: Performance and flavour-based characterisation of lactococcal starter cultures ……………………………….……………... 93 Chapter III: Comparative and functional genomics of the Lactococcus lactis taxon; insights into evolution and niche adaptation …………….. 130 Chapter IV: Comparative genomic analysis of the Lactococcus lactis plasmidome and assessment of its technological properties …….…….. 183 Chapter V: Base modification analysis of Lactococcus lactis strains and their corresponding restriction-modification systems ………….… 233 Chapter VI: Assessing functionality and genetic diversity of lactococcal prophages ……………………………………….…………... 273 Chapter VII: General discussion ………………………….……………. 321 Appendix A: Large-scale cheese fermentation trial results …….…….. 334 Acknowledgements ……………………………….……………………… 339 2 Declaration I hereby declare that the content of this thesis is the result of my own work and has not been submitted for another degree, either at University College Cork or elsewhere Signed: ______________________________ Philip Kelleher Date: 05/01/2017 3 Abbreviations AA = Amino acid Abi = Abortive infection system ADI = Arginine deanimase pathway AMC = 7-amino-4-methyl coumarin ANI = Average nucleotide identity ARD = Amino-proximal recognition domain BLAST = Basic local alignment search tool BPP = Baseplate protein Cas = CRISPR-associated proteins CDS = Coding sequence CFU = Colony forming unit COG = Clusters of Orthologous Groups CPS = Capsular exopolysaccharide CRD = Carboxy-proximal recognition domain CRISPR = Clustered Regularly Interspaced Short Palindromic Repeats Dit = Distal tail protein dso = Double-stranded origin of replication EPS = Exopolysaccharide FDA = Food and Drug Administration GRAS = Generally regarded as safe HCL = Hierarchial clustering HsdM = Methylase subunit HsdR = Restriction endonuclease subunit HsdS = Specificity subunit IS = Insertion sequence elements KEGG = Kyoto Encyclopedia of Genes and Genomes LAB = Lactic acid bacteria LDH = Lactate dehydrogenase MCL = Markov Clustering Algorithm NCBI = National Centre for Biotechnology Information NGS = Next generation sequencing NICE = Nisin-inducible controlled gene expression 4 NT = Nucleotide ORF = Open reading frame Ori = Origin of replication OriT = Origin of transfer PFGE = Pulse field gel electrophoresis PHAST = Phage Search Tool PTS = Phosphotransferase system qPCR = Quantitative PCR RBP = Receptor binding protein RCR = Rolling-circle replication Rep = Replication protein R-M = Restriction modification RSM = Reconstituted skimmed milk SBS = Sequencing-by-synthesis Sie = Superinfection exclusion system SMRT = Single molecule real time Tal = Tail-associated lysin TETRA = Tetranucleotide frequency correlation coefficients TMP = Tail tape measure protein TRD = Target recognition domain ZMW = Zero-mode waveguide 5 Abstract Lactococcus lactis has been exploited for thousands of years for the production of fermented dairy products, and from an economic perspective has become one of the most valuable bacteria. L. lactis is used predominantly as a starter culture for the production of various hard and soft cheeses. The constant threat of (bacterio)phage infection combined with consumer-driven diversification of product ranges have created an increased need to improve technologies for the rational selection of novel starter culture blends. Whole genome sequencing, spurred on by recent advances in next-generation sequencing (NGS) platforms, is a promising approach to facilitate the rapid identification and selection of such strains based on gene-trait matching. In this thesis the most up-to-date sequencing methodologies were applied to sequence sixteen L. lactis isolates to facilitate an in depth comparative and functional genomic analysis of the taxon with particular emphasis placed on dairy traits. A selection of lactococcal strains were first functionally characterised based on their phenotypic traits and assessed for industrial robustness and flavour formation using a functional approach. The behaviour of the strains under simulated cheese production conditions was monitored, and employed to assess their temperature-induced autolytic properties. This analysis was followed by the determination of activity profiles of enzymes related to key flavour formation pathways, in order to explore proteolytic and lipolytic abilities of each strain. Comparative analysis between our selection of L. lactis strains and of four starter cultures currently employed in the Irish dairy industry for the production of half-fat 6 Cheddar cheese facilitated the identification of potentially novel starter cultures. In total twenty strains were assessed for the activity of twelve separate enzymes related to cheese production. From these strains, eleven were selected for whole genome shotgun sequencing to further investigate their genetic composition, and to explore the possibility of linking genotype to phenotype (also called gene-trait matching). The genomes of sixteen L. lactis subsp. lactis and L. lactis subsp. cremoris dairy strains were sequenced to completion, doubling the number of fully sequenced L. lactis genomes currently available from the public National Centre for Biotechnology Information (NCBI) data base. These newly sequenced genomes along with available whole genome sequences were used to perform the largest comparative and functional genomic study to date on the L. lactis taxon. Their chromosomal features were assessed with particular emphasis on discerning the L. lactis subspecies division, evolution and niche adaptation. This analysis clearly identified a phylogenetic division between subspecies lactis and cremoris strains, which was further corroborated by hierarchical clustering based on carbohydrate and amino acid metabolic pathways. The pan and core genomes of L. lactis were shown to be comprised of 5906 and 1129 genes, respectively. Both were found to be in a closed state, indicating that the representative data sets employed for this analysis are sufficient to fully describe the genetic diversity of the taxon. Niche adaptation appears to play a significant role in governing the genetic content of each L. lactis subspecies, while (differential) genome decay and redundancy in the dairy niche was also highlighted. The description of chromosomal adaptations in L. lactis has not received the same level of attention compared to plasmid-mediated characteristics due to the perceived biotechnological importance of the latter. Our comparative analysis revealed that the division between plasmid- and chromosome-based traits is 7 less clear as multiple integration events within the lactococcal chromosome suggests a more fluid genome than previously thought. The complete genome sequence analysis of sixteen L. lactis strains revealed the presence of a total of sixty-seven plasmids, including two megaplasmids representing the first megaplasmids identified in lactococcal strains. Megaplasmids are large autonomous self-replicating extrachromosomal genetic elements greater than 100 Kb. While megaplasmids are not essential for the growth of their host, they may encode additional metabolic capabilities. Comparative genome analysis of these sequences combined with those of publicly available plasmids (eighty one publicly available) allowed the definition of the lactococcal plasmidome based on one hundred and forty eight complete plasmid sequences, and facilitated an investigation into technologically important plasmid-encoded traits. In contrast to the lactococcal chromosomes, the lactococcal pan-plasmidome was found to be in a fluid state implying that continued sequencing efforts will likely expand the diversity of this data set and lead to an increase in the identification of novel plasmid features. In the present study, lactococcal gut adhesion was also investigated identifying potential gut adhesion factors within the lactococcal plasmidome. It is envisioned that this may provide further insights for the application of L. lactis as a vector for vaccine and biomolecule delivery. Finally, the frequency of plasmid-encoded phage resistance mechanisms was assessed with particular emphasis on abortive infection (Abi) systems. In total fourteen plasmid-encoded Abi systems were identified, while further analysis also identified frequent occurrences of these systems within the lactococcal chromosomes. Single molecule real time sequencing (SMRT)