Phylotype-Level Characterization of Complex Lactobacilli Communities Using a High-Throughput, High-Resolution Phenylalanyl-Trna
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.09.290726; this version posted September 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 of 51 1 Phylotype-Level Characterization of Complex Lactobacilli Communities Using 2 a High-Throughput, High-Resolution Phenylalanyl-tRNA Synthetase (pheS) 3 Gene Amplicon Sequencing Approach 4 5 Shaktheeshwari Silvaraju,a Nandita Menon,a Huan Fan,a Kevin Lim,a Sandra 6 Kittelmanna# 7 8 aWilmar International Limited, WIL@NUS Corporate Laboratory, Centre for 9 Translational Medicine, National University of Singapore, Singapore. 10 11 Running Head: pheS-based species-level profiling of lactobacilli 12 13 #Address correspondence to [email protected]. 14 Shaktheeshwari Silvaraju and Nandita Menon contributed equally to this work. 15 Author order was determined on the basis of starting date on this project. 16 17 18 Keywords: amplicon sequencing, fermented food, host-associated lactobacilli, 19 Lactobacillus, Pediococcus, taxonomic framework. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.09.290726; this version posted September 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 2 of 51 20 ABSTRACT 21 The ‘lactobacilli’ to date encompass more than 270 closely related species that were 22 recently re-classified into 26 genera. Because of their relevance to industry, there is 23 a need to distinguish between closely related, yet metabolically and regulatory 24 distinct species, e.g., during monitoring of biotechnological processes or screening of 25 samples of unknown composition. Current available methods, such as shotgun 26 metagenomics or rRNA-based amplicon sequencing have significant limitations (high 27 cost, low resolution, etc.). Here, we generated a lactobacilli phylogeny based on 28 phenylalanyl-tRNA synthetase (pheS) genes and, from it, developed a high- 29 resolution taxonomic framework which allows for comprehensive and confident 30 characterization of lactobacilli community diversity and structure at the species-level. 31 This framework is based on a total of 445 pheS gene sequences, including 32 sequences of 277 validly described species and subspecies (out of a total of 283, 33 coverage of 98%). It allows differentiation between 263 lactobacilli species-level 34 clades out of a total of 273 validly described species (including the proposed species 35 L. timonensis) and a further two subspecies. The methodology was validated through 36 next-generation sequencing of mock communities. At a sequencing depth of ~30,000 37 sequences, the minimum level of detection was approximately 0.02 pg per μl DNA 38 (equalling approximately 10 genome copies per µl template DNA). The pheS 39 approach along with parallel sequencing of partial 16S rRNA genes revealed a 40 considerable lactobacilli diversity and distinct community structures across a broad 41 range of samples from different environmental niches. This novel complementary 42 approach may be applicable to industry and academia alike. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.09.290726; this version posted September 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 3 of 51 43 IMPORTANCE 44 Species within the former genera Lactobacillus and Pediococcus have been studied 45 extensively at the genomic level. To accommodate for their exceptional functional 46 diversity, the over 270 species were recently re-classified into 26 distinct genera. 47 Despite their relevance to both academia and industry, methods that allow detailed 48 exploration of their ecology are still limited by low resolution, high cost or copy 49 number variations. The approach described here makes use of a single copy marker 50 gene which outperforms other markers with regards to species-level resolution and 51 availability of reference sequences (98% coverage). The tool was validated against a 52 mock community and used to address lactobacilli diversity and community structure 53 in various environmental matrices. Such analyses can now be performed at broader 54 scale to assess and monitor lactobacilli community assembly, structure and function 55 at the species (in some cases even at sub-species) level across a wide range of 56 academic and commercial applications. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.09.290726; this version posted September 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 4 of 51 57 INTRODUCTION 58 The generic term ’lactobacilli’ encompasses all organisms described as belonging to 59 the family Lactobacillaceae until 2020, a highly diverse range of bacteria within the 60 phylum Firmicutes (1). Species within this group are Gram-positive, facultative 61 anaerobic or microaerophilic, rod-shaped, non-spore-forming bacteria. Due to their 62 beneficial effects on food during fermentation, lactobacilli play an important role for 63 the food industry and have thus been studied extensively both for their phenotypic 64 and genomic characteristics over the last decades (2). Several species have 65 received generally recognized as safe (GRAS) status for use in human food and 66 animal feed. Besides their naturally high abundances in fermented foods they are 67 frequently observed as part of the healthy microbiomes of animals, humans and 68 plants (2-4). Formerly, this group consisted of only three genera, Lactobacillus, 69 Paralactobacillus, and Pediococcus. However, these traditionally defined genera 70 alone represent a genetic diversity larger than that of a typical bacterial family (5, 6). 71 A tremendous recent re-classification effort of the lactobacilli resulted in the 72 establishment of 26 genera (encompassing 272 species plus the proposed species 73 L. timonensis) and reduction of the original genus Lactobacillus to only 38 species 74 around its type species, Lactobacillus delbrueckii (1). This refined taxonomic 75 structure will in future facilitate the detection and description of functional properties 76 that are shared within each genus. 77 In the last two decades, numerous studies aimed at elucidating the bacterial 78 communities involved in the fermentation of specific traditional foods or beverages 79 by using both culture-dependent and culture-independent methods (reviewed by 7, 80 8). These communities are most often dominated by lactobacilli. While cultivation- 81 based approaches allow isolation and detailed study of individual strains, they are bioRxiv preprint doi: https://doi.org/10.1101/2020.09.09.290726; this version posted September 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 5 of 51 82 biased towards microorganisms that are culturable under laboratory conditions (9). 83 Molecular techniques for bacterial community structure analysis in fermented food 84 samples, so far, largely relied on amplification and (high throughput) sequencing of 85 16S rRNA marker genes (recently reviewed by 10). Comparatively few studies have 86 used shotgun metagenomic sequencing (10-12) despite the availability of highly- 87 resolving bioinformatics tools (13), likely because this approach is still rather costly at 88 a sequencing depth that allows for confident species or even strain level taxonomic 89 classification and, instead, is more suited to derive functional information (14). 90 Furthermore, the accuracy of taxonomic assignment relies on a curated genome 91 database and the availability of at least one representative genome for every species 92 present in the sample. In the case of the lactobacilli, genome-based taxonomies 93 have been published and updated on a regular basis (5, 6, 15-17). However, new 94 species within this group are even more frequently discovered and described (1). 95 While draft genomes may not be immediately available, deposition of corresponding 96 marker gene sequences in public databases for reference are required for the valid 97 description of new species. The 16S rRNA gene is a widely accepted marker gene to 98 analyze bacterial and archaeal community diversity and structure in many habitats 99 and to understand evolutionary distances between species. Sequence databases 100 and taxonomic frameworks for bacterial and archaeal 16S rRNA genes such as 101 Greengenes and SILVA are highly curated (18-20) and compatible with next- 102 generation sequencing bioinformatics pipelines such as mothur (14, 21) or 103 QIIME/QIIME2 (22, 23). However, at 16S rRNA gene level and even if the almost 104 entire 16S rRNA gene is analyzed (~1,500 bp), different lactobacilli type species 105 display sequence similarities greater than the accepted species cut-off of 98.7% and 106 up to 100% sequence identity across shorter regions commonly used for next- bioRxiv preprint doi: https://doi.org/10.1101/2020.09.09.290726; this version posted September 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 6 of 51 107 generation amplicon sequencing protocols (24, 25). In consequence,