Discovery of Thermophilic Bacillales Using Reduced-Representation
Total Page:16
File Type:pdf, Size:1020Kb
Talamantes-Becerra et al. BMC Microbiology (2020) 20:114 https://doi.org/10.1186/s12866-020-01800-z METHODOLOGY ARTICLE Open Access Discovery of thermophilic Bacillales using reduced-representation genotyping for identification Berenice Talamantes-Becerra1* , Jason Carling2, Andrzej Kilian2 and Arthur Georges1 Abstract Background: This study demonstrates the use of reduced-representation genotyping to provide preliminary identifications for thermophilic bacterial isolates. The approach combines restriction enzyme digestion and PCR with next-generation sequencing to provide thousands of short-read sequences from across the bacterial genomes. Isolates were obtained from compost, hot water systems, and artesian bores of the Great Artesian Basin. Genomic DNA was double-digested with two combinations of restriction enzymes followed by PCR amplification, using a commercial provider of DArTseq™, Diversity Arrays Technology Pty Ltd. (Canberra, Australia). The resulting fragments which formed a reduced-representation of approximately 2.3% of the genome were sequenced. The sequence tags obtained were aligned against all available RefSeq bacterial genome assemblies by BLASTn to identify the nearest reference genome. Results: Based on the preliminary identifications, a total of 99 bacterial isolates were identified to species level, from which 8 isolates were selected for whole-genome sequencing to assess the identification results. Novel species and strains were discovered within this set of isolates. The preliminary identifications obtained by reduced-representation genotyping, as well as identifications obtained by BLASTn alignment of the 16S rRNA gene sequence, were compared with those derived from the whole-genome sequence data, using the same RefSeq sequence database for the three methods. Identifications obtained with reduced-representation sequencing agreed with the identifications provided by whole-genome sequencing in 100% of cases. The identifications produced by BLASTn alignment of 16S rRNA gene sequence to the same database differed from those provided by whole-genome sequencing in 37.5% of cases, and produced ambiguous identifications in 50% of cases. Conclusions: Previously, this method has been successfully demonstrated for use in bacterial identification for medical microbiology. This study demonstrates the first successful use of DArTseq™ for preliminary identification of thermophilic bacterial isolates, providing results in complete agreement with those obtained from whole-genome sequencing of the same isolates. The growing database of bacterial genome sequences provides an excellent resource for alignment of reduced-representation sequence data for identification purposes, and as the available sequenced genomes continue to grow, the technique will become more effective. Keywords: Bacterial identification, DArTseq, Genotyping-by-sequencing, Great Artesian Basin, Reduced-representation sequencing, Thermophiles * Correspondence: [email protected] 1Institute of Applied Ecology, University of Canberra, Canberra, ACT 2601, Australia Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Talamantes-Becerra et al. BMC Microbiology (2020) 20:114 Page 2 of 16 Background resolution of identifications obtained from the 16S rRNA Thermophiles continue to generate interest owing to the gene sequence. For this study, we have tested a novel thermostability of their enzymes, which have been approach of reduced-representation sequencing for the adapted for use in scientific and industrial processes The first stage identification of bacterial isolates to identify proteins of thermophilic bacteria generally exhibit higher 99 isolates from a variety of thermal sources. Addition- thermostability compared to those of mesophiles, in part ally, we have compared the preliminary identification because they tend to have stronger hydrophobic interac- outcomes obtained from 16S rRNA gene sequence and tions amongst their amino acids than in other bacteria reduced-representation sequencing with identifications [1]. The ability to withstand such extreme temperatures derived from whole-genome sequence on a subset of has made the enzymes from thermophilic bacteria of bacterial isolates. Our method used DArTseq™ particular interest for commercial, industrial and scien- (Canberra, Australia) [24], one of several available tific applications [2–5] in areas such as pharmaceutical methods for generating representative sequences from [6], food [7, 8] and detergent industries [9]. the genome. It uses restriction enzyme digestion The classic environments in which thermophilic mi- followed by PCR and Illumina short-read sequencing to croorganisms occur are primarily geothermal in nature amplify and sequence thousands of restriction fragments [10]. The Great Artesian Basin in South Australia has as genomic representations. DArTseq™ has been success- the temperature and chemical properties which are suit- fully used for a broad range of applications, for breeding able for thermophiles [11–13]. Specifically, there are of plants and animals [25], for assessment of genetic di- bores in this region with water temperatures of 90 °C or versity [26–28] and for ecological genetics [29, 30]. This more [14, 15], some of them with open, running bore study represents the first usage of DArTseq™ for identifi- drains known to contain communities of thermophilic cation of thermophilic bacterial isolates. microorganisms [16]. Thermophilic bacteria are also found in other environments such as compost [17–20] Results and hot water systems [21, 22]. In-silico analysis of control E. coli O157 (EDL 933) The isolation and discovery of thermophilic bacteria is IRMM449 certified reference standard a continuing area of research interest around the world. Reduced-representation sequence assays were performed The identification of novel thermophilic isolates is now as a control experiment on the reference standard gen- routinely achieved through DNA sequencing methods. omic DNA of E. coli O157 (EDL 933) IRMM449 [31], Jain et al. (2018), in a high throughput analysis of 90 with 6 technical replicates for each combination of re- thousand bacterial genomes, discussed the importance striction enzymes. Correct identification results were of accurate estimation of genetic relatedness in species produced for all assays at the species and strain level delimitation. In this context, ANI (Average Nucleotide using the Currito3.1 DNA Fragment Analysis Software Identity) has been considered one of the standard tools [32] which was developed for this project. The mean for this task. ANI is calculated as the average nucleotide genome coverage obtained for each method was 2.64% identity from the set of orthologous genes identified be- for PstI with HpaII and 2.34% for PstI with MseI. The tween any two genomes. Organisms belonging to the mean BLASTn percentage alignment values obtained same species are typically considered to show ANI against the genome sequence of E. coli O157 (EDL 933) values of ≥95% in pairwise comparisons [23]. IRMM449, GenBank accession number CP008957.1 [31] Here we aim to assess reduced-representation sequen- were 99.9915 and 99.9974%, respectively. The average cing as an alternative method for preliminary identifica- number of restriction fragments obtained in the se- tion of isolates derived from sampling locations across quence output for each method was 2433 and 1836 frag- Australia. A standard approach to the identification of ments, respectively. Finally, the average nucleotide novel bacterial strains or species utilises partial or sequence distance (NSD) value obtained for each complete 16S rRNA gene sequence as a preliminary method was 0.000103 and 0.000040 respectively. For the identification method to screen for potentially novel PstI with HpaII enzyme combination, the average NSD strains or species among a set of isolates. Candidates values showed less than 1 bp of difference per 10,000 bp identified from the 16S rRNA gene sequencing subse- aligned. quently undergo whole-genome sequencing. The use of 16S rRNA gene sequencing for bacterial identification is well established, although it has two potential limita- Isolation of the strains tions: firstly, in some cases it is necessary to attempt A total of 99 bacterial isolates were obtained from 27 more than one set of PCR primers in order to achieve different sampling sources. Microbial growth results of amplification