Genomic Characterization Reconfirms the Taxonomic Status of Lactobacillus Parakefiri
Total Page:16
File Type:pdf, Size:1020Kb
Note Bioscience of Microbiota, Food and Health Vol. 36 (3), 129–134, 2017 Genomic characterization reconfirms the taxonomic status of Lactobacillus parakefiri Yasuhiro TANIZAWA1, Hisami KOBAYASHI2, Eli KAMINUMA1, Mitsuo SAKAMOTO3, 4, Moriya OHKUMA3, Yasukazu NAKAMURA1, Masanori ARITA1, 5 and Masanori TOHNO2* 1Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan 2Institute of Livestock and Grassland Science, National Agriculture and Food Research Organization, 768 Senbonmatsu, Nasushiobara, Tochigi 329-2793, Japan 3Japan Collection of Microorganisms, RIKEN BioResource Center, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan 4PRIME, Japan Agency for Medical Research and Development (AMED), 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan 5RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi, Yokohama, Kanagawa 230-0045, Japan Received November 16, 2016; Accepted February 20, 2017; Published online in J-STAGE March 4, 2017 Whole-genome sequencing was performed for Lactobacillus parakefiri JCM 8573T to confirm its hitherto controversial taxonomic position. Here, we report its first reliable reference genome. Genome-wide metrics, such as average nucleotide identity and digital DNA-DNA hybridization, and phylogenomic analysis based on multiple genes supported its taxonomic status as a distinct species in the genus Lactobacillus. The availability of a reliable genome sequence will aid future investigations on the industrial applications of L. parakefiri in functional foods such as kefir grains. Key words: Lactobacillus parakefiri, taxonomy, lactic acid bacteria, whole-genome sequence Type strains hold significant positions in bacterial given pair of genomes, as implemented by the Genome- nomenclature: the taxonomic affiliation of any other to-Genome Distance Calculator (GGDC, http://ggdc. isolate is identified on the basis of comparison with the dsmz.de), which infers an in-silico analogue of DDH type strains. For example, an isolate showing a DNA- with confidence intervals [4]. These sequence-based DNA hybridization (DDH) similarity of more than 70% methods are reproducible and scalable and thus may to the type strain of a certain species is considered to be applied in the detection and validation of taxonomic belong to the same species. The DDH value of 70%, a mislabeling in public sequence databases [5–7]. In our classical threshold described nearly 30 years ago, still previous study, we used ANI to assess the taxonomic remains for the gold standard for species boundary, status of 718 publicly available genomes of the genus although it has been partly supplemented by the use Lactobacillus, the largest group of lactic acid bacteria, of 16S rRNA gene sequence similarity [1]. However, with over 180 known species [8]. We found mislabeling recently, the use of genome-wide metrics, such as average of organism names in several genomes, possibly caused nucleotide identity (ANI) and digital DDH (dDDH), by taxonomic misidentification or human errors such has been proposed as a replacement for the laborious as sample mix-ups during experimental procedures or process of DDH. ANI is calculated from the mean data handling. Moreover, our results suggested that even identity of homologous regions between two genomes, genomes for type strains might be erroneous in several and the cutoff value of 95% is widely acknowledged as species, including L. parakefiri and L. homohiochii. the threshold to delineate two species [2, 3]. The dDDH L. parakefiri is a heterofermentative lactic acid value is also based on the sequence alignment between a bacterium described by Takizawa et al. in 1994, with strain GCL 1731T (= JCM 8573T = DSM 10551T = ATCC 51648T) designated as its type strain [9]. It was *Corresponding author. Masanori Tohno, Institute of Livestock originally isolated from kefir grains and, together with and Grassland Science, National Agriculture and Food Research another kefir-isolated species, L. kefiri, belongs to the L. Organization, 768 Senbonmatsu, Nasushiobara, Tochigi 329- buchneri group [10, 11]. The complex compositions and 2793, Japan. Tel: +81-287-37-7804, E-mail: [email protected] ©2017 BMFH Press mechanisms of kefir grain microbiota have been studied This is an open-access article distributed under the terms of the extensively because of the health benefits associated Creative Commons Attribution Non-Commercial No Derivatives with kefir ingestion [12, 13]. Recently, conflicting (by-nc-nd) License. (CC-BY-NC-ND 4.0: https://creativecommons. org/licenses/by-nc-nd/4.0/) views were reported by two research groups for L. 130 Y. Tanizawa, et al. parakefiri on the basis of genomic analyses, resulting in assembled de novo in the same manner. The genome controversies about the taxonomic status of L. parakefiri. statistics are summarized in Table 1. Our data showed a Zheng et al. suggested that L. parakefiri is a later high completeness value and a low contamination level. heterotypic synonym of L. kefiri [14], whereas Sun et al. Contrarily, the data from SRR1151226 and ERR433484 acknowledged it as the species with the largest genome showed high contamination values (14.20% and 99.35%, in the genus Lactobacillus [15]. In both studies, the respectively) and high ANI values (97.9% and 91.7%, draft genome was reconstructed independently using the respectively) against the publicly available draft genome raw sequencing reads for DSM 10551T deposited under of L. kefiri DSM 20587T (accession number: AYYV01), the accession numbers SRR1151226 and ERR433484, supporting our previous findings of contamination in the respectively, in the Sequence Read Archive (SRA). sequencing data deposited in the public databases. The The reconstructed genome from ERR433484 was also different genome sizes and contamination values implied deposited in the International Nucleotide Sequence different extents of contamination in the SRR1151226 Database Collaboration (INSDC) under the accession and ERR433484 data. We identified one copy each number AZEN01. Our preliminary assessment suggested of rpoA, pheS, and recA genes in the genome of JCM that the genome AZEN01 was contaminated with L. 8573T, whose nucleotide sequences exactly matched the kefiri. Indeed, housekeeping genes that normally exist ones reported for L. parakefiri LMG 15133T (accession in single copies, such as pheS and rpoA, were found in numbers: AM087851, AM263510.1, and AJ621665, duplicate, with one matching the sequence of L. parakefiri respectively), while showing only 92%, 84%, and 82% deposited in the public sequence database and the other nucleotide identity with those of L. kefiri LMG 9480T matching that of L. kefiri. Therefore, the only publicly (AM087840, AM263508, and AJ621650, respectively). available genome for the type strain of L. parakefiri does This strongly suggests that the strain JCM 8573T belongs not serve as a reference for taxonomic studies. In this to a species distinct from L. kefiri. Notably, full-length 16S study, we obtained a type strain of L. parakefiri, JCM rRNA genes could not be identified in the draft genome. 8573T, from the Japan Collection of Microorganisms Bacterial genomes generally harbor multiple copies (JCM) and reanalyzed its taxonomic status by conducting of rDNA regions, and such repetitive sequences make whole-genome sequencing. genome assembly difficult, often yielding collapsed or Genomic DNA of JCM 8573T was extracted from cells fragmented contigs for such regions [18]. We found that cultured in de Man, Rogosa, and Sharpe broth (Difco) at several contigs contained fragmented 16S rDNA gene the mid-logarithmic phase and then purified using Qiagen sequences, which showed nearly 100% identity with the Genomic-tip 500/G gravity-flow, anion-exchange tips ones deposited in the public databases for L. parakefiri. and a Qiagen Genomic DNA Buffer Set with lysozyme The draft genome sequence of L. parakefiri JCM (Sigma) and proteinase K (Qiagen), according to the 8573 T was deposited in the INSDC under the accession manufacturer’s instructions. Whole-genome sequencing numbers BDGB01000001–BDGB01000161. The raw using a 300-bp pair-end Illumina MiSeq system yielded sequencing reads were also deposited in the SRA under 5,829,866 reads, which corresponds to approximately the accession number DRR064132. 700-fold coverage. De novo assembly was performed Next, ANI and GGD calculation were performed using the Platanus_B assembler (version 1.1.0) with between L. parakefiri JCM 8573T and each type strain the default settings after preprocessing the raw reads to of the 13 species in the L. buchneri subgroup. Genomes remove low-quality bases and adapter sequences using included in the analysis were obtained from the DFAST Platanus_trim (version 1.0.7) [16]. The draft genome Archive of Genome Annotation (DAGA) [8]. ANI was was annotated using the DDBJ Fast Annotation and calculated using the modified pyani script provided at Submission Tool (DFAST, http://dfast.nig.ac.jp). We https://github.com/widdowquinn/pyani, and the dDDH obtained a draft genome that consisted of 161 contigs with values were calculated using the GGDC web service an estimated genome size of 2,493,412 bp, which was (http://ggdc.dsmz.de). All the ANI values were less than comparable with those of other species in the L. buchneri the 95% cutoff line used to delineate two species [2], subgroup. The genome completeness and contamination with the highest value being 84.92% against L. buchneri values were also calculated using CheckM (version