bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Article Title: Draft genome assembly data of Anoxybacillus sp. strain MB8 isolated from 2 Tattapani hot springs, India

3

4 Authors: Vishnu Prasoodanan P K, Shruti S. Menon, Rituja Saxena, Prashant Waiker, Vineet 5 K. Sharma

6

7 Affiliations: MetaBioSys Group, Department of Biological Sciences, Indian Institute of 8 Science Education and Research, Bhopal, Madhya Pradesh, India

9

10 Corresponding author: Vineet K. Sharma ([email protected])

11

12 Abstract

13 Discovery of novel thermophiles has shown promising applications in the field of 14 biotechnology. Due to their thermal stability, they can survive the harsh processes in the 15 industries, which make them important to be characterized and studied. Members of 16 Anoxybacillus are alkaline tolerant thermophiles and have been extensively isolated from 17 manure, dairy-processed plants, and geothermal hot springs. This article reports the assembled 18 data of an aerobic bacterium Anoxybacillus sp. strain MB8, isolated from the Tattapani hot 19 springs in Central India, where the 16S rRNA gene shares an identity of 97% (99% coverage) 20 with Anoxybacillus kamchatkensis strain G10. The de novo assembly and annotation performed 21 on the genome of Anoxybacillus sp. strain MB8 comprises of 2,898,780 bp (in 190 contigs) 22 with a GC content of 41.8% and includes 2,976 protein-coding genes,1 rRNA operon, 73 23 tRNAs, 1 tm-RNA and 10 CRISPR arrays. The predicted protein-coding genes have been 24 classified into 21 eggNOG categories. The KEGG Automated Annotation Server (KAAS) 25 analysis indicated the presence of assimilatory sulfate reduction pathway, nitrate reducing 26 pathway, and genes for glycoside hydrolases (GHs) and glycoside transferase (GTs). GHs and 27 GTs hold widespread applications, in the baking and food industry for bread manufacturing, 28 and in the paper, detergent and cosmetic industry. Hence, Anoxybacillus sp. strain MB8 holds 29 the potential to be screened and characterized for such commercially relevant enzymes.

30 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

31 Keywords

32 Anoxybacillus; Illumina; NGS sequencing; Genome assembly

33

34 Specifications Table

Subject Biology Specific subject area Microbial genomics Type of data Genomic sequence, predicted genes and annotation, Table, Figure How data were acquired Whole-genome sequencing using Illumina NextSeq 500 platform Data format Raw and analysed Parameters for data collection The strain MB8 was isolated and cultured on DifcoTryptic Soy Agar (Becton, Dickinson and Company, USA) constituting of pancreatic digest of casein, papaic digest of soybean, sodium chloride and agar and were incubated at 65°C. Genomic DNA was extracted using phenol- chloroform extraction method from the pure culture. The bacterial isolate was identified using 16S rRNA gene amplification and whole genome sequencing using Illumina NextSeq 500 platform. Reads were assembled into 190 contigs using SPAdes 3.5.0. Genome sequence were annotated using DDBJ Fast Annotation and Submission Tool. Description of data collection Extraction of genomic DNA, Libraries were then loaded on Illumina NextSeq 500 platform, de novo assembly and annotation procedures Data source location MetaBioSys Group, Department of Biological Sciences, Indian Institute of Science Education and Research, Bhopal, Madhya Pradesh, India bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Data accessibility This whole-genome shotgun project has been deposited in DDBJ/ENA/GenBank under the accession numberRXXX00000000; BioSample SAMN10593045. 35

36 Values of Data

37 1. The Anoxybacillus sp. strain MB8 genome assembly data provides insights into 38 functional potential of thermophilic enzymes of this thermotolerant microbe. 39 2. The presence of genes for Glycoside Hydrolase (GHs) like alpha-amylase, pullulanase, 40 neopullulanase, alpha-glucosidase, beta-fructofuranosidase etc. and genes for Glycosyl 41 Transferase (GTs) like sucrose synthase, maltodextrin phosphorylase, starch synthase, 42 and glycogen phosphorylase were identified, which hold strong industrial values. 43 3. The taxonomic annotation of Anoxybacillus sp. strain MB8 using different approaches 44 indicates that the closest relatives were Anoxybacillus gonensis NZ CP012152T 45 (96.86% ANI) and Anoxybacillus kamchatkensis G10 NZ CP025535 (96.83% ANI) 46 obtained from. The strain MB8 most likely belongs to the same subspecies of 47 Anoxybacillus gonensis NZ CP012152T.

48

49 1. Data Description

50 Thermophiles are known for their capability to subsist in extreme temperatures and are known 51 to be relevant in industrial and biotechnological applications. Metagenomic analysis of several 52 extreme ecosystems across the globe has revealed micro-organisms harbouring genes for 53 enzymes showing a promising role in the applied biotechnology [1–3]. India is known to 54 harbour about 400 thermal springs according to the Geological Survey of India [4], and a study 55 conducted at Tattapani hot-springs (located in the state of Chhattisgarh, India) revealed to 56 harbor bacterial that have adapted to survive in extreme ecosystems [1]. Anoxybacillus 57 mongoliensis strain MB4 was previously isolated from the same hot spring and is a sulfur- 58 utilizing aerobic thermophile [5]. The genome sequence of Gulbenkiania mobilis and 59 Tepidimonas taiwanensis, chemolithotrophic thermophiles, have been reported previously 60 from the Anhoni hot springs located in Central India [6,7]. Here, we report the draft genome 61 of Anoxybacillus sp. strain MB8, isolated from Tattapani hot springs with a temperature of 62 61.5°C, located in the state of Chhattisgarh, Central India (23.41°N latitude and 83.39°E 63 longitude). The 16S rRNA gene sequence of MB8 showed 97% sequence identity with the bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

64 thermophilic A. kamchatkensis strain G10, previously isolated from a hot spring in 65 Indonesia [8]. The genome shows the presence of assimilatory sulfate reduction pathway and 66 the presence of glycoside hydrolases (GHs) and glycosyltransferases (GTs). The GH enzymes 67 are capable of degrading carbohydrates and starch; which are the major raw materials in 68 industries. Glycoside hydrolases have been widely used in food manufacturing, cosmetics 69 industry, and animal nutrition. These enzymes have been predicted to have important roles in 70 biorefining applications, food, and baking industries as well as in second-generation biofuel 71 production [9–11].

72 Previous studies have reported, Anoxybacillus spp. to possess genes for starch modifying 73 enzyme. The thermostable properties of the enzymes make them a suitable and stable catalyst 74 for industrial applications, and thus their study and characterization seem to be important for 75 their efficient utilization. This article also reports reference-based genome alignment of the 76 strain MB8 with the five complete and twenty-five available draft genome sequences of the 77 members of genus Anoxybacillus. The analysis reveals the alignment of Anoxybacillus sp. MB8 78 with Anoxybacillus kamchatkensis strain G10 (73.77%) to be the highest among the known 79 five complete genomes whereas Anoxybacillus gonensis strain DT3-1 (72.25%) to be highest 80 among the draft genomes. The MiGA, digital DNA: DNA hybridization (dDDH) obtained 81 using genome-to-genome distance calculator (GGDC) and average nucleotide identity (ANI) 82 revealed that the Anoxybacillus sp. MB8 most likely is a subspecies of Anoxybacillus gonensis 83 NZ CP012152T.

84 The colonies grown on DifcoTryptic Soy Agar were medium-sized, round-shaped, and 85 transparent with irregular margins. The growth of the colonies was observed after 12-16h of 86 incubation under aerobic conditions. The 16S rDNA sequence was used to identify the species. 87 The isolate showed a 97% sequence identity and 99% coverage with Anoxybacillus 88 kamchatkensis strain G10. The phylogenetic tree (Fig. 1) was used to reveal its position in the 89 taxonomic tree, which confirmed that strain MB8 is a member of the genus Anoxybacillus.

90 Illumina sequencing generated 849.3 MB of total data and 47, 86,534 reads (trimmed and 91 filtered) were used for the genome assembly using SPAdes 3.5.0. The draft genome sequence 92 annotated by DFAST server contains 190 contigs (length>=300bp, k-mer set to 91) with an 93 N50 of 54,614 bp and 41.8% G+C content. A total of 2,976 protein-coding genes, RNA genes 94 (1 rRNA operon, 73 tRNA, and 1tmRNA) and 10 CRISPR array were predicted (Table 1). A 95 total of 2,767 protein coding-genes were assigned to 21 eggNOG categories with their relative bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

96 abundance after alignment of the genes with eggnog v4.1 database and assigning eggNOG IDs. 97 Within the 21 categories, the majority of genes were assigned to “Function unknown” 98 (26.3823%) followed by “General function prediction only” (7.58%) and “Replication, 99 recombination and repair” (7.37%). The least number of genes were assigned to “Chromatin 100 structure and dynamics” (0.018%). The relative abundance of each category is shown in (Table 101 2).

102 As revealed by the KAAS analysis, the genome MB8 contained genes coding for a complete 103 set of enzymes of general carbohydrate metabolism namely, glycolysis, the TCA cycle, and the 104 pentose phosphate pathway. The genes for catalase-peroxidase, thiol peroxidase, glutathione 105 peroxidase and (Fe-Mn, Cu-Zn) superoxide dismutase was also observed, which are important 106 for protection against reactive oxygen species. Genes for menaquinol-cytochrome c 107 oxidoreductase (qcrA, qcrB, qcrC) were identified, which seem to be absent in A. 108 kamchatkensis strain G10 [8,12]. Precursors for carotenoid biosynthesis are formed by the 109 MEP/DOXP pathway. The only carotenoid biosynthesis genes present were crtB, crtN, and 110 crtP, which can be correlated to the colorless appearance of the colonies. Genes for sugar 111 phosphotransferase systems (PTS) like glucose, sucrose, mannitol, and trehalose were found, 112 indicating the ability of the bacteria to utilize multiple carbohydrates as a sole source of carbon. 113 The genes for assimilatory sulfate reduction pathway (sulfate to sulfide conversion), 114 dissimilatory nitrate reduction to ammonium (NarGHI, NrfAH), base excision repair, and 115 nucleotide excision repair were also observed. The presence of dissimilatory nitrate reduction 116 indicates that the bacterium utilizes nitrate or nitrite as an electron acceptor during anaerobic 117 conditions [13].

118 Genes involved in the selenocompound metabolism like PAPSS and TXNRD, which help in 119 the conversion of selenate to hydrogen selenide were identified. Also, genes involved in the 120 conversion of selenocysteine to seleno-methionine were present. The genome comprised of 121 genes kinA, kinB, kinD, kinE, spo0B, spo0Fand spo0A (a specific master regulator gene) 122 present in sporulation biofilm formation [14]. Genes for aerobic and anaerobic respiration in 123 case of oxygen limitation and genes involved in flagellar motor switch adaption were 124 identified. The genome showed the presence of genes for GHs like alpha-amylase, pullulanase, 125 neopullulanase, alpha-glucosidase, beta-fructofuranosidase, 6-phospho-β-glucosidase, 126 cyclomaltodextrinase (CDase), 1, 4-alpha-glucan branching enzyme, trehalose-6-phosphate 127 hydrolase and oligo-1, 6-glucosidase, which hold strong industrial value [15]. Genes for GTs 128 like sucrose synthase, maltodextrin phosphorylase, starch synthase, and glycogen bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

129 phosphorylase were identified. There were genes encoding few prokaryotic type-ABC 130 transporters like arabinooligosacchrides, ribose, zinc, etc. Presence of phosphotransacetylase, 131 acetate kinase, and L-lactate dehydrogenase genes corresponds to fermentative growth in B. 132 subtilis and A. flavithermusWK1[12], all of which were identified in the strain MB8.

133 Reference-based genome alignment for Anoxybacillus sp. strain MB8 was carried out using 134 software Bowtie2 [16] and BWA [17]. Trimmed reads (47,86,534) were aligned against 135 complete genomes of five Anoxybacillus spp. as well as the twenty-five draft genomes of 136 Anoxybacillus spp. The alignment rate against the five complete genomes was obtained using 137 the Bowtie2 (Table 3, Fig. 2a) and the MB8 genome aligned highest with Anoxybacillus 138 kamchatkensis strain G10 (73.77%) and the lowest with Anoxybacillus sp. strain B7M1 139 (2.84%). The reads were also aligned with the 190 contigs obtained after assembly, which 140 showed a 98.68% alignment rate. The alignment rate against the 25 draft reference genomes 141 varied from 1.25%-72.25% (Fig. 2b). The alignments obtained from BWA (Fig. 3 and 4) were 142 further analyzed by Qualimap [18]. The comparison between the 5 reference genomes clearly 143 reveals the coverage (X) across the genome is higher for A. kamchatkensis strain G10 and 144 lowest for Anoxybacillus sp. strain B7M1. The coverage histograms (Fig. 4) indicate the 145 number of genomic locations vs. coverage (X). The comparison between the complete genomes 146 indicates A. kamchatkensis strain G10 having better coverage (X= 208-271) for number base 147 pairs, whereas A. amylolyticus strain DSM, Anoxybacillus sp. strain B7M1, A. flavithermus 148 strain WK1 have lower coverage. A. gonensis strain G2 also has a similar coverage histogram 149 as that of A. kamchatkensis strain G10.

150 According to the 95% ANI and 70% dDDH, which are the thresholds for species demarcation 151 [19], the OGRI (dDDH and ANI) results revealed that the Anoxybacillus spp. strains MB8, G2 152 and G10 have a high degree of genomic relatedness. The results indicate that Anoxybacillus 153 gonensis G2 shares 71.1% dDDH and 96.73% ANI whereas Anoxybacillus kamchatkensis G10 154 shares 71.8% dDDH and 96.67% ANI with Anoxybacillus sp. strain MB8.

155 The MiGA results also indicate that the closest relatives were Anoxybacillus gonensis NZ 156 CP012152T (96.86% ANI) and Anoxybacillus kamchatkensis G10 NZ CP025535 (96.83% 157 ANI) obtained from MiGA. The strain MB8 most likely belongs to the same subspecies of 158 Anoxybacillus gonensis NZ CP012152T (Fig. 5).

159

160 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

161 Fig. 1: Phylogenetic analysis of Anoxybacillus sp. strain MB8 using the Maximum 162 Likelihood method.

163 The tree was constructed based on the 16S rRNA gene sequence. The scale bar represents 0.02 164 nucleotide substitutions per position and the species used as an out-group is Brevibacillus 165 brevis NCIMB 9372. The bootstrap consensus tree inferred from 1000 replicates and the 166 branches corresponding to partitions reproduced in less than 50% bootstrap replicates are 167 collapsed. The evolutionary analyses were conducted in MEGA7.

168 Fig. 2: Alignment of Anoxybacillus sp. strain MB8 reads against complete and draft 169 genomes of Anoxybacillus genus

170 The software Bowtie2 was used to calculate the alignment rates (in percentage). a) The column 171 chart explains the alignment rate of the Anoxybacillus sp. strain MB8 aligned against complete 172 genomes of Anoxybacillus sp. strain B7M1, Anoxybacillus amylolyticus strain DSM, 173 Anoxybacillus flavithermus strain WK1, Anoxybacillus gonensis strain G2, Anoxybacillus 174 kamchatkensis strain G10, and Anoxybacillus sp. strain MB8. b) The bar chart depicts the 175 alignment rate of the Anoxybacillus sp. strain MB8 aligned against twenty-five draft genomes 176 of Anoxybacillus species downloaded from NCBI

177 Fig. 3: Coverage across reference genomes using Qualimap.

178 It represents the coverage distribution and deviation across the reference. The bottom part of 179 each figure indicates the GC content across a reference (black line) and the dotted line shows 180 its average value. The genome Anoxybacillus sp. strain MB8 was aligned against the 5 181 complete genomes using Burrows-Wheeler Aligner (BWA).

182 Fig. 4: Genome coverage histogram generated from Qualimap

183 This figure is created after aligning genome MB8 reads with the 5 reference genomes using 184 BWA.

185 Fig. 5: Figure showing the taxonomic classification of reads using Microbial Genomes 186 Atlas (MiGA) online service.

187 It represents that the isolate MB8 is closely related and most likely be a subspecies of 188 Anoxybacillus gonensis.

189 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

190 Table 1: Genome Features of Anoxybacillus sp. MB8

Attribute Value genome size 2,898,780 bp Total no. of contigs 190 contigs (length>=300) Shortest contig 300 Longest contig 1,60,603 N50 54,614 bp average length 15,257 GC content 41.8% Total no. of protein-coding genes 2,976 Total no. of rRNA operon 1 Total no. of tRNA 73 Total no. of tmRNA 1 Total no. of CRISPRS 10 Coding Ratio 88.1% 191

192 Table 2: eggNOG functional categories obtained using the eggNOG database

Code Relative abundance Description (%) B 0.0180 Chromatin structure and dynamics C 5.5836 Energy production and conversion D 1.0842 Cell cycle control, cell division, chromosome partitioning E 7.0654 Amino acid transport and metabolism F 2.3671 Nucleotide transport and metabolism G 3.6863 Carbohydrate transport and metabolism H 3.6320 Coenzyme transport and metabolism I 3.0538 Lipid transport and metabolism J 5.3306 Translation, ribosomal structure, and biogenesis K 5.7824 Transcription L 7.3726 Replication, recombination and repair M 3.8670 Cell wall/membrane/envelope biogenesis N 1.4395 Cell Motility O Posttranslational modification, protein turnover, 3.4815 chaperones P 5.3668 Inorganic ion transport and metabolism Q Secondary metabolites biosynthesis, transport, and 0.7228 catabolism R 7.5894 General function prediction only S 26.3823 Function unknown T 3.9573 Signal transduction mechanisms U 0.8794 Intracellular trafficking, secretion, and vesicular transport V 1.3371 Defense mechanisms 193 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

194 Table 3: Alignment rate of reads against complete genomes of Anoxybacillus available in 195 NCBI using Bowtie2

Genome name Alignment rate Total aligned reads

Anoxybacillus sp. strain 2.84% 1,36,014 B7M1 Anoxybacillus amylolyticus 4.53% 2,16,934 strain DSM Anoxybacillus flavithermus 24.60% 11,77,579 strain WK1 Anoxybacillus gonensis 73.54% 35,20,021 strain G2 Anoxybacillus 73.77% 35,31,006 kamchatkensis strain G10 Anoxybacillus sp. strain 98.68% 47,23,548 MB8 Total no. of reads 47,86,534 196

197 2. Experimental Design, Materials and Methods 198 2.1. Growth conditions

199 The strain MB8 was isolated and cultured on DifcoTryptic Soy Agar (Becton, Dickinson and 200 Company, USA) constituting of pancreatic digest of casein, papaic digest of soybean, sodium 201 chloride and agar and were incubated at 65°C.

202 2.2. DNA extraction, PCR amplification, and Phylogenetic analysis

203 Genomic DNA was extracted using phenol-chloroform extraction method from the pure 204 culture. The bacterial isolate was identified using 16S rRNA gene amplification and sequencing 205 using universal eubacteria specific primers (8F: 5’-AGAGTTTGATCCTGGCTCAG-3’ and 206 1492R: 5’-GGTTACCTTGTTACGACTT-3’) [20]. The PCR product was analyzed by gel 207 electrophoresis, which was observed to be of ~1400 bp size. The amplified regions were 208 sequenced using Sanger sequencing by ABI 3500 genetic analyzer sequencer (Applied 209 Biosystems) at the DNA sequencing facility of IISER Bhopal. The forward and reverse 210 sequences obtained were assembled using FLASH [21]. A similarity search against the NCBI 211 database was performed for each of the assembled sequences using BLASTn [14]. Further, the 212 phylogenetic analysis was carried using the MEGA7 software [22].

213 2.3. Genome sequencing, de novo assembly, and annotations bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

214 The genomic libraries were prepared using the Illumina Nextera XT sample preparation kit 215 (Illumina Inc., USA). The libraries were evaluated on 2100 Bioanalyzer using Bioanalyzer 216 High Sensitivity DNA kit (Agilent Technologies, Santa Clara, CA, USA) to estimate the library 217 size. They were further quantified on a Qubit 2.0 fluorometer using Qubit dsDNA HS kit (Life 218 technologies, USA) and by qPCR using KAPA SYBR FAST qPCR Master Mix and Illumina 219 standards and primer premix (KAPA Biosystems, Wilmington, MA, USA) following the 220 Illumina suggested protocol. Libraries were then loaded on Illumina NextSeq 500 platform 221 using NextSeq 500/550 v2 sequencing reagent kit (Illumina Inc., USA) and 150 bp paired-end 222 sequencing was performed at the Next-Generation Sequencing (NGS) Facility, IISER Bhopal, 223 India.

224 The pre-processing, i.e., ambiguity filtration and quality control, of the raw reads obtained from 225 Illumina sequencing, was performed using the NGSQC Toolkit_v2.3.3 [23]. The filtered reads 226 obtained with a cut-off value for read-length 70 and PHRED quality score of 25 were selected 227 for de novo assembly, which was carried out using SPAdes 3.5.0 [24]. The k-mer parameters 228 from 21 to 101 were used to assemble the reads. The contigs assembled with the k-mer 91 were 229 selected for gene prediction and annotation. The genome sequences (190 contigs) were 230 annotated using DDBJ Fast Annotation and Submission Tool (DFAST) [25]. The tRNA genes 231 were predicted using tRNAscan-SE 2.0 web server [26]. The KEGG Automated Annotation 232 Server (KAAS) analysis was used to assign KOs and to identify metabolic pathways [27]. 233 Further, the annotated genome with 2,796 genes was assigned eggNOG categories by aligning 234 them against eggnog v4.1 database [28] using NCBI-blast-2.6.0+ [29] and were annotated with 235 eggNOG ID. The representation of eggnog classes was deducted using in-house Perl scripts. 236 Contigs were further assembled (72 scaffolds) using Medusa web server [30] for predicting 237 rRNA genes using RNAmmer 1.2 Server [31].

238 2.4. Alignment of MB8 reads against complete genome of Anoxybacillus spp.

239 The complete genome sequences of Anoxybacillus sp. strain B7M1, Anoxybacillus 240 amylolyticus strain DSM, Anoxybacillus flavithermus strain WK1, Anoxybacillus gonensis 241 strain G2, and Anoxybacillus kamchatkensis strain G10 as well as the twenty-five available 242 draft genome sequences were retrieved from NCBI database. Reference-based alignment was 243 carried out using the reads of genome MB8 with the tool Bowtie2 and Burrows-Wheeler 244 Aligner [16,17]. Reads were also aligned against the 190 contigs (length>=300bp) of 245 Anoxybacillus sp. strain MB8. Qualimap [18] was used for comparing the alignment coverage bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

246 and GC content after aligning the reads against an aforementioned complete reference 247 genomes. To determine the closest genomic neighbours and nucleotide identities, the web 248 server Microbial Genomes Atlas (MiGA) was used, where the query genome MB8 was 249 searched against the NCBI prokaryotic genome database [32]. The Overall Genome 250 Relatedness Indices (OGRI) was calculated between the query and the 5 reference genomes 251 using Genome-to-Genome Distance Calculator (GGDC) version 2.1 online server, using the 252 recommended formula 2 [33] . The OrthoANIu tool [34] was used to calculate the Average 253 Nucleotide Identity (ANI) values between the query MB8 and two complete reference genome 254 i.e. Anoxybacillus gonensis G2 and Anoxybacillus kamchatkensis G10. The 190 contigs 255 (>=300bp) obtained after assembly was used for MiGA, ANI and GGDC calculations.

256 Author Contributions

257 RS, PW, and VKS carried out the sample collection. RS and PW cultured and isolated the 258 strain, and carried out the 16S rRNA sequencing and identification. RS prepared the sequencing 259 libraries and carried out the whole genome sequencing. VPPK and SSM estimated the genome 260 size and assembly, assessment of assembly quality, carried out the genome annotation, 261 eggNOG analysis, and reference-based genome analysis. SSM carried out the MiGA, GGDC, 262 and OrthoANIu analysis. SSM, RS, VPPK, and VKS wrote the manuscript. All authors read, 263 edited, and approved the final manuscript.

264 Declaration of Competing Interest

265 Authors declare no conflict of interest.

266 Acknowledgments

267 The authors gratefully acknowledge the NGS facility at IISER Bhopal for providing the 268 infrastructure to carry out the sequencing experiments. RS, SSM and VPPK acknowledge DST- 269 INSPIRE for providing research fellowship. We thank MHRD, Govt of India for funding 270 Centre for Research on Environment and Sustainable Technologies (CREST) at IISER Bhopal 271 for providing financial support. However, the views expressed in this manuscript are that of 272 the authors alone and no approval of the same, explicit or implicit, by MHRD should be 273 assumed.

274 References

275 [1] R. Saxena, D.B. Dhakan, P. Mittal, P. Waiker, A. Chowdhury, A. Ghatak, V.K. bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

276 Sharma, Metagenomic analysis of hot springs in central india reveals hydrocarbon 277 degrading thermophiles and pathways essential for survival in extreme environments, 278 Front. Microbiol. (2017). https://doi.org/10.3389/fmicb.2016.02123.

279 [2] E.E. Ferrandi, C. Sayer, M.N. Isupov, C. Annovazzi, C. Marchesi, G. Iacobone, X. 280 Peng, E. Bonch-Osmolovskaya, R. Wohlgemuth, J.A. Littlechild, D. Monti, Discovery 281 and characterization of thermophilic limonene-1,2-epoxide hydrolases from hot spring 282 metagenomic libraries, FEBS J. (2015). https://doi.org/10.1111/febs.13328.

283 [3] J.A. Littlechild, Enzymes from extreme environments and their industrial applications, 284 Front. Bioeng. Biotechnol. (2015). https://doi.org/10.3389/fbioe.2015.00161.

285 [4] A. Poddar, S.K. Das, Microbiological studies of hot springs in India: a review, Arch. 286 Microbiol. (2018). https://doi.org/10.1007/s00203-017-1429-3.

287 [5] P. Mittal, R. Saxena, V.K. Sharma, Draft genome sequence of Anoxybacillus 288 mongoliensis strain MB4, a sulfur-utilizing aerobic thermophile isolated from a hot 289 spring in Tattapani, central India, Genome Announc. (2017). 290 https://doi.org/10.1128/genomeA.01709-16.

291 [6] R. Saxena, N. Chaudhary, D.B. Dhakan, V.K. Sharma, Draft genome sequence of 292 Gulbenkiania mobilis strain MB1, a sulfur-metabolizing thermophile isolated from a 293 hot spring in Central India, Genome Announc. (2015). 294 https://doi.org/10.1128/genomeA.01295-15.

295 [7] D.B. Dhakan, R. Saxena, N. Chaudhary, V.K. Sharma, Draft genome sequence of 296 Tepidimonas taiwanensis strain MB2, a chemolithotrophic thermophile isolated from a 297 hot spring in central India, Genome Announc. (2016). 298 https://doi.org/10.1128/genomeA.01723-15.

299 [8] S.J. Lee, Y.J. Lee, N. Ryu, S. Park, H. Jeong, S.J. Lee, B.C. Kim, D.W. Lee, H.S. Lee, 300 Draft genome sequence of the thermophilic bacterium Anoxybacillus kamchatkensis 301 G10, J. Bacteriol. (2012). https://doi.org/10.1128/JB.01877-12.

302 [9] J. Linares-Pasten, M. Andersson, E. Karlsson, Thermostable Glycoside Hydrolases in 303 Biorefinery Technologies, Curr. Biotechnol. (2014). 304 https://doi.org/10.2174/22115501113026660041.

305 [10] S.H. Park, Y. Na, J. Kim, S.D. Kang, K.H. Park, Properties and applications of starch bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

306 modifying enzymes for use in the baking industry, Food Sci. Biotechnol. (2018). 307 https://doi.org/10.1007/s10068-017-0261-5.

308 [11] A.O. Belduz, S. Canakci, K.G. Chan, U.M. Kahar, C.S. Chan, A.S. Yaakop, K.M. 309 Goh, Genome sequence of Anoxybacillus ayderensis AB04T isolated from the Ayder 310 hot spring in Turkey, Stand. Genomic Sci. (2015). https://doi.org/10.1186/s40793-015- 311 0065-2.

312 [12] J.H. Saw, B.W. Mountain, L. Feng, M. V. Omelchenko, S. Hou, J.A. Saito, M.B. Stott, 313 D. Li, G. Zhao, J. Wu, M.Y. Galperin, E. V. Koonin, K.S. Makarova, Y.I. Wolf, D.J. 314 Rigden, P.F. Dunfield, L. Wang, M. Alam, Encapsulated in silica: Genome, proteome 315 and physiology of the thermophilic bacterium Anoxybacillus flavithermus WK1, 316 Genome Biol. (2008). https://doi.org/10.1186/gb-2008-9-11-r161.

317 [13] R.W. Ye, S.M. Thomas, Microbial nitrogen cycles: Physiology, genomics and 318 applications, Curr. Opin. Microbiol. (2001). https://doi.org/10.1016/S1369- 319 5274(00)00208-3.

320 [14] V. Thiel, M. Tank, L.P. Tomsho, R. Burhans, S.E. Gaya, T.L. Hamilton, S.C. Schuster, 321 D.A. Bryant, Draft genome sequence of anoxybacillus ayderensis strain MT-Cab 322 (), Genome Announc. (2017). https://doi.org/10.1128/genomeA.00547-17.

323 [15] K.M. Goh, H.M. Gan, K.G. Chan, G.F. Chan, S. Shahar, C.S. Chong, U.M. Kahar, 324 K.P. Chai, Analysis of Anoxybacillus genomes from the aspects of lifestyle 325 adaptations, prophage diversity, and carbohydrate metabolism, PLoS One. (2014). 326 https://doi.org/10.1371/journal.pone.0090549.

327 [16] B. Langmead, S.L. Salzberg, Fast gapped-read alignment with Bowtie 2, Nat. 328 Methods. (2012). https://doi.org/10.1038/nmeth.1923.

329 [17] H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler 330 transform, Bioinformatics. (2009). https://doi.org/10.1093/bioinformatics/btp324.

331 [18] F. García-Alcalde, K. Okonechnikov, J. Carbonell, L.M. Cruz, S. Götz, S. Tarazona, J. 332 Dopazo, T.F. Meyer, A. Conesa, Qualimap: Evaluating next-generation sequencing 333 alignment data, Bioinformatics. (2012). https://doi.org/10.1093/bioinformatics/bts503.

334 [19] Y. Liu, Q. Lai, Z. Shao, Genome-based analysis reveals the and diversity of 335 the family idiomarinaceae, Front. Microbiol. (2018). bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

336 https://doi.org/10.3389/fmicb.2018.02453.

337 [20] S. Turner, K.M. Pryer, V.P.W. Miao, J.D. Palmer, Investigating deep phylogenetic 338 relationships among cyanobacteria and plastids by small subunit rRNA sequence 339 analysis, in: J. Eukaryot. Microbiol., 1999. https://doi.org/10.1111/j.1550- 340 7408.1999.tb04612.x.

341 [21] T. Magoč, S.L. Salzberg, FLASH: Fast length adjustment of short reads to improve 342 genome assemblies, Bioinformatics. (2011). 343 https://doi.org/10.1093/bioinformatics/btr507.

344 [22] S. Kumar, G. Stecher, K. Tamura, MEGA7: Molecular Evolutionary Genetics Analysis 345 Version 7.0 for Bigger Datasets, Mol. Biol. Evol. (2016). 346 https://doi.org/10.1093/molbev/msw054.

347 [23] R.K. Patel, M. Jain, NGS QC toolkit: A toolkit for quality control of next generation 348 sequencing data, PLoS One. (2012). https://doi.org/10.1371/journal.pone.0030619.

349 [24] A. Bankevich, S. Nurk, D. Antipov, A.A. Gurevich, M. Dvorkin, A.S. Kulikov, V.M. 350 Lesin, S.I. Nikolenko, S. Pham, A.D. Prjibelski, A. V. Pyshkin, A. V. Sirotkin, N. 351 Vyahhi, G. Tesler, M.A. Alekseyev, P.A. Pevzner, SPAdes: A new genome assembly 352 algorithm and its applications to single-cell sequencing, J. Comput. Biol. (2012). 353 https://doi.org/10.1089/cmb.2012.0021.

354 [25] Y. Tanizawa, T. Fujisawa, E. Kaminuma, Y. Nakamura, M. Arita, DFAST and 355 DAGA: Web-based integrated genome annotation tools and resources, Biosci. 356 Microbiota, Food Heal. (2016). https://doi.org/10.12938/bmfh.16-003.

357 [26] P.P. Chan, T.M. Lowe, tRNAscan-SE: Searching for tRNA genes in genomic 358 sequences, in: Methods Mol. Biol., 2019. https://doi.org/10.1007/978-1-4939-9173- 359 0_1.

360 [27] Y. Moriya, M. Itoh, S. Okuda, A.C. Yoshizawa, M. Kanehisa, KAAS: An automatic 361 genome annotation and pathway reconstruction server, Nucleic Acids Res. (2007). 362 https://doi.org/10.1093/nar/gkm321.

363 [28] S. Powell, K. Forslund, D. Szklarczyk, K. Trachana, A. Roth, J. Huerta-Cepas, T. 364 Gabaldón, T. Rattei, C. Creevey, M. Kuhn, L.J. Jensen, C. Von Mering, P. Bork, 365 EggNOG v4.0: Nested orthology inference across 3686 organisms, Nucleic Acids Res. bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

366 (2014). https://doi.org/10.1093/nar/gkt1253.

367 [29] C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, T.L. 368 Madden, BLAST+: Architecture and applications, BMC Bioinformatics. (2009). 369 https://doi.org/10.1186/1471-2105-10-421.

370 [30] E. Bosi, B. Donati, M. Galardini, S. Brunetti, M.F. Sagot, P. Lió, P. Crescenzi, R. 371 Fani, M. Fondi, MeDuSa: A multi-draft based scaffolder, Bioinformatics. (2015). 372 https://doi.org/10.1093/bioinformatics/btv171.

373 [31] K. Lagesen, P. Hallin, E.A. Rødland, H.H. Stærfeldt, T. Rognes, D.W. Ussery, 374 RNAmmer: Consistent and rapid annotation of ribosomal RNA genes, Nucleic Acids 375 Res. (2007). https://doi.org/10.1093/nar/gkm160.

376 [32] L.M. Rodriguez-R, S. Gunturu, W.T. Harvey, R. Rosselló-Mora, J.M. Tiedje, J.R. 377 Cole, K.T. Konstantinidis, The Microbial Genomes Atlas (MiGA) webserver: 378 Taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome 379 level, Nucleic Acids Res. (2018). https://doi.org/10.1093/nar/gky467.

380 [33] J.P. Meier-Kolthoff, A.F. Auch, H.P. Klenk, M. Göker, Genome sequence-based 381 species delimitation with confidence intervals and improved distance functions, BMC 382 Bioinformatics. (2013). https://doi.org/10.1186/1471-2105-14-60.

383 [34] S.H. Yoon, S. min Ha, J. Lim, S. Kwon, J. Chun, A large-scale evaluation of 384 algorithms to calculate average nucleotide identity, Antonie van Leeuwenhoek, Int. J. 385 Gen. Mol. Microbiol. (2017). https://doi.org/10.1007/s10482-017-0844-4.

386

387

388

389 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a b

98.68% Anoxybacillus tepidamans PS2 N667DRAFT 1.25% 100 Anoxybacillus geothermalis strain ATCC BAA-2555 1.60% Anoxybacillus flavithermus strain B4168 2.51% Anoxybacillus sp. P3H1B 2.83% 80 73.54% 73.77% Anoxybacillus sp. UARK-01 2.86% Anoxybacillus vitaminiphilus strain CGMCC 3.26% Anoxybacillus flavithermus AK1 18.48% 60 Anoxybacillus pushchinoensis strain K1 20.96% Anoxybacillus flavithermus TNO-09.006 25.61% Anoxybacillus flavithermus strain AF14 26.93% 27.24% 40 Anoxybacillus flavithermus strain AF16 Anoxybacillus sp. 103 28.89% 24.60% Anoxybacillus suryakundensis strain DSM 27374 30.52% Anoxybacillus flavithermus subsp. yunnanensis str. E13 31.22% 20 Alignment rate (in %) (in rate Alignment Anoxybacillus flavithermus NBRC 33.06% 4.53% Anoxybacillus flavithermus strain KU2-6_11 42.16% 2.84% Anoxybacillus sp. KU2-6(11) 42.69% 0 Anoxybacillus thermarum strain AF/04T 63.23% Anoxybacillus ayderensis strain SK3-4 66.56% Anoxybacillus ayderensis strain MT-Cab 67.59% Anoxybacillus flavithermus strain 25 68.68%

A.sp MB8 Anoxybacillus sp. BCO1 69.22% A.sp B7M1 A.sp Anoxybacillus ayderensis strain AB04 69.47% A.amylolyticus A.gonensis G2 Anoxybacillus mongoliensis strain MB4 71.17% Anoxybacillus gonensis strain DT3-1 72.25%

A.flavithermus WK1 0 10 20 30 40 50 60 70 80 A.kamchatkensis G10 Alignment rate bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Coverage (X) Coverage (X) Coverage (X) GC (%) GC (%) GC (%) Position (bp) Position (bp) Position (bp) a. Anoxybacillus sp. strain B7M1 b. Anoxybacillus amylolyticus strain DSM c. Anoxybacillus flavithermus strain WK1

Coverage GC Content Coverage (X) Coverage (X) Mean GC Content GC (%) GC (%) Position (bp) Position (bp) d. Anoxybacillus gonensis strain G2 e. Anoxybacillus kamchatkensis strain G10 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Number of genomic locations Number of genomic locations

Number of genomic locations

Coverage(X) Coverage(X) Coverage(X) a. Anoxybacillus sp. strain B7M1 b. Anoxybacillus amylolyticus strain DSM c. Anoxybacillus flavithermus strain WK1

Number of genomic locations

Number of genomic locations

Coverage(X) Coverage(X) d. Anoxybacillus gonensis e. Anoxybacillus kamchatkensis strain G10 strain G2 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.09.447659; this version posted June 10, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

root (p-value 0****) * : p-value < 0.5 : p-value < 0.1 domain Bacteria (p-value 0****) ** : p-value < 0.05 phylum Firmicutes (p-value 0****) *** **** : p-value < 0.01 class Bacilli (p-value 0****) order (p-value 0****) family Bacillaciea (p-value 0****) genus Anoxybacillus (p-value 0.00933****) species Anoxybacillus gonensis (p-value 0.0156***) subspecies Anoxybacillus sp. MB8 (p-value 0.0762**)