Canadian Journal of Microbiology

Genomic Comparison of Facultatively Anaerobic and Obligatory Aerobic Caldibacillus debilis Strains GB1 and Tf Helps Explain Physiological Differences

Journal: Canadian Journal of Microbiology

Manuscript ID cjm-2018-0464.R2

Manuscript Type: Article

Date Submitted by the 18-Dec-2018 Author:

Complete List of Authors: Wushke, Scott; University of Manitoba Froese, Alan; University of Manitoba , Microbiology Fristensky, Brian; University of Manitoba, Plant Science Zhang, Xiang;Draft University of Manitoba, Plant Science Spicer, Victor; University of Manitoba, Physics & Astronomy Krokhin, Oleg; University of Manitoba, Physics & Astronomy Levin, David B.; University of Manitoba, Biosystems Engineering Sparling, Richard; University of Manitoba, Microbiology

Keyword: Thermophile, Fermentation, Genomics

Is the invited manuscript for consideration in a Special Not applicable (regular submission) Issue? :

https://mc06.manuscriptcentral.com/cjm-pubs Page 1 of 28 Canadian Journal of Microbiology

1 Genomic Comparison of Facultatively Anaerobic and Obligatory Aerobic Caldibacillus

2 debilis Strains GB1 and Tf Helps Explain Physiological Differences

3

4 Scott Wushke1, Alan Froese1, Brian Fristensky2, Xiang Li Zhang2, Vic Spicer3, Oleg V.

5 Krokhin4, David B. Levin 5, Richard Sparling1*

6 1Department of Microbiology, University of Manitoba, Winnipeg, Manitoba, Canada

7 2Department Plant Science, University of Manitoba, Winnipeg, Manitoba, Canada

8 3Department Physics & Astronomy, University of Manitoba, Winnipeg, Manitoba, Canada 9 4Department of Internal Medicine & DraftManitoba Centre for Proteomics and Systems Biology, 10 University of Manitoba, Winnipeg, Manitoba, Canada

11 5Department of Biosystems Engineering, University of Manitoba, Winnipeg, Manitoba, Canada

12

13

14

15

16

17

18 * Corresponding Author: Richard Sparling, Department of Microbiology, University of

19 Manitoba, Winnipeg, Manitoba, Canada, R3T 2N2; Tel: (204) 474-8320; Fax: (204) 474-8320;

20 E-mail: [email protected]

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 2 of 28

21 Abstract

22 Caldibacillus debilis strains GB1 and Tf display distinct phenotypes. C. debilis GB1 is capable

23 of anaerobic growth and can synthesize ethanol while C. debilis Tf cannot. Comparison of the

24 GB1 and Tf genome sequences revealed that the genomes were highly similar in gene content

25 and showed a high level of synteny. At the genome scale, there were several large sections of

26 DNA that appeared to be from lateral gene transfer into the GB1 genome. Tf did have unique

27 genetic content but at a much smaller scale; 300 genes in Tf verses 857 genes in GB1 that

28 matched at ≤90% sequence similarity. Gene complement and copy number of genes for the

29 glycolysis, tricarboxylic acid (TCA) cycle, and electron transport chain (ETC) pathways were 30 identical in both strains. While Tf is an Draftobligate aerobe, it possesses the gene complement for an 31 anaerobic lifestyle (ldh, ak, pta, adhE, pfl). As a , other strains of C. debilis should be

32 expected to have the potential for anaerobic growth. Assaying the whole cell lysate for ADH

33 activity revealed an approximately 2-fold increase in the enzymatic activity in GB1 when

34 compared to Tf

35

36 Keywords: Thermophilic, Fermentation, Genomics

37

https://mc06.manuscriptcentral.com/cjm-pubs Page 3 of 28 Canadian Journal of Microbiology

38 Importance

39 This work characterizes the genomes of the thermophilic C. debilis strains GB1 and Tf with

40 regard to core metabolism. Comparison of two highly related but physiologically distinct strains

41 highlights the physiology and genome potential of the single species genus Caldibacullis

42 especially with regards to fermentation. Caldibacillus spp. may be of interest in industrial

43 process, namely biofuel production, a deep understanding of core metabolism is needed in order

44 to optimize these processes.

45

Draft

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 4 of 28

46 1.0 Introduction 47 48 Caldibacillus debilis GB1 was isolated from a co-culture found to convey aerotolerance

49 to Clostridium thermocellum when co-cultured aerobically on cellulose (Wushke et al. 2013).

50 The recently defined single species genus Caldibacillus has been the subject of only a few

51 physiological studies in pure culture (Banat et al. 2004; Wushke et al. 2015). Caldibacillus

52 debilis Tf was originally isolated from a cool soil environment in Northern Ireland (Banat et al.

53 2004). Contrary to what was observed with C. debilis GB1, Caldibacillus debilis Tf was unable

54 to create a micro-aerophilic environment suitable for co-culturing with the strict anaerobe

55 Clostridium thermocellum (Wushke et al. 2015). We believe that the inability of C. debilis Tf to 56 grow anaerobically may affect its abilityDraft to create the reduced anaerobic microenvironment 57 needed for C. thermocellum to thrive. Previously, Wushke et al. (2015) compared the

58 physiological profile of two strains of C. debilis, GB1 and Tf. They showed near identical

59 characteristics in growth rate, temperature optimum, substrate utilization profile, and types of

60 end-products produced under aerobic and oxygen-limiting conditions, with the notable exception

61 of ethanol production. A C. debilis Tf draft genome has been generated and annotated by the

62 Joint Genome Institute (JGI) (Grigoriev et al. 2012), but there has not been a detailed genome

63 analysis published. The C. debilis GB1 genome was previously sequenced and its proteome was

64 analyzed with respect to aerobic and anaerobic growth (Wushke et al. 2017). Because of the

65 major physiological differences observed, including anaerobic growth and ethanol synthesis, we

66 hypothesized that a comparison of the genomes sequences of the GB1 and Tf, with a focus on

67 core metabolism, could explain the observed physiological differences. Part of the work

68 presented herein was adapted from the Ph.D. thesis of Scott Wushke (2017).

https://mc06.manuscriptcentral.com/cjm-pubs Page 5 of 28 Canadian Journal of Microbiology

69 2.0 Materials and Methods 70

71 2.1 Cell Culturing

72 The methods for growing Caldibacillus debilis strain GB1 DSM 29516 and C. debilis Tf

73 DSM 16016 cultures were described previously (Wushke et al. 2013, 2015). C. debilis strains

74 were grown on modified 1191 (M-1191) medium (Islam et al. 2006) with a lower concentration

75 of yeast extract (0.76 g/L) and the pH was adjusted to 7.2 for all experiments. Sealed Balch tubes

76 (26 mL) from Bellco Glass Inc. (Vineland, NJ) and 1L Corning bottles supplied from Fisher

77 Scientific (Toronto, ON), were used to carry out experiments. Anaerobic and aerobic

78 environments in Balch tubes and Corning bottles were prepared as previously described by

79 Wushke et al. (2015). Draft

80 81 2.2 Genomic Comparison 82 83 The GB1 genome was sequenced and annotated as previously described (Wushke et al.

84 2017). The GB1 and Tf genome sequences can both be accessed through the National Center for

85 Biotechnology Information (NCBI) under the catalogue numbers AZRV00000000 and

86 ARVR00000000, respectively. Joint Genome Institute (JGI) Integrated Microbial Genomes

87 (IMG) Expert Review (IMG/ER) was used to compare the bacterial genomes (Markowitz et al.

88 2008). NUCmer plots were generated through IMG/ER. Gview was used to create genome-scale

89 comparative images (Stothard et al 2017, Petkau et al. 2010). Protein and DNA alignments were

90 done with Multiple Sequence Alignment Tool (MUSCLE) (Edgar et al. 2004). NCBI Blast was

91 used to create a 2D model of domains and key amino acids involved in function (Johnson et al.

92 2008). 3D models of the ADHE proteins were created using RAPTORX (Källberg et al. 2012).

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 6 of 28

93 2.3 Protein Extraction 94 95 Samples for protein extraction were collected during early exponential in aerobic growth,

96 OD600 0.2, shaken at 150rpm for both strains GB1 and Tf of C. debilis. Approximately ~200 mg

97 cell pellets were washed twice in phosphate buffered saline (PBS) buffer then re-suspended in 1

98 mL of lysis buffer [1% sodium dodecyl sulfate (SDS), 100 mM dithiothreitol (DTT)] in 100 mM

99 ammonium bicarbonate (ABC). The mixture was vortexed for 30 seconds (sec), then placed in

100 boiling water for 5 minutes (min), and then cooled for 5 min. Samples were then sonicated three

101 times with 15 sec pulses, and 1 min breaks in-between then were centrifuged at 16000 x g for 20

102 min. Supernatants containing the proteins were kept for further processing, and stored at -80 ºC. 103 Aliquots (20 μL) of the protein containingDraft supernatants were diluted with 180 µL of 100 mM 104 ABC buffer. Twenty (20) µL of 500 mM iodoacetic acid (IAA) were added, vortexed for 5 sec

105 and stored in the dark for 45 min to complete alkylation. To quench excess IAA, 33 µL of

106 100mM DTT was added. Then, 150 µL of ABC were added to bring total volume to ~380 µL to

107 reduce the SDS concentration in order to allow digestion. Trypsin (4 µg) was added, and CaCl2

108 was added to a final concentration of 10 mM. Samples were incubated for 12 hours at 37 ºC to

109 allow trypsinization to occur. Samples were then dried using roto-evaporation (Savant Speedvac-

110 sc110), and 200 µL of 3 M KCl was added and vortexed for 10 min, then were spun at 16000 x g

111 for 25 min and supernatants were gently removed from the precipitate (containing the SDS,

112 which is incompatible mass spectrometry methods). The samples were diluted 2 fold with 0.5%

113 trifluoroacetic acid (TFA) then desalted by HPLC using a C18 column as described previously

114 (Wushke et al. 2017).

https://mc06.manuscriptcentral.com/cjm-pubs Page 7 of 28 Canadian Journal of Microbiology

115 2.4 Mass Spectrometry Methods 116

117 One-dimensional liquid chromatography followed by mass spectrometry (1D-LC-MS)

118 proteomics were used to identify proteins by their peptide sequences, as previously described by

119 Gungormusler-Yilmaz et al. (2014). Proteomic profiles of the C. debilis strains GB1 and Tf

120 grown to early exponential phase on cellobiose were compared using an in-house data analysis

121 system called UNITY (Table S5), as previously described (Fu et al. 2015). The XML raw output

122 file from an X!tandem search and its corresponding TIC intensities can be found at

123 http://hs2.proteome.ca/thegpm-cgi/plist.pl?path=/gpm/archive/C.debilis_GB1/GB1A-

124 BASERESULTS.xml&proex=-1&npep=0 and http://hs2.proteome.ca/thegpm- 125 cgi/plist.pl?path=/gpm/archive/C.debilis_GB1/GB1B-BASERESULTS.xml&proex=-1&npep=0Draft 126 and http://hs2.proteome.ca/thegpm-cgi/plist.pl?path=/gpm/archive/C.debilis_GB1/DSM1606AX-

127 BASERESULTS.xml&proex=-1&npep=0.

128

129 2.5 Proteogenomics 130 131 To enhance the genome sequence and annotation, a proteogenomic approach was used,

132 on the basis of the proteomes described above. We have previously described the use of

133 proteomic data to identify and correct improperly annotated genes using a naïve assembler

134 (Wushke et al. 2017) using the same methodology as Verbeke et al. (2014). We used these

135 proteomic data to verify unique regions and genes missing in the C. debilis GB1 and Tf genome

136 by analyzing the peptides generated with respect to both the GB1 and Tf genomes.

137

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 8 of 28

138 2.6 Whole cell extract enzymatic assays

139 C. debilis cells grown to early exponential (OD600 0.2) on cellobiose were pelleted

140 through centrifugation at 13,000 rpm for 2 minutes. The pelleted cells were resuspended in lysis

141 buffer containing 10 mM MOPS, pH 7.4, 20 mM NaCl, and 5 mM DTT, and lysed using

142 sonication for 6 minutes with 30 seconds on / 30 seconds off cycles. The alcohol dehydrogenase

143 (ADH) assay reaction mixture contained 0.2 mM NADH or NADPH, 10 mM acetaldehyde, 10

144 mM DTT, 100 mM MOPS, and 20 µl cell extract. The final volume was 200 µl, the assay

145 temperature was 50°C, the pH was 7.4, and the assay was started by the addition of acetaldehyde.

146 The ADH reaction was also done in the reverse direction using 1.36 M ethanol, 5 mM NAD+ or

147 NAD(P)+, 10 mM DTT, 100 mM MOPS, and 20 µl cell extract. In the ALDH forward reaction

148 the acetaldehyde in the reaction mixtureDraft was replaced with 0.5 mM acetyl-CoA. In the reverse

149 direction for the ALDH reaction 1 mM CoA was added and the NAD(P)H was replaced with 2.5

150 mM NAD(P)+. Background activity was measured by adding water instead of the usual starter

151 chemical, and was subtracted from the reaction activity recorded. The changes in NAD(P)(H)

152 concentrations were recorded in a BioTek synergy 4 plate reader by monitoring the absorbance at

153 340 nm. All enzymatic assays were done in triplicate.

154

https://mc06.manuscriptcentral.com/cjm-pubs Page 9 of 28 Canadian Journal of Microbiology

155 3.0 Results 156 157 3.1 GB1 and Tf Genome Comparison 158 159 A comparision of the GB1 (Wushke et al. 2017) and Tf (Mukherjee et al. 2017)

160 permanent draft genomes containing 49 and 41 contigs respectively can be found in Table 1

161 (complete genome comparison summary in Table S1). CheckM analysis (Parks et al. 2014)

162 indicates 99.42 completeness for Tf and 99.27 for GB1 indicating that these genomes are mostly

163 complete with respect to functional genes. A pairwise ANI of the two genomes is consistent

164 with two strains of the same species (Table S2). The genome sequences of C. debilis strains

165 GB1 and Tf are highly similar in both gene identity and genome synteny (2354 genes in 166 matching cassettes, where a cassette consistsDraft of blocks of associated genes in sets of 2 or more). 167 GB1 has 2570 genes matching Tf with a sequence identity ≥90%. Figure S2 shows a NUCmer

168 plot demonstrating their synteny produced using IMG/ER. When the genome sequences of the

169 two strains were compared, there were 853 genes in GB1 that did not match with a sequence

170 identity greater than or equal to 90% in Tf. Since these strains primarily have highly related

171 genes, matches with ≤90% sequence identity were considered to be unique. Many of the

172 unmatched genes found in GB1 were considered hypothetical (408). Many of the unmatched

173 genes could be attributed to two structural differences: i) the addition of 207 genes on contig 4

174 which appear to form a conjugative element; ii) the addition of the full length genome of a

175 cryptic prophage on contig 16 described elsewhere Wushke et al. (2018), as well as several

176 partial copies of this prophage associated with the end of various other contigs. There were also

177 16 transposase-associated genes dispersed through the genome several of which were near

178 identical copies. Figure 1 provides a graphical representation produced by G-View of the GB1

179 and Tf aligned genome contigs with genes matching at ≤90% sequence similarity highlighted. In

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 10 of 28

180 contrast to GB1, Tf has only 300 unique genes that do not match sequences in the GB1 genome

181 at a ≤90% sequence similarity. Unlike GB1 most of the unique genes in Tf were scattered

182 throughout the various contigs and did not appear in any large contiguous sections. The unique

183 genes seen in Tf came from a variety of genes including different transposase-associated genes,

184 different small hypothetical genes, and differences in ABC cassettes. Figure S1 is a Venn

185 diagram of unique genes of GB1 and Tf at a 90% sequence similarity cut off.

186 Sequence analyses of GB1 revealed that there are 24 tRNA genes (Cdeb_01241-

187 Cdeb_01300) on contig 4 that are associated with the conjugative element described above.

188 There are 85 tRNA genes in GB1 compared to only 58 in Tf, as shown in Table S3. 189 Investigation of differences in theDraft gene complement of the Phosphotransferase (PT) 190 systems show that GB1 has several extra subunits associated with putative cellobiose transport

191 specificity (Table S4), consistent with its isolation from cellulose enrichment cultures and plates

192 containing cellobiose as carbon and energy substrate (Wushke et al. 2013).

193 We found two annotation differences between GB1 and Tf during our investigation: i)

194 LSU ribosomal protein L36P (Cdeb_03413), while Tf possessed the exact same DNA as GB1

195 sequence but the ORF was missing in Tf. ii) GB1 has its pyruvate kinase split into two ORF’s

196 (Cdeb_02915, Cdeb_02914) which was annotated as separate but adjacent genes representing the

197 two domains of the pyruvate kinase in GB1 and one gene, a pyruvate kinase, in Tf even though

198 DNA in the coding region is 100% identical.

199 3.2 Proteogenomics 200 201 The general features of the C. debilis GB1 genome have been described previously, and

202 proteogenomics used to correct some annotation start sites (Wushke et al. 2017). In this study we

https://mc06.manuscriptcentral.com/cjm-pubs Page 11 of 28 Canadian Journal of Microbiology

203 used a further single shallow proteome for each strain (Table S5), GB1 and Tf (704 and 473

204 proteins detected respectively), during early aerobic growth (Figure S3) which allowed us to: i)

205 further check for genes that were missed due to incomplete assembly of sequences and ii) check

206 whether or not genes in regions which appeared unique to GB1 were expressed. No new genes

207 were found in GB1 or the Tf using proteogenomics that would indicate incomplete sequencing of

208 major transcribed and translated genes. The regions that appeared to be unique to GB1, a cryptic

209 prophage region within contig 16 and conjugation equipment comprising contig 4, resulted in 5

210 and 3 proteins detected respectively. When the Tf proteome was used to check the GB1 genome

211 unique regions on contig 16 and contig 4 no proteins were observed, indicating that these regions

212 were most likely absent in Tf, but unique to GB1.

213 3.4 Pyruvate Fermentation Draft

214 A list of pyruvate metabolism genes and their abbreviations GB1 and Tf can be found in

215 Table 2, the full table with corresponding locus tags can be found in the supplementary (Table

216 S6). When focusing on pyruvate fermentation pathways, small differences were observed in

217 regions within 200bp up stream of a several genes of interest shown in Figure S4. When

218 examining the genes each appeared to have a Pribnow box and transcription start site. In some

219 cases, the differences in the upstream regions between GB1 and Tf were extensive (ldh, pfl, adh),

220 compared to others (ak, pta, adhE, aldh) where differences were observed but were less

221 extensive, as seen in Figure S4. Some of the coding sequences for pyruvate fermentation

222 contained differences (adhE, pfl activating enzyme, one of two aldh, and four of the adh genes),

223 shown in Figure S5, and others (one of two aldh, pfl C-acetyltransferase, ldh and ak) did not, as

224 would be expected in closely related strains. The activity of key enzymes PFL, LDH, AK,

225 ADHE could be surmised by end-product analysis under oxygen-limited conditions, as their

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 12 of 28

226 corresponding products were detected, with the exception of ethanol in Tf, even though the

227 genome of strain Tf encodes ADHE. Both strains of C. debilis have a single copy of adhe as well

228 as discrete copies of adh and aldh.

229 While lactic acid production was low in both organisms, differences in lactic acid

230 production were observed between the strains, with GB1 typically producing significantly less

231 than Tf under comparable conditions (Wushke et al. 2015). There are significant differences in

232 upstream regions of the ldh gene between the two strains; these changes may have affected the

233 transcriptional regulation of LDH.

234 When cultured under oxygen-limiting conditions both strains, GB1 and Tf, synthesize 235 formate and acetate as end-products indicatingDraft that both are able to use PFL (Wushke et al. 236 2015). There is a difference in the amount of formate produced by each strain under oxygen-

237 limited conditions; significant differences in the upstream regions of PFL are observed.

238 3.5 Alcohol Aldehyde Dehydrogenase (adhE) fine analysis and Alcohol Aldehyde

239 Dehydrogenase whole lysate assays

240 While the genomes of both strains encoded a putative ADHE enzyme, detailed analyses

241 revealed sequence differences in regions upstream as well as within the coding sequence of the

242 adhE gene. Nucleotide substitutions in the ADHE coding sequence, shown in Figure S6, result in

243 different amino acids: GB1  Tf = 269AE, 482DE, 495VI, 825D  A, 829R  K,

244 831PH. These amino acid changes, shown graphically in Figure S6 are caused by a single

245 nucleotide polymorphism in all but one case, 495VI where two bp were changed. Changes

246 482DE and 495VI, are most likely benign sense mutations. Two codon changes, 825 DA

247 and 829RK, are seen in other functional ADHE alignments and seem to be normal variations

248 in coding sequence (Extance et al. 2013). However, 269AE is a missense substitution rather

https://mc06.manuscriptcentral.com/cjm-pubs Page 13 of 28 Canadian Journal of Microbiology

249 close to a putative catalytic cysteine at position 255. The 831PH codon change is a missense

250 mutation that appears to place a charged amino acid in an α-helix region on the outer shell of the

251 Tf ADHE protein. The changes in the ADH domain of ADHE are not directly within any

252 putative active site, cofactor binding-site, or known allosteric regulation site (Zheng et al. 2015).

253 The Blast domain analysis is shown in Figure S7. These amino acid substitutions affect a small

254 loop connecting an α-helix region (Extance et al. 2013), and therefore could affect overall

255 protein or multimeric protein structure.

256 While there are also differences in regions upstream of adhE, they did not appear to

257 affect the promoter binding site and are not as extensive as the differences observed in ldh and

258 pfl (Figure S4). Draft 259 In order to determine if ADH activity or expression could account for the observed

260 differences in end-product production and anaerobic growth whole cell lysates were assayed.

261 The cell extract ADH activity in the forward direction for GB1 was 2.14 (st. dev = 0.031) U/mg

262 of cell-free extract protein (U = 1 µmol of substrate/min) and for Tf was 1.06 (st. dev. = 0.060)

263 U/mg (Table 3). The ADH activity in the reverse direction for GB1 was 5.75 (st. dev = 0.833)

264 U/mg of cell-free extract protein and for Tf was 0.861 (st. dev. = 0.031) U/mg (Table 3). There

265 was no detectable ADH activity for either GB1 or Tf when using NADPH as the cofactor in the

266 cell extract. The activity of AlDH could not be effectively measured in the reverse direction,

267 possibly due to the presence of an aldh (Cdeb_00412) that only required H2O, acetaldehyde, and

268 NAD+ to form acetate. This enzyme activity masked the CoA dependent activity of aldh, Table

269 S7 shows that activity was the same regardless of the addition of HS-CoA. The forward reaction

270 was not tried as acetyl-CoA is a substrate not only for aldh but also for pyruvate dehydrogenase

271 (PDH) which again would mask results in cell free-extracts.

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 14 of 28

272

Draft

https://mc06.manuscriptcentral.com/cjm-pubs Page 15 of 28 Canadian Journal of Microbiology

273 4.0 Discussion

274 4.1 Genomics

275 The genome comparison confirms that these two organisms are distinct strains of the

276 same species despite their divergent capabilities to grow in the absence of oxygen. While there

277 are some genome structural changes observed, the metabolic potential appears to be nearly

278 identical. Interpreting changes in physiology on the basis of minor changes at the nucleotide

279 level in and surrounding some of the core metabolism genes must be approached with caution as

280 such differences are also consistent with what would be expected in distinct strains (Hayashi et

281 al. 2001). Indeed many of the differences noted between GB1 and Tf would not be expected to 282 affect gene expression or activity (e.g. silentDraft mutations, mutations in non-coding areas). 283 Differences in tRNA complement have the potential to affect gene regulation at a global

284 level at the level of translation. However, all of the tRNA’s unique to GB1 had anticodons that

285 were shared by Tf and GB1, therefore the codon usage ability was not expanded.

286

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 16 of 28

287 4.2 Pyruvate Fermentation

288 Oxygen depleting conditions, i.e. conditions that become microaerobic, were the most

289 informative to observe differences in end-product profiles between the 2 strains, Table S8

290 (Wushke et al. 2015). Indeed the end product profile observed previously by Wushke et al.

291 (2015) was key in driving the genomic comparison done in this study.

292 Many wild type thermophilic and mesophilic organisms synthesize lactate as a major

293 fermentative end-product (Narayanan et al. 2004) While there are differences in the proportion

294 of lactate produced for Tf and GB1 and differences in the DNA sequence of the gene, clear

295 reasons for difference in lactic acid production were not apparent despite the difference in LDH 296 sequence. Draft 297 Production of formate and acetate alone from pyruvate cannot act as an electron sink for

298 NADH generated by glycolysis. Under microaerobic conditions where oxygen is present and

299 formate and acetate are being produced, Tf is able to use the electrons generated through

300 glycolysis via the electron transport chain (ETC). If typical glycolysis and PFL are the primary

301 means by which acetyl-CoA is generated, then anaerobic growth is not possible unless an

302 additional reduced compound is produced. Strain GB1 is capable of anaerobic growth because it

303 is also able to synthesize ethanol. Typically, expression of an air-sensitive enzyme, such as PFL,

304 is linked to anaerobic metabolism, including in subtilis which has a similar end-product

305 profile as C. debilis GB1 (Nakano et al. 1998). The use of PFL can be important for biosynthesis

306 in other (Rydzak et al. 2015). It is more likely that Tf represents a strain that has lost

307 the ability to produce ethanol, preventing anaerobic growth. Tf still maintains regulation and

308 expression of PFL since formate production is seen under oxygen limiting conditions (Wushke et

309 al. 2015; 2017). Since both strains demonstrate the ability to express PFL, as indicated by the

https://mc06.manuscriptcentral.com/cjm-pubs Page 17 of 28 Canadian Journal of Microbiology

310 presences of formate, the focus should be on ethanol production genes, since this is where there

311 is a clear physiological difference.

312 The synthesis of ethanol appears to be the main mechanism used by C. debilis GB1 to

313 dispose of excess electrons from glycolysis under anaerobic conditions, as lactic acid production

314 seems to be limited in both strains under the growth conditions used (Wushke et al. 2015; 2017).

315 Ethanol synthesis may therefore be an anaerobic requirement in C. debilis GB1. ADHE is

316 associated with the bulk of ethanol production in many anaerobic organisms (Carere et al. 2012,

317 Peng et al. 2008).

318 4.3 Lack of ethanol synthesis in Tf 319 The lack of ethanol synthesis in DraftC. debilis Tf may be understood by looking at the 320 sequence of the bifunctional alcohol/aldehyde dehydrogenase (adhE) gene. Differences in adhE

321 gene expression regulation or absence of ADHE enzyme activity could result in the ethanol-

322 minus Tf phenotype and explain the lack of anaerobic growth. There is precedence in

323 Geobacillus thermoglucosidasius, an organism from closely related genus, which produces

324 similar end-products to C. debilis GB1 under both aerobic and anaerobic growth conditions

325 (Extance et al. 2013). G. thermoglucosidasius has a similar core metabolism gene complement to

326 C. debilis GB1 including several ADH genes, a standalone AlDH and a ADHE Extance et al.

327 (2013) noted that gene knock-outs of G. thermoglucosidasius ADHE results in a lack of

328 anaerobic growth and an ethanol-minus phenotype despite the presence of alternate annotated

329 alcohol dehydrogenases and AlDH, very similar to the Tf phenotype. Furthermore, we do

330 observe a ~2 fold increase in acetaldehyde reduction to ethanol activity in C. debilis GB1 cell

331 extracts, and a 5 fold difference in the opposite direction. Taken together, the results suggest that

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 18 of 28

332 the phenotype differences observed when comparing GB1 and Tf could be due to the differences

333 in the amino acid sequence of the ADHE.

334 4.4 Respirofermentative Metabolism

335 Previous experiments done by Wushke et al. (2015) characterised end-product production

336 on cellobiose aerobically and anaerobically for strains GB1 and Tf. Under conditions where

337 oxygen concentrations are low Tf and GB1 both utilise incomplete respiration and non-reductive

338 components of fermentative pathways, also referred to as respirofermentative growth (Siso et al.

339 1996, Zigha et al. 2007). Incomplete respiration can happen when TCA cycle flux is less than

340 that of glycolysis (Vemuri et al. 2006). During respirofermentative growth a significant 341 proportion of the electrons produced fromDraft glycolysis are consumed by respiration, but excess 342 electrons from glycolysis are shunted to fermentation products such as ethanol. GB1 can either

343 use its electrons to create ethanol and/or lactate and/or use them in ETC; based on end-product

344 production under oxygen limiting conditions it appears to do all three. Tf, when undergoing

345 respiro-fermentative growth, utilizes the ETC and produces lactate as alternate electron sink.

346 While organism like Lactobacillus balance its electrons anaerobically by producing lactate alone

347 (De Vries et al.1970), Tf does not do this under the growth conditions tested. While Tf and GB1

348 both have additional adh and aldh genes, these may be used primarily for aerobic ethanol

349 oxidation; Wushke et al. (2015) show GB1 is capable of consuming ethanol aerobically.

350

351 4.5 Issues with the current genus description for Caldibacillus

352 Members of the Bacillus and Geobacillus genera appear to fill many niches within both

353 aerobic and facultativly anaerobic environmental communities. The loss of anaerobic

354 metabolism would seem to be a rather large change regarding taxonomical classification, which

https://mc06.manuscriptcentral.com/cjm-pubs Page 19 of 28 Canadian Journal of Microbiology

355 would typically lead to different strains being described as different species based on

356 physiological characteristics. However, strains GB1 and Tf do appear to be 2 strains of the same

357 species based on their genome comparison. The inability of Tf to undergo anaerobic metabolism

358 appears to be the result of one or several very small changes in the genomes. The ability of

359 similar strains to fulfill distinct roles within a co-culture has been previously described by

360 Verbeke et al. (2011). However, because of the geographic distance between the sources of these

361 2 strains it is impossible at this point to state whether loss of anaerobic growth capabilities is due

362 to natural microdiversity within the species or post enrichment genetic drift because of

363 differences in growth conditions in laboratory prior to sequencing.

364 Overall C. debilis GB1 and Tf are very similar organisms at the genetic level, in both

365 synteny and gene complement, includingDraft core metabolism pathway genes. Through our genome

366 analysis a gene complement consistent with fermentation (pfl, ldh, ak, pta, and adhE) was

367 observed in Tf and GB1. Tf likely evolved from a strain that could ferment cellobiose and

368 produce ethanol. The ability of C. debilis GB1 to grow under anaerobic conditions and

369 synthesize ethanol clearly distinguishes it from the C. debilis the type strain, Tf. The genus

370 description of Caldibacillus should be amended to include anaerobic growth as a possible

371 characteristic for strains of this genus, since it is observed in at least one strain of the only

372 species currently described.

373 Acknowledgements

374 This research was supported by Genome Canada, through the Applied Genomics Research in

375 Bioproducts or Crops (ABC) program for the grant titled, “Microbial Genomics for Biofuels and

376 Co-Products from Biorefining Processes” and by the Natural Sciences and Engineering Research

377 Council of Canada (NSERC) Discovery grant (RGPIN-06173-2014) held by Dr. Sparling

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 20 of 28

378 References 379 380 Banat, I., Marchant, R., Rahman, T. 2004. Geobacillus debilis sp. nov., a novel obligately 381 thermophilic bacterium isolated from a cool soil environment, and reassignment of 382 Bacillus pallidus to Geobacillus pallidus comb. nov. International Journal of Systematic 383 Evolutionary Microbiology. 54: 2197-2201. http://dx.doi.org/10.1099/ijs.0.63231-0 384 Carere, C.R., Rydzak, T., Verbeke, T.J., Cicek, N., Levin, D.B., Sparling, R. 2012. Linking 385 genome content to biofuel production yields: a meta-analysis of major catabolic pathways 386 among select H 2 and ethanol-producing .BMC Microbiology.12:1 387 http://dx.doi.org/10.1186/1471-2180-12-295

388 Cripps, R., Eley, K., Leak, D., Rudd, B., Taylor, M., Todd, M., Boakes, S., Martin, S., Atkinson, 389 T. 2009. Metabolic engineering of Geobacillus thermoglucosidasius for high yield 390 ethanol production. Metabolic Engineering. 11: 398-408. 391 http://dx.doi.org/10.1016/j.ymben.2009.08.005 392 Cryer, D. R., R. Eccleshall, J. Marmur. 1975. Isolation of yeast DNA. Methods in cell biology 393 12:39-44. 394 De Vries, Wytske, Willemina MC Kapteijn, E. G. Van der Beek, and A. H. Stouthamer. 1970. 395 Molar growth yields and fermentation balances of Lactobacillus casei L3 in batch 396 cultures and in continuous cultures.Draft Microbiology 63:333-345.

397 Edgar, R.C. 2004 MUSCLE: multiple sequence alignment with high accuracy and high 398 throughput. Nucleic Acids Research. 32:1792-1797. http://doi.org/10.1093/nar/gkh340

399 Extance, J., Crennell, S.J., Eley, K., Cripps, R., Hough, D.W., Danson, M.J. 2013. Structure of a 400 bifunctional alcohol dehydrogenase involved in bioethanol generation in Geobacillus 401 thermoglucosidasius. Acta Crystallographica. 69: 2104-2115. 402 http://dx.doi.org/10.1107/S0907444913020349 403 Fu, J., Sharma, P., Spicer, V., Krokhin, O.V., Zhang, X., Fristensky, B., Sparling, R., Levin, D.B. 404 2015. Effects of impurities in biodiesel-derived glycerol on growth and expression of 405 heavy metal ion homeostasis genes and gene products in Pseudomonas putida LS46. 406 Applied Microbiology and Biotechnology. 1:10. http://dx.doi.org/10.1007/s00253-015- 407 6685-z 408 Grigoriev, I.V., Nordberg, H., Shabalov, I., Aerts, A., Cantor, M., Goodstein, D., Otillar, R. 409 2012. The genome portal of the department of energy joint genome institute. Nucleic 410 Acids Research. 40:D26-D32. http://dx.doi.org/10.1093/nar/gkr947 411 Gungormusler-Yilmaz, M., Shamshurin, D., Grigoryan, M., Taillefer, M., Spicer, V., Krokhin, 412 O.V., Levin, D.B. 2014. Reduced catabolic protein expression in Clostridium butyricum 413 DSM 10702 correlate with reduced 1, 3-propanediol synthesis at high glycerol loading. 414 AMB Express. 4:63. http://dx.doi.org/10.1186/s13568-014-0063-6 415 Hayashi, T., Makino, K., Ohnishi, M., Kurokawa, K., Ishii, K., Yokoyama, K., Shinagawa, H. 416 2001. Complete genome sequence of enterohemorrhagic Eschelichia coli O157: H7 and 417 genomic comparison with a laboratory strain K-12. DNA Research. 8:11-22.

https://mc06.manuscriptcentral.com/cjm-pubs Page 21 of 28 Canadian Journal of Microbiology

418 http://dx.doi.org/10.1093/dnares/8.1.11 419 Islam, R., Cicek, N., Sparling, R., Levin, D. 2006. Effect of substrate loading on hydrogen 420 production during anaerobic fermentation by Clostridium thermocellum 27405. Applied 421 Microbiology and Biotechnology. 72:576-583. http://dx.doi.org/10.1007/s00253-006- 422 0316-7

423 Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L. 2008. 424 NCBI BLAST: a better web interface. Nucleic Acids Research. 36:W5-W9. 425 http://doi.org/10.1093/nar/gkn201

426 Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., Xu, J. (2012) Template-based 427 protein structure modeling using the RaptorX web server. Nature Protocols. 7:1511- 428 1522. http://doi.org/10.1038/nprot.2012.085

429 Markowitz, V.M., Chen, I.M.A., Chu, K., Szeto, E., Palaniappan, K., Grechkin, Y., Kyrpides, 430 N.C. 2012. IMG/M: the integrated metagenome data management and comparative 431 analysis system. Nucleic Acids Research. 40:D123-D129. 432 http://dx.doi.org/10.1093/nar/gkr1044 433 Markowitz, V.M., Ivanova, N.N., Szeto, E., Palaniappan, K., Chu, K., Dalevi, D., Kyrpides, N.C. 434 2008. IMG/M: a data management and analysis system for metagenomes.Nucleic Acids 435 Research. 36:D534-D538. http://dx.doi.org/10.1093/nar/gkm869Draft 436 Mukherjee, S., Seshadri, R., Varghese, N.J., Eloe-Fadrosh, E.A., Meier-Kolthoff, J.P., Göker, 437 M., Coates, R.C., Hadjithomas, M., Pavlopoulos, G.A., Paez-Espino, D. and Yoshikuni, 438 Y., 2017. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of 439 the tree of life. Nature biotechnology.35:7 https://doi.org/10.1038/nbt.3886 440 Nakano, M.M. & Zuber, P. 1998. Anaerobic growth of a “strict aerobe” . Annual 441 Review of Microbiology.52:165-190. http://dx.doi.org/10.1146/annurev.micro.52.1.165 442 Narayanan, N., Roychoudhury, P.K., Srivastava, A. 2004. L (+) lactic acid fermentation and its 443 product polymerization. Electronic Journal of Biotechnology. 7:167-178. 444 Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. and Tyson, G.W., 2015. CheckM: 445 assessing the quality of microbial genomes recovered from isolates, single cells, and 446 metagenomes. Genome research. pp.86072. 447 Peng, H., Wu, G., Shao, W. 2008. The aldehyde/alcohol dehydrogenase (AdhE) in relation to the 448 ethanol formation in Thermoanaerobacter ethanolicus JW200. Anaerobe.14:125-127. 449 http://dx.doi.org/10.1016/j.anaerobe.2007.09.004 450 Petkau, A., Matthew, S., Paul, S., Gary, V.D. 2010. Interactive microbial genome visualization 451 with GView. Bioinformatics.24:3125-3126. 452 http://dx.doi.org/10.1093/bioinformatics/btq588 453 Rydzak, T., Levin, D., Cicek, N., Sparling, R. 2009. Growth phase-dependant enzyme profile of 454 pyruvate catabolism and end-product formation in Clostridium thermocellum ATCC 455 27405. Journal of Biotechnology. 140:169-175. 456 http://dx.doi.org/10.1016/j.jbiotec.2009.01.022

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 22 of 28

457 Siso, M.G., Ramil, E., Cerdán, M.E., Freire-Picos, M.A. 1996. Respirofermentative metabolism 458 in Kluyveromyces lactis: ethanol production and the Crabtree Effect. Enzyme and 459 Microbial Technology.18:585-591. http://dx.doi.org.uml.idm.oclc.org/10.1016/0141- 460 0229(95)00151-4

461 Stothard, P., Grant, J.R. and Van Domselaar, G., 2017. Visualizing and comparing circular 462 genomes using the CGView family of tools. Briefings in bioinformatics.

463 Verbeke, T. J., Tim J. Dumonceaux, Scott Wushke, Nazim Cicek, David B. Levin, and Richard 464 Sparling. 2011. Isolates of Thermoanaerobacter thermohydrosulfuricus from decaying 465 wood compost display genetic and phenotypic microdiversity." FEMS microbiology 466 ecology 78:3 473-487. 467 Vemuri, G.N., Altman, E., Sangurdekar, D.P., Khodursky, A.B. and Eiteman, M.A. 2006. 468 Overflow metabolism in Escherichia coli during steady-state growth: transcriptional 469 regulation and effect of the redox ratio. Applied and environmental microbiology 470 72:3653-3661. 471 Wushke, S., Levin, D.B., Cicek, N., Sparling, R. 2015. Characterization of the facultative 472 anaerobe Caldibacillus debilis GB1 and its use in a designed aerotolerant, cellulose 473 degrading, co-culture with Clostridium thermocellum. Applied and Environmental 474 Microbiology. AEM, 00735. http://dx.doi.org/10.1128/AEM.00735-15Draft 475 Wushke, S., Levin, D.B., Cicek, N., Sparling, R. 2013. Characterization of enriched aerotolerant 476 cellulose-degrading communities for biofuels production using differing selection 477 pressures and inoculum sources. Canadian Journal of Microbiology.59: 679-683. 478 http://dx.doi.org/10.1139/cjm-2013-0430 479 Wushke, S., Spicer, V., Zhang, X.L., Fristensky, B., Krokhin, O.V., Levin, D.B., Cicek, N. and 480 Sparling, R. 2017. Understanding aerobic/anaerobic metabolism in Caldibacillus debilis 481 through a comparison with model organisms. Systematic and applied microbiology. 482 40:5245-253 483 Wushke, S., Jin, Z., Spicer, V., Zhang, X.L., Fristensky, B., Krokhin, O.V., Levin, D.B. and 484 Sparling, R. 2018. Description of a cryptic thermophilic (pro) phage, CBP1 from 485 Caldibacillus debilis strain GB1. Extremophiles. 22:203-209. 486 https://doi.org/10.1007/s00792-017-0988-1 487 Wushke, S. 2017. Characterization of C. debilis GB1 a thermophilic facultative anaerobe capable 488 of lending aerotolerance in co-culture with C. thermocellum. Ph.D Thesis, University of 489 Manitoba, Winnipeg. 490 Zheng, T., Olson, D.G., Tian, L., Bomble, Y.J., Himmel, M.E., Lo, J., Lynd, L.R. 2015. 491 Cofactor Specificity of the bifunctional alcohol and aldehyde dehydrogenase (AdhE) in 492 wild-type and mutant Clostridium thermocellum and Thermoanaerobacterium 493 saccharolyticum. Journal of Bacteriology.197: 2610-2619. 494 Zigha, A., Rosenfeld, E., Schmitt, P. and Duport, C. 2007. The redox regulator Fnr is required 495 for fermentative growth and enterotoxin synthesis in F4430/73. Journal of 496 bacteriology. 189: 2813-2824. 497

https://mc06.manuscriptcentral.com/cjm-pubs Page 23 of 28 Canadian Journal of Microbiology

498 Figure 1. Unique Genome of C. debilis GB1 (panel A) and Tf (Panel B). Gene bank file (.gbk) 499 comparison of Unique Genome genes that match ≤90% sequence similarity to the 500 corresponding strain are shown in red, ORF’s are shown in brown. This figure is a 501 graphical representation of the differences between the two genomes created by 502 concatenating the contigs of each organism and then comparing them. In panel A, Contig 503 4 is shown with a green arrow, the full putative phage genome shown with orange arrow, 504 plasmid shown with blue arrow. In panel B the areas of uniqueness are much shorter, 505 typically only corresponding to a few genes. The purple arrow points to a divergent ATP 506 binding cassette transporter. 507 508

Draft

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 24 of 28

A)

Draft

https://mc06.manuscriptcentral.com/cjm-pubs Page 25 of 28 Canadian Journal of Microbiology

B)

Draft

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 26 of 28

Table 1. General genome features of C. debilis GB1 and Tf (source: IMG/er) GB1 Tf DNA, total number of bases 3346235 3061346 DNA coding number of bases 2670185 2430336 % G+C content 51.2 51.6

DNA scaffolds 49 41 CRISPR Count 5 8

Genes total number 3374 2896 Protein coding genes 3264 2800 RNA genes 110 96 rRNA genes 8 13 5S rRNA 5 5 16S rRNA 2 5 23S rRNA 1 3 tRNA genes 85 58 Other RNA genes 17 25 Protein coding genes with functionDraft 2450 2212 prediction without function prediction 814 588 Shared genes (<90% sequence similarity) 2571 Unique genes (<90% sequence similarity) 853 300

https://mc06.manuscriptcentral.com/cjm-pubs Page 27 of 28 Canadian Journal of Microbiology

Table 2. Pyruvate metabolism gene complement of GB1 and Tf as specified through KEGG maps built in IMG/ER

Pyruvate Metabolism # of gene copies in GB1 (same as Tf) Pyruvate formate lyase (pfl) 2 Bifunctional aldehyde/alcohol dehydrogenase (adhe) 1 Acetate kinase (ak) 1 Phosphotransacetylase (pta) 1 Lactate dehydrogenase (ldh) 1 Aldehyde dehydrogenase (aldh) 2 Alcohol dehydrogenase (adh) 4 Pyruvate Dehydrogenase (pdh) E1 (EC:1.2.4.1) 6 E2 (EC:2.3.1.12) 3 E3 (EC:1.8.1.4) 3

Draft

https://mc06.manuscriptcentral.com/cjm-pubs Canadian Journal of Microbiology Page 28 of 28

Table 3. NADH/NAD+ dependent ADH activity in whole cell lysate U/mg Strain added to reaction mixture protein Standard deviation GB1 acetaldehyde 2.033 0.031 Total activity of GB1 ADH in forward direction* 2.137 Tf acetaldehyde 1.281 0.059 Total activity of Tf ADH in forward direction* 1.059 GB1 ethanol 6.59 0.833 Total activity of GB1 in reverse direction* 5.75 Tf ethanol 0.710 0.031 Total activity of Tf in reverse direction* 0.861 *Corrected by subtracting the no carbon substrate control

Draft

https://mc06.manuscriptcentral.com/cjm-pubs