bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Absolute proteome quantification in the gas-fermenting acetogen Clostridium

2 autoethanogenum

3 Kaspar Valgepea1,2, Gert Talbo1,3, Nobuaki Takemori4, Ayako Takemori4, Christina Ludwig5,

4 Alexander P. Mueller6, Ryan Tappel6, Michael Köpke6, Séan Dennis Simpson6, Lars Keld Nielsen1,3,7

5 and Esteban Marcellin1,3*

6 1Australian Institute for Bioengineering and Nanotechnology (AIBN), The University of Queensland, 4072 St. Lucia, Australia

7 2ERA Chair in Gas Fermentation Technologies, Institute of Technology, University of Tartu, 50411 Tartu, Estonia

8 3Queensland Node of Metabolomics Australia, AIBN, The University of Queensland, 4072 St. Lucia, Australia

9 4Institute for Promotion of Science and Technology, Ehime University, 791-0295 Ehime, Japan

10 5Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of Munich, 85354 Freising, Germany

11 6LanzaTech Inc., 60077 Skokie, USA

12 7The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kongens Lyngby, Denmark

13 *Correspondence: [email protected]

14

15 ABSTRACT

16 Microbes that can recycle one-carbon (C1) greenhouse gases into fuels and chemicals are vital for the

17 biosustainability of future industries. Acetogens are the most efficient known microbes for fixing

18 carbon oxides CO2 and CO. Understanding proteome allocation is important for metabolic

19 engineering as it dictates metabolic fitness. Here, we use absolute proteomics to quantify intracellular

20 concentrations for >1,000 proteins in the model-acetogen Clostridium autoethanogenum grown on

21 three gas mixtures. We detect prioritisation of proteome allocation for C1 fixation and significant

22 expression of proteins involved in the production of acetate and ethanol as well as proteins with

23 unclear functions. The data also revealed which isoenzymes are important. Integration of proteomic

24 and metabolic flux data demonstrated that catalyse high fluxes with high concentrations and

25 high in vivo catalytic rates. We show that flux adjustments were dominantly accompanied with

26 changing catalytic rates rather than concentrations. Our work serves as a reference dataset and

27 advances systems-level understanding and engineering of acetogens.

28

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

29 INTRODUCTION 30 Increasing concerns about irreversible climate change are accelerating the shift to renewable, carbon-

31 free energy production (e.g., solar, wind, fuel cells). However, many fuels and chemicals will stay

32 carbon-based, and thus, technologies for their production using sustainable and renewable feedstocks

33 are needed to transition towards a circular bioeconomy. Moreover, the rising amount of solid waste

34 produced by human activities (e.g., municipal solid waste, lignocellulosic waste) will further endanger

35 our ecosystems' already critical state. Both challenges can be tackled by using organisms capable of

36 recycling gaseous one-carbon (C1) waste feedstocks (e.g., industrial waste gases [CO2, CO, CH4],

37 syngas from gasified biomass or municipal solid waste [CO, H2, CO2]) into fuels and chemicals at

38 industrial scale1–3.

39 As we transition into a new bioeconomy, a key feature of global biosustainability will be the

40 capacity to convert carbon oxides into products at industrial scale. Acetogens are the ideal biocatalysts

41 for this as they use the most energy-efficient pathway, the Wood-Ljungdahl pathway (WLP)4,5, for

6–9 42 fixing CO2 into the central metabolite acetyl-CoA and accept gas (CO, H2, CO2) as their sole carbon

43 and energy source5. Indeed, the model-acetogen Clostridium autoethanogenum is already being used

44 as a cell factory in industrial-scale gas fermentation3,10. The WLP is considered the first biochemical

45 pathway on Earth7,11–13 and continues to play a critical role in the biogeochemical carbon cycle by

6,14 46 fixing an estimated 20% of the global CO2 . While biochemical details of the WLP are well

47 described4,6,15, a quantitative understanding of acetogen metabolism is just emerging16,17. Notably,

48 recent systems-level analyses of acetogen metabolism have revealed mechanisms behind metabolic

49 shifts18–21, transcriptional architectures22,23, and features of translational regulation24,25. However, we

50 still lack an understanding of acetogen proteome allocation through the quantification of proteome-

51 wide intracellular protein concentrations. This fundamental knowledge is required for advancing

52 rational metabolic engineering of acetogen cell factories and for accurate in silico reconstruction of

53 their phenotypes using metabolic models1,2.

54 Quantitative description of an organism’s proteome allocation through absolute proteome

55 quantification is valuable in several ways. Firstly, it enables us to understand prioritisation of the

56 energetically costly proteome resources among functional protein categories, metabolic pathways, and

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

57 single proteins26,27. This may also identify relevant proteins with unclear functions and high

58 abundances. Secondly, some metabolic fluxes can be catalysed by isoenzymes and a comparison of

59 their intracellular concentrations can indicate which are likely relevant in vivo and are thus targets for

60 genetic perturbation experiments to validate in vivo functionalities28. Thirdly, integration of absolute

61 proteomics and metabolic flux data enable the estimation of apparent in vivo catalytic rates of

26,29 62 enzymes (kapp) , which can be used to identify less-efficient enzymes as targets for improving

63 pathways through metabolic and protein engineering. Absolute proteomics data also contribute to the

64 curation of accurate genome-scale metabolic models.

65 Absolute proteome quantification is generally performed using label-free mass-spectrometry

66 (MS) approaches without spike-in standards30,31. The major limitation of this approach is that

67 accuracy of label-free estimated protein concentrations cannot be determined. Furthermore, the

68 optimal model to convert MS signals (e.g., spectral counts, peak intensities) into protein

69 concentrations remains unknown32–34. Label-based approaches using stable-isotope labelled (SIL)

70 spike-ins of endogenous proteins are thus preferred for reliable absolute proteome quantification. This

71 strategy relies on accurate absolute quantification of a limited set of intracellular proteins (i.e.,

72 anchors) using SIL spike-ins to establish a linear correlation between protein concentrations and their

73 measured MS intensities32. Studies with the latter approach have determined a 1.52.4-fold error for

74 label-free estimation of proteome-wide protein concentrations in multiple organisms28,35–41.

75 The aim of our work was to perform reliable absolute proteome quantification for the first

76 time in an acetogen. We employed a label-based MS approach using SIL-protein spike-in standards to

77 quantify SIL-based concentrations for 16 key proteins and label-free-based concentrations for >1,000

78 C. autoethanogenum proteins during autotrophic growth on three gas mixtures. This allowed us to

79 explore global proteome allocation, uncover isoenzyme usage in central metabolism, and quantify

80 regulatory principles associated with estimated kapps. Our work provides an important reference

81 dataset and advances the systems-level understanding and engineering of the ancient metabolism of

82 acetogens.

83

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

84 RESULTS 85 Absolute proteome quantification framework in the model-acetogen C. autoethanogenum. We

86 performed absolute proteome quantification from autotrophic steady-state chemostat cultures of C.

87 autoethanogenum grown on three different gas mixtures: CO, syngas (CO+CO2+H2), or CO+H2

18,19 88 (termed “high-H2 CO”) described before . Briefly, four biological cultures of each gas mixture

89 were grown anaerobically on a chemically defined medium at 37 °C, pH of 5, and dilution rate ~1

90 day-1 (specific growth rate ~0.04 h-1) without the use of heavy SIL substrates. The absolute proteome

91 quantification framework (Fig. 1) was built on using 19 synthetic heavy SIL-variants of key C.

92 autoethanogenum proteins covering central metabolism (Supplementary Table 1). The SIL-protein

93 standards were spiked in for quantification of intracellular concentrations of their endogenous light

94 counterparts. This framework ensures accurate absolute quantification compared to commonly used

95 peptide spike-ins. Spiking cell lysates with protein standards before sample clean-up and protein

96 digestion accounts for errors accompanying these critical steps30,31,42,43. Furthermore, selection of

97 peptides ensuring accurate absolute protein quantification without prior MS data is challenging as its

98 difficult to predict which peptides “fly” well30,31,42,43. In contrast, all proteotypic peptides from a

99 protein spike-in can be used for quantification.

100 We synthesised heavy-labelled lysine and arginine SIL-proteins using a cell-free wheat germ

101 extract platform as described previously18,44,45 and quantified standard stocks using parallel reaction

102 monitoring (PRM) MS. Next, proteins were extracted from culture samples using an optimised

103 protocol maximising extraction yield18 followed by spike-in of the 19 heavy SIL-proteins into light

104 cell lysates. We then used a data-independent acquisition (DIA) MS approach46 to quantitate 1,243

105 proteins of C. autoethanogenum across 12 samples (quadruplicate cultures of three gas mixtures)

106 using a comprehensive spectral library consisting of whole-cell lysates, lysate fractions, and spike-in

107 SIL-proteins. Finally, we quantified intracellular concentrations for 16 key C. autoethanogenum

108 proteins using light-to-heavy ratios between endogenous and spike-in DIA MS intensities and further

109 used these 16 as anchor proteins for label-free estimation of ~1,043 protein concentrations through

110 establishing a linear correlation between protein concentrations and their measured MS intensities32.

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

111 We express intracellular protein concentrations in nanomoles of protein per gram of dry cell weight

112 (nmol/gDCW).

113

114 Absolute quantification of 16 anchor protein concentrations. To ensure high confidence absolute

115 quantification of anchor protein concentrations from the DIA MS data, we employed stringent criteria

116 on top of the automated mProphet peak picking algorithm47 within the software Skyline48. We also

117 performed a dilution series experiment for each SIL-protein to increase accuracy (see Methods for

118 details). Briefly, we kept only peaks with Gaussian shapes and without interference and precursors

119 with highest Skyline quality metrics. Importantly, only peptides whose signal were above the lower

120 limit of quantification (LLOQ) and within the linear dynamic quantification range in the dilution

121 series experiment were used for anchor protein quantification (Supplementary Table 2). We thus used

122 106 high-confidence peptides for the absolute quantification of 16 anchor protein concentrations

123 (Table 1; see also Fig. 5). High confidence of the intracellular concentrations for these key C.

124 autoethanogenum proteins of central metabolism is supported both by the low average 11%

125 coefficient of variation (CV) between biological quadruplicate cultures (Table 1) and the average 22%

126 CV between different peptides of single proteins (Supplementary Table S2).

127

128 Label-free estimation of proteome-wide protein concentrations. Both high quality proteomics data

129 and suitable anchor proteins are required for reliable label-free absolute proteome quantification. Our

130 proteome-wide DIA MS data were highly reproducible with an average Pearson correlation

131 coefficient of R = 0.99 between biological replicates (Fig. 2a and Supplementary Fig. 1). We also

132 found our anchor proteins suitable as their concentrations spanned across three orders of magnitude,

133 and the summed mass accounted for ~⅓ of the peptide mass injected into the mass spectrometer

134 (Table 1 and Fig. 2b). We used the 16 anchor proteins (with 106 peptides) to determine the optimal

135 label-free quantification model with the best linear fit between anchor protein concentrations and their

136 measured DIA MS intensities using the aLFQ R package49 as described before for SWATH MS28 (Fig.

137 2c). Notably, we detected an average 1.5-fold cross-validated mean fold-error (CV-MFE;

138 bootstrapping) for the label-free estimated anchor protein concentrations across samples (Fig. 2d).

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

139 The errors were distributed normally (Supplementary Fig. 2) with an average 95% CI of 0.3 (Fig. 2d).

140 We then applied the optimal label-free quantification model to estimate ~1,043 protein concentrations

141 in C. autoethanogenum (Supplementary Table 3).

142 Prior to the detailed analysis of proteome-wide protein concentrations, we further evaluated

143 our label-free data accuracy beyond the 1.5-fold CV-MFE determined above. Firstly, the total

144 proteome mass (1.2±0.1 μg; average ± standard deviation) closely matched the 1 μg peptide mass

145 injected into the mass spectrometer (Fig. 2d). The data were also supported by a strong correlation

146 between estimated protein concentrations and expected stoichiometries for equimolar (Fig. 3) and

147 non-equimolar protein complexes (Supplementary Fig. 3). Notably, absolute protein concentrations of

148 syngas cultures correlated well (R = 0.65) with their respective absolute transcript expression levels

149 determined before19 (Supplementary Fig. 4). This result is similar to the correlations of absolute data

150 seen in other steady-state cultures26,50. Altogether, we present the first absolute quantitative proteome

151 dataset for a gas-fermenting acetogen that includes SIL-based concentrations for 16 key proteins and

152 label-free estimates for over 1,000 C. autoethanogenum proteins during growth on three gas mixtures.

153

154 C1 fixation dominates global proteome allocation. Global proteome allocation amongst functional

155 classifications was explored using proteomaps27 and KEGG Orthology identifiers (KO IDs)51.

156 The “treemap” structure defining the four-level hierarchy of our proteomaps (Supplementary Table 4)

157 also included manually curated categories to accurately reflect acetogen metabolism (e.g., C1

158 fixation/WLP, Hydrogenases). As expected for autotrophic growth of an acetogen, the C1 fixation

159 (Fig. 4) or WLP (Supplementary Fig. 5) categories dominated the proteome allocation with a ~⅓

160 fraction, compared to Carbohydrate metabolism or Glycolysis/Gluconeogenesis. Notably, the data

161 show that two –dihydrolipoamide dehydrogenase (LpdA; CAETHG_RS07825) and glycine

162 cleavage system H protein (GcvH; RS07795)–encoded by the WLP gene cluster were translated at

163 very high levels (Fig. 4). This is important as both have unknown functions in C. autoethanogenum

164 metabolism. Significant investment in expression of proteins involved in acetate and ethanol

165 production (Supplementary Fig. 5) is consistent with ⅓-to-⅔ of fixed carbon channelled into these

166 two growth by-products across the three gas mixtures18,19. The 11% proteome fraction of category

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

167 Translation (Fig. 4) is expected for cells growing at a specific growth rate ~0.04 h-1 based on absolute

168 proteomics data from Escherichia coli26,39,52. The notable proteome allocation for Amino acid

169 metabolism and particularly the high abundance of ketol-acid reductoisomerase (IlvC; RS00580) are

170 surprising since metabolic fluxes through 2,3-butanediol and branched-chain amino acid pathways

171 were low under these growth conditions18,19. In addition, numerous proteins with unknown or unclear

172 functions (coloured grey in proteomaps) are highly expressed (e.g., RS12590, RS08610, RS08145),

173 highlighting the need for global mapping of genotype-phenotype relationships in acetogens. In general,

174 proteome allocation was highly similar between the three gas mixtures (Supplementary Table 3). This

175 result is unsurprising given the few relative protein expression differences detected previously18.

176

177 Enzyme usage revealed in central metabolism. Next, we focused on uncovering enzyme usage in

178 acetogen central metabolism (Fig. 5). This contains enzymes of the WLP, acetate, ethanol, and 2,3-

179 butanediol production pathways, hydrogenases, and the Nfn transhydrogenase, which together carry

180 >90% of the carbon and most of the redox flow in C. autoethanogenum18–20. Multiple metabolic fluxes

181 in these pathways can be catalysed by isoenzymes and absolute proteomics data can indicate which of

182 the isoenzymes are likely relevant in vivo. While the carbon monoxide dehydrogenase (CODH) AcsA

183 (RS0786162) that forms the bifunctional CODH/ACS complex with the acetyl-CoA synthase53

184 (AcsB; RS07800) is essential for C. autoethanogenum growth on gas as confirmed in mutagenesis

185 studies54, the higher concentrations of the dispensable monofunctional CODH CooS1 (RS14775)

54 186 suggest it may also play a role in CO oxidation (Fig. 4, 5), in addition to CO2 reduction .

187 Additionally, our proteomics data show high abundance of the primary acetaldehyde:ferredoxin

188 (AOR1; RS00440) and this support the emerging understanding that in C.

189 autoethanogenum ethanol is dominantly produced using the AOR1 activity via acetate, instead of

190 directly from acetyl-CoA via acetaldehyde using mono- or bifunctional activities18,19,55,56 (Fig. 5).

191 Furthermore, the data suggest that the specific alcohol dehydrogenase (Adh4; RS08920) is responsible

192 for reducing acetaldehyde to ethanol, a key reaction in terms of carbon and redox metabolism. The

193 high abundance of the electron-bifurcating hydrogenase HytA-E complex (RS1374570) compared

57,58 194 to alternative hydrogenases confirms that it is the main H2-oxidiser (Fig. 5). This is consistent with 7 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

195 the fact that in the presence of H2 all the CO2 fixed by the WLP is reduced to formate using H2 by the

196 HytA-E and formate dehydrogenase (FdhA; RS13725) enzyme complex activity18,19. Despite the

197 proteomics evidence, genetic perturbations are required to determine condition-specific in vivo

198 functionalities of isoenzymes in acetogens unequivocally.

199 The overall most abundant protein was the formate-tetrahydrofolate (Fhs; RS07850), a

200 key enzyme in the WLP (Fig. 5, 4). Despite the high abundance, its expression might still be rate-

201 limiting (see below). Another key enzyme for acetogens is AcsB because of its essentiality for acetyl-

202 CoA synthesis by the CODH/ACS complex. AcsB is linked to the WLP by the corrinoid iron sulfur

203 proteins AcsC (RS07810) and AcsD (RS07815) that supply the methyl group to AcsB. Interestingly,

204 the ratio of AcsCD-to-AcsB increased from 1.7 (CO) to 2.3 (syngas) to 2.9 (high-H2 CO), suggesting

205 that the primary role of the CODH/ACS complex shifted from CO oxidation towards acetyl-CoA

206 synthesis, likely because increased H2 uptake could replace the supply of reduced ferredoxin from CO

207 oxidation. Concurrently, the Nfn transhydrogenase (RS07665) levels that act as a redox valve in

208 acetogens20 are maintained high (Fig. 5), potentially to rapidly respond to redox perturbations. We

209 conclude that absolute quantitative proteomics can significantly contribute to a systems-level

210 understanding of metabolism, particularly in less-studied organisms.

211

212 Integration of absolute proteomics and flux data yields in vivo enzyme catalytic rates. Absolute

213 proteomics data enable estimation of intracellular catalytic working rates of enzymes when metabolic

214 flux rates are known26,29. We thus calculated apparent in vivo catalytic rates of enzymes, denoted as

-1 26 18 215 kapp (s ) , as the ratio of specific flux rate (mmol/gDCW/h) determined before and protein

216 concentration (nmol/gDCW) (see Methods). This produced kapp values for 13 and 48

217 enzymes/complexes using either anchor or label-free protein concentrations, respectively

218 (Supplementary Table 5 and Fig. 5, 6). The first two critical steps for carbon fixation in the methyl

219 branch of the WLP (i.e., CO to formate) are catalysed at high rates (Fig. 5). Notably, FdhA showed a

-1 220 kapp~30 s for CO2 reduction without H2 during growth on CO only, which is similar to in vitro kcat

221 data of formate dehydrogenases in other acetogens59,60. Interestingly, the next step of formate

-1 222 reduction was catalysed potentially by a less-efficient enzyme–Fhs–as its kapp of ~3 s is significantly 8 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

223 lower compared to other WLP enzymes (Fig. 5). Overall, enzymes catalysing reactions in high flux

224 pathways such as the WLP and acetate and ethanol production have higher kapps than those of

225 downstream from conversion of acetyl-CoA to pyruvate (Fig. 5). Indeed, enzymes catalysing high

226 metabolic fluxes in C. autoethanogenum have both higher concentrations and higher catalytic rates

227 compared to enzymes catalysing lower fluxes as both specific flux rates and enzyme concentrations

-9 -6 228 (Kendall’s τ = 0.56, p-value = 5×10 ) and flux and kapp (τ = 0.45, p-value = 2×10 ) were significantly

229 correlated (Fig. 6a), as seen before for other organisms26,61.

230 Having acquired absolute proteomics data for C. autoethanogenum growth on three gas

231 mixtures with different metabolic flux profiles also allowed us to determine the impact of change in

232 enzyme concentration and its catalytic rate for adjusting metabolic flux rates. Two extreme examples

233 are the reactions catalysed by the HytA-E (Fig. 6b) and the Nfn (Fig. 5) complexes where flux

234 adjustments were accompanied with large changes in kapps rather than in enzyme concentrations. Flux

235 changes in high flux pathways such as the WLP and acetate and ethanol production also coincided

236 mainly with kapp changes (Fig. 6b). This principle seems to be dominant in C. autoethanogenum as

237 90% of flux changes were not regulated through enzyme concentrations (i.e., post-translational

238 regulation; Supplementary Table 6) when comparing all statistically significant flux changes between

239 the three gas mixtures with respective enzyme expression changes (see Methods).

240

241 DISCUSSION 242 The looming danger of irreversible climate change and harmful effects of solid waste accumulation

243 are pushing humanity to develop and adopt sustainable technologies for renewable production of fuels

244 and chemicals and for waste recycling. Acetogen gas fermentation offers great potential to tackle both

245 challenges through recycling waste feedstocks (e.g., industrial waste gases, gasified biomass or

246 municipal solid waste) into fuels and chemicals1,2. Although the quantitative understanding of

247 acetogen metabolism has recently improved16,17, a quantitative description of acetogen proteome

248 allocation was missing. This is needed to advance their metabolic engineering into superior cell

249 factories and accurate in silico reconstruction of their phenotypes1,2. Thus, we performed absolute

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

250 proteome quantification in the model-acetogen C. autoethanogenum grown autotrophically on three

251 gas mixtures.

252 Our absolute proteome quantification framework relied on SIL-protein spike-in standards and

253 DIA MS analysis to ensure high confidence of the determined intracellular concentrations for 16 key

254 C. autoethanogenum proteins. We further used these proteins as anchor proteins for label-free

255 estimation of >1,000 protein concentrations. This enabled us to determine the optimal label-free

256 quantification model for our data to infer protein concentrations from MS intensities, which remains

257 unknown in common label-free approaches not utilising spike-in standards32,33. More importantly,

258 label-free estimated protein concentrations using the latter approach are questionable as their accuracy

259 cannot be determined. We determined an excellent average error of 1.5-fold for our label-free

260 estimated protein concentrations based on 16 anchor proteins and a bootstrapping approach. This error

261 is in the same range as described in previous studies using SIL spike-in standards for absolute

262 proteome quantification28,35–41. Further, we also observed a good match both between estimated and

263 injected proteome mass into the mass spectrometer and between protein concentrations and expected

264 protein complex stoichiometries. We conclude that label-free estimation of proteome-wide protein

265 concentrations using SIL-protein spike-ins and state-of-the-art MS analysis is reasonably accurate.

266 Quantification of acetogen proteome allocation during autotrophic growth expectedly showed

267 prioritisation of proteome resources for fixing carbon through the WLP, in line with transcript

268 expression data in C. autoethanogenum19,56. The allocation of one third of the total proteome for C1

269 fixation is higher than proteome allocation for carbon fixation through glycolysis during heterotrophic

270 growth of other microorganisms26,36. High abundances of other key enzymes of acetogen central

271 metabolism were also expected as the WLP, acetate and ethanol production pathways, hydrogenases,

272 and the Nfn transhydrogenase carry >90% of the carbon and most of the redox flow in C.

273 autoethanogenum18–20. However, very high expression of the two genes–LpdA and GcvH–of the WLP

274 gene cluster with unknown functions in C. autoethanogenum is striking, raising the question whether

275 their function in C. autoethanogenum could also be to link WLP and glycine synthase-reductase

276 pathways, as recently proposed for another acetogen62. Since many other proteins with unknown or

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

277 unclear functions were also highly abundant, global mapping of genotype-phenotype relationships in

278 acetogens is much needed.

279 The in vivo functionalities of isoenzymes are not clear for multiple key metabolic fluxes in

280 acetogen central metabolism and absolute proteomics data can indicate which isoenzymes are likely

281 relevant. Oxidation of CO or reduction of CO2 is a fundamental step for all acetogens and known to

282 be catalysed by three CODHs in C. autoethanogenum54. Though only AcsA that forms the

283 bifunctional CODH/ACS complex with the acetyl-CoA synthase53 is essential for growth on gas54, we

284 detected higher concentrations of the monofunctional CODH CooS1, which deletion strain shows

285 intriguing phenotypes54. Concurrently, our data suggest that prioritisation of CODH/ACS activity

286 between CO oxidation and acetyl-CoA synthesis is sensitive to H2 availability. Thus, further studies

287 are required to decipher condition-dependent functionalities of CODHs. In addition to CODHs, the

288 biochemical understanding of ethanol production is important in terms of both carbon and redox

289 metabolism. Our data confirm that in C. autoethanogenum ethanol is predominantly produced via

290 acetate by AOR118,19,55,56 and more importantly, indicate for the first time that AOR1 activity is

291 followed by Adh4 (previously characterised as butanol dehydrogenase63) for reduction of

292 acetaldehyde to ethanol. These observations call for large-scale genetic perturbation experiments to

293 determine unequivocally the condition-specific in vivo functionalities of isoenzymes in acetogens.

294 Absolute proteomics data offer a unique opportunity to estimate apparent in vivo catalytic

26,29 295 rates of enzymes (kapp) if also metabolic flux data are available. These data are particularly

296 valuable for more accurate in silico reconstruction of phenotypes using protein-constrained genome-

64,65 29 297 scale metabolic models . While in vitro kcat and in vivo kapp data generally correlate , models using

66 298 maximal kapp values show better prediction of protein abundances . Furthermore, information of kapps

299 can infer less-efficient enzymes as targets for improving pathways through metabolic and protein

300 engineering. For example, protein engineering of Fhs (catalysing formate reduction) might improve

301 WLP throughput and carbon fixation since its kapp was significantly lower compared to other pathway

302 enzymes. At the same time, the large change in kapps of the abundant electron-bifurcating hydrogenase

303 HytA-E and the Nfn transhydrogenase complexes indicate capacity for the cells to rapidly respond to

20 304 H2 availability and redox perturbations, which may be critical for metabolic robustness of acetogens . 11 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

305 Overall, we detected both higher concentrations and kapps for enzymes catalysing higher metabolic

306 fluxes, which is believed to arise from an evolutionary push towards reducing protein production costs

307 for enzymes carrying high flux61. The observation that 90% of flux changes in C. autoethanogenum

308 were not regulated through changes in enzyme concentrations is not surprising for a metabolism that

309 operates at the thermodynamic edge of feasibility16,17 since post-translational regulation of fluxes is

310 energetically least costly. Further research is needed to identify which mechanism from post-

311 translational protein modification, , or substrate concentration change is

312 responsible for post-translational regulation of fluxes.

313 We have produced the first absolute proteome quantification in an acetogen and thus provided

314 understanding of global proteome allocation, isoenzyme usage in central metabolism, and regulatory

315 principles of in vivo enzyme catalytic rates. This fundamental knowledge has potential to advance

316 both rational metabolic engineering of acetogen cell factories and accurate in silico reconstruction of

317 their phenotypes1,2,64,65. Our study also highlights the need for large-scale mapping of genotype-

318 phenotype relationships in acetogens to infer in vivo functionalities of isoenzymes and proteins with

319 unknown or unclear functions. This absolute proteomics dataset serves as a reference towards a better

320 systems-level understanding of the ancient metabolism of acetogens.

321

322 METHODS

323 Bacterial strain and culture growth conditions. Absolute proteome quantification was performed

324 from high biomass concentration (~1.4 gDCW/L) steady-state autotrophic chemostat cultures of C.

325 autoethanogenum growing on three different gas mixtures with culturing conditions described in our

326 previous works18,19. Briefly, four biological replicate chemostat cultures of C. autoethanogenum strain

327 DSM 19630 were grown on a chemically defined medium (without yeast extract) either on CO (~60%

328 CO and 40% Ar), syngas (~50% CO, 20% H2, 20% CO2, and 10% N2/Ar), or CO+H2, termed “high-

329 H2 CO”, (~15% CO, 45% H2, and 40% Ar) under strictly anaerobic conditions. The bioreactors were

330 maintained at 37 °C, pH of 5, and dilution rate ~1 day-1 (specific growth rate ~0.04 h-1).

331

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

332 Cell-free synthesis of stable-isotope labelled protein standards. Twenty proteins covering C.

333 autoethanogenum central carbon metabolism, the HytA-E hydrogenase, and a ribosomal protein

334 (Supplementary Table 1) were selected for cell-free synthesis of SIL-proteins as described in ref.18.

335 Briefly, genes encoding for these proteins were synthesised by commercial gene synthesis services

336 (Biomatik). Target genes were sub-cloned into the cell-free expression vector pEUE01-His-N2

337 (CellFree Sciences) and transformed into Escherichia coli DH5α from which plasmid DNA was

338 extracted and purified. Correct gene insertion into the pEUE01-His-N2 was verified by DNA

339 sequencing. Subsequently, cell-free synthesis of His-tag fused C. autoethanogenum proteins was

340 performed using the bilayer reaction method with the wheat germ extract WEPRO8240H (CellFree

341 Sciences) as described previously44,45. mRNAs for cell-free synthesis were prepared by an in vitro

342 transcription reaction while in vitro translation of target proteins was performed using a bilayer

13 15 13 15 343 reaction where the translation layer was supplemented with L-Arg- C6, N4 and L-Lys- C6, N2

344 (Wako) at final concentrations of 20 mM to achieve high efficiency (>99 %) for stable-isotope

345 labelling of proteins. The in vitro synthesised SIL-protein sequences also contained an N-terminal

346 amino acid sequence GYSFTTTAEK that was later used as a tag for quantification of the SIL-protein

347 stock concentration. Subsequently, SIL-proteins were purified using the Ni-Sepharose High-

348 Performance resin (GE Healthcare Life Sciences) and precipitated using methanol:chloroform:water

349 precipitation in Eppendorf Protein LoBind® tubes. Lastly, precipitated SIL-proteins were reconstituted

350 in 104 μL of 8 M urea ([UA]; Sigma-Aldrich) in 0.1 M Trizma® base (pH 8.5) by vigorous vortexing

351 and stored at -80 °C until further use.

352

353 Absolute quantification of SIL-protein standards using PRM MS. Concentrations of the twenty

354 synthesised SIL-protein standard stocks were determined using PRM MS preceded by in-solution

355 digestion of proteins and sample desalting and preparation for MS analysis.

356 Sample preparation

357 Only Eppendorf Protein LoBind® tubes and pipette tips were used for all sample preparation steps.

358 Firstly, 20 μL of UA was added to 4 μL of the SIL-protein standard stock used to determine the stock

359 concentration, and the mix was vortexed. Then, 1 μL of 0.2 M DTT (Promega) was added, followed

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

360 by vortexing and incubation for 1 h at 37 °C to reduce disulphide bonds. Sulfhydryl groups were

361 alkylated with 2 µL of 0.5 M iodoacetamide (IAA; Sigma-Aldrich), vigorous vortexing, and

362 incubation for 30 min at room temperature in the dark. Next, 75 µL of 25 mM ammonium bicarbonate

363 was added to dilute UA down to 2 M concentration. Subsequently, 2 pmol (2 µL of stock) of the non-

364 labelled AQUA® peptide HLEAAKGYSFTTTAEKAAELHK (Sigma-Aldrich) containing the

365 quantification tag sequence GYSFTTTAEK was added to enable quantification of SIL-protein stock

366 concentrations using MS analysis based on the ratio of heavy-to-light GYSFTTTAEK signals (see

367 below). Protein digestion was performed for 16 h at 37 °C with 0.1 µg of Trypsin/Lys-C mix (1 µL of

368 stock; Promega) and stopped by lowering pH to 3 by the addition of 5 µL of 10% (v/v) trifluoroacetic

369 acid (TFA).

370 Samples were desalted using C18 ZipTips (Merck Millipore) as follows: the column was

371 wetted using 0.1% (v/v) formic acid (FA) in 100% acetonitrile (ACN), equilibrated with 0.1% FA in

372 70% (v/v) ACN, and washed with 0.1% FA before loading the sample and washing again with 0.1%

373 FA. Peptides were eluted from the ZipTips with 0.1% FA in 70% ACN. Finally, samples were dried

374 using a vacuum-centrifuge (Eppendorf) at 30 °C until dryness followed by reconstitution in 12 µL of

375 0.1% FA in 5% ACN for subsequent MS analysis.

376 LC method for PRM MS

377 A Thermo Fisher Scientific UltiMate 3000 RSLCnano UHPLC system was used to elute the samples.

378 Each sample was initially injected (6 µL) onto a Thermo Fisher Acclaim PepMap C18 trap reversed-

379 phase column (300 µm x 5 mm nano viper, 5 µm particle size) at a flow rate of 15 µL/min using 2%

380 ACN for 3 min with the solvent going to waste. The trap column was switched in-line with the

381 separation column (GRACE Vydac Everest C18, 300Å 150 µm x 150 mm, 2 µm) and the peptides

382 were eluted using a flowrate of 3 µL/min using 0.1% FA in water (buffer A) and 80% ACN in buffer

383 A (buffer B) as mobile phases for gradient elution. Following 3 min isocratic of 3% buffer B, peptide

384 elution employed a 3-40% ACN gradient for 28 min followed by 40-95% ACN for 1.5 min and 95%

385 ACN for 1.5 min at 40 °C. The total elution time was 50 min including a 95% ACN wash and a re-

386 equilibration step.

387 PRM MS data acquisition

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

388 The eluted peptides from the C18 column were introduced to the MS via a nano-ESI and analysed

389 using the Thermo Fisher Scientific Q-Exactive HF-X mass spectrometer. The electrospray voltage

390 was 1.8 kV in positive ion mode, and the ion transfer tube temperature was 250 °C. Full MS-scans

391 were acquired in the Orbitrap mass analyser over the range m/z 550–560 with a mass resolution of

392 30,000 (at m/z 200). The AGC target value was set at 1.00E+06 and maximum accumulation time 50

393 ms for full MS-scans. The PRM inclusion list included two mass values of 552.7640 and 556.7711.

394 MS/MS spectra were acquired in the Orbitrap mass analyser with a mass resolution of 15,000 (at m/z

395 200). The AGC target value was set at 1.00E+06 and maximum accumulation time 30 ms for MS/MS

396 with an isolation window of 2 m/z. The loop count was set at 14 to gain greater MS/MS data. Raw

397 PRM MS data have been deposited to Panorama at

398 https://panoramaweb.org/Valgepea_Cauto_PRM.url (private reviewer account details: username:

399 [email protected]; password: hSAwNUAL) with a ProteomeXchange

400 Consortium (http://proteomecentral.proteomexchange.org) dataset identifier PXD025760.

401 PRM MS data analysis

402 Analysis of PRM MS data was performed using the software Skyline48. The following parameters

403 were used to extract PRM MS data for the quantification tag sequence GYSFTTTAEK: three

404 precursor isotope peaks with a charge of 2 (++) were included (monoisotopic; M+1; M+2); five of the

405 most intense y product ions from ion 3 to last ion of charge state 1 and 2 among the precursor were

406 picked; chromatograms were extracted with an ion match mass tolerance of 0.05 m/z for product ions

407 by including all matching scans; full trypsin specificity with two missed cleavages allowed for

408 peptides with a length of 8-25 AAs; cysteine carbamidomethylation as a fixed peptide modification.

409 Additionally, peptide modifications included heavy labels for lysine and arginine as

410 13C(6)15N(2)/+8.014 Da (K) and 13C(6)15N(4)/+10.008 Da (R), respectively. This translated into the

411 SIL-proteins and the non-labelled AQUA® peptide possessing the tag GYSFTTTAEK with m/z of

412 556.7711 and 552.7640, respectively. Hence, the concentrations of SIL-protein stocks were calculated

413 based on the ratio of heavy-to-light GYSFTTTAEK signals and the spike-in of 2 pmol of the non-

414 labelled AQUA® peptide (see above). High accuracy of quantification was evidenced by the very high

415 similarity between both precursor peak areas and expected isotope distribution (R2>0.99; idotp in

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

416 Skyline) and heavy and light peak areas (R2>0.99; rdotp in Skyline) for all SIL-protein standard

417 stocks. No heavy GYSFTTTAEK signal was detected for protein CAETHG_RS16140 (an acetylating

418 acetaldehyde dehydrogenase in the NCBI annotation of sequence NC_022592.167), thus 19 SIL-

419 proteins could be used for following absolute proteome quantification in C. autoethanogenum

420 (Supplementary Table 1). PRM MS data with all Skyline processing settings can be viewed and

421 downloaded from Panorama at https://panoramaweb.org/Valgepea_Cauto_PRM.url (private reviewer

422 account details: username: [email protected]; password: hSAwNUAL).

423

424 Absolute proteome quantification in C. autoethanogenum using DIA MS. We used 19 synthetic

425 heavy SIL variants (see above) of key C. autoethanogenum proteins (Supplementary Table 1) as

426 spike-in standards for quantification of intracellular concentrations of their non-SIL counterparts

427 using a DIA MS approach46. Also, we performed a dilution series experiment for the spike-in SIL-

428 proteins to ensure accurate absolute quantification. We refer to these 19 intracellular proteins as

429 anchor proteins that were further used to estimate proteome-wide absolute protein concentrations in C.

430 autoethanogenum. This was achieved by determining the best linear fit between anchor protein

431 concentrations and their measured DIA MS intensities using the same strategy as described

432 previously28.

433 Preparation of spike-in SIL-protein standard mix and dilution series samples

434 Only Eppendorf Protein LoBind® tubes and pipette tips were used for all preparation steps. The 19

435 spike-in SIL-protein standards that could be used for absolute proteome quantification in C.

436 autoethanogenum (see above) were mixed in two lots: 1) ‘sample spike-in standard mix’: SIL-protein

437 quantities matching estimated intracellular anchor protein quantities (i.e., expected light-to-heavy

438 [L/H] ratios of ~1) based on label-free absolute quantification of the same samples in our previous

439 work18; 2) ‘dilution series standard mix’: SIL-protein quantities doubling the estimated intracellular

440 anchor protein quantities for the dilution series sample with the highest SIL-protein concentrations.

441 To ensure accurate absolute quantification of anchor protein concentrations, a dilution series

442 experiment was performed to determine the linear dynamic quantification range and LLOQ for each

443 of the 19 spike-in SIL-proteins. Dilution series samples were prepared by making nine 2-fold dilutions

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

444 of the ‘dilution series spike-in standard mix’ (i.e., 10 samples total for dilution series with a 512-fold

445 concentration span) in a constant C. autoethanogenum cell lysate background (0.07 µg/µL; 10

446 µg/tube) serving as a blocking agent to avoid loss of purified SIL-proteins (to container and pipette tip

447 walls) and as a background proteome for accurate MS quantification of the linear range and LLOQ for

448 anchor proteins.

449 Sample preparation

450 C. autoethanogenum cultures were sampled for proteomics by immediate pelleting of 2 mL of culture

451 using centrifugation (25,000 × g for 1 min at 4 °C) and stored at -80 °C until analysis. Details of

452 protein extraction and protein quantification in cell lysates are described previously18. In short, thawed

453 cell pellets were suspended in lysis buffer (containing SDS, DTT, and Trizma® base) and cell lysis

454 was performed using glass beads and repeating a ‘lysis cycle’ consisting of heating, bead beating,

455 centrifugation, and vortexing before protein quantification using the 2D Quant Kit (GE Healthcare

456 Life Sciences).

457 Sample preparation and protein digestion for MS analysis was based on the filter-aided

458 sample preparation (FASP) protocol68. The following starting material was loaded onto an Amicon®

459 Ultra-0.5 mL centrifugal filter unit (nominal molecular weight cut-off of 30,000; Merck Millipore): 1)

460 50 µg of protein for one culture sample from each gas mixture (CO, syngas, or high-H2 CO) for

461 building the spectral library for DIA MS data analysis (samples 1-3); 2) 7 µg of protein for one

462 culture sample from either syngas or high-H2 CO plus ‘sample spike-in standard mix’ for including

463 spike-in SIL-protein data to the spectral library (samples 4-5); 3) 15 µg of protein for all 12 culture

464 samples (biological quadruplicates from CO, syngas, and high-H2 CO) plus ‘sample spike-in standard

465 mix’ for performing absolute proteome quantification in C. autoethanogenum (samples 6-17); 4) ten

466 dilution series samples with 10-15 µg of total protein (C. autoethanogenum cell lysate background

467 plus ‘dilution series spike-in standard mix’) for performing the dilution series experiment for the 19

468 spike-in SIL-proteins (see above) (samples 18-27).

469 Samples containing SIL-proteins (samples 4-27) were incubated at 37 °C for 1 h to reduce

470 SIL-protein disulphide bonds (cell lysate contained DTT). Details of the FASP workflow are

471 described before18. In short, samples were washed with UA, sulfhydryl groups alkylated with IAA,

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

472 proteins digested using a Trypsin/Lys-C mix, and peptides eluted from the filter with 60 µL of

473 ammonium bicarbonate. Next, 50 µL of samples 1-3 were withdrawn and pooled for performing high

474 pH reverse-phase fractionation as described previously18 for expanding the spectral library for DIA

475 MS data analysis, yielding eight fractions (samples 28-35). Subsequently, all samples were vacuum-

476 centrifuged at 30 °C until dryness followed by reconstitution of samples 1-3 and 4-35 in 51 and 13 µL

477 of 0.1% FA in 5% ACN, respectively. Finally, total peptide concentration in each sample was

478 determined using the PierceTM Quantitative Fluorometric Peptide Assay (Thermo Fisher Scientific) to

479 ensure that the same total peptide amount across samples 1-17 and 28-35 (excluding samples 18-27,

480 see below) could be injected for DIA MS analysis.

481 LC method for data-dependent acquisition (DDA) and DIA MS

482 Details of the LC method employed for generating the spectral library using DDA and for DIA sample

20 483 runs are described previously . In short, a Thermo Fisher Scientific UHPLC system including C18

484 trap and separation columns was used to elute peptides with a gradient and total elution time of 110

485 min. For each DDA and DIA sample run, 1 µg of peptide material from protein digestion was injected,

486 except for dilution series samples (samples 18-27 above) that were injected in a constant volume of 3

487 µL to maintain the dilution levels of the ‘dilution series spike-in standard mix’.

488 DDA MS spectral library generation

489 The following 13 samples were analysed on the Q-Exactive HF-X in DDA mode to yield the spectral

490 library for DIA MS data analysis: 1) three replicates of one culture sample from each gas mixture (CO,

491 syngas, or high-H2 CO) (samples 1-3 above); 2) three replicates of one culture sample from either

492 syngas or high-H2 CO plus ‘sample spike-in standard mix’ (samples 4-5); 3) eight high pH reverse-

493 phase fractions of a pool of samples from each gas mixture (samples 28-35).

494 Details of DDA MS acquisition for generating the spectral library are described before20. In

495 short, eluted peptides from the C18 column were introduced to the MS via a nano-ESI and analysed

496 using the Q-Exactive HF-X with an Orbitrap mass analyser. The DDA MS spectral library for DIA

497 MS data confirmation and quantification using the software Skyline48 was created using the Proteome

498 Discoverer 2.2 software (Thermo Fisher Scientific) and its SEQUEST HT search as described

499 previously18. The final .pd result file contained peptide-spectrum matches (PSMs) with q-values

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

500 estimated at 1% false discovery rate (FDR) for peptides ≥4 AAs. The generated spectral library file

501 has been deposited to the ProteomeXchange Consortium

502 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository69 with the dataset

503 identifier PXD025732 (private reviewer account details: username: [email protected];

504 password: QdkJHXy0).

505 DIA MS data acquisition

506 Details of DIA MS acquisition are described before20. In short, as for DDA MS acquisition, eluted

507 peptides were introduced to the MS via a nano-ESI and analysed using the Q-Exactive HF-X with an

508 Orbitrap mass analyser. DIA was achieved using an inclusion list: m/z 3951100 in steps of 15 amu

509 and scans cycled through the list of 48 isolation windows with a loop count of 48. In total, DIA MS

510 data was acquired for 22 samples (samples 6-27 defined in section Sample preparation). Raw DIA

511 MS data have been deposited to the ProteomeXchange Consortium

512 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository69 with the dataset

513 identifier PXD025732 (private reviewer account details: username: [email protected];

514 password: QdkJHXy0).

515 DIA MS data analysis

516 DIA MS data analysis was performed with Skyline48 as described before18 with the following

517 modifications: 1) 12 manually picked high confidence endogenous peptides present in all samples and

518 spanning the elution gradient were used for iRT alignment through building an RT predictor; 2)

519 outlier peptides from iRT regression were removed; 3) a minimum of three isotope peaks were

520 required for a precursor; 4) single peptide per spike-in SIL-protein was allowed for anchor protein

521 absolute quantification while at least two peptides per protein were required for label-free estimation

522 of proteome-wide protein concentrations; 5) extracted ion chromatograms (XICs) were transformed

523 using Savitszky-Golay smoothing. Briefly, the .pd result file from Proteome Discoverer was used to

524 build the DIA MS spectral library and the mProphet peak picking algorithm47 within Skyline was used

525 to separate true from false positive peak groups (per sample) and only peak groups with q-value<0.01

526 (representing 1% FDR) were used for further quantification. We confidently quantitated 7,288

527 peptides and 1,243 proteins across all samples and 4,887 peptides and 1,043 proteins on average

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

528 within each sample for estimating proteome-wide absolute protein concentrations. For absolute

529 quantification of anchor protein concentrations, we additionally manually: 1) removed integration of

530 peaks showing non-Gaussian shapes or interference from other peaks; 2) removed precursors with

531 similarity measures of R2<0.9 between product peak areas and corresponding intensities in the

532 spectral library (dotp in Skyline), precursor peak areas and expected isotope distribution (idotp), or

533 heavy and light peak areas (rdotp). After analysis in Skyline, 17 spike-in SIL-proteins remained for

534 further analysis as protein CAETHG_RS14410 was not identified in DIA MS data while

535 CAETHG_RS18395 did not pass quantification filters (Supplementary Table 1). DIA MS data with

536 all Skyline processing settings can be viewed and downloaded from Panorama at

537 https://panoramaweb.org/Valgepea_Cauto_Anchors.url for anchor protein absolute quantification and

538 at https://panoramaweb.org/Valgepea_Cauto_LF.url for estimating proteome-wide absolute protein

539 concentrations (private reviewer account details: username: [email protected];

540 password: hSAwNUAL).

541 Absolute quantification of anchor protein concentrations

542 We employed further stringent criteria on top of the output from Skyline analysis to ensure high

543 confidence absolute quantification of 17 anchor protein concentrations. Firstly, precursor with highest

544 heavy intensity for the highest ‘dilution series spike-in standard mix’ sample in the dilution series

545 (DS01) was kept while others were deleted. Peptides quantified in less than three biological replicates

546 cultures within a gas mixture, with no heavy signal for DS01 sample, or with heavy signals for less

547 than three continuous dilution series samples were removed. Next, we utilised the dilution series

548 experiment to only keep signals over the LLOQ and within the linear dynamic quantification range.

549 For this, correlation between experimental and expected peptide L/H signal ratios for each peptide

550 across the dilution series was made to determine the LLOQ and calculate correlation, slope, and

551 intercept between MS signal and spike-in level (Supplementary Table 2). Only peptides showing

552 correlation R2>0.95, 0.95

553 ensured that we were only using peptides within the linear dynamic range. The remaining peptides

554 were further filtered for each culture sample by removing peptides whose light or heavy signal was

555 below the LLOQ in the dilution series. Subsequently, only peptides were kept with L/H ratios for at

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

556 least three biological replicates cultures for each gas mixture (i.e., ≥9 data points). Finally, we aimed

557 to detect outlier peptides by calculating the % difference of a peptide’s L/H ratio from the average

558 L/H ratio of all peptides for a given protein for every sample. Peptides were considered outliers and

559 thus removed if the average difference across all samples was >50% or if the average difference

560 within biological replicate cultures was >50%. After the previous stringent criteria were applied, 106

561 high-confidence peptides remained (Supplementary Table 2) for the quantification of 16 anchor

562 protein concentrations since CAETHG_RS01830 was lost during manual analysis (Table 1 and

563 Supplementary Table 1). Proteins CAETHG_RS13725 and CAETHG_RS07840 were excluded from

564 the high-H2 CO culture dataset as their calculated concentrations varied >50% between biological

565 replicates. Data of one high-H2 CO culture was excluded from further analysis due to difference from

566 bio-replicates likely due to challenges with MS analysis.

567 Label-free estimation of proteome-wide protein concentrations

568 We used the anchor proteins to estimate proteome-wide protein concentrations in C.

569 autoethanogenum by determining the best linear fit between anchor protein concentrations and their

570 measured DIA MS intensities using the aLFQ R package49 and the same strategy as described for

571 SWATH MS28. Briefly, aLFQ used anchor proteins and cross-validated model selection by

572 bootstrapping to determine the optimal model within various label-free absolute proteome

573 quantification approaches (e.g., TopN, iBAQ). The approach can obtain the model with the smallest

574 error between anchor protein concentrations determined using SIL-protein standards and label-free

575 estimated concentrations. The models with the highest accuracy were used to estimate proteome-wide

576 label-free concentrations for all proteins from their DIA MS intensities (1,043 proteins on average

577 within each sample; minimal two peptides per protein; see above): summing the five most intense

578 fragment ion intensities of the most or three of the most intense peptides per protein for CO or high-

579 H2 CO cultures, respectively; summing the five most intense fragment ion intensities of all quantified

580 peptides of the protein divided by the number of theoretically observable peptides (i.e., iBAQ70) for

581 syngas cultures.

582

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

583 Expected protein complex stoichiometries. Equimolar stoichiometries for the HytA-E/FdhA and

584 MetFV protein complexes were expected based on SDS gel staining experiments in C.

585 autoethanogenum57 and the acetogen Moorella thermoacetica71, respectively. Expected

586 stoichiometries for other protein complexes in Fig. 3 and Supplementary Fig. 3 were based on

587 measured stoichiometries in E. coli K-12 (Complex Portal; www.ebi.ac.uk/complexportal) and

588 significant homology between complex member proteins in C. autoethanogenum and E. coli. All

589 depicted C. autoethanogenum protein complex members had NCBI protein-protein BLAST E-

590 values<10-16 and scores>73 against respective E. coli K-12 proteins using non-redundant protein

591 sequences.

592

593 Generation of proteomaps. The distribution of proteome-wide protein concentrations among

594 functional gene classifications was visualised using proteomaps27. For this, the NCBI annotation of

595 sequence NC_022592.167 was used as the annotation genome for C. autoethanogenum, with

596 CAETHG_RS07860 removed and replaced with the carbon monoxide dehydrogenase genes named

597 CAETHG_RS07861 and CAETHG_RS07862 with initial IDs of CAETHG_1620 and 1621,

598 respectively. Functional categories were assigned to protein sequences with KO IDs51 using the

599 KEGG annotation tool BlastKOALA72. Since proteomaps require a tree-like hierarchy, proteins that

600 were automatically assigned to multiple functional categories were manually assigned to one bottom-

601 level category (Level 3 in Supplementary Table 4) based on their principal task. We also created

602 functional categories “C1 fixation/Wood-Ljungdahl Pathway” (Level 2/3), “Acetate & ethanol

603 synthesis” (Level 3), “Energy conservation” (Level 3), “Hydrogenases” (Level 3) and manually

604 assigned key acetogen proteins to these categories to reflect more accurately functional categories for

605 an acetogen. Proteins without designated KO IDs were manually assigned to latterly created

606 categories or grouped under “Not Included in Pathway or Brite” (Level 1) with Level 2 and 3 as “No

607 KO ID”. If BlastKOALA assigned multiple genes the same proposed gene/protein name, unique

608 numbers were added to names (e.g., pfkA, pfkA2). The final “treemap” defining the hierarchy for our

609 proteomaps is in Supplementary Table 4.

610

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

611 Calculation of apparent in vivo catalytic rates of enzymes (kapp). We calculated apparent in vivo

-1 26 612 catalytic rates of enzymes, denoted as kapp (s ) , as the ratio of specific flux rate (mmol/gDCW/h)

613 determined before18 and protein concentration (nmol/gDCW) quantified here for the same C.

614 autoethanogenum CO, syngas, and high-H2 CO cultures. Gene-protein-reaction (GPR) data of the

615 genome-scale metabolic model iCLAU78618 were manually curated to reflect most recent knowledge

616 and were used to link metabolic fluxes with catalysing enzymes. For reactions with multiple assigned

617 enzymes (i.e,. isoenzymes), the enzyme with the highest average ranking of its concentration across

618 the three cultures (Supplementary Table 3) was assumed to solely catalyse the flux. For enzyme

619 complexes, average of quantified subunit concentrations was used (standard deviation estimated using

620 error propagation). For the HytA-E hydrogenase, its measured protein concentration was split

621 between reactions “rxn08518_c0” (direct CO2 reduction with H2 in complex with FdhA) and

622 “leq000001_c0” (H2 oxidation solely by HytA-E) proportionally to flux for syngas and high-H2 CO

623 cultures. The resulting enzymes or enzyme complexes catalysing specific fluxes are shown in

624 Supplementary Table 5. Finally, we assumed each protein chain being catalytically active and only

625 calculated kapp values for metabolic reactions with a non-zero flux in at least one condition, specific

626 flux rate>0.1% of CO fixation flux in at least one condition, and label-free data with measured

627 concentration for the associated enzyme(s) in all conditions (Supplementary Table 5). Membrane

628 proteins were excluded from kapp calculations to avoid bias from potentially incomplete protein

629 extraction. This produced kapp values for 13 enzymes/complexes using anchor protein concentrations

630 and for 48 enzymes/complexes using label-free protein concentrations (Supplementary Table 5).

631

632 Determination of regulation level of metabolic fluxes. We used published flux and relative

633 proteomics data18 of the same cultures studied here to determine whether fluxes are regulated by

634 changing enzyme concentrations or their catalytic rates by considering metabolic fluxes with non-zero

635 specific flux rates in at least two conditions of CO, syngas, or high-H2 CO cultures. The same

636 manually curated GPRs and criteria for isoenzymes and protein complexes as described above for kapp

637 calculation were used to determine flux-enzyme pairs (Supplementary Table 6). We first used a two-

638 tailed two-sample equal variance Student’s t-test with FDR correction73 to determine fluxes with

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

639 significant changes between any two conditions (q-value<0.05). We then used the Student’s left-tailed

640 t-distribution with FDR to determine if the significant flux change for every flux was significantly

641 different from the change in respective enzyme expression between the same conditions

642 (Supplementary Table 6). Flux with a q-value<0.05 for the latter test was considered to be regulated at

643 post-translational level (e.g., by changing enzyme catalytic rate).

644

645 DATA AVAILABILITY

646 All data generated or analysed during this study are in the main text, supplementary information files,

647 or public databases. Raw PRM MS data have been deposited to Panorama at

648 https://panoramaweb.org/Valgepea_Cauto_PRM.url (private reviewer account details: username:

649 [email protected]; password: hSAwNUAL) with a ProteomeXchange

650 Consortium (http://proteomecentral.proteomexchange.org) dataset identifier PXD025760. Raw DIA

651 MS data have been deposited to the ProteomeXchange Consortium

652 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository69 with the dataset

653 identifier PXD025732 (private reviewer account details: username: [email protected];

654 password: QdkJHXy0). PRM MS data with all Skyline processing settings can be viewed and

655 downloaded from Panorama at https://panoramaweb.org/Valgepea_Cauto_PRM.url. DIA MS data

656 with all Skyline processing settings can be viewed and downloaded from Panorama at

657 https://panoramaweb.org/Valgepea_Cauto_Anchors.url for anchor protein absolute quantification and

658 at https://panoramaweb.org/Valgepea_Cauto_LF.url for estimating proteome-wide absolute protein

659 concentrations (private reviewer account details: username: [email protected];

660 password: hSAwNUAL). Any other relevant data are available from the corresponding author on

661 reasonable request.

662

663 REFERENCES

664 1. Liew, F. et al. Gas Fermentation—A flexible platform for commercial scale production of low-

665 carbon-fuels and chemicals from waste and renewable feedstocks. Front. Microbiol. 7, 694 (2016).

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

666 2. Claassens, N. J., Sousa, D. Z., dos Santos, V. A. P. M., de Vos, W. M. & van der Oost, J.

667 Harnessing the power of microbial autotrophy. Nat. Rev. Microbiol. 14, 692–706 (2016).

668 3. Fackler, N. et al. Stepping on the gas to a circular economy: accelerating development of carbon-

669 negative chemical production from gas fermentation. Annu. Rev. Chem. Biomol. Eng. 12, 1 (2021).

670 4. Ragsdale, S. W. & Pierce, E. Acetogenesis and the Wood–Ljungdahl pathway of CO2 fixation.

671 Biochim. Biophys. Acta 1784, 1873–1898 (2008).

672 5. Wood, H. G. Life with CO or CO2 and H2 as a source of carbon and energy. FASEB J. 5, 156–163

673 (1991).

674 6. Drake, H. L., Küsel, K. & Matthies, C. In The Prokaryotes Ch. Acetogenic Prokaryotes. 354–420

675 (2006).

676 7. Fuchs, G. Alternative pathways of carbon dioxide fixation: insights into the early evolution of life?

677 Ann. Rev. Microbiol. 65, 631–658 (2011).

678 8. Fast, A. G. & Papoutsakis, E. T. Stoichiometric and energetic analyses of non-photosynthetic CO2-

679 fixation pathways to support synthetic biology strategies for production of fuels and chemicals. Curr.

680 Opin. Chem. Eng. 1, 380–395 (2012).

681 9. Cotton, C. A., Edlich-Muth, C. & Bar-Even, A. Reinforcing carbon fixation: CO2 reduction

682 replacing and supporting carboxylation. Curr. Opin. Biotechnol. 49, 49–56 (2018).

683 10. Köpke, M. & Simpson, S. D. Pollution to products: recycling of ‘above ground’ carbon by gas

684 fermentation. Curr. Opin. Biotechnol. 65, 180–189 (2020).

685 11. Russell, M. J. & Martin, W. The rocky roots of the acetyl-CoA pathway. Trends Biochem. Sci. 29,

686 358–363 (2004).

687 12. Weiss, M. C. et al. The physiology and habitat of the last universal common ancestor. Nat.

688 Microbiol. 1, 16116 (2016).

689 13. Varma, S. J., Muchowska, K. B., Chatelain, P. & Moran, J. Native iron reduces CO2 to

690 intermediates and end-products of the acetyl-CoA pathway. Nat. Ecol. Evol. 2, 1019–1024 (2018).

691 14. Ljungdahl, L. G. A life with acetogens, thermophiles, and cellulolytic anaerobes. Annu. Rev.

692 Microbiol. 63, 1–25 (2009).

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

693 15. Ragsdale, S. W. Enzymology of the Wood-Ljungdahl pathway of acetogenesis. Ann. N. Y. Acad.

694 Sci. 1125, 129–136 (2008).

695 16. Schuchmann, K. & Müller, V. Autotrophy at the thermodynamic limit of life: a model for energy

696 conservation in acetogenic bacteria. Nat. Rev. Microbiol. 12, 809–821 (2014).

697 17. Molitor, B., Marcellin, E. & Angenent, L. T. Overcoming the energetic limitations of syngas

698 fermentation. Curr. Opin. Chem. Biol. 41, 84–92 (2017).

699 18. Valgepea, K. et al. H2 drives metabolic rearrangements in gas-fermenting Clostridium

700 autoethanogenum. Biotechnol. Biofuels 11, 55 (2018).

701 19. Valgepea, K. et al. Maintenance of ATP homeostasis triggers metabolic shifts in gas-fermenting

702 acetogens. Cell Syst. 4, 505–515.e5 (2017).

703 20. Mahamkali, V. et al. Redox controls metabolic robustness in the gas-fermenting acetogen

704 Clostridium autoethanogenum. Proc. Natl. Acad. Sci. U. S. A. 117, 13168–13175 (2020).

705 21. Richter, H. et al. Ethanol production in syngas-fermenting Clostridium ljungdahlii is controlled by

706 thermodynamics rather than by enzyme expression. Energy Environ. Sci. 9, 2392–2399 (2016).

707 22. de Souza Pinto Lemgruber, R. et al. A TetR-family protein (CAETHG_0459) activates

708 transcription from a new promoter motif associated with essential genes for autotrophic growth in

709 acetogens. Front. Microbiol. 10, 2549 (2019).

710 23. Song, Y. et al. Determination of the genome and primary transcriptome of syngas fermenting

711 Eubacterium limosum ATCC 8486. Sci. Rep. 7, 13694 (2017).

712 24. Al-Bassam, M. M. et al. Optimisation of carbon and energy utilisation through differential

713 translational efficiency. Nat. Commun. 9, 4474 (2018).

714 25. Song, Y. et al. Genome-scale analysis of syngas fermenting acetogenic bacteria reveals the

715 translational regulation for its autotrophic growth. BMC Genomics 19, 837 (2018).

716 26. Valgepea, K., Adamberg, K., Seiman, A. & Vilu, R. Escherichia coli achieves faster growth by

717 increasing catalytic and translation rates of proteins. Mol. Biosyst. 9, 2344–2358 (2013).

718 27. Liebermeister, W. et al. Visual account of protein investment in cellular functions. Proc. Natl.

719 Acad. Sci. U. S. A. 111, 8488–8493 (2014).

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

720 28. Schubert, O. T. et al. Absolute proteome composition and dynamics during dormancy and

721 resuscitation of Mycobacterium tuberculosis. Cell Host Microbe 18, 96–108 (2015).

722 29. Davidi, D. et al. Global characterisation of in vivo enzyme catalytic rates and their correspondence

723 to in vitro kcat measurements. Proc. Natl. Acad. Sci. U. S. A. 104, 20167–20172 (2016).

724 30. Calderón-Celis, F., Encinar, J. R. & Sanz-Medel, A. Standardization approaches in absolute

725 quantitative proteomics with mass spectrometry. Mass Spectrom. Rev. 37, 715–737 (2017).

726 31. Maaβ, S. & Becher, D. Methods and applications of absolute protein quantification in microbial

727 systems. J. Proteomics 136, 222–233 (2016).

728 32. Ludwig, C., Claassen, M., Schmidt, A. & Aebersold, R. Estimation of absolute protein quantities

729 of unlabeled samples by selected reaction monitoring mass spectrometry. Mol. Cell. Proteomics 11,

730 M111.013987 (2012).

731 33. Blein-Nicolas, M. & Zivy, M. Thousand and one ways to quantify and compare protein

732 abundances in label-free bottom-up proteomics. Biochim. Biophys. Acta 1864, 883–895 (2016).

733 34. Ahrné, E., Molzahn, L., Glatter, T. & Schmidt, A. Critical assessment of proteome-wide label-free

734 absolute abundance estimation strategies. Proteomics 17, 2567–2578 (2013).

735 35. Schmidt, A. et al. Absolute quantification of microbial proteomes at different states by directed

736 mass spectrometry. Mol. Syst. Biol. 7, 510 (2011).

737 36. Maier, T. et al. Quantification of mRNA and protein and integration with protein turnover in a

738 bacterium. Mol. Syst. Biol. 7, 511 (2011).

739 37. Marguerat, S. et al. Quantitative analysis of fission yeast transcriptomes and proteomes in

740 proliferating and quiescent cells. Cell 151, 671–683 (2012).

741 38. Zeiler, M., Moser, M. & Mann, M. Copy number analysis of the murine platelet proteome

742 spanning the complete abundance range. Mol. Cell. Proteomics 13, 3435–3445 (2014).

743 39. Schmidt, A. et al. The quantitative and condition-dependent Escherichia coli proteome. Nat.

744 Biotechnol. 34, 104–110 (2015).

745 40. Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).

746 41. Malmström, J. et al. Proteome-wide cellular protein concentrations of the human pathogen

747 Leptospira interrogans. Nature 460, 762–765 (2009).

27 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

748 42. Brun, V. et al. Isotope-labeled protein standards: toward absolute quantitative proteomics. Mol.

749 Cell. Proteomics 6, 2139–2149 (2007).

750 43. Shuford, C. M. et al. Absolute protein quantification by mass spectrometry: not as simple as

751 advertised. Anal. Chem. 89, 7406–7415 (2017).

752 44. Takemori, N. et al. High-throughput synthesis of stable isotope-labeled transmembrane proteins

753 for targeted transmembrane proteomics using a wheat germ cell-free protein synthesis system. Mol.

754 Biosyst. 11, 361–365 (2015).

755 45. Takemori, N. et al. MEERCAT: multiplexed efficient cell free expression of recombinant

756 QconCATs for large scale absolute proteome quantification. Mol. Cell. Proteomics 16, 2169–2183

757 (2017).

758 46. Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent

759 acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11,

760 O111.016717 (2012).

761 47. Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale

762 SRM experiments. Nat. Methods 8, 430–435 (2011).

763 48. MacLean, B. et al. Skyline: an open source document editor for creating and analysing targeted

764 proteomics experiments. Bioinformatics 26, 966–968 (2010).

765 49. Rosenberger, G., Ludwig, C., Röst, H. L., Aebersold, R. & Malmström, L. aLFQ: an R-package

766 for estimating absolute protein quantities from label-free LC-MS/MS proteomics data. Bioinformatics

767 30, 2511–2513 (2014).

768 50. Lahtvee, P.-J. et al. Absolute quantification of protein and mRNA abundances demonstrate

769 variability in gene-specific translation efficiency in yeast. Cell Syst. 4, 495–504.e5 (2017).

770 51. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference

771 resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

772 52. Peebo, K. et al. Proteome reallocation in Escherichia coli with increasing specific growth rate.

773 Mol. Biosyst. 11, 1184–1193 (2015).

28 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

774 53. Lemaire, O. N. & Wagner, T. Gas channel rerouting in a primordial enzyme: Structural insights of

775 the carbon-monoxide dehydrogenase/acetyl-CoA synthase complex from the acetogen Clostridium

776 autoethanogenum. Biochim. Biophys. Acta 1862, 148330 (2021).

777 54. Liew, F. et al. Insights into CO2 fixation pathway of Clostridium autoethanogenum by targeted

778 mutagenesis. mBio 7, e00427-16 (2016).

779 55. Liew, F. et al. Metabolic engineering of Clostridium autoethanogenum for selective alcohol

780 production. Metab. Eng. 40, 104–114 (2017).

781 56. Marcellin, E. et al. Low carbon fuels and commodity chemicals from waste gases – systematic

782 approach to understand energy metabolism in a model acetogen. Green Chem. 18, 3020–3028 (2016).

783 57. Wang, S. et al. NADP-specific electron-bifurcating [FeFe]-hydrogenase in a functional complex

784 with formate dehydrogenase in Clostridium autoethanogenum grown on CO. J. Bacteriol. 195, 4373–

785 4386 (2013).

786 58. Mock, J. et al. Energy conservation associated with ethanol formation from H2 and CO2 in

787 Clostridium autoethanogenum involving electron bifurcation. J. Bacteriol. 197, 2965–2980 (2015).

788 59. Maia, L. B., Fonseca, L., Moura, I. & Moura, J. J. G. Reduction of carbon dioxide by a

789 molybdenum-containing formate dehydrogenase: a kinetic and mechanistic study. J. Am. Chem. Soc.

790 138, 8834–8846 (2016).

791 60. Schuchmann, K. & Müller, V. Direct and reversible hydrogenation of CO2 to formate by a

792 bacterial carbon dioxide reductase. Science 342, 1382–1385 (2013).

793 61. Bar-Even, A. et al. The moderately efficient enzyme: evolutionary and physicochemical trends

794 shaping enzyme parameters. Biochemistry 50, 4402–4410 (2011).

795 62. Song, Y. et al. Functional cooperation of the glycine synthase-reductase and Wood–Ljungdahl

796 pathways for autotrophic growth of Clostridium drakei. Proc. Natl. Acad. Sci. U. S. A. 117, 7516–

797 7523 (2020).

798 63. Tan, Y., Liu, J., Liu, Z. & Li, F. Characterization of two novel butanol dehydrogenases involved

799 in butanol degradation in syngas-utilising bacterium Clostridium ljungdahlii DSM 13528. J. Basic

800 Microbiol. 54, 996–1004 (2014).

29 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

801 64. Davidi, D. & Milo, R. Lessons on from quantitative proteomics. Curr. Opin.

802 Biotechnol. 46, 81–89 (2017).

803 65. Nilsson, A., Nielsen, J. & Palsson, B. Ø. Metabolic models of protein allocation call for the

804 kinetome. Cell Syst. 5, 538–541 (2017).

805 66. Heckmann, D. et al. Machine learning applied to enzyme turnover numbers reveals protein

806 structural correlates and improves metabolic models. Nat. Commun. 9, 5252 (2018).

807 67. Brown, S. D. et al. Comparison of single-molecule sequencing and hybrid approaches for

808 finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial

809 relevant Clostridia. Biotechnol. Biofuels 7, 40 (2014).

810 68. Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method

811 for proteome analysis. Nat. Methods 6, 359–362 (2009).

812 69. Vizcaíno, J. A. et al. The Proteomics Identifications (PRIDE) database and associated tools: status

813 in 2013. Nucleic Acids Res. 41, D1063–D1069 (2012).

814 70. Schwanhäusser, B. et al. Global quantification of mammalian control. Nature 473,

815 337–342 (2011).

816 71. Mock, J., Wang, S., Huang, H., Kahnt, J. & Thauer, R. K. Evidence for a hexaheteromeric

817 methylenetetrahydrofolate reductase in Moorella thermoacetica. J. Bacteriol. 196, 3303–3314 (2014).

818 72. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for

819 functional characterisation of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).

820 73. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful

821 approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

822

823 ACKNOWLEDGEMENTS

824 We thank Jörg Bernhardt for help with proteomaps, Tim McCubbin for scripts, Andrus Seiman for

825 statistics, and Olivier Lemaire for valuable discussions. This work was funded by the Australian

826 Research Council (ARC LP140100213) in collaboration with LanzaTech. We thank the following

827 investors in LanzaTech's technology: Sir Stephen Tindall, Khosla Ventures, Qiming Venture Partners,

828 Softbank China, the Malaysian Life Sciences Capital Fund, Mitsui, Primetals, CICC Growth Capital 30 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

829 Fund I, L.P. and the New Zealand Superannuation Fund. The research utilised equipment and support

830 provided by the Queensland node of Metabolomics Australia, an initiative of the Australian

831 Government being conducted as part of the NCRIS National Research Infrastructure for Australia.

832 There was no funding support from the European Union for the experimental part of the study.

833 However, K.V. acknowledges support also from the European Union’s Horizon 2020 research and

834 innovation programme under grant agreement N810755.

835

836 AUTHOR CONTRIBUTIONS 837 Conceptualisation, K.V., C.L., M.K., L.K.N., and E.M.; Methodology, K.V., G.T., N.T., A.T., C.L.,

838 A.P.M., R.T., and E.M.; Formal Analysis, K.V., and G.T.; Investigation, K.V., G.T., N.T., and A.T.;

839 Resources, N.T., A.T., M.K., S.D.S., L.K.N., and E.M.; Writing – Original Draft, K.V. and E.M.;

840 Writing – Review & Editing, K.V., N.T., C.L., A.P.M., R.T., M.K., L.K.N., and E.M.; Supervision,

841 M.K., L.K.N., and E.M.; Project Administration, R.T., and E.M.; Funding Acquisition, M.K., S.D.S.,

842 L.K.N., and E.M..

843

844 COMPETING INTERESTS

845 LanzaTech has interest in commercial gas fermentation with C. autoethanogenum. A.P.M, R.T., M.K.,

846 and S.D.S. are employees of LanzaTech.

847

848 MATERIALS & CORRESPONDENCE

849 Correspondence and request for materials should be addressed to E.M.

850

851 FIGURE LEGENDS

852 Fig. 1 Absolute proteome quantification framework in C. autoethanogenum. Absolute proteome

853 quantification in light (no stable-isotope labelled [SIL] substrates) autotrophic C. autoethanogenum

854 chemostat cultures was built on using 19 synthetic heavy SIL-protein spike-in standards and data-

855 independent acquisition (DIA) mass spectrometry (MS) analysis. Culture samples with SIL-protein

856 spike-ins and samples for DIA spectral library were analysed by DIA MS. Subsequent stringent data

31 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

857 analysis allowed to quantify intracellular concentrations for 16 key C. autoethanogenum proteins

858 using light-to-heavy ratios between endogenous and spike-in DIA MS intensities. These 16 key

859 proteins were further used as anchor proteins for label-free estimation of ~1,043 protein

860 concentrations through establishing a linear correlation between protein concentrations and their

861 measured MS intensities. Some parts created with BioRender.com.

862

863 Fig. 2 Label-free estimation of proteome-wide protein concentrations. a Correlation of peptide

864 mass spectrometry (MS) feature intensities between biological replicate cultures of the three gas

865 mixtures. b Linear correlation between anchor protein concentrations and their measured MS

866 intensities for one syngas culture. gDCW, gram of dry cell weight, aLFQ, absolute label-free

867 quantification. c Errors of different label-free quantification models for the linear fit between anchor

868 protein concentrations and their measured MS intensities determined by bootstrapping using the aLFQ

869 R package49 for one syngas culture. CV-MFE, cross-validated mean fold-error. d Label-free

870 quantification error of optimal model (orange) and total proteome mass (blue) across samples. Error

871 bars denote 95% CI.

872

873 Fig. 3 Strong correlation between protein concentrations and expected stoichiometries for

874 equimolar protein complexes. Grey dotted lines denote the average 1.5-fold cross-validated mean

875 fold-error (CV-MFE) of label-free protein concentrations. Label-free protein concentrations are

876 plotted, except for the HytA-FdhA complex, which was quantified using stable-isotope labelled

877 protein spike-ins. Data points of the same colour represent gas mixtures. See Methods for details on

878 expected protein complex stoichiometries. See Supplementary Table S3 for gene/protein ID, proposed

879 name, description, and label-free data. See Table 1 for HytA-FdhA data. gDCW, gram of dry cell

880 weight.

881

882 Fig. 4 Proteomaps uncover global proteome allocation. Left proteomap shows proteome allocation

883 amongst functional gene classification categories (KEGG Orthology identifiers [KO IDs] 51) at level

884 two of the four-level “treemap” hierarchy structure (Supplementary Table S4). Right proteomap

32 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

885 shows proteome allocation at the level of single proteins (level four of “treemap”). See Supplementary

886 Fig. S5 for proteomaps of levels one and three of “treemap”. Area of the tile is proportional to protein

887 concentration. Colours denote level one categories of “treemap”. Proteomaps visualise average

888 concentrations of syngas cultures while category percentages are average of three gas mixtures

889 (shown for categories with a fraction >5%). See Supplementary Table S3 for gene/protein ID,

890 proposed name, description, and label-free protein concentrations.

891

892 Fig. 5 Quantitative systems-level view of acetogen central metabolism. Enzyme concentrations

-1 893 (nmol/gDCW), apparent in vivo catalytic rates of enzymes (kapp; s ), and metabolic flux rates

894 (mmol/gDCW/h) are shown for C. autoethanogenum steady-state chemostat cultures grown on three

895 gas mixtures. See dashed inset for bar chart and heatmap details. Enzyme concentration and kapp data

896 are average of biological replicates. Proteins forming a complex are highlighted with non-black

897 borders (FdhA forms a complex with HytA–E for direct CO2 reduction with H2; CooS1 is expected to

898 form a complex with CooS1a and b as they are encoded from the same operon). For reactions with

899 isoenzymes, kapp is for the enzyme with the highest concentration ranking (top location on enzyme

900 heatmap), see Methods for details. Flux data from ref.18 are average of biological replicates and error

901 bars denote standard deviation. Arrows show direction of calculated fluxes; red arrow denotes uptake

902 or secretion. Gene/protein IDs right of enzyme concentration heatmaps are preceded with

903 CAETHG_RS and red font denotes concentrations determined using stable-isotope labelled (SIL)

904 protein spike-in standards (i.e., anchor proteins). Asterisk denotes data for redox-consuming CO2

a 905 reduction to formate solely by FdhA without the use of H2 during growth on CO. Bifunctional

906 acetaldehyde/alcohol dehydrogenase (acetyl-CoA→ethanol); bFlux into PEP from OAA and pyruvate

907 is merged and kapp is for PEPCK. See Supplementary Table S3 for gene/protein ID, proposed name,

908 description, and label-free protein concentrations. See Table 1 for anchor protein concentrations. See

18 909 Supplementary Table S5 for kapp and flux data. See ref. for cofactors of reactions and metabolite

910 abbreviations. gDCW, gram of dry cell weight, NQ, not quantified.

911

33 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

912 Fig. 6 Regulatory principles of apparent in vivo catalytic rates of enzymes (kapp) and metabolic

913 flux throughput. a Enzymes catalysing higher metabolic flux rates have both higher concentrations

914 and higher kapps. Yellow and blue denote high and low values, respectively. Kendall’s τ correlations

915 with significance p-values between respective pairs are shown below heatmap. See Supplementary

916 Table S5 for flux rate, enzyme concentration, and kapp data, and for description of reaction names

917 (Rxn name) and gene-protein-reaction (GPR) associations. b Control of metabolic flux throughput

918 through kapp changes for high flux pathways. See also Fig. 5. gDCW, gram of dry cell weight.

34

bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

19 heavy SIL-proteins of Light and heavy DIA-MS Data analysis Absolute quantification CO H2 CO2 C. autoethanogenum peptides Thermo Q-Exactive HF-X Skyline & mProphet SILP-based concentrations

ADLNNENLEALEK for 16 anchor proteins

Heavy KR Light KR MS intensity Anchor concentration

MS intensity Retention Time DIA spectral library Label-free concentrations Light C.autoethanogenum Light cell lysate Light cell lysates & fractions for ~1,043 proteins chemostat cultures heavy SIL-proteins bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a Correlation of peptide features b Anchor protein calibration 1 4 CO 3 y = 0.94x - 6.15 0.97 R² = 0.96 2 Syngas

0.94 anchor protein 10 1

High-H2 CO Pearson correlation (R) concentration (nmol/gDCW) 0.91 Log 0 CO Syngas High-H 7 8 9 10 Log aLFQ MS intensity 2 CO 10 d c Label-free quantification accuracy aLFQ optimal model determination 1.6 3 Sum of protein 2.0 5 1.9 1.2 4 2 1.8 1.7 0.8 3 1.6 1 CV-MFE 2 1.5 0.4 CV-MFE CV-MFE 1 1.4 Fragment ions Sum of label-free protein concentrations (µg) 0 0 top1 top2 top3 top4 top5 all iBAQ CO Syngas High-H2 CO Peptides bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

2 tinubus noitartnecnoc nietorP noitartnecnoc tinubus 2 10³ Ribosome (30S/50S)

10² MetFV

Anthranilate synthase HytA-FdhA Glutamate synthase IscS-IscU 10¹ SecD-SecF CarA-CarB 3-isopropylmalate dehydratase InfB-InfC (nmol/gDCW) GyrA-GyrB FtsEX ABC cell division 10⁰ Citrate

1.50x1x 0.67x 0 0 10⁰ 10¹ 10² 10³ Protein concentration subunit 1 (nmol/gDCW) bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

gph pilT ABC.FEV.S_ birA 8 ictB Metabolism of terpenoids hprA2 glcD2 dxr lipA MTHFS wecB iscS3 comEC2 prkC ABC.SN.S_ and polyketides glcD gcpE yfbK murA glmS2 hprA GGPS dxs 2 ABC.PA.P_ glmE lplA2 murB galE2 hemC MTFMT 2 gtrB2 glmU cobB-cbiA cobM gpmI glcD4 ispF pbuG wbpA thiI bioB tlyC thiN bioB2 cbiE glcD3 mal galE cobH-cbiC ABC.PA.S hutG2 npdA glmM glf cobQ deoC pncC cbiT PdxS ABC.FEV.S_ deoB hemE2 cobA cbiK pdxT 2 wlbA hemA nadC nadA ABC.PA.S_ ydjE2 FolD hemB folC moaA 3 Metabolism of cofactors nadC2 cobJ ppnK cobI-cbiL TalA cobA-hemD folP moeA3 GAPDH2 nadE pncB ribE sulD ABC-2.A_ ABC.FEV.S FBA2 GcvH 20 cbiD uraA2 tuaC coaX murD and vitamins PGK corA2 ABC.PA.A cobU cbiG ABC.PA.S_2 ribBA ribH cobK-cbiJ moeA1 tktA4 murI Metabolism MoeA2 ggt mogA of other panE csn2 amino hemL acids Protein MetF GCH1 mreB K10907 trxB selA CoaBC coaBC2 nrnA csh1 ygfK cas5h

ribD K08303 ncd2 csh2 K08303_ 2 ampS families PfkA2 cysK2 alr tktA3 DNPEP HytE1 ldcA TPI ddl Hyd1a Hyd1b pepF GAPDH gdhA prc hcp2 thiJ PreP LpdA Hyd1c HytA AOR2 nifH3 nifH4 lon2 CooS1 hcp clpP Carbohydrate GPI AcsA paaH2 fbp3 map HytD adh2 pgm ENO pepP pepQ metabolism larA (CODH) yvgN accC2 plc 4hbD lon accD gloB BDH MetV HytB (7%) FchA ClpP2 accA Pta PpdK Pyc HytE2 ME2_ larA2 CooC 4 CooC2 AcsB(ACS) HytC Hyd4a

adh

ldh RS04110 PckA RS02985 RS11275

Energy RS15715 RS06320 RS20145 RS01855 RnfD RS15140RS12595 RS04820 C1 AOR1 RS14895 RS17395

RS07705RS01190 RS19915 RS07785 RS16255 RS18085 AcsD RS10620 RS07615 RS16345 RS11145 Fdh RS15485 RS11240 RS04460 RS15070

RS15075 AckA RS10295 metabolism RS10555 PpaC RS02070 RS07880 RS02655 RS12730 RS01295RS12555 RS19280 RS11075 PFOR RS14350 RS19070 (13%) RS14015RS11130 RS19205 RS00650 RS13050 RS13070 RS18865 RS01065

RS16365 RS13295 RS14480 RS04290 RS00570 araA RS16220 Nfn RS13280

RS10220 accB RS07700 RS08915 RS00980RS13120 yjhH3 RS02140 RS19850 RS03605 RS02990

adh5 RS10090 RS03885 RS19530 UGP2 RS12690 RS00955 RS00740 atpG RS04390 RS09955 ACO AcsE fixation(30%) ACO3 IDH3 AcsC Adh4 RS10725 RS01770 manA RS19840RS03275 RS15235 RS05130 fumB AtpD algA2 mgsA E2.3.3.3.2 RS03945 fumA RS00060 pyrI udk RS19690 RS05760 RnfC rnfE RS05240 DPYS RS05915 pyrR RS03600 pyrE RS13240 pyrG RS04450 codA atpH adh3 RS12360 pyrDI dut RS08000 Upp URA42 dapB4 SORD RS19115 RS04640 nrdA dapA RS06545 Adk lysA carB guaD hisA RnfB RS06900 Pdp hisF LYSN3 punA dapE RS12835 RS08145

dapF atpB RnfG RS08160 GuaA CooS1b hisH pyrB hisC ygeT purM hisIE hisG RS10845 pyrH PurA RS00695 PFAS xpt metK2 hisZ NUDT5 Fhs nudF spoT purB E3.1.3.15B E2.7.6.5X3 hutG add PurC CooS1a atpF RS03270 gmk RS00315 RS13160 surE hisD nrdD mtnN RS16840 RS18455 RS08725 Nucleotide RS18870 RS19150 RS01035 purE trpC comC fwdE purF mtaA3 comB gpsA cysK3 RS06780 mtaA2 RS11790 metabolism trpF MetY ilvB3 RS12580 RS13305 RS07900 dagK metA RS04855 RS07780 RS16475 RS03385 RS10920 RS20130 PRPS RS13985 RS18910 RS07920 TrpB AroB thrC ilvB4 leuC PurH RS08530 RS07285 RS11490 RS02580 RS17030 aroF speE RS19210 metH IlvC RS03290 gldA pheA RS10930 RS11325 AroE leuD RS19935 aroK serA2 serA RS12845 Amino acid plsX purD tyrA2 Lipid IMPDH leuB RS01210 rbgA APRT RS03555 RS13215 obgE uppS2 murE rfbB aroA spoVD hom serA6 MCH TrpD Asd ilvH Glycan biosynthesis rsuA rsmB RS03060 mrdA2 pheA2 ilvE2 RS04455 RS14970 and metabolism murT murF leuB2 RS06550 mrcA gatD plsC fabH hom2 tsaC2 murG rfbA RS18705 rumA2 glyA RS04500 metabolism ychF rfbC rimO engA sdaA rlmB era rimM AcpP serA4 tlyA mhpD RS02310 RS00725 aroC ilvA gltD2 leuA RS03940 AROA2 RS13695 metabolism prfA fabG3 TrpG thrB gltD RS14725 RS02400 RS05495 fabK OTC prfC carB2 IlvD RS11795 RS02170 No KO fabZ paaK enr RS16450 argH RS04385 RS15670 RS14605 (8%) proB proC RS15415 RS12745 RplK aspB2 yhdR RS07870 rplI TrpE RS12855 RS07335 RS07100 Efp frr rplJ RS07585 argD RS07790 RS14545 RS08060 RS17290 aspA RS02815 asnB3 RS09565 argJ aspB RS07415 PTH1 argC RS05235 RS17250 RS07600 gltB RS00480 RS03720 rplX ppiC proA RS01310 carA2 RS09930 RplM RS06180 RS12810 dnaJ RS13560 glnA RS05140 RS02715 RS15200 RS12725 RS07760 NIT2 RS19050 HSP20 nadB RS07275 RS11065 RS04440 RS13935 RS07775 Tsf RpsO RplL RS09150 FusA RS10825 RS00120 RS07915 RpsM RS02175 RS17875 ID RS12825 RS08035 RS02290 infC RS16860 rplW RS15550 RS18730 RS06340 RS04265 RS08110 K07082 rplN RS12755 (11%) RS18025 RS14525 RS18165 RS12685

rpmC RS09885 RS02125 DnaK RS10910 ycdX RS05080 RS04465 K07040 RS08610 RS12590 RS13080 RpmD TrxA RS07865 K09118 ftsH2 RS10325 RS05975 RplF K07030 RS00705 RS08785 RS12520 RS18660 K09766 RpsE rplV K00243 K09777 RS04355 rpsL RS02820 RS05840 RS06190 K07079_ RS07095 K07138_ RS08605 2 2 RS06240 RS00055 RS01740 degP2 RS06280 GRPE RS10635 RS04030 RS06940 RpsP RS13085 K07076 RS13990 RS12750 RS12565 RS10020 RS03525 RS01060 RS18015 RS03550 PPIB RS06080 RS12740 ftsH K09157 RS01825 RS15210 RS12330 RS19905 RS17965 K07137_ RS02130 RS03655 RS17200 rplE RS14625 RS04395 RS11165 RS12760 RS11430 RS01520 RpsC 2 RS00190 Poorly RS06440 RS01720 RS11780 RS17820 RS10915 RS10640 RS19900 K07079 hypF xdhC RS11245 RS12005 characterised dnaJ2 hypD ABC.CD.TX2 RS15225 RS12770RS19910 K09117 hypE2 Tuf rpsA RplO pspA RpsK htpG K06923 typA RpsD K06933 rplD hypE hr cstA2 Folding, rpsF K07166 K07138 K06929 K07137 K09766_ yggS SpoVS RpsB 2 mcsA TC.APA5 K07219 xdhC2 TC.SSS3 hprK K07220 Tig clpB K09975 Translation sorting and K07571 GroEL spoVG jag rplC HSP20_ (11%) 2 K09747 mnmA xseB RpsJ rnc mutS2 RpsG ftnA RpsH clpX larE degradation uvrC uvrB dnaN infB rapZ E3.1.3.97 K09749 ridA2 FprB polA miaB mcsB uvrA uvrD4 ssb ABCF3 rph KAE1 dnaG queG2 uvrD3 ruvX nifE2

disA yabN tsaB larC E4.3.1.15 PDF rplB dtd cfa apbE2 hydN2 Unclassified RpsI mtaB apbE E1.6.99.1 tilS queG dacA fixA rpsS modD sepF parA RplP clpC MreB2 clpA larB FARSA tcdA pdhR dnaC rplA mnmE CspA3 topB2 prsA fsaB GroES trmL gidA PDF5 IscU hslO iscS2 AbrB purR ABC.SP.S_ acrA2 argR PDF3 2 rho rseC FprA1_ cheR KARS mcp31 FtsZ ftsA RplU HARS fdhF Cellular parB pucR kdgR3 mcp29 community MARS kdgR4 perR RS01050 pgdA mcp15 acoR2 nusG 2 nnr - glnB2 phoB1 PDF2 deaD2 cheC prokaryotes rnj cheB mcp17 RplS EARS NARS2 AARS tex larE2 luxS rplQ acoR K07445 K22882 minD RpsU greA mcp13 iscS resD3 flgE mcp16 Replication ecfA1 divIVA yajC codY greA3 citF3 argR2 citF4 CARS2 acoR6 Transcription cbiO2 rny padR NARS hfq citE4 mcp27 TARS LARS mcp20 fliC2 VARS acoR3 LacI ftsE ytfQ cheA2 mcp5 rpoC mglB and repair Signal recG gyrB PtsH mreC rplR asnC transduction pnp cbiQ2 fliD mcp8 mcp3 bapA Membrane rsbV gyrA WARS secA mcp4 recG2 YARS2 greA2 fliC3 RplT secF ftsX bmpA2 CheY3 Cell motility dnaA spo0A metQ DARS2 QARS ffh mcp23 transport RecA radA IARS rex ccpN flgK cheW3 cheD HupB GARS SARS nusB flgL lsrK mutT3 PARS potD cycB topA addA nusA rpoB znuA FARSB CARS ycbB znuA2 cas1 RARS secD ftsY RpoA pleD ptsI rbsB metN tupA fliF fliN mcp19 motB2 fliM bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

CO H2

40 40 + 20 Fdred+NADH+2NADP 20 Flux

k 0 0 app + Enzyme Fdox+NAD +2NADPH Nfn 07665 CO Enzyme Nfn transhydrogenase CooS1 14775 30 Flux CooS1a 14770 H Enzyme k CooS1b 14765 2 app Enzyme NQ 07645 15 CooS2 NQ 19130 HytA 13765 07650 0 HytC 13745 NQ 07655 F6P Flux HytB 13750 1 Enzyme + CO2 NQ 00530 (Fdred+NADPH+H ) kapp HytD 13755 2 NQ 17510 Fbp3 04285 H2 Enzyme HytE1 13760 PfkA2 12015 * NQ 18840 Flux * FdhA * NQ 13725 (+13745-13770) HytE2 13770 NQ 04100 k * FdHA2 NQ 14690 app *

+ GAP 1 H + + 1.5 [H ] 2 (Fdox+NADP ) Enzyme Flux GAPDH2 16815

Formate k GAPDH 08515 1.0 MIN Heatmap scale MAX app Flux Enzyme Enzyme 0.3 1,122 Enzyme Flux 0 29 Flux Fhs 07850 0.5 kapp kapp 0 6 74 9k app 10 0.0 Substrate 3PG CO syngas high-H CO 10-Formyl-THF 2 Enzyme/protein Flux Name CAETHG_RS... Enzyme direction Enzyme Flux Flux kapp GpmI 08500 k FchA 07845 Reaction app k app Product

Input and output fluxes Parameters and units Enzyme/protein - protein concentration (nmol/gDCW) 2PG 5,10-Methenyl-THF Flux - specific flux rate (mmol/gDCW/h) Enzyme Blue - CO -1 kapp - apparent in vivo catalytic rate of enzyme (s ) Flux Red - syngas FolD NQ 07840 Enzyme Green - high-H2 CO kapp NQ Flux mmol/gDCW/h Flux ENO 08495

kapp

Enzyme 5,10-Methylene-THF Enzyme Enzyme 13535 PEPCK 13335 Flux MetF 07830 13540 PEP α 13545 kapp MetV 07835 -KG

Flux 5-Methyl-THF Flux b k Enzyme/protein app

kapp 13535 AcsE 07805 OAA Flux Enzyme kapp 13540 AcsC 07810 PPDK 14295 kapp kapp 13545 AcsD 07815

Biomass Flux

k CO [CH3] app Enzyme Enzyme

AcsB(ACS) 07800 Pyc 07735 Flux AcsA(CODH) 07861 Enzyme kapp AcsA(CODH) 07862 Pyruvate Flux PFOR 14890 Enzyme PFOR2 kapp NQ 04430 IlvB4 01930 Acetyl-CoA Flux IlvH 00595

Enzyme kapp IlvB3 00590 Flux Enzyme Pta 16490 NQ 08420 kapp Ald NQ 08810 Flux NQ 08865

kapp NQ Ald2 NQ 16140 Acetolactate Acetyl-P Enzyme Flux Enzyme Enzyme Flux AlsD NQ 14410 a k NQ AckA 16495 AdhE1 NQ 18395 app kapp AdhE2 NQ 18400 Flux Acetaldehyde Enzyme

kapp Adh4 08920 Acetoin Adh 02620 Enzyme Acetate Enzyme Flux 5 Flux Adh3 02630 BDH 01830 k AOR1 00440 Enzyme app

kapp Adh2 01950 AOR2 00490 AtpI NQ 11520 Enzyme Adh5 19400 AtpB 11525 RnfC 15840 SORD NQ 15940 0 AtpE NQ 11530 RnfD 15845 1 AtpF 11535 RnfG 15850 2 ATP AtpH 11540 2,3-BDO Fdox RnfE 15855 Fdred Ethanol 12 AtpG 11550 RnfA 15860 + 10 0.10 NADH NAD 6 AtpD 11555 RnfB 15865 AtpC NQ 11560 0.05 5 0 0 0.00 30 Rnf complex ATPase 15 0 2 [H+] bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a GPR b Rxn name Flux Enzyme kapp CAETHG_RS… rxn07189 14775 rxn00690 07850 10 49 rxn01211 07845 60 HytA-E high-H2 CO rxn00907 07840 syngas 300 rxn00910 07830+35 rxn06149 07805 40 rxn12080 07800 (s -¹ ) rxn00225 16495

rxn00173 16490 app rxn08518 13725+(45-70) k 20 200 rxn00543 08920 CO leq000004 00440 72 leq000001 13745-70 400 100 leq000002 07665 rxn00103 13725 81 rxn05938 14890 30 Pta rxn00001 15405 high-H2 CO rxn00250 07735 89 (s -¹ ) 30 rxn00184 11665 20 81 syngas rxn00097 09335 app rxn05289 09165 k CO rxn00247 13335 10 rxn01106 08500 10 rxn01100 08510 Protein concentration (nmol/gDCW) rxn00459 08495 rxn00781 16815 150 rxn00003 01930 rxn00260 01020 high-H CO rxn00903 14905 AOR1 2 Hydrogenase rxn00747 08505 10

(s -¹ ) syngas rxn02187 00580 290 rxn03068 00580 app CO rxn00898 00585 k 211 09960+65 5 rxn06673 Acetate rxn00187 09895 270 rxn02112 01830 production rxn01643 06525 rxn00611 16355 15 rxn01301 15215 rxn00974 13540 rxn00256 13535 Ethanol MetF 233 280 rxn01388 13540 10 production rxn00505 13545 (s -¹ ) rxn01069 05835 syngas 254 high-H2 CO rxn00770 09805 app rxn01974 15555 k 5 CO rxn01972 18870 Wood-Ljungdahl rxn03030 06545 rxn03086 06510 pathway rxn00248 12200 0 rxn00799 09220 0 2 4 6 8 10 12 τ=0.56 p=5×10-9 τ=0.45 p=2×10-6 Specific flux rate (mmol/gDCW/h)