bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Absolute proteome quantification in the gas-fermenting acetogen Clostridium
2 autoethanogenum
3 Kaspar Valgepea1,2, Gert Talbo1,3, Nobuaki Takemori4, Ayako Takemori4, Christina Ludwig5,
4 Alexander P. Mueller6, Ryan Tappel6, Michael Köpke6, Séan Dennis Simpson6, Lars Keld Nielsen1,3,7
5 and Esteban Marcellin1,3*
6 1Australian Institute for Bioengineering and Nanotechnology (AIBN), The University of Queensland, 4072 St. Lucia, Australia
7 2ERA Chair in Gas Fermentation Technologies, Institute of Technology, University of Tartu, 50411 Tartu, Estonia
8 3Queensland Node of Metabolomics Australia, AIBN, The University of Queensland, 4072 St. Lucia, Australia
9 4Institute for Promotion of Science and Technology, Ehime University, 791-0295 Ehime, Japan
10 5Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of Munich, 85354 Freising, Germany
11 6LanzaTech Inc., 60077 Skokie, USA
12 7The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
13 *Correspondence: [email protected]
14
15 ABSTRACT
16 Microbes that can recycle one-carbon (C1) greenhouse gases into fuels and chemicals are vital for the
17 biosustainability of future industries. Acetogens are the most efficient known microbes for fixing
18 carbon oxides CO2 and CO. Understanding proteome allocation is important for metabolic
19 engineering as it dictates metabolic fitness. Here, we use absolute proteomics to quantify intracellular
20 concentrations for >1,000 proteins in the model-acetogen Clostridium autoethanogenum grown on
21 three gas mixtures. We detect prioritisation of proteome allocation for C1 fixation and significant
22 expression of proteins involved in the production of acetate and ethanol as well as proteins with
23 unclear functions. The data also revealed which isoenzymes are important. Integration of proteomic
24 and metabolic flux data demonstrated that enzymes catalyse high fluxes with high concentrations and
25 high in vivo catalytic rates. We show that flux adjustments were dominantly accompanied with
26 changing enzyme catalytic rates rather than concentrations. Our work serves as a reference dataset and
27 advances systems-level understanding and engineering of acetogens.
28
1 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
29 INTRODUCTION 30 Increasing concerns about irreversible climate change are accelerating the shift to renewable, carbon-
31 free energy production (e.g., solar, wind, fuel cells). However, many fuels and chemicals will stay
32 carbon-based, and thus, technologies for their production using sustainable and renewable feedstocks
33 are needed to transition towards a circular bioeconomy. Moreover, the rising amount of solid waste
34 produced by human activities (e.g., municipal solid waste, lignocellulosic waste) will further endanger
35 our ecosystems' already critical state. Both challenges can be tackled by using organisms capable of
36 recycling gaseous one-carbon (C1) waste feedstocks (e.g., industrial waste gases [CO2, CO, CH4],
37 syngas from gasified biomass or municipal solid waste [CO, H2, CO2]) into fuels and chemicals at
38 industrial scale1–3.
39 As we transition into a new bioeconomy, a key feature of global biosustainability will be the
40 capacity to convert carbon oxides into products at industrial scale. Acetogens are the ideal biocatalysts
41 for this as they use the most energy-efficient pathway, the Wood-Ljungdahl pathway (WLP)4,5, for
6–9 42 fixing CO2 into the central metabolite acetyl-CoA and accept gas (CO, H2, CO2) as their sole carbon
43 and energy source5. Indeed, the model-acetogen Clostridium autoethanogenum is already being used
44 as a cell factory in industrial-scale gas fermentation3,10. The WLP is considered the first biochemical
45 pathway on Earth7,11–13 and continues to play a critical role in the biogeochemical carbon cycle by
6,14 46 fixing an estimated 20% of the global CO2 . While biochemical details of the WLP are well
47 described4,6,15, a quantitative understanding of acetogen metabolism is just emerging16,17. Notably,
48 recent systems-level analyses of acetogen metabolism have revealed mechanisms behind metabolic
49 shifts18–21, transcriptional architectures22,23, and features of translational regulation24,25. However, we
50 still lack an understanding of acetogen proteome allocation through the quantification of proteome-
51 wide intracellular protein concentrations. This fundamental knowledge is required for advancing
52 rational metabolic engineering of acetogen cell factories and for accurate in silico reconstruction of
53 their phenotypes using metabolic models1,2.
54 Quantitative description of an organism’s proteome allocation through absolute proteome
55 quantification is valuable in several ways. Firstly, it enables us to understand prioritisation of the
56 energetically costly proteome resources among functional protein categories, metabolic pathways, and
2 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
57 single proteins26,27. This may also identify relevant proteins with unclear functions and high
58 abundances. Secondly, some metabolic fluxes can be catalysed by isoenzymes and a comparison of
59 their intracellular concentrations can indicate which are likely relevant in vivo and are thus targets for
60 genetic perturbation experiments to validate in vivo functionalities28. Thirdly, integration of absolute
61 proteomics and metabolic flux data enable the estimation of apparent in vivo catalytic rates of
26,29 62 enzymes (kapp) , which can be used to identify less-efficient enzymes as targets for improving
63 pathways through metabolic and protein engineering. Absolute proteomics data also contribute to the
64 curation of accurate genome-scale metabolic models.
65 Absolute proteome quantification is generally performed using label-free mass-spectrometry
66 (MS) approaches without spike-in standards30,31. The major limitation of this approach is that
67 accuracy of label-free estimated protein concentrations cannot be determined. Furthermore, the
68 optimal model to convert MS signals (e.g., spectral counts, peak intensities) into protein
69 concentrations remains unknown32–34. Label-based approaches using stable-isotope labelled (SIL)
70 spike-ins of endogenous proteins are thus preferred for reliable absolute proteome quantification. This
71 strategy relies on accurate absolute quantification of a limited set of intracellular proteins (i.e.,
72 anchors) using SIL spike-ins to establish a linear correlation between protein concentrations and their
73 measured MS intensities32. Studies with the latter approach have determined a 1.5 2.4-fold error for
74 label-free estimation of proteome-wide protein concentrations in multiple organisms28,35–41.
75 The aim of our work was to perform reliable absolute proteome quantification for the first
76 time in an acetogen. We employed a label-based MS approach using SIL-protein spike-in standards to
77 quantify SIL-based concentrations for 16 key proteins and label-free-based concentrations for >1,000
78 C. autoethanogenum proteins during autotrophic growth on three gas mixtures. This allowed us to
79 explore global proteome allocation, uncover isoenzyme usage in central metabolism, and quantify
80 regulatory principles associated with estimated kapps. Our work provides an important reference
81 dataset and advances the systems-level understanding and engineering of the ancient metabolism of
82 acetogens.
83
3 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
84 RESULTS 85 Absolute proteome quantification framework in the model-acetogen C. autoethanogenum. We
86 performed absolute proteome quantification from autotrophic steady-state chemostat cultures of C.
87 autoethanogenum grown on three different gas mixtures: CO, syngas (CO+CO2+H2), or CO+H2
18,19 88 (termed “high-H2 CO”) described before . Briefly, four biological cultures of each gas mixture
89 were grown anaerobically on a chemically defined medium at 37 °C, pH of 5, and dilution rate ~1
90 day-1 (specific growth rate ~0.04 h-1) without the use of heavy SIL substrates. The absolute proteome
91 quantification framework (Fig. 1) was built on using 19 synthetic heavy SIL-variants of key C.
92 autoethanogenum proteins covering central metabolism (Supplementary Table 1). The SIL-protein
93 standards were spiked in for quantification of intracellular concentrations of their endogenous light
94 counterparts. This framework ensures accurate absolute quantification compared to commonly used
95 peptide spike-ins. Spiking cell lysates with protein standards before sample clean-up and protein
96 digestion accounts for errors accompanying these critical steps30,31,42,43. Furthermore, selection of
97 peptides ensuring accurate absolute protein quantification without prior MS data is challenging as its
98 difficult to predict which peptides “fly” well30,31,42,43. In contrast, all proteotypic peptides from a
99 protein spike-in can be used for quantification.
100 We synthesised heavy-labelled lysine and arginine SIL-proteins using a cell-free wheat germ
101 extract platform as described previously18,44,45 and quantified standard stocks using parallel reaction
102 monitoring (PRM) MS. Next, proteins were extracted from culture samples using an optimised
103 protocol maximising extraction yield18 followed by spike-in of the 19 heavy SIL-proteins into light
104 cell lysates. We then used a data-independent acquisition (DIA) MS approach46 to quantitate 1,243
105 proteins of C. autoethanogenum across 12 samples (quadruplicate cultures of three gas mixtures)
106 using a comprehensive spectral library consisting of whole-cell lysates, lysate fractions, and spike-in
107 SIL-proteins. Finally, we quantified intracellular concentrations for 16 key C. autoethanogenum
108 proteins using light-to-heavy ratios between endogenous and spike-in DIA MS intensities and further
109 used these 16 as anchor proteins for label-free estimation of ~1,043 protein concentrations through
110 establishing a linear correlation between protein concentrations and their measured MS intensities32.
4 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
111 We express intracellular protein concentrations in nanomoles of protein per gram of dry cell weight
112 (nmol/gDCW).
113
114 Absolute quantification of 16 anchor protein concentrations. To ensure high confidence absolute
115 quantification of anchor protein concentrations from the DIA MS data, we employed stringent criteria
116 on top of the automated mProphet peak picking algorithm47 within the software Skyline48. We also
117 performed a dilution series experiment for each SIL-protein to increase accuracy (see Methods for
118 details). Briefly, we kept only peaks with Gaussian shapes and without interference and precursors
119 with highest Skyline quality metrics. Importantly, only peptides whose signal were above the lower
120 limit of quantification (LLOQ) and within the linear dynamic quantification range in the dilution
121 series experiment were used for anchor protein quantification (Supplementary Table 2). We thus used
122 106 high-confidence peptides for the absolute quantification of 16 anchor protein concentrations
123 (Table 1; see also Fig. 5). High confidence of the intracellular concentrations for these key C.
124 autoethanogenum proteins of central metabolism is supported both by the low average 11%
125 coefficient of variation (CV) between biological quadruplicate cultures (Table 1) and the average 22%
126 CV between different peptides of single proteins (Supplementary Table S2).
127
128 Label-free estimation of proteome-wide protein concentrations. Both high quality proteomics data
129 and suitable anchor proteins are required for reliable label-free absolute proteome quantification. Our
130 proteome-wide DIA MS data were highly reproducible with an average Pearson correlation
131 coefficient of R = 0.99 between biological replicates (Fig. 2a and Supplementary Fig. 1). We also
132 found our anchor proteins suitable as their concentrations spanned across three orders of magnitude,
133 and the summed mass accounted for ~⅓ of the peptide mass injected into the mass spectrometer
134 (Table 1 and Fig. 2b). We used the 16 anchor proteins (with 106 peptides) to determine the optimal
135 label-free quantification model with the best linear fit between anchor protein concentrations and their
136 measured DIA MS intensities using the aLFQ R package49 as described before for SWATH MS28 (Fig.
137 2c). Notably, we detected an average 1.5-fold cross-validated mean fold-error (CV-MFE;
138 bootstrapping) for the label-free estimated anchor protein concentrations across samples (Fig. 2d).
5 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
139 The errors were distributed normally (Supplementary Fig. 2) with an average 95% CI of 0.3 (Fig. 2d).
140 We then applied the optimal label-free quantification model to estimate ~1,043 protein concentrations
141 in C. autoethanogenum (Supplementary Table 3).
142 Prior to the detailed analysis of proteome-wide protein concentrations, we further evaluated
143 our label-free data accuracy beyond the 1.5-fold CV-MFE determined above. Firstly, the total
144 proteome mass (1.2±0.1 μg; average ± standard deviation) closely matched the 1 μg peptide mass
145 injected into the mass spectrometer (Fig. 2d). The data were also supported by a strong correlation
146 between estimated protein concentrations and expected stoichiometries for equimolar (Fig. 3) and
147 non-equimolar protein complexes (Supplementary Fig. 3). Notably, absolute protein concentrations of
148 syngas cultures correlated well (R = 0.65) with their respective absolute transcript expression levels
149 determined before19 (Supplementary Fig. 4). This result is similar to the correlations of absolute data
150 seen in other steady-state cultures26,50. Altogether, we present the first absolute quantitative proteome
151 dataset for a gas-fermenting acetogen that includes SIL-based concentrations for 16 key proteins and
152 label-free estimates for over 1,000 C. autoethanogenum proteins during growth on three gas mixtures.
153
154 C1 fixation dominates global proteome allocation. Global proteome allocation amongst functional
155 gene classifications was explored using proteomaps27 and KEGG Orthology identifiers (KO IDs)51.
156 The “treemap” structure defining the four-level hierarchy of our proteomaps (Supplementary Table 4)
157 also included manually curated categories to accurately reflect acetogen metabolism (e.g., C1
158 fixation/WLP, Hydrogenases). As expected for autotrophic growth of an acetogen, the C1 fixation
159 (Fig. 4) or WLP (Supplementary Fig. 5) categories dominated the proteome allocation with a ~⅓
160 fraction, compared to Carbohydrate metabolism or Glycolysis/Gluconeogenesis. Notably, the data
161 show that two genes–dihydrolipoamide dehydrogenase (LpdA; CAETHG_RS07825) and glycine
162 cleavage system H protein (GcvH; RS07795)–encoded by the WLP gene cluster were translated at
163 very high levels (Fig. 4). This is important as both have unknown functions in C. autoethanogenum
164 metabolism. Significant investment in expression of proteins involved in acetate and ethanol
165 production (Supplementary Fig. 5) is consistent with ⅓-to-⅔ of fixed carbon channelled into these
166 two growth by-products across the three gas mixtures18,19. The 11% proteome fraction of category
6 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
167 Translation (Fig. 4) is expected for cells growing at a specific growth rate ~0.04 h-1 based on absolute
168 proteomics data from Escherichia coli26,39,52. The notable proteome allocation for Amino acid
169 metabolism and particularly the high abundance of ketol-acid reductoisomerase (IlvC; RS00580) are
170 surprising since metabolic fluxes through 2,3-butanediol and branched-chain amino acid pathways
171 were low under these growth conditions18,19. In addition, numerous proteins with unknown or unclear
172 functions (coloured grey in proteomaps) are highly expressed (e.g., RS12590, RS08610, RS08145),
173 highlighting the need for global mapping of genotype-phenotype relationships in acetogens. In general,
174 proteome allocation was highly similar between the three gas mixtures (Supplementary Table 3). This
175 result is unsurprising given the few relative protein expression differences detected previously18.
176
177 Enzyme usage revealed in central metabolism. Next, we focused on uncovering enzyme usage in
178 acetogen central metabolism (Fig. 5). This contains enzymes of the WLP, acetate, ethanol, and 2,3-
179 butanediol production pathways, hydrogenases, and the Nfn transhydrogenase, which together carry
180 >90% of the carbon and most of the redox flow in C. autoethanogenum18–20. Multiple metabolic fluxes
181 in these pathways can be catalysed by isoenzymes and absolute proteomics data can indicate which of
182 the isoenzymes are likely relevant in vivo. While the carbon monoxide dehydrogenase (CODH) AcsA
183 (RS07861 62) that forms the bifunctional CODH/ACS complex with the acetyl-CoA synthase53
184 (AcsB; RS07800) is essential for C. autoethanogenum growth on gas as confirmed in mutagenesis
185 studies54, the higher concentrations of the dispensable monofunctional CODH CooS1 (RS14775)
54 186 suggest it may also play a role in CO oxidation (Fig. 4, 5), in addition to CO2 reduction .
187 Additionally, our proteomics data show high abundance of the primary acetaldehyde:ferredoxin
188 oxidoreductase (AOR1; RS00440) and this support the emerging understanding that in C.
189 autoethanogenum ethanol is dominantly produced using the AOR1 activity via acetate, instead of
190 directly from acetyl-CoA via acetaldehyde using mono- or bifunctional activities18,19,55,56 (Fig. 5).
191 Furthermore, the data suggest that the specific alcohol dehydrogenase (Adh4; RS08920) is responsible
192 for reducing acetaldehyde to ethanol, a key reaction in terms of carbon and redox metabolism. The
193 high abundance of the electron-bifurcating hydrogenase HytA-E complex (RS13745 70) compared
57,58 194 to alternative hydrogenases confirms that it is the main H2-oxidiser (Fig. 5). This is consistent with 7 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
195 the fact that in the presence of H2 all the CO2 fixed by the WLP is reduced to formate using H2 by the
196 HytA-E and formate dehydrogenase (FdhA; RS13725) enzyme complex activity18,19. Despite the
197 proteomics evidence, genetic perturbations are required to determine condition-specific in vivo
198 functionalities of isoenzymes in acetogens unequivocally.
199 The overall most abundant protein was the formate-tetrahydrofolate ligase (Fhs; RS07850), a
200 key enzyme in the WLP (Fig. 5, 4). Despite the high abundance, its expression might still be rate-
201 limiting (see below). Another key enzyme for acetogens is AcsB because of its essentiality for acetyl-
202 CoA synthesis by the CODH/ACS complex. AcsB is linked to the WLP by the corrinoid iron sulfur
203 proteins AcsC (RS07810) and AcsD (RS07815) that supply the methyl group to AcsB. Interestingly,
204 the ratio of AcsCD-to-AcsB increased from 1.7 (CO) to 2.3 (syngas) to 2.9 (high-H2 CO), suggesting
205 that the primary role of the CODH/ACS complex shifted from CO oxidation towards acetyl-CoA
206 synthesis, likely because increased H2 uptake could replace the supply of reduced ferredoxin from CO
207 oxidation. Concurrently, the Nfn transhydrogenase (RS07665) levels that act as a redox valve in
208 acetogens20 are maintained high (Fig. 5), potentially to rapidly respond to redox perturbations. We
209 conclude that absolute quantitative proteomics can significantly contribute to a systems-level
210 understanding of metabolism, particularly in less-studied organisms.
211
212 Integration of absolute proteomics and flux data yields in vivo enzyme catalytic rates. Absolute
213 proteomics data enable estimation of intracellular catalytic working rates of enzymes when metabolic
214 flux rates are known26,29. We thus calculated apparent in vivo catalytic rates of enzymes, denoted as
-1 26 18 215 kapp (s ) , as the ratio of specific flux rate (mmol/gDCW/h) determined before and protein
216 concentration (nmol/gDCW) (see Methods). This produced kapp values for 13 and 48
217 enzymes/complexes using either anchor or label-free protein concentrations, respectively
218 (Supplementary Table 5 and Fig. 5, 6). The first two critical steps for carbon fixation in the methyl
219 branch of the WLP (i.e., CO to formate) are catalysed at high rates (Fig. 5). Notably, FdhA showed a
-1 220 kapp~30 s for CO2 reduction without H2 during growth on CO only, which is similar to in vitro kcat
221 data of formate dehydrogenases in other acetogens59,60. Interestingly, the next step of formate
-1 222 reduction was catalysed potentially by a less-efficient enzyme–Fhs–as its kapp of ~3 s is significantly 8 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
223 lower compared to other WLP enzymes (Fig. 5). Overall, enzymes catalysing reactions in high flux
224 pathways such as the WLP and acetate and ethanol production have higher kapps than those of
225 downstream from conversion of acetyl-CoA to pyruvate (Fig. 5). Indeed, enzymes catalysing high
226 metabolic fluxes in C. autoethanogenum have both higher concentrations and higher catalytic rates
227 compared to enzymes catalysing lower fluxes as both specific flux rates and enzyme concentrations
-9 -6 228 (Kendall’s τ = 0.56, p-value = 5×10 ) and flux and kapp (τ = 0.45, p-value = 2×10 ) were significantly
229 correlated (Fig. 6a), as seen before for other organisms26,61.
230 Having acquired absolute proteomics data for C. autoethanogenum growth on three gas
231 mixtures with different metabolic flux profiles also allowed us to determine the impact of change in
232 enzyme concentration and its catalytic rate for adjusting metabolic flux rates. Two extreme examples
233 are the reactions catalysed by the HytA-E (Fig. 6b) and the Nfn (Fig. 5) complexes where flux
234 adjustments were accompanied with large changes in kapps rather than in enzyme concentrations. Flux
235 changes in high flux pathways such as the WLP and acetate and ethanol production also coincided
236 mainly with kapp changes (Fig. 6b). This principle seems to be dominant in C. autoethanogenum as
237 90% of flux changes were not regulated through enzyme concentrations (i.e., post-translational
238 regulation; Supplementary Table 6) when comparing all statistically significant flux changes between
239 the three gas mixtures with respective enzyme expression changes (see Methods).
240
241 DISCUSSION 242 The looming danger of irreversible climate change and harmful effects of solid waste accumulation
243 are pushing humanity to develop and adopt sustainable technologies for renewable production of fuels
244 and chemicals and for waste recycling. Acetogen gas fermentation offers great potential to tackle both
245 challenges through recycling waste feedstocks (e.g., industrial waste gases, gasified biomass or
246 municipal solid waste) into fuels and chemicals1,2. Although the quantitative understanding of
247 acetogen metabolism has recently improved16,17, a quantitative description of acetogen proteome
248 allocation was missing. This is needed to advance their metabolic engineering into superior cell
249 factories and accurate in silico reconstruction of their phenotypes1,2. Thus, we performed absolute
9 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
250 proteome quantification in the model-acetogen C. autoethanogenum grown autotrophically on three
251 gas mixtures.
252 Our absolute proteome quantification framework relied on SIL-protein spike-in standards and
253 DIA MS analysis to ensure high confidence of the determined intracellular concentrations for 16 key
254 C. autoethanogenum proteins. We further used these proteins as anchor proteins for label-free
255 estimation of >1,000 protein concentrations. This enabled us to determine the optimal label-free
256 quantification model for our data to infer protein concentrations from MS intensities, which remains
257 unknown in common label-free approaches not utilising spike-in standards32,33. More importantly,
258 label-free estimated protein concentrations using the latter approach are questionable as their accuracy
259 cannot be determined. We determined an excellent average error of 1.5-fold for our label-free
260 estimated protein concentrations based on 16 anchor proteins and a bootstrapping approach. This error
261 is in the same range as described in previous studies using SIL spike-in standards for absolute
262 proteome quantification28,35–41. Further, we also observed a good match both between estimated and
263 injected proteome mass into the mass spectrometer and between protein concentrations and expected
264 protein complex stoichiometries. We conclude that label-free estimation of proteome-wide protein
265 concentrations using SIL-protein spike-ins and state-of-the-art MS analysis is reasonably accurate.
266 Quantification of acetogen proteome allocation during autotrophic growth expectedly showed
267 prioritisation of proteome resources for fixing carbon through the WLP, in line with transcript
268 expression data in C. autoethanogenum19,56. The allocation of one third of the total proteome for C1
269 fixation is higher than proteome allocation for carbon fixation through glycolysis during heterotrophic
270 growth of other microorganisms26,36. High abundances of other key enzymes of acetogen central
271 metabolism were also expected as the WLP, acetate and ethanol production pathways, hydrogenases,
272 and the Nfn transhydrogenase carry >90% of the carbon and most of the redox flow in C.
273 autoethanogenum18–20. However, very high expression of the two genes–LpdA and GcvH–of the WLP
274 gene cluster with unknown functions in C. autoethanogenum is striking, raising the question whether
275 their function in C. autoethanogenum could also be to link WLP and glycine synthase-reductase
276 pathways, as recently proposed for another acetogen62. Since many other proteins with unknown or
10 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
277 unclear functions were also highly abundant, global mapping of genotype-phenotype relationships in
278 acetogens is much needed.
279 The in vivo functionalities of isoenzymes are not clear for multiple key metabolic fluxes in
280 acetogen central metabolism and absolute proteomics data can indicate which isoenzymes are likely
281 relevant. Oxidation of CO or reduction of CO2 is a fundamental step for all acetogens and known to
282 be catalysed by three CODHs in C. autoethanogenum54. Though only AcsA that forms the
283 bifunctional CODH/ACS complex with the acetyl-CoA synthase53 is essential for growth on gas54, we
284 detected higher concentrations of the monofunctional CODH CooS1, which deletion strain shows
285 intriguing phenotypes54. Concurrently, our data suggest that prioritisation of CODH/ACS activity
286 between CO oxidation and acetyl-CoA synthesis is sensitive to H2 availability. Thus, further studies
287 are required to decipher condition-dependent functionalities of CODHs. In addition to CODHs, the
288 biochemical understanding of ethanol production is important in terms of both carbon and redox
289 metabolism. Our data confirm that in C. autoethanogenum ethanol is predominantly produced via
290 acetate by AOR118,19,55,56 and more importantly, indicate for the first time that AOR1 activity is
291 followed by Adh4 (previously characterised as butanol dehydrogenase63) for reduction of
292 acetaldehyde to ethanol. These observations call for large-scale genetic perturbation experiments to
293 determine unequivocally the condition-specific in vivo functionalities of isoenzymes in acetogens.
294 Absolute proteomics data offer a unique opportunity to estimate apparent in vivo catalytic
26,29 295 rates of enzymes (kapp) if also metabolic flux data are available. These data are particularly
296 valuable for more accurate in silico reconstruction of phenotypes using protein-constrained genome-
64,65 29 297 scale metabolic models . While in vitro kcat and in vivo kapp data generally correlate , models using
66 298 maximal kapp values show better prediction of protein abundances . Furthermore, information of kapps
299 can infer less-efficient enzymes as targets for improving pathways through metabolic and protein
300 engineering. For example, protein engineering of Fhs (catalysing formate reduction) might improve
301 WLP throughput and carbon fixation since its kapp was significantly lower compared to other pathway
302 enzymes. At the same time, the large change in kapps of the abundant electron-bifurcating hydrogenase
303 HytA-E and the Nfn transhydrogenase complexes indicate capacity for the cells to rapidly respond to
20 304 H2 availability and redox perturbations, which may be critical for metabolic robustness of acetogens . 11 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
305 Overall, we detected both higher concentrations and kapps for enzymes catalysing higher metabolic
306 fluxes, which is believed to arise from an evolutionary push towards reducing protein production costs
307 for enzymes carrying high flux61. The observation that 90% of flux changes in C. autoethanogenum
308 were not regulated through changes in enzyme concentrations is not surprising for a metabolism that
309 operates at the thermodynamic edge of feasibility16,17 since post-translational regulation of fluxes is
310 energetically least costly. Further research is needed to identify which mechanism from post-
311 translational protein modification, allosteric regulation, or substrate concentration change is
312 responsible for post-translational regulation of fluxes.
313 We have produced the first absolute proteome quantification in an acetogen and thus provided
314 understanding of global proteome allocation, isoenzyme usage in central metabolism, and regulatory
315 principles of in vivo enzyme catalytic rates. This fundamental knowledge has potential to advance
316 both rational metabolic engineering of acetogen cell factories and accurate in silico reconstruction of
317 their phenotypes1,2,64,65. Our study also highlights the need for large-scale mapping of genotype-
318 phenotype relationships in acetogens to infer in vivo functionalities of isoenzymes and proteins with
319 unknown or unclear functions. This absolute proteomics dataset serves as a reference towards a better
320 systems-level understanding of the ancient metabolism of acetogens.
321
322 METHODS
323 Bacterial strain and culture growth conditions. Absolute proteome quantification was performed
324 from high biomass concentration (~1.4 gDCW/L) steady-state autotrophic chemostat cultures of C.
325 autoethanogenum growing on three different gas mixtures with culturing conditions described in our
326 previous works18,19. Briefly, four biological replicate chemostat cultures of C. autoethanogenum strain
327 DSM 19630 were grown on a chemically defined medium (without yeast extract) either on CO (~60%
328 CO and 40% Ar), syngas (~50% CO, 20% H2, 20% CO2, and 10% N2/Ar), or CO+H2, termed “high-
329 H2 CO”, (~15% CO, 45% H2, and 40% Ar) under strictly anaerobic conditions. The bioreactors were
330 maintained at 37 °C, pH of 5, and dilution rate ~1 day-1 (specific growth rate ~0.04 h-1).
331
12 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
332 Cell-free synthesis of stable-isotope labelled protein standards. Twenty proteins covering C.
333 autoethanogenum central carbon metabolism, the HytA-E hydrogenase, and a ribosomal protein
334 (Supplementary Table 1) were selected for cell-free synthesis of SIL-proteins as described in ref.18.
335 Briefly, genes encoding for these proteins were synthesised by commercial gene synthesis services
336 (Biomatik). Target genes were sub-cloned into the cell-free expression vector pEUE01-His-N2
337 (CellFree Sciences) and transformed into Escherichia coli DH5α from which plasmid DNA was
338 extracted and purified. Correct gene insertion into the pEUE01-His-N2 was verified by DNA
339 sequencing. Subsequently, cell-free synthesis of His-tag fused C. autoethanogenum proteins was
340 performed using the bilayer reaction method with the wheat germ extract WEPRO8240H (CellFree
341 Sciences) as described previously44,45. mRNAs for cell-free synthesis were prepared by an in vitro
342 transcription reaction while in vitro translation of target proteins was performed using a bilayer
13 15 13 15 343 reaction where the translation layer was supplemented with L-Arg- C6, N4 and L-Lys- C6, N2
344 (Wako) at final concentrations of 20 mM to achieve high efficiency (>99 %) for stable-isotope
345 labelling of proteins. The in vitro synthesised SIL-protein sequences also contained an N-terminal
346 amino acid sequence GYSFTTTAEK that was later used as a tag for quantification of the SIL-protein
347 stock concentration. Subsequently, SIL-proteins were purified using the Ni-Sepharose High-
348 Performance resin (GE Healthcare Life Sciences) and precipitated using methanol:chloroform:water
349 precipitation in Eppendorf Protein LoBind® tubes. Lastly, precipitated SIL-proteins were reconstituted
350 in 104 μL of 8 M urea ([UA]; Sigma-Aldrich) in 0.1 M Trizma® base (pH 8.5) by vigorous vortexing
351 and stored at -80 °C until further use.
352
353 Absolute quantification of SIL-protein standards using PRM MS. Concentrations of the twenty
354 synthesised SIL-protein standard stocks were determined using PRM MS preceded by in-solution
355 digestion of proteins and sample desalting and preparation for MS analysis.
356 Sample preparation
357 Only Eppendorf Protein LoBind® tubes and pipette tips were used for all sample preparation steps.
358 Firstly, 20 μL of UA was added to 4 μL of the SIL-protein standard stock used to determine the stock
359 concentration, and the mix was vortexed. Then, 1 μL of 0.2 M DTT (Promega) was added, followed
13 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
360 by vortexing and incubation for 1 h at 37 °C to reduce disulphide bonds. Sulfhydryl groups were
361 alkylated with 2 µL of 0.5 M iodoacetamide (IAA; Sigma-Aldrich), vigorous vortexing, and
362 incubation for 30 min at room temperature in the dark. Next, 75 µL of 25 mM ammonium bicarbonate
363 was added to dilute UA down to 2 M concentration. Subsequently, 2 pmol (2 µL of stock) of the non-
364 labelled AQUA® peptide HLEAAKGYSFTTTAEKAAELHK (Sigma-Aldrich) containing the
365 quantification tag sequence GYSFTTTAEK was added to enable quantification of SIL-protein stock
366 concentrations using MS analysis based on the ratio of heavy-to-light GYSFTTTAEK signals (see
367 below). Protein digestion was performed for 16 h at 37 °C with 0.1 µg of Trypsin/Lys-C mix (1 µL of
368 stock; Promega) and stopped by lowering pH to 3 by the addition of 5 µL of 10% (v/v) trifluoroacetic
369 acid (TFA).
370 Samples were desalted using C18 ZipTips (Merck Millipore) as follows: the column was
371 wetted using 0.1% (v/v) formic acid (FA) in 100% acetonitrile (ACN), equilibrated with 0.1% FA in
372 70% (v/v) ACN, and washed with 0.1% FA before loading the sample and washing again with 0.1%
373 FA. Peptides were eluted from the ZipTips with 0.1% FA in 70% ACN. Finally, samples were dried
374 using a vacuum-centrifuge (Eppendorf) at 30 °C until dryness followed by reconstitution in 12 µL of
375 0.1% FA in 5% ACN for subsequent MS analysis.
376 LC method for PRM MS
377 A Thermo Fisher Scientific UltiMate 3000 RSLCnano UHPLC system was used to elute the samples.
378 Each sample was initially injected (6 µL) onto a Thermo Fisher Acclaim PepMap C18 trap reversed-
379 phase column (300 µm x 5 mm nano viper, 5 µm particle size) at a flow rate of 15 µL/min using 2%
380 ACN for 3 min with the solvent going to waste. The trap column was switched in-line with the
381 separation column (GRACE Vydac Everest C18, 300Å 150 µm x 150 mm, 2 µm) and the peptides
382 were eluted using a flowrate of 3 µL/min using 0.1% FA in water (buffer A) and 80% ACN in buffer
383 A (buffer B) as mobile phases for gradient elution. Following 3 min isocratic of 3% buffer B, peptide
384 elution employed a 3-40% ACN gradient for 28 min followed by 40-95% ACN for 1.5 min and 95%
385 ACN for 1.5 min at 40 °C. The total elution time was 50 min including a 95% ACN wash and a re-
386 equilibration step.
387 PRM MS data acquisition
14 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
388 The eluted peptides from the C18 column were introduced to the MS via a nano-ESI and analysed
389 using the Thermo Fisher Scientific Q-Exactive HF-X mass spectrometer. The electrospray voltage
390 was 1.8 kV in positive ion mode, and the ion transfer tube temperature was 250 °C. Full MS-scans
391 were acquired in the Orbitrap mass analyser over the range m/z 550–560 with a mass resolution of
392 30,000 (at m/z 200). The AGC target value was set at 1.00E+06 and maximum accumulation time 50
393 ms for full MS-scans. The PRM inclusion list included two mass values of 552.7640 and 556.7711.
394 MS/MS spectra were acquired in the Orbitrap mass analyser with a mass resolution of 15,000 (at m/z
395 200). The AGC target value was set at 1.00E+06 and maximum accumulation time 30 ms for MS/MS
396 with an isolation window of 2 m/z. The loop count was set at 14 to gain greater MS/MS data. Raw
397 PRM MS data have been deposited to Panorama at
398 https://panoramaweb.org/Valgepea_Cauto_PRM.url (private reviewer account details: username:
399 [email protected]; password: hSAwNUAL) with a ProteomeXchange
400 Consortium (http://proteomecentral.proteomexchange.org) dataset identifier PXD025760.
401 PRM MS data analysis
402 Analysis of PRM MS data was performed using the software Skyline48. The following parameters
403 were used to extract PRM MS data for the quantification tag sequence GYSFTTTAEK: three
404 precursor isotope peaks with a charge of 2 (++) were included (monoisotopic; M+1; M+2); five of the
405 most intense y product ions from ion 3 to last ion of charge state 1 and 2 among the precursor were
406 picked; chromatograms were extracted with an ion match mass tolerance of 0.05 m/z for product ions
407 by including all matching scans; full trypsin specificity with two missed cleavages allowed for
408 peptides with a length of 8-25 AAs; cysteine carbamidomethylation as a fixed peptide modification.
409 Additionally, peptide modifications included heavy labels for lysine and arginine as
410 13C(6)15N(2)/+8.014 Da (K) and 13C(6)15N(4)/+10.008 Da (R), respectively. This translated into the
411 SIL-proteins and the non-labelled AQUA® peptide possessing the tag GYSFTTTAEK with m/z of
412 556.7711 and 552.7640, respectively. Hence, the concentrations of SIL-protein stocks were calculated
413 based on the ratio of heavy-to-light GYSFTTTAEK signals and the spike-in of 2 pmol of the non-
414 labelled AQUA® peptide (see above). High accuracy of quantification was evidenced by the very high
415 similarity between both precursor peak areas and expected isotope distribution (R2>0.99; idotp in
15 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
416 Skyline) and heavy and light peak areas (R2>0.99; rdotp in Skyline) for all SIL-protein standard
417 stocks. No heavy GYSFTTTAEK signal was detected for protein CAETHG_RS16140 (an acetylating
418 acetaldehyde dehydrogenase in the NCBI annotation of sequence NC_022592.167), thus 19 SIL-
419 proteins could be used for following absolute proteome quantification in C. autoethanogenum
420 (Supplementary Table 1). PRM MS data with all Skyline processing settings can be viewed and
421 downloaded from Panorama at https://panoramaweb.org/Valgepea_Cauto_PRM.url (private reviewer
422 account details: username: [email protected]; password: hSAwNUAL).
423
424 Absolute proteome quantification in C. autoethanogenum using DIA MS. We used 19 synthetic
425 heavy SIL variants (see above) of key C. autoethanogenum proteins (Supplementary Table 1) as
426 spike-in standards for quantification of intracellular concentrations of their non-SIL counterparts
427 using a DIA MS approach46. Also, we performed a dilution series experiment for the spike-in SIL-
428 proteins to ensure accurate absolute quantification. We refer to these 19 intracellular proteins as
429 anchor proteins that were further used to estimate proteome-wide absolute protein concentrations in C.
430 autoethanogenum. This was achieved by determining the best linear fit between anchor protein
431 concentrations and their measured DIA MS intensities using the same strategy as described
432 previously28.
433 Preparation of spike-in SIL-protein standard mix and dilution series samples
434 Only Eppendorf Protein LoBind® tubes and pipette tips were used for all preparation steps. The 19
435 spike-in SIL-protein standards that could be used for absolute proteome quantification in C.
436 autoethanogenum (see above) were mixed in two lots: 1) ‘sample spike-in standard mix’: SIL-protein
437 quantities matching estimated intracellular anchor protein quantities (i.e., expected light-to-heavy
438 [L/H] ratios of ~1) based on label-free absolute quantification of the same samples in our previous
439 work18; 2) ‘dilution series standard mix’: SIL-protein quantities doubling the estimated intracellular
440 anchor protein quantities for the dilution series sample with the highest SIL-protein concentrations.
441 To ensure accurate absolute quantification of anchor protein concentrations, a dilution series
442 experiment was performed to determine the linear dynamic quantification range and LLOQ for each
443 of the 19 spike-in SIL-proteins. Dilution series samples were prepared by making nine 2-fold dilutions
16 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
444 of the ‘dilution series spike-in standard mix’ (i.e., 10 samples total for dilution series with a 512-fold
445 concentration span) in a constant C. autoethanogenum cell lysate background (0.07 µg/µL; 10
446 µg/tube) serving as a blocking agent to avoid loss of purified SIL-proteins (to container and pipette tip
447 walls) and as a background proteome for accurate MS quantification of the linear range and LLOQ for
448 anchor proteins.
449 Sample preparation
450 C. autoethanogenum cultures were sampled for proteomics by immediate pelleting of 2 mL of culture
451 using centrifugation (25,000 × g for 1 min at 4 °C) and stored at -80 °C until analysis. Details of
452 protein extraction and protein quantification in cell lysates are described previously18. In short, thawed
453 cell pellets were suspended in lysis buffer (containing SDS, DTT, and Trizma® base) and cell lysis
454 was performed using glass beads and repeating a ‘lysis cycle’ consisting of heating, bead beating,
455 centrifugation, and vortexing before protein quantification using the 2D Quant Kit (GE Healthcare
456 Life Sciences).
457 Sample preparation and protein digestion for MS analysis was based on the filter-aided
458 sample preparation (FASP) protocol68. The following starting material was loaded onto an Amicon®
459 Ultra-0.5 mL centrifugal filter unit (nominal molecular weight cut-off of 30,000; Merck Millipore): 1)
460 50 µg of protein for one culture sample from each gas mixture (CO, syngas, or high-H2 CO) for
461 building the spectral library for DIA MS data analysis (samples 1-3); 2) 7 µg of protein for one
462 culture sample from either syngas or high-H2 CO plus ‘sample spike-in standard mix’ for including
463 spike-in SIL-protein data to the spectral library (samples 4-5); 3) 15 µg of protein for all 12 culture
464 samples (biological quadruplicates from CO, syngas, and high-H2 CO) plus ‘sample spike-in standard
465 mix’ for performing absolute proteome quantification in C. autoethanogenum (samples 6-17); 4) ten
466 dilution series samples with 10-15 µg of total protein (C. autoethanogenum cell lysate background
467 plus ‘dilution series spike-in standard mix’) for performing the dilution series experiment for the 19
468 spike-in SIL-proteins (see above) (samples 18-27).
469 Samples containing SIL-proteins (samples 4-27) were incubated at 37 °C for 1 h to reduce
470 SIL-protein disulphide bonds (cell lysate contained DTT). Details of the FASP workflow are
471 described before18. In short, samples were washed with UA, sulfhydryl groups alkylated with IAA,
17 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
472 proteins digested using a Trypsin/Lys-C mix, and peptides eluted from the filter with 60 µL of
473 ammonium bicarbonate. Next, 50 µL of samples 1-3 were withdrawn and pooled for performing high
474 pH reverse-phase fractionation as described previously18 for expanding the spectral library for DIA
475 MS data analysis, yielding eight fractions (samples 28-35). Subsequently, all samples were vacuum-
476 centrifuged at 30 °C until dryness followed by reconstitution of samples 1-3 and 4-35 in 51 and 13 µL
477 of 0.1% FA in 5% ACN, respectively. Finally, total peptide concentration in each sample was
478 determined using the PierceTM Quantitative Fluorometric Peptide Assay (Thermo Fisher Scientific) to
479 ensure that the same total peptide amount across samples 1-17 and 28-35 (excluding samples 18-27,
480 see below) could be injected for DIA MS analysis.
481 LC method for data-dependent acquisition (DDA) and DIA MS
482 Details of the LC method employed for generating the spectral library using DDA and for DIA sample
20 483 runs are described previously . In short, a Thermo Fisher Scientific UHPLC system including C18
484 trap and separation columns was used to elute peptides with a gradient and total elution time of 110
485 min. For each DDA and DIA sample run, 1 µg of peptide material from protein digestion was injected,
486 except for dilution series samples (samples 18-27 above) that were injected in a constant volume of 3
487 µL to maintain the dilution levels of the ‘dilution series spike-in standard mix’.
488 DDA MS spectral library generation
489 The following 13 samples were analysed on the Q-Exactive HF-X in DDA mode to yield the spectral
490 library for DIA MS data analysis: 1) three replicates of one culture sample from each gas mixture (CO,
491 syngas, or high-H2 CO) (samples 1-3 above); 2) three replicates of one culture sample from either
492 syngas or high-H2 CO plus ‘sample spike-in standard mix’ (samples 4-5); 3) eight high pH reverse-
493 phase fractions of a pool of samples from each gas mixture (samples 28-35).
494 Details of DDA MS acquisition for generating the spectral library are described before20. In
495 short, eluted peptides from the C18 column were introduced to the MS via a nano-ESI and analysed
496 using the Q-Exactive HF-X with an Orbitrap mass analyser. The DDA MS spectral library for DIA
497 MS data confirmation and quantification using the software Skyline48 was created using the Proteome
498 Discoverer 2.2 software (Thermo Fisher Scientific) and its SEQUEST HT search as described
499 previously18. The final .pd result file contained peptide-spectrum matches (PSMs) with q-values
18 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
500 estimated at 1% false discovery rate (FDR) for peptides ≥4 AAs. The generated spectral library file
501 has been deposited to the ProteomeXchange Consortium
502 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository69 with the dataset
503 identifier PXD025732 (private reviewer account details: username: [email protected];
504 password: QdkJHXy0).
505 DIA MS data acquisition
506 Details of DIA MS acquisition are described before20. In short, as for DDA MS acquisition, eluted
507 peptides were introduced to the MS via a nano-ESI and analysed using the Q-Exactive HF-X with an
508 Orbitrap mass analyser. DIA was achieved using an inclusion list: m/z 395 1100 in steps of 15 amu
509 and scans cycled through the list of 48 isolation windows with a loop count of 48. In total, DIA MS
510 data was acquired for 22 samples (samples 6-27 defined in section Sample preparation). Raw DIA
511 MS data have been deposited to the ProteomeXchange Consortium
512 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository69 with the dataset
513 identifier PXD025732 (private reviewer account details: username: [email protected];
514 password: QdkJHXy0).
515 DIA MS data analysis
516 DIA MS data analysis was performed with Skyline48 as described before18 with the following
517 modifications: 1) 12 manually picked high confidence endogenous peptides present in all samples and
518 spanning the elution gradient were used for iRT alignment through building an RT predictor; 2)
519 outlier peptides from iRT regression were removed; 3) a minimum of three isotope peaks were
520 required for a precursor; 4) single peptide per spike-in SIL-protein was allowed for anchor protein
521 absolute quantification while at least two peptides per protein were required for label-free estimation
522 of proteome-wide protein concentrations; 5) extracted ion chromatograms (XICs) were transformed
523 using Savitszky-Golay smoothing. Briefly, the .pd result file from Proteome Discoverer was used to
524 build the DIA MS spectral library and the mProphet peak picking algorithm47 within Skyline was used
525 to separate true from false positive peak groups (per sample) and only peak groups with q-value<0.01
526 (representing 1% FDR) were used for further quantification. We confidently quantitated 7,288
527 peptides and 1,243 proteins across all samples and 4,887 peptides and 1,043 proteins on average
19 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
528 within each sample for estimating proteome-wide absolute protein concentrations. For absolute
529 quantification of anchor protein concentrations, we additionally manually: 1) removed integration of
530 peaks showing non-Gaussian shapes or interference from other peaks; 2) removed precursors with
531 similarity measures of R2<0.9 between product peak areas and corresponding intensities in the
532 spectral library (dotp in Skyline), precursor peak areas and expected isotope distribution (idotp), or
533 heavy and light peak areas (rdotp). After analysis in Skyline, 17 spike-in SIL-proteins remained for
534 further analysis as protein CAETHG_RS14410 was not identified in DIA MS data while
535 CAETHG_RS18395 did not pass quantification filters (Supplementary Table 1). DIA MS data with
536 all Skyline processing settings can be viewed and downloaded from Panorama at
537 https://panoramaweb.org/Valgepea_Cauto_Anchors.url for anchor protein absolute quantification and
538 at https://panoramaweb.org/Valgepea_Cauto_LF.url for estimating proteome-wide absolute protein
539 concentrations (private reviewer account details: username: [email protected];
540 password: hSAwNUAL).
541 Absolute quantification of anchor protein concentrations
542 We employed further stringent criteria on top of the output from Skyline analysis to ensure high
543 confidence absolute quantification of 17 anchor protein concentrations. Firstly, precursor with highest
544 heavy intensity for the highest ‘dilution series spike-in standard mix’ sample in the dilution series
545 (DS01) was kept while others were deleted. Peptides quantified in less than three biological replicates
546 cultures within a gas mixture, with no heavy signal for DS01 sample, or with heavy signals for less
547 than three continuous dilution series samples were removed. Next, we utilised the dilution series
548 experiment to only keep signals over the LLOQ and within the linear dynamic quantification range.
549 For this, correlation between experimental and expected peptide L/H signal ratios for each peptide
550 across the dilution series was made to determine the LLOQ and calculate correlation, slope, and
551 intercept between MS signal and spike-in level (Supplementary Table 2). Only peptides showing
552 correlation R2>0.95, 0.95 553 ensured that we were only using peptides within the linear dynamic range. The remaining peptides 554 were further filtered for each culture sample by removing peptides whose light or heavy signal was 555 below the LLOQ in the dilution series. Subsequently, only peptides were kept with L/H ratios for at 20 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 556 least three biological replicates cultures for each gas mixture (i.e., ≥9 data points). Finally, we aimed 557 to detect outlier peptides by calculating the % difference of a peptide’s L/H ratio from the average 558 L/H ratio of all peptides for a given protein for every sample. Peptides were considered outliers and 559 thus removed if the average difference across all samples was >50% or if the average difference 560 within biological replicate cultures was >50%. After the previous stringent criteria were applied, 106 561 high-confidence peptides remained (Supplementary Table 2) for the quantification of 16 anchor 562 protein concentrations since CAETHG_RS01830 was lost during manual analysis (Table 1 and 563 Supplementary Table 1). Proteins CAETHG_RS13725 and CAETHG_RS07840 were excluded from 564 the high-H2 CO culture dataset as their calculated concentrations varied >50% between biological 565 replicates. Data of one high-H2 CO culture was excluded from further analysis due to difference from 566 bio-replicates likely due to challenges with MS analysis. 567 Label-free estimation of proteome-wide protein concentrations 568 We used the anchor proteins to estimate proteome-wide protein concentrations in C. 569 autoethanogenum by determining the best linear fit between anchor protein concentrations and their 570 measured DIA MS intensities using the aLFQ R package49 and the same strategy as described for 571 SWATH MS28. Briefly, aLFQ used anchor proteins and cross-validated model selection by 572 bootstrapping to determine the optimal model within various label-free absolute proteome 573 quantification approaches (e.g., TopN, iBAQ). The approach can obtain the model with the smallest 574 error between anchor protein concentrations determined using SIL-protein standards and label-free 575 estimated concentrations. The models with the highest accuracy were used to estimate proteome-wide 576 label-free concentrations for all proteins from their DIA MS intensities (1,043 proteins on average 577 within each sample; minimal two peptides per protein; see above): summing the five most intense 578 fragment ion intensities of the most or three of the most intense peptides per protein for CO or high- 579 H2 CO cultures, respectively; summing the five most intense fragment ion intensities of all quantified 580 peptides of the protein divided by the number of theoretically observable peptides (i.e., iBAQ70) for 581 syngas cultures. 582 21 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 583 Expected protein complex stoichiometries. Equimolar stoichiometries for the HytA-E/FdhA and 584 MetFV protein complexes were expected based on SDS gel staining experiments in C. 585 autoethanogenum57 and the acetogen Moorella thermoacetica71, respectively. Expected 586 stoichiometries for other protein complexes in Fig. 3 and Supplementary Fig. 3 were based on 587 measured stoichiometries in E. coli K-12 (Complex Portal; www.ebi.ac.uk/complexportal) and 588 significant homology between complex member proteins in C. autoethanogenum and E. coli. All 589 depicted C. autoethanogenum protein complex members had NCBI protein-protein BLAST E- 590 values<10-16 and scores>73 against respective E. coli K-12 proteins using non-redundant protein 591 sequences. 592 593 Generation of proteomaps. The distribution of proteome-wide protein concentrations among 594 functional gene classifications was visualised using proteomaps27. For this, the NCBI annotation of 595 sequence NC_022592.167 was used as the annotation genome for C. autoethanogenum, with 596 CAETHG_RS07860 removed and replaced with the carbon monoxide dehydrogenase genes named 597 CAETHG_RS07861 and CAETHG_RS07862 with initial IDs of CAETHG_1620 and 1621, 598 respectively. Functional categories were assigned to protein sequences with KO IDs51 using the 599 KEGG annotation tool BlastKOALA72. Since proteomaps require a tree-like hierarchy, proteins that 600 were automatically assigned to multiple functional categories were manually assigned to one bottom- 601 level category (Level 3 in Supplementary Table 4) based on their principal task. We also created 602 functional categories “C1 fixation/Wood-Ljungdahl Pathway” (Level 2/3), “Acetate & ethanol 603 synthesis” (Level 3), “Energy conservation” (Level 3), “Hydrogenases” (Level 3) and manually 604 assigned key acetogen proteins to these categories to reflect more accurately functional categories for 605 an acetogen. Proteins without designated KO IDs were manually assigned to latterly created 606 categories or grouped under “Not Included in Pathway or Brite” (Level 1) with Level 2 and 3 as “No 607 KO ID”. If BlastKOALA assigned multiple genes the same proposed gene/protein name, unique 608 numbers were added to names (e.g., pfkA, pfkA2). The final “treemap” defining the hierarchy for our 609 proteomaps is in Supplementary Table 4. 610 22 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 611 Calculation of apparent in vivo catalytic rates of enzymes (kapp). We calculated apparent in vivo -1 26 612 catalytic rates of enzymes, denoted as kapp (s ) , as the ratio of specific flux rate (mmol/gDCW/h) 613 determined before18 and protein concentration (nmol/gDCW) quantified here for the same C. 614 autoethanogenum CO, syngas, and high-H2 CO cultures. Gene-protein-reaction (GPR) data of the 615 genome-scale metabolic model iCLAU78618 were manually curated to reflect most recent knowledge 616 and were used to link metabolic fluxes with catalysing enzymes. For reactions with multiple assigned 617 enzymes (i.e,. isoenzymes), the enzyme with the highest average ranking of its concentration across 618 the three cultures (Supplementary Table 3) was assumed to solely catalyse the flux. For enzyme 619 complexes, average of quantified subunit concentrations was used (standard deviation estimated using 620 error propagation). For the HytA-E hydrogenase, its measured protein concentration was split 621 between reactions “rxn08518_c0” (direct CO2 reduction with H2 in complex with FdhA) and 622 “leq000001_c0” (H2 oxidation solely by HytA-E) proportionally to flux for syngas and high-H2 CO 623 cultures. The resulting enzymes or enzyme complexes catalysing specific fluxes are shown in 624 Supplementary Table 5. Finally, we assumed each protein chain being catalytically active and only 625 calculated kapp values for metabolic reactions with a non-zero flux in at least one condition, specific 626 flux rate>0.1% of CO fixation flux in at least one condition, and label-free data with measured 627 concentration for the associated enzyme(s) in all conditions (Supplementary Table 5). Membrane 628 proteins were excluded from kapp calculations to avoid bias from potentially incomplete protein 629 extraction. This produced kapp values for 13 enzymes/complexes using anchor protein concentrations 630 and for 48 enzymes/complexes using label-free protein concentrations (Supplementary Table 5). 631 632 Determination of regulation level of metabolic fluxes. We used published flux and relative 633 proteomics data18 of the same cultures studied here to determine whether fluxes are regulated by 634 changing enzyme concentrations or their catalytic rates by considering metabolic fluxes with non-zero 635 specific flux rates in at least two conditions of CO, syngas, or high-H2 CO cultures. The same 636 manually curated GPRs and criteria for isoenzymes and protein complexes as described above for kapp 637 calculation were used to determine flux-enzyme pairs (Supplementary Table 6). We first used a two- 638 tailed two-sample equal variance Student’s t-test with FDR correction73 to determine fluxes with 23 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 639 significant changes between any two conditions (q-value<0.05). We then used the Student’s left-tailed 640 t-distribution with FDR to determine if the significant flux change for every flux was significantly 641 different from the change in respective enzyme expression between the same conditions 642 (Supplementary Table 6). Flux with a q-value<0.05 for the latter test was considered to be regulated at 643 post-translational level (e.g., by changing enzyme catalytic rate). 644 645 DATA AVAILABILITY 646 All data generated or analysed during this study are in the main text, supplementary information files, 647 or public databases. Raw PRM MS data have been deposited to Panorama at 648 https://panoramaweb.org/Valgepea_Cauto_PRM.url (private reviewer account details: username: 649 [email protected]; password: hSAwNUAL) with a ProteomeXchange 650 Consortium (http://proteomecentral.proteomexchange.org) dataset identifier PXD025760. Raw DIA 651 MS data have been deposited to the ProteomeXchange Consortium 652 (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository69 with the dataset 653 identifier PXD025732 (private reviewer account details: username: [email protected]; 654 password: QdkJHXy0). PRM MS data with all Skyline processing settings can be viewed and 655 downloaded from Panorama at https://panoramaweb.org/Valgepea_Cauto_PRM.url. DIA MS data 656 with all Skyline processing settings can be viewed and downloaded from Panorama at 657 https://panoramaweb.org/Valgepea_Cauto_Anchors.url for anchor protein absolute quantification and 658 at https://panoramaweb.org/Valgepea_Cauto_LF.url for estimating proteome-wide absolute protein 659 concentrations (private reviewer account details: username: [email protected]; 660 password: hSAwNUAL). Any other relevant data are available from the corresponding author on 661 reasonable request. 662 663 REFERENCES 664 1. Liew, F. et al. Gas Fermentation—A flexible platform for commercial scale production of low- 665 carbon-fuels and chemicals from waste and renewable feedstocks. Front. Microbiol. 7, 694 (2016). 24 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 666 2. Claassens, N. J., Sousa, D. Z., dos Santos, V. A. P. M., de Vos, W. M. & van der Oost, J. 667 Harnessing the power of microbial autotrophy. Nat. Rev. Microbiol. 14, 692–706 (2016). 668 3. Fackler, N. et al. Stepping on the gas to a circular economy: accelerating development of carbon- 669 negative chemical production from gas fermentation. Annu. Rev. Chem. Biomol. Eng. 12, 1 (2021). 670 4. Ragsdale, S. W. & Pierce, E. Acetogenesis and the Wood–Ljungdahl pathway of CO2 fixation. 671 Biochim. Biophys. Acta 1784, 1873–1898 (2008). 672 5. Wood, H. G. Life with CO or CO2 and H2 as a source of carbon and energy. FASEB J. 5, 156–163 673 (1991). 674 6. Drake, H. L., Küsel, K. & Matthies, C. In The Prokaryotes Ch. Acetogenic Prokaryotes. 354–420 675 (2006). 676 7. Fuchs, G. Alternative pathways of carbon dioxide fixation: insights into the early evolution of life? 677 Ann. Rev. Microbiol. 65, 631–658 (2011). 678 8. Fast, A. G. & Papoutsakis, E. T. Stoichiometric and energetic analyses of non-photosynthetic CO2- 679 fixation pathways to support synthetic biology strategies for production of fuels and chemicals. Curr. 680 Opin. Chem. Eng. 1, 380–395 (2012). 681 9. Cotton, C. A., Edlich-Muth, C. & Bar-Even, A. Reinforcing carbon fixation: CO2 reduction 682 replacing and supporting carboxylation. Curr. Opin. Biotechnol. 49, 49–56 (2018). 683 10. Köpke, M. & Simpson, S. D. Pollution to products: recycling of ‘above ground’ carbon by gas 684 fermentation. Curr. Opin. Biotechnol. 65, 180–189 (2020). 685 11. Russell, M. J. & Martin, W. The rocky roots of the acetyl-CoA pathway. Trends Biochem. Sci. 29, 686 358–363 (2004). 687 12. Weiss, M. C. et al. The physiology and habitat of the last universal common ancestor. Nat. 688 Microbiol. 1, 16116 (2016). 689 13. Varma, S. J., Muchowska, K. B., Chatelain, P. & Moran, J. Native iron reduces CO2 to 690 intermediates and end-products of the acetyl-CoA pathway. Nat. Ecol. Evol. 2, 1019–1024 (2018). 691 14. Ljungdahl, L. G. A life with acetogens, thermophiles, and cellulolytic anaerobes. Annu. Rev. 692 Microbiol. 63, 1–25 (2009). 25 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 693 15. Ragsdale, S. W. Enzymology of the Wood-Ljungdahl pathway of acetogenesis. Ann. N. Y. Acad. 694 Sci. 1125, 129–136 (2008). 695 16. Schuchmann, K. & Müller, V. Autotrophy at the thermodynamic limit of life: a model for energy 696 conservation in acetogenic bacteria. Nat. Rev. Microbiol. 12, 809–821 (2014). 697 17. Molitor, B., Marcellin, E. & Angenent, L. T. Overcoming the energetic limitations of syngas 698 fermentation. Curr. Opin. Chem. Biol. 41, 84–92 (2017). 699 18. Valgepea, K. et al. H2 drives metabolic rearrangements in gas-fermenting Clostridium 700 autoethanogenum. Biotechnol. Biofuels 11, 55 (2018). 701 19. Valgepea, K. et al. Maintenance of ATP homeostasis triggers metabolic shifts in gas-fermenting 702 acetogens. Cell Syst. 4, 505–515.e5 (2017). 703 20. Mahamkali, V. et al. Redox controls metabolic robustness in the gas-fermenting acetogen 704 Clostridium autoethanogenum. Proc. Natl. Acad. Sci. U. S. A. 117, 13168–13175 (2020). 705 21. Richter, H. et al. Ethanol production in syngas-fermenting Clostridium ljungdahlii is controlled by 706 thermodynamics rather than by enzyme expression. Energy Environ. Sci. 9, 2392–2399 (2016). 707 22. de Souza Pinto Lemgruber, R. et al. A TetR-family protein (CAETHG_0459) activates 708 transcription from a new promoter motif associated with essential genes for autotrophic growth in 709 acetogens. Front. Microbiol. 10, 2549 (2019). 710 23. Song, Y. et al. Determination of the genome and primary transcriptome of syngas fermenting 711 Eubacterium limosum ATCC 8486. Sci. Rep. 7, 13694 (2017). 712 24. Al-Bassam, M. M. et al. Optimisation of carbon and energy utilisation through differential 713 translational efficiency. Nat. Commun. 9, 4474 (2018). 714 25. Song, Y. et al. Genome-scale analysis of syngas fermenting acetogenic bacteria reveals the 715 translational regulation for its autotrophic growth. BMC Genomics 19, 837 (2018). 716 26. Valgepea, K., Adamberg, K., Seiman, A. & Vilu, R. Escherichia coli achieves faster growth by 717 increasing catalytic and translation rates of proteins. Mol. Biosyst. 9, 2344–2358 (2013). 718 27. Liebermeister, W. et al. Visual account of protein investment in cellular functions. Proc. Natl. 719 Acad. Sci. U. S. A. 111, 8488–8493 (2014). 26 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 720 28. Schubert, O. T. et al. Absolute proteome composition and dynamics during dormancy and 721 resuscitation of Mycobacterium tuberculosis. Cell Host Microbe 18, 96–108 (2015). 722 29. Davidi, D. et al. Global characterisation of in vivo enzyme catalytic rates and their correspondence 723 to in vitro kcat measurements. Proc. Natl. Acad. Sci. U. S. A. 104, 20167–20172 (2016). 724 30. Calderón-Celis, F., Encinar, J. R. & Sanz-Medel, A. Standardization approaches in absolute 725 quantitative proteomics with mass spectrometry. Mass Spectrom. Rev. 37, 715–737 (2017). 726 31. Maaβ, S. & Becher, D. Methods and applications of absolute protein quantification in microbial 727 systems. J. Proteomics 136, 222–233 (2016). 728 32. Ludwig, C., Claassen, M., Schmidt, A. & Aebersold, R. Estimation of absolute protein quantities 729 of unlabeled samples by selected reaction monitoring mass spectrometry. Mol. Cell. Proteomics 11, 730 M111.013987 (2012). 731 33. Blein-Nicolas, M. & Zivy, M. Thousand and one ways to quantify and compare protein 732 abundances in label-free bottom-up proteomics. Biochim. Biophys. Acta 1864, 883–895 (2016). 733 34. Ahrné, E., Molzahn, L., Glatter, T. & Schmidt, A. Critical assessment of proteome-wide label-free 734 absolute abundance estimation strategies. Proteomics 17, 2567–2578 (2013). 735 35. Schmidt, A. et al. Absolute quantification of microbial proteomes at different states by directed 736 mass spectrometry. Mol. Syst. Biol. 7, 510 (2011). 737 36. Maier, T. et al. Quantification of mRNA and protein and integration with protein turnover in a 738 bacterium. Mol. Syst. Biol. 7, 511 (2011). 739 37. Marguerat, S. et al. Quantitative analysis of fission yeast transcriptomes and proteomes in 740 proliferating and quiescent cells. Cell 151, 671–683 (2012). 741 38. Zeiler, M., Moser, M. & Mann, M. Copy number analysis of the murine platelet proteome 742 spanning the complete abundance range. Mol. Cell. Proteomics 13, 3435–3445 (2014). 743 39. Schmidt, A. et al. The quantitative and condition-dependent Escherichia coli proteome. Nat. 744 Biotechnol. 34, 104–110 (2015). 745 40. Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011). 746 41. Malmström, J. et al. Proteome-wide cellular protein concentrations of the human pathogen 747 Leptospira interrogans. Nature 460, 762–765 (2009). 27 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 748 42. Brun, V. et al. Isotope-labeled protein standards: toward absolute quantitative proteomics. Mol. 749 Cell. Proteomics 6, 2139–2149 (2007). 750 43. Shuford, C. M. et al. Absolute protein quantification by mass spectrometry: not as simple as 751 advertised. Anal. Chem. 89, 7406–7415 (2017). 752 44. Takemori, N. et al. High-throughput synthesis of stable isotope-labeled transmembrane proteins 753 for targeted transmembrane proteomics using a wheat germ cell-free protein synthesis system. Mol. 754 Biosyst. 11, 361–365 (2015). 755 45. Takemori, N. et al. MEERCAT: multiplexed efficient cell free expression of recombinant 756 QconCATs for large scale absolute proteome quantification. Mol. Cell. Proteomics 16, 2169–2183 757 (2017). 758 46. Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent 759 acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, 760 O111.016717 (2012). 761 47. Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale 762 SRM experiments. Nat. Methods 8, 430–435 (2011). 763 48. MacLean, B. et al. Skyline: an open source document editor for creating and analysing targeted 764 proteomics experiments. Bioinformatics 26, 966–968 (2010). 765 49. Rosenberger, G., Ludwig, C., Röst, H. L., Aebersold, R. & Malmström, L. aLFQ: an R-package 766 for estimating absolute protein quantities from label-free LC-MS/MS proteomics data. Bioinformatics 767 30, 2511–2513 (2014). 768 50. Lahtvee, P.-J. et al. Absolute quantification of protein and mRNA abundances demonstrate 769 variability in gene-specific translation efficiency in yeast. Cell Syst. 4, 495–504.e5 (2017). 770 51. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference 771 resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016). 772 52. Peebo, K. et al. Proteome reallocation in Escherichia coli with increasing specific growth rate. 773 Mol. Biosyst. 11, 1184–1193 (2015). 28 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 774 53. Lemaire, O. N. & Wagner, T. Gas channel rerouting in a primordial enzyme: Structural insights of 775 the carbon-monoxide dehydrogenase/acetyl-CoA synthase complex from the acetogen Clostridium 776 autoethanogenum. Biochim. Biophys. Acta 1862, 148330 (2021). 777 54. Liew, F. et al. Insights into CO2 fixation pathway of Clostridium autoethanogenum by targeted 778 mutagenesis. mBio 7, e00427-16 (2016). 779 55. Liew, F. et al. Metabolic engineering of Clostridium autoethanogenum for selective alcohol 780 production. Metab. Eng. 40, 104–114 (2017). 781 56. Marcellin, E. et al. Low carbon fuels and commodity chemicals from waste gases – systematic 782 approach to understand energy metabolism in a model acetogen. Green Chem. 18, 3020–3028 (2016). 783 57. Wang, S. et al. NADP-specific electron-bifurcating [FeFe]-hydrogenase in a functional complex 784 with formate dehydrogenase in Clostridium autoethanogenum grown on CO. J. Bacteriol. 195, 4373– 785 4386 (2013). 786 58. Mock, J. et al. Energy conservation associated with ethanol formation from H2 and CO2 in 787 Clostridium autoethanogenum involving electron bifurcation. J. Bacteriol. 197, 2965–2980 (2015). 788 59. Maia, L. B., Fonseca, L., Moura, I. & Moura, J. J. G. Reduction of carbon dioxide by a 789 molybdenum-containing formate dehydrogenase: a kinetic and mechanistic study. J. Am. Chem. Soc. 790 138, 8834–8846 (2016). 791 60. Schuchmann, K. & Müller, V. Direct and reversible hydrogenation of CO2 to formate by a 792 bacterial carbon dioxide reductase. Science 342, 1382–1385 (2013). 793 61. Bar-Even, A. et al. The moderately efficient enzyme: evolutionary and physicochemical trends 794 shaping enzyme parameters. Biochemistry 50, 4402–4410 (2011). 795 62. Song, Y. et al. Functional cooperation of the glycine synthase-reductase and Wood–Ljungdahl 796 pathways for autotrophic growth of Clostridium drakei. Proc. Natl. Acad. Sci. U. S. A. 117, 7516– 797 7523 (2020). 798 63. Tan, Y., Liu, J., Liu, Z. & Li, F. Characterization of two novel butanol dehydrogenases involved 799 in butanol degradation in syngas-utilising bacterium Clostridium ljungdahlii DSM 13528. J. Basic 800 Microbiol. 54, 996–1004 (2014). 29 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 801 64. Davidi, D. & Milo, R. Lessons on enzyme kinetics from quantitative proteomics. Curr. Opin. 802 Biotechnol. 46, 81–89 (2017). 803 65. Nilsson, A., Nielsen, J. & Palsson, B. Ø. Metabolic models of protein allocation call for the 804 kinetome. Cell Syst. 5, 538–541 (2017). 805 66. Heckmann, D. et al. Machine learning applied to enzyme turnover numbers reveals protein 806 structural correlates and improves metabolic models. Nat. Commun. 9, 5252 (2018). 807 67. Brown, S. D. et al. Comparison of single-molecule sequencing and hybrid approaches for 808 finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial 809 relevant Clostridia. Biotechnol. Biofuels 7, 40 (2014). 810 68. Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method 811 for proteome analysis. Nat. Methods 6, 359–362 (2009). 812 69. Vizcaíno, J. A. et al. The Proteomics Identifications (PRIDE) database and associated tools: status 813 in 2013. Nucleic Acids Res. 41, D1063–D1069 (2012). 814 70. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 815 337–342 (2011). 816 71. Mock, J., Wang, S., Huang, H., Kahnt, J. & Thauer, R. K. Evidence for a hexaheteromeric 817 methylenetetrahydrofolate reductase in Moorella thermoacetica. J. Bacteriol. 196, 3303–3314 (2014). 818 72. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for 819 functional characterisation of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016). 820 73. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful 821 approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995). 822 823 ACKNOWLEDGEMENTS 824 We thank Jörg Bernhardt for help with proteomaps, Tim McCubbin for scripts, Andrus Seiman for 825 statistics, and Olivier Lemaire for valuable discussions. This work was funded by the Australian 826 Research Council (ARC LP140100213) in collaboration with LanzaTech. We thank the following 827 investors in LanzaTech's technology: Sir Stephen Tindall, Khosla Ventures, Qiming Venture Partners, 828 Softbank China, the Malaysian Life Sciences Capital Fund, Mitsui, Primetals, CICC Growth Capital 30 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 829 Fund I, L.P. and the New Zealand Superannuation Fund. The research utilised equipment and support 830 provided by the Queensland node of Metabolomics Australia, an initiative of the Australian 831 Government being conducted as part of the NCRIS National Research Infrastructure for Australia. 832 There was no funding support from the European Union for the experimental part of the study. 833 However, K.V. acknowledges support also from the European Union’s Horizon 2020 research and 834 innovation programme under grant agreement N810755. 835 836 AUTHOR CONTRIBUTIONS 837 Conceptualisation, K.V., C.L., M.K., L.K.N., and E.M.; Methodology, K.V., G.T., N.T., A.T., C.L., 838 A.P.M., R.T., and E.M.; Formal Analysis, K.V., and G.T.; Investigation, K.V., G.T., N.T., and A.T.; 839 Resources, N.T., A.T., M.K., S.D.S., L.K.N., and E.M.; Writing – Original Draft, K.V. and E.M.; 840 Writing – Review & Editing, K.V., N.T., C.L., A.P.M., R.T., M.K., L.K.N., and E.M.; Supervision, 841 M.K., L.K.N., and E.M.; Project Administration, R.T., and E.M.; Funding Acquisition, M.K., S.D.S., 842 L.K.N., and E.M.. 843 844 COMPETING INTERESTS 845 LanzaTech has interest in commercial gas fermentation with C. autoethanogenum. A.P.M, R.T., M.K., 846 and S.D.S. are employees of LanzaTech. 847 848 MATERIALS & CORRESPONDENCE 849 Correspondence and request for materials should be addressed to E.M. 850 851 FIGURE LEGENDS 852 Fig. 1 Absolute proteome quantification framework in C. autoethanogenum. Absolute proteome 853 quantification in light (no stable-isotope labelled [SIL] substrates) autotrophic C. autoethanogenum 854 chemostat cultures was built on using 19 synthetic heavy SIL-protein spike-in standards and data- 855 independent acquisition (DIA) mass spectrometry (MS) analysis. Culture samples with SIL-protein 856 spike-ins and samples for DIA spectral library were analysed by DIA MS. Subsequent stringent data 31 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 857 analysis allowed to quantify intracellular concentrations for 16 key C. autoethanogenum proteins 858 using light-to-heavy ratios between endogenous and spike-in DIA MS intensities. These 16 key 859 proteins were further used as anchor proteins for label-free estimation of ~1,043 protein 860 concentrations through establishing a linear correlation between protein concentrations and their 861 measured MS intensities. Some parts created with BioRender.com. 862 863 Fig. 2 Label-free estimation of proteome-wide protein concentrations. a Correlation of peptide 864 mass spectrometry (MS) feature intensities between biological replicate cultures of the three gas 865 mixtures. b Linear correlation between anchor protein concentrations and their measured MS 866 intensities for one syngas culture. gDCW, gram of dry cell weight, aLFQ, absolute label-free 867 quantification. c Errors of different label-free quantification models for the linear fit between anchor 868 protein concentrations and their measured MS intensities determined by bootstrapping using the aLFQ 869 R package49 for one syngas culture. CV-MFE, cross-validated mean fold-error. d Label-free 870 quantification error of optimal model (orange) and total proteome mass (blue) across samples. Error 871 bars denote 95% CI. 872 873 Fig. 3 Strong correlation between protein concentrations and expected stoichiometries for 874 equimolar protein complexes. Grey dotted lines denote the average 1.5-fold cross-validated mean 875 fold-error (CV-MFE) of label-free protein concentrations. Label-free protein concentrations are 876 plotted, except for the HytA-FdhA complex, which was quantified using stable-isotope labelled 877 protein spike-ins. Data points of the same colour represent gas mixtures. See Methods for details on 878 expected protein complex stoichiometries. See Supplementary Table S3 for gene/protein ID, proposed 879 name, description, and label-free data. See Table 1 for HytA-FdhA data. gDCW, gram of dry cell 880 weight. 881 882 Fig. 4 Proteomaps uncover global proteome allocation. Left proteomap shows proteome allocation 883 amongst functional gene classification categories (KEGG Orthology identifiers [KO IDs] 51) at level 884 two of the four-level “treemap” hierarchy structure (Supplementary Table S4). Right proteomap 32 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 885 shows proteome allocation at the level of single proteins (level four of “treemap”). See Supplementary 886 Fig. S5 for proteomaps of levels one and three of “treemap”. Area of the tile is proportional to protein 887 concentration. Colours denote level one categories of “treemap”. Proteomaps visualise average 888 concentrations of syngas cultures while category percentages are average of three gas mixtures 889 (shown for categories with a fraction >5%). See Supplementary Table S3 for gene/protein ID, 890 proposed name, description, and label-free protein concentrations. 891 892 Fig. 5 Quantitative systems-level view of acetogen central metabolism. Enzyme concentrations -1 893 (nmol/gDCW), apparent in vivo catalytic rates of enzymes (kapp; s ), and metabolic flux rates 894 (mmol/gDCW/h) are shown for C. autoethanogenum steady-state chemostat cultures grown on three 895 gas mixtures. See dashed inset for bar chart and heatmap details. Enzyme concentration and kapp data 896 are average of biological replicates. Proteins forming a complex are highlighted with non-black 897 borders (FdhA forms a complex with HytA–E for direct CO2 reduction with H2; CooS1 is expected to 898 form a complex with CooS1a and b as they are encoded from the same operon). For reactions with 899 isoenzymes, kapp is for the enzyme with the highest concentration ranking (top location on enzyme 900 heatmap), see Methods for details. Flux data from ref.18 are average of biological replicates and error 901 bars denote standard deviation. Arrows show direction of calculated fluxes; red arrow denotes uptake 902 or secretion. Gene/protein IDs right of enzyme concentration heatmaps are preceded with 903 CAETHG_RS and red font denotes concentrations determined using stable-isotope labelled (SIL) 904 protein spike-in standards (i.e., anchor proteins). Asterisk denotes data for redox-consuming CO2 a 905 reduction to formate solely by FdhA without the use of H2 during growth on CO. Bifunctional 906 acetaldehyde/alcohol dehydrogenase (acetyl-CoA→ethanol); bFlux into PEP from OAA and pyruvate 907 is merged and kapp is for PEPCK. See Supplementary Table S3 for gene/protein ID, proposed name, 908 description, and label-free protein concentrations. See Table 1 for anchor protein concentrations. See 18 909 Supplementary Table S5 for kapp and flux data. See ref. for cofactors of reactions and metabolite 910 abbreviations. gDCW, gram of dry cell weight, NQ, not quantified. 911 33 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 912 Fig. 6 Regulatory principles of apparent in vivo catalytic rates of enzymes (kapp) and metabolic 913 flux throughput. a Enzymes catalysing higher metabolic flux rates have both higher concentrations 914 and higher kapps. Yellow and blue denote high and low values, respectively. Kendall’s τ correlations 915 with significance p-values between respective pairs are shown below heatmap. See Supplementary 916 Table S5 for flux rate, enzyme concentration, and kapp data, and for description of reaction names 917 (Rxn name) and gene-protein-reaction (GPR) associations. b Control of metabolic flux throughput 918 through kapp changes for high flux pathways. See also Fig. 5. gDCW, gram of dry cell weight. 34 bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 19 heavy SIL-proteins of Light and heavy DIA-MS Data analysis Absolute quantification CO H2 CO2 C. autoethanogenum peptides Thermo Q-Exactive HF-X Skyline & mProphet SILP-based concentrations ADLNNENLEALEK for 16 anchor proteins Heavy KR Light KR MS intensity Anchor concentration MS intensity Retention Time DIA spectral library Label-free concentrations Light C.autoethanogenum Light cell lysate Light cell lysates & fractions for ~1,043 proteins chemostat cultures heavy SIL-proteins bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a Correlation of peptide features b Anchor protein calibration 1 4 CO 3 y = 0.94x - 6.15 0.97 R² = 0.96 2 Syngas 0.94 anchor protein 10 1 High-H2 CO Pearson correlation (R) concentration (nmol/gDCW) 0.91 Log 0 CO Syngas High-H 7 8 9 10 Log aLFQ MS intensity 2 CO 10 d c Label-free quantification accuracy aLFQ optimal model determination 1.6 3 Sum of protein 2.0 5 1.9 1.2 4 2 1.8 1.7 0.8 3 1.6 1 CV-MFE 2 1.5 0.4 CV-MFE CV-MFE 1 1.4 Fragment ions Sum of label-free protein concentrations (µg) 0 0 top1 top2 top3 top4 top5 all iBAQ CO Syngas High-H2 CO Peptides bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 2 tinubus noitartnecnoc nietorP noitartnecnoc tinubus 2 10³ Ribosome (30S/50S) 10² MetFV Anthranilate synthase HytA-FdhA Glutamate synthase IscS-IscU 10¹ SecD-SecF CarA-CarB 3-isopropylmalate dehydratase InfB-InfC (nmol/gDCW) GyrA-GyrB FtsEX ABC cell division 10⁰ Citrate lyase 1.50x1x 0.67x 0 0 10⁰ 10¹ 10² 10³ Protein concentration subunit 1 (nmol/gDCW) bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. gph pilT ABC.FEV.S_ birA 8 ictB Metabolism of terpenoids hprA2 glcD2 dxr lipA MTHFS wecB iscS3 comEC2 prkC ABC.SN.S_ and polyketides glcD gcpE yfbK murA glmS2 hprA GGPS dxs 2 ABC.PA.P_ glmE lplA2 murB galE2 hemC MTFMT 2 gtrB2 glmU cobB-cbiA cobM gpmI glcD4 ispF pbuG wbpA thiI bioB tlyC thiN bioB2 cbiE glcD3 mal galE cobH-cbiC ABC.PA.S hutG2 npdA glmM glf cobQ deoC pncC cbiT PdxS ABC.FEV.S_ deoB hemE2 cobA cbiK pdxT 2 wlbA hemA nadC nadA ABC.PA.S_ ydjE2 FolD hemB folC moaA 3 Metabolism of cofactors nadC2 cobJ ppnK cobI-cbiL TalA cobA-hemD folP moeA3 GAPDH2 nadE pncB ribE sulD ABC-2.A_ ABC.FEV.S FBA2 GcvH 20 cbiD uraA2 tuaC coaX murD and vitamins PGK corA2 ABC.PA.A cobU cbiG ABC.PA.S_2 ribBA ribH cobK-cbiJ moeA1 tktA4 murI Metabolism MoeA2 ggt mogA of other panE csn2 amino hemL acids Protein MetF GCH1 mreB K10907 trxB selA CoaBC coaBC2 nrnA csh1 ygfK cas5h ribD K08303 ncd2 csh2 K08303_ 2 ampS families PfkA2 cysK2 alr tktA3 DNPEP HytE1 ldcA TPI ddl Hyd1a Hyd1b pepF GAPDH gdhA prc hcp2 thiJ PreP LpdA Hyd1c HytA AOR2 nifH3 nifH4 lon2 CooS1 hcp clpP Carbohydrate GPI AcsA paaH2 fbp3 map HytD adh2 pgm ENO pepP pepQ metabolism larA (CODH) yvgN accC2 plc 4hbD lon accD gloB BDH MetV HytB (7%) FchA ClpP2 accA Pta PpdK Pyc HytE2 ME2_ larA2 CooC 4 CooC2 AcsB(ACS) HytC Hyd4a adh ldh RS04110 PckA RS02985 RS11275 Energy RS15715 RS06320 RS20145 RS01855 RnfD RS15140RS12595 RS04820 C1 AOR1 RS14895 RS17395 RS07705RS01190 RS19915 RS07785 RS16255 RS18085 AcsD RS10620 RS07615 RS16345 RS11145 Fdh RS15485 RS11240 RS04460 RS15070 RS15075 AckA RS10295 metabolism RS10555 PpaC RS02070 RS07880 RS02655 RS12730 RS01295RS12555 RS19280 RS11075 PFOR RS14350 RS19070 (13%) RS14015RS11130 RS19205 RS00650 RS13050 RS13070 RS18865 RS01065 RS16365 RS13295 RS14480 RS04290 RS00570 araA RS16220 Nfn RS13280 RS10220 accB RS07700 RS08915 RS00980RS13120 yjhH3 RS02140 RS19850 RS03605 RS02990 adh5 RS10090 RS03885 RS19530 UGP2 RS12690 RS00955 RS00740 atpG RS04390 RS09955 ACO AcsE fixation(30%) ACO3 IDH3 AcsC Adh4 RS10725 RS01770 manA RS19840RS03275 RS15235 RS05130 fumB AtpD algA2 mgsA E2.3.3.3.2 RS03945 fumA RS00060 pyrI udk RS19690 RS05760 RnfC rnfE RS05240 DPYS RS05915 pyrR RS03600 pyrE RS13240 pyrG RS04450 codA atpH adh3 RS12360 pyrDI dut RS08000 Upp URA42 dapB4 SORD RS19115 RS04640 nrdA dapA RS06545 Adk lysA carB guaD hisA RnfB RS06900 Pdp hisF LYSN3 punA dapE RS12835 RS08145 dapF atpB RnfG RS08160 GuaA CooS1b hisH pyrB hisC ygeT purM hisIE hisG RS10845 pyrH PurA RS00695 PFAS xpt metK2 hisZ NUDT5 Fhs nudF spoT purB E3.1.3.15B E2.7.6.5X3 hutG add PurC CooS1a atpF RS03270 gmk RS00315 RS13160 surE hisD nrdD mtnN RS16840 RS18455 RS08725 Nucleotide RS18870 RS19150 RS01035 purE trpC comC fwdE purF mtaA3 comB gpsA cysK3 RS06780 mtaA2 RS11790 metabolism trpF MetY ilvB3 RS12580 RS13305 RS07900 dagK metA RS04855 RS07780 RS16475 RS03385 RS10920 RS20130 PRPS RS13985 RS18910 RS07920 TrpB AroB thrC ilvB4 leuC PurH RS08530 RS07285 RS11490 RS02580 RS17030 aroF speE RS19210 metH IlvC RS03290 gldA pheA RS10930 RS11325 AroE leuD RS19935 aroK serA2 serA RS12845 Amino acid plsX purD tyrA2 Lipid IMPDH leuB RS01210 rbgA APRT RS03555 RS13215 obgE uppS2 murE rfbB aroA spoVD hom serA6 MCH TrpD Asd ilvH Glycan biosynthesis rsuA rsmB RS03060 mrdA2 pheA2 ilvE2 RS04455 RS14970 and metabolism murT murF leuB2 RS06550 mrcA gatD plsC fabH hom2 tsaC2 murG rfbA RS18705 rumA2 glyA RS04500 metabolism ychF rfbC rimO engA sdaA rlmB era rimM AcpP serA4 tlyA mhpD RS02310 RS00725 aroC ilvA gltD2 leuA RS03940 AROA2 RS13695 metabolism prfA fabG3 TrpG thrB gltD RS14725 RS02400 RS05495 fabK OTC prfC carB2 IlvD RS11795 RS02170 No KO fabZ paaK enr RS16450 argH RS04385 RS15670 RS14605 (8%) proB proC RS15415 RS12745 RplK aspB2 yhdR RS07870 rplI TrpE RS12855 RS07335 RS07100 Efp frr rplJ RS07585 argD RS07790 RS14545 RS08060 RS17290 aspA RS02815 asnB3 RS09565 argJ aspB RS07415 PTH1 argC RS05235 RS17250 RS07600 gltB RS00480 RS03720 rplX ppiC proA RS01310 carA2 RS09930 RplM RS06180 RS12810 dnaJ RS13560 glnA RS05140 RS02715 RS15200 RS12725 RS07760 NIT2 RS19050 HSP20 nadB RS07275 RS11065 RS04440 RS13935 RS07775 Tsf RpsO RplL RS09150 FusA RS10825 RS00120 RS07915 RpsM RS02175 RS17875 ID RS12825 RS08035 RS02290 infC RS16860 rplW RS15550 RS18730 RS06340 RS04265 RS08110 K07082 rplN RS12755 (11%) RS18025 RS14525 RS18165 RS12685 rpmC RS09885 RS02125 DnaK RS10910 ycdX RS05080 RS04465 K07040 RS08610 RS12590 RS13080 RpmD TrxA RS07865 K09118 ftsH2 RS10325 RS05975 RplF K07030 RS00705 RS08785 RS12520 RS18660 K09766 RpsE rplV K00243 K09777 RS04355 rpsL RS02820 RS05840 RS06190 K07079_ RS07095 K07138_ RS08605 2 2 RS06240 RS00055 RS01740 degP2 RS06280 GRPE RS10635 RS04030 RS06940 RpsP RS13085 K07076 RS13990 RS12750 RS12565 RS10020 RS03525 RS01060 RS18015 RS03550 PPIB RS06080 RS12740 ftsH K09157 RS01825 RS15210 RS12330 RS19905 RS17965 K07137_ RS02130 RS03655 RS17200 rplE RS14625 RS04395 RS11165 RS12760 RS11430 RS01520 RpsC 2 RS00190 Poorly RS06440 RS01720 RS11780 RS17820 RS10915 RS10640 RS19900 K07079 hypF xdhC RS11245 RS12005 characterised dnaJ2 hypD ABC.CD.TX2 RS15225 RS12770RS19910 K09117 hypE2 Tuf rpsA RplO pspA RpsK htpG K06923 typA RpsD K06933 rplD hypE hr cstA2 Folding, rpsF K07166 K07138 K06929 K07137 K09766_ yggS SpoVS RpsB 2 mcsA TC.APA5 K07219 xdhC2 TC.SSS3 hprK K07220 Tig clpB K09975 Translation sorting and K07571 GroEL spoVG jag rplC HSP20_ (11%) 2 K09747 mnmA xseB RpsJ rnc mutS2 RpsG ftnA RpsH clpX larE degradation uvrC uvrB dnaN infB rapZ E3.1.3.97 K09749 ridA2 FprB polA miaB mcsB uvrA uvrD4 ssb ABCF3 rph KAE1 dnaG queG2 uvrD3 ruvX nifE2 disA yabN tsaB larC E4.3.1.15 PDF rplB dtd cfa apbE2 hydN2 Unclassified RpsI mtaB apbE E1.6.99.1 tilS queG dacA fixA rpsS modD sepF parA RplP clpC MreB2 clpA larB FARSA tcdA pdhR dnaC rplA mnmE CspA3 topB2 prsA fsaB GroES trmL gidA PDF5 IscU hslO iscS2 AbrB purR ABC.SP.S_ acrA2 argR PDF3 2 rho rseC FprA1_ cheR KARS mcp31 FtsZ ftsA RplU HARS fdhF Cellular parB pucR kdgR3 mcp29 community MARS kdgR4 perR RS01050 pgdA mcp15 acoR2 nusG 2 nnr - glnB2 phoB1 PDF2 deaD2 cheC prokaryotes rnj cheB mcp17 RplS EARS NARS2 AARS tex larE2 luxS rplQ acoR K07445 K22882 minD RpsU greA mcp13 iscS resD3 flgE mcp16 Replication ecfA1 divIVA yajC codY greA3 citF3 argR2 citF4 CARS2 acoR6 Transcription cbiO2 rny padR NARS hfq citE4 mcp27 TARS LARS mcp20 fliC2 VARS acoR3 LacI ftsE ytfQ cheA2 mcp5 rpoC mglB and repair Signal recG gyrB PtsH mreC rplR asnC transduction pnp cbiQ2 fliD mcp8 mcp3 bapA Membrane rsbV gyrA WARS secA mcp4 recG2 YARS2 greA2 fliC3 RplT secF ftsX bmpA2 CheY3 Cell motility dnaA spo0A metQ DARS2 QARS ffh mcp23 transport RecA radA IARS rex ccpN flgK cheW3 cheD HupB GARS SARS nusB flgL lsrK mutT3 PARS potD cycB topA addA nusA rpoB znuA FARSB CARS ycbB znuA2 cas1 RARS secD ftsY RpoA pleD ptsI rbsB metN tupA fliF fliN mcp19 motB2 fliM bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. CO H2 40 40 + 20 Fdred+NADH+2NADP 20 Flux k 0 0 app + Enzyme Fdox+NAD +2NADPH Nfn 07665 CO Enzyme Nfn transhydrogenase CooS1 14775 30 Flux CooS1a 14770 H Enzyme k CooS1b 14765 2 app Enzyme NQ 07645 15 CooS2 NQ 19130 HytA 13765 07650 0 HytC 13745 NQ 07655 F6P Flux HytB 13750 1 Enzyme + CO2 NQ 00530 (Fdred+NADPH+H ) kapp HytD 13755 2 NQ 17510 Fbp3 04285 H2 Enzyme HytE1 13760 PfkA2 12015 * NQ 18840 Flux * FdhA * NQ 13725 (+13745-13770) HytE2 13770 NQ 04100 k * FdHA2 NQ 14690 app * + GAP 1 H + + 1.5 [H ] 2 (Fdox+NADP ) Enzyme Flux GAPDH2 16815 Formate k GAPDH 08515 1.0 MIN Heatmap scale MAX app Flux Enzyme Enzyme 0.3 1,122 Enzyme Flux 0 29 Flux Fhs 07850 0.5 kapp kapp 0 6 74 9k app 10 0.0 Substrate 3PG CO syngas high-H CO 10-Formyl-THF 2 Enzyme/protein Flux Name CAETHG_RS... Enzyme direction Enzyme Flux Flux kapp GpmI 08500 k FchA 07845 Reaction app k app Product Input and output fluxes Parameters and units Enzyme/protein - protein concentration (nmol/gDCW) 2PG 5,10-Methenyl-THF Flux - specific flux rate (mmol/gDCW/h) Enzyme Blue - CO -1 kapp - apparent in vivo catalytic rate of enzyme (s ) Flux Red - syngas FolD NQ 07840 Enzyme Green - high-H2 CO kapp NQ Flux mmol/gDCW/h Flux ENO 08495 kapp Enzyme 5,10-Methylene-THF Enzyme Enzyme 13535 PEPCK 13335 Flux MetF 07830 13540 PEP α 13545 kapp MetV 07835 -KG Flux 5-Methyl-THF Flux b k Enzyme/protein app kapp 13535 AcsE 07805 OAA Flux Enzyme kapp 13540 AcsC 07810 PPDK 14295 kapp kapp 13545 AcsD 07815 Biomass Flux k CO [CH3] app Enzyme Enzyme AcsB(ACS) 07800 Pyc 07735 Flux AcsA(CODH) 07861 Enzyme kapp AcsA(CODH) 07862 Pyruvate Flux PFOR 14890 Enzyme PFOR2 kapp NQ 04430 IlvB4 01930 Acetyl-CoA Flux IlvH 00595 Enzyme kapp IlvB3 00590 Flux Enzyme Pta 16490 NQ 08420 kapp Ald NQ 08810 Flux NQ 08865 kapp NQ Ald2 NQ 16140 Acetolactate Acetyl-P Enzyme Flux Enzyme Enzyme Flux AlsD NQ 14410 a k NQ AckA 16495 AdhE1 NQ 18395 app kapp AdhE2 NQ 18400 Flux Acetaldehyde Enzyme kapp Adh4 08920 Acetoin Adh 02620 Enzyme Acetate Enzyme Flux 5 Flux Adh3 02630 BDH 01830 k AOR1 00440 Enzyme app kapp Adh2 01950 AOR2 00490 AtpI NQ 11520 Enzyme Adh5 19400 AtpB 11525 RnfC 15840 SORD NQ 15940 0 AtpE NQ 11530 RnfD 15845 1 AtpF 11535 RnfG 15850 2 ATP AtpH 11540 2,3-BDO Fdox RnfE 15855 Fdred Ethanol 12 AtpG 11550 RnfA 15860 + 10 0.10 NADH NAD 6 AtpD 11555 RnfB 15865 AtpC NQ 11560 0.05 5 0 0 0.00 30 Rnf complex ATPase 15 0 2 [H+] bioRxiv preprint doi: https://doi.org/10.1101/2021.05.11.443690; this version posted May 12, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a GPR b Rxn name Flux Enzyme kapp CAETHG_RS… rxn07189 14775 rxn00690 07850 10 49 rxn01211 07845 60 HytA-E high-H2 CO rxn00907 07840 syngas 300 rxn00910 07830+35 rxn06149 07805 40 rxn12080 07800 (s -¹ ) rxn00225 16495 rxn00173 16490 app rxn08518 13725+(45-70) k 20 200 rxn00543 08920 CO leq000004 00440 72 leq000001 13745-70 400 100 leq000002 07665 rxn00103 13725 81 rxn05938 14890 30 Pta rxn00001 15405 high-H2 CO rxn00250 07735 89 (s -¹ ) 30 rxn00184 11665 20 81 syngas rxn00097 09335 app rxn05289 09165 k CO rxn00247 13335 10 rxn01106 08500 10 rxn01100 08510 Protein concentration (nmol/gDCW) rxn00459 08495 rxn00781 16815 150 rxn00003 01930 rxn00260 01020 high-H CO rxn00903 14905 AOR1 2 Hydrogenase rxn00747 08505 10 (s -¹ ) syngas rxn02187 00580 290 rxn03068 00580 app CO rxn00898 00585 k 211 09960+65 5 rxn06673 Acetate rxn00187 09895 270 rxn02112 01830 production rxn01643 06525 rxn00611 16355 15 rxn01301 15215 rxn00974 13540 rxn00256 13535 Ethanol MetF 233 280 rxn01388 13540 10 production rxn00505 13545 (s -¹ ) rxn01069 05835 syngas 254 high-H2 CO rxn00770 09805 app rxn01974 15555 k 5 CO rxn01972 18870 Wood-Ljungdahl rxn03030 06545 rxn03086 06510 pathway rxn00248 12200 0 rxn00799 09220 0 2 4 6 8 10 12 τ=0.56 p=5×10-9 τ=0.45 p=2×10-6 Specific flux rate (mmol/gDCW/h)