ORIGINAL RESEARCH published: 21 August 2020 doi: 10.3389/fbioe.2020.00772
Development of a Genome-Scale Metabolic Model of Clostridium thermocellum and Its Applications for Integration of Multi-Omics Datasets and Computational Strain Design
Sergio Garcia 1,2, R. Adam Thompson 2,3,4†, Richard J. Giannone 2,5, Satyakam Dash 2,6, 2,6 1,2,3,4 Edited by: Costas D. Maranas and Cong T. Trinh * Young-Mo Kim, 1 2 Pacific Northwest National Laboratory Department of Chemical and Biomolecular Engineering, The University of Tennessee, Knoxville, TN, United States, Center 3 (DOE), United States for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN, United States, Bredesen Center for Interdisciplinary Research and Graduate Education, The University of Tennessee, Knoxville, TN, United States, 4 Oak Ridge Reviewed by: National Laboratory, Oak Ridge, TN, United States, 5 Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Karsten Zengler, TN, United States, 6 Department of Chemical Engineering, The Pennsylvania State University, University Park, PA, University of California, San Diego, United States United States Esteban Marcellin, The University of Queensland, Solving environmental and social challenges such as climate change requires a shift Australia from our current non-renewable manufacturing model to a sustainable bioeconomy. *Correspondence: To lower carbon emissions in the production of fuels and chemicals, plant biomass Cong T. Trinh [email protected] feedstocks can replace petroleum using microorganisms as biocatalysts. The anaerobic thermophile Clostridium thermocellum is a promising bacterium for bioconversion due †Present address: R. Adam Thompson, to its capability to efficiently degrade lignocellulosic biomass. However, the complex Quantitative Translational metabolism of C. thermocellum is not fully understood, hindering metabolic engineering Pharmacology, DMPK-BA, Abbvie to achieve high titers, rates, and yields of targeted molecules. In this study, we developed Inc., North Chicago, IL, United States an updated genome-scale metabolic model of C. thermocellum that accounts for recent Specialty section: metabolic findings, has improved prediction accuracy, and is standard-conformant to This article was submitted to ensure easy reproducibility. We illustrated two applications of the developed model. Synthetic Biology, a section of the journal We first formulated a multi-omics integration protocol and used it to understand Frontiers in Bioengineering and redox metabolism and potential bottlenecks in biofuel (e.g., ethanol) production in Biotechnology C. thermocellum. Second, we used the metabolic model to design modular cells for Received: 02 April 2020 Accepted: 18 June 2020 efficient production of alcohols and esters with broad applications as flavors, fragrances, Published: 21 August 2020 solvents, and fuels. The proposed designs not only feature intuitive push-and-pull Citation: metabolic engineering strategies, but also present novel manipulations around important Garcia S, Thompson RA, central metabolic branch-points. We anticipate the developed genome-scale metabolic Giannone RJ, Dash S, Maranas CD and Trinh CT (2020) Development of a model will provide a useful tool for system analysis of C. thermocellum metabolism to Genome-Scale Metabolic Model of fundamentally understand its physiology and guide metabolic engineering strategies to Clostridium thermocellum and Its Applications for Integration of rapidly generate modular production strains for effective biosynthesis of biofuels and Multi-Omics Datasets and biochemicals from lignocellulosic biomass. Computational Strain Design. Front. Bioeng. Biotechnol. 8:772. Keywords: Clostridium thermocellum, biofuels, genome-scale model, metabolic model, omics integration, doi: 10.3389/fbioe.2020.00772 modular cell design, ModCell
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org 1 August 2020 | Volume 8 | Article 772 Garcia et al. C. thermocellum Model and Its Applications
1. INTRODUCTION ethanol production and carbon recovery but reduced amino acid formation, confirming the role of the malate shunt as an NADPH Global oil reserves will be soon depleted (Shafiee and Topal, source in C. thermocellum. 2009), and climate change could become a major driver Sulfur metabolism also plays a key role in redox metabolism of civil conflict (Hsiang et al., 2011). These challenges to of C. thermocellum and has been investigated for its role in security and the environment need to be addressed by ethanol production. Sulfate, a component of C. thermocellum replacing our current non-renewable production of energy media, serves as an electron acceptor, which is capable of and materials for a renewable and carbon neutral approach oxidizing sulfate to sulfite and then sulfide. Thompson et al. (Ragauskas et al., 2006). The gram-positive, thermophilic, (2015) demonstrated that the strain hydG ech pfl could not cellulolytic, strict anaerobe C. thermocellum is capable of efficient grow in a conventional defined medium due to its inability to degradation of lignocellulosic biomass to produce biofuels and secrete hydrogen or formate, but was able to rescue growth biomaterial precursors, making this organism an ideal candidate by sulfate supplementation to the culture medium. More for consolidated bioprocessing (CBP), where production of recently, Biswas et al. (2017) reported an increase in final lignocellulosic enzymes, saccharification, and fermentation take sulfide concentration and over-expression of the associated place in a single step (Olson et al., 2012). However, its complex sulfate uptake and reduction pathway in the hydG strain, and poorly understood metabolism remains the main roadblock but did not observe a significant difference in final sulfide to achieve industrially competitive titers, rates, and yields of concentration in hydG ech. Remarkably, neither of the strains biofuels such as ethanol (Tian et al., 2016) and isobutanol consumed cysteine from the medium, unlike the wild-type. (Lin et al., 2015). Sulfide can be converted to cysteine by CYSS (cysteine synthase) For the past decade, significant efforts have been dedicated or homocysteine and then methionine by SHSL2 (succinyl- to characterize and manipulate the central metabolism of homoserine succinate lyase) and METS (methionine synthase), C. thermocellum, due to increasing interest in developing but the connection between the cessation of cysteine uptake and this organism as a CBP manufacturing platform for biofuels sulfate metabolism remains unclear. production (Akinosho et al., 2014). C. thermocellum possesses Overall, the complex interactions of C. thermocellum atypical central metabolism, characterized by the important metabolic pathways remain challenging to understand and roles of pyrophosphate and ferredoxin (Zhou et al., 2013), engineer with conventional methods, and hence require a which makes redirection of both carbon and electron flows quantitative systems biology approach to decipher. To this for biofuel production challenging to achieve. Specifically, end, several genome-scale metabolic models (GSMs) of the metabolic network of C. thermocellum contains various C. thermocellum have been developed. The first GSM, named reactions to regulate intracellular concentration levels of NADH, iSR432, was constructed for the strain ATCC27405 and applied NADPH, and reduced ferredoxin. These cofactors are used as to identify gene deletion strategies for high ethanol yield electron donors with high specificity throughout metabolism. (Roberts et al., 2010). This model was then further curated To maintain redox balance, C. thermocellum also possesses into iCth446 (Dash et al., 2017). More recently, Thompson several hydrogenases to oxidize these reduced cofactors to et al. developed the iAT601 genome-scale model (Thompson molecular hydrogen that is secreted by the cell. Removal of these et al., 2016) for the strain DSM1313, which is genetically hydrogenases through deletion of ech (encoding the ferredoxin- tractable (Argyros et al., 2011). The iAT601 model was used dependent hydrogenase, ECH) and hydG (associated with the to identify genetic manipulations for high ethanol, isobutanol, bifurcating hydrogenase, BIF, and bidirectional hydrogenase, and hydrogen production (Thompson et al., 2016), and to H2ASE_syn) was successfully applied to increase ethanol yield by understand growth cessation prior to substrate depletion electron rerouting (Biswas et al., 2015). Thompson et al. (2015) observed under high-substrate loading fermentations that characterized the hydG ech strain in depth by flux analysis simulate industrial conditions (Thompson and Trinh, 2017). In of its core metabolism, concluding that the major driver for addition to these core and genome-scale steady-state metabolic ethanol production was redox rather than carbon balancing. In models, a kinetic model of central metabolism, k-ctherm118, particular, the conversion of reduced ferredoxin to NAD(P)H was recently developed and used to elucidate the mechanisms is likely the most rate limiting step. In a subsequent study, of nitrogen limitation and ethanol stress (Dash et al., 2017). Lo et al. (2017) over-expressed rnf (encoding the ferredoxin- Due to the biotechnological relevance of the Clostridium genus, NAD oxidoreductase, RNF) in the hydG ech strain that GSMs have also been developed for other species (Dash et al., is expected to enhance NADH supply, but did not achieve 2016), including C. acetobutylicum (Senger and Papoutsakis, improved ethanol yield. 2008; Salimi et al., 2010; McAnulty et al., 2012; Wallenius et al., In an attempt to redirect carbon and electron flows for 2013; Dash et al., 2014; Yoo et al., 2015; Lee and Trinh, 2019), enhanced ethanol production, Deng et al. (2013) manipulated C. beijerinckii (Milne et al., 2011), C. butyricum (Serrano- the pyruvate node and malate shunt of C. thermocellum. By Bermúdez et al., 2017), C. cellulolyticum (Salimi et al., 2010), and converting phosphoenolpyruvate (pep) to oxaloacetate (oaa) and C. ljungdahlii (Nagarajan et al., 2013). then to pyruvate (pyr), this shunt can interchange one mole In this study, we developed an updated genome-scale of NADPH with one mol of NADH generated from glycolysis. metabolic model of C. thermocellum, named iCBI655, with Interestingly, the authors noted that replacement of the malate more comprehensive and precise metabolic coverage, enhanced shunt by alternative pathways not linked to NADPH increased prediction accuracy, and extensive documentation. This model
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org 2 August 2020 | Volume 8 | Article 772 Garcia et al. C. thermocellum Model and Its Applications is a human-curated database that coherently represents all Transport and exchange reactions were updated to reflect the the available genetic, genomic, and metabolic knowledge export of amino acids and uptake of pyruvate as observed during of C. thermocellum from both experimental literature and fermentation experiments (Holwerda et al., 2014). bioinformatic predictions. Furthermore, the model can be In terms of specific reactions, oxaloaceate decarboxylase applied not only to enable metabolic flux simulation but also to was eliminated from the model in accordance with recent provide a framework to contextualize disparate datasets at the findings (Olson et al., 2017). The stoichiometries of pentose- system level. As a demonstration for the model application, we phospate reactions, including sedoheptulose 1,7-bisphosphate first developed a quantitative multi-omics integration protocol D-glyceraldehyde-3-phosphate-lyase (FBA3) and sedoheptulose and used it to fundamentally study redox metabolism and 1,7-bisphosphate ppi-dependent phosphofructokinase potential redox bottlenecks critical for production of biofuels (PFK3_ppi), were corrected (according to experimental (e.g., ethanol) in C. thermocellum. Furthermore, we used the evidence, Rydzak et al., 2012) from the previous model by model, in combination with the previously developed ModCell ensuring mass balance and avoiding lumping multiple steps tool (Garcia and Trinh, 2019c), to design modular (chassis) cells into one reaction. Transaldolase (TALA) was removed from the (Garcia and Trinh, 2019b) for alcohol and ester production. model due to lack of annotation for this gene in C. thermocellum. Several modifications were also performed in key bioenergetic 2. RESULTS reactions. The reactions catalyzed by membrane-bound enzymes, including inorganic diphosphatase (PPA) (Zhou 2.1. Development of an Upgraded et al., 2013) and membrane-bound ferredoxin-dependent C. thermocellum Genome-Scale Model hydrogenase (ECH) (Calusinska et al., 2010), were corrected Named iCBI655 to capture proton translocation. Furthermore, hydrogenase The iCBI655 model was developed using the published iAT601 reactions were updated to ensure ferredoxin association for model (Thompson et al., 2016) as a starting point. The all cases and remove those reactions that do not involve model improvements include updated metabolic pathways, ferredoxin and only use NAD(P)H as a cofactor, based on new annotation, and new extensive documentation. A our recent understanding of C. thermocellum metabolism detailed account of these changes can be found in the (Biswas et al., 2017). Gene-protein-reaction associations were Supplementary Datasheet 1. Here, we highlight the most updated to represent experimental knowledge. For instance, relevant modifications. the hydrogenases BIF (CLO11313_RS09060-09070) and H2ASE (CLO1313_RS12830, CLO1313_RS02840) require 2.1.1. Modeling Updates the maturase Hyd (CLO1313_RS07925, CLO1313_RS11095, To facilitate model usage and reduce human error, the identifiers CLO1313_RS12830) to be functional, and the maturase itself of reactions and metabolites were converted from KEGG into requires all of its subunits to operate, which enables accurate BiGG human-readable form (King et al., 2015). Additionally, representations of hydG deletion genotypes (Biswas et al., 2015). reaction and metabolite identifiers were linked to the modelSEED Two hypothetical reaction modifications were introduced to database (Henry et al., 2010) that enables analysis through ensure consistency with reported phenotypes. First, to enable the KBase web interface (Arkin et al., 2018). The gene growth without the need for succinate secretion, as observed identifiers and functional descriptions were updated to the most in experimental data (Supplementary Datasheet 2), the reaction current annotation (NCBI Reference Sequence: NC_017304.1). homoserine-O-trans-acetylase (HSERTA) was added to enable Metabolite formulas and charges from the modelSEED database methionine biosynthesis (essential for growth). Although this (Henry et al., 2010) were included in the model and reactions reaction is not currently known to be associated with any were systematically corrected for charge and mass balance by the gene in C. thermocellum, it is present as a gene-associated addition of protons and water. reaction in other Clostridium GSMs (Nagarajan et al., 2013). Next, the reaction deoxyribose-phosphate aldolase (DRPA) was 2.1.2. Metabolic Updates removed based on a systematic analysis (section 4.4) to ensure The automated construction process used in the previous model correct lethality prediction of the hydG ech pfl mutant introduced several inconsistencies that were corrected in the strain as well as the correct prediction of growth recovery current model. We removed reactions that were blocked and in this mutant by addition of external electron sinks such as non-gene-associated, apparently introduced during automated sulfate or ketoisovalerate (Table 1). The correct prediction of gap-filling. Two notable examples are (i) the blocked selenate hydG ech pfl-associated phenotypes is critical to successfully pathway which lacks experimental evidence (e.g., selenoproteins use the model for computational strain design (Long et al., have not been found in C. thermocellum), and (ii) blocked 2015; Ng et al., 2015; Maranas and Zomorrodi, 2016; Wang and reactions involving molecular oxygen (e.g., oxidation of Fe2+ Maranas, 2018; Garcia and Trinh, 2019a,b,c). to Fe3+) that are not possible in strict anaerobes like C. thermocellum. Furthermore, tRNA cycling reactions were unblocked by including tRNA in the biomass reaction (Reimers 2.2. Comparison of iCBI655 Against Other et al., 2017). Metabolite isomers were examined and consolidated Genome-Scale Models under the same metabolite identifier when possible, leading to We compared iCBI655 with the previous GSMs of the removal of duplicated reactions and the elimination of gaps. C. thermocellum and the highly-curated GSM iML1515 of
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org 3 August 2020 | Volume 8 | Article 772 Garcia et al. C. thermocellum Model and Its Applications
TABLE 1 | Comparison of mutant growth rates predicted by iAT601 and iCBI655. NGAM has its own pseudo-reaction that hydrolyzes ATP at a rate tuned by the constraint parameters. Gene deletions Medium Percent of W.T. growth rate (%) To increase model prediction accuracy for various iAT601 iCBI655 Experiment conditions, we trained GAM and NGAM parameters of iCBI655 using an extensive dataset of 28 extracellular fluxes hydg MTC 100 100 73 (Supplementary Datasheet 2) measured during the growth hydg-ech MTC 85 85 67 phase under different reactor configurations, carbon sources, hydg-pta-ack MTC 100 100 48 and gene deletion mutants. This approach is based on the hydG-ech-pfl MTC 58 0 0 method used to train the iML1655 E. coli model (Monk et al., hydG-ech-pfl MTC + fumarate 377 726 0 2017). Remarkably, we observed highly linear trends under three hydG-ech-pfl MTC + sulfate 58 65 + different conditions, including chemostat reactor with cellobiose hydG-ech-pfl MTC + ketoisovalerate 97 101 + as a carbon source, chemostat reactor with cellulose as a carbon
Experimental values are taken from Thompson et al. (2015); for some mutants whose source, and batch reactor with either cellobiose or cellulose as growth recovery, not growth rate, was reported, they are presented with “+”. W.T., a carbon source (Figure 1A). This model training has led to wildtype; MTC, Medium for Thermophilic Clostridia. improved growth rate prediction of iCBI655 as compared to iAT601 that has previously been trained with only a smaller dataset (Figure 1B). Specifically, the iAT601 training dataset was TABLE 2 | Comparison of all genome-scale metabolic models of C. thermocellum and the latest E. coli model. limited to batch conditions; hence, the inaccurate predictions of iAT601 were observed for chemostat conditions (Figure 1C). iSR432 iCth446 iAT601 iCBI665 iML1515
Strain ATCC27405 ATCC27405 DSM1313 DSM1313 MG1655 2.4. Assessment of Model Quality and Genes 432 446 601 665 1515 Standard Compliance With Memote Metabolites 583 599 903 795 1877 The field of metabolic network modeling suffers from a lack Reactions 632 660 872 854 2712 of standard enforcement and quality control metrics that limit Blocked 39.2% 32.1% 40.8% 35.1% 9.8% model reproducibility and applicability. To address this issue, reactions Lieven et al. (2020) recently developed the Memote framework Reference Roberts Dash et al., Thompson This study Monk that systematically tests for standards and best practices in GSMs. et al., 2010 2017 et al., 2016 et al., 2017 We applied Memote to both the iCBI655 and E. coli iML1515 models for comparison (Figure 1D). This analysis produced five independent scores that assess model quality. The consistency score measures basic biochemical requirements, such as mass the extensively studied bacterium Escherichia coli (Table 2). and charge balance of metabolic reactions, and it was near 100% The increased number of genes in iCBI655 with respect to for both models. Additionally, the different annotation scores iAT601 cover a variety of functions, including hydrogenase quantify how many elements in the model contain relevant chaperones, cellulosome and cellulase, ATP synthase, and metadata. More specifically, the systems biology ontology (SBO) transporters. Remarkably, iCBI655 has a smaller percentage of annotation indicates if an object in the model refers to a blocked reactions than iAT601, indicating higher biochemical metabolite, reaction, or gene, while the respective annotation consistency. The number of metabolites in iCBI655 is smaller scores of these elements correspond to properties (e.g., name, than those in iAT601 mainly due to the removal of metabolites chemical formula, etc.) and identifiers linking them to relevant that did not appear in any reaction, duplicated metabolites (e.g., databases (e.g., KEGG Kanehisa and Goto, 2000 or modelSEED certain isomers), and blocked pathways added automatically Henry et al., 2010). The overall score is computed as a weighted during gap-filling without any gene association. C. thermocellum average of all the individual scores with additional emphasis DSM1313 has 2911 protein coding genes, 22% of which is on the consistency score. In summary, the high scores obtained captured by iCB655, while E. coli MG1655 has 4240 genes, by iCBI655 indicate the quality of the model and ensure its 35% of which is included in iML1515. Overall, iCBI655 applicability for future studies. has the increased coverage of the metabolic functionality of C. thermocellum but remains far from the well-studied E. coli. 2.5. Model-Guided Analysis of Proteomics 2.3. Training of Model Parameters Under and Flux Datasets Sheds Light on Redox Diverse Conditions Metabolism Critical for Biofuel Production Growth and non-growth associated maintenance (GAM and in C. thermocellum NGAM) are parameters that capture the consumption of ATP For the first application of the genome-scale metabolic model, toward cell division and homeostasis, respectively. These are we aimed to understand the complex redox metabolism known to be condition-specific; however, genome-scale models and potential redox bottlenecks critical for enhanced biofuel do not include a mechanistic description that allows to determine production in C. thermocellum. We used the model as a scaffold these ATP consumption rates as part of the simulation. Instead, to analyze proteomics and metabolic flux data collected for GAM is incorporated into the biomass pseudo-reaction and the C. thermocellum wild-type and hydG ech strains. The
Frontiers in Bioengineering and Biotechnology | www.frontiersin.org 4 August 2020 | Volume 8 | Article 772 Garcia et al. C. thermocellum Model and Its Applications
FIGURE 1 | Training and validation of the iCBI655 model. (A) Training of GAM and NGAM parameters. Discrete points correspond to experimental data. The slope of the linear regression function corresponds to GAM, while the intercept corresponds to NGAM. The data points circled as outliers were not included in any of the linear regression calculations. (B) Comparison of growth prediction error between iCBI655 and iAT601. Each maximum growth rate was predicted by constraining the models with the measured substrate uptake and product secretion fluxes (Supplementary Datasheet 2). r2 corresponds to the Pearson correlation coefficient. (C) Error in growth predictions under batch and chemostat conditions. Predicted and measured growth rates correspond to the values included in (B). (D) Scores provided by the quality control tool Memote (Lieven et al., 2020) for iCBI655 and iML1515. “Overall score” is shown in the legend.