<<

Prediction of metabolic fluxes by incorporating genomic context and flux-converging pattern analyses

Jong Myoung Parka,b, Tae Yong Kima,b, and Sang Yup Leea,b,c,1

aMetabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Program), KAIST, Daejeon 305-701, Republic of Korea; bCenter for Systems and Synthetic Biotechnology, Institute for the BioCentury, KAIST, Daejeon 305-701, Republic of Korea; and cDepartment of Bio and Brain Engineering, BioProcess Engineering Research Center and Bioinformatics Research Center, KAIST, Daejeon 305-701, Republic of Korea

Edited* by Bernhard Ø. Palsson, University of California at San Diego, La Jolla, CA, and accepted by the Editorial Board July 6, 2010 (received for review March 26, 2010) Flux balance analysis (FBA) of a genome-scale metabolic model However, this approach still works with a flux solution space that allows calculation of intracellular fluxes by optimizing an objective is too large compared with the biologically feasible flux solution function, such as maximization of cell growth, under given con- space of a real organism (15–17). Thus, additional information straints, and has found numerous applications in the field of systems and procedures, such as providing more constraints, are needed to biology and biotechnology. Due to the underdetermined nature of narrow down the set of candidate solutions. the system, however, it has limitations such as inaccurate prediction One example of a suitable constraint that can be applied is the of fluxes and existence of multiple solutions for an optimal objective experimentally measured metabolic flux data obtained from 13C- value. Here, we report a strategy for accurate prediction of meta- based flux analysis and fermentation data (18–21). Other constraints bolic fluxes by FBA combined with systematic and condition- that can be applied include transcriptome data under various en- independent constraints that restrict the achievable flux ranges of vironmental changes (22–26), thermodynamic constraints (27–30), grouped reactions by genomic context and flux-converging pattern and molecular crowding of various biomolecules in limited cyto- analyses. Analyses of three types of genomic contexts, conserved plasmic space (31). However, to apply these condition-dependent genomic neighborhood, gene fusion events, and co-occurrence of and objective function-specific constraints to the models, biolog- SYSTEMS BIOLOGY genesacrossmultiple organisms,wereperformed tosuggest a group ically meaningful knowledge and information on environmental and of fluxes that are likely on or off simultaneously. The flux ranges of genetic conditions are required. these grouped reactions were constrained by flux-converging The community has thus been actively pursuing the development pattern analysis. FBA of the Escherichia coli genome-scale metabolic of such methods. Previously, gene neighborhood methods were used model was carried out under several different genotypic (pykF, zwf, for gap filling in a metabolic network, finding new metabolic mod- ppc, and sucA mutants) and environmental (altered carbon source) ules and linking them to the genome, reconstructing pathway net- conditions by applying these constraints, which resulted in flux val- works, and refining the accuracy of metabolic flux prediction (32). ues that were in good agreement with the experimentally measured Also, the coupling of fluxes was previously studied in genome-scale 13 C-based fluxes. Thus, this strategy will be useful for accurately networks using methods such as the flux coupling finder (33, 34). predicting the intracellular fluxes of large metabolic networks when These methods were used to primarily investigate the topology of their experimental determination is difficult. the metabolic network and to supplement operon prediction tools. Here we present a unique strategy of using information gen- Escherichia coli | flux balance analysis | grouping reaction erated from genomic context and flux-converging pattern anal- 13 constraints | C-based flux | genome-scale metabolic model yses to generate and introduce unique constraints for more accurate simulation of the metabolic network. Three types of he accumulation of omics data, including genome, transcriptome, genomic context analyses consisting of conserved neighborhood, Tproteome, metabolome, and fluxome, provides an opportunity gene fusion, and co-occurrence were performed (Fig. 1A). The to understand cellular physiology and characteristics at multiple conserved neighborhood analysis predicts genes in close prox- levels (1, 2). Among them, the fluxome profiling allows quantifi- imity on various genomes repeatedly. The gene fusion analysis cation of metabolic fluxes, which collectively represent the cellular captures events of forming a hybrid gene from previously sepa- metabolic characteristics. For successful metabolic engineering, it rate genes in other organisms. The co-occurrence analysis pre- is important to understand and predict the changes of metabolic dicts presence or absence of linked proteins across organisms fluxes after modifying interesting genes, pathways, and regulatory (35). Flux-converging pattern analysis restricts the range of at- circuits. On the basis of systems-level understanding of cellular tainable flux distributions by considering the number of carbons status through fluxome profiling, rational and systematic develop- in the metabolite that participates in the reactions and the pat- fl ment of an improved strain becomes possible (3–6). terns of the uxes from a carbon source in the metabolic network fl One of the popular methods that has been used to examine by using the concept of ux-converging metabolite (Fig. 1B). cellular metabolic fluxes and predict their changes in perturbed These were all considered together in generating grouping re- conditions is flux balance analysis (FBA) (7–12). To calculate the optimal flux distribution in the multidimensional flux solution space of metabolic network, FBA optimizes an objective function, Author contributions: J.M.P., T.Y.K., and S.Y.L. designed research; J.M.P. and T.Y.K. per- formed research; J.M.P. contributed new reagents/analytic tools; J.M.P., T.Y.K., and S.Y.L. such as maximizing cell growth rate, under pseudosteady state analyzed data; and J.M.P., T.Y.K., and S.Y.L. wrote the paper. with mass balance and other constraints (13). Given the fact that The authors declare no conflict of interest. the information used to reconstruct the metabolic networks is *This article is a PNAS Direct Submission. B.Ø.P. is a guest editor invited by the Editorial incomplete, multiple equivalent solutions can be computed for Board. the states of these networks (14). One method that has been 1To whom correspondence should be addressed. E-mail: [email protected]. presented to overcome the problem of multiple solutions is to This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. explore the flux solution space by flux variability analysis (FVA). 1073/pnas.1003740107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1003740107 PNAS Early Edition | 1of6 Downloaded by guest on September 26, 2021 Fig. 1. Schematic illustration of flux balance analysis with constraints of grouping of functionally and physically related reactions based on genomic context and flux-converging pattern analyses. (A) By considering protein–protein associations based on genomic context analyses consisting of conserved neighbor-

hood, gene fusion and co-occurrence, significantly related reactions were organized in a group. The binary variable, y,is0or1.vj is the flux of reaction j and v1 and v2 belong to the same group. vj,min and vj,max are lower and upper bounds for the flux of reaction j.(B) The flux ranges of the grouped reactions based on genomic context analyses were further constrained by flux-converging pattern analyses based on the carbon number of metabolites that participate in each

reaction (Ca) and the number of passing through flux-converging metabolites from a carbon source (Jb), for instance in this study. A metabolite could be split into different metabolites or created by combining two metabolites. Each circle with different colors defines reaction groups that have a similar scale of flux values, determined by stoichiometric carbon balances for metabolites and fluxes through . The flux-converging metabolite is defined as a metabolite that sums fluxes originated from a reaction that produces two or more metabolites into parallel pathways. δ denotes a constant defining the flux n n level of reactions in a reaction group. v1 , v2 are normalized flux of reaction 1, 2 by dividing each reaction by a carbon source uptake rate, such as glucose. (C) To restrict the flux solution space of the Escherichia coli genome-scale model, the simultaneous on/off constraints (Con/off) by genomic context analyses and flux scale constraints (Cscale)byflux-converging pattern analyses were applied to the E. coli central . (D) Grouping of reactions using genomic context analysis and clustering of the grouped reactions using flux-converging pattern analysis in the form of CxJy. Cx indicates the carbon number of metabolites that participate in each reaction and Jy indicates the type of fluxes through flux-converging metabolites from a carbon source. The red metabolites are flux- converging metabolites. The flux-converging metabolites correspond to metabolites where two split reactions combine, indicated by bold, transparent, and red

arrows, such as glyceraldehyde-3-phosphate, which converges fluxes split by fructose-bisphosphate aldolase. The flux-converging metabolites categorize Jy into four types, denoted as JA, JB, JC, and JD, where each subscript indicates the number of flux-converging metabolites passed zero to three times, respectively, for the given flux from a carbon source. The subscript E, which specifically indicates that the flux has once converged into pyruvate, is placed next to subscripts of J, including A, B, C,orD. Each rectangle with different colors defines reaction groups that are likely on or off simultaneously, determined by genomic context

analysis. The CxJy’s assigned by using a different pathway from a glucose for each reaction are partitioned by a slash.

action constraints, which represent those genes that have good patterns under given genetic and/or environmental perturbations chance of being simultaneously on/off under the influence of the (23). The grouping reaction constraints were incorporated dur- same regulatory mechanisms and showing similar expression ing the FBA of the Escherichia coli genome-scale metabolic

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1003740107 Park et al. Downloaded by guest on September 26, 2021 model under various genetic and environmental conditions. Flux constraints, which predicts only PTS as a glucose importer, FBA distributions obtained by this approach showed good agreement with grouping reaction constraints predicted that both PTS and with those experimentally determined by 13C-based flux analysis, GLK reactions are in operation, which is consistent with experi- which suggests its general usefulness in more accurately de- mental observation. termining metabolic fluxes by FBA. Restricting the Flux Ranges of Grouped Reactions by Flux-Converging Results Pattern Analysis. Followed by genomic context analyses, reactions Grouping of Reactions in E. coli Central Metabolism by Genomic in each reaction group were further clustered into 35 sets using fl Context Analysis. Genomic context and flux-converging pattern ux-converging pattern analysis, to constrain each set of reactions to have a similar scale of flux values. This analysis is based on the analyses were performed to group functionally and physically re- fl lated reactions (Fig. 1C). Here, grouping reaction constraints for idea that the scale of ux values is considerably affected by two fl elements: the number of carbons (Cx) in a metabolite that par- FBA, which consist of simultaneous on/off (Con/off)and ux scale fl C ticipates in the reaction (Rj) and the ux-converging pattern of ( scale) constraints, were focused on reactions of central carbon fl metabolism because it is the most interconnected core metabolism uxes from a carbon source (Jy)(SI Text and Fig. 1). For example, wherein, in contrast to most linear biosynthetic pathways, multiple the C6 metabolite, fructose-6-phosphate, is divided into two C3 regulations and controls are in operation to properly operate the metabolites, glyceraldehyde-3-phosphate and dihydroxyacetone metabolism (21). Thus, it is expected that appropriately constrain- phosphate, by fructose-bisphosphate aldolase in . This ing the reaction fluxes of central carbon metabolism would lead to would cause downstream reactions of fructose-bisphosphate al- fl accurate prediction of flux distribution in the whole metabolism. dolase metabolizing C3 metabolites to have 2-fold higher ux By considering protein–protein associations on the basis of ge- values, compared with the upstream reactions of fructose- fl nomic context analysis, significantly related reactions were orga- bisphosphate aldolase. Also, the ux-converging metabolites in fl fl nized in a group (Fig. 1). After the assignment of association scores the metabolic pathway heavily in uence the scale of ux values as fl for each predictor of functional associations of proteins, the final uxes propagate from an initial carbon source, and thus the fl combined score between any pair of proteins was summed up as number of such ux-converging metabolites is considered. The flux-converging metabolite is defined as a metabolite that sums fluxes originated from a reaction that produces two or more S ¼ 1 −∏ð1 − SiÞ; [1] i metabolites into parallel pathways (Fig. 1 and Fig. S1). These flux- converging metabolites categorize Jy into four types, denoted as where Si denotes the score of each predictor i.Thiscombinedscore JA, JB, JC, and JD, where each subscript indicates the number of is usually higher than the individual subscores. After considering flux-converging metabolites passing through 0, 1, 2, and 3 times, SYSTEMS BIOLOGY the criteria of minimal score from 0.7 to 0.95 to organize the groups respectively, for the given flux originated from a carbon source. that are composed of significantly related reactions, the minimal Analysis of the flux-converging pattern is terminated if the flux score of 0.75 that showed good prediction fidelity compared with goes back to the metabolite it has already passed. experimental data was chosen (Table S1). The individual and There were three flux-converging metabolites, glyceraldehyde- combined scores were calculated to predict protein–protein asso- 3-phosphate, pyruvate, and malate, found in E. coli central me- ciations, which can be obtained from a database of known and tabolism. Among them, pyruvate deserves further attention as predicted protein interactions, STRING 8.0 (35). Consequently, it causes more complex changes of flux distribution, compared E. coli central metabolism appeared to be composed of nine re- with the other two flux-converging metabolites; fluxes split from action groups from genomic context analysis (Tables S1 and S2). 6-phospho-D-gluconate eventually converge into pyruvate, but Reactions in the same group by genomic context analysis receive one of them goes through glyceraldehyde-3-phosphate, which is Con/off during FBA as follows: a flux-converging metabolite. Thus, subscript E, which specifically indicates that the flux has once converged into pyruvate, is placed ð Þ¼ ð Þ [2] y v1 y v2 next to subscripts of J, including A, B, C,orD. Although each reaction could be assigned by several C J at the same time by À Á À Á x y using a different pathway from a carbon source, reactions having y v1 · v1;min ≤ v1 ≤ y v1 · v1;max [3] a similar scale of flux values showed the same CxJy’s. Furthermore, fi À Á À Á it should be noted that Jy is speci cally assigned for the case of fi y v2 · v2;min ≤ v2 ≤ y v2 · v2;max: [4] using glucose as a carbon source and should be modi ed ac- cordingly if a different carbon source is used (see below). Taken The binary variable, y, is available to operate the simultaneous together, reactions in the same group from genomic context on/off of reactions for each group (Eq. 2). The binary variables in analyses were further clustered into the same set if they have the each group represent the same values and are multiplied to same CxJy in E. coli central metabolism with glucose as a carbon fl upper and lower bounds of reactions (Eq. 3 and Eq. 4). The vj is source (Fig. 1). Reactions in the same set from ux-converging the flux of reaction j and the v and v belong to the same group. pattern analysis will not only receive simultaneous on/off con- 1 2 fl The vj,min and vj,max are lower and upper bounds for the flux of straints, but also be constrained to have a similar scale of ux reaction j. values, which is Cscale, whereas reactions that were not clustered The reactions that constitute the same predicted group usually into the set within each reaction group are only subjected to on/off fl belong to an identical functional category of metabolism. How- constraints. In other words, the Cscale by ux-converging analysis ever, some predicted groups consist of reactions belonging to further constrains the flux ranges of the reactions grouped by different functional categories (Table S2). In group 3 of Table S2, genomic context analysis. The mathematical expression of the although pyruvate dehydrogenase (PDH) bridges between gly- Cscale is detailed in SI Text. colysis and TCA cycle by providing acetyl-CoA, PDH was pre- dicted to be related to and grouped with reactions in the TCA FBA with Grouping Reaction Constraints Predicts the Changes of Flux cycle. Additionally, glucokinase (GLK) and glucose-6-phosphate Patterns upon Gene Knockouts. FVA was first performed to esti- isomerase in glycolysis are found in group 5 with reactions in the mate the maximal and minimal flux values of each reaction to pentose phosphate (PP) pathway and the Entner–Doudoroff explore the change of flux solution space under the given con- (ED) pathway. In the utilization of glucose, although most of dition with respect to 13C-based measurements. The changes in glucose is transported and phosphorylated by the phosphoenol- flux patterns in response to several perturbations compared with pyruvate:sugar phosphotransferase system (PTS) in E. coli and a control flux solution space were studied by using two simple most other bacteria, the expression of glk encoding GLK reaction determinants called flux bias, Vavg, and flux capacity, lsol; the is still active (36). In contrast to FBA without grouping reaction former is defined as the average value of the maximal and

Park et al. PNAS Early Edition | 3of6 Downloaded by guest on September 26, 2021 minimal flux values of a reaction, and the latter is the distance between them (SI Text). To systematically characterize the changes in flux values, flux changes in response to perturbations were categorized into nine types on the basis of the combinations of positive and negative changes in Vavg and lsol in comparison with the control flux distribution (Table S3). In this study, the control flux solution space was the flux distribution of the wild- type E. coli cultured in glucose minimal medium under aerobic conditions. The flux values from FBA and 13C-based experiments are relative fluxes that were normalized to the respective carbon source uptake rates. Details on the determination of flux pattern changes by considering the increase or decrease of Vavg and lsol are described in SI Text. The intracellular fluxes can often be much affected by genetic perturbation, such as gene knockouts (37–40), and hence vali- dation of our approach was performed by gene knockouts. Thus, FBA with grouping reaction constraints was performed on four representative gene knockout mutant strains: the pykF (pyruvate kinase in glycolytic pathway), zwf (glucose 6-phosphate de- hydrogenase in PP pathway), ppc (anaplerotic phosphoenolpyr- uvate carboxylase), and sucA (oxoglutarate dehydrogenase in 13 fl TCA cycle) mutants. According to the results of 13C-based flux Fig. 2. Comparison of C-based ux values to those calculated from FBA with or without grouping reaction constraints in wild-type E. coli and pykF knockout analysis of the pyk mutant strain and the wild-type strain, the mutant strains on glucose minimal medium under aerobic conditions and flux glycolytic, malate dehydrogenase (MDH) in TCA cycle and ac- distribution changes for pykF knockout in E. coli central metabolism. The color etate production fluxes decreased whereas the fluxes in the PP gradient and squares indicate the values of Vavg. The red lines indicate that the pathway and the anaplerotic pathway increased when the pyk fluxes show the increment determined by flux bias (Vavg), compared with gene was knocked out (37). However, results obtained by FBA control fluxes, which were from E. coli wild type aerobically grown in glucose did not show such changes. It was found that an alternative route minimal medium. The blue lines indicate the opposite. Thickness of the lines fl for the flux was found to compensate the deleted pyruvate kinase denotes ux capacities (lsol) after environmental perturbation, compared with (PYK) reaction during the FBA of the pyk knockout strain, control fluxes; three types of thickness were used to indicate the changes in flux resulting in no change of glycolysis. The results of FBA on the capacity. The thickest line indicates the increased flux capacity, the thinnest line pyk knockout mutant showed that they all belong to type 9 the decreased flux capacity, and the medium thickness refers to no changes in 13 (Table S3), implying no effects of pyk deletion on the changes in flux capacity. Each graph represents the C-based fluxes and solution spaces of fl each flux predicted with or without grouping reaction constraints for wild type ux distribution. Thus, FBA does not seem to be suitable for fl determining the flux distribution. Next, the grouping reaction and pykF mutant. The y axis in each graph indicates the relative ux (%) that is normalized to the carbon source uptake rate. The 13C-based fluxes are repre- constraints were applied during the FBA. PYK is grouped with sented in (I) wild type and (II) pykF mutant. The fluxes predicted by FBA without other reactions in glycolysis (group 4 in Table S2), and thus the grouping reaction constraints are represented in (III) wild type and (IV) pykF other reactions in group 4 are also affected upon the knockout of mutant. The fluxes predicted by FBA with grouping reaction constraints are the pyk gene, as shown by their decreased flux values (Table S2 represented in (V) wild type and (VI) pykF mutant. The bars in each embedded and Fig. 2). FBA with grouping reaction constraints predicted graph denote the range between the maximal and minimal values of each that the fluxes through reactions in the PP pathway and malic reaction computed from FVA. The graph for succinyl-CoA synthetase is not reaction in the anaplerotic pathway increased whereas shown because its flux values were outside the range of biologically feasible the glycolytic, MDH in TCA cycle and acetate production fluxes values in the case of FBA without grouping reaction constraints. In the case of decreased, compared with the wild type based on V . These reversible reactions, the fluxes corresponding to the direction represented in avg 13 results are in agreement with those of the 13C-based flux and the pathway were considered. The C-based fluxes were evaluated with 90% fi fermentation profile of the pykF knockout mutant (Fig. 2) (37), con dence limits obtained from statistical analysis (37). suggesting that the FBA with grouping reaction constraints allows more accurate determination of the fluxes. Furthermore, Flux patterns predicted by FBA with grouping reaction con- the lsol’s of reactions in the ED pathway and the PP pathway were found to be increased to supply the deficient pyruvate pool straints for other gene knockout mutants also showed good agreement with 13C-based fluxes of corresponding strains. For the caused by the removal of PYK reaction. Also, the lsol of PDH, an enzyme connecting the ED pathway with TCA cycle, and the zwf knockout mutant, in which glucose 6-phosphate de- hydrogenase in the PP pathway was disabled, FBA with grouping lsol’s of the beginning reactions of TCA cycle increased. The type reaction constraints predicted increased Vavg’s through glycolysis, 1 reactions, implying the increase of the Vavg’s and lsol’sof TCA cycle and acetate production reaction, and decreased Vavg’s reactions, were 43.59% of reactions in the E. coli central me- 13 tabolism of the pykF knockout mutant (Fig. 2 and Table S3). The of the PP pathway, which is consistent with results of the C-based fl ’ percentage of type I reactions are the results of the high fluxes metabolic ux analysis. In both methods, higher Vavg s through through the PP pathway, anaplerotic pathway, and TCA cycle TCA cycle were observed to complement the shortage of reducing in the pykF knockout mutant. The type 4 and 5 reactions, im- power (NADPH) caused by the disruption of the PP pathway (Fig. plying the decrease of the Vavg’s of reactions, were 2.56 and S3) (38). For the ppc (phosphoenolpyruvate carboxylase) and 51.28% of reactions in the E. coli central metabolism of the sucA (2-oxoglutarate dehydrogenase) knockouts, FBA with pykF knockout mutant, respectively, and mainly influenced by grouping reaction constraints also yielded consistent predictions, 13 MDH reaction and neighboring reactions in TCA cycle and compared with C-based fluxes (Figs. S4 and S5) (39, 40). In the those in glycolysis, all of which showed lower flux values from case of the gnd and zwf mutants, the changes of the 13C-based flux pykF knockout. The type 9 reaction corresponds to the iso- range (95% confidence intervals) were in good agreement with citrate lyase reaction in glyoxylate shunt, comprising 2.56% of those of the predicted flux ranges obtained using the grouping considered reactions in E. coli central metabolism. These reaction constraints (Fig. S6). As a result, FBA with grouping changes of flux distribution were further confirmed by the reaction constraints well predicted changes in flux distribution analysis based on the random sampling of glucose uptake rate caused by various single-gene knockouts, which suggests its useful having normal distribution (Fig. S2). application in in silico gene targeting for metabolic engineering.

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1003740107 Park et al. Downloaded by guest on September 26, 2021 FBA with Grouping Reaction Constraints Predicts the Changes of Flux available; e.g., there is broad flux solution space causing multiple Patterns upon Altering the Carbon Source from Glucose to Acetate. intracellular flux solutions for a given optimal objective state. The The changes of flux patterns using glucose and acetate as carbon key issue to overcome such problems is to reduce the flux solution sources were predicted by FBA with grouping reaction con- space of the genome-scale model by using additional constraints straints, and the results were compared with the experimentally (22–31). Even though these constraints are invaluable in improving measured fluxes (Fig. S7) (38). When acetate was used as a car- the prediction, they require complex information, such as tran- bon source, the direction of the fluxes through glycolysis was scriptional regulation and a signaling mechanism, and are condition reversed. In addition to the redirection of the pathway, the ac- dependent. We thus aimed at developing a strategy that generates tivation of the glyoxylate shunt and anaplerotic phosphoenol- additional information to be incorporated, in the form of con- pyruvate carboxykinase reaction toward phosphoenolpyruvate straints, to improve the accuracy in predicting the metabolic fluxes. changed the flux distribution dramatically compared with that Performance of FBA could also be improved by adopting ap- observed with glucose as a carbon source. FBA with grouping propriate objective functions under specific circumstances. Schuetz reaction constraints also successfully predicted the use of the et al. (41) systematically evaluated several objective functions un- glyoxylate shunt and anaplerotic reaction toward phosphoenol- der specific conditions and showed that the prediction accuracy pyruvate, and the fluxes were redirected toward glycolysis on the could be significantly improved by selecting a suitable objective fl basis of Vavg and lsol. Most of the uxes that originated from function. Yet, this approach still requires the selection of an ap- acetate uptake went through the TCA cycle and the glyoxylate propriate objective function according to the specific condition ’ ’ shunt. The Vavg s and lsol s of the redirected reactions in glycol- being investigated. Consequently, the objective of this study was ’ ysis and Vavg s of glyoxylate shunt reactions showed increments whether the prediction of metabolic fluxes in pseudosteady state implying a higher likelihood of fluxes going through the newly can be improved by applying condition- and objective function- ’ directed pathways. The high Vavg s of the reactions in the TCA independent constraints, such as genomic features and stoichio- cycle were consistent with those flux patterns experimentally metric reactions. observed with acetate as a carbon source. Recently, the complexity of transcription unit architecture was The organic acid production rates predicted with grouping re- investigated by several groups (42–44). The presence of these ’ action constraints showed very small Vavg s, which are also con- transcriptional units results in a more complicated model of gene sistent with the batch fermentation profiles obtained using acetate expression, compared with the classical operon model. A tran- as a carbon source (Fig. S8). The type 1 and 2 reactions covered scription unit can contain a gene group that is transcribed to an 21.62 and 8.11% of reactions in E. coli central metabolism when mRNA and corresponds to its respective metabolic reaction. glucose and acetate were used, respectively, which is because of However, the transcriptional unit architecture does not account for the amplified redirected reactions and the reactions in the the influences of other functionally related genes that are not found SYSTEMS BIOLOGY glyoxylate shunt (Fig. S7 and Table S3). The type 5 reactions in the same transcriptional unit or operon. The grouping reaction mainly correspond to reactions involved in organic acid pro- constraints, defined by this study, consider the co-occurrence and duction, the PP pathway, and the TCA cycle, comprising 70.27% gene fusion, in addition to the conserved neighborhood relation- of all considered central reactions. Thus, FBA with grouping re- ship. As a result, grouping reaction constraints encompasses a action constraints allowed more accurate prediction of the broader scope that includes the operons and transcription units of fl changes in ux patterns when different carbon sources were used. functionally related genes and resulted in an assumption that the reactions in the same group are affected by similar regulation. Assessment of the Prediction Accuracy of FBA with Grouping Reaction Thus, grouping of such significantly related reactions by genomic Constraints. Having seen that FBA with grouping reaction con- context analysis provides us with a condition-independent con- straints can predict flux patterns better than simple FBA, the pre- straint that incorporates complex regulations, such as transcrip- diction accuracies of the two methods were compared with the tional regulation, in a simple way. In addition, applying additional 13 fl experimentally determined C-based uxes. Here, the Euclidean flux scale constraints within the groups generated by genomic distance was used to quantify the overall agreement of the calcu- context analysis to the model could improve the fidelity of the fl fl lated uxes with experimentally determined uxes (SI Text). More predicted metabolic fluxes by further restricting the range of flux fi speci cally, the Euclidean distance was maximized or minimized solution space. From an engineering point of view, predicting the with the constrained optimal cell growth rate under the given metabolic fluxes and their changes under various perturbed con- condition to estimate a range of possible Euclidean distances ditions is important in designing the strategies for strain improve- resulting from the multiple solutions of computationally calculated ment. The existence of alternative functional states to the predicted fl uxes. Consequently, the maximal and minimal Euclidean dis- metabolic fluxes can be explained from a biological point of view, tances that were estimated with and without grouping reaction constraints allowed us to see the improvement of predictions in the presence of grouping reaction constraints. The minimal distances obtained with grouping reaction constraints were shorter than those without grouping reaction constraints. Additionally, the distances between the best (minimal Euclidean distance) and the worst (maximal Euclidean distance) possible predictions with grouping reaction constraints were remarkably reduced, compared with those without grouping reaction constraints (Fig. 3). These results mean that the predicted flux values became much more similar to those experimentally determined for all six perturbations exam- ined by applying grouping reaction constraints. In conclusion, FBA with grouping reaction constraints allows more accurate pre- diction of fluxes during the simulation of the genome-scale meta- Fig. 3. Assessment of the prediction accuracy of FBA with grouping re- action constraints using Euclidean distance between computationally cal- bolic model, which is typically a highly underdetermined system. 13 culated fluxes and C-based fluxes. Shown are Euclidean distances of E. coli Discussion wild type on glucose minimal medium under aerobic conditions (A) with grouping reaction constraints and (B) without grouping reaction constraints, FBA-based simulation of genome-scale models has been valuable in E. coli pykF mutant on glucose minimal medium under aerobic conditions (C) predicting metabolic flux distributions and understanding the with grouping reaction constraints and (D) without grouping reaction con- metabolic characteristics under various genotypic and environ- straints, and E. coli wild type on acetate minimal medium under aerobic mental conditions. However, the accuracy of FBA in predicting the conditions (E) with grouping reaction constraints and (F) without grouping metabolic fluxes is limited due to the incomplete information reaction constraints.

Park et al. PNAS Early Edition | 5of6 Downloaded by guest on September 26, 2021 often described as biological (14, 45, 46). Thus, we in- Fermentations and Analytical Procedures. Batch fermentations were carried vestigated the metabolic fluxes and their changes under perturbed out in a 6.6-L Bioflo 3000 fermenter (New Brunswick Scientific) containing 2 L conditions using FVA, which can calculate alternative functional of M9 minimal medium with 20 g/L of glucose or 10 g/L of acetate at 37 °C states for a given condition. under aerobic conditions. The pH was controlled at 6.8 by automatic feeding In this paper, we reported a simple yet generally applicable strat- of NH4OH or HCl. Cell growth was monitored by measuring the absorbance egy for performing FBA more accurately by providing condition- at 600 nm (OD600) using an Ultrospec3000 spectrophotometer (Pharmacia and objective function-independent constraints. The range of flux Biotech). Glucose concentration was measured using a glucose analyzer solution space could be reduced by applying grouping reaction (model 2700 STAT; Yellow Springs Instrument). The concentrations of glu- constraints generated from information easily obtainable from the cose and organic acids were determined by HPLCy (ProStar 210; Varian) genome sequence and stoichiometric network. Thus, this strategy equipped with UV/visible-light (ProStar 320; Varian) and refractive index will be generally useful for the accurate prediction of metabolic (Shodex RI-71) detectors. The experimental details of fermentations and fluxes in genome scale when their experimental determination analytical procedures are detailed in SI Text. is difficult. ACKNOWLEDGMENTS. We thank Hyun Uk Kim and Seung Bum Sohn for Materials and Methods their help in manuscript preparation. We also thank Dr. Byung Kwan Cho for his helpful discussion. This work was supported by the Korean Systems FBA with Grouping Reaction Constraints. FBA was used to maximize or min- Biology Research Project (20100002164) of the Ministry of Education, Sci- imize an objective function under different constraints. The FBA with ence, and Technology through the National Research Foundation of Korea. grouping reaction constraints framework was established by applying si- Further support by the World Class University Program (R32-2009-000-10142-0) multaneous on/off and flux scale constraints to FBA, as described in Results through the National Research Foundation of Korea funded by the Ministry and as detailed in SI Text. of Education, Science, and Technology is appreciated.

1. Lee SY, Lee DY, Kim TY (2005) Systems biotechnology for strain improvement. Trends 25. Covert MW, Xiao N, Chen TJ, Karr JR (2008) Integrating metabolic, transcriptional Biotechnol 23:349–358. regulatory and signal transduction models in Escherichia coli. Bioinformatics 24: 2. Joyce AR, Palsson BO (2006) The model organism as a system: Integrating ‘omics’ data 2044–2050. sets. Nat Rev Mol Cell Biol 7:198–210. 26. Lee JM, Min Lee J, Gianchandani EP, Eddy JA, Papin JA (2008) Dynamic analysis of 3. Park JH, Lee SY (2008) Towards systems metabolic engineering of microorganisms for integrated signaling, metabolic, and regulatory networks. PLOS Comput Biol 4: amino acid production. Curr Opin Biotechnol 19:454–460. e1000086. 4. Krömer JO, Sorgenfrei O, Klopprogge K, Heinzle E, Wittmann C (2004) In-depth 27. Feist AM, et al. (2007) A genome-scale metabolic reconstruction for Escherichia coli profiling of lysine-producing Corynebacterium glutamicum by combined analysis of K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst the transcriptome, metabolome, and fluxome. J Bacteriol 186:1769–1784. Biol 3:121. 5. Deshpande R, Yang TH, Heinzle E (2009) Towards a metabolic and isotopic steady state 28. Henry CS, Broadbelt LJ, Hatzimanikatis V (2007) Thermodynamics-based metabolic fi in CHO batch cultures for reliable isotope-based metabolic pro ling. Biotechnol J 4: flux analysis. Biophys J 92:1792–1805. 247–263. 29. Kümmel A, Panke S, Heinemann M (2006) Systematic assignment of thermodynamic 6. Becker J, Klopprogge C, Wittmann C (2008) Metabolic responses to pyruvate kinase constraints in metabolic network models. BMC Bioinformatics 7:512. deletion in lysine producing Corynebacterium glutamicum. Microb Cell Fact 7:8. 30. Yang F, Qian H, Beard DA (2005) Ab initio prediction of thermodynamically feasible 7. Price ND, Reed JL, Palsson BO (2004) Genome-scale models of microbial cells: reaction directions from biochemical network stoichiometry. Metab Eng 7:251–259. – Evaluating the consequences of constraints. Nat Rev Microbiol 2:886 897. 31. Beg QK, et al. (2007) Intracellular crowding defines the mode and sequence of fl 8. Kim HU, Kim TY, Lee SY (2008) Metabolic ux analysis and metabolic engineering of substrate uptake by Escherichia coli and constrains its metabolic activity. Proc Natl – microorganisms. Mol Biosyst 4:113 120. Acad Sci USA 104:12663–12668. 9. Park JH, Lee KH, Kim TY, Lee SY (2007) Metabolic engineering of Escherichia coli for 32. Breitling R, Vitkup D, Barrett MP (2008) New surveyor tools for charting microbial the production of L-valine based on transcriptome analysis and in silico gene metabolic maps. Nat Rev Microbiol 6:156–161. – knockout simulation. Proc Natl Acad Sci USA 104:7797 7802. 33. Burgard AP, Nikolaev EV, Schilling CH, Maranas CD (2004) Flux coupling analysis of 10. Lee KH, Park JH, Kim TY, Kim HU, Lee SY (2007) Systems metabolic engineering of genome-scale metabolic network reconstructions. Genome Res 14:301–312. Escherichia coli for L-threonine production. Mol Syst Biol 3:149. 34. Papin JA, Reed JL, Palsson BO (2004) Hierarchical thinking in network biology: The 11. Jamshidi N, Palsson BO (2009) Using in silico models to simulate dual perturbation exper- unbiased modularization of biochemical networks. Trends Biochem Sci 29:641–647. iments: Procedure development and interpretation of outcomes. BMC Syst Biol 3:44. 35. Jensen LJ, et al. (2009) STRING 8—a global view on proteins and their functional 12. Feist AM, Palsson BO (2008) The growing scope of applications of genome-scale interactions in 630 organisms. Nucleic Acids Res 37(Database issue):D412–D416. metabolic reconstructions using Escherichia coli. Nat Biotechnol 26:659–667. 36. Meyer D, Schneider-Fresenius C, Horlacher R, Peist R, Boos W (1997) Molecular 13. Edwards JS, Ibarra RU, Palsson BO (2001) In silico predictions of Escherichia coli metabolic characterization of glucokinase from Escherichia coli K-12. J Bacteriol 179:1298–1306. capabilities are consistent with experimental data. Nat Biotechnol 19:125–130. 37. Al Zaid Siddiquee K, Arauzo-Bravo MJ, Shimizu K (2004) Metabolic flux analysis of 14. Reed JL, Palsson BO (2004) Genome-scale in silico models of E. coli have multiple pykF gene knockout Escherichia coli based on 13C-labeling experiments together with equivalent phenotypic states: Assessment of correlated reaction subsets that comprise measurements of enzyme activities and intracellular metabolite concentrations. Appl network states. Genome Res 14:1797–1805. Microbiol Biotechnol 63:407–417. 15. Mahadevan R, Schilling CH (2003) The effects of alternate optimal solutions in 38. Zhao J, Baba T, Mori H, Shimizu K (2004) Effect of zwf gene knockout on the constraint-based genome-scale metabolic models. Metab Eng 5:264–276. metabolism of Escherichia coli grown on glucose or acetate. Metab Eng 6:164–174. 16. Park JM, Kim TY, Lee SY (2009) Constraints-based genome-scale metabolic simulation 39. Li M, Ho PY, Yao S, Shimizu K (2006) Effect of sucA or sucC gene knockout on the for systems metabolic engineering. Biotechnol Adv 27:979–988. 17. Puchałka J, et al. (2008) Genome-scale reconstruction and analysis of the Pseudo- metabolism in Escherichia coli based on gene expressions, enzyme activities, fl 13 monas putida KT2440 metabolic network facilitates applications in biotech- intracellular metabolite concentrations and metabolic uxes by C-labeling – nology. PLoS Comput Biol 4:e1000210. experiments. Biochem Eng J 30:286 296. fl 18. Blank LM, Kuepfer L, Sauer U (2005) Large-scale 13C-flux analysis reveals mechanistic 40. Peng L, Arauzo-Bravo MJ, Shimizu K (2004) Metabolic ux analysis for a ppc mutant 13 principles of metabolic network robustness to null mutations in yeast. Genome Biol 6: Escherichia coli based on C-labelling experiments together with enzyme activity – R49. assays and intracellular metabolite measurements. FEMS Microbiol Lett 235:17 23. 19. Fischer E, Sauer U (2005) Large-scale in vivo flux analysis shows rigidity and suboptimal 41. Schuetz R, Kuepfer L, Sauer U (2007) Systematic evaluation of objective functions for performance of Bacillus subtilis metabolism. Nat Genet 37:636–640. predicting intracellular fluxes in Escherichia coli. Mol Syst Biol 3:119. 20. Yong K T, Lee SY (2006) Accurate metabolic flux analysis through data reconciliation 42. Cho BK, et al. (2009) The transcription unit architecture of the Escherichia coli of isotope balance-based data. J Microbiol Biotechnol 16:1139–1143. genome. Nat Biotechnol 27:1043–1049. 21. Sauer U (2006) Metabolic networks in motion: 13C-based flux analysis. Mol Syst Biol 2: 43. Güell M, et al. (2009) Transcriptome complexity in a genome-reduced bacterium. 62. Science 326:1268–1271. 22. Barrett CL, Herring CD, Reed JL, Palsson BO (2005) The global transcriptional 44. Sharma CM, et al. (2010) The primary transcriptome of the major human pathogen regulatory network for metabolism in Escherichia coli exhibits few dominant Helicobacter pylori. Nature 464:250–255. functional states. Proc Natl Acad Sci USA 102:19103–19108. 45. Raamsdonk LM, et al. (2001) A functional genomics strategy that uses met- 23. Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO (2004) Integrating high- abolome data to reveal the phenotype of silent mutations. Nat Biotechnol 19: throughput and computational data elucidates bacterial networks. Nature 429:92–96. 45–50. 24. Shlomi T, Eisenberg Y, Sharan R, Ruppin E (2007) A genome-scale computational 46. Fong SS, Marciniak JY, Palsson BO (2003) Description and interpretation of adaptive study of the interplay between transcriptional regulation and metabolism. Mol Syst evolution of Escherichia coli K-12 MG1655 by using a genome-scale in silico metabolic Biol 3:101. model. J Bacteriol 185:6400–6408.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1003740107 Park et al. Downloaded by guest on September 26, 2021