UNIVERSITY OF CALIFORNIA, SAN DIEGO

Identification of growth-coupled and kinetically robust production in strain designs using genome-scale models

A thesis submitted in partial satisfaction of the requirements for the degree Master of Science

in

Bioengineering

by

Hoang Viet Dinh

Committee in charge:

Professor Bernhard O. Palsson, Chair Professor Adam M. Feist Professor Nathan E. Lewis Professor Christian M. Metallo

2017 Copyright

Hoang Viet Dinh, 2017

All rights reserved. The Thesis of Hoang Viet Dinh is approved and it is acceptable in quality and form for

publication on microfilm and electronically:

______

______

______

______

Chair

University of California, San Diego

2017

iii DEDICATION

I dedicate this work to my parents who wholefully support their children education

iv TABLES OF CONTENTS

Signature Page...... iii

Dedication...... iv

Table of Contents...... v

List of Figures...... vi

List of Tables...... vii

Acknowledgements...... viii

Abstract of the Thesis...... x

Chapter 1 Introduction...... 1

Chapter 2 Genome-scale models: System biology calculation platform for rational metabolic engineering strain designs...... 4 2.1. Introduction...... 4 2.2. Model and design modifications...... 5

Chapter 3 Growth-coupled production evaluation...... 7 3.1. Cell growth simulation...... 7 3.2. Growth-coupled production...... 10 3.3. Removal of redundant knockouts...... 13 3.4. Robustness evaluation of growth-coupled production with kinetic parameter sampling in Metabolic-Expression model...... 16 3.5. Term definitions...... 20

Chapter 4 Results...... 22 4.1. Integrated strain designs search and evaluation workflow...... 22 4.2. Growth-coupled production diversity in Eschericihia coli...... 27 4.3. Knockout prediction...... 31 4.4. Robust designs after kinetic parameters sampling test in Metabolic-Expression model...... 33

Chapter 5 Discussion...... 36

Bibliography...... 39

v LIST OF FIGURES

Figure 3.1 Iterative workflow to remove redundant knockouts...... 14

Figure 3.2 Visualization of growth-coupled production slope...... 15

Figure 3.3 Sampled magnitude space for target and competitive pathways’ kinetic parameters...... 17

Figure 3.4 Status of kinetic parameters sampling with discretization...... 18

Figure 3.5 High resolution discretization with binary search...... 19

Figure 4.1 Integrated model-driven strain design search and evaluation workflow using Metabolic and Metabolic-Expression models...... 25

Figure 4.2 Filtration workflow results...... 26

Figure 4.3 Growth-coupled production pathway map...... 29

Figure 4.4 Growth-coupled production of target molecules in specific conditions 30

Figure 4.5 Knockout strategies suggested by strain design search algorithm for growth-coupled production...... 32

Figure 4.6 Suggestion of robust 1,4-butanediol production pathway by Metabolic- Expression model...... 35

vi LIST OF TABLES

Table 2.1 Model and design modifications...... 6

Table 3.1 Nonphysiological pathway shutdowns...... 8

Table 3.2 Molecules growth-coupled production status...... 11

Table 3.3 Table of terms and their definitions used in this work...... 21

Table 4.1 Properties of strain designs...... 28

vii ACKNOWLEDGEMENTS

First and foremost, I want to thank Dr. Adam Feist for the initial idea of the project that eventually turns into this thesis. His insight on rational metabolic engineering study using system biology at large-scale was present not only in his guidance for this project but also from his previous published works that stimulated my thought on how to tackle the problem at visionary level.

I want to thank my advisor Professor Bernhard Palsson for the given resources and opportunities. This project could not be realized and actualized without the level of foundation he has established in his lab and the system biology and metabolic engineering community as a whole. He has organized his lab in multiple disciplinaries and enhance collaboration from which this project is one of its fruit.

I want to thank Dr. Zachary King for his close supervision which help maintain the project on the right track. He has provided much of the technical help and vision without which the project would stop at general workflow level without an elegant execution. He is also a great mentor which help establish my simulation expertise, critical thinking, and idea presentation skills.

On the modeling side, I want to thank Colton Lloyd and Dr. Laurence Yang.

Their expertise from countless questions that I have asked shaped the core on the simulation side of this project.

I want to thank COBRApy community for their contribution on an excellent and convenient toolbox for system biology researchers. I want to thank everyone in

System Biology Research Group for all the stimulating discussion and presentation that I have experienced during my stay which has shaped my thinking.

viii Last but not least, I want to thank my parents for the infinite amount of interests and support for their children education. My primitive seeds of knowledge and academia appreciation were planted by my dad without which I could have not been where I am today.

The thesis, including chapters 1-5, is adapted from an in preparation manuscript: Dinh, H. V., King Z. A., Palsson B. Ø., Feist A. M. Identification of growth- coupled and kinetic parameters robust production in strain designs using genome- scale models. The thesis author was the primary author of the manuscript and was responsible for the research.

ix ABSTRACT OF THE THESIS

Identification of growth-coupled and kinetically robust production in strain designs using genome-scale models

by

Hoang Viet Dinh

Master of Science in Bioengineering

University of California, San Diego, 2017

Professor Bernhard O. Palsson, Chair

Conversion of renewable biomass to useful molecules in microbial cell factories can be approached in a rational and systematic manner using constraint- based reconstruction and analysis. Filtering for high confidence in silico designs is critical as in vivo construction and testing is expensive and time consuming. As such, a workflow was devised to analyze the robustness of growth-coupled production against variability in kinetic parameters through applying a genome-scale model of metabolism and macromolecules expression (ME-model). A collection of

2632 knockout designs in E. coli was evaluated by a workflow that removed 40

x redundant knockouts and returned pools of 634 growth-coupled designs and 41 ME- model robust designs. The resulting knockout strategies revealed how enzyme efficiency and pathway tradeoffs can affect growth-coupled production phenotypes.

xi Chapter 1

Introduction

The chemical industry has relied on petroleum as raw material for the last century (Sittig and Weil 1954). As of recent, shifting to renewable biomass has gained interest in both industry and academia (United Nations Industrial Development

Organization; Lee and Kim 2015). Microbial cell factories can be used in efficient bioprocesses to convert biomass to a wide array of useful products (Lee and Kim

2015). Microbial cell factories needs to be designed to optimize for production rate, yield, and titer as wild-type strains do not generally produce desirable molecules. For example, in the model bacterium , CO2 is secreted aerobically and mixture (ethanol, acetate, formate, D-lactate, and succinate) is secreted anaerobically (Clark 1989). Knocking out the default fermentation pathways is the common strategy that redirects raw material toward desirable products (King et al.

2017). Byproduct secretion during optimal cell growth, called growth-coupled production, is a desirable target for strain design because adaptive laboratory evolution technique can be used to enhance growth-coupled production by selecting population with higher fitness (Fong et al. 2005; Zhang et al. 2007).

Fueled by advance in system biology, synthetic biology, and evolutionary engineering, rational metabolic engineering has been applied and succeeded in designing overproduction strain (Park and Lee 2008; Lee and Kim 2015; Davy et al.

2017). System biology approach with strain design evaluation in genome-scale evaluation takes into account multiple biological components and their interactions

1 2

which is necessary for predicting growth-coupled production of target molecule.

Genome-scale metabolic reconstructions, or models (GEMs), are collection of genetic and biochemical information from databases and literatures (Thiele and Palsson

2010).Genome-scale metabolic reconstructions, or models (GEMs), are collection of genetic and biochemical information from databases and literatures (Thiele and

Palsson 2010). Constraint-based reconstruction and analysis (COBRA) can be used on GEMs to calculate metabolic flux distribution and predict genotype-phenotype relationship (Lewis et al. 2012). Systematic approach via optimization algorithms

(Machado and Herrgård 2015; Maia et al. 2016) can be performed using GEMs to identify strain designs to optimize cell factory strains. In the recent years, models of metabolism and macromolecule expression (ME-models) have been reconstructed with additional biological constraints, allowing for more accurate predictions of overflow metabolism (O'Brien et al. 2013), membrane content (Liu et al. 2014), and by-product secretion (King et al. 2017).

Model-driven strain design and pathway prediction methods (Feist et al. 2010;

Campodonico et al. 2014) provide an ample amount strain designs which are currently infeasible to test in vivo.Such strain designs are susceptible to alternative production which becomes growth optimal and replaces target growth-coupled production under different model reconstruction, especially under ME-models with extra kinetic parameters input. Therefore, this work attempted to make an efficient high throughput workflow to filtered 2632 E. coli designs from previous studies for high confidence robust designs for expensive and time-consuming in vivo testing.

Robust strain designs sustained growth-coupled production under sampled effective turnover rate (keff) region that hampered production in ME-model. After the workflow, 3

634 growth-coupled production designs were found but only 41 designs maintain growth-coupled characteristic under kinetic sampling test in ME-models’ kinetic parameter sampling. Enzyme efficiency was shown to be a decisive factor for the present of growth-coupled production in ME-model. Robust strain designs that worked in both M- and ME-model are highly suggested for in vivo testing and subsequent implementation. Chapter 2

Genome-scale models: System biology calculation platform for rational metabolic engineering strain designs

2.1. Introduction

Genome-scale metabolic reconstructions, or models (GEMs), are collection of genetic and biochemical information from databases and literatures (Thiele and

Palsson 2010). The models provide an organized way to store knowledges of organisms and perform iterative study to validate hypotheses and discover new information (O'Brien et al. 2015). For metabolic engineering study, centralization of all the possible fermentation pathways provided by genome-scale models are key to assert targeted molecules growth-coupled production under competition of other pathways.

One method to equip the computability property to the genome-scale models is constraint-based reconstruction and analysis (COBRA) (Schellenberger et al.

2011). Specifically to metabolic engineering, GEM with COBRA calculation provides theoretical yield, effect of gene deletions, biomass requirements (King et al. 2015), and strain designs suggestion via algorithm (Machado and Herrgård 2015). For

4 5

growth-coupled production strain, which can be suggested using search algorithm, it is crucial that the strain was able to growth and overproduce target molecules.

Growth-coupled strain design were collected from previous studies using search algorithms implemented in the iAF1260b and iJO1366 M-models of

Escherichia coli K-12 MG1655 (Feist et al. 2007; Feist et al. 2010; Orth et al. 2011).

M-model simulation to validate growth-coupled production for a strain design were performed on the corresponding model in which the strain design was found. The newest E. coli ME-model iLE1678-ME (Lloyd et al. 2017) was used for design robustness evaluation. The ME-model has the capability to explicitly calculate the cost of machinery supporting the metabolic flux state that can be calculated in M- model and accounts for both metabolic flux distribution and machinery cost in optimization.

2.2. Model and design modifications

The following modifications (Table 2.1) were carried out due to its causing of erratic behavior for in silico simulation and literature evidence and explanation were provided to support the changes made. 6

Table 2.1. Model and design modifications

Modifications Explanation

Glycogen is required for growth in ME-model. 1. Remove glycogen Glycogen contributed a miniscule amount to synthesis pathway knockout usage thus PGMT knockout was (PGMT) redundant.

PGM knockout is lethal in vivo (Foster et al. 2010). 2. Remove phosphoglucose This knockout is viable in silico but results in very mutase knockout (PGM) low growth rate (0 - 0.11 hr-1)

tktAB expresses enzyme that catalyzes both TKT1 and TKT2 reactions but algorithm only suggest TKT2 3. Remove transketolase 2 only. Knocking out both is lethal both in vivo knockout (TKT2) (Josephson and Fraenkel 1969; Josephson and Fraenkel 1974) and in silico.

4. Remove fructose-6- See Table 3.1, Nonphysiological pathways phosphate aldolase shutdown, item #1. knockout (F6PA)

5. Turn off all lipid exchange reactions (EX_acolipa_e, EX_colipa_e, EX_colipap_e, Lipid metabolites are assigned compartment EX_eca4colipa_e, external (‘e’). However, those metabolites are EX_enlipa_e, membrane integral and not freely floating and diffuse EX_hacolipa_e, at cell external. In simulation, lipid are found to be EX_halipa_e, growth-coupled and freely diffused to outside via EX_kdo2lipid4_e, exchange reactions. EX_lipa_e, EX_lipa_cold_e, EX_o16a4colipa_e)

L-Alanine and L-valine are growth-coupled 6. Turn off L-valine production pair, i.e. the sum of them are growth- exchange reaction coupled. L-valine exchange reaction was also shut (EX_val__L_e) down when L-alanine strain designs were searched (Feist et al. 2010). Chapter 3

Growth-coupled production evaluation

3.1. Cell growth simulation

Flux balance analysis (FBA) was used to simulate feasible flux distributions.

Cell growth was simulated by maximizing the biomass objective function in the M- model and biomass dilution in the ME-model. FBA was employed in COBRApy

(Ebrahim et al. 2013) software packages in Python 2 (Python Software Foundation,

Beaverton, OR, USA). In M-model, linear optimization problems was solved by Gurobi

(Gurobi Optimization Inc., Houston, TX, USA) solver. For ME-model calculations, sub- problem of linear optimization in binary search was solved by the solveME (Yang et al. 2016) software package and the Quad MINOS solver (Ma and Saunders 2015).

ME-model simulation results were analyzed using COBRAme toolbox (Lloyd et al.

2017).

Substrate uptake rate was set to 20 mmol gDW-1 h-1. uptake rate was set to 20 mmol gDW-1 h-1 for aerobic conditions and 0 for anaerobic conditions.

Reaction knockouts were simulated by constraining the flux of the corresponding reactions to be zero. Nonphysiological pathways were found to carry flux in silico despite being unlikely to be activated in vivo for which detailed explanations and in silico implementation were laid out in Table 3.1.

7 8

Table 3.1. Nonphysiological pathways shutdown

In simulation, this reaction was found to carry high flux and is in C-C bond breaking direction both of which have not been shown in vivo or in vitro. Despite the discoveries (Schürmann and Sprenger 2001), no in vivo over-expression or pathway activation study has been performed. Existing studies had used this enzyme to carry out C-C bond 1. Turn off fructose 6- forming aldolase (Samland and Sprenger 2006). phosphate aldolase (F6PA)

In term of enzyme activity, the KM is very large for hexoses of fructose 6-phosphate (f6p) (unit uM, dha = 35000, g3p = 800, f6p = 9000) compare to other glycolysis alternative such as pgi (unit uM, glucose 6-phosphate (g6p) = 1018, f6p = 78, ecocyc) and zwf (unit uM, g6p = 174, ecocyc) all three of which use hexoses as the substrate.

In simulation, the conjuncting nucleotide synthesis- salvage pathway acts as glycoylsis making 2. Turn off purine nucleotide acetaldehyde and g3p. A transcriptome analysis phosphorylase study indicated that deoD is repressed during (PUNP1/2/3/4/5/6, DURIPP) glucose uptake (Gosset et al. 2004). Reactions contain deoD in GPR are shut down which can shut down the pathway.

3. Turn off L-allo-threonine High KM value for NADP+ (Fujisawa et al. 2003) dehydrogenase’s L-allo- suggests the reaction acts in the L-allo-threonine threonine forming direction breaking direction. (ATHRDHr.lower_bound) 9

Table 3.1. Nonphysiological pathway shutdowns, contd.

This pathway is active in silico as a design failure mode producing hexanoate. This pathway was shut 4. Turn off reverse beta- down due to: oxidation 1- Hexanoate is not a target of strain designs (ACACT1r.lower_bound) 2- The pathway is not likely to occur in normal conditions and require extensive engineering effort based on literature (Dellomonaco et al. 2011).

This reaction carried high flux in simulation. Based 5. Turn off malate oxidase on KEGG, the MOX reacton (EC number: 1.1.3.3) via oxygen (MOX) was outdated (Kanehisa and Goto 2000) and was redirected to malate oxidase via quinone instead.

This reaction carry high flux in ME-model simulation 6. Turn off glycolate due to its high enzyme efficiency yet based on dehydrogenase (NAD) literature this version of the reaction has low activity (GLYCLTDx) (Nuñez et al. 2001).

In simulation, this pathway recycled acetyl-CoA, made pyruvate, and formed a loop with pyruvate- 7. Turn off pyruvate formate overproducing formate at extremely synthase (with high flux (~70-90 mmol.gDW-1.hr-1). The pyruvate- flavoredoxin)’s pyruvate forming direction of pyruvate synthase with forming direction flavoredoxin can only be activated under oxidative (POR5.lower_bound) stress (specifically methyl viologen in literature) (Eremina et al. 2010; Nakayama et al. 2013).

L-serine degradation was found to carry high flux 8. Turn off L-serine and act as lower glycolysis alternative. The following degradation genes were repressed during catabolism: tnaA, (TRPAS2.upper_bound, dsdA, tdcG, and sdaB (Shao and Newman 1993; SERD_D, SERD_L) Sawers 2001; Gosset et al. 2004). The corresponding reactions were shut down. 10

3.2. Growth-coupled production

After implementing design conditions and knockouts in simulations, cell growth was maximized. In the M-model, to account for possible alternative molecule production, target growth-coupled production was minimized at maximum growth rate. In contrast to M-model, in ME-model because there are unique costs and turnover rates associated with each , the metabolic flux distribution associated with the maximum growth rate is unique and growth-coupled production was directly taken albeit production rate can also be minimized per linear optimization iteration. Growth-coupled production level was evaluated in carbon yield determined by the following equation where vproduct and vsubstrate are target molecule growth-coupled

-1 -1 production rate and substrate uptake rate in mmol gDW h respectively, numCproduct and numCsubstrate are number of carbon in target molecule and substrate molecule respectively:

v product∗numC product mmol carbon in product y product = = ( ) vsubstrate∗numC substrate mmol carbon in substrate

A design was considered as growth-coupled if its flux through the biomass objective function was greater than 0.1 h-1 and its target molecule production’s carbon yield was more than 10%. A list of valid byproducts based on the previous work and their growth-coupled production status was given in Table 3.2. The production was minimized in M-model to account for possible alternative solutions in the underdetermined linear optimization. In ME-model, because of the different costs and turnover rates between enzymes, optimal solutions were unique. 11

Table 3.2. Molecules growth-coupled production status

Molecule Growth-coupled production status

1-Butanol Robust

Isobutanol Non-robust

1-Propanol Non-robust

2-Propanol Non-robust

Ethanol Non-robust

1,4-Butanediol Robust

2,3-Butanediol Non-robust

1,3-Propanediol Non-robust

Glycerol Non-robust

3-Hydroxyvalerate No growth-coupled

3-(R)-Hydroxybutyrate No growth-coupled

D-Lactate Robust

Acrylate Non-robust

Acrylamide Non-robust

2-Oxopentanoate No growth-coupled

3-Methyl-2-oxobutanoate No growth-coupled

3-Hydroxypropanoate Non-robust

Fumarate Non-robust

2-Oxoglutarate Robust

Pyruvate Robust

L-Alanine Non-robust 12

Table 3.2. Molecules growth-coupled production status, contd.

L-Glutamate No growth-coupled

L-Malate No growth-coupled

L-Serine No growth-coupled

2-Phenylethanol No designs reported

2,3-Propanediol No designs reported

3-Methyl-1-butanol No designs reported

2-Methyl-1-butanol No designs reported

4-Hydroxybutyrate No designs reported

Designs were not evaluated in this work due to low activity of formate-hydrogen lyase (Feist et al. 2010). 13

3.3. Removal of redundant knockouts

A procedure was implemented to scan through each strain design and remove excessive gene knockouts in order to minimize time for in vitro strain construction.

Specifically, for every design keep knockout that once removed design qualities did not drop by more than 1%. Three examined quality designs, which were modified versions of those from Feist et. al. 2010 study, included carbon yield which could not drop more than 1% carbon yield value, maximum growth-rate without target molecule production (μmax,Remove Production or μmax,RP) could not drop more than 1% of original value, and substrate-specific productivity (SSP) could not drop more than 1% of original value. Knockout candidates in the designs were iteratively removed and design’s growth-coupled production was reevaluated (Figure 3.1). 14

Figure 3.1. Iterative workflow to remove redundant knockouts when removed did not drop the growth-coupled production qualities more than 1%. 15

Substrate-specific productivity (SSPproduct) was calculated by the following formula where where yproduct is carbon yield and μmax is maximum growth rate:

−1 SSP product = y product∗μmax (carbon yield/hr )

Originally in Feist et. al. 2010, strength of growth coupling which was calculated by square of product yield divided by production-growth slope visualized in Figure 3.2.

Production-growth slope (PS) was calculated by the following formula where where vproduct is the production rate, μmax is maximum growth rate, and μmax,RP is maximum growth rate when removing the product exchange reaction:

v PS = product μmax−μmax, RP

Because vproduct was checked as the first criterion and was maintained when checking subsequently for production slope plus μmax being constant, μmax,RP was checked instead of PS.

Figure 3.2. Visualization of growth-coupled production slope. The more suboptimal competitive production removed, the more shift of suboptimal growth rate during alternative production (μmax,RP) and the lower the production slope. 16

3.4. Robustness evaluation of growth-coupled production with kinetic parameter sampling in

Metabolic-Expression model

Enzyme turnover rates were set to be constant prior to ME-model simulation and were inversely proportional to enzyme cost to support their reaction fluxes. In this study, kinetic parameters was changed to decrease in target molecule production pathway and/or increase in competitive molecule production pathway which could render target molecule growth-coupled production pathway suboptimal. Thus, a strain design was considered to be robust if the change did not bring about cell survival without target molecule production.

Possible turnover rates (keff) for enzymes was determined to be ranged from

10-2 to 106 s-1 from all enzymes turnover rates in BRENDA database (Bar-Even et al.

2011). Enzyme turnover rate magnitude was sampled between 10-2 to the default value for target pathway and between the default value to 106 for competitive pathway. The competitive molecule was found in ME-model by re-simulation after removing the exchange reaction of the target molecule from the model. The sampled space was discretized into points at which pathway’s turnover rates magnitude was adjusted and ME-model was re-simulated. If there existed a simulation returning survival without target molecule production, the design was not robust.

Solving for maximum growth rate in ME-model required binary search whose each steps fixed the growth rate and ME-model become linear for linear optimization.

-6 -1 Binary search solution tolerance was set at 10 hr . Turnover rates (keffs) were set 17

before binary search and different keffs could render growth-coupled production fail.

Turnover rate magnitudes of target pathway and competitive pathway make up a two- dimensional space. Centered at default average values, the quadrant that increase keff for competitive pathway and decrease keff for target pathway were sampled. The quadrant spanned from default magnitudes to 10-6 s-1 for competitive pathway dimension and 10-2 s-1 for target pathway dimension. The sampled space was initially discretized into 9 points (Figure 3.3).

Figure 3.3. Sampled magnitude space for target and competitive pathways’ kinetic parameters. Sampled space was initially discretized into 9 points. Only a single point with cell survival without growth-coupled production was sufficient to declare the design non-robust.

Outcome schematic was displayed in Figure 3.4. If a strain was found to survive without growth-coupled production, the strain was considered not robust. If among 9 points, no survival without growth-coupled production found, higher resolution discretization was performed (Figure 3.5). 18

Figure 3.4. Status of kinetic parameters sampling with discretization. Red glows were non-robust, green glows were robust, and no glows were inconclusive. Red-circled nine points status was suspected to have a survival without target growth-coupled production (red band) between growth-coupled (green band) and cell death (black band) thus higher resolution sampling was ran, visualized in Figure 3.5. 19

Figure 3.5. High resolution discretization with binary search 20

3.5. Term definitions

Definitions for terms used in the filtration workflow procedure and results were laid out in Table 3.3. 21

Table 3.3. Table of terms and their definitions used in this work.

Term Definition

Yield calculated by fraction of carbon from substrate Carbon yield converted into target molecule (mmol carbon in product / mmol carbon in substrate)

A design has growth-coupled production or a design is Growth-coupled growth-coupled when its minimum target molecule production production’s carbon yield is more than 10% at maximum growth rate

The qualities are carbon yield, substrate-specific productivity (SSP), and maximum growth rate without target molecule Growth-coupled production (μmax,RP). SSP and μmax,RP were the modified qualities versions of SSP and strength-of-coupling from Feist et. al. 2010 respectively.

Removal of redundant knockouts from the design does not Redundant decrease carbon yield by 1%, SSP by 1% of the original knockouts value, and μmax,RP by 1% of the original value.

Designs with the exact same set of knockouts, substrate, oxygenation state, target molecule, and heterologous Duplicates pathway were duplicates. Only one among the duplicates goes through the filter.

Turnover rate(keff) is a parameter associated with each

Turnover rate (keff) enzyme in ME-model. A set of constant keffs are set prior to

(in ME-model) each ME-simulation. Input of different sets of keffs can output different phenotype at optimal growth rate.

Designs for which cell survive with target molecule produced Robust and non- (growth-coupled production) or cell die when for all sampled robust (in ME- sets of keffs were robust. Designs for which cell survive model) without target molecule produced for at least one sampled

set of keffs were non-robust. Chapter 4

Results

4.1. Integrated strain design search and evaluation workflow

A workflow was developed to identify robust growth-coupled strains using both

M and ME-models of E. coli K-12 MG1655. Significant target molecule production at maximal growth in M-model simulation, represent by a cutoff of 10% carbon yield, was considered growth-coupled production. To represent target production, minimum production was selected to account for possible alternative production. Cutoff of 10% was used in the evaluation workflow to filter for strain designs with significant production and prevented strain with miniscule non-zero production to pass. Subject to production minimization, retained target production was deemed growth-coupled in

M-model. ME-model’s macromolecules expression calculation allowed us to calculate the enzyme cost to sustain metabolic flux state and the returned solution indicate design’s growth-coupled production is an enzymatically efficient. Changing turnover rates had caused designs to fail and thus designs’ passing the test were labeled as robust. Designs passed subsequent evaluation steps was believed to has higher confidence of working in vivo as the growth-coupled production was robust under ME- model reconstruction with adverse sampled keff additionally.

22 23

Collected strain designs pools were collected from previous studies was generated on the context of M-model (Figure 4.1) which was impossible to done on

ME-model due to its non-linearity. To continue utilizing M-model simplicity, the steps were arranged to rationally minimize computational resource in which M-model was performed before ME-model calculation and single-pass was performed before iterative process. In this evaluation studies, strain designs were evaluated with the M- model from which they were found since previous studies used different M-models.

Thus strain designs found in Feist et. al. 2010 were evaluated in iAF1260b and strain designs found in iJO1366 were evaluated in iJO1366. In this study, a design was defined as a unique set of knockouts combination, target molecule, substrate, oxygenation state, and heterologous pathway inserted or no heterologous pathway used. Through the workflow, 2632 designs from the previous studies were reduced to

634 growth-coupled designs in M-model and 41 robust designs in ME-model (Figure

4.2).

Based on the strain design search algorithm formulation, the number of reactions knocked out did not affect the optimality of the strain design. For example, a set of 3 useful knockouts is an equally optimal solution to that set plus an additional knockout of a reaction carrying no flux when knockout penalty was not implemented.

Specifically for the design pool collected from previous work, the algorithms were run without knockout penalty for OptKnock 3 & 5 knockouts in Feist et. al. 2010 and

RobustKnock 3 knockouts and GDLS 4 knockouts in Campodonico et .al. 2014 except for OptGene 10 knockouts in Feist et. al. 2010 having knockout penalty implementation. Additionally, after implementing models modification (Table 2.1), 24

some knockouts might become obsolete. To address this issue, a knockout filtration step was put in to remove knockouts that did not improve growth-coupled qualities significantly after which removed 40 knockouts from the collected design pool. 25

Figure 4.1. Integrated model-driven strain design search and evaluation workflow using Metabolic and Metabolic-Expression models 26

Figure 4.2. Filtration workflow results. Pool of 2632 designs were collected from previous studies and among them 634 were growth-coupled and 41 were kinetically robust growth-coupled. Redundant knockouts were removed by the number of 40. Designs was filtered for significant growth-coupled production (>10% carbon yield) and subsequently robustness, i.e. the ability to maintain growth-coupled production with various set of sampled keffs. 27

4.2. Growth-coupled production diversity in

Escherichia coli

Using the workflow, growth-coupled production of 19 molecules were found and their growth-coupled yield, knockouts, and number of designs for specific molecule was presented at Table 4.1. Majority of which produced alcohols (69% of growth-coupled designs) accepting electron and recycles reduced cofactors generated by glycolysis. D-lactate and L-alanine were also terminal electron acceptors. 3-(R)-hydroxybutyrate (with an exception of anaerobic design co- producing ethanol), acrylate, acrylamide, and 3-hydroxypropanoate, while being electron acceptors, required oxygen as a co-acceptor. Some of 1,4-butanediol designs also required oxygen as a co-acceptor. TCA molecules (fumarate, succinate, and 2-oxoglutarate) were co-produced with alcohols. Pyruvate is not a terminal electron acceptor and required oxygen as one (Figure 4.3). Growth-coupled designs were found in all substrates (49% of designs were glucose, 33% xylose, and 18% glycerol) and oxygenation state (40% of designs were aerobic, 24% in ECOM, and

35% in anaerobic) albeit glycerol and anaerobic conditions favored alcohol designs

(89% of glycerol designs and 80% of anaerobic designs produced alcohols). Most of the designs in ECOM condition, due to its “aerobic fermentation” feature, produced molecules that required oxygen as an electron co-acceptor (80% of ECOM designs successfully utilized oxygen as electron co-acceptor) (Figure 4.4). 28

Table 4.1. Properties of strain designs. (GC = growth-coupled)

Max carbon yield Number of (mmol C in Number of Target molecule designs found product/mmol C knockouts range in substrate) GC Robust GC Robust GC Robust 1-Butanol 76 6 0.61 0.61 1-3 2-3 Isobutanol 17 0.63 1-3 1-Propanol 58 0.55 1-4 2-Propanol 10 0.41 2-4 Ethanol 55 0.62 1-10 1,4-Butanediol 191 17 0.61 0.54 1-4 2-4 2,3-Butandiol 2 0.44 3-3 1,3-Propanediol 28 0.20 2-4 Glycerol 1 0.34 3-3 3-Hydroxyvalerate 0 3-(R)-Hydroxybutyrate 9 0.60 2-4 D-Lactate 63 6 0.96 0.95 2-10 5-10 Acrylate 33 0.50 1-4 Acrylamide 14 0.50 2-4 2-Oxopentanoate 0 3-Methyl-2-oxobutanoate 0 2-Oxobutanoate 0 3-Hydroxypropanoate 29 0.80 2-4 Fumarate 1 0.17 3-3 Succinate 12 4 0.60 0.51 4-10 5-10 2-Oxoglutarate 1 1 0.12 0.14 5-5 5-5 Pyruvate 30 9 0.84 0.81 2-5 3-5 L-Alanine 4 0.95 3-5 L-Glutamate 0 L-Malate 0 L-Serine 0 29

Figure 4.3. Growth-coupled production pathway map. Target molecules, substrates, reduced cofactors (e.g. NADH, NADPH), ATP, important branching points (e.g. pyruvate, acetyl-CoA), and molecules contribute to carbon lost (e.g. formate, CO2) were annotated. Several pathways such as pentose phosphate pathway and Entner- Doudoroff were omitted to reduce figure complexity from network interconnectivity and emphasize production pathways. 30

Figure 4.4. Growth-coupled production of target molecules in specific conditions. Oxygenation states included aerobic (A), ECOM (E), and anaerobic (N) 31

4.3. Knockout prediction

Knockout prediction by existing algorithms was consistent with observation from literature mining study (King et al. 2017), such as common knockouts blocking wild-type fermentation pathways producing formate (PFL), lactate (LDH_D), ethanol

(ACALD, ALCD2x), acetate (ACKr), and succinate (MDH, FUM, FRD3). There were

64 knockouts suggested by optimization-based strain design search algorithm contribute to robust growth-coupled designs in which reactions around pyruvate are highly targeted and pyruvate-formate lyase (PFL) being targeted the most. Electron transport chain reactions were the second most targeted group and interestingly ATP synthase was the second most knocked out and presented in strain designs regardless of oxygenation conditions (ATPS4rpp, Figure 4.5B).

Most of the knockouts involved in redirecting flux toward target fermentation pathways around branch points such as pyruvate (e.g. pyruvate-formate lyase and pyruvate dehydrogenase knockouts) and acetyl-CoA (e.g. acetate kinase and knockouts). Important branch points and their relative positions were shown in Figure 4.3. Electron transport cofactors choice (e.g. NADH, NADPH) was important for robust designs as knockouts were also found in redirecting flux toward one of the three glycolysis pathway options of Embden-Meyerhof-Parnas (EMP) (e.g. glucose 6-phosphate dehydrogenase knockout), Entner-Doudoroff (ED) (e.g. phosphogluconate dehydrogenase and glucose-6-phosphate knockouts), or the pentose phosphate pathway (PPP) (2-dehydro-3-deoxy-phosphogluconate aldolase knockout) which are different in choice. For example, when breaking glucose to 2 pyruvate or 5/3 pyruvate plus CO2 in PPP, using EMP pathway generates 2 NADH, PPP generates 2 NADPH, and ED generates half of each. 32

Controlling electron transport chain knockouts (e.g. cytochromic reactions knockouts in ECOM strains and ATP synthase knockout) participated in 60% of the designs and were beneficial to growth-coupled production.

Figure 4.5. Knockout strategy suggested by strain design search algorithm for growth-coupled production. (A) Fraction of robust designs employed a specific knockout strategy. The top three strategies employment were pyruvate outflow control, electron transport chain control, and NADH/NADPH generation choice via glycolysis control. (B) Number of designs having a specific knockout for a target molecule. Knockout-to-strategy was mapped with colored patches between figures A and B. 33

4.4. Robust designs after kinetic parameters sampling test in Metabolic-Expression model

A molecule is growth-coupled if its production presents at optimal growth rate in simulation. A growth-coupled molecule production is metabolically favorable in M- model whereas in ME-model the production is a balance between metabolic benefit gained and enzymes cost which yields optimal enzyme efficiency. This efficiency is, however, susceptible to change in turnover rates in ME-model. In our study, the enzyme efficiency optimality of growth-coupled production was altered by tuning the keff by which target molecule pathway become inefficient and/or competitive molecule pathway become efficient.

A competitive molecule production presented in M-model simulation if its production is equally optimal to target molecule production. In ME-model, because enzyme costs can change with keff, a competitive molecule production was found by being a suboptimal replacement to target molecule production at default keff but become optimal under different set of kinetic parameters which alter the network enzyme efficiency. Growth-coupled production could fail in efficient competitive molecule pathway and/or inefficient target molecule pathway. However, the growth- coupled production failure was more likely to cause by the inefficient target molecule production (68% of ME-model failed designs) rather than efficient competitive molecule production (11% of ME-model failed designs). Default ME-model keffs, perhaps, are mostly efficient enough for cell metabolic activities so higher enzyme efficiency hardly change cell metabolism and decreasing the enzyme efficiency is an effective way to do so. 34

Among 41 robust strain designs, 10 strains were not found in pool of strains from the literature mining study (King et al. 2017) while the rests have at least partial matching with strains published in literature. As an example for ME-model robust designs, two pathways for 1,4-butanediol (14BDO) production were found growth- coupled: succinyl-CoA precursor and butanoyl-CoA precursor (Figure 4.6). However, under ME-model sampled keff, D-Lactate replaced 14BDO as secreted compound for the latter pathway. Thus, succinyl-CoA precursor pathway had higher confidence level to be success as its growth-coupled production worked in both M-model and ME- model with sampled keffs. The robust 14BDO pathway has the same succinyl-CoA precursor to the published strain except 2-oxoglutarate was additionally used as precursor in published strain (Yim et al. 2011). Another difference between the strains was the robust strain used reductive TCA cycle whereas the published strain used the oxidative counterpart. The published strain worked both in vivo and in silico (King et al. 2017) and contains all the knockout suggested by algorithms from previous study

(Campodonico et al. 2014). 35

Figure 4.6. Suggestion of robust 1,4-butanediol production pathway by Metabolic- Expression model. For two growth-coupled 1,4-butanediol (14BDO) production pathways, in ME-model simulation with different set of keffs, competitive production pathways producing D-lactate (red arrows) were found to replace 14BDO production in the 14BDO production pathway on the right. Robust designs were found using 14BDO pathway on the left. Chapter 5

Discussion

An advance in E. coli genome-scale models’ scale and scope has allowed additional study of cell factory engineering and it was shown that next-generation models have increasingly better prediction capabilities (King et al. 2017). The M-

Model has become increasingly informative allowing systematic study of target molecule production subject to whole-cell metabolism and specifically alternative molecule production. In addition to metabolism, ME-model provides a mathematical framework allowing study of growth-coupled production under suboptimal enzyme efficiency by altering model’s kinetic parameters.

There are a number of challenges involved when using a ME-model for system metabolic engineering. First, the current scope of study could be expanded from pathway to genome-scale level with the trade-off of computational time for prediction confidence. Taking genome-scale scope into account, reducing the number of enzymes selected for kinetic parameter sampling would be a first rational step to reduce the computation resource required for genome-scale level study. Different method of sampling, such as random sampling (King et al. 2017), could be employed in place of discretization to reduce the amount of simulation performed.

5 -1 Second, keffs magnitudes change was up to some extreme 10 s to guarantee the highest level of kinetic parameters robustness for strain designs because the

36 37

lowest and highest recorded turnover rates were assumed to be possible for fermentation pathways and it is uncertain how much enzyme efficiency could change.

Tighter constraints for different enzymes keff could be taken into account depend on their catalytic activities (Bar-Even et al. 2011) or magnitude change could be lower down to the highest record in vivo - in vitro change of 103 s-1 (Davidi et al. 2016). More designs with lower prediction confidence but with higher biological relevance would be robust under less severe keff magnitude change.

Third, this study which is built upon previous systematic strain design search

(Feist et al. 2010; Campodonico et al. 2014) is based on growth optimization while cell in vivo might not operate at optimal growth rate. Growth-coupled production was found in both M and ME-models to be increasingly favorable with higher growth rate.

However,. This pose a challenge to drive cell toward optimal growth in experiment.

ALE is a platform to drive cell toward the optimal growth and can ALE alone be used to drive cell toward that optimal point? Several studies were successful in using ALE to increase growth-coupled production (Fong et al. 2005; Zhang et al. 2007). ALE was also found to increase product tolerance (Atsumi et al. 2010) which is favorable for high concentration due to overproduction.

Fourth, supporting the adaptive evolution effort to drive the cell toward optimal growth, there are other biological barriers that are beyond the scope of M and ME- models such as regulation and transporter engineering which have shown to affect molecule overproduction (Davy et al. 2017). For example, feedback inhibition was removed to increase antitumor drugs violacein and deoxyviolacein (Rodrigues et al.

2013) and L-threonine importer was knocked out to increase L-threonine production

(Lee et al. 2007). As a continuation of the workflow (Figure 4.1), available modeling 38

methods for regulation integration (Blazier and Papin 2012) and transporter expression (Dunlop et al. 2011) could be integrated, and further design improvements could be suggested to increase the confidence in strain designs.

Chapter 1-5 are adapted from an in preparation manuscript: Dinh, H. V., King

Z. A., Palsson B. Ø., Feist A. M. Identification of growth-coupled and kinetic parameters robust production in strain designs using genome-scale models. The thesis author was the primary author of the manuscript and was responsible for the research. Bibliography

Atsumi, S., Wu, T.-Y., Machado, I.M.P., Huang, W.-C., Chen, P.-Y., Pellegrini, M., Liao, J.C., 2010. Evolution, genomic analysis, and reconstruction of isobutanol tolerance in Escherichia coli. Mol. Syst. Biol. 6, 449.

Bar-Even, A., Noor, E., Savir, Y., Liebermeister, W., Davidi, D., Tawfik, D.S., Milo, R., 2011. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. Biochemistry 50, 4402–4410.

Blazier, A.S., Papin, J.A., 2012. Integration of expression data in genome-scale metabolic network reconstructions. Front. Physiol. 3, 299.

Campodonico, M.A., Andrews, B.A., Asenjo, J.A., Palsson, B.O., Feist, A.M., 2014. Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path. Metab. Eng. 25, 140–158.

Clark, D.P., 1989. The fermentation pathways of Escherichia coli. FEMS Microbiol. Rev. 5, 223–234.

Davidi, D., Noor, E., Liebermeister, W., Bar-Even, A., Flamholz, A., Tummler, K., Barenholz, U., Goldenfeld, M., Shlomi, T., Milo, R., 2016. Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcat measurements. Proc. Natl. Acad. Sci. U. S. A. 113, 3401–3406.

Davy, A.M., Kildegaard, H.F., Andersen, M.R., 2017. Cell Factory Engineering. Cell Syst 4, 262–275.

Dellomonaco, C., Clomburg, J.M., Miller, E.N., Gonzalez, R., 2011. Engineered reversal of the β-oxidation cycle for the synthesis of fuels and chemicals. Nature 476, 355–359.

Dharmadi, Y., Murarka, A., Gonzalez, R., 2006. Anaerobic fermentation of glycerol by Escherichia coli: a new platform for metabolic engineering. Biotechnol. Bioeng. 94, 821–829.

Dunlop, M.J., Dossani, Z.Y., Szmidt, H.L., Chu, H.C., Lee, T.S., Keasling, J.D., Hadi, M.Z., Mukhopadhyay, A., 2011. Engineering microbial biofuel tolerance and export using efflux pumps. Mol. Syst. Biol. 7, 487.

Ebrahim, A., Lerman, J.A., Palsson, B.O., Hyduke, D.R., 2013. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 7, 74.

39 40

Eremina, N.S., Yampolskaya, T.A., Altman, I.B., Mashko, S. V, Stoynova, N. V, 2010. Overexpression of ydbK-encoding putative pyruvate synthase improves L-valine production and aerobic growth on ethanol media by an Escherichia coli strain carrying an oxygen-resistant alcohol dehydrogenase. J Microb. Biochem Technol 2, 77–83.

Feist, A.M., Henry, C.S., Reed, J.L., Krummenacker, M., Joyce, A.R., Karp, P.D., Broadbelt, L.J., Hatzimanikatis, V., Palsson, B.Ø., 2007. A genome‐scale metabolic reconstruction for Escherichia coli K‐12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol. Syst. Biol. 3, 121.

Feist, A.M., Zielinski, D.C., Orth, J.D., Schellenberger, J., Herrgard, M.J., Palsson, B.Ø., 2010. Model-driven evaluation of the production potential for growth-coupled products of Escherichia coli. Metab. Eng. 12, 173–186.

Fong, S.S., Burgard, A.P., Herring, C.D., Knight, E.M., Blattner, F.R., Maranas, C.D., Palsson, B.O., 2005. In silico design and adaptive evolution of Escherichia coli for production of lactic acid. Biotechnol. Bioeng. 91, 643–648.

Foster, J.M., Davis, P.J., Raverdy, S., Sibley, M.H., Raleigh, E.A., Kumar, S., Carlow, C.K.S., 2010. Evolution of bacterial phosphoglycerate mutases: non-homologous isofunctional enzymes undergoing gene losses, gains and lateral transfers. PLoS One 5, e13576.

Fujisawa, H., Nagata, S., Misono, H., 2003. Characterization of short-chain dehydrogenase/reductase homologues of Escherichia coli (YdfG) and Saccharomyces cerevisiae (YMR226C). Biochim. Biophys. Acta 1645, 89–94.

Gosset, G., Zhang, Z., Nayyar, S., Cuevas, W.A., Saier Jr, M.H., 2004. Transcriptome analysis of Crp-dependent catabolite control of gene expression in Escherichia coli. J. Bacteriol. 186, 3516–3524.

Josephson, B.L., Fraenkel, D.G., 1974. Sugar metabolism in transketolase mutants of Escherichia coli. J. Bacteriol. 118, 1082–1089.

Josephson, B.L., Fraenkel, D.G., 1969. Transketolase mutants of Escherichia coli. J. Bacteriol. 100, 1289–1295.

Kanehisa, M., Goto, S., 2000. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30.

King, Z.A., Lloyd, C.J., Feist, A.M., Palsson, B.O., 2015. Next-generation genome- scale models for metabolic engineering. Curr. Opin. Biotechnol. 35, 23–29.

King, Z.A., O’Brien, E.J., Feist, A.M., Palsson, B.O., 2017. Literature mining supports a next-generation modeling approach to predict cellular byproduct secretion. Metab. Eng. 39, 220–227. 41

Lee, J., 1997. Biological conversion of lignocellulosic biomass to ethanol. J. Biotechnol. 56, 1–24.

Lee, K.H., Park, J.H., Kim, T.Y., Kim, H.U., Lee, S.Y., 2007. Systems metabolic engineering of Escherichia coli for L-threonine production. Mol. Syst. Biol. 3, 149.

Lee, S.Y., Kim, H.U., 2015. Systems strategies for developing industrial microbial strains. Nat. Biotechnol. 33, 1061–1072.

Lewis, N.E., Nagarajan, H., Palsson, B.O., 2012. Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305.

Liu, J.K., O’Brien, E.J., Lerman, J.A., Zengler, K., Palsson, B.O., Feist, A.M., 2014. Reconstruction and modeling protein translocation and compartmentalization in Escherichia coli at the genome-scale. BMC Syst. Biol. 8, 110.

Lloyd, C.J., Ebrahim, A., Yang, L., King, Z.A., Catoiu, E., O’Brien, E.J., Liu, J.K., Palsson, B.O., 2017. COBRAme: A Computational Framework for Building and Manipulating Models of Metabolism and Gene Expression (unpublished), bioRxiv.

Ma, D., Saunders, M.A., 2015. Solving Multiscale Linear Programs Using the Simplex Method in Quadruple Precision, in: Numerical Analysis and Optimization, Springer Proceedings in Mathematics & Statistics. Springer, Cham, pp. 223–235.

Machado, D., Herrgård, M.J., 2015. Co-evolution of strain design methods based on flux balance and elementary mode analysis. Metab. Eng. Commun. 2, 85–92.

Maia, P., Rocha, M., Rocha, I., 2016. In Silico Constraint-Based Strain Optimization Methods: the Quest for Optimal Cell Factories. Microbiol. Mol. Biol. Rev. 80, 45– 67.

Nakayama, T., Yonekura, S.-I., Yonei, S., Zhang-Akiyama, Q.-M., 2013. Escherichia coli pyruvate:flavodoxin , YdbK - regulation of expression and biological roles in protection against . Genes Genet. Syst. 88, 175–188.

Nuñez, M.F., Pellicer, M.T., Badia, J., Aguilar, J., Baldoma, L., 2001. Biochemical characterization of the 2-ketoacid reductases encoded by ycdW and yiaE genes in Escherichia coli. Biochem. J 354, 707–715.

O’Brien, E.J., Lerman, J.A., Chang, R.L., Hyduke, D.R., Palsson, B.Ø., 2013. Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol. Syst. Biol. 9, 693.

O’Brien, E.J., Monk, J.M., Palsson, B.O., 2015. Using Genome-scale Models to Predict Biological Capabilities. Cell 161, 971–987. 42

Orth, J.D., Conrad, T.M., Na, J., Lerman, J.A., Nam, H., Feist, A.M., Palsson, B.Ø., 2011. A comprehensive genome‐scale reconstruction of Escherichia coli metabolism---2011. Mol. Syst. Biol. 7, 535.

Park, J.H., Lee, S.Y., 2008. Towards systems metabolic engineering of microorganisms for amino acid production. Curr. Opin. Biotechnol. 19, 454–460.

Perlack, R.D., Wright, L.L., Turhollow, A.F., Graham, R.L., Stokes, B.J., Erbach, D.C., 2005. Biomass as feedstock for a bioenergy and bioproducts industry: the technical feasibility of a billion-ton annual supply (techreport).

Portnoy, V.A., Herrgård, M.J., Palsson, B.Ø., 2008. Aerobic fermentation of D-glucose by an evolved oxidase-deficient Escherichia coli strain. Appl. Environ. Microbiol. 74, 7561–7569.

Rodrigues, A.L., Trachtmann, N., Becker, J., Lohanatha, A.F., Blotenberg, J., Bolten, C.J., Korneli, C., de Souza Lima, A.O., Porto, L.M., Sprenger, G.A., Wittmann, C., 2013. Systems metabolic engineering of Escherichia coli for production of the antitumor drugs violacein and deoxyviolacein. Metab. Eng. 20, 29–41.

Samland, A.K., Sprenger, G.A., 2006. Microbial aldolases as C--C bonding enzymes---unknown treasures and new developments. Appl. Microbiol. Biotechnol. 71, 253.

Sawers, G., 2001. A novel mechanism controls anaerobic and catabolite regulation of the Escherichia coli tdc operon. Mol. Microbiol. 39, 1285–1298.

Schellenberger, J., Que, R., Fleming, R.M.T., Thiele, I., Orth, J.D., Feist, A.M., Zielinski, D.C., Bordbar, A., Lewis, N.E., Rahmanian, S., Kang, J., Hyduke, D.R., Palsson, B.Ø., 2011. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat. Protoc. 6, 1290–1307.

Schürmann, M., Sprenger, G.A., 2001. Fructose-6-phosphate Aldolase Is a Novel Class I Aldolase from Escherichia coli and Is Related to a Novel Group of Bacterial Transaldolases. J. Biol. Chem. 276, 11055–11061.

Shao, Z., Newman, E.B., 1993. Sequencing and characterization of the sdaB gene from Escherichia coli K-12. FEBS J. 212, 777–784.

Sittig, M., Weil, B.H., 1954. Raw Materials for Chemicals from Petroleum, in: LITERATURE RESOURCES, Advances in Chemistry. AMERICAN CHEMICAL SOCIETY, pp. 327–338.

Thiele, I., Palsson, B.Ø., 2010. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5, 93–121. 43

United Nations Industrial Development Organization (UNIDO), 2007. Industrial biotechnology and biomass utilization.

Yang, L., Ma, D., Ebrahim, A., Lloyd, C.J., Saunders, M.A., Palsson, B.O., 2016. solveME: fast and reliable solution of nonlinear ME models. BMC Bioinformatics 17, 391.

Yim, H., Haselbeck, R., Niu, W., Pujol-Baxley, C., Burgard, A., Boldt, J., Khandurina, J., Trawick, J.D., Osterhout, R.E., Stephen, R., Estadilla, J., Teisan, S., Schreyer, H.B., Andrae, S., Yang, T.H., Lee, S.Y., Burk, M.J., Van Dien, S., 2011. Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 7, 445–452.

Zhang, X., Jantama, K., Moore, J.C., Shanmugam, K.T., Ingram, L.O., 2007. Production of L-alanine by metabolically engineered Escherichia coli. Appl. Microbiol. Biotechnol. 77, 355–366.