Flux balance analysis to improve butanol productivity in Synechocystis PCC 6803

Master of Science Thesis

Kiyan Shabestary

Supervisor: Josefine Anfelt, PhD student, KTH

Examiner: Paul Hudson, Assistant Professor, KTH

Degree Project in Biotechnology BB202X

Abstract

Engineering microorganisms at the systems level is recognized to be the future of metabolic engineering. Thanks to the development of genome annotation, microorganisms can be understood, as never before, and be reconstructed in the form of computational models. Flux balance analysis provides a deep insight into cellular metabolism and can guide metabolic engineering strategies. In particular, algorithms can assess the cellular complexity of the metabolism and hint at genetic interventions to improve productivity. In this work, Synechocystis PCC 6803 metabolism was investigated in silico. Genetic interventions could be suggested to couple butanol synthesis to growth as a way to improve current productivities. recycling and, in particular, buffering mechanisms were shown to be important targets. Creating a cofactor imbalance and removing these buffering mechanisms is an important driving force. This forces a carbon flux through butanol synthesis to maintain cofactor balance and sustain growth.

Objective

The objective of the present work is to identify gene targets in silico at the systems level to improve n-butanol and isobutanol productivities in Synechocystis PCC 6803. An emphasis is put on coupling product to growth and understanding the driving forces behind it.

2 Contents

Introduction ………………………………………………………………………………………4 Theory……………………………………………………………………………………………….6 6 Genome-scale metabolic models…….…………………………………………………………..6 Flux balance analysis …………………………………………………………………………………7 Optimization algorithms ……………………………………………………………………………9 9 n-butanol and isobutanol pathways …………………………………………………………11 11 Metabolic engineering in cyanobacteria …………………………………………………...14 Results …………………………………………………………………………………………….18 OptKnock to predict reaction deletions ………………………………………………...... 18 OptForce to predict reaction modulations .………………………………………………23 Identifying buffering mechanisms for cofactor balancing ..……………………28 Discussion ……………………………………………………………………………………….30 Conclusion ……………………………………………………………………………………….33 Future directions ……………………………………………………………………………..33 References ……………………………………………………………………………………….34

3 Introduction

Industrial biotechnology is an emerging and promising discipline for the production of a wide range of compounds including fine chemicals, biopolymers, biofuels or biopharmaceuticals. Using living organisms as ‘cell factories’, mainly micro- refineries that perform a specific task, represents the cornerstone of this field and lay the foundations towards a future bio-based society. The apparition of high- throughput sequencing to read cellular genetic information has appeared as a revolution leading biotechnology into a new era. Understanding the cellular genetic code enable an unprecedented insight into microorganisms physiology and functions. As a result, tools to genetically engineer cell factories have evolved to gain in accuracy.

Previous methods to seek for high-producing strains were mainly based on chance where random mutagenesis created strains diversity and good screening methods enabled selection of the best producers. However, these methods were time- consuming and did not find the optimal strains. Recently, genomics paved the way for other –omics techniques such as transcriptomics, proteomics and finally metabolomics. This high amount of data assessable with newly available high- throughput analytical techniques can be translated into gene-protein-reaction (GPR) associations. New methods, mainly rational design, accurately target genes to modulate cellular functions through this relationship for production purposes. Metabolic engineering is the field aiming at understanding how genes influence cellular metabolism and further exploit them to shape microorganisms metabolism to meet engineering objectives. Metabolic engineer may count on tools including gene deletions, over-expression, down-regulation or even heterologous gene insertion from other organisms as way to improve strains.

Initial metabolic engineering strategies aimed at redirecting carbon flux to product synthesis regardless of the metabolic burden created in the cell. In particular, cofactors, widely connected metabolites helping reactions to go forward, were not taken into account and their imbalance resulted in such bottlenecks. It is now commonly thought that understanding the cell as a whole is an important key to metabolic engineering success. However, biological organisms are complex systems involving hundreds if not thousands of biochemical reactions. For metabolic engineers, this means a high number of combinatorial gene targets to reach an optimal overproduction phenotype. Therefore, to efficiently assess the high number of possibilities a cell can offer, computational methods are very welcoming tools to support experiments. Genome-scale metabolic models aim to represent cellular metabolism as a whole using GPR associations and can be used for simulations. Flux balance analysis (FBA) is a computational method aiming at solving flux distributions (phenotype) in cellular metabolism for a given genotype. Algorithms can further be used to provide strategies (mainly reaction deletions or modulations) in order to reach a desired overproduction phenotype.

4 An important issue met in engineering strategies is cellular behavior. Very often, engineering strategies are in opposition to the cellular ones. Most microorganisms optimize their growth as a mean of survival. The ones growing the fastest outcompete slower organisms. This forms the basis for natural selection. Organisms have evolved for millions of year to perfectly fit their environment. Coupling product synthesis to growth is a way to bypass this issue. A successful strategy is to modify the stoichiometric system in a way that the cell needs to produce a product in order to achieve growth. Therefore at maximal growth rate, the product is synthesized as a growth-related by-product.

The aim of this project is to perform FBA on cyanobacteria Synechocystis PCC 6803 genome scale model to identify metabolic engineering strategies for enhanced biofuel production. n-butanol and isobutanol are target molecules in this project. Cyanobacteria are prokaryotic algae and promising cell factories for the production of a wide variety of compounds including biofuels. There are two types of process using algea for biofuel production. One is to use high-biomass content eukaryotic algae such as chlamydomonas species for biomass production. The biomass is then decomposed into sugars and fermented into biofuel by microorganisms such as yeast. The other is the direct use of algae as a chassis for biofuel production. Cyanobacteria species are preferred cell factories in this case due to easier genetic modifications. Nevertheless, algae only require nitrogen, carbon dioxide and sunlight to grow. This low requirement makes them promising cell factories. Moreover they do not compete with food production as they do not require arable land.

5 Theory

Genome-scale metabolic models

Genome-scale metabolic models (GSMMs) have been the catalysts for the development of flux balance analysis. It is a bottom-up approach where the model is created from its smallest constituents, starting from genes to establish the metabolic map (Thiele and Palsson, 2010; Fig. 1). Based on extensive research from literature and databases (KEGG, Brenda), a draft reconstruction can be first established. It should be noted that this might include previous GSMMs to be upgraded. Software such as the RAVEN toolbox (Agren et al., 2013) can perform reconstructions in a semi-automated way. This reconstruction often results in gaps i.e. some missing reactions in a pathway (a missing serine pathway for Synechocystis PCC 6803; Knoop et al., 2013). For further readability, confidence scores are assigned for each reaction. Reactions without any physiological evidence, introduced in the model for functional purposes solely, take the lowest confidence value. Finally, a model can be generated into a mathematical form, the stoichiometric matrix (S matrix). All stoichiometric coefficients for each reaction and metabolite are stored in the S matrix. One important part of the model is the addition of artificial reactions for modeling purposes. This includes addition of maintenance and a biomass reaction. The biomass reaction aims to link precursors to biomass constituent such as lipids, proteins and other macromolecules to model growth. The flux going through this reaction is scaled in order to represent the growth rate (μ).

The reconstructed model can then be saved in the Systems Biology Markup Language (SBML) format (Hucka et al., 2003). This provides a standardized way for model usage and improvement. Genome-scale models can be used for two purposes mainly. First, they can be used as a way to provide a way to contextualize high trough-put omics data. Secondly, they can be used alongside algorithms in order to find optimal engineering strategies. The latter is the aim of the present work. Synechocystis PCC 6803, often labeled as the “green Fig. 1. Genome-scale model reconstruction procedure. Figure from Feist et al. (2008). Escherichia coli” (E. coli), is a

6 model organism for phototrophs. However, lack of genome annotation and extensive in vivo gene essentiability have slowed down the development of accurate cyanobacteria genome-scale models in comparison to well-studied organisms like E. coli or Saccharomyces cerevisiae (S. cerevisiae). The apparition of a gene database (CyanoBase; http://genome.microbedb.jp/cyanobase/) for all known Synechocystis PCC 6803 and Anabaena PCC 7120 ORFs (Nakamura et al., 1999; Nakao et al., 2010) paved the way for numerous genome-scale reconstructions (Shastri and Morgan, 2005; Montagud et al., 2011; Saha et al., 2012; Nogales et al., 2012; Knoop et al., 2013).

Synechocystis sp. PCC 6803 model iJN678 (Nogales et al., 2012) was used to perform flux balance analysis. The reconstructed network incorporates 678 genes, 863 reactions and 795 metabolites. The model supports heterotrophic, autotrophic and mixotrophic conditions simulation by constraining uptakes. Autotrophic conditions were simulated as “light limited” where photon flux was constrained to -18.7 mmol/gDW/h (both bounds) and carbon uptake (HCO3) constrained to a maximal value of 3.8 mmol/gDW/h. In heterotrophic conditions, glucose uptake was constrained to 0.8 mmol/gDCW/h in order to match the photoautotrophic maximum growth rate. Photon flux was set to 0.

In this project, iJN678 was used because it offers a standardized metabolite, reaction and transporter annotation comparable to E. coli genome-scale models. The biomass objective function is refined in comparison to other models. Respiration and photosynthesis are also well modeled, in particular cyclic electron flows which are important targets in this study.

Results have been shown to be sensitive to the model used, cofactor preference and reaction promiscuity. Since cofactor preference is not known for a large number of reactions, models can show high differences in cofactor assignment. This has been shown to be a real issue when investigating suitable strategies for target overproduction. For this reason, the model from Knoop et al. (2013) is used to contrast and discuss the validity of obtained results.

Flux balance analysis

Flux balance analysis (FBA) is a mathematical method assuming steady-state growth to simulate flux distribution for a given genome-scale model (Orth et al., 2010). Mainly, internal metabolites such as the ones found in the central metabolism are assumed not to accumulate. For each internal metabolite, production and consumption rates equalize. This then leads to a set of equations that can be solved using linear programming (LP). Since the system is underdetermined, multiple distributions are possible. It should be noted that this is a fundamental difference with metabolic flux analysis (MFA) where exchange reactions are measured such as the system is determined. A single flux distribution can then be obtained for MFA. Using an objective function to be maximized or minimized (a reaction in the model needs to be chosen), a certain distribution is

7 picked up. For the flux distribution to be realistic, the objective function to be optimized should fit the organism objective. Natural selection has compelled microorganisms to aim for increased fitness. Therefore, maximization of growth (biomass equation) is often chosen as objective function, especially for prokaryotes.

Alternatives to FBA include minimization of metabolic adjustment (MOMA; Segrè et al., 2002) and regulatory on/off minimization (ROOM; Shlomi et al., 2005). Both are used to find distribution in mutants cells based on WT distributions. They both minimize flux distribution differences between the WT and the mutant. MOMA is used to find immediate flux distribution whereas FBA and ROOM give the final one (Segrè et al., 2002). MOMA and ROOM are relevant tools when mutants are tested experimentally.

FBA is often referred to as a constraint-based tool. Constraints present in current GSMMs can be classified into two types: (1) Balance constraints due to the stoichiometry of the system. (2) Flux constraints where each flux is limited by an upper and lower bound (i.e irreversible reaction cannot carry a negative flux). Additional constraints on the system can greatly influence the solution space, mainly, the set of all mathematical possible flux distributions that satisfy the constraints (Fig. 2). The more constraints one adds, the more restricted the solution space. This results in predicting an optimal flux distribution with greater accuracy.

Fig. 2. The conceptual basis of constraint-based modeling. In this representation, a three reactions model is assumed for simplicity. Without any constraints, all mathematical distributions are possible solutions of the system. When constraints are added, the solution space is restricted to a constrained area (in blue). An objective function can be used to get an optimal flux distribution for (v1,v2,v3). In this example v3 is the objective function to be optimized. Adapted from Orth et al. (2013). The main constraint in FBA is the stoichiometry of the system. Net formation of a given metabolite is calculated as the sums of all flux leading to this metabolite minus the sums of all flux consuming this metabolite, over the whole system. Depending on the stoichiometry, some reactions can weigh more than others. For a given flux, a reaction whose stoichiometry is 2 moles of metabolites formed per mole of leads to higher formation rate than if only 1 mole was formed. Thus, net

8 metabolite formation rate !!! can be simplified to the sum of reaction rates !" multiplied by the reaction-specific stoichiometry coefficient of the metabolite :

! �� ! = � � �� !" ! !!!

N is the total number of reactions, �!" and vj are stoichiometric coefficient and reaction rate for the ith metabolite and jth reaction. This equation can be further generalized for all metabolites M of the system. A vector of net formation rates dx/dt can be obtained as the S matrix multiplied by a vector v containing the flux distribution. Under steady-state, the net rate for each metabolites is zero (no accumulation).

�� = �� = 0 ��

Other types of constraints not integrated (for now) in current genome-scale models are regulatory constraints or advanced kinetic constraints such as activity or kinetic parameter values.

Similarly, all possible mathematical distributions can be further plot in two or three dimensions. One reaction flux can be plotted against another one giving a particular solution space. This is useful to determine possible phenotypes for a given genotype. Ibarra et al. (2002) successfully demonstrated FBA accuracy for long-term phenotypes. After 40 days, E. coli evolved from a sub-optimal phenotype to reach optimal growth as predicted by FBA.

The most widely used platform to perform FBA is the COBRA toolbox (Schellenberger et al., 2011). It can be used on MATLAB or python platforms and its high modularity and documentation makes COBRA toolbox the best choice in the scientific community. Most simulations were performed using the COBRA toolbox 2.0 on MATLAB.

Optimization algorithms

As discussed before, FBA offers a genuine insight into the flux distribution for a given network. As an extension, perturbations can be applied to the system in the form of reaction deletions or modulations. Flux distributions can be computed and differences can be analyzed to understand the cellular metabolism. Additionally, algorithms can be employed to go the other way round. For a particular phenotype – let’s say an overproduction phenotype-, algorithms predict associated reaction deletions or modulations (genotype alteration).

9 OptKnock

The most common and widely used algorithm is the bi-level optimization framework OptKnock (Burgard et al., 2003). It identifies reaction deletion strategies that couple the biomass objective function with a biochemical overproduction target. This bi-level formulation can be simplified into a single-level mixed integer linear programming (MILP), which consists of problems with both discrete and continuous variables.

maximize �!"#$%&' (OptKnock) �! subject to maximize �!"#$%&& (Primal) �! ! subject to !!! �!" �! = 0 �!!!"!#_!"#$%& = �!!!"!#_!!_!"#$%&%"#' ! ! �!"#! ≤ �!"#! !"# !"#$%! �!"#$!"" ≥ �!"#$%&& !"# !"# �! ∙ �! ≤ �! ≤ �! ∙ �!, ∀� ∈ �

!∈! 1 − �! ≤ � ���ℎ �! = 0,1 , ∀� ∈ � ��� �, ������ �� ��������� �������

However, due to the complexity of the algorithm, high computational time has often been an issue. OptKnock screen for all reactions, and the more knockout allowed, the higher number of combinations. An alternative algorithm is OptGene, a genetic algorithm (GA) to find knockout strategies quicker (Patil et al., 2005). The idea behind this algorithm is to find individuals with high fitness values (mainly high productivity). Individuals are then selected for mating. Offspring are screened and the iterative process continues. A mutation rate is also applied each round in order to create diversity.

OptForce

Developped as an extension of the OptKnock framework, OptForce finds reaction modulations in addition to deletions (Ranganathan et al., 2010). The algorithm compares the flux variability between an overproducing (OP) phenotype and a wild type (WT) phenotype (Fig. 3). Reactions whose value ranges significantly differ between OP and WT phenotypes are listed in MustU (over-production) and MustL (down-regulation). This can be further extended to reactions pair (MustUU, MustLL, MustUL). As a first output of the algorithm, these reaction lists can be used to guide strategies. OptForce can further compute combinations of these reaction modulations from these lists alongside reaction knockouts. The obtained strategy is referred as the FORCE set. By increasing the number of allowed modifications (knock-out/-up/-down), OptForce can iteratively give different strategies and solutions can pictures in the form of a boolean diagram.

10

OptSwap

OptSwap (King and Feist, 2013) is a bi-level MILP problem based on RobustKnock formulation (Tepper and Shlomi, 2010). In addition to RobustKnock and OptKnock, it incorporates possible swaps in cofactor specificity for reactions. Since cofactor usage is an essential feature of coupling, this algorithm provides a genius tool to achieve high productivity.

Fig. 3. Comparing flux ranges between normal and overproducing phenotype in OptForce (Ranganathan et al., 2010). There are other similar algorithms developed for specific purposes but not extended to the COBRA toolbox community (OptORF, Kim & Reed, 2010; CASOP, Hädicke & Klamt, 2010). Nevertheless, results using different algorithms are useful tools to compare with results from OptForce, OptKnock and OptSwap. n-butanol and isobutanol pathways n-butanol (hereafter butanol) is a fermentative product produced from the precursor acetyl-CoA through the ABE fermentation process. Traditionally, production of fermentative products has been produced under anaerobic conditions in facultative or obligate fermentative organisms. Earlier efforts to produce butanol have been achieved in the native butanol producer and obligate anaerobic bacteria Clostridia, where butanol production was first reported (Pasteur, 1862). Clostridium species are not great hosts due to their slow growth and thus low volumetric productivity. Therefore, Clostridia butanol pathway has further been adapted to other organisms (Nielsen et al., 2009). Expressing a pathway from a fermentative organism into an autotrophic organism such as cyanobacteria raise few concerns. Among them, oxygen sensitivity of pathway (Atsumi et al., 2008; Boynton et al., 1996) and reversibility of trans-enoyl-CoA reductase (Bond-Watts et al., 2011) have been targets for improvement. To solve these problems, chimeric pathway using enzymes from other organisms have been employed (Lan & Liao, 2012; Bond- Watts et al., 2011; Anfelt et al., 2015). Alternative pathways to produce butanol include modification of the clostridia pathway at one or several steps. For instance, Lan and Liao (2012) replaced the first condensation step by an ATP-driven two-step process using beta-oxidation reversal in Synechococcus elongatus PCC 7942. One

11 malonyl-CoA and one acetyl-CoA ultimately condense to one acetoacetyl-CoA. Pasztor et al. (2014) also used beta-oxidation reversal and extended it to butyryl- ACP further transformed to butanol using the TPC pathway in E. coli (Fig. 4).

Although isobutanol is similar to butanol, it is less soluble and has an energy density close to gasoline making it a better biofuel. Isobutanol synthesis takes place along the 2-keto-acid (or Ehrlich) pathway. Amino acid precurssors in the form of keto acids are decarboxylated into aldehydes and reduced to the corresponding alcohols. Isobutanol can be obtained from valine whereas propanol from isoleucine. Isobutanol has been successfully produced in E. coli (Yan and Liao, 2009) and Synechococcus elongatus PCC 7942 (Atsumi et al., 2009). All pathways are studied in this project (Fig. 4). They are hereafter referred to as LL pathway (Lan and Liao, 2012), TPC pathway (Pasztor et al., 2014), isobutanol pathway, NADH and NADPH pathway.

FBA was used to determine the maximal theoretical butanol (or isobutanol) yields for all these pathways (Table 1). Yields were calculated as production rate over carbon uptake rate. Under light-limited conditions, the carbon uptake is the sole limiting-factor. It is calculated as bicarbonate input minus carbon dioxide output. Even though most pathways have the same maximal rate towards the product, yields slightly differ because of different uptake rates.

Table 1. Maximum rates and yields for the different pathways in light-limited conditions.

Maximum Bicarbonate Carbon dioxide Maximum theoretical production rate uptake rate export rate yield YHCO3-,P

Pathway mmol Prod. / gDW h mmol HCO3- / gDW h mmol CO2 / gDW h mol Prod. / mol HCO3- Std NADH 0.39 1.55 0 0.252 Std NADPH 0.39 1.55 0 0.252 TPC 0.375 3.7 2.19 0.248 LL 0.39 3.7 2.14 0.250 Isobutanol 0.39 1.55 0 0.252

It is interesting to see that for TPC and LL pathways, HCO3- is taken at maximal rate even though one third is required as represented by a massive CO2- export. These two pathways differ from the others in using ATP. One explanation could be that this extra demand requires higher ATP production. The main ATP source is ATP synthase, whose activity is dependent on LEF, fixed in this case (light-limiting). Based on these calculations, each pathway can achieve similar maximal theoretical yield due to similar stoichiometry. However TPC and LL pathway requires more carbon dioxide. It was mentioned that carbon sinks such as decarboxylation steps increased carbon uptake (Oliver and Atsumi, 2015). This can be a disadvantage for the cost of the process with a high amount of carbon not efficiently used. Production of isobutanol is suggested here since it a better fuel and has a similar maximal yield than butanol.

12

Fig. 4. Pathways leading to butanol and isobutanol synthesis from pyruvate precursor as implemented in the model. Plain and dashed arrows indicate native and non-native reactions respectively. In blue, the 2-keto- acid route for isobutanol. In red, standard Clostridium butanol pathway. In this project, last three steps were studied with NADH or NADPH. In green, ATP-driven pathway reported by Lan and Liao (2012). The first two steps and conversion of crotonyl-CoA to butyryl-CoA useing NADH are specific for this pathway. In yellow, TPC pathway using β-oxidation reversal reported by Pazstor et al. (2014). Metabolite names next to arrows indicate reaction requirements.

13 Metabolic engineering in cyanobacteria

Generic approaches / Introduction

Regardless of the organism used, metabolic engineering can be divided into three levels with increased complexity and predictive power (Fig. 5). Initial strategies have mainly focused on gene level. Rerouting carbon flux to a certain pathway by simple amplification using high gene copy number, optimized codon usage or increased promoter strength was common. Quickly it was realized that productivities did not meet expectations due to cellular balancing omission. Stepping back to the pathway level granted a better global understanding of the process. The “push and pull” concept aiming at increasing pathway precursors and minimizing competing pathways is an example. Balancing the pathway using cofactor usage and the notion of driving force enabled to achieve higher productivity. The current state of metabolic engineering focuses on these two levels. Generic strategies to increase productivities include most notably: By-product production inhibition, removal of competing reactions, target pathway overexpression, cofactor usage, product degradation/uptake removal, feedback inhibition removal or toxicity tolerance. Built on the previous levels, systems level is at its infancy due to its high level of complexity. It requires high computational power and high understanding of the organism.

Fig. 5. Systemic approach of strain development (Lan and Liao, 2013). From right to left, increased in productivity often means increased in complexity. In cyanobacteria, early strategies on individual gene level have been performed extensively. Expression of different native and foreign promoters has been achieved. Native promoters used are usually found in housekeeping genes (light-inducible PpsbA2 from PSII; Prbc in CO2 fixation). Strong foreign IPTG-inducible promoter Ptrc is widely used in pathway engineering (Atsumi et al., 2009; Huang et al., 2010; Lan and Liao, 2011). Codon usage for heterologous gene could also be optimized. IspS gene encoding isoprene synthase from a plant specie was successfully codon-optimized, resulting in a 10-fold increase in gene expression (Lindberg et al, 2010). Recent advances have mainly focus on the pathway level with the notion of a driving force pushing towards product synthesis. Decarboxylation and ATP usage could provide such a driving force (Lan and Liao, 2012). As discussed earlier, reversibility

14 pathway reactions has also been investigated as a way to push forward the carbon flux (Bond-Watts et al., 2011). Enzyme kinetic data can also be integrated to have an overview of the rate-controlling step in a pathway using metabolic control analysis (Angermayr et al., 2013). Engineering tolerance to product is also an important part to achieve high productivity (Kaczmarzyk et al., 2014).

Future studies aim at building on these driving forces to improve current productivity. Genetic engineering of non-obvious targets aims at creating such driving forces over the whole organism rather than the pathway level. Since genome-scale reconstructions involve hundreds of reactions, powerful computational techniques are therefore essential. In E. coli, it was possible to use OptKnock to couple lactate production to growth (Fong et al., 2005). After 60 days, experimental results were in good agreement with simulations and approximately 90% of maximal theoretical lactate production rate was achieved. Through this project, analysis of driving forces that lead to growth-coupled butanol synthesis is a recurring theme.

Cofactor recycling

One important issue in metabolic engineering is cofactor balance. It has been many times the case that adding and removing pathways results in a cofactor imbalance, equivalent to a bottleneck. Due to interconnections between photosynthesis and respiration, cyanobacteria have a highly regulated cofactor balance making it difficult to engineer.

Cofactor recycling is important in order to couple biomass to product synthesis. Since cell factories internal objectives are to optimize their growth, linking it to engineering objective represents a genuine driving force towards overproduction phenotype. A phenotypic phase plane (PPP or production envelope) can be plot to illustrate that. Biomass is iteratively forced at different flux values. For each of them, flux towards product synthesis is maximized and minimized. For a given genotype, the PPP represents all mathematically possible phenotypes. We assume that the cell will optimize its growth. Thus, the cellular flux distribution will be pinpointed as the point with the highest growth rate (far right; Fig. 6). Now if product synthesis is coupled to growth, at maximal growth rate, product synthesis will occur as a necessary by-product to growth. This notion is perhaps the central aspect of this work and provides a powerful strategy for product synthesis.

Coupling biomass and fermentative products requires few gene deletions in E. coli or S. cerevisiae. Lactate production in E. coli is a good example. Biomass formation results in an excess of reducing equivalents through the central metabolism in the form of NADH. Under anaerobic conditions, the TCA cycle is not active. Fermentation pathways leading to lactate, ethanol or succinate (all recycling NADH to NAD+) are thus the main remaining mechanisms to re-oxidize excess NADH. This recycling is essential for the cell to sustain growth. Without any NAD+ available, flux through the central metabolism cannot go forward. As a result, lactate, ethanol, and

15 succinate are all potential growth-coupled product through cofactor recycling. Strong coupling between one of these products and growth can be achieved by knocking-out other fermentative pathways.

A. Simple trade-off B. Growth-coupled

0.45 0.45 0.4 0.4 0.35 0.35 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 Product mmol/gDW/h Product mmol/gDW/h 0.05 0.05 0 0 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 Growth rate 1/h Growth rate 1/h

Fig. 6. Simple trade-off versus growth-coupled phenotypic phase plane. A. In the simple trade-off design, no product is synthesized at maximal growth rate. B. In the growth-coupled design, product is synthesized at maximal growth rate.

Another key concept related to cofactor recycling as driving force is the ATP/NADPH ratio, a central theme in cyanobacteria. For instance, knocking out ethanol dehydrogenase and acetate kinase lead to growth-coupled synthesis of lactate in silico (Burgard et al., 2003) and in vivo (Fong et al., 2005). This suggests that single NADH recycling does not achieve coupling in this case. ATP generation is also an important factor.

Coupling product with biomass has been shown to be difficult in cyanobacteria (Nogales et al., 2013). Under phototrophic conditions, presence of mechanisms to balance cofactor redox state hinders coupling (Fig. 7). Cyclic electron flows (or alternate electron flows; CEF) support the cell in ATP/NADPH ratio modulation depending on environmental conditions. Because carbon dioxide fixation requires an ATP/NADPH ratio of 1.5, higher than the 1.28 ratio provided by the linear electron flow, CEF decreases the ATP/NADPH ratio by indirectly increasing ATP production at the expense of direct NADH/NADPH oxidation. It is believed that biomass requires a ratio larger than 1.51 or 2 (Erdrich et al., 2014; Knoop and Steuer, 2015). For this reason, a successful strategy would aim at decreasing the ATP/NADPH ratio. With a higher ratio requirement needed to produce biomass, an excess NADPH would then be available for our product. NADPH would then be recycled back to NADP+ using product synthesis pathway. This would provide a way to couple product with growth.

16 A

B ATP NADPH

Extra NADPH for product synthesis Fixed ATP/NADPH ratios

1.3 LEF ≥ 2 Biomass

Fig. 7. A. Overview of photosynthetic and respiratory electron transport chains. In most cyanobacteria, photosynthesis only occurs in thylakoids, and respiration takes place in both cellular and thylakoid membrane. As a result photosynthesis and respiration closely interact even sharing some membrane components (plastoquinone PQ pool, cytochrome b6f). Linear electron flow includes photosystem II (PSII), cytochrome b6f (CYTBF), photosystem I (PSI), ferredoxin NADP+ oxidoreductase (FNR), connected trough plastoquinone (PQ), cytochrome c (cytC), and ferredoxin, respectively. Cyclic electron flows include the ferredoxin PQ reductase (FQR), the NAD(P)H dehydrogenase complexes, the aa3-type terminal oxidase (CYO), the PQ oxidase (CydBD), the MEHLER reaction, and the hydrogenase (H2ase). B. Simple scheme showing how ATP/NADPH requirements can be engineered to provide a driving force for product synthesis.

17 Results

OptKnock and OptForce were applied to iJN678. OptKnock was performed using the COBRA toolbox 2.0 on MATLAB (Schellenberger et al., 2011). OptForce was performed using the GAMS modeling environment (General Algebraic Modeling System, www.gams.com). CPLEX solver was used for both algorithms. Transport reactions were removed from consideration using OptKnock. For OptForce, transport and peripherical reactions associated with biomass formation (carotenoid, riboflavin, sterol, nucleic acid components...) were removed from the pool as much as possible (reduced set). Two approaches were used, one iteratively removing these reactions from the solution and another removing these reactions before running the algorithm. Transporters or other peripherical reactions were allowed to be knockout mainly for informative reasons. Understanding the logic behind these knockouts is an important aspect to understand driving forces behind growth-coupled butanol production. In many cases, strategies could not be found for the reduced set. In both cases, reaction deletions found using OptKnock were used to guide the algorithm and decrease the computation time. Simulations were limited to few hours.

Identifying the driving force

Identifying the reason behind coupling is an important part of understanding to create high-producing strain. For each strategy, sensitivity to cofactor recycling is tested to identify which cofactor balance requirements push the flux though product synthesis. Implementing a reaction that re-oxidize a specific cofactor and test its influence on the coupling design is informative. If the coupling is not affected, the strategy is not sensitive to this cofactor. Conversely, no coupling means that the reaction is used instead of the pathway because it is less a burden for the cell to simply recycle cofactors using a single reaction than producing butanol. For instance, a reaction converting NADH to NAD+ can be added for a given strategy to test sensitivity to NADH recycling. If the PPP shows uncoupling between growth and product, then the strategy is said to be sensitive to NADH consumption. Strategies were sensitive to NADH consumption, NADPH consumption, and/or ATP production.

OptKnock to predict reaction deletions

OptKnock provides the first step towards identifying suitable gene targets. High computational time or even existence of a possible solution makes it difficult to find targets. OptKnock hinted strategies for both heterotrophic and autotrophic conditions. For simplicity and also because autotrophic conditions are target conditions for cyanobacteria growth, autotrophic strategies are considered here. Strategies were found for the NADH pathway and for the LL pathway.

18 Strategy for NADH pathway

The initial list proposed by OptKnock involved more than 10 reaction deletions to achieve coupling. Refining the list involved successive removal of each target to get a minimal set required to reach coupling. The reduced list involves 7 reaction deletions (Table 2). To compare the effect of each reaction deletion on the flux distribution, simulation with the restored reaction was computed for each target (mutant+). A visualization tool was used to easily see global changes in the metabolism (Maarleveld et al., 2014). A phenotypic phase plane permits to visualize growth-coupled production of butanol (Fig. 8). This strategy was shown to be sensitive to NADH consumption.

PPP for NADH pathway

0.45

0.4

0.35

0.3

0.25

0.2 initial strain modiied strain 0.15 Butanol mmol/gDW/h 0.1

0.05

0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Growth rate 1/h

Fig. 8. Phenotypic phase plane for the standard NADH pathway. In the modified strain, approximately 0.13 mmol/gDW/h butanol production is predicted at maximal growth.

Among the targets, hindering the circular electron flow was shown to be essential for coupling. NADH dehydrogenases (type 1 and type 2) were required to be knocked out. This is consistent with other studies from Erdrich et al. (2014) and Reed et al. (2013), where they used other algorithms such as CASOP and OptORF, respectively. The most obvious reason behind these knockouts is the removal of competing reactions for NADH re-oxidation. Since biomass is optimized, NADH re- oxidation will preferentially occurs in the ETC, where ATP can be produced as an end product rather than being ‘wasted’ for a product without interest for the cell (butanol is excreted and cannot be re-used after all).

19 This deletion is also believed to be sufficient enough to lower the ATP/NAD(P)H ratio in two different ways: (1) preventing NADH oxidation into NAD, (2) decreasing ATP production through ATP synthase. Decreasing electron flow activity reduces the number of proton crossing the membrane resulting in a lower proton gradient.

Glutamate synthase was also hinted as a suitable reaction deletion. Glutamate synthase is an oxido-reductase converting glutamine to glutamate. Glutamate synthetase, involved in nitrogen fixation, goes in the reverse direction. These reactions can combine to form a cycle where 1 mol ATP and 1 mol NADH are consumed per iteration (Fig. 9). Glutamate synthase may be seen as a competing reaction consuming NADH. Its repression is thus necessary to ensure coupling. Interestingly, its repression also down-regulates the flux through glutamate synthetase which otherwise could lower ATP/NADPH ratio by consuming ATP. This deletion shows that NADH recycling and not lowering this ration is the target effect.

Fig. 9. Futile cycle between glutamine and glutamate. This cycle could allow to balance NADH at the expense of ATP. Knockout of glutamate synthase is required in many strategies.

Malate dehydrogenase might have been suggested for the same reason. It catalyzes the inter-conversion between oxaloacetate and malate (Fig. 10). MDH+ shows that the reaction normally drives through NADH consumption to form malate. Thus, knocking this reaction out is believed to prevent another competing reaction for NADH consumption. Knocking-out MDH also resulted in a 23% increased flux from pyruvate to acetyl-CoA, precursor of 1-butanol. This deletion may offer an additional important feature of coupling; directing the flux towards our pathway of interest. FBA predicts indeed a higher flux towards pyruvate in the mutant than in MDH+ distribution. However, it is not clear whether it results from the coupling itself or this individual knockout. To further confirm the influence of NADH recycling, cofactor preference for malate dehydrogenase and glutamate synthase was swapped from NADH to NADPH. Knocking out these two targets were not essential any more to reach coupling.

Pyruvate-ferredoxin oxidoreductase (POR_syn) catalyzes pyruvate decarboxylation to acetyl-CoA resulting in the reduction of the cofactor ferredoxin. Ferredoxin can be oxidized back using ferredoxin-NADP+ reductase (FNOR) with NADP+ as electron

20 acceptor. The mechanism behind this is not clear since POR_syn indirectly offers more NADPH. Interestingly, knocking-out POR_syn results in a 10% increase through FNOR in comparison to POR+ mutant. Many interventions target the central metabolism as expected. Since the flux there is higher than the rest of the metabolism, interventions have the highest effect.

Glycolate oxidase (GLYCTO1), our last target, is part of photorespiration. Two + variants exist in the model. One uses H2O and the other one uses NAD as electron acceptor. This knockout is potentially problematic and represents a limitation that one can encounter using OptKnock. In fact, both versions are encoded by the same open reading frame (ORF). Knocking out both version results in a lethal phenotype for the cell according to simulations. This is consistent with reports that photorespiration is important for cell viability (Knoop et al., 2013). To understand the meaning of this knockout, the mutant was compared with GLYCTO1+ mutant flux distribution. When the flux is not constrained, FBA predicts that the H2O version of the reaction will carry all the flux. In contrast, the NAD+ version is the only way to photorespiration in the mutant, resulting in coupling.

An interesting question is whether the knockout it-self or the down-regulation of photorespiration is an essential feature of coupling. To test this, flux was forced to a low value for both versions. This resulted in coupling. It indicates that the target knockout aims at reducing the flux through photorespiration. But why should the flux be down-regulated to ensure coupling? There are two explanations: (1) More NADH is produced using the NAD+ way, which can be used for butanol production. (2) Low flux through photorespiration and steady-state constraints imply that the carbon flux is redirected to the central metabolism. Comparison with the mutant+ indicates that accounts for a 12% increase towards 3-Phosphoglycerate (3PG). Again, it is not clear whether this feature is due to this individual knockout or due to the sum of knockouts resulting in coupling.

Table 2. Minimal knockout set to couple growth with butanol in NADH pathway Enzyme name Reaction Glutamate synthase H+ + NADH + 2-Oxoglutarate + L-Glutamine -> NAD+ + 2 L-Glutamate + + NADH dehydrogenase 2 (tilacoide) NADH H + NADH + PQtil -> NAD + PQH2 til + + NADH dehydrogenase 1 (periplasm) NADH 4 H + NADH + PQper -> NAD + 3 Hper + PQH2 per + + NADH dehydrogenase 1 (tilacoide) NADH 4 H + NADH + PQtil -> NAD + 3 Htil + PQH2 til + Pyruvate-ferredoxin oxidoreductase CoA + Pyruvate + 2 oxidized ferredoxin -> H + CO2 + Acetyl-CoA + 2 reduced ferredoxin

Glycolate oxidase O2 + Glycolate -> H2O2 + Glyoxylate Malate dehydrogenase NAD+ + L-Malate <=> H+ + NADH + Oxaloacetate

21 Strategy for LL pathway

The initial set suggested by OptKnock involves acetate kinase, glycolate oxidase, NADH dehydrogenases type 1 and type 2, cytochrome c oxidase, malate dehydrogenase, lactate dehydrogenase and glutamate synthase. The minimal knockout set to ensure coupling is similar to the one obtained for the NADH pathway (Table 2). The only difference is the absence of pyruvate-ferredoxin oxidoreductase. Therefore, it is believed that knockouts have the same functions. However, the growth-coupled phenotype shows a lower level of butanol production (Fig. 9). This strategy is sensitive to NADH consumption, indicating that NADH recycling is the driving force behind the coupling.

PPP for LL pathway

0.45

0.4

0.35

0.3

0.25

0.2 initial strain modiied strain 0.15 Butanol mmol/gDW/h 0.1

0.05

0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Growth rate 1/h

Fig. 10. Phenotypic phase plane for the Lan & Liao pathway. In the modified strain, approximately 0.02 mmol/gDW/h butanol production is predicted at maximal growth.

When pyruvate-ferredoxin oxidoreductase was added to the knockout set, it resulted in almost no-growth. This suggests that the model is sensitive to additional knockouts. Interestingly, this strategy also worked for the standard NADH pathway but not for any others. This is consistent with the assumption that NADH is the driving force behind this strategy. When cofactor specificity is changed such as the pathway only requires NADPH, almost no growth is observed. Addition of a reaction recycling NADH to NAD rescues this phenotype and restores normal growth. This

22 indicates that this pathway is important for NADH recycling and now vital for the cell.

OptForce to predict reaction modulations

OptForce was used to predict reaction modulations to couple butanol to growth for the five different pathways. Gene modulations make this algorithm more powerful than OptKnock with fewer interventions needed in theory. Solutions could be found for all pathways. However, many of these results were cofactor un-sensitive meaning that the coupling was achieved with a different driving force. Adding a proton sink uncoupled growth to product in most of these strategies. Since proton- driven coupling strategies appear unlikely, those designs were written off.

Strategy for TPC pathway

The initial strategy consists of no more than 21 interventions. Ferredoxin NADP+ reductase was suggested to be up-regulated. Glycolate oxidase, cytochrome c oxidase, NADH dehydrogenase type 1 and 2, transhydrogenase, ATP synthase, glutamate synthase, glutamate dehydrogenase, , malate and acetate transporters were among the initial knockout list. The reduced set includes 14 reaction deletions and one up-regulation. This strategy is sensitive to NADH and NADPH consumption and ATP production. Among them, familiar targets (glutamate futile cycle, cyclic electron flows) found in other strategies re-appear and seem to be important components of coupling designs.

It should be pointed out that another set of NADH dehydrogenase knockouts is required here, including both NADH and NADPH dependent enzymes, consistent with NADH and NADPH sensitivity. ATP synthase, using proton gradient across the membrane to generate ATP, is also suggested as knockout to hypothetically decrease the ATP/NADPH ratio. Cyclic electron flows, the Mehler reaction and hydrogenase are potential competing reactions for NADPH re-oxidation. Mehler reaction was also predicted by CASOP (Erdrich et al., 2014) as a member of cyclic electron flows. Thus, its deletion may aim at lowering ATP/NADPH ratio rather than direct cofactor recycling. Restoring both reactions and swapping cofactor preferences from NADPH to NADH had no effect due to sensitivity to both cofactors.

Fumarase (FUM) deletion is an interesting target since it is cofactor independent. This knockout offers a genuine flux redirection instead. To analyze flux rerouting, flux distribution was compared with FUM+, the modified strain with fumarase restored. In the model, three reactions are connected to malate: malate dehydrogenase (a knockout encountered earlier), fumarase, and malic enzyme converting malate to pyruvate using NADP+ (Fig. 12). In FUM+, malic enzyme is not active and malate is entirely converted in fumarate. In the mutant, knocking out FUM diverts the flux from malate back to pyruvate and to the acetyl-CoA pool as an extension. Additionally, this rerouting produces extra NADPH for butanol.

23 PPP for TPC pathway

0.45

0.4

0.35

0.3

0.25

0.2 initial strain modiied strain 0.15 Butanol mmol/gDW/h 0.1

0.05

0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Growth rate 1/h

Fig. 11. Phenotypic phase plane for the TPC pathway. In the modified strain, approximately 0.14 mmol/gDW/h butanol production is predicted at maximal growth.

It is interesting to compare this deletion with malate dehydrogenase deletion from a previous strategy for the NADH pathway. Even though these two knockouts are neighbors, their resulting effects are quite different. MDH knockout aimed at removing competing reaction for NADH for butanol. It decreased NADPH production, not essential for butanol. In the TPC design, NADPH requirements are higher (4 mol vs 1mol). For this reason, producing NADPH at the expense of NADH might be an explanation for this knockout. This can be seen as a regulatory mechanism (similar to a futile cycle). Phosphoenolpyruvate (PEP) can be converted to pyruvate either through the direct dephosphorylation route producing ATP or through a first decarboxylation step to OAA. A second step through MDH produces malate and NADH. Finally, a decarboxylation step to pyruvate produces NADPH. It should be noted that a potential futile cycling involving PEP, malate, oxaloacetate, pyruvate could provide a futile cycle as regulatory mechanism. This is an example of how cofactor requirement for butanol synthesis can lead to different knockouts in a specific region. Finally, acetate and malate transporters could be replaced by acetate kinase knockout. The main reason behind ferredoxin NADP+ reductase overexpression is production of extra NADPH, a driving force in this design.

24

Fig. 12. Overview of central metabolism reactions around pyruvate. There are two routes from PEP to pyruvate; a direct and an indirect one through OAA and malate. PEP carboxylase, malate dehydrogenase, malic enzyme and PEP synthase could form a cycle indirectly converting NADH to NADPH.

Table 3. Minimal knockout set to couple growth with butanol for the TPC pathway Enzyme name Reaction Glutamate synthase H+ + NADH + 2-Oxoglutarate + L-Glutamine -> NAD+ + 2 L-Glutamate + + NADH dehydrogenase 2 (tilacoide) H + NADH + PQtil -> NAD + PQH2 til + + + NADH dehydrogenase 1 (tilacoide) NADPH 4 H + NADPH + PQtil -> NADP + 3 H til + PQH2 til + + + NADH dehydrogenase 1 (tilacoide) NADH 4 H + NADH + PQtil -> NAD + 3 H til + PQH2 til + + ATP synthase 3 ADP + 3 Pi + 14 H per -> 3 ATP + 11 H + 3 H2O

Glycolate oxidase O2 + Glycolate -> H2O2 + Glyoxylate

Fumarase H2O + Fumarate <=> Malate Phosphotransacetylase Acetyl-CoA + Pi <=> CoA + Acetyl-P + + Hydrogenase H + NADPH <=> NADP + H2 + + + Active CO2 transporter facilitator (tilacoide) 3 H + H2O + NADPH + PQ + CO2 -> NADP + HCO3 + 3 H + PQH2 + + Mehler reaction H + 0.5 O2 + NADPH -> H2O + NADP + + Cytochrome c oxidase (tilacoide) 4 H + 2 ferrocytochrome + 0.5 O2 til -> 2 H + 2 ferricytochrome + H2Otil + + + 2+ 4 H + 2 plastocyanin(Cu ) + 0.5 O2 til -> 2 H + 2 plastocyanin(Cu ) + H2O til Ferredoxin NADP+ reductase H+ + NADP+ + 2 reduced ferredoxin <=> NADPH + 2 oxidized ferredoxin

25 This strategy also works for all other pathways including the NADPH one. But it did not work for the isobutanol pathway, most probably due another precursor used (pyruvate). This design is the first to show sensitivity to three factors; NADH consumption, NADPH consumption, and ATP production. The two last factors are familiar and could be regrouped into the ATP/NADPH ratio that acts as a driving force. Adding a reaction that consumes NADPH and produces ATP mimics an increase in the ATP/NADPH ratio. A natural question is how can we get two driving forces? Is one not sufficient enough or are these sensitivities related? To investigate these questions, flux distribution at maximal growth rate with an NADH recycling reaction was analyzed. Flux through the transhydrogenase suggests excess NADPH converted in NADH. Flux distributions were compared with and without the transhydrogenase reaction set to 0. This allowed identifying another regulatory mechanism that can substitute the transhydrogenase. Two glyceraldehyde-3- phosphate dehydrogenases (slr0844 & sll1342) form a cycle that can consume NADH and produce NADPH (Fig. 13). Knocking out both mechanisms did not make the strategy less sensitive to one of the two cofactors. This does not necessarily indicate the presence of two independent driving forces. But, looking at the transhydrogenase direction, ATP/NADPH could be the driving force. Since the pathway requires NADH, active conversion of NADPH to NADH occurs in the network and is necessary to achieve coupling. With hundreds of reaction, it is always possible for the network to balance NADH/NADPH ratio using peripheral reactions rather than the mechanisms studied, but at higher metabolic costs.

Fig. 13. Glyceraldehyde 3-phosphate dehydrogenase cycle. This cycle can substitute the transhydrogenase reaction.

Strategy for the isobutanol pathway

The initial list includes more than 15 interventions. The set was reduced and reactions including PHB synthesis, succinate dehydrogenase, transhydrogenase were not required for the coupling. The reduced set incorporates 16 reactions knockout and one overexpression (Table 4). This strategy is sensitive to NADH consumption, and ATP production. Also the transhydrogenase carries no flux in this situation, suggesting that NADH is in excess. This is confirmed when reversible flux is allowed. However, when fumarase, succinate dehydrogenase, and polyhydroxybutyrate synthase are knocked out, the design is not sensitive to NADH

26 anymore. It is not clear how butanol synthesis regenerates ATP and can be coupled to growth.

Among targets not already discussed, transporter for pyruvate was suggested as knockout. The most logical explanation is to redirect carbon flux to isobutanol. It is interesting to get this target for isobutanol, the only pathway using pyruvate as direct precursor. Starting from pyruvate may require a better rerouting than acetyl- CoA, which has no transporter. Pyruvate has been suggested as an overflow metabolite under nitrogen starvation for cells lacking glycogen synthesis (Gründel et al., 2012). ATP synthase knockout is consistent with ATP sensitivity and an ATP/NADPH ratio decrease. Phosphoribulokinase overexpression, member of carbon fixation, was suggested. Increasing carbon fixation may indeed be beneficial for product synthesis and, additionally, ATP consumed during this reaction lowers the ATP/NADPH ratio. This strategy also works for all other pathways (including NADPH-dependant butanol) except for the LL pathway.

Table 4. Minimal knockout set to couple growth with isobutanol. Enzyme name Reaction + + NADH dehydrogenase 2 (tilacoide) H + NADH + PQtil -> NAD + PQH2 til + + + NADH dehydrogenase 1 (periplasm) 4 H + NADPH + PQtil -> NADP + 3 H til + PQH2 til + + + NADH dehydrogenase 1 (tilacoide) 4 H + NADH + PQtil -> NAD + 3 H til + PQH2 til + + ATP synthase 3 ADP + 3 Pi + 14 H per -> 3 ATP + 11 H + 3 H2O

Glycolate oxidase O2 + Glycolate -> H2O2 + Glyoxylate Phosphotransacetylase Acetyl-CoA + Pi <=> CoA + Acetyl-P + + + Active CO2 transporter facilitator (tilacoide) 3 H + H2O + NADPH + PQ + CO2 -> NADP + HCO3 + 3 H + PQH2 + + Cytochrome c oxidase 4 H + 2 ferrocytochrome + 0.5 O2 til -> 2 H + 2 ferricytochrome + H2Otil + + + 2+ 4 H + 2 plastocyanin(Cu ) + 0.5 O2 til -> 2 H + 2 plastocyanin(Cu ) + H2O til + Cyclic electron flow (FQR) 2 H + PQtil + 2 reduced ferredoxin -> PQH2 til + 2 oxidized ferredoxin + Pyruvate-ferredoxin oxidoreductase CoA + Pyruvate + 2 oxidized ferredoxin -> H + CO2 + Acetyl-CoA + 2 reduced ferredoxin

Pyruvate transporter Pyruvatecyt <=> Pyruvateext Phosphoribulokinase ATP + Ribulose-5P -> ADP + H+ + Ribulose-1,5-biP

27 PPP for Isobutanol

0.45

0.4

0.35

0.3

0.25

0.2 initial strain modiied strain 0.15

Isobutanol mmol/gDW/h 0.1

0.05

0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Growth rate 1/h

Fig. 14. Phenotypic phase plane for isobutanol production. In the modified strain, approximately 0.3 mmol/gDW/h isobutanol production is predicted at maximal growth. This corresponds to 77% of the maximal theoretical production rate.

Identifying buffering mechanisms for cofactor balancing

Buffering mechanisms are to some extent unwanted, as we want to create a cofactor imbalance shifting the flux towards product synthesis. Buffering mechanisms were identified using cofactor excess conditions. Reactions that create cofactor excess were added to the model. FBA was solved and flux distributions were compared with normal conditions. Reactions with significant changes in their fluxes were identified as targets and potential competing reaction for cofactor recycling. Moreover, since FBA always aims at finding optimal distribution, we can iteratively find such buffering reactions, set them to initial flux values and identify new ones. A hierarchical order for regulation can then be obtained since the first reactions predicted by FBA are the most efficient mathematically for cofactor recycling. Unbalance was created in the form of (1) Reduction of NAD+ to NADH, (2) Reduction of NADP+ to NADPH, (3) ATP hydrolysis to ADP, and (4) ATP hydrolysis and NADPH production to account for ATP/NADPH ratio modulation. For each condition, an arbitrary flux of 20 mmol/g DW/h was set for the reaction creating excess cofators. This arbitrary value needs to be high enough to see changes but not too much to unable growth. Ideally a value that leads to a growth rate similar to the ones obtained using the algorithms would perhaps enable better comparisons.

28 Many of the regulatory mechanisms found are consistent with targets predicted by both OptForce and OptKnock. Two general patterns can be obtained from the simulations: one for (1) NADH balance and another one for (2) NADPH, (3) ATP, (4) ATP/NADPH. This indicates that ATP production and NADPH consumption are closely linked (ATP/NADPH ratio) and use a similar set of reactions.

For NADH regulation, NADH dehydrogenase type 1 and 2 are the first regulatory mechanisms predicted by FBA to balance an increased NADH production. Secondary regulating mechanisms involve other enzymes in the electron transport chain like cytochrome b6/f complex or cytochrome c oxidase. Interestingly, glutamate synthase and glutamate dehydrogenase showed both same level of up-regulation in what might be a futile cycle to recycle NADH into NAD+. This is consistent with the hypothesis from OptKnock for the NADH pathway strategy. Also it makes sense that this futile cycle is listed as secondary regulatory mechanism since it wastes one mole ATP per NADH recycled. Other secondary regulating mechanisms involve adenylate kinase consuming ATP. The utility of this target for NADH dissipation is not clearly understood.

The other pattern involves cytochrome b6/f complex or cytochrome c oxidase as primary regulatory mechanism whereas NADH dehydrogenase type 1 was here predicted as secondary mechanism with transhydrogenase. Since transhydrogenase is considered as non-reversible reaction converting NADPH to NADH, it makes sense that it participates in cofactor balancing in NADPH dissipation.

29 Discussion

Objective function choice

All simulations were performed under the assumption that cyanobacteria optimize their growth, which is true for most bacteria. Even though this assumption is relevant for cyanobacteria, this might not be its primal objective. Diurnal environment imposes some flexibility on the cell. It has been suggested that cyanobacteria optimize ATP production to better react to changes in the environment (Siurana et al., unpublished). For instance, ATP yield optimization has been shown to better describe E. coli flux states in resting conditions (). Other groups suggested splitting the objective function in two; one accounting for light and one for dark conditions. Biomass is optimized during the day and ATP is maximized during the night. Nevertheless, taking a good objective function is another issue towards using genome-scale model as close to reality.

Comparing results with other studies

Results are in good agreement with two other studies using different models and algorithms. However, no one has reported growth-coupled strategies for fermentative products in autotrophic conditions. OptORF suggested blocking reactions or cycles that consume reducing power in the form of NAD(P)H for ethanol, acetate, alanine, succinate, butanol, and isoprene in Synechococcus sp. PCC 7002 (Reed et al., 2013). NADH dehydrogenase was also a target to be blocked in their study. They acknowledged difficulties to couple growth to chemical production because the considered carbon-limiting conditions (excess light). Thereby, excess of reductants, the driving force to growth-couple a product, was inexistent. Erdrich et al. (2014) used the Knoop model and CASOP to find strategies for ethanol production (NADPH used). Cyclic electron flows were the main targets to be blocked. The list includes the NADPH dehydrogenase, NADH hydrogenase type 2, FQR, cytochrome c oxidase, cytochrome oxidase bd, mehler reaction. Additionally, the proline dehydrogenase reaction “Proline + PQ => 1-Pyrroline-5- carboxylate + PQH2” (sll1561) is believed to form a cyclic electron flow with pyrroline-5-carboxylate reductase catalyzing “Pyrroline-5-carboxylate + NAD(P)H + H+ => Proline + NAD(P)+” (slr0661). In iJN678, proline dehydrogenase uses flavine adenine dinucleotide (FADH2) as cofactor instead of plastoquinone (PQ). This electron carrier does not participate in cyclic electron flows. Nevertheless, pyrroline-5-carboxylate reductase was also suggested by OptKnock, even though its knockout was not required to couple butanol with growth. The authors also suggested a similar cycle, catalyzed by two glycerol-3-phosphate dehydrogenases. Glycerol and dihydroxyacetone phosphate NADH-dependant interconversion forms one half of the cycle (slr1755). PQ acts as an electron carrier to convert glycerol 3- phosphate to dihydroxyacetone phosphate and close the cycle (sll1085). Again, this last reaction uses FADH2 instead according to iJN678. When FADH2 was swapped for PQ, this resulted in uncoupling product synthesis with growth. Knocking out these

30 two cycles rescued the coupling. This shows the importance of comparing cofactor preferences between genome-scale models.

Cofactor buffering mechanisms

Through this work, cofactor buffering mechanisms haven proven to be suitable knockout targets. Getting rid of these mechanisms permit a cofactor balance controls through product synthesis.

In E. coli and Bacillus subtilis (B. subtilis), transhydrogenases have been subject to intensive studies from U. Sauer and co-workers (Rühl et al., 2012; Chou et al., 2015). As a regulatory reaction that balances NAD(H)/NADP(H) ratio, it is of interest for engineering as it might make more cofactor available for product synthesis (Angermayr et al., 2012). In iJN678, a reversible transhydrogenase converting NADPH to NADH is present in the model. On the contrary, Knoop et al. (2013) account for a reversible transhydrogenase in their model. Allowing a reversible flux through the transhydrogenase uncoupled butanol with growth for NADH consumption sensitive designs. In other words, instead of using the butanol pathway to recycle NADH, the transhydrogenase is used instead producing reducing equivalents in the form of NADPH. NADPH is then recycled using numerous reactions in the cell. Transhydrogenase should therefore not be seen as a competing reaction for cofactor recycling since a re-oxidation step still needs to be carried out. Instead it allows a regulation between both cofactors. Nevertheless, its knockout may allow fewer competing reaction if coupling is achieved through one particular cofactor balance. In strategies using both NADH and NADPH, mechanisms such as the transhydrogenase are essential to keep a good cofactor balance between both types in the most direct way.

In this project, a couple of cycles that could regulate cofactor balance were identified and could be added to a list of important targets to consider. Interconversion between glutamine and glutamate provides a regulatory cycle that consumes ATP and NADH per iteration. Experimentally, a related cycle between ana- and catabolism of glutamate (consuming NADPH and producing NADH) has been identified in B. subtilis (Rühl et al., 2012). Such cycle exists in cyanobacteria, but did not seem to be active in silico.

Another cycle, involves several reactions around pyruvate. This cycle consumes NADH and produces NADPH from PEP to PYR, instead of producing ATP through pyruvate kinase. OptKnock suggested pyruvate kinase knockout in numerous simulations, but was not essential to achieve coupling. This cycle can be modulated to suit cofactor requirements for pathways of interest. For instance in NADPH dependant pathways, fumarase deletion was found to amplify this cycle and produce NADPH. If the pathway requires NADH, malate dehydrogenase was suggested for knockout. This set of reactions is believed to play an important role in the cell physiology. Highest flux is achieved in the central metabolism. Thus, knockouts have potential high effects here. In B. subtilis, malic enzyme was

31 experimentally identified as a cofactor buffering mechanism alongside a reverse reaction not present in cyanobacteria.

Simultaneous activity between both glyceraldehyde-3-phosphate dehydrogenases was also found to form a cycle interconnecting NADH with NADPH. This cycle was shown to be important for NADPH balancing in nitrogen-starving B. subtilis. In a study where they study NADPH regulation, Rühl et al. (2012) measured catabolic NADPH production. A significant part of this production could not be assigned to measured recycling mechanisms, indicating the presence of more buffering reactions.

ATP/NADPH versus NADH as driving forces

Using pathways with different cofactor requirements enable to analyze the different driving forces behind coupling. Overall, driving forces and pathway requirements correlate. In some cases, coupling occurs with different cofactors suggesting that an interconversion mechanism allows meeting cofactor requirements for the butanol pathway. Cofactors are highly interconnected metabolites. This property and presence of numerous mechanisms that allows a balance between the oxidized and the reduced form, as well as cofactor types, makes it difficult to understand the driving forces behind coupling.

It seems two driving forces exist to couple growth to product synthesis. On the one hand, ATP/NADPH ratio has been already discussed (Erdrich et al., 2014) for NADPH dependent synthesis of ethanol. On the other hand, NADH recycling as driving force in cyanobacteria has not been reported previously. These two driving forces are not exclusive. They can be encountered individually or both for a same design. Moreover due to the presence of regulatory mechanisms, other than the ones investigated, both cofactor balance are highly related. To illustrate this point, NADH recycling was identified as a driving force for all strategies including fully NADPH-dependent pathway. When identified targets for cofactor balance were knockout, strategy was still sensitive to NADH. This could indicate that NADH recycling is the driving force, interconnected with NADPH through mechanisms not identified in this study.

Another example shows the complexity of this close connection between both driving forces. In the first strategy for the NADH dependent path, NADH was the sole driving force identified. However, when ATP/NADPH ratio is plot for different fixed growth rates and butanol production rates, ATP/NADPH ratio decreases as butanol is produced. Even though NADH is the driving force, the strategy correlates with ATP/NADPH ratio.

Overall, strategies using NADPH as sole driving force could not be found. Often, coupling appeared to be through NADH recycling. Then NADH is converted into NADPH to meet the pathway needs. One simple explanation is that NADPH is listed in more reactions than NADH (86 versus 52 reactions). Another explanation is

32 NADPH requirement for carbon fixation in the Calvin cycle. This could act as a competing reaction that could not be knockout under phototrophic conditions.

Conclusion

In conclusion, coupling fermentative product to growth is difficult in autotrophic conditions, in particular under carbon-limited physiology with light in excess. In this work, the first strategies to couple butanol to growth in autotrophic conditions are presented. Strategies aim at reducing photosynthesis robustness by targeting cycling electron flows in accordance to similar studies. Additionally, different mechanisms buffering cofactor balance as well as competing reactions for cofactor recycling are also targets. Two driving forces were identified in the form of ATP/NADPH ratio and NADH recycling. It seems that the latter is the actual driving forces in coupling strategies. A successful strategy aims at creating a cofactor imbalance, removing cofactor regulatory mechanisms so that product synthesis is the last mechanism available to restore such balance, forcing the flux towards product synthesis.

Future directions

Testing these strategies experimentally or a combination thereof, is a genuine extension of this project. In particular, MFA could be used to further constrain the model. Metabolomics could help to identify driving forces by measuring cofactor concentrations for different genotypes. Transcriptomics data could also give information whether certain reactions are active or not. To find alternative strategies, OptKnock and OptForce could be run again with different settings and on different models as well. For time reasons, OptSwap could not be used in this project, but its use would be relevant to study cofactors requirement in more details. Finally, it is difficult to knockout a high number of targets at once. However, with recent advances in genome editing, accurate and wide genetic modification is now at reach.

33 References

Agren, R., Liu, L., Shoaie, S., Vongsangnak, W., Nookaew, I., and Nielsen, J. (2013). The RAVEN Toolbox and Its Use for Generating a Genome-scale Metabolic Model for Penicillium chrysogenum. PLoS Computational Biology 9.

Angermayr, S. A., Paszota, M., and Hellingwerf, K. J. (2012). Engineering a cyanobacterial cell factory for production of lactic acid. Applied and Environmental Microbiology 78, 7098-7106.

Angermayr, S. A., and Hellingwerf, K. J. (2013). On the use of metabolic control analysis in the optimization of cyanobacterial biosolar cell factories. Journal of Physical Chemistry B 117, 11169-11175.

Atsumi, S., Cann, A. F., Connor, M. R., Shen, C. R., Smith, K. M., Brynildsen, M. P., Chou, K. J. Y., Hanai, T., and Liao, J. C. (2008). Metabolic engineering of Escherichia coli for 1-butanol production. Metabolic Engineering 10, 305-311.

Atsumi, S., Higashide, W., and Liao, J. C. (2009). Direct photosynthetic recycling of carbon dioxide to isobutyraldehyde. Nature biotechnology 27, 1177-1180.

Bond-Watts, B. B., Bellerose, R. J., and Chang, M. C. Y. (2011). Enzyme mechanism as a kinetic control element for designing synthetic biofuel pathways. Nature chemical biology 7, 222-227.

Boynton, Z. L., Bennett, G. N., and Rudolph, F. B. (1996). Cloning, sequencing, and expression of clustered genes encoding beta-hydroxybutyryl-coenzyme A (CoA) dehydrogenase, crotonase, and butyryl-CoA dehydrogenase from Clostridium acetobutylicum ATCC 824. Journal of Bacteriology 178, 3015-3024.

Burgard, A. P., Pharkya, P., and Maranas, C. D. (2003). OptKnock: A Bilevel Programming Framework for Identifying Gene Knockout Strategies for Microbial Strain Optimization. Biotechnology and Bioengineering 84, 647-657.

Chou, H.-H., Marx, C. J., and Sauer, U. (2015). Transhydrogenase Promotes the Robustness and Evolvability of E. coli Deficient in NADPH Production. PLOS Genetics 11, e1005007.

Erdrich, P., Knoop H., Steuer R. and Klamt S. (2014). Cyanobacterial biofuels: new insights and strain design strategies revealed by computational modeling. Microbial Cell Factories, 13:128.

Feist, A. M., Herrgård, M. J., Thiele, I., Reed, J. L., and Palsson, B. Ø (2009). Reconstruction of Biochemical Networks in Microbial Organisms. Nature Reviews Microbiology 7, 129-143.

34

Fong, S. S., Burgard, A. P., Herring, C. D., Knight, E. M., Blattner, F. R., Maranas, C. D., and Palsson, B. O. (2005). In silico design and adaptive evolution of Escherichia coli for production of lactic acid. Biotechnology and bioengineering 91, 643-648.

Gründel, M., Scheunemann, R., Lockau, W., and Zilliges, Y. (2012). Impaired glycogen synthesis causes metabolic overflow reactions and affects stress responses in the cyanobacterium Synechocystis sp. PCC 6803. Microbiology (United Kingdom) 158, 3032-3043.

Hädicke, O., and Klamt, S. (2010). CASOP: A Computational Approach for Strain Optimization aiming at high Productivity. Journal of Biotechnology 147, 88-101.

Huang, H.-H., and Lindblad, P. (2013). Wide-dynamic-range promoters engineered for cyanobacteria. Journal of biological engineering 7, 10.

Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J. C., Kitano, H., Arkin, A. P., Bornstein, B. J., Bray, D., Cornish-Bowden, A., et al. (2003). The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 19, 524-531.

Ibarra, R. U., Edwards, J. S., and Palsson, B. O. (2002). Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420, 186-189.

Kaczmarzyk, D., Anfelt, J., Särnegrim, A., and Hudson, E. P. (2014). Overexpression of sigma factor SigB improves temperature and butanol tolerance of Synechocystis sp. PCC6803. Journal of Biotechnology 182-183, 54-60.

King, Z. A., and Feist, A. M. (2013). Optimizing Cofactor Specificity of Oxidoreductase Enzymes for the Generation of Microbial Production Strains—OptSwap. Industrial Biotechnology 9, 236-246.

Kim, J., and Reed, J. L. (2010). OptORF: Optimal metabolic and regulatory perturbations for metabolic engineering of microbial strains. BMC systems biology 4, 53.

Knoop, H., Gründel, M., Zilliges, Y., Lehmann, R., Hoffmann, S., Lockau, W., and Steuer, R. (2013). Flux Balance Analysis of Cyanobacterial Metabolism: The Metabolic Network of Synechocystis sp. PCC 6803. PLoS Computational Biology 9.

Knoop H. and Steuer R. (2015). A computational analysis of stoichiometric constraints and trade-offs in cyanobacterial biofuel production. Frontiers in Bioengineering and Biotechnology, 3:47.

35 Lan, E. I., and Liao, J. C. (2011). Metabolic engineering of cyanobacteria for 1-butanol production from carbon dioxide. Metabolic Engineering 13, 353-363.

Lan, E. I., and Liao, J. C. (2012). ATP drives direct photosynthetic production of 1- butanol in cyanobacteria. Proceedings of the National Academy of Sciences 109, 6018-6023.

Lan, E. I., and Liao, J. C. (2013). Microbial synthesis of n-butanol, isobutanol, and other higher alcohols from diverse resources. Bioresource Technology 135, 339- 349.

Lindberg, P., Park, S., and Melis, A. (2010). Engineering a platform for photosynthetic isoprene production in cyanobacteria, using Synechocystis as the model organism. Metabolic Engineering 12, 70-79.

Maarleveld, T. R., Boele, J., Bruggeman, F. J., and Teusink, B. (2014). A data integration and visualization resource for the metabolic network of Synechocystis sp. PCC 6803. Plant physiology 164, 1111-21.

Montagud, A., Zelezniak, A., Navarro, E., Córdoba, P. F. de, Urchueguía, J. F., and Patil, K. R. (2011). Flux coupling and transcriptional regulation within the metabolic network of the photosynthetic bacterium Synechocystis sp. PCC6803. Biotechnology Journal 6, 330-342.

Nakamura, Y., Kaneko, T., Miyajima, N., and Tabata, S. (1999). Extension of CyanoBase, CyanoMutants: Repository of mutant information on synechocystis sp. strain PCC6803. Nucleic Acids Research 27, 66-68.

Nakao, M., Okamoto, S., Kohara, M., Fujishiro, T., Fujisawa, T., Sato, S., Tabata, S., Kaneko, T., and Nakamura, Y. (2009). CyanoBase: The cyanobacteria genome database update 2010. Nucleic Acids Research 38.

Nielsen, D. R., Leonard, E., Yoon, S. H., Tseng, H. C., Yuan, C., and Prather, K. L. J. (2009). Engineering alternative butanol production platforms in heterologous bacteria. Metabolic Engineering 11, 262-273.

Nogales, J., Gudmundsson, S., Knight, E. M., Palsson, B. O., and Thiele, I. (2012). Detailing the optimality of photosynthesis in cyanobacteria through systems biology analysis. Proceedings of the National Academy of Sciences 109, 2678-2683.

Nogales, J., Gudmundsson, S., and Thiele, I. (2013). Toward systems metabolic engineering in cyanobacteria: Opportunities and bottlenecks. Bioengineered 4.

Oliver, J. W. K., and Atsumi, S. (2015). A carbon sink pathway increases carbon productivity in cyanobacteria. Metabolic Engineering.

36 Orth, J. D., Thiele, I., and Palsson, B. Ø. (2010). What is flux balance analysis? Nature biotechnology 28, 245-248.

Pasteur, L. (1862). Quelques résultats nouveaux relatifs aux fermentations acétique et butyrique. Bull Soc Chim Paris, May, pp. 52-53

Pásztor, A., Kallio, P., Malatinszky, D., Akhtar, M. K., and Jones, P. R. (2014). A synthetic O2 -tolerant butanol pathway exploiting native fatty acid biosynthesis in Escherichia coli. Biotechnology and bioengineering.

Patil, K. R., Rocha, I., Förster, J., and Nielsen, J. (2005). Evolutionary programming as a platform for in silico metabolic engineering. BMC bioinformatics 6, 308.

Ranganathan, S., Suthers, P. F., and Maranas, C. D. (2010). OptForce: An optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Computational Biology 6.

Rühl, M., Le Coq, D., Aymerich, S., and Sauer, U. (2012). 13C-flux analysis reveals NADPH-balancing transhydrogenation cycles in stationary phase of nitrogen- starving Bacillus subtilis. Journal of Biological Chemistry 287, 27959-27970.

Saha, R., Verseput, A. T., Berla, B. M., Mueller, T. J., Pakrasi, H. B., and Maranas, C. D. (2012). Reconstruction and Comparison of the Metabolic Potential of Cyanobacteria Cyanothece sp. ATCC 51142 and Synechocystis sp. PCC 6803. PLoS ONE 7.

Schellenberger, J., Que, R., Fleming, R. M. T., Thiele, I., Orth, J. D., Feist, A. M., Zielinski, D. C., Bordbar, A., Lewis, N. E., Rahmanian, S., et al. (2011). Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nature protocols 6, 1290-1307.

Schuetz, R., Kuepfer, L., and Sauer, U. (2007). Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Molecular systems biology 3, 119.

Segrè, D., Vitkup, D., and Church, G. M. (2002). Analysis of optimality in natural and perturbed metabolic networks. Proceedings of the National Academy of Sciences of the United States of America 99, 15112-15117.

Shastri, A. A., and Morgan, J. A. (2005). Flux balance analysis of photoautotrophic metabolism. Biotechnology Progress 21, 1617-1626.

Shlomi, T., Berkman, O., and Ruppin, E. (2005). Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proceedings of the National Academy of Sciences of the United States of America 102, 7695-7700.

37 Tepper, N., and Shlomi, T. (2009). Predicting metabolic engineering knockout strategies for chemical production: Accounting for competing pathways. Bioinformatics 26, 536-543.

Thiele, I., and Palsson, B. Ø. (2010). A protocol for generating a high-quality genome- scale metabolic reconstruction. Nature protocols 5, 93-121.

Yan, Y., and Liao, J. C. (2009). Engineering metabolic systems for production of advanced fuels. Journal of Industrial Microbiology and Biotechnology 36, 471-479.

38