<<

Enhancing CO2 fixation by synergistic substrate cofeeding

by

Nian Liu

B.S. Chemical Engineering, University of California, Berkeley, 2014

SUBMITTED TO THE DEPARTMENT OF CHEMICAL ENGINEERING IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY IN CHEMICAL ENGINEERING AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2020

© 2020 Massachusetts Institute of Technology. All rights reserved.

Signature of author ...... Department of Chemical Engineering July 1, 2020

Certified by ...... Gregory Stephanopoulos William Henry Dow Professor of Chemical Engineering and Biotechnology Thesis Supervisor

Accepted by ...... Patrick S. Doyle Robert T. Haslam (1911) Professor of Chemical Engineering Graduate Officer

Enhancing CO2 fixation by synergistic substrate cofeeding

by

Nian Liu

Submitted to the Department of Chemical Engineering On July 1, 2020, in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Chemical Engineering at the Massachusetts Institute of Technology

ABSTRACT

The irrevocable rise of atmospheric CO2 levels has prompted the development of scalable fixation technologies in recent years. To this end, biological non-photosynthetic methods serve as promising leads since they not only sequester CO2, but also convert it into a variety of value-added fuels, chemicals, and pharmaceuticals with high specificity. Additionally, as the responsible for fixing CO2 derive from free electrons or electron carriers such as H2, the process can interface with existing photovoltaics to achieve a high energy efficiency (~8%) that outcompetes photosynthetic systems (<1%). In this thesis, we describe a non-photosynthetic CO2- fixation approach that sequentially utilizes an acetogenic bacterium and an oleaginous yeast to accomplish the conversion of H2/CO2 into -based biodiesel with acetate as the key intermediate. Despite its feasibility, this two-stage system suffers from slow of the two chosen microbes. To remedy the issue, we began by identifying the limiting factors, which was determined to be insufficient ATP availability for CO2 fixation in the first stage and inadequate NADPH levels for acetate-driven in the second stage. Correspondingly, a dual carbon source cofeeding scheme was developed to promote mixed substrate metabolism in the two organisms, synergistically stimulating CO2 reduction into products. We demonstrate that minor amounts of glucose addition to cultures saturated with H2 enhances net CO2 conversion into acetate by simultaneously satisfying ATP and e− demands at the appropriate ratio. Similarly, feeding the resulting acetate in conjunction with small quantities of gluconate balances the supply of carbon, ATP, and NADPH, which significantly accelerates lipid formation. The work advances our understanding of systems-level control of metabolism and can be applied to many other situations as an alternative tool for enhancing strain performance in metabolic engineering. Many products other than can also be synthesized from CO2 using the two-stage non- photosynthetic design as long as proper organisms are employed. As such, in order to expand the utility of the process, we also developed a data-driven host selection framework. By implementing a recommender system algorithm on strain-product-titer information collected from literature, we aimed to systematically summarize the criteria used for choosing an given a certain product of interest and vice versa. The results revealed an implicit principle that governs the selection of model versus non-model host organisms, which could benefit many industrial biotechnological applications.

Thesis Supervisor: Gregory Stephanopoulos Title: Willard Henry Dow Professor of Chemical Engineering and Biotechnology 3

4

Acknowledgements

To say that my six years of going through grad school was a ‘journey’ would be a massive understatement. It is by no means an exaggeration that the hardships I endured and the emotional swings I felt parallels that of Frodo as he crossed Middle Earth to reach Mount Doom (pardon my nerdy reference here). Although my PhD experience comes nowhere close to as exciting as his adventures (and it is probably best to leave it that way), I find peace in my mind knowing that we share something in common: Frodo always had his sidekick Sam close by his side to carry his footsteps onward when the need arose; I, too, had countless friends, families, and mentors who became the very giants that shouldered me throughout my research endeavor. Needless to say, I feel incredibly lucky and grateful to have everyone as part of my life here at MIT. First and foremost, I would like to express my sincerest gratitude towards my thesis advisor, Professor Gregory Stephanopoulos. Funnily enough, as part of a research group that specializes in metabolic engineering, I have rarely done much of the actual engineering, which at times brings me to question myself as to whether I am doing the right thing. Greg, on the other hand, never doubted my ideas and was patient enough to put up with a lot of the nonsense that I have thrown around throughout the years. I can confidently say that I would not have seen any of my projects through without his continued belief in my skills. His focus on impact and the utility of research have also been invaluable in pushing me to pay close attention to the big picture, in addition to laying a solid scientific foundation. All of his guidance and the resources he placed at my disposal, shaped me into the scientist I am today and for that, I am very thankful. Additionally, I would like to extend my acknowledgements to the rest of my thesis committee members, Professor Charles Cooney and Professor Richard Braatz, for their insightful comments and encouragements. The discussions we had during our meetings inspired me to look into areas beyond metabolic engineering and helped me build confidence in conducting scientific research. I am also indebted to Junyoung Park, whom I had the pleasure to collaborate with on the cofeeding project. In addition to the wealth of knowledge in cellular metabolism that he has imparted to me, what he has truly taught me is how to approach scientific research in general. Through our interactions, I have learned from him to design my experiments with a purpose in mind and to dig through my data extensively before rushing to the next experiment. Although he would always claim that these principles help him minimize manual labor and indulge on his ‘laziness’, I still hold him in the utmost respect as one of the best scientists I have had the honor to work with. Outside of lab and work, I regard Jun as a close friend and we had many interesting conversations and risky bets. Hopefully in the near future, I can finally cash in on some of the delicious Korean barbeque that he has promised me in LA. There are a number of other lab members and collaborators that I would like to thank as well. Yuting Zheng and Thomas Wasylenko were the first people who trained me when I joined the lab. With no prior experience in at all, it is a miracle I have made it this far and I owe it to both of them for helping me take the initial step. Kangjian Qiao was another postdoc in the lab I deeply respect. Much of his advice guided me through grad school, avoiding some of the common pitfalls, and I still discuss with him these days about planning my career. I interacted quite heavily with Zbigniew Lazar, who taught me a lot of neat tricks in molecular biology. Our interactions have really made the time I spend in lab much more enjoyable and I am honestly happy to have him as ‘my friend’. I have also had the pleasure to learn from Benjamin Woolston and his immeasurable

5 depth of knowledge in biotechnology and analytical chemistry. His actions have always inspired me to probe the fundamentals of every scientific aspect and not be content with superficial understandings. The time I spent with these mentors have really prepared me in becoming an independent researcher early on during grad school, which I leveraged to build many fruitful collaborations. In particular, I would like to acknowledge Junichi Mano for research on acid whey utilization, Suvi Santala and Ville Santala on wax ester production, David Emerson on acetogen metabolism, and Xiang Ji on microbial electrosynthesis. Of course, none of this would have been possible without the support from our lab’s administrative assistants, Rosangela dos Santos and Nicholas Pasinella, who worked tirelessly to take care of the mundane tasks and offered help whenever needed. Outside of research, I also have to thank a number of people in lab who I have had a lot of fun with. Jack Hammond was one of the few people I knew here that shared the same passion for gaming as I do. In many cases, I felt our personalities and general ‘chill’ attitudes were largely similar, which was refreshing in this hypercompetitive environment. I had many chats with Boonsum Uranukul and Alkiviadis Chatzivasileiou over coffee (despite me not drinking coffee), going over the small details of grad school life and the hardships associated with academic research. I also want to thank the Malden hotpot crew, Vincent Zu, Yongshuo Ma, Jingbo Li, and Zhengshan Luo, who were all terrific friends of mine that really gave my life here in the US a taste of home. Finally, I am grateful for meeting Bob Van Hove, Brian Pereira, Felix Lam, Constantinos Katsimpouras, Jingyang Xu, Sun Jin Moon, and many others whom I cannot possibly enumerate here. Every past and present member of the lab has helped me in one way or another as I crawled through the myriad of experiments and simulations that ultimately stockpiled into this thesis. I would like to give special thanks to my labmates Mark Keibler and Wentao Dong, both of which are very close friends who I spent a great deal of my time with. Mark would always accompany me in events such as going to escape rooms, playing Beatsaber, spontaneously wandering around town, and many of the other weird things that I do, not to mention his constant support whenever I am anxious about life and work. Now that I think about it, it truly is a miracle how we managed to have so many meals together given that he is a herbivore and I am (mostly) a carnivore. Then again, I guess having him as a consolidated source on (up-to-date) American politics, history, and pop culture more than makes up for our differences in culinary requirements. Wentao is another grad student who joined the lab in the same year as I did, and because of that, I never felt alone in any part of the journey towards graduation. Having fought through adapting to a new environment, advisor selection, progressing through degree requirements, and generally growing into mature researchers, I am glad that he was by my side all this time. Specifics aside, when it comes to Mark and Wentao, what moved me the most was their willingness to help whenever I am in need, regardless of the circumstance or how difficult the task was, and I cannot imagine what life at MIT would be without either of them. Apart from lab, I shared many fond memories with my friends. Albert Liu was a fantastic roommate that I spent over two years with and he really made a lot of the weekends and after-work hours that much more enjoyable. He has never stopped to amaze me with his impeccable attention to detail as well as his due diligence in working hard towards set goals, which at times motivated me to strive for perfection in my own duties. I also got to know Yiming Mo very well from the beginning of grad school, as he would always share some unique and interesting ideas in our discussions. Most likely related to this, he was the type of person to constantly make bold, yet ineffective, moves in board games, which commonly lead to a loss. Jim Chu was another friend of us who seems to know just about everything and I rarely had a boring conversation with him.

6

Towards the end of my PhD career, I also had the honor to meet Jingfan Yang and Ge Zhang who shared a lot of interests and hobbies with me. I am proud to say that my six years of student life at MIT has been much more meaningful with the company of my awesome friends. My family members have definitely been an integral part of my PhD experience. Both of my parents have been very supportive as I pursued my degree. As a professor himself, my dad pushed me to work on projects with a fundamental science component early on. Since then I have followed his advice in my research and tried to always look for mechanistic findings whenever possible. I am grateful for his general guidance on how to approach science, which helped complete my maturation in my chosen field. My mom, on the other hand, cared about my day-to-day life more than anyone else. She was always patient in listening to my (quite frequent) rants when I was having some downs, and at the end of the day it would help me clear my mind on what to do next, making me feel much more optimistic. Admittedly I would not have made it this far without knowing that both of my parents will have my back and encourage me no matter what happens. Finally (and I always save the best for last), if there is one single person that I can dedicate this thesis to, it would be my fiancé, Shichen Wang. Since we have been together for the past 8 and a half years, it is difficult to define our relationship simply through words. Perhaps the best description I can muster is that I never have to wear a mask in front of her. After a long day of work and getting on to the phone to chat with her, I no longer have to be the attentive scientist working out every single detail of an experiment; I no longer have to be the experienced collaborator pretending to know more than I actually do; I no longer have to be the concerned individual trying to maintain fragile relationships with people who did not care to reciprocate. Instead, I can brag about my 35 platinum trophies, complain about how upgrading from a 1080Ti to a 2080Ti broke my bank, and, most importantly, speak out my true feelings without any filters. I am, regrettably, an odd ball at MIT as well as many other instances of life, and I cannot express how grateful I am to have someone in my life who completely understands me and supports me with unconditional love.

7

8

Contents

1. Introduction

1.1. Technologies for scalable CO2 fixation ...... 20

1.2. Two-stage designs for non-photosynthetic CO2 conversion ...... 22

1.3. Thesis overview ...... 23

1.4. References ...... 26

2. 13C Metabolic Flux Analysis of acetate conversion to by Yarrowia lipolytica

2.1. Introduction ...... 30

2.2. Materials and methods ...... 33

2.2.1. Strain and culture conditions ...... 33

2.2.2. Cell quenching and metabolite extractions ...... 35

2.2.3. Metabolite analysis and quantifications ...... 36

2.2.4. Metabolic flux estimations ...... 38

2.3. Results ...... 42

2.3.1. profiles and establishment of metabolic steady state ...... 42

2.3.2. Extracellular fluxes ...... 47

2.3.3. Biomass compositions ...... 49

2.3.4. Intracellular fluxes ...... 49

2.3.5. Lipogenic NADPH source ...... 55

2.4. Discussion ...... 58

2.5. References ...... 61

9

3. Mixed substrate metabolism—an alternative approach to genetic modifications

3.1. Introduction ...... 68

3.2. Challenges in mixed substrate utilization—carbon catabolite repression ...... 71

3.3. Efforts to enable substrate co-utilization for better conversion of renewable

feedstocks ...... 72

3.4. Substrate co-utilization better balances biosynthesis components ...... 73

3.5. Substrate mixtures simultaneously activate multiple critical metabolic pathways ...... 75

3.6. Mixed substrate metabolism provides shortcut access to key synthesis pathways ...... 76

3.7. Multi-substrate enabled metabolic control enhances strain performance ...... 77

3.8. Conclusions and future outlook ...... 79

3.9. References ...... 80

4. Synergistic substrate cofeeding stimulates reductive metabolism

4.1. Introduction ...... 88

4.2. Materials and methods ...... 90

4.2.1. Strain and culture conditions ...... 90

4.2.2. Metabolite extraction and measurement ...... 93

4.2.3. Substrate uptake and product secretion measurement ...... 95

4.2.4. Headspace gas measurement ...... 97

4.2.5. Flux balance analysis and isotope tracing flux analysis ...... 98

4.2.6. Calculation of specific growth rate and productivities ...... 99

4.2.7. Genome-scale stoichiometric metabolic model ...... 100

4.2.8. Relationship between ATP availability and CO2 fixation ...... 100

10

4.2.9. Analysis of synergies associated with substrate cofeeding...... 100

4.3. Results ...... 108

4.3.1. Accelerating lipogenesis from acetate by enhancing NADPH generation in Y.

lipolytica ...... 108

4.3.2. Recursive NADPH generation via the pentose cycle ...... 113

4.3.3. Preferential use of glucose leads to excessive decarboxylation in CO2-fixing

M. thermoacetica ...... 114

– 4.3.4. Accelerating acetate production from CO2 by decoupling e supply from

decarboxylation ...... 118

4.3.5. Coordination of “doped” acetogenesis and lipogenesis ...... 122

4.4. Discussion ...... 125

4.5. References ...... 128

5. A recommender system for host organism selection

5.1. Introduction ...... 136

5.2. Methods...... 138

5.2.1. Data collection and processing ...... 138

5.2.2. The recommender system algorithm ...... 140

5.3. Results ...... 141

5.3.1. Characteristics of the dataset ...... 141

5.3.2. Algorithm testing ...... 147

5.3.3. Recommendation results for organisms ...... 149

5.3.4. Recommendation results for products ...... 155

11

5.4. Discussion ...... 160

5.5. References ...... 164

6. Conclusions and future directions

6.1. Summary and conclusions ...... 170

6.2. Suggestions for future directions ...... 173

6.2.1. Methanol as a more efficient NADPH generator ...... 173

6.2.2. Multi-substrate cofeeding ...... 174

6.3. References ...... 177

Appendix A ...... 179

Appendix B ...... 207

12

List of Figures

Chapter 1

Figure 1.1 Schematic of a scalable CO2 fixation and conversion process ...... 21

Figure 1.2 Two-stage non-photosynthetic design that converts H2/CO2 into lipids ...... 23

Chapter 2

Figure 2.1 Overview of the major metabolic pathways involved in triacylglyceride (TAG)

synthesis using acetate as the sole carbon source ...... 32

Figure 2.2 Fermentation profiles for extracellular metabolites ...... 44

Figure 2.3 Fermentation profiles for fatty acids and their biosynthetic precursors acetyl-CoA,

Glyc3P and NADPH ...... 45

Figure 2.4 Distribution profiles of fatty acids produced by Y. lipoltytica ...... 47

Figure 2.5 Exponential fits for all six cultures during the growth phase ...... 48

Figure 2.6 Flux distributions from 13C-Metabolic Flux Analysis ...... 52

Figure 2.7 Flux confidence intervals of selected metabolic reactions ...... 53

Figure 2.8 Flux confidence intervals for metabolic reactions related to glyoxylate shunt

pathway and ...... 54

Figure 2.9 Oxidative PPP flux confidence intervals ...... 57

Chapter 3

Figure 3.1 Schematic of fermentation using a mixture of carbon sources ...... 69

13

Figure 3.2 Examples of strategies where mixed substrate fermentation is employed to

optimize cellular metabolism, leading to increases in carbon yield and/or

...... 70

Chapter 4

Figure 4.1 Overexpression of gluconate kinase does not affect lipid production ...... 92

Figure 4.2 Continuous glucose cofeeding relieves repression of acetate in Y. lipolytica .....109

Figure 4.3 Preferential consumption of glucose, fructose, , and gluconate over

acetate by Y. lipolytica ...... 110

Figure 4.4 Cofeeding substrates near oxidative pentose phosphate pathway accelerates cell

growth and lipogenesis from acetate ...... 111

Figure 4.5 Simultaneous consumption of acetate and superior substrates by Y. lipolytica ...112

Figure 4.6 Enhanced cell growth and lipid production with substrate cofeeding in Y.

lipolytica ...... 112

Figure 4.7 Gluconate generates NADPH via the pentose cycle ...... 114

Figure 4.8 Glucose generates ATP for CO2 fixation but leads to decarboxylation in M.

thermoacetica ...... 115

Figure 4.9 Combination of glucose oxidation and some biosynthetic pathways enables net

CO2 fixation ...... 117

Figure 4.10 Labeling of CO2 in the cell and in the headspace ...... 118

Figure 4.11 M. thermoacetica growth rate decreases with increasing ATP demand while CO2

consumption drops suddenly to 0 once ATP demand reaches the threshold of ATP

generation capability ...... 119

14

Figure 4.12 Continuous glucose cofeeding accelerates acetogenesis from CO2 fixation at the

autotrophic limit ...... 120

Figure 4.13 M. thermoacetica glucose consumption rate decreases in the presence of H2 and

with increasing utilization of H2 ...... 121

Figure 4.14 Synergy and coordination of substrate cofeeding accelerate the conversion of CO2

and H2 into lipids ...... 123

Chapter 5

Figure 5.1 Most used organisms ranked by the number of associated entries in the dataset ......

...... 143

Figure 5.2 Top ten most used organisms ranked by the number of unique products ...... 144

Figure 5.3 Most produced products ranked by the number of associated entries in the dataset

...... 146

Figure 5.4 Most produced natural products ranked by the number of associated entries in the

dataset ...... 147

Figure 5.5 Titer histograms of the four major classes of bioproducts ...... 148

Figure 5.6 Comparison of validation error and classification accuracy between the ALS and

PPCA algorithms ...... 149

Figure 5.7 30 products with the highest titers produced by C. glutamicum as indicated by the

dataset ...... 154

Figure 5.8 Organisms that produced the highest titers of several (non-natural) products ....157

Figure 5.9 Organisms that produced the highest titers of several natural products ...... 160

15

Chapter 6

Figure 6.1 The dissimilatory RuMP pathway ...... 174

Figure 6.2 Exclusive conversion of methanol into NADPH through the RuMP cycle ...... 175

Figure 6.3 Multi-substrate cofeeding enhances cell density in addition to per-cell

productivity in M. thermoacetica H2/CO2 cultures ...... 177

16

List of Tables

Chapter 2

Table 2.1 Extracellular fluxes for 13C-MFA ...... 49

Table 2.2 Comparison of NADPH generation in modified models ...... 56

Chapter 4

Table 4.1 Acetate yields, productivities, and CO2 fixation rates with varying energy sources

in M. thermoacetica ...... 94

Chapter 5

Table 5.1 Top recommended products from the recommender system for selected organisms

...... 151

Table 5.2 Top recommended organisms from the recommender system for selected non-

natural and natural products ...... 158

Appendix A

Table A1 Model for growth phase ...... 180

Table A2 Model metabolic network for lipid production phase ...... 183

Table A3 Biomass formula for the MTYL037 and MTYL065 strains during the growth phase

...... 185

Table A4 Intracellular metabolite mass isotopomer distribution during growth phase ...... 186

Table A5 Intracellular metabolite mass isotopomer distribution during lipid production phase

...... 192

17

Table A6 MTYL037 growth phase best-fit flux values and flux confidence intervals ...... 197

Table A7 MTYL065 growth phase best-fit flux values and flux confidence intervals ...... 200

Table A8 MTYL037 lipid production phase best-fit flux values and flux confidence intervals

...... 203

Table A9 MTYL065 lipid production phase best-fit flux values and flux confidence intervals

...... 205

Appendix B

Table B1 Steady-state metabolite labeling from 13C gluconate in Y. lipolytica lipogenic phase

...... 208

Table B2 Metabolic flux distributions of Y. lipolytica lipogenic phase with 95% confidence

intervals (mmol gCDW–1 hr–1) determined by isotopomer balancing ...... 209

Table B3 Steady-state metabolite labeling from 13C glucose in M. thermoacetica ...... 210

Table B4 Metabolic flux distributions of M. thermoacetica with 95% confidence intervals

(mmol gCDW–1 hr–1) determined by isotopomer balancing ...... 211

Table B5 New reactions and genes added to the M. thermoacetica metabolic model iAI563

...... 212

Table B6 Corrected reactions in the M. thermoacetica metabolic model iAI563 ...... 212

18

Chapter 1

Introduction

19

1.1 Technologies for scalable CO2 fixation

st A growing concern of the 21 century is the irrevocable rise in CO2 levels caused by our current reliance on fossil energy and petroleum. This has prompted the search for renewable alternatives that can not only sequester atmospheric carbon, but also convert it into value-added products, achieving a carbon-neutral or even carbon-negative economy. Methods related to carbon capture and revalorization can generally be classified into two broad categories: electrochemical or biological. While the former has the potential to fix large amounts of CO2, its ability to generate various compounds is often limited and it also suffers from low selectivity (Godoy et al., 2017).

By contrast, biological CO2 fixation, combined with enzyme and metabolic engineering techniques, offers a much higher degree of versatility in transforming the CO2 into fuels, chemicals, materials, and other products (Godoy et al., 2017). Therefore, biological systems for carbon conversion represent a promising approach to reduce our dependence on non-renewable resources and eventually move to a sustainable future.

Biological CO2 fixation can be further categorized into either photosynthetic or non- photosynthetic approaches. In nature, the vast majority of CO2 consumption occurs through the photosynthetic route, deriving energy directly from sunlight (Appel et al., 2013). This has also been exploited for large-scale CO2 fixation in the form of lignocellulosic biomass. However photosynthetic systems are usually very inefficient at harvesting energy (Godoy et al., 2017), while also having slow rates for CO2 incorporation (Gonzales, Matson and Atsumi, 2019). On the other hand, non-photosynthetic CO2 fixation, which derives its energy from reduced inorganic compounds (such as H2, H2S, and CO) or electricity (Fast and Papoutsakis, 2012), generally operate at much higher rates and energetic efficiencies (Daniell, Köpke and Simpson, 2012; Hu,

Rismani-Yazdi and Stephanopoulos, 2013; Gonzales, Matson and Atsumi, 2019). In particular, a

20 non-photosynthetic CO2 fixation design using H2 or electricity as the energy input is most appealing since it can be coupled to photovoltaics, allowing the process to use sunlight as the primary driver (Figure 1.1). Consequently, this approach combines the high selectivity and versatility offered by biological systems with the high efficiencies offered by material and electrochemical systems. Considering that the energy efficiencies of commercially available solar panels, water electrolysis, and biological conversion of H2/CO2 to reduced compounds are ~20%,

>80%, and >50%, respectively (Battersby, 2019; Park et al., 2019; Shiva Kumar and Himabindu,

2019), over 8% of the solar energy can be ultimately captured in the final product, which is much higher than the typical <1% reported for plants (Blankenship et al., 2011) and on par with other innovative designs (Liu et al., 2016). Furthermore, its scalability and relatively inexpensive costs are also the key features that grants this technology the potential to disrupt our current infrastructure. Hence, the current thesis work revolves around this design and is dedicated to analyzing the CO2-fixing biological step of the process, detailing tractable methods that enhance its performance.

− CO e H2 2 hν biofuels

e− biopolymers pharmaceuticals

Figure 1.1. Schematic of a scalable CO2 fixation and conversion process. This system combines photovoltaics for energy generation and non-photosynthetic biological systems for

21 carbon conversion. Various products can be synthesized directly from CO2 using either electrical energy generated directly from sunlight, or H2 as an energy carrier from water electrolysis.

1.2 Two-stage designs for non-photosynthetic CO2 conversion

− The biological aspect of the CO2 conversion system shown in Figure 1.1 utilizes either H2 or e from electricity to biochemically reduce CO2 into value-added products. To maximize the carbon fixation potential and product diversity, a two-stage bioprocess was conceived, where an acetogenic bacterium is employed in the first stage to fix CO2 into acetic acid using H2 as the energy source, and various organisms can be used in the second stage to upgrade the acetic acid into different compounds (Hu et al., 2016). This design has many advantages. First of all, operate the reductive acetyl-CoA pathway (i.e., the Wood-Ljungdahl pathway) for CO2 fixation, which is the most efficient compared to all other naturally occurring pathways (Fast and

Papoutsakis, 2012; Gong, Cai and Li, 2016). The setup also only requires the bacterium to synthesize its native, most preferred metabolic end-product, acetic acid, thereby eliminating the need to redirect carbon flux, which is still a major challenge (Woolston et al., 2018). In addition, feeding the resulting acetic acid into a separate stage allows the adoption of many existing engineered microbes tailored to the synthesis of specific products so long as they can be cultured on this carbon source. An example of a bioprocess that utilizes this concept is shown in Figure 1.2 where thermoacetica, an acetogen that produces exclusively acetic acid (Pierce et al.,

2008), and Yarrowia lipolytica, an oleaginous yeast capable of synthesizing lipids as a form of biodiesel (Abdel-Mawgoud et al., 2018), are employed. Although a proof-of-concept work has demonstrated the feasibility of the design in achieving the H2/CO2-to-acetate-to-lipids conversion

(Hu et al., 2016), the inherent metabolic rates for both of these organisms cultured in their respective conditions are rather slow, leaving room for further improvement and optimization.

22

Therefore, the system will serve as a starting point for this thesis where the limitations are systematically analyzed and unique solutions to overcome the bottlenecks are presented.

Additional ideas to expand the utility of the current design will also be discussed.

acetate Moorella Yarrowia thermoacetica lipolytica

lipids CO2 H 2 Figure 1.2. Two-stage non-photosynthetic design that converts H2/CO2 into lipids. The first stage employs an acetogenic bacterium M. thermoacetica to fix CO2 into acetic acid using H2 as the primary energy input, whereas the second stage feeds the acetic acid to Y. lipolytica, which excels at making lipids as a form of biodiesel. The second stage can also house other organisms to achieve the synthesis of different products from CO2.

1.3 Thesis overview

As stated previously, the overarching goal of this thesis is to optimize the bioprocess shown in

Figure 1.2, where the general issue is slow metabolism related to autotrophic growth on H2/CO2 for M. thermoacetica and heterotrophic grow on acetic acid for Y. lipolytica. Tools such as quantitative metabolomics, isotopic tracing, and metabolic flux analysis (MFA) will be frequently used in this work to access microbial metabolism under various culture conditions.

23

We open by examining the root causes that led to the undesirably low metabolic rates in our system. While it is well known that low ATP levels bottlenecks cell growth as well as reductive acetyl-CoA pathway flux in M. thermoacetica (Fast and Papoutsakis, 2012), no prior work has investigated the challenges related to acetate metabolism in Y. lipolytica. Thus, the focus of

Chapter 2 will be on this topic where 13C-MFA is employed to study Y. lipolytica grown on acetic acid as the sole carbon source. In this chapter, we will illustrate how insufficient gluconeogenic and oxidative pentose phosphate pathway (oxPPP) fluxes limited the rate at which cells can generate NADPH, eventually hindering lipid synthesis. A mechanism by which the cells regulate fluxes through these pathways is also proposed.

Knowing the specific bottlenecks of the two organisms in our process, we then devised strategies to overcome the limitations accordingly. Traditionally, genetic targets for overexpression or down-regulation would be identified at this stage, followed by extensive genetic engineering. However, given that the successful transformation of M. thermoacetica is still rather difficult, we adopted an alternative approach where judiciously chosen substrate pairs that complement each other in terms of biological function were co-fed to the cells, achieving similar effects to genetic modulations. Since the co-utilization of multiple substrates is not commonly seen, Chapter 3 provides a brief introduction into the subject and illustrates the cases where cells cultured on more than one carbon source can lead to significant performance enhancements. The goal here is to demonstrate some of the unique benefits that multi-substrate metabolism offers, providing metabolic engineers an additional aspect to consider in conjunction with established genetic engineering methods when it comes to strain and process optimization.

Then in Chapter 4, we will present our dual substrate co-feeding scheme in detail and how we applied it to the two-stage CO2-fixation process. We discovered that controlling the feed rate of

24 the preferred substrate to maintain negligible concentrations in systems dominated by less- preferred substrates promotes co-utilization, eliminating issues related to catabolite repression.

This in turn allowed us to explore the mixed substrate metabolism landscape. In particular, we were able to find conditions that led to synergistic improvements, where the productivities stemming from dual-substrate co-feeding exceeded that of the sum of the individual substrates’ productivities (V12 > V1 + V2 where V represents the productivity on either substrate (1 or 2 as subscript) or both substrates (12 as subscript)). This shows that the co-feeding design went beyond simply adding more energy or carbon into the system by providing an additional substrate. Instead, the metabolic state of the cells changed when exposed to complementary carbon sources and the underlying mechanisms were elucidated. We also established that the concept should be broadly applicable to situations other than the one we investigated, making it a useful tool for industrial biotechnology.

In addition to the optimizations performed for the process shown in Figure 1.2, we also sought ways to extend the utility of the two-stage design. As mentioned previously, distributing the task of non-photosynthetic CO2-to-product conversion between two separate organisms allows the acetogen to dedicate its resources to fixing CO2 without the need to consider any additional requirements. At the same time, the flexibility of the product-formation stage is preserved, with many possible microbes to choose from. In light of this, finding the ideal organism given a predetermined product or vice versa becomes key to the design of this stage. Chapter 5 addresses this question by implementing a recommender system (a machine learning algorithm) based on literature data in the metabolic engineering space to generate suitable strain-product pairs that can potentially achieve high titers. In a broader context, the work will summarize the host selection

25 principles that have been implicitly applied in the past two decades and attempts to rationalize a criterion for selecting model versus non-model organisms.

Finally, Chapter 6 will conclude this thesis by summarizing the general findings in each chapter as well as their implications. Several topics that focus on expanding the concept of multi- substrate co-feeding will also be listed and accompanied with preliminary results. We hope that these ideas will be a good starting point for related research in the near future.

1.4 References

Abdel-Mawgoud, A. M. et al. (2018) ‘Metabolic engineering in the host Yarrowia lipolytica’,

Metabolic Engineering. 50, pp. 192–208. doi: 10.1016/j.ymben.2018.07.016.

Appel, A. M. et al. (2013) ‘Frontiers, opportunities, and challenges in biochemical and chemical

catalysis of CO2 fixation’, Chemical Reviews, 113(8), pp. 6621–6658. doi:

10.1021/cr300463y.

Battersby, S. (2019) ‘The solar cell of the future’, Proceedings of the National Academy of

Sciences of the United States of America, 116(1), pp. 7–10. doi:

10.1073/pnas.1820406116.

Blankenship, R. E. et al. (2011) ‘Comparing photosynthetic and photovoltaic efficiencies and

recognizing the potential for improvement’, Science, 332(6031), pp. 805–809. doi:

10.1126/science.1200165.

Daniell, J., Köpke, M. and Simpson, S. D. (2012) Commercial biomass syngas fermentation,

Energies. doi: 10.3390/en5125372.

Fast, A. G. and Papoutsakis, E. T. (2012) ‘Stoichiometric and energetic analyses of non-

photosynthetic CO2-fixation pathways to support synthetic biology strategies for

26

production of fuels and chemicals’, Current Opinion in Chemical Engineering. 1(4), pp.

380–395. doi: 10.1016/j.coche.2012.07.005.

Godoy, M. S. et al. (2017) ‘About how to capture and exploit the CO2 surplus that nature, per se,

is not capable of fixing’, Microbial Biotechnology, 10(5), pp. 1216–1225. doi:

10.1111/1751-7915.12805.

Gong, F., Cai, Z. and Li, Y. (2016) ‘Synthetic biology for CO2 fixation’, Science China Life

Sciences, 59(11), pp. 1106–1114. doi: 10.1007/s11427-016-0304-2.

Gonzales, J. N., Matson, M. M. and Atsumi, S. (2019) ‘Nonphotosynthetic Biological CO2

Reduction’, Biochemistry, 58(11), pp. 1470–1477. doi: 10.1021/acs.biochem.8b00937.

Hu, P. et al. (2016) ‘Integrated bioprocess for conversion of gaseous substrates to liquids’,

Proceedings of the National Academy of Sciences of the United States of America,

113(14), pp. 3773–3778. doi: 10.1073/pnas.1516867113.

Hu, P., Rismani-Yazdi, H. and Stephanopoulos, G. (2013) ‘Anaerobic CO2 fixation by the

acetogenic bacterium moorella thermoacetica’, AIChE Journal, 59(9), pp. 3176–3183.

doi: 10.1002/aic.

Liu, C. et al. (2016) ‘Water splitting-biosynthetic system with CO2 reduction efficiencies

exceeding ’, Science, 352(6290), pp. 1210–1213. doi:

10.1017/cbo9781139941785.014.

Park, J. O. et al. (2019) ‘Synergistic substrate cofeeding stimulates reductive metabolism’,

Nature Metabolism. 1(6), pp. 643–651. doi: 10.1038/s42255-019-0077-0.

Pierce, E. et al. (2008) ‘The complete genome sequence of Moorella thermoacetica (f.

Clostridium thermoaceticum)’, Environmental Microbiology, 10(10), pp. 2550–2573. doi:

10.1111/j.1462-2920.2008.01679.x.

27

Shiva Kumar, S. and Himabindu, V. (2019) ‘ production by PEM water electrolysis –

A review’, Materials Science for Energy Technologies. 2(3), pp. 442–454. doi:

10.1016/j.mset.2019.03.002.

Woolston, B. M. et al. (2018) ‘Rediverting carbon flux in Clostridium ljungdahlii using CRISPR

interference (CRISPRi)’, Metabolic Engineering. doi: 10.1016/j.ymben.2018.06.006.

28

Chapter 2

13C Metabolic Flux Analysis of Acetate Conversion to Lipids by Yarrowia lipolytica

This chapter is adapted from

Liu, N., Qiao, K. & Stephanopoulos, G., 2016. 13C Metabolic Flux Analysis of acetate conversion to lipids by Yarrowia lipolytica. Metabolic Engineering, 38, pp86-97.

29

2.1 Introduction

Concerns about global have motivated research in seeking renewable energy sources to replace fossil liquid fuels for transportation. Biodiesel based on triacylglyceride (TAG) and its derivatives represents an attractive route for decarbonizing heavy-duty transport and aviation industries. Various methods have emerged for the production of these fuels with ideal properties from a wide range of cheap and renewable feedstocks (Li, Du and Liu, 2008; Meng et al., 2009). As mentioned in the previous chapter, we propose a scalable two-step process based on non-photosynthetic means where the first step converts H2/CO2 into acetic acid using anaerobic fermentation and the second step converts the resulting acid into lipids using an oleaginous organism (Fei et al., 2011; Morgan-Sagastume et al., 2011; Fontanille et al., 2012; Hu et al., 2016).

While anaerobic gas fermentation with acetogenic (e.g., the first stage) has been well- characterized, knowledge related to acetate metabolism, which is crucial in designing the second stage, is scarce due to limited experience with this carbon source.

Engineering of microorganisms that exhibit high lipid titer, productivity, and yield from acetic acid is key to achieving the cost-effective operation of this process. The model oleaginous yeast

Yarrowia lipolytica has emerged as a promising biocatalyst to serve such purposes due to its superior capability of TAG overproduction and storage, availability of genome sequencing data and established genetic engineering tools (Dujon et al., 2004; Beopoulos et al., 2009). However, although numerous efforts have been made to understand and engineer lipid accumulation in Y. lipolytica, they are almost exclusively focused on using (sugars) as the starting feedstock (Beopoulos, Chardot and Nicaud, 2009; Papanikolaou et al., 2009; Blazeck et al., 2014;

Qiao et al., 2015; Wasylenko, Ahn and Stephanopoulos, 2015) and few studies have been conducted to elucidate its metabolism when acetate is used as the sole carbon source. The

30 metabolism of Y. lipolytica on acetate is significantly different from that of glucose. Initially, acetate is imported into the and activated to acetyl-CoA (AcCoA) by acetyl-CoA synthetase (ACS, YALI0F05962g) at the expense of two molecules of ATP equivalents (Jogl and

Tong, 2004). The resulting cytosolic AcCoA has many metabolic destinations as shown in Figure

2.1. For example, it can be directly incorporated into lipids during de novo fatty acid biosynthesis, or transported into the mitochondria via the carnitine shuttle and enter the tricarboxylic acid (TCA) cycle for energy production. The most prominent distinction between acetate and glucose metabolism is that the former activates the glyoxylate shunt and gluconeogenesis pathways

(Kornberg, 1966; Eaton, 2002; Eschrich, Kötter and Entian, 2002). The glyoxylate shunt pathway involves the export of isocitrate from the mitochondria to the cytosol, cleavage of isocitrate into glyoxylate and succinate by isocitrate lyase (ICL, YALI0C16885p and YALI0F31999p), and condensation of glyoxylate with AcCoA to form malate by malate synthase (MES,

YALI0D19140p and YALI0E15708g). This process bypasses the two decarboxylation steps of isocitrate in the TCA cycle and replenishes the metabolic pools of malate and oxaloacetate, thereby conserving carbon atoms for anaplerotic purposes. Cytosolic oxaloacetate also serves as an entry point to gluconeogenesis. Phosphoenolpyruvate carboxykinase (PEPCK, YALI0C16995p) converts the oxaloacetate into phosphoenolpyruvate (PEP), which can then proceed through the gluconeogenic reactions. This process is essentially the reverse of and it utilizes all of the glycolytic enzymes except for the step of fructose-1,6-bisphosphate (FBP) to fructose-6- phosphate (F6P) which requires fructose-1,6-bisphosphatase (YALI0A15972p). The significance of this pathway is that it can replenish glycolytic intermediates essential for macromolecule synthesis and that it provides flux through the pentose phosphate pathway (PPP). Apart from the aforementioned metabolic reactions, the source of NADPH is also a major consideration given that

31 lipid biosynthesis requires large amounts of this reducing . Previous studies on Y. lipolytica using glucose as the carbon source suggest that this lipogenic NADPH is supplied primarily from the oxidative pentose phosphate pathway and not from malic enzyme (Zhang et al., 2013;

Wasylenko, Ahn and Stephanopoulos, 2015).

Figure 2.1. Overview of the major metabolic pathways involved in triacylglyceride (TAG) synthesis using acetate as the sole carbon source. Metabolites in bold represent the final destinations of the carbon atoms coming from acetate. Abbreviations used: AcCoA, acetyl-CoA;

TAGs, triacylglycerides; TCA, tricarboxylic acid; oxPPP, oxidative pentose phosphate pathway; non-oxPPP, non-oxidative pentose phosphate pathway.

These features of acetate metabolism suggest that there must be sufficient flux through gluconeogenesis to support biomass synthesis and NADPH generation through oxidative PPP.

However, this flux must be tightly regulated as it draws away resources for energy (ATP) generation through the TCA cycle, which in turn competes with pathways involved in lipid synthesis (Figure 2.1). Consequently, in order for Y. lipolytica to attain high lipid accumulation

32 on acetate, the cells need to distribute its carbon source efficiently among the above pathways as to optimally satisfy both the energy and TAG synthesis requirements.

13C Metabolic Flux Analysis (MFA) is an effective tool to determine the intracellular metabolic flux distribution within the cell using experimental data (Wiechert, 2001). In this study, we conducted stationary 13C-MFA on two Y. lipolytica strains—a previously engineered strain for lipid overproduction and a control strain, with acetate as the sole carbon source. The goal was to elucidate how the organism partitions the carbon atoms from acetate throughout the lipid synthesis pathway, TCA cycle, glyoxylate shunt, gluconeogenesis, and PPP in order to identify potential bottlenecks. In addition to the metabolism during normal growth and cell division, a nitrogen limiting condition that triggers lipid accumulation in oleaginous organisms was also investigated, resulting in a total of four cases (Boulton and Ratledge, 1981; Evans and Ratledge, 1984). Parallel

13 13 labeling experiments were conducted using two different C sodium acetate tracers (1- C1

13 sodium acetate and U- C2 sodium acetate), metabolic models were constructed for both the growth and lipid production phases, and intracellular flux estimations were obtained by fitting experimental data to the model. Results indicate that malate transport and pyruvate kinase play crucial roles in controlling the flux through gluconeogenesis and that the oxidative PPP is the primary source of NADPH supporting the lipogenesis in Y. lipolytica.

2.2 Materials and methods

2.2.1 Strain and culture conditions

Two strains of Y. lipolytica were used for all 13C-MFA experiments: a control strain MTYL037 and a previously engineered lipid overproducing strain MTYL065 which overexpresses ACC1

(acetyl-CoA carboxylase 1) and DGA1 (diacylglycerol acyltransferase 1) (Tai and

33

Stephanopoulos, 2013). In the engineered strain, the enzymes encoded by the two overexpressed genes catalyze the first and last step of TAG synthesis respectively, thereby greatly enhancing the flux through this pathway. As a result, this strain has higher demands for cytosolic acetyl-CoA and

NADPH. Prior to the experiment, both strains were maintained at 4 °C on minimal media plates containing 20 g/L glucose, 5 g/L ammonium sulfate, and 1.7 g/L yeast nitrogen base without amino acids and ammonium sulfate (YNB-AA-AS). To prepare for the 13C-MFA experiments, one test tube (14 mL total volume) starter culture was set up for each strain by inoculating from the corresponding plate. The medium contained 2 mL of yeast extract-peptone-dextrose (20 g/L glucose, 20 g/L peptone, and 10 g/L yeast extract) to rapidly accumulate cell density. After 24 hours, 1 mL of the test tube cultures were transferred to 40 mL shake flask (250 mL total volume) cultures containing 50 g/L sodium acetate, 1.34 g/L ammonium sulfate, and 1.7 g/L YNB-AA-AS to adapt the cells and synchronize growth. The carbon to nitrogen (C/N) ratio of the shake flask culture medium was 60:1. All tube and shake flask cultures were incubated at 30 °C and 250 rpm.

After 24 hours, each shake flask culture was used to inoculate three batch mini-bioreactors

(Applikon Biotechnology MiniBio 250 mL, Foster City, CA) to an initial OD600 of 0.05. The working volume for all bioreactor cultures was 150 mL and the media had the same composition as that of the shake flask. However, the sodium acetate substrate for each of the three bioreactor

13 cultures were different. One contained 100% 1- C1 sodium acetate (Cambridge Isotope

Laboratories, Tewksbury, MA), another contained 100% sodium acetate labeled to natural

13 abundance, and the last contained 40 mol% U- C2 sodium acetate (Cambridge Isotope

Laboratories, Tewksbury, MA). All bioreactor cultures were maintained at a temperature of 30 °C and a pH of 7.0 through the addition of 10 wt% sulfuric acid. The aeration rate was 1 vvm and the dissolved level (DO) was maintained at 20% through agitation. 100 μL 20 vol% Antifoam

34

204 (Sigma-Aldrich) was added at 6 and 24 hours after inoculation to prevent foaming. The sampling port was located at the bottom of the bioreactor.

Cells from the shake flask cultures were washed prior to inoculation into the bioreactor. The appropriate culture volume was centrifuged at 18,000 g for 5 min, after which the supernatant discarded and the cell pellet resuspended in 1 mL of the culture medium to be used in the 13C-

MFA experiment. A second centrifugation step was carried out and the supernatant was discarded.

The cells were then resuspended in 1 mL medium to be used in the 13C-MFA experiment and transferred to the bioreactors.

2.2.2 Cell quenching and metabolite extractions

To obtain intracellular metabolites, 7.5 mL cell culture was quenched in 37.5 mL methanol precooled in an ethanol-dry ice bath (< -70 °C). After centrifugation at -10 °C and 3270g for 5 min, the supernatant was carefully removed through aspiration. The cells were then resuspended in 40 mL cold methanol (< -70 °C) for a wash step and centrifuged again under the same conditions. Following the aspiration of the supernatant, 5 mL 75% ethanol preheated in a water bath (80 °C) was added to the cell pellet for simultaneous lysis of the cell and extraction of intracellular metabolites. Samples were then vortexed for 30 s, incubated in the 80 °C water bath for 3 min, vortexed again for 30 s, briefly cooled in the ethanol-dry ice bath, and centrifuged (same conditions). The supernatant containing the cell extracts was split into two fractions: 3.5 mL was used for LC-MS/MS analysis and 1.5 mL was used for GC-MS analysis. All samples were then dried under airflow using a Pierce Reacti-Therm III Heating/Stirring Module and stored at -80 °C.

2.2.3 Metabolite analysis and quantification

35

Extracellular acetate and citrate concentrations were quantified using High-Performance Lipid

Chromatography (HPLC). 1 mL sample was extracted from each bioreactor culture and centrifuged at 18,000 g for 10 min. The supernatant was filtered through 0.2 μm Nylon syringe filters (Denville Scientific Inc., Holliston, MA) and analyzed on an Agilent 1200 HPLC system coupled to a G1362A Refractive Index Detector. A Bio-Rad HPX-87H column was used for separation with 14 mM sulfuric acid as the mobile phase flowing at a rate of 0.7 mL/min. The injection volume was 10 μL. Extracellular concentration of ammonium sulfate was measured using an Ammonium Assay Kit (Sigma-Aldrich).

The dry cell weight (DCW) was measured by extracting 1 mL sample from the bioreactor culture and vacuum-filtering it on a pre-weighed 0.2 μm nitrocellulose filter paper (Whatman,

Pittsburg, PA). After washing with 2 volumes of Milli-Q water, the samples were dried at 60 °C and weighed again after 24 hours. For each time point measurement, a control filter was prepared by filtering 1 mL natural abundance sodium acetate medium followed by washing. This was used to correct for changes in filter mass during sample preparation.

The fatty acids synthesized by Y. lipolytica including palmitate (C16:0), palmitoleate (C16:1), stearate (C18:0), oleate (C18:1) and linoleate (C18:2) were quantified using a Gas

Chromatography coupled to a Flame Ionization Detector (GC-FID). 0.1-1 mL cell culture was extracted from each bioreactor such that the sample contained approximately 1 mg biomass. A centrifugation step at 18,000 g for 10 min was performed and the supernatant discarded. Cell pellets were then stored at -20 °C until the analysis of fatty acids. For the analysis step, 100 μL internal standard containing 2 mg/mL methyl tridecanoate (Sigma-Aldrich) and 2 mg/mL glyceryl triheptadecanoate (Sigma-Aldrich) dissolved in hexane was added to each sample. Methyl tridecanoate was used for volume loss correction during sample preparation and glyceryl

36 triheptadecanoate was used for transesterification efficiency correction. 500 μL 0.5 N sodium methoxide (20 g/L sodium hydroxide in anhydrous methanol) was then added and the samples were vortexed at 1200 rpm for 60 min to allow for the transesterification of lipids to fatty acid methyl esters (FAMEs). Afterwards, 40 μL of 98% sulfuric acid was added to neutralize the pH.

The FAMEs were then extracted through the addition of 500 μL hexane followed by vortexing at

1200 rpm for 30 min. Centrifugation at 6000g for 1 min was performed to remove cellular debris and the top hexane layer was extracted for analysis. Separation of the FAME was achieved on an Agilent J&W HP-INNOWax capillary column with a Bruker 450-GC system. The injection volume was 1 μL, split ratio was 10, and injection temperature was 260 °C. The column was held at a constant temperature of 200 °C and helium was used as the carrier gas with a flow rate of 1.5 mL/min. The FID was set at a temperature of 260 °C with the flow rates of helium make up gas, hydrogen, and air at 25 mL/min, 30 mL/min, and 300 mL/min respectively.

Intracellular metabolites in the TCA cycle as well as free amino acids were analyzed by gas chromatography-mass spectrometry (GC-MS). Metabolite extracts were resuspended in 20 μL 2% methoxyamine-hydrogen chloride in pyridine (MOX Reagent, Thermo Scientific) and the reaction proceeded for 90 min at 37 °C. Subsequently, 25 μL N-tert-butyldimethylsilyl-N- methyltrifluoroacetamide with 1% tert-butyldimethylchlorosilane (TBDMS, Sigma-Aldrich) was added and the samples were incubated for 60 min at 56 °C. Following centrifugation to remove cell debris, the supernatant was analyzed on an Agilent 6890N Network GC System coupled to an

Agilent 5975B Inert XL MSD. 3 μL sample was injected in splitless mode with an inlet temperature of 270 °C. An Agilent J&W DB-35ms column was used with helium as the carrier gas flowing at a rate of 1 mL/min. The temperature of the GC oven was initially set at 100 °C for

1 min, increased to 105 °C at 2.5 °C/min, held at 105 °C for 2 min, increased to 250 °C at 3.5

37

°C/min, and finally increased to 320 °C at 20 °C/min. The MS operated in electron ionization mode. Electron energy was 69.9 eV and the source and quadrupole temperatures were 230 °C and

150 °C respectively. Mass spectra were obtained using Selective Ion Monitoring (SIM) mode (Ahn and Antoniewicz, 2011).

All other intracellular metabolites, namely the glycolytic and PPP intermediates were analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS). Metabolite extracts were resuspended in 80 μL Milipore water. An Agilent 1100 Series HPLC system coupled to an API

2000 MS/MS (AB Sciex, Framingham, MA) was used for analysis with an injection volume of 20

μL. Separation of metabolites was performed on a Waters XBridge C18 Column using an ion pair chromatography method (Luo et al., 2007). The flow rate for the mobile phase (mixture of A and

B where A was 10 mM tributylamine + 15 mM acetic acid and B was methanol) was 300 μL/min with the following solvent profile: 0% B for 8 min; increase to 22.5% B from 8 min to 18 min; increase to 40% B from 18 min to 28 min; increase to 60% B from 28 min to 32 min; increase to

90% B from 32 min to 34 min; held at 90% B from 34 min to 36 min; increase to 100% B from 36 min to 37 min; held at 100% B from 37 min to 42 min. Mass spectra were obtained using multiple reaction monitoring (MRM) mode.

2.2.4 Metabolic flux estimations

The extracellular flux for TAG synthesis was decomposed into that of the two constitutive components of TAGs: AcCoA and glycerol-3-phosphate (Glyc3P). This was performed by measuring the fatty acid distribution of the TAGs during each phase to determine the relative amounts of each fatty acid synthesized. Then the total amount of AcCoA and Glyc3P required was calculated by assuming that 1 mol AcCoA was used for every 2 mol carbon incorporated into the

38 fatty acids and 1 mol Glyc3P was used for every 3 mol fatty acids incorporated into the TAGs.

The amount of lipogenic NADPH required was also calculated by assuming that 2 NADPH was used for every fatty acid elongation step and 1 NADPH was used for every desaturation step. Note that a total balance on lipogenic NADPH consumption and production was not used in the metabolic flux estimation. As such, a mass balance constraint for NADPH was not included in the model in order to avoid introducing errors due to assumptions on enzyme cofactor preference and unidentified other sources and sinks of NADPH (Schmidt et al., 1998; Ahn and Antoniewicz,

2011).

All extracellular fluxes were normalized to an acetate uptake rate of 100. Linear regressions were performed on plots of DCW, citrate, and lipogenic AcCoA, Glyc3P, and NADPH versus acetate and the slope of the best-fit line was determined. Since this slope represents the yield of these metabolites on acetate, its value multiplied by 100 gives the normalized extracellular fluxes based on the arbitrarily fixed acetate consumption rate. The results from the three bioreactor cultures for each strain were averaged and the standard deviations were viewed as the uncertainties in the obtained extracellular flux values.

As for intracellular flux estimation, two separate compartmentalized bioreaction networks were constructed for the growth (G) phase and the lipid production (LP) phase. The G phase model consisted of the enzymatic reactions for the TCA cycle, glyoxylate shunt, gluconeogenesis, PPP, and one-carbon metabolism, as well as the pathways for synthesis of biomass constituents. The reactions for the non-oxidative PPP was modeled using half reactions (Kleijn et al., 2005). The biomass composition for Y. lipolytica was adapted from literature (Pan and Hua, 2012). To account for variability in lipid content (gram TAGs per gram DCW), the biomass formula was adjusted such that the actual measured lipid content in this study was used. As a result, the biomass equation

39 was slightly different for the MTYL037 versus the MTYL065 strain. Furthermore, synthesis of

TAGs was not included in the biomass equation and was represented by the reactions for lipogenic

AcCoA and Glyc3P consumption (see section 2.5) in order to better facilitate comparison of TAG production across strains and fermentation phases. Reactions for mitochondrial malic enzyme as well as the gluconeogenic enzymes phosphoenolpyruvate carboxykinase and fructose-1,6- bisphosphatase were also included (Perea and Gancedo, 1982; Jardón, Gancedo and Flores, 2008;

Beopoulos, Nicaud and Gaillardin, 2011). It was shown that in Y. lipolytica, the anaplerotic function of is not essential for growth so long as the glyoxylate shunt pathway is active, which is indeed the case for the conditions used in this study (Flores and

Gancedo, 2005). Therefore, cytosolic pyruvate carboxylase was omitted from the model to avoid futile cycling mediated by the three enzymes pyruvate carboxylase, pyruvate kinase, and PEP carboxykinase during the modeling process. Separate cytosolic and mitochondrial pools for citrate, succinate, malate, pyruvate, AcCoA and oxaloacetate were constructed. For each of these metabolites, the labeling patterns from both pools were allowed to contribute to the experimentally measured total isotopomer distributions with the relative contributions of each compartment left as a free parameter for estimation. Reactions for transporting compartmentalized metabolites between the cytosol and the mitochondria were also included. Transportation of cytosolic AcCoA into the mitochondria was assumed to be carried out reversibly by the carnitine shuttle (Eaton,

2002). The transport of pyruvate from the cytosol to the mitochondria was assumed to proceed unidirectionally (Maaheimo et al., 2001). Succinate and malate were assumed to be transported through decarboxylate carriers in a unidirectional fashion from the cytosol to the mitochondria

(Luévano-Martínez et al., 2010). The carrier for oxaloacetate was omitted since its inclusion did not affect the results for flux estimation significantly (Palmieri et al., 1999). As for the LP phase,

40 all reactions remained the same except for the exclusion of the synthesis reactions for biomass constituents other than TAGs and the inclusion of the reaction for extracellular citrate production.

The complete metabolic network models along with carbon atom transitions can be found in

Appendix A (Tables A1 and A2). Note that reversible fluxes were modeled in terms of a net flux and an exchange flux as opposed to a forward and a reverse flux (Wiechert and de Graaf, 1997).

Mass isotopomer distributions (MIDs) were used to describe the labeling patterns of the intracellular metabolites and these data were obtained experimentally from MS measurements using the cell extract samples from cultures with labeled acetate tracers. The cell extracts from natural abundance acetate cultures were also analyzed simultaneously along with the labeled samples. MIDs in the natural abundance samples were compared to that of the theoretical values calculated from the expected effects of naturally occurring heavy isotopes, and the metabolites that have significant discrepancies were excluded during further analysis (Wittmann and Heinzle,

1999; Van Winden et al., 2005). This effect of naturally occurring heavy isotopes was also accounted for in the 13C labeled metabolite samples and the MIDs have been corrected in subsequent analyses.

This study estimates the intracellular metabolic flux distribution using stationary 13C-MFA and all computations were performed using an in-house software that utilizes the concept of elementary metabolite units (Maciek R. Antoniewicz, Kelleher and Stephanopoulos, 2007). Under the steady state assumption, a random set of fluxes that satisfies mass balance constraints is first generated and serves as the initial guess to the actual flux distribution. From this initial guess, the expected extracellular fluxes and MIDs can then be simulated and compared to the experimentally determined results. The lack-of-fit between the simulated and experimental values is captured by the weighted sum of squared residuals (WSSR), whose value is minimized through iteratively

41 refining the flux distribution until the minimization algorithm converges. This procedure was repeated 500 times for each strain and phase using different initial guesses and the smallest WSSR was considered as the global minimum. The resulting flux distribution that produced the global minimum is then assumed to be a good estimate to the actual flux distribution within the cell. A

Chi-square test was used to evaluate the goodness-of-fit and whether the model for each scenario accurately described the data. 68% and 95% confidence intervals were determined for each flux value using a parameter continuation technique (Antoniewicz, Kelleher and Stephanopoulos,

2006). To perform these calculations, the uncertainties in the labeling data for intracellular metabolites were assumed to be 0.4 mol% (Maciek R Antoniewicz, Kelleher and Stephanopoulos,

2007; Wasylenko and Stephanopoulos, 2013).

2.3 Results

2.3.1 Fermentation profiles and establishment of metabolic steady state

The metabolism of Y. lipolytica on acetate described in the introduction section is significantly different from that of glucose. Consequently, the culture conditions must also be treated differently. For the case of glucose, the pH of the culture medium gradually decreases. This does not pose a problem when culturing the cells in shake flasks since Y. lipolytica can tolerate relatively acidic environments. However, when acetate is used as the carbon substrate, the pH of the culture medium rapidly increases up to 10 after 48 hours, at which the cells can no longer survive (data not shown). The growth rate is also hindered by this pH effect. Consequently, achievable cell density is low and intracellular metabolites cannot be extracted at appreciable amounts, resulting in low signal-to-noise ratios in GC-MS and LC-MS/MS analyses. In order to resolve these issues,

42

250 mL small scale bioreactors were used for MFA experiments to provide pH control and maintain it at a fixed value of 7.

The performances of the engineered lipid-overproducing strain MTYL065 and the control strain MTYL037 were evaluated and compared when cultured in a low-nitrogen medium (starting

C/N = 60) with sodium acetate as the sole carbon source. Each strain was cultured in triplicates

13 with one replicate in 1- C1 sodium acetate, one in natural abundance sodium acetate, and the other

13 in 40% U- C2 sodium acetate. Figures 2.2 and 2.3 show the time-course fermentation profiles for these six cultures. Addition of the pH control notably extends the fermentation time length beyond 48 hours and the accumulated cell density was sufficient for metabolite extraction. The time course for ammonium consumption (Figure 2.2b) shows that nitrogen had been depleted from the medium between 44 and 56 hours after inoculation, thereby dividing the entire fermentation period into two phases: the G phase when nitrogen is present allowing for biomass accumulation (24-44 hr), and the LP phase when depletion of nitrogen causes the cell to convert excess carbon into lipids (56-76 hr). During LP phase, the MTYL065 strain consumed acetate and produced lipids at much faster rates compared to the MTYL037 strain (Figures 2.2a and 2.3a).

These differences were not as prominent during the G phase. The final lipid contents (g lipid per g dry cell weight) for the three MTYL065 cultures were 53-60%, much higher than the achievable contents for the three MTYL037 cultures (23-28%). Fatty acid distributions shown in Figure 2.4 were similar between the two strains and the two fermentation phases with oleate being the dominant fatty acid species accounting for ~55% of the total fatty acids. The only byproduct determined through HPLC was citrate. During the G phase, neither strain produced citrate at detectable quantities (Figure 2.2c). However, citrate began to accumulate during the LP phase with the MTYL037 strain producing nearly four times as much as that of MTYL065 strain.

43 a 700 b 25 600 20 500 400 15 300 10 200

Ammonium (mM) Ammonium 5

Sodium (mM) acetateSodium 100 0 0 0 20 40 60 80 0 20 40 60 80 Time (hr) Time (hr) c 8 d 10

8 6 6 4

4 DCW (g/L) DCW Citrate (mM) Citrate 2 2

0 0 0 20 40 60 80 0 20 40 60 80 Time (hr) Time (hr)

Figure 2.2. Fermentation profiles for extracellular metabolites. Time-course consumption of

(a) acetate and (b) ammonium as well as formation of (c) citrate and (d) dry cell weight in 250 mL mini-bioreactors for Y. lipolytica strains MTYL037 and MTYL065. Abbreviations used to denote

13 the carbon substrate in the culture medium: 1, 1- C1 sodium acetate; NA, natural abundance

13 sodium acetate; U, 40% U- C2 sodium acetate. Other abbreviations used: DCW, dry cell weight.

To perform stationary 13C-MFA, the isotopic labeling patterns of the intracellular metabolites must be at steady state. In order to satisfy this requirement, the cells must be maintained at metabolic steady state during the period of study in which all intra- and extracellular fluxes remain

44 invariant over time. If this condition is maintained for sufficiently long times, the labeling patterns of the intracellular metabolites will eventually reach isotopic steady state, after which the metabolites can be harvested and analyzed (Wiechert, 2001). This study uses the labeling patterns of the central carbon metabolites for 13C-MFA. Since these metabolites have very fast turnover rates, they can be expected to reach isotopic steady state relatively quickly if the culture is held in a metabolic steady state (Canelas et al., 2008).

a 5 b 160

4 120 3 80

2 AcCoA (mM) AcCoA

Lipid titer (g/L) titerLipid 40 1

0 0 0 20 40 60 80 0 20 40 60 80 Time (hr) Time (hr) c 6 d 300 5 250 4 200 3 150

2 100

Glyc3P (mM) Glyc3P NADPH (mM) NADPH 1 50 0 0 0 20 40 60 80 0 20 40 60 80 Time (hr) Time (hr)

Figure 2.3. Fermentation profiles for fatty acids and their biosynthetic precursors acetyl-

CoA, Glyc3P and NADPH. Time courses of the total (a) lipid titer as well as lipogenic (b) AcCoA,

45

(c) Glyc3P, and (d) NADPH required for lipid biosynthesis in mini-bioreactors for strains MTYL037 and MTYL065. Abbreviations used to denote the carbon substrate in the culture medium: 1, 1-

13 13 C1 sodium acetate; NA, natural abundance sodium acetate; U, 40% U- C2 sodium acetate.

Other abbreviations used: AcCoA, acetyl-CoA; Glyc3P, glycerol-3-phosphate.

The fermentation profiles can be used to determine whether metabolic steady state has been reached. During G phase, the cells utilize the nitrogen source in the media and actively divide.

Under these conditions, it is generally assumed that for batch cultures exponential growth behavior of the cells approximates a metabolic steady state. For the bioreactor batch cultures used in this study, exponential growth is indeed observed as shown in the time courses for dry cell weight

(Figure 2.5) and therefore metabolic steady state is achieved during the G phase. During the LP phase, the cells no longer have access to nitrogen and can no longer divide, resulting in a constant cell number and loss of exponential growth behavior. Since the total cell number within the culture remains constant, if the cells were to be in a metabolic steady state, the entire culture would be expected to consume acetate and produce citrate and lipids at constant rates. Indeed, the fermentation profiles in Figures 2.2 and 2.3 are nearly linear during the LP phase (56-76 hr), implying constant consumption and production rates. Therefore, metabolic steady state is also achieved during this phase. Both the G and LP phases spanned 20-hour timeframes such that the metabolic steady state was maintained sufficiently long for the central carbon metabolites to reach isotopic labeling steady state by the time they were harvested near the end of each fermentation phase.

46 a Growth phase b Lipid production phase 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2

0.1 0.1 Molefraction fattyof acid 0 Molefraction fattyof acid 0 C16:0 C16:1 C18:0 C18:1 C18:2 C16:0 C16:1 C18:0 C18:1 C18:2

Figure 2.4. Distribution profiles of fatty acids produced by Y. lipoltytica. The mole fractions of the five principal fatty acids in strains MTYL037 and MTYL065 measured when cells were harvested for 13C-MFA during each phase: (a) growth phase and (b) lipid production phase.

2.3.2 Extracellular fluxes

Based on the results shown in Figures 2.2 and 2.3, the extracellular fluxes were determined for the two strains MTYL037 and MTYL065. In addition, both fermentation phases were analyzed for each strain. In all four cases, the acetate uptake rate was arbitrarily fixed to a value of 100 in order to compare the intracellular metabolic flux distribution across phases and strains. The extracellular fluxes for dry cell weight, citrate, and lipogenic AcCoA, Glyc3P, and NADPH were normalized based on the acetate uptake rate. Values of these extracellular fluxes along with their uncertainties are listed in Table 2.1 and they are directly used for metabolic flux analysis. These results show that during the G phase, the control strain MTYL037 utilized the carbon source to generate 20% more biomass compared to the engineered strain MTYL065. On the other hand, the

MTYL065 strain generated 50% more TAGs than MTYL037 during the same phase. As for the

LP phase, the MTYL065 strain clearly outperformed the MTYL037 strain by producing nearly

47 twice as much TAGs. In addition, the control strain generated significantly higher amounts of citrate as the byproduct thereby causing its lipid yield to be low.

MTYL037 1-13C MTYL037 NA 5.00 4.00 4.00 y = 0.0356e0.1082x 3.00 y = 0.0691e0.0905x 3.00 R² = 0.9765 R² = 0.9809 2.00

2.00 DCW (g/L)DCW DCW (g/L)DCW 1.00 1.00 0.00 0.00 20 30 40 50 20 30 40 50 Time (hr) Time (hr)

MTYL037 U-13C MTYL065 1-13C 5.00 5.00 4.00 4.00 y = 0.0424e0.1034x y = 0.0094e0.1399x 3.00 R² = 0.9752 3.00 R² = 0.9626

2.00 2.00 DCW (g/L)DCW DCW (g/L)DCW 1.00 1.00 0.00 0.00 20 30 40 50 20 30 40 50 Time (hr) Time (hr)

MTYL065 NA MTYL065 U-13C 5.00 5.00 4.00 4.00 y = 0.0097e0.1394x y = 0.0125e0.1339x 3.00 R² = 0.9523 3.00 R² = 0.9526

2.00 2.00 DCW (g/L)DCW DCW (g/L)DCW 1.00 1.00 0.00 0.00 20 30 40 50 20 30 40 50 Time (hr) Time (hr)

Figure 2.5. Exponential fits for all six cultures during the growth phase. Abbreviations used

13 to denote the carbon substrate in the culture medium: 1, 1- C1 sodium acetate; NA, natural

13 abundance sodium acetate; U, 40% U- C2 sodium acetate.

48

Table 2.1. Extracellular fluxes for 13C-MFA

MTYL037 MTYL065 MTYL037 MTYL065 growth growth lipid lipid

Acetate 100 100 100 100 Dry cell weight 2.49 ± 0.19 2.04 ± 0.07 0 0 Citrate 0 0 3.84 ± 0.45 0.693 ± 0.063 AcCoA 12.2 ± 0.9 18.8 ± 0.3 19.4 ± 1.0 37.3 ± 4.1 Glyc3P 0.463 ± 0.036 0.724 ± 0.012 0.750 ± 0.037 1.43 ± 0.15 NADPH 18.8 ± 1.6 33.1 ± 0.6 37.8 ± 1.9 69.4 ± 7.7

2.3.3 Biomass composition The biomass equation obtained from (Pan and Hua, 2012) was modified slightly to reflect the differences in lipid content between the control and engineered strains. The final lipid content achieved by the end of the G phase was 17.2% for the MTYL037 strain and 32.2% for the

MTYL065 strain. Amounts of amino acids, carbohydrates, , phospholipids, and sterols per gram of dry cell weight are listed in Appendix A (Table A3) for both strains. This biomass formula was used in 13C-MFA to estimate the flux for the synthesis of biomass macromolecules other than TAGs.

2.3.4 Intracellular fluxes

13C-MFA was performed on all four cases. In all scenarios, the measured extracellular fluxes obtained by HPLC and GC-FID as well as the metabolite labeling patterns obtained from GC-MS and LC-MS/MS were used as inputs. The best-fit flux values for central carbon metabolism are shown in Figure 2.6 and the flux confidence intervals for several important metabolic reactions are shown in Figures 2.7 and 2.8. For the G phase, the sum of squared residuals for the MTYL037

49 and MTYL065 MFA models were 196.1 and 191.0 respectively. Both values fall within the range of 170.0 to 235.6, which is required for the model to accurately describe the data. Similarly, the sum of squared residuals for MTYL037 and MTYL065 LP phase models were 189.3 and 160.2 respectively and they fall within the required range of 134.0 to 199.1 (Antoniewicz, Kelleher and

Stephanopoulos, 2006). The experimentally measured and model simulated intracellular metabolite MIDs are listed in Appendix A (Tables A4 and A5). The complete set of best-fit flux values along with their confidence intervals are listed in Appendix A (Tables A6 through A9).

During the G phase, intracellular metabolic flux distributions are largely similar for the two strains despite the flux of AcCoA to TAGs being higher in the MTYL065 strain (Figures 2.6 and

2.7a). For instance, both strains exhibit high TCA cycle fluxes to generate ATP. The glyoxylate shunt pathway and gluconeogenesis are both active to provide fluxes for biomass synthesis. The activity of the dicarboxylate carrier, which transports malate from the cytosol to the mitochondria, is present in both strains. This could potentially serve as a way to replenish the TCA cycle metabolite pools for since the high glyoxylate shunt flux constantly draws citrate from the mitochondria to the cytosol. The flux values through major pathways such as the

TCA cycle, glyoxylate shunt pathway, and gluconeogenesis are nearly the same for the MTYL037 and MTYL065 strains (Figures 2.6 and 2.7). The similarities observed between the two strains in terms of intracellular flux distributions during G phase is not surprising. When nitrogen is present, the primary objective of cells is to accumulate biomass regardless of strain type. Hence, both strains primarily use acetate for biomass precursor synthesis and energy production, and the enzyme activities for these reactions are expected to be comparable in the control vs the engineered strain. Nevertheless, the oxidative PPP flux is different with the engineered strain having a higher flux value (17.9 compared to 10.4 for the control strain, Figures 2.6 and 2.7b).

50

During the LP phase, however, several major distinctions emerge between the two strains.

When the cells no longer have access to nitrogen, metabolism shifts to allow the conversion of excess carbon in the medium into TAGs. As a result, the flux of AcCoA to lipids increased significantly in both strains (60% increase in MTYL037 and 100% increase in MTYL065 compared to G phase). Interestingly, the increase in lipid production does not seem to have a significant impact on TCA cycles fluxes for the MTYL037 strain. Although a smaller portion of the cytosolic AcCoA pool is transported into the mitochondria to enter the TCA cycle, the strain uses other metabolic reactions to make up for the loss. Pyruvate kinase, for example, is upregulated by 3-fold compared to the growth phase to divert the majority of the glyoxylate shunt flux back to the TCA cycle (Figures 2.6 and 2.8b). The end result is that the MTYL037 strain sacrificed a large portion of the gluconeogenic flux to keep the TCA cycle fluxes high while still synthesizing more lipids during the LP phase. Indeed, the gluconeogenic flux decreased from 11.3 to 7.6 upon entry into this phase (Figure 2.6). On the other hand, the MTYL065 strain can no longer maintain high fluxes through the TCA cycle. Since the engineered strain produced twice as much TAGs compared to the control strain, the available cytosolic AcCoA that enters the TCA cycle decreased even more. In addition, malate transport activity is nearly shut off and pyruvate kinase activity is much lower compared to the control strain (Figures 2.6, 2.8a and 2.8b). Therefore, even though the engineered strain has considerable glyoxylate shunt flux, the majority of the flux is diverted into gluconeogenesis with only a small portion flowing back into TCA cycle. Correspondingly, this strain has lower TCA cycle fluxes (mitochondrial isocitrate dehydrogenase flux) during LP phase compared to G phase (25.1 versus 33.9, Figures 2.6 and 2.7c) but is able to upregulate gluconeogenic enzymes to increase flux through this pathway (13.9 versus 11.9, Figures 2.6 and

2.8c).

51

Figure 2.6. Flux distributions from 13C-Metabolic Flux Analysis. Estimated best-fit values for the MTYL037 strain during growth (WT G) and lipid production phase (WT L) as well as for the

MTYL065 strain during growth (AD G) and lipid production phase (AD L). Legend shown on the

52 top right of the figure. For the growth phase, reactions that synthesize biomass macromolecules other than TAGs are not shown. Reversible reactions are described by a net flux and an exchange flux listed in parentheses. Exchange fluxes denoted “nr” could not be resolved.

a Lipogenic AcCoA b Oxidative PPP 45 45

30 30

15 15

0 0 MTYL037 MTYL065 MTYL037 MTYL065 MTYL037 MTYL065 MTYL037 MTYL065 growth growth lipid lipid growth growth lipid lipid c Mitochondrial IDH d Isocitrate Lyase 50 40 40 30 30 20 20 10 10 0 0 MTYL037 MTYL065 MTYL037 MTYL065 MTYL037 MTYL065 MTYL037 MTYL065 growth growth lipid lipid growth growth lipid lipid

e PEP Carboxykinase 40 30 20 10 0 MTYL037 MTYL065 MTYL037 MTYL065 growth growth lipid lipid

Figure 2.7. Flux confidence intervals of selected metabolic reactions. The best-fit flux values and corresponding confidence intervals are shown for reactions that best illustrate differences and similarities across strains (MTYL037 and MTYL065) and fermentation phases (growth and lipid production): (a) AcCoA to lipids; (b) oxidative PPP; (c) mitochondrial IDH (TCA cycle); (d)

53 isocitrate lyase (glyoxylate shunt); (e) PEP carboxykinase. Best-fit values, 68% confidence intervals, and 95% confidence intervals are described by the lines in the middle, the boxes, and the error bars respectively. All flux values are normalized to an acetate uptake rate of 100.

Abbreviations: growth, growth phase; lipid, lipid production phase; AcCoA, acetyl-CoA; PPP, pentose phosphate pathway; IDH, isocitrate dehydrogenase; PEP, phosphoenolpyruvate; TCA, tricarboxylic acid.

a Malate Transport b Pyruvate Kinase 10 20 15 5 10 5 0 0 MTYL037 MTYL065 MTYL037 MTYL065 MTYL037 MTYL065 MTYL037 MTYL065 growth growth lipid lipid growth growth lipid lipid

c Enolase 20 15 10 5 0 MTYL037 MTYL065 MTYL037 MTYL065 growth growth lipid lipid

Figure 2.8. Flux confidence intervals for metabolic reactions related to glyoxylate shunt pathway and gluconeogenesis. The best-fit flux values and corresponding confidence intervals are shown for reactions that divert the glyoxylate shunt flux back to TCA or into gluconeogenesis:

(a) malate transport; (b) pyruvate kinase; (c) enolase. Best-fit values, 68% confidence intervals, and 95% confidence intervals are described by the lines in the middle, the boxes, and the error bars respectively. All flux values are normalized to an acetate uptake rate of 100. Abbreviations: growth, growth phase; lipid, lipid production phase.

54

2.3.5 Lipogenic NADPH source

The synthesis of amino acids ceases during the LP phase due to the depletion of nitrogen and the cell no longer divides. Therefore, TAG synthesis becomes the primary pathway that requires the reducing cofactor NADPH. In the control strain MTYL037, the estimated consumption of NADPH for is 37.8 mol per 100 mol of acetate (Table 2.1). In addition, Figure 2.6 indicates that the flux through oxidative PPP is 20.6. Since 2 molecules of NADPH is generated per reaction through this pathway, the total amount of NADPH generated through oxidative PPP is 41.2 mol per 100 mol acetate, which is enough to fully support lipid synthesis. The minor excess of NADPH (3.4 mol) might be used for other purposes in the cell that are not captured by this truncated model. As for the engineered strain MTYL065, a similar trend can be observed. The

NADPH requirement for lipid synthesis in this strain is 69.4 (Table 2.1) while the amount supplied through oxidative PPP is 75.0 (Figure 2.6). Once again, NADPH generated is adequate for lipid synthesis with minor excess. In both cases, the amount of lipogenic NADPH required and the amount of NADPH produced through oxidative PPP agree with each other fairly well with less than 8% discrepancy, suggesting that there is a correlation between the two.

To further investigate other possible sources of lipogenic NADPH, the base model for the lipid production phase was revised to include other enzymatic reactions that could potentially generate

NADPH. Flux estimation was repeated for the MTYL037 and MTYL065 strains using the modified models. Two cases were analyzed: inclusion of the cytosolic malic enzyme reaction and inclusion of the cytosolic NADP+ dependent isocitrate dehydrogenase (IDH). The fluxes through these added reactions and their NADPH producing capacities are listed in Table 2.2. In the second case, the cytosolic and mitochondrial IDH reactions cannot be differentiated from each other and thus only the upper bound for the flux through cytosolic IDH was listed. The addition of the

55 cytosolic malic enzyme did not change the flux distribution in either strain (data not shown). Flux through this reaction is near zero despite having large glyoxylate shunt fluxes to generate cytosolic malate as the substrate for the enzyme. Clearly, the lipogenic NADPH cannot come from this reaction. As for the second case, even if the cytosolic IDH flux attains its maximum value, the generated NADPH would only account for 94% and 36% of the required lipogenic NADPH for the MTYL037 and MTYL065 strains respectively. However, this best-case scenario most likely cannot be achieved in the cell since it would require the elimination of all mitochondrial IDH flux thereby significantly lowering energy production due to loss of NADH. Furthermore, similar to the cytosolic malic enzyme case, incorporation of the cytosolic IDH did not alter the fluxes through other pathways (data not shown). Accordingly, as shown in Figure 2.9, the flux through oxidative

PPP remains invariant when either of the two enzymatic reactions are included. Thus, it is largely possible that the required NADPH for TAG biosynthesis is generated through this pathway.

Table 2.2. Comparison of NADPH generation in modified models

Flux NADPH generated New reaction MTYL037 MTYL065 MTYL037 MTYL065

Cytosolic malic enzyme 0.00099 0.00034 0.00099 0.00034

Cytosolic IDH <35.8 <25.3 <35.8 <25.3

56

a MTYL037 oxPPP b MTYL065 oxPPP 50 50 40 40 30 30 20 20 10 10 0 0 base malic enz cyto IDH base malic enz cyto IDH

Figure 2.9. Oxidative PPP flux confidence intervals. The best-fit oxidative PPP flux values and corresponding confidence intervals are shown. Four different models of the lipid production phase are used to calculate the oxPPP flux for the two strains (a) MTYL037 and (b) MTYL065. Best-fit values, 68% confidence intervals, and 95% confidence intervals are described by the lines in the middle, the boxes, and the error bars respectively. All flux values are normalized to an acetate uptake rate of 100. Abbreviations: oxPPP, oxidative pentose phosphate pathway; base, base model; malic enz, cytosolic malic enzyme reaction included; cyto IDH, cytosolic IDH reaction included.

Additional evidence that oxidative PPP is the primary source for lipogenic NADPH comes from the results obtained during the G phase. As mentioned earlier, the metabolic flux map is largely similar for the control versus engineered strain with the exception of the fluxes through oxidative PPP and AcCoA to TAGs. Figure 2.6 shows that there is a 58% increase in the AcCoA to lipid flux for the MTYL065 strain compared to the MTYL037 strain. Similarly, the oxidative

PPP flux is increased by 72% in the engineered strain (Figure 2.6). The upregulation of oxPPP is likely due to the increase in lipid synthesis flux since most other fluxes are comparable between the two strains. Therefore, correlation between the lipid synthesis flux and the oxidative PPP flux

57 is present in all cases presented in this study, suggesting that oxidative PPP is the primary source of lipogenic NADPH during acetate metabolism.

2.4 Discussion

We performed 13C-MFA on a control and an engineered strain of Y. lipolytica when cultured on acetate as the sole carbon source. For each strain, both the G phase (before nitrogen depletion) and the LP phase (after nitrogen depletion) were analyzed, resulting in a total of four metabolic flux distribution maps. High resolution fluxes for TCA cycle and glyoxylate shunt pathway and good resolution for gluconeogenesis and pentose phosphate pathways fluxes were obtained from parallel

13 13 labeling experiments using 1- C1 sodium acetate and 40% U- C2 sodium acetate as separate tracers.

The results demonstrate that the flux through the glyoxylate shunt pathway is high for both strains during both phases and the flux value does not differ significantly among all four cases. As stated previously, in using acetate as the primary carbon source for metabolism, the glyoxylate shunt pathway is active as it provides an avenue for anaplerotic reactions. Gluconeogenic flux is crucial to Y. lipolytica during both fermentation phases. In the G phase, gluconeogenesis replenishes the metabolite pools of upper glycolysis and PPP which are constantly syphoned off to generate biomass macromolecules. In the LP phase, the flux through oxidative PPP, which is supported by gluconeogenesis, generates the necessary NADPH for lipid synthesis. However, some steps in gluconeogenesis consume ATP or NADH and thus it is costly for the cells to have a large flux through this pathway. With these considerations, the flux through gluconeogenesis should be tightly regulated.

58

By observing the differences in how the control and the engineered strains transition from the

G phase to the LP phase, this study provides evidence that the point of regulation may occur at two locations, namely, the malate transporter and pyruvate kinase. When the cell needs the gluconeogenic flux for biomass accumulation, as is the case for both strains during the G phase, the malate transporter and pyruvate kinase activities are moderate. In this case, a significant portion of the glyoxylate shunt flux is conserved through gluconeogenesis and only a small portion is diverted back to the TCA cycle. Upon entry into the LP phase, the engineered strain requires an even higher gluconeogenic flux in order to generate sufficient lipogenic NADPH through oxidative

PPP. To satisfy this requirement, malate transporter is completely shut off and the activity of pyruvate kinase is relatively low (albeit increased compared to the G phase) to conserve more of the glyoxylate shunt flux for gluconeogenesis. The increase in pyruvate kinase activity could potentially serve as a way to generate more energy compounds through

(PDH) and to compensate for the loss of TCA cycle flux. Regardless, the combined effects of malate transporter and pyruvate kinase does indeed allow 63% of the glyoxylate shunt flux to flow through gluconeogenesis which is higher compared to 48% during the G phase. On the other hand, the gluconeogenic flux requirements for the control strain has decreased due to cessation of macromolecule synthesis and low TAG production rates. Consequently, malate transport is active and pyruvate kinase activity is upregulated significantly, thereby recycling the majority of the glyoxylate shunt flux back into the TCA cycle. The effects of these two enzymes are evident in that only 26% of the glyoxylate shunt flux goes into gluconeogenesis, much lower compared to

41% during the G phase. In sum, the changes in activities of the malate transporter and pyruvate kinase along with the conserved high glyoxylate shunt flux can be viewed as a way to dynamically and rapidly alter the flux through gluconeogenesis over a large range. In this way, only the

59 necessary amount of flux for macromolecule and NADPH production flows through gluconeogenesis and the excess glyoxylate shunt flux is diverted back to the TCA cycle preferentially through pyruvate kinase and PDH in order to produce more ATP.

Another interesting finding comes from the correlation between the amount of lipogenic

NADPH required and the amount of NADPH synthesized through the oxidative PPP across two different strains of Y. lipolytica. As noted before, the MFA model does not include a cofactor balance and it is entirely possible to have a mismatch of NADPH generated versus consumed in either strain. Therefore, the correlation between the two comes directly from the experimental results and is unbiased by model assumptions. The same conclusion that lipogenic NADPH comes primarily from oxidative PPP was also reached when Y. lipolytica was cultured on glucose

(Wasylenko, Ahn and Stephanopoulos, 2015). Since the same trend is observed in these two studies where the metabolism of the cell is drastically different, this provides strong evidence that the oxidative PPP might be the only native pathway that can be utilized for lipogenic NADPH production.

For the case of acetate metabolism, it is very costly to divert flux through the oxidative PPP.

Overall, a total of 2 AcCoA and 2 ATP molecules are required to generate a maximum of 6

NADPH through this pathway assuming that all the in AcCoA are eventually converted to 4 molecules of CO2. When the lipid synthesis pathway is active and the demand for lipogenic

NADPH is high, the cell needs to use large quantities of AcCoA and ATP. This, along with many other essential reactions such as acetate activation, puts the cell in a state of high energy demand.

Furthermore, constant draining of the available cytosolic AcCoA pool during lipid synthesis and

NADPH production decreases the carbons available to enter the TCA cycle and thus the energy producing capacity is reduced. As such, these issues largely hinder the microbe’s ability to rapidly

60 synthesize lipids on acetate as the sole carbon source and engineering designs to circumvent these challenges should focus on providing sufficient NADPH as well as glycolysis- and PPP-derived biomass precursors without relying on the energy-intensive gluconeogenic pathway.

2.5 References

Ahn, W.S. & Antoniewicz, M.R., 2011. Metabolic flux analysis of CHO cells at growth and non-

growth phases using isotopic tracers and mass spectrometry. Metabolic Engineering,

13(5), pp.598–609.

Antoniewicz, M.R., Kelleher, J.K. & Stephanopoulos, G., 2007. Accurate Assessment of Amino

Acid Mass Isotopomer Distributions for Metabolic Flux Analysis Accurate Assessment

of Mass Isotopomer Distributions for Metabolic Flux Analysis. Analytical

Chemistry, 79(19), pp.7554–7559.

Antoniewicz, M.R., Kelleher, J.K. & Stephanopoulos, G., 2006. Determination of confidence

intervals of metabolic fluxes estimated from stable isotope measurements. Metabolic

Engineering, 8(4), pp.324–337.

Antoniewicz, M.R., Kelleher, J.K. & Stephanopoulos, G., 2007. Elementary metabolite units

(EMU): A novel framework for modeling isotopic distributions. Metabolic Engineering,

9(1), pp.68–86.

Beopoulos, A., Cescut, J., et al., 2009. Yarrowia lipolytica as a model for bio-oil production.

Progress in Lipid Research, 48(6), pp.375–387.

Beopoulos, A., Chardot, T. & Nicaud, J.M., 2009. Yarrowia lipolytica: A model and a tool to

understand the mechanisms implicated in lipid accumulation. Biochimie, 91(6), pp.692–

696.

61

Beopoulos, A., Nicaud, J.M. & Gaillardin, C., 2011. An overview of in yeasts

and its impact on biotechnological processes. Applied Microbiology and Biotechnology,

90(4), pp.1193–1206.

Blazeck, J. et al., 2014. Harnessing Yarrowia lipolytica lipogenesis to create a platform for lipid

and production. Nature communications, 5, p.3131.

Boulton, C.A. & Ratledge, C., 1981. Correlation of Lipid Accumulation in Yeasts with

Possession of ATP: Citrate Lyase. Microbiology, 127(1), pp.169–176.

Canelas, A.B. et al., 2008. Leakage-free rapid quenching technique for yeast metabolomics.

Metabolomics, 4(3), pp.226–239.

Dujon, B. et al., 2004. Genome evolution in yeasts. Nature, 430(6995), pp.35–44.

Eaton, S., 2002. Control of mitochondrial β-oxidation flux. Progress in Lipid Research, 41(3),

pp.197–239.

Eschrich, D., Kötter, P. & Entian, K.D., 2002. Gluconeogenesis in Candida albicans. FEMS

Yeast Research, 2, pp.315–325.

Evans, C.T. & Ratledge, C., 1984. Effect of Nitrogen Source on Lipid Accumulation in

Oleaginous Yeasts. Microbiology, 130(April), pp.1693–1704.

Fei, Q. et al., 2011. The effect of volatile fatty acids as a sole carbon source on lipid

accumulation by Cryptococcus albidus for biodiesel production. Bioresource Technology,

102(3), pp.2695–2701.

Flores, C.L. & Gancedo, C., 2005. Yarrowia lipolytica mutants devoid of pyruvate carboxylase

activity show an unusual growth phenotype. Eukaryotic Cell, 4(2), pp.356–364.

Fontanille, P. et al., 2012. Bioconversion of volatile fatty acids into lipids by the oleaginous yeast

Yarrowia lipolytica. Bioresource Technology, 114, pp.443–449.

62

Hu, P. et al. (2016) ‘Integrated bioprocess for conversion of gaseous substrates to liquids’,

Proceedings of the National Academy of Sciences of the United States of America,

113(14), pp. 3773–3778. doi: 10.1073/pnas.1516867113.

Jardón, R., Gancedo, C. & Flores, C.L., 2008. The gluconeogenic enzyme fructose-1,6-

bisphosphatase is dispensable for growth of the yeast Yarrowia lipolytica in

gluconeogenic substrates. Eukaryotic Cell, 7(10), pp.1742–1749.

Jogl, G. & Tong, L., 2004. Crystal Structure of Yeast Acetyl-Coenzyme A Synthetase in

Complex with AMP. Biochemistry, 43(6), pp.1425–1431.

Kleijn, R.J. et al., 2005. Revisiting the13C-label distribution of the non-oxidative branch of the

pentose phosphate pathway based upon kinetic and genetic evidence. FEBS Journal,

272(19), pp.4970–4982.

Kornberg, H.L., 1966. The role and control of the glyoxylate cycle in .

Biochemical Journal, 99(1), pp.1–11.

Li, Q., Du, W. & Liu, D., 2008. Perspectives of microbial oils for biodiesel production. Applied

Microbiology and Biotechnology, 80(5), pp.749–756.

Luévano-Martínez, L.A. et al., 2010. Identification of the mitochondrial carrier that provides

Yarrowia lipolytica with a fatty acid-induced and -sensitive uncoupling

-like activity. Biochimica et Biophysica Acta - Bioenergetics, 1797(1), pp.81–88.

Luo, B. et al., 2007. Simultaneous determination of multiple intracellular metabolites in

glycolysis, pentose phosphate pathway and tricarboxylic acid cycle by liquid

chromatography-mass spectrometry. Journal of Chromatography A, 1147(2), pp.153–

164.

Maaheimo, H. et al., 2001. Central carbon metabolism of Saccharomyces cerevisiae explored by

63

biosynthetic fractional 13C labeling of common amino acids. European Journal of

Biochemistry, 268(8), pp.2464–2479.

Meng, X. et al., 2009. Biodiesel production from oleaginous microorganisms. Renewable

Energy, 34(1), pp.1–5.

Morgan-Sagastume, F. et al., 2011. Production of volatile fatty acids by fermentation of waste

activated sludge pre-treated in full-scale thermal hydrolysis plants. Bioresource

Technology, 102(3), pp.3089–3097.

Palmieri, L. et al., 1999. Identification of the yeast mitochondrial transporter for oxaloacetate and

sulfate. Journal of Biological Chemistry, 274(32), pp.22184–22190.

Pan, P. cheng & Hua, Q., 2012. Reconstruction and In Silico Analysis of Metabolic Network for

an Oleaginous Yeast, Yarrowia lipolytica. PLoS ONE, 7(12), pp.1–11.

Papanikolaou, S. et al., 2009. Biosynthesis of lipids and organic acids by Yarrowia lipolytica

strains cultivated on glucose. European Journal of Lipid Science and Technology,

111(12), pp.1221–1232.

Perea, J. & Gancedo, C., 1982. Isolation and characterization of a mutant of Saccharomyces

cerevisiae defective in phosphoenolpyruvate carboxykinase. Arch Microbiol, 132,

pp.141–143.

Qiao, K. et al., 2015. Engineering lipid overproduction in the oleaginous yeast Yarrowia

lipolytica. Metabolic Engineering, 29, pp.56–65.

Schmidt, K. et al., 1998. 13C tracer experiments and metabolite balancing for metabolic flux

analysis: Comparing two approaches. Biotechnology and Bioengineering, 58(3), pp.254–

257.

Tai, M. & Stephanopoulos, G., 2013. Engineering the push and pull of lipid biosynthesis in

64

oleaginous yeast Yarrowia lipolytica for biofuel production. Metabolic Engineering,

15(1), pp.1–9.

Wasylenko, T.M., Ahn, W.S. & Stephanopoulos, G., 2015. The oxidative pentose phosphate

pathway is the primary source of NADPH for Lipid Overproduction from Glucose in

Yarrowia lipolytica. Metabolic Engineering, 30, pp.27–39.

Wasylenko, T.M. & Stephanopoulos, G., 2013. Kinetic isotope effects significantly influence

intracellular metabolite 13C labeling patterns and flux determination. Biotechnology

Journal, 8(9), pp.1080–1089.

Wiechert, W., 2001. 13C metabolic flux analysis. Metabolic engineering, 3(3), pp.195–206.

Wiechert, W. & de Graaf, a a, 1997. Bidirectional reaction steps in metabolic networks: I.

Modeling and simulation of carbon isotope labeling experiments. Biotechnology and

bioengineering, 55(1), pp.101–17.

Van Winden, W.A. et al., 2005. Metabolic-flux analysis of Saccharomyces cerevisiae

CEN.PK113-7D based on mass isotopomer measurements of 13C-labeled primary

metabolites. FEMS Yeast Research, 5(6-7), pp.559–568.

Wittmann, C. & Heinzle, E., 1999. Mass spectrometry for metabolic flux analysis. Biotechnology

and bioengineering, 62(6), pp.739–750.

Zhang, H. et al., 2013. Regulatory properties of malic enzyme in the oleaginous yeast, Yarrowia

lipolytica, and its non-involvement in lipid accumulation. Biotechnology Letters, 35(12),

pp.2091–2098.

65

66

Chapter 3

Mixed Substrate Metabolism—an Alternative Approach to Genetic Modifications

This chapter is adapted from

Liu, N., Santala, S. & Stephanopoulos, G., 2020. Mixed carbon substrates: a necessary nuisance or a missed opportunity? Current Opinion in Biotechnology, 62, pp15-21.

67

3.1 Introduction

Industrial biotechnology uses cells as biocatalysts to convert substrates into valuable products with high specificity. In most cases, the carbon source is the most important nutrient providing both energy and building blocks. Currently, the use of a single carbon source, most notably sugars such as glucose, is prevalent in both laboratory and industrial settings due to historical and practical reasons. With rapidly developing metabolic engineering and synthetic biology tools, microbes harboring rewired metabolism can successfully transform a single sugar into a wide variety of products. However, when single substrates are used, it naturally imposes several limitations on the metabolism of cells. This is exemplified when the product of interest requires long synthetic routes from the starting substrate, when the product has distinct chemical properties compared to the substrate, or when unfavorable substrates are used, all of which lead to low yield and productivity

(Babel, 2009; King, Woolston and Stephanopoulos, 2017). The issues related to acetate metabolism detailed in the previous chapter serves as a compelling case that illustrates the limitations of single-substrate . Providing microbes with multiple carbon sources, on the other hand, can potentially ease these constraints since it adds an additional degree of freedom to cellular systems that can be optimized for product formation. As such, this chapter aims to summarize recent developments where mixed substrates are utilized to enhance the performance of microbes (Figure 3.1). We will begin by discussing carbon catabolite repression (CCR), the most pressing issue hindering widespread adoption of mixed substrate fermentation. Then, we will briefly touch upon the various methods that alleviate CCR and their importance in the efficient utilization of renewable feedstocks (Mosier et al., 2005). Finally, we will focus on the primary topic of this review: a discussion of situations where multiple carbon sources are provided to the cells by design for significant enhancements in productivity and yield. We summarize the benefits

68 of mixed substrate metabolism in four distinct categories, as it can either, 1) better balance the various biosynthetic components towards satisfying the requirements of the product, 2) simultaneously activate multiple required metabolic pathways for improved carbon conversion, 3) provide shortcut access to key synthesis pathways reducing the overall number of enzymatic steps for product formation, or 4) serve as inexpensive inducers for segregating growth and production phases (Figure 3.2). By highlighting recent successes in utilizing mixed substrates, we hope to illustrate the significance of employing such strategies in biotechnology and elucidate the circumstances under which this can be an effective starting point for the biosynthesis of products.

Growth

Product

Figure 3.1. Schematic of fermentation using a mixture of carbon sources. With judiciously chosen substrate types and carefully designed methods to introduce the substrates to the cells, mixed substrate metabolism can be exploited to enhance the performance of microbes. Shown

69 here is an example where two simultaneously utilized substrates tailor to different needs of the bioprocess (i.e., cell growth and product formation), resolving the conflict between the two.

(a) (b)

NAD(P)H

C

ATP

(c) (d)

Figure 3.2. Examples of strategies where mixed substrate fermentation is employed to optimize cellular metabolism, leading to increases in carbon yield and/or productivity. (a)

Well-designed mixed substrate co-utilization balances key biosynthetic components for more optimal product formation. (b) Substrate mixtures simultaneously activate multiple pathways leading to increased yield or better consumption of substrates. (c) Using a secondary substrate in addition to a growth-supporting primary substrate reduces overall enzymatic steps leading to the final product, thereby enhancing productivity. (d) Automated transition from growth to production phase by adjusting the ratio of mixed substrates leads to optimal redistribution of resources.

70

3.2 Challenges in mixed substrate utilization—carbon catabolite repression

Bacteria and yeast can commonly utilize several carbon sources, but this occurs under strict regulation. Most catabolic pathways are subject to CCR, a global regulatory system which is found in nearly all heterotrophic hosts (Görke and Stülke, 2008; Simpson-Lavy and Kupiec, 2019). While

CCR facilitates optimal growth in complex environments, it imposes a great challenge for the efficient utilization of multiple carbon substrates in biotechnological applications. Because of

CCR, the uptake of a secondary carbon source is inhibited in the presence of a preferred substrate, forcing sequential utilization and prolonging fermentation time. While some designs exploit this phenomenon for the control of gene expression, CCR is generally undesirable in fermentation systems containing multiple substrates.

The phosphotransferase system (PTS) and cAMP-CRP complex are the best studied mechanisms that lead to CCR. They play major roles in the selective transport and transcriptional regulation of catabolic genes in Escherichia coli, giving rise to either diauxie or co-utilization depending on the substrate types (Görke and Stülke, 2008; X. Wang et al., 2019). Distinct mechanisms for CCR have also been discovered in other model organisms, such as Bacillus subtilis

(Fujita, 2009) and Saccharomyces cerevisiae (Gancedo, 1998; Simpson-Lavy and Kupiec, 2019), underlying similar effects. However, despite numerous efforts to study CCR, the interaction between two or more substrates is still not well understood in many organisms. In Yarrowia lipolytica, for example, the inhibition of xylose uptake by glucose occurs presumably through regulation of sugar transporters rather than gene transcription (Lazar et al., 2017), although strains adapted for xylose utilization do not exhibit this effect (Ryu, Hipp and Trinh, 2016). CCR can also be triggered by carbon sources other than glucose. For instance, in Pseudomonas putida and

Acinetobacter baylyi, succinate primarily inhibits the pathways responsible for hydrocarbon and

71 aromatic compound degradation (Rojo, 2010). Overall, despite CCR being a widespread phenomenon among microorganisms, the substrate pairs causing it as well as the underlying mechanisms need to be analyzed on a case-by-case basis, making it difficult to develop a universal strategy that can overcome this undesirable effect. A better fundamental understanding is still required to enable smarter designs that can facilitate multiple substrate utilization.

3.3 Efforts to enable substrate co-utilization for better conversion of renewable feedstocks

Although CCR is still not well understood in many cases, efforts have been made for alleviating its effects in order to allow simultaneous consumption of multiple carbon sources. Methods employed for substrate co-utilization include engineering of sugar transporters (Farwick et al.,

2014), adaptive evolution (Papapetridis et al., 2018), targeted genome engineering (Kim et al.,

2015), and introduction of non-native transporters or catabolic pathways that are not subject to

CCR (Young, Lee and Alper, 2010). More recently, engineered consortia with different population distributions have been applied to consume designated substrate mixtures, which can be a promising alternative approach (Flores et al., 2019; L. Wang et al., 2019). The strategies for circumventing the effects related to CCR are especially important when it comes to the use of inexpensive and renewable feedstocks containing mixtures of organic carbon sources, such as lignocellulosic hydrolysates and municipal waste streams. Many detailed reviews can be found in literature that cover this topic extensively (Lee et al., 2014; Ledesma-Amaro and Nicaud, 2016;

Gao, Ploessl and Shao, 2019; Li, Chen and Nielsen, 2019). In these cases, the separation of individual substrates is costly and impractical, and thus the efficient utilization of substrate mixtures becomes a necessity that requires additional considerations and engineering efforts.

Nonetheless, when conducted properly, the switch from single substrates to mixed substrates can

72 benefit the cells in many different ways. As such, the remainder of this review will focus on the intentional application of mixed substrate metabolism with an emphasis on how they improve product biosynthesis.

3.4 Substrate co-utilization better balances biosynthetic components

We begin the discussion by illustrating how co-utilizing multiple carbon sources can balance key biosynthetic components. Generally, product synthesis in cells is a complex process involving the coordination of various types of biological building blocks. Due to the nature of metabolism, these are often generated with varying degrees of efficiency. Thus, it can be difficult for cells to simultaneously satisfy all demands at the appropriate ratios using only a single carbon substrate.

For instance, one interesting idea formulated by Babel et al. describes how different substrates generate distinct carbon-to-energy ratios and this ratio is rarely equal to the requirement of the product (Babel, 2009). Therefore, in single substrate bioconversions, the surplus component, either carbon or energy, is wasted, leading to suboptimal yield. This concept establishes that for many products, glucose is an energy deficient sole substrate (i.e., the carbon-to-energy ratio is too high).

As such, an additional energy-rich auxiliary substrate with a lower carbon-to-energy ratio can be fed to the cells in conjunction with glucose to compensate for the energetic deficiency seen in glucose-to-product conversions (Babel, 2009). Generally, organic C1 substrates are considered to be convenient energy donors and can be readily supplied to organisms. In particular, formate, formaldehyde, and methanol have all been successfully demonstrated to improve yields of various products when co-utilized with sugars for better balancing of the carbon-to-energy ratio

(Geertman, Van Dijken and Pronk, 2006; van der Krogt et al., 2007; Baerends et al., 2008;

Koopman, De Winde and Ruijssenaars, 2009; Witthoff et al., 2015). However, the engineering of

73 heterologous C1 pathways can be challenging for several industrial hosts and the toxicity induced by these compounds can be problematic as well (Baerends et al., 2008). As such, other carbon sources have also been explored to serve as energy-rich auxiliary substrates to be used in conjunction with glucose, resulting in improvements in titer and yield compared to glucose-only cultures (Wang et al., 2013; Zeng et al., 2019).

Taking the auxiliary substrate concept one step further, a recent publication argued that a balanced supply of three principle biosynthetic components, carbon, ATP, and reducing equivalents, are required in order to maximize the performance of complex bioproduct synthesis

(Park et al., 2019). The authors illustrated this in systems involving autotrophic CO2-fixation by

Moorella thermoacetica, which has surpluses in carbon and reducing electrons but is severely limited by ATP. Similarly, the authors also noted that in acetate-driven lipogenesis by Y. lipolytica, carbon and ATP are generated in excess whereas reducing equivalents are insufficient. To addresses these challenges, “dopant substrates” providing facile access to the otherwise limiting biosynthetic components were cofed to the cells along with the primary substrates for co-utilization and minimal feeding of the “dopant substrate” (< 5% of total carbon source) was able to accelerate the productivity by more than 2-fold. Importantly, under controlled co-feeding conditions, 13C tracing revealed that the two substrates interacted synergistically, complementing each other in their respective roles for generating the three biosynthetic components. As a result, the measured productivity exceeded that of the sum of the individual substrates, a substantial improvement over previous mixed substrate studies (Park et al., 2019).

74

3.5 Substrate mixtures simultaneously activate multiple critical metabolic pathways

In addition to balancing biosynthetic components, using substrate mixtures can also simultaneously activate pathways that are isolated from each other in the metabolic network.

Doing so enables a unique flux distribution distinct from single-substrate metabolism that can benefit the cells in various ways. Mixotrophic fermentation, where organic (e.g. glucose) and inorganic carbon sources (e.g. CO2) are co-utilized in CO2 assimilating cells, best illustrates this idea (Fast et al., 2015; Gonzales, Matson and Atsumi, 2019). Under these conditions, both glycolysis and carbon fixation pathways operate concurrently, with the latter re-assimilating the

CO2 evolved from the former, thus allowing for near 100% carbon yield to the key metabolic intermediate acetyl-CoA (Ragsdale and Pierce, 2008). This has been applied to address the carbon conversion challenge of traditional heterotrophic sugar-based cultures where the carbon yield is constrained to at most 67% acetyl-CoA yield due to decarboxylation of pyruvate. Jones et al. applied this concept where they engineered Clostridium ljungdahlii for the production of acetone under fructose and CO2 mixotrophic conditions (Jones et al., 2016). In their experiments, a product yield higher than the theoretical maximum heterotrophic yield was achieved due to re-fixing of

CO2. More recent developments in mixotrophy have demonstrated its utility in producing other complex products such as mevalonate and isoprene (Diner et al., 2018). Furthermore, it can also be applied to various non-model acetogens (Jones et al., 2016; Maru et al., 2018) as well as photosynthetic bacteria (Kanno, Carroll and Atsumi, 2017), each with their unique strengths in forming specific products. Using substrate mixtures to activate key metabolic pathways is not limited to mixotrophy however (Meyer et al., 2018; Woolston et al., 2018). Uranukul et al., for instance, demonstrated that supplementing limited quantities of glucose to xylose-based S. cerevisiae cultures allows for efficient operation of both glycolysis and the monoethylene glycol

75

(MEG) synthesis pathway. This helps maintain cell viability while allowing for exclusive diversion of xylose assimilation flux into MEG formation, resulting in 4 g/L of MEG, a near 50% increase compared to xylose-only cultures (Uranukul et al., 2019).

3.6 Mixed substrate metabolism provides shortcut access to key synthesis pathways

Another area that mixed substrate fermentation has achieved considerable success is in the production of natural products (e.g., terpenes, polyketides, etc.). The synthesis of these compounds occurs through long and complex pathways that are heavily regulated by the cells, leading to many bottlenecks when using sugars as the sole carbon source (King, Woolston and Stephanopoulos,

2017). A strategy to circumvent this issue is to provide another substrate, in addition to sugars, that can be more readily converted to the final product in fewer steps (Billingsley et al., 2017).

Under these conditions, a sugar substrate is still required to sustain cell growth, while the added secondary substrate can bypass decarboxylation steps (Kaneko, Ishii and Kirimura, 2011; Xie et al., 2014), the need to express heterologous pathways (Luo et al., 2019), or native regulations and cofactor requirements (Chatzivasileiou et al., 2018; Lund, Hall and Williams, 2019) to boost flux and carbon conversion towards product formation. To illustrate this, recent research has engineered an alternative 2-step synthetic pathway that can efficiently phosphorylate isopentenols into isopentenyl pyrophosphate (IPP), the common precursor for nearly all isoprenoids

(Chatzivasileiou et al., 2018; Lund, Hall and Williams, 2019). This reduces the number of enzymatic steps considerably compared to using glucose as the carbon source. As a result, when carrying out fermentations using a mixture of glucose and an isopentenol, the flux through the isoprenoid synthesis pathway was elevated significantly, thereby supporting rapid accumulation of lycopene and other (Chatzivasileiou et al., 2018; Lund, Hall and Williams, 2019).

76

On the other hand, cell growth through of glucose remained mostly unhindered as the engineered route is completely orthogonal to central carbon metabolism (Chatzivasileiou et al.,

2018).

To expand further on this concept, researchers have also used a secondary substrate as a precursor analog to incorporate various functional groups not found in nature into a final product.

Typically, introducing non-native functional groups to metabolites relies on the promiscuity of all enzymes present in the pathway and hence the amount of functionalized product decreases dramatically with longer synthesis steps. By employing functionalized precursor substrates with fewer steps to the final product, the chances of obtaining new products can be improved significantly with the added benefit of detailed control over functionalization location and fraction.

However, a mixture of substrates is required in this case since the functionalized secondary substrate generally does not support growth. A recent publication by Li et al. successfully demonstrated this strategy where the authors achieved alkaloid halogenation at the target location by feeding halogenated tyrosine derivatives to glucose-based cultures (Li et al., 2018). Other studies have also employed this idea for the microbial synthesis of new compounds with antibiotic functions (Harvey et al., 2012; Zhang et al., 2016). Overall, the flexibility in chemical transformations enabled by such a mixed substrate approach can greatly expand the properties of various natural products, aiding the discovery of novel applications.

3.7 Multi-substrate enabled metabolic control enhances strain performance

Finally, engineering cells to segregate native (e.g., growth) and non-native (e.g., product over accumulation) functions in response to the ratio of various substrates present in the media represents the last class of mixed substrate fermentation applications that we will discuss (Tan and

77

Prather, 2017; Xu, 2018). This is typically achieved by coupling the expression of genes to regulatory elements responsive to specific substrates such that the substrates themselves serve as inducers for the transition from growth to production phase. Such a transition can facilitate optimal resource allocation within the cells, where the expression of enzymes can be tailored to biomass production during the initial growth phase and product accumulation during the later production phase (Studier, 2005; Williams et al., 2015). Additionally, using substrate mixtures to achieve this goal offers some key advantages. Most notably, carbon substrates (sugars, organic acids, etc.) are much cheaper and less toxic than dedicated chemical inducers and thus should provide a better incentive for applications in large scale industrial fermentations. The time of transition between the two phases can be also automated and controlled through adjusting the initial concentrations of the different carbon sources present in the mixture.

Here we highlight two recent publications from literature to serve as examples illustrating how this concept improves product accumulation. The first example involves A. baylyi ADP1 engineered to decouple cell growth and wax ester synthesis using two common constituents of lignocellulose hydrolysates, acetate and arabinose (Santala, Efimova and Santala, 2018). The shift from growth-mode to synthesis-mode was achieved by placing a growth-essential gene under an arabinose-inducible promoter. Initially when both substrates were present, the cells could replicate.

However, as arabinose was progressively consumed throughout fermentation, cell growth gradually decreased and eventually ceased when arabinose was no longer present. Accordingly, excess acetate was instead channeled through the wax ester synthesis pathway towards the final product. Tuning the initial ratio of arabinose to acetate led to various relative lengths of growth and production phases. The optimal strain and bioprocess conditions gave rise to accumulation of nearly 20% of dry cell weight as wax esters, a 4-fold improvement compared to wild type cells.

78

Similarly, Lo et al. also developed a system for the automated transition from cell growth to bioproduct synthesis through the sequential utilization of substrate pairs (Lo et al., 2016). Using their approach, product biosynthesis was strategically delayed to occur only after sufficient biomass has accumulated. The results showed lowered metabolic stress, as well as increases in both the growth rate during the growth phase and the productivity during the production phase, illustrating the utility of this strategy.

3.8 Conclusions and future outlook

The synthesis of bioproducts using a single carbon source has historically been the preferred mode of operation. By contrast, using multiple substrates for cell growth and product formation adds a layer of complexity to the system, which can be discouraging. Nevertheless, mixed substrate fermentation introduces an additional degree of freedom metabolically that facilitates enhancements in yield and productivity. This is achieved by the judicious choice of the co- substrates such that they work in conjunction to either optimally balance biosynthetic components, flux distributions, and cellular resources, or efficiently supply the rate limiting components in fewer enzymatic steps. Many studies listed in this review have demonstrated that having multiple substrates can achieve a similar effect to metabolic rewiring without the need for extensive genetic engineering. This is essential to improving the performance of unconventional organisms that lack the necessary genetic tools. Even in model organisms, the utilization of more than one substrate can provide a shortcut to accomplishing the desired goal. Overall, metabolic optimizations enabled by the interactions among mixed carbon substrates have been successfully employed in cultures with different organisms under various conditions. As this concept is currently attracting increasing attention from researchers in the field, we anticipate that the use of mixed substrates, in

79 combination with current metabolic engineering methods, can be an opportunity to discover novel designs that further increase performance of microbes compared to the state-of-the-art systems.

3.9 References

Babel, W. (2009) ‘The Auxiliary Substrate Concept: From simple considerations to heuristically

valuable knowledge’, Engineering in Life Sciences, 9(4), pp. 285–290. doi:

10.1002/elsc.200900027.

Baerends, R. J. S. et al. (2008) ‘Engineering and analysis of a Saccharomyces cerevisiae strain

that uses formaldehyde as an auxiliary substrate’, Applied and Environmental

Microbiology, 74(10), pp. 3182–3188. doi: 10.1128/AEM.02858-07.

Billingsley, J. M. et al. (2017) ‘Engineering the biocatalytic selectivity of iridoid production in

Saccharomyces cerevisiae’, Metabolic Engineering. 44, pp. 117–125. doi:

10.1016/j.ymben.2017.09.006.

Chatzivasileiou, A. O. et al. (2018) ‘Two-step pathway for isoprenoid synthesis’, Proceedings of

the National Academy of Sciences, 116(2), pp. 506–511. doi: 10.1073/pnas.1812935116.

Diner, B. A. et al. (2018) ‘Synthesis of heterologous pathway enzymes in

Clostridium ljungdahlii for the conversion of fructose and of syngas to mevalonate and

isoprene’, Applied and Environmental Microbiology, 84(1), pp. 1–16. doi:

10.1128/AEM.01723-17.

Farwick, A. et al. (2014) ‘Engineering of yeast hexose transporters to transport D-xylose without

inhibition by D-glucose’, Proceedings of the National Academy of Sciences, 111(14), pp.

5159–5164. doi: 10.1073/pnas.1323464111.

Fast, A. G. et al. (2015) ‘Acetogenic mixotrophy: novel options for yield improvement in

80

biofuels and biochemicals production’, Current Opinion in Biotechnology. 33, pp. 60–72.

doi: 10.1016/j.copbio.2014.11.014.

Flores, A. D. et al. (2019) ‘Engineering a Synthetic, Catabolically-Orthogonal Co-Culture

System for Enhanced Conversion of Lignocellulose-Derived Sugars to Ethanol’, ACS

Synthetic Biology. 8, pp. 1089–1099. doi: 10.1021/acssynbio.9b00007.

Fujita, Y. (2009) ‘Carbon Catabolite Control of the Metabolic Network in Bacillus subtilis’,

Bioscience, Biotechnology, and Biochemistry, 73(2), pp. 245–259. doi:

10.1271/bbb.80479.

Gancedo, J. M. (1998) ‘Yeast carbon catabolite repression.’, Microbiology and molecular

biology reviews : MMBR, 62(2), pp. 334–61.

Gao, M., Ploessl, D. and Shao, Z. (2019) ‘Enhancing the co-utilization of biomass-derived mixed

sugars by yeasts’, Frontiers in Microbiology, 10(JAN), pp. 1–21. doi:

10.3389/fmicb.2018.03264.

Geertman, J. M. A., Van Dijken, J. P. and Pronk, J. T. (2006) ‘Engineering NADH metabolism

in Saccharomyces cerevisiae: Formate as an electron donor for glycerol production by

anaerobic, glucose-limited chemostat cultures’, FEMS Yeast Research, 6(8), pp. 1193–

1203. doi: 10.1111/j.1567-1364.2006.00124.x.

Gonzales, J. N., Matson, M. M. and Atsumi, S. (2019) ‘Nonphotosynthetic Biological CO2

Reduction’, Biochemistry. 58(11), pp. 1470–1477. doi: 10.1021/acs.biochem.8b00937.

Görke, B. and Stülke, J. (2008) ‘Carbon catabolite repression in bacteria: Many ways to make

the most out of nutrients’, Nature Reviews Microbiology, 6(8), pp. 613–624. doi:

10.1038/nrmicro1932.

Harvey, C. J. B. et al. (2012) ‘Precursor directed biosynthesis of an orthogonally functional

81

erythromycin analogue: Selectivity in the ribosome macrolide binding pocket’, Journal of

the American Chemical Society, 134(29), pp. 12259–12265. doi: 10.1021/ja304682q.

Jones, S. W. et al. (2016) ‘CO2 fixation by anaerobic non-photosynthetic mixotrophy for

improved carbon conversion’, Nature Communications. 7, pp. 1–9. doi:

10.1038/ncomms12800.

Kaneko, A., Ishii, Y. and Kirimura, K. (2011) ‘High-yield Production of cis,cis-Muconic Acid

from Catechol in Aqueous Solution by Biocatalyst’, Chemistry Letters, 40(4), pp. 381–

383. doi: 10.1246/cl.2011.381.

Kanno, M., Carroll, A. L. and Atsumi, S. (2017) ‘Global metabolic rewiring for improved CO2

fixation and chemical production in ’, Nature Communications. 8, pp. 1–11.

doi: 10.1038/ncomms14724.

Kim, S. M. et al. (2015) ‘Simultaneous utilization of glucose and xylose via novel mechanisms

in engineered Escherichia coli’, Metabolic Engineering. 30, pp. 141–148. doi:

10.1016/j.ymben.2015.05.002.

King, J. R., Woolston, B. M. and Stephanopoulos, G. (2017) ‘Designing a New Entry Point into

Isoprenoid Metabolism by Exploiting Fructose-6-Phosphate Aldolase Side Reactivity of

Escherichia coli’, ACS Synthetic Biology, 6(7), pp. 1416–1426. doi:

10.1021/acssynbio.7b00072.

Koopman, F. W., De Winde, J. H. and Ruijssenaars, H. J. (2009) ‘C1 compounds as auxiliary

substrate for engineered Pseudomonas putida S12’, Applied Microbiology and

Biotechnology, 83(4), pp. 705–713. doi: 10.1007/s00253-009-1922-y. van der Krogt, Z. A. et al. (2007) ‘Formate as an Auxiliary Substrate for Glucose-Limited

Cultivation of Penicillium chrysogenum: Impact on Penicillin G Production and Biomass

82

Yield’, Applied and Environmental Microbiology, 73(15), pp. 5020–5025. doi:

10.1128/aem.00093-07.

Lazar, Z. et al. (2017) ‘Characterization of hexose transporters in Yarrowia lipolytica reveals

new groups of Sugar Porters involved in yeast growth’, Fungal Genetics and Biology.

100, pp. 1–12. doi: 10.1016/j.fgb.2017.01.001.

Ledesma-Amaro, R. and Nicaud, J. M. (2016) ‘Metabolic Engineering for Expanding the

Substrate Range of Yarrowia lipolytica’, Trends in Biotechnology. 34(10), pp. 798–809.

doi: 10.1016/j.tibtech.2016.04.010.

Lee, W. S. et al. (2014) ‘A review of the production and applications of waste-derived volatile

fatty acids’, Chemical Engineering Journal. 235, pp. 83–99. doi:

10.1016/j.cej.2013.09.002.

Li, X., Chen, Y. and Nielsen, J. (2019) ‘Harnessing xylose pathways for biofuels production’,

Current Opinion in Biotechnology. 57(11), pp. 56–65. doi: 10.1016/j.copbio.2019.01.006.

Li, Y. et al. (2018) ‘Complete biosynthesis of noscapine and halogenated alkaloids in yeast’,

Proceedings of the National Academy of Sciences, 115(17), pp. E3922–E3931. doi:

10.1073/pnas.1721469115.

Lo, T. M. et al. (2016) ‘A Two-Layer Gene Circuit for Decoupling Cell Growth from Metabolite

Production’, Cell Systems. 3(2), pp. 133–143. doi: 10.1016/j.cels.2016.07.012.

Lund, S., Hall, R. and Williams, G. J. (2019) ‘An Artificial Pathway for Isoprenoid Biosynthesis

Decoupled from Native Hemiterpene Metabolism’, ACS Synthetic Biology. 8, pp. 232–

238. doi: 10.1021/acssynbio.8b00383.

Luo, X. et al. (2019) ‘Complete biosynthesis of cannabinoids and their unnatural analogues in

yeast’, Nature. 567(7746), pp. 123–126. doi: 10.1038/s41586-019-0978-9.

83

Maru, B. T. et al. (2018) ‘Fixation of CO2 and CO on a diverse range of carbohydrates using

anaerobic, non-photosynthetic mixotrophy’, FEMS Microbiology Letters, 365(8), pp. 1–

8. doi: 10.1093/femsle/fny039.

Meyer, F. et al. (2018) ‘Methanol-essential growth of Escherichia coli’, Nature Communications.

9(1). doi: 10.1038/s41467-018-03937-y.

Mosier, N. et al. (2005) ‘Features of promising technologies for pretreatment of lignocellulosic

biomass.’, Bioresource technology, 96(6), pp. 673–86. doi:

10.1016/j.biortech.2004.06.025.

Papapetridis, I. et al. (2018) ‘Laboratory evolution for forced glucose-xylose co-consumption

enables identification of mutations that improve mixed-sugar fermentation by xylose-

fermenting Saccharomyces cerevisiae’, FEMS Yeast Research, 18(6), pp. 1–17. doi:

10.1093/femsyr/foy056.

Park, J. O. et al. (2019) ‘Synergistic substrate cofeeding stimulates reductive metabolism’,

Nature Metabolism. 1(6), pp. 643–651. doi: 10.1038/s42255-019-0077-0.

Ragsdale, S. W. and Pierce, E. (2008) ‘Acetogenesis and the Wood-Ljungdahl pathway of CO2

fixation’, Biochimica et Biophysica Acta - and Proteomics, 1784(12), pp. 1873–

1898. doi: 10.1016/j.bbapap.2008.08.012.

Rojo, F. (2010) ‘Carbon catabolite repression in Pseudomonas: Optimizing metabolic versatility

and interactions with the environment’, FEMS Microbiology Reviews, 34(5), pp. 658–

684. doi: 10.1111/j.1574-6976.2010.00218.x.

Ryu, S., Hipp, J. and Trinh, C. T. (2016) ‘Activating and Elucidating Metabolism of Complex

Sugars in Yarrowia lipolytica’, Applied and environmental microbiology, 82(4), pp. 475–

480. doi: 10.1128/AEM.03582-15.Editor.

84

Santala, S., Efimova, E. and Santala, V. (2018) ‘Dynamic decoupling of biomass and wax ester

biosynthesis in Acinetobacter baylyi by an autonomously regulated switch’, Metabolic

Engineering Communications, 7(June). doi: 10.1016/j.mec.2018.e00078.

Simpson-Lavy, K. and Kupiec, M. (2019) ‘Carbon Catabolite Repression in Yeast is Not Limited

to Glucose’, Scientific Reports, 9(1), pp. 1–10. doi: 10.1038/s41598-019-43032-w.

Studier, F. W. (2005) ‘Protein production by auto-induction in high density shaking cultures.’,

Protein expression and purification, 41(1), pp. 207–34. doi: 10.1016/j.pep.2005.01.016.

Tan, S. Z. and Prather, K. L. (2017) ‘Dynamic pathway regulation: recent advances and methods

of construction’, Current Opinion in Chemical Biology. 41, pp. 28–35. doi:

10.1016/j.cbpa.2017.10.004.

Uranukul, B. et al. (2019) ‘Biosynthesis of monoethylene glycol in Saccharomyces cerevisiae

utilizing native glycolytic enzymes’, Metabolic Engineering. 51, pp. 20–31. doi:

10.1016/j.ymben.2018.09.012.

Wang, L. et al. (2019) ‘Simultaneous fermentation of biomass-derived sugars to ethanol by a co-

culture of an engineered Escherichia coli and Saccharomyces cerevisiae’, Bioresource

Technology. 273, pp. 269–276. doi: 10.1016/j.biortech.2018.11.016.

Wang, X. et al. (2019) ‘Growth strategy of microbes on mixed carbon sources’, Nature

Communications. 10(1), pp. 1–7. doi: 10.1038/s41467-019-09261-3.

Wang, Y. et al. (2013) ‘Improved co-production of S-adenosylmethionine and glutathione using

citrate as an auxiliary energy substrate’, Bioresource Technology, 131, pp. 28–32. doi:

10.1016/j.biortech.2012.10.168.

Williams, T. C. et al. (2015) ‘Dynamic regulation of gene expression using sucrose responsive

promoters and RNA interference in Saccharomyces cerevisiae’, Microbial Cell Factories.

85

14(1), pp. 1–10. doi: 10.1186/s12934-015-0223-7.

Witthoff, S. et al. (2015) ‘Metabolic Engineering of Corynebacterium glutamicum for Methanol

Metabolism’, Applied and Environmental Microbiology, 81(6), pp. 2215–2225. doi:

10.1128/aem.03110-14.

Woolston, B. M. et al. (2018) ‘Improving formaldehyde consumption drives methanol

assimilation in engineered E. coli’, Nature Communications. 9(1). doi: 10.1038/s41467-

018-04795-4.

Xie, N. Z. et al. (2014) ‘Biotechnological production of muconic acid: Current status and future

prospects’, Biotechnology Advances. 32(3), pp. 615–622. doi:

10.1016/j.biotechadv.2014.04.001.

Xu, P. (2018) ‘Production of chemicals using dynamic control of metabolic fluxes’, Current

Opinion in Biotechnology. 53, pp. 12–19. doi: 10.1016/j.copbio.2017.10.009.

Young, E., Lee, S. M. and Alper, H. (2010) ‘Optimizing pentose utilization in yeast: The need

for novel tools and approaches’, Biotechnology for Biofuels, 3, pp. 1–12. doi:

10.1186/1754-6834-3-24.

Zeng, X. et al. (2019) ‘Production of natamycin by Streptomyces gilvosporeus Z28 through

solid-state fermentation using agro-industrial residues’, Bioresource Technology. 273, pp.

377–385. doi: 10.1016/j.biortech.2018.11.009.

Zhang, N. et al. (2016) ‘Precursor-directed biosynthesis of new sansanmycin analogs bearing

para-substituted-phenylalanines with high yields’, Journal of Antibiotics. 69(10), pp.

765–768. doi: 10.1038/ja.2016.2.

86

Chapter 4

Synergistic Substrate Cofeeding Stimulates Reductive Metabolism

This chapter is adapted from

Park, J.O.*, Liu, N.*, Holinski, K.M., Emerson, D.F., Qiao, K., Woolston, B.M., Xu. J., Lazar, Z., Islam, M.A., Vidoudez, C., Girguis, P.R. & Stephanopoulos, G., 2019. Synergistic substrate cofeeding stimulates reductive metabolism. Nature Metabolism, 1(6), pp643.

(*denotes equal contribution)

87

4.1 Introduction

One of the greatest feats of metabolism is the ability to synthesize reduced compounds from input substrates with varying oxidation states. Using reductive metabolism, cells reassemble the output of substrate catabolism for energy-dense bioproduct synthesis (Ledesma-Amaro and Nicaud,

2016). This process is often implemented in both laboratory and industry with single organic carbon sources (e.g., sugars) as inputs due to simplicity (Atsumi, Hanai and Liao, 2008; Xue et al.,

2013). Nonetheless, single substrates naturally impose stoichiometric constraints on available carbons, energy, and cofactors, leading to biosynthetic imbalance and suboptimal product yield. Thus, genetic rewiring of metabolic pathways is required to advantageously shift these stoichiometries (Qiao et al., 2017), which precludes wide application of non-model organisms that lack suitable genetic tools (Kita et al., 2013).

Substrate mixtures, on the other hand, present the potential to alleviate such stoichiometric constraints in reductive metabolism without genetic rewiring. Since each substrate has unique efficiencies for carbon, energy, and cofactor generation, varying the relative amounts of substrates in the mixture allows fine-tuning of carbon-to-energy-to-cofactor ratios. Furthermore, substrates with different entry points to metabolism alleviate protein burdens by providing the required components in fewer enzymatic steps. Nevertheless, mixed substrate metabolism is epitomized by sequential (e.g., diauxie) and hierarchical (yet simultaneous) utilization based on substrate preference (Monod, 1942; Görke and Stülke, 2008; Aristilde et al., 2015), reflecting the evolutionary fitness of cells in their native environments (Bren et al., 2016). Despite the recent success of substrate mixture batch fermentation using limited substrate pairs (that do not trigger catabolite repression) (Joshua et al., 2011; Hermsen et al., 2015), genetic engineering (Martínez et al., 2008; Kanno, Carroll and Atsumi, 2017), and directed evolution (Sanchez et al., 2010; Kim

88 et al., 2015; Meyer et al., 2018), the full mixture spectrum remains inaccessible and thus unexplored.

Here we report a simple and universal solution to overcoming undesirable substrate preferences and improving carbon reduction in various organisms. We eliminate catabolite repression by controlling the continuous feed rate of preferred superior substrates to maintain negligible concentrations in systems dominated by inferior substrates. Using this method, we explored mixed substrate metabolism and therein observed enhanced metabolic productivity that exceeds the sum of individual-substrate productivities.

This substrate co-feeding scheme was applied to two widely divergent organisms to optimize reductive metabolism of lipogenesis and acetogenesis. We cultured the oleaginous yeast Y. lipolytica on acetate and continuously fed limiting quantities of glucose, fructose, glycerol, or gluconate as “dopant” substrates to augment reductive metabolism. In this fed-batch setup, cells simultaneously consumed acetate and the supplemented substrate with acetate remaining as the primary carbon source. In particular, the rate of lipogenesis with gluconate doping was twice as fast as that of the acetate-only control. Tracing 13C from gluconate revealed that obligatory

NADPH synthesis by recursive use of the oxidative pentose phosphate pathway (oxPPP) was responsible for the observed synergy with acetate.

We then set out to source acetate via acetogenesis, a reductive metabolic process starting from

CO2. Acetogenic bacterium M. thermoacetica simultaneously consumes CO2 and glucose with the

– latter providing both ATP and electrons (e ) necessary for CO2 fixation, cell maintenance, and growth (Jones et al., 2016). However, tracing 13C-labeled glucose revealed that glucose metabolism dominated and e– generation was coupled to undesirable decarboxylation. To shift cellular metabolism towards greater CO2 incorporation, we designed a chemostat that continuously

89 supplied limiting glucose and ample H2. Under these conditions, CO2 reduction metabolism dominated, glucose primarily produced ATP sufficient for cell maintenance via pyruvate kinase,

– and carbon-free e for net CO2 reduction was supplied by H2. Importantly, with dopant substrate glucose, M. thermoacetica rapidly converted CO2 into acetate exclusively, serving as the ideal input for gluconate-doped lipogenesis.

With the aforementioned synergy, we fixed CO2 at 2.3 g per g cell dry weight per hour (g gCDW–1 hr–1), substantially faster than ~0.05 g gCDW–1 hr–1 of typical photosynthetic systems

(Bowes, Ogren and Hageman, 1972). Using the resulting acetate, we produced lipids at 0.046 g gCDW–1 hr–1, a more than two-fold improvement over the previously optimized system (~0.02 g gCDW–1 hr–1) (Xu et al., 2017). Coordinating the glucose-doped acetogenesis and gluconate- doped lipogenesis, we converted carbons in the most oxidized, undesirable state (CO2) to the reduced, energy-dense state (lipids) with 38% energetic efficiency. Through synergistic substrate co-feeding, we overcame the limitation of ATP- and NADPH-dependent biological carbon reduction, paving the path for CO2-derived advanced bioproduct synthesis.

4.2 Materials and methods

4.2.1 Strain and culture conditions

Yarrowia lipolytica strains based on the ACCDGA strain (MTYL065) (Tai and Stephanopoulos,

2013) were pre-cultured at 30 °C in 14 mL test tubes containing YPD media (20 g/L glucose, 20 g/L peptone, 10 g/L yeast extract). After 24 hr, 1 mL culture was transferred to a shake flask containing 40 mL of acetate media (50 g/L sodium acetate, 1.7 g/L YNB-AA-AS, and 1.34 g/L ammonium sulfate). The shake flask culture was carried out for 24 hr to adapt the cells to acetate.

90

Afterwards, the cells were pelleted at 18,000 g for 5 min, washed once with acetate media, and used for inoculation at an initial OD600 of 0.05 for all Y. lipolytica experiments.

Mixed substrate batch cultures were carried out in shake flasks with 40 mL of acetate media except that 6 mol% of the from acetate was replaced with the supplemental substrate

(glucose, fructose, glycerol, or gluconate). Continuous fed-batch supplementation cultures were carried out in 250 mL bioreactors (Applikon Biotechnology) with 150 mL working volume.

Acetate media was used under batch conditions while the supplemented substrate was continuously fed at a rate of 0.13 mmol C/hr. For the acetate-only control case, the supplemented substrate was replaced with acetate and fed at the same rate to ensure that cells had equal amounts of carbon substrates throughout all conditions. All bioreactor cultures were carried out at 30 °C, pH 7.0 (controlled with 10 wt% sulfuric acid), and 0.2 LPM air sparging. The dissolved oxygen levels were controlled at 20% during the growth phase and ~2% during the lipogenic phase for optimal lipid production and minimal citrate excretion (Qiao et al., 2017). For gluconate 13C tracing experiments, natural gluconate in the supplementation feed stream was replaced with [U-

13 C6]gluconate (99%, Cambridge Isotope Laboratories).

In all Y. lipolytica experiments having gluconate as a substrate, an ACCDGA strain overexpressing its native gluconate kinase (glucK) under the TEFin promoter was used. The expression of TEFin-glucK was performed through genome integration. This was to ensure that gluconate uptake and incorporation into central carbon metabolism was not inhibited by inadequate levels of the kinase. All other experiments were performed using the same ACCDGA strain with an empty control vector integrated into the genome. Overexpressing gluconate kinase did not have any appreciable effects on the strain’s capability to produce lipids on acetate, as shown in Figure 4.1.

91

1.0

0.8

0.6

0.4 Normalized lipid titer lipidNormalized 0.2

0.0 ACCDGA EV TEFin-glucK

Figure 4.1. Overexpression of gluconate kinase does not affect lipid production. Normalized lipid titers after 72 hr of culture in acetate media. ACCDGA is the original unmodified Y. lipolytica strain. EV is a strain carrying an empty control vector stably integrated into the genome while

TEFin-glucK is a strain overexpressing Y. lipolytica native gluconate kinase under the TEFin promoter through genome integration. Individual data points for each biological replicate are shown as circles. The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

Moorella thermoacetica (ATCC 39073 and 49707) were cultured in balch-type tubes containing culture medium with 8 g/L glucose, 7.5 g/L NaHCO3, 7 g/L KH2PO4, 5.5 g/L K2HPO4,

2 g/L (NH4)2SO4, 0.5 g/L MgSO4 • 7H2O, 0.3 g/L cysteine, 0.02 g/L CaCl2 • 2H2O, 1% (v/v) trace minerals (ATCC MD-TMS), and 1% (v/v) (ATCC MD-VS) at 55 °C pH 6.8. Cysteine scavenged residual dissolved oxygen in the medium (Michaelis and Guzman Barron, 1929). The

13 headspace was pressurized to either 170 kPa with CO2 or 240 kPa with 80:20 H2/CO2. For C

13 tracing experiments, natural glucose was replaced with [U- C6]glucose (99%, Cambridge Isotope

92

Laboratories) and the headspace was pressurized to 170 kPa with natural CO2. The balch-tube cultures were incubated inside a strictly anoxic glovebox with magnetic stirring.

For bioreactor experiments, M. thermoacetica (ATCC 49707) was cultured in a strictly anoxic vessel with pH and temperature control set to 6.6 (using 10M sodium hydroxide) and 55°C. Low glucose but otherwise identical culture media were fed as follows (media glucose concentrations and media feed rates): 0.25 g/L at 11.5 mL/hr; 0.25 g/L at 9.1 mL/hr; 0.25 g/L at 6.9 mL/hr; 0.25 g/L at 4.3 mL/hr; 0.25 g/L at 2.3 mL/hr; 0.25 g/L at 1.2 mL/hr; and 0.13 g/L at 1.2 mL/hr. The rate of effluent was the same to keep the culture volume constant at 135 mL. H2 and CO2 were mixed at 60:40 and sparged into the culture at 200 mL/min. The headspace pressure was maintained at 130 kPa. All the data and conditions are shown in Table 4.1.

4.2.2 Metabolite extraction and measurement

To extract metabolites, Y. lipolytica cells were collected during exponential and lipogenic phases.

Cells were filtered on 0.45 µm nylon membrane filters and immediately transferred to a precooled

40:40:20 acetonitrile/methanol/water solution. After 20 minutes at -20°C, the filters were washed, and extracts were moved to Eppendorf tubes. The samples were then centrifuged for five minutes and the supernatants were dried under nitrogen.

In mid-exponential phase, the M. thermoacetica cultures were collected from balch-type tubes using syringes inside the anaerobic glovebox. Immediately after, cellular metabolism was quenched and metabolites were extracted by quickly transferring filtered cells (on 0.2 µm nylon membrane filter) to plates containing precooled 80% acetonitrile on ice (Rabinowitz and Kimball,

2007). After 20 minutes at 4°C, the membrane filters were washed, and the metabolite extracts

93

Table 4.1. Acetate yields, productivities, and CO2 fixation rates with varying energy sources in M. thermoacetica. Glc feed Dilution Gas and Fraction of e– Acetate yield Acetate productivity CO fixation rate Condition 2 conc. (g/L) rate (hr–1) pressure (kPa) from glucose (g ace / g glc) (mmol gCDW–1 hr–1) (mmol gCDW–1 hr–1)

Batch* n/a n/a CO2; 170 100 % 0.77 ± 0.02 8.0 ± 0.1 –1.5 ± 0.5

Chemostat 0.25 0.085 60:40 H2/CO2; 130 41.1 ± 2.4 % 2.3 ± 0.1 19.9 ± 0.8 25.3 ± 1.6

Batch* n/a n/a 80:20 H2/CO2; 240 24.9 ± 1.4 % 3.8 ± 0.2 23.3 ± 0.8 36.6 ± 1.7

Chemostat 0.25 0.067 60:40 H2/CO2; 130 15.1 ± 1.1 % 6.3 ± 0.1 25.9 ± 0.2 46.1 ± 0.5

Batch n/a n/a 80:20 H2/CO2; 240 12.4 ± 1.1 % 7.5 ± 0.6 26.6 ± 0.9 50.0 ± 1.9

Chemostat 0.25 0.051 60:40 H2/CO2; 130 9.2 ± 1.5 % 10.5 ± 0.2 28.0 ± 0.2 52.7 ± 0.4

Chemostat 0.25 0.032 60:40 H2/CO2; 130 4.2 ± 1.8 % 23.0 ± 0.4 22.3 ± 0.2 43.8 ± 0.5

Chemostat 0.25 0.017 60:40 H2/CO2; 130 2.9 ± 0.1 % 33.4 ± 1.3 11.1 ± 0.3 22.3 ± 0.7

Chemostat 0.25 0.009 60:40 H2/CO2; 130 1.9 ± 0.1 % 52.3 ± 2.9 9.2 ± 0.2 18.3 ± 0.5

Chemostat 0.13 0.009 60:40 H2/CO2; 130 1.2 ± 0.1 % 82.1 ± 7.2 6.9 ± 0.4 13.9 ± 0.8 Values shown in this table represent the mean and S.E.M. determined from three biological replicates * M. thermoacetica (ATCC 39073) was used in these conditions. M. thermoacetica (ATCC 49707) was used in all other conditions.

94 were moved to Eppendorf tubes. The supernatants were obtained after five minutes of centrifugation and lyophilized.

Dried samples were resuspended in HPLC-grade water for LC-MS analysis. These samples were analyzed on a Dionex UltiMate 3000 UPLC system (Thermo) with a ZIC-pHILIC (5 µm polymer particle) 150 × 2.1 mm column (EMD Millipore) coupled to a QExactive orbitrap mass spectrometer (Thermo) by electrospray ionization. With 20 mM ammonium carbonate, 0.1% ammonium hydroxide as solvent A and acetonitrile as solvent B, the chromatographic gradient was run at a flow rate of 0.150 mL/min as a linear gradient from 80% B to 20% B between 0 and

20 mins, a linear gradient from 20% B to 80% B between 20 and 20.5 mins, and 80% B held from

20.5 to 28 mins. The column and autosampler tray temperature were at 25 °C and 4 °C. The mass spectrometer was operated in polarity switching mode scanning a range of 70-1,000 m/z. The resolving power was set to 70,000 for 13C labeling experiments. With retention times determined by authenticated standards, resulting mass spectra and chromatograms were identified and processed using MAVEN software (Clasquin, Melamud and Rabinowitz, 2012). To obtain labeling information of cellular and acetate, the labeling of carbamoyl group was obtained by comparing (i.e., computing the inverse Cauchy product) citrulline to ornithine, and the labeling of acetyl group was obtained by comparing N-acetyl-glutamate to glutamate.

4.2.3 Substrate uptake and product secretion measurement

For Y. lipolytica, 1 mL of culture was taken at each time point for media and cell dry weight

(CDW) analysis. The cells were centrifuged at 18,000 g for 10 min and the supernatant was subsequently extracted, filtered (0.2 µm syringe filters), and analyzed on a high-performance liquid chromatography (HPLC). The cell pellet was then wash once with 1 mL water to remove

95 residual media components and dried in a 60 °C oven until its mass remains unchanged. This mass was taken to be the CDW per mL of culture. As for lipids, a small volume was extracted from the culture such that it contains ~1 mg of CDW. The supernatant was discarded after centrifugation at

18,000g for 10 min. 100 μL of an internal standard containing 2 mg/mL methyl tridecanoate

(Sigma-Aldrich) and 2 mg/mL glyceryl triheptadecanoate (Sigma-Aldrich) in hexane was added to each sample. Transesterification was then carried out in 500 μL 0.5 N sodium methoxide solutions with continuous vortexing at 1200 rpm for 60 min. Afterwards, 40 μL of 98% sulfuric acid was added to neutralize the pH and 500 μL of hexane was used for extraction. Additional vortexing at 1200 rpm for 30 min was carried out and centrifugation at 6,000 g for 1 min was performed to remove cellular debris. The top hexane layer was used for analysis on a GC-FID system. All Y. lipolytica specific rate data were normalized to the lipid-free CDW, which was the difference between the measured CDW and the lipid titer.

For media analysis in M. thermoacetica cultures, small aliquots of the cultures were collected with syringes inside the anaerobic glovebox over their exponential phase. Filtered media samples

(0.2 µm syringe filters) were analyzed by YSI biochemistry analyzer for glucose and by HPLC for acetate and formate along with other potential products (e.g., lactate and ethanol). Culture density

–1 –1 was measured by spectrophotometry (0.45 gCDW L OD660 ) at the time of sampling. The rates of substrate uptake and product secretion were determined using the rates at which substrates, products, culture density change over time. The carbon output rate for biomass was determined using growth rate and elemental biomass composition of CH2.08O0.53N0.24 (Tracy et al., 2012). The net CO2 fixation rates were calculated based on the measured acetate and biomass carbon production rates less the corresponding measured glucose carbon consumption rates. The fraction of electrons derived from H2 was inferred from the fraction of acetate and biomass carbons

96 generated from net CO2 fixation since the average oxidation state of acetate and biomass carbons is nearly the same as that of glucose.

For HPLC, 10 μL sample was injected into an Agilent 1200 High-Performance Liquid

Chromatography system coupled to a G1362 Refractive Index Detector (Agilent Technologies). A

Bio-Rad HPX-87H column was used for separation with 14 mM sulfuric acid as the mobile phase flowing at 0.7 mL/min. For GC-FID, 1 μL of sample was injected at a split ratio of 50:1 into an

Agilent 7890B GC-FID system coupled to a J&W HP-INNOWax capillary column (Agilent

Technologies). The column was held at a constant temperature of 200 °C with helium as the carrier gas (1.5 mL/min). The injection and FID temperatures were set to 260 °C.

4.2.4 Headspace gas measurement

After collecting the M. thermoacetica cultures from balch-type tubes inside the anaerobic glovebox for intracellular and extracellular metabolite analysis, the empty balch-type tubes containing only the headspace gas were stored at 4 °C until gas chromatography-mass spectrometry (GC-MS) analysis. To measure CO2 isotope labeling, 100 µl of headspace sample was collected from each tube with a gastight syringe and injected in a multimode inlet, which was maintained at 180 °C, with a split of 10. Samples were analyzed on a 7890A GC system with a 60 m GS-GasPro (0.320 mm diameter) column coupled with a 5975C quadrupole mass spectrometer (Agilent). The oven was kept at 90 °C for 3 minutes before heating to 260 °C at 45 °C/min and held at 260 °C for 1 minute.

97

4.2.5 Flux balance analysis and isotope tracing flux analysis

M. thermoacetica model based on the published genome-scale metabolic reconstruction (Islam et al., 2015) was employed for constraint-based flux analysis. Among the feasible metabolic flux distributions that satisfy steady-state mass balance and nutrient availability constraints, optimal solutions that maximize/minimize objective functions were obtained using the COBRA toolbox and a Gurobi solver (Schellenberger et al., 2011). To determine CO2 utilization capability, the objective was to maximize CO2 consumption, or equivalently, minimize CO2 production. To determine the growth potential using H2 as the energy source, the objective was to maximize biomass production (i.e., cell growth). Substrate uptake and product secretion rate constraints were selected based on experimental or previously reported values.

To determine flux distributions, isotopomer mass balance constraints were also imposed based on the 13C labeling results. For this purpose, the metabolic networks including glycolysis and PPP for Y. lipolytica as well as lower glycolysis, the TCA cycle, anaplerosis, the reductive acetyl-CoA pathway and the /glycine pathway for M. thermoacetica were constructed with carbon atom mapping. The labeling of following metabolites were simulated by the elementary metabolite unit

(EMU) framework (Antoniewicz, Kelleher and Stephanopoulos, 2007): for Y. lipolytica, G6P,

F6P, 3PG, S7P, 6PG, R5P, PEP, and Pyr (Appendix B Table B1); for M. thermoacetica, 3PG,

PEP, Ala, acetyl-CoA, Ser, Gly, Asp, Glu, and CO2 (Appendix B Table B3).

The flux distribution that best simulated the metabolite labeling and uptake-secretion rates was found by minimizing the variance weighted-sum of squared residuals (SSR) between simulation and experiment:

2 2 퐢퐬퐨퐞퐱퐩 − 퐢퐬퐨(퐯) 퐯퐞퐱퐩 − 퐯 min ∑ ( ) + ∑ ( ) 푣 퐬퐢퐬퐨 퐬퐯

98

퐯 and 퐢퐬퐨(퐯) denote in vector form the metabolic flux distribution and the simulated 13C labeling of metabolites as a function of 푣 . 푣푒푥푝 and 𝑖푠표푒푥푝 denote measured fluxes and measured metabolite labeling; 퐬퐯 and 퐬퐢퐬퐨 , their measurement standard deviation. The 95% confidence interval for each best fit flux was obtained by searching for the minimum and maximum flux values that increase the minimum SSR by less than the χ2 cutoff (1 degree of freedom) of 3.84

(Antoniewicz, Kelleher and Stephanopoulos, 2006).

4.2.6 Calculation of specific growth rate and productivities

Y. lipolytica batch fermentations have two distinct phases: (a) nitrogen replete growth phase (6-44 hr) and (b) nitrogen deplete lipogenic phase (44 hr onwards). During the growth phase, exponential growth takes place while during the lipogenic phase, cell division ceases due to nitrogen limitation.

For the growth phase, an exponential curve was fitted through the lipid-free CDW measurements during the 20 and 32 hr time points and the exponent was taken as the specific growth rate μ (hr-1). To obtain the specific productivity, the ratio between lipid synthesis and lipid- free CDW synthesis was calculated using the lipid titer and lipid-free CDW measurements at the

20 and 32 hr time points. This ratio was then multiplied by μ to obtain the specific lipid productivity

-1 -1 normalized to lipid-free CDW qlipid,G (g gCDW hr ).

As for the lipogenic phase, the cell number and hence the lipid-free CDW was approximately constant. Specific lipid productivity was calculated by first determining the volumetric lipid production rate using the lipid titer measurements at the 56 and 68 hr time points. This rate was then divided by the lipid-free CDW measured at the end-point of the fermentation to obtain the

-1 -1 specific productivity qlipid,L (g gCDW hr ).

99

4.2.7 Genome-scale stoichiometric metabolic model

Our genome-scale model of M. thermoacetica (iAI563) contains 563 genes and 712 reactions. We incorporated 6 new reactions and 5 new genes involved in ethanol metabolism into iAI558

(Appendix B Table B5). Balancing electrons in the reactions that involve ferredoxin, iAI563 includes corrected stoichiometries (Appendix B Table B6).

4.2.8 Relationship between ATP availability and CO2 fixation

In autotrophic growth with an electron-rich energy source such as H2, cells produce ATP with energy derived from transmembrane proton motive force27. Using the genome-scale metabolic model, we characterized the electron-rich metabolism in H2+CO2 gas fermentation. We set the

–1 –1 maximum H2 consumption rate to 32 mmol gCDW hr (selected based on the specific hydrogenase activity and the assumption that hydrogenase could take up to 1% of total protein mass). With the objective function of maximum growth, we found that increasing ATP consumption (by increasing the non-growth associated ATP maintenance requirement) produced decreasing growth rate without affecting the consumption rate of the oxidizing agent and carbon source CO2. This suggested that rapid CO2 fixation is feasible as long as minimum cellular ATP requirement is met.

4.2.9 Analysis of synergies associated with substrate cofeeding

Using the measured product synthesis rates with individual substrates, we evaluated the extent of synergies by comparing the experimentally observed result with the expected product synthesis rates (see below) when two substrates were simultaneously provided.

100

For Y. lipolytica, we chose the lipogenic phase, in which gluconate contributed to ~5.2% of carbon uptake during dual substrate cofeeding. The expected value shown here is the linearly extrapolated sum of the productivities measured individually for each substrate, weighted by their contributions to carbon uptake during cofeeding.

Fatty acid productivity (mmol gCDW–1 hr–1) Acetate 0.126 Gluconate 0.094 Acetate+Gluconate 0.126 + 0.052×0.094 = Expected 0.13 *Observed 0.166 ± 0.008 *The “Observed” value represents mean and S.E.M. determined from 3 biological replicates.

Similarly for M. thermoacetica, we chose the maximum acetate productivity condition, in which glucose contributed to ~9.2% of electron uptake, and we obtained the following:

Acetate productivity (mmol gCDW–1 hr–1) H2 & CO2 ~0

Glucose & CO2 8.00 H2+Glucose & CO2 0 + 0.092×8.00 = Expected 0.74 *Observed 28.0 ± 0.2 *The “Observed” value represents mean and S.E.M. determined from 3 biological replicates.

We set out to determine the feasibility of the observed synergies with substrate cofeeding. We hypothesized that the synergies were due to complementary roles of different substrates in providing balanced ingredients for lipogenesis and acetogenesis. One fatty acid (FA 18:0) molecule synthesis requires 18 carbons (i.e., 9 acetyl groups), 8 ATP, and 16 NADPH. Acetate and gluconate each can generate these three ingredients. For acetogenesis, one acetate molecule

– synthesis requires 2 CO2 and 4 reducing equivalents (8 e ). Both H2 and glucose generate reducing equivalents for CO2 reduction. As Y. lipolytica and M. thermoacetica cell maintenance

101

(housekeeping) requires ATP, the synthesis of fatty acid and acetate actually requires greater than

8 and 0 ATPs, respectively.

While individual substrates can satisfy these metabolic requirements, their product synthesis rates are different. We hypothesized that these rates are mainly determined by the synthesis of those ingredients that are least accessible by cells. In other words, if more pathway steps are required to synthesize the precursors, they are likely rate determining because of greater protein burdens and greater likelihood of containing rate-limiting enzymatic steps.

We began formulating the analysis of metabolic requirements and burdens by identifying different substrate usage pathways, their output products, and the number of enzymatic steps. For

Y. lipolytica:

Cases Carbon ATP NADPH # Steps Role Flux (mmol gCDW–1 hr–1) Ace#1 2 -2 0 1 Max Carbon 1.14 Ace#2 0 8 0 9 Max ATP 0.49 Ace#3 0 -1 3 74 Max NADPH 0.67 Glcn#1 3.33 10 1 16 Max Carbon 0.51 Glcn#2 0 26.67 1 29.33 Max ATP 0 Glcn#3 0 -1 11 37 Max NADPH 0.090 FA 18:0 18 >8 16 PHB 4 >0 1 IPP 6 >3 2

These stoichiometries represent the numbers of products and steps when one acetate or gluconate goes through different pathways. The conversion factor of 2.5 ATP/NADH and 1.5

ATP/FADH2 was used. These different cases can be linearly combined to satisfy metabolic demands for FA 18:0 synthesis, and we set out to determine their usage in terms of fluxes.

Mathematically, these problems were solved by linear programming, where we aimed to find a positive flux vector that i) minimizes enzymatic steps, ii) produces exact moles of Carbon and

NADPH (reducing equivalents) required for product synthesis, and iii) produces ATP greater than

102 the requirement for product synthesis. The mathematical formulation is presented below. For acetate:

Ace#1 min [0 8 73] [Ace#2] Ace#3 푠푢푐ℎ 푡ℎ푎푡 Ace#1 2 0 0 18 [ ] [Ace#2] = [ ] , 0 0 3 16 Ace#3 Ace#1 [−2 8 −1] [Ace#2] ≥ 8 , 푎푛푑 Ace#3 Ace#1 [Ace#2] ≥ ퟎ Ace#3 To get absolute fluxes, we multiply the solution vector by the measured lipid production rate from acetate, 0.126 mmol gCDW–1 hr–1:

Ace#1 푓푙푢푥푒푠퐴푐푒 = 0.126 [Ace#2] Ace#3 For gluconate:

Glcn#1 min [15 28.33 36] [Glcn#2] Glcn#3 푠푢푐ℎ 푡ℎ푎푡 Glcn#1 3.33 0 0 18 [ ] [Glcn#2] = [ ], 1 1 11 16 Glcn#3 Glcn#1 [10 26.67 −1] [Glcn#2] ≥ 8, 푎푛푑 Glcn#3 Glcn#1 [Glcn#2] ≥ ퟎ Glcn#3 To get absolute fluxes, we multiply the solution vector by the measured lipid production rate from gluconate, 0.094 mmol gCDW–1 hr–1:

103

Glcn#1 푓푙푢푥푒푠퐺푙푐푛 = 0.094 [Glcn#2] Glcn#3 The resulting flux values are shown in italics in the last column of the table above.

For each nutrient, we used these fluxes to define the flux upper bound (U.B.) of each case by summing all the fluxes of those cases whose enzymatic steps are greater and the scaled-back fluxes

(proportionally by the ratios of enzymatic steps so as not to increase total protein burden) of those cases whose enzymatic steps are less. These upper bounds correspond to the maximum rates of substrate utilization through corresponding cases.

Cases U.B. (mmol gCDW–1 hr–1) Ace#1 1.14 + 0.49 + 0.67 Ace#2 1.14×1/9 + 0.49 + 0.67 Ace#3 1.14×1/74 + 0.49×9/74 + 0.67 Glcn#1 0.51 + 0 + 0.090 Glcn#2 0.51×16/29.33 + 0 + 0.090 Glcn#3 0.51×16/37 + 0 + 0.090

We used these upper bounds to test the feasibility of the observed synergy between acetate and gluconate. Gluconate feed was controlled at 0.049 mmol gCDW–1 hr–1, which is well below the limits for Glcn#1, Glcn#2, and Glcn#3. Therefore, we used 0.049 mmol gCDW–1 hr–1 as the upper bound for the sum of all gluconate utilization cases. To obtain maximum Carbon for FA 18:0 synthesis:

Ace#1 Ace#2

Ace#3 max 푓퐶 = [2 0 0 3.33 0 0] Glcn#1 Glcn#2 [Glcn#3] To obtain maximum ATP for FA 18:0 synthesis:

104

Ace#1 Ace#2

Ace#3 max 푓퐴푇푃 = [−2 8 −1 10 26.67 −1] Glcn#1 Glcn#2 [Glcn#3] To obtain maximum NADPH for FA 18:0 synthesis:

Ace#1 Ace#2

Ace#3 max 푓푁퐴퐷푃퐻 = [0 0 3 1 1 11] Glcn#1 Glcn#2 [Glcn#3] which are all subject to:

Ace#1 Ace#2

Ace#3 [0 0 0 1 1 1] ≤ 0.049, 푎푛푑 Glcn#1 Glcn#2 [Glcn#3] Ace#1 2.30 Ace#2 1.29

Ace#3 0.75 ퟎ ≤ ≤ Glcn#1 0.049 Glcn#2 0.049 [Glcn#3] [0.049] We obtained the following maximum fluxes (mmol gCDW–1 hr–1):

푓퐶 = 4.77, 푓퐴푇푃 = 11.66, 푎푛푑 푓푁퐴퐷푃퐻 = 2.78

Corresponding maximum FA 18:0 synthesis rates (mmol gCDW–1 hr–1) are obtained by dividing these fluxes by stoichiometric requirements:

푓퐶⁄ 푓퐴푇푃⁄ 푓푁퐴퐷푃퐻⁄ 18 = 0.27, 8 = 1.46, 푎푛푑 16 = 0.17

Since the limiting component (i.e., NADPH) determines the rate of fatty acid synthesis, we take the minimum of these to obtain the maximum FA 18:0 productivity of 0.17 mmol gCDW–1 hr–1. This value is similar to the observed fatty acid synthesis of 0.166 ± 0.008 mmol gCDW–1 hr–

105

1 from acetate with gluconate doping. Thus, we concluded that the observed synergy is theoretically feasible.

As for the analysis of PHB and IPP production, the same mathematical framework as above was applied. We assumed that the ratio of productivities between the acetate-only culture and the gluconate-only culture remained constant regardless of the product being considered. Thus, although the absolute product synthesis flux could not be computed without the measured single substrate fluxes, we were able to obtain the percent gains in mixture productivity over the sum of individual-substrate productivities.

For M. thermoacetica:

Reducing # Flux Cases ATP eqv. (2e–) Acetate Pyruvate Steps Role (mmol gCDW–1 hr–1)

H2#1 0.125 1 0 0 3 Max ATP ~0 – H2#2 0 1 0 0 1 Fastest e Mass transfer rate (R) Glc#1 2 2 0 2 15 Max efficiency 0.80 Glc#2 4 0 3 0 31 Max ATP 2.67 Ace >0 4

These stoichiometries represent the numbers of products and steps when one H2 or glucose goes through different pathways. The microbial utilization of H2 is widely accepted to be mass transfer limited. However, since we did not observe appreciable acetate production or cell growth in an autotrophic condition with H2 and CO2, we assigned 0 flux for the ATP-producing H2#1 case.

Glc#1 and Glc#2 fluxes were obtained from the observed glucose uptake under glucose + CO2 fermentation and cells’ acetate yield. Glucose uptake was 3.47 mmol gCDW–1 hr–1 and acetate yield was 77%. Therefore, we computed Glc#2 to be 3.47×0.77=2.67. Glc#1 was approximated as the rest of glucose flux: 3.47–2.67=0.80. Glc#1 represents maximum ATP efficiency: the number of ATP produced per step. We computed flux upper bounds by summing all the fluxes of those cases whose enzymatic steps are greater and the scaled-back fluxes (proportionally by the ratios

106 of enzymatic steps so as not to increase total protein burden) of those cases whose enzymatic steps are less.

Cases U.B. (mmol gCDW–1 hr–1)

H2#1 0 + R×1/3

H2#2 R + 0 Glc#1 2.67 + 0.80 Glc#2 2.67 + 0.80×15/31

Mathematically, we aimed to find a positive flux vector that i) minimizes enzymatic steps, ii) produces reducing equivalents required for one acetate production, and iii) produces ATP greater than the requirement for product synthesis (i.e., >0). The mathematical formulation is as follows:

H2#1 H #2 min [3 1 15 31] [ 2 ] Glc#1 Glc#2 푠푢푐ℎ 푡ℎ푎푡

H2#1 H #2 [1 1 2 0] [ 2 ] = 4 , Glc#1 Glc#2

H2#1 H #2 [0.125 0 2 4] [ 2 ] ≥ 휀 , 푎푛푑 Glc#1 Glc#2

H2#1 푅/3 H #2 푅 ퟎ ≤ [ 2 ] ≤ [ ] Glc#1 3.06 Glc#2 3.47

휀 represents an infinitesimally small positive number. When sufficient H2 is provided to the system

(R>4), we get the following solution:

퐻2#1 = 0, 퐻2#2 = 4 − 휀/2, 퐺푙푐#1 > 0, 푎푛푑 퐺푙푐#2 = 0

This solution represents that since H2#2 and Glc#1 have fewer steps, cells in chemostat would be inclined to use H2#2 for reducing power generation and Glc#1 for ATP generation. Our glucose cofeeding system supplied <3.06 mmol gCDW–1 hr–1, and thus, our Glc#1 flux would be the

107 glucose feeding rate. Therefore, maximization of acetate production can be achieved as long as

ATP is generated by glucose doping and we maximize gas mass transfer (R), and the maximum acetate productivity (~R/4) would be proportional to H2 transfer rate.

4.3 Results

4.3.1 Accelerating lipogenesis from acetate by enhancing NADPH generation in Y. lipolytica

Lipogenesis requires a balanced supply of acetyl-CoA, ATP, and NADPH at a ~1:1:2 ratio. Single substrates, such as glucose and acetate, can provide all three building blocks for lipids (Ratledge and Wynn, 2002; Qiao et al., 2015; Xu et al., 2017). However, lipid synthesis from acetate, despite acetate’s direct contribution to acetyl-CoA and ATP, is slower compared to that from glucose

(Figure 4.2a) (Fontanille et al., 2012). This is because in Y. lipolytica, NADPH generation is mainly through oxPPP, which takes a series of ATP-intensive reactions to arrive at starting from acetate (Liu, Qiao and Stephanopoulos, 2016).

We aimed to enhance acetate-to-lipid conversion by better supplying NADPH. Since glucose can flow more directly into oxPPP than acetate, we provided both acetate and glucose to a Y. lipolytica batch culture. Consistent with the widely accepted phenomenon of catabolite repression

(Gancedo, 1992), cells consumed glucose only at first (Figure 4.3). To overcome this selective preference (i.e., diauxie), we devised a fed-batch system, in which the same amount of glucose was instead continuously supplied over the course of fermentation to an acetate culture (Figure

4.2b). The feed rate was kept slow to maintain negligible glucose concentrations in the reactor. In this setup, despite constant introduction of glucose, we observed steadily decreasing acetate and no glucose in the reactor, suggesting simultaneous consumption of the two carbon sources (Figure

108

4.2c). Furthermore, the fed-batch co-feeding strategy enhanced both the growth and lipid production in Y. lipolytica significantly compared to the acetate-only control (Figure 4.2d,e).

Figure 4.2. Continuous glucose cofeeding relieves repression of acetate in Y. lipolytica. (a)

Acetate can efficiently support acetyl-CoA and ATP generation through the TCA cycle but not

NADPH generation, which requires many enzymatic steps and ATP. Glucose, on the other hand, can produce NADPH more directly through oxPPP. (b) Since glucose batch feeding suppresses acetate consumption, glucose was continuously supplemented in small quantities to the acetate culture. (c) Despite the continuous feeding of glucose, its concentration in the reactor remained at 0 and acetate concentration decreased. Thus, the fed-batch system enabled simultaneous consumption of acetate and glucose. (d) Biomass and (e) lipid production was faster and higher with glucose-“doping” compared to the acetate-only control. (c-e) The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

109

36 2.0 36 2.0

27 1.5 27 1.5

18 1.0 18 1.0 Glucose Fructose

9 0.5 9 0.5 upeet (g/L) Supplement 0 0.0 0 0.0 0 10 20 30 0 10 20 30

36 2.0 36 2.5 Acetate (g/L)

2.0 27 1.5 27 Gluconate 1.5 18 1.0 18 1.0 Glycerol 9 0.5 9 0.5

0 0.0 0 0.0 0 10 20 30 0 10 20 30 Time (hr) Figure 4.3. Preferential consumption of glucose, fructose, glycerol, and gluconate over acetate by Y. lipolytica. Fermentations were conducted in batch cultures with two biological replicates. Data points represent the mean.

Using the same fed-batch system, we also tested supplementing other substrates (fructose, glycerol, and gluconate) that enter metabolism near oxPPP as metabolic “dopants” to provide

NADPH (Figure 4.4a). In all cases, we observed simultaneous consumption of acetate and the supplemented substrate (Figure 4.5). As with glucose, cell growth and lipid production were enhanced (Figure 4.6) despite the supplemental substrates constituting only small fractions of carbons (Figure 4.4b). To distinguish whether the increase in lipid production was due to cellular metabolism enhancements or simply having more cells in the culture, we determined specific growth rates and productivities. Substrate doping nearly doubled both the specific growth rate

(Figure 4.4c) and the specific lipid productivity during nitrogen-replete growth phase (Figure

4.4d). In nitrogen-depleted lipogenic phase, glucose, fructose, and glycerol cofeeding only

110 modestly enhanced specific productivity while gluconate cofeeding significantly outperformed all other conditions (Figure 4.4e).

Figure 4.4. Cofeeding substrates near oxidative pentose phosphate pathway accelerates cell growth and lipogenesis from acetate. (a) Glucose, fructose, glycerol, and gluconate enter central carbon metabolism through upper glycolysis and PPP. (b) Supplementation of these four substrates accounted for ~5% of the total carbon consumed by the cells and the primary carbon source was acetate. (c) Specific growth rates nearly doubled with substrate cofeeding compared to the acetate-only control. (d) Growth phase (nitrogen-replete) specific lipid productivity nearly doubled with substrate cofeeding. (e) Lipogenic phase (nitrogen-depleted) specific lipid productivity was mildly enhanced by glucose, fructose, or glycerol supplementation. Gluconate-

“doping” significantly outperformed the other conditions. Individual data points for each biological replicate are shown as circles. (b-e) The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

111

36 36 36

27 27 27

18 18 18

Concentration (g/L) 9 9 9 Fructose Glycerol Gluconate 0 0 0 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Time (hr) Figure 4.5. Simultaneous consumption of acetate and superior substrates by Y. lipolytica.

Fermentations were conducted in fed-batch cultures with the superior substrates feeding in at slow rates. The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

a 5 5 5

fructose glycerol gluconate 4 supplement 4 supplement 4 supplement

3 3 3

2 2 2

1 1 1 Lipid concentrationLipid (g/L)

0 0 0 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Time (hr) b 9 9 9 fructose glycerol gluconate supplement supplement supplement

6 6 6

3 3 3 Cell dry dry Cell(g/L) weight

0 0 0 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 Time (hr) Figure 4.6. Enhanced cell growth and lipid production with substrate cofeeding in Y. lipolytica. (a) Lipid concentration over time in fructose, glycerol, and gluconate supplemented

112 cultures (red) compared to the acetate-only control condition (blue). (b) Cell dry weight over time in fructose, glycerol, and gluconate supplemented cultures (red) compared to the acetate-only control condition (blue). The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

4.3.2 Recursive NADPH generation via the pentose cycle

To understand the mechanism of accelerated lipid production, we aimed to elucidate how continuous gluconate supplementation rewires metabolism. Tracing the carbons from [U-

13 13 C6]gluconate by liquid chromatography-mass spectrometry (LC-MS), we observed that the C atoms were confined to the PPP and upper glycolysis (Figure 4.7a and Appendix B Table B1).

Gluconate enters metabolism as 6-phosphogluconate (6PG), which can only go in the oxidative direction through oxPPP because the combined thermodynamics of glucose-6-phosphate dehydrogenase and 6-phosphogluconolactonase (ΔG°’=–29 kJ mol-1) strongly favors the flow of

6PG further into PPP (Casazza and Veech, 1986). This causes gluconate to obligatorily generate

NADPH via 6PG dehydrogenase, which is likely responsible for the acceleration of lipogenesis.

On the other hand, metabolites in the TCA cycle as well as fatty acids were completely unlabeled, indicating exclusive contribution of lipogenic acetyl-CoA and ATP from acetate (Figure 4.7a and

Appendix B Table B1). These labeling data suggested the partitioned usage of metabolism where acetate primarily provided acetyl-CoA and ATP while gluconate primarily provided NADPH to meet the metabolic demands of lipogenesis.

To further validate the hypothesis that gluconate enhances lipogenesis through NADPH supplementation, we performed metabolic flux analysis using the labeling data, substrate uptake rates, and lipid production rate. The flux distribution that best fit all these measurements revealed a strong flux through the oxPPP NADPH-generating steps (Figure 4.7b and Appendix Table

113

B2). Interestingly, phosphoglucose isomerase operated in the reverse direction converting fructose-6-phosphate (F6P) to glucose-6-phosphate (G6P). The flux analysis also revealed that gluconeogenic, oxPPP, and non-oxPPP fluxes together form a metabolic cycle, which we termed the “pentose cycle” (Figure 4.7b and Appendix B Table B3). Akin to the TCA cycle, the pentose cycle recursively oxidized the carbons from gluconate into CO2 while preserving the electrons as

NADPH for lipogenesis, maximizing the dopant substrate’s role as a NADPH provider.

Figure 4.7. Gluconate generates NADPH via the pentose cycle. (a) Tracing carbons from [U-

13 13 C6]gluconate revealed partitioned usage of metabolism. The heavy C of gluconate remained mainly in upper glycolysis and PPP. Acetyl-CoA and TCA intermediates were completely unlabeled, indicating exclusive contribution from acetate. (b) Metabolic flux analysis via isotope mass balancing revealed the cyclic reaction sequence generating NADPH. The “pentose cycle” consisted of the NADPH-producing oxPPP, transketolase, transaldolase, and phosphoglucose isomerase. Flux values are in mmol gCDW–1 hr–1.

4.3.3 Preferential use of glucose leads to excessive decarboxylation in CO2-fixing M. thermoacetica

Acetogenesis is a reductive metabolic process that produces acetate from CO2. In acetogenic organisms, the reductive acetyl-CoA pathway incorporates CO2 as carbonyl and methyl components of the acetyl group (Figure 4.8a) (Ragsdale and Pierce, 2008). The methyl branch of

114 this pathway requires ATP, which acetogens may recover by acetate production. This ATP conservation contributes to efficient autotrophic CO2 fixation (Mall et al., 2018), but autotrophic culture conditions, which derive energy solely from inorganic sources (e.g., oxidation of H2), results in slow metabolism and low culture density (Schuchmann and Müller, 2014; Nunoura et al., 2018).

Figure 4. Glucose generates ATP for CO2 fixation but leads to decarboxylation in M. thermoacetica. (a) The reductive acetyl-CoA pathway consists of the carbonyl and methyl branches for conversion of CO2 into acetyl group. The methyl branch requires ATP. (b) Analysis of carbon input and output in batch cofeeding of M. thermoacetica with glucose and CO2 revealed the preferential use of glucose. The center and error bars represent mean ± S.E.M. (n = 3

13 biologically independent samples). (c) Batch co-feeding [U- C6]glucose and CO2 revealed the simultaneous use of glucose and CO2. Glucose carbons contributed mainly to glycolysis and PPP

115 while partially to TCA cycle. A substantial fraction of TCA carbons was traced to CO2. (d) Despite simultaneous utilization of CO2, preferred glucose use led to undesirable decarboxylation

–1 –1 outpacing CO2 uptake. Flux values are in mmol gCDW hr of acetyl-CoA. Individual data points for each biological replicate are shown as circles.

Since glycolysis effectively produces ATP and e– necessary for operating the reductive acetyl-

13 CoA pathway, we co-fed CO2 and [U- C6]glucose to M. thermoacetica and looked for signs of

CO2 incorporation. If acetate were the only product, we would expect up to 100% carbon yield, that is, three acetate molecules per glucose (Schuchmann and Müller, 2014). On the other hand, with potential other products (e.g., pyruvate) or biomass components (e.g., Ser/Gly, Asp, and Glu), net CO2 utilization becomes feasible as some pathways generate reducing agents without CO2 production or fix more CO2 than the amount produced (Figure 4.9). Since net CO2 utilization depends on the types and fractions of fermentation products, we quantified the cell growth, the secreted molecules, and their carbon yields relative to glucose consumption (Figure 4.8b). We observed the activity of the reductive acetyl-CoA pathway as the produced acetate accounted for

77% of glucose carbons, exceeding what is possible via glycolysis (67%). However, as glucose carbon consumption rate approximately matched the total carbon output rate of major products

(i.e., biomass, acetate, and formate accounted for 93%), we did not observe net CO2 utilization.

We hypothesized that our observed carbon yield was due to insufficient reducing agents available for new CO2 utilization and cells preferentially consuming glucose over CO2. To trace the fate of 13C-glucose carbons and to visualize usage, we measured 13C enrichment in cellular metabolites using LC-MS. Unlabeled CO2 was provided in the headspace and CO2 remained mostly unlabeled (Figure 4.10). The carbons of glycolytic intermediates were

≥90% labeled except for pyruvate, which was ~50% labeled (Figure 4.8c and Appendix B Table

116

B3). The lower labeling in pyruvate was due to reversible pyruvate:ferredoxin oxidoreductase

(PFOR), which can form pyruvate by combining unlabeled CO2 and acetyl group derived from the reductive acetyl-CoA pathway. With phosphoenolpyruvate (PEP) remaining mostly labeled, the contrasting pyruvate labeling indicated that pyruvate kinase (PEP + ADP → Pyr + ATP) was forward-driven to produce ATP.

- - A Extracting energy from glucose B Fixing CO2 without consuming e C Generating more e than required without generating CO2 for product CO2 fixation

Glucose CO2 Wood- Glucose CO2 Wood- Glucose CO2 Wood- 2e- Ljungdahl 2e- Ljungdahl 2e- Ljungdahl 3PG Ser/Gly Pathway 3PG Ser/Gly Pathway 3PG Ser/Gly Pathway ATP ATP ATP - CO2 CO2 2e CO2 PEP Pyr (Ala) AcCoA Ace PEP Pyr (Ala) AcCoA Ace PEP Pyr (Ala) AcCoA Ace CO2 CO2 CO2 2e- Cit CO2 Cit CO Cit OA OA 2 OA CO2 (Asp) αKG (Glu) (Asp) αKG (Glu) (Asp) αKG (Glu) Figure 4.9. Combination of glucose oxidation and some biosynthetic pathways enables net

CO2 fixation. (A) Ser and pyruvate/Ala synthesis from glucose produces energy without CO2. (B)

Biosynthesis of Asp and Asn that are derived from oxaloacetate incorporates CO2 without consuming reducing power. (C) Glu/Gln synthesis produces more reducing power than what is required for fixing CO2 that is produced in the process.

Interestingly, serine, glycine, as well as other amino acids derived from pyruvate and TCA cycle intermediates were also half-labeled (Figure 4.8c and Appendix Table B3). These labeling data suggested shared usage of central metabolism, where glucose and CO2 jointly contributed to the TCA cycle (and thus non-aromatic amino acid biosynthesis). However, because glycolysis and the pentose phosphate pathway (and thus the synthesis of nucleotide ribose rings and aromatic amino acids) were driven mainly by glucose, cells incorporated more carbons from glucose.

Therefore, despite the simultaneous consumption of CO2 and glucose, lack of observable net CO2 fixation was the result of cells prioritizing ATP production (and cell growth). Prioritizing ATP

117 production involves faster glycolysis via faster glucose uptake. This substrate hierarchy favoring glucose subsequently led to excessive pyruvate decarboxylation via PFOR (Figure 4.8d and

Appendix B Table B4) which, together with CO2-producing biosynthetic pathways, outpaced CO2 incorporation.

a b Cellular CO2 labeling Headspace CO2 labeling 13 13 from [U- C6]glucose from [U- C6]glucose 1 1 0.8 0.8 0.6 0.6

0.4 0.4 Fraction Fraction 0.2 0.2 0 0 M+0 M+1 M+0 M+1 13 Figure 4.10. Labeling of CO2 in the cell and in the headspace. (a,b) Tracing C from [U-

13 C6]glucose in M. thermoacetica. (a) Cellular CO2 labeling was inferred by comparing the labeling of citrulline and ornithine. (b) Headspace CO2 labeling as measured by GC-MS was consistent with the cellular CO2 labeling. Individual data points for each biological replicate are shown as circles. The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

– 4.3.4 Accelerating acetate production from CO2 by decoupling e supply from decarboxylation

Since undesirable decarboxylation is coupled to the PFOR step for e– generation from glucose (2

– – e generated per CO2 released), we aimed to limit the function of glucose as an e source by

– decreasing PFOR flux and stimulate net CO2 incorporation (4 e required per CO2 fixed). On the other hand, sufficient ATP is still required from glucose through pyruvate kinase to avoid slow metabolism and sustain CO2 reduction. We note that acetate production via the reductive acetyl-

CoA pathway does not consume ATP leading to cell maintenance (e.g., housekeeping) being the

118 only ATP requirement for converting CO2 to acetate (Figure 4.11). Hence, we implemented glucose-limiting culture environments in a chemostat to reduce the rate of glycolysis such that it supplies the required ATP but overall decarboxylation is slowed (Figure 4.12a). To compensate

– – for the decreased e availability, cells were provided with H2 as a carbon-free e source that yields

–1 reducing agents without CO2 generation. In addition, low dilution rates (0.009 and 0.017 hr ) were selected to minimize biomass formation and maximize cell residence time in the reactor.

)

1

- hr

0.05 20 1 -

0.04 16

)

1 - 0.03 12

0.02 8 Growth Rate (hr Rate Growth 0.01 4

0 0 (mmol Rate gCDW Consumption 0 2 4 6 8 2 -1 -1 ATP Maintenance (mmol gCDW hr ) CO

Figure 4.11. M. thermoacetica growth rate decreases with increasing ATP demand while

CO2 consumption drops suddenly to 0 once ATP demand reaches the threshold of ATP generation capability. Using the genome-scale metabolic model, we probed cell growth and CO2 utilization rates with increasing ATP demand. We used the maximum growth objective function.

As ATP cell maintenance cost is the only ATP requirement for converting CO2 into acetate via the reductive acetyl-CoA pathway, we observed that CO2 consumption remained steady until ATP cell maintenance demand reached ATP generation capability.

119

Figure 4.12. Continuous glucose cofeeding accelerates acetogenesis from CO2 fixation at the autotrophic limit. (a) Since glucose batch feeding leads to undesirable decarboxylation, glucose was continuously supplemented in small quantities to gas-fermenting M. thermoacetica

– culture. (b,c) Acetate productivity, yield, and CO2 fixation rate at varying ratios of electrons (e ) derived from H2 and glucose. The plots include both batch and chemostat data (Table 4.1). The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples). (b)

– Acetate productivity peaked when 91% of e were derived from H2 and 9% glucose. On the other hand, carbon yield (acetate produced per glucose consumed) increased with increasing fraction

– of electrons from H2. (c) CO2 fixation rate peaked when 9% of e were derived from glucose and remained high near the autotrophic limit.

Using this glucose doping system, productivities and yields at various fractions of electrons derived from glucose versus H2 were obtained (Figure 4.12b). In this plot, we also included batch results with or without H2 in the headspace (Table 4.1). Interestingly, the presence of H2 decreased glucose consumption rate, shifting carbon substrate preferences towards CO2 (Figure 4.13). At steady state, acetate concentration in the effluent from the chemostat could exceed 13 g L-1. With decreasing fractions of electrons from glucose, acetate production rate could be more than 80 times as fast as the glucose feed rate, and the carbon yield monotonically increased to >80 g acetate produced per g glucose consumed. This high yield indicated that the overwhelming majority of

120 acetate and biomass was derived from CO2 rather than glucose. While cell growth rates were slow in the chemostat (growth rate = dilution rate), acetate production remained fast (Figure 4.12b).

Importantly, we found that, at 2% of e– from glucose, glucose doping simultaneously enabled a very high yield (>50 g acetate per g glucose) and a substantial acetate productivity (>9 mmol gCDW–1 hr–1, ~⅓ of the maximum observed productivity).

)

1

hr 1 – 3

2

1

0

Glucose uptake rate (mmolrate GlucosegCDW uptake 0 0.25 0.5 0.75 1 – Fraction of e from H2 Figure 4.13. M. thermoacetica glucose consumption rate decreases in the presence of H2 and with increasing utilization of H2. In batch cultures of M. thermoacetica (the first two points:

ATCC 39073; and the last point: ATCC 49707), we measured glucose uptake rates in the absence

(the first point) and in the presence of H2 (the last two points) in the headspace. The greater fractional utilization of H2 corresponded to lower glucose consumption. The center and error bars represent mean ± S.E.M. (n = 3 biologically independent samples).

–1 – Across the glucose+H2 energy landscape, CO2 fixation rates peaked at 52.7 mmol gCDW hr

1 –1 –1 (2.3 g gCDW hr ) (Figure 4.12c). Such high rates implied that we not only decreased CO2 generation from pyruvate decarboxylation but also increased the reductive acetyl-CoA pathway

121 flux. Furthermore, the maximum rate occurring between the two extremes (glucose-only and H2- only) demonstrated that CO2 fixation rate is determined by a balance between reducing agents and

ATP supplied via H2 and glucose, respectively. Thus, by controlled glucose doping, we decoupled

– e supply from decarboxylation, shifted cellular metabolism towards favoring CO2 utilization over glucose, and achieved rapid and continuous CO2 conversion into acetate.

4.3.5 Coordination of “doped” acetogenesis and lipogenesis

Coordinating acetogenesis and lipogenesis allows CO2-to-acetate-to-lipid conversion.

Interestingly, the observed acetate and fatty acid productivities from glucose- and gluconate- doping (V12) exceeded not only the measured productivities with individual substrates (V1 or V2) but also the expected productivity for the two substrates combined (V1 + V2) (Figure 4.14a). The expected productivity was linearly extrapolated from the combination of supplemental glucose feeding with CO2+H2 batch fermentation for acetogenesis and the combination of supplemental gluconate feeding with acetate batch fermentation for lipogenesis.

We attributed the observed synergy (V12>V1+V2) to complementary substrate cofeeding. While our 13C labeling experiments showed the roles of glucose and gluconate in ATP and cofactor synthesis, respectively, we sought to define the theoretical framework that illustrates the feasibility of this synergy. To this end, stoichiometric analysis of the different fates of individual substrates was combined with experimentally measured rates of single-substrate acetogenesis and lipogenesis. The maximum carbon, ATP, and electron attainable with mixed substrates were then evaluated for the two processes. We identified that the ATP and NADPH generation by glucose and gluconate doping relieved the limiting ingredients for acetate and lipid synthesis, respectively,

122 and, in conjunction with the primary substrates, better balanced the energy and cofactor ratio requirements for reduced bioproduct synthesis (Figure 4.14b).

Figure 6. Synergy and coordination of substrate co-feeding accelerate the conversion of

CO2 and H2 into lipids. (a) Glucose- and gluconate-doping resulted in synergy (‘Observed’) that accelerated acetogenesis and lipogenesis beyond the linear extrapolation of additional carbon supplement (‘H2 + Glc’ and ‘Ace + Glcn’). The center and error bars represent mean ± S.E.M. (n

= 3 biologically independent samples). (b) The maximum CO2 fixation and fatty acid production rates were attained by co-feeding glucose and gluconate in limiting quantities. Stoichiometric analysis of metabolic requirements and burdens revealed the key role of glucose and gluconate in generating ATP and NADPH. The dashed arrows denote negligible contributions. (c) In terms 123 of energy efficiency, 95% of H2 energy can be stored as acetate by M. thermoacetica and 55% of acetate energy can be stored as lipids by Y. lipolytica. Coordination of acetogenesis and lipogenesis enabled storage of 38% of H2 energy as lipids and 14% as biomass. (d) The cofeeding approach can also be applied to synthesizing other products with predicted synergistic productivity that exceeds the sum of individual-substrate productivities. Individual data points for each biological replicate are shown as circles.

In terms of organic carbon yield, the integrated acetogenesis-lipogenesis process converted 1 g of glucose to ~13 g of lipids (0.154 g lipids per g acetate × ~82 g acetate per g glucose) by extensive CO2 utilization. Increasing mass transfer rates of gases improves H2 (and CO2) utilization efficiency, and it has been reported that ~95% of supplied H2 can be used by commercial

CO2-fixing microbes (Daniell, Köpke and Simpson, 2012; Hu, Rismani-Yazdi and

Stephanopoulos, 2013). By continuously converting CO2 and H2 to lipids via coordinated acetogenesis and lipogenesis, 38% of energy from H2 was be stored as lipids and 14% as yeast biomass (Figure 4.14c). Nearly all carbons (~99%) in lipids originated from CO2.

To further explore the potential of our synergistic co-feeding approach, we applied the stoichiometric analysis to other acetyl-CoA derived products and determined the gains in productivities (Figure 4.14d). Similar to the results for fatty acids, the model predicted synergy between the substrate pair in producing other reduced compounds such as polyhydroxybutyrate

(PHB) and isopentenyl pyrophosphate (IPP, precursor for isoprenoids), leading to increases in productivities over the extrapolated sum (V12>V1+V2). Therefore, our synergistic substrate co- feeding strategy may stimulate conversion of CO2 into a wide array of advanced bioproducts.

124

4.4 Discussion

One of the greatest biotechnological challenges is engineering metabolism. Current engineering efforts often focus on funneling metabolic fluxes through product synthesis pathways via assembling various gene pools and knocking out competing pathways with existing genetic tools

(Blazeck et al., 2014). In addition, most processes start from sugars as the sole substrate, which inherently causes some metabolic intermediates to be out of balance and surplus components to be wasted because of imbalances in chemical energetics between the substrate and the product. This further necessitates the use of genetic engineering for flux rewiring in order to achieve industrially relevant production metrics. Such approaches set a limit on the choice of microbial hosts based on genetic manipulability and the existing strategies are not generalizable to all organisms.

Here we presented the potential of synergistic substrate co-feeding as a generalizable method and a more effective starting point for bioproduct synthesis. Importantly, we overcame the difficulties that arise due to organisms’ preferential substrate usage. Controlled continuous feeding of a preferred substrate as a metabolic dopant did not inhibit the consumption of the less favored substrate. Using this approach, we enhanced the utilization of CO2 and acetate, which are typically the end products of metabolism and therefore least preferred by organisms. This was demonstrated in both M. thermacetica and Y. lipolytica, two organisms with distinct metabolism and genetic manipulability, using various substrate pairs (glucose and H2/CO2 as well as acetate and gluconate). Correspondingly, we expect this design to be widely applicable to other substrates and organisms.

Surprisingly, substrate co-feeding synergistically enhanced product synthesis. In both cases, the total product carbon flux resulting from co-utilized substrates exceeded the sum of the individual substrate fluxes (V12>V1 + V2 where V represents the productivity on either substrate (1

125 or 2 as subscript) or both substrates (12 as subscript)). However, previous experimental efforts and models describing mixed substrate utilization have shown sublinear productivity enhancements

(V12>V1 or V2 but V12

Müller, 1993; Babel, 2009; Hermsen et al., 2015). The synergistic co-feeding scheme reported here goes beyond “sacrificing” a secondary substrate in order to provide energetic gains for the primary substrate. In our case, we found that under carefully controlled but easy to implement conditions, the two substrates can have a distinct mode of interaction, allowing them to balance fluxes across the entire metabolic network, achieving substantially higher productivities than previous mixed substrate fermentation efforts. Importantly, the observed substantial enhancements in CO2 and acetate reduction metabolism required only minor addition of “valuable” glucose and gluconate, and the productivity gain outweighed the “cost” of the dopants.

To understand how the dopant substrates can strikingly achieve such efficiency in enhancing reductive metabolism, we also elucidated the underlying mechanisms. Our stoichiometric analysis of metabolic requirements and burdens suggested that glucose and gluconate as dopant substrates could indeed complement ATP and NADPH generation, alleviating the limitations seen in acetogenesis and lipogenesis, respectively. Tracing 13C-labeled glucose and gluconate revealed that nearly all of glucose and gluconate were utilized locally within glycolysis and PPP. Hence, cells used these substrates almost exclusively for ATP and NADPH production, respectively. We identified pyruvate kinase (PEP + ADP → Pyr + ATP) in M. thermoacetica and the pentose cycle

(6PG → R5P → F6P → G6P → 6PG + 2 NADPH) in Y. lipolytica to be important cofactor generating steps. In particular, activating pyruvate kinase by co-feeding glucose solved the challenge of slow CO2 fixation, which is due to ATP-limited metabolism in autotrophic fermentations (Daniel et al., 1990; Hu, Rismani-Yazdi and Stephanopoulos, 2013). Activating the

126 pentose cycle by co-feeding gluconate solved the challenge of limited NADPH production through oxPPP in acetate-fed cells. Therefore, we rewired metabolism without genetic engineering through synergistic cofeeding.

Akin to the widespread application of dopants in the electronics industry to enhance material properties, we envision the use of dopant substrates in synergistic co-feeding becoming valuable in a wide array of biotechnological applications. Our demonstration of CO2/H2-to-acetate-to-lipids

–1 –1 conversion at high productivity (up to 2.3 gCDW hr CO2 fixation) and energetic efficiency

(38% energy yield) serves as an exemplary renewable energy storage strategy using substrates that do not interfere with food supply. Since acetate is closely related to acetyl-CoA, a focal point in many metabolic pathways, other acetate-based processes applying proper doping substrates could enable rapid synthesis of a wide repertoire of bioproducts such as fatty acid derived oleochemicals

(Ledesma-Amaro et al., 2016) and mevalonate pathway derived natural products (Martin et al.,

2003). By coupling this to the glucose-doped acetogenesis, CO2 could become the initial feedstock for all subsequent acetate-driven processes, benefiting both the environment and carbon economy.

With process optimization and scale-up, the productivities reported here could potentially enable these CO2-based biosynthetic processes to be market-competitive. Therefore, the findings presented in this paper contribute to various fields ranging from fundamental metabolic control theory to metabolic engineering to CO2 utilization.

Finally, the metabolic enhancements by co-feeding superior substrates is not limited to CO2- and acetate-based fermentations. The imbalance of carbon building blocks, cofactors, and energy with respect to the desired product requirement can also be seen in many other single-substrate bioconversions. In these cases, identification of complementary substrates and implementation of controlled dopant substrate co-feeding would optimally coordinate pathway usage for superior

127 biosynthesis. Consequently, substrates previously considered infeasible for industrial bioprocesses due to limited productivity may become well-suited as economically and technologically viable feedstocks (Haynes and Gonzalez, 2014).

4.5 References

Antoniewicz, M. R., Kelleher, J. K. and Stephanopoulos, G. (2006) ‘Determination of

confidence intervals of metabolic fluxes estimated from stable isotope measurements’,

Metabolic Engineering, 8(4), pp. 324–337. doi: 10.1016/j.ymben.2006.01.004.

Antoniewicz, M. R., Kelleher, J. K. and Stephanopoulos, G. (2007) ‘Elementary metabolite units

(EMU): A novel framework for modeling isotopic distributions’, Metabolic Engineering,

9(1), pp. 68–86. doi: 10.1016/j.ymben.2006.09.001.

Aristilde, L. et al. (2015) ‘Hierarchy in pentose sugar metabolism in Clostridium

acetobutylicum’, Applied and Environmental Microbiology, 81(4), pp. 1452–1462. doi:

10.1128/AEM.03199-14.

Atsumi, S., Hanai, T. and Liao, J. C. (2008) ‘Non-fermentative pathways for synthesis of

branched-chain higher alcohols as biofuels’, Nature, 451(7174), pp. 86–89. doi:

10.1038/nature06450.

Babel, W. (2009) ‘The Auxiliary Substrate Concept: From simple considerations to heuristically

valuable knowledge’, Engineering in Life Sciences, 9(4), pp. 285–290. doi:

10.1002/elsc.200900027.

Babel, W., Brinkmann, U. and Müller, R. H. (1993) ‘The auxiliary substrate concept — an

approach for overcoming limits of microbial performances’, Acta Biotechnologica, 13(3),

pp. 211–242. doi: 10.1002/abio.370130302.

128

Blazeck, J. et al. (2014) ‘Harnessing Yarrowia lipolytica lipogenesis to create a platform for lipid

and biofuel production.’, Nature communications. 5, p. 3131. doi: 10.1038/ncomms4131.

Bowes, G., Ogren, W. L. and Hageman, R. H. (1972) ‘ Light Saturation, Photosynthesis Rate,

RuDP Carboxylase Activity, and Specific Leaf Weight in Soybeans Grown Under

Different Light Intensities 1 ’, Crop Science, 12(1), pp. 77–79. doi:

10.2135/cropsci1972.0011183x001200010025x.

Bren, A. et al. (2016) ‘Glucose becomes one of the worst carbon sources for E.coli on poor

nitrogen sources due to suboptimal levels of cAMP’, Scientific Reports. 6, pp. 2–11. doi:

10.1038/srep24834.

Casazza, J. P. and Veech, R. L. (1986) ‘The interdependence of glycolytic and pentose cycle

intermediates in ad libitum fed rats’, Journal of Biological Chemistry, 261(2), pp. 690–

698.

Clasquin, M. F., Melamud, E. and Rabinowitz, J. D. (2012) ‘LC-MS data processing with

MAVEN: A metabolomic analysis and visualization engine’, Current Protocols in

Bioinformatics, pp. 1–23. doi: 10.1002/0471250953.bi1411s37.

Daniel, S. L. et al. (1990) ‘Characterization of the H2- and CO-dependent chemolithotrophic

potentials of the acetogens Clostridium thermoaceticum and Acetogenium kivui’, Journal

of Bacteriology, 172(8), pp. 4464–4471. doi: 10.1128/jb.172.8.4464-4471.1990.

Daniell, J., Köpke, M. and Simpson, S. D. (2012) Commercial biomass syngas fermentation,

Energies. doi: 10.3390/en5125372.

Fontanille, P. et al. (2012) ‘Bioconversion of volatile fatty acids into lipids by the oleaginous

yeast Yarrowia lipolytica’, Bioresource Technology, 114, pp. 443–449. doi:

10.1016/j.biortech.2012.02.091.

129

Gancedo, J. M. (1992) ‘Carbon catabolite repression in yeast’, European Journal of

Biochemistry, 206(2), pp. 297–313. doi: 10.1111/j.1432-1033.1992.tb16928.x.

Görke, B. and Stülke, J. (2008) ‘Carbon catabolite repression in bacteria: Many ways to make

the most out of nutrients’, Nature Reviews Microbiology, 6(8), pp. 613–624. doi:

10.1038/nrmicro1932.

Haynes, C. A. and Gonzalez, R. (2014) ‘Rethinking biological activation of methane and

conversion to liquid fuels’, Nature Chemical Biology. 10(5), pp. 331–339. doi:

10.1038/nchembio.1509.

Hermsen, R. et al. (2015) ‘A growth‐rate composition formula for the growth of E. coli on co‐

utilized carbon substrates’, Molecular Systems Biology, 11(4), p. 801. doi:

10.15252/msb.20145537.

Hu, P., Rismani-Yazdi, H. and Stephanopoulos, G. (2013) ‘Anaerobic CO2 fixation by the

acetogenic bacterium Moorella thermoacetica’, AIChE Journal, 59(9), pp. 3176–3183.

doi: 10.1002/aic.

Islam, M. A. et al. (2015) ‘Investigating Moorella thermoacetica metabolism with a genome-

scale constraint-based metabolic model’, Integrative Biology. 7(8), pp. 869–882. doi:

10.1039/c5ib00095e.

Jones, S. W. et al. (2016) ‘CO2 fixation by anaerobic non-photosynthetic mixotrophy for

improved carbon conversion’, Nature Communications. 7, pp. 1–9. doi:

10.1038/ncomms12800.

Joshua, C. J. et al. (2011) ‘Absence of diauxie during simultaneous utilization of glucose and

xylose by Sulfolobus acidocaldarius’, Journal of Bacteriology, 193(6), pp. 1293–1301.

doi: 10.1128/JB.01219-10.

130

Kanno, M., Carroll, A. L. and Atsumi, S. (2017) ‘Global metabolic rewiring for improved CO2

fixation and chemical production in cyanobacteria’, Nature Communications. 8, pp. 1–11.

doi: 10.1038/ncomms14724.

Kim, S. M. et al. (2015) ‘Simultaneous utilization of glucose and xylose via novel mechanisms

in engineered Escherichia coli’, Metabolic Engineering. 30, pp. 141–148. doi:

10.1016/j.ymben.2015.05.002.

Kita, A. et al. (2013) ‘Development of genetic transformation and heterologous expression

system in carboxydotrophic thermophilic acetogen moorella thermoacetica’, Journal of

Bioscience and Bioengineering. 115(4), pp. 347–352. doi: 10.1016/j.jbiosc.2012.10.013.

Ledesma-Amaro, R. et al. (2016) ‘Combining metabolic engineering and process optimization to

improve production and secretion of fatty acids.’, Metabolic Engineering. 38, pp. 38–46.

doi: 10.1016/j.ymben.2016.06.004.

Ledesma-Amaro, R. and Nicaud, J. M. (2016) ‘Metabolic Engineering for Expanding the

Substrate Range of Yarrowia lipolytica’, Trends in Biotechnology. 34(10), pp. 798–809.

doi: 10.1016/j.tibtech.2016.04.010.

Liu, N., Qiao, K. and Stephanopoulos, G. (2016) ‘13C Metabolic Flux Analysis of acetate

conversion to lipids by Yarrowia lipolytica’, Metabolic Engineering, 38, pp. 86–97. doi:

10.1016/j.ymben.2016.06.006.

Mall, A. et al. (2018) ‘Reversibility of citrate synthase allows autotrophic growth of a

thermophilic bacterium’, Science, 359, pp. 563–567. doi: 10.1126/science.aao2410.

Martin, V. J. J. et al. (2003) ‘Engineering a mevalonate pathway in Escherichia coli for

production of terpenoids’, Nature Biotechnology, 21(7), pp. 796–802. doi:

10.1038/nbt833.

131

Martínez, K. et al. (2008) ‘Coutilization of glucose and glycerol enhances the production of

aromatic compounds in an Escherichia coli strain lacking the phosphoenolpyruvate:

Carbohydrate phosphotransferase system’, Microbial Cell Factories, 7, pp. 1–12. doi:

10.1186/1475-2859-7-1.

Meyer, F. et al. (2018) ‘Methanol-essential growth of Escherichia coli’, Nature Communications.

9(1). doi: 10.1038/s41467-018-03937-y.

Michaelis, L. and Guzman Barron, E. (1929) ‘Oxidation-reduction systems of biological

significance. II. Reducing effect of cysteine induced by free metals’, Journal of

Biological Chemistry, 81, pp. 29–40.

Monod, J. (1942) Recherches sur la croissance des cultures bacteriennes.

Nunoura, T. et al. (2018) ‘A primordial and reversible TCA cycle in a facultatively

chemolithoautotrophic thermophile’, Science, 359, pp. 559–563. doi:

10.1126/science.aao3407.

Qiao, K. et al. (2015) ‘Engineering lipid overproduction in the oleaginous yeast Yarrowia

lipolytica’, Metabolic Engineering. 29, pp. 56–65. doi: 10.1016/j.ymben.2015.02.005.

Qiao, K. et al. (2017) ‘Lipid production in Yarrowia lipolytica is maximized by engineering

cytosolic redox metabolism’, Nature Biotechnology. 35(2), pp. 173–177. doi:

10.1038/nbt.3763.

Rabinowitz, J. D. and Kimball, E. (2007) ‘Acidic acetonitrile for cellular metabolome extraction

from Escherichia coli’, Analytical Chemistry, 79(16), pp. 6167–6173. doi:

10.1021/ac070470c.

Ragsdale, S. W. and Pierce, E. (2008) ‘Acetogenesis and the Wood-Ljungdahl pathway of CO2

fixation’, Biochimica et Biophysica Acta - Proteins and Proteomics, 1784(12), pp. 1873–

132

1898. doi: 10.1016/j.bbapap.2008.08.012.

Ratledge, C. and Wynn, J. P. (2002) ‘The biochemistry and molecular biology of lipid

accumulation in oleaginous microorganisms’, Advances in Applied Microbiology, 51, pp.

1–51. doi: 10.1016/S0065-2164(02)51000-5.

Sanchez, R. G. et al. (2010) ‘Improved xylose and arabinose utilization by an industrial

recombinant’, Biotechnology for Biofuels, 3(13). doi: 10.1186/1754-6834-3-13.

Schellenberger, J. et al. (2011) ‘Quantitative prediction of cellular metabolism with constraint-

based models: The COBRA Toolbox v2.0’, Nature Protocols, 6(9), pp. 1290–1307. doi:

10.1038/nprot.2011.308.

Schuchmann, K. and Müller, V. (2014) ‘Autotrophy at the thermodynamic limit of life: A model

for energy conservation in acetogenic bacteria’, Nature Reviews Microbiology. doi:

10.1038/nrmicro3365.

Tai, M. and Stephanopoulos, G. (2013) ‘Engineering the push and pull of lipid biosynthesis in

oleaginous yeast Yarrowia lipolytica for biofuel production’, Metabolic Engineering.

15(1), pp. 1–9. doi: 10.1016/j.ymben.2012.08.007.

Tracy, B. P. et al. (2012) ‘: The importance of their exceptional substrate and

metabolite diversity for biofuel and biorefinery applications’, Current Opinion in

Biotechnology. 23(3), pp. 364–381. doi: 10.1016/j.copbio.2011.10.008.

Xu, J. et al. (2017) ‘Application of metabolic controls for the maximization of lipid production in

semicontinuous fermentation’, Proceedings of the National Academy of Sciences,

114(27), pp. E5308–E5316.

Xue, Z. et al. (2013) ‘Production of omega-3 eicosapentaenoic acid by metabolic engineering of

Yarrowia lipolytica.’, Nature biotechnology. 31(8), pp. 734–40. doi: 10.1038/nbt.2622.

133

134

Chapter 5

A Recommender System for Host Organism Selection

135

5.1 Introduction

Nowadays, with maturing high-throughput methods and the advent of multi-omics techniques, a plethora of biological data has become available and widely accessible, advancing data-driven and machine learning approaches in biotechnology (Presnell and Alper, 2019). With an abundant amount of high-quality data, these algorithms can unveil the underlying characteristics of the input information to generate meaningful predictions, circumventing the lack of mechanistic understanding in many complex systems (Bishop, 2006). Consequently, they have been successfully employed in many biotechnological areas such as disease diagnostics (Shipp et al.,

2002; Tsai et al., 2020), genome annotation (Libbrecht and Noble, 2015; Ryu, Kim and Lee, 2019), and enzyme engineering (Wu et al., 2019; Yang, Wu and Arnold, 2019) with many more applications under development.

Metabolic engineering, an integral component of industrial biotechnology that focuses on the modulation of an organism’s metabolism for the synthesis of various products, also benefits from the availability of relevant data and the implementation of data-based models. In the process of optimizing a strain for production purposes, automated algorithms for searching enzymatic reactions (Mellor et al., 2016) can be combined with the machine learning-guided directed evolution (Yang, Wu and Arnold, 2019) for pathway reconstruction, formulating the basis of substrate-to-product conversion. Additionally, statistical methods applied to transcriptomics

(Guan et al., 2018), proteomics (Alonso-Gutierrez et al., 2015), and metabolomics (Ohtake et al.,

2017) could guide the rewiring of pathway fluxes, and further fine-tuning based on biological information generated from predictive algorithms (Costello and Martin, 2018; Han et al., 2019) can be conducted in order to maximize production metrics. In microbial fermentation, machine learning techniques have also been integrated for better control of process parameters (Gadkar,

136

Mehra and Gomes, 2005). However, although these data-driven techniques complement knowledge- or experience-based practices in many aspects of metabolic engineering, it has been rarely applied to host organism selection, which can largely dictate the chances of success for constructing an efficient strain.

Here we report a recommender system approach towards choosing the proper microbe given a specific product or vice versa. Similar to how media streaming services (e.g., Netflix) provide item recommendations to its users based on a dataset containing user-item-rating information generated from numerous viewing histories, our work applies the method to a strain-product-titer dataset with the goal of suggesting specific strain-product combinations that can potentially achieve high production metrics. The dataset was constructed from metabolic engineering and synthetic biology literature followed by manual curation. Analyzing both the trends of the dataset as well as the recommendation lists obtained by implementing the machine learning algorithm reveals two interesting findings. First of all, while the results imply that the strengths and weaknesses of most organisms are consistent with our experiences, they also highlight situations where strains can be better suited for applications different from traditional use cases. Moreover, the recommender system also specifies a condition on when to use model (i.e., Escherichia coli and Saccharomyces cerevisiae) versus non-model organisms. In particular, for simple bioproducts with well- characterized pathways, using non-model microbes that are naturally occurring high-producers generally outcompetes the model microbes. On the other hand, for more sophisticated compounds, such as natural products, the use of model organisms with the ability to perform extensive engineering should be the choice. The raw data collected from literature also provides evidence in support of the same conclusions as well. Overall, this work provides an alternative perspective to the host organism selection problem, in that it attempts to summarize the principles related to

137 choosing proper strain-product pairs as hinted by recent literature results. With further refinement of the data, this framework can be a valuable tool for initiating metabolic engineering research with the purpose of creating high-performing strains for industrial biotechnology.

5.2 Methods

5.2.1 Data collection and processing

The raw data in this study was collected from abstracts of journal articles published between 1999 and mid 2019. ~50,000 abstracts were retrieved from the Web of Science database using a combination of two keywords, one from each group:

Group I keywords: “metabolic engineering” and “synthetic biology”

Group II keywords: “product” and genus names of identified strains (see below)

The initial screening processes retained only the abstracts containing a concentration unit (g/L, mg/L, mol/L, mmol/L, M, mM or their formatting variants) as an indicator of reporting a titer value. Additionally, studies related to in vitro biocatalysis, co-culture/consortia, biohydrogen production, wastewater treatment, protein production, and biomass production were removed.

Preprocessing scripts that identified potential choices of strains, product names, and titer values were also included. These choices were then manually verified and the correct strain, product, and titer were selected or otherwise manually inputted if none of the choices were correct. The manually verified strains and product names were individually collected into two lists, which were then used by the preprocessing scripts for strain and product identification in subsequent screenings. The strain list was also used as a group II keyword in the abstract search.

Following raw data collection, the dataset was manually curated to further remove any irrelevant entries of those that were related to the unwanted topics listed above. Any strain that has

138 fewer than 5 associated entries in the dataset was consolidated into its genus to reduce the variability in strain type. Additionally, products were classified into 46 distinct product classes based on common precursors or metabolic pathway usage. Finally, all titer values were converted into the g/L unit by identifying the original unit and applying a conversion factor. The strains, consolidated strains, product names, and product classes were all assigned a unique numeric identifier for data processing and algorithm implementation. The full dataset can be accessed online along with the full recommendation lists provided by this study.

Prior to implementing the recommender system algorithm using this dataset, some additional processing was performed. When there were multiple entries associated with the same strain- product combination, either the median, average, or maximum titer value was used as a representation. The performance of the algorithm was evaluated using each of the three metrics and it was determined that using the median value gave the best results (see section 5.3.2). In addition, the raw titer values were collected into 46 separate groups according to the product classes and normalization into a titer rating number between 1 and 10 was performed within each individual group. This was done by first calculating the log10 value of the raw titer and then applying a linear map within a given product class were the lowest log10(titer) value was assigned a 1 rating, the highest log10(titer) value was assigned a 10 rating, and everything else in between.

These titer rating values were used to populate a sparse matrix with each row representing a strain and each column representing a product. All data processing and analysis was conducted using a combination of Microsoft Excel and MATLAB 2018a.

139

5.2.2 The recommender system algorithm

The sparse matrix constructed using titer rating values from the dataset as explained in the previous section served as the input to the recommender system algorithm. Essentially, the algorithm functions as a dimensionality reduction process where the initial data matrix is decomposed into two or more smaller matrices with lower rank. In other words, the following objective function (or those with similar forms depending on the specific algorithm used) is subjected to optimization:

푚 푛 2 2 2 argmin ∑ (퐔푖 ∙ 퐕푗 − 푅푖푗) + 휆 ∑‖퐔푖‖ + 휆 ∑‖퐕푗‖ 퐔,퐕 (푖,푗)∈퐑 푖=1 푗=1 where R is the m×n original data matrix (sparse), U and V are the two matrices obtained after matrix decomposition and are of size m×k and k×n, respectively, and λ is the tuning parameter for regularization (regularization may or may not be included depending on the algorithm used). Note here that k << min(m,n), and hence this is a dimensionality reduction process. Depending on the algorithm used, U and V will have different representations or will be products of other matrices but the general ideas remain the same. Once U and V are obtained, the prediction matrix can be easily calculated as the product of the two:

퐑pred = 퐔퐕 where Rpred is now a completely filled matrix. In our study, we examined alternating least squares

(ALS) or probability principal component analysis (PPCA) as the two basic algorithms that can achieve the goal with missing data in the input (Ilin and Raiko, 2010). Actual implementations of these algorithms were conducted in MATLAB 2018a according to (Severson, Molaro and Braatz,

2017).

In order to evaluate algorithm performance, the order of the dataset was first randomized, then split into two groups where 10% of the data was allocated for testing (test data) and the rest for

140 building predictions (training data and validation data). To train the model and formulate predictions, the following procedure was adopted. First, 10-fold cross validation was performed on the set of training + validation data, and the average validation root-mean-square error

(RMSEval) across the 10 instances was documented. Furthermore, the cross validation was conducted over 10 epochs of the dataset with prior randomization in order to minimize the impact of noisy data being grouped into a single validation batch. As such, the resulting RMSEval is calculated as the average across 100 validation instances (10 epochs times 10-fold for each epoch).

This process was then repeated by varying the hyperparameters (k and λ) of the models and the optimal values that minimized RMSEval was selected. Using the chosen hyperparameter values, the model was trained on the combined set of training + validation data, and the resulting predictions were compared to the test data. To quantify the accuracy of the prediction, we calculated the RMSE for the test set (RMSEtest) as well as a classification accuracy defined as the percentage of test datapoints that the algorithm correctly predicted as a high- (titer rating ≥ 5.5) or a low- (titer rating < 5.5) performing strain-titer pair. Finally, the entire testing procedure was repeated multiple times until all datapoints within the original dataset have been tested. The average testing error and accuracy as well as the associated standard deviations were used to compare the performance between the ALS and PPCA algorithms. To generate final predictions and recommendations, the better-performing algorithm was trained on the entire dataset.

5.3 Results

5.3.1 Characteristics of the dataset

After collecting strain-product-titer information from literature and manually curating the individual entries, we obtained a dataset containing 334 unique organisms and 564 unique

141 products. Consolidating the less studied organisms into their respective genera (see section 5.2.1) reduced the total number of organisms down to 195. The total number of valid entries in our dataset was 2,632, with 1,188 unique strain-product pairs and the rest being additional reports of the same strain-product combination at a different titer. Hence, the size of our final data matrix was 195 rows (representing consolidated organisms) by 564 columns (representing products) with 1,188 filled elements, resulting in a matrix density of 1.08%. For all subsequent analysis, we will use the consolidated list of organisms as opposed to the full set. Note that we do not anticipate this dataset to be a complete exhaustion of all strain and metabolic engineering results from literature.

Nonetheless, our dataset here should reflect the general activities of the field within the past two decades.

Upon inspection of the dataset, we immediately noticed that the model organisms, Escherichia coli and Saccharomyces cerevisiae, accounted for most of the entries (45%, Figure 5.1a). This is not surprising given that the majority of research related to industrial biotechnology is conducted in these two microbes due to the availability of genetic tools, biological knowledge, as well as prior engineering and culture experience. Nevertheless, as indicated by Figure 5.1b which shows the top ten most-studied organisms in our dataset, there are a number of other microbes that are widely researched and suited for various applications. Examples include Corynebacterium glutamicum for amino acid production (accounting for 45% of all C. glutamicum entries),

Yarrowia lipolytica for the biosynthesis of acetyl-CoA derived molecules (accounting for 62% of all Y. lipolytica entries), and Klebsiella pneumoniae for diol production (accounting for 62% of all

K. pneumoniae entries). Several other species that are considered as representative organisms to study unique biology and metabolism are also prevalent in the dataset (Figure 5.1), including the gram-positive bacteria Bacillus subtilis, the Aspergillus niger, and the photosynthetic

142 cyanobacteria Synechococcus elongatus. These unconventional microbes (defined as those that are not E. coli or S. cerevisiae) contribute to 19% of the total entries in our dataset (Figure 5.1a), suggesting that metabolic engineering research utilizing these organisms with relatively underdeveloped tools is quite substantial. We also analyzed the product diversity of each strain in our dataset by counting the associated number of unique products. This indicates how versatile each microbe can be and serves as a surrogate to how well these organisms can be engineered.

When ranked by this metric, the list of top ten strains remains largely the same compared to the previous ranking by number of entries (Figure 5.1 and Figure 5.2), reflecting that the engineerability of a strain contributes to its ‘popularity’.

a b 900 E. coli 800 S. cerevisiae 700 C. glutamicum 600 Y. lipolytica 500 B. subtilis 400 300 K. pneumoniae

Number of entries of Number 200 Streptomyces sp. 100 0 Other A. niger S. elongatus

K. marxianus coli E.

A. niger A.

B. subtilis B.

Y. lipolytica Y.

S. cerevisiae S. elongatus S.

K. marxianus K.

C. glutamicum C. K. pneumoniae K. sp. Streptomyces Figure 5.1. Most used organisms ranked by the number of associated entries in the dataset.

(a) Number of entries associated with each strain relative to the total number of entries in the dataset (2,632). Only the top ten are shown. The dataset shows that E. coli and S. cerevisiae as the two model organisms account for the majority of studies that are present in the dataset (45%).

However, several other non-conventional organisms have received a great deal of attention as

143 well, contributing to 19% of total entries. (b) Absolute number of entries associated with each strain.

400

300

200

100 Number of unique products unique of Number

0

E. coli E.

P. putida P.

B. subtilis B.

P. pastoris P.

Y. lipolytica Y.

S. cerevisiae S.

S. elongatus S.

C. glutamicum C. K. pneumoniae K. Streptomyces sp. Streptomyces Figure 5.2. Top ten most used organisms ranked by the number of unique products. This ranking reflects the versatility of an organism at synthesizing bioproducts. It also reflects, to a certain degree, the engineerability of the organism.

We also performed a similar analysis on the various products listed in our dataset. In this case, the distribution of product entries was more diverse where the top ten most-studied products only accounted for 25% of the dataset (Figure 5.3a), reflecting the flexibility of microbial biosynthesis.

Ethanol remained as the most popular product of choice among metabolic engineering studies

(Figure 5.3a,b), presumably due to its simplicity to produce and the influence from first- and second-generation biofuels. Other commonly encountered biofuel products such as lipids (i.e., single-cell oil), fatty acids, and butanol are also present among the list (Figure 5.3a,b). Apart from fuels, certain industrially relevant chemicals (diols, , and TCA cycle acids) have also been investigated numerous times (Figure 5.3a,b). Note that all of these products are either central

144 carbon intermediates or can be derived from central carbon metabolism within several well- characterized enzymatic steps. Additionally, most of these products have well-known high- producers, such as S. cerevisiae for ethanol, Lactobacillus sp. for lactic acid, and Y. lipolytica for lipids. Hence, they tend to receive the most attention. Although nearly all of the top most studied products in our dataset are either chemicals or fuels, microbial natural product synthesis is also a dominant area of research (Figure 5.3c). Given the functional and structural diversity of this class of compounds as well as its importance to metabolic engineering, we isolated the natural products class and looked into its trends. According to our dataset, terpenoids is the most dominant subclass of natural products when ranked by total number of entries (Figure 5.4a). Among the ten most synthesized natural products, achieved several of the highest rankings (Figure 5.4b).

Akin to the focus on the biosynthesis of simple molecules due to better control of metabolic pathways, this also suggests that our understanding of terpene synthesis is more profound compared to that of the other subclasses of natural products. In terms of titers, our dataset contained entries than spanned multiple orders of magnitude, with several bulk chemicals being produced at concentrations greater than 100 g/L, whereas certain complex compounds were only synthesized at quantities below 1 mg/L. Figure 5.5 shows the titer histograms of the four major classes of bioproducts. For amino acids, chemicals, and fuel products, the distributions of titer values are largely similar to each other, generally clustering above the 1 g/L scale with the peak lying within the range of 10-100 g/L. Natural products, on the other hand, are synthesized at much lower amounts, where the distribution is shifted to the left. Given that natural products are generally more challenging to produce in microbial hosts, it is expected that their titer values are lower compared to other products.

145

a ethanol b 160 140 2,3-butanediol 120 Other butanol 100 l-lactic acid 80 1,3-propanediol 60 40 entries of Number fatty acid 20 lipid 0

d-lactic acid

lipid

ethanol

butanol

citric acid citric

l-lactic acid l-lactic

d-lactic acid d-lactic

succinic acid succinic

free fatty acid fatty free 2,3-butanediol 1,3-propanediol c amino acids

natural products

chemicals fuels

Figure 5.3. Most produced products ranked by the number of associated entries in the dataset. (a) Number of entries associated with each product relative to the total number of entries in the dataset (2,632). Only the top ten are shown. There is a much larger degree of diversity here where the top ten products only accounts for 25% of the dataset, suggesting that a wide range of products is well established for microbial synthesis. (b) Absolute number of entries associated with each product. All of these are compounds are directly related to central carbon metabolism.

(c) Distribution of the number of entries associated with the four major classes of products according to the dataset. The microbial synthesis of various chemical compounds represents the bulk of the entries (45%), whereas fuels, natural products, and amino acids accounts for 22%,

23%, and 10%, respectively.

146 a b 35 30 25 polyketides alkaloids 20 15 10

terpenoids phenyl- Number of entries of Number propanoids 5

0

vanillin

geraniol

lycopene

naringenin

resveratrol

astaxanthin

pinocembrin

beta-carotene amorphadiene

triacetic acid lactone acid triacetic

Figure 5.4. Most produced natural products ranked by the number of associated entries in

the dataset. (a) Among the total number of natural product-associated entries in our dataset

(583), 41%, 30%, 21%, and 8% is attributed to terpenoids, phenylpropanoids, polyketides, and

alkaloids, respectively. (b) Absolute number of the top ten most studied natural products

according to our dataset. Terpenoids, and in particular carotenoids, are ranked amongst the

highest.

5.3.2 Algorithm testing

For the recommender system, two algorithms, ALS and PPCA, were tested and their performances

were compared. Additionally, the best method to handle cases where there are multiple entries in

our dataset corresponding to the same strain-product pair was also investigated. In these situations,

among the range of titer values for a given strain-product combination, either the median, average,

or maximum value was used to construct the data matrix as the input to the algorithms. Our results

indicated that for this dataset specifically, the ALS algorithm had better predictability measured

both in terms of RMSEtest and classification accuracy over the test dataset (Figure 5.6. For testing

methodology, see section 5.2.2). One possible explanation is that the PPCA algorithm was

147 overfitting the training data, as indicated by comparing the training RMSE between the two

-3 -4 algorithms (RMSEtrain,ALS = 2.71±0.05 × 10 and RMSEtrain,PPCA = 4.59±0.09 × 10 ). Furthermore, the variation of test RMSE was generally smaller in the ALS algorithm compared to that of the

PPCA algorithm (Figure 5.6), suggesting that ALS was less sensitive to local minima when operated on our dataset (Severson, Molaro and Braatz, 2017). Test RMSE and classification accuracy values were similar when comparing the different metrics used to handle multiple entries of the same strain-product pair. Using the maximum titer value caused a slight decrease in predictability, likely because it introduced more noise into the dataset. Considering all of these results, we decided to implement the ALS algorithm along with the median metric to generate the final recommendations.

600 natural products fuels 500 chemicals amino acids 400

300

200 Number of entries of Number

100

0 <=0.001 0.001-0.01 0.01-0.1 0.1-1 1-10 10-100 >=100 Titer range (g/L)

Figure 5.5. Titer histograms of the four major classes of bioproducts. The distribution of titers among amino acids, chemicals, and fuels are largely similar, with the majority of the values falling within the range of 10 to 100 g/L. However, for natural products, the titer distribution is shifted towards the lower end, leaving room for further improvement.

148

a RMSE accuracy b RMSE accuracy c RMSE accuracy 5.0 1.0 5.0 1.0 5.0 1.0

4.0 0.8 4.0 0.8 4.0 0.8

3.0 0.6 3.0 0.6 3.0 0.6

Test RMSE Test RMSE Test RMSE Test

2.0 0.4 2.0 0.4 2.0 0.4

Classification accuracy Classification accuracy Classification 1.0 0.2 accuracy Classification 1.0 0.2 1.0 0.2

0.0 0.0 0.0 0.0 0.0 0.0 ALS PPCA ALS PPCA ALS PPCA Median Average Maximum

Figure 5.6. Comparison of test error and classification accuracy between the ALS and

PPCA algorithms. The test RMSE and classification accuracy of the ALS (alternating least squares) and PPCA (probability principal component analysis) algorithms are shown (see section

5.2.2 for methodology). For the classification accuracy, the algorithms were evaluated based on how well they were able predict whether a strain-product pair will have high (titer rating > 5.5) or low (titer rating < 5.5) performance for the test data. In this comparison, different metrics were also tested for cases where the same strain-product pair had multiple entries in our dataset (and hence multiple titer values). To this end, we used (a) the median value, (b) the average value, or

(c) the maximum value of all titers corresponding to the same strain-product combination. For our dataset, the ALS algorithm performed better in all cases and neither algorithm is sensitive to the metric used.

5.3.3 Recommendation results for organisms

Running the ALS algorithm on the strain-product matrix generated from our dataset essentially filled the missing elements with predictions of titer rating values, resulting in the prediction matrix.

This prediction matrix is then used to recommend specific strain-product pairs based on how well a given organism makes different products or how well a specific product can be produced by

149 various organisms. We began by listing the recommended products to make for each individual organism. To do this, we sorted the titer ratings independently within each row of the prediction matrix and identified the top ten values. The full recommendation lists are rather large and thus are stored along with the dataset instead of tabulated here. Nevertheless, we will present several specific examples from the recommendation lists to illustrate the key points.

Table 5.1 shows the recommended products to produce for the microbes presented previously in section 5.3.1 (Figure 5.1 and Figure 5.2). For many of the organisms in the table, the recommended products are specifically ones that are already present in the dataset and this is especially the case for E. coli and S. cerevisiae, where all recommendations are based on what has already been done. Additionally, for the recommended products that are in dataset, nearly all have very high absolute titers (i.e., ethanol for S. cerevisiae has a titer of 185.5 g/L), or at least comparably high titers (i.e., piceatannol for E. coli has a titer of 0.88 g/L but is the highest value compared to all other strains that were reported to produce piceatannol). This should be expected since for these model organisms, the rows were mostly filled in the original data matrix and with high titer values. Hence, the number of predictions that needed to be performed for E. coli and S. cerevisiae were small and it would be difficult for any of the predicted titers to outcompete the existing ones. Interestingly though, despite >50% overlap of the types of products synthesized by

E. coli and S. cerevisiae as indicated by the dataset, the recommendations generated for these two organisms are completely different and mostly based on their own uniqueness (Table 5.1). For instance, many of the products recommended to E. coli (e.g., pinoresinol or 3-dehydroshikimic acid) are not produced by any other organism in the dataset, whereas the recommendations generated for S. cerevisiae are generally tailored towards yeasts (e.g., ethanol, lycopene, and artemisinic acid). Although the exact products listed in Table 5.1 for these two model organisms

150

Table 5.1. Top recommended products from the recommender system for selected organisms.

Organism E. coli S. cerevisiae C. glutamicum Y. lipolytica

1-pipecolic acid 13r-manoyl oxide vanillin* beta-carotene

mevalonic acid dammarenediol-ii d-lactic acid triacetic acid lactone uridine protopanaxadiol 3-hydroxybutyric acid 3-hydroxy-gamma-decalactone 3-dehydroshikimic acid artemisinic acid violacein lipid hesperetin-3 lycopene muconic acid succinic acid indirubin aminobenzoic acid alpha-ketoglutaric acid leucocyanidin xylitol 1-pipecolic acid 2-hydroxyisobutyric acid pelargonidin 3-o-glucoside ethanol ethanol citric acid

Recommended Recommended products piceatannol 2,3-butanediol succinic acid alkane* pinoresinol d-lactic acid l- malic acid* *indicates that this recommendation was not present in the original dataset

Table 5.1 continued

Organism B. subtilis K. pneumoniae Streptomyces sp. A. niger

adenosine butyric acid* malic acid* malic acid

inosine malic acid* 2,3-butanediol* lipid* butyric acid* d-lactic acid xylitol* citric acid riboflavin 1,3-propanediol daidzein gluconic acid 1-pipecolic acid* l-carnitine* butyric acid* butyric acid* uridine l-lactic acid chlortetracycline oxalic acid gluconic acid* 2,3-butanediol d-lactic acid* canolol violacein* ethanol* alpha-ketoglutaric acid docosahexaenoic acid*

Recommended Recommended products l-* vanillin* free fatty acid* itaconic acid adipic acid* adipic acid* l-lactic acid* alkane* *indicates that this recommendation was not present in the original dataset

151

Table 5.1 continued

Organism K. marxianus S. elongatus P. putida P. pastoris

succinic acid* l-lactic acid* succinic acid* 3-hydroxygenistein

gluconic acid* isoprene gluconic acid* malic acid* mevalonic acid* malic acid* xylitol* 2,3-butanediol riboflavin* 3-hydroxybutyric acid 2,3-butanediol* s-adenosyl-l-methionine 1-pipecolic acid* l-lysine* d-lactic acid* butyric acid* butyric acid* xylitol* 3-hydroxypropionic acid* beta-carotene* violacein* gamma-aminobutyric acid* mevalonic acid* 1,3-propanediol* l-lysine* sucrose* lipid* itaconic acid*

Recommended Recommended products xylitol * pyruvic acid* l-lactic acid* 4-hydroxy-1-* riboflavin* * acetoin* *indicates that this recommendation was not present in the original dataset

152 might not be very interesting, the results indicate that the recommender system can function as expected, providing reasonable suggestions for how organisms can be used given the existing data.

More meaningful recommendations arise for the organisms that have an abundant amount of associated data entries but still leave room for prediction. Similar to E. coli and S. cerevisiae, most of the recommended products for Y. lipolytica already exist in the dataset and have high titers (i.e.,

β-carotene, triacetic acid lactone, lipids, and succinic acid, Table 5.1). However, the system recommends alkanes and malic acid as the two new products that are potentially suitable for Y. lipolytica with neither of them present in our dataset (they are reported for other organisms but not

Y. lipolytica). These recommendations are reasonable given Y. lipolytica’s oleaginous nature and high TCA cycle flux, respectively (Abdel-Mawgoud et al., 2018). Additionally, alkane production has been demonstrated in Y. lipolytica (Xu et al., 2016; Bruder et al., 2019) suggesting this strain- product pair can be very promising (these reports were missed from the database due to the abstract not containing relevant information and the publication date was outside of our search range).

Similarly for K. pneumoniae, although the algorithm generates the two diols as recommendations which falls within expectation given the nature of this organism (Yang et al., 2017), it also suggests the production of adipic acid (Table 5.1). Once again, the K. pneumoniae-adipic acid pair is not present in our dataset but given that this organism has been reported to produce muconic acid, a biosynthetic precursor to adipic acid, at 2.1 g/L (Jung, Jung and Oh, 2015), the recommendation seems sensible. Surprisingly, the recommender system only suggested one amino acid (L-lysine) for C. glutamicum, a prominent amino acid producer (Table 5.1). To rationalize this result, we extracted the entries associated with C. glutamicum and found that although 42% of them were indeed amino acids, the highest titer products reported for this microbe were mostly non-amino acids, with the exceptions being L-lysine and, to a lesser extent, L-glutamate (Figure 5.7). Hence,

153 both the recommendation list and the raw data implies that C. glutamicum is better suited for applications other than (Table 5.1 and Figure 5.7). While for C. glutamicum in particular, it is becoming increasingly apparent that its production utility spans across many chemicals (Baritugo et al., 2018), we used this example here to illustrate how our approach can potentially reveal some unusual characteristics of microbes that help refine their industrial applicability.

300

250

200

150

Titer (g/L) Titer 100

50

0

3HB

GABA

ectoine

L-lysine L-lysine L-lysine L-lysine L-lysine L-lysine L-lysine L-lysine

ethanol

L-proline

L-arginine L-arginine L-arginine

cadaverine cadaverine

L-lactic acid L-lactic

D-lactic acid D-lactic

succinic acid succinic acid succinic acid succinic acid succinic

shikimic

muconic acid muconic

L-glutamic acid L-glutamic acid L-glutamic acid L-glutamic L-glutamic acid L-glutamic Figure 5.7. 30 products with the highest titers produced by C. glutamicum as indicated by the dataset. According to the dataset, while the majority (42%) of the studies using C. glutamicum produced amino acids, the highest titers reported generally involved products that are not amino acids, with L-lysine and L-glutamate being the exceptions.

Despite some of the benefits of the strain-product recommender system discussed above, the algorithm does have shortcomings in cases where there is insufficient training data, a common issue for most data-driven approaches. Specifically, for the organisms that only have a very small number of reported products, the recommendations will no longer be based on certain

154 characteristics of the microbe that the algorithm has ‘learned’. Instead, generic products with high average titers across the dataset will be suggested, as shown in Table 5.1 where compounds such as 2,3-butanediol and lactic acid are frequently present. Therefore in these cases, the specific products shown in the recommendation list should be examined more carefully, and product suggestions beyond the top ten listed here should be taken into consideration as well (see full recommendation list, section 5.2.1).

5.3.4 Recommendation results for products

Similarly, we also generated a list of best strains to use given a certain product by independently identifying the highest titer ratings in each column of the prediction matrix. Table 5.2 shows the suggested organisms to use for several of the non-natural products (Figure 5.3b) and natural products (Figure 5.4b) presented in section 5.3.1. In this table, several organisms appear repeatedly such as Gluconobacter, Pseudomonas, and Rhizopus species. This is because in our dataset, there are only a few entries associated with these microbes but nearly all of them have very high titer numbers (see full dataset, section 5.2.1). When applying ALS, we noticed that the algorithm tends to classify these microbes as general high producers, filling their rows with high values regardless of the product. Therefore, the reoccurrence of these organisms in Table 5.2 is a direct consequence of the lack of data that describe their characteristics properly. Thus, these items should be carefully considered for their feasibility, and other suggestions in the recommendation list should be weighted more (see full recommendation list, section 5.2.1).

When ignoring such species, many of the recommendations presented in Table 5.2 become reasonable and agree with our current characterizations of the organisms. Examples include S. cerevisiae and Z. mobilis for ethanol (Xia et al., 2019), Clostridia species for butanol (Lütke-

Eversloh and Bahl, 2011), Enterococcus and Lactobacillus species for lactic acid (Reddy et al.,

155

2008), Y. lipolytica for β-carotene (Larroude et al., 2018), Xanthophyllomyces for astaxanthin

(Schmidt et al., 2011), and so on. However, we immediately noticed a lack of E. coli or S. cerevisiae as the recommended organism for all of the non-natural products (except for the obvious

S. cerevisiae suggestion for ethanol). This prompted us to search through our dataset to see if the unconventional organisms outcompete E. coli and S. cerevisiae for the production of these compounds. As shown in Figure 5.8, when we individually extracted the top five highest titers along with the associated producers for each of the non-natural products, we once again noticed a minimal appearance of the two model organisms. These results suggest that for the overproduction of products with sufficient characterization and identified producers, it is potentially more advantageous to directly exploit the specialized metabolism of various unconventional microbes combined with a few engineering tweaks, rather than attempting to replicate the full high- producing phenotype in model organisms. By contrast, in generating the recommended strains for natural product synthesis, the algorithm favors E. coli and S. cerevisiae much more, with all but one of the products having a model organism in the recommendation list (Table 5.2). This observation also correlates well with the existing data for natural products where the two model organisms are associated with high titers much more frequently (Figure 5.9). Typically, natural products are secondary metabolites with very long biosynthetic pathways that are heavily regulated, demanding a higher degree of engineering effort to achieve their overproduction. As such, in these cases, using E. coli or S. cerevisiae with an incommensurably higher abundance of genetic tools and biological knowledge can be expected to grant higher chances of success.

Overall, both our data and the analysis from the recommender system suggest a method to determine when to use model (sophisticated products that requires complex pathways) versus non-

156 model (well-characterized products that are derived from central carbon metabolism within a few steps) organisms, which could be a useful addition to the current list of host selection criteria.

succinic L-lactic 200 ethanol acid 2,3-BDO butanol acid 1,3-PDO 160 120

80 Titer (g/L) Titer 40

0

E. coli E.

L. casei L.

R. oryzae R.

Z. mobilis Z. mobilis Z. mobilis Z.

K. oxytoca K.

Bacillus sp Bacillus

Y. lipolytica Y. lipolytica Y.

S. cerevisiae S. cerevisiae S. cerevisiae S. cerevisiae S.

C. butyricum C.

E. aerogenes E.

Monascus sp Monascus

C. glutamicum C. glutamicum C.

K. pneumoniae K. pneumoniae K. pneumoniae K. pneumoniae K.

Enterobacter sp Enterobacter

Enterococcus sp Enterococcus sp Enterococcus

C. tyrobutyricum C. tyrobutyricum C. Actinobacteria sp Actinobacteria C. acetobutylicum C. Figure 5.8. Organisms that produced the highest titers of several (non-natural) products.

For ethanol, succinic acid, 2,3-butanediol (2,3-BDO), butanol, L-lactic acid, and 1,3-propanediol

(1,3-PDO), the five highest titers are selected from the dataset and the associated producing organisms are presented. In the cases shown here, the two model organisms do not appear frequently, with only one instance of E. coli and four instances of S. cerevisiae (two of which are for ethanol).

157

Table 5.2. Top recommended organisms from the recommender system for selected non-natural and natural products.

Product ethanol succinic acid 2,3-butanediol butanol

Pseudomonas sp* Gluconobacter oxydans* Gluconobacter oxydans* Kluyveromyces marxianus*

s Bacillus licheniformis** Bacillus licheniformis* Pseudomonas sp* Actinobacteria sp Schizosaccharomyces sp Pseudomonas putida* Pseudomonas putida* Candida sp*

Corynebacterium glutamicum Pseudomonas sp* Streptomyces sp* Clostridium tyrobutyricum organism Kluyveromyces sp Kluyveromyces marxianus* Bacillus amyloliquefaciens Gluconobacter oxydans* Zymomonas mobilis Candida sp* Kluyveromyces marxianus* Clostridium acetobutylicum Corynebacterium crenatum** Bacillus sp* Enterobacter sp Clostridium beijerinckii Saccharomyces cerevisiae Aspergillus terreus* Aspergillus terreus* Bacillus sp

Recommended Recommended Thermoanaerobacterium sp Enterococcus sp Enterobacter aerogenes Clostridium sp Kluyveromyces marxianus Lactococcus lactis* Klebsiella oxytoca Pseudomonas sp* *indicates that this recommendation was not present in the original dataset **indicates that the recommendation with this specific strain was not present in the original dataset although the genus was present

Table 5.2 continued

Product l-lactic acid 1,3-propanediol beta-carotene astaxanthin

Pseudomonas sp* Pseudomonas putida* Candida sp* Pseudomonas sp*

s Gluconobacter oxydans* Klebsiella pneumoniae Bacillus sp* Candida sp* Aspergillus terreus* Bacillus sp* Pseudomonas sp* Escherichia coli

Monascus sp Klebsiella oxytoca Yarrowia lipolytica Brevibacterium flavum* organism Enterococcus sp Clostridium butyricum Gluconobacter oxydans* Pseudomonas putida* Pseudomonas putida* Gluconobacter oxydans* Escherichia coli Paracoccus sp Bacillus licheniformis Kluyveromyces marxianus* Mortierella alpine* Rhizopus sp* Kluyveromyces lactis Shimwellia sp Bacillus amyloliquefaciens* Thermoanaerobacterium sp*

Recommended Recommended Bacillus sp Lactobacillus reuteri Aspergillus sp* Xanthophyllomyces sp Lactobacillus casei Rhizopus sp* Kluyveromyces marxianus* Bacillus amyloliquefaciens* *indicates that this recommendation was not present in the original dataset

158

Table 5.2 continued

Product resveratrol lycopene vanillin naringenin

Pseudomonas sp* Pseudomonas sp* Corynebacterium glutamicum* Pseudomonas sp*

Pseudomonas putida* Pseudomonas putida* Bacillus licheniformis** Escherichia coli Rhizopus sp* Saccharomyces cerevisiae Amycolatopsis sp Kluyveromyces marxianus*

Escherichia coli Bacillus sp* Kluyveromyces marxianus* Candida sp* organisms Candida sp* Gluconobacter oxydans* Pseudomonas putida Rhizopus sp* Saccharomyces cerevisiae Bacillus licheniformis* Candida sp* Pseudomonas putida* Kluyveromyces marxianus* Candida sp* Pseudomonas sp Aspergillus terreus* Klebsiella pneumoniae* Rhizopus sp* Rhizopus sp* Bacillus sp*

Recommended Recommended Brevibacterium flavum* Bacillus amyloliquefaciens* Gluconobacter oxydans* Brevibacterium flavum* Bacillus licheniformis* Aspergillus terreus* Klebsiella pneumoniae* Klebsiella pneumoniae* *indicates that this recommendation was not present in the original dataset **indicates that the recommendation with this specific strain was not present in the original dataset although the genus was present

159

β-carotene astaxanthin resveratrol lycopene vanillin naringenin 23.3 7 6 5 4 3

Titer (g/L) Titer 2 1

0

E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E. coli E.

P. putida P.

Bacillus sp Bacillus

B. trispora B.

Y. lipolytica Y. lipolytica Y.

S. cerevisiae S. cerevisiae S. cerevisiae S. cerevisiae S. cerevisiae S. cerevisiae S.

Paracoccus sp Paracoccus

Pediococcus sp Pediococcus

Pseudomonas sp Pseudomonas Amycolatopsis sp Amycolatopsis

Xanthophyllomyces sp Xanthophyllomyces Figure 5.9. Organisms that produced the highest titers of several natural products. For β- carotene, astaxanthin, resveratrol, lycopene, vanillin, and naringenin, the top five highest titers are selected from the dataset and the associated producing organisms are presented. Unlike the previous case for non-natural products, the two model organisms appear much more frequently for microbial natural product synthesis. In this case, the majority of the entries are associated with

E. coli and S. cerevisiae.

5.4 Discussion

Selecting a microbial host for product biosynthesis is one of the most important steps in the metabolic engineering pipeline. Finding the proper microbe to produce the appropriate product typically requires an understanding of the cell’s metabolism and physiology, the availability of genetic engineering tools, and knowledge of the product’s biosynthetic demands. While there were previous efforts on data-driven methods that guide the selection of host cells, most of them function as a characterization of the organism’s genome to indirectly inform the researcher on which organism to choose and they contain no information about the products (Boudellioua et al., 2016;

Clauwaert, Menschaert and Waegeman, 2019; Kim et al., 2020). In this work, a more direct

160 approach was applied to rationalize the host selection process. By first gathering strain-product- titer information from literature and then implementing a recommender system, we aimed to algorithmically generate a list of suitable organisms to use given a desired product or vice versa.

Ultimately, the recommender system serves to analyze the hidden patterns underlying past data, identifying the characteristics of each strain as well as the requirements of each product to find a match between the two. As such, this method can be viewed as a means to systematically summarize the principles that we apply during host selection, followed by an automated pairing scheme.

Since our approach is primarily data driven, a large portion of the recommended results stems from what has already been demonstrated, reflecting that our model is well-trained. On the other hand, since in these cases the algorithm is unable to find any new strain-product combinations that outperforms prior work, this also serves as a testament to how well-established the current strain and product selection criteria are. Furthermore, the recommender system demonstrates its utility in situations where there is sufficient data but also ample room for predictions. One example is the recommended products generated for C. glutamicum, which is traditionally considered as a prominent amino acid producer and is still used for amino acid biosynthesis nowadays (D’Este,

Alvarado-Morales and Angelidaki, 2018). However, both the data itself and the recommendation list suggests that C. glutamicum could be a better performing organism when tasked with synthesizing other products, such as organic acids. A similar reasoning can be applied to other less-characterized organisms to uncover some of their industrial potential.

Interestingly, the algorithm provides a condition to choose between model versus non-model organisms. The debate on whether to exploit the unique metabolism and physiology offered by non-model organisms or the genetic manipulability of model organisms has been ongoing for a

161 long time. However, the dataset presented here reflects the collective ‘voting’ efforts of the metabolic community that is implicitly hidden in titer numbers. Specifically, for compounds that are central carbon metabolism-related or can be reached within a few enzymatic steps, several unconventional microbes can be identified to produce them at very high titers and should be prioritized. This is presumably because these organisms have specialized metabolism and exceptionally high flux through relevant pathways, giving rise to an overproduction phenotype that is difficult to replicate elsewhere. On the other hand, for more structurally and functionally diverse compounds, especially natural products, E. coli or S. cerevisiae should be applied for better coordination of long pathways, enhanced expression of engineered proteins, and refined control of metabolic networks, ultimately leading to higher titers.

Nonetheless, we recognize that our current application of the recommender system to the existing dataset has many limitations. For instance, the ALS algorithm that we chose may not be the most ideal given the characteristics of the strain-product data matrix, where two rows are largely filled (E. coli and S. cerevisiae), several rows are partially filled (e.g., C. glutamicum and

Y. lipolytica), and all other rows are only filled with a very small number of entries. Additionally, due to the presence of many under-studied strains and products, the size of our dataset is rather small and can lead to poor-quality predictions as discussed previously. Lastly, using titer as the rating metric lumps together many factors that contribute to high or low performance of the strain- product combination and our algorithm is unable to differentiate them. In light of these considerations, instead of strictly adhering to the specific ordering of items in our recommendation lists, it is perhaps more productive to view them as general guidelines to benchmark user-chosen strain-product pairs against. Namely, a certain combination of microbe and compound should be selected first according to established principles and experiences. Then, the use of the

162 recommendation lists in conjunction with the dataset can serve to support the decision or otherwise prompt considerations for better alternatives.

Finally, the dataset that we provide as part of this work reflects the general trends of metabolic engineering within the past two decades and functions as a quick reference to what has been done previously, illustrating areas that achieved remarkable accomplishments as well as those that require further improvement. Hence, building upon the dataset tailored to industrial biotechnology and metabolic engineering with overlooked published results and future research can be beneficial in many ways. For the recommender system in particular, refinement of the data will most definitely lead to better prediction outcomes, mitigating a number of aforementioned issues and eventually allowing the algorithm to provide strain-product pairs that have a high probability to achieve superior titers. Furthermore, for each datapoint, information other than strain, product, and titer can be specified as well, such as substrate type, fermentation mode, and pathway length.

Doing so allows for the construction of a better rating metric that is a function of many parameters defining the strain-product pair, which should allow the algorithm to perform better. Having more information in the dataset also enables the utilization of context-aware recommender algorithms where the current strain-product matrix is augmented with additional dimensions to provide more meaningful suggestions under various conditions. Considering these benefits, it might be worthwhile to actively maintain a database that houses information tailored to metabolic engineering, where authors of newly published articles are allowed to contribute and input their results, similar to the Experiment Data Depot (Morrell et al., 2017). This, in turn, promotes the use of data-driven approaches in metabolic engineering, which complements the current experience and knowledge-based methods.

163

5.5 References Abdel-Mawgoud, A. M. et al. (2018) ‘Metabolic engineering in the host Yarrowia lipolytica’,

Metabolic Engineering. 50, pp. 192–208. doi: 10.1016/j.ymben.2018.07.016.

Alonso-Gutierrez, J. et al. (2015) ‘Principal component analysis of proteomics (PCAP) as a tool

to direct metabolic engineering’, Metabolic Engineering. 28, pp. 123–133. doi:

10.1016/j.ymben.2014.11.011.

Baritugo, K.-A. et al. (2018) ‘Metabolic engineering of Corynebacterium glutamicum for

fermentative production of chemicals in biorefinery’, Applied Microbiology and

Biotechnology. 102, pp. 3915–3937. doi: 10.1080/09168451.2018.1452602.

Bishop, C. M. (2006) Machine Learning and Pattern Recoginiton, Information Science and

Statistics.

Boudellioua, I. et al. (2016) ‘Prediction of metabolic pathway involvement in prokaryotic

uniprotkb data by association rule mining’, PLoS ONE, 11(7), pp. 1–16. doi:

10.1371/journal.pone.0158896.

Bruder, S. et al. (2019) ‘Drop-in biofuel production using fatty acid photodecarboxylase from

Chlorella variabilis in the oleaginous yeast Yarrowia lipolytica’, Biotechnology for

Biofuels. 12(1), pp. 1–13. doi: 10.1186/s13068-019-1542-4.

Clauwaert, J., Menschaert, G. and Waegeman, W. (2019) ‘DeepRibo: A neural network for

precise gene annotation of prokaryotes by combining ribosome profiling signal and

binding site patterns’, Nucleic Acids Research. 47(6). doi: 10.1093/nar/gkz061.

Costello, Z. and Martin, H. G. (2018) ‘A machine learning approach to predict metabolic

pathway dynamics from time-series multiomics data’, npj Systems Biology and

Applications. 4(1), pp. 1–14. doi: 10.1038/s41540-018-0054-3.

D’Este, M., Alvarado-Morales, M. and Angelidaki, I. (2018) ‘Amino acids production focusing

164

on fermentation technologies – A review’, Biotechnology Advances. 36(1), pp. 14–25.

doi: 10.1016/j.biotechadv.2017.09.001.

Gadkar, K. G., Mehra, S. and Gomes, J. (2005) ‘On-line adaptation of neural networks for

bioprocess control’, Computers and Chemical Engineering, 29(5), pp. 1047–1057. doi:

10.1016/j.compchemeng.2004.11.004.

Guan, N. et al. (2018) ‘Comparative genomics and transcriptomics analysis-guided metabolic

engineering of Propionibacterium acidipropionici for improved propionic acid

production’, Biotechnology and Bioengineering, 115(2), pp. 483–494. doi:

10.1002/bit.26478.

Han, X. et al. (2019) ‘ProGAN: Protein solubility generative adversarial nets for data

augmentation in DNN framework’, Computers and Chemical Engineering. 131. doi:

10.1016/j.compchemeng.2019.106533.

Ilin, A. and Raiko, T. (2010) ‘Practical approaches to principal component analysis in the

presence of missing values’, Journal of Machine Learning Research, 11, pp. 1957–2000.

Jung, H. M., Jung, M. Y. and Oh, M. K. (2015) ‘Metabolic engineering of Klebsiella

pneumoniae for the production of cis,cis-muconic acid’, Applied Microbiology and

Biotechnology, 99(12), pp. 5217–5225. doi: 10.1007/s00253-015-6442-3.

Kim, G. B. et al. (2020) ‘Machine learning applications in systems metabolic engineering’,

Current Opinion in Biotechnology. 64, pp. 1–9. doi: 10.1016/j.copbio.2019.08.010.

Larroude, M. et al. (2018) ‘A synthetic biology approach to transform Yarrowia lipolytica into a

competitive biotechnological producer of β-carotene’, Biotechnology and

Bioengineering, 115(2), pp. 464–472. doi: 10.1002/bit.26473.

Libbrecht, M. W. and Noble, W. S. (2015) ‘Machine learning applications in genetics and

165

genomics’, Nature Reviews Genetics. 16(6), pp. 321–332. doi: 10.1038/nrg3920.

Lütke-Eversloh, T. and Bahl, H. (2011) ‘Metabolic engineering of Clostridium acetobutylicum:

Recent advances to improve butanol production’, Current Opinion in Biotechnology,

22(5), pp. 634–647. doi: 10.1016/j.copbio.2011.01.011.

Mellor, J. et al. (2016) ‘Semisupervised Gaussian Process for Automated Enzyme Search’, ACS

Synthetic Biology, 5(6), pp. 518–528. doi: 10.1021/acssynbio.5b00294.

Morrell, W. C. et al. (2017) ‘The Experiment Data Depot: A Web-Based Software Tool for

Biological Experimental Data Storage, Sharing, and Visualization’, ACS Synthetic

Biology, 6(12), pp. 2248–2259. doi: 10.1021/acssynbio.7b00204.

Ohtake, T. et al. (2017) ‘Metabolomics-driven approach to solving a CoA imbalance for

improved 1-butanol production in Escherichia coli’, Metabolic Engineering. 41, pp. 135–

143. doi: 10.1016/j.ymben.2017.04.003.

Presnell, K. V. and Alper, H. S. (2019) ‘Systems Metabolic Engineering Meets Machine

Learning: A New Era for Data-Driven Metabolic Engineering’, Biotechnology Journal.

14(9). doi: 10.1002/biot.201800416.

Reddy, G. et al. (2008) ‘Amylolytic bacterial - A review’, Biotechnology

Advances, 26(1), pp. 22–34. doi: 10.1016/j.biotechadv.2007.07.004.

Ryu, J. Y., Kim, H. U. and Lee, S. Y. (2019) ‘Deep learning enables high-quality and high-

throughput prediction of enzyme commission numbers’, Proceedings of the National

Academy of Sciences of the United States of America, 116(28), pp. 13996–14001. doi:

10.1073/pnas.1821905116.

Schmidt, I. et al. (2011) ‘Biotechnological production of astaxanthin with Phaffia

rhodozyma/Xanthophyllomyces dendrorhous’, Applied Microbiology and Biotechnology,

166

89(3), pp. 555–571. doi: 10.1007/s00253-010-2976-6.

Severson, K. A., Molaro, M. C. and Braatz, R. D. (2017) ‘Principal component analysis of

process datasets with missing values’, Processes, 5(3), pp. 1–18. doi:

10.3390/pr5030038.

Shipp, M. A. et al. (2002) ‘Diffuse large B-cell lymphoma outcome prediction by gene-

expression profiling and supervised machine learning’, Nature Medicine, 8(1), pp. 2–8.

Tsai, A. G. et al. (2020) ‘Multiplexed single-cell morphometry for hematopathology

diagnostics’, Nature Medicine. 26(3), pp. 408–417. doi: 10.1038/s41591-020-0783-x.

Wu, Z. et al. (2019) ‘Machine learning-assisted directed protein evolution with combinatorial

libraries’, Proceedings of the National Academy of Sciences of the United States of

America, 116(18), pp. 8852–8858. doi: 10.1073/pnas.1901979116.

Xia, J. et al. (2019) ‘Engineering Zymomonas mobilis for Robust Cellulosic Ethanol

Production’, Trends in Biotechnology. 37(9), pp. 960–972. doi:

10.1016/j.tibtech.2019.02.002.

Xu, P. et al. (2016) ‘Engineering Yarrowia lipolytica as a platform for synthesis of drop-in

transportation fuels and oleochemicals’, Proceedings of the National Academy of

Sciences of the United States of America, 113(39), pp. 10848–10853. doi:

10.1073/pnas.1607295113.

Yang, K. K., Wu, Z. and Arnold, F. H. (2019) ‘Machine-learning-guided directed evolution for

protein engineering’, Nature Methods. 16(8), pp. 687–694. doi: 10.1038/s41592-019-

0496-6.

Yang, T. et al. (2017) ‘Metabolic engineering strategies for acetoin and 2,3-butanediol

production: advances and prospects’, Critical Reviews in Biotechnology, 37(8), pp. 990–

167

1005. doi: 10.1080/07388551.2017.1299680.

168

Chapter 6

Conclusions and Future Directions

169

6.1 Summary and conclusions

The focus of this thesis work is the design and execution of a synergistic substrate co-feeding scheme to accelerate cellular metabolism and product formation. The idea was applied to a two- stage non-photosynthetic system that can fix and convert CO2 into biodiesel using the energy from

H2. In order to find the proper co-substrates tailored to CO2 fixation by Moorella thermoacetica and acetate-driven lipogenesis by Yarrowia lipolytica, we first identified the core issues, which was determined to be ATP and NADPH limitations, respectively. Correspondingly, cofeeding minor amounts of glucose to M. thermoacetica cells growing on H2/CO2 allowed ‘trapped’ this reduced sugar compound within glycolysis, preserving its energy into generating almost exclusively ATP. This boosted the specific CO2 conversion rate into acetic acid by ~4-5 fold while still maintaining very high levels of net CO2 fixation. Similarly, co-feeding acetate-cultured Y. lipolytica cells with small amounts gluconate drove significant amounts of NADPH formation through the recursive oxidation within the pentose cycle (6PG → R5P → F6P → G6P → 6PG +

2 NADPH). As a result, this doubled the specific lipid formation rates.

The implications of these findings are many-fold. We observed synergy in our cofeeding experiments in that the experimentally measured productivity exceeded that of the linear sum of the individual substrates’ productivities (V12 > V1 + V2 where V represents the productivity on either substrate (1 or 2 as subscript) or both substrates (12 as subscript)). This has not been reported previously, as all experimental efforts and models describing mixed substrate utilization up to this point have only shown sublinear enhancements (V12 > V1 or V2 but V12 < V1 + V2). Because of these widely accepted sublinear effects, dual substrate co-utilization is often times viewed as a nuisance since the incremental improvements rarely outweigh the additional challenges (e.g., catabolite repression) and added layers of complexity compared to single substrate fermentations. Thus, there

170 has been minimal motivation to apply multiple substrates in microbial cultures outside of the cases where efficient use of complex renewable feedstocks is desired. However, with the discovery of the synergistic regime of dual-substrate metabolism, it might be worthwhile to reconsider our current viewpoints regarding the use of more than one carbon source. As we have demonstrated, judiciously chosen substrate pairs better balance the various biosynthetic components towards the needs of the product, debottlenecking the rate limiting steps. Furthermore, we also provide a simple and universal method that can bypass catabolite repression, easing the implementation of these designs. Additional ongoing work has also shown that it is possible to achieve synergistic yield improvements using a co-feeding system (Y12 > Y1 + Y2 where Y represents the gram product per gram substrate yield), which brings economic benefits in terms of operating costs. These results should provide incentives to explore further the uncharted landscape of multi-substrate metabolism, which should now be regarded as an opportunity to further enhance strain performance post genetic engineering. Alternatively, for microbes without established genetic tools, synergistic co-feeding can serve as a viable substitute to rewire metabolic fluxes, achieving similar effects to genetic modulations. Finally, generalizing the concept further, we now have a method to control intracellular fluxes for cells cultured in a multi-substrate environment by changing the feed ratios of the involved carbon sources. This could potentially be useful in studying the intricacies of cellular metabolism, especially for mammalian systems where the media inherently contains multiple nutrients.

The second contribution of this work is the creation of a dataset containing strain-product-titer information, which, when analyzed with machine learning tools such as the recommender system presented in the previous chapter, reflects some of the trends in current metabolic engineering research. Our approach serves as a systematic summary of the implicit rules that applies to

171 selecting a proper host organism or product based on the ‘collective voting’ of literature results.

For instance, there is a tendency to focus more on microbes with better engineerability and products with simpler pathways. The strengths of various organisms and the requirements of different products have also become explicit when viewing both the dataset and the recommendation lists provided by the algorithm. Perhaps the most interesting finding here is that the choice between model (abundant genetic tools, a wealth of biological knowledge) and non- model (specialized metabolism and physiology) organisms is largely dictated by the degree of complexity involved in the synthesis of the desired product. Bulk chemicals, fuels, and amino acids can generally reach higher titers with pre-identified, suitable non-model organisms while natural products typically require model organisms for better production.

Admittedly, this part of the thesis is more exploratory compared to the co-feeding system and this is largely due to the limited quality and quantity of our dataset. However, we were still able to achieve ~72% accuracy in predicting whether a strain-product pair is high- or low-performing, providing evidence that our algorithm is able to capture the hidden trends present in the data despite its drawbacks. Thus, this serves as another example that machine learning algorithms can be feasible in metabolic engineering throughout the strain development pipeline (Presnell and Alper,

2019; Kim et al., 2020) even though the predictions may be imperfect. The results should at least augment rational designs based on a priori knowledge to better inform the user. In order to achieve higher predictive power, further developments in high-throughput systems that rapidly generate relevant, high-quality data is obviously required. Yet, simply having more data is insufficient as most of the algorithms used are borrowed directly from other disciplines, leading to a mismatch between their intended and actual use cases. Therefore, novel algorithms should be created from the ground-up in conjunction with data collection in order to ensure the best possible performance.

172

6.2 Suggestions for future work

6.2.1 Methanol as a more efficient NADPH generator

For Y. lipolytica cultured under co-feeding conditions with acetate as the primary carbon source, we previously described the existence of a pentose cycle comprised of the oxidative pentose phosphate pathway (oxPPP), transketolase (TKL) and transaldolase (TAL) steps, as well as the phosphoglucose isomerase (PGI) step. Note that the TKL, TAL, and PGI enzymes are all highly reversible such that in order to maintain the operation of this cyclic pathway, the gluconeogenic flux needs to be strong enough to push PGI in the reverse direction, catalyzing the fructose-6- phosphate (F6P) to glucose-6-phosphate (G6P) reaction and closing the cycle. This requirement is achieved when the cells consume acetate, transcriptionally activating a series of gluconeogenic- related enzymes (phosphoenolpyruvate carboxykinase (PEPCK), and fructose-1,6-bisphosphatase

(FBP)) as well as driving mass action kinetics that favors gluconeogenesis.

The reversed PGI seen in Y. lipolytica under our culture conditions resembles that of the dissimilatory ribulose monophosphate pathway (RuMP) in methylotrophs (Zhang et al., 2017).

This inspired us to implement the RuMP cycle in Y. lipolytica, allowing it to convert methanol, a biotechnologically important substrate (Zhang et al., 2018), exclusively into NADPH (Figure 6.1).

To this end, we overexpressed methanol dehydrogenase (MDH), hexulose-6-phosphate synthase

(HPS) and phosphohexulose isomerase (PHI) as the three key enzymes of the pathway (Woolston

13 et al., 2018). Co-feeding the cells with natural abundance acetate and [ C1]methanol suggests that the dissimilatory RuMP cycle is functioning in Y. lipolytica, albeit the contribution of methanol to

NADPH generation is minimal in these preliminary experiments (Figure 6.2). The benefit of the construction shown here over the previous pentose cycle is that 100% of the co-fed methanol entering the cycle will be converted to NADPH as no other metabolic intermediates are labeled

173

(Figure 6.2, PEP). By contrast, in the previous study, a portion of the cofed gluconate leaks from the pentose cycle and contributes to other parts of central carbon metabolism, leading to lower

NADPH yield (see Figure 4.7). Therefore, to maximize the efficiency of the supplemental substrate, the dissimilatory RuMP cycle driven by methanol is a superior alternative. Further optimizations to enable better incorporation of methanol, possibly through introducing more active

MDHs or alcohol oxidases (AOXs), as well as to reduce the toxicity of methanol, potentially by investigating glutathione-dependent pathways, can be topics for future research in this direction.

G6P 2e- 6PG CO2 2e- Ru5P P P P 6PGDH PGI G6PDH P P HPS PHI F6P H6P Formaldehyde 2e- Aox/MDH Methanol

Figure 6.1. The dissimilatory RuMP pathway. This pathway is commonly found in methylotrophs that can natively consume methanol. With PGI operating in the reverse direction due to acetate-driven gluconeogenic flux and the overexpression of MDH, HPS, and PHI, the dissimilatory RuMP cycle was constructed in Y. lipolytica to oxidize methanol for the exclusive generation of NADPH. Abbreviations used: G6P, glucose-6-phosphate; F6P, fructose-6- phosphate; 6PG, 6-phosphogluconate; Ru5P, ribulose-5-phosphate; H6P, hexulose-6- phosphate.

6.2.2 Multi-substrate cofeeding

In this thesis, we implemented our synergistic co-feeding system using substrate pairs for both M. thermoacetica and Y. lipolytica. However, the idea of having substrates complement each other in

174 terms of metabolic function is not limited to only two different sources. In analyzing the isotopic

− tracing results from M. thermoacetica co-feed with glucose (ATP) and H2 (e ) as complementary energy sources for CO2 fixation, we noticed that the carbons from glucose also contributed significantly to aromatic amino acid and ribose (i.e., nucleic acid) synthesis. Given that the cells can only derive its carbons from glucose or CO2 in our system, it is unsurprising that these biosynthetic precursors are largely produced from glucose as the relevant pathways do not involve

CO2. Consequently, although glucose co-feeding resolved the issue of ATP availability, as evidenced by the high per-cell productivity, it was still insufficient to fully debottleneck the synthesis of these important biomass building blocks, as evidenced by the low cell density where

OD660 did not exceed 0.43 (biomass was significantly higher compared to autotrophic conditions with H2/CO2 alone).

100%

95%

90%

M+1 MID 85% M+0 80%

75% G6P F6P R5P PEP

Figure 6.2. Exclusive conversion of methanol into NADPH through the RuMP cycle.

13 Labelling patterns of relevant metabolic intermediates following [ C1]methanol and natural abundance acetate cofeeding revealed that methanol has been incorporated into the dissimilatory

RuMP cycle (M+1 in G6P and F6P) and subsequently decarboxylated through oxPPP to form

NADPH (no M+1 in R5P). Furthermore, no leakage of methanol outside of the RuMP cycle was

175 observed as evidenced by the labeling of PEP (no M+1), indicating that 100% of the metabolized methanol was converted into NADPH, which is more efficient compared to the pentose cycle.

Additional abbreviations used: R5P, ribose-5-phosphate; PEP, phosphoenolpyruvate.

The analysis above prompted us to consider additional carbon sources that we can introduce to support biomass synthesis. Here, the apparent choice is to provide the limiting components as stated above, namely tyrosine, phenylalanine, and tryptophan as the three aromatic amino acids, as well as xylose, a native pentose substrate of M. thermoacetica for the synthesis of ribose moieties. As illustrated by Figure 6.3, addition of these substrates increased cell density, especially with xylose. Furthermore, the specific CO2 fixation rates did not change appreciably (data not shown), indicating that the aromatic amino acids and xylose did not detrimentally affect the well- established glucose + H2 condition. Nevertheless, despite these encouraging results, the cell density within our chemostat system remained low, implying that there might be other limitations.

Hence, for future research directions related to this topic, it is worthwhile to formulate a systematic approach targeted at identifying a set of complementary substrates that can satisfy all demands of the cell to simultaneously increase growth rate and specific CO2 fixation rate. For instance, computational systems can be applied to screen through permutations of substrate combinations within a predefined pool. We note here that while it is tempting to add as many substrates as possible to the system to maximize all gains, having a large number of organic substances severely limits net CO2 fixation (see Figure 4.12), which is the ultimate goal. As such, the challenge here is to find a minimal set of substrates that can achieve a prespecified level of growth and net CO2 fixation, which can be formulated into an interesting optimization problem. Once the types of substrates are identified, the respective feed rates should also be determined with the constraints

176 that they should be less than the uptake rate specified by the cells (if catabolite repression is an issue).

0.8

0.6

660 0.4 OD

0.2

0

Figure 6.3. Multi-substrate cofeeding enhances cell density in addition to per-cell productivity in M. thermoacetica H2/CO2 cultures. Glucose/H2 is the standard dual substrate cofeeding scheme that was described in previous chapters and these two substrates are present in all three conditions tested here. We also added aromatic amino acids (tyr, trp, phe, as indicated by the +AAA condition) and xylose (as indicated by the +AAA +xylose condition) to the system for co-utilization. The results indicate that biomass synthesis can be improved upon these modifications with xylose addition contributing the most. Specific productivities remained similar across all conditions.

6.3 References

Kim, G. B. et al. (2020) ‘Machine learning applications in systems metabolic engineering’,

Current Opinion in Biotechnology. 64, pp. 1–9. doi: 10.1016/j.copbio.2019.08.010.

Presnell, K. V. and Alper, H. S. (2019) ‘Systems Metabolic Engineering Meets Machine

Learning: A New Era for Data-Driven Metabolic Engineering’, Biotechnology Journal.

177

14(9). doi: 10.1002/biot.201800416.

Woolston, B. M. et al. (2018) ‘Improving formaldehyde consumption drives methanol

assimilation in engineered E. coli’, Nature Communications. 9(1). doi: 10.1038/s41467-

018-04795-4.

Zhang, W. et al. (2017) ‘Guidance for engineering of synthetic methylotrophy based on

methanol metabolism in methylotrophy’, RSC Advances. 7(7), pp. 4083–4091. doi:

10.1039/c6ra27038g.

Zhang, W. et al. (2018) ‘Current advance in bioconversion of methanol to chemicals’,

Biotechnology for Biofuels. 11(1), pp. 1–11. doi: 10.1186/s13068-018-1265-y.

178

Appendix A

Isotopic tracing and MFA results for Y. lipolytica acetate cultures

179

Table A1. Model metabolic network for growth phase

Extracelluar fluxes v1: Acetate.ext (ab) -> Acetate (ab) v2: AcCoA.c (ab) -> Lipids (ab) v3: Glyc3P (abc) -> TAGs (abc) v4: Ala + Arg + Asn + Asp + Cys + Gln + Glu + Gly + His + Ile + Leu + Lys + Met + Phe + Pro + Ser + Thr + Trp + Tyr + Val + CMP + GMP + UMP + AMP + DAMP + DCMP + DGMP + DTMP + Chitin + Trehalose + Mannan + Glucan + Ergo + Zymo + PA + PI + PS + PE + PC + AcCoA.c -> biomass (see Supplementary Table 1 for stoichiometry)

Pyruvate compartmentalization v5: 0 Pyr.c (abc) -> Pyr.mnt (abc) v6: 0 Pyr.m (abc) -> Pyr.mnt (abc) v7: Pyr.mnt (abc) -> Pyr.fix (abc)

Malate compartmentalization v8: 0 Mal.c (abcd) -> Mal.mnt (abcd) v9: 0 Mal.m (abcd) -> Mal.mnt (abcd) v10: Mal.mnt (abcd) -> Mal.fix (abcd)

Succinate compartmentalization v11: 0 Suc.c (abcd) -> Suc.mnt (abcd) v12: 0 Suc.m (abcd) -> Suc.mnt (abcd) v13: Suc.mnt (abcd) -> Suc.fix (abcd)

Citrate compartmentalization v14: 0 Cit.c (abcdef) -> Cit.mnt (abcdef) v15: 0 Cit.m (abcdef) -> Cit.mnt (abcdef) v16: Cit.mnt (abcdef) -> Cit.fix (abcdef)

Gluconeogenesis v17: OAA.c (abcd) -> PEP (abc) + C02 (d) v18: PEP (abc) <-> PG3 (abc) v19: PEP (abc) -> Pyr.c (abc) v20: PG3 (abc) <-> G3P (abc) v21: G3P (abc) <-> DHAP (abc) v22: DHAP (abc) -> Glyc3P (abc) v23: DHAP (cba) + G3P (def) <-> FBP (abcdef) v24: FBP (abcdef) <-> F6P (abcdef) v25: F6P (abcdef) <-> G6P (abcdef)

Pentose Phosphate Pathway v26: G6P (abcdef) -> C02 (a) + Ru5P (bcdef) v27: Ru5P (abcde) <-> R5P (abcde) v28: Ru5P (abcde) <-> X5P (abcde) v29: X5P (abcde) <-> G3P (cde) + TKC2 (ab) v30: E4P (cdef) + TKC2 (ab) <-> F6P (abcdef) v31: R5P (cdefg) + TKC2 (ab) <-> S7P (abcdefg) v32: G3P (def) + TAC3 (abc) <-> F6P (abcdef) v33: S7P (abcdefg) <-> E4P (defg) + TAC3 (abc)

Pyruvate metabolism v34: Mal.m (abcd) -> Pyr.m (abc) + C02 (d) v35: Mal.c (abcd) <-> OAA.c (abcd) v36: AcCHO (ab) -> AcCoA.c (ab) 180 v37: Acetate (ab) -> AcCoA.c (ab)

Citric Acid Cycle v38: Pyr.m (abc) -> AcCoA.m (bc) + C02 (a) v39: OAA.m (abcd) + AcCoA.m (ef) -> Cit.m (dcbfea) v40: Cit.m (abcdef) <-> ICit.m (abcdef) v41: ICit.m (abcdef) -> AKG (abcde) + C02 (f) v42: AKG (abcde) -> Suc.m (bcde) + C02 (a) v43: Suc.m (abcd) <-> Fum (abcd) v44: Fum (abcd) <-> Mal.m (abcd) v45: Mal.m (abcd) <-> OAA.m (abcd)

Glyoxylate Shunt v46: Cit.c (abcdef) <-> ICit.c (abcdef) v47: ICit.c (abcdef) -> Glx (ab) + Suc.c (edcf) v48: Glx (ab) + AcCoA.c (cd) -> Mal.c (abdc) v49: Suc.c (abcd) -> Suc.m (abcd)

One-carbon metabolism v50: PG3 (abc) + Glu (defgh) -> Ser (abc) + AKG (defgh) v51: Ser (abc) <-> Gly (ab) + C1 (c) v52: Gly (ab) <-> C02 (a) + C1 (b)

Transport Reactions v53: Pyr.c (abc) -> Pyr.m (abc) v54: Cit.m (abcdef) -> Cit.c (abcdef) v55: AcCoA.c (ab) <-> AcCoA.m (ab) v56: Mal.c (abcd) -> Mal.m (abcd)

Amino Acid Biosynthesis (neglecting One-carbon metabolism) v57: AKG (abcde) -> Glu (abcde) v58: Glu (abcde) -> Gln (abcde) v59: Glu (abcde) -> Pro (abcde) v60: Glu (abcde) + Glu (fghij) + C02 (k) + Gln (lmnop) + Asp (qrst) -> AKG (abcde) + Arg (fghijk) + Glu (lmnop) + Fum (qrst) v61: OAA.c (abcd) + Glu (efghi) -> Asp (abcd) + AKG (efghi) v62: Asp (abcd) + Gln (efghi) -> Asn (abcd) + Glu (efghi) v63: Pyr.m (abc) + Glu (defgh) -> Ala (abc) + AKG (defgh) v64: Asp (abcd) -> Thr (abcd) v65: Thr (abcd) -> Gly (ab) + AcCHO (cd) v66: Ser (abc) + AcCoA.c (de) + Asp (fghi) + C02(j) -> Cys (abc) + Acetate (de) + Suc.c (gihj) + C02 (f) v67: Asp (abcd) + C1 (e) + AcCoA.c (fg) -> Met (abcde) + Acetate (fg) v68: AKG (abcde) + AcCoA.c (fg) + Glu (hijkl) + Glu (mnopq) -> Lys (fgbcde) + C02 (a) + AKG (hijkl) + AKG (mnopq) v69: AcCoA.m (ab) + Pyr.m (cde) + Pyr.m (fgh) + Glu (ijklm) -> Leu (abdghe) + C02 (c) + C02 (f) + AKG (ijklm) v70: Thr (abcd) + Pyr.m (efg) + Glu (hijkl) -> Ile (abfcdg) + C02 (e) + AKG (hijkl) v71: Pyr.m (abc) + Pyr.m (def) + Glu (ghijk) -> Val (abecf) + C02 (d) + AKG (ghijk) v72: PEP (abc) + PEP (def) + E4P (ghij) + Glu (klmno) -> Phe (abcefghij) + C02 (d) + AKG (klmno) v73: PEP (abc) + PEP (def) + E4P (ghij) + Glu (klmno) -> Tyr (abcefghij) + C02 (d) + AKG (klmno)

181 v74: Ser (abc) + R5P (defgh) + PEP (ijk) + E4P (lmno) + PEP (pqr) + Gln (stuvw) -> Trp (abcedklmnoj) + C02 (i) + G3P (fgh) + Pyr.c (pqr) + Glu (stuvw) v75: R5P (abcde) + C1 (f) + Gln (ghijk) + Asp (lmno) -> His (edcbaf) + AKG (ghijk) + Fum (lmno)

Nucleotide Biosynthesis v76: C02 (b) + Asp (nfed) + R5P (vwxyz) + Gln (ijklm) -> UMP (bdefvwxyz) + C02 (n) + Glu (ijklm) v77: UMP (bdefvwxyz) + Gln (ijklm) -> CMP (bdefvwxyz) + Glu (ijklm) v78: R5P (vwxyz) + Gln (ijklm) + Gln (nopqr) + Gly (de) + Asp (stua) + C1 (b) + C1 (h) + C02 (f) -> IMP (bdefhvwxyz) + Glu (ijklm) + Glu (nopqr) + Fum (stua) v79: IMP (bdefhvwxyz) + Asp (ijkl) -> AMP (bdefhvwxyz) + Fum (ijkl) v80: IMP (bdefhvwxyz) + Gln (ijklm) -> GMP (bdefhvwxyz) + Glu (ijklm) v81: AMP (abcdefghij) -> DAMP (abcdefghij) v82: CMP (abcdefghi) -> DCMP (abcdefghi) v83: GMP (abcdefghij) -> DGMP (abcdefghij) v84: UMP (abcdefghi) -> DUMP (abcdefghi) v85: DUMP (abcdefghi) + C1 (j) -> DTMP (abcdjefghi)

Lipid Synthesis v86: Glyc3P (abc) -> PA (abc) v87: Glyc3P (def) + Ser (abc) -> PS (abcdef) v88: Glyc3P (def) + Ser (abc) -> PE (cbdef) + C02 (a) v89: Glyc3P (def) + Ser (abc) + C1 (g) + C1 (h) + C1 (i) -> PC (cbghidef) + C02 (a) v90: Glyc3P (ghi) + G6P (abcdef) -> PI (fedcbaghi) v91: AcCoA.c (ab) + AcCoA.c (cd) -> AcAcCoA (abcd) v92: AcAcCoA (abcd) + AcCoA.c (ef) -> HMGCoA (abcfed) v93: HMGCoA (abcdef) -> MVL (edcbaf) v94: MVL (abcdef) -> IPPP (edcbf) + C02 (a) v95: IPPP (abcde) -> DMAPP (abcde) v96: DMAPP (abcde) + IPPP (fghij) + IPPP + DMAPP + IPPP + IPPP -> C02 (d) + C02 (e) + C02 (j) + Zymo v97: Zymo + C1 -> Ergo

Carbohydrate Biosynthesis v98: G6P (abcdef) + G6P (ghijkl) -> Trehalose (abcdefghijkl) v99: G6P (abcdef) + AcCoA.c (gh) -> Chitin (abcdefgh) v100: G6P (abcdef) -> Glucan (abcdef) v101: F6P (abcdef) -> Mannan (abcdef)

C02 Output V102: C02 (a) -> C02.out (a)

182

Table A2. Model metabolic network for lipid production phase

Extracelluar fluxes v1: Acetate.ext (ab) -> AcCoA.c (ab) v2: AcCoA.c (ab) -> Lipids (ab) v3: Glyc3P (abc) -> TAGs (abc) v4: Cit.c (abcdef) -> Cit.ext (abcdef)

Pyruvate compartmentalization v5: 0 Pyr.c (abc) -> Pyr.mnt (abc) v6: 0 Pyr.m (abc) -> Pyr.mnt (abc) v7: Pyr.mnt (abc) -> Pyr.fix (abc)

Malate compartmentalization v8: 0 Mal.c (abcd) -> Mal.mnt (abcd) v9: 0 Mal.m (abcd) -> Mal.mnt (abcd) v10: Mal.mnt (abcd) -> Mal.fix (abcd)

Succinate compartmentalization v11: 0 Suc.c (abcd) -> Suc.mnt (abcd) v12: 0 Suc.m (abcd) -> Suc.mnt (abcd) v13: Suc.mnt (abcd) -> Suc.fix (abcd)

Alanine compartmentalization v14: 0 Pyr.c (abc) -> Ala (abc) v15: 0 Pyr.m (abc) -> Ala (abc) v16: Ala (abc) -> Ala.fix (abc)

Aspartate compartmentalization v17: 0 OAA.c (abcd) -> Asp (abcd) v18: 0 OAA.m (abcd) -> Asp (abcd) v19: Asp (abcd) -> Asp.fix (abcd)

Gluconeogenesis v20: OAA.c (abcd) -> PEP (abc) + C02 (d) v21: PEP (abc) <-> PG3 (abc) v22: PEP (abc) -> Pyr.c (abc) v23: PG3 (abc) <-> G3P (abc) v24: G3P (abc) <-> DHAP (abc) v25: DHAP (abc) -> Glyc3P (abc) v26: DHAP (cba) + G3P (def) <-> FBP (abcdef) v27: FBP (abcdef) <-> F6P (abcdef) v28: F6P (abcdef) <-> G6P (abcdef)

Pentose Phosphate Pathway v29: G6P (abcdef) -> C02 (a) + Ru5P (bcdef) v30: Ru5P (abcde) <-> R5P (abcde) v31: Ru5P (abcde) <-> X5P (abcde) v32: X5P (abcde) <-> G3P (cde) + TKC2 (ab) v33: E4P (cdef) + TKC2 (ab) <-> F6P (abcdef) v34: R5P (cdefg) + TKC2 (ab) <-> S7P (abcdefg) v35: G3P (def) + TAC3 (abc) <-> F6P (abcdef) v36: S7P (abcdefg) <-> E4P (defg) + TAC3 (abc)

Pyruvate metabolism v37: Mal.m (abcd) -> Pyr.m (abc) + C02 (d)

183 v38: Mal.c (abcd) <-> OAA.c (abcd)

Citric Acid Cycle v39: Pyr.m (abc) -> AcCoA.m (bc) + C02 (a) v40: OAA.m (abcd) + AcCoA.m (ef) -> Cit.m (dcbfea) v41: Cit.m (abcdef) <-> ICit.m (abcdef) v42: ICit.m (abcdef) -> AKG (abcde) + C02 (f) v43: AKG (abcde) -> Suc.m (bcde) + C02 (a) v44: Suc.m (abcd) <-> Fum (abcd) v45: Fum (abcd) <-> Mal.m (abcd) v46: Mal.m (abcd) <-> OAA.m (abcd)

Glyoxylate Shunt v47: Cit.c (abcdef) <-> ICit.c (abcdef) v48: ICit.c (abcdef) -> Glx (ab) + Suc.c (edcf) v49: Glx (ab) + AcCoA.c (cd) -> Mal.c (abdc) v50: Suc.c (abcd) -> Suc.m (abcd)

Transport Reactions v51: Pyr.c (abc) -> Pyr.m (abc) v52: Cit.m (abcdef) -> Cit.c (abcdef) v53: AcCoA.c (ab) <-> AcCoA.m (ab) v54: Mal.c (abcd) -> Mal.m (abcd)

C02 Output v55: C02 (a) -> C02.out (a)

184

Table A3. Biomass formula for the MTYL037 and MTYL065 strains during the growth phase

MTYL037 (mmol/g DCW) MTYL065 (mmol/g DCW)

Amino acids 0.294 0.241 Arginine 0.174 0.143 Asparagine 0.196 0.161 Aspartate 0.196 0.161 Cysteine 0.036 0.030 Glutamine 0.239 0.196 Glutamate 0.239 0.196 Glycine 0.214 0.175 0.112 0.092 Isoleucine 0.159 0.131 Leucine 0.319 0.262 Lysine 0.254 0.208 Methionine 0.112 0.092 Phenylalanine 0.174 0.143 Proline 0.170 0.140 Serine 0.203 0.167 Threonine 0.221 0.181 Tryptophan 0.036 0.030 Tyrosine 0.112 0.092 Valine 0.221 0.181 Nucleotides CMP 0.044 0.036 GMP 0.070 0.058 UMP 0.047 0.039 AMP 0.057 0.047 dAMP 0.029 0.024 dCMP 0.028 0.023 dGMP 0.028 0.023 dTMP 0.029 0.024 Carbohydrates Chitin 0.261 0.228 Trehalose 0.0022 0.0019 Mannan 0.080 0.070 Glucan 0.273 0.238 Lipids other than TAGs Ergosterol 0.013 0.011 Zymosterol 0.060 0.050 PA 0.0016 0.0013 PI 0.015 0.012 PS 0.0045 0.0037 PE 0.012 0.010 PC 0.017 0.014

185

Table A4. Intracellular metabolite mass isotopomer distribution during growth phase

Notation: Fragment: the ion fragment measured by the mass spectrometer. The number after the underscore represents the mass of the M+0 isotopomer of the fragment. Mass isotopomers: the number of carbon atoms labeled with 13C within a metabolite. Measured: experimentally determined mole fraction of each mass isotopomer. Simulated: mole fraction of each mass isotopomer obtained by simulating the best-fit flux results

13 1- C1 sodium acetate tracer MTYL037 MTYL065 Fragment Mass isotopomer Measured Simulated Measured Simulated Pyr_174 M+0 0.3255 0.3263 0.3235 0.3171 (GC-MS) M+1 0.5486 0.5527 0.5613 0.5654 M+2 0.0951 0.0904 0.0858 0.0877 M+3 0.0280 0.0278 0.0265 0.0272 M+4 0.0025 0.0025 0.0029 0.0023 M+5 0.0013 0.0003 0 0.0004 Ala_232 M+0 0.7217 0.7239 0.7287 0.7309 (GC-MS) M+1 0.1853 0.1867 0.1807 0.1827 M+2 0.0774 0.0744 0.0758 0.0724 M+3 0.0130 0.0125 0.0124 0.0118 M+4 0.0025 0.0022 0.0024 0.0020 Ala_260 M+0 0.2765 0.2829 0.2771 0.2732 (GC-MS) M+1 0.5217 0.5200 0.5281 0.5266 M+2 0.1397 0.1380 0.1365 0.1343 M+3 0.0519 0.0495 0.0490 0.0500 M+4 0.0086 0.0080 0.0078 0.0110 M+5 0.0016 0.0014 0.0013 0.0049 Gly_218 M+0 0.7448 0.7414 0.7469 0.7483 (GC-MS) M+1 0.1747 0.1766 0.1722 0.1735 M+2 0.0695 0.0693 0.0687 0.0716 M+3 0.0110 0.0107 0.0103 0.0130 Gly_246 M+0 0.2955 0.2930 0.2982 0.2942 (GC-MS) M+1 0.5243 0.5239 0.5193 0.5203 M+2 0.1261 0.1286 0.1158 0.1307 M+3 0.0466 0.0463 0.0455 0.0465 M+4 0.0075 0.0069 0.0112 0.0071 Suc_289 M+0 0.0536 0.0466 0.0397 0.0386 (GC-MS) M+1 0.5012 0.5028 0.5192 0.5199 M+2 0.3248 0.3241 0.3153 0.3166 M+3 0.0907 0.0940 0.0922 0.0933 M+4 0.0250 0.0274 0.0272 0.0266 M+5 0.0033 0.0043 0.0052 0.0042 M+6 0.0014 0.0007 0.0013 0.0007 Fum_287 M+0 0.0418 0.0426 0.0289 0.0351 (GC-MS) M+1 0.4900 0.4859 0.4919 0.4959 M+2 0.3365 0.3403 0.3419 0.3381 M+3 0.0983 0.0967 0.0970 0.0967

186

M+4 0.0290 0.0290 0.0345 0.0288 M+5 0.0064 0.0046 0.0058 0.0045 Ser_390 M+0 0.2390 0.2447 0.2355 0.2382 (GC-MS) M+1 0.4746 0.4704 0.4819 0.4760 M+2 0.1852 0.1846 0.1822 0.1848 M+3 0.0773 0.0765 0.0762 0.0771 M+4 0.0190 0.0187 0.0190 0.0187 M+5 0.0049 0.0043 0.0052 0.0043 Akg_346 M+0 0.0260 0.0244 0.0145 0.0194 (GC-MS) M+1 0.2879 0.2936 0.2841 0.2875 M+2 0.4863 0.4881 0.4986 0.4961 M+3 0.1344 0.1349 0.1378 0.1368 M+4 0.0506 0.0488 0.0500 0.0498 M+5 0.0105 0.0084 0.0079 0.0086 M+5 0.0028 0.0015 0.0041 0.0016 M+6 0.0015 0.0002 0.0029 0.0002 Mal_391 M+0 0.2277 0.2229 0.2273 0.2277 (GC-MS) M+1 0.4943 0.4970 0.4867 0.4936 M+2 0.1756 0.1808 0.1762 0.1801 M+3 0.0783 0.0768 0.0795 0.0762 M+4 0.0173 0.0177 0.0213 0.0176 M+5 0.0069 0.0041 0.0090 0.0041 Mal_419 M+0 0.0331 0.0340 0.0260 0.0289 (GC-MS) M+1 0.4045 0.4089 0.4128 0.4210 M+2 0.3516 0.3544 0.3499 0.3495 M+3 0.1403 0.1382 0.1405 0.1373 M+4 0.0528 0.0500 0.0525 0.0490 M+5 0.0128 0.0116 0.0128 0.0114 M+6 0.0048 0.0025 0.0054 0.0025 Glu_330 M+0 0.0624 0.0630 0.0525 0.0506 (GC-MS) M+1 0.6587 0.6563 0.6705 0.6643 M+2 0.1878 0.1933 0.1891 0.1962 M+3 0.0735 0.0715 0.0715 0.0727 M+4 0.0140 0.0133 0.0145 0.0136 M+5 0.0029 0.0023 0.0022 0.0023 M+6 0.0006 0.0003 0.0007 0.0003 Gln_431 M+0 0.0230 0.0213 0.0183 0.0170 (GC-MS) M+1 0.2546 0.2585 0.2494 0.2527 M+2 0.4468 0.4549 0.4556 0.4611 M+3 0.1734 0.1736 0.1741 0.1759 M+4 0.0759 0.0705 0.0766 0.0718 M+5 0.0195 0.0167 0.0193 0.0170 M+6 0.0054 0.0038 0.0053 0.0038 M+7 0.0014 0.0006 0.0013 0.0006 PEP_453 M+0 0.2379 0.2446 0.2374 0.2365 (GC-MS) M+1 0.4622 0.4677 0.4738 0.4770 M+2 0.1836 0.1837 0.1843 0.1861 M+3 0.0828 0.0788 0.0793 0.0826 M+4 0.0244 0.0196 0.0195 0.0241 M+5 0.0091 0.0047 0.0047 0.0049 187

Cit_459 M+0 0.0023 0.0032 0.0016 0.0022 (GC-MS) M+1 0.0669 0.0704 0.0561 0.0583 M+2 0.4070 0.4102 0.4090 0.4186 M+3 0.3289 0.3257 0.3361 0.3284 M+4 0.1318 0.1303 0.1346 0.1319 M+5 0.0473 0.0462 0.0478 0.0466 M+6 0.0111 0.0110 0.0113 0.0111 M+7 0.0036 0.0024 0.0027 0.0024 M+8 0.0011 0.0004 0.0009 0.0004 3PG_585 M+0 0.2154 0.2103 0.2069 0.2041 (GC-MS) M+1 0.4235 0.4272 0.4316 0.4317 M+2 0.2069 0.2145 0.2155 0.2155 M+3 0.1027 0.1036 0.1044 0.1043 M+4 0.0372 0.0325 0.0326 0.0326 M+5 0.0143 0.0093 0.0091 0.0093 S7P289_97 M+0 0.1177 0.1082 0.0979 0.0906 (LC-MS/MS) M+1 0.3120 0.3201 0.2938 0.2983 M+2 0.3317 0.3338 0.3476 0.3471 M+3 0.1763 0.1754 0.1936 0.1945 M+4 0.0507 0.0519 0.0579 0.0582 M+5 0.0091 0.0094 0.0092 0.0101 M+6 0.0024 0.0011 0 0.0092 G6P259_97 M+0 0.1574 0.1509 0.1294 0.1255 (LC-MS/MS) M+1 0.3902 0.4000 0.3741 0.3798 M+2 0.3227 0.3222 0.3564 0.3519 M+3 0.1037 0.1038 0.1128 0.1170 M+4 0.0220 0.0205 0.0272 0.0228 M+5 0.0040 0.0023 0 0.0016 DHAP169_97 M+0 0.3701 0.3715 0.3538 0.3573 (LC-MS/MS) M+1 0.5826 0.5789 0.5938 0.5919 M+2 0.0390 0.0430 0.0446 0.0444 M+3 0.0064 0.0063 0.0078 0.0062 M+4 0.0019 0.0002 0 0.0009 3PG185_79 M+0 0.3754 0.3667 0.3601 0.3559 (LC-MS/MS) M+1 0.5814 0.5871 0.5935 0.5996 M+2 0.0361 0.0382 0.0387 0.0369 M+3 0.0071 0.0077 0.0077 0.0073 PEP167_79 M+0 0.3771 0.3677 0.3644 0.3569 (LC-MS/MS) M+1 0.5790 0.5884 0.5988 0.6010 M+2 0.0371 0.0371 0.0307 0.0358 M+3 0.0068 0.0065 0.0062 0.0061

13 40% U- C2 sodium acetate tracer MTYL037 MTYL065 Fragment Mass isotopomer Measured Simulated Measured Simulated Pyr_174 M+0 0.2546 0.2507 0.2596 0.2558 (GC-MS) M+1 0.3074 0.3138 0.3060 0.3076 M+2 0.2690 0.2716 0.2676 0.2693

188

M+3 0.1422 0.1387 0.1399 0.1417 M+4 0.0202 0.0200 0.0209 0.0202 M+5 0.0066 0.0049 0.0061 0.0051 Ala_232 M+0 0.2771 0.2725 0.2751 0.2725 (GC-MS) M+1 0.4211 0.4248 0.4207 0.4248 M+2 0.2257 0.2247 0.2277 0.2247 M+3 0.0602 0.0600 0.0606 0.0600 M+4 0.0159 0.0156 0.0160 0.0156 Ala_260 M+0 0.2230 0.2219 0.2163 0.2253 (GC-MS) M+1 0.2930 0.2944 0.2954 0.2906 M+2 0.2736 0.2729 0.2768 0.2711 M+3 0.1587 0.1583 0.1596 0.1600 M+4 0.0402 0.0397 0.0404 0.0400 M+5 0.0115 0.0111 0.0114 0.0113 Gly_218 M+0 0.4625 0.4582 0.4582 0.4626 (GC-MS) M+1 0.4049 0.4040 0.4040 0.4058 M+2 0.0986 0.1009 0.1009 0.0980 M+3 0.0341 0.0318 0.0318 0.0303 Gly_246 M+0 0.3814 0.3722 0.2951 0.2966 (GC-MS) M+1 0.2403 0.2431 0.3822 0.3782 M+2 0.2925 0.2943 0.2408 0.2430 M+3 0.0616 0.0641 0.0615 0.0580 M+4 0.0243 0.0227 0.0175 0.0205 Suc_289 M+0 0.1883 0.1828 0.1797 0.1819 (GC-MS) M+1 0.2042 0.2043 0.2176 0.2068 M+2 0.2800 0.2884 0.2765 0.2853 M+3 0.1754 0.1748 0.1817 0.1768 M+4 0.1146 0.1137 0.1102 0.1131 M+5 0.0282 0.0266 0.0262 0.0266 M+6 0.0093 0.0081 0.0081 0.0081 Fum_287 M+0 0.1903 0.1861 0.1865 0.1862 (GC-MS) M+1 0.1927 0.1990 0.2027 0.1997 M+2 0.2931 0.2919 0.2845 0.2901 M+3 0.1719 0.1714 0.1766 0.1722 M+4 0.1226 0.1154 0.1199 0.1154 M+5 0.0294 0.0267 0.0299 0.0268 Ser_390 M+0 0.1979 0.1884 0.1975 0.1910 (GC-MS) M+1 0.2777 0.2783 0.2770 0.2757 M+2 0.2717 0.2742 0.2718 0.2725 M+3 0.1732 0.1739 0.1737 0.1750 M+4 0.0596 0.0603 0.0599 0.0606 M+5 0.0200 0.0196 0.0201 0.0198 Akg_346 M+0 0.1316 0.1259 0.1302 0.1271 (GC-MS) M+1 0.1718 0.1744 0.1727 0.1735 M+2 0.2499 0.2532 0.2483 0.2528 M+3 0.2139 0.2156 0.2121 0.2152 M+4 0.1271 0.1357 0.1348 0.1356 M+5 0.0728 0.0712 0.0739 0.0716 M+6 0.0218 0.0181 0.0204 0.0182 M+7 0.0111 0.0051 0.0077 0.0051 189

Mal_391 M+0 0.2043 0.1981 0.1960 0.1974 (GC-MS) M+1 0.2690 0.2686 0.2719 0.2694 M+2 0.2621 0.2681 0.2666 0.2685 M+3 0.1784 0.1776 0.1775 0.1773 M+4 0.0634 0.0615 0.0640 0.0614 M+5 0.0228 0.0206 0.0240 0.0205 Mal_419 M+0 0.1714 0.1624 0.1570 0.1616 (GC-MS) M+1 0.1831 0.1865 0.1966 0.1884 M+2 0.2840 0.2801 0.2739 0.2780 M+3 0.1760 0.1823 0.1879 0.1837 M+4 0.1279 0.1286 0.1264 0.1283 M+5 0.0414 0.0417 0.0421 0.0417 M+6 0.0162 0.0145 0.0162 0.0145 Glu_330 M+0 0.1540 0.1466 0.1630 0.1477 (GC-MS) M+1 0.2501 0.2508 0.2505 0.2504 M+2 0.2485 0.2537 0.2439 0.2521 M+3 0.2066 0.2097 0.2027 0.2098 M+4 0.1031 0.1022 0.1021 0.1027 M+5 0.0297 0.0284 0.0297 0.0286 M+6 0.0080 0.0072 0.0081 0.0073 Gln_431 M+0 0.1169 0.1098 0.1141 0.1108 (GC-MS) M+1 0.1676 0.1643 0.1673 0.1637 M+2 0.2379 0.2414 0.2374 0.2411 M+3 0.2146 0.2179 0.2158 0.2175 M+4 0.1426 0.1469 0.1440 0.1468 M+5 0.0822 0.0819 0.0828 0.0823 M+6 0.0284 0.0269 0.0286 0.0271 M+7 0.0098 0.0085 0.0099 0.0085 PEP_453 M+0 0.1876 0.1873 0.1874 0.1904 (GC-MS) M+1 0.2810 0.2773 0.2725 0.2743 M+2 0.2732 0.2739 0.2753 0.2720 M+3 0.1783 0.1741 0.1779 0.1753 M+4 0.0627 0.0613 0.0643 0.0616 M+5 0.0186 0.0204 0.0227 0.0207 Cit_459 M+0 0.0937 0.0902 0.0897 0.0909 (GC-MS) M+1 0.1133 0.1163 0.1157 0.1161 M+2 0.2265 0.2280 0.2210 0.2275 M+3 0.1860 0.1920 0.1910 0.1917 M+4 0.1879 0.1877 0.1864 0.1874 M+5 0.1014 0.1008 0.1044 0.1009 M+6 0.0629 0.0590 0.0632 0.0592 M+7 0.0206 0.0184 0.0210 0.0185 M+8 0.0077 0.0061 0.0076 0.0061 3PG_585 M+0 0.1586 0.1611 0.1675 0.1637 (GC-MS) M+1 0.2550 0.2576 0.2573 0.2553 M+2 0.2695 0.2706 0.2726 0.2687 M+3 0.1888 0.1880 0.1899 0.1887 M+4 0.0851 0.0808 0.0815 0.0812 M+5 0.0275 0.0306 0.0311 0.0309 S7P289_97 M+0 0.0340 0.0355 0.0424 0.0393 190

(LC-MS/MS) M+1 0.1314 0.1355 0.1389 0.1376 M+2 0.2446 0.2410 0.2388 0.2374 M+3 0.2627 0.2642 0.2624 0.2589 M+4 0.1988 0.1944 0.1832 0.1926 M+5 0.0944 0.0954 0.0987 0.0972 M+6 0.0245 0.0289 0.0285 0.0312 M+7 0 0.0026 0.0071 0.0054 G6P259_97 M+0 0.0603 0.0622 0.0691 0.0672 (LC-MS/MS) M+1 0.1912 0.1876 0.1848 0.1876 M+2 0.2709 0.2759 0.2740 0.2706 M+3 0.2510 0.2538 0.2460 0.2502 M+4 0.1511 0.1524 0.1542 0.1523 M+5 0.0569 0.0567 0.0608 0.0594 M+6 0.0126 0.0107 0.0110 0.0121 DHAP169_97 M+0 0.2771 0.2781 0.2828 0.2781 (LC-MS/MS) M+1 0.3373 0.3354 0.3292 0.3318 M+2 0.2691 0.2683 0.2665 0.2642 M+3 0.1165 0.1166 0.1198 0.1221 M+4 0 0.0011 0.0012 0.0017 3PG185_79 M+0 0.2867 0.2809 0.2811 0.2855 (LC-MS/MS) M+1 0.3211 0.3283 0.3226 0.3223 M+2 0.2735 0.2668 0.2692 0.2651 M+3 0.1187 0.1205 0.1282 0.1236 PEP167_79 M+0 0.2846 0.2817 0.2875 0.2863 (LC-MS/MS) M+1 0.3249 0.3290 0.3239 0.3230 M+2 0.2682 0.2667 0.2657 0.2651 M+3 0.1223 0.1200 0.1228 0.1231

191

Table A5. Intracellular metabolite mass isotopomer distribution during lipid production phase

Notation: Fragment: the ion fragment measured by the mass spectrometer. The number after the underscore represents the mass of the M+0 isotopomer of the fragment. Mass isotopomers: the number of carbon atoms labeled with 13C within a metabolite. Measured: experimentally determined mole fraction of each mass isotopomer. Simulated: mole fraction of each mass isotopomer obtained by simulating the best-fit flux results

13 1- C1 sodium acetate tracer MTYL037 MTYL065 Fragment Mass isotopomer Measured Simulated Measured Simulated Pyr_174 M+0 0.3608 0.3618 0.3567 0.3560 (GC-MS) M+1 0.5062 0.5111 0.5120 0.5135 M+2 0.0982 0.0965 0.0986 0.0985 M+3 0.0273 0.0274 0.0281 0.0286 M+4 0.0055 0.0029 0.0038 0.0031 M+5 0.0019 0.0003 0.0007 0.0004 Ala_232 M+0 0.6803 0.6827 0.6814 0.6819 (GC-MS) M+1 0.2123 0.2139 0.2114 0.2130 M+2 0.0871 0.0838 0.0871 0.0850 M+3 0.0169 0.0162 0.0168 0.0164 M+4 0.0034 0.0030 0.0033 0.0031 Ala_260 M+0 0.3108 0.3161 0.3142 0.3110 (GC-MS) M+1 0.4793 0.4804 0.4747 0.4820 M+2 0.1463 0.1435 0.1470 0.1453 M+3 0.0522 0.0494 0.0525 0.0507 M+4 0.0096 0.0089 0.0097 0.0092 M+5 0.0018 0.0015 0.0019 0.0016 Suc_289 M+0 0.0703 0.0708 0.0689 0.0696 (GC-MS) M+1 0.5235 0.5163 0.5215 0.5156 M+2 0.2906 0.2953 0.2855 0.2967 M+3 0.0822 0.0885 0.0913 0.0889 M+4 0.0272 0.0245 0.0265 0.0247 M+5 0.0051 0.0039 0.0050 0.0040 M+6 0.0013 0.0006 0.0013 0.0006 Fum_287 M+0 0.0813 0.0687 0.0679 0.0683 (GC-MS) M+1 0.5025 0.5041 0.5086 0.5121 M+2 0.3088 0.3064 0.3017 0.3003 M+3 0.0867 0.0903 0.0845 0.0895 M+4 0.0345 0.0257 0.0324 0.0250 M+5 0.0061 0.0041 0.0050 0.0040 Akg_346 M+0 0.0393 0.0392 0.0343 0.0392 (GC-MS) M+1 0.3262 0.3276 0.3239 0.3302 M+2 0.4564 0.4514 0.4582 0.4497 M+3 0.1226 0.1275 0.1217 0.1271 M+4 0.0405 0.0448 0.0499 0.0445 M+5 0.0103 0.0078 0.0095 0.0077 M+6 0.0028 0.0014 0.0024 0.0014 M+7 0.0017 0.0002 0.0011 0.0002 192

Mal_391 M+0 0.2424 0.2441 0.2621 0.2642 (GC-MS) M+1 0.4815 0.4828 0.4625 0.4699 M+2 0.1742 0.1771 0.1702 0.1733 M+3 0.0766 0.0742 0.0797 0.0716 M+4 0.0194 0.0172 0.0192 0.0166 M+5 0.0058 0.0039 0.0060 0.0038 Mal_419 M+0 0.0534 0.0559 0.0553 0.0571 (GC-MS) M+1 0.4213 0.4212 0.4343 0.4423 M+2 0.3342 0.3322 0.3228 0.3175 M+3 0.1309 0.1312 0.1290 0.1272 M+4 0.0466 0.0461 0.0440 0.0433 M+5 0.0110 0.0107 0.0106 0.0101 M+6 0.0023 0.0023 0.0044 0.0021 Asp_390 M+0 0.2494 0.2434 0.2570 0.2583 (GC-MS) M+1 0.4792 0.4832 0.4698 0.4738 M+2 0.1772 0.1781 0.1728 0.1751 M+3 0.0727 0.0737 0.0758 0.0718 M+4 0.0175 0.0170 0.0197 0.0166 M+5 0.0040 0.0038 0.0061 0.0037 Asp_418 M+0 0.0534 0.0558 0.0513 0.0558 (GC-MS) M+1 0.4103 0.4206 0.4342 0.4384 M+2 0.3362 0.3332 0.3244 0.3215 M+3 0.1349 0.1313 0.1290 0.1281 M+4 0.0496 0.0458 0.0450 0.0436 M+5 0.0120 0.0106 0.0116 0.0101 M+6 0.0036 0.0023 0.0034 0.0021 Glu_330 M+0 0.0879 0.0910 0.0939 0.0898 (GC-MS) M+1 0.6352 0.6364 0.6353 0.6381 M+2 0.1864 0.1882 0.1827 0.1879 M+3 0.0724 0.0690 0.0683 0.0690 M+4 0.0142 0.0128 0.0152 0.0128 M+5 0.0031 0.0022 0.0036 0.0022 M+6 0.0008 0.0003 0.0010 0.0003 PEP_453 M+0 0.2761 0.2711 0.2703 0.2665 (GC-MS) M+1 0.4460 0.4426 0.4528 0.4436 M+2 0.1884 0.1838 0.1774 0.1853 M+3 0.0803 0.0769 0.0760 0.0783 M+4 0.0241 0.0198 0.0199 0.0203 M+5 0.0067 0.0048 0.0036 0.0050 S7P289_97 M+0 0.1370 0.1403 0.1035 0.1089 (LC-MS/MS) M+1 0.3430 0.3408 0.3108 0.3078 M+2 0.3112 0.3139 0.3297 0.3310 M+3 0.1515 0.1519 0.1798 0.1817 M+4 0.0455 0.0438 0.0564 0.0574 M+5 0.0095 0.0081 0.0098 0.0114 M+6 0.0023 0.0011 0 0.0009 G6P259_97 M+0 0.1734 0.1754 0.1550 0.1474 (LC-MS/MS) M+1 0.3983 0.3957 0.3673 0.3754 M+2 0.2976 0.3021 0.3289 0.3239 M+3 0.1044 0.1025 0.1159 0.1217 193

M+4 0.0232 0.0212 0.0281 0.0273 M+5 0.0024 0.0028 0.0049 0.0038 M+6 0.0006 0.0003 0 0.0001 DHAP169_97 M+0 0.4222 0.4243 0.3848 0.3909 (LC-MS/MS) M+1 0.4474 0.4425 0.5131 0.5098 M+2 0.1131 0.1178 0.0875 0.0867 M+3 0.0186 0.0149 0.0146 0.0122 M+4 0.0021 0.0005 0 0.0007 3PG185_79 M+0 0.4120 0.4064 0.4007 0.3995 (LC-MS/MS) M+1 0.5382 0.5370 0.5416 0.5408 M+2 0.0428 0.0481 0.0477 0.0501 M+3 0.0069 0.0080 0 0.0064

13 40% U- C2 sodium acetate tracer MTYL037 MTYL065 Fragment Mass isotopomer Measured Simulated Measured Simulated Pyr_174 M+0 0.2450 0.2403 0.2423 0.2414 (GC-MS) M+1 0.3198 0.3266 0.3230 0.3252 M+2 0.2782 0.2762 0.2762 0.2757 M+3 0.1347 0.1325 0.1340 0.1331 M+4 0.0172 0.0195 0.0194 0.0196 M+5 0.0051 0.0047 0.0052 0.0047 Ala_232 M+0 0.2801 0.2738 0.2770 0.2749 (GC-MS) M+1 0.4173 0.4225 0.4171 0.4206 M+2 0.2269 0.2255 0.2294 0.2263 M+3 0.0600 0.0600 0.0605 0.0601 M+4 0.0157 0.0156 0.0160 0.0157 Ala_260 M+0 0.2010 0.2099 0.2012 0.2109 (GC-MS) M+1 0.3138 0.3078 0.3097 0.3067 M+2 0.2861 0.2794 0.2853 0.2789 M+3 0.1504 0.1522 0.1539 0.1527 M+4 0.0384 0.0386 0.0392 0.0387 M+5 0.0103 0.0104 0.0107 0.0105 Suc_289 M+0 0.1776 0.1752 0.1753 0.1757 (GC-MS) M+1 0.2117 0.2143 0.2152 0.2136 M+2 0.2875 0.2851 0.2836 0.2853 M+3 0.1781 0.1804 0.1814 0.1800 M+4 0.1098 0.1098 0.1084 0.1101 M+5 0.0265 0.0262 0.0276 0.0262 M+6 0.0086 0.0078 0.0086 0.0078 Fum_287 M+0 0.1793 0.1774 0.1760 0.1766 (GC-MS) M+1 0.2085 0.2107 0.2122 0.2123 M+2 0.2899 0.2875 0.2928 0.2861 M+3 0.1779 0.1780 0.1771 0.1792 M+4 0.1151 0.1110 0.1131 0.1105 M+5 0.0282 0.0262 0.0288 0.0262 Akg_346 M+0 0.1275 0.1204 0.1194 0.1203 (GC-MS) M+1 0.1781 0.1791 0.1790 0.1792

194

M+2 0.2504 0.2542 0.2501 0.2542 M+3 0.2134 0.2176 0.2170 0.2176 M+4 0.1338 0.1364 0.1374 0.1364 M+5 0.0710 0.0690 0.0715 0.0689 M+6 0.0190 0.0176 0.0192 0.0176 M+7 0.0067 0.0048 0.0065 0.0048 Mal_391 M+0 0.1938 0.1944 0.1864 0.1917 (GC-MS) M+1 0.2778 0.2723 0.2816 0.2751 M+2 0.2675 0.2704 0.2744 0.2721 M+3 0.1758 0.1761 0.1742 0.1750 M+4 0.0627 0.0611 0.0618 0.0607 M+5 0.0225 0.0203 0.0217 0.0200 Mal_419 M+0 0.1557 0.1558 0.1429 0.1528 (GC-MS) M+1 0.2013 0.1943 0.2102 0.1992 M+2 0.2795 0.2781 0.2733 0.2748 M+3 0.1869 0.1871 0.1974 0.1903 M+4 0.1213 0.1258 0.1211 0.1244 M+5 0.0399 0.0411 0.0406 0.0409 M+6 0.0154 0.0141 0.0145 0.0138 Asp_390 M+0 0.1935 0.1942 0.1858 0.1922 (GC-MS) M+1 0.2784 0.2725 0.2819 0.2745 M+2 0.2681 0.2705 0.2744 0.2718 M+3 0.1752 0.1763 0.1741 0.1755 M+4 0.0626 0.0610 0.0621 0.0608 M+5 0.0223 0.0201 0.0218 0.0199 Asp_418 M+0 0.1551 0.1557 0.1516 0.1533 (GC-MS) M+1 0.2017 0.1945 0.2007 0.1984 M+2 0.2796 0.2781 0.2736 0.2753 M+3 0.1873 0.1873 0.1982 0.1899 M+4 0.1212 0.1257 0.1210 0.1247 M+5 0.0398 0.0411 0.0406 0.0410 M+6 0.0152 0.0139 0.0143 0.0138 Glu_330 M+0 0.1592 0.1435 0.1403 0.1436 (GC-MS) M+1 0.2514 0.2522 0.2488 0.2521 M+2 0.2479 0.2575 0.2530 0.2573 M+3 0.2037 0.2096 0.2093 0.2096 M+4 0.1010 0.1009 0.1004 0.1009 M+5 0.0290 0.0280 0.0278 0.0280 M+6 0.0078 0.0071 0.0103 0.0071 PEP_453 M+0 0.1797 0.1796 0.1757 0.1806 (GC-MS) M+1 0.2803 0.2851 0.2815 0.2841 M+2 0.2805 0.2788 0.2810 0.2782 M+3 0.1704 0.1710 0.1753 0.1714 M+4 0.0595 0.0603 0.0647 0.0604 M+5 0.0160 0.0198 0.0218 0.0198 S7P289_97 M+0 0.0326 0.0333 0.0356 0.0355 (LC-MS/MS) M+1 0.1261 0.1322 0.1341 0.1339 M+2 0.2314 0.2435 0.2398 0.2413 M+3 0.2789 0.2705 0.2657 0.2667 M+4 0.1978 0.1959 0.1951 0.1948 195

M+5 0.1002 0.0928 0.0949 0.0943 M+6 0.0291 0.0271 0.0292 0.0285 M+7 0.0039 0.0043 0.0057 0.0047 G6P259_97 M+0 0.0607 0.0577 0.0601 0.0602 (LC-MS/MS) M+1 0.1798 0.1851 0.1801 0.1857 M+2 0.2908 0.2827 0.2835 0.2796 M+3 0.2578 0.2600 0.2560 0.2575 M+4 0.1512 0.1509 0.1554 0.1513 M+5 0.0499 0.0533 0.0537 0.0548 M+6 0.0098 0.0096 0.0113 0.0102 F6P259_97 M+0 0.0586 0.0577 0.0611 0.0602 (LC-MS/MS) M+1 0.1867 0.1851 0.1851 0.1857 M+2 0.2903 0.2827 0.2804 0.2796 M+3 0.2468 0.2600 0.2613 0.2575 M+4 0.1560 0.1509 0.1602 0.1513 M+5 0.0615 0.0533 0.0518 0.0548 3PG185_79 M+0 0.2752 0.2693 0.2723 0.2707 (LC-MS/MS) M+1 0.3416 0.3436 0.3440 0.3417 M+2 0.2727 0.2711 0.2708 0.2705 M+3 0.1105 0.1127 0.1129 0.1137 PEP167_79 M+0 0.2796 0.2700 0.2726 0.2715 (LC-MS/MS) M+1 0.3402 0.3443 0.3405 0.3424 M+2 0.2727 0.2710 0.2697 0.2705 M+3 0.1076 0.1121 0.1172 0.1131

196

Table A6. MTYL037 growth phase best-fit flux values and flux confidence intervals

Notation: 95 lb: lower bound of flux 95% confidence interval 68 lb: lower bound of flux 68% confidence interval Best: best-fit flux value 68 ub: upper bound of flux 68% confidence interval 95 ub: upper bound of flux 95% confidence interval Net: net flux for a reversible reaction. Exch: exchange flux for a reversible reaction.

Flux 95 lb 68 lb Best 68 ub 95 ub v1 100 100 100 100 100 v2 10.411 11.165 11.935 12.680 13.448 v3 0.40532 0.43330 0.46187 0.49058 0.51855 v4 2.1247 2.2056 2.2775 2.3415 2.3977 v5 0.12702 0.52245 0.86471 1 1 v6 0 0 0.13529 0.47755 0.87298 v7 1 1 1 1 1 v8 0.050662 0.062102 0.073154 0.084373 0.096395 v9 0.90360 0.91563 0.92685 0.93790 0.94934 v10 1 1 1 1 1 v11 0 0 0.020452 0.042429 0.063070 v12 0.93693 0.95757 0.97955 1 1 v13 1 1 1 1 1 v14 0 0 0.52986 1 1 v15 0 0 0.47014 1 1 v16 1 1 1 1 1 v17 15.810 17.040 18.349 19.757 20.980 v18 net 10.266 10.820 11.317 11.906 12.507 v18 exch 175.74 625.91 2.47E+05 inf inf v19 3.7494 4.6213 5.5649 6.6029 8.0829 v20 net 8.7334 9.2058 9.6592 10.238 10.847 v20 exch 27.341 34.756 46.022 67.165 115.84 v21 net 5.6101 5.9999 6.4720 7.1132 7.7990 v21 exch 92.669 369.19 2.44E+05 inf inf v22 0.51936 0.54768 0.57597 0.60412 0.63217 v23 net 5.0377 5.4135 5.8960 6.5395 7.2288 v23 exch 3.5341 3.9646 32.511 inf inf v24 net 5.0377 5.4135 5.8960 6.5395 7.2288 v24 exch 3.5373 3.9707 6.0805 inf inf v25 net 9.1347 10.176 11.701 13.764 15.988 v25 exch 0 0 55.702 inf inf v26 7.8610 8.9007 10.440 12.533 14.790 v27 net 3.6008 3.9544 4.4534 5.1284 5.8542 v27 exch 0.35776 1.3355 5.2849 121.65 inf v28 net 4.2510 4.9550 5.9870 7.4051 8.9356 v28 exch 1.9069 4.5755 60.642 inf inf v29 net 4.2510 4.9550 5.9870 7.4051 8.9356 v29 exch 1.9068 4.5737 65.154 inf inf v30 net 1.7518 2.1120 2.6268 3.3443 4.1188

197 v30 exch 4.3701 17.647 29.193 47.491 84.799 v31 net 2.4985 2.8412 3.3602 4.0606 4.8161 v31 exch 11.490 92.716 56924 inf inf v32 net 2.4985 2.8412 3.3602 4.0606 4.8161 v32 exch 13.793 47.424 28914 inf inf v33 net 2.4985 2.8412 3.3602 4.0606 4.8161 v33 exch 7.7755 10.700 16.179 25.038 80.886 v34 1.8396 2.7573 3.7732 4.6907 5.6655 v35 net 20.157 21.332 22.589 23.944 25.124 v35 exch 0 0 -1.728E-06 inf inf v36 0.27065 0.38988 0.54105 0.68962 0.85164 v37 100.31 100.33 100.34 100.35 100.35 v38 5.6116 5.7659 5.9286 6.6573 8.1501 v39 60.264 61.024 61.824 62.646 63.934 v40 net 32.589 33.383 34.360 35.323 36.272 v40 exch 0 0 0.00052408 inf inf v41 32.589 33.383 34.360 35.323 36.272 v42 30.104 30.911 31.909 32.852 33.886 v43 net 57.870 58.641 59.456 60.323 61.672 v43 exch 0 0 1E-07 25.951 54.207 v44 net 59.151 59.915 60.722 61.574 62.883 v44 exch 429.02 1640.1 4.08E+05 inf inf v45 net 60.264 61.024 61.824 62.646 63.934 v45 exch 0 0 1.28E-07 inf inf v46 net 26.617 27.033 27.464 27.964 28.750 v46 exch 0 0 38.584 inf inf v47 26.617 27.033 27.464 27.964 28.750 v48 26.617 27.033 27.464 27.964 28.750 v49 26.701 27.114 27.546 28.036 28.828 v50 1.4406 1.5528 1.6580 1.7620 1.8483 v51 net 0.76899 0.86233 0.95536 1.0457 1.1190 v51 exch 0 0 0.50494 13.99674 inf v52 net 0.45580 0.51527 0.58995 0.66626 0.74129 v52 exch 0 0 1E-07 0.008328 0.029139 v53 3.8294 4.7033 5.6469 6.6847 8.1629 v54 26.617 27.033 27.464 27.964 28.750 v55 net 55.152 55.931 56.622 57.350 58.075 v55 exch 0 0 1E-07 17.445 105.71 v56 2.9641 3.8705 4.8755 5.7886 6.7312 v57 11.671 12.112 12.491 12.804 13.107 v58 3.0660 3.1830 3.2865 3.3788 3.4599 v59 0.36121 0.37491 0.38718 0.39806 0.40761 v60 0.36972 0.38397 0.39629 0.40743 0.41720 v61 3.8923 4.0639 4.2398 4.4024 4.5548 v62 0.41646 0.43223 0.44640 0.45894 0.46995 v63 0.62469 0.64832 0.66960 0.68841 0.70492 v64 1.1373 1.2570 1.4065 1.5596 1.7067 v65 0.27065 0.38988 0.54105 0.68962 0.85164 v66 0.076492 0.079385 0.081992 0.084295 0.086317 v67 0.23796 0.24701 0.25508 0.26225 0.26854 198 v68 0.53987 0.56023 0.57850 0.59475 0.60901 v69 0.67779 0.70362 0.72654 0.74695 0.76486 v70 0.33784 0.35061 0.36213 0.37230 0.38123 v71 0.46955 0.48770 0.50334 0.51748 0.52989 v72 0.36972 0.38397 0.39629 0.40743 0.41720 v73 0.23796 0.24701 0.25508 0.26225 0.26854 v74 0.076480 0.079385 0.081992 0.084295 0.086317 v75 0.23796 0.24701 0.25508 0.26225 0.26854 v76 0.31447 0.32660 0.33708 0.34655 0.35486 v77 0.15296 0.15877 0.16398 0.16859 0.17263 v78 0.39096 0.40583 0.41907 0.43084 0.44117 v79 0.18273 0.18964 0.19587 0.20137 0.20620 v80 0.20823 0.21612 0.22320 0.22947 0.23497 v81 0.061602 0.063944 0.066049 0.067904 0.069533 v82 0.059513 0.061795 0.063771 0.065563 0.067135 v83 0.059490 0.061753 0.063771 0.065563 0.067135 v84 0.061602 0.063944 0.066049 0.067904 0.069533 v85 0.061602 0.063944 0.066049 0.067904 0.069533 v86 0.0033995 0.0035292 0.0036441 0.0037464 0.0038363 v87 0.0095609 0.0099281 0.010249 0.010537 0.010790 v88 0.025498 0.026473 0.027331 0.028098 0.028772 v89 0.036136 0.037505 0.038718 0.039806 0.040760 v90 0.031864 0.033079 0.034163 0.035123 0.035965 v91 0.46532 0.48307 0.49878 0.51280 0.52509 v92 0.46532 0.48307 0.49878 0.51280 0.52509 v93 0.46532 0.48307 0.49878 0.51280 0.52509 v94 0.46532 0.48307 0.49878 0.51280 0.52509 v95 0.15510 0.16103 0.16626 0.17093 0.17503 v96 0.15510 0.16103 0.16626 0.17093 0.17503 v97 0.027622 0.028685 0.029608 0.030440 0.031170 v98 0.0046741 0.0048517 0.0050106 0.0051514 0.0052749 v99 0.55452 0.57546 0.59444 0.61115 0.62580 v100 0.58004 0.60216 0.62177 0.63924 0.65457 v101 0.16996 0.17645 0.18220 0.18732 0.19181 v102 104.98 106.86 108.98 111.33 114.06

199

Table A7. MTYL065 growth phase best-fit flux values and flux confidence intervals

Notation: 95 lb: lower bound of flux 95% confidence interval 68 lb: lower bound of flux 68% confidence interval Best: best-fit flux value 68 ub: upper bound of flux 68% confidence interval 95 ub: upper bound of flux 95% confidence interval Net: net flux for a reversible reaction. Exch: exchange flux for a reversible reaction.

Flux 95 lb 68 lb Best 68 ub 95 ub v1 100 100 100 100 100 v2 18.134 18.464 18.795 19.133 19.458 v3 0.70021 0.71178 0.72369 0.73566 0.74723 v4 1.8540 1.9180 1.9831 2.0482 2.1036 v5 0 0.25472 0.59090 0.93419 1 v6 0 0.065814 0.40910 0.74528 1 v7 1 1 1 1 1 v8 0.024979 0.036160 0.048024 0.060544 0.073718 v9 0.92628 0.93946 0.95198 0.96384 0.97502 v10 1 1 1 1 1 v11 0 0 1E-07 0.0051032 0.018411 v12 0.98159 0.99490 1 1 1 v13 1 1 1 1 1 v14 0 0 0.029006 1 0.9999999 v15 0 0 0.97099 1 0.9999999 v16 1 1 1 1 1 v17 13.318 14.635 15.797 16.862 17.814 v18 net 10.348 11.112 11.924 12.873 13.384 v18 exch 41.439 116.92 3.45E+06 inf inf v19 1.3332 1.9543 2.8219 3.8795 5.0349 v20 net 8.5782 9.6696 10.903 11.804 12.285 v20 exch 12.862 17.373 24.592 37.908 70.712 v21 net 6.2734 7.3446 8.5577 9.4879 10.020 v21 exch 17.178 59.474 4.92E+06 inf inf v22 0.78118 0.79307 0.80500 0.81724 0.82908 v23 net 5.4703 6.5675 7.7527 8.6843 9.2186 v23 exch 2.3621 2.8233 7.7013 inf inf v24 net 5.4703 6.5675 7.7527 8.6843 9.2186 v24 exch 2.3635 2.8253 11.968 inf inf v25 net 12.051 15.221 18.834 21.707 23.398 v25 exch 0 0 14.558 inf inf v26 14.104 16.269 17.879 19.765 21.486 v27 net 4.3892 5.4530 6.6584 7.6041 8.1650 v27 exch 4.0527 13.623 2.63E+05 inf inf v28 net 6.7216 8.8198 11.221 13.154 14.315 v28 exch 8.5750 25.228 232.46 inf inf v29 net 6.7216 8.8198 11.221 13.154 14.315 v29 exch 8.5777 25.226 182.15 inf inf v30 net 3.1034 4.1548 5.3475 6.3109 6.9050

200 v30 exch 5.0430 8.3245 11.317 14.827 19.073 v31 net 3.6140 4.6706 5.8731 6.8364 7.4096 v31 exch 6.2489 25.233 2.03E+06 inf inf v32 net 3.6140 4.6706 5.8731 6.8364 7.4096 v32 exch 3.9346 11.541 6.29E+05 inf inf v33 net 3.6140 4.6706 5.8731 6.8364 7.4096 v33 exch 2.1912 3.7947 5.7057 8.0813 10.545 v34 3.1470 3.7926 4.7667 6.0114 7.3575 v35 net 16.887 18.107 19.173 20.138 20.990 v35 exch 0 0 34.604 inf inf v36 0.35700 0.54035 0.72623 0.91050 1.0979 v37 100.23 100.23 100.24 100.25 100.26 v38 4.1663 4.3039 5.1534 6.9451 8.2715 v39 57.071 57.557 58.577 60.102 61.457 v40 net 32.227 32.876 33.854 35.034 36.166 v40 exch 0 0 0.00022108 inf inf v41 32.227 32.876 33.854 35.034 36.166 v42 30.464 31.114 32.103 33.292 34.462 v43 net 55.357 55.846 56.885 58.440 59.825 v43 exch 0 0 1E-07 2.5796 9.9667 v44 net 56.280 56.764 57.793 59.333 60.705 v44 exch 680.51 2596.6 1.00E+07 inf inf v45 net 57.071 57.557 58.577 60.102 61.457 v45 exch 0 0 1.94E-07 inf inf v46 net 23.980 24.272 24.723 25.304 25.902 v46 exch 0 0 9.89E-08 inf inf v47 23.980 24.272 24.723 25.304 25.902 v48 23.980 24.272 24.723 25.304 25.902 v49 24.040 24.334 24.782 25.363 25.960 v50 0.80077 0.91425 1.0208 1.1279 1.2315 v51 net 0.31123 0.41585 0.51571 0.61592 0.71438 v51 exch 0 0 1E-07 0.025265 0.0930973 v52 net 0.40968 0.50104 0.59346 0.68386 0.77690 v52 exch 0.54510 1.4245 4.3687 inf inf v53 1.3947 2.0144 2.8814 3.9408 5.0947 v54 23.980 24.272 24.723 25.304 25.902 v55 net 52.804 53.368 53.943 54.492 54.953 v55 exch 0 0 24.961 230.44 inf v56 3.9360 4.5838 5.5501 6.7836 8.1147 v57 8.5318 8.8189 9.1085 9.4000 9.6616 v58 2.2026 2.2785 2.3560 2.4331 2.4988 v59 0.25957 0.26851 0.27764 0.28675 0.29447 v60 0.26513 0.27430 0.28359 0.29288 0.30077 v61 2.9846 3.1791 3.3757 3.5461 3.7550 v62 0.29850 0.30881 0.31929 0.32976 0.33865 v63 0.44682 0.46222 0.47794 0.49361 0.50694 v64 0.97718 1.1602 1.3450 1.5237 1.7092 v65 0.35700 0.54035 0.72623 0.91050 1.0979 v66 0.055621 0.057538 0.059494 0.061443 0.063107 v67 0.17057 0.17645 0.18245 0.18843 0.19352 201 v68 0.38564 0.39893 0.41249 0.42602 0.43749 v69 0.48576 0.50250 0.51958 0.53648 0.55113 v70 0.24288 0.25125 0.25979 0.26824 0.27557 v71 0.33558 0.34715 0.35895 0.37064 0.38075 v72 0.26513 0.27430 0.28359 0.29288 0.30077 v73 0.17057 0.17645 0.18245 0.18843 0.19352 v74 0.055621 0.057538 0.059494 0.061422 0.063107 v75 0.17057 0.17645 0.18245 0.18843 0.19352 v76 0.22619 0.23399 0.24194 0.24987 0.25662 v77 0.10939 0.11316 0.11701 0.12084 0.12411 v78 0.28181 0.29153 0.30144 0.31116 0.31974 v79 0.13164 0.13617 0.14080 0.14541 0.14935 v80 0.15018 0.15535 0.16063 0.16587 0.17037 v81 0.044497 0.046030 0.047595 0.049149 0.050486 v82 0.042643 0.044112 0.045612 0.047105 0.048381 v83 0.042643 0.044113 0.045612 0.047109 0.048380 v84 0.044497 0.046030 0.047595 0.049149 0.050486 v85 0.044497 0.046030 0.047595 0.049149 0.050486 v86 0.0024103 0.0024933 0.0025781 0.0026624 0.0027346 v87 0.0068599 0.0070963 0.0073376 0.0075783 0.0077826 v88 0.018540 0.019179 0.019831 0.020480 0.021034 v89 0.025957 0.026851 0.027764 0.028675 0.029447 v90 0.022248 0.023015 0.023798 0.024574 0.025243 v91 0.33706 0.34868 0.36054 0.37236 0.38238 v92 0.33706 0.34868 0.36054 0.37236 0.38238 v93 0.33706 0.34868 0.36054 0.37236 0.38238 v94 0.33706 0.34868 0.36054 0.37236 0.38238 v95 0.11235 0.11623 0.12018 0.12412 0.12747 v96 0.11235 0.11623 0.12018 0.12412 0.12747 v97 0.019653 0.020330 0.021021 0.021708 0.022295 v98 0.0035227 0.0036441 0.0037680 0.0038899 0.0039968 v99 0.42272 0.43729 0.45216 0.46696 0.47958 v100 0.44126 0.45647 0.47199 0.48746 0.50060 v101 0.12978 0.13426 0.13882 0.14337 0.14724 v102 109.59 111.05 112.75 114.43 116.10

202

Table A8. MTYL037 lipid production phase best-fit flux values and flux confidence intervals

Notation: 95 lb: lower bound of flux 95% confidence interval 68 lb: lower bound of flux 68% confidence interval Best: best-fit flux value 68 ub: upper bound of flux 68% confidence interval 95 ub: upper bound of flux 95% confidence interval Net: net flux for a reversible reaction. Exch: exchange flux for a reversible reaction.

Flux 95 lb 68 lb Best 68 ub 95 ub v1 100 100 100 100 100 v2 17.487 18.499 19.542 20.584 21.594 v3 0.67651 0.71222 0.74856 0.78535 0.82107 v4 3.3937 3.8094 4.2341 4.4746 4.8683 v5 0 0 0 1 1 v6 0 0 1 1 1 v7 1 1 1 1 1 v8 0.10274 0.11720 0.13211 0.14126 0.15509 v9 0.84491 0.85874 0.86789 0.88280 0.89726 v10 1 1 1 1 1 v11 0 0 1E-07 0.004275 0.015203 v12 0.98480 0.99572 1 1 1 v13 1 1 1 1 1 v14 0 0 1 1 1 v15 0 0 1E-07 1 1 v16 1 1 1 1 1 v17 0.10451 0.11886 0.13378 0.14296 0.15682 v18 0.84318 0.85704 0.86622 0.88114 0.89549 v19 1 1 1 1 1 v20 22.693 23.615 24.015 25.066 25.997 v21 net 6.5654 7.2559 7.6086 8.3988 9.1699 v21 exch 268.34 838.54 1.00E+07 inf inf v22 15.412 16.093 16.407 17.128 17.606 v23 net 6.5654 7.2559 7.6086 8.3988 9.1699 v23 exch 16.992 19.995 23.025 30.004 40.014 v24 net 6.5654 7.2559 7.6086 8.3988 9.1699 v24 exch 12.885 15.154 18.708 22.791 27.817 v25 0.67651 0.71222 0.74856 0.78535 0.82107 v26 net 5.8155 6.5068 6.8601 7.6508 8.4208 v26 exch 69.797 128.00 1.00E+07 inf inf v27 net 5.8155 6.5068 6.8601 7.6508 8.4208 v27 exch 62.462 123.51 1.00E+07 inf inf v28 net 17.447 19.520 20.580 22.951 25.262 v28 exch 0 0 6.1759 inf inf v29 17.447 19.520 20.580 22.951 25.262 v30 net 5.8155 6.5068 6.8601 7.6508 8.4208 v30 exch 37.150 55.955 282.28 inf inf v31 net 11.631 13.014 13.720 15.302 16.842 v31 exch 73.259 143.02 1.00E+07 inf inf

203 v32 net 11.631 13.014 13.720 15.302 16.842 v32 exch 70.986 160.08 1.00E+07 inf inf v33 net 5.8155 6.5068 6.8601 7.6508 8.4208 v33 exch 15.975 22.802 27.550 38.388 51.584 v34 net 5.8155 6.5068 6.8601 7.6508 8.4208 v34 exch 53.882 106.09 1.00E+07 inf inf v35 net 5.8155 6.5068 6.8601 7.6508 8.4208 v35 exch 19.664 67.370 1.00E+07 inf inf v36 net 5.8155 6.5068 6.8601 7.6508 8.4208 v36 exch 10.007 12.518 14.126 18.327 23.966 v37 0 0 0.54476 0.86935 1.6160 v38 net 22.693 23.615 24.015 25.066 25.997 v38 exch 0 0 31.715 inf inf v39 16.415 16.735 16.952 17.306 17.653 v40 67.051 67.821 68.615 69.088 69.845 v41 net 34.468 35.010 35.587 35.818 36.337 v41 exch 0 0 17.293 inf inf v42 34.468 35.010 35.587 35.818 36.337 v43 34.468 35.010 35.587 35.818 36.337 v44 net 62.847 63.601 64.381 64.980 65.727 v44 exch 0 0 1E-07 4.1905 17.188 v45 net 62.847 63.601 64.381 64.980 65.727 v45 exch 221.55 727.19 1.00E+07 inf inf v46 net 67.051 67.821 68.615 69.088 69.845 v46 exch 0 0 34.308 inf inf v47 net 28.167 28.595 28.794 29.291 29.771 v47 exch 0 0 63.217 inf inf v48 28.167 28.595 28.794 29.291 29.771 v49 28.167 28.595 28.794 29.291 29.771 v50 28.167 28.595 28.794 29.291 29.771 v51 15.412 16.093 16.407 17.128 17.606 v52 32.323 32.792 33.028 33.523 34.002 v53 net 50.087 50.863 51.663 52.073 52.765 v53 exch 602.80 1595.0 1.00E+07 inf inf v54 3.5621 4.0507 4.7789 5.0872 5.7789 v55 128.13 131.23 133.27 136.52 139.78

204

Table A9. MTYL065 lipid production phase best-fit flux values and flux confidence intervals

Notation: 95 lb: lower bound of flux 95% confidence interval 68 lb: lower bound of flux 68% confidence interval Best: best-fit flux value 68 ub: upper bound of flux 68% confidence interval 95 ub: upper bound of flux 95% confidence interval Net: net flux for a reversible reaction. Exch: exchange flux for a reversible reaction.

Flux 95 lb 68 lb Best 68 ub 95 ub v1 100 100 100 100 100 v2 33.891 35.672 37.498 39.165 40.938 v3 1.1305 1.2811 1.4311 1.5862 1.7367 v4 0.58804 0.64867 0.71093 0.75873 0.81789 v5 0 0 1E-07 1 1 v6 0 0 1 1 1 v7 1 1 1 1 1 v8 0.0074757 0.017834 0.0283823 0.032305 0.038152 v9 0.96185 0.96769 0.97162 0.98217 0.99252 v10 1 1 1 1 1 v11 0 0 1E-07 0.0026339 0.0098572 v12 0.99014 0.99737 1 1 1 v13 1 1 1 1 1 v14 0 0 0.58491 1 1 v15 0 0 0.41509 1 1 v16 1 1 1 1 1 v17 0.027857 0.038208 0.0487481 0.052877 0.062084 v18 0.93792 0.94712 0.95125 0.96179 0.97214 v19 1 1 1 1 1 v20 19.991 20.415 20.623 21.352 22.276 v21 net 12.001 12.854 13.941 14.410 14.924 v21 exch 167.53 413.85 1.00E+07 inf inf v22 6.3387 6.5692 6.6818 8.2654 9.2860 v23 net 12.001 12.854 13.941 14.410 14.924 v23 exch 16.466 19.720 22.047 28.022 36.173 v24 net 12.001 12.854 13.941 14.410 14.924 v24 exch 51.228 69.459 127.31 185.44 960.75 v25 1.1305 1.2811 1.4311 1.5862 1.7367 v26 net 10.558 11.420 12.510 13.007 13.441 v26 exch 29.445 34.484 51.036 173.80 548.41 v27 net 10.558 11.420 12.510 13.007 13.441 v27 exch 28.699 34.329 48.395 166.96 473.35 v28 net 31.674 34.259 37.531 39.022 40.324 v28 exch 0 0 227.60 inf inf v29 31.674 34.259 37.531 39.022 40.324 v30 net 10.558 11.420 12.510 13.007 13.441 v30 exch 30.996 96.314 1.00E+07 inf inf v31 net 21.116 22.839 25.021 26.014 26.882 v31 exch 114.94 299.88 1.00E+07 inf inf

205 v32 net 21.116 22.839 25.021 26.014 26.882 v32 exch 114.85 307.91 1.00E+07 inf inf v33 net 10.558 11.420 12.510 13.007 13.441 v33 exch 19.024 25.100 29.010 37.973 44.700 v34 net 10.558 11.420 12.510 13.007 13.441 v34 exch 11.916 20.184 73.141 684.23 inf v35 net 10.558 11.420 12.510 13.007 13.441 v35 exch 0.58405 6.4933 4.32E+05 inf inf v36 net 10.558 11.420 12.510 13.007 13.441 v36 exch 4.2084 6.2380 8.0848 13.479 18.018 v37 0 0 0.15842 0.27168 0.55783 v38 net 19.991 20.415 20.623 21.352 22.276 v38 exch 0 0 0.0038293 inf inf v39 6.3828 6.6296 6.8402 8.5626 9.4424 v40 45.128 46.450 47.850 49.428 51.172 v41 net 24.035 24.906 25.146 26.266 27.107 v41 exch 0 0 31.955 inf inf v42 24.035 24.906 25.146 26.266 27.107 v43 24.035 24.906 25.146 26.266 27.107 v44 net 44.420 45.739 47.139 48.733 50.467 v44 exch 0 0 1E-07 6.0084 29.092 v45 net 44.420 45.739 47.139 48.733 50.467 v45 exch 22.543 131.08 1.00E+07 inf inf v46 net 45.128 46.450 47.850 49.428 51.172 v46 exch 0 0 11.552 inf inf v47 net 20.821 21.277 21.993 22.334 23.107 v47 exch 0 0 47.435 inf inf v48 20.821 21.277 21.993 22.334 23.107 v49 20.821 21.277 21.993 22.334 23.107 v50 20.821 21.277 21.993 22.334 23.107 v51 6.3387 6.5692 6.6818 8.2654 9.2860 v52 21.529 21.986 22.703 22.914 23.820 v53 net 38.574 40.016 41.009 41.617 42.646 v53 exch 0 0 1E-07 24.349 41.410 v54 0.59576 0.65821 0.86935 0.98051 1.2647 v55 109.55 113.15 116.44 120.14 123.75

206

Appendix B

Isotopic tracing and MFA results for Y. lipolytica and M. thermoacetica cofeeding cultures

207

Table B1. Steady-state metabolite labeling from 13C gluconate in Y. lipolytica lipogenic phase.

13C Gluconate + Acetate Measured S.E.M. Simulated G6P M+0 63.2% 0.3% 64.4% M+1 17.1% 0.2% 17.5% M+2 7.9% 0.2% 6.9% M+3 7.6% 0.3% 7.1% M+4 2.2% 0.2% 2.1% M+5 1.1% 0.2% 0.8% M+6 1.0% 0.2% 1.3% F6P M+0 64.4% 0.3% 64.5% M+1 16.9% 0.3% 17.5% M+2 7.3% 0.3% 6.9% M+3 7.4% 0.2% 7.2% M+4 2.1% 0.2% 2.1% M+5 1.1% 0.2% 0.7% M+6 0.9% 0.2% 1.1% 3PG M+0 93.0% 0.2% 92.6% M+1 1.5% 0.2% 1.6% M+2 0.1% 0.2% 0.6% M+3 5.5% 0.2% 5.2% S7P M+0 53.6% 0.2% 53.6% M+1 22.0% 0.2% 22.6% M+2 9.7% 0.2% 9.7% M+3 6.0% 0.2% 6.6% M+4 3.6% 0.2% 3.7% M+5 2.7% 0.2% 2.7% M+6 1.2% 0.2% 0.8% M+7 1.1% 0.2% 0.3% 6PG M+0 63.7% 0.3% 62.6% M+1 17.0% 0.3% 16.9% M+2 6.6% 0.3% 6.5% M+3 5.3% 0.3% 6.7% M+4 1.6% 0.3% 2.0% M+5 0.3% 0.3% 1.0% M+6 5.5% 0.3% 4.3% R5P M+0 66.9% 0.2% 67.2% M+1 16.3% 0.2% 17.0% M+2 7.1% 0.3% 4.8% M+3 4.7% 0.2% 4.6% M+4 1.5% 0.3% 2.0% M+5 3.5% 0.3% 4.4% PEP M+0 92.9% 0.3% 93.0% M+1 1.2% 0.2% 1.5% M+2 0.2% 0.2% 0.5% M+3 5.7% 0.3% 5.0% Pyr M+0 93.6% 0.2% 93.6% M+1 1.2% 0.2% 1.4% M+2 0.5% 0.2% 0.5% M+3 4.7% 0.2% 4.6%

G6P denotes glucose-6-phosphate; F6P, fructose-6-phosphate; 3PG, 3-phosphoglycerate; S7P, sedoheptulose-7-phosphate; 6PG, 6- phosphogluconate; R5P, ribose-5-phosphate; PEP, phosphoenolpyruvate. The “Measured” column represents the mean of three biological replicates measured by LC-MS. The “Simulated” column represents the simulation values that best fit the measured labeling, nutrient uptake, and product synthesis rates.

208

Table B2. Metabolic flux distributions of Y. lipolytica lipogenic phase with 95% confidence intervals (mmol gCDW–1 hr–1) determined by isotopomer balancing.

Gluconate + Acetate Reaction Substrates Products flux L.B. U.B. Gluconate_IN Gluconate 6PG 0.06 0.05 0.06 PGI G6P F6P -1.05 -1.22 -0.88 PFK_FBPase F6P FBP -0.39 -0.69 -0.07 FBA FBP DHAP + GAP -0.39 -0.69 -0.07 TPI DHAP GAP -0.37 -0.42 -0.32 GAPDH GAP 13BPG -0.32 -0.36 -0.27 PGK 13BPG 3PG -0.32 -0.36 -0.27 PGM 3PG 2PG -0.32 -0.36 -0.27 ENO 2PG PEP -0.32 -0.36 -0.27 PYK PEP Pyr 0.62 0.46 0.80 G6PDH G6P 6PG 1.05 0.88 1.22 GND 6PG Ru5P + CO2 1.10 0.93 1.28 RPI Ru5P R5P 0.37 0.31 0.43 RPE Ru5P Xu5P 0.74 0.62 0.85 TKT1 R5P + Xu5P S7P + GAP 0.37 0.31 0.43 TKT2 E4P + Xu5P F6P + GAP 0.37 0.31 0.43 TAL S7P + GAP E4P + F6P 0.29 0.00 0.58 SBA DHAP + E4P SBP -0.08 -0.37 0.22 SBPase SBP S7P -0.08 -0.37 0.22 PEP_IN OA PEP 0.93 0.78 1.11 PYR_IN Mal Pyr 0.06 0.03 0.10 Glyc3P_EX DHAP 0.06 0.05 0.06 Pyr_EX Pyr 0.67 0.51 0.87

DHAP denotes dihydroxyacetone phosphate; GAP, glyceraldehyde-3-phosphate; 13BPG, 1,3-bisphosphoglycerate; SBP, sedoheptulose-1,7- bisphosphate; Ru5P, ribulose-5-phosphate; Xu5P, xylulose-5-phosphate; E4P, erythrose-4-phosphate; OA, oxaloacetate; and Mal, malate. L.B. and U.B. are the lower and upper bounds of 95% confidence intervals for the reaction fluxes.

209

Table B3. Steady-state metabolite labeling from 13C glucose in M. thermoacetica.

13 C Glucose + CO2 Measured S.E.M. Simulated 3PG M+0 3.1% 0.5% 3.2% M+1 1.1% 0.6% 1.7% M+2 12.2% 2.1% 10.7% M+3 83.5% 2.9% 84.3% PEP M+0 1.6% 1.6% 3.6% M+1 0.9% 0.9% 1.9% M+2 16.6% 3.2% 11.6% M+3 80.9% 5.0% 82.9% Ala M+0 34.7% 1.0% 33.8% M+1 12.8% 2.8% 17.2% M+2 32.3% 1.5% 31.5% M+3 20.1% 2.5% 17.5% Ser M+0 32.1% 2.8% 24.0% M+1 10.8% 2.3% 18.9% M+2 17.3% 1.8% 17.3% M+3 39.7% 1.0% 39.8% Gly M+0 39.0% 2.9% 43.7% M+1 39.6% 2.8% 33.7% M+2 21.4% 4.7% 22.6% Glu M+0 7.9% 1.1% 4.7% M+1 8.7% 1.4% 6.1% M+2 36.6% 0.6% 39.3% M+3 23.5% 1.8% 26.0% M+4 22.9% 1.4% 23.4% M+5 0.4% 0.4% 0.5% Asp M+0 10.9% 0.6% 10.7% M+1 5.2% 0.5% 6.0% M+2 28.9% 0.6% 28.9% M+3 54.1% 1.4% 53.2% M+4 0.8% 0.5% 1.2% cellular CO2* M+0 96.6% 1.6% 97.7% M+1 3.4% 1.6% 2.3% acetyl-group* M+0 44.7% 1.2% 42.5% M+1 24.0% 3.0% 20.6% M+2 31.3% 1.8% 36.9%

The “Measured” column represents the mean of three biological replicates measured by LC-MS * Inferred from metabolite pairs with and without the moieties (acetyl-glutamate and glutamate, citrulline and ornithine)

210

Table B4. Metabolic flux distributions of M. thermoacetica with 95% confidence intervals (mmol gCDW–1 hr–1) determined by isotopomer balancing.

Glucose + CO2 Reaction Substrates Products flux L.B. U.B. GAPD_PGK GAP 3PG 6.61 6.57 6.62 PGM_ENO 3PG PEP 6.56 4.07 6.58 PYK PEP PYR 7.02 0.00 7.60 PPC PEP OA -0.46 -1.05 6.53 PC PYR OA 0.72 -6.28 1.31 PFOR PYR AcCoA + CO2 6.19 3.76 6.25 CS AcCoA + OA Cit 0.10 0.08 0.10 ACON_IDH Cit aKG + CO2 0.10 0.08 0.10 AKGD aKG SuccCoA + CO2 0.00 0.00 0.00 PHGDH 3PG Ser 0.05 0.00 2.54 SHMT Ser Gly + MLTHF 0.02 -0.03 2.52 GCS_a Gly Gly-CO2* + CO2 -0.03 -0.09 2.46 GCS_b Gly-CO2* MLTHF -0.03 -0.09 2.46 FDH_FTHL CO2 FTHF 3.57 -0.75 3.68 MTHFC_MTHFD FTHF MLTHF 3.57 -0.75 3.68 MTHFR MLTHF MTHF 3.57 1.76 5.29 MTRCFSP MTHF MCFeSP 3.57 1.76 5.29 CODH_ACS MCFeSP + CO2 AcCoA 3.57 1.76 5.29 ACCOA_EX AcCoA 9.66 7.91 9.67 ALA_EX Pyr 0.10 0.08 0.10 ASP_EX OA 0.17 0.14 0.17 GLU_EX aKG 0.10 0.08 0.10 GLY_EX Gly 0.05 0.05 0.06 SER_EX Ser 0.03 0.03 0.03

* '-' sign indicates transition state in which the molecule that follows the sign is cleaved from the substrate that precedes the sign MCFeSP denotes methylcorrinoid iron–sulfur protein; ML, methylene; MLTHF, methylene-THF; FTHF, formyl-THF; and MTHF, methyl-THF. L.B. and U.B. are the lower and upper bounds of 95% confidence intervals for the fluxes.

211

Table B5. New reactions and genes added to the M. thermoacetica metabolic model iAI563.

Reaction Reaction Name Reaction Formula Genes Associated with Reactions Model Subsystem Abbreviation Alcohol dehydrogenase etoh[c] + nad[c] <=> acald[c] + h[c] + ( Moth_1024 and Moth_1911 and Alternate Carbon ALCD2x (ethanol) nadh[c] Moth_2268 ) Metabolism Alcohol dehydrogenase acald[c] + h[c] + nadph[c] -> etoh[c] + ( Moth_1024 and Moth_1911 and Alternate Carbon ALCD2yi (ethanol, NADP) nadp[c] Moth_2268 ) Metabolism Acetaldehyde dehydrogenase acald[c] + coa[c] + nad[c] <=> accoa[c] ACALD Moth_1776 Central Metabolism (acetylating) + h[c] + nadh[c] Acetaldehyde:ferredoxin ac[c] + fdxrd[c] + 3 h[c] -> acald[c] + AOR_MT Moth_0722 Central Metabolism oxidoreductase fdxox[c] + h2o[c] EX_etoh(e) Ethanol exchange etoh[e] <=> Exchange ETOHt Ethanol reversible transport etoh[e] <=> etoh[c] Transport

Table B6. Corrected reactions in the M. thermoacetica metabolic model iAI563.

Reaction Reaction Name Published Model iAI558 Updated Model iAI563 Abbreviation 2 fdxrd[c] + 3 h[c] <=> 2 fdxox[c] + h2[c] + fdxrd[c] + 3 h[c] <=> fdxox[c] + h2[c] + FRHD Hydrogenase (ferredoxin) h[e] h[e] Carbonmonoxide dehydrogenase co[c] + 2 fdxox[c] + h2o[c] -> co2[c] + 2 co[c] + fdxox[c] + h2o[c] -> co2[c] + CODH (ferredoxin) fdxrd[c] + 2 h[c] fdxrd[c] + 2 h[c] 2 fdxrd[c] + 2 h[c] + co2[c] + accoa[c] <=> 2 fdxrd[c] + 2 h[c] + co2[c] + accoa[c] <=> PFOR Pyruvate ferredoxin oxidoreductase fdxox[c] + pyr[c] + coa[c] fdxox[c] + pyr[c] + coa[c] Electron bifurcating Ferredoxin:NAD 2 fdxox[c] + 2 h2[c] + nad[c] <=> 2 fdxrd[c] + fdxox[c] + 2 h2[c] + nad[c] <=> fdxrd[c] + 3 HYDFDNr hydrogenase (HydABC) 3 h[c] + nadh[c] h[c] + nadh[c]

212