BIOCATALYSIS AND BIOPROCESS ENGINEERING FOR TERPENOID

PRODUCTION

by

Sonal Ayakar

B. Pharm., Mumbai University, 2012

M. Tech., Institute of Chemical Technology, 2014

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Chemical and Biological Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

May 2019

© Sonal Ayakar, 2019 The following individuals certify that they have read, and recommend to the Faculty of Graduate and Postdoctoral Studies for acceptance, the dissertation entitled:

Biocatalysis and bioprocess engineering for terpenoid production

submitted by Sonal Ayakar in partial fulfillment of the requirements for

the degree of Doctor of Philosophy

in Chemical and Biological Engineering

Examining Committee:

Vikramaditya Yadav, Chemical and Biological Engineering Supervisor

Charles Haynes, Chemical and Biological Engineering Supervisory Committee Member

Joerg Bohlmann, Botony Supervisory Committee Member

Fariborz Taghipour, Chemical and Biological Engineering University Examiner

Nobuhiko Tokuriki, Biochemistry and Molecular Biology University Examiner

ii

Abstract

The biomanufacturing of terpenoids is limited by the low yields of heterologously expressed biosynthetic pathway and challenges associated with recovering these products at commercial scales. To enhance the flux through methylerythritol phosphate (MEP) pathway for terpenoid biosynthesis, I screened soil metagenomes for more active and stable orthologs of the rate-limiting enzymes. I successfully identified three entirely novel, natural fusions of IspD and

IspF, one of which improved production of lycopene from 235 mg/L to 275 mg/L and production of isoprene from 3.6 mg/L to 6.3 mg/L when compared to the native enzyme overexpression. A comprehensive study of the role of the linking domain revealed the higher activity of each of the catalytic domains and the absence of substrate channeling. Moreover, the non-natural fusions of

E. coli enzymes catalyzing consecutive steps were constructed. One such a fusion of IspD and

IspE yielded 281 mg/L of lycopene, whereas the best performing fusion of IspE and IspF only yielded 39 mg/L of lycopene. Further investigation of the sequence of this biocatalytic cascade concluded the commencement with the activity of IspE, followed IspD and IspF suggesting the reactive plasticity in MEP pathway.

I probed the promiscuous nature of terpene synthases (TSs) through the systematic study of monoterpene synthases from Picea abies in vivo and in vitro . I uncovered the influence of intracellular expression and oxygen supply on the promiscuity of TSs. Computational analysis revealed the putative roles of the amino acid residues within the active sites and their evolutionary trajectory. Finally, the fermentation of engineered E. coli strains for carene and myrcene were scaled up to 1 L and a newer technique was developed for efficient product capture using a fluidized bed capture device (FBCD) using a hydrophobic resin. The device was easy to integrate iii

into the existing bioreactor set up. It yielded 2-fold higher carene titers and 17-fold higher myrcene titers.

In conclusion, the three aspects of the terpenoid biomanufacture studied in this work address some of the biggest challenges facing the industry and lay strong foundations for commercialization of terpenoid biomanufacturing processes that employ genetically engineered microorganisms.

iv

Lay Summary

Terpenoids are the plant natural products having applications as pharmaceutical agents, flavors, fragrances, surfactants, fuel additives, polymer additives etc. There are various challenges in efficient production and purification of these terpenoids from plants. In this work, E. coli are genetically engineered to perform the terpenoid biosynthesis route identified from Norway spruce.

Efforts of metabolic engineering in this area led to discoveries of novel biochemical pathways that operate at higher activity. Moreover, the fermentative production of the engineered E. coli provided insights into the evolution and activity of the enzymes from the plant to guide higher terpenoid production rates. An efficient method was developed for recovery of the terpenoid products from the fermentative process. This work will aid discovery of newer plant-based terpenoids that have wider applications as well as facilitate the development of their manufacturing in genetically engineered microbes.

v

Preface

This Ph.D. dissertation consists of seven chapters. The Ph.D. study was conducted by Sonal

Ayakar under the direct supervision of Dr. Vikramaditya G. Yadav in the Department of Chemical and Biological Engineering at UBC. The literature review, experimental planning was done by

Sonal Ayakar with direct inputs from Vikramaditya Yadav.

Chapter 2

The metagenomic database was generated by Steven Hallam Lab members, UBC and the screen was conducted with Sandip Pawar. Construction of pSASDFI was conducted by Sonal

Ayakar and Protiva Roy. Other natural fusion constructs were completed by Sonal Ayakar.

Construction of non-natural fusion genes was done by Carmen Bayly. Expression of the enzyme chimera was done by Sonal Ayakar. Analysis, quantification of terpenoids and interpretation of the data for natural fusions was performed by Sonal Ayakar. The natural fusion discovery and application were presented as:

S. R. Ayakar, S. V. Pawar. S. J. Hallam and V. G. Yadav, “Multiplying productivity using

metagenomics: beyond canonical metabolic engineering”, Synthetic biology for natural

products conference , Cancun, Mexico (2017).

The patent application has been filed to protect the natural fusions.

S. R. Ayakar and V. G. Yadav et al., Metabolic Engineering of E. Coli for the

Biosynthesis of Cannabinoid Products, USPTO PCT/CA2018/051073, Sept. 2018

(Technology transfer to Inmed Pharmaceticals, Canada)

vi

Homology models, analysis and quantification of terpenoids for non-natural fusion was conducted by Sonal Ayakar and Adhithi Raghavan. The data interpretation was done by Sonal

Ayakar. The patent application for non-natural fusions has been filed as:

S. R. Ayakar and V. G. Yadav et al., Compositions and methods for biosynthesis of

cannabinoids products in a prokaryote, International PCT Appl. No. PCT/CA2018/

051074, March 2019 (Technology transferred to Inmed Pharmaceuticals, Canada)

The entire chapter was written by Sonal Ayakar.

Chapter 3

The experimental design, experimental work and data interpretation were done by Sonal

Ayakar. The LC-MS analysis was done by Lina Madilao, UBC. The docked substrates were generated by Vikramaditya Yadav. The entire chapter was written by Sonal Ayakar. This work was presented at:

S. R. Ayakar and V. G. Yadav, “Metagenomics yields new insights about metabolic flux

through the non-mevalonate pathway”, DECHEMA Bioflavours , Frankfurt, Germany

(2018).

Chapter 4

The experimental design, experimental work and data interpretation were done by Sonal

Ayakar. The monoterpene gene sequences and monoterpene standards were procured from Joerg

Bohlmann. The docked substrates were generated by Vikramaditya Yadav. The computational modelling and figures were generated by Sonal Ayakar. The entire chapter was written by Sonal

Ayakar. vii

Chapter 5

The research idea was conceived by Sonal Ayakar. The initial planning of experiments for terpenoid capture were conducted by Sonal Ayakar and Azin Amiri. The fluidized bed capture device printing would not have been possible without Logan Phillips. The bioreactor run and optimization experiments were done by Azin Amiri. The data interpretation and product spectra analysis were done by Sonal Ayakar. The entire chapter was written by Sonal Ayakar.

This work has now become Azin Amiri’s PhD project.

Chapter 6

Following work was equally contributed by the co-authors. The application of lycopene producing strain developed and mentioned in chapter 2 was used in construction of biogenic photovoltaic cell. This work was published as:

S. K. Srivastava, P. Piwek, S. R. Ayakar, A. Bonakdarpour, D. Wilkinson, V. G. Yadav,

"A biogenic photovoltaic material", Small, 14, 1800729 (2018).

A metric was developed to gauge efficiency of biomass valorization and is accepted for publication as:

S.C. Patankar, S.R. Ayakar, V.G. Yadav, and S. Renneckar, “The V-factor: A new metric

for gauging the efficiency & profitability of manufacturing processes in the bio-

economy”, Bioproducts Business , 2018.bpb.047, accepted manuscript (2019).

viii

A chemical catalytic process was developed for lignin vaporization with highest reported yield for vanillin:

S.C. Patankar, L. Liu, L. Ji, S.R. Ayakar, V.G. Yadav, and S. Renneckar, Isolation of

phenolic monomers from kraft lignin using a magnetically recyclable TEMPO

nanocatalyst. Green Chem. 21, 785–791 (2019).

The study on characterization of microbial degradation of nanofibrillated cellulose.

Manuscript on this work is under preparation.

ix

Table of Contents

Abstract ...... iii

Lay Summary ...... v

Preface ...... vi

Table of Contents ...... x

List of Tables ...... xviii

List of Figures ...... xx

List of Equations ...... xxviii

List of Abbreviations ...... xxix

Glossary ...... xxxii

Acknowledgments ...... xxxiii

Dedication ...... xxxv

Chapter 1: Introduction ...... 1

1.1 Terpenoid chemical space ...... 1

1.2 Applications of terpenoids ...... 2

1.3 Isoprenoid synthesis ...... 3

Chemical synthesis ...... 3

Semi-synthesis ...... 3

Biosynthesis in heterologous hosts ...... 4

1.4 Biosynthetic pathways of isoprenoid production ...... 4

C5 precursor synthesis ...... 4

Chain elongation ...... 6 x

Pyrophosphate group removal ...... 7

Scaffold activation/modification ...... 7

1.5 MEP pathway ...... 8

Dxs (1-deoxy-d-xylulose 5-phosphate synthase) ...... 8

IspC (Dxr, 1-deoxy-d-xylulose 5-phosphate reductoisomerase) ...... 9

IspD (YgbP, 4-diphosphocytidyl-2-C-methylerythritol synthase) ...... 9

IspE (YchB, 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase) ...... 10

IspF (YgbB, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase) ...... 11

IspG (GcpE, 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase) ...... 12

IspH (LytB, 4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase) ...... 12

Idi (Isopentenyl diphosphate delta-isomerase) ...... 13

1.6 Properties of isoprenoids...... 15

Chemical properties ...... 15

Physical properties ...... 16

1.7 Bioprocess development for isoprene production ...... 16

1.8 Challenges in terpenoid production ...... 17

1.9 Research questions ...... 19

1.10 Organization of thesis ...... 21

Chapter 2: Study of enzyme fusions to improve MEP pathway yield ...... 22

2.1 Introduction ...... 22

Improvements in the precursor supply for the MEP pathway ...... 22

MEP pathway optimization...... 23

2.1.2.1 Homologous expression of MEP pathway enzymes ...... 23 xi

2.1.2.2 Heterologous expression of MEP pathway enzymes ...... 23

2.1.2.3 Improvement in genomic MEP control ...... 23

2.1.2.4 Evolution of MEP pathway genes ...... 24

Effect of culture conditions on the solubility of pathway enzymes ...... 24

Bifunctional enzyme discovery and applications ...... 24

2.1.4.1 IspDF discovery and structure ...... 24

2.1.4.2 Interactions of IspD, IspE and IspF ...... 25

Naturally occurring enzyme fusions catalyzing non-consecutive pathway steps ..... 27

Knowledge gap in the understanding of the MEP pathway ...... 27

Construction of non-natural fusions ...... 28

2.1.7.1 Demonstration of non-native function by fusions ...... 29

2.1.7.2 Non-natural fusions of MEP pathway enzymes ...... 29

2.2 Materials and methods ...... 30

Metagenomic screening for orthologs of MEP pathway enzymes ...... 30

Strains, genes and plasmids ...... 31

Non-natural protein fusions ...... 35

Construction of plasmids and strains for analysis of non-natural fusions ...... 36

Culture conditions ...... 38

Isoprene analysis ...... 38

Lycopene analysis ...... 38

2.3 Results ...... 39

Discovery of three IspDF fusions ...... 39

Construction of MEP pathway operon ...... 44 xii

Influence of novel fusion enzymes on the MEP pathway ...... 46

Role of IspE ...... 48

Effect of linkers on IspDF 1 activity ...... 49

Effect of linkers on non-natural fusions of monofunctional IspD and IspF ...... 51

Effect of XL on non-natural fusions of monofunctional E. coli IspD and IspF ...... 52

Expression of domains of IspDF 1 as IspD 1 and IspF 1 ...... 52

Non-natural fusions of IspE and their effects on MEP pathway flux ...... 53

Categorical inferences ...... 54

2.4 Discussions ...... 55

Influence of native Dxs, IspD, IspF and Idi on MEP pathway flux ...... 55

IspDF fusion activity and the influence of IspE on MEP pathway flux ...... 56

Study on linker types ...... 57

Influence of IspDF 1 domain separation on MEP pathway flux ...... 58

Improvement in the flux on IspDE fusion ...... 59

Chapter 3: Study of MEP pathway biochemistry ...... 61

3.1 Introduction ...... 61

Kinetics of MEP pathway cascade from MEP to MEcPP ...... 61

Canonical MEP pathway and plausible bifurcation step ...... 62

3.2 Materials and methods ...... 64

Strains, plasmid and genes ...... 64

Media and growth conditions...... 65

Protein extraction and IspE purification ...... 66

IspE in vitro reactions ...... 66 xiii

Bioluminescent assay for ATP consumption ...... 67

HPLC analysis of ATP, ADP and AMP ...... 67

LC-MS analysis of MEP and ME-2,4-PP ...... 67

Computational analysis ...... 68

3.3 Results ...... 68

Protein expression and quantification ...... 68

Analysis of IspE reaction by ATP bioluminescent assay ...... 69

HPLC analysis of IspE reactions ...... 71

LC-MS analyses of IspE reaction mixture ...... 74

Computational analysis ...... 76

3.4 Discussions ...... 78

Chapter 4: Study of catalytic promiscuity of terpene synthases ...... 79

4.1 Introduction ...... 79

Catalytic promiscuity of TSs ...... 80

Factors responsible for TS promiscuity ...... 81

Study of TS activity and promiscuity ...... 82

Research variables and scope ...... 85

4.2 Materials and methods ...... 86

Strains and plasmid construction ...... 86

Media and growth conditions...... 87

In vitro reactions ...... 88

Metabolite analyses ...... 89

Computational analysis ...... 89 xiv

4.3 Results ...... 91

Plasmid construction and protein expression ...... 91

Study of parameters on SACar fermentations ...... 93

Study of strain and fermentation parameters on monoterpene production ...... 96

Aerobic and micro-aerobic fermentations ...... 104

Metabolite analyses ...... 107

Study of terpene synthase catalytic mechanisms ...... 109

Cyclization of primary geranyl carbocation (cisoid) to the α-terpinyl carbocation 112

Formation of carene from the α-terpinyl carbocation ...... 115

Pinyl cation formation by 2,7-closure of the α-terpinyl carbocation ...... 118

Acyclic terpenoid mechanisms from cisoid primary geranyl carbocation ...... 119

4.4 Discussions ...... 121

Optimization of fermentation conditions ...... 121

Optimization of shake flask fermentations ...... 121

Product profile characterization ...... 124

Computational analyses ...... 126

Comparison of product profile with computational analysis ...... 129

Chapter 5: Designing an efficient bioprocess for terpenoid recovery ...... 130

5.1 Introduction ...... 130

Challenges in terpenoid bioprocess development ...... 130

Continuous ex situ recovery (CESR) ...... 131

Drawbacks of state of the art ...... 131

The objective of the study ...... 132 xv

5.2 Materials and methods ...... 132

Strains and fermentation conditions ...... 132

Bioreactor set up ...... 133

CESR for terpenoid ...... 133

Resins used for terpenoid capture ...... 134

Screening resin for carene capture ...... 136

5.3 Results and discussions ...... 137

Adsorption based carene recovery ...... 137

Bioreactor parameter optimization ...... 138

Terpenoid product analyses ...... 139

Terpenoid recovery comparison ...... 140

Chapter 6: Other scholarly contributions ...... 142

6.1 A Biogenic Photovoltaic Material 166 ...... 142

6.2 Isolation of phenolic monomers from kraft lignin using magnetically recyclable TEMPO

nanocatalyst 173 ...... 145

6.3 The V-factor: Towards a new metric for gauging the efficiency & profitability of

manufacturing processes for the bioeconomy ...... 149

6.4 Microbial growth and its characterization on nanocellulose synthesized from cherry

veneer 153

Chapter 7: Conclusions ...... 158

7.1 Improvement in the intracellular pool of C5 precursors ...... 159

7.2 Understanding the biocatalytic sequence of steps in the MEP pathway ...... 161

7.3 Study of terpene synthase biocatalysis ...... 162 xvi

7.4 Bioprocess development for terpenoid recovery ...... 163

7.5 Future work ...... 165

Improvement in MEP pathway flux (Chapter 2) ...... 165

MEP pathway biochemistry (Chapter 3) ...... 166

Terpene synthase study (Chapter 4) ...... 166

Terpenoid recovery from the bioprocess (Chapter 5) ...... 167

Bibliography ...... 168

Appendices ...... 179

Appendix A Fusion protein sequences ...... 179

A.1 IspDF 1 ...... 179

A.2 IspDF 2 ...... 179

A.3 IspDF 3 ...... 180

Appendix B HPLC chromatograms for ATP, ADP and AMP analysis ...... 181

Appendix C GC-MS analysis for terpenes ...... 182

xvii

List of Tables

Table 1.1 Theoretical yield of MEP and MVA pathway 27 ...... 6

Table 1.2 Kinetic parameters of MEP pathway reactions ...... 14

Table 1.3 Classes of terpenoids ...... 15

Table 2.1 Strains, genes and plasmids used for MEP pathway study ...... 32

Table 2.2 Types of linkers used in the study and their sequences ...... 35

Table 2.3 List of non-natural protein fusions ...... 36

Table 2.4 Strains and plasmid expressing non-natural fusion proteins ...... 36

Table 2.5 Protein alignment analysis of the bifunctional enzymes against E. coli IspD-IspF and cjIspDF using the online BLASTN search tool ...... 40

Table 2.6 Protein alignment analysis of each domain of the bifunctional enzymes against corresponding E. coli monofunctional enzymes using the online BLASTN search tool ...... 42

Table 2.7 Protein alignment analysis of each domain of the bifunctional enzymes against corresponding cjIspDF enzyme domains using the online BLASTN search tool ...... 43

Table 3.1 Enzyme amount and kinetics for MEP pathway steps (data extracted from 118 ) ...... 62

Table 3.2 Strains, plasmids and genes ...... 65

Table 3.3 Reaction conditions for different experimental sets ...... 69

Table 4.1 Strains, plasmids and genes ...... 86

Table 4.2 Summary of fermentation conditions selected ...... 104

Table 4.3 Monoterpene product distribution and percent total ...... 108

Table 4.4 Physical properties of monoterpenes ...... 123

Table 5.1 Resins employed in the study and their properties ...... 135 xviii

Table 5.2 Screening resin for carene recovery ...... 137

Table 5.3 Product distribution comparison of shake flask and bioreactor scale fermentations .. 140

Table 6.1 Comparison of vanillin yield derived through oxidative depolymerization of kraft lignin using Fe@MagTEMPO catalyst with values in literature ...... 147

Table 6.2 Predicted global averages of fractional dollar outputs of selected sectors ...... 152

xix

List of Figures

Figure 1-1 Suite of isoprenoid products ...... 2

Figure 1-2 MEP pathway ...... 9

Figure 1-3 Challenges in commercialization of terpenoid bioprocess ...... 18

Figure 2-1 Expression of the novel IspDFs as (His) 6 tagged protein expressed in E. coli BL21(DE3) and their analyses on SDS/PAGE. Lanes 1 and 5: total and purified IspDF 1 extract respectively, lanes 2 and 6: total and purified IspDF 2 extract respectively, lanes 4 and 7: total and purified

IspDF 3 extract respectively, lanes 2 and 8: protein ladder. The bands corresponding with the specific protein are highlighted with red box...... 40

Figure 2-2 Protein sequence alignment generated from the Centre for Genomic Regulation (CRG)

M-Coffee tool. IspD-IspF is the sequences for E. coli IspD and IspF...... 41

Figure 2-3 SDS/PAGE image of the soluble protein fraction of pSASDFI. Lane 1: E. coli

BL21(DE3), lane 2: protein ladder, lane 3 and 4: SASDFI. The bands corresponding to protein

are: Dxs (band a, 68.2 kDa), IspD (band b, 25.7 kDa), IspF (band d, 16.9 kDa) and Idi (band c,

21.2 kDa)...... 44

Figure 2-4 Influence of rate-limiting steps on MEP pathway flux. (a) Lycopene production, (b)

Isoprene production. The IPTG concentrations used for induction are denoted in the legends.

Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 45

Figure 2-5 Influence of novel IspDF fusions on MEP pathway flux. (a) Lycopene production, (b)

Isoprene production. The IPTG concentrations used for induction are denoted in the legends.

Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 47

xx

Figure 2-6 Homology models for the fusion proteins generated by SWISS-MODEL tool. (a)

66 cjIspDF , (b) IspDF 1, (c) IspDF 2 and (d) IspDF 3. The IspD domain is in pink, the IspF domain is in blue and linker is in green. The N-terminal residue is colored black and C-terminal residue is colored orange...... 48

Figure 2-7 Effect of IspE overexpression on lycopene production. The IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 49

Figure 2-8 Linkers for IspDF 1 and their effect on MEP pathway flux. (a) Strains overexpressing

Dxs, IspDF chimeras and Idi, (b) strains overexpressing Dxs, IspDF chimeras, IspE and Idi. The

IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 50

Figure 2-9 Linkers for non-natural fusions of E. coli IspD and IspF; and their effect on MEP pathway flux. (a) Strains overexpressing Dxs, IspDF chimeras and Idi, (b) strains overexpressing

Dxs, IspDF chimeras, IspE and Idi. The IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 51

Figure 2-10 Linkers for non-natural fusions of E. coli IspD and IspF on MEP pathway flux. The

IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 52

Figure 2-11 Effect of domain separation of IspDF 1 on MEP pathway flux. The IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 53

xxi

Figure 2-12 Non-natural fusions of IspE and their effect on MEP pathway flux. The IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer...... 54

Figure 2-13 Categorical comparison of lycopene production for the various linker...... 55

Figure 3-1 Schematic for the flow of metabolites through the MEP pathway for IspDF, IspE. (a)

Canonical MEP pathway, (b)MEP pathway bifurcation. A: MEP, B: CDP-ME, C:CDP-MEP,

D:MEcPP, B’: ME-2,4-PP ...... 63

Figure 3-2 Bifurcation in the MEP pathway with the promiscuity of IspE. The pathway highlighted with blue arrow is the canonical MEP pathway and the step highlighted by yellow arrow marks the bifurcation step from MEP...... 64

Figure 3-3 IspE expression and analysis on SDS/PAGE. Lane 1: Total soluble protein fraction of

E. coli BL21(DE3), lane 2: total soluble protein fraction of SAHisIspE induced with 0.5 mM IPTG, lane 3: protein ladder, lanes 4 and 5: purified IspE. The bands corresponding to IspE are highlighted with red box...... 69

Figure 3-4 Bioluminescent assay monitoring ATP and ADP in the IspE reactions...... 70

Figure 3-5 HPLC analysis of set 1 reactions to study ATP consumption...... 71

Figure 3-6 HPLC analysis of set 2 reactions with higher IspE concentration to study ATP consumption...... 72

Figure 3-7 HPLC analysis of set 3 reactions when MEP is rate limiting reactant to study ATP consumption...... 73

Figure 3-8 HPLC analysis of set 4 reactions when ADP is used as a phosphate donor...... 74

Figure 3-9 Mass spectra of the peak identified as ME-2,4-PP in IspE reactions...... 75

Figure 3-10 The major mass fragments of ME-3,4-PP...... 75 xxii

Figure 3-11 Canonical substrate (MEP) and alternate substrate (ME-2,4-PP) docked on IspD. .. 76

Figure 3-12 Substrate alignments in the active site for canonical MEP pathway and bifurcation pathway. Top left image: MEP and CTP in IspE, top right image: ME-2,4-PP and CTP in IspE, bottom left image: CDP-ME and ATP in IspE, bottom right image: MEP and ATP in IspE...... 77

Figure 3-13 (a) Canonical substrate (CDP-ME) and (b)alternate substrate (MEPP) docked on IspE.

...... 78

Figure 4-1 Enzyme evolution trajectory to achieve the newer function (reproduced 133 ) ...... 81

Figure 4-2 Protein expression and analysis on SDS/PAGE. (a) Gel image for GPPS expression, lane 1: total protein extract of E. coli BL21(DE3), lane 2: protein ladder, lane 3: total protein extract of engineered E. coli for GPPS uninduced, lane 4: total protein extract of engineered E. coli for GPPS with band corresponding GPPS (32.4 kDa) in the red box; (b) Gel image for (His) 6-

MTS (induction with 0.5 mM IPTG) purified by Ni-NTA column, lane 1: myrcene synthase induced, lane 2: carene synthase induced, lane 3: myrcene synthase uninduced, lane 4: carene synthase uninduced, lane 5: protein ladder, lane 6: limonene synthase uninduced, lane 7: linalool synthase uninduced, lane 8: limonene synthase induced, lane 9: linalool synthase induced. The bands corresponding with the specific protein are highlighted with red box. The bands corresponding with the specific protein are highlighted with red box...... 91

Figure 4-3 Protein expression of (His) 6-GPPS and analysis on SDS/PAGE. Lane 1: protein ladder, lane 2: purified uninduced GPPS (5 ug), lane 3: purified uninduced GPPS (10 ug), lane 4: purified induced (with 0.5 mM IPTG) GPPS (5 ug), lane 5: purified induced (with 0.5 mM IPTG) GPPS

(10 ug) ...... 92

Figure 4-4 Plasmid maps for MTS expression system. (a) Myrcene synthase, (b) Linalool synthase,

(c) Carene synthase, (d) Limonene synthase ...... 93 xxiii

Figure 4-5 Images of cultures (a) aerobic tube culture, (b) microaerobic tube culture, (c) aerobic flask culture, (d) microaerobic flask culture...... 94

Figure 4-6 Effect of culturing conditions on SACar. TA: aerobic tube culture, TM: microaerobic tube culture, FA: aerobic flask culture, FM: micro-aerobic flask culture. Primary Y-axis is carene titer and secondary Y-axis is normalized carene titer. The other culture conditions were: 30 °C temperature, LBY media (supplemented with the antibiotic and 2 mM MgCl 2), 18 h incubation time and 10 % dodecane overlay...... 95

Figure 4-7 Effect of incubation temperature on carene production. Primary Y-axis is carene titer and secondary Y-axis is normalized carene titer. The other culture conditions were: micro-aerobic flask cultures, LBY media (supplemented with the antibiotic and 2 mM MgCl 2), 18 h incubation time and 10 % dodecane overlay...... 96

Figure 4-8 Effect of growth media on terpene production. (a) SACar, (b) SAMyr, (c) SALin, (d)

SALim+. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. The fermentation conditions were: micro-aerobic flask cultures, 2 mM MgCl 2, 18 h incubation time and 10 % dodecane overlay...... 98

+2 Figure 4-9 Effect of Mg concentration on terpene production. (a) SACar in LBY media, (b)

SAMyr in LBY media, (c) SALin in TB media, (d) SALim+ in LBY media. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. The fermentation conditions were: micro-aerobic flask cultures, 18 h incubation time and 10 % dodecane overlay...... 99

Figure 4-10 Effect of dodecane overlay on terpene production. (a) SACar in LBY media, (b)

SAMyr in LBY media, (c) SALin in TB media, (d) SALim+ in LBY media. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. The fermentation conditions were: micro-aerobic flask cultures, 2 mM MgCl 2 and 18 h incubation time...... 101 xxiv

Figure 4-11 Effect of IPTG induction on terpene production. (a) SACar, (b) SAMyr and (d)

SALim+ in LBY media with 10 % dodecane overlay; (c) SALin in TB media with 30 % dodecane overlay. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. The fermentation conditions were: micro-aerobic flask cultures, 2 mM MgCl 2 and 18 h incubation time.

...... 103

Figure 4-12 Fermentative production of monoterpenes over time. (a) SACar, (b) SAMyr, (c)

SALin, (d) SALim+. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. Fermentation conditions are mentioned in Table 4.2...... 106

Figure 4-13 Isomeric structures of carbocations. (a) Cisoid geranyl carbocation, (b) transoid geranyl carbocation, (c) α-terpinyl carbocation and (d) β-terpinyl carbocation...... 110

Figure 4-14 Reaction mechanisms of monoterpene synthases. Monoterpene structures’ color is based on its penultimate carbocation origin...... 111

Figure 4-15 Changes in the orientation of primary geranyl cation on 6,1-closure to form the α- terpinyl cation. (a) Orientation of primary geranyl cation, (b) orientation of α-terpinyl cation. The

TSSs are highlighted in green...... 113

Figure 4-16 Role of amino acid residues in TS active site towards cisoid primary geranyl cation reactivity. The TS is identified in the left top corner of each box. The cisoid primary geranyl cation is highlighted in green, the favorable interactions for 1,6-closure are highlighted in green dotted lines and unfavorable interactions are highlighted in purple dotted lines with distance measurements...... 114

Figure 4-17 TS active site contour towards formations of carene from the α-terpinyl cation. The

TS is identified in the left top corner of each box. The cisoid primary geranyl cation is highlighted in purple and the α-terpinyl cation is highlighted in green.The α-terpinyl cation interactions are xxv

highlighted in purple dotted lines and carene interactions are highlighted in green dotted lines with distance measurements...... 117

Figure 4-18 TS active site contour towards formations of pinyl cation from the α-terpinyl cation.

The TS is identified in the left top corner of each box. The α-terpinyl cation is highlighted in purple

and the pinyl cation is highlighted in green. The α-terpinyl cation interactions are highlighted in purple dotted lines with distance measurements...... 118

Figure 4-19 TS active site contour towards formations of tertiary geranyl cation from cisoid primary geranyl cation. The TS is identified in the left top corner of each box. The cisoid primary geranyl cation is highlighted in purple and the tertiary geranyl cation is highlighted in green.The primary geranyl cation interactions are highlighted in purple dotted lines and tertiary geranyl cation interactions are highlighted in green dotted lines with distance measurements...... 120

Figure 4-20 TS active site contour and participating amino acid residues. The TS is identified in

the left top corner of each box. The surface is highlighted in grey...... 128

Figure 4-21 Predictions for roles of the amino acid residues present in TS active site and structure-

function correlations ...... 129

Figure 5-1 Schematic for construction of fluidized bed capture device (FBCD) ...... 133

Figure 5-2 Schematic of CESR setup ...... 134

Figure 5-3 Bioreactor parameter optimization for efficient carene production and recovery .... 138

Figure 5-4 Experimental apparatus ...... 139

Figure 6-1 Sequential representation of whole cell bio-PV materials. (a) molecular cloning of E.

coli for expression of lycopene, (b) non-covalent surface binding of TiO2 nanoparticles resulting

in core@shell-like morphology and (c) deployment of biogenic PV material towards DSSC

fabrication...... 143 xxvi

Figure 6-2 Electrochemical measurements of bio-PV DSSC. (a) I-V curves, (b) open-circuit photovoltage response (time dependent) and (c) cyclic voltammetry curves ...... 144

Figure 6-3 Oxidative depolymerization of kraft lignin using the Fe@MagTEMPO catalyst .... 146

Figure 6-4 Comparison between E-factor and V-factor ...... 149

Figure 6-5 AFM height profile of nanocellulose samples synthesized from cherry veneer using

Fe@MagTEMPO catalyst. (a) Freshly synthesized nanocellulose sample and (b) nanocellulose sample stored at 4 °C for 10 months ...... 156

Figure 6-6 AFM image showing assembly of microbes on nanocellulose fibrils synthesized from cherry veneer using Fe@MagTEMPO catalyst ...... 157

Figure 0-1 HPLC chromatogram for set 2 IspE in vitro reactions. The peaks for ATP (2.49 min),

ADP (2.60 min) and AMP (2.72 min) are labelled...... 181

Figure 0-2 GC-MS Chromatograms of aerobic shake flask fermentation analysis for SACar,

SAMyr, SALin and SALim+. The monoterpene retention times are: pinene 6.48 min, sabinene

7.42 min, myrcene 7.86 min, carene 8.42 min, limonene 8.88 min, ocimene 9.36 min, terpinene

9.61 min, terpinolene 10.20 min, linalool 10.35 min...... 182

xxvii

List of Equations

Equation 1-1 MVA pathway stoichiometry ...... 5

Equation 1-2 MEP pathway stoichiometry ...... 5

Equation 6-1 The unabridged V-factor equation ...... 150

Equation 6-2 V-factor equation ...... 150

xxviii

List of Abbreviations

ADP Adenosine diphosphate

AFM Atomic force microscopy

AMP Adenosine monophosphate

ATP Adenosine triphosphate

CDP-ME 4-Diphosphocytidyl-2-C-methylerythritol

CESR Continuous ex situ recovery

CTP Cytidine triphosphate

DHAP Dihydroxyacetone phosphate

DMA Dimethylallyl alcohol

DMAP Dimethylallyl monophosphate

DMAPP Dimethylallyl pyrophosphate

DO Dissolved oxygen

DOX 1-Deoxy-d-xylulose

DOXP 1-Deoxy-d-xylulose 5-phosphate

Dxs 1-Deoxy-d-xylulose 5-phosphate synthase

ED Entner-Doudoroff

EMP Embden–Meyerhof–Parnas

ESI Electrospray ionization

FPP Farnesyl pyrophosphate

GAP Glyceraldehyde-3-phosphate

xxix

GGPP Geranylgeranyl pyrophosphate

GPP Geranyl pyrophosphate

GRAS Generally recognized as safe

HMBPP 1-Hydroxy-2-methyl-2-(E)-butenyl-4-diphosphate

ICH International Council for Harmonisation

Idi Isopentenyl diphosphate delta-isomerase

IPK Isopentenyl monophosphate kinase

IPP Isopentenyl pyrophosphate

ISO Isopentenol

IspC or Dxr 1-Deoxy-d-xylulose 5-phosphate reducto-isomerase

IspD or YgbP 4-Diphosphocytidyl-2-C-methylerythritol synthase

IspE or YchB 4-Diphosphocytidyl-2-C-methyl-D-erythritol kinase

IspF or YgbB 2-C-Methyl-D-erythritol 2,4-cyclodiphosphate synthase

IspG or GcpE 4-Hydroxy-3-methylbut-2-en-1-yl diphosphate synthase

IspH or LytB 4-Hydroxy-3-methylbut-2-en-1-yl diphosphate reductase

LB Luria Bertani log P o/w Logarithm of partition coefficient of a compound in oil from water

MD Molecular dynamic

ME 2-Methyl-d-erythritol

MEcPP 2-C-Methyl-D-erythritol-2,4-cyclodiphosphate

MEP 2-C-Methyl-d-erythritol-4-phosphate

MTS Monoterpene synthase

xxx

MVA Mevalonate/mevalonic acid

MWCO Molecular weight cut off

NAD Nicotinamide adenine dinucleotide

NADH Reduced nicotinamide adenine dinucleotide

NADP Nicotinamide adenine dinucleotide phosphate

NADPH Reduced nicotinamide adenine dinucleotide phosphate

NMR Nuclear magnetic resonance

OD 600 Absorbance at 600 nm wavelength

ORF Open reading frame

PDB Protein data bank

PCR Polymerase chain reaction

PDB Protein data bank

RBS Ribosome binding site

SDS/PAGE Sodium dodecyl sulphate poly acrylamide gel electrophoresis

TPP Thiamine pyrophosphate

TS Terpene synthase

TSS Transition state structure vvm Volume of air passed through 1 volume of liquid per minute (in bioreactor)

xxxi

Glossary

Cn This refers to a chemical structure containing ‘n’ number of carbon atoms.

For example, C10 refers to a compound containing 10 carbon atoms.

th Cn This refers to the carbon atom at n position in the structure.

th For example, C 10 refers to the carbon atom at 10 position in the structure.

Error bars The error bars in the figures denote standard deviation.

xxxii

Acknowledgments

I express my gratitude towards my supervisor Prof. Vikramaditya G. Yadav. He continually encouraged me at every step to lead the work and gave me the independence to take up new challenges. His guidance, educational expertise, providing resources required to conduct the research and scholarly feedback gave a direction to my work. I learned many things from him and being a Biofoundry group member that will take me a long way in my academic career. I thank

Prof. Charles Haynes for his support and constructive feedback at every step of my research progress and enriching my work with his enormous experience in the field.

I thank Joe and Sandip who taught me ABC of molecular biology techniques and trained me in cloning experiments. Protiva and I worked together on my first independent molecular biology project of construction of MEP pathway operon and I am grateful for her guidance and support. Discovery of natural fusions would have not been possible without the support from Prof.

Steven Hallam who provided us access to the genomic library and Sandip for working together in developing metagenomic screen. I thank Carmen for answering my doubts regarding cloning techniques and constructing non-natural enzyme fusions. I thank Adhithi for her willingness to take the lycopene work forward and help in analysis. I thank Lina Madilao for her expertise and help in carrying our LC-MS analysis and interpretation of its data.

Prof. Joerg Bohlmann encouraged me to study terpene synthases and provided all logistic support in terms of enzymes and their products required for the work. Every discussion with him gave me a new perspective on the activity of these enzymes and I deeply acknowledge his support and his work in the field. I thank Azin for her support in investigating the bioprocessing challenges and showing a keen interest in continuing the study in detail. The work also required engineering xxxiii

expertise that was provided by Logan. I thank the contributions of Vicente and Rodrigo who were the undergraduate students worked with me at different stages helping me in cloning.

I thank Prof. Scott Renneckar and Saurabh who gave me opportunities to employ my expertise to tackle the challenges in the biomass valorization field. I thank Sarvesh and Prof. David

Wilkinson who realized the potential of my recombinant lycopene strain as a biogenic photovoltaic material. Both these inter-field academic collaborations stimulated my thinking and widened my research approach.

I also thank the industrial collaborator Inmed Pharmaceuticals for their interest in commercializing the platform strains I developed for cannabinoid biosynthesis. I appreciate inputs from their chief scientific officer Eric Hsu in the process. This experience led to the filing of two patent applications. I thank Protiva and Benson who are now working on scale-up studies of my work. My stay in UBC would have never been enjoyable without my labmates Julia, Parisa, Roza,

Gaurav, Amir and Maryam apart from other members already mentioned. I would like to thank the workshop and administrate staff members of the department for the support. I appreciate all the relations I wove during my tenure at UBC, every stitch of which is unique and colorful.

Lastly, there are few people in my life whom I can never thank enough. My parents have always supported me through thick and thin. Their philosophy encouraged me to pursue higher education and seek excellence. My husband Saurabh has not only encouraged me but also reflected it every day through his deeds. My younger brother Harish always kept the innocence and charm alive in me.

xxxiv

Dedication

Dedicated to my Vibrant In dia

xxxv

Chapter 1: Introduction

Chemical engineering and sciences have been catering to the needs of mankind. This industry is operating at a massive scale of $5.1 trillion per year of global sales 1. These include petrochemicals, polymers, bulk chemicals, consumer products and specialty chemicals. Natural products (NP), their derivatives, semi-synthetic analogs and synthetic derivatives are used in sectors like pharmaceuticals, nutraceuticals, flavors, fragrances, cosmetics, energy, agrochemicals etc., and their share in the market is increasing exponentially. Some of the examples of these natural products are isoprenoids, polyketides, alkaloids, phenylpropanoids etc.

1.1 Terpenoid chemical space

Isoprenoids are one of the largest and most diverse classes of natural products with more than 80000 characterized structures 2. Isoprenoids are also known as terpenoids or terpenes though terpenes are regarded as the hydrocarbon moieties which on rearrangements and functionalization lead to terpenoids containing oxygen moieties. The word ‘terpene’ was named by Otto Wallach who received the Nobel prize in 1910 for his pioneering work on terpene characterization 3.

Figure 1-1 depicts the isoprenoid compounds and biosynthetic interlinks that are summarized from sources on the Kyoto Encyclopedia of Genes and Genomes 4. The isoprenoid backbone also forms the structural basis for many other classes of natural products such as zeatin, quinone, alkaloids, polyketides etc.

1

Figure 1-1 Suite of isoprenoid products

1.2 Applications of terpenoids

Terpenoids are involved in a wide variety of biological functions of growth and development in their natural producers. Terpenoids have been evolved to function as defense molecules in plants and hence are naturally tailored to bind to physiological targets. Isoprenoids have applications in pharmaceutical, nutraceutical, food, flavor and fragrance industry. Terpenes possess high calorific value due to their hydrocarbon backbone and serve as fuel. Artemisinin is used as an antimalarial drug, taxol is used as an anticancer agent. Sandalwood oil and patchouli oil are used as fragrances, isoprene is used in polymers; and myrcene, farnesene are used as fuel additives5.

2

1.3 Isoprenoid synthesis

There are the following three routes for isoprenoid synthesis:

Chemical synthesis

Total chemical synthesis of terpenoids involves multistep catalytic processes that suffer

from poor yields due to complexity in terpenoid chemical structure. Total taxol biosynthesis

involves 42 steps with 0.75 % overall yield 6. Isodon polycyclic diterpene synthesis from a non-

natural starting material takes 12 steps involving rigorous conditions of temperature and pressure 7.

(-)-Lingzhiol takes 17 steps with multiple intermediary purifications from a substrate that needs to be synthesized from petrochemical resources 8.

Semi-synthesis

Although natural hosts such as plants produce terpenoids in very low quantities, they

selectively accumulate some of the stable key precursors at higher quantities. and yeast

are engineered to produce such acyclic or cyclic polyprenic intermediates in bulk which are then

purified and converted to the desired isoprenoid via chemical steps. These reactions involve one

or more derivatization steps of the precursor by electronegative groups like alkoxy, phenyl or

halogen followed by selective cyclization reactions 9. This biogenic chemical reactions 10 have lower yields and are difficult to engineer for regioselectivity. These cyclization steps utilize Lewis and Brønsted acid-promoted mechanism 11 . Continuous flow synthesis of artemisinin, an endoperoxide ring containing diterpene, has shown to improve the yield and lower manufacturing cost 12 .

3

Biosynthesis in heterologous hosts

Harvesting terpenoids directly from their natural producers have several consequences.

Their abundance in the natural host is low requiring large biomass processing. For example, 20 kg

of bark from 5 full grown Pacific Yew trees yields 1 kg of an anticancer drug, taxol 13 . Moreover,

the terpenoids usually occur as a mixture in the host from where the desired product needs to be

purified making the downstream process time intensive and cost ineffective, for example,

extraction of sweet steviosides from other bitter and salty analogs in Stevia plant 14 . Apart from these challenges, the recovery of the desired product from the natural producer can often be inefficient such as volatility of isoprene leads to its loss from plants before its capture 15 . Hence,

for majority of these class of products, the growing demand exceeds the supply that can be

sustained by isolation from their natural sources. Hence, alternative sources of expression of the

biosynthetic pathways in heterologous hosts that provide ease of engineering and processing have

gained importance. Microbial fermentations with recombinant strain has shown to produce

selective terpenoids at high titers and productivities16–18 . Pathway engineering tools have further improved yields 19–21 .

1.4 Biosynthetic pathways of isoprenoid production

Biosynthesis of terpenoids in the natural host is a multistep process involving a wide range of enzymes catalyzing reactions that could be summarized as follows:

C5 precursor synthesis

Two isoprenoid biosynthetic pathways exist that synthesize the C5 precursors, IPP and its

isomer DMAPP. Archea and eukaryotes other than plants use the mevalonate-dependent (MVA) 4

pathway exclusively to convert glucose to IPP, which is subsequently isomerized to DMAPP.

Plants use both the MVA and the mevalonate-independent (MEP) pathways for isoprenoid synthesis whereas MEP is dominant in prokaryotes 19,22,23 . There are evidences of the existence of subcellular level compartmentalization of the isoprenoid biosynthetic pathways in higher organisms like plants. The mevalonate pathway is located in the cytoplasm and MEP pathway is located in plastids. The former leads to the formation of sesquiterpenes, triterpenes, squalene, sterols and higher polyprenoids. On the other hand, the MEP pathway is responsible for the formation of the monoterpenes, diterpenes and terpenoids required for the photosynthetic machinery (carotenoids, phytol, prenyl chain of plastoquinone) 24,25 .

3 + 3 + 2 ⁄ → + + 3 + 2 + 2 Equation 1-1 MVA pathway stoichiometry

+ + 3 + 3 ⁄ → + + 3 + 2 + 3 Equation 1-2 MEP pathway stoichiometry

Both Equation 1-1 and Equation 1-2 summarize the stoichiometry of the isoprenoid synthesis pathways from the intermediates of glycolysis. Both MVA and MEP pathways utilize the same number of ATP molecules, but MEP requires one extra NADPH equivalent per IPP synthesized. This explains why MVA pathway engineering yields higher isoprenoid yield than

MEP pathway engineering 26 . The substrate acetyl CoA is biosynthesized from pyruvate that liberates one CO 2 and generates one NADH (that generates one ATP in electron transport chain).

Acetyl CoA from the EMP pathway generates 1 molar equivalent of CO 2. This leads to 4 molar

5

equivalents of CO 2 for MVA versus 1 molar equivalent of for MEP. The overall theoretical yield of both the pathways in E. coli aerobic fermentation is summarized 27 in Table 1.1.

Pathway From glucose From Glycerol

MEP 1.255 Glucose  1 IPP + 2.529 CO 2 2.151 Glycerol  1 IPP + 1.454 CO 2

MVA 1.5 Glucose  1 IPP + 4 CO 2 + 4 NADH 3 Glycerol  1 IPP + 4 CO 2 + 7 NADH

Table 1.1 Theoretical yield of MEP and MVA pathway 27

The MVA pathway consumes 1.5 moles of glucose or 3 moles of glycerol per mole of IPP

and produces excess reducing equivalents (NADH). Regeneration of these reducing equivalents

and imbalances in co-factors further decrease IPP yield. Whereas, MEP utilizes 1.196 moles of

glucose per 2 moles of glycerol for every mole of IPP. Overall, MEP is stoichiometrically and

thermodynamically more balanced than the MVA pathway for isoprenoid biosynthesis 28 . MEP pathway hence holds a potential to serve as a platform to efficiently convert carbon substrate to terpenoids.

Chain elongation

IPP and DMAPP polymerize head to tail to form C10 GPP. Subsequent additions of IPP to the polymerized product result in longer chain carbon scaffolds viz. C15 FPP, C20 GGPP. C30 and C40 scaffolds are formed by head-to-head condensation of two FPP or GGPP respectively.

Enzymes catalyzing these reactions are called prenyltransferase. Transferases attack electron-

6

deficient carbocation generated by the loss of pyrophosphate from the substrate, on electron-rich

double bond of IPP to form longer chain molecule.

Pyrophosphate group removal

These enzymes (Type I) catalyze divalent cation-mediated abstraction of a pyrophosphate

group from the scaffold product of the above step. The reaction also proceeds via stabilization of

carbocation by rearrangement of double bond and hydride shift. In some cases, the reactions also

cascade into cyclization of the scaffold by intramolecular carbon-carbon bond formation 17 .

Enzymes catalyzing these steps are terpene synthases (TSs) or terpene cyclases based on the function they perform.

Scaffold activation/modification

The hydrocarbon scaffolds are activated by various reactions like oxidation, methylation,

isomerization, hydration etc. Cytochrome P450 dependent oxygenases play an important role in

the modification. TSs with specialized reaction mechanisms has been evolved from large terpene

synthase superfamily by diversification of their functions. In this process of evolving themselves

to generate plethora compounds with diverse functionality, TS have expanded their substrate

specificity and catalytic elasticity. Most terpenes that need activation by P450 enzymes, expand

their chemical diversity further as P450 oxygenases are catalytically non-specific too. This has

limited their use in metabolic engineering.

7

1.5 MEP pathway

MEP pathway involves following catalytic steps explained in the order of their natural

occurrence (Figure 1-2).

Dxs (1-deoxy-d-xylulose 5-phosphate synthase)

Presence of 1-deoxy-d-xylulose 5-phosphate (DOXP) as thiamine mediated enzymatic

product of transketoaldolase like enzyme Dxs from pyruvate and glyceraldehyde-3-phosphate

(GAP) as substrates was reported in late 20 th century 29,30 . Dxs The initial experiments performed in E. coli 31,32 successfully incorporated 1-deoxy-d-xylulose C5 chain in isoprenoid metabolites.

The activity was confirmed in bacteria, yeast, fungi and plants. The reaction involves

decarboxylative activation of pyruvate (Pyr) by thiamine and subsequent condensation with

glyceraldehyde-3-phosphate (GAP) and detachment to form 1-deoxy-d-xylulose 5-phosphate

(DOXP)33 . The 1-deoxy-d-xylulose C5 chain was also shown to be the precursor for the

heterocycles of thiamine and pyridoxal 34 . Dxs activity controls the overall flux through the MEP

pathway by acting as a gatekeeper in most of the species 35 . Dxs contains TPP binding site that is

shown to be competitively inhibited by IPP and DMAPP in Populus trichocarpa and Populus

alba 36,37 . E. coli Dxs bears similarity with Dxs from Populus trichocarpa and same feedback inhibition is likely to occur in E. coli .

8

Dxs IspC +

NADPH/H + + CO 2 NADP Pyr GAP DXP MEP

CTP IspD

PP i

IspF IspE

CMP ADP ATP MEcPP CDP-MEP CDP-ME

IspG

DMAPP

Idi IspH

HMBPP IPP

Figure 1-2 MEP pathway

IspC (Dxr, 1-deoxy-d-xylulose 5-phosphate reductoisomerase)

Conversion of DOXP to 2-C-methyl-d-erythritol-4-phosphate (MEP) is a two-step reaction

catalyzed by IspC. It involves an acid catalyzed α-cetol rearrangement reaction 38 to 2-C-methyl- d-erythrose-4-phosphate followed by NADPH mediated reduction to yield MEP 39 requiring Mn +2 metal ion . Presence of 5-phosphate is required for activity and 1-deoxy-d-xylulose (DOX) is not converted to 2-methyl-d-erythritol (ME)31 . The exact catalytic mechanism is unknown.

IspD (YgbP, 4-diphosphocytidyl-2-C-methylerythritol synthase)

Reactions conducted to characterize the IspD activity were carried in E. coli cell extracts

and radiolabeled MEP and cytidine triphosphate (CTP) as substrates and analyzed by NMR 9

spectroscopy on purification 40 . The product was thus characterized as 4-diphosphocytidyl-2-C-

methylerythritol (CDP-ME) and enzymatic activity requires divalent cation like Mg +2 , Mn +2 or

Co +2 as a cofactor. It is the energy-intensive step in the MEP pathway. A homologous gene is also present in bacteria, eubacteria, yeast and plants. The ispF gene is present upstream ispD gene and both are cotranscribed. IspD exists as a homodimer (26 kDa as a monomer and 50 kDa as a dimer) verified from SDS/PAGE in nondenaturing conditions 40 . N-terminal sequencing revealed removal of the start methionine on posttranslational modification. The specific activity is reported to be 23

-1 -1 µmol mg min with K m value of 3.14 µM for MEP and 131 µM for CTP. Ribitol-5-phosphate and erythritol-4-phosphate tested negative for substrates and divalent cations Cu +2 , Ni +2 , Ca +2 , Fe +2 and Zn +2 tested negative for cofactors. Utilization of nucleotide 5-triphosphates by wild type IspD as is CTP=100 %, UTP=30 %, GTP=20 %, ATP=20 %, ITP=17 % suggesting the enzymatic mechanism is selective to CTP 40 .

IspE (YchB, 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase)

The gene ispE was discovered by systematic computer search following dxs, ispC, ispD, and ispF distribution in E. coli chromosome 41 . It translates to a 31 kDa polypeptide IspE that

catalyzes ATP- dependent phosphorylation of CDP-ME. This was confirmed by reaction of

radiolabeled CDP-ME with ATP using purified recombinant IspE 41 . 1H, 13C and 31P NMR spectroscopy studies of the reaction mixture and product confirmed 4-diphosphocytidyl-2-C- methyl-D-erythritol-2-phosphate (CDP-MEP) as the major product and ADP as a by-product.

NMR spectroscopy results nullified the possibility of erythritol-1-phosphate and of erythritol-3- phosphate 41 . Initial studies of isopentenyl monophosphate kinase (IPK), that belongs to the same group of the kinase as IpsE, from Mentha piperita and E. coli reported the promiscuous nature of 10

IspE. It converts isopentenyl monophosphate (IP) to isopentenyl diphosphate and was reported as the terminal step in MEP pathway 42 . It contains a plastidial targeting sequence and forms 33 kDa

(M. piperita ) and 31 kDa ( E. coli ) cytosolic polypeptide. It also exhibits activity for isopentenol

(ISO) and dimethylallyl alcohol (DMA) to respective monophosphate. Dimethylallyl monophosphate (DMAP) does not serve as a substrate. Radiolabeled substrate feeding studies showed that the ease of conversion of metabolites to terpenoids is in the order IPP > ISO > IP and;

DMAP and DMAPP did not get incorporated in terpenoids in vivo 42 .

IspF (YgbB, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase)

The recombinant protein was found to convert CDP-MEP into 2-C-methyl-D-erythritol-

2,4-cyclodiphosphate (MEcPP) and CMP using Mn +2 or Mg +2 as a cofactor. It also converts CDP-

ME to in vitro product 2-C-methyl-D-erythritol-3,4-cyclophosphate but is not involved in isoprenoid biosynthesis 43 . Presence of the reaction cascade comprising the reactions catalyzed by

Dxs, IspC, IspD, IspE and IspF was also confirmed in C. annuum chloroplasts. The crystal

structure of the enzyme revealed the presence of an extra electron dense cavity that can bind to

several isoprenoid diphosphates (like IPP, DMAPP, GPP and FPP) with GPP exhibiting the best

fit. The Arg 142 δ-guanido side chain in the cavity interacts with the diphosphate and hydrophobic

part of the cavity interacts with a hydrocarbon chain of GPP molecule 44 . But these interactions are not reported as catalytic. In vitro studies could not confirm feedback regulation by the prenyl diphosphates, but showed that IspF exhibits enhanced activity in the presence of MEP and ME.

This effect was diminished by FPP 45 .

11

IspG (GcpE, 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase)

IspG catalyzes NADP mediated reductive rearrangement of MEcPP into 1-hydroxy-2-

methyl-2-(E)-butenyl-4-diphosphate (HMBPP) catalyzed by cysteine residues. IspG in E. coli has

three conserved cysteines residues (269, 272, and 305) that is postulated to form a disulfide bond

participate in the reaction via reductive cleavage if disulfide bond by an attack from electronegative

46 C2 center of the substrate and the product was verified by radiolabeling assay . Another study

found that oxygen-sensitive Fe/S redox cluster is necessary for IspG apoenzyme activity and is

present as a [4Fe-4S] protein 47 . ErpA protein is essential for the assembly of the [4Fe-4S] cluster

in IspG 48 . The association of the prosthetic group with IspG was verified by UV/Vis spectroscopy.

These studies were performed with cell-free extracts as well as with purified enzyme in the

presence of flavodoxin, flavodoxin reductase and NADPH as regeneration system. The substrate

MEcPP accumulates in E. coli under oxidative and nitric oxide stress due to the dependence of

IspG on radical oxygen species. HMBPP is also reported to be toxic to IspG activity 36 .

IspH (LytB, 4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase)

Gene ispH was found to be essential for viability of the cell and E. coli with ispH deletion

only survived in media supplemented with mevalonate suggesting its role in MEP pathway 49 . Both

IspG and IspH reactions involve the transfer of 2 electron equivalents and C-OH bond cleavage involving Fe-S cluster as prosthetic group 50 . It catalyzes the conversion of HMBPP to IPP as well as DMAPP. The catalytic site contour determines the ratio of IPP to DMAPP. It is 5:1 in E. coli 46,51 ,

15:1 in Ginko biloba 52 and 2:1 in glumae 53 . The point mutation G120A in IspH lead

to DMAPP as the sole product 54 .

12

Idi (Isopentenyl diphosphate delta-isomerase)

The gene was first isolated from Saccharomyces cerevisiae 55 . Idi catalyzes the terminal step of both the mevalonate and MEP pathway. The reaction involves the conversion of the homoallylic IPP to its highly electrophilic isomer dimethylallyl diphosphate (DMAPP). The mechanism involves 1,3-allylic rearrangement mediated by Cys67, Glu116 and Tyr104 residues in E. coli 56,57 . Idi is a 20 kDa protein and is present in bacteria, yeast and eukaryotes 55 . This terminal step adjusts the distribution of IPP and DMAPP but is not essential for E. coli as IspG can produce IPP and DMAPP unlike MVA pathway 52 . Table 1.2 summarizes the rates and kinetic parameters of the enzymes in the pathway.

.

13

Table 1.2 Kinetic parameters of MEP pathway reactions

Maximum

rate Kcat Enzyme Km (µM) Reaction conditions Ref. (µM.min -1. (s -1) -1 mg enz )

100 mM HEPES, 2 mM MgCl 2, 2.5 mM tris(2- 23.5 (GAP), Dxs - 153.6 carboxyethyl)phosphine, 100 µM 58 48.7 (Pyr) NADPH, 1 mM thiamine diphosphate as cofactor 45 (DXP), 50 mM HEPES at pH 7.6 and 3 IspC 35 29 59 0.5 (NADPH) mM MgCl2, 150 µM NADPH 3.14 (MEP), 100mM Tris-HCl pH 8.0, 20 mM IspD 23 - 40 131 (CTP) sodium fluoride, 10 mM MgCl2 5 mM MgCl2, 60 mM NaCl, 20 200 (CDP- mM HEPES, 1 mM dithiothreitol IspE 34 ME), 20 - (DTT), 0.5% DMSO, 200 µM 60,61 (ATP) CDP-ME (Echelon), and 40 µM ATP 1.2 mM NADH, 6 mM MgCl2, 18 410 (CDP- mM phosphoenolpyruvate, 6 mM IspF - 2.7 62 MEP) ATP, pyruvate kinase (3 U), CMP kinase (1.5 U) [Fe]:[S]:[IspG]=3.9:4.3:1, 1.0 mM IspG 0.55 311 (MEcPP) 0.4 reduced methyl viologen in 100 63 [4Fe-4S] + mM Tris-HCl, pH 8.0 100 mM Tris-HCl pH 8.0, 1.0 mM HMBPP, 50 nM IspH, 5 mM 31.6 dithionite, and 1 mM 6,7-dihydro- IspH 30.4 1125 64 (HMBPP) 2,11-dimethyldipyrido[1,2-a:2,1- c]pyrazinium dibromide as mediator 50 mM Tris-HCl buffer, pH 7.4, containing the divalent metal Idi 4345 9.5 - 56 cofactors (1.5 mm Mn2 + and Mg2 +) and [1-14C]IPP 18 μm

14

1.6 Properties of isoprenoids

Chemical properties

The diversity in the chemical structure of isoprenoids is the outcome of multiplicity and order of a few basic plug-and-play reactions. Terpenes are labeled based on the number of C5 units they contain (Table 1.3). These units are joined in ‘head to tail’ manner except squalene and carotenoids that are joined in tail to tail manner with a basic molecular formula of (C 5H8)n where n is a number of monomeric units.

Table 1.3 Classes of terpenoids

Number of isoprene units involved Terpenoid class Number of carbons in the biosynthesis

Hemiterpene 1 5 Monoterpene 2 10 Sesquiterpene 3 15 Diterpene 4 20 Sesterterpene 5 25 Triterpene 6 30 Tetraterpene 8 40 Polyterpene n (where, n>8) 5n

Terpenoids are acyclic or cyclic and are non-aromatic. The geminal alkyl groups in the

polyprenyl intermediates limit to cyclization to 3, 4 or 6-membered rings. This effect is known as

‘gem dialkyl rule’ postulated by Ingold in 1921. Terpenoids also contain oxygen atoms that are

15

generally present as hydroxyl groups (linalool, phytol), ether (cineole), ketone (verbenone) and endoperoxide (artemisinin) among others.

Terpenes belonging to the same class are structural isomers of each other. Terpenoids also have chiral centers and exhibit stereoisomerism (enantiomers around chiral carbon). They yield isoprene as a major product on thermal degradation. Terpenoids are also prone to oxidation and addition reactions due to the presence of unsaturation.

Physical properties

Most terpenoids are colorless fragrant liquids (exceptions like camphor that is solid at room

temperature) with densities lower than water and are volatile in nature. Terpenes have very close

boiling points and are difficult to separate from a mixture by distillation. Terpenes are hydrophobic

compounds with high log P o/w values. Terpenoids have oxygen moieties that impart polarity to the molecule but are not completely ionized. Terpenoids are water immiscible and are extracted in organic solvents.

1.7 Bioprocess development for isoprene production

The biochemical properties of terpenoids make them excellent candidates for an efficient bioprocess. They are secondary metabolites and are commonly secreted out of the cell. Despite many of their economically interesting applications, there occur very few examples of their commercial production. Because of their cytotoxicity and volatility, the complete bioprocess must include in situ product recovery.

16

1.8 Challenges in terpenoid production

The commercial production of terpenoids using recombinant strain encounters several challenges. The yield of heterologously expressed terpenoid biosynthesis is low. This challenge can be broken down into two parts. The first factor that is responsible for the overall yield of the biosynthesis is the MEP pathway. And, the flux through the MEP pathway is very low resulting in a lower abundance of intracellular IPP and DMAPP pool. The second factor is that the downstream step catalyzed by terpene synthases results in a generation on various side products due to their catalytic promiscuity. When these factors were taken care of resulting in the development of a recombinant selectively making the desired terpenoid at high yield, the recovery of the product is challenging. The current state-of-the-art of biphasic extraction is not scalable. These challenges are summarized and represented in Figure 1-3.

17

Figure 1-3 Challenges in commercialization of terpenoid bioprocess

18

1.9 Research questions

The flux through heterologously expressed plant secondary metabolic pathways is typically quite low. Metabolic engineering, a branch of synthetic biology, is used to modulate the fluxes through these pathways through overexpression of rate-determining enzymes in the pathway. Such a robust and optimized precursor platform can be used to study isoprenoid biosynthesis with high throughput. This chassis can be used to mine new terpenoids 65 that would otherwise remain undetected in the native hosts due to lower abundance. As a consequence, the key questions considered in this thesis include:

• Can more active and stable orthologs of MEP pathway enzymes improve terpenoid

production?

Protein engineering has been employed to improve enzyme throughput. On the other hand,

a third strategy that involves replacing bottleneck steps with more active and/or stable

orthologous enzymes has not witnessed widespread adoption. The fusion enzymes have

shown to improve the efficiency of biocatalysis through substrate channeling and

decreasing diffusional limitations. The study was conducted to mine such fusions from the

soil metagenome for MEP pathway enzymes.

• What are the effects of different linkers on the fusions of enzymes involved in the MEP

pathway?

The IspDF bifunctional enzymes have shown to improve flux through the MEP pathway

due to the co-localization of active centers. Effect of linkers on the localization and the flux

19

is not studied. Linker flexibility/rigidity will play a role in the in situ association of the

enzymes (IspD, IspE and IspF) forming a catalytic complex 66 .

• Why natural fusions of IspD and IspF occur but not the fusions of IspE?

IspD and IspF catalyze non-consecutive steps in the MEP pathway and their fusions are

widely found in . Fusions of IspE are not reported that catalyzed the

intermediary step. Study of non-natural fusions of IspE and various fusion variants of

IspDF can provide insights about MEP pathway control and engineering.

• What factors influence heterologously expressed TS promiscuity?

TSs are reported to be one of the highly promiscuous classes of enzymes. Various amino

acid sequences at the active site control the catalytic mechanism and selectivity. The

distinction between the in vitro and in vivo catalytic promiscuity of these enzymes is

unclear. I planned to undertake a systematic study of this heterologous system.

• How terpenoids can be efficiently recovered from the bioreactor?

The current strategy of terpenoid recovery from the bioreactor involves the overlay of an

organic solvent suffers from many disadvantages. First that they are high boiling solvents

and isolation of terpenoids is energy and cost ineffective. Developing proof of concept of

utilization of resins for de novo terpenoid fermentation is established through this study.

20

1.10 Organization of thesis

Chapters 2 2 3 4 5

Sections  MEP flux improvement Study of non-natural Study of MEP pathway Study of terpene Development of enzyme fusions biochemistry synthase promiscuity efficient terpenoid bioprocess

Description  Discovery of three novel Analysis of linkers. In vitro reactions of IspE Expression of TSs and Considerations to the IspDF bifunctional Construction of non- and; in silico analysis of analysis of products. efficient product

enzymes. natural fusions of IspDF 1 IspE, IspD and IspF. recovery. domains and E. coli IspD, IspF, IspE.

Output  IspDF 1 at least 17% Higher activity of each Detection of a Evidence of influence of Design of adsorption more active enzyme than of the domains of diphosphate intermediate cellular environment on based ex situ continuous

native homologs. IspDF 1. 2.3-fold and discovery of a catalytic intermediates. system for terpenoid improvement in the branch point within MEP Involvement of specific recovery. lycopene yield with pathway. amino acid residues to IspDE fusion. catalytic function.

Deliverables  Patent 1 filed. Patent 2 in preparation. Manuscript 1 ready for submission. Manuscript 2 in Manuscript 3 in preparation. preparation.

21

Chapter 2: Study of enzyme fusions to improve MEP pathway yield

2.1 Introduction

Flux through the MEP pathway in E. coli is very low but the pathway is necessary for cell

viability. The disruption of the pathway genes was reported to be lethal in E. coli 67,68 . The pathway

downstream to Dxs catalytic step can be complemented with heterologous expression of the rate

determining enzymes of the MVA pathway 69 . Dxs deletion cannot be complemented with MVA

34 pathway because of its role in vitamin B6 and B 1 biosynthesis . Whereas IPP and DMAPP are essential for prenylation of t-RNAs 70 and quinones 71 .

As discussed in section 1.3 (page number 4), MEP operates at a higher theoretical yield and is thermodynamically favored over MVA pathway 27 . The experimentally observed MEP

pathway yield is far from the theoretical maxima. MEP pathway can be used to generate a most

robust heterologous platform for isoprenoid biosynthesis on optimization.

Improvements in the precursor supply for the MEP pathway

GAP and pyruvate are the metabolites from the glycolytic pathway involved in central

carbon metabolism. Efforts of improving flux through glycolysis have been limited by the attempts

at enhancing sugar uptake rate 72–74 . As the glucose transporter was made more active, various steps

in the glycolytic pathway lost their metabolic control 75 . The thermodynamics of conversion of

fructose-1,6-diphosphate to DHAP and GAP push the equilibrium towards the substrate 76 .

Isomerization of DHAP and GAP is favored towards DHAP. Some successful efforts have been to channel the flux through the pentose phosphate pathway and ED pathway for isopentenol production 77 . The distribution between GAP and pyruvate has a role in driving flux through the 22

MEP pathway and redirection of flux to GAP from pyruvate lead to improvement in downstream lycopene production 78 . The same study also reported that feeding GAP and pyruvate does not change the flux substantially.

MEP pathway optimization

Improvements in genome sequencing, genome mining, proteomics, metabolomics and bioinformatic tools have provided the field of metabolic engineering to find wider applications.

2.1.2.1 Homologous expression of MEP pathway enzymes

A well-studied strategy is pathway optimization through tools of metabolic engineering.

Overexpressing homologous MEP pathway bottlenecks have proven to greatly enhance the synthesis of terminal isoprenoid products. Overexpression of four genes- dxs, ispD, ispF and idi were shown to improve taxol yield in E. coli 28 . Whereas, overexpression of dxs, ispD, ispF and

ispH improved lycopene yield by 15-fold in Bacillus subtilis 79 .

2.1.2.2 Heterologous expression of MEP pathway enzymes

MEP flux can be upregulated by expression of higher active heterologous MEP pathway

enzymes. This involves the replacement of a single enzyme or the entire pathway construct with

orthologous enzymes. For example, Dxs from Arabidopsis thaliana was expressed in transgenic

Lavandula latifolia led to a 5-fold higher total terpenoid yield 80 .

2.1.2.3 Improvement in genomic MEP control

The genes involved in the MEP pathway are controlled by constitutive promoters.

Chromosomal exchange of dxs promoter with a strong promoter Ptuf in Corynebacterium glutamicum achieved 60 % improved Dxs activity and doubled lycopene production 51 . 23

2.1.2.4 Evolution of MEP pathway genes

Reasons for flux limitations lie in one or more of these factors: low activity, low stability, low expression levels, low solubility, feedback regulation or toxicity. The approach of modification of these enzymes at genetic levels through mutation has been tried. Directed co- evolution of Dxs, Dxr and Idi lead to 60% improvement in lycopene yield in E. coli 81 .

Effect of culture conditions on the solubility of pathway enzymes

Dxs, IspG, IspH and Idi suffer from low solubility and form inactive inclusion bodies on overexpression. Improvement in the solubility hence will greatly impact their activity. Lowering incubation temperature, co-expression with chaperone proteins and protein mutagenesis improve the solubility of the otherwise insoluble protein. Another strategy of supplementing growth media with betaine and sorbitol increased the Dxs solubility by 60%. This also led to overall improvement in the MEP pathway flux 82 .

Bifunctional enzyme discovery and applications

The occurrence of fused IspDF enzyme is common in α and ε proteobacterial genomes but not so in β and γ proteobacterial genomes 83 . IspDF is isolated and studied in detail from

Campylobacter jejuni 83 , Mesorhizobium loti 84 and Agrobacterium tumefaciens 85 .

2.1.4.1 IspDF discovery and structure

The first bifunctional gene was isolated from Campylobacter jejuni 83 , a product of which

(cjIspDF, 42 kDa polypeptide) catalyzed two reactions individually carried out by IspD and IspF with rates of 3.9 µmol.mg -1.min -1 and 0.8 µmol.mg -1.min -1 respectively. These rates are still lower 24

than that of E. coli monofunctional enzymes (Table 1.2). The cjIspDF had a greater similarity with

E. coli IspF (approx. 48 %) than ispD (approx. 25 %). In vitro reactions with purified His tagged protein from recombinant E. coli employing 13 C labeled MEP yielded CDP-ME and addition of

Zn +2 ion as cofactor gave highest rate (18.5 µmol.mg -1.min -1) with Km values of 3 µM and 20 µM for CTP and MEP respectively at pH 5. Presence of ATP did not alter the reaction kinetics until

IspE was added when it led to the formation of MEcPP with the highest activity at pH 8 and Ca +2 as a cofactor with 19 µM Km value for CDP-MEP. The estimated shortest distance between the two catalytic centers of IspD and IspF subunits in the cjIspDF is around 38 Å. The cjIspDF was reported to exist as a trimer, hexamer and dodecamer when analyzed by size exclusion chromatography 83 whereas, the crystal structure is hexameric 66 . It also shows two clear domains for each of the domains joined by a linker sequence. The hexameric assembly contains two trimers of IspD domain dimers and two trimers of IspF domain trimers. In this hexameric complex, one of the IspF domains of corresponding dimers IspD domains associate to form trimers. This means that the individual domains of the same bifunctional polypeptide do not associate.

Another well studied bifunctional IspDF from Mesorhizobium loti (mlIspDF) was expressed in E. coli and was also found to exhibit catalytic activities of both IspD and IspF 84 . The

IspD subunit had 46 % similarity with E. coli IspD whereas, The IspF subunit had 44 % similarity with E. coli IspF. Size exclusion chromatography of the protein sample showed the existence of monomeric unit and dimeric complex of mlIspDF. Higher molecular complexes were not observed.

2.1.4.2 Interactions of IspD, IspE and IspF

Experiments on monomeric E. coli enzymes were performed and analyzed by sedimentation velocity method for 3 sets of combinations: (a) IspD and IspE, (b) IspE and IspF; 25

and (c) IspD, IspE and IspF. These studies revealed the assembly of three IspD dimers, three IspE dimers, and two IspF trimers 66 . The same study revealed that the domains IspD and IspF from

IspDF associate with IspE to form a mega complex 66 and aid the substrate channeling. This was reported for both cjIspDF and atIspDF 85 (IspDF from Agrobacterium tumefaciens ) Trimers of IspD dimer and IspE dimer complex with dimers of IspF trimers to form an assembly of 18 catalytic centers. atIspDF was also detected to associate at higher molecular weight ratios. For cjIspDF, the distance between the two catalytic centers of the same multimer is 35 Å for IspD subunit and 30

Å for IspF subunit which is lesser than the distance between the two catalytic centers of the cjIspDF.

On the other hand, a similar study 85 was done on IspDF and IspE isolated from

Agrobacterium tumefaciens (atIspDF and atIspE respectively). These enzymes were not found to associate based on sedimentation velocity experiments. Further validation was confirmed in vitro condition by adding an inactive form of atIspE by A152A point mutation. The inactive IspE did not change the reaction course of conversion of MEP to MEcPP through atIspDF and atIspE cascade. The mutated IspE should have interacted with the complex and lowered the overall rate of reaction if the enzymes associate to facilitate substrate channeling. The other examples of fusions where the active sites but do not channel the substrates. GlmU enzyme from E. coli involved in peptidoglycan biosynthesis is a bifunctional enzyme that catalyzes the consecutive steps in the pathway but the intermediate is released from the first active site, accumulates in the environment to be acted upon by the second functionality 86 .

26

Naturally occurring enzyme fusions catalyzing non-consecutive pathway steps

Natural occurrence of fusions enzymes that catalyze non-consecutive steps in a

biosynthetic pathway is rare 25 . Gram-positive bacteria like Enterococcus faecalis and

Enterococcus faecium encode a bifunctional enzyme MvaE that possesses both 3-hydroxy-3-

methylglutaryl CoA (HMG-CoA) reductase and acetyl-CoA acetyltransferase activities that are

involved in MVA pathway and are separated by one step catalyzed by HMG-CoA synthase 87,88 .

But no association complex is reported. The second example is involved in the carotenoid biosynthetic pathway. The carRA gene identified in fungi - Phycomyces blakesleeanus and Mucor circinelloides that encodes fusion for phytoene synthase and lycopene cyclase 89,90 . Phytoene synthase is a prenyl transferase that catalyzes the synthesis of phytoene from the condensation of two GGPP molecules. Phytoene is then converted into lycopene by the dehydrogenase encoded by

CarB. β- Carotene is then synthesized by cyclization catalyzed by lycopene cyclase. The reports accept the presence of exceptions of these fusions, they fail to justify the reason as well as usefulness.

The occurrence of enzyme fusions at the genetic level is common. Fatty acid synthesis, polyketide synthesis pathways involve bifunctional enzymes but all of them catalyze consecutive steps in the pathway. The reasons behind the existence of the fusions like IspDF, MvaE and CraAR remain unclear. Though some researchers argue their relevance at metabolic control levels.

Knowledge gap in the understanding of the MEP pathway

There lies a gap between theoretical maximum and experimentally feasible yield of the

MEP pathway. Many efforts are done in the area of genome engineering, protein engineering and

metabolic engineering to fill in the gap. A strategy that involves replacing the bottleneck steps 27

with more active and/or stable orthologous enzymes has not witnessed widespread adoption. The bifunctional enzymes that are reported to be involved in the pathway are promising targets. There are no reports of influence on in vivo MEP flux by these bifunctional IspDFs. The efforts have been directed towards studying the purified proteins for their in vitro activities.

In this work I conducted metagenomic screening for identification of fusions of enzymes

of the MEP pathway with consideration to enhance substrate channeling. All the fusions

discovered were of IspD and IspF. These enzymes are reported to catalyze non-consecutive steps

in the MEP pathway. I conducted a thorough study on the linker characteristics and their influence

on MEP pathway flux. The linker sequence that connects the two domains in a bifunctional enzyme

plays critical role 91,92 . The flexibility and rigidity of the linker play a role in maintaining independence in the movements of the domains. I non-naturally fused IspE to each of IspD and

IspF to mimic natural fusions. Isoprene and lycopene were used as reporter molecules and the comparative study was conducted. Such a robust and high yielding MEP pathway platform strain can thus be utilized to produce isoprenoids as well as to mine new compounds.

Construction of non-natural fusions

Synthetic fusion proteins that have more than one catalytic activity are designed either to

expand the catalytic spectra of the protein or to improve the catalytic efficiency. Expressing a

single fusion protein also substantially reduce production cost leading to higher industrial

applicability 93 . Chemical catalysis has widely accepted the strategy of multifunctional catalyst that is tailored to catalyze more than one type of reactions and has gained popularity in the industry 94,95 .

There are two major ways of generating non-natural fusions96 . First is at the genetic level

by replacing transcriptional stop codon of the first gene and transcriptional start codon of the 28

second gene with a nucleotide sequence that will generate a peptide bond on translation. The second is introducing tags in the protein that trigger an association reaction forming the peptide bond at the post-translational step.

2.1.7.1 Demonstration of non-native function by fusions

Conversion of L-erythrulose from 2-amino-1,2,3-butanetriol was catalyzed by a novel enzyme, ω-transaminase using serine as amine donor. This reaction generated hydroxypyruvate as a byproduct that was shuttled back into a substrate re-generating system as an amine donor by the action of a transketolase enzyme for the conversion of glycoaldehyde to L-erythrulose. The fusion of transaminase and transketolase created an efficient closed loop system 97 . Another study combined four heterologously expressed enzymes to create a multienzyme reaction cascade in E. coli for the conversion of ethylbenzenes to enantiopure ( R)‐1‐phenylethanamines eliminating the need for use of additional co-factors 98 .

2.1.7.2 Non-natural fusions of MEP pathway enzymes

There are no reports on non-natural MEP pathway enzyme fusions. Absence and presence of fusions to aid active site colocalization and thereby channeling substrate for efficient conversion are highly debated topics in the field. Moreover, the fusions of IspD and IspF occur that catalyze non-consecutive steps in the pathway and fusions of IspE have been never reported.

This study has two objectives. First was to mine metagenomes for open reading frame coding for the fusion enzymes involved in the MEP pathway and; the second was to design a strategy to study the role of linker types and their influence on the pathway.

29

2.2 Materials and methods

Metagenomic screening for orthologs of MEP pathway enzymes

Soil samples were collected at the Skulow Lake site (SBS-3 WL) located at coordinates

52° 20’N, 121°55’W as a part of Long-term Soil Productivity (LTSP) study 99 . High molecular

weight genomic DNA was extracted and purified to create large insert fosmid libraries 100–102 . NR

fosmid library was created using the CopyControl™ Fosmid Library Production Kit (Epicentre)

according to the manufacturer's protocol from Bt soil horizon in a naturally disturbed reference

site. Twenty 384-plates from the library were Sanger end-sequenced at the Michael Smith Genome

Science Center (GSC), UBC with the pCC1-Forward (5’-

GGATGTGCTGCAAGGCGATTAAGTTGG) and pCC1-Reverse (5’-

CTCGTATGTTGTGTGGAATTGTGAGC) primers generating ~7680 paired-end sequences.

Approximately 530 fosmids were selected in silico based on phylogenetic gene markers located

on the fosmid ends and functional screens and have been full-length sequenced on the Illumina

HiSeq platform at the GSC. Sequence analysis including open reading frame (ORF) prediction and

annotation was performed using the MetaPathways pipeline v2.5 supplied with a collection of

reference databases (KEGG 2011-06-18, COG 2013-12-27, RefSeq 2014-01-18 and MetaCyc

2011-07-03) 99 . Protein family searches using the online HMMER tool version 2.17.3 103 were performed to confirm functional annotations generated by the MetaPathways tool. The resulting

MetaPathways outputs for the fosmid ends and fully sequenced fosmids were searched for Enzyme

Commission (EC) numbers of genes encoding bifunctional ispDF. Cognate nucleotide sequences were searched against NCBI database using the online BLASTN search tool and resulting text files were uploaded into Megan 6.10.0 to assign using the LCA algorithm 99 . Based on this

30

analysis fosmid sequences of NR0032_N05, NR0032_O07 and NR0037_N05 were assigned to

Acidobacteria and the ispDFs were annotated as ispDF 1, ispDF 2 and ispDF 3 respectively.

Strains, genes and plasmids

All strains, plasmids and genes used in this study are listed in Table 2.1. It contains genetic

operons with natural monomeric enzymes as well as natural fusion enzymes of the MEP pathway.

Genes dxs, ispD, ispE, ispF, idi were amplified from E. coli strain K12 genome by polymerase

chain reaction. Bifunctional genes ispDF1, ispDF2 and ispDF3; and ispS were codon optimized

and synthesized from Genewiz Inc. pTrc-trGPPS(CO)-LS was a gift from Jay Keasling (Addgene

plasmid 50603) 104 from where the vector backbone was amplified to construct the plasmid variants. E. coli DH5 α was used as cloning host and E. coli BL21(DE3) was used as an expression host. Oligonucleotide primers were synthesized from Integrated DNA Technologies, USA. Genes were amplified using Phusion High-Fidelity DNA Polymerase (Thermo Scientific). Recombinant plasmid DNA was purified using the GeneJET plasmid miniprep kit (Thermo Scientific). DNA extractions from agarose gel and PCR reactions were performed using the GeneJET gel extraction and DNA cleanup micro kit (Thermo Scientific). Wizard Genomic DNA purification kit was procured from Promega. All the restriction enzymes, DNA ladders were from New England

Biolabs. Protein ladder, and AnyKD Mini-PROTEAN TGX precast protein gels were from Bio-

Rad. Antibiotics, nutrient media and other reagents were purchased from Sigma-Aldrich. Protein expression was verified by Sodium dodecyl sulfate-polyacrylamide gel electrophoresis

(SDS/PAGE) on cell lysate samples.

31

Table 2.1 Strains, genes and plasmids used for MEP pathway study

Strains Description Source

E. coli DH5 α Cloning strain NEB (# C2987) E. coli BL21(DE3) Expression strain NEB (#C2527) E. coli strain K12 Gene amplification Sigma-Aldrich (#EC1) SASDFI pSASDFI and expressed in E. coli BL21(DE3) This study SAIso pSAIspS expressed in E. coli BL21(DE3) This study SAIso-SDFI pSASDFI and pSAIspS coexpressed in E. coli BL21(DE3) This study

SAIso-SDF 1I pSASDF 1I and pSAIspS coexpressed in E. coli This study BL21(DE3)

SAIso-SDF 2I pSASDF 2I and pSAIspS coexpressed in E. coli This study BL21(DE3)

SAIso-SDF 3I pSASDF 3I and pSAIspS coexpressed in E. coli This study BL21(DE3) SAIso-SI pSASI and pSAIspS coexpressed in E. coli BL21(DE3) This study

SAIso-DF 1 pSADF 1 and pSAIspS coexpressed in E. coli BL21(DE3) This study

SAIso-DF 2 pSADF 2 and pSAIspS coexpressed in E. coli BL21(DE3) This study

SAIso-DF 3 pSADF 3 and pSAIspS coexpressed in E. coli BL21(DE3) This study SALyc pAC-LYC expressed in E. coli BL21(DE3) This study SALyc-SDFI pSASDFI and pAC-LYC coexpressed in E. coli This study BL21(DE3) SALyc-SDFEI pSASDFEI and pAC-LYC coexpressed in E. coli This study BL21(DE3)

SALyc-SDF 1I pSASDF 1I and pAC-LYC coexpressed in E. coli This study BL21(DE3)

SALyc-SDF 1EI pSASDF 1EI and pAC-LYC coexpressed in E. coli This study BL21(DE3)

32

SALyc-SDF 2I pSASDF 2I and pAC-LYC coexpressed in E. coli This study BL21(DE3)

SALyc-SDF 3I pSASDF 3I and pAC-LYC coexpressed in E. coli This study BL21(DE3) SALyc-SI pSASI and pAC-LYC coexpressed in E. coli BL21(DE3) This study

SALyc-DF 1 pSADF 1 and pAC-LYC coexpressed in E. coli BL21(DE3) This study

SALyc-DF 2 pSADF 2 and pAC-LYC coexpressed in E. coli BL21(DE3) This study

SALyc-DF 3 pSADF 3 and pAC-LYC coexpressed in E. coli BL21(DE3) This study Plasmids Description Source pSASDFI Amp r; trc promoter; genes dxs, ispD, ispF and idi; pBR322 This study ori pSASDFEI Amp r; trc promoter; genes dxs, ispD, ispF, idi and ispE; This study pBR322 ori r pSASDF 1I Amp ; trc promoter; genes dxs, ispDF 1 and idi; pBR322 ori This study r pSASDF 2I Amp ; trc promoter; genes dxs, ispDF 2 and idi; pBR322 ori This study r pSASDF 3I Amp ; trc promoter; genes dxs, ispDF 3 and idi; pBR322 ori This study r pSASDF 1EI Amp ; trc promoter; genes dxs, ispDF 1, idi and ispE; This study pBR322 ori r pSADF 1 Amp ; trc promoter; ispDF 1; pBR322 ori This study r pSADF 2 Amp ; trc promoter; ispDF 2; pBR322 ori This study r pSADF 3 Amp ; trc promoter; ispDF 3; pBR322 ori This study pSAIspS Cam r; araBAD promoter; ispS; p15A ori This study r pSAHisDF 1 Cam ; T7 promoter; (His) 6 tagged ispDF 1; p15A ori This study r pSAHisDF 2 Cam ; T7 promoter; (His) 6 tagged ispDF 2; p15A ori This study r pSAHisDF 3 Cam ; T7 promoter; (His) 6 tagged ispDF 3; p15A ori This study pSASI Amp r; trc promoter; genes dxs and idi; pBR322 ori This study pAC-LYC Cam r; crtE, crtI, and crtB under endogenous promoter; Addgene p15A ori plasmid 53270 105

33

Genes Description Source dxs 1-deoxy-D-xylulose-5-phosphate synthase NCBI Gene ID: 945060 ispD 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase NCBI Gene ID: 948269 ispE 4-(cytidine 5'-diphospho)-2-C-methyl-D-erythritol kinase NCBI Gene ID: 945774 ispF 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase NCBI Gene ID: 945057 idi isopentenyl-diphosphate Delta-isomerase NCBI Gene ID: 949020 ispDF 1 Codon optimized bifunctional 2-C-methyl-D-erythritol 4- This study, phosphate cytidylyltransferase/2-C-methyl-D-erythritol USPTO 2,4-cyclodiphosphate synthase (NR0032_N05) PCT/CA2018/0 51073 ispDF 2 Codon optimized bifunctional 2-C-methyl-D-erythritol 4- This study, phosphate cytidylyltransferase/2-C-methyl-D-erythritol USPTO 2,4-cyclodiphosphate synthase (NR0032_O07) PCT/CA2018/0 51073 ispDF 3 Codon optimized bifunctional 2-C-methyl-D-erythritol 4- This study, phosphate cytidylyltransferase/2-C-methyl-D-erythritol USPTO 2,4-cyclodiphosphate synthase (NR0037_N05) PCT/CA2018/0 51073 ispS Isoprene synthase ( Populus alba sp.) UniProtKB: Q50L36.1 106

34

Non-natural protein fusions

We constructed fusions with different linkers. The linkers used and their sequences are

listed in Table 2.2. The linkers were added by PCR and generated by Gibson assembly.

Table 2.2 Types of linkers used in the study and their sequences

Linker type Polypeptide sequence (N terminus  C terminus) Reference Flexible Linker (FL) SLGGGGSAAA 107,108 Rigid Linker (RL) AEAAAKEAAAKEAAAKEAAAKEAAAKAAA 107,108 cjIspDF Linker (CJ) LPTPSFE 83

IspDF 1 Linker (XL) RQRLRSAVAA This study

CJ and XL linkers sequences were generated by aligning sequences of respective fusion

enzyme with E. coli IspD and IspF. Homology models were built for the natural as well as non- natural chimeric fusions using SWISS-MODEL server. The non-natural fusions are listed in Table

2.3. The IspD and IspF domains of IspDF 1 were also expressed separately. This was achieved by adding a stop codon (TAA) at the end of genetic sequence for domain ispD, taking out the genetic sequence for the linker, adding a RBS and a start codon (ATG) in frame with the genetic sequence for IspF. This enabled transcriptional level separation for the two domains. The genetic sequence coding for IspD domain is denoted as ispD 1 with corresponding protein as IspD 1. The genetic sequence coding for IspF domain is denoted as ispF 1 with corresponding protein as IspF 1.

35

Table 2.3 List of non-natural protein fusions

Fusion N-terminal C-terminal Linker Enzyme protein/domain protein/domain

IspDFL F Flexible Linker (FL) E. coli IspD E. coli IspF

IspD RL F Rigid Linker (RL) E. coli IspD E. coli IspF

IspD CJ F cjIspDF Linker (CJ) E. coli IspD E. coli IspF

IspD XL F IspDF 1 Linker (XL) E. coli IspD E. coli IspF

IspD FL F1 Flexible Linker (FL) IspD domain of IspDF1 IspF domain of IspDF 1

IspD RL F1 Rigid Linker (RL) IspD domain of IspDF 1 IspF domain of IspDF 1

IspD CJ F1 cjIspDF Linker (CJ) IspD domain of IspDF 1 IspF domain of IspDF 1

IspD FL E Flexible Linker (FL) E. coli IspD E. coli IspE

IspE FL F Flexible Linker (FL) E. coli IspE E. coli IspF

Construction of plasmids and strains for analysis of non-natural fusions

The non-natural fusions were cloned with other genes involved in the MEP pathway to assess their influence on the pathway flux. These constructs and strains are mentioned in Table

2.4.

Table 2.4 Strains and plasmid expressing non-natural fusion proteins

Strains Description

SALyc-SD FL FI pSASDFL FI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD RL FI pSASDRL FI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD CJ FI pSASDCJ FI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD XL FI pSASDXLFI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD FL FEI pSASDFL FEI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD RL FEI pSASDRL FEI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD CJ FEI pSASDCJ FEI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD XL FEI pSASDXL FEI and pAC-LYC coexpressed in E. coli BL21(DE3)

36

SALyc-SD FL F1I pSASDFL F1I and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD RL F1I pSASDRL F1I and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD CJ F1I pSASDCJ F1I and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD 1F1I pSASD 1-F1I and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD FL F1EI pSASDFL F1EI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD RL F1EI pSASDRL F1EI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD CJ F1EI pSASDCJ F1EI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SD 1F1EI pSASD 1-F1EI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SDFL EFI pSASDFL EFI and pAC-LYC coexpressed in E. coli BL21(DE3)

SALyc-SEFL FDI pSASEFL FDI and pAC-LYC coexpressed in E. coli BL21(DE3) Plasmids Description

r pSASDFL FI Amp ; trc promoter; genes dxs, ispD FL F and idi; pBR322 ori r pSASDRL FI Amp ; trc promoter; genes dxs, ispD RL F and idi; pBR322 ori r pSASDCJ FI Amp ; trc promoter; genes dxs, ispD CJ F and idi; pBR322 ori r pSASDXL FI Amp ; trc promoter; genes dxs, ispD XL F and idi; pBR322 ori r pSASDFL FEI Amp ; trc promoter; genes dxs, ispD FL F, idi and ispE; pBR322 ori r pSASDRL FEI Amp ; trc promoter; genes dxs, ispD RL F, idi and ispE; pBR322 ori r pSASDCJ FEI Amp ; trc promoter; genes dxs, ispD CJ F, idi and ispE; pBR322 ori r pSASDXL FEI Amp ; trc promoter; genes dxs, ispD XL F, idi and ispE; pBR322 ori r pSASDFL F1I Amp ; trc promoter; genes dxs, ispD FL F1 and idi; pBR322 ori r pSASDRL F1I Amp ; trc promoter; genes dxs, ispD RL F1 and idi; pBR322 ori r pSASDCJ F1I Amp ; trc promoter; genes dxs, ispD CJ F1 and idi; pBR322 ori r pSASD 1-F1I Amp ; trc promoter; genes dxs, ispD1, ispF 1 and idi; pBR322 ori r pSASDFL F1EI Amp ; trc promoter; genes dxs, ispD FL F1, idi and ispE; pBR322 ori r pSASDRL F1EI Amp ; trc promoter; genes dxs, ispD RL F1, idi and ispE; pBR322 ori r pSASDCJ F1EI Amp ; trc promoter; genes dxs, ispD CJ F1, idi and ispE; pBR322 ori r pSASD 1-F1EI Amp ; trc promoter; genes dxs, ispD1, ispF 1, idi and ispE; pBR322 ori r pSASDFL EFI Amp ; trc promoter; genes dxs, ispD FL E, idi and ispF; pBR322 ori r pSASEFL FDI Amp ; trc promoter; genes dxs, ispE FL F, idi and ispD; pBR322 ori

37

Culture conditions

Both isoprene and lycopene starter cultures were cultivated overnight at 30 °C in LB media

(Sigma-Aldrich) containing appropriate antibiotic/s. Isoprene starter cultures were then diluted to

15 mL with the medium to OD 600 of 0.2, induced with arabinose and/or IPTG; and allowed to grow for 24 h at 30 °C in 25 mL sealed glass tube. Lycopene starter cultures were diluted to 5 mL with

the medium to OD600 of 0.2, induced with IPTG, and allowed to grow for 24 h at 30 °C in culture tubes in the dark.

Isoprene analysis

Isoprene analysis was performed on PerkinElmer Clarus 680 gas chromatograph and Perking

Elmer Clarus SQ 8 T mass spectrometer (GC-MS). Since isoprene is volatile monoterpene, the sealed cultures were heated at 70 °C for 1 min and vortexed for 5 sec before sampling 200 µL of headspace using a gas-tight syringe. The standard curve for isoprene was prepared in a similar manner for quantification. HP-5MS capillary column (25 m long, 0.2 mm internal diameter, 0.33 µm film thickness; Agilent Technologies) was used, with helium (1 mL/min) as a carrier gas. The oven temperature program was 35 °C for 3 min, 25 °C/min to 200 °C and hold for 1min. The injector

was maintained at 60 °C and 20:1 split ratio was maintained. Mass spectrum acquisition was carried out in SIR mode for m/z 68 and m/z 67 ions.

Lycopene analysis

Lycopene is an intracellular product. 2 mL of cell culture was centrifuged at 8000 rpm for

5 min and lycopene was extracted by extraction from the pellet with 1 mL acetone. Extraction was

performed at 55 °C with intermittent vortexing for 20 min in reduced light condition. The acetone 38

suspension was centrifuged and filtered before analysis. Samples were analyzed on the

PerkinElmer Flexar system equipped with Zorbax C-18 column (4.6 × 250 mm, Agilent

Technologies) maintained at 30 °C. Samples were run with mobile phase consisting of 66% (v/v) methanol, 30% (v/v) tetrahydrofuran and 4% (v/v) water at 1 mL/min flow rate. Lycopene detection was done by monitoring absorbance at 474 nm wavelength using a photodiode detector.

2.3 Results

Discovery of three IspDF fusions

Soil metagenome sequences were screened and led to the discovery of novel bifucyional fusions of IspD and IspF. They were isolated from fosmids NR0032_N05, NR0032_O07 and

NR0037_N05 and the corresponding genes were annotated as ispDF 1, ispDF 2 and ispDF 3 respectively. The translated polypeptides were annotated as IspDF 1 (41.6 kDa), IspDF 2 (42.1 kDa) and IspDF 3 (40.2 kDa) respectively. The sequences of the novel enzymes are listed in Appendix

A These genes were tagged for affinity-based separation and expressed in E. coli BL21(DE3) using 0.5 mM IPTG as an inducer. Figure 2-1 is the image of the SDS/PAGE gel stained with

Coomassie dye. Desired bands were seen on the gel but the expression levels of IspDFs were low.

Insoluble cell debris were denatured and analyzed, and it was realized that all three fusions formed inclusion bodies.

Sequences of IspDF 1, IspDF 2 and IspDF 3 were aligned with E. coli IspD, IspF and cjIspDF

(Table 2.5 and Figure 2-2). The discovered enzymes were more similar to the native monofunctional enzymes in E. coli . When aligned against cjIspDF 83 , more differences were

observed. Though most of the residue functions were conserved among all five (Figure 2-2), the

dissimilarity existed in clusters. The amino acid region between 220 and 250 residues was highly 39

variable and was involved in linking both the domains. Other dissimilar clusters were observed in

IspD domain of the fusion. All three IspDFs discovered have novel sequence and are not reported.

Figure 2-1 Expression of the novel IspDFs as (His) 6 tagged protein expressed in E. coli

BL21(DE3) and their analyses on SDS/PAGE. Lanes 1 and 5: total and purified IspDF 1 extract

respectively, lanes 2 and 6: total and purified IspDF 2 extract respectively, lanes 4 and 7: total

and purified IspDF 3 extract respectively, lanes 2 and 8: protein ladder. The bands corresponding

with the specific protein are highlighted with red box.

Table 2.5 Protein alignment analysis of the bifunctional enzymes against E. coli IspD-IspF

and cjIspDF using the online BLASTN search tool

% Query cover % Sequence % Sequence % Query cover Bifunctional when aligned similarity when similarity when when aligned enzymes with E. coli IspD, aligned with E. aligned with with cjIspDF IspF coli IspD, IspF cjIspDF

IspDF 1 97 40.81 94 29.71

IspDF 2 99 40.72 97 29.75

IspDF 3 98 41.60 99 32.20

40

Figure 2-2 Protein sequence alignment generated from the Centre for Genomic Regulation

(CRG) M-Coffee tool. IspD-IspF is the sequences for E. coli IspD and IspF.

41

Each domain of the fusion enzymes was aligned against E. coli IspD and E. coli IspF (Table

2.6). The IspF domains of the fusions share greater sequence similarity with E. coli IspF than the similarity between IspD domain and E. coli IspD. This observation is consistent with the similarity

reported for cjIspDF with E. coli native enzymes 66 . IspF domain of cjIspDF shares 48 % sequence similarity with E. coli IspD whereas IspD domain shares 25 % similarity with E. coli IspD.

Table 2.6 Protein alignment analysis of each domain of the bifunctional enzymes against

corresponding E. coli monofunctional enzymes using the online BLASTN search tool

% Sequence % Sequence % Query cover % Query cover similarity when similarity when Bifunctional when IspD when IspF IspD domain IspF domain enzymes domain aligned domain aligned aligned with E. aligned with E. with E. coli IspD with E. coli IspF coli IspD coli IspF

IspDF 1 94 35.71 90 56.38

IspDF 2 100 37.55 89 49.35

IspDF 3 98 35.93 96 51.30

When the domains of fusions were aligned against cjIspDF domains, a similar trend was observed (Table 2.7).

42

Table 2.7 Protein alignment analysis of each domain of the bifunctional enzymes against

corresponding cjIspDF enzyme domains using the online BLASTN search tool

% Sequence % Sequence % Query cover % Query cover similarity when similarity when when IspD when IspF Bifunctional IspD domain IspF domain domain aligned domain aligned enzymes aligned with IspD aligned with IspF with IspD domain with IspF domain domain of domain of of cjIspDF of cjIspDF cjIspDF cjIspDF

IspDF 1 95 23.77 86 42.36

IspDF 2 97 24.89 86 41.72

IspDF 3 95 26.89 95 43.14

43

Construction of MEP pathway operon

Enzymatic steps catalyzed by Dxs, IspD, IspF and Idi are the rate-controlling steps of the MEP pathway 28 in E. coli . The same operon was reconstructed (pSASDFI) and analyzed for protein expression. The soluble protein samples were run SDS/PAGE gel and stained with Coomassie dye.

Figure 2-3 SDS/PAGE image of the soluble protein fraction of pSASDFI. Lane 1: E. coli

BL21(DE3), lane 2: protein ladder, lane 3 and 4: SASDFI. The bands corresponding to protein

are: Dxs (band a, 68.2 kDa), IspD (band b, 25.7 kDa), IspF (band d, 16.9 kDa) and Idi (band c,

21.2 kDa).

SASDFI was tested for activity towards isoprene and lycopene production by co- expressing with downstream pathway (pSAIspS and pAC-LYC respectively). The clone expressing Dxs and Idi (pSASI) was constructed to account for the influence of IspD and IspF on MEP pathway flux improvement.

44

0 5 10 0 5 400 120 8 2.0 0 µM 25 µM 50 µM Normalized Titer 0 µM 50 µM Normalized Titer

300 90 6 1.5

200 60 4 1.0

100 30 2 0.5 Isoprene Produced Isoprene (mg/L) Lycopene (mg/L) Produced Lycopene Isoprene Produced Isoprene (mg/L/OD) 0 0 (mg/L/OD) Produced Lycopene 0 0.0 (a) (b)

Figure 2-4 Influence of rate-limiting steps on MEP pathway flux. (a) Lycopene production,

(b) Isoprene production. The IPTG concentrations used for induction are denoted in the legends.

Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer.

SALyc and SAIso made the corresponding terpenoid at very low yield (Figure 2-4). These strains reflect the native expression level of the MEP pathway. Induction did not have a substantial influence on terpenoid production. IPTG induction for SAIso had a negative impact on cell growth and hence shows higher normalized yield. Higher IPTG induction levels were detrimental to lycopene production and had a negative influence on growth. Overexpression of Dxs and Idi

(strains SALyc-SI and SAIso-SI) produced 22-fold and 12-fold more terpenoid respectively.

Additional expression of IspD and IspF (strains SALyc-SDFI and SAIso-SDFI) further enhanced the terpenoid production by 47-fold and 15-fold respectively. Uninduced cultures of SALyc-SI and SALyc-SDFI still produced lycopene at a higher yield than that of SALyc.

45

Influence of novel fusion enzymes on the MEP pathway

All three fusions exhibited different effects on isoprene and lycopene production (Figure

2-5). SALyc-SDF 1I and SAIso-SDF 1I were the best performers. There was 20% and 75 %

improvement in lycopene and isoprene production respectively for IspDF 1 strains. The IspDF 2 and

IspDF 3 versions lowered the titer. OD 600 for strains were in a similar range. IspDF 1 variants showed

higher normalized titer which means the catalytic throughput was improved as well. SALyc-SDF 1I was tested at IPTG induction concentrations of 75 µM and 100 µM, but the titer declined, and the maximum titer was obtained at 50 µM IPTG concentration.

To assess the influence sole contribution from IspDFs, strains SAIso-DF 1, SAIso-DF 2 and

SAIso-DF 3 were tested for isoprene productions; and strains SALyc-DF1, SALyc-DF 2 and SALyc-

DF 3 were tested for lycopene production. All these six strains made respective terpenoid in the levels equal to SAIso and SALyc (data not shown). The induction had no effect on the terpenoid titer.

46

500 0 2 4 6 8 10 12 14 16 12 0 2 4 6 8 10 12 0 µM 50 µM Normalized Titer 0 µM 25 µM 50 µM Normalized Titer 120 10 2.4 400

8 300 80 1.6 6 200 4 40 0.8 100 2 Isoprene Produced Isoprene (mg/L) Lycopene (mg/L) Produced Lycopene Isoprene Produced Isoprene (mg/L/OD)

0 0 (mg/L/OD) Produced Lycopene 0 0.0 (a) (b)

Figure 2-5 Influence of novel IspDF fusions on MEP pathway flux. (a) Lycopene production,

(b) Isoprene production. The IPTG concentrations used for induction are denoted in the legends.

Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer.

Homology models for the fusions were generated by SWISS-MODEL using cjIspDF as a template (Figure 2-6). All four fusions have conserved subunit structures. IspDF 1 and IspDF 3 align well with cjIspDF but IspDF 2 has a longer linker. The active sites of the subunits are located at opposite ends. The putative linker sequences are: EAIARGTGERAVGERAA for IspDF 2 and

ERLIGARNTAGAM for IspDF 3. Since, IspDF 1 improved the terpenoid titer, it was used for further study.

47

Figure 2-6 Homology models for the fusion proteins generated by SWISS-MODEL tool. (a)

66 cjIspDF , (b) IspDF1, (c) IspDF 2 and (d) IspDF 3. The IspD domain is in pink, the IspF domain is

in blue and linker is in green. The N-terminal residue is colored black and C-terminal residue is

colored orange.

Role of IspE

IspE is reported to influence the flux by associating with IspD and IspF 66 . The association complex then assists the efficient transfer and conversion of metabolites from MEP to MEcPP. I investigated this phenomenon for lycopene production by testing the recombinant E. coli strain expressing five enzymes Dxs, IspD, IspF (or IspDF), IspE and Idi. For both SALyc-SDFEI and

48

SALyc-SDF1EI had lower lycopene titers than SALyc-SDFI and SALyc-SDF 1I respectively

(Figure 2-7). The percent loss in flux on IspE overexpression was more evident for IspDF 1 clone than monofunctional native enzyme clone. This effect was a summation of the lower rate of lycopene as well as lower cell growth rate. The OD 600 in IspE clones was remarkably lower (by

20-60 %). SALyc-SDFEI cultures had higher variable growth reflecting in wider error bars.

600 0 2 4 6 8 10 12 14 16 160 0 µM 25 µM 50 µM Normalized Titer

500 120 400

300 80

200 40 100 Lycopene (mg/L) Produced Lycopene

0 0 (mg/L/OD) Produced Lycopene

Figure 2-7 Effect of IspE overexpression on lycopene production. The IPTG concentrations

used for induction are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-

axis is normalized terpene titer.

Effect of linkers on IspDF 1 activity

To evaluate the role of the linker in the enhancement of flux in SALyc-SDF 1I, I replaced the putative linker sequences with three types of linkers. First is the linker identified from cjIspDF.

Second is ‘FL’ that is glycine and serine linker and imparts flexibility to the domains. The third is

49

‘RL’ that forms an α-helix and restricts the free movement and giving rigidity to the conformation.

The effect of the linker was tested in strains with and without IspE overexpression. The non-natural linkers did not improve the overall titers of lycopene (Figure 2-8) but influenced cell viability and lowered OD 600 for the cultures. Normalized titers were highest for SALyc-SD RL F1I followed by

SALyc-SD CJ F1I. The clone with flexible linker displayed lowest lycopene titers in both the sets.

600 0 2 4 6 8 10 12 14 16 600 0 2 4 6 8 10 12 14 16 90 0 µM 25 µM 50 µM Normalized Titer 0 µM 25 µM 50 µM Normalized Titer 160 500 500

400 120 400 60

300 300 80 200 200 30 40 100 100 Lycopene (mg/L) Produced Lycopene (mg/L) Produced Lycopene Lycopene (mg/L/OD) Produced Lycopene (mg/L/OD) Produced Lycopene

(a) 0 0 (b) 0 0

Figure 2-8 Linkers for IspDF 1 and their effect on MEP pathway flux. (a) Strains

overexpressing Dxs, IspDF chimeras and Idi, (b) strains overexpressing Dxs, IspDF chimeras,

IspE and Idi. The IPTG concentrations used for induction are denoted in the legends. Primary Y-

axis is terpene titer and secondary Y-axis is normalized terpene titer.

50

Effect of linkers on non-natural fusions of monofunctional IspD and IspF

Likers in section 2.3.5 above failed to improve the lycopene titers but had a positive impact on the normalized titers. This means that the linkers improved the flux at the cost of cell growth.

The same linkers along with the natural linker of IspDF 1 were then employed to link E. coli IspD

and IspF. Any type of linkers did not perform better than the monofunctional enzyme construct for

lycopene production (Figure 2-9). For strains in Figure 2-9(b), the lower normalized titers were

the result of higher OD 600 . This suggests overall carbon flux channeling towards cell growth metabolisms. Whereas, for strains depicted in Figure 2-9(a), the fusions had a negative impact on lycopene products without the substantial effect of cell growth.

600 0 2 4 6 8 10 12 14 16 120 600 0 2 4 6 8 10 12 14 16 160 0 µM 25 µM 50 µM Normalized Titer 0 µM 25 µM 50 µM Normalized Titer

500 500 120 400 80 400

300 300 80

200 40 200 40 100 100 Lycopene (mg/L) Produced Lycopene (mg/L) Produced Lycopene Lycopene (mg/L/OD) Produced Lycopene (mg/L/OD) Produced Lycopene

(a) 0 0 (b) 0 0

Figure 2-9 Linkers for non-natural fusions of E. coli IspD and IspF; and their effect on

MEP pathway flux. (a) Strains overexpressing Dxs, IspDF chimeras and Idi, (b) strains

overexpressing Dxs, IspDF chimeras, IspE and Idi. The IPTG concentrations used for induction

are denoted in the legends. Primary Y-axis is terpene titer and secondary Y-axis is normalized

terpene titer. 51

Effect of XL on non-natural fusions of monofunctional E. coli IspD and IspF

Since the strains exhibited a mixed response to CJ, FL and RL linkers, fusions of IspD and

IspF with the putative linker of IspDF 1 were constructed. These fusions lowered the MEP flux and further decreased the lycopene production (Figure 2-10). This effect was pronounced for SALyc-

SD XL FI than SALyc-SD XL FEI.

600 0 2 4 6 8 10 12 14 16 160 0 µM 25 µM 50 µM Normalized Titer

500 120 400

300 80

200 40 100 Lycopene (mg/L) Produced Lycopene Lycopene (mg/L/OD) Produced Lycopene

0 0

Figure 2-10 Linkers for non-natural fusions of E. coli IspD and IspF on MEP pathway flux.

The IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is

terpene titer and secondary Y-axis is normalized terpene titer.

Expression of domains of IspDF 1 as IspD 1 and IspF 1

The XL linker’s negative impact on the pathway flux suggested the need to study the

domains of IspDF 1 in isolation (Figure 2-11). The separation of the domains as individual enzymes had a more pronounced effect on SALyc-SD 1F1EI.

52

600 0 2 4 6 8 10 12 14 16 160 0 µM 25 µM 50 µM Normalized Titer

500 120 400

300 80

200 40 100 Lycopene (mg/L) Produced Lycopene Lycopene (mg/L/OD) Produced Lycopene 0 0

Figure 2-11 Effect of domain separation of IspDF 1 on MEP pathway flux. The IPTG

concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and

secondary Y-axis is normalized terpene titer.

Non-natural fusions of IspE and their effects on MEP pathway flux

To evaluate the cause behind the natural existence of fusions of enzymes that catalyze non- consecutive steps in the MEP pathway, I constructed non-natural fusions of IspE. The fusions were constructed using the flexible linker. The linking strategy was kept similar to that of natural

IspDFs. The IspDE fusion was constructed by linking C-terminus of IspD to N-terminus of IspE.

And, the IspEF fusion was constructed by linking C-terminus of IspE to N-terminus of IspF. Figure

2-12 shows that IspDE fusion exhibited a 20 % improvement in lycopene production compared to

SALyc-SDFI and 2.3-fold improvement than SALyc-SDFEI. Whereas, IspEF fusion lowered the lycopene production substantially. 53

600 0 5 10 15 20 0 µM 25 µM 50 µM Normalized Titer 160 500

120 400

300 80

200 40 100 Lycopene (mg/L) Produced Lycopene Lycopene (mg/L/OD) Produced Lycopene 0 0

Figure 2-12 Non-natural fusions of IspE and their effect on MEP pathway flux. The IPTG concentrations used for induction are denoted in the legends. Primary Y-axis is terpene titer and

secondary Y-axis is normalized terpene titer.

Categorical inferences

Figure 2-13 summarizes the results obtained so far. It is a comparison plot for different constructs with the highest titer and normalized titer values. The blank places denoted by ‘-‘.

54

Figure 2-13 Categorical comparison of lycopene production for the various linker.

*IspD and IspF domains of IspDF 1 are separated to be translated as two polypeptides.

2.4 Discussions

Influence of native Dxs, IspD, IspF and Idi on MEP pathway flux

The downstream lycopene production genes (crtE, crtB, crtI) are under the control of an endogenous promoter and MEP pathway genes are under the control of trc promoter that is reported to be leaky 109–111 . Due to these reasons, lycopene cultures at no induction produced higher

55

lycopene than that of the base strain SALyc. Higher normalized titers in both lycopene and isoprene fermentation indicate an abundance of C5 precursor metabolites- IPP and DMAPP that are shuttled to respective downstream terpene synthesis pathway.

To study fusions and role of linkers, it was necessary to construct the basal operon overexpressing Dxs, IspD, IspF and Idi that was reported to increase the taxol yield 28 . This strain containing plasmid pSASDFI served as the basis for comparison in this study. Some reports emphasized overexpression of Dxs and Idi only for improvement of MEP pathway flux 112,113 and results of my study (Figure 2-4) showed that additional overexpression of IspD and IspF improved the titers for lycopene by 80% and that of isoprene but 35 %. The micro-aerobic environment during isoprene cultures could be responsible for the disparity in titers as it is the highly oxygen- limited environment. The Lycopene titers obtained in SALyc-SDFI are comparable to the titers reported in literature 114,115. Overall, the pSADFI operon improved lycopene production by 47-fold

and isoprene titers by 15-fold compared to pSALyc and pSAIso strains; and the strategy proved to

be effective in eliminating bottlenecks in the MEP pathway.

IspDF fusion activity and the influence of IspE on MEP pathway flux

Dxs is a gatekeeper gene in the MEP pathway and Idi catalyzes the terminal step

maintaining equilibrium in IPP and DMAPP concentrations required for the downstream pathway

of terpenoid biosynthesis. Hence, the operon overexpressing only IspD and IspF as well as IspDFs

did not influence the terpenoid titers. Production of terpenoids by SAIso-DF 1, SAIso-DF 2, SAIso-

DF 3, SALyc-DF 1, SALyc-DF 2 and SALyc-DF 3 were not significantly different than the strains with no MEP pathway overexpression (data not shown). Hence it was necessary to include genes dxs and idi in further experiments to study the influence of intermediary steps. 56

Improvement in the flux through the pathway due to IspDF 1 expression in pSASDF 1I operon can be attributed to the role of the linker imparting physical features (like flexibility or catalytic site proximity/substrate channeling) to the catalytic domains; and/or, higher stability and/or activity of IspDF 1 than the native monofunctional enzymes. The IspF domain of IspDF 1 has

the highest similarity to the E. coli IspF than that of IspDF 2 and IspDF 3. The intensity of influence of IspDF 1 overexpression in lycopene strain was different than the isoprene strain. Since, IspE catalyzes the step between IspD and IspF, further investigation was carried out to evaluate the role of IspE in the catalytic cascade. IspE catalyzed step is not reported to be the bottleneck in the pathway and its overexpression exerted metabolic stress and lowered the lycopene titers. The stress effect was dominant in SALyc-SDF 1EI even though it expressed only four recombinant proteins versus five recombinant proteins in SALyc-SDFEI. This result highlighted the existence of factor/s other metabolic stress.

Study on linker types

The first factor studied was the role of the linker. The type of fusion and linker affected

only their activity and did not contribute to variability in protein expression levels. This was

verified by enzyme expression and analysis on SDS/PAGE (data not shown). Flexible linker was

chosen to impart mobility to the domains and rigid linker was chosen that forms a long helix

restricting movements of the domains. Linker from cjIspDF was employed as well. For the non-

natural IspDF 1 fusion, the C flux was diverted more to the MEP pathway and away from growth resulting in higher normalized lycopene titers but lower total lycopene production. SALyc-

SD RL F1I was best performing strain with 22% higher normalized titers than SALyc-SDF 1I and

33% higher normalized titers than the basal strain SALyc-SDFI. This suggested that the rigidity in 57

the conformation of the fusion had a positive impact on the catalytic activity. Homology modeling

IspD RL F1 was inconclusive since the templates could not accurately replicate the folding of the linker. On the other hand, when IspE was overexpressed (strain SALyc-SD RL F1EI), the production

decreased by 30 % and the normalized titers lowered by 80 %. But the OD 600 of SALyc-SD RL F1EI was 50 % higher than SALyc-SD RL F1I. Since the SALyc-SD RL F1EI expresses 4 heterologous enzymes, the effective quantities of the individual enzyme are lower than that in SALyc-SD RL F1I that expresses 3 heterologous enzymes. Hence the MEP pathway flux was lower, and the overall

C flux was diverted to biomass generation.

To deduce this effect further, construction and comparison were made with non-natural fusions of E. coli IspD and IspF. In these cases (Figure 2-9), co-localization of the activities had a negative impact on lycopene production as well as normalized titers. But, in these cases the overall

OD 600 of strains overexpressing IspE was 10-50% lower than their corresponding variants not overexpressing IspE. This prompted the involvement of IspE beyond its influence on the health and growth of the cell. Moreover, the putative linker of IspDF 1 when used to link E. coli IspD and

IspF, exhibited similar effects as other non-natural fusions.

Influence of IspDF 1 domain separation on MEP pathway flux

The chimeric enzyme with RL type linker so far exhibited the maximum flux through the

MEP pathway at an expense of cell growth. This prompted to re-evaluate necessity for linker and domain co-localization. The prevailing theory of organization of fusion of enzymes improves the rate of reaction cascade is by lowering the substrate diffusional limitations and substrate channeling. But the recent evidence shows that the dynamics of fusions on a metabolic cascade is more complex than previously assumed 116 . It is not simply the proximity between enzymes that 58

enhances the initial reaction rate; rather, colocalization increases the local concentration of enzymes. This therefore increases the chance that a diffusing substrate will interact with an active site cavity 117 .

IspD 1 and IspF 1 retained the individual activities. Strain SALyc-SD 1F1I had 25 % lower lycopene production but the OD 600 was lower by the same factor as well. Hence the overall flux

and the normalized titers were similar. SALyc-SD 1F1EI had 30 % lower OD 600 and displayed an

82 % increase in lycopene production. Both these observations factored to three-fold improvement

in normalized lycopene titers for SALyc-SD 1F1EI than SALyc-SDF 1EI. Though the overall

lycopene titers remained lower than SALyc-SDF 1I due to lower availability of copies of enzyme

because of longer operon; improvement in the normalized titers bolstered the observation of higher

stability and activity of IspDF 1.

Improvement in the flux on IspDE fusion

The absence of any literature on the fusion of IspE is noticeable in contrast to many

discoveries of IspDFs. The role of fusion of enzymes catalyzing non-consecutive steps in the

pathway and role of intermediary step enzyme is not only highly debated but also rather

unforeseeable. I tried to unravel it by constructing non-natural fusions of IspE. The performance

of IspDE fusion was many folds better than IspEF fusion. In fact, IspDE fusion exhibited 2.3-fold

improvement in lycopene production and 20 % improvement in the normalized titers than that of

SALyc-SDFEI. The OD 600 for SALyc-SD FL EFI was doubled as well. Whereas, IspEF fusion decreased the lycopene production at least 65 % normalized titers by 85 % than of SALyc-SDFEI.

The strain SALyc-SD FL EFI was the best performing strain for lycopene production and second

59

best in MEP pathway flux after SALyc-SD RLF1I. This was due to the fact that the individual domains of IspDF 1 are higher active than E. coli native IspD and IspF.

In conclusion, I was able to successfully engineer higher active ortholog of MEP pathway enzyme by fusion IspDF 1. I decoded it by the strategy to study linkers where non-natural IspDE fusion performed as well as the IspDF 1. This posed the question on the plausible role of IspE in the enzyme cascade. The positive effect of active site co-localization should have worked fairly equally for IspDE and IspEF since, they are the fusions of consecutive steps in the pathway. Rather,

IspEF underperformed 7-fold than IspDE for lycopene production as well as 8.5-fold for normalized titers that are indirect measures of MEP pathway flux. These results pose questions about the canonically reported MEP pathway for a cascade of IspD, IspE and IspF catalyzed steps.

60

Chapter 3: Study of MEP pathway biochemistry

3.1 Introduction

The work presented in chapter 2 of this thesis put forth the question on the benefit of the

natural existence of fusion protein (IspDF) that catalyzes non-consecutive steps in the MEP

pathway. Clearly, the reason isn’t substrate channeling. If it was so, IspDE or IspEF fusions have

been common too. Moreover, the non-natural fusion IspDE was higher active than IspEF fusion.

If the reason underlies in ease of complexation to form multicatalytic complex 66 of IspD, ispE and

IspF, published work in this area has never reported IspE catalyzed reaction as the rate limiting 66,85 .

This brought our attention to reconsider the MEP pathway steps. The cascade of IspD, IspE and

IspF catalyzed steps attracted our attention. These discoveries were the result of radiolabelling individual substrates and testing the reactions in isolation in vitro . Out of the three, IspF is reported to be promiscuous catalyzing cyclization of CDP-ME to 2-C-Methyl-D-erythritol-3,4- cyclophosphate 43 . Though this product was regarded as in vitro artifact without biosynthetic

relevance to the MEP pathway, the enzyme’s promiscuous nature cannot be ignored.

Kinetics of MEP pathway cascade from MEP to MEcPP

The recent study published gives some insights about the MEP pathway flux 118 summarized in Table 3.1. These estimations were based on experiments carried out using glucose as a substrate and E. coli cell lysate. Though they are not the in vivo measurements, they give an overview of the three-enzyme cascade. Fluxes through IspD and IspF are lower many folds compared to that of IspE though Kcat value for IspE is lower than IspD, the higher availability of

IspE drives higher flux. 61

Table 3.1 Enzyme amount and kinetics for MEP pathway steps (data extracted from 118 )

-1 Enzyme Kcat (min ) Amount per cell Maximum flux per cell (molecules) (molecules min -1) IspD 2940 50 ± 8 1.5 × 10 5 IspE 1010 208 ± 22 2.1 × 10 5 IspF 60 111 ± 13 6.7 × 10 3

Moreover, each enzyme association complex reported 66 contains 6 molecules each of IspD,

IspE and IspF. But the amounts reported per cell are not equal. In fact, the rough ratio of molecules

estimated (Table 3.1) is 1:4:2 for IspD:IspE:IspF distorting the stoichiometry required

complexation.

The natural fusion enzyme IspDF 1 exhibited higher flux as well as terpenoid yield due to

the higher activity of each domain that was clear from non-natural fusion IspD RL F1 and from the

separation of the domains to IspD 1 and IspF 1. Another of my study revealed that IspD FL E fusion improved the pathway flux but not IspE FL F.

The interpretations from the literature as well as observations of my past experiments directed me to study the pathway steps in detail with respect to the order of occurrence reported.

Canonical MEP pathway and plausible bifurcation step

Conversion of MEP to MEcPP happens through three steps in the order of citidylyl transfer, phosphorylation and intramolecular cyclization. In the canonical MEP pathway reported (Table

3.1(a)), MEP enters IspD catalytic site and gets converted to CDP-ME. CDP-ME diffuses out of the IspD in the bulk and enters IspE catalytic site to get converted to CDP-MEP. CDP MEP then

62

diffuses back in the bulk and enters IspE active site to be converted to MEcPP. These steps involve many diffusional steps.

Figure 3-1 Schematic for the flow of metabolites through the MEP pathway for IspDF,

IspE. (a) Canonical MEP pathway, (b)MEP pathway bifurcation. A: MEP, B: CDP-ME, C:CDP-

MEP, D:MEcPP, B’: ME-2,4-PP

We postulate that IspE can act on MEP to phosphorylate 2-OH group to form 2-Methyl- erythritol-2,4-diphosphate (ME-2,4-PP) as an alternate intermediate. ME-2,4-PP will then be acted upon by IspD to for CDP-MEP and resume following steps reported in the canonical pathway

(Figure 3-1(b) and Figure 3-2). This route involves fewer diffusional steps in case of fusions and are also aligned with my results of section 2.3.9 (page number 53) of this thesis.

63

ATP PP i CDP-ME ADP CTP IspD IspE IspF

CMP IspE MEP IspD CDP-MEP MEcPP ATP/ADP PP i ADP/AMP CTP

ME-2,4-PP

Figure 3-2 Bifurcation in the MEP pathway with the promiscuity of IspE. The pathway

highlighted with blue arrow is the canonical MEP pathway and the step highlighted by yellow

arrow marks the bifurcation step from MEP.

In this work, I systematically evaluated the feasibility of conversion of MEP to ME-2,4-PP

by IspE.

3.2 Materials and methods

Strains, plasmid and genes

All strains, plasmids and genes used in this study are listed (Table 3.2). Genes ispE was

amplified from E. coli strain K12 genome. All constructs were sequenced and verified from

Genewiz Inc., USA. IspE protein was tagged with 6×His residues for affinity-based purification.

64

Table 3.2 Strains, plasmids and genes

Strains Description Source

E. coli DH5 α Cloning strain New England Biolabs (# C2987) E. coli BL21(DE3) Expression strain New England Biolabs (#C2527) E. coli strain K12 Gene amplification Sigma-Aldrich (#EC1) SAHisIspE E. coli BL21(DE3) transformed with This study pSAHisIspE Plasmid Description Source

r pSAHisIspE Cam ; T7 promoter; (His) 6 tagged ispE; This study p15A ori Gene Description Source

ispE 4-(cytidine 5'-diphospho)-2-C-methyl-D- NCBI Gene ID: 945774 erythritol kinase

Media and growth conditions

E. coli DH5 α was used for plasmid construction and cultivated at 37 °C in Luria Bertani

(LB) media (purchased from Sigma-Aldrich). E. coli BL21(DE3) was used as an expression host.

The successful transformant of SAHisIspE was selected on LB agar plate containing 50 µg/mL chloramphenicol. For expression of IspE, a single colony from the plate was inoculated in LB media supplemented with chloramphenicol and grown at 37 °C for 6 h. This inoculum was suitably diluted with fresh media to the OD 600 of 0.2 and grown at 37 °C. When the OD 600 reached 0.8, the

culture was induced with 0.5 mM IPTG and allowed to grow at 30 °C overnight. All the chemicals

were purchased from Sigma-Aldrich.

65

Protein extraction and IspE purification

The cells were harvested, and soluble proteins were extracted from the cell pellet. Lysis

buffer consisted of 50 mM Tris-HCl (pH 8.0), 10 % (v/v) glycerol, 0.1 % (v/v) Triton X100, 1 mM

phenylmethylsulfonyl fluoride (PMSF), 2 mM magnesium chloride and 3 units/mL DNAase. The

samples were sonicated to completely lyse cells. IspE was purified from the total soluble protein

fraction by immobilized metal-affinity chromatography (IMAC) on Qiagen Ni-NTA Agarose

(Qiagen Cat No./ID: 30210) using protocol provided by the supplier.

IspE purification was performed using affinity-based technique utilizing Ni-NTA resin

columns supplied by Qiagen. The purification was performed as per the supplier’s protocol and

the purified protein solution was exchanged with 0.1 mM Tris-HCl and concentrated by

ultrafiltration. Protein concentration was measured by the Bradford method using bovine serum

albumin as a standard. The protein samples were stored at -20 °C until use.

IspE in vitro reactions

In vitro reaction consisted of 0.1 mM Tris HCl, 5 mM DTT, 10 mM MgCl 2, ATP (or ADP),

MEP, IspE enzyme. Chemicals and standards were procured from Thermo Scientific. The reaction was initiated by addition of substrate and allowed to proceed at 37 °C 85,119 . The reaction mixture

was sampled, and the reaction was arrested by vortexing and addition of equal volume of methanol.

The samples were then desalted using polyethersulfone (PES) membrane filters with 3 kDa

MWCO (VWR International).

66

Bioluminescent assay for ATP consumption

Initial reactions were performed using ATP as a phosphate donor. The course of the reaction was monitored using ATP Bioluminescent Assay Kit (Sigma-Aldrich) using the supplier’s protocol.

HPLC analysis of ATP, ADP and AMP

Further analysis of reaction samples to analyze products of ATP was performed on

PerkinElmer Flexar HPLC consisting of Zorbax C-18 column (4.6 × 250 mm, Agilent

Technologies) maintained at 25 °C. Samples were run with 1 mL/min mobile phase containing

solvent A: 10 mM ammonium acetate and solvent B: acetonitrile. Gradient elution conditions

were: 95 % A and 5 % B at 0 min; changed to 10 % A and 90 % B in 10 min; run for next 4 min;

gradient back to 95 % A and 5 % B in 4 min. Detection was done at 254 nm. This analysis could

neither detect MEP not its phosphorylated product.

LC-MS analysis of MEP and ME-2,4-PP

The reaction components were further analyzed on LC Agilent 1100 /MSD Trap XCT plus

system (Agilent Technologies) in negative mode consisting of Zorbax SB-C18 column (4.6 × 150

mm) maintained at 25 °C. Samples were run with 1 mL/min mobile phase containing solvent A:

0.2 % (v/v) formic acid solvent B: acetonitrile. Gradient elution conditions were: 95 % A and 5 %

B for 1 min; changed to 10 % A and 90 % B in 5 min; run for next 1 min; gradient back to 95 %

A and 5 % B in 10 sec. The ESI temperature was set at 350 °C with nebulizer set at 60 psi. Scans

were performed in negative mode.

67

Computational analysis

The analysis was done against IspD (PDB ID 1I52), IspE (PDB ID 1OJ4) and IspF (PDB

ID 1JY8). The PDB The structures of all molecules that participate in the catalytic cycle were

prepared in Marvin 120 . The molecules were then docked into the active site of the enzymes using

AutoDock Vina 121 . The input to the docking calculation comprised an initial set of atomic coordinates, and the program outputs putative poses of the molecule within the active site. A corresponding Gibbs free energy of binding ( ΔG) was also computed for each pose. In order to identify the pose that yields the global minimum for ΔG, the most stable pose was selected, and its coordinates were re-used as the initial value for a subsequent docking calculation. This methodology was reiterated until an unchanging pose was outputted.

3.3 Results

Protein expression and quantification

The IspE protein was successfully expressed and run on SDS/PAGE. (His) 6-IspE is 32.9

kDa protein that can be seen in the image of a gel stained with Coomassie dye (Figure 3-3).

68

Figure 3-3 IspE expression and analysis on SDS/PAGE. Lane 1: Total soluble protein fraction

of E. coli BL21(DE3), lane 2: total soluble protein fraction of SAHisIspE induced with 0.5 mM

IPTG, lane 3: protein ladder, lanes 4 and 5: purified IspE. The bands corresponding to IspE are

highlighted with red box.

Analysis of IspE reaction by ATP bioluminescent assay

In vitro reactions of IspE were carried out using MEP and ATP/ADP as substrates. Four sets of conditions were studied (Table 3.3).

Table 3.3 Reaction conditions for different experimental sets

Set IspE concentration MEP concentration ATP concentration ADP concentration

No. (µM) (µM) (µM) (µM)

1 6 500 150 0

2 30 500 150 0

3 30 200 1000 0

4 30 200 0 1000

69

The set 1 of the reactions was carried out keeping the molar ratio of the ATP (150 µM) to

MEP (500 µM) at 1:3.3 and IspE concentration was kept at 6 µM. The reaction was monitored using the bioluminescent assay. The assay is two-step procedure that measures both ATP and ADP based on luciferase activity. The drop in relative luminescent units (RLU) corresponds to the utilization of ATP whereas, ADP/ATP ratio gives ADP amount present indirectly. The course of this reaction was monitored for 4 h. The RLU drop over time indicated consumption of ATP in the reaction (Figure 3-4) but the proportional increase in the ADP concentration was not observed.

The ADP concentration suddenly increased after 2 and soon reached a plateau. Both ATP and ADP amounts reached saturation after 3 h.

100 1 Drop in RLU (%) ADP/ATP ratio 80 0.8

60 0.6

40 0.4 ADP/ATP Ratio ADP/ATP Drop in RLU in (%) RLU Drop 20 0.2

0 0 0 60 120 180 240 300 Time (min) Figure 3-4 Bioluminescent assay monitoring ATP and ADP in the IspE reactions.

70

HPLC analysis of IspE reactions

The bioluminescent analysis showed consumption of ATP, but non-proportional ADP

buildup. To investigate this further, the set 1 experimental samples were analyzed by HPLC. It

revealed a conversion of ATP to AMP and the ADP buildup only occurred at the later stages of

reaction (Figure 3-5). These results were in agreement with the bioluminescence analyses. The

reaction did not go to completion and 75 % of ATP was consumed at the end of 3 h. The reaction

rate is very low and the canonical reaction of conversion of CDP-ME to CDP-MEP is reported

faster 85 . The initial rate of reaction was calculated from the ATP consumption data. The rate for this reaction was observed to be 0.781 µM.min -1. The reaction samples without enzyme but with

ATP, ADP and AMP were analyzed as well to assess temperature and pH-dependent degradation of the phosphates. No noticeable degradation was observed for 3 h at reaction conditions.

150 ATP ADP AMP

120

90

60

Concentration (µM) Concentration 30

0 0 30 60 90 120 150 180 Time (min) Figure 3-5 HPLC analysis of set 1 reactions to study ATP consumption.

71

Hence, the set 2 experiments were designed with exactly the same parameters but IspE concentration of 30 µM. 75 % ATP consumption was achieved in 1 hour and ATP was rapidly converted to AMP and ADP built over time (Figure 3-6). The rate of ATP conversion increased

-1 2.3-fold to 1.795 µM.min .

150 ATP ADP AMP

120

90

60

Concentration (µM) Concentration 30

0 0 30 60 90 120 150 180 Time (min) Figure 3-6 HPLC analysis of set 2 reactions with higher IspE concentration to study ATP

consumption.

The first two steps involved ATP as the limiting reactant, set 3 reactions were planned with ATP to MEP ratio of 5:1 so that ATP is no longer a limiting reactant. The ATP consumption reached a maximum in the first 90 min of the reaction (Figure 3-7) and the initial rate was 2.521 µM.min -1. But the total conversion decreased to 30 %. Conversion of ATP to AMP was faster than that to ADP.

72

ATP ADP AMP 1000

800

600

400

Concentration (µM) Concentration 200

0 0 30 60 90 120 150 180 Time (min) Figure 3-7 HPLC analysis of set 3 reactions when MEP is rate limiting reactant to study

ATP consumption.

This suggested sequential phosphorylation of ATP to first ADP and ADP to AMP. To assess the role of ADP as a phosphate donor for IspE reaction, a reaction set 4 were planned with only ADP and MEP as substrates in the ratio of 5:1. It was observed that ADP is utilized and converted to AMP (Figure 3-8). The rate of reaction was 2.967 µM.min -1 which is 17 % higher than that of utilization of ATP as phosphate donor.

73

ADP AMP 1000

800

600

400

Concentration (µM) Concentration 200

0 0 15 30 45 60 Time (min) Figure 3-8 HPLC analysis of set 4 reactions when ADP is used as a phosphate donor.

The HPLC analyses gave insights about the dynamics of the reaction but failed to detect

MEP and ME-2,4-PP.

LC-MS analyses of IspE reaction mixture

LC-MS analysis was performed on set 3 and 4 reaction samples. MEP peak was

characterized by the standard. ME-2,4-PP chemical synthesis for the standard was attempted but

failed several times. Hence, the online tool Competitive Fragmentation Modeling for Metabolite

Identification (CFM-ID) was used to predict probable fragmentation pattern for the product. The

following spectra (Figure 3-9) was seen that closely corresponded to peaks predicted for ME-2,4-

PP (highlighted in the figure). The 296.64 is molecular ion (M -) peak and 280.71 and 264.71 are

the major fragments.

74

25000 264.71

20000 296.64 280.71

15000

10000 Ion counts Ion

5000

0 68.11 96.97 180.8 216.8 281.7 329.2 355.4 426.5 121.74 144.87 164.84 201.76 233.72 261.88 272.74 294.72 315.71 343.11 366.92 378.72 387.71 397.03 410.61 418.95 435.54 447.68 454.73 464.54 471.49 480.43 490.66 498.67 m/z

Figure 3-9 Mass spectra of the peak identified as ME-2,4-PP in IspE reactions.

Additionally, ME-3,4-PP spectral predictions prompted fragments at m/z 153 and 155 corresponding to carbonyl (Figure 3-10(a)) and hydroxy (Figure 3-10(b)) monophosphates respectively. These tautomeric rearrangements are only possible when the phosphate is present at

C3 position and; a methyl and a hydroxy group at the C2 position. The spectra of the sample lacked these fragments and hence confirm that ME-3,4-PP adduct is absent.

OH OH - OH - OH P P OH OH O O

H2C H2C O OH

(a) (b)

Figure 3-10 The major mass fragments of ME-3,4-PP.

75

Computational analysis

The canonical substrate (CDP-ME) and alternate substrate (ME-2,4-PP) was were docked

at IspD active site. The distance between the phosphate of CTP and 4’-phosphate oxygen of

canonical substrate MEP is 6 Å, whereas the distance between CTP and alternate substrate ME-

2,4-PP is 5.6 Å (Figure 3-11). The distance between the phosphate of CTP and 2’-phosphate

oxygen of ME-2,4-PP is 8.3 Å. This suggests that the addition of CMP at 4’-phosphate is unlikely.

Figure 3-11 Canonical substrate (MEP) and alternate substrate (ME-2,4-PP) docked on

IspD.

The substrates for IspD and IspE were docked (Figure 3-12) and revealed that they bind in the same active pocket but at the different regions in the pocket. CDP-ME is the canonical substrate whereas, MEP is an alternate substrate for IspE (Figure 3-13).

76

Figure 3-12 Substrate alignments in the active site for canonical MEP pathway and

bifurcation pathway. Top left image: MEP and CTP in IspE, top right image: ME-2,4-PP and

CTP in IspE, bottom left image: CDP-ME and ATP in IspE, bottom right image: MEP and ATP

in IspE.

77

Figure 3-13 (a) Canonical substrate (CDP-ME) and (b)alternate substrate (MEPP) docked

on IspE.

3.4 Discussions

The discovery and study of IspDF led me to study the MEP pathway cascade. The in vitro study conducted revealed plausible bifurcation step in the pathway. The initial bioluminescent assay showed ATP consumption without a substantial rise in ADP. To investigate ADP fate, HPLC analysis was conducted. This revealed accumulation of AMP by dephosphorylation of ADP.

Reactions were conducted using ADP as the sole phosphate donor and analyzed on LC-MS. This showed that the kinase, IspE can accept ATP as well as ADP as phosphate donors to phosphorylate

MEP to ME-2,4-PP. ME-2,4-PP was later confirmed by mass fragmentation pattern. The rates of this bifurcation reaction are very low compared to the canonical reaction and the conversion did not reach 100 %.

Since, chemical synthesis of ME-2,4-PP was unsuccessful, the final reaction product could not be fully characterized. This study gives enough evidence to investigate the cascade of IspD,

IspE and IspF in detail as future work.

78

Chapter 4: Study of catalytic promiscuity of terpene synthases

4.1 Introduction

Terpenoids are secondary metabolites that form part of plants’ defense mechanisms. They also have chemical and biological significance. Terpenoids have been used as anti-infectives, anticancer, antioxidants, pesticides etc. since they are naturally tailored to bind to physiological targets (cellular components and receptors). The structure of a typical terpene comprises a hydrocarbon scaffold and has multiple chiral centers and some heteroatoms. These attributes make them lipophilic and volatile appealing their use in cosmetic, flavor and fragrance industry to a greater extent. These features also help to improve their bioavailability in natural targets. They have high calorific value because of the hydrocarbon backbone and hence are also explored as fuel additives. Such highly useful properties of the diverse class of terpenoids are the result promiscuous nature of the terminal steps of the biosynthetic pathway. These two terminal steps are catalyzed by catalytically promiscuous enzymes, terpene synthases (TSs) and P450 monooxygenases, in the order they take part in the biosynthesis. Terpene synthase acts on the products of prenyltransferase generating a pool hydrocarbon scaffolds. P450 monooxygenases then functionalize each of the scaffolds to more diverse structures. Hence, efforts of selectivity improvements towards one terpenoid product must start from addressing TS promiscuity.

Terpenes occur in low concentrations in their natural hosts like plants merely due to their physiological purpose and need. Moreover, the production of terpenoids in plants is limited by several geographical and climatic factors 122,123 and farming plants for terpenoid production are a threat to biodiversity. Extraction and purification of the desired terpenoid from plant’s complex matrix require arduous 124–126 techniques making it uneconomical. Relying on the natural supply of 79

these trace terpenoids is non-viable at industrial scale 127 . For example, the prices and availability

of limonene in the market are determined by cultivation in Southern American countries and are

prone to fluctuations due to climatic variations and political instability. Uncertain and lower

availability of isoprenoids has discouraged their explorations for applications on a large scale.

The use of engineered microbial cell factories like bacteria has proved to be a more viable

and sustainable option. That requires engineering the biosynthetic pathway in the heterologous

host. It provides advantages like utilization of inexpensive carbon 128,129 , ability to improve titer by

genetic manipulations 81,104,130,131 , faster and stable rate of production, engineering selectivity towards single product and simpler downstream processing. Some examples of developed commercial processes are- isoprene, farnesene, artemisinin and squalene from Amyris; valencene, nootkatone and sandalwood oil from DuPont; elemene and patchouli oil from Isobionics 5. The

inputs from the study of flux improvements from chapters 2 and 3 can be used as a modular

platform strain. When doing so, it is undesired to have multiple products that not only lead to lower

titers but also increase the cost of downstream purification. TS promiscuity is restrictive to the

potential use of a heterologous expression as a means of production. This needs to be studied

before engineering the enzymes and the pathway.

Catalytic promiscuity of TSs

TSs are the most diverse class of enzymes that yield a plethora of products by simple

catalytic steps. The flexible polyisoprenyl diphosphate substrate (GPP, FPP, GGPP etc.) occupy

the TS active site depending on the site contour. The first step of transformation involves the

generation of a reactive substrate carbocation through abstraction of pyrophosphate. This step is

mediated by divalent metal ion like Mg +2 and the motive DDxxD plays a role in stabilization of 80

product as well as ionic pyrophosphate group 132 . The carbocation then undergoes cascades of hydride shifts and/or cyclization generating diverse transition state structures (TSS). The cation is then neutralized either by proton loss or nucleophilic attack to generated terpenes. The trajectory of this cascade is determined by the stability of the substrate as well as intermediary steps. The promiscuity in the carbocation neutralization cascade determines the extent of promiscuity of the

TS.

Factors responsible for TS promiscuity

Promiscuous nature of an enzyme is an intermediary step in the evolutionary trajectory

(Figure 4-1 reproduced) of an ancient catalytically selective enzyme to achieve newer function 133 .

This is evident from the occurrence of more than one TSs from the same class in a single species that bear high sequence similarity with each other. Phylogenetic analysis of these TSs also reveals their ancestral origin 134–136 .

Figure 4-1 Enzyme evolution trajectory to achieve the newer function (reproduced 133 )

81

The TS active site molds itself on the substrate binding to more product-like orientation, for example, ‘open’ and ‘closed’ conformations of aristolochene synthase from Aspergillus terreus 137 . Reactions of terpene synthases are thermodynamically favorable if the active site

contour is more product-like and generate a single product. Whereas, if the contour is less product-

like, the TS is promiscuous and generates multiple products 138 . As a consequence, one or a few

amino acid changes in the active site makes a huge change in product specificity 139–142 .

Apart from intrinsic parameters like evolution and active site structure, extrinsic parameter like enzyme environment plays a role as well. In vitro reactions of recombinant E. coli lysate expressing taxadiene synthase from Taxus brevifolia with GGPP yielded single product, taxa-

4(5),11(12)-diene 143 . The same enzyme when expressed along with MEP pathway cassette and

GGPP synthase in E. coli K12, produced a mixture of taxadiene and isotaxadiene derivatives in

vivo 144 . The selectivity for taxadiene was reported to be 96.1 % in vivo . When the same enzyme

was expressed in E. coli BL21(DE3) strain, it produced only 80 % taxadiene in vivo 137 . Effects of culture conditioning and heterologous host on product yield are well studied. But their influence on catalytic promiscuity is not explored systematically. Thorough knowledge of such factors contributing to the loss of flux can help to optimize the system further.

Study of TS activity and promiscuity

TSs fall into two main classes depending on their mechanism of generation of a carbocation from prenyl diphosphate substrates. Class I TSs involve a metal divalent aided chemistry to generate the cation and inorganic pyrophosphate. Whereas, class II TSs involve protonation of the substrate double aided by acidic residue (like aspartate) to yield a carbocation. MTSs exhibit class

82

I mechanism and the active site is located in α-helical folds 145 . The TSs studied here have been reported to belong to class I.

Diterpene synthases from Norway spruce- isopimaradiene synthase (PaIso) and levopimaradiene/abietadiene synthase (PaLAS), exhibit very different catalytic selectivity even though they share 91 % protein sequence similarity. After homology modeling, they identified catalytic AC domain in both the TSs. When this domain was swapped, the catalytic product profiles were reversed. The specific two amino acids identified in both TSs (H694 and S721 in

Palso, and Y686 and A713 PaLAS) were swapped, it did not entirely reverse the product outcome.

A site-directed mutations A713D in PaLAS extinguished its native catalytic activity and produced

96% of PaIso activity. This study was able to identify four mutations that reciprocally reversed the catalytic profiles of these TSs 145 . This study of identification of active site structure on homology modeling, subsequent identification of residues and mutagenesis to identify the impact lead to the crucial property of TSs of catalytic plasticity. The rate-limiting step in this strategy is the generation of mutants, their characterization and the iterative process until the desired outcomes are achieved.

Another example of such a study that resulted in the identification of disparity between crystal structure and catalytically active structure. The classical crystal structure reported for taxadiene synthase (TXS) is an ‘open’ structure with GGPP oriented in catalytically non-feasible conformation. This structure was reported to be catalytically non-functional and a new ‘closed’ structure was postulated based on molecular dynamic (MD) and docking studies 145 . The new structure verified the induced fit mechanism of involvement of R754, S713, V714 amino acid triad.

The study is an excellent example of delineation of TS reaction mechanism where the discrepancy

83

with the active form of the enzyme is deduced from docking studies. The inputs for the study were provided by in vitro and in vivo TS expression.

For MTSs, apart from the involvement of aspartate-rich DDxxD motif at the active site, a study revealed presence of tandem arginine (RR) motif upstream and deletion of all the amino acids upstream to it including RR motif in limonene synthase from Mentha spicata rendered the enzyme inactive towards GPP as a substrate 146 . This also implied that the RR motif is required for

GPP to linalyl pyrophosphate (LPP) isomerization. Another study on the same enzyme reported electrostatic interactions of a first arginine residue (R58) of the RR motif with E363 residue and

H-bond interactions of a second arginine residue (R59) with side chains of V357 and Y435. These interactions anchor the arginine side chains to the outside of the active site and thereby maintaining the closed active site structure 147 . The same study also reported that R315 forms H-bond with phosphoester oxygen of pyrophosphate after isomerization to LPP. Whereas, D496 forms the H- bond with phosphoester oxygen of pyrophosphate of GPP 147 .

Apart from this study, a study conducted on bornyl synthase from Salvia officinalis and its crystal structure, revealed the role of R314 and R493 in forming H-bonding with the pyrophosphate and presence of aromatic residues leading to π-cation interactions stabilizing cationic structure 148 . Cineole synthase from Salvia fruticosa contains N338 residue that is reported to activate the water molecule for hydroxylation yielding terpenes with hydroxy groups 148 .

Homology models of pinene synthase from Abies grandis revealed the presence of R184, R365 and C543 residues at active site 149 . These experiments on MTS activity were based on the enzyme crystal structure and homology model development.

The use of molecular dynamics (MD) and quantum mechanics (QM) approach alone falls short in defining effective boundary conditions and results in simulations performed in a non- 84

natural environment (like gas-phase model or water as solvent) 150 . Thought it is higher throughput,

QM calculations involve a wide range of approximations and speculations are often limited by the resolution of the system and necessitates verification by wet lab experiments. Despite advancements in our understanding of these enzymes, much remains to be learned about the TS biocatalysis, especially their chemical control and how individual TSs diversified from the ancestral superfamily.

Research variables and scope

The variables in the study are a flux of substrate carbon, the source of the TSs and product structural diversity. I designed a controlled study where substrate carbon content and TS source were kept the same and the TSs that generated a diverse class of products were selected.

I selected monoterpene synthase as a prototype as it is widely distributed among the plants and contains diverse product profiles. Monoterpenes are acyclic (hydrocarbon like myrcene and hydroxylated like linalool), monocyclic or bicyclic (with a combination of 6 and 3-membered rings in carene or a combination of 6 and 4-membered rings in pinene). The second factor that was taken into consideration is evolutionary pressure on the enzymes selected. Since enzymes isolated from a single plant have experienced similar evolutionary pressures, I selected Norway spruce

(Picea abies ), a coniferous tree as our model source. Isolation and characterization of MTSs are reported 136,151 and were gifted to us Joerg Bohlman from his Treenomix consortia. In this work, I

started developing a strategy to compare TSs in vitro and in vivo product distribution. These

product profiles then guided us to dock substrates, transition states and products to deduce the

catalytic mechanism. This study involves four enzymes carene synthase, myrcene synthase,

85

limonene synthase and linalool synthase and their successful expression in E. coli without protein engineering.

4.2 Materials and methods

Strains and plasmid construction

All strains, plasmids and genes used in this study are listed in Table 4.1. Genes dxs, ispD,

ispF and idi were amplified from E. coli strain K12 genome. GPPS was amplified from pTrc-

trGPPS(CO)-LS that was a gift from Jay Keasling (Addgene plasmid # 50603) 152 . Genes CarTS,

MyrTS, LimTS and LinTS were kindly provided by Joerg Bohlmann 136 and were suitably amplified. All constructs were sequenced and verified from Genewiz Inc., USA.

Table 4.1 Strains, plasmids and genes

Strains Description Source

E. coli DH5 α Cloning strain New England Biolabs (# C2987) E. coli BL21(DE3) Expression strain New England Biolabs (#C2527) E. coli BL21- Expression strain Agilent Technologies CodonPlus (DE3) RIPL (#230280) E. coli strain K12 Gene amplification Sigma-Aldrich (#EC1) SACar E. coli BL21(DE3) transformed with pSACar This study SAMyr E. coli BL21(DE3) transformed with pSAMyr This study SALin E. coli BL21(DE3) transformed with pSALin This study SALim E. coli BL21(DE3) transformed with pSACar This study SALim+ BL21-CodonPlus (DE3) transformed with pSACar This study Plasmids Description Source

pSACar Amp r; GPPS and CarTS under trc promoter; dxs, This study ispD, ispF and idi under T7 promoter; pBR322 ori 86

pSAMyr Amp r; GPPS and MyrTS under trc promoter; dxs, This study ispD, ispF and idi under T7 promoter; pBR322 ori pSALim Amp r; GPPS and LimTS under trc promoter; dxs, This study ispD, ispF and idi under T7 promoter; pBR322 ori pSALin Amp r; GPPS and LinTS under trc promoter; dxs, This study ispD, ispF and idi under T7 promoter; pBR322 ori Genes Description Source

dxs 1-Deoxy-D-xylulose-5-phosphate synthase NCBI Gene ID: 945060 ispD 2-C-Methyl-D-erythritol 4-phosphate NCBI Gene ID: cytidylyltransferase 948269 ispF 2-C-Methyl-D-erythritol 2,4-cyclodiphosphate NCBI Gene ID: synthase 945057 idi Isopentenyl-diphosphate Delta-isomerase NCBI Gene ID: 949020 GPPS Geranyl pyrophosphate synthase 152 CarTS (+)-3-Carene synthase from Picea abies 136 MyrTS Myrcene synthase from Picea abies 136 LimTS (-)-Limonene synthase from Picea abies 136 LinTS (-)-Linalool synthase from Picea abies 136

Media and growth conditions

E. coli DH5 α was used for plasmid construction and cultivated at 37 °C in Luria Bertani

(LB) media (purchased from Sigma-Aldrich). E. coli BL21(DE3) and E. coli BL21-CodonPlus

(DE3) were used as expression hosts. Successful transformants of SACar, SAMyr, SALin and

SALim were selected on LB agar plate containing 100 µg/mL ampicillin. Successful transformants

of SALim+ were selected on LB plate containing 100 µg/mL ampicillin and 50 µg/mL

chloramphenicol.

For flask fermentation, a single clone was inoculated in a media containing required

antibiotics; supplemented with magnesium chloride (procured from Thermo Fisher Scientific) and

87

grown at 37 °C for 6 h. The inoculum was diluted to 15 mL with the medium to OD 600 of 0.2 and allowed to grow at 37 °C. Cultures were induced with isopropyl β-D-1-thiogalactopyranoside

(IPTG, purchased from Sigma-Aldrich) at an OD 600 of 0.8 and overlaid with dodecane to trap

monoterpenes produced. Fermentations were continued at 37 °C, 30 °C, 22 °C and 16 °C. 30 mL

glass tubes and 250 mL Erlenmeyer flasks were used, and the cultures were shaken at 220 rpm.

Fermentations were carried out aerobically and micro-aerobically. For micro-aerobic

fermentations, the tubes were capped with neoprene rubber stoppers and crimped whereas, flasks

were screw-capped containing Teflon liner after the induction. Since terpenoids are volatile, the

conditions were optimized initially for SACar clone. LB, modified LB (LBY; LB medium

supplemented with 5 g/L yeast extract), terrific broth (TB; containing 24 g/L yeast extract, 20 g/L

tryptone, 4 mL/L glycerol, 0.017 M monobasic potassium phosphate and 0.072 M dibasic

potassium phosphate), 2x YT (16 g tryptone, 10 g yeast extract and 5 g sodium chloride, pH

adjusted to 7) and Hi-Def Azure (HD, supplemented with 20 g/L glycerol) media were employed

in fermentation. Hi-Def Azure media was purchased from Teknova and all rest media components

were purchased from Sigma-Aldrich.

In vitro reactions

Proteins from cell pellets from 20 mL cultures were extracted and used to carry out in vitro

reactions. Lysis buffer consisted of 50 mM Tris-HCl (pH 8.0), 10 % (v/v) glycerol, 0.1 % (v/v)

Triton X100, 1 mM phenylmethylsulfonyl fluoride (PMSF), 2 mM magnesium chloride and 3

units/mL DNAase. The samples were sonicated to completely lyse cells. Specific protein was

purified from the lysate. Both total protein extract as well as purified protein solution were

exchanged in reaction buffer by ultrafiltration and tested for in vitro activity using geranyl 88

pyrophosphate (GPP) as substrate. In vitro reactions consisted of 50 mM Tris-HCl (pH 8.0), 100 mM NaCl, 10 % (v/v) glycerol, the purified protein, 0.2 mM GPP, and 0.4 mM MnCl 2 overlaid with 1 mL of hexane136 . Reactions were initiated by addition of substrate and allowed to proceed at 22 °C and terminated by vortexing. The hexane layer was sampled and analyzed on GC-MS.

All chemicals used were purchased from Sigma-Aldrich.

Metabolite analyses

Metabolites were analyzed using GC-MS. The system consisted of PerkinElmer Clarus 680 gas chromatograph and Perking Elmer Clarus SQ8T mass spectrometer (70 eV). HP-5MS capillary column (25 m long, 0.2 mm internal diameter, 0.33 µm film thickness; Agilent Technologies) was used, with helium (1 mL/min) as the carrier gas. The oven temperature program was: 60 °C for 3 min, 5 °C/min to 100 °C, 40 °C/min to 270 °C and hold for 1 min. The injector was maintained at

250 °C. Dodecane from overlay was diluted 100 times and injected in split mode with a 20:1 split ratio and detected on total ion chromatogram. Qualitative and quantitative analyses were done by monitoring for m/z 93 which is a predominant monoterpene fragment. Peak identities were confirmed by running monoterpene standards and quantified using their respective calibration curves.

Computational analysis

The homology models for CarTS, MyrTS, LinTS and LimTS were generated by using I-

TASSER 153 using α-bisabolene synthase (PDB ID 3SAE), taxadiene synthase (PDB ID 3P5P) and

4S-limonene synthase (PDB ID 2ONH) as templates. The structures of all molecules that

participate in the catalytic cycle were prepared in Marvin 120 . The molecules were then docked into 89

the active site of the enzymes using AutoDock Vina 121 . The input to the docking calculation comprised an initial set of atomic coordinates, and the program outputs putative poses of the molecule within the active site. A corresponding Gibbs free energy of binding ( ΔG) was also computed for each pose. In order to identify the pose that yields the global minimum for ΔG, we selected the most stable pose and re-used its coordinates as the initial value for a subsequent docking calculation. We iterated this methodology until an unchanging pose was outputted.

90

4.3 Results

Plasmid construction and protein expression

GPPS (Abies grandis )104 and each of the terpene synthase were cloned in the pSASDFI

operon (Section 2.2.2). Figure 4-2 are the images of SDS/PAGE gels stained with Coomassie dye.

Figure 4-2 Protein expression and analysis on SDS/PAGE. (a) Gel image for GPPS

expression, lane 1: total protein extract of E. coli BL21(DE3), lane 2: protein ladder, lane 3: total

protein extract of engineered E. coli for GPPS uninduced, lane 4: total protein extract of

engineered E. coli for GPPS with band corresponding GPPS (32.4 kDa) in the red box; (b) Gel

image for (His) 6-MTS (induction with 0.5 mM IPTG) purified by Ni-NTA column, lane 1:

myrcene synthase induced, lane 2: carene synthase induced, lane 3: myrcene synthase uninduced,

lane 4: carene synthase uninduced, lane 5: protein ladder, lane 6: limonene synthase uninduced,

lane 7: linalool synthase uninduced, lane 8: limonene synthase induced, lane 9: linalool synthase

induced. The bands corresponding with the specific protein are highlighted with red box. The

bands corresponding with the specific protein are highlighted with red box.

91

LinTS could not be detected on SDS/PAGE when expressed in E. coli BL21(DE3) strain.

When it was transformed in E. coli BL21-CodonPlus (DE3) RIPL, LinTS protein could be detected in the cell extract. Since the antibiotic resistance protein (beta-lactamase) has a size of 34.8 kDa, it is in Figure 4-2(a) lane 3. But after induction the band widens corresponding to GPPS that was also confirmed by protein-tag experiments Figure 4-3.

Figure 4-3 Protein expression of (His) 6-GPPS and analysis on SDS/PAGE. Lane 1: protein

ladder, lane 2: purified uninduced GPPS (5 ug), lane 3: purified uninduced GPPS (10 ug), lane 4:

purified induced (with 0.5 mM IPTG) GPPS (5 ug), lane 5: purified induced (with 0.5 mM

IPTG) GPPS (10 ug)

MTSs possess a transit peptide sequence at their N termini that localizes them to the plastid.

The sequence was removed, and the remaining sequence was successfully cloned with other genes in the operon. The plasmid maps generated by Genewiz software are shown in Figure 4-4.

92

Figure 4-4 Plasmid maps for MTS expression system. (a) Myrcene synthase, (b) Linalool

synthase, (c) Carene synthase, (d) Limonene synthase

Study of parameters on SACar fermentations

Two variables were optimized for fermentative production of terpenoids in recombinant E. coli - culture vessel and aeration. Four conditions (Figure 4-5) were tested for SACar in LBY media 93

in tubes and flasks were grown for 18 h after 1 mM IPTG induction at 30 °C with 10 %(v/v) dodecane overlay.

Figure 4-5 Images of cultures (a) aerobic tube culture, (b) microaerobic tube culture, (c)

aerobic flask culture, (d) microaerobic flask culture.

The tube cultures produced less biomass as well as carene (Figure 4-6). Carene production was improved by 8 to 20-fold when tubes were replaced by flasks for fermentation. The biomass density was marginally lower in microaerobic flask cultures and had higher normalized carene titer

(µg/L/OD).

94

7 0.6 µg/L µg/L/OD

6 0.5

5 0.4 4 0.3 3 0.2 2 Carene produced (µg/L) produced Carene Carene produced (µg/L/OD) produced Carene

1 0.1

0 0 TA TM FA FM

Figure 4-6 Effect of culturing conditions on SACar. TA: aerobic tube culture, TM:

microaerobic tube culture, FA: aerobic flask culture, FM: micro-aerobic flask culture. Primary

Y-axis is carene titer and secondary Y-axis is normalized carene titer. The other culture

conditions were: 30 °C temperature, LBY media (supplemented with the antibiotic and 2 mM

MgCl 2), 18 h incubation time and 10 % dodecane overlay.

The next factor studied was incubation temperature after induction in microaerobic flask cultures (Figure 4-7). Temperatures ≤22 °C had lower OD 600 but the higher carene production. At higher temperature, cells too grew faster but made lower carene. Hence 22 °C was determined to be optimal for the objective and was used for all subsequent experiments.

95

9 1.6 µg/L µg/L/OD 8 1.4 7 1.2 6 1 5 0.8 4 0.6 3

Carene produced (µg/L) produced Carene 0.4

2 produced (µg/L/OD) Carene

1 0.2

0 0 16 22 30 37 Temperature ( °C) Figure 4-7 Effect of incubation temperature on carene production. Primary Y-axis is carene

titer and secondary Y-axis is normalized carene titer. The other culture conditions were: micro-

aerobic flask cultures, LBY media (supplemented with the antibiotic and 2 mM MgCl 2), 18 h

incubation time and 10 % dodecane overlay.

Study of strain and fermentation parameters on monoterpene production

Strains SACar, SAMyr and SALin, when grown in LB media in shake flask fermentation, produced monoterpenes. Strain SALim, when grown in LB as well as other media, did not produce detectable levels of any monoterpene. pSALim transformed in E. coli BL21-CodonPlus(DE3)

RIPL produced monoterpenes. E. coli BL21-CodonPlus(DE3) RIPL contains extra copies of rare tRNA genes to aid expression of heterologous proteins.

Four different fermentation parameters like the role of media, divalent ion concentration, percent dodecane overlay and quantity of inducer were studied for engineered strains SACar, 96

SAMyr, SALin and SALim+. Sequential optimization of each parameter was conducted. I had found that due to the volatile nature of products, sealed cultures maintain the most consistent environment required for the study. All these experiments were performed in micro-aerobic fermentative conditions in screw-cap shake flasks. I also observed that the biomass produced had a direct relation with monoterpene titers and each strain produced a mixture of monoterpenes. The titer of major product was monitored (carene for SACar, myrcene for SAMyr, linalool for SALin and limonene for SALim+) in each case. Though strains SACar and SALim+ gave high titers in

TB, it generated way more biomass and LBY outweighed TB in when OD 600 was factored in for titers of SACar and SALim+ (Figure 4-8). SAMyr consistently displayed higher myrcene production in LBY than any other media tested. Linalool titers were more than three times higher in TB medium than others. All further analyses on SACar, SAMyr and SALim+ were performed in LBY media whereas for SALin were performed in TB media.

Magnesium ions were supplemented in the media optimized earlier for each strain and are required for catalytic activity of MTSs 136 . I observed that (Figure 4-9) Mg 2+ it is required for the monoterpene production, but it did not have any effect on cell growth. 2mM magnesium chloride in the media was found to be optimum and employed in all subsequent experiments.

97

12 1.6 0.4 µg/L µg/L/OD µg/L µg/L/OD 0.8 10 1.2 0.3 8 0.6

6 0.8 0.2 0.4 4 0.4 0.1 Carene produced produced (µg/L) Carene

Myrcene (µg/L) produced Myrcene 0.2 2 produced (µg/L/OD) Carene Myrcene produced (µg/L/OD) Myrcene

0 0 0 0 (a) LB LBY 2YT TB HD (b) LB LBY 2YT TB HD

8 1.6 1 0.3 µg/L µg/L/OD µg/L µg/L/OD 0.9 0.8 6 1.2 0.7 0.2 0.6 4 0.8 0.5 0.4 0.1 0.3 2 0.4 Linalool (µg/L) Linalool produced 0.2 Limonene produced (µg/L) Limonene Linalool Linalool produced (µg/L/OD) 0.1 produced (µg/L/OD) Limonene 0 0 0 0 (c) LB LBY 2YT TB HD (d) LB LBY 2YT TB HD

Figure 4-8 Effect of growth media on terpene production. (a) SACar, (b) SAMyr, (c) SALin,

(d) SALim+. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer.

The fermentation conditions were: micro-aerobic flask cultures, 2 mM MgCl 2, 18 h incubation

time and 10 % dodecane overlay.

98

9 2 0.9 0.25 µg/L µg/L µg/L µg/L/OD 8 0.8 7 1.6 0.7 0.2 6 0.6 1.2 0.15 5 0.5 4 0.4 0.8 0.1 3 0.3 Carene produced (µg/L) produced Carene 2 0.4 (µg/L) produced Myrcene 0.2 0.05 Carene produced (µg/L/OD) produced Carene 1 0.1 (µg/L/OD) produced Myrcene

(a) 0 0 (b) 0 0 1 2 3 1 2 3

8 1.2 1 0.3 µg/L µg/L/OD µg/L µg/L/OD 7 0.9 1 0.25 0.8 6 0.7 0.8 0.2 5 0.6 4 0.6 0.5 0.15 0.4 3 0.1 0.4 0.3 2

Linalool (µg/L) Linalool produced 0.2 0.2 (µg/L) produced Limonene 0.05 Linalool (µg/L/OD) Linalool produced 1 0.1 (µg/L/OD) produced Limonene 0 0 0 0 (c) (d) 1 2 3 1 2 3

MgCl 2 concentration (mM)

+2 Figure 4-9 Effect of Mg concentration on terpene production. (a) SACar in LBY media, (b)

SAMyr in LBY media, (c) SALin in TB media, (d) SALim+ in LBY media. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. The fermentation conditions were:

micro-aerobic flask cultures, 18 h incubation time and 10 % dodecane overlay. 99

Monoterpenes are volatile and cytotoxic. In situ biphasic extraction serves the purpose of capturing monoterpenes and alleviating toxicity 136 . 10 % (v/v) dodecane was observed to be sufficient for the capture of monoterpenes in cultures SACar, SAMyr and SALim+. However,

SALin required 30 % (v/v) overlay with dodecane (Figure 4-10). This optimized dodecane overlay was used in subsequent experiments.

100

9 2 0.9 0.2 µg/L µg/L/OD µg/L µg/L/OD 8 0.8 7 1.6 0.7 0.16 6 0.6 1.2 0.12 5 0.5 4 0.4 0.8 0.08 3 0.3

Carene produced (µg/L) produced Carene 2 0.2 0.4 (µg/L) produced Myrcene 0.04 Carene produced (µg/L/OD) produced Carene 1 0.1 (µg/L/OD) produced Myrcene 0 0 0 0 (a) (b) 5 10 20 30 5 10 15 20

9 1.2 1 0.25 µg/L µg/L/OD µg/L µg/L/OD 8 0.9 0.8 0.2 7 0.9 6 0.7 0.6 0.15 5 0.6 0.5 4 0.4 0.1 3 0.3 2 0.3 Linalool (µg/L) Linalool produced Limonene (µg/L) produced Limonene 0.2 0.05 Linalool (µg/L/OD) Linalool produced 1 0.1 (µg/L/OD) produced Limonene 0 0 0 0 (c) 5 10 20 30 40 (d) 5 10 15 20

% Dodecane overlay (v/v)

Figure 4-10 Effect of dodecane overlay on terpene production . (a) SACar in LBY media, (b)

SAMyr in LBY media, (c) SALin in TB media, (d) SALim+ in LBY media. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene titer. The fermentation conditions were:

micro-aerobic flask cultures, 2 mM MgCl 2 and 18 h incubation time. 101

Induction with IPTG was the next step in optimization. Both MEP pathway operon and terpene biosynthesis operon were cloned under IPTG inducible promoters. The strains were observed to require differing amounts of IPTG for optimal induction (Figure 4-11). Optimal concentrations of IPTG for strains were: 0.7 mM for SACar, 0.25 mM for SAMyr, 0.2 mM for

SALim+ and 0.1 mM for SALin. Higher concentration of IPTG had a negative effect on cell growth and subsequent monoterpene production which is consistent with the literature. This effect was pronounced for SALim+ cultures.

102

12 2 0.9 0.25 µg/L µg/L/OD µg/L µg/L/OD 0.8 10 1.6 0.7 0.2 8 0.6 1.2 0.15 0.5 6 0.4 0.8 0.1 4 0.3

Carene produced (µg/L) produced Carene 0.4 0.2 2 (µg/L) produced Myrcene 0.05 Carene produced (µg/L/OD) produced Carene 0.1 Myrcene (µg/L/OD) produced Myrcene 0 0 0 0 0 1 1 (a) (b) 0 0.1 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.5 0.25 0.15 0.25 0.75

8 1.2 1 0.35 µg/L µg/L/OD µg/L µg/L/OD 0.9 7 0.3 0.8 6 0.9 0.7 0.25 5 0.6 0.2 4 0.6 0.5 0.15 3 0.4 0.3 2 0.3 0.1 Linalool (µg/L) produced Linalool 0.2 Limonene (mg/L) produced Limonene

1 (µg/L/OD) produced Linalool 0.05

0.1 (mg/L/OD) produced Limonene 0 0 0 0 0 1 (c) (d) 0 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.05 0.75 0.025 0.075 IPTG concentration (mM)

Figure 4-11 Effect of IPTG induction on terpene production. (a) SACar, (b) SAMyr and (d)

SALim+ in LBY media with 10 % dodecane overlay; (c) SALin in TB media with 30 %

dodecane overlay. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene

titer. The fermentation conditions were: micro-aerobic flask cultures, 2 mM MgCl 2 and 18 h

incubation time. 103

Table 4.2 summarizes the fermentation conditions that were found to give higher terpenoid

titers from selected sample space.

Table 4.2 Summary of fermentation conditions selected

Magnesium chloride IPTG concentration Dodecane overlay Strain Growth media concentration (mM) (mM) (% v/v)

SACar LBY 2 0.70 10 SAMyr LBY 2 0.25 10 SALim LBY 2 0.20 10 SALin+ TB 2 0.10 30

Aerobic and micro-aerobic fermentations

All shake flask fermentations were run using optimized parameters. SACar, SAMyr and

SALim+ fermentations were run in LBY media supplemented with 2 mM MgCl 2, overlaid with

10 % (v/v) dodecane. SACar was induced with 0.7 mM IPTG, SAMyr with 0.25 mM IPTG and

SALim+ with 0.2 mM IPTG. SALin fermentations were run in TB media supplemented with 2 mM MgCl 2, overlaid with 30 % (v/v) dodecane on induction with 0.1 mM IPTG. The conditions of higher titers implied that the flux through the pathway is high enough that the limiting step is the last step catalyzed by TS.

Figure 4-12 shows the major monoterpene product produced (carene for SACar, myrcene

for SAMyr, linalool for SALin and limonene for SALim). SACar, SAMyr and SALin aerobic

fermentations gave higher titers of major monoterpene product than micro-aerobic conditions.

Whereas, SALim+ performed better in oxygen-limited conditions. All four strains had a similar

trend in dependence of monoterpene production on fermentation time. The monoterpene titers

104

increased with time upon induction until a point where it reached a maximum and stayed constant.

All four strains took 18 h to 26 h to reach the saturation point. Because of the volatile nature of terpenoids, later drop in titer can be attributed to loss from the system by evaporation.

105

15 (a) CarTS 6 10 4

5 2

0 0 1 0 10 20 30 40 50 60 70 80 0.8 (b) MyrTS 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 12 0 10 20 30 40 50 60 70 80 (c) LinTS 16 9 12

6 8 Major monoterpene Major monoterpene produced (µg/L) 3 4 Major monoterpene Major monoterpene produced (µg/L/OD)

0 0 0 10 20 30 40 50 60 70 80 1.2 LimTS 0.4 (d) 0.9 0.3

0.6 0.2

0.3 0.1

0.0 0 0 10 20 30 40 50 60 70 80 µg/L Aerobic µg/L Micro-aerobic µg/L/OD Aerobic µg/L/OD Micro-aerobic Figure 4-12 Fermentative production of monoterpenes over time. (a) SACar, (b) SAMyr, (c)

SALin, (d) SALim+. Primary Y-axis is terpene titer and secondary Y-axis is normalized terpene

titer. Fermentation conditions are mentioned in Table 4.2. 106

Maximum carene production for SACar in aerobic conditions was 12.6 µg/L at 21 h whereas in micro-aerobic, it was 10.3 µg/L carene at the end of 26 h. Myrcene titers in SAMyr for aerobic conditions reached a maximum of 0.84 µg/L at 26 h but 0.75 µg/L at 21 h for micro-aerobic fermentations. SALim+ titers for limonene in aerobic conditions (0.93 µg/L) were less than micro- aerobic conditions (1.08 µg/L) both reaching maxima at 26 h after induction. SALin fermentations were greatly influenced by oxygen. I observed a 30% difference in the maximum linalool concentration (aerobic: 10.3 µg/L at 18 h and micro-aerobic: 6.9 µg/L at 21 h). Except for SALim+ cultures, all other tree had higher monoterpene yield in aerobic conditions than micro-aerobic. So,

I tested limonene synthase from Mentha spicata 104 expression in E. coli BL21(DE3) and limonene production in aerobic and micro-aerobic fermentations. Aerobic cultures produced limonene at higher concentrations initially and after the third day of fermentation, micro-aerobic surpassed aerobic titers. I monitored these fermentations for five days and the maximum titer was reached on the fourth day (5 mg/L).

Cell growth during fermentation was monitored as optical density at 600 nm. LBY fermentations reached maximum OD 600 in the range of 6 to 8. Whereas, TB cultures reached a

higher OD 600 of 12-14. For all, OD 600 did not change substantially after the third day and was

higher for aerobic cultures than micro-aerobic except SACar. SALim+ cultures showed a drop in

OD 600 after the fourth day due to cell death and lysis.

Metabolite analyses

Dodecane samples from fermentations and hexane samples from in vitro reactions were

analyzed on GC-MS. Chromatograms were analyzed in SIM mode with m/z = 93, a characteristic

fragment for monoterpenes. In vitro product profiles for both total protein extract and purified 107

protein extract were similar to each other as well as literature 140,129 . I found some differences not only with the amount of a specific product being made but also which product is made. I identified newer products of the TSs than what had been reported previously and were dependent on the supply of oxygen during fermentation. Qualitative and quantitative product profiles are listed in

Table 4.3 for nine different monoterpenes that were produced by the strains in varying concentration. Percent share of a terpene out of total was calculated on molarity basis with standard error for triplicates. Products of these monoterpenes are hydrocarbons (C 10 H16) except linalool

(C 10 H18 O) that has a hydroxyl group. The product distribution is enriched in different classes of monoterpenes encompassing acyclic (myrcene, ocimene, linalool), monocyclic (limonene, terpinene, terpinolene) and bicyclic (carene, pinene, sabinene).

Table 4.3 Monoterpene product distribution and percent total

Products Conditions

SACar aerobic SACar micro-aerobic CarTS in vitro Previously fermentations fermentations reactions reported (This study) (This study) (This study) CarTS in vitro reactions 151 Carene 83.4±0.9 76.7±0.8 80.1±0.6 78.0 Terpinolene 3.9±0.2 4.8±0.2 9.9±0.4 11.0 Sabinene 8.5±0.2 10.4±0.4 4.3±0.2 5.0 Myrcene 2.5±0.2 2.6±0.2 2.8±0.3 3.0 Limonene 1.2±0.2 2.2±0.2 0.1±0.1 0.0 Terpinene 0.3±0.2 0.8±0.3 1.5±0.3 1.6 Pinene 0.1±0.1 2.5±0.4 0.7±0.3 0.9 Phellandrene 0.0 0.0 0.6±0.2 0.7

108

SAMyr aerobic SAMyr micro-aerobic MyrTS in vitro Previously fermentations fermentations reactions reported (This study) (This study) (This study) MyrTS in vitro reactions 129 Myrcene 98.7±0.8 93.7±0.7 100 100 Carene 1.0±0.2 6.1±0.5 0 0.0 Pinene 0.3±0.1 0.3±0.1 0 0.0

SALim+ aerobic SALim+ micro- LimTS in vitro Previously fermentations aerobic fermentations reactions reported (This study) (This study) (This study) LimTS in vitro reactions 129 Limonene 92.1±0.6 88.7±0.5 90±0.6 89.9 Myrcene 5.9±0.1 8.2±0.3 5±0.2 5.2 Pinene 0.8±0.1 1.0±0.1 5±0.2 4.9 Linalool 1.2±0.3 0.5±0.2 0 0.0 Carene 0.4±0.2 1.6±0.2 0 0.0

SALin aerobic SALin micro-aerobic LinTS in vitro Previously fermentations fermentations reactions reported (This study) (This study) (This study) LinTS in vitro reactions 129 Linalool 97.0±0.9 79.8±0.8 99.2±0.9 98.2 Ocimene 2.8±0.5 5.4±0.3 0.8±0.2 1.0 Carene 0.1±0.2 14.6±0.4 0 0.2 Myrcene 0.0 0.0 0 0.2 Terpinolene 0.0 0.0 0 0.1 Pinene 0.0 0.2±0.2 0 0.0 Others - - - Each <0.07

Study of terpene synthase catalytic mechanisms

The reaction mechanisms for monoterpene production from GPP are depicted in Figure

4-14. It originates with the abstraction of the pyrophosphate group to generate primary geranyl carbocation. This carbocation can exist in two isomeric conformations: cisoid and transoid. For 109

the cyclic monoterpenes to form, the geranyl carbocation undergoes 6,1-ring closure to generate

α-terpinyl carbocation. The distance between the cation and C6 in trans isomer is 5.8 Å whereas, in cis isomer it is 4.7 Å (Figure 4-13 (a) and (b)). Hence, cyclization is favorable in cisoid conformation than in transoid. α-Terpinyl cation then undergoes a range of cyclization, rearrangements and hydride shifts before the reaction is terminated by deprotonation. Similarly, the formation of α-terpinyl carbocation is favorable than β-terpinyl carbocation (Figure 4-13 (c) and (d)).

Figure 4-13 Isomeric structures of carbocations . (a) Cisoid geranyl carbocation, (b) transoid

geranyl carbocation, (c) α-terpinyl carbocation and (d) β-terpinyl carbocation.

110

Myrcene

H O Geranyl pyrophosphate Primary geranyl Tertiary geranyl cation 2 cation (cisoid)

6,1-closure Linalool

3 4 2 2,7-closure 6,7-hydride shift 2,6-closure

5 1 6 7

Pinene Pinyl cation α-Terpinyl cation Terpenen-4-yl cation Thujyl cation

Carene Limonene Terpinolene γ-Terpinene Sabinene

Figure 4-14 Reaction mechanisms of monoterpene synthases. Monoterpene structures’ color

is based on its penultimate carbocation origin.

The α-terpinyl carbocation leads to an array of cyclic monoterpenes through rearrangements. Carene, limonene and terpinolene are deprotonation products of the α-terpinyl carbocation. Carene synthesis involves 1,7-closure to generate C3 ring. α-Terpinyl carbocation can

111

undergo 2,7-closure to form a pinyl cation containing C4 ring that then forms pinene. α-Terpinyl carbocation can either form γ-terpinene through neutralization or can undergo a 6,7-hydride shift to thujyl carbocation containing C3 ring. Thujyl cation then generates sabinene by deprotonation.

Formation of acyclic monoterpenes like myrcene and linalool proceeds via the rearrangement of cisoid primary geranyl carbocation to form a tertiary carbocation. The tertiary carbocation is then neutralized to form myrcene by deprotonation and to linalool by water addition.

Following two characteristics were studied to evaluate the plausible catalytic cascade for each terpene synthase – the amino acid located at the active site and their proximity to the reactive centers; binding energetics of the transition state structure (TSS). It is reported that amino acids like arginine, histidine, tyrosine, aspartate, glutamate are involved in hydrogen-mediated mechanisms 17,141,155 . The orientation of TSSs and products were docked on TS active site and amino acid residues in 5.0 Å radius were identified. Following discussion is with a focus of these docking experiments. PyMOL software was employed to study the results and generate the images.

Cyclization of primary geranyl carbocation (cisoid) to the α-terpinyl carbocation

The likelihood of this reaction decides whether the terminal products are acyclic or cyclic.

For the cyclization to happen, the C 6 comes closer to the cationic carbon decreasing the angles between C 3, C 4, C5 and C6 (Figure 4-15). For all the docking results, the active site surface is

highlighted in grey.

112

Figure 4-15 Changes in the orientation of primary geranyl cation on 6,1-closure to form the

α-terpinyl cation. (a) Orientation of primary geranyl cation, (b) orientation of α-terpinyl cation.

The TSSs are highlighted in green.

Figure 4-16 shows the outcomes of docking the geranyl carbocation at the active site.

+ CarTS active site contains R480, -NH 2 group of which is at a 4.1 Å from the geranyl carbocation.

The arginine is reported to catalyze pyrophosphate group removal and catalyze proton elimination- condensation cascade 155,156 . Moreover, -NH + group of H436 and -SH group of C236 are positioned at the distances of 4.3 Å each from the C 6 that trigger π-electron shift and cyclization. Moreover,

+ the energy of binding to CarTS is -4.7 kcal/mol. Similarly, LimTS contains -NH 2 group of R303 residue at 3.6 Å, R485 residue at 4.2 Å and R301 residue at 4.9 Å from the cationic carbon. The

C6 atom is also oriented at 4.5 Å from R479. The binding energy of geranyl carbocation is -5.0 kcal/mol due to the arginine triad (R301, R303 and R485). Both these enzymes have an aromatic amino acid residue- Y561 in CarTS and F566 in LimTS close to methyl groups of C 7, but they are oriented away from the direction of cyclization.

113

Figure 4-16 Role of amino acid residues in TS active site towards cisoid primary geranyl

cation reactivity. The TS is identified in the left top corner of each box. The cisoid primary geranyl cation is highlighted in green, the favorable interactions for 1,6-closure are highlighted

in green dotted lines and unfavorable interactions are highlighted in purple dotted lines with

distance measurements.

114

MyrTS and LinTS product spectra involve the majority of acyclic terpenoids. When the

geranyl carbocation was docked in their active sites, few/no arginine residues were observed in

+ 5.0 Å radius. MyrTS has R482 whose -NH 2 is at 4.0 Å from the carbocation and Y410 whose

hydroxy group is at 4.2 Å from C 6. But the benzene ring of F563 is at 3.7 Å and 4 Å from the

methyl groups on C 7. The ring faces the same side as the possible trajectory of the cyclization. The binding energy of primary geranyl cation in MyrTS was found to be -4.9 kcal/mol. The LinTS active site is devoid of the arginine residues, instead have Y399, Y403 and D328 whose hydroxy groups are at 3.5 Å, 3.9 Å and 4.0 Å respectively. The geranyl moiety in LinTS is surrounded by four phenyl alanine residues F317, F429, F546 and F556 in 5.0 Å radius obstructing cyclization and exhibited the highest binding energy of -5.6 kcal/mol.

Formation of carene from the α-terpinyl carbocation

CarTS fermentations made 77-83 % carene from the mixture, whereas the other three TSs studies made carene at less than 6 % of the mixture. This drastic difference in the product ratios is a result of the reactivity of α-terpinyl carbocation. Hence CarTS and MyrTS active sites were docked with both α-terpinyl cation and carene; and LinTS and LimTS were docked for TSS α- terpinyl cation Figure 4-17.

α-Terpinyl cation TSS is dynamic inside the CarTS active site and flips on cyclization.

+ This could result because of the location of -OH of D333 and -NH 2 of R296 at 3.8 Å and 4.6 Å

+ respectively from C 7 of the α-terpinyl carbocation. Apart from this, -SH group of C326 and -NH of H436 occupy active site at 4.4 Å and 4.6 Å distance from carene C 7. Apart from R480 identified earlier, R488 is also present aligned with the plane of the α-terpinyl cation. CarTS is the only

115

enzyme that has cysteine residue (C326) in the catalytic site and its interaction with the terpinyl cation can lead to the cyclization to carene.

MyrTS does not contain the arginine located in 5.0 Å radius. Benzene ring of F563 is at

3.3 Å from terpinyl cation and at 5.3 Å from C1 of terpinyl structure. In the case of LimTs, R485 is at 5.2 Å from the terpinyl cationic carbon. Moreover, LimTS active site also possesses cysteine residue C486 but is at 8.0 Å from the C 7 of the α-terpinyl carbocation. Hydroxy group of D338 residue in LimTS active site is located at 3.7 Å from C 1 generating the possibility of cyclization.

The LinTS site is dominated by aromatic residues like phenyl alanine and tyrosine. Hydroxy group of T325 is at 3.8 Å from the cationic carbon. These interactions are very weak and are reflected in lower titers of carene by LinTS aerobic fermentation.

The binding energies of terpinyl cation at CarTS site is -5.0 kcal/mol, at LimTS site is -5.3 kcal/mol and at MyrTs site is -5.5 kcal/mol. But at LinTS site it is -6.3 kcal/mol also suggesting a lower activity for further catalytic reactions.

116

Figure 4-17 TS active site contour towards formations of carene from the α-terpinyl cation.

The TS is identified in the left top corner of each box. The cisoid primary geranyl cation is

highlighted in purple and the α-terpinyl cation is highlighted in green.The α-terpinyl cation interactions are highlighted in purple dotted lines and carene interactions are highlighted in green

dotted lines with distance measurements.

117

Pinyl cation formation by 2,7-closure of the α-terpinyl carbocation

Pinene was detected in trace levels during CarTS and LimTS fermentations. Hence, the

TSSs were docked in their catalytic sites (Figure 4-18). In case of CarTS, apart from the

interactions of residues with α-terpinyl carbocation in previous section 4.3.8, C 2 of α-terpinyl TSS does not have any residue within 5.0 Å radius. Benzene ring of F566 located at 3.7 Å from C 2 of

terpinyl hinders the 2,7-closure in LimTS. And R301 and R303 are at the distance of 4.1 Å and

4.7 Å from C2 terpinyl leading the plausible catalytic activity towards pinyl cation. But in both

cases, the interactions are weaker. CarTS on the other hand has dominant 1,7-closure leading to

carene.

Figure 4-18 TS active site contour towards formations of pinyl cation from the α-terpinyl

cation. The TS is identified in the left top corner of each box. The α-terpinyl cation is

highlighted in purple and the pinyl cation is highlighted in green. The α-terpinyl cation

interactions are highlighted in purple dotted lines with distance measurements.

118

Acyclic terpenoid mechanisms from cisoid primary geranyl carbocation

The acyclic monoterpenes such as myrcene and linalool originate from tertiary geranyl carbocation. Hence, the likelihood of formation of acyclic monoterpene products is determined by the conversion of cisoid primary geranyl cation to tertiary geranyl (Figure 4-19).

In CarTS, as discussed earlier, primary geranyl cation distance from R296 and R480 is 4.6

Å 4.4 Å respectively. But the tertiary geranyl cation orientation increases the distance to 4.8 Å from R480 and >5.0 Å for R296 reducing the stability. In the case of LimTS, the primary geranyl cation is in close proximity to R301 (4.9 Å), R303 3.6 Å) and R485 (4.2 Å). Moreover, -SH of

C486 is at 4.9 Å from the primary carbocation. But only hydroxy groups of D330 and Y413 are at

3.6 Å and 4.0 Å from the tertiary carbocation. This explains the higher distribution of cyclic monoterpenes for these TSs.

+ For MyrTS, the -NH 2 group of R482 is at 4.1 Å from primary cation but 4.0 Å from tertiary cation. Moreover, R298 and D335 are at 4.4 Å and 4.6 Å respectively from the tertiary cation.

Whereas in the case of LinTS, both TSSs exhibit weaker interactions and the active site is enriched in aromatic residues like Y403 and Y399. Aspartate residue D328 is at 4.5 Å from the tertiary geranyl carbocation.

119

Figure 4-19 TS active site contour towards formations of tertiary geranyl cation from

cisoid primary geranyl cation. The TS is identified in the left top corner of each box. The

cisoid primary geranyl cation is highlighted in purple and the tertiary geranyl cation is

highlighted in green.The primary geranyl cation interactions are highlighted in purple dotted lines and tertiary geranyl cation interactions are highlighted in green dotted lines with distance

measurements.

120

4.4 Discussions

Optimization of fermentation conditions

Mixing is an important factor in terpenoid production in cell cultures. Mixing ensures a

homogeneous environment of nutrients, oxygen and metabolites; and also influences terpene

extraction in the organic phase. Terpenoids are hydrophobic and are effluxed out of the cell after

synthesis. They evaporate from the media due to their volatile nature. Organic solvent overlay

traps the terpenoids. Tube cultures have a smaller exposed surface area and hence terpenoids can

be efficiently stripped into dodecane. But the same feature hinders mixing as the dissolution of

oxygen in media which was evident from low cell density. Aerobic cultures in tubes had higher

cell growth but low carene detected which could be due to a loss in the environment. Flasks reached

twice the OD 600 than tubes. Aeration did not influence flask cultures to the extent tube cultures did towards carene production. Cultures in the flasks occupied 6 % of volumetric space which was multiple folds lower than that of tubes (50 %).

Optimization of shake flask fermentations

Monoterpenes are volatile and are sparingly soluble in aqueous media but have high

octanol:water partition coefficient. Dodecane overlay ensured that the monoterpenes partitioned

into the overlay when produced and reduced further evaporation. I carried out shake flask

fermentations in the absence of dodecane overlay and extracted monoterpenes at the end of

fermentation by liquid-liquid extraction with hexane, I did not detect any monoterpenes. This

suggested to us that dodecane was efficient in trapping terpenes as well as reducing their

cytotoxicity. Even though the reduced evaporation rates aided more terpene capture and detection,

fermentation parameter optimizations were done in micro-aerobic conditions to minimize errors. 121

Another key parameter that played a vital role in improving titers was media composition.

Terpene production required enriched media. This was achieved by supplementing LB media with

0.5 % (w/v) yeast extract. For SACar, SAMyr and SALim+, this substantially increased monoterpene titers. For strains expressing cyclic monoterpene as major products (SACar and

SALim+), TB yielded 16 % and 23 % higher titers for SACar and SALim+ respectively than LBY by supporting higher biomass generation. It was observed that monoterpene titers in TB did not increase in comparison to the cell growth and when normalized with OD600 (µg/L/OD); and they dropped by 40 % for SACar and 30 % for SALim+. Hence LBY media was selected for these three strains. Linalool production by SALin displayed a different trend. It had higher titers as well as

OD 600 normalized titers in TB than any other media. pH of SALin cultures in LBY media was 7.8 and in TB media was 7.2 when measured after overnight incubation. TB media is buffered with mono and dibasic potassium phosphates. Since, linalool is oxygenated (-OH) terpene and the hydroxyl functionality is weakly acidic, higher titers in TB indicate that the buffered conditions had a positive impact. LB, LBY and 2YT are unbuffered; and yielded titers and cell densities in a similar range. HD media is buffered too but is not as enriched as TB and did not support very high cell growth throughout and hence lower linalool titers. Effect of pH on the biosynthesis of monoterpenes in E. coli is not researched much. It is reported that acidic pH had an unfavorable effect on bioconversion of limonene to perillic acid in yeast 157 . Another study showed the effect of pH on sesquiterpene production in E. coli and its dependence on terpenes from the same class but the different structure 157 . Apart from structural variability, linalool had differences in physical parameters than other monoterpenes. It has lower reported vapor pressure (0.159 mm Hg), and lower (2.97) octanol:water partition coefficient and very high solubility in water (1590 mg/L) 158

(Table 4.4). This also explains the need for using higher dodecane overlay (30 % v/v) to capture 122

linalool produced. After analyzing these parameters, it is evident that the ionization state of linalool affects the partitioning, cytotoxicity and in turn its overall titers.

Table 4.4 Physical properties of monoterpenes

Vapor pressure Solubility in water Structure Compound Log Po/w (mm Hg) (mg/L)

Myrcene 4.33 2.09 5.60

Carene 4.38 3.48 6.10

Limonene 4.57 1.55 0.65

Linalool 2.97 0.16 1590.00

123

Product profile characterization

To compare the product profiles I referred to catalytic mechanisms of monoterpene

biosynthesis from GPP 159 . Monoterpene biosynthesis involves three major cationic intermediates generated by an attack by Mg +2 leading to pyrophosphate group abstraction. These three carbocationic stages are a trans-geranyl cation, cis-linalyl cation and α-terpinyl cation; occurring in the order in the cascade of electron transfers to stabilize the ion.

Product profiles of in vitro reactions performed in this study at 22 °C were similar both qualitatively and quantitatively with the one reported in the literature that were performed at 30

°C (table 2). This suggests that the strain engineering has not affected protein folding or activity.

Even lower temperature for reaction incubation had no noticeable effect on product profile.

Shake flask fermentation data provided us with two comparisons. First, the comparison between aerobic and micro-aerobic product profiles and second, between in vivo expressions to in vitro reactions. Discrepancies between in vitro product profiles of CarTS has been mentioned before 18,151 . Reiling et al. could not detect minor products of CarTS when expressed in vivo but have reported in vitro product distribution by comparing it with carene’s titer. In addition to products reported by Faldt et al., Reiling et al. detected limonene. I too observed limonene in

SACar fermentations but phellandrene remained undetected in our study. The percent of carene in the products was lower; and pinene and sabinene were higher in micro-aerobic than aerobic fermentation. All these are cyclic monoterpenes originating by neutralization of α-terpinyl cation

by hydride shifts. Pinene synthesis occurs from pinyl cation by 2,7-closure of the α-terpinyl cation.

Sabinene is formed by 2,6-closure through thujyl cation. The absence of phellandrene in vivo suggests that phellandryl cation chemistry does not proceed in our cultures.

124

Catalytic promiscuity of MyrTS was detected in vivo with myrcene as a major product but not in vitro . Catalytic mechanisms myrcene biosynthesis involves quenching geranyl (trans) or linalyl (cis) carbocations by proton abstraction. Presence of carene and pinene in fermentation suggests stabilization of α-terpinyl cation by 6,1-closure of linalyl cation. Though pinene titers did not change substantially, carene titers were higher in the micro-aerobic fermentations. SALim+ cultures showed a similar trend. In vivo promiscuity was higher and; carene and linalool were detected as additional products apart from limonene, myrcene and pinene. In this case, linalool titer was higher in the aerobic whereas myrcene and carene titers were higher in the micro-aerobic conditions. Out of the five products, linalool formation involves water as a nucleophile and based on results, this is favored in the aerobic conditions. Fermentative levels of pinene were low in

SALim+ cultures than in vitro reactions.

SALin in vitro reactions detected linalool as major product and ocimene as a minor product.

Other monoterpenes reported 129 remained undetected in our enzymatic reactions. Micro-aerobic titer for linalool was substantially lower and carene was higher. Linalool and carene involve biosynthetic mechanisms through two completely different mechanisms as well as carbocationic stages. Bicyclic monoterpene pinene was also detected in trace levels. This suggests that the cyclization mechanisms prevailed in micro-aerobic conditions. I did not detect any acyclic or monocyclic monoterpene produced by SALin.

In conclusions, I observed that linalool production was favored in aerobic conditions whereas, bicyclic monoterpene (carene and pinene) production was enhanced in oxygen-limited condition.

125

Computational analyses

The homology models generated for TSs and docking various TSSs and products revealed the structural basis for TS promiscuity. All four terpene synthases contain an aspartate residue

(highlighted in green in Figure 4-21) that is the first residue of DDxxD motif. This is Mg +2 ion

binding motif is required for the generation of primary geranyl cation from GPP.

Both CarTS and LimTS have active site contour favoring cyclization of primary geranyl

carbocation to terpinyl carbocation. The residues interacting for conversion of primary geranyl

carbocation to tertiary geranyl carbocation are either absent or sparsely located from the active

center. Whereas, MyrTS and LinTS lack the residues at active site responsible for 6,1-closure in

primary geranyl carbocation but display the active site shape favoring tertiary carbocation

generation. The residues with charged side chains like arginine, histidine, aspartate and tyrosine

populate the active site and perform two functions: charge delocalization and cation stabilization

through solvation. The amino acids with aromatic side chain like phenyl alanine and tyrosine are

enriched in the MyrTS and LinTS reducing the mobility of the TSSs for cyclization mechanisms.

The residues involved at the active site are shown in Figure 4-20 and the discussion is divided for

each MTS in the following paragraphs.

CarTS: Residues R480, H436 and C326 are involved in 1,6-closure mechanism to generate

and stabilize α-terpinyl cation. These steps facilitate the movement of the terpinyl cation and bring

it in the proximity of H436, C326, D333, R296 and R488 that trigger the 7,1-closure leading to

carene formation. Residue H436 and C326 are closer to C6 of the terpinyl cation, generating a possibility of 2,7-closure leading to pinene as a product. The active site also contains Y561, but the benzene ring is located away from the active site interacting minimally.

126

LimTS: Primary geranyl cation is less reactive towards the formation of tertiary geranyl cation as residues R303, R485, R301and C486 are in closer to the primary cation. Moreover, R479 in the vicinity of C6 triggers the terpinyl cation formation which then follows 2,7-closure to yield pinyl cation due to residue R303 and R301. The active site also contains aromatic residue F566 but is pointing away from the active site pocket.

MyrTS: This active site contains R298, R482 and D335 that are in the closer to tertiary geranyl cation than primary geranyl cation. The proximity of C 6 to Y410 could translate into cyclic monoterpene formation but juxtaposed benzene ring of F563 limits the movement of primary geranyl cation towards cyclization mechanisms.

LinTS: Like MyrT, LinTS active site contains Y399, Y403, T325 and D328 that form a basket around the tertiary carbocation. Aromatic residues F317, F429, F546 and F556 surround the C 1, C 6 and C 7 methyl groups strongly obstructing movement for cyclization.

127

Figure 4-20 TS active site contour and participating amino acid residues. The TS is

identified in the left top corner of each box. The surface is highlighted in grey.

128

Comparison of product profile with computational analysis

All four TSs studied belong to Picea abies and were considered to have evolved from the common ancestral enzyme. When the product spectra were compared with the outcomes of computational docking, the link between the conserved amino acid and catalytic function could be established. This comparison is summarized in Figure 4-21.

CarTS LimTS MyrTS LinTS R296 R301 R298 F317 Generation of cyclic products C326 R303 T325 Generation of primary geranyl cation D333 D338 D335 D328 Interaction (non-cyclization) with tertiary geranyl cation H436 Y410 Y399 Interaction (cyclization) with tertiary geranyl cation Y403 Stearic hindrance to α-terpinyl cation generation R480 R479 R482 Unknown but involvement in bicyclic product formation R488 R485 Y561 C486 F566 F563 F429 F546 F556

Figure 4-21 Predictions for roles of the amino acid residues present in TS active site and

structure-function correlations

129

Chapter 5: Designing an efficient bioprocess for terpenoid recovery

5.1 Introduction

In my work on terpenoid biosynthesis in recombinant strains, I addressed the issues of low

yields of terpenoids in a heterologous host. I studied strategies to improve flux through the MEP

pathway by employing enzyme fusions. This shed light on the plausible bifurcation step in the

pathway. In any case, despite the flux through engineered MEP pathway being very high, it gets

attenuated further due to the promiscuous nature of the biosynthetic enzymes. My strategy of

studying the mechanistic basis of terpene synthase promiscuity could be applied towards

addressing low yields. However, these insights need to be translated to the chemical industry and

all the efforts for improving heterologous terpenoid production will be in vain if large scale

bioprocess development is not economically viable. This challenge will be addressed in the current

chapter.

Challenges in terpenoid bioprocess development

Terpenoids are secondary metabolites that are commonly secreted out of the cell. Despite many of their economically interesting applications and advancements in the synthetic biology tools that have enabled their production in simpler hosts like bacteria and fungi, there are very few examples of their commercial production. The major challenges in bioprocess development are physicochemical and biochemical properties of terpenoids. They are volatile hydrocarbons and are cytotoxic once they exceed a certain concentration. Microbial bioprocess development is hence challenging and calls for continuous product recovery 160 .

130

Continuous ex situ recovery (CESR)

The physicochemical properties of terpenoids (hydrophobic and volatile) are exactly

opposite to the environment that they are produced in (aqueous and less volatile). These differences

can be utilized to efficiently design the bioprocess. There are various techniques to efficiently

capture the products that either come directly in contact with the culture media ( in situ product recovery) or that run parallel to the bioreactor capturing volatile fraction (ex situ product recovery).

Both these methods either involve heterogeneous liquid phase or gas phase 160 . The continuous

extraction in the heterogeneous phase also minimizes the cytotoxicity of terpenoids.

There are many examples of the development of such processes for cell-free biocatalysis 16 or extraction from the cells post bioprocess 161 . Some examples that involve whole cell fermentation are centered towards single reaction conversion like interface bioreactor for racemic mixture resolution 162 . De novo bioprocesses for terpenoids widely use an organic solvent that is

overlaid on the fermentation media to capture products. It employs high boiling solvents like

dodecane 28,152,163 , isopropyl myristate 164 , methyl oleate 165 . Most of these processes are either only in experimental stages or are for high value-low volume pharmaceuticals like taxol.

Drawbacks of state of the art

Employment of high boiling organic phase for terpenoid recovery is efficient but not

sustainable. Boiling points of these solvents are: dodecane (216 °C), isopropyl myristate (167 °C)

and methyl oleate (351 °C). Whereas terpenoids are high boiling as well for example monoterpenes

(>140 °C), sesquiterpene (>220 °C) etc. Hence, recovering the terpenoids back from the organic

phase is an energy-intensive process often requiring distillation. Such a process loses its industrial

131

applicability and viability at product recovery-purification stage. Low boiling and recyclable

solvents like hexanes, methanol, ethanol etc. are cytotoxic and cannot be used.

The objective of the study

Development of an efficient recovery technique for terpenoids will not only make the

bioprocess economical but also enable large scale applications of terpenoids as bulk and

commodity chemicals. The objective of this study is to devise a strategy that will not only be

efficient, scalable and economical. The initial study was done on SACar strain as a prototype and

now has become a comprehensive project lead by Azin Amiri for her PhD studying SAMyr,

SALim and SALin. I used the volatile nature of these terpenoids to capture it from the exhaust

gases using the adsorption mechanism.

5.2 Materials and methods

Strains and fermentation conditions

SACar (described in Chapter 4: page number 86) fermentative process was optimized for carene recovery in a 2 L bioreactor with 1 L working volume. Fermentation was carried out in

LBY media supplemented with the appropriate antibiotic and 2 mM MgCl 2. Starter cultures were grown in Erlenmeyer flask at 30 °C overnight. Production cultures in the bioreactor were inoculated with 1 % (v/v) of starter cultures and grown 37 °C. The temperature was reduced to 22

°C when the culture reached an OD 600 of 0.8. When the culture temperature reached 22 °C, IPTG was added to the final concentration of 0.7 mM and fermentation was continued for 18 h.

132

Bioreactor set up

Fermentations were carried out in New Brunswick Scientific BioFlo 110 equipped with probes for pH, DO, temperature; and controllers for temperature and pH. Rushton turbine system with two impellers was used for agitation with sparger located at the base of the bottom impeller.

The reactor vessel was 8.5" tall and 4.92" in diameter with an internal cooling coil and an external heating blanket. pH was monitored but not controlled. Antifoam addition was not required.

CESR for terpenoid

The device (fluidized bed capture device, FBCD) was constructed for CESR that contained

resin bed fluidized in water and was connected to the air outlet stream through a condenser. The

FBCD schematic is shown in Figure 5-1. The device was made from polylactic acid (PLA) polymer

with an internal diameter of 1.5 cm and a height of 4.5 cm with a capacity of 8 mL. It was covered

with a mesh made of stainless steel and was filled wet resin and the inlet was connected to the air

flow outlet of the bioreactor.

Figure 5-1 Schematic for construction of fluidized bed capture device (FBCD)

The CESR schematic is shown in Figure 5-2. Condenser prevents water loss and maintains

the volume of the culture. The exhaust air enriched in evaporated terpenes then passes through the

133

fluidized bed of resin. Due to pressure the air bubbles travel upwards, and the monoterpenes get

adsorbed on the resin surface due to affinity.

Agitator

Air flow out

Water Resin

Bioreactor

FBCD Sparger

Figure 5-2 Schematic of CESR setup

Resins used for terpenoid capture

Resins are polymeric compounds with tunable chemistry. The resin can be a homopolymer

(made of a single type of repeating monomers) or heteropolymer (made of more than one type of repeating monomers). The physicochemical properties of the polymer depend on the monomeric units that it contains. Monomeric blends are chosen for desired characteristics like hydrophobicity/hydrophilicity, type of interaction, stability, porosity etc. Moreover, the resins can be tuned to be highly solvent resistant and can be regenerated for reuse. Adsorption is affinity based on weaker forces like van der Waals forces. It supports multilayer binding and is readily 134

reversible. Since, terpenoids are hydrophobic, adoption is preferred phenomenon for their recovery.

Different resins were screened for their efficiency to capture carene. Resins employed in this study and their physicochemical properties are listed in Table 5.1. Purolite resins were gifts from Purolite, USA. Amberlite resins were purchased from Sigma. The resins were activated as per the supplier recommendations, washed thrice with water prior to use.

Table 5.1 Resins employed in the study and their properties

Pore size Surface area Resin Hydrophobicity Polymer structure (Å) (m 2/g)

Amberlite XAD-4 Hydrophobic Styrene-divinylbenzene 50 725

Hydrophilic Amberlite XAD-7 Acrylic 90 450 (moderately polar)

Hydrophilic Amberlite XAD-8 Acrylic ester 225 140 (moderately polar)

Hydrophilic Purolite Purosorb PAD610 Polyacrylic-divinylbenzene 290 450 (moderately polar)

Purolite Macronet MN202 Hydrophobic Polystyrene-divinylbenzene 6-210 900

Purolite Macronet MN270 Hydrophobic Polystyrene-divinylbenzene 16-80 1100

135

Screening resin for carene capture

10 µL of carene was added to 1 mL of each resin dispersed in 9 mL water in an airtight glass vial and was shaken for 1 hour at 22 °C. The resin was then allowed to settle, and the solids were separated by filtration. Carene from the filtered water was extracted using dodecane extraction and analyzed as well. Hexane was used as a solvent to desorb carene from resin. The desorption was carried at room temperature. All the steps were done in sealed glass vials. 2 µL of hexane fractions were analyzed on GC-MS for characterization. The analytical method used is described in Section 4.2.4 (Page number 89).

136

5.3 Results and discussions

Adsorption based carene recovery

Desorption was carried out for 1 week. The results are reported in Table 5.2.

Table 5.2 Screening resin for carene recovery

% Carene remaining % Carene recovery (desorption time) Resin in the water after extraction 2 h 3 days 1 week

Amberlite XAD-4 15.5 ± 1.5 58 ± 2 74 ± 4 72 ± 6

Amberlite XAD-7 4.1 ± 1.5 42 ± 4 71 ± 7 75 ± 11

Amberlite XAD-8 0.08 ± 2.5 35 ± 20 77 ± 9 69 ± 18

Purolite Purosorb PAD610 0.03 ± 0.0 84 ± 6 85 ± 13 93 ± 15

Purolite Macronet MN202 0.03 ± 0.0 57 ± 10 79 ± 7 86 ± 18

Purolite Macronet MN270 0.05 ± 0.5 58 ± 10 67 ± 13 66 ± 19

100 % carene was never recovered due to its volatility and loss by evaporation. Due to the

same reason, carene desorption reduced on longer incubation for Amberlite XAD-4 and XAD-8.

Purolite Purosorb PAD610 could recover 84 % of carene on 2 h of desorption and was selected for

bioprocess optimization. PAD601 is made of a polyacrylic-divinylbenzene copolymer which is

moderately hydrophilic. The analysis revealed that acrylate co-blocks aid desorption whereas, styrene co-

blocks are important for adsorptive interactions.

Since the resin can be reused, optimization of both adsorption efficiency and desorption

time is necessary. PAD610 though could not be desorbed completely higher efficiency can be

achieved when it is recycled. 137

Bioreactor parameter optimization

The bioreactor parameters like air flow rate, agitation speed and resin bed volume were optimized. Optimal parameters are shown in Figure 5-3. The optimal conditions for monoterpene product capture were determined to be agitation speed of 250 rpm, air flow rate of 2 vvm and 7 mL resin. Higher rates of air flow through the resin bed were observed to dry up the resin. As a consequence, the FBCD was kept immersed in water.

Figure 5-3 Bioreactor parameter optimization for efficient carene production and recovery

138

The bioreactor and FBCD set up is shown in Figure 5-4.

Figure 5-4 Experimental apparatus

Terpenoid product analyses

The optimization was done for SACar strain. SAMyr strain was run on bioreactor as well

on the optimized parameters.

Table 5.3 summarizes the monoterpene product distributions obtained. The product distribution at bioreactor scale was compared with aerobic shake flask fermentation (reported in section 4.4.3). The results match closely but the bioreactor runs had higher variability. This suggests that the capturing mechanism is efficient for a variety of monoterpenes. This was also validated by an independent study of testing resin adsorption for a mixture of different monoterpenes.

139

Table 5.3 Product distribution comparison of shake flask and bioreactor scale

fermentations

Products SACar aerobic shake flask SACar bioreactor fermentations fermentations (Section 4.2.4) (This study) Carene 83.4±0.9 84.8±1.5 Terpinolene 3.9±0.2 3.9±0.6 Sabinene 8.5±0.2 7.4±0.9 Myrcene 2.5±0.2 2.2±0.7 Limonene 1.2±0.2 1.0±0.5 Terpinene 0.3±0.2 0.5±0.2 Pinene 0.1±0.1 0.1±0.1 SAMyr aerobic shake SAMyr bioreactor flask fermentations fermentations (Section 4.2.4) (This study) Myrcene 98.7±0.8 98.4±0.9 Carene 1.0±0.2 1.2±0.5 Pinene 0.3±0.1 0.4±0.6

Terpenoid recovery comparison

The SACar fermentations in the bioreactor were run with 10 % (v/v) overlay with dodecane for 18 h after induction. 4.4 µg/L carene was recovered from the dodecane. When FBCD was employed for the capture replacing dodecane overlay, the carene recovery was 26.6 µg/L. A 6-fold improvement in the recovery was observed. Similarly, 14.6 µg/L myrcene was recovered from the

140

SAMyr fermentations. No monoterpenes were detected in media and water used to immerse

FBCD.

In conclusion, the terpenoids were successfully captured from the recombinant E. coli fermentation fluidized bed capturing device. The device was connected externally to the air flow outlet and was filled with resin. The desorption was carried out using hexane but other solvents like methanol can be used as well. Scale-up at this bioreactor scale improved the terpenoid production of SACar and SAMyr but the product distributions remained similar to that found in aerobic shake flask fermentations.

141

Chapter 6: Other scholarly contributions

6.1 A Biogenic Photovoltaic Material 166

Other contributors: Sarvesh Kumar Srivastava, Przemyslaw Piwek, Arman Bonakdarpour, David

P. Wilkinson, Vikramaditya G. Yadav

The organic dye-sensitized solar cells (DSSCs) 167 are fabricated by immobilizing Ru- based light-absorbing chelates over a layer of titania (TiO 2) working electrode in presence of an electrolyte 168 . However, despite significant progress, they require toxic solvents, carcinogenic chemicals and expensive & rigid process controls under clean-room environment limiting their practical application. In fact, a combination of water-based electrolytes 169 and naturally occurring dyes 170 have been envisioned to hold the key for the development of solar cells and our future energy needs. The material that we have developed, in contrast, is biogenic 171,172 and comprises of a porous mesh of E. coli BL21 cells that are encapsulated with TiO 2 (Figure 6-1). The bacterial cells are genetically engineered to synthesize lycopene, a photosensitive dye, and TiO 2 is deposited onto the cells via a tryptophan-mediated supramolecular interface to produce a core@shell-like morphology. Unlike other biological or bio-hybrid photovoltaic materials, not only is our synthetic scheme uncomplicated, but the resulting material is also exceptionally stable and exhibits

- - impressive PV properties when used as an anode in an I /I 3 -based DSSC.

142

Figure 6-1 Sequential representation of whole cell bio-PV materials. (a) molecular cloning of

E. coli for expression of lycopene, (b) non-covalent surface binding of TiO2 nanoparticles

resulting in core@shell-like morphology and (c) deployment of biogenic PV material towards

DSSC fabrication.

We recorded an open circuit (VOC) potential of 0.289 V, a short circuit (ISC) current of

0.19 mA and a corresponding short circuit current density (JSC) of 0.686 mA/cm 2 (Figure 6-2(a)).

Furthermore, since we coated TiO2 NPs around dye-encapsulating bacteria, we generate a preponderance of biogenic-hydrophobic interfaces that significantly increase the resistance. The time-dependent (or ON/OFF) illumination of the bio-PV DSSC (Figure 6-2(b)) exhibited a temporal variation of potential. When illuminated, a net potential difference of about ~ 0.2 V was observed over a period of 5 minutes with minimal decay. We also recorded cyclic voltammograms

(CV) to confirm the PV effect (Figure 6-2(c)). When the illumination was turned off, we observed a typical capacitor-like charging-discharging behavior. However, when the DSSC is illuminated, we measure a current flow of ~ 5 µA (~ 0.02 mA/cm2).

143

Figure 6-2 Electrochemical measurements of bio-PV DSSC. (a) I-V curves, (b) open-circuit

photovoltage response (time dependent) and (c) cyclic voltammetry curves

The absence of any reduction-oxidation peaks suggests that the bio-PV material and associated DSSC are stable in the operational range of -0.1 to -0.2 V, which is also the corresponding range of the working potential. The use of metagenomics and synthetic biology to screen and clone microbial hosts that synthesize highly photoactive dyes, together with the flexibility and environmentally-friendly approach offered by biogenics (no further downstream processing, for instance), offers substantial advantages in the development of novel bio-PV materials and devices such as DSSCs, photodiodes and bio-polymeric cells.

144

6.2 Isolation of phenolic monomers from kraft lignin using magnetically recyclable TEMPO nanocatalyst 173

Other contributors: Saurabh C Patankar, Li-Yang Liu, Lun Ji, Vikramaditya Yadav and Scott

Renneckar

A future bioeconomy will utilize platform chemicals sourced from biomass to create functional products and this action mirrors one central tenet of green chemistry. For example, vanillin has been suggested as a platform chemical that can be transformed into materials such as polyethylene terephthalate analogs useful in plastic containers to carpets and other textiles. While much research has been performed for second generation biorefineries producing cellulosic ethanol, a potential cornerstone of a bioeconomy, cellulosic ethanol targets have not been met to date. In part, biofuels have a poor valorization factor compared to platform chemicals such as vanillin and the environmental benefits have not been realized due to difficulties associated with scaling the technologies. Moreover, it is not clear there can be a return on investment of cellulosic ethanol biorefineries, prior to mass electrification of vehicle fleets occurring within the next decade. This issue ushers the existing pulp and paper industry to lead the crucial transformation towards the production of renewable feedstock for a global bioeconomy. However, mills have faced economic hurdles and many of them have closed in the last decade. In part, like second generation biorefineries, pulp and paper companies struggle to deal with a quarter to a third of the biomass which is composed of lignin. The majority of chemical pulps are produced from the kraft process, which uses sodium sulfate to delignify wood chips, breaking native C-O ether linkages in lignin and subsequently leaching its fragments into the alkaline pulping liquor. In this process, lignin gets modified with elemental sulphur and further gets repolymerized to develop into a 145

complicated macromolecule with considerable heterogeneity. This heterogeneity limits usage of lignin to its fuel value during chemical recovery and recycling of caustics. Moreover, this aromatic polymeric material has an important role in production of platform chemicals in the bioeconomy, yet there are limited economic compelling reasons to work with lignin.

In North America, three different kraft lignins are available in semi-commercial quantities from three mills that all use gymnosperm species, which contain more than 90% guaiacyl lignin derived from the dehydrogenative polymerization of coniferyl alcohol. There are oxidative routes to isolate phenolic aldehydes from this material. However, the technologies employed require consumption of a large amount of alkali and high-temperature reactions with the pathway of vanillin cleavage. Deriving vanillin from biomass has low yields with large inputs of energy and material. Hence, novel catalytic oxidative depolymerization techniques have a crucial role in developing greener bio feedstock processes. In this work, we reported the use of novel catalyst designated as Fe@MagTEMPO developed by our collaborators for the oxidative depolymerization of lignin to obtain a higher yield of vanillin with lower calculated E factor than previously reported in the literature (Figure 6-3).

Figure 6-3 Oxidative depolymerization of kraft lignin using the Fe@MagTEMPO catalyst

146

Table 6.1 Comparison of vanillin yield derived through oxidative depolymerization of kraft

lignin using Fe@MagTEMPO catalyst with values in literature

Vanillin E Lignin type Reaction Conditions Ref. yield (%) factor

Kraft lignin 60 g.L -1 lignin, 2N NaOH aq. solution, 174 WestVaco Co. 10.8 20.60 9 bar with pO being 3 bar, 130° C, 35 min (Pinus Spp.) 2 110.6 g LS 1200 , NaOH solution (18 g NaOH in 3 16 cm water), Preheated at 15 bar O 2, 190° C Lignosulfonate for 15 min 175 3 5.9 19.14 LS 1200 1.08 g CuSO 4. 5H 2O catalyst, 2 cm Nitrobenzene 12 bar O 2, 190° C, Upto 30 min Kraft lignin (Pinus spp.) 30 mg LWest, 7 cm 3 of 2M NaOH aq. solution 273.5 176 12.14 LWest from 0.45 cm 3 Nitrobenzene, 170° C, 4 h 9 WestVaco Co. 220 g.L -1 NaLS, 3M NaOH aq. solution (pH 14) Sodium 4.6 g.L -1 copper sulfate (Cu 2+ ), 11.5 bar with pO 177 Lignosulfonate 2 7 21.38 being 1.3 bar, Air flowrate of 4.5x10 -3 m3.min -1 NaLS 140-160° C 140 mg lignin, 30 cm 3 of 2 M NaOH aq. solution Spruce kraft 10 mg LaMn Cu O catalyst, 5 bar O + 15 217.7 178 0.8 0.2 3 2 17.3 lignin bar He 3 175° C, 10 min Indulin AT from West 15 7.75 1% (w/w) Lignin sample, 200 cm 3 water as Rock solvent This Acetone 0.175 mg.cm -3 Fe@MagTEMPO recyclable soluble Indulin 21 5.25 work catalyst AT (ASKL) 0.2 mmol NaBr, 5 mmol.g -1 NaClO, 25° C, 4 h Acid wash 19.7 5.66 Indulin AT

147

The Fe@MagTEMPO catalyst has free amine groups attached to magnetic nanoparticles that were observed to provide a local alkaline environment to abstract proton from phenolic hydroxyl in lignin, eliminating the use of alkali during oxidative depolymerisation. This aspect reduces the waste generation by over 98% compared to traditional methods of isolating vanillin from kraft liquor that were infamous for generating 160 kg alkali waste per kilogram of vanillin.

148

6.3 The V-factor: Towards a new metric for gauging the efficiency & profitability of

manufacturing processes for the bioeconomy

Other contributors: Saurabh C Patankar, Vikramaditya Yadav and Scott Renneckar

Several metrics have been formulated to evaluate the environmental impact of chemical

manufacturing processes. However, there are no formulas for quick, back-of-the-envelope

estimation of their efficiency of resource utilization and profitability, which are hugely influential

when it comes to determining investments. In this work, we developed a new metric called the V-

factor to estimate these parameters for manufacturing processes that utilize biomass. Additionally,

we also argued that co-assessment of E- and V-factors will greatly facilitate the development of

environmentally and economically sustainable processes for the valorization of biomass (Figure

6-4).

Figure 6-4 Comparison between E-factor and V-factor

The unabridged V-factor incorporates all variables that influence the profitability of a

manufacturing process, namely the costs of its reactants and products ($/kg), its fractional yield,

149

its capital and operating expenditures (CapEx and OpEx, respectively, $/kg for both), tonnage of the product (MT or metric tons), and the tonnage of its product sector (MT). The unabridged V- factor (Equation 6-1) is estimated as:

Unabridged Cost of product Fractional CapEx Tonnage of product = × × × V-factor Cost of reactant yield OpEx Tonnage of product sector

Equation 6-1 The unabridged V-factor equation

Although the unabridged V-factor adequately describes how efficiently a process valorizes its inputs, it is not easy to calculate. This is especially true for academic users who may not have good estimates for CapEx, or up-to-date information about pricing or product tonnages. It is also evident that the equation has redundancies that could be factored out in most cases. As a result, we have simplified the previous formula by incorporating the fractional dollar output for individual sectors. We term the simplified metric as the V-factor (Equation 6-2), and the fractional dollar output is an attribute of the industrial sector in which the product will be sold.

Cost of product Fractional dollar V-factor = × Fractional yield × Cost of reactant output of sector

Equation 6-2 V-factor equation

This simplification greatly improves the utility of the V-factor and provides academic users

with a useful tool to quickly assess profitability during the formulation of green chemistry and

engineering research strategies. The fractional dollar output also ranges between 0 and 1. In

addition, the sum of the fractional dollar outputs for all sectors of the chemical industry is one. 150

Forecasted global averages of fractional dollar outputs for some common industrial sectors between now and 2050 are listed in Table 6.2. The impacts of manufacturing volumes, capital and operating expenditures, and investments on R&D on the profits has been assessed previously

179,180 , and it has been observed that a sector’s profit share of the global chemical market has remained constant over a period of 20 years and is expected to remain same over next 25 years.

The volume or tonnage at which a product must be manufactured in order to attain economies of scale, the capital and operating expenditures, and investments on R&D that are typically required for a sector greatly influence its profit share. For instance, the profit share is positively correlated with manufacturing volumes. Sectors with higher manufacturing volumes such as petrochemicals typically achieve greater economies of scale compared with speciality chemical producers 181 .

However, higher manufacturing volumes usually incur substantially greater capital expenditures.

As a consequence, the profit share for operating in a low-tonnage sector, despite having disadvantages such as higher product and process R&D expenditures, is not significantly lower in comparison. We used the sector’s profit share data and coined the term fractional dollar output to simplify the determination of V factor. The use of the fractional dollar output for estimating the profitability of processes is akin to performing a gate-to-gate life cycle analysis (LCA) in lieu of the more accurate cradle-to-grave LCA.

151

Table 6.2 Predicted global averages of fractional dollar outputs of selected sectors

Sector Fractional dollar output

Basic chemicals, fuels and petrochemicals 0.3

Pharmaceuticals and agrochemicals 0.2

Speciality chemicals 0.2

Polymers and fibres 0.2

Oleochemicals, surfactants & auxiliary chemicals 0.1

In this work, we demonstrated the use of the V factor in understanding the profitability of over 50 chemical manufacturing processes.

152

6.4 Microbial growth and its characterization on nanocellulose synthesized from cherry veneer

Other contributors: Saurabh C Patankar, Muzaffer Karaaslan, Vikramaditya Yadav and Scott

Renneckar

The use of reusable TEMPO nanocatalyst (Fe@MagTEMPO) for the synthesis of nanofibrillated cellulose from kraft pulp 182 and phenolic monomers from kraft lignins 173 (Section

6.2) has been demonstrated. We wanted to investigate one pot fractionation and modification of biomass using the same catalyst. The veneer was an excellent model for wood chips owning to its nominal thickness and longitudinal arrangement of fibers. The reactions were carried out in 200 cm 3 distilled water containing 2 g cherry veneer chips (50 x 10 x 0.5 mm) at pH 10. Sodium bromide (30 mg) and Fe@MagTEMPO catalyst (50 mg) were added to the dispersion and the dispersion was heated to desired temperature and oxidation was started by dropwise addition of sodium hypochlorite (10 cm 3). The reaction was stopped when no further drop in pH was observed.

The catalyst was separated from the reaction using an external magnet. It was ensured that the reaction mixture was still alkaline (pH 10) to keep the lignin fraction solubilized. The cellulose fraction was then separated across by filtration and redispersed in water such that solid concentration was 0.5 % (w/w). This dispersion was blended for 30 min and homogenized for two cycles to get nanocellulose. The nanocellulose samples were stored in a closed container at 4 °C.

It appeared that the sample was infected with microbes after a duration of 10 months.

This is the first report of characterization of contaminants of treated nanofibrillated cellulose. The nanocellulose sample was inoculated in tryptic soy broth (purchased from BD

Biosciences) and modified Luria-Bertani broth (1% w/w tryptone, 1% yeast extract, 1% sodium 153

chloride). Cultures were grown for seven days at 22 °C, 30 °C and 37 °C. The cultures were streaked on tryptic soy agar media and modified Luria-Bertani agar media plates and incubated again for seven days at 22 °C, 30 °C and 37 °C. Morphologically distinguished colonies were isolated and analyzed using optical microscopy. Genomic DNA was extracted from the isolates by phenol/chloroform/isoamyl alcohol extraction. Cell pellets were washed and resuspended in

10mM Tris hydrochloride buffer (pH = 8) containing 1mM EDTA and 3 mg/mL lysozyme. The suspension was incubated at 60 °C for an hour. 15 ug/mL sodium dodecyl sulfate, Proteinase K and RNAase. Samples were incubated at 60 °C for an hour. Phenol solution (equilibrated with

10 mM Tris HCl, pH 8.0, 1 mM EDTA) was added to samples. All samples were centrifuged at

12000 rpm to collect aqueous layer containing DNA. The layer was washed with a mixture of chloroform and isoamyl alcohol several times to remove phenol. The purity of the DNA was assessed by the ratios of absorbance at 260 nm/280 nm and 260 nm/230 nm on Thermo Fisher

Scientific NanoDrop™ 2000/c Spectrophotometer. Genomic DNA was stored at -20 °C until further steps.

Ribosomal RNA (rRNA) gene sequencing technique was used for phylogenetic characterization of the four isolated colonies. Three sets of primers were used to amplify rRNA gene. 16S_F (5’- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACG

GGNGGCWGCAG -3’) and 16S_R(5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGA

CAGGACTACHVGGGTATCT-AATCC-3’) primers (T m of 60 °C) target bacterial 16S V3 and

V4 region 183 amplifying region of 500-600 bp.ITS1 (5’-TCCGTAGGTGAACCTGCGG-3’) and

ITS4 (5’- TCCTCCGCTTATTGATATGC-3’) primers (T m of 57 °C) target highly variable fungal

Internal Transcribe Sequences (ITS) surrounding the 5.8S-coding sequence that is situated

154

between the Nuclear Small rDNA and Nuclear Large rDNA of the ribosomal operon 184 amplifying region of 800-1000 bp. (GS primer info). Amplification was performed on extracted genomic DNA as a template by Phusion High-Fidelity DNA Polymerase in Phusion HF Buffer from Thermo

Fisher Scientific. PCR conditions were as follows: initial denaturation at 98 °C for 3 min; 30 cycles of 10 sec denaturation at 98 °C, 30 sec primer annealing at a primer-specific melting temperature

(T m), 30 sec extension at 72 °C; final annealing for 10 min. The amplified DNA was separated on agarose gel by electrophoresis and extracted using the GeneJET Gel Extraction Kit by Thermo

Fisher Scientific. The purified amplicons were sequenced from Genewiz Inc. The sequences obtained were aligned to NCBI gene bank database using the Basic Local Alignment Search Tool

(BLAST). The sequence alignments that elicited 100% query coverage with 100% identity and 0.0

E value were analyzed further.

rDNA sequencing analysis characterized the four isolated contaminants as one bacterium and three fungi. Paraburkholderia fungorum strain OX1216 was identified .

Whereas, Papiliotrema flavescens strain 7 (also known as Cryptococcus flavescens ) and

Rhodotorula mucilaginosa strain F.P705162591 were identified as yeast belonging to basidiomycota and basidiomycete classes respectively. The fourth multicellular filamentous fungus was identified as Penicillium chrysogenum strain HGQ6. Paraburkholderia fungorum and

Papiliotrema flavescens were isolated as pure cultures. Rhodotorula mucilaginosa and Penicillium chrysogenum were not possible to isolate from their co-cultures with Paraburkholderia fungorum.

Paraburkholderia sp. is reported to occur in symbiotic relationship with yeast and fungus185 .

Various strains of Papiliotrema flavescens possess the glycoside hydrolase activity 186 . Bacterial strain Paraburkholderia fungorum is reported to exhibit β-galactosidase activity 187 .

155

Atomic force microscopy (AFM) images were captured using Veeco multimode 8 system and RTESPA-525 cantilever tip with force constant of 200 N.m -1 and resonant frequency of 525 kHz. A drop of 0.001% (w/w) suspension of ligno-nanocellulose in water was put on freshly cleaved mica sheet and dried in air at 20 °C before AFM analysis. The AFM images were captured just after the synthesis of nanocellulose and after 10 months of storage in closed container at 4 °C.

Figure 6-5 shows the AFM height image of freshly prepared nanocellulose synthesized from cherry veneer using Fe@MagTEMPO catalyst. It is visible that cellulose nanofibrils are individually separated and well dispersed on the mica due to the negative surface charge introduced after oxidation. The width of nanofibrils estimated from the height images are in the range of 2-5 nm, which is in accordance with the previous reports 188 . Figure 6-5 shows the height image of nanocellulose sample after 10 months of storage at 4 °C. It can be clearly seen that the nanofibrils were broken down to smaller pieces and shorter fibrils and chunks of these nanofibrils were aggregated.

Figure 6-5 AFM height profile of nanocellulose samples synthesized from cherry veneer

using Fe@MagTEMPO catalyst. (a) Freshly synthesized nanocellulose sample and (b)

nanocellulose sample stored at 4 °C for 10 months

156

In addition, the assembly of microorganisms on the cellulose nanofibrils was identified by

AFM (Figure 6-6). The AFM height as well as adhesion and peak force error images showed that the microbial growth and assembly of microorganisms specifically on individual nanofibrils. The dimensions of the particles on nanocellulose samples were approximately 200-500 nm in width and 20-80 nm in height.

Figure 6-6 AFM image showing assembly of microbes on nanocellulose fibrils synthesized

from cherry veneer using Fe@MagTEMPO catalyst

157

Chapter 7: Conclusions

Terpenoids are a diverse class of natural products with great industrial potential but are produced in their natural hosts such as plants in low quantities. These natural sources fail to meet the increasing demand for these molecules. Total chemical syntheses of terpenoids have been attempted but usually involves low yielding steps requiring hazardous chemicals. The natural biochemistry is still the best path in terms of modularity and ability for widespread adoption for terpenoid production.

Application of microbial cell factories for terpenoid biosynthesis has provided an alternative with advancements in the synthetic biology and metabolic engineering tools. To design the system efficiently, a thorough knowledge of the biosynthesis is necessary. The terpenoid biosynthesis route in the natural hosts contains an upstream pathway of synthesis of C5 precursors

IPP and DMAPP from central carbon metabolism; and a downstream pathway of polymerization of the precursors followed by cyclization and activation. There are two upstream pathways for terpenoid production- MVA pathway and MEP pathway. The MEP pathway is energetically balanced and more efficient than the MVA pathway in the utilization of sugars 28 . MEP pathway is also natively expressed in bacteria which have proven to be excellent heterologous hosts for terpenoid biosynthesis 25,79,118 . The yields of terpenoids through such platforms are still lower than their theoretical maxima and various strategies have been undertaken to optimize and improve the flux through the pathway. These efforts for improving the heterologous terpenoid production will be in vain if the large-scale bioprocess development is not possible. Lack of suitable terpenoid recovery technique is impeding the terpenoid commercialization. Hence, I set the objective to

158

design and improve biocatalysis and bioprocess for terpenoid production in E. coli in a holistic way.

The overall question was ‘despite tremendous economic advantages, why there are very few examples of commercial terpenoid production?’ This question was broken down into three well-identified research objectives. The first objective sought to augment the overall terpenoid production through improvement in the intracellular pool of C5 precursors – IPP and DMAPP.

The second objective was to study the terpene synthase catalysis with a focus on the factors governing their promiscuity, and the third objective was to design an efficient terpene recovery technique for large scale bioprocessing.

7.1 Improvement in the intracellular pool of C5 precursors

The MEP pathway is the first bottleneck in the biosynthetic route. Various strategies have been undertaken to optimize and improve the MEP pathway flux. The reconstructed the MEP pathway operon from native MEP pathway genes achieved similar performance to previously reported best-in-class constructs that used multivariate modular metabolic engineering for taxol precursor biosynthesis 28 . This construct was considered a basis in the study of augmenting C5 precursor availability. Isoprene and lycopene were used as the reporter molecules.

The strategy used for enhancement of the reporter terpenoids through MEP pathway engineering involved a metagenomic search for more active and stable orthologs of pathway bottleneck enzymes – Dxs, IspD, IspF and Idi. The recent discoveries of IspDF fusion enzymes have predicted to channel the substrate from one active site to the next and decrease diffusional limitations. The metagenomic search hence conducted was tailored towards the search for such enzyme fusions. Three fusions were discovered and all co-localized IspD and IspF active sites onto 159

a single scaffold. When these fusions were tested, one fusion (IspDF 1) increased the lycopene titers by 17 % and isoprene titers by 73 %. When the cell densities were factored in, IspDF 1 construct enhanced the normalized titers by 9 % and 60 % for lycopene and isoprene respectively. The fusion not only reduced the metabolic burden but also enhanced terpenoid production. But this differential behavior for hemiterpenes like isoprene and tetraterpene like lycopene remained unexplained.

The improvement in the flux on fusion expression was initially contributed to the co- localization effect that channeled the substrate effectively through the pathway. This directed the next study on the role linkers play in the effectiveness of the enzyme fusion. This study was conducted on non-natural enzyme chimeras of IspDF 1 as well as E. coli IspD and IspF. Four

different linkers (flexible linker, rigid linker, linker from cjIspDF and linker from IspDF 1) were tested. The rigid linker for IspDF 1 chimera improved the normalized lycopene titers by 34 % but lowered overall production by 21 %. This was due to the lower cell growth rate that reflected in lower OD 600 of the cultures. Whereas for non-natural fusions of native IspD and IspF fusion chimeras, none of the fusions showed a positive effect. In the study so far, the role of IspE which catalyzes the intermediary step was unclear. Hence, a similar set of experiments were done for the operon now also overexpressing IspE. To our surprise, this had a net negative effect on lycopene production. This observation was attributed to metabolic stress.

Higher normalized titers for fusion of IspD 1 and IspF 1 linked by the rigid linker directed us to separate the fusion as two genes each expressing the domain individually. This exhibited the outcomes similar to that of the fused enzyme. When IspE was overexpressed along with the operon comprising the best performing fusion, normalized titers increased by 23 % but the overall lycopene titers dropped by 31 %. This concluded that the individual domains in the IspDF 1 domain were higher active and the effect of fusion was no more than reducing transcriptional burden. 160

Another approach of linking enzymes catalyzing consecutive steps in the MEP pathway was then considered to augment C5 precursor pool in E. coli . The IspDE and IspEF fusions with flexible linker were constructed for native E. coli enzymes. The IspD FL EF fusion produced lycopene at the highest titers of 281 mg/L. This titer is 20% higher than the basal strain SALyc-

SDFI (overexpressing Dxs, IspD, IspF and Idi), is 2.3-fold strain SALyc-SDFEI (overexpressing

Dxs, IspD, IspF, IspE and Idi) and is 70-fold SALyc strain (that utilizes the genomic MEP pathway and does not contain any additional MEP gene cassette). In this work, we demonstrated the strategy of non-naturally co-localizing enzymes involved to enhance the flux through the pathway thereby increasing the intracellular precursor pool.

7.2 Understanding the biocatalytic sequence of steps in the MEP pathway

The IspDF 1 fusion improved the flux through the MEP pathway but IspDF 2 and IspDF 3 lowered it. Moreover, IspD FL E fusion was highly effective that its unfused variant. These results put forth the question on the relevance of the natural existence of fusions of IspDF that catalyze non-consecutive steps in the pathway. Here I proposed a bifurcation in the step wherein, IspE can also precede IspD in the catalytic cascade. The in vitro reactions of IspE were analyzed by ATP bioluminescence assay, HPLC and LC-MS. The analysis led to the characterization of ME-2,4-PP as a product. The chemical synthesis of ME-2,4-PP was challenging and failed, and hence the final product could not be characterized in greater detail. In addition to this, the IspE reactions accepted both ATP and ADP as phosphate donor. This study opened a possibility for potential plasticity in the MEP pathway.

161

The strategy employed to improve the flux through the MEP pathway is not limited to only the MEP pathway but has wider applicability. Moreover, this study also guided the further investigation of the pathway cascade. Though much more work needs to be done to verify it in detail, this is the first report of the existence of the MEP pathway bifurcation. The robust and high precursor producing strain can not only be employed to improve terpenoid yield but also used to mine newer terpenoids that otherwise remain undetected in the natural host due to lower abundance.

7.3 Study of terpene synthase biocatalysis

Terpene synthases are reported to possess catalytic promiscuity that converts single substrate to an array of products that are usually structural isomers of each other. The strategies of improvement of precursor supply are in vain if the TS enzymes downstream are not selectively utilizing the precursors to make only the desired product. Few examples are TS are highly selective but there is not a selective enzyme for all the terpenoids known. Hence, the only option available is to engineer TSs to reduce their promiscuity. Individual attempts have been made in this direction and have proven to be useful 137,156,189,190 . The fact often ignored is the natural hosts such as plants that are the source of these enzymes display the presence of more than one TSs. Evolution has diversified the enzymes to achieve newer functions. The objective of this study was to investigate

TS promiscuity in the framework of C5 precursor abundance. The factor of evolutionary pressure was kept constant by selecting four enzymes carene synthase, myrcene synthase, limonene synthase and linalool synthase from Norway spruce. We reported the successful expression of

MyrTS, LimTS and LinTS in E. coli and characterization of the product spectra.

162

The first conclusion of this work was that the product distribution changed when studied in vivo . In addition, MyTS which was reported to be highly selective enzyme 136 in vitro reactions with GPP, made carene and pinene as by-products in vivo . Newer products were characterized for other TSs as well. The study also exhibited the influence of fermentation parameters like oxygen supply on product distribution. The highest terpenoid titer achieved for CarTS was 12.64 µg/L carene, for MyrTS was 0.84 µg/L myrcene, for LimTS was 1.08 µg/L limonene and for LinTS was

10.33 µg/L linalool. The substrate and TSS docking studies on homology models generated for the enzymes revealed the plausible role of amino acid residues taking part in the active site structure.

The residues like arginine were dominant in CarTS and LimTS that form cyclic products; whereas, residues like phenylalanine were dominant in MyrTS and LinTS that form acyclic products. The conclusions from this comprehensive study can be utilized to engineer each one of them for the desired activity.

7.4 Bioprocess development for terpenoid recovery

There are many lab scale successes of efficient strain engineering for terpenoid production.

But the scale up is challenging mainly due to the product recovery technique (such as biphasic extraction) that is suitable at a small scale but loses the process profitability at a large scale. The module – fluidized bed capture device (FBCD) was designed that required minimal hardware perturbations for integration with the bioreactor. The device design was made to utilize volatility and hydrophobicity of the terpenoids to recover them from the fermentation. The device contains resin bed fluidized in water by the air flow out from the bioreactor. The terpenoids in the air outlet were efficiently captured by adsorption. The terpenoids were then recovered from the resin by desorption using solvents like methanol and hexane. The terpenoid products then can be distilled 163

and the solvents can be recycled. Methanol and hexane are defined in class II of restricted solvents in pharmaceuticals by ICH (International Council for Harmonisation) Q3C guidelines, and their residual content should be closely monitored in the final product. The process can also be tailored to utilize class III solvents like acetone, isopropanol, pentane which are Generally Recognized As

Safe (GRAS).

The recovery of carene increased six-fold when the dodecane based biphasic in situ extraction was replaced by FBCD. The process was validated for SACar and SAMyr strains; and their product distributions were similar to that obtained in the aerobic shake flask fermentations in my previous study. The titers of major terpenoid achieved with the optimized bioprocess with

FBCD were 26.6 µg/L for SACar and 14.6 µg/L for SAMyr. The fold improvement was a result of improved aeration in the system as well as reduced product toxicity.

This architecture developed and studied for terpenoid bioprocesses, finds wider applications for products with similar characteristics like butanol and alkanes.

In conclusion, the entire work was conducted in a holistic way to address the existing challenges in the terpenoid biomanufacturing. Various expertise gained during this work were utilized to study problems in the competing fields that are summarized in Chapter 6 of the thesis.

164

7.5 Future work

This work was based on addressing three major challenges in the field of study. The work

led to many novel outcomes and observations. Further experiments and tests towards bolstering

claims in this thesis as well as providing depth to the scientific findings can be carried out. This

chapter discusses different aspects of these future efforts in detail.

Improvement in MEP pathway flux (Chapter 2)

Instead of using reporter terpenoids and indirect measurements of the pathway flux, a more insightful study could be carried using radio- or isotopically labelled substrates. The in vivo kinetic parameter estimation for each catalytic step would provide greater insights into the flux determination. Another simpler approach would be to use defined carbon source (like glucose or glycerol) in minimal media and quantify yield for the reporter terpenoids. The present work did not consider this as an option due to slower growth rates in minimal media.

The metagenomic bifunctional enzyme screening could be applied to different genomic libraries to identify suitable targets. The advanced tools can be used to effectively develop homology models for the natural as well as non-natural enzymes. The best performing fusions

IspDF 1 and IspD FL E could be purified to obtain crystal structures and to study the effect of

substrate channeling further. To evaluate the observations of IspE overexpression, the

transcriptomic analysis would reveal its influence on cellular activity as well as the fair comparison

can then be carried out on normalizing the terpenoid titers with individual enzyme expression

levels. Moreover, parallel experiments could be carried out in cell-free systems and the optimal

MEP pathway enzyme concentrations could be quantified.

165

The activity of IspD 1 and IspF 1 domains can be estimated and depending on the results, more study can be conducted by replacing each of the domain with E. coli monofunctional enzyme for example fusions of IspD 1 and E. coli IspF. Such a study can be extensively carried out with domains of other natural fusions as well as non-natural IspE fusions. As a step forward, the sequence-activity correlations would result in evolving the IspD and IspF enzymes for higher activity. These experiments in totality can also guide the engineering of MEP pathway for improved terpenoid production.

MEP pathway biochemistry (Chapter 3)

The study undertaken in this work was rather serendipitous and for proof-of-concept. The detailed analysis of the MEP pathway cascade is necessary in order to confirm. The direction of the future work would be towards detail analysis of each reaction step using a isotopically labeled substrate. There are no reports of testing MEP as a substrate for IspE. In vivo product characterization of these steps can add great value to the work in the direction of understanding the MEP pathway. ME-2,4-PP chemical synthesis can be pursued further with the newer approach and the products can be characterized in detail.

Terpene synthase study (Chapter 4)

The bioprocessing parameters like oxygenation and pH that influenced the titer of

terpenoid, and the product spectra could be studied for other TSs to deduce the exact mechanism.

The MTSs studied could be purified and crystal structure could be determined to reveal the active

site architecture. The observations of the homology modeling and structure-function correlations

can also be verified further with mutations. The product spectra of the mutants can be correlated 166

back to the structure of the active site. The comprehensive study would then generate an excellent

model to engineer monoterpene synthases. The strategy of comparison of in silico observation with

in vivo product profile of the MTSs can be applied to other species of plants as well to other classes

of TSs. A database of such observations can be used to direct the evolution studies for

improvement of TS activity.

Terpenoid recovery from the bioprocess (Chapter 5)

The efficiency of the developed capture technique could be improved further in several ways. The fluidized bed height can be increased to allow longer contact of the rising air bubbles with water as well as resin beads. A gas distributor plate can be added at each of the nozzle and an air bubble breaker can be used that will distribute the bubbles in smaller sizes increasing the area for stripping. Employment of ways by which the beads will be in direct contact with the culture could highly improve the recovery. I attempted this by dispersing beads in the fermentation media but the growing cells clogged the pores moreover, downstream processing and recovery of terpenoids were cumbersome. Another approach will be- use porous adsorbent pads either integrated into the bioreactor headspace or covering the top layer of culture. For later to work, the material of adsorbent has to be resistant to biofilm formation. More work on this topic is currently being pursued by our collaborators.

167

Bibliography

1. The Global Chemical Industry: Catalyzing Growth and Addressing Our World’s Sustainability Challenges . (2019). 2. Christianson, D. W. Structural and Chemical Biology of Terpenoid Cyclases. Chemical Reviews 117, 11570–11648 (2017). 3. Christmann, M. Otto Wallach: Founder of Terpene Chemistry and Nobel Laureate 1910. Angew. Chemie Int. Ed. 49, 9580–9586 (2010). 4. KEGG PATHWAY: Terpenoid backbone biosynthesis - Reference pathway. Available at: https://www.genome.jp/kegg-bin/show_pathway?map00900. (Accessed: 8th February 2019) 5. Leavell, M. D., McPhee, D. J. & Paddon, C. J. Developing fermentative terpenoid production for commercial usage. Current Opinion in Biotechnology 37, 114–119 (2016). 6. Hirai, S., Utsugi, M., Iwamoto, M. & Nakada, M. Formal total synthesis of (-)-taxol through Pd-catalyzed eight-membered carbocyclic ring formation. Chem. - A Eur. J. 21, 355–359 (2015). 7. Moritz, B. J., Mack, D. J., Tong, L. & Thomson, R. J. Total synthesis of the isodon diterpene sculponeatin N. Angew. Chemie - Int. Ed. 53, 2988–2991 (2014). 8. Long, R. et al. Asymmetric total synthesis of (-)-lingzhiol via a Rh-catalysed [3+2] cycloaddition. Nat. Commun. 5, 5707 (2014). 9. Justicia, J. et al. Bioinspired terpene synthesis: A radical approach. Chemical Society Reviews 40, 3525–3537 (2011). 10. Maimone, T. J. & Baran, P. S. Modern synthetic efforts toward biologically active terpenes. Nature Chemical Biology 3, 396–407 (2007). 11. Yoder, R. A. & Johnston, J. N. A case study in biomimetic total synthesis: Polyolefin carbocyclizations to terpenes and steroids. Chemical Reviews 105, 4730–4756 (2005). 12. Lévesque, F. & Seeberger, P. H. Continuous-flow synthesis of the anti-malaria drug artemisinin. Angew. Chemie - Int. Ed. 51, 1706–1709 (2012). 13. Walji, A. & MacMillan, D. Strategies to Bypass the Taxol Problem. Enantioselective Cascade Catalysis, a New Approach for the Efficient Construction of Molecular Complexity. Synlett 2007, 1477–1489 (2007). 14. Purkayastha, S., Markosyan, A., … J. M.-U. P. A. 15 & 2017, U. Stevia composition to improve sweetness and flavor profile. Google Patents (2016). 15. Palmer, P. I. et al. Estimates of global terrestrial isoprene emissions using MEGAN (Model of Emissions of Gases and Aerosols from Nature). Atmos. Chem. Phys. Discuss. 6, 107–173 (2010). 16. Schindler, J. Terpenoids by Microbial Fermentation. Ind. Eng. Chem. Prod. Res. Dev. 21, 537–539 (1982). 17. Schmidt-Dannert, C. Biosynthesis of terpenoid natural products in fungi. Adv. Biochem. Eng. Biotechnol. 148, 19–61 (2015). 18. Reiling, K. K. et al. Mono and diterpene production in Escherichia coli. Biotechnol. Bioeng. 87, 200–212 (2004). 19. Martin, V. J. J., Pitera, D. J., Withers, S. T., Newman, J. D. & Keasling, J. D. Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol. 21, 796–802 (2003). 168

20. Eisenreich, W., Bacher, A., Arigoni, D. & Rohdich, F. Biosynthesis of isoprenoids via the non-mevalonate pathway. Cellular and Molecular Life Sciences 61, 1401–1426 (2004). 21. Kim, E. M., Eom, J. H., Um, Y., Kim, Y. & Woo, H. M. Microbial synthesis of myrcene by metabolically engineered Escherichia coli. J. Agric. Food Chem. 63, 4606–4612 (2015). 22. Tholl, D. Biosynthesis and biological functions of terpenoids in plants. Adv. Biochem. Eng. Biotechnol. 148, 63–106 (2015). 23. McGarvey, D. J. & Croteau, R. Terpenoid metabolism. Plant Cell 7, 1015–1026 (1995). 24. Vranová, E., Coman, D. & Gruissem, W. Network Analysis of the MVA and MEP Pathways for Isoprenoid Synthesis. Annu. Rev. Plant Biol. 64, 665–700 (2013). 25. Frank, A. & Groll, M. The Methylerythritol Phosphate Pathway to Isoprenoids. Chem. Rev. 117, 5675–5703 (2017). 26. Morrone, D. et al. Increasing diterpene yield with a modular metabolic engineering system in E. coli: Comparison of MEV and MEP isoprenoid precursor pathway engineering. Appl. Microbiol. Biotechnol. 85, 1893–1906 (2010). 27. Yadav, V. G., De Mey, M., Giaw Lim, C., Kumaran Ajikumar, P. & Stephanopoulos, G. The future of metabolic engineering and synthetic biology: Towards a systematic practice. Metab. Eng. 14, 233–241 (2012). 28. Ajikumar, P. K. et al. Isoprenoid Pathway Optimization for Taxol Precursor Overproduction in Escherichia coli. Science (80). 330, 70–74 (2010). 29. Yokota, A. & Sasajima, K. I. Formation of l-deoxy-D-threo-pentulose and 1-deoxy-L-threo- pentulose by cell-free extracts of microorganisms. Agric. Biol. Chem. 48, 149–158 (1984). 30. Yokota, A. & Sasajima, K. I. Formation of 1-deoxy-ketoses by pyruvate dehydrogenase and acetoin dehydrogenase. Agric. Biol. Chem. 50, 2517–2524 (1986). 31. Rohmer, M. The discovery of a mevalonate-independent pathway for isoprenoid biosynthesis in bacteria, algae and higher plants. Nat. Prod. Rep. 16, 565–574 (1999). 32. Zhou, D. & White, R. H. Early steps of isoprenoid biosynthesis in Escherichia coli. Biochem. J. 273, 627–634 (1991). 33. Sprenger, G. A. et al. Identification of a thiamin-dependent synthase in Escherichia coli required for the formation of the 1-deoxy-D-xylulose 5-phosphate precursor to isoprenoids, thiamin, and pyridoxol. Proc. Natl. Acad. Sci. 94, 12857–12862 (1997). 34. Lois, L. M. et al. Cloning and characterization of a gene from Escherichia coli encoding a transketolase-like enzyme that catalyzes the synthesis of D-1-deoxyxylulose 5-phosphate, a common precursor for isoprenoid, thiamin, and pyridoxol biosynthesis. Proc. Natl. Acad. Sci. U. S. A. 95, 2105–10 (1998). 35. Ye, L., Lv, X. & Yu, H. Engineering microbes for isoprene production. Metabolic Engineering 38, 125–138 (2016). 36. Banerjee, A. et al. Feedback inhibition of deoxy-D-xylulose-5-phosphate synthase regulates the methylerythritol 4-phosphate pathway. J. Biol. Chem. 288, 16926–16936 (2013). 37. Ghirardo, A. et al. Metabolic Flux Analysis of Plastidic Isoprenoid Biosynthesis in Poplar Leaves Emitting and Nonemitting Isoprene. PLANT Physiol. 165, 37–51 (2014). 38. Rohmer, M., Knani, M., Simonin, P., Sutter, B. & Sahm, H. Isoprenoid biosynthesis in bacteria: a novel pathway for the early steps leading to isopentenyl diphosphate. Biochem. J. 295, 517–524 (1993). 39. Rohmer, M., Seemann, M., Horbach, S., Bringer-Meyer, S. & Sahm, H. Glyceraldehyde 3- phosphate and pyruvate as precursors of isoprenic units in an alternative non-mevalonate 169

pathway for terpenoid biosynthesis. J. Am. Chem. Soc. 118, 2564–2566 (1996). 40. Rohdich, F. et al. Cytidine 5’-triphosphate-dependent biosynthesis of isoprenoids: YgbP protein of Escherichia coli catalyzes the formation of 4-diphosphocytidyl-2-C- methylerythritol. Proc. Natl. Acad. Sci. 96, 11758–11763 (1999). 41. Lüttgen, H. et al. Biosynthesis of terpenoids: YchB protein of Escherichia coli phosphorylates the 2-hydroxy group of 4-diphosphocytidyl-2C-methyl-D-erythritol. Proc. Natl. Acad. Sci. U. S. A. 97, 1062–7 (2000). 42. Lange, B. M. & Croteau, R. Isopentenyl diphosphate biosynthesis via a mevalonate- independent pathway: isopentenyl monophosphate kinase catalyzes the terminal enzymatic step.file:///C:/Users/kwgeorge/Downloads/1-s2.0-S0006291X06000167-main.pdf. Proc. Natl. Acad. Sci. U. S. A. 96, 13714–9 (1999). 43. Herz, S. et al. Biosynthesis of terpenoids: YgbB protein converts 4-diphosphocytidyl-2C- methyl-D-erythritol 2-phosphate to 2C-methyl-D-erythritol 2,4-cyclodiphosphate. Proc. Natl. Acad. Sci. U. S. A. 97, 2486–2490 (2000). 44. Richard, S. B. et al. Structure and mechanism of 2-C-methyl-D-erythritol 2,4- cyclodiphosphate synthase. An enzyme in the mevalonate-independent isoprenoid biosynthetic pathway. J. Biol. Chem. 277, 8667–72 (2002). 45. Kipchirchir Bitok, J. & Meyers, C. F. 2 C-methyl-d-erythritol 4-phosphate enhances and sustains cyclodiphosphate synthase IspF activity. ACS Chem. Biol. 7, 1702–1710 (2012). 46. Hecht, S. et al. Studies on the nonmevalonate pathway to terpenes: The role of the GcpE (IspG) protein. Proc. Natl. Acad. Sci. 98, 14837–14842 (2001). 47. Seemann, M. et al. Isoprenoid biosynthesis through the methylerythritol phosphate pathway: The (E)4-hydroxy-3-methylbut-2-enyl diphosphate synthase (GcpE) is a [4Fe-4S] protein. Angew. Chemie - Int. Ed. 41, 4337–4339 (2002). 48. Lange, H., Kaut, A., Kispal, G. & Lill, R. A mitochondrial ferredoxin is essential for biogenesis of cellular iron-sulfur proteins. Proc. Natl. Acad. Sci. U. S. A. 97, 1050–1055 (2000). 49. Altincicek, B. et al. LytB, a novel gene of the 2- C -methyl- D -erythritol 4-phosphate pathway of isoprenoid biosynthesis in Escherichia coli. FEBS Lett. 499, 37–40 (2001). 50. Rohdich, F. et al. The deoxyxylulose phosphate pathway of isoprenoid biosynthesis: Studies on the mechanisms of the reactions catalyzed by IspG and IspH protein. Proc. Natl. Acad. Sci. 100, 1586–1591 (2003). 51. Heider, S. A. E., Wolf, N., Hofemeier, A., Peters-Wendisch, P. & Wendisch, V. F. Optimization of the IPP Precursor Supply for the Production of Lycopene, Decaprenoxanthin and Astaxanthin by Corynebacterium glutamicum. Front. Bioeng. Biotechnol. 2, 28 (2014). 52. Shin, B.-K., Ahn, J.-H. & Han, J. N -Terminal Region of GbIspH1, Ginkgo biloba IspH Type 1, May Be Involved in the pH-Dependent Regulation of Enzyme Activity. Bioinorg. Chem. Appl. 2015, 1–8 (2015). 53. Kwon, M., Shin, B. K., Lee, J., Han, J. & Kim, S. U. Characterization of Burkholderia glumae BGR1 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (HDR), the terminal enzyme in 2-C-methyl-d-erythritol 4-phosphate (MEP) pathway. J. Korean Soc. Appl. Biol. Chem. 56, 35–40 (2013). 54. Puan, K. J., Wang, H., Dairi, T., Kuzuyama, T. & Morita, C. T. fldA is an essential gene required in the 2-C-methyl-D-erythritol 4-phosphate pathway for isoprenoid biosynthesis. 170

FEBS Lett. 579, 3802–3806 (2005). 55. Hahn, F. M., Baker, J. A. & Poulter, C. D. Open reading frame 176 in the photosynthesis gene cluster of Rhodobacter capsulatus encodes idi, a gene for isopentenyl diphosphate isomerase. J. Bacteriol. 178, 619–24 (1996). 56. de Ruyck, J., Durisotti, V., Oudjama, Y. & Wouters, J. Structural role for Tyr-104 in Escherichia coli isopentenyl-diphosphate isomerase: site-directed mutagenesis, enzymology, and protein crystallography. J. Biol. Chem. 281, 17864–9 (2006). 57. Wouters, J. et al. Catalytic mechanism of Escherichia coli isopentenyl diphosphate isomerase involves Cys-67, Glu-116, and Tyr-104 as suggested by crystal structures of complexes with transition state analogues and irreversible inhibitors. J. Biol. Chem. 278, 11903–11908 (2003). 58. Brammer, L. A., Smith, J. M., Wades, H. & Meyers, C. F. 1-Deoxy-D-xylulose 5-phosphate synthase catalyzes a novel random sequential mechanism. J. Biol. Chem. 286, 36522–36531 (2011). 59. Fox, D. T. & Poulter, C. D. Mechanistic studies with 2-C-methyl-D-erythritol 4-phosphate synthase from Escherichia coli. Biochemistry 44, 8360–8368 (2005). 60. Rohdich, F. et al. Biosynthesis of terpenoids: 4-Diphosphocytidyl-2-C-methyl-d-erythritol kinase from tomato. Proc. Natl. Acad. Sci. 97, 8251–8256 (2000). 61. Tang, M., Odejinmi, S. I., Allette, Y. M., Vankayalapati, H. & Lai, K. Identification of novel small molecule inhibitors of 4-diphosphocytidyl-2- C-methyl-d-erythritol (CDP-ME) kinase of Gram-negative bacteria. Bioorganic Med. Chem. 19, 5886–5895 (2011). 62. Geist, J. G. et al. Thiazolopyrimidine inhibitors of 2-methylerythritol 2,4-cyclodiphosphate synthase (IspF) from Mycobacterium tuberculosis and Plasmodium falciparum. ChemMedChem 5, 1092–1101 (2010). 63. Xiao, Y., Zahariou, G., Sanakis, Y. & Liu, P. IspG Enzyme activity in the deoxyxylulose phosphate pathway: Roles of the iron-sulfur cluster. Biochemistry 48, 10483–10485 (2009). 64. Xiao, Y., Chu, L., Sanakis, Y. & Liu, P. Revisiting the IspH Catalytic System in the Deoxyxylulose Phosphate Pathway: Achieving High Activity. J. Am. Chem. Soc. 131, 9931–9933 (2009). 65. Bian, G., Ma, T. & Liu, T. In Vivo Platforms for Terpenoid Overproduction and the Generation of Chemical Diversity. in Methods in Enzymology 608, 97–129 (Academic Press, 2018). 66. Gabrielsen, M. et al. Hexameric assembly of the bifunctional methylerythritol 2,4- cyclodiphosphate synthase and protein-protein associations in the deoxy-xylulose- dependent pathway of isoprenoid precursor biosynthesis. J. Biol. Chem. 279, 52753–61 (2004). 67. Testa, C. A. & Brown, M. J. The methylerythritol phosphate pathway and its significance as a novel drug target. Curr. Pharm. Biotechnol. 4, 248–59 (2003). 68. Testa, C. A., Cornish, R. M. & Poulter, C. D. The Sorbitol Phosphotransferase System Is Responsible for Transport of 2-C-Methyl-D-Erythritol into Salmonella enterica Serovar Typhimurium. J. Bacteriol. 186, 473–480 (2004). 69. Puan, K. J. et al. Preferential recognition of a microbial metabolite by human V γ2V δ2 T cells. Int. Immunol. 19, 657–673 (2007). 70. Connolly, D. M. & Winkler, M. E. Genetic and physiological relationships among the miaA gene, 2-methylthio-N6-(delta 2-isopentenyl)-adenosine tRNA modification, and 171

spontaneous mutagenesis in Escherichia coli K-12. J. Bacteriol. 171, 3233–46 (1989). 71. Okada, K. et al. The ispB gene encoding octaprenyl diphosphate synthase is essential for growth of Escherichia coli. J. Bacteriol. 179, 3058–3060 (1997). 72. Apel, A. R., Ouellet, M., Szmidt-Middleton, H., Keasling, J. D. & Mukhopadhyay, A. Evolved hexose transporter enhances xylose uptake and glucose/xylose co-utilization in Saccharomyces cerevisiae. Sci. Rep. 6, (2016). 73. Jo, S. et al. Modular pathway engineering of Corynebacterium glutamicum to improve xylose utilization and succinate production. J. Biotechnol. 258, 69–78 (2017). 74. Lim, J. H. & Jung, G. Y. A simple method to control glycolytic flux for the design of an optimal cell factory. Biotechnol. Biofuels 10, 160 (2017). 75. Bakker, B. M., Westerhoff, H. V, Opperdoes, F. R. & Michels, P. A. . Metabolic control analysis of glycolysis in trypanosomes as an approach to improve selectivity and effectiveness of drugs. Molecular and Biochemical Parasitology 106, 1–10 (2000). 76. Alberty, R. A. Systems of biochemical reactions from the point of view of a semigrand partition function. Biophys. Chem. 93, 1–10 (2001). 77. Liu, H. et al. MEP pathway-mediated isopentenol production in metabolically engineered Escherichia coli. Microb. Cell Fact. 13, 135 (2014). 78. Farmer, W. R. & Liao, J. C. Precursor balancing for metabolic engineering of lycopene production in escherichia coli. Biotechnol. Prog. 17, 57–61 (2001). 79. Xue, D., Abdallah, I. I., de Haan, I. E. M., Sibbald, M. J. J. B. & Quax, W. J. Enhanced C30carotenoid production in Bacillus subtilis by systematic overexpression of MEP pathway genes. Appl. Microbiol. Biotechnol. 99, 5907–5915 (2015). 80. Munoz-Bertomeu, J., Arrillaga, I., Ros, R. & Segura, J. Up-Regulation of 1-Deoxy-D- Xylulose-5-Phosphate Synthase Enhances Production of Essential Oils in Transgenic Spike Lavender. PLANT Physiol. 142, 890–900 (2006). 81. Lv, X. et al. Combinatorial pathway optimization in Escherichia coli by directed co- evolution of rate-limiting enzymes and modular pathway engineering. Biotechnol. Bioeng. 113, 2661–2669 (2016). 82. Zhou, K., Zou, R., Stephanopoulos, G. & Too, H.-P. Enhancing solubility of deoxyxylulose phosphate pathway enzymes for microbial isoprenoid production. Microb. Cell Fact. 11, 148 (2012). 83. Gabrielsen, M. et al. Biosynthesis of isoprenoids a bifunctional IspDF enzyme from Campylobacter jejuni. Eur. J. Biochem. 271, 3028–3035 (2004). 84. Testa, C. A., Lherbet, C., Pojer, F., Noel, J. P. & Poulter, C. D. Cloning and expression of IspDF from Mesorhizobium loti. Characterization of a bifunctional protein that catalyzes non-consecutive steps in the methylerythritol phosphate pathway. Biochim. Biophys. Acta - Proteins Proteomics 1764, 85–96 (2006). 85. Lherbet, C., Pojer, F., Richard, S. B., Noel, J. P. & Poulter, C. D. Absence of substrate channeling between active sites in the Agrobacterium tumefaciens IspDF and IspE enzymes of the methyl erythritol phosphate pathway. Biochemistry 45, 3548–3553 (2006). 86. Brown, E. D., Walsh, C. T., Mindiola, D. J., Gehring, A. M. & Lees, W. J. Acetyltransfer Precedes Uridylyltransfer in the Formation of UDP- N -acetylglucosamine in Separable Active Sites of the Bifunctional GlmU Protein of Escherichia coli †. Biochemistry 35, 579– 585 (2002). 87. Wilding, E. I. et al. Identification, evolution, and essentiality of the mevalonate pathway for 172

isopentenyl diphosphate biosynthesis in gram-positive cocci. J. Bacteriol. 182, 4319–4327 (2000). 88. Hedl, M. et al. Enterococcus faecalis acetoacetyl-coenzyme A thiolase/3-hydroxy-3- methylglutaryl-coenzyme A reductase, a dual-function protein of isopentenyl diphosphate biosynthesis. J. Bacteriol. 184, 2116–2122 (2002). 89. Velayos, A., Eslava, A. P. & Iturriaga, E. A. A bifunctional enzyme with lycopene cydase and phytoene synthase activities is encoded by the carRP gene of Mucor circinelloides. Eur. J. Biochem. 267, 5509–5519 (2000). 90. Arrach, N., Fernandez-Martin, R., Cerda-Olmedo, E. & Avalos, J. A single gene for lycopene cyclase, phytoene synthase, and regulation of carotene biosynthesis in Phycomyces. Proc. Natl. Acad. Sci. 98, 1687–1692 (2001). 91. CLARKE, J. H. et al. Evidence that linker sequences and cellulose-binding domains enhance the activity of hemicellulases against complex substrates. Biochem. J. 319, 515– 520 (2015). 92. Huang, Z., Li, G., Zhang, C. & Xing, X. H. A study on the effects of linker flexibility on acid phosphatase PhoC-GFP fusion protein using a novel linker library. Enzyme Microb. Technol. 83, 1–6 (2016). 93. Elleuche, S. Bringing functions together with fusion enzymes—from nature’s inventions to biotechnological applications. Applied Microbiology and Biotechnology 99, 1545–1556 (2015). 94. Patankar, S. C. & Yadav, G. D. Cascade Engineered Synthesis of γ-Valerolactone, 1,4- Pentanediol, and 2-Methyltetrahydrofuran from Levulinic Acid Using Pd-Cu/ZrO2Catalyst in Water as Solvent. ACS Sustain. Chem. Eng. 3, 2619–2630 (2015). 95. Ma, R., Yang, P., Ma, Y. & Bian, F. Facile Synthesis of Magnetic Hierarchical Core–Shell Structured Fe3O4@PDA-Pd@MOF Nanocomposites: Highly Integrated Multifunctional Catalysts. ChemCatChem 10, 1446–1454 (2018). 96. Yu, K., Liu, C., Kim, B. G. & Lee, D. Y. Synthetic fusion protein design and applications. Biotechnology Advances 33, 155–164 (2015). 97. Hailes, H. et al. Multi-step biocatalytic strategies for chiral amino alcohol synthesis. Enzyme Microb. Technol. 81, 23–30 (2015). 98. Both, P. et al. Whole-Cell Biocatalysts for Stereoselective C-H Amination Reactions. Angew. Chemie - Int. Ed. 55, 1511–1513 (2016). 99. Hartmann, M., Lee, S., Hallam, S. J. & Mohn, W. W. Bacterial, archaeal and eukaryal community structures throughout soil horizons of harvested and naturally disturbed forest stands. Environ. Microbiol. 11, 3045–3062 (2009). 100. Taupp, M., Lee, S., Hawley, A., Yang, J. & Hallam, S. J. Large Insert Environmental Genomic Library Production. J. Vis. Exp. (2009). doi:10.3791/1387 101. Wright, J. J., Lee, S., Zaikova, E., Walsh, D. a & Hallam, S. J. DNA Extraction from 0.22 &mu;M Sterivex Filters and Cesium Chloride Density Gradient Centrifugation. J. Vis. Exp. 3–6 (2009). doi:10.3791/1352 102. Lee, S. & Hallam, S. J. Extraction of High Molecular Weight Genomic DNA from Soils and Sediments. J. Vis. Exp. 2–5 (2009). doi:10.3791/1569 103. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res. 39, W29-37 (2011). 104. Alonso-Gutierrez, J. et al. Metabolic engineering of Escherichia coli for limonene and 173

perillyl alcohol production. Metab. Eng. 19, 33–41 (2013). 105. Cunningham, F. X., Sun, Z., Chamovitz, D., Hirschberg, J. & Gantt, E. Molecular structure and enzymatic function of lycopene cyclase from the cyanobacterium Synechococcus sp strain PCC7942. Plant Cell 6, 1107–1121 (1994). 106. Sasaki, K., Ohara, K. & Yazaki, K. Gene expression and characterization of isoprene synthase from Populus alba. FEBS Lett. 579, 2514–2518 (2005). 107. Kamiya, N., Kitayama, A., Ueda, H., Nagamune, T. & Arai, R. Design of the linkers which effectively separate domains of a bifunctional fusion protein. Protein Eng. Des. Sel. 14, 529–532 (2002). 108. Chen, X., Zaro, J. L. & Shen, W. C. Fusion protein linkers: Property, design and functionality. Advanced Drug Delivery Reviews 65, 1357–1369 (2013). 109. Schwaneberg, U., Hamer, S. N., Cheng, F., Ruff, A. J. & Yang, J. Screening through the PLICable promoter toolbox enhances protein production in Escherichia coli. Biotechnol. J. 11, 1639–1647 (2016). 110. Li, J., Li, Y., Cui, Z., Liang, Q. & Qi, Q. Enhancement of succinate yield by manipulating NADH/NAD+ ratio and ATP generation. Appl. Microbiol. Biotechnol. 101, 3153–3161 (2017). 111. Rosano, G. L. & Ceccarelli, E. A. Recombinant protein expression in Escherichia coli: Advances and challenges. Frontiers in Microbiology 5, 172 (2014). 112. Albrecht, M., Misawa, N. & Sandmann, G. Metabolic engineering of the terpenoid biosynthetic pathway of Escherichia coli for production of the carotenoids β-carotene and zeaxanthin. Biotechnol. Lett. 21, 791–795 (1999). 113. Lin, P. C. & Pakrasi, H. B. Engineering cyanobacteria for production of terpenoids. Planta 249, 145–154 (2018). 114. Xu, J. et al. Efficient production of lycopene by engineered E. coli strains harboring different types of plasmids. Bioprocess Biosyst. Eng. 41, 489–499 (2018). 115. Shen, H.-J., Hu, J.-J., Li, X.-R. & Liu, J.-Z. Engineering of Escherichia coli for Lycopene Production Through Promoter Engineering. Curr. Pharm. Biotechnol. 16, 1094–1103 (2015). 116. Quin, M. B., Wallin, K. K., Zhang, G. & Schmidt-Dannert, C. Spatial organization of multi- enzyme biocatalytic cascades. Org. Biomol. Chem. 15, 4260–4271 (2017). 117. Idan, O. & Hess, H. Origins of activity enhancement in enzyme cascades on scaffolds. ACS Nano 7, 8658–8665 (2013). 118. Microbiology, A. & Volke, D. Rational engineering of the methylerythritol 4-phosphate ( MEP ) pathway for terpenoid production through metabolic control analysis . 9, (2019). 119. Krasutsky, S. G. et al. Synthesis of methylerythritol phosphate analogues and their evaluation as alternate substrates for IspDF and IspE from Agrobacterium tumefaciens. J. Org. Chem. 79, 9170–9178 (2014). 120. ChemAxon - Software Solutions and Services for Chemistry & Biology. Available at: https://chemaxon.com/products/marvin. (Accessed: 4th March 2019) 121. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–61 (2010). 122. Homer, L. E. et al. Natural variation in the essential oil content of Melaleuca alternifolia Cheel (Myrtaceae). Biochem. Syst. Ecol. 28, 367–382 (2000). 174

123. Rajeswara Rao, B. R., Kaul, P. N., Mallavarapu, G. R. & Ramesh, S. Effect of seasonal climatic changes on biomass yield and terpenoid composition of rose-scented geranium (Pelargonium species). Biochem. Syst. Ecol. 24, 627–635 (1996). 124. Waseem, R. & Low, K. H. Advanced analytical techniques for the extraction and characterization of plant-derived essential oils by gas chromatography with mass spectrometry. J. Sep. Sci. 38, 483–501 (2015). 125. Rajan, R., Chandran, K., Harper, S. L., Yun, S.-I. & Kalaichelvan, P. T. Plant extract synthesized silver nanoparticles: An ongoing source of novel biocompatible materials. Ind. Crops Prod. 70, 356–373 (2015). 126. Cavinder, C., Vogelsang, M., Gibbs, P., Forrest, D. & Scott, B. A Review on the Extraction Methods Use in Medicinal Plants, Principle, Strength and Limitation. Med. Aromat. Plants 04, 7 (2015). 127. Gallezot, P. Conversion of biomass to selected chemical products. Chem. Soc. Rev. 41, 1538–1558 (2012). 128. Liao, J. C., Mi, L., Pontrelli, S. & Luo, S. Fuelling the future: microbial engineering for the production of sustainable biofuels. Nat. Rev. Microbiol. 14, 288–304 (2016). 129. Sawisit, A., Jantama, S. S., Kanchanatawee, S. & Jantama, K. Efficient utilization of cassava pulp for succinate production by metabolically engineered Escherichia coli KJ122. Bioprocess Biosyst. Eng. 38, 175–187 (2015). 130. Edgar, S., Li, F.-S., Qiao, K., Weng, J.-K. & Stephanopoulos, G. Engineering of Taxadiene Synthase for Improved Selectivity and Yield of a Key Taxol Biosynthetic Intermediate. ACS Synth. Biol. acssynbio.6b00206 (2016). doi:10.1021/acssynbio.6b00206 131. Park, S. Y., Binkley, R. M., Kim, W. J., Lee, M. H. & Lee, S. Y. Metabolic engineering of Escherichia coli for high-level astaxanthin production with high productivity. Metab. Eng. 49, 105–115 (2018). 132. Diplom-Biochemiker Raimund Nagel, B., Gershenzon MPI für Chemische Ökologie, J., Hertweck Hans-Knöll-Institut, C. & Tissier, A. The regulatory function of isoprenyl diphosphate synthases in terpene biosynthesis in Halberstadt. (1983). 133. Renata, H., Wang, Z. J. & Arnold, F. H. Expanding the enzyme universe: Accessing non- natural reactions by mechanism-guided directed evolution. Angewandte Chemie - International Edition 54, 3351–3367 (2015). 134. Mafu, S., Sambandaswami, P., Ann, T. & Lee, B. Biosynthesis of the microtubule- destabilizing diterpene pseudolaric acid B from golden larch involves an unusual diterpene synthase. 1–6 (2016). doi:10.1073/pnas.1612901114 135. Köksal, M., Jin, Y., Coates, R. M., Croteau, R. & Christianson, D. W. (Supp Info) Taxadiene synthase structure and evolution of modular architecture in terpene biosynthesis. Nature 469, 116–20 (2011). 136. Martin, D. M., Fäldt, J. & Bohlmann, J. Functional Characterization of Nine Norway SpruceTPS Genes and Evolution of GymnospermTerpene Synthases of the TPS-d Subfamily. Plant Physiol. 135, 1908–1927 (2004). 137. Köksal, M., Jin, Y., Coates, R. M., Croteau, R. & Christianson, D. W. Taxadiene synthase structure and evolution of modular architecture in terpene biosynthesis. Nature 469, 116– 122 (2011). 138. Christianson, D. W. Unearthing the roots of the terpenome. Curr. Opin. Chem. Biol. 12, 141–150 (2008). 175

139. Li, Z. et al. The T296V Mutant of Amorpha-4,11-diene Synthase Is Defective in Allylic Diphosphate Isomerization but Retains the Ability To Cyclize the Intermediate (3R)- Nerolidyl Diphosphate to Amorpha-4,11-diene. Biochemistry 55, 6599–6604 (2016). 140. Srividya, N., Davis, E. M., Croteau, R. B. & Lange, B. M. Functional analysis of (4S)- limonene synthase mutants reveals determinants of catalytic outcome in a model monoterpene synthase. Proc. Natl. Acad. Sci. U. S. A. 112, 3332–7 (2015). 141. Leferink, N. G. H. et al. Experiment and Simulation Reveal How Mutations in Functional Plasticity Regions Guide Plant Monoterpene Synthase Product Outcome. ACS Catal. 8, 3780–3791 (2018). 142. Irmisch, S. et al. One amino acid makes the difference: the formation of ent-kaurene and 16α-hydroxy-ent-kaurane by diterpene synthases in poplar. BMC Plant Biol. 15, 262 (2015). 143. Wildung, M. R. & Croteau, R. A cDNA clone for taxadiene synthase, the diterpene cyclase that catalyzes the committed step of taxol biosynthesis. J. Biol. Chem. 271, 9201–9204 (1996). 144. Yadav, V. G. Unraveling the multispecificity and catalytic promiscuity of taxadiene monooxygenase. J. Mol. Catal. B Enzym. 110, 154–164 (2014). 145. Bian, G. et al. Releasing the potential power of terpene synthases by a robust precursor supply platform. Metab. Eng. 42, 1–8 (2017). 146. Williams, D. C., McGarvey, D. J., Katahira, E. J. & Croteau, R. Truncation of limonene synthase preprotein provides a fully active ‘pseudomature’ form of this monoterpene cyclase and reveals the function of the amino-terminal arginine pair. Biochemistry 37, 12213–12220 (1998). 147. David C. Hyatt, Buhyun Youn, Yuxin Zhao, Bindu Santhamma, Robert M. Coates, Rodney B. Croteau, and C. K. Structure of limonene synthase, a simple model for terpenoid cyclase catalysis. Proc. Natl. Acad. Sci. 104, 5360–5365 (2006). 148. Whittington, D. A. et al. Bornyl diphosphate synthase: structure and strategy for carbocation manipulation by a terpenoid cyclase. Proc. Natl. Acad. Sci. U. S. A. 99, 15375–80 (2002). 149. Bohlmann, J., Steele, C. L. & Croteau, R. Monoterpene synthases from grand fir (Abies grandis). cDNA isolation, characterization, and functional expression of myrcene synthase, (-)-(4S)-limonene synthase, and (-)-(1S,5S)-pinene synthase. J. Biol. Chem. 272, 21784–92 (1997). 150. Lodewyk, M. W., Gutta, P. & Tantillo, D. J. Computational studies on biosynthetic carbocation rearrangements leading to sativene, cyclosativene, α-ylangene, and β-ylangene. J. Org. Chem. 73, 6570–6579 (2008). 151. Fäldt, J., Martin, D., Miller, B., Rawat, S. & Bohlmann, J. Traumatic resin defense in Norway spruce (Picea abies): Methyl jasmonate-induced terpene synthase gene expression, and cDNA cloning and functional characterization of (+)-3-carene synthase. Plant Mol. Biol. 51, 119–133 (2003). 152. Alonso-Gutierrez, J. et al. Metabolic engineering of Escherichia coli for limonene and perillyl alcohol production. Metab. Eng. 19, 33–41 (2013). 153. Zhang, Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9, 40 (2008). 154. Martin, D. M., Fäldt, J. & Bohlmann, J. Functional characterization of nine Norway Spruce TPS genes and evolution of gymnosperm terpene synthases of the TPS-d subfamily. Plant Physiol. 135, 1908–27 (2004). 176

155. Christianson, D. W. Structural and Chemical Biology of Terpenoid Cyclases. Chemical Reviews 117, 11570–11648 (2017). 156. Schrepfer, P. et al. Identification of amino acid networks governing catalysis in the closed complex of class I terpene synthases. Proc. Natl. Acad. Sci. 113, E958–E967 (2016). 157. Ferrara, M. A. et al. Bioconversion of R-(+)-limonene to perillic acid by the yeast Yarrowia lipolytica. Brazilian J. Microbiol. 44, 1075–1080 (2013). 158. PubChem Search. Available at: https://pubchem.ncbi.nlm.nih.gov/search/. (Accessed: 14th February 2019) 159. Degenhardt, J., Köllner, T. G. & Gershenzon, J. Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochemistry 70, 1621–1637 (2009). 160. Freeman, A., Woodley, J. M. & Lilly, M. D. In situ product removal as a tool for bioprocessing. Bio/Technology 11, 1007–1012 (1993). 161. Pollak, F. C. & Berger, R. G. Geosmin and Related Volatiles in Bioreactor-Cultured Streptomyces citreus CBS 109.60. Appl. Environ. Microbiol. 62, 1295–9 (1996). 162. Oda, S., Inada, Y., Kato, A., Matsudomi, N. & Ohta, H. Production of (S)-citronellic acid and (R)-citronellol with an interface bioreactor. J. Ferment. Bioeng. 80, 559–564 (1995). 163. Zhang, C. et al. Production of sesquiterpenoid zerumbone from metabolic engineered Saccharomyces cerevisiae. Metab. Eng. 49, 28–35 (2018). 164. Covello, P. S. et al. High-level semi-synthetic production of the potent antimalarial artemisinin. Nature 496, 528–532 (2013). 165. Pitera, D. J. et al. Production of amorphadiene in yeast, and its conversion to dihydroartemisinic acid, precursor to the antimalarial agent artemisinin. Proc. Natl. Acad. Sci. 109, E111–E118 (2012). 166. Srivastava, S. K. et al. A Biogenic Photovoltaic Material. Small 14, 1800729 (2018). 167. Grätzel, M. Dye-sensitized solar cells. Journal of Photochemistry and Photobiology C: Photochemistry Reviews 4, 145–153 (2003). 168. Ryan, M. Progress in ruthenium complexes for dye sensitised solar cells. Platin. Met. Rev. 53, 216–218 (2009). 169. Bella, F., Gerbaldi, C., Barolo, C. & Grätzel, M. Aqueous dye-sensitized solar cells. Chem. Soc. Rev. 44, 3431–3473 (2015). 170. Narayan, M. R. Review: Dye sensitized solar cells based on natural photosensitizers. Renewable and Sustainable Energy Reviews 16, 208–215 (2012). 171. Srivastava, S. K., Yamada, R., Ogino, C. & Kondo, A. Biogenic synthesis and characterization of gold nanoparticles by Escherichia coli K12 and its heterogeneous catalysis in degradation of 4-nitrophenol. Nanoscale Res. Lett. 8, 70 (2013). 172. Srivastava, S. K. & Constanti, M. Room temperature biogenic synthesis of multiple nanoparticles (Ag, Pd, Fe, Rh, Ni, Ru, Pt, Co, and Li) by aeruginosa SM1. J. Nanoparticle Res. 14, 831–841 (2012). 173. Liu, L.-Y., Ji, L., Patankar, S., Yadav, V. & Ayakar, S. Isolation of phenolic monomers from kraft lignin using a magnetically recyclable TEMPO nanocatalyst. Green Chem. 21, 785–791 (2019). 174. Fargues, C., Mathias, Á. & Rodrigues, A. Kinetics of Vanillin Production from Kraft Lignin Oxidation. Ind. Eng. Chem. Res. 35, 28–36 (1996). 175. Bjørsvik, H. R. & Minisci, F. Fine chemicals from lignosulfonates. 1. Synthesis of vanillin by oxidation of lignosulfonates. Org. Process Res. Dev. 3, 330–340 (1999). 177

176. Rodrigues Pinto, P. C., Borges Da Silva, E. A. & Rodrigues, A. E. Insights into oxidative conversion of lignin to high-added-value phenolic aldehydes. in Industrial and Engineering Chemistry Research 50, 741–748 (2011). 177. Pacek, A. W., Ding, P., Garrett, M., Sheldrake, G. & Nienow, A. W. Catalytic conversion of sodium lignosulfonate to vanillin: Engineering aspects. part 1. effects of processing conditions on vanillin yield and selectivity. Ind. Eng. Chem. Res. 52, 8361–8372 (2013). 178. Schutyser, W. et al. Revisiting alkaline aerobic lignin oxidation. Green Chem. 20, 3828– 3844 (2018). 179. Heaton, C. A. An Introduction to Industrial Chemistry . (Blackie Academic & Professional, 1991). 180. Valencia, R. The Future of the Chemical Industry by 2050 . (Wiley-VCH Verlag GmbH & Co. KGaA., 2013). 181. Jones, M. E. Why big always wins: Examining economies of scale. Chemical Engineering (2013). 182. Patankar, S. C. & Renneckar, S. Greener synthesis of nanofibrillated cellulose using magnetically separable TEMPO nanocatalyst. Green Chem. 19, 4792–4797 (2017). 183. Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 41, e1– e1 (2013). 184. White TJ, Bruns T, Lee S, T. J. Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. in PCR Protocols: A Guide to Methods and Applications (ed. Innis MA, Gelfand DH, Sninsky JJ, W. T.) 315–322 (New York: Academic Press Inc., 1990). 185. Coenye, T. et al. Burkholderia fungorum sp. nov. and Burkholderia caledonica sp. nov., two new species isolated from the environment, animals and human clinical samples. Int. J. Syst. Evol. Microbiol. 51, 1099–1107 (2001). 186. Stratilová, B. et al. Characterization of a long-chain α-galactosidase from Papiliotrema flavescens. World J. Microbiol. Biotechnol. 34, 19 (2018). 187. BacDive | The Bacterial Diversity Metadatabase. Available at: https://bacdive.dsmz.de/. (Accessed: 3rd March 2019) 188. Usov, I. et al. Understanding nanocellulose chirality and structure-properties relationship at the single fibril level. Nat. Commun. 6, 7564 (2015). 189. Peters, R. J., Croteau, R. B., Croteau, R. & Bohlmann, J. Abietadiene synthase catalysis: Mutational analysis of a prenyl diphosphate ionization-initiated cyclization and rearrangement. Proc. Natl. Acad. Sci. 99, 580–584 (2002). 190. Vattekkatte, A. et al. Substrate geometry controls the cyclization cascade in multiproduct terpene synthases from Zea mays. Org. Biomol. Chem. 13, 6021–6030 (2015).

178

Appendices

Appendix A Fusion protein sequences

A.1 IspDF 1

MIALQRSLSMHVTAIIAAAGEGRRLGAPLPKQLLDIGGRSILERSVMAFARHERIDDVIV

VLPPALAAAPPDWIAASGRVPAVHVVSGGERRQDSVANAFDRVPAQSDVVLVHDAAR

PFVTAELISRAIDGAMQHGAAIVAVPVRDTVKRVDPDGEHPVITGTIPRDTIYLAQTPQA

FRRDVLGAAVALGRSGVSATDEAMLAEQAGHRVHVVEGDPANVKITTSADLDQARQR

LRSAVAARIGTGYDLHRLIEGRPLIIGGVAVPCDKGALGHSDADVACHAVIDALLGAAG

AGNVGQHYPDTDPRWKGASSIGLLRDALRLVQERGFTVENVDVCVVLERPKIAPFIPEIR

ARIAGALGIDPERVSVKGKTNEGVDAVGRGEAIAAHAVALLSES

A.2 IspDF 2

MRCEIPHRCVRRKYRIRRHSPFRLRNCWEGRSVRASSLLRQLAKGGCAARRLSWVTPGF

SQSRRCKTTASELTRIGIGIRIQGDIMQVTAIIAAGGRGRRFGGGVPKQLVGVGGRPILER

TVAAFLGHPAIHEVVVALPAELMADPPAYLRAAPKPIRLVAGGVQRQDSVRQAFQAAN

EQSDVIVIHDAARPFASADLISRTIAAAAEGGAALAAVPARDTVKRGAFAAGRTGPAGR

QAVEGAPLLVVAETLPRDSIYLAQTPQAFRRDVLRDALALGEAGSEATDEATLAERAGH

IVRLVEGEPANIKITTPDDLLVAEAIARGTGERAVGERAAFRIGAGYDLHRLVEGRPLVL

GGVTIPFERGLLGHSDADAICHAVTDAVLGAAAAGDIGRHFPDSDPKWRDWSSIDLLRR

ASAIVKGRGYAIANVDAVVIAERPKLAPFLDEMRANVAGAIGIAVDAVGIKGKTNEGLG

ELGRGEAIAVHAVALLHL

179

A.3 IspDF 3

MRCEIPHRCVRRKYRIRRHSPFRLRNCWEGRSVRASSLLRQLAKGGCAARRLSWVTPGF

SQSRRCKTTASELTRIGIGIRIQGDIMVHVSAIIAAGGRGERFGGPQPKQLLLLGGVPILKR

TVDAFLRGYPFIEVIVALPAEFVANPPDYLDDVIVVEGGARRQDSVANAFRAVAPSAQV

VVIHDAARPLVTPSLIERTVDAAVKHGAAIAALRATDTVKRGDASRVIRGTLPRDEIFLA

QTPQAFRAGVLRDALALAASAADATDEAMLAEQAGHHVRLVDGDPRNLKITTPEDLE

MAERLIGARNTAGAMRIGNGYDLHRLVTGRPLVLGGVTIPFEKGLQGHSDADAVCHAI

TDAILGAASAGDIGRHFPDTDPAWKDAKSIVLLQQAAQIVSRAGYAIANLDVVVIAQQP

KLVPHIDAIRHSVAHALGIDVQQVSVKGKTNEGVDSMGAGESIAVHAVALLQHS

180

Appendix B HPLC chromatograms for ATP, ADP and AMP analysis

Figure 0-1 HPLC chromatogram for set 2 IspE in vitro reactions. The peaks for ATP (2.49

min), ADP (2.60 min) and AMP (2.72 min) are labelled.

181

Appendix C GC-MS analysis for terpenes

Figure 0-2 GC-MS Chromatograms of aerobic shake flask fermentation analysis for

SACar, SAMyr, SALin and SALim+. The monoterpene retention times are: pinene 6.48

min, sabinene 7.42 min, myrcene 7.86 min, carene 8.42 min, limonene 8.88 min, ocimene 9.36

min, terpinene 9.61 min, terpinolene 10.20 min, linalool 10.35 min.

182