University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2013-05-06 Biochemical and evolutionary studies of lactone metabolism in the sunflower () family

Nguyen, Trinh Don

Nguyen, T. D. (2013). Biochemical and evolutionary studies of sesquiterpene lactone metabolism in the sunflower (Asteraceae) family (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/25121 http://hdl.handle.net/11023/702 doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY

Biochemical and evolutionary studies of sesquiterpene lactone metabolism

in the sunflower (Asteraceae) family

by

Trinh Don Nguyen

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF BIOLOGICAL SCIENCES

CALGARY, ALBERTA

APRIL, 2013

© Trinh Don Nguyen 2013 Abstract

Plants have evolved the capacity to synthesize a myriad of specialized metabolites which enhance their fitness in specific living conditions. These compounds are also widely utilized for human purposes. Elucidating the in plant specialized metabolism has been one of the main forces driving plant biochemistry. The more intriguing question, however, is how these enzymes evolved to acquire their existent functions. The Asteraceae, the largest flowering plant family, is well-known for its enormously diverse and lineage-characteristic contents of sesquiterpene lactones (STLs). Thousands of compounds in this subclass of specialized metabolites have been studied extensively for their structures and valuable bioactivities. However, the details of their metabolism are poorly understood. Studying STLs in the Asteraceae thus improves our knowledge of the biosynthesis of these compounds. Furthermore, the tight links between STLs and the Asteraceae family provide an excellent model to explore adaptive evolution. My thesis aims to advance our understanding of STL metabolism by focusing on elucidating the enzyme that is responsible for the oxidation of sesquiterpene to sesquiterpene carboxylic acid in the general STL biosynthetic route of the Asteraceae. In lettuce, two cytochrome P450- dependent monooxygenase (P450) isoforms responsible for oxidizing the three consecutive oxidations of germacrene A to germacrene A carboxylic acid in the biosynthesis of costunolide were characterized. This was achieved using a combination of genomic and biochemical approaches, and the aid of a metabolically-engineered yeast system. Furthermore, this germacrene A oxidase (GAO) activity was demonstrated to be highly conserved throughout the Asteraceae, even in the phylogenetically basal subfamily Barnadesioideae, which split from the rest of the family at least 50 million years go. Previous studies have characterized an Artemisia annua-specific sesquiterpene oxidase, amorphadiene oxidase (AMO), which is considered to have diverged from an ancestral GAO. The substrate specificity/promiscuity of AMO and GAOs towards each other’s natural substrates and seven other non-natural substrates was investigated to test the general hypothesis of enzyme evolution from ancestral promiscuity. The results from these combinatorial biochemistry studies and phylogenetic relations of AMO and GAOs provided deep insights into the evolution of these P450s in the context of the chemical diversity of the Asteraceae.

ii

Acknowledgements

This work would never be possible without the kind supports of many people. First of all, I would like to express my gratitude to my supervisor, Dr. Dae-Kyun Ro, for introducing me to the exciting world of the sunflower family and , and for mentoring and inspiring me throughout the years. I would like to thank Dr. Chendanda C. Chinnappa and Mrs. Janaki Chinnappa for encouraging me to get my education at the University of Calgary and making me feel like home in Canada. I want to thank my two Supervisory Committee members, Dr. Sui- Lam Wong and Dr. Gregory Moorhead, and Dr. Edward C. Yeung for continuously giving me helpful advice during my study. I would like to thank all current and former members of the Ro Lab, including Gillian MacNevin, Dr. Nobuhiro Ikezawa, Dr. Jens Göpfert, Vince Yang Qu, Dr. Hue T. Tran, Dr. Romit Chakrabarty, Dr. Benjamin Pickel, Kiva Ferraro, Tegan Haslam, Dr. Jorge Bariuso, Rod Mitchell, Mohamed Attia, Bryan Pyle, Merisa Tran, and Elysabeth Reavell- Roy for their help and friendliness. My research would have been very difficult without the kind help of many colleagues, particularly Dr. David Dietrich (University of Alberta), Chi Nguyen (University of California, Irvine), the late Dr. Dean McIntyre (University of Calgary), and Dr. Ping Zhang (University of Calgary). I also want to acknowledge Dr. John Vederas (University of Alberta), Dr. Sheryl S.C. Tsai (University of California, Irvine, USA), Dr. Isabelle Barrette-Ng (University of Calgary), and my colleagues from the University of Calgary’s Department of Biological Sciences, particularly the Facchini Laboratory and the Zaremberg Laboratory, for their supports. My study was financially supported in part by the University of Calgary’s Department of Biological Sciences, and the Bettina Bahlsen Memorial Graduate Scholarship. I want to thank all my friends in Calgary, especially anh Hoàng – Huyền, cô Kim – chú Dzũng, anh Thắng – chị Trang, and many Canadians who did not even let me know their names for their helps in times of need. I appreciate Dr. Dương Tấn Nhựt (Tây Nguyên Institute for Scientific Research, Viet Nam) and the Biotechnology Center of Hô Chi Minh City (Viet Nam) for supporting my intention to study in Canada. I am truly grateful to my grandparents, my parents and parents-in-law, my brothers and sister, cô Hiền – chú Tý, cô Kiều Anh – chú Chí, and chị Tuyết Phương for all their love, support, and encouragement. And to my wife, Thuỷ, I am forever indebted for your love, sacrifice, and friendship.

iii

Dedication

,

for always believing in me.

iv

Table of Contents

Abstract ...... ii Acknowledgements ...... iii Dedication ...... iv Table of Contents ...... v List of Tables ...... ix List of Figures ...... x List of Abbreviations ...... xii Published Materials ...... xiv Chapter 1: General introduction ...... 1 1.1. Secondary or specialized metabolism in plants ...... 1 1.2. biosynthesis ...... 5 1.2.1. Structures and functions of terpenoids ...... 5 1.2.2. Biogenesis and cellular compartmentalization of the central precursors ...... 9 1.2.3. Biogenesis of terpenoid structural diversity by TPSs and modifying enzymes ...... 12 1.2.4. Cytochrome P450-dependent monooxygenases ...... 15 1.3. STLs in the Asteraceae family...... 17 1.3.1. Structures and functions of STLs ...... 17 1.3.2. Occurrence of STLs in the Asteraceae family ...... 19 1.3.3. Biosynthesis of STLs ...... 21 1.4. Approaches to understanding the biochemistry of plant specialized metabolism ...... 23 1.4.1. Traditional biochemical approaches ...... 23 1.4.2. Genomics-based approaches ...... 24 1.4.3. Metabolic engineering and synthetic biology in studying specialized metabolism ...... 25 1.5. Adaptive enzyme evolution ...... 27 1.5.1. Enzyme evolution in the context of metabolism networks ...... 27 1.5.2. Evolution of new enzyme activities ...... 28

v

1.6. Rationale, hypothesis, and objectives ...... 29 Chapter 2: Materials and methods ...... 31 2.1. Plant materials ...... 31 2.2. Sequence analysis, RNA extraction, and isolation of cDNAs ...... 31 2.2.1. Data mining ...... 31 2.2.2. RNA extraction and cDNA synthesis ...... 33 2.2.3. Isolation of full-length cDNAs for candidate genes ...... 34 2.2.4. Phylogenetic analysis of GAOs ...... 35 2.2.5. Isolation of STSs ...... 36 2.3. Yeast platform for the biochemical characterization of enzymes ...... 39 2.3.1. Yeast strains ...... 39 2.3.2. Expression constructs ...... 41 2.3.3. Yeast culture media ...... 42 2.3.4. Yeast culture conditions ...... 43 2.3.5. Microsomal preparation, immunoblot analysis, and in vitro enzyme assays ...... 43 2.4. Metabolite purification and analysis ...... 44 2.4.1. Metabolite preparation ...... 44 2.4.2. Metabolite analysis by gas chromatography ...... 45 2.4.3. Metabolite analysis by LC–MS ...... 45 2.4.4. Metabolite purification for structural elucidation ...... 46 2.4.5. Structural analysis ...... 46 2.5. Quantitative RT-PCR for gene expression analysis in lettuce ...... 47 2.6. Structure–function analysis of AMO and GAOs ...... 48 2.6.1. Homology modeling ...... 48 2.6.2. Site-directed mutagenesis ...... 50 Chapter 3: Biochemical conservation and cross-reactivity of GAO in the Asteraceae ...... 51 3.1. Introduction ...... 51 3.2. Isolation of GAO and pathway reconstitution in engineered yeast ...... 52

vi

3.2.1. Isolation of GAO from lettuce ...... 52 3.2.2. Metabolite profiles of EPY300 expressing GAS, GAO, and CPR ...... 53 3.2.3. Structural analyses of the products isolated from yeast culture ...... 55 3.2.4. Altered metabolite profiles of transgenic EPY300 in pH-adjusted condition ...... 56 3.2.5. Structural elucidation of germacrene A carboxylic acid ...... 59 3.2.6. AMO/GAO homologues in the Asteraceae ...... 61 3.2.7. Occurrence of the STL biosynthetic pathway in B. spinosa ...... 64 3.2.8. Phylogenetic analysis of AMO and GAOs ...... 65 3.2.9. Cross-reactivities of AMO and GAOs ...... 67 3.3. Discussion ...... 69 3.3.1. GAO activity in the Asteraceae family ...... 69 3.3.2. Cross-reactivities of AMO and GAOs ...... 71 Chapter 4: Probing the evolution of AMO and GAOs ...... 73 4.1. Introduction ...... 73 4.2. Functional characterization of the second GAO isoform in lettuce ...... 74 4.2.1. Biochemical activities of LsGAO2 ...... 74 4.2.2. Differential expressions of LsGAO1 and LsGAO2 in lettuce ...... 76 4.3. Substrate specificity of AMO and GAOs ...... 78 4.3.1. De novo synthesis of sesquiterpene substrates in yeast ...... 78 4.3.2. Metabolite profiles of EPY300 co-expressing combinations of AMO/GAOs and STSs 81 4.3.3. Structural analyses of the novel products ...... 84 4.3.4. Metabolite profiles of EPY300 co-expressing combinations of AMO/GAOs and STSs in buffered (neutral) culture condition ...... 90 4.4. Evolutionary analysis of AMO and GAOs ...... 93 4.4.1. Phylogenetic analysis of GAO isoforms ...... 93 4.4.2. Selection analysis of AMO and GAOs in the STL metabolic network ...... 93 4.5. Structure–function analysis of AMO and GAOs ...... 94 4.5.1. Homology modeling of AMO and GAOs ...... 94 4.5.2. Site-directed mutagenesis of GAOs ...... 95 vii

4.6. Discussion ...... 99 4.6.1. The two GAO isoforms in lettuce ...... 99 4.6.2. Substrate promiscuity of GAOs ...... 100 4.6.3. Natural selection of AMO and GAOs ...... 103 4.6.4. Structure–function analysis of AMO and GAOs ...... 104 Chapter 5: General discussion and future perspectives ...... 106 5.1. Contribution of GAOs to the fitness of the Asteraceae ...... 106 5.1.1. Critical role of GAOs in STL metabolism ...... 106 5.1.2. Conservation of GAO activity in the Asteraceae ...... 109 5.2. The many “talents” of GAOs and chemical diversity of the Asteraceae...... 110 5.3. AMO/GAO structure–function analysis ...... 113 References ...... 116 Appendix: Yeast as a platform for biochemical studies and de novo synthesis of natural products ...... 132 A.1. General introduction ...... 132 A.2. Molecular characterization of LovA in lovastatin biosynthesis ...... 133 A.2.1. Introduction ...... 133 A.2.2. Isolation and expression of LovA in yeast ...... 134 A.2.3. Yeast feeding assay of LovA ...... 136 A.2.4. In vitro assay for LovA activity ...... 139 A.2.5. Summary ...... 142 A.3. De novo synthesis of high-value plant sesquiterpenoids in yeast ...... 142 A.3.1. Introduction ...... 142 A.3.2. De novo production of : and 5-epi-aristolochene ...... 143 A.3.3. De novo production of hydroxylated sesquiterpenes: capsidiol ...... 146 A.3.4. Summary ...... 148

viii

List of Tables

Table 2.1. A list of ESTs that share high sequence identity with A. annua AMO found in the Compositae Genome Project Database ...... 32 Table 2.2. Sequences of primers for cloning full-length cDNAs of GAOs into MCS1 of pESC vector ...... 35

Table 2.3. List of CHSs and STL biosynthetic genes included in the Ka/Ks measurement ...... 37 Table 3.1. 1H and 13C NMR spectral data of γ-costic acid, β-costic acid, ilicic acid, and germacrene A carboxylic acid ...... 60 Table 3.2. AMO/GAO homologues in the Asteraceae ...... 65 Table 4.1. 1H and 13C NMR spectral data of δ-guaienic acid, epi-ilicic acid, and iso-epi-ilicic acid ...... 90

ix

List of Figures

Figure 1.1. Major biosynthetic routes leading to the specialized metabolites ...... 2 Figure 1.2. Re-construction of terpenoids based on units ...... 5 Figure 1.3. Biosynthetic routes from the central precursors to the major subclasses of terpenoids with representative examples ...... 6 Figure 1.4. Two biosynthetic pathways leading to the central precursors of terpenoids ...... 10 Figure 1.5. Catalytic mechanism of γ- synthase from giant fir (Abies grandis) ...... 13 Figure 1.6. Biosynthetic routes from the diterpenoid skeletal structure ent-kaurene to the bioactive gibberellins ...... 14 Figure 1.7. Schematic cycle of catalysis of the most common reaction by a P450 ...... 16 Figure 1.8. Major groups of STL skeletons and their hypothetical biogenetic routes ...... 18 Figure 1.9. Representative STL structures naturally occurring in the Asteraceae family ...... 20 Figure 1.10. STL biosynthetic routes in the Asteraceae family ...... 22 Figure 1.11. Jensen’s hypothesis of enzyme evolution from ‘generalist’ progenitors ...... 29 Figure 2.1. EPY300 yeast platform for in vivo enzyme assays ...... 40 Figure 2.2. Alignment of LsGAO1 and LsGAO2 cDNA sequences ...... 49 Figure 3.1. GC–MS profile of metabolite extracts from transgenic yeast ...... 54 Figure 3.2. GC–MS and LC–MS comparative analyses of sesquiterpenoids from yeast co- expressing LsGAO1, CPR, and GAS in acidic and neutral conditions ...... 58 Figure 3.3. Biochemical analyses of GAOs from various species of the Asteraceae family ...... 63 Figure 3.4. (–)-LC–MS analysis in selected ion monitoring mode of metabolite extract from Barnadesia spinosa leaf and flower for m/z 233 ...... 64 Figure 3.5. Phylogenetic analysis of AMO and GAOs from various plants in the Asteraceae .....66 Figure 3.6. Amino acid sequence alignment of AMO and GAOs ...... 68 Figure 3.7. Relative cross-reactivities of AMO and GAOs in transgenic yeast ...... 69 Figure 4.1. GC–MS profiles of metabolite extracts from yeast expressing LsGAO1/CPR/GAS and LsGAO2/CPR/GAS in native (acidic) yeast culture condition ...... 75 Figure 4.2. (–)-LC–MS analyses of LsGAO2 activities in transgenic yeasts ...... 76 x

Figure 4.3. Relative transcript levels of LsGAO1 and LsGAO2 in different tissues of three- month-old lettuce ...... 77 Figure 4.4. Structures of the sesquiterpenes subjected to the test of substrate promiscuity/specificity of AMO/GAOs...... 78 Figure 4.5. De novo synthesis of plant sesquiterpenes in the engineered yeast strain EPY300 ... 80 Figure 4.6. Analysis of combinatorial biochemistry of P450s and STSs ...... 83 Figure 4.7. Oxidation of valerenadiene to by GAOs ...... 85 Figure 4.8. Oxidation of valencene to nootkatone by GAOs ...... 86 Figure 4.9. Oxidations and rearrangements of δ-guaiene, 5-epi-aristolochene, and valencene in yeasts expressing GAOs ...... 88 Figure 4.10. (–)-LC–MS analysis of metabolite profiles of EPY300 co-expressing LsGAO2/CPR/STSs in acidic and neutral conditions ...... 89 Figure 4.11. Phylogenetic tree of AMO/GAOs including LsGAO2 and CiGAO2 ...... 92 Figure 4.12. Models of amorphadiene docked in AMO, LsGAO1, and BsGAO structures ...... 96 Figure 4.13. Sequence alignment of AMO, LsGAO1, and BsGAO ...... 97 Figure 4.14. Production of artemisinic acid by mutant and native GAOs ...... 98 Figure 5.1. Biosynthetic pathway of costunolide...... 107 Figure 5.2. Model of sesquiterpene oxidase evolution based on Jensen’s hypothesis of enzyme evolution from ancestral promiscuity ...... 111 Figure 5.3. Structural similarities between naturally-occurring STLs in the Asteraceae and the sesquiterpene substrates tested for oxidation by GAOs ...... 112 Figure A1. Lovastatin biosynthesis in Aspergillus terreus ...... 135 Figure A2. HPLC–UV metabolite profiles of yeast feeding assays ...... 137 Figure A3. (–)-LC–MS analyses of in vitro LovA enzyme assays ...... 140 Figure A4. Proposed mechanism of LovA ...... 141 Figure A5. Effect of thioredoxin fusion on in vivo STS productivity in yeast ...... 145 Figure A6. Production of capsidiol and other oxygenated compounds in engineered yeast ...... 149

xi

List of Abbreviations

ADS amorpha-4,11-diene synthase AMO amorpha-4,11-diene oxidase amorphadiene amorpha-4,11-diene CHS chalcone synthase CpVS Citrus x paradisis valencene synthase CTAB cetyltrimethylammonium bromide DMAPP dimethylallyl pyrophosphate (dimethylallyl diphosphate) GC–MS gas chromatography–mass spectrometry EST expressed sequence tag FPP (farnesyl diphosphate) GAA germacrene A carboxylic acid GaCDS Gossipium arboreum δ-cadinene synthase GAO germacrene A oxidase GAS germacrene A synthase HEPES 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid HPLC high-performance liquid chromatography hr hour(s) IPP isopentenyl pyrophosphate (isopentenyl diphosphate) LC–MS liquid chromatography–mass spectrometry MCS multi-cloning site min minute(s)

MEP 2-C-methyl-D-erythritol-4-phosphate MVA mevalonate / mevalonic acid NMR nuclear magnetic resonance NtEAH Nicotiana tabaccum 5-epi-aristolochene-1,3-dihydroxylase NtEAS Nicotiana tabaccum 5-epi-aristolochene synthase ORF open reading frame P450 cytochrome P450-dependent monooxygenase

xii

qRT-PCR quantitative reverse-transcription polymerase chain reaction RACE rapid amplification of cDNA ends ScGDS Soligado canadensis (+)-germacrene D synthase sec second(s) STL sesquiterpene lactone STS sesquiterpene synthase SynGUS synthetic guaiene synthase SynVS synthetic (+)-valencene synthase TPS synthase Trx thioredoxin valerenadiene valerena-4,7(11)-diene VoVDS Valeriana officinalis valerena-4,7(11)-diene synthase

xiii

Published Materials

Below is the list of publications that are related to this PhD research project with details of my contribution in each of the published manuscript.

Nguyen TD, Göpfert JC, Ikezawa N, MacNevin G, Kathiresan M, Conrad J, Spring O, Ro DK (2010). Biochemical conservation and evolution of germacrene A oxidase in Asteraceae. Journal of Biological Chemistry 285:16588–16598. - Cloning CiGAO1, BsGAO, and characterizing all recombinant GAOs - Purifying GAA for structural analysis - Investigating cross-reactivities, and phylogenetic relations of AMO and GAOs - Metabolite profiling of Barnadesia spinosa Chapter 3 is adapted from this manuscript.

Barriuso J, Nguyen TD, Li JW, Roberts JN, MacNevin G, Chaytor JL, Marcus SL, Vederas JC, Ro DK (2011). Double oxidation of the cyclic nonaketide dihydromonacolin L to monacolin J by a single cytochrome P450 monooxygenase, LovA. Journal of the American Chemical Society 133:8078–8081. - Characterizing recombinant LovA in vitro - Purifying LovA products for structural analyses Appendix A.2 is adapted from this manuscript.

Nguyen TD, MacNevin G, Ro DK (2012). De novo synthesis of high-value plant sesquiterpenoids in yeast. Methods in Enzymology 517:261–278. - Establishing yeast transformation and cultures; - Generating constructs and expressing sesquiterpene synthases (except valencene synthase), thioredoxin-fusion for sesquiterpene synthases, co-expressing P450/CPR/sesquiterpene synthase to produce sesquiterpenes and oxygenated sesquiterpenes. Chapter 2 and Appendix A.3 are adapted from this manuscript

Ikezawa N, Göpfert JC, Nguyen TD, Kim SU, O’Maille PE, Spring O, Ro DK (2011). Lettuce costunolide synthase (CYP71BL2) and its homolog (CYP71BL1) from sunflower catalyze distinct region- and stereoselective hydroxylations in sesquiterpene lactone metabolism. Journal of Biological Chemistry 286:21601–21611. - Cloning and testing four costunolide synthase candidates from sunflower

xiv

Chapter 1: GENERAL INTRODUCTION

1.1. Secondary or specialized metabolism in plants

Throughout the course of evolution, plants have acquired the capacity to synthesize a vast array of molecules for necessary biochemical and physiological functions. Among these, the basic substances (e.g. nucleotides, amino acids, phytohormones, lipids, and sterols) directly required for the survival of plants are ubiquitously found in all plant species and are conventionally referred to as “primary metabolites”. Plants possess another group of compounds that are secondarily derived from the primary metabolites. These compounds are normally referred to as “secondary metabolites” since they were speculated to be waste products or, at best, not strictly necessary for plant growth and development in the early days of plant metabolism study (Wink, 2010). The increasing knowledge on “secondary metabolites” in the past years has demonstrated that, far from being unnecessary, these small molecules are critically important in the adaptation of plants as sessile living organisms. Their eco-physiological roles include defence against herbivores and microbes, attraction to pollinators and fruit-dispensing animals, and allelopathy in competing with other plants (Dewick, 2009). As they indeed increase the fitness of plants in specific ecological niches and through evolutionary adaptations, these compounds should be better indicated as “specialized metabolites”. These substances have also been utilized by human for medicine, dyes, and flavours long before we had a slightest idea about their nature. The majority of modern drugs is either specialized metabolites or synthesized

(or designed) based on specialized metabolites (Newman and Cragg, 2007; Harvey, 2008).

1

Figure 1.1. Major biosynthetic routes leading to the specialized metabolites with some representative examples. Italicized texts indicate biochemical pathways. MVA: mevalonic acid. MEP: methylerythritol phosphate. TCA: tricarboxylic acid cycle.

2

Despite the great structural and functional diversity of more than 100,000 known specialized metabolites (Wink, 1988) and up to hundreds of thousands yet to be elucidated (Pichersky and

Gang, 2000), they can be grouped into a few categories based on their basic biosynthetic routes

(Dewick, 2009) (Figure 1.1). The major groups of specialized metabolites are terpenoids, , phenylpropanoids and flavonoids, polyketides, and cyanogenic and sulfur-containing compounds (Wink, 2010).

Terpenoids represent the most structurally diverse class of specialized metabolites with more than 55,000 known structures (Breitmaier, 2006). All of these substances share the central precursor isopentenyl pyrophosphate (IPP) and its isomer dimethylallyl pyrophosphate

(DMAPP). Details of their biosynthesis and functions will be discussed in the subsequent section.

Alkaloids with more than 12,000 compounds (Croteau et al., 2000) constitute a group of nitrogen-containing specialized metabolites. With a very long history of uses in human society, alkaloids include many well-known examples, including “God’s own medicine” morphine

(Taylor, 1965), and the parasympathomimetic compound nicotine. Most of alkaloids are biosynthesized from positively charged amino acids (including arginine, lysine, and ornithine), and aromatic ring-containing amino acids (including phenylalanine, tyrosine, tryptophan, anthranilic acid, and niacin) (Roberts et al., 2010).

More than 8,000 phenolic compounds, including phenylpropanoids and flavonoids, have played a critical role in plant evolution since the rise of land plants during the Paleozoic era

(541–252.2 million years ago) (Chaloner, 1970). Not only making up the cell wall structures to support plants on terrestrial environment, these metabolites also consist of non-structural

3 molecules for defence, colours, and flavours (Croteau et al., 2000). They are synthesized in plants via the shikimate pathway (Figure 1.1).

Polyketides are synthesized from malonyl CoA with various priming molecules. Found in plants, fungi, and bacteria, polyketides have their backbones synthesized by successive rounds of decarboxylative Claisen-type condensations of extension units (malonyl-CoA or methylmalonyl-

CoA) on a starter unit (acetyl-CoA or malonyl-CoA). Compared to microbes, a wider range of starter molecules is utilized by plant polyketides, such as p-coumaroyl-CoA derived from the phenylpropanoid pathway (Staunton and Weissman 2001), hexanoyl-CoA in the cannabinoid biosynthetic pathway (Stout et al., 2011), and cinnamic acid in curcumin biosynthesis (Kita et al., 2008).

The amino acid derivatives in plants also constitute a group of specialized metabolites including more than 100 glucosinolates and cyanogenic glycosides (Wink, 2010). They share a common activation mechanism. Upon hydrolysis, particularly after herbivorous attacks, glucosinolates and cyanogenic glycosides release sulfur-containing toxic materials and hydrocyanic acid, respectively, to deter the herbivores. Other amino acid derivatives include the cysteine sulfoxides, specialized to the onion (Alliaceae) family (Dewick, 2009).

In addition to the aforementioned classes, plant specialized metabolites are also comprised by many complex structures of multiple origins. For instance, the anti-cancer drug

(commercially known as Taxol) is an diterpenoid naturally occurring in the Taxus genus

(Wani et al., 1971). Another example is the neurotoxin coniine, which is an alkaloid with a polyketide origin (Roberts et al., 2010).

4

1.2. Terpenoid biosynthesis

1.2.1. Structures and functions of terpenoids

Figure 1.2. Re-construction of terpenoids based on isoprene units with examples representing three different joining configurations.

Of all the specialized metabolites, terpenoids stand out as the largest class with more than

55,000 structures reported to date (Breitmeier, 2011). The name “terpenoids” came from

“turpentine”, distillation mixture from live tree . These compounds are also called

“isoprenoids” since, in concept, the back bone of each molecule can be constructed from five- carbon isoprene units. In the vast majority of terpenoid structures, isoprene units are joined by head-to-tail condensations (Ashour et al., 2010). Recent studies showed that head-to-head (e.g., triterpenoids and sterols) and head-to-middle (e.g. pyrethrin I, chrysanthemene) condensations can also occur (Rivera et al., 2001; Ro, 2011) (Figure 1.2).

5

Figure 1.3. Biosynthetic routes from the central precursors to the major subclasses of terpenoids with representative examples. Note that the polyterpenoids are grouped together merely based on the large number of carbons as they do not necessarily share a common biosynthetic route. IPP: isopentenyl pyrophosphate. DMAPP: dimethylallyl pyrophosphate. GPP: . FPP: farnesyl pyrophosphate. GGPP: geranylgeranyl pyrophosphate.

6

The concept of isoprene units is also used to classify terpenoids based on the unit number.

Structures that contain one, two, three, four, five, six, and eight isoprene units are called hemiterpenoids, monoterpenoids, sesquiterpenoids, diterpenoids, sesterterpenoids, triterpenoids, and tetraterpenoids, respectively. Molecules that have more than eight isoprene units are referred to as polyterpenoids (Figure 1.3). This nomenclature system was based on the previously dominant view that ten-carbon structures (with two isoprene units) were the smallest terpenoids and thus considered “mono-” (Croteau et al., 2000).

Although the isoprene unit concept is useful in understanding terpenoid structures, the true biological precursor of terpenoids is not isoprene, but isopentenyl pyrophosphate (IPP) and its isomer dimethylallyl pyrophosphate (DMAPP). Furthermore, certain terpenoids are broken-down product (loss of carbon) of other terpenoids. For examples, the plant can be structurally considered as a sesquiterpenoid as it has three isoprene units; however, it is a broken-down product from which have eight isoprene units (Nambara and Marion-

Poll, 2005). Similarly, the recently identified group of plant hormone, strigolactones, are also a cleavage product of carotenoids (Schwartz et al., 2004). Therefore, feeding assays with labelled precursors may be required to fully elucidate the biosynthetic origin of terpenoids.

The functions of terpenoids are as diverse as their structures. They are also excellent examples for the blurred boundary between “primary” and “secondary” metabolites since many of them are essentially involved in the basic physiological processes in all plants while the others are specialized for certain plant lineages. Although the biological functions of the majority of terpenoids remain unknown, it is well documented that terpenoids play critical roles in a wide range of processes of plant growth and development as well as interactions with the biotic and abiotic environment. These roles are, but not restricted to, phytohormones (gibberellins, abscisic

7 acid, brassinosteroids, and strigolactones), cellular membrane component (), photosynthetic pigments (carotenoids, and phytol), electron carriers (ubiquinone and plastoquinone), pollinator attractants (e.g. 1,8-cineole and ), herbivore deterrents (e.g. sesquiterpene lactones, triterpenoid ), antimicrobial agents (e.g. capsidiol, and oryzalexins), and allelochemicals (e.g. momilactones, messegenic acids, and breviones) (Kato et al., 1973; Harmatha and Nawrot, 1984; Mori and Waku, 1985; McGarvey and Croteau, 1995;

Croteau et al., 2000; Vyvyan, 2002; Roberts, 2007; Maldonando-Bonilla et al., 2008; Ro, 2011).

As specialized terpenoids are the results of plant adaptation to the environment, they exert bioactivities on other living organisms, including human. Our society has for long included a myriad of useful terpenoids in foods and drugs, even without understanding the nature of these substances. Modern science has helped reveal the terpenoid structures underlying our favourite flavours and fragrances, such as (peppermint flavour), , nootkatone ( flavour), linalool (floral pleasant scent) among others (Caputi and Aprea, 2010). Many terpenoids have nutraceutical values, particularly the carotenoids, while the defensive functions of some terpenoids make them or their derivatives potent pesticides (e.g. pyrethrin, nerolidol, , 1,4-cineole, ) (Vaugh and Spencer, 1996; Croteau et al., 2000; Abdel-

Rahman et al., 2013). The configuration of cis-1,4-polyisoprene gives the desired elasticity that cannot be achieved by synthetic rubber (trans-configuration) (Croteau et al., 2000;

Ro, 2011). The greatest attention has been paid to the terpenoids of medicinal properties. Since as early as 200 BCE, the sesquiterpene lactone (STL) (known as “qinghaosu” in

China) has been used to treat many illnesses including malaria (Croteau et al., 2000). Other prominent examples of medicinal terpenoids include a plethora of anti-cancer agents such as ginseng triterpenoid saponins, paclitaxel, , triptolide, and celastrol (Croteau et al., 2000;

8

Yue et al., 2011; Huang et al., 2012). Several researchers also recorded other medicinal uses of terpenoids based on their anti-viral, anti-allergenic, anti-hyperglycemic, anti-inflammatory, and immune-modulatory characteristics (Ajikumar et al., 2008). In addition, terpenoids have been increasingly studied as a potential renewable source of hydrocarbon biofuels since the discovery that sesquiterpenes from Amazon copaiba (Copaifera sp.) could be used for diesel engines

(Bohlmann and Keeling, 2008; Ro, 2011).

1.2.2. Biogenesis and cellular compartmentalization of the central precursors

Terpenoids are derived from a common five-carbon precursor IPP and its isomer DMAPP.

In higher plants, two independent pathways are responsible for the biosynthesis of IPP (Figure

1.4). The cytosolic mevalonate (MVA) pathway, elucidated in the 1950s, starts with acetyl coenzyme-A to generate a key intermediate, MVA, and finally IPP, which is then isomerized to

DMAPP by IPP . This was thought to be the only biosynthetic route leading to IPP until the 1990s, when the non-MVA 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway, or 1- deoxy-D-xylulose-5-phosphate (DXP) pathway, was established (Rohmer et al., 1993). In plastid, the MEP pathway converts pyruvate and glyceraldehyde-3-phosphate to MEP, and finally to IPP and DMAPP through several steps. The repetitive condensations of IPP and DMAPP units create a series of prenyl pyrophosphates that can be used as immediate precursors for different classes of terpenoids (Croteau et al., 2000; Ro, 2011).

The biogenesis of terpenoids in plant cells is highly compartmentalized. Generally, in the plastids, the MEP-derived IPP provides precursor for the C10 geranyl pyrophosphate (GPP) and

C20 geranylgeranyl pyrophosphate (GGPP). These give rise to , , and , including the photosynthesis-related isoprenoids (chlorophyll, , plastoquinone, and ) and some phytohormones (gibberellins and abscisic acid).

9

Figure 1.4. Two biosynthetic pathways leading to the central precursors of terpenoids. Bold text represents the enzymes involved in the pathways. IPP: isopentenyl pyrophosphate. DMAPP: dimethylallyl pyrophosphate. HMG-CoA: 3-hydroxy-3-methylglutaryl-coenzyme A. CDP: cytidine 5’-pyrophosphate. Modified from Phillips et al. (2008).

10

On the other hand, the MVA-derived IPP pool leads to the formation of the C15 farnesyl pyrophosphate (FPP), giving rise to sesquiterpenes, (squalenes), and phytohormones, such as cytokinins. In addition to the IPP pools in cytosol and plastids, mitochondrial IPP can also serve as precursor for ubiquinone, which then participates in the electron transport chain

(Eisenreich et al., 2004; Dudareva et al., 2005; Lüker et al., 2007). These steps are followed by primary modifications catalyzed by specific terpene synthases (TPS) and secondary modifications by various enzymes (mostly with redox activity) to yield functional terpenoids (Croteau et al.,

2000).

The aforementioned compartmentalization, however, is not absolute. Many studies have shown that there are metabolite cross-talks between the plastidial MEP and the cytosolic MVA pathways. Using radio-labelled substrates and specific inhibitors for each IPP biosynthetic route, researchers have revealed that the MEP-pathway-derived intermediates could contribute to the biosynthesis of cytosolic terpenoids and vice versa in Arabidopsis thaliana, tobacco (Nicotiana tabaccum), spinach (Spinacea oleracea), kale (Brassica oleracea), Indian mustard (Brassica juncea), and snapdragon (Antirrhinum majus) (Arigoni et al., 1997; Nagata et al., 2002; Bick and

Lange, 2003; Hemmerlin et al., 2003; Dudareva et al., 2005). Recent 13C-labelled feeding experiment in Artemisia annua also showed that the STL artemisinin is a terpenoid of both sources (Schramek et al., 2010). The cross-talks are regulated based on the nature of the precursor and the physiological state of the plant (Schuhr et al., 2003). Despite the cross-talks, the two IPP biosynthetic pathways cannot compensate perfectly for each other, and the unidirectional transport of terpenoid intermediates from plastid to cytosol is preferred (Bick and

Lange, 2003; Laule et al., 2003; Rodríguez-Concepción et al., 2004). In addition, the current view of the cytosol/plastid compartmentalization of IPP/DMAPP biosynthesis is being re-

11 considered as new evidence suggests that the MVA pathway is localized in peroxisomes (Kovacs et al., 2002; Sapir-Mir et al., 2008).

1.2.3. Biogenesis of terpenoid structural diversity by TPSs and modifying enzymes

The central intermediate prenyl pyrophosphates (Figure 1.3) are transformed into basic skeletal structures by members of the TPS superfamily. These basic structures are then chemically modified and decorated further by other enzyme superfamilies, particularly the cytochrome P450-dependent monooxygenases (P450s). The enormous diversity of terpenoid structures is due to the large numbers of members of these enzyme superfamilies, as well as their special catalytic mechanisms.

Geranyl-, farnesyl-, and geranylgeranyl-pyrophosphates are converted to the basic by TPSs. These enzymes are also called terpene cyclases as many of them produce cyclic products from acyclic prenyl pyrophosphates. A TPS co-ordinates divalent metal ions (Mg2+) to interact with the pyrophosphate group of a prenyl pyrophosphate while the hydrocarbon chain of the substrate is held in the enzyme’s hydrophobic pocket (Starks et al., 1997; Greenhagen and

Chappell, 2001). This results in the pyrophosphate group being cleaved off, leaving a highly- reactive allylic carbocation that subsequently goes through double-bond rearrangement, hydride shift, methyl migration, deprotonation, and reprotonation that possibly give rise to more than one product depending on its interactions with other residues in the of the enzyme

(Greenhagen and Chappell, 2001). However, many TPSs indeed catalyze a multiproduct reaction, such as giant fir (Abies grandis) γ-humulene synthase, which produces 52 sesquiterpenes from

FPP (Figure 1.5). This multiproduct characteristic together with the natural occurrence of hundreds of TPS sequences, of which sesquiterpene synthases alone include more than 100

(Degenhardt et al., 2009), fundamentally contribute to the diversity of terpenoid structures.

12

Figure 1.5. Catalytic mechanism of γ-humulene synthase from giant fir (Abies grandis). Deprotonations (1 and 8), cyclizations (2 to 7), and many rearrangements allows the synthesis of 52 sesquiterpenes from FPP. Adapted from Yoshikuni et al., 2006.

The diversity of terpenoid structures is further enriched by chemical modifications on the basic skeletal structures. One terpenoid backbone could undergo further enzymatic reactions including oxidations, reductions, isomerizations, hydrations, and conjugations to generate several end products (Ro, 2011). These steps involved modifying enzymes such as the membrane-bound

P450s, the soluble non- 2-oxoglutarate-dependent dioxygenases, the soluble dehydrogenases, and the soluble glycosyltransferases (Bowles et al., 2005; Ashour et al., 2010),

13 leading to changes in terpenoids’ solubility, stability, localization, availability, and ultimately bioactivity. The majority of these secondary enzymatic modifications of terpenoid structures are not well understood at the molecular levels. However, some biosynthetic routes have been extensively studied and demonstrate the aforementioned theme. One such example is the biosynthesis of the diterpenoid phytohormone gibberellins. In Arabidopsis, the diterpenoid hydrocarbon ent-kaurene is oxidized in three consecutive steps by a single P450 to ent-kaurenoic acid, which is subsequently oxidized to gibberellin 12 by another P450. This compound is in turn a substrate for oxidations by 2-oxoglutarate-dependent dioxygenases to yield other gibberellins

(Yamaguchi, 2008) (Figure 1.6).

Figure 1.6. Biosynthetic routes from the diterpenoid skeletal structure ent-kaurene to the bioactive gibberellins GA1, GA3, and GA4. Bold text represents the involved enzymes, and colours indicate an enzyme with its associated chemical decoration. P450: cytochrome P450- dependent monooxygenase. 2ODD: 2-oxoglutarate-dependent dioxygenase. GA: gibberellin. Modified from Yamaguchi (2008).

14

1.2.4. Cytochrome P450-dependent monooxygenases

The enzymes of interest in this research are P450s. They constitute the largest enzyme superfamily among the aforementioned modifying enzymes in terpenoid biosynthesis, and even in plant metabolism in general, with 5,100 annotated plant P450 sequences (Nelson and Werck-

Reichhart, 2011) and a proportion of up to 1% in each sequenced plant genome (Mizutani and

Ohta, 2010). Although their number is by far the largest in plants, these heme-thiolate enzymes are present in all living organisms and even some viruses, and are considered to be the “most versatile biological catalyst” (Coon, 2005; Lamb et al., 2009). Due to their ability to catalyze the transformations of natural products, P450s are an important target of drug discovery/design and metabolism studies in animals (Anzenbacher and Anzenbacherová, 2001; de Groot, 2006).

Sequences of P450s vary significantly between different kingdoms of life (less than 20% identity between bacterial and human P450s) (Hasemann et al., 1995). However, increasing knowledge about P450s’ structures reveals a well-conserved topography and folding of all P450s

(Wreck-Reichhart and Feyereisen, 2000). The core structure contains a four-helix bundle, of which two helices confine the prosthetic heme group. The heme group is also tethered to an absolutely-conserved cysteine residue on a heme-binding loop with the consensus sequence Phe-

Xxx-Xxx-Gly-Xxx-Arg-Xxx-Cys-Xxx-Gly. The sulfur ligand between this cysteine and heme give the characteristic Soret absorbance at 450 nm in the reduced carbon monoxide-bound form, hence the name of the enzymes (Coon, 2005). Most of eukaryotic P450s, including those in plants, reported to date are membrane-bound proteins, attached to the cytoplasmic surface of the endoplasmic reticulum (ER) by a short N-terminal chain.

Although they can catalyze a wide range of reactions including epoxidations, heteroatom dealkylations, oxygenations, methylene-dioxybridge formation, phenol coupling reactions,

15 oxidative rearrangement and cleavage of carbon skeletons (Mizutani and Sato, 2011), most of the

P450s catalyze the NADPH- and O2-dependent monooxygenation/hydroxylation reactions

(Bowell et al., 1994; Chapple, 1998) with the typical scheme:

+ + RH + O2 + NADPH, H  ROH + H2O + NADP (Figure 1.7)

As the electron transport occurs, most P450s require a protein partner, cytochrome P450 reductase (CPR), to deliver one or more electrons from NADPH to reduce the iron molecule of

P450s and hydroxylate the substrates (Hannemann et al., 2006).

Figure 1.7. Schematic cycle of catalysis by a P450. Red and ox indicate the reductant and oxidant states of the redox partner (e.g. CPR), respectively. Compound 0 is the ferric hydroperoxo form , and compound I is the ferryl–oxo form of the P450. Adapted from Munro et al. (2007).

16

1.3. STLs in the Asteraceae family

1.3.1. Structures and functions of STLs

This research focuses on STLs, a terpenoid subclass that is characterized by a cis- or trans- fused γ-lactone ring (five-membered) on a 15-carbon sesquiterpene skeleton (Figure 1.8). In some STLs, this lactone moiety contains an α-methylene group. More than 4,000 STL structures have been identified so far (Chaturvedi, 2011). Despite this structural diversity, STLs can be categorized into a few groups based on their basic skeletal backbones. Similar to the basic terpenoid skeletons, STL basic structures in each of the groups depicted in Figure 1.8 can be further chemically decorated to produce a large number of STL end-products. These modifications include oxidations, epoxidations, rearrangements, and conjugations (Figure 1.9).

Many STLs are known to be biologically active. They deter herbivorous animals by giving plants a bitter taste as well as causing allergic and poisoning reactions in animals. STLs also suppress the growth of microbes and other plants as part of plant defensive mechanisms

(Rodriguez et al., 1976; Rees and Harborne, 1985; Picman, 1986). The capacity of STLs in interacting with other living organisms has also been utilized extensively by human for their health benefits. Prominent examples of medicinal STLs include parthenolide (anti- inflammatory), artemisinin (anti-malarial, anti-leishmanial, and anti-tumour), lactucopicrin

(sedative and analgesic), thapsigargin (anti-cancer), and helenalin (anti-bacterial) (Hall et al.,

1979; Christensen et al., 1999; Hehner et al., 1999; Eckstein-Ludwig et al., 2003; Wesołoska et al., 2006; Fraga, 2007; Chaturvedi, 2011).

17

Figure 1.8. Major groups of sesquiterpene lactone (STL) skeletons and their hypothetical biogenetic routes. Adapted from Rodriguez et al. (1976).

18

The α-methylene γ-lactone structure is thought to be the major factor responsible for the bioactivities of STLs. This unsaturated carbonyl moiety interacts with the highly nucleophilic groups in cysteine residues of proteins and glutathione, leading to the disruption of enzymatic activity and intracellular redox balance. In addition, the O=C–C=CH2 bonding system of a cyclopentenone group in many STLs interacts with protein and free intracellular glutathione in a similar manner to provide enhanced bioactivities for STLs (Rodriguez et al., 1976;

Chaturvedi, 2011). Because of their cytotoxicity, STLs are stored in specialized cell types, such as glandular trichomes and laticifers to avoid self-toxicity (Sessa et al., 2000; Spring et al., 2003;

Göpfert et al., 2005; Majdi et al., 2011).

1.3.2. Occurrence of STLs in the Asteraceae family

The natural occurrence of STLs have been reported in at least ten plant families including the Magnoliaceae, the Lauraceae, the , and the Apiaceae (Robles et al., 1995;

Zhang et al., 2005). However, the vast majority of STLs are found in the sunflower (Asteraceae or Compositae) family with more than 3,000 structures (Seaman, 1982; Chaturvedi, 2011).

Originating in South America and starting its diversification more than 50 million years ago

(Barreda et al., 2010), the Asteraceae currently has 1,700 genera with an estimation of 24,000 to

30,000 species, including many popular and important crops such as sunflower (Helianthus spp.), lettuce (Lactuca spp.), dandelion ( spp.), artichoke (Cynara cardunculus), chrysanthemum (Chrysanthemum spp.), and marigold (Tagetes spp.). It is arguably the largest flowering plant family, and its extremely diverse members are distributed all over the planet except the Antarctica (Funk et al., 2005). The phylogeny of the Asteraceae family members has been established based on intensive studies using fossil records and molecular data (Jansen and

Palmer, 1987; Panero and Funk, 2008; Barreda et al., 2010). STLs in the Asteraceae have

19 diversified in a lineage-specific manner with the Asteraceae plants and have been commonly used as chemotaxonomic markers of the family (Seaman, 1982; Zidorn, 2008; Scotti et al., 2012)

(Figure 1.9). Given the eco-physiological characteristics of STLs as described earlier, it is reasonable to suggest that STLs play a critical part in the evolutionary success of the Asteraceae.

Figure 1.9. Representative STL structures naturally occurring in the Asteraceae family. Red and black colours indicate the STL skeleton and secondary chemical decorations, respectively. Texts in brackets indicate the species from which these structures are isolated.

20

1.3.3. Biosynthesis of STLs

Despite the exploding results from intensive studies on structures and bioactivities of STLs, the biosynthesis of these compounds remains poorly understood at the molecular level. Early research proposed that germacranolides, particularly the structurally simplest costunolide, is the hypothetical precursor of the diverse STLs (Rodriguez et al., 1976; Seaman, 1982) (Figure 1.8).

Figure 1.10 (right) depicts the proposed biosynthetic pathway of costunolide. As in the biosynthesis of any other sesquiterpenoids, the formation of STLs starts with the cyclization of

FPP to the basic sesquiterpene skeleton germacrene A by germacrene A synthase (GAS). The

C12 of germacrene A is then oxidized in three consecutive steps to yield germacrene A carboxylic acid (germacra-1(10),4,11(13)-triene-12-oic acid) (GAA), followed by another hydroxylation at C6 position to form 6α-hydroxy-GAA. The C6-hydroxyl group and C12- carboxylic group on 6α-hydroxy-GAA subsequently undergo a lactonization to become costunolide (de Kraker et al., 2001a). The STL skeleton is further chemically elaborated to produce more complex and bioactive STLs (de Kraker et al., 2002). A number of GAS genes have been isolated and characterized in the Asteraceae (Bennett et al., 2002; Bouwmeester et al.,

2002; Bertea et al., 2006; Göpfert et al., 2009). However, prior to this research, the oxidation of germacrene A to GAA and other downstream steps were not biochemically elucidated. Based on assays using crude enzyme extracts from , de Kraker et al. (2001a) suggested that germacrene A was hydroxylated at C12 position by a P450 to become germacrene A , which was subsequently further oxidized by an alcohol dehydrogenase and an aldehyde dehydrogenase to yield GAA. The hydroxylation at C6 position of GAA was also demonstrated to be mediated by P450 using chicory extract (de Kraker et al., 2002). Despite these efforts, the genes responsible for these proposed steps were not isolated.

21

Figure 1.10. STL biosynthetic routes in the Asteraceae family. Left: the pathway in Artemisia annua. Right: the proposed general pathway in the Asteraceae. FPP: farnesyl pyrophosphate. AMO: amorphadiene oxidase. DBR2: artemisinic aldehyde reductase. GAO: germacrene A oxidase. Dashed arrow indicates uncharacterized step(s).

22

Research has also been focused on another STL biosynthetic pathway in the Asteraceae, artemisinin. Artemisinin is the well-known anti-malarial STL, naturally occurring only in the

Artemisia annua. In this case, FPP is converted to the sesquiterpene backbone amorpha-4,11- diene (hereafter referred to as “amorphadiene”) by amorphadiene synthase (ADS) (Bouwmeester et al., 1999; Mercke et al., 2000; Kim et al., 2006). Similar to the proposed costunolide biosynthetic route, amorphadiene is also oxidized at C12 position to become sesquiterpene carboxylic acids (Figure 1.10, left). Molecular studies in 2006 have demonstrated that this three- step oxidation is catalyzed by a single P450 named amorphadiene oxidase (AMO or

CYP71AV1) (Ro et al., 2006; Teoh et al., 2006). Although it is dihydroartemisinic acid, not artemisinic acid, that is the true biological precursor of artemisinin (Figure 1.10, left), the critical role of AMO in artemisinin biosynthesis is undisputable and it is the sole enzyme responsible for the oxidation at C12 of amorphadiene to alcohol, aldehyde, and carboxylic acid (Brown and Sy,

2007; Zhang et al., 2008). This is different from the proposed mechanism for germacrene A oxidation by three different enzymes as described earlier by de Kraker et al. (2001a) (Figure

1.10). As the amorphadiene route only occurs in a single species (A. annua), it could be reasoned that this is a divergent evolution product from the more general germacrene A pathway, and that the oxidation of germacrene A might not be catalyzed by three distinct enzymes as proposed

(P450 and dehydrogenase enzymes) but by a single P450 homologous to AMO.

1.4. Approaches to understanding biochemistry of plant specialized metabolism

1.4.1. Traditional biochemical approaches

Traditional biochemical approaches have contributed significantly in the early studies on molecular biology of plants in general and plant specialized metabolism in particular. One of

23 these methods is isolating enzymes from plant tissues. This was applied successfully in characterizing cyclases from sage (Salvia officinalis) (Gambliel and Croteau, 1984) and grand fir (A. grandis) (Lewinsohn et al., 1992). Characterization of enzyme purified directly from plant could go one step further with deducing the amino acid sequence by trypsin digestion, from which degenerate primers were designed to clone the full-length ORF sequence of the responsible gene. Examples include the characterization of synthase, (–)-4S-limonene synthase, and (–)-pinene synthase from grand fir (Bolhmann et al., 1997).

A different traditional method is to probe for genes using degenerate primers designed based on the conserved regions of homologous genes. In fact, the genes responsible for the first committed step in STL biosynthesis, GASs, were cloned using this strategy in lettuce (Lactuca sativa) (Bennette et al., 2002), chicory (Cichorium intybus) (Bouwmeester et al., 2002), and sunflower (Helianthus annuus) (Göpfert et al., 2009).

1.4.2. Genomics-based approaches

Recent progress in DNA sequencing technologies and bioinformatics has significantly facilitated biochemical studies of plant specialized metabolism. With the establishment of expressed sequence tag (EST) databases based on sequencing cDNA libraries of plants, or even specific tissues, of interest, researchers can isolate candidate genes based on sequence similarity to previously-characterized genes using the basic local alignment search tool (BLAST) (Altschul et al., 1990). This strategy has been successfully applied in search for GAS in A. annua (Bertea et al., 2006) and AMO (Ro et al., 2006; Teoh et al., 2006). It is also employed in this research to elucidate the STL biosynthetic pathway in the Asteraceae family, owing to the fact that up to hundreds of thousands of ESTs of many Asteraceae species have been made available through

24 the Compositae Genome Project Database (http://compgenomics.ucdavis.edu/) at the University of California, Davis (USA).

Recent advances brought about by the “next-generation” sequencing technologies with unprecedented coverage (Schuster, 2008) even make this genomics-based approach more applicable. Several deep transcriptome databases have been constructed including the

PhytoMetaSyn Project for high-value plant metabolites (http://www.phytometasyn.ca), from which many sesquiterpenoid biosynthetic genes have been elucidated, such as kunzeaol synthase in Thapsia garganica (Pickel et al., 2012), (+)-epi-α-bisabolol synthase in Lippia dulcis (Attia et al., 2012), and valerena-4,7(11)-diene synthase (VoVDS) in Valeriana officinalis (Pyle et al.,

2012). Similar databases include the Medicinal Plant Consortium

(http://medicinalplantgenomics.msu.edu/), and the Medicinal Plant/Human Health Consortium

(http://uic.edu/pharmacy/MedPlTranscriptome/index.html) are also being constructed. In addition, the increasing number of completely sequenced genomes makes the traditional map- based cloning method easier with ever-detailed information about molecular markers (Austin et al., 2011).

1.4.3. Metabolic engineering and synthetic biology for studying plant specialized metabolism

One of the challenges in studying plant specialized metabolism is the lack of substrates and product standards. Most of the specialized metabolites of interest are intermediates in the pathway and thus present at a trace amount in plants. Their complex structures, which often have chiral center(s), make it difficult to chemically synthesize. There is thus a need to engineer a system that can provide specialized metabolites at high abundance. Such a system not only contributes to research in the field but also serves as a platform for de novo production of valuable natural products.

25

Metabolic engineering is the purposeful modification and/or creation of entire or partial metabolic pathways in living organisms for the desired chemical phenotypes. With the aids of continuous advances in molecular biology and analytical methods, metabolic engineering has been utilized over the last two decades to produce a vast array of products that cannot be easily obtained by chemical synthesis (Stephanopoulos, 1999; Dietrich et al., 2010). The ultimate goal of metabolic engineering is to maximize the desired products from the engineered biological systems. The conventional approach to achieve this goal is to increase the expression level of the involved genes by using strong promoters (Stephanopoulos, 1999) and/or employing high-copy plasmids (Ro et al., 2008). Another approach is to create “better” enzymes with desired properties by directed evolution or rational design (Bloom et al., 2005; Yoshikuni et al., 2006).

Thanks to the new sequencing technologies, plant genome and transcriptome information has recently exploded, and the genes in specialized metabolism are discovered at an increasing speed. The availability of gene sequences and DNA synthesis technology pave the way for the rise of synthetic biology as a strategy to design and assemble desired biological logics in living organisms (Keasling and Nielsen, 2011; Facchini et al., 2012). With these new tools, we have witnessed the establishment of new microbial strains that can produce rare and valuable chemicals (Ro et al., 2006; Chapter 2). Metabolic engineering and synthetic biology also allow the microbial reconstitution of the plant specialized metabolism pathways, including benzylisoquinoline alkaloids and aromatic monoterpenes in yeast (Hawkins and Smolke, 2008;

Herrero et al., 2008), and sesquiterpenoids, monoterpenoids, and some alkaloids in Escherichia coli (Carter et al., 2003; Martin et al., 2003; Nakagawa et al., 2011). Furthermore, many

“artificial” natural products have been synthesized via combinatorial biochemistry in a synthetic biology system, particularly in polyketide biosynthesis (Zhang and Tang, 2008).

26

1.5. Adaptive enzyme evolution

1.5.1. Enzyme evolution in the context of metabolism networks

One of the most intriguing questions in the studies of plant specialized metabolism is how enzymes evolve and contribute to the enormous chemical diversity observed today. Hundreds of thousands of specialized metabolites are the catalytic products of a comparable number of enzymes, accounting up to 15–25% of genes in plant genomes (Somerville and Somerville,

1999; Pichersky and Gang, 2000). This impressive diversity of enzymes in specialized metabolism is the result of gene duplications and neo-functionalization (gaining new functions), and to a smaller extent, domain swapping of the few ancestral enzymes (Ober, 2010).

Gene duplications are mainly resulted from the unequal crossing-overs of the chromosomes.

It has long been theorized that, once such an event happens, one copy of the gene maintains its normal physiological role in the organism while the other duplicate is free to evolve for novel functions (Kinch and Grishin, 2002; Glasner et al., 2006). Under different selection pressures, divergent evolution from these gene duplication events can lead to the modern enzyme families and super-families. Members of each of these groups share sequence identities, many aspects of their catalysis (e.g. common reactions, intermediates or transition states), and overall structures

(Glasner et al., 2006). Among many enzyme families and super-families involved in plant specialized metabolism are TPSs, polyketide synthases, P450s, 2-oxoglutarate-dependent dioxygenases, acyltransferases, O-methyltransferases, and glycosyltransferases (Pichersky and

Gang, 2000; Gang, 2005; Chen et al., 2011).

New metabolic pathways evolve by recruiting several individual members of an enzyme family/super-family as well as members of different enzyme groups. This can be illustrated by

27 the biosynthetic pathway of the defensive compound 2,4-dihydroxy-7-methoxy-1,4-benzoxazin-

3-one-β-D-glucoside (DIMBOA-β-D-glucoside). This pathway in maize (Zea mays) starts with the oxidation of indole in four consecutive steps by four different P450s to 2,4-dihydroxy-1,4- benzoxazin-3-one (DIBOA) (Glawischnig et al., 1998). DIBOA subsequently undergoes several steps catalyzed by glucosyltransferase, dioxygenases, and methyltransferase to yield the final product DIMBOA-β-D-glucoside (Jonczyk et al., 2008). The emergence of enzymatic novelty being recruited for new metabolic pathways, however, is not well understood.

1.5.2. Evolution of new enzyme activities

New enzyme activities following gene duplications has constantly occurred throughout the history of life. However, how new functions efficiently diverge from their progenitor is still debatable (Khersonsky and Tawfik, 2010). As modern enzymes are the products of natural selection, they are not “designed” but built on previous evolutionary starting points (Jacob,

1977). According to Jensen (1976), these starting points were the broad catalytic promiscuity with low efficiencies in primitive enzymes. This broad specificity helped the ancient living organisms survive with the limited coding capacity of their genomes. Moreover, it is this ability of performing several functions that gives the enzyme its evolvability (i.e. being selected), leading to new, specific functions in descendent enzymes under specific selection pressures.

Although most enzymes are often described as highly specific in terms of substrates and products, many enzymes accepting a broad range of substrates and/or producing multiple products have been reported (O’Brien and Herschlag, 1999; Pichersky and Gang, 2000).

Directed evolution studies also showed that “specialist” enzymes could be generated in laboratories from the “generalist” ones (Yoshikuni et al., 2006; Khersonsky and Tawfik, 2010).

This suggests that catalytic plasticity of progenitor enzymes may serve as starting points for

28 evolving to more specialized enzymes through evolutionary selections (Figure 1.11). Most of these findings, however, are only circumstantial in supporting Jensen’s arguments (Khersonsky and Tawfik, 2010), and more data are required to test this hypothesis as well as to better understand the mechanisms underlying the transition from promiscuity to specificity.

Figure 1.11. Jensen’s hypothesis of enzyme evolution from ‘generalist’ progenitor with low level of broad activities to ‘specialist’ descendents with optimized level of specific activities. Adapted from Khersonsky et al. (2006).

1.6. Rationale, hypothesis, and objectives

The primary objective of this research is to advance our understanding of STL metabolism in Asteraceae with a particular emphasis on obtaining evidence of enzyme evolution and chemical diversification. The hypothesis to test is that the P450 enzyme in the ancient and general metabolic pathway displays a higher degree of catalytic promiscuity than the recent and specialized ones. The STL metabolism in Asteraceae can be a good model to assess this hypothesis because a well-studied phylogeny, including the basal clade, has been established in the taxonomy community. Furthermore, the Compositae Genomics Project has deposited an

29 extensive amount of transcriptomics data in the public domain since 2000. In addition, availability of the genetically engineered yeast (EPY 300, Chapter 2) allows me to overcome the difficulties in obtaining costly substrate as well as to elucidate product structure after classical purification.

Specific objectives are:

1) To isolate and characterize the germacrene A oxidase (GAO) from lettuce and related

Asteraceae plants (chicory, sunflower, costus, and Barnadesia spinosa);

2) To analyze the biochemical and phylogenetic relation of AMO and various GAOs to infer the lineage of the biochemical evolution that AMO has taken over 50 million years;

3) To test if evolutionarily earlier GAO enzymes have catalytic promiscuity and, if so, to evaluate the degree of promiscuity in the GAO enzymes; and

4) To determine the structures of the compounds if novel compounds are synthesized due to promiscuous activities.

Experimental data for objectives 1 and 2 are described in Chapter 3, and those of objectives

3 and 4 are described in Chapter 4. At the start of this thesis project, GAO enzyme had yet to be identified and characterized, and their evolutionary lineage in Asteraceae family had not been studied at the molecular level.

30

Chapter 2: METHODS AND MATERIALS

2.1. Plant materials

Five Asteraceae plant species used in this research are lettuce (Lactuca sativa cv Mariska), chicory (Cichorium intybus), sunflower (Helianthus annuus cv HA300), costus (Saussurea lappa), and Barnadesia spinosa. Lettuce, sunflower, and wild-type chicory were cultivated in the greenhouse at the University of Calgary (22 C ± 3, 14-hr light with supplementary light between

October and March). Samples were harvested after four to five months (chicory) or two months

(sunflower). Costus were grown under greenhouse conditions for two months and harvested at the University of Hohenheim (Germany). Barnadesia spinosa frozen ground tissue mixture of leaf and flower was a gift from the Center of Genomics and Bioinformatics at Indiana University

(USA).

2.2. Sequence analysis, RNA extraction, and isolation of cDNAs

2.2.1. Data mining

Sequence information at the start and stop codons of putative germacrene A oxidases

(GAOs) from lettuce, chicory, and B. spinosa was obtained from the Compositae Genome

Project Databases at the University of California, Davis (USA)

(http://compgenomics.ucdavis.edu/) by using Artemisia annua amorphadiene oxidase

(CYP71AV1) as query. The ESTs which display >70% identity to the query were selected as candidate genes for GAOs from the respective plants (Table 2.1).

31

Table 2.1. A list of ESTs that share high sequence identity with A. annua AMO found in the Compositae Genome Project Database

Species GenBank ID Start/stop codons Identity E-value Lettuce DY975738a Start 86% 1 x 10–158 a –163 (Lactuca sativa) DY972978 (none) 87% 6 x 10 DY968314a (none) 87% 1 x 10–161 81,330 ESTs DW137188a Start 86% 1 x 10–148 DY964666b Start 77% 4 x 10–142 DW123725b Start 81% 8 x 10–105 DW124018b Start 81% 4 x 10–104 BQ846488c Stop 80% 2 x 10–87 BQ873310c Stop 80% 2 x 10–87 BQ874735c Stop 80% 3 x 10–87 BQ843287c Stop 80% 3 x 10–87 BQ872261c (none) 80% 2 x 10–61 DY975676a Start 84% 1 x 10–36 BQ856014c Stop 92% 4 x 10–33 BQ848661c Stop 92% 4 x 10–33 Chicory EH676964d (none) CiGAO2 80% 4 x 10–160 –160 (Cichorium intybus) EH689030 Stop CiGAO1 82% 5 x 10 EH696767d (none) CiGAO2 74% 2 x 10–153 53,973 ESTs EH672544e (none) CiGAO1 84% 8 x 10–147 EH673835e Start CiGAO1 85% 2 x 10–145 EH683314d Start CiGAO2 80% 2 x 10–141 EH697393d (none) CiGAO2 74% 5 x 10–140 EH675626d Start 79% 8 x 10–138 EH672499d (none) CiGAO2 80% 2 x 10–133 EH684197d (none) CiGAO2 72% 6 x 10–124 EH679329d (none) CiGAO2 83% 1 x 10–92 EH704913d (none) CiGAO2 78% 3 x 10–56 Barnadesia spinosa GE526649f Start 83% 8 x 10–127 f –125 28,483 ESTs GE537963 Start 81% 1 x 10 GE541913f (none) 82% 7 x 10–125 GE535322 (none) 74% 2 x 10–110 Note: Same letters (a, b, c, d, e, f) indicate ESTs that can be assembled into a contig.

32

Full-length sequence for putative GAO in sunflower, which shares 84% identity with AMO, was acquired from the sunflower glandular trichome cDNA library. This library of 767 unigenes was previously generated from sunflower glandular trichomes where STLs are known to accumulate (Göpfert et al., 2005; Göpfert et al., 2009; Ikezawa et al., 2011). Since transcriptomics information of costus was not available, the primers designed based on conserved regions of various GAO were used to amplify a fragment, and 3’- and 5’-end sequences were determined by rapid amplification of cDNA ends (RACE) technique (see below).

While only one isoform exists in sunflower and B. spinosa, transcriptomics data from lettuce and chicory revealed that two isoforms of putative GAO are present in these plant species. In this work, they were named as LsGAO1 and LsGAO2 (contigs a and b, respectively, in lettuce),

CiGAO1 and CiGAO2 (contigs e and d, respectively, in chicory) (Table 2.1), HaGAO

(sunflower), and BsGAO (B. spinosa).

2.2.2. RNA extraction and cDNA synthesis

Plant tissues were frozen and ground in liquid nitrogen, and 200 mg of ground tissue were vortexed with cetyl trimethyl ammonium bromide (CTAB) extraction buffer, including 0.5 mL

CTAB buffer (20 g L–1 cetrimonium bromide, 100 mM Tris pH 8.0, 20 mM EDTA pH 8.0, 1.4

M NaCl, 10 g L–1 polyvinyl pyrrolidone), 10 µL β-mercaptoethanol, and 5 µL 50 g L–1 spermidine trihydrochloride, at 65 oC.

The mixture was added with 0.5 mL 24:1 chloroform:isoamyl alcohol, vortexed, and centrifuged at 12,000 x g for 20 min at room temperature. The aqueous phase was collected and vortexed with another 0.5 mL 24:1 chloroform:isoamyl alcohol, followed by a centrifugation at

12,000 x g for 20 min at room temperature. The aqueous phase was repeatedly collected after vortexing with chloroform:isoamyl alcohol until protein layer becomes negligible.

33

The aqueous phase was vortexed with 125 µL 10 M LiCl, and the mixture was let precipitated at 4 oC overnight or on ice for one hr. The mixture was then centrifuged at 12,000 x g for 35 min at 4 oC, and the supernatant was discarded. Dried pellet was then resuspended with

50 µL diethylpyrocarbonate-treated water, vortexed with 50 µL 24:1 chloroform:isoamyl alcohol, and centrifuged at 14,000 x g for 30 min at 4 oC. The aqueous phase was vortexed with

100 µL ice-cold 100% ethanol, and the RNA was precipitated for 30 min at –80 oC. Supernatant was removed from precipitate after centrifugation at 14,000 x g for 20 min at 4 oC. The precipitate was dried and washed with 100 µL 75% ethanol, followed by a centrifugation at

14,000 x g for 20 min at 4 oC. Pellet containing total RNA was resuspended in 50 µL diethylpyrocarbonate-treated water.

Reverse transcription of RNA to cDNA was performed using SuperScript® II Reverse

Transcriptase kit (Invitrogen). Complete reactions were assumed for cDNA concentration calculations in later quantitative reverse-transcription PCR (qRT-PCR) experiments.

2.2.3. Isolation of full-length cDNAs for candidate genes

PCR primers were designed based on the sequence information (Tables 2.1, 2.2) to isolate full-length cDNAs of putative GAOs from lettuce, chicory, and sunflower. B. spinosa cDNA clones (Table 2.1: GE526649, GE537963, GE541913, and GE535322) were ordered from the

Arizona Genomics Institute at the University of Arizona (USA). The GE537963 clone was found to contain a full-length cDNA, and was used to designed PCR primers (Table 2.2).

For isolating cDNA of putative GAO in costus (SlGAO), a forward primer, 5’-

ACCGTGGCTCAAAGCTCTCAGTC-3’, and a reverse primer, 5’-

GACTCCCCATAATCGGTCACATGC-3’ were designed based on the highly-conserved domain of other GAOs. A 1.4-kb fragment of SlGAO was amplified, and its 5’- and 3’-sequences

34 were determined by RACE to design primers for full-length cDNA cloning. The reverse primer for 5’-RACE and the forward primer for 3’-RACE were 5'-

GATATCGTGGGTCGTTAAGATCTC-3' and 5'-GAATGTGTCCAGGAGCCGCACTC-3', respectively. All primers for full-length cDNA isolation have XbaI or SpeI sites to clone the entire open reading frames (ORFs) in the SpeI site of pESC vector resulting in in-frame fusion to the FLAG epitope.

Table 2.2. Sequences of primers for cloning full-length cDNAs of GAOs into multi-cloning site 1 of pESC vector

Forward Reverse LsGAO1 CGAGGTCTAGAATGGAGCTTTCAATA GCCCTCTAGAGCAAAACTCGGTACGA ACCACC GTAACAAC CiGAO1 ACGTCTAGAATGGAGCTCTCACTCAC ACGTCTAGAGCAAAACTTGGTACGAG TCACTACTTCCA TATCAATTCGGT HaGAO GCACTAGTATGGAAGTCTCCCTCACC CGATACTAGTGCAAAACTTGGTACAA ACTTC GCATCAA BsGAO ATATCTAGAACCATGGAACTCACTCT ATACTAGTCGAGCAGAGTTGTTAGCA CACCACTTCCC GTCTTGTAAGCTG SlGAO TAATCTAGAATGGAACTCTCCTTCACC TATTCTAGACGAAAACTAGGTACCAG ACTTCCATTGC TACCAAATGAGTC LsGAO2 TGATCTAGAATGGAGATTTCTGTCACC AATTCTAGAGCATAACTTGACGCTTCA ACTACCCTTGGC AGTGTTTTG Note: Underlined sequences indicate recognition sites of XbaI or SpeI.

2.2.4. Phylogenetic analysis of GAOs

Amino acid sequences of AMO and GAOs were aligned using ClustalW algorithm.

Evolutionary relations were inferred using maximum parsimony method in the phylogenetic analysis software MEGA5 (Tamura et al., 2011). The phylogenetic tree was constructed using the Subtree-Pruning-Regrafting algorithm with search level 1 in which the initial trees were

35 obtained by the random addition of sequences (ten replicates). The analysis involved ten-amino acid sequences. All positions containing gaps and missing data were eliminated, and 1,000 replicates of the bootstrap analysis were performed to evaluate the statistical significance of each node.

2.2.5. Isolation of sesquiterpene synthases

Publicly-available sequences of previously-characterized sesquiterpene synthases (STS) were used to design primers to isolate the full-length genes or synthesize their yeast codon- optimized versions. Germacrene D synthase (ScGDS; GenBank ID: AJ583447) (Prosser et al.,

2004) was isolated from Canadian goldenrod (Solidago canadensis) cDNA with the forward primer 5’-AGTTGGGCCCGCCATGGCTGCTAAACAAGGAGAGGTTG-3’, and the reverse primer 5’- TAGCCTCGAGTCAAACACTAATAGCATTAATGAAA-3’. Synthetic yeast codon-optimized valencene synthase (SynVS) was generated by a previous study in our laboratory (unpublished) based on the natural sequences of grapefruit (Citrus x paradisi) valencene synthase (CpVS; see the Appendix). It was subcloned using the forward primer 5’-

AGTTGGGCCCGCCATGAGTTCTGGTGAAACCTTTAG-3’ and the reverse primer 5’-

TAGCCTCGAGTTAGAAAGGAACATGGTCACCTAG-3’ to the yeast expression vector. In this research, synthetic guaiene synthase (SynGUS) was codon-optimized based on

(Aquilaria crassna) guaiene synthase (GU083698) (Kumeta and Ito, 2010), and was subsequently subcloned directly to yeast expression vector. These three genes were cloned into the ApaI and XhoI sites of multi-cloning site (MCS) 2 on the pESC-Leu2d vector.

Valerenadiene synthase (VoVDS; JQ437840) and thioredoxin-fused 5-epi-aristolochene synthase (NtEAS; see the Appendix) were readily available in MCS2 of the yeast expression vector pESC-Leu2d from two other previous studies in our laboratory (Pyle et al., 2012; Nguyen

36 et al., 2012). The sixth sesquiterpene synthase, cotton (Gossypium arboreum) δ-cadinene synthase (GaCDS; U23206) (Chen et al., 1995) was cloned to MCS2 of the pESC-Leu2d vector by the homologous recombination-based method using the forward primer 5’-

GGGCCCGGGCGTCGAGCCATGGCTTCACAAGTTTCTCAAATGC-3’ and the reverse primer 5’-TCTGTTCCATGTCGACAAGTGCAATTGGTTCAATGAGC-3’, with the In-

Fusion® HD Cloning Kit (Clontech Laboratories, Inc).

2.2.6. Selection analysis

Selection analysis of AMO, GAOs, GASs, and 21 FPP synthases (FPSs) of several

Asteraceae species were performed based on the ratios of non-synonymous to synonymous substitions (Ka/Ks). Fifteen chalcone synthases (CHSs) of the family were also included as a universal gene reference (Table 2.3). Coding sequences of AMO and GAOs (excluding 63 first nucleotides corresponding to 21-residue membrane-bound domains), GASs, and FPSs were aligned using ClustalW algorithm.

Table 2.3. List of CHSs and STL biosynthetic genes included in the Ka/Ks measurement

Gene GenBank ID Species Subfamily Reference CHS AB550239 Gynura bicolor Asteroideae Shimizu et al., 2010 Z38096 Gerbera hybrida Mutisioideae Helariutta et al., 1995 Z38098 Gerbera hybrida Mutisioideae Helariutta et al., 1995 DQ521272 Chrysanthemum x morifolium Asteroideae Zhang et al., 2006 (unpublished) AB591825 Dahlia pinnata Asteroideae Ohno et al., 2011 AB591826 Dahlia pinnata Asteroideae Ohno et al., 2011 AB576661 Dahlia pinnata Asteroideae Ohno et al., 2011 JN182806 Silybum marianum Carduoideae Sanjari et al., 2011 (unpublished) DQ350888 Sassurea medusa Carduoideae Yu et al., 2006 (unpublished) EF070339 Rudbeckia hirta Asteroideae Schlangen et al., 2006 (unpublished) Z67988 Callistephus chinensis Asteroideae Radhakrishnan et al., 2010 FJ913888 Ageratina adenophora Asteroideae Guo et al., 2011

37

Table 2.3. (continued) Gene GenBank ID Species Subfamily Reference CHS AM906210 Gerbera cv. ‘Terra Regina’ Mutisioideae Teeri et al., 2007 (unpublished) AM906211 Gerbera cv. ‘Terra Regina’ Mutisioideae Teeri, 2007 (unpublished) AM906212 Gerbera cv. ‘Terra Regina’ Mutisioideae Teeri, 2007 (unpublished) FPS AF019892 Helianthus annuus Asteroideae Gaffe et al., 2000 AY308476 Asteroideae Hemmerlin et al., 2003 AY308477 Artemisia tridentata Asteroideae Hemmerlin et al., 2003 AF112881 Artemisia annua Asteroideae Chen et al., 2000 U36376 Artemisia annua Asteroideae Matsushita et al., 1996 EF675758 Matricaria recutita Asteroideae Mueller, 2007 (unpublished) X82542 Parthenium argentatum Asteroideae Pan et al., 1996 JX424551 Achillea asiatica Asteroideae Liu et al., 2012 JX424552 Achillea asiatica Asteroideae Liu et al., 2012 JX424554 Chrysanthemum lavandulifolium Asteroideae Liu et al., 2012 JX424555 Chrysanthemum lavandulifolium Asteroideae Liu et al., 2012 JX424557 Leucanthemum vulgare Asteroideae Liu et al., 2012 JX424558 Leucanthemum vulgare Asteroideae Liu et al., 2012 JX424559 Tanacetum coccineum Asteroideae Liu et al., 2012 JX424560 Tanacetum coccineum Asteroideae Liu et al., 2012 JX424562 Aster ageratoides Asteroideae Liu et al., 2012 JX424564 Helianthus annuus Asteroideae Liu et al., 2012 JX424568 Leibnitzia anandria Mutisoideae Liu et al., 2012 JX424569 Leibnitzia anandria Mutisioideae Liu et al., 2012 JX424566 Taxaracum mongolicum Cichorioideae Liu et al., 2012 JX424567 Taxaracum mongolicum Cichorioideae Liu et al., 2012 GAS DQ447636 Artemisia annua Asteroideae Bertea et al., 2006 DQ016667 Helianthus annuus Asteroideae Göpfert et al., 2009 EU327785 Helianthus annuus Asteroideae Göpfert et al., 2009 GU176380 Helianthus annuus Asteroideae Göpfert et al., 2010 AF489964 Lactuca sativa Cichorioideae Bennett et al., 2002 AF489965 Lactuca sativa Cichorioideae Bennett et al., 2002 AF498000 Cichorium intybus Cichorioideae Bouwmeester et al., 2002 AF497999 Cichorium intybus Cichorioideae Bouwmeester et al., 2002

38

Ka/Ks was measured in MEGA5 (Tamura et al., 2011) using the modified Nei-Gojobori model with an assumed ratio of 2 for transitional to transverstional mutation (Zhang et al., 1998).

The analysis involved eight nucleotide sequences. All positions containing gaps and missing data were eliminated.

2.3. Yeast platforms for enzymes’ biochemical characterization

2.3.1. Yeast strains

For in vivo enzyme assay, this research employed the yeast (Saccharomyces cerevisiae) strain EPY300 engineered to over-produce FPP (S288C, MATα his3Δ1 leu2Δ0 PGAL1– tHMG1::δ1 PGAL1–upc2-1::δ2 erg9::PMET3–ERG9::HIS3 PGAL1–ERG20::δ3 PGAL1–tHMG1::δ4)

(Ro et al., 2006; Figure 2.1). This metabolic engineering was achieved by: (i) genomic integrations of two copies of 3-hydroxy-3-methyl glutaryl-CoA reductase (HMG-CoA reductase) as an N-terminal truncated version under GAL1 promoter (Donald et al., 1997); (ii) a genomic integration under GAL1 promoter of the mutant version of the transcription factor upc2-1, which activates several genes in the in yeast (Davies et al., 2005); (iii) replacement of the promoter of synthase (ERG9) with the methionine-repressible promoter (PMET3); and (iv) genomic integration of FPP synthase (ERG20) under GAL1 promoter.

Galactose and methionine in culture medium of EPY300 increases the MVA pathway flux and decreases metabolic flux to ergosterol, respectively (Figure 2.1).

39

Figure 2.1. EPY300 yeast platform for in vivo enzyme assays. EPY300 was engineered to overproduce farnesyl pyrophosphate (FPP), the central precursor for sesquiterpenoids. Thick, solid arrows indicate genes (HMG-CoA reductase, and FPP synthase) directly upregulated GAL1 promoter. Empty arrows indicate genes (HMG-CoA synthase, mevalonate kinase, and phosphomevalonate kinase) indirectly upregulated by the transcription factor upc2-1. The cross indicates the step controlled under the methionine-repressible promoter. Dashed arrows indicate the steps catalyzed by recombinant enzymes. For the most part of this research, either the STS-harbouring vector (A) or the triple expression vector (B) was transformed into EPY300 to produce desired sesquiterpenes and oxidized sesquiterpenes, respectively. When yeast containing vector A was cultured, a layer of dodecane was overlaid to trap the volatile sesquiterpene product(s). Adapted from Nguyen et al. (2012).

40

For in vitro enzyme assay, cDNAs encoding for the enzymes of interest are expressed in the protease-deficient S. cerevisiae strain (BY47471, MATα his3Δ1 leu2Δ0 met150 ura30

YPL154c::kanMX4). This strain has deletion in the gene Pep4 (YPL154C), encoding for the vacuolar protease proteinase A (Parr et al., 2007).

2.3.2. Expression constructs

The pESC yeast expression vectors containing FLAG and cMyc epitopes (Agilent) were used to express STSs, P450s, and CPR under GAL1 and GAL10 promoters. This research also uses the high-copy pESC-Leu2d plasmid. This high-copy plasmid only differs from the commercial vector pESC-Leu at the promoter of Leu2 gene (selection marker). In pESC-Leu2d, the majority of promoter of Leu2 gene was deleted, leaving only 29 bp. When grown under leucine-deficient selection conditions, yeast transformed with pESC-Leu2d compensates for the weak promoter activity by increasing the copy number of the plasmid, which results in higher level of gene expression (Ro et al., 2008). For expressing STSs in yeast, isolated cDNAs were cloned into the multi-cloning site (MCS) 2 of pESC under GAL1 promoter.

For in vivo evaluation of the oxidation activities of P450s on sesquiterpenes in yeast, pESC was modified to include an additional expression cassette of MCS2 under the GAL1 promoter. To achieve this triple expression vector, a pESC dual expression vector was generated first by cloning cDNAs of P450 into MCS1 (Table 2.2) and replacing the empty expression cassette of

GAL1 promoter with that containing A. annua CPR from the previously-generated vector pESC-

Ura::CPR (Ro et al., 2006) using BamHI and NheI. The expression cassette of GAL1 promoter–

MCS2–CYC1 terminator of another pESC vector which already contained a STS was amplified with the forward primer 5’-GTCAATCACTACGTGAGTACGGATTAGAAGCCGCCGA-3’ and the reverse primer 5’-GTCAATGCCGGCCTTCGAGCGTCCCAAAACCT-3’, digested

41 with DraIII and NaeI, and ligated into the pESC dual expression vector to generate the triple expression vector pESC::P450/CPR/STS (Figure 2.1, vector B). This triple expression vector could be used to test activities of different P450s on different sesquiterpene substrates simply by replacing the entire cassettes of P450 and STS using the two pairs of restriction enzyme, NotI–

PacI and DraIII–NaeI, respectively.

For in vitro enzyme assay, the aforementioned pESC-Leu2d dual expression vector was used to express P450s and CPR in the protease-deficient Pep4-KO strain. The expression constructs were transformed into yeasts using the lithium acetate/single-stranded DNA/polyethyleneglycol method (Gietz and Schiestl, 2007).

2.3.3. Yeast culture media

Yeast was cultured in either selection (SC dropout) or rich (YPA) media. Components of 1 L

YPA medium include 10 g BactoTM yeast extract, 20 g BactoTM peptone (BD Bioscience), and 80 mg adenine hemisulfate (Sigma-Aldrich). Components of 1 L SC dropout medium include 6.7 g yeast nitrogen base without amino acids (Difco Microbiology), 21 mg adenine, 173.4 mg leucine, 85.6 mg of each of the other 18 standard amino acids (without methionine), and 85.6 mg uracil (Sigma-Aldrich). As EPY300 is selected in SC –His –Met medium, EPY300 containing the expression constructs was grown in SC –His –Met –Leu, or SC –His –Met –Ura, or SC –His

–Met –Leu –Ura, accordingly.

Glucose and galactose were used as carbohydrate sources for yeast cultures. Solutions of

20% (w/v) concentration were prepared separately and added to the media before use. For solid medium, 15 g L–1 Difco Bacto agar (Difco Microbiology) was added.

42

2.3.4. Yeast culture conditions

All yeast inoculations and cultures were performed in a controlled environment incubator shaker at 30 oC and 200 rpm. Cultures started with overnight inoculations (15–20 hr) in appropriate SC dropout media with 2% glucose. For standard in vivo enzyme assays in yeast, inoculations were diluted 50 to 100 folds to the same media with 0.2% glucose, 1.8% galactose, and 1 mM methionine and cultured for 24–120 hr. When only sesquiterpenes were to be analyzed, a layer of dodecane equivalent to 10% of total culture volume was overlaid to trap the volatile hydrocarbons. For preparing enzymes for in vitro assays, inoculations were diluted 50 to

100 folds to the same media with 2% glucose, cultured for 12–15 hr, and shifted to fresh YPA media with 2% galactose for 24 hr. Twenty to 100 mL cultures were carried out for metabolite analysis and microsomal preparation while 1–5 L cultures were used for metabolite purification.

To maintain pH > 6, 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) buffer was added to SC dropout medium to a final concentration of 100 mM (pH 7.0–7.5).

2.3.5. Microsomal preparation, immunoblot analysis, and in vitro enzyme assays

Yeast microsomes were prepared according to the published protocol with a small modification (Pompon et al., 1996). Yeast cells were collected from cultures by centrifugation and broken open in with glass beads (500-µm diameter) at 4 oC in Micro-beadbeater (Biospec

Products) for 90 sec in TES buffer (50 mM Tris, 1 mM EDTA, 0.6 M sorbitol, pH 7.4).

Microsomes were isolated by sequential centrifugation at 10,000 x g and 100,000 x g to collect microsomes. Pelleted microsomes were dissolved in TEG buffer (50 mM Tris, 1 Mm EDTA,

20% glycerol, pH 7.4) and stored at –80 oC.

For immunoblot analysis, microsomal proteins were separated on a 10% SDS-PAGE and transferred onto a polyvinylidene fluoride membrane in transfer buffer (25 mM Tris, 192 mM

43 glycine, 20% methanol). The membrane was blocked in TBS-T buffer (25 mM Tris-HCl, 150 mM NaCl, 0.05% Tween-20, pH 7.5) with 5% (w/v) skimmed milk for at least one hr. The membrane was then incubated with anti-FLAG M2 mouse antibodies (Sigma-Aldrich) in a

1:5,000 dilution in 3% (w/v) skimmed milk to detect P450s fused to FLAG epitope on pESC vectors, washed three times with TBS-T, and incubated with goat anti-mouse antibodies (GE

Healthcare) in a 1:5,000 dilution in 3% (w/v) skimmed milk. The excessive anti-mouse antibodies were then washed three times with TBS-T, and the bound antibodies were visualized with ECL Plus detection reagents (GE Healthcare).

For in vitro enzyme assay, the 0.1–3 mg microsome was used in a reaction mixture of 3 mL

50 mM HEPES/NaOH (pH 7.5) buffer containing substrate, 500 µM NADPH with or without an

NADPH regeneration system (10 mM glucose-6-phosphate and three units of glucose-6- phosphate dehydrogenase). The assays were performed at 23 oC for four hr with gentle agitation.

2.4. Metabolite purification and analysis

2.4.1. Metabolite preparation

For analyzing the profiles of sesquiterpenes produced by transgenic yeast, the aforementioned dodecane overlay was separated from the culture by centrifugation at 3,000 x g for five min. In other cases, the cultures were extracted with ethyl acetate for non-buffered cultures, or adjusted to pH 6.0 before extraction with ethyl acetate for HEPES-buffered cultures.

To evaluate the occurrence of the STL biosynthetic pathway in the subfamily Barnadesioideae,

B. spinosa frozen tissue was extracted with 5 mL ethyl acetate, filtered, dried, and resuspended in methanol for a metabolite profile on liquid chromatography–mass spectrometry (LC–MS) as described below.

44

2.4.2. Metabolite analysis by gas chromatography

Metabolites from transgenic yeast cultures were analyzed on a 6890 N gas chromatography coupled with either a 5975B mass spectrometer (GC–MS) or a 6850 flame ionization detector

(GC–FID) (Agilent). Sample (1–5 µL) were injected onto a DB5-MS column (30-m length x

250-µm inner diameter x 0.25-µm film thickness) for separation with a gradient temperature program. An Agilent chiral Cyclodex-B column was used to distinguish α-costic acid from its stereoisomer β-costic acid. Standard port temperature was 250 oC, but lower (150 oC) and higher

(330 oC) port temperatures were also used. Carrier gas was helium with a constant flow of 1 mL min–1. To facilitate the detection of sesquiterpenoid carboxylic acid in some cases, metabolite samples were derivatized with trimethylsilyldiazomethane (Sigma-Aldrich) in the presence of methanol. Derivatization was quenched by adding acetic acid to ~5% of total volume, and subjected to GC analysis.

2.4.3. Metabolite analysis by liquid chromatography–mass spectroscopy

The ethyl acetate extracts of yeast metabolites were evaporated completely under gentle nitrogen flows and replaced with 20% methanol. Five to ten µL of these mixtures were injected to an Agilent 1200 Rapid Resolution LC system coupled with an Agilent 6410 MS. After separation on a reverse phase Eclipse plus C18 Zorbax column (2.1-mm diameter x 50-mm length, 1.8-µm particle size), the metabolites went through electrospray ionization (ESI) and were subjected to mass spectroscopy. For most of the LC–MS analyses in this research, metabolites were analyzed with a solvent system of water plus 1% acetic acid (A) and acetonitrile (B), starting at the ratio of 80:20 (A:B) to 20:80 (A:B) over 12 min or to 0:100 (A:B) over 14 min. Both negative and positive modes were used for total ion scans as well as selected ion modes.

45

2.4.4. Metabolite purification for structural elucidation

The ethyl acetate extracts were concentrated in a vacuum concentrator. For GAA, the concentrated extract solvent was replaced with methanol and fractionated on a reverse-phase

Sep-Pak Plus C18 column (55–105-µm particles) (Waters) with a step-wise solvent system of water and acetonitrile. For other compounds of interest, the concentrated extracts were separated by thin-layer chromatography (TLC) or liquid chromatography on silica gel 60 F254 (Merck) with a solvent system of hexane:ethyl acetate. In some cases, purification was facilitated by a basic wash (saturated NaHCO3, pH 8.0) followed by acidification with HCl (pH 2.0) and ethyl acetate extraction. These preliminary purifications were followed by further separation on high- performance liquid chromatography (HPLC) by an isocratic solvent system (40% acetonitrile, adjusted to pH 2.5 with trifluoroacetic acid) on a Grom-Sil 120 ODS-5ST column (4.6 mm x 250 mm, 5 µm) in a P80 Dionex Liquid Chromatography system for ilicic acid. Optimized gradient solvent systems (water and acetonitrile or methanol, adjusted to pH 4.0 with acetic acid) on a

Waters Nova-Pak C18 column (3.9 mm x 150 mm, 4 µm) or Waters SunFireTM C18 column (4.6 mm x 150 mm, 5 µm) in a Waters 279 Separation Module was also used. Fractions containing target sesquiterpenoid carboxylic acids were visualized as blue spots on TLCs by p-anisaldehyde

o stain (p-anisaldehyde:H2SO4:ethanol = 5:5:95, heated to 120 C for 1–2 min) or Hanessian’s

o stain (2.5 g (NH4)2MoO4 and 1.0 g Ce(NH4)4(SO4)4.2H2O in 10% H2SO4, heated to 180 C for 1–

2 min).

2.4.5. Structural analysis

Structural information of metabolites was elucidated with high-resolution mass spectrometry in a Fourier transform ion cyclotron resonance mass spectrometry or a 6500 quadrupole time-of- flight (qTOF) mass spectrometry (Agilent), followed by nuclear magnetic resonance (NMR)

46 spectroscopy. For costic and ilicic acids, NMR spectra were acquired on a Varian Unity Innova

500 MHz spectrometer equipped with a 3-mm indirect detection pulsed field gradient probe. The

1 13 H and C NMR chemical shifts were referenced to solvent signals at δH/C 7.14/127.68 (C6D6) relative to tetramethylsilane. Two-dimensional NMR spectra included gCOSY, TOCSY,

ROESY, gHSQCAD, and gHMBCAD. Adiabatic broadband and band-selective GHSQCAD and

GHMBCAD spectra were recorded using CHEMPACK 4.0 pulse sequences (in Varian Vnmrj

2.1B spectrometer software). For GAA, 1H and 13C NMR spectra were recorded at 400.13 and

100.6 MHz on a Bruker AVANCE 400 spectrometer with a 5-mm inverse probe with triple axis gradients and standard pulse sequences under Xwin-nmr. Chemical shifts were referenced to tetramethylsilane. Experiments included one-dimensional 1H and 13C with proton decoupling,

13C double quantum fileter, TOCSY with 60-ms mixing time, heteronuclear multiple quantum coherence, and heteronuclear multiple bond coherence. For other compounds, NMR spectra were recorded on Varian Innova 500 MHz, Varian (500 MHz (equipped with cryo-probe), Varian 600

MHz, and Varian 700 MHz spectrometers. Chemical shifts of 1H and 13C spectra were referenced to solvent signals at δH/C 7.24/77.0 (CDCl3), 3.30/49.0 (CD3OD), or 5.33/54.2 (CD2Cl2). Signal assignments were achieved with COSY, HMQC, HMBC, HSQC, APT, and ROESY experiments.

2.5. Gene expression analysis in lettuce by qRT-PCR

RNA from different tissue types of lettuce, including root, stem, latex, young leaf (top three leaves), old leaf (bottom three leaves), and floral bud, were extracted using the CTAB method, and its cDNA was synthesized according to the aforementioned methods. Primers were designed to amplify fragments of 143 and 137 bp from LsGAO1 and LsGAO2, respectively, and to span

47 the second of the two introns recently identified in LsGAO genes to avoid amplifying contaminated genomic DNA (Figure 2.2).

Primer–template specificities were tested before qRT-PCR experiments with Taq polymerase for 25 cycles of the following program: 94 oC in 30 sec (template denature), 58 oC in

30 sec (primer–template annealing), 72 oC in 30 sec (primer elongation). This thermocycling was preceded by 94 oC in 2 min for initial denature, and followed by 72 oC in 10 min for final primer elongation.

Power SYBR® Green PCR Master Mix was used in quantitative PCR on a StepOnePlusTM

Real-Time PCR system (Applied Biosystem) for 40 cycles of template denaturation, primer– template annealing, and primer elongation. The qRT-PCR melt curves were recorded and qRT-

PCR products were run on agarose gels to evaluate the specificity. Lettuce actin was used as reference with the forward primer 5’-GGAGATGAGGCACAATCCAAAAGAGG-3’ and reverse primer 5’-CACGGAGCTCGTTGTAGAAAGTGTGA-3’. Transcripts of LsGAOs and actin were relatively quantified using the method described by Pfaffl (2001).

2.6. Structure–function analysis of AMO and GAOs

2.6.1. Homology modelling

Structural models of AMO and GAOs were constructed using the modelling software

MODELLER (Eswar et al., 2007) based on crystal structures of CYP2C9 (PDB: 1R9O),

CYP19A1 (PDB: 3EQM), and CYP1B1 (PDB: 3PM0). These templates were suggested by the homology detection and structure prediction server HHpred (Hildebrand et al., 2009). The model was energy-minimized using the Crystallography & NMR System (Brunger et al., 1998).

48

Figure 2.2. Alignment of LsGAO1 and LsGAO2 cDNA sequences by ClustalW algorithm. Positions that share identity are shaded. Red boxes indicate forward and reverse primers for quantitative PCR. Blue triangles indicate intron positions as identified by comparison with genomic DNA sequence (The Compositae Genome Project, personal communication).

49

Models of substrates (amorphadiene and germacrene A) were generated and energy- minimized using ChemBioDraw Ultra 12.0 (CambridgeSoft). In silico docking of substrates into the enzymes’ models was performed using GOLD Docking (Cambridge Crystallographic Data

Centre, UK). Structural data were analyzed using the molecular visualization system PyMOL

(Schrödinger LLC). Docking solutions that showed close proximity of substrate’s C12 and/or C2

(only (+)-valencene) to the protein’s heme and no clash between the substrate and side chains of the protein’s amino acids were selected for structural analysis. All residues within 10-Å radius of the substrates were selected and compared between the P450s.

2.6.2. Site-directed mutagenesis

Amino acids positioned within the 10-Å proximity of the docked-substrates and different between AMO and GAOs were selected for mutagenesis. The codons encoding for these selected amino acids in LsGAO1 and BsGAO were changed to the corresponding codons in AMO by gene synthesis. The two new mutant genes were synthesized by GenScript, and named LsAMO and BsAMO. LsAMO and BsAMO were cloned into pESC-Leu2d-based triple expression vector with CPR, and amorphadiene synthase (ADS) using the aforementioned method. The vector was transformed into the EPY300 yeast strain and grown in SC media. After 48 hr of culture, 10 mL was extracted with 2 x 1.5 mL of ethyl acetate, and analyzed on GC–MS and GC–FID for artemisinic acid production. The rest of the culture (20 mL) was used for microsomal preparation.

50

Chapter 3: BIOCHEMICAL CONSERVATION AND CROSS-REACTIVITY

OF GERMACRENE A OXIDASE IN THE ASTERACEAE

3.1. Introduction

More than 4,000 STLs have been structurally characterized to date, and some of their eco- physiological roles in plants (e.g., anti-herbivorous, anti-microbial, and allelopathic) and pharmaceutical activities for human (e.g., anti-malarial and sedative) have been demonstrated

(Herz, 1977; Furuya et al., 1994; Eckstein-Ludwig et al., 2003; Wesołowska et al., 2006).

However, when this project was initiated, the biosynthetic pathway of STL has yet to be fully understood at the molecular level.

The proposed biosynthetic route of costunolide, the simplest STL, involves the following steps: cyclization of FPP into germacrene A by a germacrene A synthase (GAS); oxidation of germacrene A at C12 position to yield GAA by a P450; hydroxylation of GAA to form 6α- hydroxy-GAA by another P450, and lactonization of C6-hydroxyl and C12-carboxylic group to form costunolide (Figure 1.10, right). Subsequent chemical decorations on the basic STL backbone by other modifying enzymes allow the synthesis of diverse STLs.

GAS from lettuce, chicory, and sunflower have been characterized in the Asteraceae family

(Bennett et al., 2002; Bouwmeester et al., 2002; Bertea et al., 2006; Göpfert et al., 2009). The downstream steps of the costunolide pathway were not well understood although it was proposed that three distinct enzymes (i.e., P450, alcohol dehydrogenase, and aldehyde dehydrogenase) are involved in the transformation of germacrene A to germacrene A acid, based on chicory cell-free assays (de Kraker et al., 2001a). However, in artemisinin biosynthesis, an enzyme that catalyzes three consecutive oxidations of the sesquiterpene amorphadiene to artemisinic acid has been

51 identified (Ro et al., 2006; Teoh et al., 2006). This discovery strongly suggested that similar multi-functional P450 could catalyze the conversion of germacrene A to germacrene A acid in lettuce and related Asteraceae species.

This chapter focuses on identifying and characterizing germacrene A oxidase (GAO). This enzyme plays a critical role in the biosynthesis of costunolide, the hypothetical central intermediate of various STLs. Lettuce (Lactuca sativa) has been known to accumulate STLs in the laticifers of stems and leaves (Sessa et al., 2000), and hence I used lettuce as a starting plant material to isolate GAO. Since lettuce is an important crop, extensive ESTs were generated and deposited at the public domain, which expedite the STL research. From lettuce GAO, the biochemical conservation of this multifunctional activity was also demonstrated throughout the

Asteraceae family.

3.2. Isolation of GAO and pathway reconstitution in engineered yeast

3.2.1. Isolation of GAO from lettuce

As amorphadiene oxidase (AMO) and putative GAO catalyze very similar reactions (Figure

1.10), it was reasoned that GAO is most likely to be a P450 homologous to AMO in its primary sequences (Ro et al, 2006; Teoh et al., 2006). To identify GAO from lettuce, publicly available lettuce EST database was searched using AMO as query. This BLAST search identified two isoforms of putative GAO (Table 2.1). The first isoform named as LsGAO1 was present as a full- length cDNA and was isolated by PCR-amplification using primers designed at the start and stop codons. However, the second isoform (LsGAO2) lacked the 3’-end of the cDNA. LsGAO1 was thus selected for functional characterization first. The isolated LsGAO1 cDNA shared 83% nucleotide identity with AMO, and encodes a polypeptide of 488 amino acids with a predicted

52 molecular mass of 54.9 kDa. The deduced amino acid sequence of LsGAO1 showed 86% identity to A. annua AMO. The isolation and characterization of LsGAO2 is presented in Chapter 4.

Germacrene A was required as substrate to test the activities of LsGAO1. This volatile compound exists at a trace level in plants and is not commercially available. Therefore, the yeast strain EYP300, which was previously engineered to produce an elevated level of FPP (an immediate precursor of germacrene A), was used for this purpose. Over-expression of lettuce

GAS in EPY300 demonstrated that significant amount of germacrene A hydrocarbon could be synthesized in vivo from FPP (Bennett et al., 2002; Göpfert et al., 2009). The sesquiterpene oxidase activities of LsGAO1 could thus be evaluated in EPY300 co-expressing GAS and P450 redox partner, CPR. The ORFs of GAO and CPR were cloned and fused with a FLAG epitope under GAL10 promoter and a cMyc epitope under GAL1 promoter, respectively, in pESC-Ura vector. The resulted dual expression vector pESC-Ura::GAO/CPR were co-transformed into

EPY300 yeast with the previously generated pESC-Leu::GAS vector. Upon induction by galactose, these genes were simultaneously expressed and confirmed by immunoblot analysis with FLAG and cMyc antibodies (data not shown).

3.2.2. Metabolite profiles of EPY300 expressing GAS, GAO, and CPR

Three EPY300 yeast strains were generated to investigate the identified P450 function.

These are GAS single expressor, LsGAO1/CPR co-expressor, and the yeast co-expressing GAS and LsGAO1/CPR. After 48-hr induction by galactose, their metabolites were extracted and subjected to GC–MS analysis. Since the expected product is a sesquiterpene carboxylic acid, the metabolite extracts were derivatized to convert carboxylic acids to methyl esters to enhance detection sensitivity in GC–MS analysis.

53

Figure 3.1. GC–MS profile of metabolite extracts from transgenic yeast. A: GC–MS chromatograms of metabolite extracts from EPY300 transformed with the indicated genes (lines a and b are negative controls). Inset shows the separation of peaks 2 and 3 on the chiral Cyclodex-B column. B: Electron impact mass fragmentation spectra for the four novel peaks of line c. Adapted from Nguyen et al. (2010). 54

Results showed that, instead of one novel peak compared to the negative controls, yeast expressing LsGAO1/CPR/GAS produced four new metabolites as shown in the GC–MS chromatograms (Figure 3.1.A). All of these four compounds showed the same parental mass of m/z 248, corresponding to the methylated form of sesquiterpene carboxylic acid (C16H24O2), but they have distinct mass fragmentation spectra in electron impact (EI)–MS (Figure 3.1.B). Peaks

2 and 3 were not well separated on DB-5 MS column but they could be clearly distinguished on a chiral Cyclodex-B column (Figure 3.1.A).

The non-derivatized metabolite extracts were also analyzed on LC–MS in both positive and negative ionization modes. Results showed that peaks 1, 2, and 3 in previous GC–MS analysis

(Figure 3.1.A) had parental ions of m/z 233 and m/z 235 in negative ([M–H]–) and positive

([M+H]+) modes, respectively. The electrospray ionization (ESI) technique produces ions in LC–

MS analysis generally by addition (positive mode) or abstraction (negative mode) of a proton from the analyte. The nominal masses of the parental ions of peaks 1, 2, and 3 are thus 234

(C15H22O2), supporting the indication above that these were the products of three-step oxidation from germacrene A (C15H24). Interestingly, peak 4 had m/z 251 in (–)-LC–MS and m/z 235 in

(+)-LC–MS. In (+)-LC–MS, water loss after protonation is very common and hence m/z 235 positive ion is likely to be derived from m/z 253 ([M+H]+) ion after water loss (–18 mass).

Therefore, the parental ion of peak 4 had a neutral nominal mass of 252, corresponding to a hydrated sesquiterpene carboxylic acid (C15H24O3).

3.2.3. Structural analyses of the products isolated from yeast culture

Chemical purification of the novel metabolites was pursued for structural elucidation by

NMR spectroscopy. This analytical technique requires milligram level of analytes, and yeast

55 culture was therefore up-scaled to 2.5-L to produce sufficient amount of target metabolites. To enhance the plasmid stability and copy numbers in yeast, all three genes (i.e., GAS, GAO, and

CPR) were cloned to a single high-copy triple expression construct pESC-

Leu2d::LsGAO1/CPR/GAS (Figure 2.1, vector B).

Through TLC and HPLC, compounds 1, 3, and 4 were successfully isolated and subjected to extensive one- and two-dimensional NMR analyses. Results showed the structures of compounds

1 and 4 to be γ-costic acid and ilicic acid, respectively (Table 3.1; Figure 3.2). Although a small amount of the compound 3 could be purified, partially-assigned NMR signals indicated that it was β-costic acid. Purification of compound 2 could not be achieved. However, as γ- and β- costic acids are two of the three cyclization products of GAA, based on previous knowledge of cyclization of germacrene A-type structures (de Kraker et al., 1998), it was assumed that compound 2 is the third cyclization form, which is α-costic acid, based on the observed m/z 234 in GC–MS analysis. Ilicic acid, in this case, was speculated to be the hydration product of the double bonds in costic acids (Figure 3.2.D).

3.2.4. Altered metabolite profiles of transgenic EPY300 in pH-adjusted condition.

No GAA was detected as predicted. Instead, three isomers of costic acid and a hydrated form of costic acid, ilicic acid, were identified from the yeast culture. Therefore, it was speculated that GAA synthesized in yeast had been cyclized due to the acidic conditions of yeast cultures (pH dropped from 6 to ~3 after 48 hr). To overcome the natural acidification of yeast cultures, the media were buffered by HEPES/NaOH (pH 7.5) to maintain neutral pH during the course of the experiment. After 48 hr of cultivation of EPY300 containing pESC-

Leu2d::LsGAO1/CPR/GAS, pH of the media was ~6. Metabolites were extracted and analyzed on

56

GC–MS without derivatization to avoid any contact with acidic reagents which might cause structural rearrangement.

GC–MS analysis revealed a very different metabolite profile from that of acidic culture. The four novel peaks described earlier (Figure 3.1) became very minor. Instead, a broad peak with m/z 234, corresponding to sesquiterpene carboxylic acid, appeared as a dominant feature of the chromatogram (Figure 3.2A, top). Previous GC–MS analyses already reported a broad peak corresponding to germacrene A (de Kraker et al., 2001b). The broadening peak is attributed to the on-column heat-induced Cope rearrangement of a germacrene A-type structure, which was speculated to be GAA in this case, into a faster migrating β-elemene-type structure (Figure

3.2.D).

To evaluate whether the broad peak in GC–MS analysis represented the precursor of the costic acids, metabolite extract from neutral cultures were subjected to acidification with trichloroacetic acid (pH < 2.0). When the acidified sample was re-analyzed on GC–MS, the broad peak no longer existed but three peaks identified as the three costic acids were clearly detected (Figure 3.2.A, middle). In addition, to test if the broad peak in GC–MS metabolite profile of neutral culture was indeed a germacrene A-type structure whose rearrangement could be induced by heat, GC inlet temperature was raised from 150 oC to 330 oC to re-analyze the sample. As expected, the size of the broad peak decreased significantly, and the minute peak preceding this broad peak increased 19 folds (Figure 3.2.A, bottom). This result convincingly suggested that high inlet temperature facilitated a more complete rearrangement, and the fast- migrating compound was the heat-induced rearranged product, β-elemenic acid.

57

Figure 3.2. GC–MS and LC–MS comparative analyses of sesquiterpenoids from yeast co-expressing LsGAO1, CPR, and GAS in acidic and neutral conditions. A: GC–MS chromatograms of metabolite extracts from neutral culture at different inlet (injection port) temperatures. TCA indicates the treatment of metabolite extracts with trichloroacetic acid (pH < 2.0). B: Extracted ion chromatograms of (–)-LC–MS analyses of metabolite extracts from acidic and neutral cultures. The extracted ions included m/z 171, 233, and 251, corresponding to decanoic acid (internal standard), sesquiterpenoid carboxylic acids (costic and GAAs), and ilicic acid, respectively. C: Extracted ion chromatograms of (–)-LC–MS analysis of in vitro enzyme assay. Adapted from Nguyen et al. (2010).

58

LC–MS analyses also showed a clear difference of metabolite profiles between acidic and neutral (buffered) cultures of EPY300 co-expressing LsGAO1/CPR/GAS. Metabolite profile of acidic culture showed dominant peaks of costic and ilicic acids with minimal amount of the putative GAA (Figure 3.2.B, top). In contrast, (–)-LC–MS analysis of neutral culture showed decreases of 15% and 34% of costic and ilicic acids, respectively, compared to their corresponding levels in acidic cultures. The abundance of a new sesquiterpene carboxylic acid peak (m/z 233) increased 44 folds (Figure 3.2.B, bottom). Furthermore, (–)-LC–MS analysis for the in vitro enzyme assay of LsGAO1 at neutral conditions (pH > 6.0) only showed a single dominant product (Figure 3.4.C) with m/z 233, but it was not costic acids as seen in acidic culture of yeast co-expressing LsGAO1/CPR/GAS. These results showed that transgenic yeast grown in buffered conditions dramatically altered the metabolite profile, reducing the rearranged products and producing a single dominant peak of sesquiterpene carboxylic acid.

3.2.5. Structural elucidation of GAA

To purify the dominant compound synthesized in neutralized yeast culture, culture of

EPY300 transformed with pESC-Leu2d::LsGAO1/CPR/GAS was up-scaled to 1-L in

HEPES/NaOH-buffered conditions and was incubated for 48 hr. About 4 mg of the new sesquiterpene carboxylic acid compound was isolated after solid-phase extraction using Sep-

Pak® C18 cartridge (Waters) and HPLC. High-resolution Fourier transform ion cyclotron resonance mass spectroscopy showed that the new compound had an ion at m/z 233.1547 in negative mode, corresponding to the exact molecular mass for the [M–H]– ion of a sesquiterpene

– carboxylic acid (C15H21O2 ).

59

Table 3.1. 1H and 13C NMR spectral data of costic acids (1, 3), ilicic acid (4), and GAA.

germacrene A acidΔ γ-costic acid (1)* β-costic acid (3)* ilicic acid (4)* 13C 13C 13C◊ 13C position (δ, ppm) 1H (δ, mult; J [Hz]) (δ, ppm) 1H (δ, mult; J [Hz]) (δ,ppm) 1H (δ, mult; J [Hz]) (δ,ppm) 1H (δ, mult; J [Hz]) 1.30, bt; J=12.0 1.08,dt;J=4.5,13.0 1.16, brm 1 126.06 4.76, bd, J=7.4 40.20 41.4 40.39 1.46, ov 1.28, ov 1.26, brm 2.04, ov 1.55, ov 1.55, ov 1.42, m 2 26.22 19.13 19.78 2.23, ov 1.55, ov 1.47, ov 1.36, m 1.83, m 1.82, m 1.89, dt,J=5.5,13.3 1.60, bt, J=13.1 3 38.70 33.06 42.66 2.12, ov 1.94, m 2.23, bd, J=13.2 1.86, bd, J=13.0 4 128.38 125.04 73.28 5 131.26 4.51, bd, J=10.1 134.14 49.7 1.70, bd 54.27 1.51, bd, J=4.8 2.08, ov 1.75, bt J=13.3 1.27, ov 0.87, ddd, J=3x12.0 6 34.37 31.60 - 27.44 2.16, ov 2.79, ddd, J=2.6, 3.2, 13.5 1.74, ov● 2.30, bd, J=12.6 7 43.59 2.49, bt, J=10.2 40.30 2.55, tt, J=3.7, 12.6 43.4 2.59, bt, J=11.8 39.20 2.60, bt, J=11.0 1.58, ov 1.49, ov 1.34, ov● 8 33.56 27.86 25.54 1.38, ov 1.61, ov 1.59, ov 1.59, ov● 2.03, ov 1.28, dt, J=4.0 1.14, ov● 9 41.08 42.02 43.91 1.33, ov 2.29, ov 1.45, ov 1.31, ov● 10 137.29 34.33 35.6 34.69 11 149.36 144.99 145.44 12 168.08 172.29 170.47 5.52, bs 5.35, bs 5.32, bs 5.33, bs 13 120.89 124.49 123.4 122.37 5.96, bs 6.30, bs 6.24, bs 6.32, bs 14 15.88 1.35, s 24.38 1.02, s 16.0 0.70, s 18.14 0.70, s 4.48, s 15 16.45 1.49, s 19.09 1.59, bs 105.4 20.95 1.07, s 4.77, s brm: broad multiplet; bs: broad singlet; bt: broad triplet; d: doublet; dd: doublet of doublets; ddd: doublet of doublets of doublets; m: multiplet; tt: triplet of triplets; ov: (partially) overlapped signals. Δ sample measured in DMSO-D6 at 400 MHz with TMS as internal standard, exhibited 13 exchange broadening. *samples measured in C6D6 at 500 MHz. ◊ Due to limited amount of sample C chemical shifts could only be partially detected by gHSQCAD and gHMBCAD. ● Values may be interchanged. Adapted from Nguyen et al. (2010)

60

Complete signal assignment of the compound by one- and two-dimensional NMR experiments identified the structure to be GAA (germacra-1(10),4,11(13)-trien-12-oic acid)

(Table 3.1). Extensive peak broadening recorded in 1H-NMR spectrum of the compound was attributed to the occurrence of different conformers of germacrene A-type structures as observed previously in analyzing NMR signals of germacrene A (Faraldos et al., 2007). In summary, the in vivo, in vitro, and chemical analysis data demonstrated that LsGAO1 encodes for an enzyme catalyzing the three-step oxidation of germacrene A to GAA in lettuce.

3.2.6. AMO/GAO homologues in the Asteraceae

The occurrence of STLs is considered a characteristic of the Asteraceae family. It is, therefore, hypothesized that the enzymes responsible for the oxidation of sesquiterpene in the

STL biosynthesis in other species are homologous enzymes to AMO in A. annua and LsGAO1 in lettuce. A combination of EST mining in the Compositae Genome Project Database

(http://compgenomics.ucdavis.edu/), PCR of conserved region, and rapid amplification of cDNA ends was employed to isolate full-length sequences of AMO/GAO homologues in selected species (Table 2.2).

Sunflower (H. annuus), chicory (C. intybus), and costus (S. lappa) were chosen as representatives of the Asteroideae, the Cichorioideae, and the Carduoideae, respectively. These are the core subfamilies that constitute 95% of all Asteraceae species. In addition, B. spinosa was selected as the representative of the phylogenetically basal subfamily Barnadesioideae (Figure

3.5.A), which split from the rest of the family over 50 million years ago (Panero and Funk, 2008;

Barreda et al., 2010). As AMO is unique to A. annua, the homologues from other species were considered putative GAO and named as such. Except chicory, in each of these species, a copy of

61 putative GAO was found. Similar to that of lettuce, the EST database of chicory revealed two versions of GAO, named CiGAO1 and CiGAO2 (Table 2.1). Only CiGAO1 was successfully isolated in this research.

The enzymatic activities of the isolated GAO clones were examined in the EPY300 yeast system by co-expressing with CPR and GAS using the triple expression vector pESC-

Leu2d::GAO/CPR/GAS (Figure 2.1, plasmid B). To avoid the complication of possible heat- induced Cope rearrangement of germacrene A-type structures on GC–MS (Figure 3.2.A), all the metabolite extracts were analyzed on (–)-LC–MS. Results showed that all GAO homologues, even that from the basal species B. spinosa, produced a single metabolite with the identical retention time and m/z 233 as those of GAA produced by LsGAO1. Quantitative analyses indicated comparable amounts of GAA are synthesized in all samples, except for the sunflower

GAO which is 10–15 fold lower (Figure 3.3.A). Microsomes of yeast expressing GAOs were isolated for semi-quantitative immunoblot analyses. Data revealed that heterologous expression level of sunflower GAO in EPY300 yeast was at least 10-fold lower than those of other GAOs

(Figure 3.3.B). Normalization of (–)-LC–MS signal against immunoblot data resulted in a comparable activity of sunflower GAO to other homologues. These data indicate a strong conservation of GAO activities throughout the Asteraceae family.

62

Figure 3.3. Biochemical analyses of GAOs from various species of the Asteraceae family. A: Extracted ion chromatograms of (–)-LC–MS analyses for m/z 171 (decanoic acid, internal standard) and m/z 233 (GAA). Arrow indicates GAA, and arrow head indicates the internal standard. The yields of GAA are given for each clone (mean ± SD, four independent transformants). B: Immunoblot analysis of recombinant GAOs in EPY300 yeast. Loaded microsome amounts are indicated. Negative control was EPY300 transformed with pESC-Leu2d::GAS. Adapted from Nguyen et al. (2010).

63

Figure 3.4. (–)-LC–MS analysis in selected ion monitoring mode of metabolite extract from Barnadesia spinosa leaf and flower tissue for m/z 233. Metabolite profile of B. spinosa tissue is represented by the top chromatogram. Standards for GAA and costic acids are shown in the middle and bottom chromatograms, respectively. Adapted from Nguyen et al. (2010).

3.2.7. Occurrence of the STL biosynthetic pathway in B. spinosa

Chemical analyses and in vitro enzyme assays in previous studies have demonstrated the possible involvement of GAA in the biosynthesis of STLs in lettuce, chicory, sunflower, and costus (de Kraker et al., 2001a, 2001b, 2002; Göpfert et al., 2009). However, the biochemical relevance of GAA and the occurrence of a STL biosynthetic pathway have not been addressed in

B. spinosa. Therefore, an experiment focusing on the presence of GAA in B. spinosa was conducted by using (–)-LC–MS analysis of the metabolite extract from B. spinosa leaf and flower tissues.

Analysis of the metabolite extracts in selected ion monitoring of m/z 233 indicated the presence of GAA and costic acids at the concentration of 0.013% (weight per tissue fresh

64 weight) (Figure 3.4). This low abundance demonstrated that GAA was most likely an intermediate metabolite in B. spinosa. In summary, yeast in vivo biochemical data of BsGAO and metabolite analysis of B. spinosa leaf and flower tissues indicated that the three-step oxidation of germacrene A to GAA at C12 position is conserved in B. spinosa, in the basal subfamily

Barnadesioideae of the Asteraceae.

3.2.8. Phylogenetic analysis of GAOs and AMO

The amino acid sequences of GAOs share significant sequence identities with each other

(79–97%) (Table 3.2). Interestingly, AMO share a higher degree of homology to GAOs from lettuce, chicory, sunflower, and costus (84–86%) while BsGAO only share 79–84% identity with the other GAOs. To obtain a better insight into the evolution of AMO and GAOs, a phylogenetic tree was reconstructed for AMO and five GAOs. Two additional P450s, tobacco 5-epi- aristolochene dihydroxylase (EAH) and Hyoscyamus muticus premnaspirodiene

(HPO) (Ralston et al., 2001; Takahashi et al., 2007), were chosen as out-groups as they are non-

Asteraceae sesquiterpene oxidases.

Table 3.2. AMO/GAO homologues in the Asteraceae

predicted amino acid identity GenBank ID ORF size polypeptide mass AMO LsGAO1

CiGAO1 GU256644 1,467 bp 54.9 kDa 86% 97%

SlGAO GU256645 1,467 bp 55.0 kDa 86% 91%

HaGAO GU256646 1,467 bp 55.0 kDa 84% 89%

BsGAO GU256647 1,491 bp 55.6 kDa 79% 84%

65

Figure 3.5. Phylogenetic analysis of AMO and GAOs from various plants in the Asteraceae. A: Simplified phylogeny of the Asteraceae family based on the phylogenetic tree constructed by Panero and Funk (2008). Asterisks and parentheses indicate the subfamilies and the species, respectively, from which GAOs were isolated. B: Phylogenetic tree of AMO/GAOs in the Asteraceae family. HPO (Hyoscyamus muticus premnaspirodiene oxygenase) and EAH (tobacco 5-epi-aristolochene-1,3-dihydroxylase) were used as out-groups. Bootstrap values were shown in percentage values from 1,000 replicates. Scale bar indicates number of changes.

66

The reconstructed phylogenetic tree showed that GAOs from sunflower, lettuce, chicory, and costus together with AMO constitute a different lineage from BsGAO. Within this lineage,

AMO forms a distinctive node from the major GAO clade with a strong statistical support

(Figure 3.5.B). When the phylogeny of the Asteraceae family is considered (Figure 3.5.A), it is observed that GAO from sunflower does not form a clade with AMO from A. annua, despite sunflower and A. annua are grouped in the same subfamily Asteroideae. These data indicated that the AMO recently underwent a specific biochemical microevolution that was not reflected by the overall speciation pattern in this subfamily.

3.2.9. Cross-reactivities of AMO and GAOs

The astonishingly high identities between AMO and GAOs are best presented with their amino acid sequence alignment (Figure 3.6). After excluding the 21-amino acid membrane bound domains from the P450s, there are only 21 residues in AMO that are different from the corresponding conserved residues in GAO, and only seven residues unique in AMO that were not present among the variable corresponding amino acids in GAOs. Such high sequence identities raised the question of substrate specificities between the two types of enzyme. To address this question, cross-reactivities of GAOs towards amorphadiene and that of AMO towards germacrene A were tested using the in vivo EPY300 yeast system. The P450s in yeast triple expression constructs as described earlier were swapped, resulting in the new pESC-

Leu2d::AMO/CPR/GAS, and the pESC-Leu2d::GAO/CPR/ADS plasmids. EPY300 yeast transformed with these new constructs was cultured and metabolite profiled.

Metabolite analysis revealed that AMO had negligible activity for GAA synthesis. However, all GAOs displayed significance activities towards amorphadiene oxidation to produce artemisinic acid. These relative activities were 5–40% compared to the native enzyme pair of

67

AMO and ADS, and were 100–1,000 folds higher than the relative activity of AMO against GAS

(Figure 3.7). These data suggest that during the evolution from GAO to AMO, the capacity to oxidize germacrene A has been lost. In contrast, GAOs from various subfamilies including the phylogenetically basal Barnadesioideae have the catalytic potential to oxidize amorphadiene.

Figure 3.6. Amino acid sequence alignment of AMO and GAOs. AMO. Stars indicate the residues conserved in GAOs but different in AMO. Circled marks indicate residues unique in AMO but not conserved in GAOs. The alignment is shaded to a 50% consensus. Dark and light shadings represent identical and similar residues, respectively. Adapted from Nguyen et al. (2010). 68

Figure 3.7. Relative cross-reactivities of amorphadiene oxidase and germacrene A oxidases in transgenic yeast. Activities of the natural pairs of sesquiterpene synthase and P450 (left: LsGAO1 and germacrene A synthase; right: AMO and amorphadiene synthase) were set as 100%. Relative activities are the results of the production levels of GAA (left) or artemisinic acid (right) normalized against relative abundance of P450s estimated by immunoblot analysis. Values are mean ± SD (n = 3).

3.3. Discussion

3.3.1. GAO activity in the Asteraceae family

The functional characterization of AMO homologues from lettuce, chicory, sunflower, costus, and B. spinosa revealed that they are germacrene A oxidases, catalyzing a critical step in the STL biosynthetic routes of the Asteraceae (Figure 1.10). As a cosmopolitan plant family of which many species are of high medicinal, agricultural, and industrial values, the Asteraceae has been studied intensively in terms of its biology and chemistry. A fossil record and molecular data found in the last 30 years have shown that the family originated in South America and started its diversification more than 50 million years ago to become a dominant plant family all over the 69 planet, except Antarctica (Cronquist, 1977; Panero and Funk, 2008; Barreda et al., 2010) (Figure

3.5). Particularly, the discovery that a 22-kb DNA inversion in the chloroplastic genome shared by all Asteraceae species is lacking in the subfamily Barnadesioideae indicated its ancient split from the rest of the family (Jansen and Palmer, 1987). This chapter demonstrated that, despite this enormous diversification, the germacrene A oxidase activity is highly conserved in the five plant species representing the major subfamilies and the phylogenetically basal subfamily

Barnadesioideae.

The fact that sequences and activities of all GAOs are conserved throughout the Asteraceae over the course of more than 50 million years of evolution is intriguing. Genes involved in plant specialized metabolism, such as P450s, often went through rapid divergence and neofunctionalization. Particularly, the CYP71 clan of cytochrome P450s, to which all of the

GAOs and AMO described here belong, has undergone bursting diversification, accounting for more than 50% of all higher plants’ P450s (Nelson and Werck-Reichhart, 2011). The retention of

GAO activity throughout the Asteraceae’s evolution strongly indicates that it offers enhanced fitness, and that the STLs synthesized via GAO activities are a selective advantage for many members of the Asteraceae.

The detection of GAA and costic acids from B. spinosa tissues indicated the biochemical relevance of GAO and suggested the occurrence of a STL biosynthetic pathway in the species.

This research, however, was restricted by the limited amount of B. spinosa tissues. Studies on metabolites of B. spinosa up to date have only focused on flavonoids and so far no evidence of

STL has been reported (Bohm and Stuessy, 1995; Mendiondo et al., 1997). Further investigation is thus required to better understand the correlation of GAO and

STLs in the Barnadesioideae.

70

In addition to GAA, other compounds identified in this study including costic acids are known plant natural products (de Kraker et al., 2001b). The natural occurrence of ilicic acid and its derivatives were also reported in the Asteraceae (Hernández et al., 2001; Zheng et al., 2003;

Abou-Douh, 2008), and their anti-tumour and anti-inflammatory activities were even suggested

(Léon et al., 2009). However, while it was not totally unexpected to detect costic acids because their formation from rearrangement of GAA was easily induced (de Kraker et al., 1998; 2001b), it is more difficult to explain the presence of ilicic acid in the experimental products. Due to the stereo-specificity of the hydroxyl group at C4 position, ilicic acid might be a product of a non- specific yeast enzyme from costic acids. The nature of this hydration, however, was not further pursued.

3.3.2. Cross-reactivities of AMO and GAOs

A. annua is the only plant species known to produce artemisinic acid and artemisinin. The occurrence of the sesquiterpene amorphadiene in this species appears to be a specific evolutionary event that might have been followed by the microevolution of GAO to AMO to accommodate the new substrate. The cross-reactivity study in this research indicated that AMO had compromised its activity for germacrene A oxidation to acquire a new substrate specificity towards amorphadiene. In contrast, all GAOs displayed noticeable activities in oxidizing amorphadiene despite the fact that they have not been exposed to this sesquiterpene in nature.

The observed activity of GAOs for amorphadiene supports the general hypothesis that progenitor enzymes have broader catalytic ambiguity, thereby providing necessary plasticity for specific activities to be selected and optimized (Jensen, 1976; Khersonsky et al., 2006).

In directed evolution studies, enzyme evolvability towards non-natural substrates with decreased activity for the native substrate has been shown a number of times (Khersonsky et al.,

71

2006). A similar example can also be found in case of deoxyhypusine synthase and homospermidine synthase (Ober and Hartmann, 1999). The primordial deoxyhypusine synthase conserved in all eukaryotes can utilize the substrate of homospermidine synthase. However, homospermidine synthase, evolved from deoxyhypusine synthase and specific for the pyrrolizidine alkaloid biosynthesis in certain plants, cannot utilize the substrate of deoxyhypusine synthase.

In most plants of the Asteraceae family, diversification and speciation of primitive GAO may not be favoured due to the importance of germacrene A acid in the STL pathway. In A. annua, GAO (a progenitor of AMO) might be under a unique selection pressure to evolve to efficiently accommodate a different substrate, amorphadiene. This study suggested that the naturally occurring STS/P450 gene pairs of GAS/GAO and ADS/AMO can be considered as an excellent model system to infer the co-evolutionary process in a linear metabolic pathway.

72

Chapter 4: PROBING THE EVOLUTION OF

AMORPHADIENE OXIDASE AND GERMACRENE A OXIDASES

4.1. Introduction

In the previous study, AMO and GAOs were isolated from five Asteraceae plant species and the cross-reactivity of AMO or GAOs recombinant enzyme was evaluated by swapping their native STSs (i.e., AMO and GAO) in yeast expression systems. The results clearly demonstrated that GAOs can utilize amorphadiene substrate while AMO poorly accepts germacrene A. In this chapter, it was further questioned if the substrate promiscuity of GAOs can be expanded to other structurally diverse sesquiterpenes. I reasoned that the experimental data addressing this question would be valuable in understanding enzyme evolution and chemical diversity.

To explain the mechanism of enzyme evolution, Jensen (1976) hypothesized that progenitor enzyme possesses a catalytic ambiguity, providing starting points for the enzyme to be selected and optimized for a specialized function. This hypothesis is well rationalized but lacks convincing evidence from nature (Khersonsky and Tawfik, 2010). This chapter describes the use of the AMO/GAO enzyme set to further test Jensen’s hypothesis. The substrate specificity and promiscuity of AMO and GAOs were investigated with several sesquiterpene substrates, providing more convincing and systematic evidence for the hypothesis of enzyme evolution from

“generalist” to “specialist”, and shedding some light on the adaptive evolution of STL biosynthesis. Structure–function analysis of GAO and AMO was also attempted to understand the molecular basis underlying the differential substrate specificity of AMO and GAOs.

73

4.2. Functional characterization of the second GAO isoform in lettuce

4.2.1. Biochemical activities of LsGAO2

Of the two AMO homologues identified from lettuce EST database, only one, named

LsGAO1, was found with both start and stop codons. This gene was isolated and characterized in

Chapter 3. For the second isoform, named LsGAO2, only the 5’-end sequence with start codon could be identified from the lettuce EST (Table 2.1). The 3’-end sequence was thus determined from the fragment isolated by 3’-RACE. Using the primers encompassing the start and stop codons, a full-length cDNA for LsGAO2 was isolated, revealing a cDNA encoding for a polypeptide of 497 amino acids with a predicted molecular mass of 56.2 kDa. Its deduced amino acids share 76% and 75% identities to LsGAO1 and AMO, respectively. To assess the catalytic activity of LsGAO2, the gene was also co-expressed with GAS and CPR as shown in LsGAO1 characterization. The triple expression vector pESC-Leu2d::LsGAO2/CPR/GAS was constructed and transformed to the EPY300 yeast strain. The metabolite profile of this transgenic yeast was compared to that of EPY300 co-expressing LsGAO1/CPR/GAS.

GC–MS analysis of the transgenic yeast cultured in the unbuffered (acidic) condition showed the similar chromatograms between the two yeast strains (Figure 4.1). The observed four products have the identical retention time and m/z values as costic acids and ilicic acid. When the transgenic yeast was cultured in buffered (neutral) condition, its metabolite profile was altered and resembled that of yeast expressing LsGAO1/CPR/GAS in the same condition. Only a single dominant peak was found in (–)-LC-MS analysis, and it was identified as GAA in comparison to the retention time and m/z value of the authentic standard (Figure 4.2, left). These data indicate that lettuce possesses the second isoform of GAO that also catalyzes the three-step oxidation of germacrene A to yield GAA.

74

Figure 4.1. GC–MS profiles of metabolite extracts from yeast expressing LsGAO1/CPR/GAS (top) and LsGAO2/CPR/GAS (bottom) in native (acidic) yeast culture conditions. Peaks 1, 2, 3, and 4 represent γ-, β-, and α-costic acids, and ilicic acid, respectively.

As LsGAO2 also has a high degree of homology to AMO, the cross-reactivity of LsGAO2 towards oxidizing amorphadiene was investigated. EPY300 yeast transformed with the triple expression vector pESC-Leu2d::LsGAO2/CPR/ADS was employed for this purpose. (–)-LC–MS analysis of the metabolite extract showed that the transgenic yeast produced a new product, identified as artemisinic acid by the authentic standard (Figure 4.2, right). These data indicated a broad substrate promiscuity of LsGAO2, similar to that of the other GAOs (see Chapter 3). In summary, two copies of GAO are encoded in lettuce and they showed the same catalytic activities.

75

Figure 4.2. (–)-LC–MS analyses of LsGAO2 activities in transgenic yeasts. Left: Metabolite profiles from EPY300 co-expressing LsGAOs/CPR/GAS in neutral (buffered) conditions. Right: Metabolite profiles from EPY300 co-expressing LsGAOs/CPR/ADS. “Standard” indicates the isolated GAA or artemisinic acid, and “IS” is internal standard (decanoic acid).

4.2.2. Differential expressions of LsGAO1 and LsGAO2 in lettuce

To investigate the potential difference of physiological roles of the two GAO isoforms in lettuce, their expression patterns were analyzed by qRT-PCR in various tissues including root, stem, old leaf, young leaf, and stem laticifer (via latex). The expression levels of LsGAO1 and

LsGAO2 relative to the old leave were shown in Figure 4.3.

76

Figure 4.3. Relative transcript levels of LsGAO1 and LsGAO2 in different tissues of thee-month- old lettuce. Transcript level in old leaf was used as the control with actin as reference gene.

LsGAO1 showed a high transcript level in root and stem with reduced levels of expression in young and old leaves. Interestingly, almost no expression was detected in latex where STLs accumulate. The relative expression of LsGAO2 was equally high in root, old, and young leaves; however, a significantly higher level of LsGAO2 transcript was detected in stem latex than stem

(Figure 4.3). When relative expressions of LsGAO1 and LsGAO2 were compared in stem and latex, their differential expression was obvious. LsGAO1 showed more 105-fold higher expression in stem than in latex whereas LsGAO2 showed more than five-fold higher expression in stem than in latex. As lettuce is known to accumulate STLs in latex (Sessa et al., 2000), these data suggest differential roles of LsGAO1 and LsGAO2 in STL biosynthesis of this species.

77

4.3. Substrate specificity of AMO and GAOs

4.3.1. De novo synthesis of sesquiterpene substrates in yeast

Figure 4.4. Structures of the sesquiterpenes to be tested for the substrate promiscuity/specificity of AMO/GAOs. The expected oxidation position is highlighted.

The previous data (Chapter 3) validated the cross-reactivities of GAO/AMO against germacrene A and amorphadiene but did not provide their possible promiscuous activities for other substrates. Therefore, seven more sesquiterpenes, bearing different sesquiterpene backbones, were further tested using the yeast system. These new sesquiterpene substrates were

78 one monocyclic sesquiterpene (germacrene D), three bicyclic sesquiterpenes with two six-carbon rings (5-epi-aristolochene, valencene, and δ-cadinene), two bicyclic sesquiterpenes with five- and seven-carbon rings (α- and δ-guaienes), and one bicyclic sesquiterpene with five- and six- carbon rings (valerenadiene) (Figure 4.4).

The cDNA clones for the synthesis of the seven sesquiterpenes are publicly available, and thus these cDNAs were expressed in yeast to provide substrates in vivo. First, in order to examine the ability of the endogenous synthesis of the seven substrates in yeast, cDNAs for each

STSs were cloned into pESC-Leu2d vector and expressed in EPY300 (Figure 2.1.A). After 48 hr of culture, GC–MS analysis of the dodecane overlay of the culture, which captures the volatile sesquiterpenes, showed that all seven clones in yeast produced the desired sesquiterpenes with an estimated yield 10–400 µg mL–1. In addition, trace levels of minor products were also observed

(Figure 4.5) as reported previously (Prosser et al., 2004; Kumeta and Ito, 2010; Pyle et al.,

2012). These data indicated that the seven clones were all expressed in yeast, and thus sesquiterpene oxidation activities could be investigated in these yeast backgrounds by co- expressing the P450s with the STSs.

79

Figure 4.5. De novo synthesis of plant sesquiterpenes in the engineered yeast strain EPY300. Yeast transformed with the empty vector was the negative control. GaCDS: cotton δ-cadinene synthase. ScGDS: goldenrod germacrene D synthase. VoVDS: valerenadiene synthase. SynGUS: synthetic guaiene synthase. NtEAS: tobacco 5-epi-aristolochene synthase. SynVS: synthetic valencene synthase.

80

4.3.2. Metabolite profiles of EPY300 co-expressing various combinations of AMO/GAOs and

STSs

Among six functionally characterized GAOs, three clones were selected for further substrate specificity studies in comparison to AMO. The first selected GAO was BsGAO, representing a sesquiterpene oxidase from the phylogenetic base, the Barnadesioideae. The other two were

LsGAO1 and LsGAO2 in lettuce, representing two isoforms of GAOs in the subfamily

Cichorioideae (Figure 3.7) (Panero and Funk, 2008).

GAS and ADS in the triple expression vectors with the four P450s described earlier were swapped with the six STS clones, generating 24 triple expression constructs. They were transformed into the yeast strain EPY300, and their metabolite extracts were analyzed by LC–

MS for oxidative products. As AMO and GAOs are known to catalyze the three-step oxidations of amorphadiene and germacrene A to form sesquiterpene carboxylic acids, I predicted that if

AMO and GAOs had any activity on the seven non-native substrates, the products would be sesquiterpene carboxylic acids and/or their derivatives. As carboxylic acids are easily ionized by deprotonation, (–)-LC–MS was used to analyze the metabolites synthesized from the transgenic yeasts. In the yeast expressing the three GAOs, new products with m/z 233 (nominal MW = 234, corresponding to sesquiterpene carboxylic acid) and m/z 251 (nominal MW = 252, corresponding to hydrated carboxylic acid) were detected at a significantly higher than the yeast expressing

AMO (Figure 4.6). A m/z 233 and/or a m/z 251 compounds were detected in all cases of the transgenic yeasts, except for the one expressing SynGUS, which had two closely migrating m/z

233 compounds at 8.7 and 8.8 min (12a and 12b, respectively) (Figure 4.5). These two m/z 233 compounds were likely to be the carboxylic acid forms of the two substrates α- and δ-guaienes.

Despite the high yields of germacrene D and δ-cadinene in EPY300 yeast (Figure 4.5), the

81 abundance of the oxidative sesquiterpene products from the yeast expressing GAOs together with

ScGDS or GaCDS was lower than those of other combinations (Figure 4.6). These results indicate that oxidation efficiencies of germacrene D or δ-cadinene by GAO are lower than those of other sesquiterpenes.

In addition, the possible one- or two-step oxidation of the sesquiterpenes by AMO and

GAOs was also investigated. (+)-LC–MS analysis was utilized to detect compounds with m/z

219 (nominal MW = 218, corresponding to sesquiterpene aldehyde or sesquiterpene ketone) and m/z 221 (nominal MW = 220, corresponding to sesquiterpene alcohol) because they are more readily ionized by protonation than by deprotonation. Data showed that while peaks with m/z 221 were not clearly detected, there was a distinct peak with m/z 219 when SynVS was co-expressed with the GAOs (Figure 4.8). Similar to the aforementioned data from (–)-LC–MS, this peak was negligible from yeast expressing AMO compared to yeasts expressing GAOs.

To confirm the expression of the P450s in these combinatorial tests, immunoblot analysis was performed in all combinations. Results showed that all P450s were expressed, indicating that the lack of novel products was not due to the failure to express P450s (Figure 4.6, top). In summary, these data showed that GAOs have a much broader substrate-utilization spectrum than

AMO has.

82

Figure 4.6. Analysis of combinatorial biochemistry of P450s and STSs. The six top panels show immunoblot analysis for the expression of AMO (Aa), BsGAO (Bs), LsGAO1 (Ls1), and LsGAO2 (Ls2) in yeast expressing the six STSs. The bottom chromatograms are from (–)- LC-MS metabolite profiling of transgenic yeast. The STSs and the P450s are indicated on top and in the far right, respectively. The ions extracted were m/z 171 for decanoic acid (IS: internal standard spiked in the extraction solvent), m/z 233 (diamond) for sesquiterpene carboxylic acid, and m/z 251 (triangle) for hydrated sesquiterpene carboxylic acid. GaCDS: cotton δ-cadinene synthase. ScGDS: goldenrod germacrene D synthase. VoVDS: valerian valerenadiene synthase. SynGUS: synthetic guaiene synthase. NtEAS: tobacco 5-epi-aristolochene synthase. SynVS: synthetic valencene synthase.

83

4.3.3. Structural analyses of the novel products

The chemical identities of the newly synthesized compounds were investigated using commercial standards when available and by purification and subsequent NMR spectroscopy when standards could not be obtained. For example, valerenic acid and nootkatone are commercially available and, therefore, could be used as standards.

Valerenadiene is naturally oxidized to valerenic acid in the valerian plant (Valeriana officinalis) (Bent et al., 2006; Benke et al., 2009). The oxidative product (MW = 234) isolated from the yeast expressing VoVDS and GAO was identified as valerenic acid by LC–MS and GC–

MS analyses using the authentic standard (Figure 4.7). This indicates that GAOs catalyzed the three consecutive oxidations at C12 position of the non-native substrate valerenadiene to form valerenic acid. Quantification based on the authentic standard showed that the production yield was 3.6 ± 1.6 µg mL–1 (by LsGAO2).

Another known oxidized sesquiterpene is nootkatone. This is a sesquiterpene ketone biosynthesized from valencene and has a unique grapefruit flavour (Figure 4.8.A). (+)-LC–MS results showed that a clear peak of m/z 219 was observed when SynVS was co-expressed with

GAOs. Using authentic standard, this m/z 219 peak was identified as nootkatone (Figure 4.8.B and C). The yield of nootkatone de novo synthesis reached 3.6 ± 1.1 µg mL–1 in EPY300 yeast co-expressing LsGAO2/CPR/SynVS.

84

Figure 4.7. Oxidation of valerenadiene to valerenic acid by GAOs. A: Schematic presentation of the three-step oxidation at C12 of valerenadiene to valerenic acid by GAOs. B: Extracted ion chromatograms for m/z 171 for decanoic acid (IS: internal standard) and m/z 233 (sesquiterpene carboxylic acid) from (–)-LC-MS analysis of different combinations of AMO/GAOs with VoVDS. Diamond indicates m/z 233 peak. C: Comparison of the electron impact fragmentation patterns of the de novo synthesized compound from valerenadiene by GAOs and authentic valerenic acid standard.

85

Figure 4.8. Oxidation of valencene to nootkatone by GAOs. A: Schematic presentation of the two-step oxidation at C2 of valencene to nootkatone by GAOs. B: Extracted ion chromatograms for m/z 219 (sesquiterpene ketone) from (+)-LC-MS analysis of different combinations of AMO/GAOs with SynVS. Star indicates m/z 219 peak. C: Comparison of the electron impact fragmentation patterns of the de novo synthesized compound from valencene by GAOs and authentic nootkatone standard.

86

The yeast expressing SynVS, NtEAS, or SynGUS, together with GAO, produced high abundance of oxidized sesquiterpene products, which could be further purified for structural studies. As shown in Figure 4.6, three-step oxidation products from these compounds included sesquiterpene carboxylic acids (m/z 233 in (–)-LC–MS) and their hydrated forms (m/z 251). One of the putative sesquiterpene carboxylic acids (12a) from guaienes were of reasonable abundance and were thus further purified for structural confirmation. However, the yield of putative sesquiterpene carboxylic acids from valencene and 5-epi-aristolochene were negligible compared to their putative hydrated forms, rendering them difficult to purify. Therefore, only the hydrated forms (13 and 14) were purified for these two cases.

Cultures of transgenic yeasts co-expressing LsGAO2/CPR with SynGUS, SynVS, and NtEAS were scaled to 5-L. After 48 hr, the cultures were extracted, and the target compounds were purified by a combination of alkaline wash, silica column chromatography, and HPLC. (–)-ESI high-resolution time-of-flight MS analyses of the isolated compounds confirmed the molecular

formulae of 12a as C15H22O2 (exact mass 233.1547, 0.7 ppm from calculated mass of a deprotonated sesquiterpene carboxylic acid), and 13 and 14 as C15H24O3 (exact mass 251.1651,

0.8 ppm different from calculated mass of a deprotonated, hydrated sesquiterpene carboxylic acid). The three compounds were subsequently subjected to one- and two-dimensional NMR analyses. Data showed that the structures of 13 and 14 were iso-ilicic acid (Figure 4.9.B) and iso- epi-ilicic (Figure 4.9.C) acid, respectively (Table 4.1). These were speculated to be hydration products from 5-epi-aristolochenoic and valencenoic acids. The abundance of compound 12a isolated for structural analysis was too low for full NMR signals. However, partial assignments indicated that 12a was δ-guaienoic acid (Figure 4.9.A; Table 4.1). The isolation of products from germacrene D and δ-cadinene was not attempted due to their low abundance.

87

Figure 4.9. Oxidations and rearrangements of δ-guaiene, 5-epi-aristolochene, and valencene in yeasts expressing GAOs. A: Oxidation (red) by GAO at C12 of δ-guaiene to δ-guaienoic acid (12a). B: Oxidation (red) by GAO at C12 of 5-epi-aristolochene to 5-epi-aristolochenoic acid, which undergoes rearrangements and hydration (blue) under acidic conditions to iso-ilicic acid (13). C: Oxidation (red) by GAO at C12 of valencene to valencenoic acid, which undergoes rearrangements and hydration under (blue) acidic conditions to iso-epi-ilicic acid (14).

88

Figure 4.10. (–)-LC–MS analysis of metabolite profiles of EPY300 co-expressing LsGAO2/CPR/STSs in acidic and neutral conditions. The STSs and P450s are indicated on top and in the far right, respectively. The ions extracted were m/z 171 for decanoic acid (IS: internal standard spiked in the extraction solvent), m/z 233 (diamond) for sesquiterpene carboxylic acid, and m/z 251 (triangle) for hydrated sesquiterpene carboxylic acid. GaCDS: cotton δ-cadinene synthase. ScGDS: goldenrod germacrene D synthase. VoVDS: valerian valerenadiene synthase. SynGUS: synthetic guaiene synthase. NtEAS: tobacco 5-epi-aristolochene synthase. SynVS: synthetic valencene synthase.

89

Table 4.1. 1H and 13C NMR spectral data of δ-guaienoic acid (12a), iso-ilicic acid (13), and iso- epi-ilicic acid (14).

iso-epi-ilicic acid1 (14) iso-ilicic acid2 (13) δ-guaienoic acid3 (12a) 13C 1H 13C 1H 13C 1H position (δ, ppm) (δ, mult, J [Hz]) (δ, ppm) (δ, mult, J [Hz]) (δ, ppm) (δ, mult, J [Hz]) 1.14, m 1.15, m 1 40.9 41.1 142.1 1.38, m 1.40, m 2 20.1 1.56, m 20.3 1.56, m 25.5 2.06, ov 1.35, bd, J = 7.1 1.38, bt, J = 6.1 2.00, ov 3 43.5 43.6 34.5 1.80, dd, J = 12.6, 1.4 1.83, m 2.25, bt, J = 12.5 4 72.0 72.4 38.8 2.11, bt, J = 6.9 5 54.9 1.3, bd, J = 2.3 55.2 1.32, bd, J = 2.3 46.1 2.51, bt, J = 8.9 1.48, bd, J = 3.4 1.40, m 0.97, d, J = 6.5 6 27.2 27.5 33.1 1.63, m 1.65, ov 1.32, ov 7 40.2 2.50, tt, J = 11.6, 12.1 40.4 2.52, tt, J = 2x12.0 43.2 2.7, bt, J = 11.9 1.20, d, J = 12.3 1.22, d, J = 12.1 2.17, m 8 26.7 26.6 30.5 1.94, m 1.96, m 2.32, m 1.28, ov 1.28, bd, J = 2.8 2.00, ov 9 44.5 44.7 34.5 1.44, bt, J = 3.0 1.45, ov 2.25, bt, J = 12.5 10 34.5 34.8 128.9 11 145.1 144.9 12 169.9 169.7 5.70, s 5.67, s 5.56, s 13 124.0 124.7 124.7 6.30, s 6.27, s 6.24, s 14 18.4 0.95, s 18.9 0.92, s 16.1 1.60, s 15 22.3 1.10, s 22.7 1.12, s 15.5 0.89, d, J = 7.1 1 from valencene; signals were measured in CD2Cl2 at 500 MHz 2 from 5-epi-aristolochene; signals were measured in CDCl3 at 500 MHz 3 from δ-guaiene; signals were measured in CDCl3 at 600 MHz m: multiplet; bd: broad doublet; d: doublet; dd: doublet of doublet; tt: triplet of triplet; bt: broad triplet; ov: overlapped; s: singlet

4.3.4. Metabolite profiles of transgenic EPY300 expressing various combinations of

AMO/GAOs and STSs in buffered (neutral) culture condition

Since the dominant novel products of yeast co-expressing P450s and STSs appeared to be hydrated sesquiterpene carboxylic acids (Figure 4.6), I reasoned that sesquiterpene carboxylic acids formed by the P450s’ activities were hydrated and rearranged, at least in the cases of 5-epi- aristolochene and valencene, due to the acidic conditions of yeast cultures (Table 4.8). To evaluate this possibility, yeast cultures were buffered with HEPES/NaOH (pH 7.5) and the pH was maintained at 7.0 until the cultures were extracted for metabolite profiling. As LsGAO2

90 yielded the oxidative products at the highest abundance compared to other P450s, transgenic yeasts co-expressing LsGAO2 and all STSs were chosen for this purpose.

(–)-LC–MS analyses of transgenic yeasts in buffered conditions showed that the metabolite profiles of yeasts expressing LsGAO2 with GaCDS, ScGDS, and VoVDS were largely unaltered.

Two novel peaks of m/z 233 were present in the products of δ-cadinene and germacrene D, but their abundance was only 3% and 10% of the corresponding m/z 251 compounds, respectively

(Figure 4.10). In contrast, there was a significant change in the metabolite profiles between acidic and neutral culture conditions of transgenic yeasts expressing LsGAO2 with SynGUS,

NtEAS, and SynVS. In each of these three cases, the abundance of the m/z 251 peak decreased, and a new m/z 233 peak appeared. The new m/z 233 peak was 33% of the m/z 251 peak in yeast expressing SynGUS, while this ratio was as high as 120% and 60% in yeast expressing NtEAS and SynVS, respectively (Figure 4.10). These data suggest that the investigated sesquiterpenes were oxidized to sesquiterpene carboxylic acids, and subsequently hydrated in yeast cultures.

This hydration is likely to be facilitated by the acidic nature of yeast cultures; therefore, the hydration was reduced when pH was maintained at neutral level.

91

Figure 4.11. Phylogenetic tree of AMO/GAOs, including LsGAO2 and CiGAO2 in the Asteraceae family. HPO (Hyoscyamus muticus premnasporidiene oxygenase) and EAH (tobacco 5-epi-aristolochene-1,3-dihydroxylase) were used as out-groups. Bootstrap values were shown in percentage values from 1,000 replicates. Scale bar indicates number of changes.

92

4.4. Evolutionary analysis of AMO and GAOs

4.4.1. Phylogenetic analysis of GAO isoforms

Pair-wise sequence alignment data showed that the newly identified LsGAO2 shares a lower degree of homology with LsGAO1 than that with BsGAO (83% with LsGAO1 and 86% identity with BsGAO). A new isoform of GAO has been recently reported from chicory (Cankar et al.,

2011). Sequence comparison showed that this isoform was identical to the CiGAO2 described in

Chapter 2, and it shared 96% sequence identity with LsGAO2. To better understand the evolutionary relations of these new isoforms with the previously-characterized GAOs and AMO, a phylogenetic tree was re-constructed (Figure 4.11).

The phylogenetic tree showed that LsGAO2 and CiGAO2 clustered in the same clade with

BsGAO, supported by significant bootstrap value (88%). On the other hand, LsGAO1 and

CiGAO1 form a distinct clade with other GAOs (bootstrap value: 89%). I refer the clade for

BsGAO/LsGAO2/CiGAO2 as clade I and the clade for other GAOs as clade II. Interestingly, the genome-wide analysis of GAO clade I sequences in various Asteraceae plants such as sunflower, costus, and B. spinosa, could only identify the GAOs from the GAO clade II. These results implied that GAO clade I from B. spinosa, which served as an evolutionary template for GAO clade II, disappeared in many Asteraceae plants but was retained in species in the subfamily

Cichorioideae (Figure 3.5).

4.4.2. Selection analysis of AMO and GAOs in the STL metabolic network

In order to detect the differential selection pressures on AMO and GAOs, pairwise ratios of non-synonymous (Ka) to synonymous (Ks) substitutions between BsGAO (i.e. phylogenetic base) and AMO or other GAOs were measured. As a result, the highest Ka/Ks ratio is observed between

BsGAO and AMO (0.14) while those between BsGAO and other GAOs are 0.09 (CiGAO2), 0.10

93

(LsGAO2, LsGAO1, HaGAO), 0.11 (CiGAO1), and 0.13 (SlGAO). The average of pairwise Ka/Ks ratios between AMO and all GAOs, and all pairwise Ka/Ks ratios between these eight P450s are

0.12 ± 0.01 (n = 7), and 0.10 ± 0.02 (n = 28), respectively. These values are significantly lower than 1, indicating that these P450s are under negative (purifying) selection (Husrt 2002).

Ka/Ks values of 21 FPSs and eight GASs from various Asteraceae species (Table 2.3) were also measured to infer the differential selection constraints between different enzymatic steps of

STL biosynthesis in comparison with the universal chalcone biosynthesis in the Asteraceae. The lowest Ka/Ks ratio was found to be 0.09 ± 0.05 (n = 105) for the universal gene CHSs. In the STL biosynthetic route, the average Ka/Ks ratio of FPSs was calculated to be 0.14 ± 0.04 (n = 210), lower than 0.22 ± 0.09 (n = 28) for the subsequent GASs. However, this trend is not followed by

GAOs. Although GAOs are positioned downstream of GASs in the pathway, their Ka/Ks ratio is only 0.10 ± 0.02 (n = 21), lower than that of GASs. All of these ratios are statistically significantly different (P < 0.0001, two-sample t test) except between GAOs and CHSs.

4.5. Structure–function analysis of AMO and GAOs

4.5.1. Homology modelling of AMO and GAOs

The intriguing difference in substrate specificity and promiscuity between AMO and GAOs raised the question about the underlying structural difference. The high sequence homology shared between AMO and GAOs (Figures 3.6) suggests that there may be only a few plasticity residues that play critical roles in the substrate specificity of these enzymes. To obtain an insight into these structural elements, homology modelling and in silico substrate docking were performed. Three P450s were selected for this analysis. These P450 enzymes are the highly

94 substrate-specific enzyme AMO, and LsGAO1 and BsGAO, which represent the two GAOs that share the highest and lowest sequence identities with AMO.

As AMO has evolved to specifically and efficiently accept and transform amorphadiene to artemisinic acid, substrate docking experiments of amorphadiene in AMO as well as in LsGAO1 and BsGAO were expected to offer insights into the structural elements underlying the substrate specificity and promiscuity of these enzymes. Docking solutions of amorphadiene positioned with C12 in a close proximity of heme were selected for analysis (Figure 4.12). Among the amino acids within 10 Å of the substrate, 12 and 17 residues differentiate AMO from LsGAO1 and BsGAO, respectively (Figure 4.13). Of these residue variations between AMO and two

GAOs, 11 residues were conserved in LsGAO1 and BsGAO, which may specify GAO activity

(in AMO: Leu109, Val115, Leu197, Leu236, Arg237, Ile240, Ile283 Ser300, Leu362, Ser472, and Met476). Intriguingly, all of these residues position on or adjacent to the six substrate- recognition sites (SRS) of P450 enzymes (Figure 4.14; Gotoh, 1992; Denisov et al., 2005).

Therefore, the residues identified from the modelling are ideal target for mutagenesis.

4.5.2. Site-directed mutagenesis of GAOs

To evaluate whether the variable amino acids are AMO/GAO plasticity residues, the site- directed mutagenesis of LsGAO1 and BsGAO at multiple residues was performed by chemical gene synthesis. The synthetic LsGAO1 and BsGAO genes include the mutations of 12 and 17 residues, respectively, to the corresponding residue positions in AMO (Figure 4.13). In anticipation of altering substrate-specificity of the mutant enzymes, the synthetic LsGAO and

BsGAO genes including multiple mutations were named as LsAMO (mutated LsGAO1) and

BsAMO (mutated BsGAO). In order to assess the catalytic activities, these synthetic genes were co-expressed with CPR and ADS in EPY300 yeast.

95

Figure 4.12. Models of amorphadiene docked in AMO, LsGAO1, and BsGAO structures. The enzymes’ heme and amorphadiene are presented in stick form. Residues within 10 Å of the substrate are highlighted with yellow colour.

96

AMO MALSLTTSIALATILLFVYKFATRSKSTKKSLPEPWRLPIIGHMHHLIGTTPHRGVRDLA 60 LsGAO1 MELSITTSIALATIVFFLYKLATRPKSTKKQLPEASRLPIIGHMHHLIGTMPHRGVMDLA 60 BsGAO MELTLTTSLGLAVFVFILFKLLTGSKSTKNSLPEAWRLPIIGHMHHLVGTLPHRGVTDMA 60

AMO RKYGSLMHLQLGEVPTIVVSSPKWAKEILTTYDITFANRPETLTGEIVLYHNTDVVLAPY 120 LsGAO1 RKHGSLMHLQLGEVSTIVVSSPKWAKEILTTYDITFANRPETLTGEIIAYHNTDIVLAPY 120 BsGAO RKYGSLMHLQLGEVSTIVVSSPRWAKEVLTTYDITFANRPETLTGEIVAYHNTDIVLSPY 120

AMO GEYWRQLRKICTLELLSVKKVKSFQSLREEECWNLVQEIKASGSGRPVNLSENVFKLIAT 180 LsGAO1 GEYWRQLRKLCTLELLSVKKVKSFQSIREEECWNLVKEVKESGSGKPINLSESIFTMIAT 180 BsGAO GEYWRQLRKLCTLELLSAKKVKSFQSLREEECWNLVKEVRSSGSGSPVDLSESIFKLIAT 180

AMO ILSRAAFGKGIKDQKELTEIVKEILRQTGGFDVADIFPSKKFLHHLSGKRARLTSLRKKI 240 LsGAO1 ILSRAAFGKGIKDQREFTEIVKEILRQTGGFDVADIFPSKKFLHHLSGKRARLTSIHKKL 240 BsGAO ILSRAAFGKGIKDQREFTEIVKEILRLTGGFDVADIFPSKKILHHLSGKRAKLTNIHNKL 240

AMO DNLIDNLVAEHTVNTSSKTNETLLDVLLRLKDSAEFPLTSDNIKAIILDMFGAGTDTSSS 300 LsGAO1 DNLINNIVAEHHVSTSSKANETLLDVLLRLKDSAEFPLTADNVKAIILDMFGAGTDTSSA 300 BsGAO DSLINNIVSEHPGSRTSSSQESLLDVLLRLKDSAELPLTSDNVKAVILDMFGAGTDTSSA 300

AMO TIEWAISELIKCPKAMEKVQAELRKALNGKEKIHEEDIQELSYLNMVIKETLRLHPPLPL 360 LsGAO1 TVEWAISELIRCPRAMEKVQAELRQALNGKEKIQEEDIQDLAYLNLVIRETLRLHPPLPL 360 BsGAO TIEWAISELIRCPRAMEKVQTELRQALNGKERIQEEDIQELSYLKLVIKETLRLHPPLPL 360

AMO VLPRECRQPVNLAGYNIPNKTKLIVNVFAINRDPEYWKDAEAFIPERFENSSATVMGAEY 420 LsGAO1 VMPRECREPVNLAGYEIANKTKLIVNVFAINRDPEYWKDAEAFIPERFENNPNNIMGADY 420 BsGAO VMPRECREPCVLAGYEIPTKTKLIVNVFAINRDPEYWKDAETFMPERFENSPINIMGSEY 420

AMO EYLPFGAGRRMCPGAALGLANVQLPLANILYHFNWKLPNGVSYDQIDMTESSGATMQRKT 480 LsGAO1 EYLPFGAGRRMCPGAALGLANVQLPLANILYHFNWKLPNGASHDQLDMTESFGATVQRKT 480 BsGAO EYLPFGAGRRMCPGAALGLANVELPLAHILYYFNWKLPNGARLDELDMSECFGATVQRKS 480

AMO ELLLVPSF------488 LsGAO1 ELLLVPSF------488 BsGAO ELLLVPTAYKTANNSA 496

Figure 4.13. Sequence alignment of AMO, LsGAO1, and BsGAO. The six substrate recognition sites are highlighted in yellow. Different residues within 10 Å of amorphadiene according to the enzyme–substrate models are indicated in red. These residues were mutated to the respective amino acids in AMO to generate the mutant proteins LsAMO and BsAMO.

97

Figure 4.14. Production of artemisinic acid by mutant and native GAOs. Relative activities were calculated after normalization against quantitative immunoblot analysis data. Data presented are average ± SD of at least five replicates.

(–)-LC–MS analysis of the metabolite extracts from transgenic yeast showed that yeast expressing LsAMO and BsAMO produced a significantly lower yield of artemisinic acid compared to native LsGAO1 and BsGAO, respectively (Figure 4.14). Immunoblot analysis of the P450s also indicated lower expression levels of the mutants compared to those of the native enzymes. When the abundance of artemisinic acid was normalized against the expression levels

98 of the enzymes, the relative activities of LsAMO and BsAMO showed 24% and 33% of the respective native P450s. These results suggested that changes in the selected residues do not enhance substrate specificity toward amorphadiene as expected, and the promiscuous activities of the mutant enzymes were lower than those of native GAO enzymes.

4.6. Discussion

4.6.1. The two GAO isoforms in lettuce

A combination of molecular biology, biochemical, and phylogenetic approaches were used to characterize the second isoform of GAO in lettuce (LsGAO2), and the evolutionary linage of

AMO and GAOs in the Asteraceae family was studied in this work. This study demonstrated that two GAO isoforms are present in lettuce, LsGAO1 and LsGAO2, which encode two enzymes with an identical catalytic activity (Figures 1.10, 4.1). Despite the shared biochemical activity,

LsGAO1 and LsGAO2 have distinct expression patterns in different tissues of lettuce, particularly in stem and stem latex (Figure 4.3).

Similar to lettuce, two isoforms of GAO were also found in the chicory EST database (Table

2.1). Although only CiGAO1 was biochemically characterized by this research, Cankar et al.

(2011) recently reported another germacrene A oxidase homologue, CYP71AV8, in chicory that has the same activity on substrate germacrene A and shares 82% amino acid identity to both

LsGAO1 and CiGAO1. When compared to the GAO-like ESTs from chicory that were not of

CiGAO1, CYP71AV8 showed a 98% nucleotide sequence identity to contig d (Table 2.1), demonstrating that it is the second isoform of GAO in chicory, CiGAO2. The presence of this second copy of chicory GAO had been speculated earlier in this study.

99

Although two GAO clades (I and II) can be identified, I could not find evidence that GAO clade II is present in B. spinosa EST database. Intuitively, it can be reasoned that gene duplication and neo-functionalization of the GAO clade I have occurred in various Asteraceae plant species including lettuce and chicory. However, the GAO clade I could not be identified from many Asteraceae plants outside of the Cichorioideae subfamily. Therefore, although GAO clade I may have served as a gene source for GAO clade II in the Asteraceae genome evolution,

GAO clade I may not have been selected evolutionarily in many Asteraceae plants. It is interesting to notice that this ancient gene signature (GAO clade I) is present in lettuce and chicory in the subfamily Cichorioideae. These two species have a specialized secretory cell type called laticifer where various specialized metabolites accumulate. In the qRT-PCR analysis,

LsGAO1 and LsGAO2 showed a marked expression difference in stem latex and stem. There was almost no expression of LsGAO1 in latex while LsGAO2 showed a stronger expression in latex than in stem. Since stem latex is part of stem, it is be possible that LsGAO2 is exclusively expressed in latex. I speculate that the retention of two copies of GAOs in lettuce and chicory is implicated in specialized cellular differentiation, such as laticifer differentiation, in the

Cichorioideae. However, further studies are necessary to address the physiological roles of each

GAO isoforms and its implication in laticifer evolution and differentiation.

4.6.2. Substrate promiscuity of GAOs

LsGAO2 and LsGAO1 share not only their activity towards the native substrate germacrene

A but also the substrate promiscuity against other sesquiterpene substrates. The co-expression strategy used here clearly showed the substrate promiscuity of the GAOs which accept and oxidize seven non-native sesquiterpenes with various structural types (Figure 4.6). In addition to this three-step oxidation activity on C12 position of the substrates, GAOs also catalyze the two-

100 step oxidation on C2 position of valencene to produce nootkatone (Figure 4.8). The occurrence of more than one product and the diversity of the substrates’ structural types convincingly demonstrate that GAOs readily accept a wide range of sesquiterpene substrates including those that GAOs do not meet in nature.

The question of how new activities evolved is of fundamental scientific importance, but it remains poorly understood. This is due to the inherent lack of knowledge of historical events including evolutionary “accidents” and selection pressures (Arnold et al., 2001). Jensen (1976) proposed the general hypothesis of enzyme evolution from ancestral promiscuity. Although this hypothesis is well rationalized, it is supported by few examples in nature (Khersonsky et al.,

2006; Ober, 2010). The differential cross-reactivities between AMO and GAOs presented here provided an excellent piece of evidence supporting this hypothesis. The environmental conditions have constantly changed since the Asteraceae started its diversification more than 50 million years ago, and so do the metabolic pathways of Asteraceae plants. Therefore, the broad substrate ambiguity of GAOs likely exerted selection advantages for the plants as it readily provides room for selection and optimization of specific catalytic functions. This process resulted in the subfunctionalization of GAO to become the more specialized, modern AMO and might have resulted in other specific sesquiterpene oxidases that are yet to be elucidated. This means that the myriad of strikingly diverse STLs in the Asteraceae may not be biosynthesized via one or a few common parental STLs (Figure 1.8) but from specific sesquiterpene backbones with the specialized sesquiterpene oxidases.

Novel oxidation products, when GAOs were co-expressed with VoVDS, SynGUS, NtEAS, and SynVS, included sesquiterpene carboxylic acids, hydrated sesquiterpene carboxylic acids, and sesquiterpene ketone. A combination of GC–MS, LC–MS, and NMR spectroscopy analyses

101 showed that recombinant GAOs in yeast yielded valerenic acid from valerenadiene, δ-guaienoic acid from δ-guaiene, iso-ilicic acid from 5-epi-aristolochene, and nootkatone and iso-epi-ilicic acid from valencene (Figures 4.7, 4.8, 4.9; Table 2.1). Altered metabolite profiles from different yeast culture conditions suggested that the presence of hydrated sesquiterpene carboxylic acids, including iso-ilicic acid and iso-epi-ilicic acid, was due to the culture artifacts. (i.e., acidic environment of yeast cultures) (Figure 4.10). I speculated that 5-epi-aristolochene and valencene were oxidized at their C12 by GAOs to form 5-epi-aristolochenoic acid and valencenoic acid.

Under the acidic conditions (pH ≈ 3 after 48 hr of culture), the C9=C10 and C1=C10 double bonds of 5-epi-aristolochenoic acid and valencenoic acid, respectively, were protonated. The resulting bridgehead carbocations were most likely chemically unstable and thus underwent a series of 1,2-methyl and 1,2-hydride migrations, and final hydrations to form iso-ilicic and iso- epi-ilicic acids (Figure 4.9). Among the oxidation products, valerenic acid and nootkatone are well-known natural products with high medicinal and industrial values (Bent et al., 2006; Benke et al., 2009; Furusawa et al., 2005). However, to the best of my knowledge, δ-guaienoic acid, iso-ilicic acid, iso-epi-ilicic acid, and the hypothetical valencenoic and 5-epi-aristolochenoic acids are not found in the literature. Therefore, artificial co-expression of STSs and GAO in yeast in the present work yielded new laboratory compounds, increasing the repertory of artificial natural products

No novel oxidative products from yeast co-expressing GAOs with ScGDS and GaCDS were structurally elucidated due to their low abundance (Figure 4.6) despite the high levels of the substrates (Figure 4.5). The low activities of GAO on germacrene D and δ-cadinene are likely due to the lack of a double bond adjacent to the oxidation position (Figure 4.4), which makes hydrogen abstraction by P450 more difficult. It is also interesting to note that the metabolite

102 profiles were largely unchanged between acidic and buffered cultures of these transgenic yeasts

(Figure 4.10). As germacrene D is highly prone to acid-induced rearrangements (Yoshihara et al., 1969), even mild and/or local acidic conditions were sufficient to completely induce the hydration and rearrangement of the hypothetical germacrene D carboxylic acid to a hydrated sesquiterpene carboxylic acid. Similarly, δ-cadinene carboxylic acid might be highly prone to acid-facilitated hydration due to the presence of a bridgehead double bond in its structure. The nature of these hydrations and/or rearrangements, however, was not further pursued.

4.6.3. Natural selection of AMO and GAOs

Pairwise measurements of Ka/Ks ratios between BsGAO and AMO and other GAOs showed that BsGAO–AMO has the highest Ka/Ks ratio (0.14) among the ratios between BsGAO and other

P450s (0.09–0.13). These data indicate that the divergence of AMO from GAOs experienced a more positive selection, and the increased specificity towards amorphadiene has a selective advantage in the adaptation of A. annua. Nevertheless, the low Ka/Ks ratios (0.12 ± 0.01 < 1) of

AMO imply that this gene is under a general negative (purifying) selection. In addition to the advantage of improved specificity towards amorphadiene, the possible disadvantage of substrate promiscuity in the adaptive evolution of the enzyme might have driven the ancestral enzyme towards trading off its multi-specificity for a single dominant activity.

The average Ka/Ks ratios of GAOs (0.10 ± 0.02) are also significantly smaller than 1, indicating that they are under very strong purifying selection. Previous studies in anthocyanin and terpenoid metabolism in higher plants have showed that the selection constraint on genes encoding for enzymes in later steps of the pathways is more relaxed compared to the constraint of those involved in earlier steps (Rausher et al., 2008; Ramsay et al., 2009). However, data from this research showed that the Ka/Ks ratios of GAOs are even lower than those of FPSs (0.14 ±

103

0.04) and GASs (0.22 ± 0.09), and comparable to that of CHSs (0.09 ± 0.05) in the Asteraceae.

These data imply that the selection on these P450s is stronger than that on FPSs and GASs, and is at the same level with that on CHSs, a ubiquitous gene in higher plants. The results again reinforces that GAOs play a critical role in the adaptation of the Asteraceae. Therefore, any deleterious mutations on these genes would most likely be eliminated and the sequences are thus highly conserved throughout the family.

4.6.4. Structure–function analysis of AMO and GAOs

AMO and GAOs share a very high degree of sequence identity, but their substrate specificities are remarkably different. It is reasonable to speculate that ancient GAO has accumulated a number of mutations on the substrate-binding sites to accommodate amorphadiene as a substrate although we do not understand the selection pressure imposing the metabolic transformation in A. annua. However, site-directed mutagenesis of the hypothetical plasticity residues of GAOs in this study, could not improve GAO’s specificity towards amorphadiene.

These data suggest that in addition to the selected residues for site-directed mutagenesis, other amino acids may contribute to the substrate specificity of AMO and GAOs. Several examples of non-active-site residues playing crucial roles in the interaction between enzymes and substrates have been reported (Wongtrakul et al., 2003; Valmsen et al., 2007). Recent studies on two promiscuous P450s in human, CYP3A4 and CYP2B4, using site-directed mutagenesis and computational approaches also indicated that residues on the substrate entrance and exit channels have an important role in the activities of enzymes (Fishelovitch et al., 2009;

Wilderman et al., 2011). Furthermore, the inherent lack of absolute accuracy in protein modelling cannot be ignored. Enzymes may have more than one conformation in solutions, and

104 the conformational diversity of an enzyme can be linked to its multi-specificity (James and

Tawfik, 2003). As this diversity is not reflected well in the models inferred from crystal structures, a combination of homology modelling and directed evolution may work better in pinpointing the structural elements behind the substrate specificity of the P450s. Although understanding how AMO has structurally evolved from the ancestral GAO proved challenging, its understanding will have significant practical implications in enzyme engineering in the future.

105

Chapter 5: GENERAL DISCUSSION AND FUTURE PERSPECTIVES

The general goal of my research is to advance the understanding of STL metabolism in the

Asteraceae family. Using a combination of genomics and biochemical approaches, this study first aimed to isolate and characterize GAO, an enzyme responsible for the oxidation of germacrene A to GAA in the STL biosynthetic pathway in lettuce. Furthermore, the functional conservation of GAO activity throughout the Asteraceae family was demonstrated, and deeper insights into their evolution contributing to the STL chemical diversity were obtained from studies of biochemical and phylogenetic relations of GAO with AMO.

5.1. Contribution of GAOs to the fitness of the Asteraceae

5.1.1. Critical role of GAOs in STL metabolism

Despite the extensive studies on the structures and bioactivities of STLs, their biosynthesis is still poorly understood. Two STL biosynthetic routes have been proposed in the Asteraceae

(Figure 1.10). The first pathway leading to artemisinin is unique to A. annua. The second pathway leads to costunolide, the simplest STL and hypothetical precursor for a plethora of other

STLs in many species, and thus represents the general STL metabolism route in the Asteraceae

(Figure 1.8). Except the first step from FPP to germacrene A, the downstream part of the costunolide biosynthetic pathway was unclear at the molecular level before this research started.

106

Figure 5.1. Biosynthetic pathway of costunolide. GAS: germacrene A synthase. GAO: germacrene A oxidase. COS: costunolide synthase. GAA: germacrene A carboxylic acid. Two isoforms of GAS and GAO (x 2) have been characterized in lettuce.

This study successfully isolated and characterized GAO, encoding the enzyme catalyzing the oxidation of germacrene A to GAA, in lettuce. In addition, the functional conservation of GAO was demonstrated in five species representing the three largest subfamilies (Asteroideae,

Cichorioideae, and Carduoideae) and the phylogenetic base (Barnadesioideae) of the Asteraceae.

Once plants produce GAA, attaching a hydroxyl group adjacent to the C12 carboxylic acid (i.e.,

C6 or C8 position of GAA) will result in STL formation in the simplest scenario (Figures 1.10,

107

5.1) (Ikezawa et al., 2011). GAO activity is the prerequisite for the synthesis of STLs, and thus renders a selective advantage through the eco-physiological activities of STLs to the Asteraceae species. In fact, the functional characterization of GAOs and the yeast system for de novo synthesis of GAA in this research paved the way for the characterization of costunolide synthase

(Figure 5.1; Ikezawa et al., 2011).

It should be noted that GAO can catalyze three consecutive oxidations of germacrene A to germacrene A alcohol, germacrene A aldehyde, and GAA. Multi-functional P450s have been identified in several plant species. One example is the loblolly (Pinus taeda) abietadienol/abietadienal oxidase, which catalyzes two consecutive oxidations of abietadienol to abietic acid (Ro et al., 2005). Other examples include ent-kaurene oxidase and ent-kaurenoic acid oxidase, which catalyze the three-step oxidations of ent-kaurene and ent-kaurenoic acid, respectively, in the biosynthesis of gibberellin-12 (Yamaguchi, 2008). However, such multi- functional P450s are not commonly found in nature, compared to one-step hydroxylation enzymes. Therefore, the acquisition of the uncommon GAO activity in the Asteraceae was likely a crucial event for STL evolution in the family.

Two GAOs were found that encode two enzymes with identical activity in lettuce. While

LsGAO1 is highly expressed in stem and almost non-existent in stem latex, the transcript level of

LsGAO2 is several folds higher in latex than in stem. Latex is produced by laticifer, known as a storage site of plant specialized metabolites (Hagel et al., 2008). In opium poppy (Papaver somniferum), immuno-localization data revealed that codeinone reductase, catalyzing the last step to codeine, is expressed in laticifers, where codeine is found (Weid et al., 2004). Similarly, the two enzymes responsible for the terminal steps leading to vindoline in Catharanthus roseus, desacetoxyvindoline 4-hydroxylase and deacetylvindoline 4-O-acetyltransferase, were found to

108 be expressed in laticifers (St-Pierre et al., 1999). As STLs accumulate in latex in lettuce (Sessa et al., 2000), it is reasonable to speculate that the differential expression patterns between stem and latex of LsGAO1 and LsGAO2 are linked to their differential roles in STL biosynthesis in lettuce.

In addition, at least two isoforms of GAS have been reported in lettuce (Bennett et al., 2002)

(Figure 5.1). More systematic research is required to understand the roles of these TPSs and

P450s in co-ordinating the regulation of STL metabolism. Specific studies can include specific silencing of each GAO and GAS isoform, and localization of GAOs and GASs in lettuce.

Furthermore, questions about which enzymes are responsible for the downstream steps of STL biosynthesis and whether costunolide is the true precursor for other STLs in the Asteraceae need to be addressed.

5.1.2. Conservation of GAO activity in the Asteraceae

The retention of GAO activity over 50 million years of diversification in the Asteraceae indicates that these genes significantly contribute to the evolutionary success of the family. It is intriguing that conserved GAO activity was found in B. spinosa, and GAA was detected from this plant. Apparently, a transcript encoding a biochemically active GAO is present in B. spinosa.

It is reasonable to predict that B. spinosa can synthesize STLs. However, there is no report in the scientific literature on the presence of STLs in any species from the subfamily Barnadesioideae despite the intensive phytochemical studies in 1960s and 1970s. Also, NMR signals for α- methylene--lactone moiety could not be detected from the herbarium samples of B. spinosa

(personal communication with Dr. Otmar Spring, University of Hohenheim, Germany). It should be noted that STLs accumulate in specialized cells/organs such as glandular trichomes (e.g., sunflower) or laticifers (e.g., lettuce) probably to avoid self-toxicity of STLs. The anatomy of B. spinosa does not suggest the presence of such secretory organs (Funk et al., 2009). Although it is

109 speculative, development of the dedicated storage organs is required to implement STL metabolism in any plant, and B. spinosa might not able to develop such specialized cells for

STLs. In short, B. spinosa may represent a STL-free Asteraceae plant with GAO activity.

The occurrence of sesquiterpene carboxylic acids has been widely reported in the Asteraceae

(Bohlmann and Zdero, 1978; Marco et al., 1994; Sanz et al., 1997; Hernández et al., 2001).

Among these compounds, many have anti-feedant activities such as γ-costic acid, ilicic acid, and tessaric acid (González-Coloma et al., 2005). As two of these sesquiterpene carboxylic acids could be produced by GAO activity as shown in this study, the presence of GAO, even without

STLs, could contribute to the survival of the Barnadesioideae plants. The later evolution of STL biosynthesis provides enhanced fitness, based on the GAO pathway that was already present by natural selections.

Clearly, more studies on GAO activity and STL metabolism in the Barnadesioideae are required to explore their roles in the evolution of the Barnadesioideae in particular and the whole

Asteraceae family in general. In fact, glandular trichomes have been recorded in at least one genus, Doniophyton, in the subfamily Barnadesioideae (Katinas and Stuessy, 1997). It will be of great scientific significance to determine whether the Barnadesioideae species are indeed STL- free and how much GAOs contribute to their fitness.

5.2. The many “talents” of GAOs and chemical diversity of the Asteraceae family

This study showed that GAOs have a very high degree of substrate promiscuity that is absent in AMO. In addition to germacrene A, GAOs accept and oxidize many sesquiterpenes with various structural features. Among these substrates, at least one (valencene) can be oxidized at two different positions (Figures 4.6–4.9). Oxidative products of GAOs include known natural

110 products (e.g. valerenic acid and nootkatone) and potentially novel “unnatural” sesquiterpenoids

(e.g. iso-ilicic acid, iso-epi-ilicic acid and δ-guaienoic acid). These findings open up the possibility of using GAOs for the production of desired “unnatural” sesquiterpenoids in the future.

Figure 5.2. Model of sesquiterpene oxidase evolution based on Jensen’s hypothesis of enzyme evolution from ancestral promiscuity. GAOs have features of a generalist (ancestral) enzyme

(Eg) while AMO represents a specialist (descendent) enzyme (Es).

More importantly, the differential substrate specificity between AMO and GAOs as demonstrated by this research convincingly supports the general hypothesis of the enzyme evolution from ancestral promiscuity (Jensen, 1976). Although we do not know the ecological conditions that favoured the emergence of amorphadiene and its derivatives, the functional flexibility that has allowed ancestral GAO to accept amorphadiene was certainly a selection advantage. This ancestral promiscuity was selected and “tinkered” by evolution for optimal

111 activity on amorphadiene, leading to modern AMO. It is, however, still difficult to understand how GAS/GAO has optimally co-evolved to ADS/AMO for efficient synthesis of artemisinic acid in A. annua.

Figure 5.3. Structural similarities between naturally-occurring STLs (bottom) in the Asteraceae and the sesquiterpene substrates tested for oxidation by GAOs (top). Examples of STLs include a guaianolide (lactucin) from lettuce, an eremophinolide (1α-tigloyloxy-8βH,10βH eremophi- 7(11)-en-8α,12-olide) from Senecio poepigii, and a cadinanolide from Pseudelephantopus spicatus. Black colour indicates the core sesquiterpene backbone of each structure.

The multi-specificity of GAOs may also help explain the enormous chemical diversification in the Asteraceae. It has long been speculated that many STLs are derived from chemical decorations and rearrangements of the parental structure costunolide (Figure 1.8). However, analyses of biochemical and phylogenetic relations between GAOs and AMO in this study raise the possibility of independent pathways for specific STLs with specialized STSs and sesquiterpene oxidases (Figure 5.2). For examples, the structural backbones of many STLs found in the Asteraceae, including guaianolides, eremophilanolides, and cadinanolides, are very similar to the sesquiterpenes oxidized by GAOs in this research (Sessa et al., 2000; Ragasa and Rideout,

112

2001; Reina et al., 2004). With the remarkable substrate promiscuity of GAOs, it can be reasoned that these STLs may be derived from their respective specialized STSs and AMO/GAO homologues (Figure 5.3).

This view is supported by the selection (Ka/Ks) analysis of GAOs. The selection constraints on genes in the same metabolic pathway have been suggested to be progressively relaxed from early to later steps due to the pleiotropic nature of upstream genes (Otto, 2004; Ramsay et al.,

2009). This trend is observed for FPSs and GASs. However, GAOs appear to be under a stronger constraint (lower Ka/Ks) although they are downstream of FPSs and GASs (Chapter 4) in STL biosynthesis. This unusually strong constraint on GAOs together with the demonstrated substrate promiscuity of the enzymes they encode strongly suggests that they represent a highly connected node in the STL metabolic network. GAOs are thus very likely a critical step responsible for the diversity of STLs in the Asteraceae. With the power the next-generation sequencing technologies, the increase at an unprecedented rate of plant genomics and transcriptomics data will help future studies to resolve these questions.

5.3. AMO/GAO structure–function analysis

The remarkable sequence homology between AMO and GAOs strongly suggests that changes of a few amino acids are sufficient to alter the substrate specificity of these P450s.

However, rational site-directed mutagenesis of active-site residues based on homology modelling in this research could not improve the specificity of GAOs towards amorphadiene. Therefore, it can be inferred that residues outside of the 10 Å radius of the substrate- influence the substrate specificity.

113

Several studies on P450s and other enzymes have shown that the activities of enzymes on certain substrates depend not only on the active site but also on the interactions between the substrates and non-active site residues (Wongtrakul et al., 2003; Valmsen et al., 2007;

Fishelovitch et al., 2009; Wilderman et al., 2011). These interactions in P450s include initial substrate recognition, substrate movement from the enzyme surface to the buried active site, and enzyme–redox partner binding (Scott et al., 2002; Scott et al., 2004; Kumar et al., 2005). Such interactions between substrates and the non-active site amino acids cannot be well predicted by homology modelling. Therefore, other approaches should be employed to deduce the plasticity residues behind the differential activities between P450s.

Without a solid knowledge of the interactions between substrates and the enzymes as well as between residues of an enzyme, directed evolution could be utilized to elucidate the critical amino acids. Recent attempts in improving the activities of bacterial and mammalian P450s using directed evolution have achieved promising results and revealed the critical roles of residues outside of the active sites. Glieder et al. (2002) used laboratory evolution to pinpoint eleven residues which could be mutated to improve the hydroxylation capacity of the bacterial

P450 BM3 on unnatural alkane substrates; ten of these eleven residues were not in the active site.

Directed evolution was also used to enhance the activities of human CYP1A2 towards two different substrates, and the critical residues were identified to be distant from the substrates

(Kim and Guengerich, 2004a,b). This approach can be applied to understand the structural difference between AMO and GAOs. However, due to the lack of a simple screening system, homology modelling will still be useful in narrowing the target regions for directed evolution.

Another option to answer the question of structural difference between AMO and GAOs is enzyme crystallography. The purification and crystallization of membrane-bound proteins such

114 as AMO and GAOs are difficult but feasible with several examples of crystal structures of human P450s (Johnson et al., 2002; Yano et al., 2004; Rowland et al., 2006). Morita et al.

(2011) reported a successful application of crystal structure-based mutagenesis to alter the catalytic activity of a promiscuous polyketide synthase towards efficient production of

“unnatural” polyketide-alkaloid hybrid compounds with bioactivities. The remarkable substrate promiscuity of GAOs could also be utilized in such a way to produce valuable “novel” sesquiterpenoids. In addition, understanding which structural elements underlie the functional transformation from GAO and AMO will have significant implications in studying structural evolution.

In conclusion, this research has successfully isolated and characterized the sesquiterpene oxidases, GAOs, which encode the enzymes catalyzing the central three-step oxidation of germacrene A to GAA in the general STL biosynthetic route of the Asteraceae family. It also provided convincing evidence supporting the general hypothesis of enzyme evolution from ancestral promiscuity. Although this study raises more questions than answers, it opens one more door into the realm of STL metabolism, which is for the large part still a mystery.

This study overcame a typical obstacle in plant metabolism research, which is the lack and instability of biosynthetic substrates and/or intermediates. Both germacrene A and GAA are unstable and transient metabolites in the STL biosynthetic pathway in plants. Therefore, it is practically impossible to obtain these compounds at sufficient amounts for biochemical studies.

By combining available metabolic engineering platform and multiple gene expressions in yeast, this research demonstrated a convenient alternative system for enzyme characterization and an adequate level of metabolite production. More examples of employing yeast for specialized metabolism study are given in the Appendix.

115

References

Abdel-Rahman FH, Alaniz NM, Saleh MA (2013). Nematicidal activity of terpenoids. Journal of Environmental Science and Health, Part B 48:16–22. Abou-Douh AM (2008). New eudesmane derivatives and other sesquiterpenes from the epigeal parts of Dittrichia graveolens. Chemical & Pharmaceutical Bulletin 56:1535–1545. Aharoni A, Gaidukov L, Khersonsky O, Gould SM, Roodveldt C, Tawfik DS (2004). The ‘evolvability’ of promiscuous protein functions. Nature Genetics 37:73–76. Ajijumar PK, Tyo K, Carlsen S, Mucha O, Phon TH, Stephanopoulos G (2008). Terpenoids: opportunities for biosynthesis of natural product drugs using engineered microorganisms. Molecular Pharmaceutics 5:167–190. Altschul S, Gish W, Miller W, Myers E, Lipman D (1990). Basic local alignment search tool. Journal of Molecular Biology 215:403–410. Alves R, Chaleil RAG, Sternberg MJE (2002). Evolution of enzymes in metabolism: a network perspective. Journal of Molecular Biology 320:751–770. Anzenbacher P, Anzenbacherová E (2001). Cytochrome P450 and metabolism and xenobiotics. Cellular and Molecular Life Sciences 58:737–747. Arigoni D, Sagner S, Latzel C, Eisenreich W, Bacher A, Zenk MH (1997). Terpenoid biosynthesis from 1-deoxy-D-xylulose in higher plants by intramolecular skeletal rearrangement. Proceedings of the National Academy of Sciences of the USA 94:10600– 10605. Arnold FH, Wintrode PL, Miyazaki K, Gershenson A (2001). How enzymes adapt: lessons from directed evolution. Trends in Biochemical Sciences 26:100–106. Ashour M, Wink M, Gershenzon J (2010). Biochemistry of terpenoids: monoterpenes, sesquiterpenes and diterpenes. Annual Plant Reviews 40:258–303. Attia M, Kim SU, Ro DK (2012). Molecular cloning and characterization of (+)-epi-α- bisabolol synthase, catalyzing the first step in the biosynthesis of the natural sweetener, hernandulcin, in Lippia dulcis. Archives of Biochemistry and Biophysics 527:37–44. Austin RS, Vidaurre D, Stamatiou G, Breit R, Provart NJ, Bonetta D, Zhang J, Fung P, Gong Y, Wang PW, McCourt P, Guttman DS (2011). Next-generation mapping of Arabidopsis genes. Plant Journal 67:715–725. Bariuso J, Nguyen TD, Li JW, Roberts JN, MacNevin G, Chaytor JL, Marcus SL, Vederas JC, Ro D-K (2011). Double oxidation of the cyclic nonaketide dihydromonacolin L to monacolin J by a single cytochrome P450 monooxygenase, LovA. Journal of the American Chemical Society 133:8078–8081. Barreda VD, Palazzesi L, Tellería MC, Katinas L, Crisci JV, Bremer K, Passalia MG, Corsolini R, Rodríguez Brizuela R, Bechis F (2010). Eocene Patagonia fossils of the daisy family. Science 329:1621–1621. Benke D, Barberis A, Kopp S, Atlmann KH, Schubiger M, Vogt KE, Rudolph U, Mohler H (2009). GABAA receptors as in vivo substrate for the anxiolytic action of valerenic acid, a major constituent of valerian root extracts. Neuropharmacology 56:174–181.

116

Bennett MH, Mansfield JW, Lewis MJ, Beale MH (2002). Cloning and expression of sesquiterpene synthase genes from lettuce (Lactuca sativa L.). Phytochemistry 60:255– 261. Bent S, Padula A, Moore D, Patterson M, Mehling W (2006). Valerian for sleep: a systematic review and meta-analysis. American Journal of Medicine 119:1005–1012. Bertea CM, Voster A, Verstappen FW, Maffei M, Beekwilder J, Bouwmeester HJ (2006). Isoprenoid biosynthesis in Artemisia annua: cloning and heterologous expression of a germacrene A synthase from a glandular trichome cDNA library. Archives of Biochemistry and Biophysics 448:3–12. Bick JA, Lange BM (2003). Metabolic cross talk between cytosolic and plastidial pathways of isoprenoid biosynthesis: unidirectional transport of intermediates across the chloroplast envelope membrane. Archives of Biochemistry and Biophysics 415:146–154. Bloom JD, Meyer MM, Meinhold P, Otey CR, MacMillan D, Arnold FH (2005). Evolving strategies for enzyme engineering. Current Opinion in Structural Biology 15:447–452. Bohm BA, Stuessy TF (1995). Flavonoid chemistry of Barnadesioideae (Asteraceae). Systematic Botany 20:22–27. Bolhmann F, Zdero C (1978). New sesquiterpenes and acetylenes from Athanasia and Pentzia species. Phytochemistry 17:1595–1599. Bolhmann J, Keeling CI (2008). Terpenoid biomaterials. Plant Journal 54:656–669. Bolhmann J, Steele CL, Croteau R (1997). synthases from grand fir (Abies grandis): cDNA isolation, characterization, and functional expression of myrcene synthase, (–)-4S-limonene synthase, and (–)-(1S,5S)-pinene synthase. Journal of Biological Chemistry 272:21784–21792. Bouwmeester HJ, Kodde J, Verstappen FWA, Altug IG, de Kraker JW, Wallaart TE (2002). Isolation and characterization of two germacrene A synthase cDNA clones from chicory. Plant Physiology 129:134–144. Bouwmeester HJ, Wallaart TE, Janssen MHA, van Loo B, Jansen BJM, Posthumus MA, Schmidt CO, de Kraker JW, König WA, Franssen MCR (1999). Amorpha-4,11-diene synthase catalyses the first probable step in artemisinin biosynthesis. Phytochemistry 52:843–854. Bowell GP, Bozak K, Zimmerlin A (1994). Plant cytochrome P450. Phytochemistry 37:1491– 1506. Bowles D, Isayenkova J, Lim EK, Poppenberger B (2005). Glycosyltransferases: managers of small molecules. Curren Opion in Plant Biology 8: 254–263. Breitmeier E (2006). Terpenes: flavors, fragrances, pharmaca, pheromones. Wiley-VCH, Germany. Brown GD, Sy LK (2007). In vivo transformations of dihydroartemisinic acid in Artemisia annua plants. Tetrahedron 60:1139–1159. Brunger AT, Adams PD, Clore GM, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges N, Pannu NS, Read RJ, Lice LM, Simonson T, Warren GL (1998). Crystallography & NMR System (CNS): a new software suite for macromolecular structure determination. Acta Crystallographica D54:905–921.

117

Campbell CD, Vederas JC (2010). Biosynthesis of lovastatin and related metabolites formed by fungal iterative PKS enzymes. Biopolymers 93:755–763. Cankar K, van Houwelingen A, Bosch D, Sonke T, Bouwmeester HJ, Beekwilder J (2011). A chicory cytochrome P450 mono-oxygenase CYP71AV8 for the oxidation of (+)- valencene. FEBS Letters 585:178–182. Caputi L, Aprea E (2010). Use of terpenoids as natural flavouring compounds in food industry. Recent Patents on Food, Nutrition & Agriculture 3:9–16. Carter OA, Peters RJ, Croteau R (2003). Monoterpene biosynthesis pathway construction in Escherichia coli. Phytochemistry 64:425–433. Chaloner WG (1970). The rise of the first land plants. Biological Reviews 45:353–377. Chang MCY, Eachus RA, Trieu W, Ro DK, Keasling JD (2007). Engineering Escherichia coli for production of functionalized terpenoids using plant P450s. Nature Chemical Biology 3:274–277. Chapple C (1998). Molecular-genetic analysis of plant cytochrome P450-dependent monooxygenases. Annual Review of Plant Physiology and Plant Molecular Biology 49:311– 343. Chaturvedi D (2011). Sesquiterpene lactones: structural diversity and their biological activities. In: Tiwari VK, Mishra BB (Eds) Opportunity, Challenge and Scope of Natural Products in Medicinal Chemistry. Research Signpost, India. pp. 313–334. Chen F, Tholl D, Bohlmann J, Pichersky E (2011). The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant Journal 66:212–229. Chen XY, Chen Y, Heinstein P, Jo Davisson V (1995). Cloning, expression, and characterization of (+)-δ-cadinene synthase: a catalyst for cotton phytoalexin biosynthesis. Archives of Biochemistry and Biophysics 324:255–266. Chen DH, Ye HC, Li GF (2000). Expression of a chimeric farnesyl diphosphate synthase gene in Artemisia annuai L. transgenic plants via Agrobacterium tumefaciens-mediated transformation. Plant Science 155:179–185. Christensen SB, Andersen A, Kromann H, Treiman M, Tombal B, Denmeade S, Isaacs JT (1999). Thapsigargin analogues for targeting programmed cell death of androgen- independent prostate cancer cells. Bioorganic & Medicinal Chemistry 7:1273–1280. Coon MJ (2005). Cytochrome P450: nature’s most versatile biological catalyst. Annual Review of Pharmacology and Toxicology 45:1–25. Cronquist A (1977). The Compositae revisited. Brittonia 29:137–153. Croteau R, Kutchan TM, Lewis NG (2000). Natural products (secondary metabolites). In:Buchanan B, Gruissem W, Jones R (Eds) Biochemistry and molecular biology of plants. American Society of Plant Biologists, USA. pp. 1250–1318. Davies BSJ, Wang HS, Rine J (2005). Dual activators of the sterol biosynthetic pathway of Saccharomyces cerevisiae: similar activation/regulatory domains but different response mechanisms. Molecular and Cellular Biology 25:7375–7385. de Groot MJ (2006). Designing better drugs: predicting cytochrome P450 metabolism. Drug Discovery Today 11:601–606.

118 de Kraker JW, Franssen MCR, Dalm MCF, de Groot A, Bouwmeester HJ (2001a). Biosynthesis of germacrene A carboxylic acid in chicory roots: demonstration of a cytochrome P450 (+)-germacrene A hydroxylase and NADP+-dependent sesquiterpenoid dehydrogenase(s) involved in sesquiterpene lactone biosynthesis. Plant Physiology 125:1930–1940. de Kraker JW, Franssen MCR, de Groot A, König WA, Bouwmeester HJ (1998). (+)- Germacrene A biosynthesis. Plant Physiology 117:1381–1392. de Kraker JW, Franssen MCR, de Groot A, Shibata T, Bouwmeester HJ (2001b). Germacrenes from fresh costus roots. Phytochemistry 58:481–487. de Kraker JW, Franssen MCR, Joerink M, de Groot A, Bouwmeester HJ (2002). Biosynthesis of costunolide, dihydrocostunolide, and leucodin: demonstration of cytochrom P450- catalyzed formation of the lactone ring present in sesquiterpene lactones of chicory. Plant Physiology 129:257–268. Degenhardt J, Köllner TG, Gershenzon J (2009). Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochemistry 70:1621–1637. Denisov IG, Makris TM, Sligar SG, Schlichting I (2005). Structure and chemistry of cytochrome P450. Chemical Reviews 105:2253–2277. Dewick PM (2009). Medicinal natural products: a biosynthetic approach (Third Edition). John Wiley & Sons Ltd., UK. Dietrich JA, McKee AE, Keasling JD (2010). High-throughput metabolic engineering: advances in small-molecule screening and selection. Annual Review in Biochemistry 79:21.1–21.8. Donald KAG, Hampton RY, Fritz IB (1997). Effects of overproduction of the catalytic domain of 3-hydroxy-3-methylglutaryl coenzyme A reductase on squalene synthesis in Saccharomyces cerevisiae. Applied and Environmental Microbiology 63:3341–3344. Dudareva N, Anderson S, Orlova I, Gatto N, Reichlt M, Rhodes D, Boland W, Gershenzon J (2005). The nonmevalonate pathway supports both monoterpene and sesquiterpene formation in snapdragon flowers. Proceedings of the National Academy of Sciences of the USA 102:933–938. Eckstein-Ludwig U, Webb RJ, van Goethem ID, East JM, Lee AG, Kimura M, O’Neill PM, Bray PG, Ward SA, Krishna S (2003). target the SERCA of Plasmodium falciparum. Nature 424:957–961. Eisenreich W, Bacher A, Arigoni D, Rohdich F (2004). Biosynthesis of isoprenoids via the non-mevalonate pathway. Cellular and Molecular Life Sciences 61:1401–1426. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A (2007). Comparative protein structure modeling using MODELLER. In: Current Protocols in Protein Science. Wiley Interscience. pp. 2.9.1–2.9.31. Facchini PJ, Bohlmann J, Covello PS, De Luca V, Mahadevan R, Page JE, Ro DK, Sensen CW, Storms R, Martin VJJ (2012). Synthetic biosystems for the production of high-value plant metabolites. Trends in Biotechnology 30:127–131. Facchini PJ, Chappell J (1992). Gene family for an elicitor-induced sesquiterpene cyclase in tobacco. Proceedings of the National Academy of Sciences of the USA 89:11088–11092.

119

Faraldos JA, Wu S, Chappell J, Coates RM (2007). Conformational analysis of (+)- germacrene A variable-temperature NMR and NOE spectroscopy. Tetrahedron 63:7733– 7742. Fishelovitch D, Shaik S, Wolfson HJ, Nussinov R (2009). Theoretical characterization of substrate access/exit channels in the human cytochrome P450 3A4 enzyme: involvement of phenylalanine residues in the gating mechanism. Journal of Physical Chemistry B 113:13018–13025. Fraga BM (2007). Natural sesquiterpenoids. Natural Product Reports 24:1350–1381. Funk VA, Bayer RJ, Keeley S, Chan R, Watson L, Gemeinholzer B, Schilling E, Panero JL, Baldwin BG, Garcia-Jacas N, Susanna A, Jansen RK (2005). Everywhere but Antartica: using a supertree to understand the diversity and distribution of the Compositae. Biologiske Skrifter 55:343–374. Funk VA, Susanna A, Stuessy TF, Bayer RJ (2009). Systematics, evolution, and biogeography of Compositae. International Association for Plant Taxonomy. University of Vienna, Vienna, Austria. Furusawa M, Toshihiro H, Yoshiaki N, Yoshinori A (2005). Highly efficient production of nootkatone, the grapefruit aroma from valencene, by biotransformation. Chemical & Pharmaceutical Bulletin 53:1513–1514. Furuya Y, Lundmo P, Short AD, Gill DL, Isaacs JT (1994). The role of calcium, pH, and cell proliferation in the programmed (apoptotic) death of androgen-independent prostatic cancer cells induced by thapsigargin. Cancer Research 54:6167–6175. Gaffe J, Bru JP, Causse M, Vidal A, Stamitti-Bert L, Carde JP, Gallusci P (2000). LEFPS1, a tomato farnesyl pyrophosphate gene highly expressed during early fruit development. Plant Physiology 123:1351–1362. Gambliel H, Croteau R (1984). Pinene cyclases I and II: two enzymes from sage (Salvia officinalis) which catalyze stereospecific cyclizations of geranyl pyrophosphate to monoterpene olefins of opposite configuration. Journal of Biological Chemistry 259:740– 748. Gang DR (2005). Evolution of flavors and scents. Annual Review of Plant Biology 56:301–325. Gao X, Xie X, Pashkov I, Sawaya MR, Laidman J, Zhang W, Cacho R, Yeates TO, Tang Y (2009). Directed evolution and structural characterization of a simvastatin synthase. Chemistry & Biology 16:1064–1074. Gietz RD, Schiestl RH (2007). High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nature Protocols 2:31–34. Glasner ME, Gerlt JA, Babbitt PC (2006). Evolution of enzyme superfamilies. Current Opinion in Chemical Biology 10:492–497. Glawischnig E, Grün S, Frey M, Gierl A (1998). Cytochrome P450 monooxygenases of DIBOA biosynthesis: specificity and conservation among grasses. Phytochemistry 50:925– 930. Glieder A, Farinas ET, Arnold FH (2002). Laboratory evolution of a soluble, self-sufficient, highly active alkane hydroxylase. Nature Biotechnology 20:1135–1139.

120

González-Coloma A, Guadaño A, Tonn CE, Sosa ME (2005). Antifeedant/insecticidal terpenes from Asteraceae and Labiatae species native to Argentinean semi-arid lands. Zeitschrift für Naturforschung C 60:855–861. Göpfert JC, Buwlow AK, Spring O (2010). Identification and functional characterization of a new sunflower germacrene A synthase (HaGAS3). Natural Product Communications 5:709–715. Göpfert JC, Heil N, Conrad J, Spring O (2005). Cytological development and sesquiterpene lactone secretion in capitate glandular trichomes of sunflower. Plant Biology 7:148–155. Göpfert JC, MacNevin G, Ro DK, Spring O (2009). Identification, functional characterization and developmental regulation of sesquiterpene synthases from sunflower capitate glandular trichomes. BMC Plant Biology 9:86–103. Gotoh O (1992). Substrate recognition sites in cytochrome P450 family 2 (CYP2) protein inferred from comparative analyses of amino acids and coding nucleotide sequences. Journal of Biological Chemistry 267:83–90. Greenhagen B, Chappell J (2001). Molecular scaffolds for chemical wizardy: learning nature’s rules for terpene cyclases. Proceedings of the National Academy of Sciences of the USA 98:13479–13481. Guo H, Pei X, Wan F, Cheng H (2011). Molecular cloning of allelopathy related genes and their relation to HHO in Eupatorium adenophorum. Molecular Biology Reports 38:4651– 4656. Hagel JM, Yeung EC, Facchini PJ (2008). Got milk? The secret life of laticifers. Trends in Plant Science 13:631–639. Hall IH, Lee KH, Starnes CO, Sumida Y, Wu RY, Waddell TG, Cochran JW, Gerhart KG (1979). Anti-inflammatory activity of sesquiterpene lactones and related compounds. Journal of Pharmaceutical Sciences 68:537–542. Hannemann F, Bichet A, Ewen KM, Bernhardt R (2006). Cytochrome P450 systems – biological variations of electron transport chains. Biochimica et Biophysica Acta 1770:330–344. Harmatha J, Nawrot J (1984). Comparison of the feeding deterrent activity of some sesquiterpene lactones and a lignan lactone towards selected insect storage pests. Biochemical Systematics and Ecology 12:95–98. Harvey AL (2008). Natural products in drug discovery. Drug Discovery Today 13:894–901. Hasemann CA, Kurumbail RG, Boddupalli SS, Peterson JA, Deisenhofer J (1995). Structure and function of cytochromes P450: a comparative analysis of three crystal structures. Structure 2:41–62. Hawkins KM, Smolke CD (2008). Production of benzylisoquinoline alkaloids in Saccharomyces cerevisiae. Nature Chemical Biology 4:564–573. Hehner SP, Hofmann TG, Dröge W, Schmitz ML (1999). The antiinflammatory sesquiterpene lactone parthenolide inhibits NF-κB by targeting the IκB kinase complex. Journal of Immunology 163:5617–5623. Helariutta Y, Kotilainen M, Elomaa P, Teeri TH (1995). Chalcone synthase-like genes active during corolla development are differentially expressed and encode enzymes with

121

different catalytic properties in Gerbera hybrida (Asteraceae). Plant Molecular Biology 28:47–60. Hemmerlin A, Hoefller JF, Meyer O, Tritsch D, Kagan IA, Grosdemange-Billiard C, Rohmer M, Bach TJ (2003). Cross-talk between the cytosolic mevalonate and the plastidial methylerythritol phosphate pathways in tobacco bright yellow-2 cells. Journal of Biological Chemistry 278:26666–26676. Hemmerlin AM, Rivera SB, Erickson HK, Poulter CD (2003). Enzymes encoded by the farnesyl diphosphate synthase gene family in the big sagebrush Artemisia tridentata ssp. spiciformis. Journal of Biological Chemistry 278:32132–32140. Hernández V, del Carmen Recio M, Máñez S, Prieto JM, Giner RM, Ríos JL (2001). A mechanistic approach to the in vivo anti-inflammatory activity of sesquiterpenoid compounds isolated from Inula viscosa. Planta Medica 67:726–731. Herrero O, Ramón D, Orejas M (2008). Engineering the Saccharomyces cerevisiae isoprenoids pathway for de novo production of aromatic monoterpenes in wine. Metabolic Engineering 10:78–86. Herz W (1977). Sesquiterpene lactones in the Compositae. In: Heywood VH, Harborne JB, Turner BL (Eds) The biology and chemistry of the Compositae. Volume 1, pp. 334–358. Academic Press, Inc., New York. Hildebrand A, Remmert M, Biegert A, Söding J (2009). Fast and accurate automatic structure prediction with HHpred. Proteins 77 (Supple 9):128–132. Huang M, Lu JJ, Huang MQ, Bao JL, Chen XP, Wang YT (2012). Terpenoids: natural products for cancer therapy. Expert Opinion on Investigational Drugs 21:1801–1808.

Hurst LD (2002). The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends in Genetics 18:486–487. Ikezawa N, Göpfert JC, Nguyen TD, Kim SU, O’Maille PE, Spring O, Ro DK (2011). Lettuce costunolide synthase (CYP71BL2) and its homolog (CYP71BL1) from sunflower catalyze distinct regio- and stereoselective hydroxylations in sesquiterpene lactone metabolism. Journal of Biological Chemistry 286:21601–21611. Jacob F (1977). Evolution and tinkering. Science 196:1161–1166. James LC, Tawfik DS (2003). Conformational diversity and protein evolution – a 60-year- old hypothesis revisited. Trends in Biochemical Sciences 28:361–368. Jansen RK, Palmer JD (1987). A chloroplast DNA inversion marks an ancient evolutionary split in the sunflower family (Asteraceae). Proceedings of the National Academy of Science of the USA 84:5818–5822. Jensen RA (1976). Enzyme recruitment in evolution of new function. Annual Review of Microbiology 30:409–425. Johnson EF, Wester MR, Stout CD (2002). The structure of microsomal cytochrome P450 2C5: a and drug metabolizing enzyme. Endocrine Research 28:435–441. Jonczyk R, Schmidt H, Osterrieder A, Fiesselmann A, Schullehner K, Haslbeck M, Sicker D, Hofmann D, Yalpani N, Simmons C, Frey M, Gierl A (2008). Elucidation of the final reactions of DIMBOA-glucoside biosynthesis in maize: characterization of Bx6 and Bx7. Plant Physiology 146:1053–1063.

122

Katinas L, Stuessy TF (1997). Revision of Doniophyton (Compositae, Barnadesioideae). Plant Systematics and Evolution 206:33–45. Kato T, Kabuto C, Sasaki N, Tsunagawa M, Aizawa H, Fujita K, Kato Y, Kitahara Y (1973). Momilactones, growth inhibitors from rice, Oryza sativa L. Tetrahedron Letters 14:3861– 3864. Keasling JD, Nielsen J (2011). Synergies between synthetic biology and metabolic engineering. Nature Biotechnology 29:693–695. Kennedy J, Auclair K, Kendrew SG, Park C, Vederas JC, Hutchinson CR (1999). Modulation of polyketide synthase activity by accessory proteins during lovastatin biosynthesis. Science 284:1368–1372. Khersonsky O, Roodveldt C, Tawfik DS (2006). : evolutionary and mechanistic aspects. Current Opinion in Chemical Biology 10:498–508. Khersonsky O, Tawfik DS (2010). Enzyme promiscuity: a mechanistic and evolutionary perspective. Annual Review of Biochemistry 79:11.1–11.35. Kim D, Guengerich FP (2004a). Enhancement of 7-methoxyresorufin O-demethylation activity of human cytochrome P450 1A2 by molecular breeding. Archives of Biochemistry and Biophysics 432:102–108. Kim D, Guengerich FP (2004b). Selection of human cytochrome P450 1A2 mutants with enhanced catalytic activity for heterocyclic amine H-hydroxylation. Biochemistry 43:981– 988. Kim SH, Heo K, Chang YJ, Park SH, Rhee SK, Kim SU (2006). Cyclization mechanism of amorpha-4,11-diene synthase, a key enzyme in artemisinin biosynthesis. Journal of Natural Products 69:758–762. Kinch LN, Grishin NV (2002). Evolution of protein structures and functions. Current Opinion in Structural Biology 12:400–408. Kita T, Imai S, Sawada H, Kumagai H, Seto H (2008). The biosynthetic pathway of curcuminoid in turmeric (Curcuma longa) as revealed by 13C-labeled precursors. Bioscience, Biotechnology, and Biochemistry 72:1789–1798. Komagata D, Shimada H, Murakawa S, Endo A (1989). Biosynthesis of monacolins: conversion of monacolin L to monacolin J by a monooxygenase of Monascus ruber. Journal of Antibiotics (Tokyo) 42:407–412. Koonin EV, Wolf YI (2010). Constraints and plasticity in genome and molecular-phenome evolution. Nature Reviews Genetics 11:487–498. Kovacs WJ, Olivier LM, Krisans SK (2002). Central role of peroxisomes in isoprenoids biosynthesis. Progress in Lipid Research 41:369–391. Kumar S, Chen CS, Waxman DJ, Halpert JR (2005). Directed evolution of mammalian cytochrome P450 2B1: mutations outside of the active site enhance the metabolism of several substrates, including the anticancer prodrugs cyclophosphamide and isosfamide. Journal of Biological Chemistry 280:19569–19575. Kumeta Y, Ito M (2010). Characterization of δ-guaiene synthases from cultured cells of Aquilaria, responsible for the formation of sesquiterpenes in agarwood. Plant Physiology 154:1998–2007.

123

Lamb DC, Lei L, Warrilow AGS, Lepesheva GI, Mullins JGL, Waterman MR, Kelly SL (2009). The first virally encoded cytochrome P450. Journal of Virology 83:8266–8269. Laule O, Fürholz A, Chang HS, Zhu T, Wang X, Heifetz PB, Gruissem W, Lange BM (2003). Crosstalk between cytosolic and plastidial pathways of isoprenoid biosynthesis in Arabidopsis thaliana. Proceedings of the National Academy of Sciences of the USA 100:6866–6871. Léon LG, Donadel OJ, Tonn CE, Padrón JM (2009). Tessaric acid derivatives induce G2/M cell cycle arrest in human solid tumor cell lines. Bioorganic & Medicinal Chemistry 17:6251–6256. Lewinsohn E, Gijzen M, Croteau R (1992). Wound-inducible pinene cyclase from grand fir: purification, characterization, and renaturation after SDS-PAGE. Archives of Biochemistry and Biophysics 293:167–173. Liu PL, Wan JN, Guo YP, Ge S, Rao GY (2012). Adaptive evolution of the chrysanthemyl diphosphate synthase gene involved in irregular monoterpene metabolism. BMC Evolutionary Biology 12:214–224. Lüker J, Bouwmeester HJ, Aharoni A (2007). Metabolic engineering of terpenoid biosynthesis in plants. In: Verpoorte R, Alfermann AW, Johnson TS (Eds) Applications of plant metabolic engineering. Springer, the Netherlands. pp. 219–236. Ma SM, Li JW, Zhou H, Lee KK, Moorthie VA, Xie X, Kealey JT, Da Silva NA, Vederas JC, Tang Y (2009). Complete reconstruction of a highly reducing iterative polyketide synthase. Science 326:589–592. Majdi M, Liu Q, Karimzadeh G, Malboobi MA, Beekwilder J, Cankar C, de Vos R, Todorović S, Simonović A, Bouwmeester H (2011). Biosynthesis and localization of parthenolide in glandular trichomes of feverfew (Tanacetum parthenium L. Schulz Bip.). Phytochemistry 72:1739–1750. Maldonando-Bonilla LD, Betancourt-Jiménez M, Lozoya-Gloria E (2008). Local and systematic gene expression of sesquiterpene phytoalexin biosynthetic enzymes in plant leaves. European Journal of Plant Pathology 121:439–449. Marco JA, Sanz-Cervera JF, Garcia-Lliso V, Sunsanna A, Garcia-Jacas N (1994). Sesquiterpene lactones, lignans and aromatic esters from Cheirolophus species. Phytochemistry 37:1101– 1107. Martin VJ, Pitera DJ, Withers ST, Newman JD, Keasling JD (2003). Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nature Biotechnology 21:796– 802. Matsushita Y, Kang W, Charlwood BV (1996). Cloning and analysis of a cDNA encoding farnesyl diphosphate synthase from Artemisia annua. Gene 172:207–209. McGarvey DJ, Croteau R (1995). Terpenoid metabolism. Plant Cell 7:1015–1026. Mendiondo ME, Juárez BE, Seeligmann P (1997). Flavonoid patterns of some Barnadesioideae (Asteraceae): eventual chemosystematic significance. Biochemical Systematics and Ecology 25:673–674. Mercke P, Bengtsson M, Bouwmeester HJ, Posthumus MA, Brodelius PE (2000). Molecular cloning, expression, and characterization of amorpha-4,11-diene synthase, a key enzyme

124

of artemisinin biosynthesis in Artemisia annua L. Archives of Biochemistry and Biophysics 381:173–180. Mizutani M, Ohta D (2010). Diversification of P450 genes during land plant evolution. Annual Review of Plant Biology 61:291–315. Mizutani M, Sato F (2011). Unusual P450 reactions in plant secondary metabolism. Archives of Biochemistry and Biophysics 507:194–203. Mori K, Waku M (1985). Synthesis of oryzalexins a, b, and c, the diterpenoidal phytoalexins isolated from rice blast leaves infected with Pyricularia oryzae. Tetrahedron 41:5653– 5660. Morita H, Yamashita M, Shi SP, Wakimoto T, Kondo S, Kato R, Sugio S, Kohno T, Abe I (2011). Synthesis of unnatural alkaloid scaffolds by exploiting plant polyketide synthase. Proceedings of the National Academy of Sciences of the USA 108:13504–13509. Munro AW, Girvan HM, McLean KJ (2007). Variations on a (t)heme – novel mechanisms, redox partners and catalytic functions in cytochrome P450 superfamily. Natural Product Reports 24:585–609. Nagata N, Suzuki M, Yoshida S, Muranaka T (2002). Mevalonic acid partially restores chloroplast and etioplast development in Arabidopsis lacking the non-mevalonate pathway. Planta 216:345–350. Nakagawa A, Minami H, Kim JS, Koyanagi T, Katayama T, Sato F, Kumagai H (2011). A bacterial platform for fermentative production of plant alkaloids. Nature Communications 2:326. Nakamura T, Komagata D, Murakawa S, Sakai K, Endo A (1990). Isolation and biosynthesis of 3 alpha-hydroxy-3,5-dihydromonacolin L. Journal of Antibiotics (Tokyo) 43:1597–1600. Nambara E, Marion-Poll A (2005). Abscisic acid biosynthesis and catabolism. Annual Review of Plant Biology 56:165–185. Nelson D, Werck-Reichhart D (2011). A P450-centric view of plant evolution. Plant Journal 66:194–211. Newman DJ, Cragg GM (2007). Natural products as sources of new drugs over the past 25 years. Journal of Natural Products 70:461–477. Nguyen TD, Göpfert JC, Ikezawa N, MacNevin G, Kathiresan M, Conrad J, Spring O, Ro D-K (2010). Biochemical conservation and evolution of germacrene A oxidase in Asteraceae. Journal of Biological Chemistry 285:16588–16598. Nguyen TD, MacNevin G, Ro DK (2012). De novo synthesis of high-value plant sesquiterpenoids in yeast. Methods in Enzymology 517:261–278. O’Brien PJ, Herschlag D (1999). Catalytic promiscuity and the evolution of new enzymatic activities. Chemistry & Biology 6:R91–R105. Ober D (2010). Gene duplications and the time thereafter – examples from plant secondary metabolism. Plant Biology 12:570–577. Ober D, Hartmann T (1999). Homospermidine synthase, the first pathway-specific enzyme of pyrrolizidine alkaloid biosynthesis, evolved from deoxyhypusine synthase. Proceedings of the National Academy of Sciences of the USA 96:14777–14782. Ohno S, Hosokawa M, Kojima M, Kitamura Y, Hoshino A, Tatsuzawa F, Doi M, Yazawa S (2011). Simultaneous post-transcriptional gene silencing of two different chalcone 125

synthase genes resulting in pure white flowers in the octoploid dahlia. Planta 234:945– 958. Otto SP (2004). Two steps forward one step back: the pleiotropic effects of favoured alleles. Proceedings of the Royal Society B–Biological Sciences 271:705–714. Pan Z, Herickhoff L, Backhaus RA (1996). Cloning, characterization, and heterologous expression of cDNAs for farnesyl diphosphate synthase from the guayule rubber plant reveals that this prenyltransferase occurs in rubber particles. Archives of Biochemistry and Biophysics 332:196–204. Panero JL, Funk VA (2008). The value of sampling anomalous taxa in phylogenetic studies: major clades of the Asteraceae revealed. Molecular Phylogenetics and Evolution 47:757– 782. Parr CL, Keates RA, Bryksa BC, Ogawa M, Yada RY (2007). The structure and function of Saccharomyces cerevisiae proteinase A. Yeast 24:467–480. Pfaffl MW (2001). A new mathematical model for relative quantification in real-time RT- PCR. Nucleic Acids Research 29:2002–2007. Phillips MA, Leon P, Boronat A, Rodríguez-Concepción M (2008). The plastidial MEP pathway: unified nomenclature and resources. Trends in Plant Science 13:619–623. Pichersky E, Gang DR (2000). Genetics and biochemistry of secondary metabolites in plants: an evolutionary perspective. Trends in Plant Science 5:439–445. Pickel B, Drew DP, Manczak T, Weitzel C, Simonsen HT, Ro DK (2012). Identification and characterization of a kuzeaol synthase from Thapsia garganica: implications for the biosynthesis of the pharmaceutical thapsigargin. Biochemical Journal 448:261–271. Picman AK (1986). Biological activities of sesquiterpene lactones. Biochemical Systematics and Ecology 14:255–281. Pompon D, Louerat B, Bronine A, Urban P (1996). Yeast expression of animal and plant P450s in optimized redox environments. Methods in Enzymology 272:51–64. Prosser I, Altug IG, Phillips AL, König WA, Bouwmeester HJ, Beale MH (2004). Enantiospecific (+)- and (–)-germacrene D synthases, cloned from goldenrod, reveal a functionally active variant of the universal isoprenoid-biosynthesis aspartate-rich motif. Archives of Biochemistry and Biophysics 432:136–144. Pyle BW, Tran HT, Pickel B, Haslam TM, Gao Z, MacNevin G, Vederas JC, Ro DK (2012). Enzymatic synthesis of valerena-4,7(11)-diene by a unique sesquiterpene synthase from the valerian plant (Valeriana officinalis). FEBS Journal 279:3136–3146. Radhakrishnan EK, Varghese RT, Vasudevan SE (2010). Unusual intron in the second exon of a type III polyketide synthase gene of Alpinia calcarata Rosc. Genetics and Molecular Biology 33:141–145. Ralston L, Kwon ST, Schoenbeck M, Ralston J, Schenk DJ, Coates RM, Chappell J (2001). Cloning, heterologous expression, and functional characterization of 5-epi-aristolochene- 1,3-dihydroxylase from tobacco (Nicotiana tabaccum). Archives of Biochemistry and Biophysics 15:222–235. Ramsay H, Rieseberg LH, Ritland K (2009). The correlation of evolutionary rate with pathway position in plant terpenoid biosynthesis. Molecular Biology and Evolution 26:1045–1053.

126

Rasaga CY, Rideout JA (2001). An antifungal candinanolide from Pseudoelephantopus spicatus. Chemical and Pharmaceutical Bulletin 49:1359–1361. Rausher MD, Lu YQ, Meyer K (2008). Variation in constraint versus positive selection as an explanation for evolutionary rate variation among anthocyanin genes. Journal of Molecular Evolution 67:137–144. Rees SB, Harborne JB (1985). The role of sesquiterpene lactones and phenolics in the chemical defence of the chicory plant. Phytochemistry 24:2225–2231. Reina M, Gonzáles-Coloma A, Domínguez-Díaz D, Cabrera R, Mariňo CG, Rodríguez ML, Villarroel L (2004). Bioactive eremophilanolides from Senecio peopigii. Natural Product Research 20:13–19. Rivera SB, Swedlund BD, King GJ, Bell RN, Hussey Jr CE, Shattuck-Eidens DM, Wrobel WM, Peiser GD, Poutler CD (2001). Chrysanthemyl disphosphate synthase: isolation of the gene and characterization of the recombinant non-head-to-tail monoterpene synthase from Chrysanthemum cinerariaefolium. Proceedings of the National Academy of Sciences of the USA 98:4373–4378. Ro DK (2011). Terpenoid biosynthesis. In: Ashihara H, Crozier A, Komamine A (Eds) Plant metabolism and biotechnology. John Wiley & Sons Ltd., UK. pp. 217–240. Ro DK, Arimura GI, Lau SYW, Piers E, Bohlmann J (2005). Loblolly pine abietadienol/abietadienal oxidase PtAO (CYP720B1) is a multifunctional, multisubstrate cytochrome P450 monooxygenases. Proceedings of the National Academy of Sciences of the USA 102:8060–8065. Ro DK, Ouellet M, Paradise EM, Burd H, Eng D, Paddon CJ, Newman JD, Keasling JD (2008). Induction of multiple pleiotropic drug resistance genes in yeast engineered to produce an increased level of anti-malarial drug precursor, artemisinic acid. BMC Biotechnology 83:83–96. Ro DK, Paradise EM, Ouellet M, Fisher KJ, Newman KL, Ndungu JM, Ho KA, Eachus RA, Ham TS, Kirby J, Chang MCY, Withers ST, Shiba Y, Sarpong R, Keasling JD (2006). Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440:940–943. Roberts MF, Strack D, Wink M (2010). Biosynthesis of alkaloids and . Annual Plant Reviews 40:20–91. Roberts SC (2007). Production and engineering of terpenoids in plant cell culture. Nature Chemical Biology 3:387–395. Robles M, Aregullin M, West J, Rodriguez E (1995). Recent studies on the zoopharmacognosy, pharmacology and neurotoxicology of sequterpene lactones. Planta Medica 61:199–203. Rodriguez E, Towers GHN, Mitchell JC (1976). Biological activities of sesquiterpene lactones. Phytochemistry 15:1573–1580. Rodríguez-Concepción M, Forés O, Martínez-García JF, González V, Phillips MA, Ferrer A, Boronat A (2004). Distinct light-mediated pathways regulate the biosynthesis and exchange of isoprenoid precursors during arabidopsis seedling development. Plant Cell 16:144–146.

127

Romer M, Knani M, Simonin P, Sutter B, Sahm H (1993). Isoprenoid biosynthesis in bacteria: a novel pathway for the early steps leading to isopentenyl diphosphate. Biochemical Journal 295:517–524. Rowland P, Blaney EE, Smyth MG, Jones JJ, Leydon VR, Oxbrow AK, Lewis CJ, Tennant MG, Modi S, Eggeleston DS, Chenery RJ, Bridges AM (2006). Crystal structure of human cytochrome P450 2D6. Journal of Biological Chemistry 281:7614–7622. Sadowitz B, Seymour K, Costanza MJ, Gahtan V (2010). Statin therapy–Part II: clinical considerations for cardiovascular disease. Vascular and Endovascular Surgery 44:4221– 433. Sanz MBK, Donadel OJ, Rossomando PC, Tonn CE, Guerreiro E (1997). Sesquiterpenes from Tessaria absinthioides. Phytochemistry 44:897–900. Sapir-Mir M, Mett A, Belausov E, Tal-Meshulam S, Frydman A, Gidoni D, Eyal Y (2008). Peroxisomal localization of Arabidopsis isopentenyl diphosphate suggests that part of the plant isoprenoids mevalonic acid pathway is compartmentalized to peroxisomes. Plant Physiology 148:1219–1228. Schramek N, Wang H, Romisch-Margl W, Keil B, Radykewicz T, Winzenhorlein B, Beerhues L, Bacher A, Rohdich F, Gershenzon J, Liu B, Eisenreich W (2010). Artemisinin biosynthesis 13 in growing plants of Artemisia annua: a CO2 study. Phytochemistry 71:179–187. Schuhr CA, Radykewicz T, Sagner S, Latzel C, Zenk MH, Arigoni D, Bacher A, Rohdich F, Eisenreich W (2003). Quantitative assessment of crosstalk between the two isoprenoid biosynthesis pathways in plants by NMR spectroscopy. Phytochemistry Reviews 2:3–16. Schuster SC (2008). Next-generation sequencing transforms today’s biology. Nature Methods 5:16–18. Schwartz SH, Qin X, Loewen MC (2004). The biochemical characterization of two carotenoid cleavage enzymes from arabidopsis inidicates that a caretenoid-derived compound inhibits lateral branching. Journal of Biological Chemistry 279:46940–46945. Scott EE, He YQ, Halpert JR (2002). Substrate routes to the buried active site may vary among cytochrome P450: mutagenesis of the F-G region in P450 2B1. Chemical Research in Toxicology 15:1407–1413. Scott EE, White MA, He YA, Johnson EF, Stout CD, Halpert JR (2004). Structure of mammalian cytochrome P450 2B4 complexed with 4-(4-chlorophenyl)imidazole at 1.9-Å resolution: insight into the range of P450 conformationsand the coordination of redox partner binding. Journal of Biological Chemistry 279:27294–27301. Scotti MT, Emerenciano V, Ferreira MJ, Scotti L, Stefani R, da Silva MS, Mendonça Junior FJ (2012). Self-organizing maps of molecular descriptors for sesquiterpene lactones and their application to the chemotaxonomy of the Asteraceae family. Molecules 17:4684– 4702. Seaman FC (1982). Sesquiterpene lactones as taxonomic characters in the Asteraceae. Botanical Review 48:121–595. Sessa RA, Bennett MH, Lewis MJ, Mansfield JW, Beale MH (2000). Metabolite profiling of sesquiterpene lactones from Lactuca species: major latex components are novel oxalate and sulfate conjugates of lactucin and its derivatives. Journal of Biological Chemistry 275:26877–26884.

128

Sharon-Asa L, Shalit M, Frydman A, Bar E, Holland D, Or E, Lavi U, Lewinsohn E, Eyal Y (2003). Citrus fruit flavor and aroma biosynthesis: isolation, functional characterization, and developmental regulation of CsTPS1, a key gene in the production of the sesquiterpene aroma compound valencene. Plant Journal 36:664–674. Shimizu Y, Maeda K, Kato M, Shimomura K (2010). Isolation of anthocyanin-related MYB gene, GbMYB2, from Gynura bicolor leaves. Plant Biotechnology 27:481–487. Sommerville C, Somerville S (1999). Plant functional genomics. Science 285:380–383. Sorensen JL, Auclair K, Kennedy J, Hutchinson CR, Vederas JC (2002). Tranformations of cyclic nonaketides by Aspergillus terreus mutants blocked for lovastatin biosynthesis at the lovA and lovC genes. Organic & Biomolecular Chemistry 1:50–59. Spring O, Zipper R, Conrad J, Vogler B, Klaiber I, da Costa FB (2003). Sesquiterpene lactones from glandular trichomes of Viguiera radula (Heliantheae; Asteraceae). Phytochemistry 62:1185–1189. Starks CM, Back K, Chappell J, Noel JP (1997). Structural basis for cyclic terpene biosynthesis by tobacco 5-epi-aristolochene synthase. Science 277:1815–1820. Staunton J, Weissman KJ (2001). Polyketide biosynthesis: a millenium review. Natural Product Reports 18:380–416. Stephanopoulos G (1999). Metabolic fluxes and metabolic engineering. Metabolic Engineering 1:1–11. Stout JM, Boukakir Z, Ambrose SJ, Purves RW, Page JE (2011). The hexanoyl-CoA precursor for cannabinoid biosynthesis is formed by an acyl-activating enzyme in sativa trichomes. Plant Journal 71:353–365. St-Pierre B, Vazquez-Flota FA, De Luca V (1999). Multicellular compartmentation of Catharanthus roseus alkaloid biosynthesis predicts intercellular translocation of a pathway intermediate. Plant Cell 11:887–900. Takahashi S, Yeo Y, Greenhagen BT, McMullin T, Song L, Maurina-Brunker J, Rosson R, Noel JP, Chappell J (2007). Metabolic engineering of sesquiterpene metabolism in yeast. Biotechnology and Bioengineering 97:170–181. Takahashi S, Yeo YS, Zhao Y, O’Maille PE, Greenhagen BT, Noel JP, Coates RM, Chappell J (2007). Functional characterization of premnaspirodiene oxygenase, a cytochrome P450 catalyzing regio- and stereo-specific hydroxylations of diverse sesquiterpene substrates. Journal of Biological Chemistry 282:31744–31754. Takahashi S, Zhao Y, O’Maille PE, Greenhagen BT, Noel JP, Coates R, Chappell J (2005). Kinetic and molecular analysis of 5-epi-aristolochene 1,3-dihydroxylase, a cytochrome P450 enzyme catalyzing successive hydroxylations of sesquiterpenes. Journal of Biological Chemistry 280:3686–3696. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011). MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28:2731–2739. Taylor N (1965). Plant drugs that changed the world (First Edition). Harper & Row, Canada. Teoh KH, Polichuk DR, Reed DW, Nowak G, Covello PS (2006). Artemisia annua L. (Asteraceae) trichome-specific cDNAs reveal CYP71AV1, a cytochrome P450 with a key

129

role in the biosynthesis of the antimalarial sesquiterpene lactone artemisinin. FEBS Letters 580:1411–1416. Treiber LR, Reamer RA, Rooney CS, Ramjit HG (1989). Origin of monacolin L from Aspergillus terreus cultures. Journal of Antibiotics (Tokyo) 42:30–36. Valmsen K, Boeglin WE, Järving R, Järving I, Varvas K, Brash AR, Samel N (2007). A critical role of non-active site residues on helices 5 and 6 in the control of prostagladin stereochemistry at carbon 15. Journal of Biological Chemistry 282:28157– 28163. Vaughn SF, Spencer GF (1993). Volatile monoterpenes as potential parent structures for new herbicides. Weed Science 41:114–119. Vyvyan JR (2002). Allelochemicals as leads for new herbicides and agrochemicals. Tetraherdon 58:1631–1646. Wani M, Taylor H, Wall M, Coggon P, McPhail A (1971). Plant antitumor agents. VI. The isolation and structure of taxol, a novel antileukemic and antitumor agent from Taxus brevifolia. Journal of the Americal Chemical Society 93:2325–2327. Weid M, Ziegler J, Kutchan TM (2004). The roles of latex and the vascular bundle in morphine biosynthesis in the opium poppy, Papaver somniferum. Proceedings of the National Academy of Sciences of the USA 101:13957–13962. Wesołowska A, Nikiforuk A, Michalska K, Kisiel W, Chojnacka-Wójcik E (2006). Analgesic and sedative activities of lactucin and some lactucin-like guaianolides in mice. Journal of Ethnopharmacology 107:254–258. Wilderman PR, Gay SC, Jang HH, Zhang Q, Stout CD, Halpert JR (2011). Investigation by site-directed mutagenesis of the role of cytochrome P450 2B4 non-active-site residues in protein–ligand interactions based on crystal structures of the ligand-bound enzyme. The FEBS Journal 279:1607–1620. Wink M (1988). Plant breeding: importance of plant secondary metabolites for protection gainst pathogens and herbivores. Theoretical and Applied Genetics 75:225–233. Wink M (2010). Introduction: biochemistry, physiology and ecological functions of secondary metabolites. Annual Plant Reviews 40:1–19. Wrongtrakul J, Udomsinprasert R, Ketterman A (2003). Non-active site residues Cys69 and Asp150 affected the enzymatic properties of glutathione S- AdGSTD3-3. Insect Biochemistry and Molecular Biology 33:971–979. Xie X, Meehan MJ, Xu W, Dorrestesin PC, Tang Y (2009). Acyltransferase mediated polyketide release from a fungal megasynthase. Journal of the American Chemical Society 131:8388–8389. Yamaguchi S (2008). Gibberellin metabolism and its regulation. Annual Reviews of Plant Biology 59:225–251. Yano JK, Wester MR, Schoch GA, Griffin KJ, Stout CD, Johnson EF (2004). The structure of human microsomal cytochrome P450 3A4 determined by X-ray crystallography to 2.05- Å resolution. Journal of Biological Chemistry 279:38091–38094. Yoshihara K, Ohta Y, Sakai T, Hirose Y (1969). Germacrene D, a key intermediate of cadinene group compounds and bourbonenes. Tetrahedron Letters 27:2263–2264.

130

Yoshikuni Y, Ferrin TE, Keasling JD (2006). Designed divergent evolution of enzyme function. Nature 440:1078–1082. Yue PYK, Huang Y, Wong RNS (2011). Ginsenolide Rg3 and Rh2: the anti-cancer and anti- angiogenic saponins from ginseng. In: Uh-Rahman A, Choudhary MI (Eds) Anti- angiogenesis drug discovery and development. Bentham Science, UAE. pp 34–57. Zhang J, Rosenberg HF, Nei M (1998). Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proceedings of the National Academy of Sciences of the USA 95:3708–3713. Zhang S, Won YK, Ong CN, Shen HM (2005). Anti-cancer potential of sesquiterpene lactones: bioactivity and molecular mechanisms. Current Medicinal Chemistry – Anti- Cancer Agents 5:239–249. Zhang W, Tang Y (2008). Combinatorial biosynthesis of natural products. Journal of Medicinal Chemistry 51:2629–2633. Zhang Y, Teoh KH, Reed DW, Maes L, Goossens A, Olson DJH, Ross ARS, Covello PS (2008). The molecular cloning of artemisinic aldehyde Δ11(13) reductase and its role in glandular trichome-dependent biosynthesis of artemisinin in Artemisia annua. Journal of Biological Chemistry 283:21501–21508. Zheng QX, Xu ZJ, Sun XF, Guéritte F, Cessario M, Sun HD, Cheng CH, Hao XJ, Zhao Y (2003). Eudesmane derivatives and other sesquiterpenes from Laggera alata. Journal of Natural Products 66:1078–1081. Zidorn C (2008). Sesquiterpene lactones and their precursors as chemosystematic markers in the tribe Cichorieae of the Asteraceae. Phytochemistry 69:2270–2296.

131

Appendix: YEAST AS A PLATFORM FOR BIOCHEMICAL STUDIES

AND DE NOVO SYNTHESIS OF NATURAL PRODUCTS

A.1. General introduction

“Natural products ” is a common term referred to the enormously diverse arrays of specialized metabolites synthesized by fungi and plants. Although they are not directly involved in the normal growth and development processes, they play critical roles in enhancing the fitness and survival of the species, and their uses for human’s medicinal, agricultural and industrial purposes are of long history and increasing importance. Natural products share a common theme in their biosynthesis, in which their fundamental backbones are formed based on precursors from basic metabolic pathways (i.e., glycolysis, citric acid cycle, pentose phosphate cycle, and shikimate pathway) via the activities of several enzyme families such as terpene synthase and polyketide synthase. These structures can work as fully-functioning specialized metabolites, but many of them are further chemically decorated (e.g., acetylation, methylation, lipidation, oxidations, epoxidation) for their specific roles in an organism by the co-ordinated catalysis of other enzyme families, including P450s.

Despite the diversity and critical roles of natural products in nature and human society, the majority of their biosynthetic pathways remain unknown, and almost all of them are still produced by extracting the natural sources. However, recent progress in using heterologous hosts

(e.g., Escherichia coli and Saccharomyces cerevisiae ), synthetic biology, and advanced technologies for sequencing thousands of transcripts or even genomes have made the characterization of natural product biosynthesis at the molecular level possible and allowed production of desired metabolites.

132

In addition to understanding the biochemistry of STL biosynthesis in the Asteraceae, this thesis presents the use of genetically-modified strains of the yeast S. cerevisiae as a simple and

reliable platform for biochemical studies and production of some other plant and fungal natural

products.

A.2. Molecular characterization of LovA, a multi-functional P450 in lovastatin biosynthesis

A.2.1. Introduction

Lovastatin, together with its natural analogues and semi-synthetic derivatives (including

compactin, simvastatin, and pravastatin), are potent drugs commonly prescribed for lowering

for their inhibitory effects on 3-hydroxy-3-methylglutaryl coenzyme A reductase, a

rate-limiting enzyme in the cholesterol biosynthetic route (Campbell and Vederas, 2010;

Sadowitz et al ., 2010).

In nature, lovastatin occurs in the filamentous fungus Aspergillus terreus , and a gene cluster

encoding for necessary enzymes for lovastatin biosynthesis has been found (Kennedy et al .,

1999). The pathway starts with the condensation of acetyl coenzyme A and malonyl coenzyme A

units to form 4a,5-dihydromolacolin L acid (Figure A1, DML acid, 1a ) by the iterative type-I

polyketide synthase LovB together with the trans -acting enoyl reductase LovC (Ma et al ., 2009;

Gao et al ., 2009; Xie et al., 2009). This step is then followed by the oxidation to 3 α-hydroxy-3,5-

dihydromonacolin L acid ( 2a ), dehydration to monacolin L acid (ML acid, 3a ), and another

oxidation to monacolin J acid (MJ acid, 4a ), all catalyzed by yet to determined enzyme(s).

Finally, a 2-methylbutyryl side chain synthesized by the second iterative type-I polyketide

synthase LovF is attached to C8 position of MJ acid by the acyltransferase LovD to yield

lovastatin ( 5a ) (Kennedy et al ., 1999).

133

Although the enzyme(s) catalyzing the conversion of DML acid ( 1a ) to MJ acid ( 4a ) were not elucidated, the step from hydroxy-DML acid ( 2a ) to ML acid ( 3a ) has been known to be a

non-enzymatic dehydration (Treiber et al ., 1989; Nakamura et al ., 1990). Importantly, Komagata et al . (1989) used microsome from Monascus ruber , a lovastatin-producing filamentous fungus, and carbon monoxide inhibition assay to demonstrate that the oxidation of ML acid ( 3a ) to MJ acid ( 4a ) was catalyzed by cytochrome P450 monooxygenase(s) (P450). The lovastatin biosynthetic gene cluster indeed has two P450s, LovA and ORF17 (Figure A1), and genetic study on A. terreus with mutant LovA showed the complete absence of the metabolites downstream of

DML acid ( 1a ) (Sorensen et al ., 2002). These results strongly suggest that the oxidations of

DML acid ( 1a ) to hydroxy-DML acid ( 2a ), and of ML acid ( 3a , non-enzymatic dehydration product from 2a ) to MJ acid ( 4a ) are cytochrome P450-dependent, and that LovA is involved in one or both of them.

A.2.2. Isolation and expression of LovA in yeast

Previous attempt of expressing LovA in Escherichia coli proved unsuccessful. Therefore the protease-deficient yeast strain Saccharomyces cerevisiae YPL154C::PEP4 KO (Chapter 2) was

employed as an alternative host.

To provide a redox partner for LovA, A. terreus cytochrome P450 reductase (CPR ) was

searched for using other eukaryotic CPRs as query. A. terreus CPR was then amplified using the

forward and primer 5’-TACGGGATCCAAAATGGCTCAACTCGACACTCTCGA-3’ and the reverse primer 5’-TACGCTCGAGTGACCACACGTCCTCCTGGTA-3’, and digested with

Bam HI and Xho I and cloned into the Bam HI and Sal I positions on pESC-Leu2d plasmid (see

Chapter 2). Microsome of yeast containing pESC-Leu2d :: CPR was assayed to demonstrated reduction activity (7.3-fold higher than microsome of yeast transformed with pESC-Leu2d only).

134

Figure A1. Lovastatin biosynthesis in Aspergillus terreus . A: Lovastatin biosynthetic gene cluster. Black arrows indicate genes involved in lovastatin biosynthesis, and gray arrows indicate gene of unknown functions. Two P450s are represented by the open triangles. B: Proposed lovastatin biosynthetic pathway. Compounds are shown in their hydroxy acid forms ( a), but their corresponding lactone forms are also isolated ( b). Adapted from Barriuso et al. (2012)

135

The forward primer 5’-ACGTCTAGAATGACTGTCGACGCGCTCACA-3’ and the reverse primer 5’-ACGTCTAGAGCTAGTGAACCAGGAAGGCG-3’ were used to amplify

LovA . The amplicon was digested with Xba I and cloned into the Spe I position on the pESC-

Leu2d :: CPR vector, resulting in pESC-Leu2d :: CPR/LovA for expression in yeast.

Despite a eukaryotic host, LovA expression level was undetectable in immunoblot analysis using antibodies against the C-terminal FLAG tag fused with LovA as encoded by pESC-Leu2d vector:: CPR /LovA . As LovA transcripts could be detected by reverse transcriptase-PCR, inefficient translation or sub-cellular targeting in yeast were speculated to cause the low expression. This was overcome by synthesizing codons-optimized LovA for yeast expression

(synthetic LovA = S-LovA ), and replacing 58 N-terminal amino acids of LovA ( hybrid LovA =

H-LovA ) with 43 N-terminal residues of the well-expressed LsGAO1 (Chapter 3) for proper

endoplastic reticulum targeting. A combination of these two versions, hybrid synthetic LovA

(HS-LovA ) was also made. Immunoblot analysis of microsomes from yeast harbouring pESC-

Leu2d dual expression vectors showed that both S-LovA and H-LovA could be clearly detected, and the highest expression was observed in the combinatorial version HS-LovA (data not shown). As H-LovA and HS-LovA had the same amino acid sequence, only HS-LovA and S-

LovA was used for further investigation.

A.2.3. Yeast feeding assay of LovA

Activity of LovA, in yeast transformed with pESC-Leu2d :: CPR/H-LovA and pESC-

Leu2d :: CPR/HS-LovA , was evaluated by feeding the yeast cultures with 100 µM DML acid ( 1a , in the lithium salt form). After eight hr, HPLC with diode array detector (DAD) at 240 nm revealed four new peaks compared to the negative control (yeast with pESC-Leu2d :: CPR only).

136

The retention times and UV spectra of these peaks match those of authentic standards for ML acid and lactone ( 3a and 3b ), and MJ acid and lactone ( 4a and 4b ) (Figure A2). LC–MS analyses in both negative and positive ionization modes also showed consistent masses between these four products and the standards. In addition, S-LovA demonstrated a higher activity than HS-LovA, and thus was selected to be the sole version of LovA for further experiments.

Figure A2. HPLC-UV metabolite profiles of yeast feeding assays. The substrate DML acid ( 1a ) and its lactone form ( 1b ) could not be detected at 240 nm. The resulted products are ML acid ( 3a ), ML lactone (3b ), MJ acid ( 4a ), and MJ lactone ( 4b ). IS is internal standard (cinnamic acid, 20 µM). Adapted from Barriuso et al. (2012)

137

When DML lactone ( 1b ) was fed into the transgenic yeast cultures, LC–MS analysis showed

almost no product compared to the DML acid ( 1a ) feeding assay, suggesting that lactone form of

DML was not the right substrate for LovA. Also, DML acid feeding assays performed in acidic

conditions (pH ~ 3.0 after 24 hr) produced ML and MJ lactones ( 3b and 4b ) almost exclusively

while ML and MJ acids (3a and 4a ) were observed as dominant peaks on HPLC-UV when the yeast cultures were buffered (pH ~ 6.8 after 24 hr). These data indicate that LovA expressed in yeast converted DML acid ( 1a ) to ML acid ( 3a ) and MJ acid ( 4a ), and the lactone forms of ML and MJ ( 3b and 4b ) were by-products of the acidic conditions in yeast cultures.

LovA products were prepared for structural analyses by up-scaling the culture (1 L) of yeast

expressing CPR/S-LovA, maintaining pH at 6.8–7.0, and feeding DML acid. The putative ML

and MJ acids ( 3a and 4a ) were isolated using HPLC. NMR analysis of the putative MJ acid ( 4a )

showed signals identical to those of the authentic MJ acid standard. Structure of the other

product was confirmed to be ML acid ( 3a ) with the following NMR signals: 1H NMR (700 MHz,

CD 3OD) δ 5.85 ( dd, 1H, J = 13.9, 9.7 Hz), 5.70 (m, 1H), 5.39 (br s, 1H), 4.11 (m, 1H), 3.78 (m,

1H), 2.38 (dd, 1H, J = 15.5, 5.0 Hz), 2.34 – 2.28 (m, 3H), 2.05 (m, 1H), 1.82 – 1.69 (m, 3H),

1.65 – 1.58 (m, 3H), 1.50 (m, 2H), 1.40 – 1.28 (m, 2H), 1.20 (m, 1H), 0.98 (d, 3H, J = 7.2 Hz),

13 0.89 (d, 3H, J = 7.1 Hz); C NMR (125 MHz, CD 3OD) δ 138.4, 134.2, 130.8, 129.5, 71.3, 68.9,

44.9, 44.8, 43.6, 36.4, 35.8, 32.8, 30.6, 30.1, 26.0, 23.6, 21.7, 14.2 (the carbonyl carbon is not

observed due it slow intensity from long relaxation time of quaternary carbon atoms). High-

resolution FT-ICR-MS analyses of the putative ML and MJ acids ( 3a and 4a ) also matched the

theoretical masses of these compounds (within 0.4 ppm deviation).

138

A.2.4. In vitro assay for LovA activity

Microsome of yeast harbouring pESC-Leu2d :: CPR/S-LovA was extracted for in vitro enzyme assay. In general, (–)-LC/ESI-MS analysis showed consistent results with the in vivo feeding assay. No products were detected when DML lactone ( 1b ) was used as substrate; and

ML acid ( 3a , m/z 323) and MJ acid ( 4a , m/z 321) were detected clearly when DML acid ( 1a ) was fed (Figure A3).

To test if ML acid ( 3a ) is an intermediate or a shunt product irreversibly released from LovA during the conversion of DML acid ( 2a ) to MJ acid ( 4a ), another LovA in vitro enzyme assay was performed with ML acid ( 3a ) as substrate. (–)-LC/ESI-MS analysis revealed a clear conversion of ML acid ( 3a ) to MJ acid ( 4a ) in microsome of yeast transformed with pESC-

Leu2d :: CPR/S-LovA and no conversion in the negative control (Figure A3). Kinetic study of the

–1 –1 microsome showed a Vmax of 9.1 ± 0.5 pmol min mg , and a KM of 6.2 ± 1.1 µM, which falls into the reasonable physiological concentration range of metabolism intermediates. This indicates that ML acid ( 3a ) is a true intermediate in lovastatin biosynthesis.

Different from the in vivo feeding assay, however, two additional peaks with m/z 339 were present in the in vitro product profile. It was speculated that these new peaks could be 3 α- hydroxy-3,5-dihydromonacolin L acid ( 2a ), a known intermediate between DML acid and ML acid, and its isomer 4a α-hydroxy-4a,5-dihydromonacolin L acid (MW = 340). They could also

be α- and β-3,4-epoxy-dihydromonacolin L (MW = 340) as P450s could catalyze epoxidation

(Figure A4).

139

Figure A3. (–)-LC–MS analyses of in vitro LovA enzyme assays. Ions of m/z 321, 323, 337, and 339 were selected to detect ML acid ( 3a , MW = 322), DML acid ( 1a , MW = 324), MJ acid ( 4a , MW = 338), and hydroxy-3,5-dihydromonacolin L acids or other potential intermediates (MW = 340), respectively. A: DML acid ( 1a ) was substrate B: ML acid ( 3a ) was substrate Adapted from Barriuso et al. (2012)

140

Figure A4. Proposed mechanism of LovA. The two proposed intermediates with m/z 399 (MW = 340) in (–)-LC–MS analysis are 4a α-hydroxy-4a,5-dihydromonacolin L and 3 α-hydroxy-3,5-dihydromonacolin L acids. Adapted from Barriuso et al. (2012).

To examine whether the two new products were α- and β-3,4-epoxy-dihydromonacolin L,

these compounds were chemically synthesized and compared with LovA in vitro assay products.

Retention times and MS/MS fragmentation patterns on LC–MS rejected the possibility that this

141 was the case (data not shown). The two m/z 339 compounds could not be isolated at sufficient amount for NMR analysis. From these data, it is proposed that LovA catalyzes the conversion of

DML acid ( 1a ) to ML acid ( 3a ) first by abstracting a hydrogen at C4a position followed by an oxygen rebound onto the allylic radical of the C3 hydroxyl group (Figure A1.4), producing the intermediate 3 α-hydroxy-3,5-dihydromonacolin L acid ( 2a ) (MW = 340). Oxygen rebound could

also occur, in a similar manner, to introduce a hydroxyl group at C4a, giving the other

intermediate 4a α-hydroxy-4a,5-dihydromonacolin L acid (MW = 340).

A.2.5. Summary

Using genetically-modified yeast as a heterologous host, this research unambiguously demonstrated that LovA catalyzes the two consecutive oxidations of DML acid to ML acid and

MJ acid, closing the remaining biochemical gaps in the biosynthetic pathway of lovastatin.

Despite the fact that other genes in the lovastatin biosynthetic gene cluster still await functional characterization, complete elucidation of the entire set of necessary enzymes for lovastatin biosynthesis paves the way for metabolic engineering towards more efficient production of this valuable compound.

A.3. De novo synthesis of high-value plant sesquiterpenoids in yeast

A.3.1. Introduction

Plant sesquiterpenoids constitute a class of enormously diverse naturally-occurring 15- carbon structures derived from the common precursor FPP. Their functions are as diverse as their structures with critical roles in plant physiology and human health and wellness (Chapter 1). The limited natural resources as well as difficult chemical synthesis process have made production of sesquiterpenoids very costly for research and application. In order to overcome these restrictions,

142 the yeast strain EPY300 has been developed to produce an elevated level of farnesyl pyrophosphate (Ro et al ., 2006; Ro et al ., 2008). Expressing an STS separately or in combination

with sesquiterpene modifying enzymes, particularly P450s, in this metabolically engineered

yeast platform allows for the synthesis of desired sesquiterpenoids to a sufficient amount for

chemical and biological studies. Driven by galactose-induced promoters, the cost of de novo

sesquiterpenoid production in this yeast system is therefore negligible. This work applied the

simple procedure of expressing STS to produce a high abundance of valencene, an important

aromatic component of citrus (Furusawa et al ., 2005), and 5-epi -aristolochene, a precursor of

several phytoalexins against fungal pathogens. Furthermore, the same yeast system was also used

to produce the phytoalexin dihydroxy-5-epi -aristolochene (capsidiol), a critical defensive

compound in tobacco and chili pepper (Maldonando-Bonilla et al ., 2008) by additionally

expressing a P450 and CPR.

A.3.2. De novo production of sesquiterpenes: valencene and 5-epi -aristolochene

The high-copy pESC-Leu2d plasmid was employed to express valencene and 5-epi -

aristolochene synthases under the GAL 1 promoter. Grapefruit ( Citrus x paradisi ) flavedo was

used to isolate valencene synthase (CpVS ) cDNA with the forward primer 5’-

ATCGAGGGCCCGCCATGTCGTCTGGAGAAACATTTCG-3’ and the reverse primer 5’-

ACTCGAGTCAAAATGGAACGTGGTCTCCTAG-3’. These primers were designed based on the C. sinensis valencene synthase gene (GenBank ID: AF441124) (Sharon-Asa et al ., 2003).

The cDNA of NtEAS was isolated from tobacco ( Nicotiana tabaccum ) with the forward primer

5’-ATCGAGGGCCCGCCATGGCCTCAGCAGCAGTTGCAAAC-3’ and the reverse primer

5’-ACCATCTCGAGTCAAATTTTGATGGAGTCCACAAGT-3’. PCR products with these

143 primers were digested with Apa I and Xho I and ligated to the corresponding positions on the plasmid to generate pESC-Leu2d :: CpVS and pESC-Leu2d :: NtEAS .

A previous study has shown that sunflower ( H. annuus ) δ-cadinene synthase could only be

expressed properly in E. coli and yeast with a N-terminal fusion with thioredoxin (Göpfert et al .,

2009). As thioredoxin fusion reduces the formation of inclusion bodies in E. coli and may also

help proper folding and stability of STSs in yeast, the thioredoxin gene ( Trx ) was fused to the N- terminus of CpVS and NtEAS to evaluate its effect on these enzymes. This was achieved by including the thioredoxin gene in the GAL 1 promoter–MCS2 cassette of pESC-Leu2d . Trx from the commercial pET–48b(+) vector (EMD Biosciences) with the forward primer 5’-

AATGTAGATCTGCCATGAGCGATAAAATTATTCACC-3’ and the reverse primer 5’-

TCAGTAGATCTGGCCAGGTTAGCGTCGAGGAACTCTTTC-3’. The amplicon was digested with Bgl II and Bam HI and ligated to pESC-Leu2d under GAL 1 promoter. As the start

codons of CpVS and NtEAS were placed immediately after Apa I site, these ORFs were inframe

with Trx . The CpVS and NtEAS encoded in pESC-Leu2d :: Trx–CpVS and pESC-Leu2d :: Trx–

NtEAS were translationally fused with Trx by a 10-residue linker of Arg-Ser-Val-Ile-Arg-Leu-

Thr-Ile-Gly-Pro.

The aforementioned plasmids were transformed into EPY300 yeast. Transgenic yeast was cultivated for 48 hr with a dodecane overlay to trap volatile sesquiterpenes (Figure 2.1). GC–MS analysis of the dodecane fraction showed that EPY300 expressing CpVS produced 5.3 µg mL –1 valencene while EPY300 containing pESC-Leu2d :: NtEAS yielded 39.2 µg mL –1 5-epi -

aristolochene (Figure A5). The difference in yields between valencene and 5-epi -aristolochene might be a result of different kinetic efficiencies between the two STSs. Takahashi et al . (2007) reported that the kinetic efficiency of NtEAS was 10-fold higher than that of valencene synthase,

144 which shares 99% identity to the CpVS reported here. However, other factors including enzyme folding, stability, and solubility cannot be excluded.

Figure A5. Effect of thioredoxin fusion on in vivo STS productivity in yeast. Values are mean ± SD ( n = 3). CpVS: valencen synthase from grapefruit ( Citrus x paradisi ). NtEAS: 5-epi - aristolochene synthase from tobacco ( Nicotiana tabaccum ). Adapted from Nguyen et al. (2012).

When the N-terminal thioredoxin-fused version of NtEAS was expressed in EPY300, a remarkable increase of 5-epi -aristolochene level to approximately 400 µg mL –1 (about 10-fold higher than the yeast expressing NtEAS alone) was observed. These data clearly showed the usefulness of thioredoxin fusion to improve sesquiterpene yield in yeast. This positive effect of 145 thioredoxin fusion, however, is not generally applicable to all STSs. Indeed, EPY300 expressing

Trx–CpVS showed a decrease in valencene synthesis to 50% of the yield seen in yeast expressing

CpVS only (Figure A5). The low yield of the latter case might be fundamentally limited by the low kinetic efficiency of CpVS. In addition, the N-terminal thioredoxin fusion might interfere with the optical CpVS activity in yeast. In the case of NtEAS, non-optimal folding, stability, and solubility might occur without thioredoxin fusion, while in CpVS, the structure might be distorted with thioredoxin fusion. Although more studies are required to understand the mechanism of the effect of N-terminal thioredoxin fusion on STS activities in yeast, it is still an option to consider for sesquiterpene production improvement in yeast.

A.3.3. De novo production of hydroxylated sesquiterpene: capsidiol

Capsidiol, or 1,3-dihydroxy-5-epi -aristolochene, is a dihydroxy-sesquiterpene produced from 5-epi -aristolochene in chili pepper and tobacco by the P450 5-epi-aristolochene dihydroxylase (CYP71D20, or NtEAH) (Takahashi et al ., 2005). To reconstitute the capsidiol biosynthetic route from FPP in yeast, the dual expression pESC-Leu2d vector was modified to

include an additional expression cassette under the GAL 1 promoter (Figure 2.1), allowing the co- expression of NtEAS , and NtEAH with its redox partner CPR .

The ORF of NtEAH (GenBank ID: AF368376) was amplified from tobacco leaf cDNA with the forward primer 5’-ACTTATTCTAGAGCCATGCAATTCTTCAGCTTGGTTTCC-3’ and the reverse primer 5’-TAGTTTTCTAGAGCCTCTCGAGAAGGTTGATAAGGAGTGG-3’.

The amplicon was digested with Xba I and ligated to the Spe I position on pESC-Leu2d under

GAL 10 promoter. Artemisia annua CPR was subcloned into pESC-Ura :: CPR (Ro et al ., 2006) using Bam HI and Nhe I to pESC-Leu2d :: NtEAH to generate the dual expression vector pESC-

Leu2d :: CPR/NtEAH . As the thioredoxin-fused NtEAS yielded a significantly higher abundance

146 of 5-epi -aristolochene compared to the non-thioredoxin enzyme, this fusion form was selected for co-expression with NtEAH and CPR . The expression cassette of Trx–NtEAS from pESC-

Leu2d :: Trx–NtEAS was inserted to the dual expression vector pESC-Leu2d :: CPR/NtEAH using the method described in Chapter 2, yielding the triple expression vector pESC-Leu2d :: Trx–

NtEAS/CPR/NtEAH .

EPY300 transformed with the triple expression vector was cultivated and its capsidiol yield was monitored up to 120 hr. Although sesquiterpenes, as hydrocarbons, have poor solubility in water, they can diffuse through yeast cell membrane and are sequestered by dodecane. Therefore, dodecane overlay was not used in this case to avoid its interference with the production of oxygenated sesquiterpenoids.

Both capsidiol (7) and the substrate 5-epi -aristolochene (6) were clearly detected in GC–MS

analysis of the ethyl acetate extract from the culture. In addition, the derivative of FPP

(9), and an unknown peak ( 8) were also present (Figure A6.A). The parental ion in electron impact mass fragmentation pattern of peak 8 showed a value of m/z 220, indicating that this might be a hydroxy-5-epi -aristolochene, a one-step oxidation product from 5-epi -aristolochene

(Figure A6.B).

The metabolite profiles of capsidiol-producing yeast vary depending on culture media

(Figure A6.C). In synthetic dropout (SC) medium, capsidiol yield increased relatively fast in the first 48 hr, reached its plateau at 72 hr, and marginally increased to a final yield of 35.7 µg mL –1 after 120 hr of culture. However, when the yeast was cultured in YPA (rich) medium, an exponential increase of capsidiol yield was observed between 48 and 72 hr. Capsidiol reached the highest level of approximately 250 µg mL –1, more than seven-fold higher than that in SC medium, but the accumulation of farnesol and compound 8 also became significantly prominent.

147

The stability of the pESC-Leu2d vector allows de novo synthesis of natural products in yeast

cultured in both SC (selection) and YPA (non-selection) media. This work showed successful

reconstitution of one-gene (STS alone) and three-gene (STS, P450, and CPR) biosynthetic

routes. The same system has also been utilized for studying and producing natural products by

co-expressing three genes (Chapters 3 and 4) or even four genes (STS, two P450s, and CPR)

(Ikezawa et al ., 2011) in both SC and YPA media.

The use of YPA rich medium could yield significantly higher abundance of desired products

as described earlier. However, Ro et al . (2008) reported that when artemisinic acid, a

sesquiterpene carboxylic acid, was produced using the EPY300 platform, ATP-binding cassette

(ABC) transporter genes were substantially induced, and the plasmid’s stability became very low

since there is no selection pressure for yeast to maintain the pESC-Leu2d plasmid in rich

medium. Apparently, the cytotoxicity of the sesquiterpenoids produced in yeasts has a large

impact on the usefulness of YPA medium. Nevertheless, YPA medium can be considered as a

simple way to evaluate the possibility of improving sesquiterpenoid yield as well as the possible

cytotoxicity of the target sesquiterpenoids.

A.3.4. Summary

The yeast platform described in this work uses common sugars (glucose and galactose) for

de novo synthesis of sesquiterpenoids up to hundreds of µg mL –1 with the production of the valuable sesquiterpenoids, valencene, 5-epi -aristolochene, and capsidiol as examples. The triple expression vector presents a simple “plug-and-play” system for producing sesquiterpenes and oxygenated sesquiterpenes in yeast at sufficient abundance for structural analysis and further biochemical studies.

148

Figure A6. Production of capsidiol and other oxygenated compounds in engineered yeast. A: GC–MS metabolite profile of EPY300 expressing Trx–NtEAS/CPR/NtEAH after 48 hr of culture in YPA medium revealing the de novo syntheses of 5-epi -aristolochene ( 6), capsidiol ( 7), an unknown compound ( 8), and farnesol ( 9). B: Electron impact mass fragmentation pattern of compound 8, putatively a hydroxy-5-epi - aristolochene (MW = 220). C: Time-course yields and relative abundance of capsidiol and other oxygenated compounds in two different media (mean ± SD, n = 5). Quantification was based on total ion counts in GC–FID analysis. Adapted from Nguyen et al. (2012)

149